Commit Graph

1353082 Commits

Author SHA1 Message Date
David Sterba
6aa79c4f25 btrfs: use rb_entry_safe() where possible to simplify code
Simplify conditionally reading an rb_entry(), there's the
rb_entry_safe() helper that checks the node pointer for NULL so we don't
have to write it explicitly.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
c4669e4a8b btrfs: pass a pointer to get_range_bits() to cache first search result
Allow get_range_bits() to take an extent state pointer to pointer argument
so that we can cache the first extent state record in the target range, so
that a caller can use it for subsequent operations without doing a full
tree search. Currently the only user is try_release_extent_state(), which
then does a call to __clear_extent_bit() which can use such a cached state
record.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
32c523c578 btrfs: allow folios to be released while ordered extent is finishing
When the release_folio callback (from struct address_space_operations) is
invoked we don't allow the folio to be released if its range is currently
locked in the inode's io_tree, as it may indicate the folio may be needed
by the task that locked the range.

However if the range is locked because an ordered extent is finishing,
then we can safely allow the folio to be released because ordered extent
completion doesn't need to use the folio at all.

When we are under memory pressure, the kernel starts writeback of dirty
pages (folios) with the goal of releasing the pages from the page cache
after writeback completes, however this often is not possible on btrfs
because:

  * Once the writeback completes we queue the ordered extent completion;

  * Once the ordered extent completion starts, we lock the range in the
    inode's io_tree (at btrfs_finish_one_ordered());

  * If the release_folio callback is called while the folio's range is
    locked in the inode's io_tree, we don't allow the folio to be
    released, so the kernel has to try to release memory elsewhere,
    which may result in triggering more writeback or releasing other
    pages from the page cache which may be more useful to have around
    for applications.

In contrast, when the release_folio callback is invoked after writeback
finishes and before ordered extent completion starts or locks the range,
we allow the folio to be released, as well as when the release_folio
callback is invoked after ordered extent completion unlocks the range.

Improve on this by detecting if the range is locked for ordered extent
completion and if it is, allow the folio to be released. This detection
is achieved by adding a new extent flag in the io_tree that is set when
the range is locked during ordered extent completion.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
cbfb4cbf45 btrfs: update comment for try_release_extent_state()
Drop reference to pages from the comment since the function is fully folio
aware and works regardless of how many pages are in the folio. Also while
at it, capitalize the first word and make it more explicit that
release_folio is a callback from struct address_space_operations.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Qu Wenruo
1e5773e0ba btrfs: prepare btrfs_punch_hole_lock_range() for large data folios
The function btrfs_punch_hole_lock_range() needs to make sure there is
no other folio in the range, thus it goes with filemap_range_has_page(),
which works pretty fine.

But if we have large folios, under the following case
filemap_range_has_page() will always return true, forcing
btrfs_punch_hole_lock_range() to do a very time consuming busy loop:

        start                            end
        |                                |
  |//|//|//|//|  |  |  |  |  |  |  |  |//|//|
   \         /                         \   /
    Folio A                            Folio B

In the above case, folio A and B contain our start/end indexes, and there
are no other folios in the range.  Thus we do not need to retry inside
btrfs_punch_hole_lock_range().

To prepare for large data folios, introduce a helper,
check_range_has_page(), which will:

- Shrink the search range towards page boundaries
  If the rounded down end (exclusive, otherwise it can underflow when @end
  is inside the folio at file offset 0) is no larger than the rounded up
  start, it means the range contains no other pages other than the ones
  covering @start and @end.

  Can return false directly in that case.

- Grab all the folios inside the range

- Skip any large folios that cover the start and end indexes

- If any other folios are found return true

- Otherwise return false

This new helper is going to handle both large folios and regular ones.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Qu Wenruo
be8ef7990c btrfs: prepare btrfs_buffered_write() for large data folios
This involves the following modifications:

- Set the order flags for __filemap_get_folio() inside
  prepare_one_folio()

  This will allow __filemap_get_folio() to create a large folio if the
  address space supports it.

- Limit the initial @write_bytes inside copy_one_range()
  If the largest folio boundary splits the initial write range, there is
  no way we can write beyond the largest folio boundary.

  This is done by a simple helper calc_write_bytes().

- Release exceeding reserved space if the folio is smaller than expected
  Which is doing the same handling when short copy happens.

All the preparations should not change the behavior when the largest
folio order is 0.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Qu Wenruo
581bb9e761 btrfs: refactor how we handle reserved space inside copy_one_range()
There are several things not ideal in copy_one_range():

- Unnecessary temporary variables
  * block_offset
  * reserve_bytes
  * dirty_blocks
  * num_blocks
  * release_bytes
  These are utilized to handle short-copy cases.

- Inconsistent handling of btrfs_delalloc_release_extents()
  There is a hidden behavior that, after reserving metadata for X bytes
  of data write, we have to call btrfs_delalloc_release_extents() with X
  once and only once.

  Calling btrfs_delalloc_release_extents(X - 4K) and
  btrfs_delalloc_release_extents(4K) will cause outstanding extents
  accounting to go wrong.

  This is because the outstanding extents mechanism is not designed to
  handle shrinking of reserved space.

Improve above situations by:

- Use a single @reserved_start and @reserved_len pair
  Now we reserve space for the initial range, and if a short copy
  happened and we need to shrink the reserved space, we can easily
  calculate the new length, and update @reserved_len.

- Introduce helpers to shrink reserved data and metadata space
  This is done by two new helpers, shrink_reserved_space() and
  btrfs_delalloc_shrink_extents().

  The later will do a better calculation if we need to modify the
  outstanding extents, and the first one will be utilized inside
  copy_one_range().

- Manually unlock, release reserved space and return if no byte is
  copied

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:40 +02:00
Filipe Manana
5c41f6010e btrfs: remove EXTENT_UPTODATE io tree flag
The EXTENT_UPTODATE io tree flag is now used only to mark ranges in the
fs_info->excluded_extents as used by super blocks and not available for
extent allocation (to prevent adding those ranges as free space in the
in memory space caches). As we can use any flag for that purpose, and
we are using EXTENT_DIRTY for the pinned extents io tree for example,
remove the EXTENT_UPTODATE flag and use instead EXTENT_DIRTY for the
excluded extents io tree.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Filipe Manana
db3f796c7c btrfs: stop searching for EXTENT_DIRTY bit in the excluded extents io tree
At btrfs_add_new_free_space() we keep searching for ranges in the excluded
extents io tree that have the EXTENT_DIRTY bit set, however we never ever
set that bit for ranges in that tree. That is a leftover from when that
function used the global freed extents trees (fs_info->freed_extents[2]),
where we used both the EXTENT_DIRTY and EXTENT_UPTODATE bits, but those
trees are gone with commit fe119a6eeb ("btrfs: switch to per-transaction
pinned extents"), which introduced the fs_info->excluded_extents io tree,
where only EXTENT_UPTODATE is set.

So remove the EXTENT_DIRTY bit search at btrfs_add_new_free_space().

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Filipe Manana
d2c41835fd btrfs: remove leftover EXTENT_UPTODATE clear from an inode's io_tree
After commit 52b029f427 ("btrfs: remove unnecessary EXTENT_UPTODATE
state in buffered I/O path") we never set EXTENT_UPTODATE in an inode's
io_tree anymore, but we still have some code attempting to clear that
bit from an inode's io_tree. Remove that code as it doesn't do anything
anymore. The sole use of the EXTENT_UPTODATE bit is for the excluded
extents io_tree (fs_info->excluded_extents), which is used to track the
locations of super blocks, so that their ranges are never marked as free,
making them unavailable for extent allocation.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Filipe Manana
5e85262e54 btrfs: fix fsync of files with no hard links not persisting deletion
If we fsync a file (or directory) that has no more hard links, because
while a process had a file descriptor open on it, the file's last hard
link was removed and then the process did an fsync against the file
descriptor, after a power failure or crash the file still exists after
replaying the log.

This behaviour is incorrect since once an inode has no more hard links
it's not accessible anymore and we insert an orphan item into its
subvolume's tree so that the deletion of all its items is not missed in
case of a power failure or crash.

So after log replay the file shouldn't exist anymore, which is also the
behaviour on ext4, xfs, f2fs and other filesystems.

Fix this by not ignoring inodes with zero hard links at
btrfs_log_inode_parent() and by committing an inode's delayed inode when
we are not doing a fast fsync (either BTRFS_INODE_COPY_EVERYTHING or
BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's runtime flags). This
last step is necessary because when removing the last hard link we don't
delete the corresponding ref (or extref) item, instead we record the
change in the inode's delayed inode with the BTRFS_DELAYED_NODE_DEL_IREF
flag, so that when the delayed inode is committed we delete the ref/extref
item from the inode's subvolume tree - otherwise the logging code will log
the last hard link and therefore upon log replay the inode is not deleted.

The base code for a fstests test case that reproduces this bug is the
following:

   . ./common/dmflakey

   _require_scratch
   _require_dm_target flakey
   _require_mknod

   _scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed"
   _require_metadata_journaling $SCRATCH_DEV
   _init_flakey
   _mount_flakey

   touch $SCRATCH_MNT/foo

   # Commit the current transaction and persist the file.
   _scratch_sync

   # A fifo to communicate with a background xfs_io process that will
   # fsync the file after we deleted its hard link while it's open by
   # xfs_io.
   mkfifo $SCRATCH_MNT/fifo

   tail -f $SCRATCH_MNT/fifo | \
        $XFS_IO_PROG $SCRATCH_MNT/foo >>$seqres.full &
   XFS_IO_PID=$!

   # Give some time for the xfs_io process to open a file descriptor for
   # the file.
   sleep 1

   # Now while the file is open by the xfs_io process, delete its only
   # hard link.
   rm -f $SCRATCH_MNT/foo

   # Now that it has no more hard links, make the xfs_io process fsync it.
   echo "fsync" > $SCRATCH_MNT/fifo

   # Terminate the xfs_io process so that we can unmount.
   echo "quit" > $SCRATCH_MNT/fifo
   wait $XFS_IO_PID
   unset XFS_IO_PID

   # Simulate a power failure and then mount again the filesystem to
   # replay the journal/log.
   _flakey_drop_and_remount

   # We don't expect the file to exist anymore, since it was fsynced when
   # it had no more hard links.
   [ -f $SCRATCH_MNT/foo ] && echo "file foo still exists"

   _unmount_flakey

   # success, all done
   echo "Silence is golden"
   status=0
   exit

A test case for fstests will be submitted soon.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Mark Harmstone
846b534075 btrfs: fix typo in space info explanation
There's an explanation of how space info works at the top of
fs/btrfs/space-info.c, which makes reference to a variable called
bytes_may_reserve.  There's nothing called that in the code, and wasn't
at time the comment was written; as far I can tell this is a typo, and
it should actually be bytes_may_use.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Daniel Vacek
062f3d02a2 btrfs: remove unused flag EXTENT_BUFFER_IN_TREE
This flag is set after inserting the eb to the buffer tree and cleared
on it's removal.  It was added in commit 34b41acec1 ("Btrfs: use a
bit to track if we're in the radix tree") and wanted to make use of it,
faa2dbf004 ("Btrfs: add sanity tests for new qgroup accounting
code"). Both are 10+ years old, we can remove the flag.

Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Daniel Vacek
c61660ec34 btrfs: remove unused flag EXTENT_BUFFER_CORRUPT
This flag is no longer being used.  It was added by commit a826d6dcb3
("Btrfs: check items for correctness as we search") but it's no longer
being used after commit f26c923860 ("btrfs: remove reada
infrastructure").

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Daniel Vacek
350362e95f btrfs: remove unused flag EXTENT_BUFFER_READAHEAD
This flag is no longer being used.  It was added by commit ab0fff0305
("btrfs: add READAHEAD extent buffer flag") and used in commits:

79fb65a1f6 ("Btrfs: don't call readahead hook until we have read the entire eb")
78e62c02ab ("btrfs: Remove extent_io_ops::readpage_io_failed_hook")
371cdc0700 ("btrfs: introduce subpage metadata validation check")

Finally all the code using it was removed by commit f26c923860 ("btrfs: remove
reada infrastructure").

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Daniel Vacek
40f47f6d72 btrfs: remove unused flag EXTENT_BUFFER_READ_ERR
This flag was added by commit 656f30dba7 ("Btrfs: be aware of btree
inode write errors to avoid fs corruption") but it stopped being used
after commit 046b562b20 ("btrfs: use a separate end_io handler for
read_extent_buffer").

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Qu Wenruo
ced47a4db4 btrfs: factor out the main loop of btrfs_buffered_write() into a helper
Inside the main loop of btrfs_buffered_write() we are doing a lot of
heavy lifting inside a while() loop.

This makes it pretty hard to read, factor out the content into a helper,
copy_one_range() to do the work.

This has no functional change, but with some minor variable renames,
e.g. rename all "sector" into "block".

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Qu Wenruo
af821cba72 btrfs: factor out space reservation code from btrfs_buffered_write()
Inside the main loop of btrfs_buffered_write(), we have a complex data
and metadata space reservation code, which tries to reserve space for
a COW write, if failed then fallback to check if we can do a NOCOW
write.

Factor out that part of code into a dedicated helper, reserve_space(),
to make the main loop a little easier to read.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:39 +02:00
Qu Wenruo
afe990fb59 btrfs: cleanup the reserved space inside loop of btrfs_buffered_write()
Inside the main loop of btrfs_buffered_write(), if something wrong
happened, there is a out-of-loop cleanup path to release the reserved
space.

This behavior saves some code lines, but makes it much harder to read,
as we need to check release_bytes to make sure when we need to do the
cleanup.

Factor out the cleanup part into a helper, release_reserved_space(), to
do the cleanup inside the main loop, so that we can move @release_bytes
inside the loop.

This will make later refactoring of the main loop much easier.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:38 +02:00
Qu Wenruo
563bd2b785 btrfs: remove force_page_uptodate variable from btrfs_buffered_write()
Commit c87c299776 ("btrfs: make buffered write to copy one page a
time") changed how the variable @force_page_uptodate was updated.

Before that commit the variable was only initialized to false at the
beginning of the function, and after hitting a short copy, the next
retry on the same folio would force the folio to be read from the disk.

But after the commit, the variable is always initialized to false at the
beginning of the loop's scope, causing prepare_one_folio() never to get a
true value passed in.

The change in behavior is not a huge deal, it only makes a difference
on how we handle short copies:

Old: Allow the buffer to be split

     The first short copy will be rejected, that's the same for both
     cases.

     But for the next retry, we require the folio to be read from disk.

     Then even if we hit a short copy again, since the folio is already
     uptodate, we do not need to handle partial uptodate range, and can
     continue, marking the short copied range as dirty and continue.

     This will split the buffer write into the folio as two buffered
     writes.

New: Do not allow the buffer to be split

     The first short copy will be rejected, that's the same for both
     cases.

     For the next retry, we do nothing special, thus if the short copy
     happened again, we reject it again, until either the short copy is
     gone, or we failed to fault in the buffer.

     This will mean the buffer write into the folio will either fail or
     succeed, no splitting will happen.

To me, either solution is fine, but the new one makes it simpler and
requires no special handling, so I prefer that solution.

And since @force_page_uptodate is always false when passed into
prepare_one_folio(), we can just remove the variable.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:38 +02:00
Qu Wenruo
d03e3a9370 btrfs: move block perfect compression out of experimental features
Commit 1d2fbb7f1f ("btrfs: allow compression even if the range is not
page aligned") introduced the block perfect compression for block size <
page size cases.

Before that commit, if the fs block size is smaller than page size (aka
subpage cases), compressed write is only enabled if the dirty range is
fully page aligned.

This block perfect compression support was introduced in v6.13, and has
been tested for two kernel releases.
I believe it's time to move it out of experimental features so that we
can get more tests in the real world.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15 14:30:38 +02:00
Linus Torvalds
088d13246a Merge tag 'kbuild-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Add proper pahole version dependency to CONFIG_GENDWARFKSYMS to avoid
   module loading errors

 - Fix UAPI header tests for the OpenRISC architecture

 - Add dependency on the libdw package in Debian and RPM packages

 - Disable -Wdefault-const-init-unsafe warnings on Clang

 - Make "make clean ARCH=um" also clean the arch/x86/ directory

 - Revert the use of -fmacro-prefix-map=, which causes issues with
   debugger usability

* tag 'kbuild-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: fix typos "module.builtin" to "modules.builtin"
  Revert "kbuild, rust: use -fremap-path-prefix to make paths relative"
  Revert "kbuild: make all file references relative to source root"
  kbuild: fix dependency on sorttable
  init: remove unused CONFIG_CC_CAN_LINK_STATIC
  um: let 'make clean' properly clean underlying SUBARCH as well
  kbuild: Disable -Wdefault-const-init-unsafe
  kbuild: rpm-pkg: Add (elfutils-devel or libdw-devel) to BuildRequires
  kbuild: deb-pkg: Add libdw-dev:native to Build-Depends-Arch
  usr/include: openrisc: don't HDRTEST bpf_perf_event.h
  kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86
2025-05-14 22:24:17 -07:00
Linus Torvalds
546bce5792 Merge tag 'tpmdd-next-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
 "A few last minute fixes for v6.15"

* tag 'tpmdd-next-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: tis: Double the timeout B to 4s
  char: tpm: tpm-buf: Add sanity check fallback in read helpers
  tpm: Mask TPM RC in tpm2_start_auth_session()
2025-05-14 19:33:18 -07:00
Michal Suchanek
2f661f71fd tpm: tis: Double the timeout B to 4s
With some Infineon chips the timeouts in tpm_tis_send_data (both B and
C) can reach up to about 2250 ms.

Timeout C is retried since
commit de9e33df77 ("tpm, tpm_tis: Workaround failed command reception on Infineon devices")

Timeout B still needs to be extended.

The problem is most commonly encountered with context related operation
such as load context/save context. These are issued directly by the
kernel, and there is no retry logic for them.

When a filesystem is set up to use the TPM for unlocking the boot fails,
and restarting the userspace service is ineffective. This is likely
because ignoring a load context/save context result puts the real TPM
state and the TPM state expected by the kernel out of sync.

Chips known to be affected:
tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54)
Description: SLB9672
Firmware Revision: 15.22

tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22)
Firmware Revision: 7.83

tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16)
Firmware Revision: 5.63

Link: https://lore.kernel.org/linux-integrity/Z5pI07m0Muapyu9w@kitsune.suse.cz/
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-15 04:49:15 +03:00
Purva Yeshi
32d495b384 char: tpm: tpm-buf: Add sanity check fallback in read helpers
Fix Smatch-detected issue:

drivers/char/tpm/tpm-buf.c:208 tpm_buf_read_u8() error:
uninitialized symbol 'value'.
drivers/char/tpm/tpm-buf.c:225 tpm_buf_read_u16() error:
uninitialized symbol 'value'.
drivers/char/tpm/tpm-buf.c:242 tpm_buf_read_u32() error:
uninitialized symbol 'value'.

Zero-initialize the return values in tpm_buf_read_u8(), tpm_buf_read_u16(),
and tpm_buf_read_u32() to guard against uninitialized data in case of a
boundary overflow.

Add defensive initialization ensures the return values are always defined,
preventing undefined behavior if the unexpected happens.

Signed-off-by: Purva Yeshi <purvayeshi550@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-15 04:47:14 +03:00
Jarkko Sakkinen
539fbab378 tpm: Mask TPM RC in tpm2_start_auth_session()
tpm2_start_auth_session() does not mask TPM RC correctly from the callers:

[   28.766528] tpm tpm0: A TPM error (2307) occurred start auth session

Process TPM RCs inside tpm2_start_auth_session(), and map them to POSIX
error codes.

Cc: stable@vger.kernel.org # v6.10+
Fixes: 699e3efd6c ("tpm: Add HMAC session start and end functions")
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Closes: https://lore.kernel.org/linux-integrity/Z_NgdRHuTKP6JK--@gondor.apana.org.au/
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-15 04:47:14 +03:00
Linus Torvalds
74a6325597 Merge tag 'for-6.15-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix potential endless loop when discarding a block group when
   disabling discard

 - reinstate message when setting a large value of mount option 'commit'

 - fix a folio leak when async extent submission fails

* tag 'for-6.15-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: add back warning for mount option commit values exceeding 300
  btrfs: fix folio leak in submit_one_async_extent()
  btrfs: fix discard worker infinite loop after disabling discard
2025-05-14 18:39:12 -07:00
Linus Torvalds
c94d59a126 Merge tag 'trace-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix sample code that uses trace_array_printk()

   The sample code for in kernel use of trace_array (that creates an
   instance for use within the kernel) and shows how to use
   trace_array_printk() that writes into the created instance, used
   trace_printk_init_buffers(). But that function is used to initialize
   normal trace_printk() and produces the NOTICE banner which is not
   needed for use of trace_array_printk(). The function to initialize
   that is trace_array_init_printk() that takes the created trace array
   instance as a parameter.

   Update the sample code to reflect the proper usage.

 - Fix preemption count output for stacktrace event

   The tracing buffer shows the preempt count level when an event
   executes. Because writing the event itself disables preemption, this
   needs to be accounted for when recording. The stacktrace event did
   not account for this so the output of the stacktrace event showed
   preemption was disabled while the event that triggered the stacktrace
   shows preemption is enabled and this leads to confusion. Account for
   preemption being disabled for the stacktrace event.

   The same happened for stack traces triggered by function tracer.

 - Fix persistent ring buffer when trace_pipe is used

   The ring buffer swaps the reader page with the next page to read from
   the write buffer when trace_pipe is used. If there's only a page of
   data in the ring buffer, this swap will cause the "commit" pointer
   (last data written) to be on the reader page. If more data is written
   to the buffer, it is added to the reader page until it falls off back
   into the write buffer.

   If the system reboots and the commit pointer is still on the reader
   page, even if new data was written, the persistent buffer validator
   will miss finding the commit pointer because it only checks the write
   buffer and does not check the reader page. This causes the validator
   to fail the validation and clear the buffer, where the new data is
   lost.

   There was a check for this, but it checked the "head pointer", which
   was incorrect, because the "head pointer" always stays on the write
   buffer and is the next page to swap out for the reader page. Fix the
   logic to catch this case and allow the user to still read the data
   after reboot.

* tag 'trace-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Fix persistent buffer when commit page is the reader page
  ftrace: Fix preemption accounting for stacktrace filter command
  ftrace: Fix preemption accounting for stacktrace trigger command
  tracing: samples: Initialize trace_array_printk() with the correct function
2025-05-14 11:24:19 -07:00
Steven Rostedt
1d6c39c89f ring-buffer: Fix persistent buffer when commit page is the reader page
The ring buffer is made up of sub buffers (sometimes called pages as they
are by default PAGE_SIZE). It has the following "pages":

  "tail page" - this is the page that the next write will write to
  "head page" - this is the page that the reader will swap the reader page with.
  "reader page" - This belongs to the reader, where it will swap the head
                  page from the ring buffer so that the reader does not
                  race with the writer.

The writer may end up on the "reader page" if the ring buffer hasn't
written more than one page, where the "tail page" and the "head page" are
the same.

The persistent ring buffer has meta data that points to where these pages
exist so on reboot it can re-create the pointers to the cpu_buffer
descriptor. But when the commit page is on the reader page, the logic is
incorrect.

The check to see if the commit page is on the reader page checked if the
head page was the reader page, which would never happen, as the head page
is always in the ring buffer. The correct check would be to test if the
commit page is on the reader page. If that's the case, then it can exit
out early as the commit page is only on the reader page when there's only
one page of data in the buffer. There's no reason to iterate the ring
buffer pages to find the "commit page" as it is already found.

To trigger this bug:

  # echo 1 > /sys/kernel/tracing/instances/boot_mapped/events/syscalls/sys_enter_fchownat/enable
  # touch /tmp/x
  # chown sshd /tmp/x
  # reboot

On boot up, the dmesg will have:
 Ring buffer meta [0] is from previous boot!
 Ring buffer meta [1] is from previous boot!
 Ring buffer meta [2] is from previous boot!
 Ring buffer meta [3] is from previous boot!
 Ring buffer meta [4] commit page not found
 Ring buffer meta [5] is from previous boot!
 Ring buffer meta [6] is from previous boot!
 Ring buffer meta [7] is from previous boot!

Where the buffer on CPU 4 had a "commit page not found" error and that
buffer is cleared and reset causing the output to be empty and the data lost.

When it works correctly, it has:

  # cat /sys/kernel/tracing/instances/boot_mapped/trace_pipe
        <...>-1137    [004] .....   998.205323: sys_enter_fchownat: __syscall_nr=0x104 (260) dfd=0xffffff9c (4294967196) filename=(0xffffc90000a0002c) user=0x3e8 (1000) group=0xffffffff (4294967295) flag=0x0 (0

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250513115032.3e0b97f7@gandalf.local.home
Fixes: 5f3b6e839f ("ring-buffer: Validate boot range memory events")
Reported-by: Tasos Sahanidis <tasos@tasossah.com>
Tested-by: Tasos Sahanidis <tasos@tasossah.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-14 13:53:23 -04:00
pengdonglin
11aff32439 ftrace: Fix preemption accounting for stacktrace filter command
The preemption count of the stacktrace filter command to trace ksys_read
is consistently incorrect:

$ echo ksys_read:stacktrace > set_ftrace_filter

   <...>-453     [004] ...1.    38.308956: <stack trace>
=> ksys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe

The root cause is that the trace framework disables preemption when
invoking the filter command callback in function_trace_probe_call:

   preempt_disable_notrace();
   probe_ops->func(ip, parent_ip, probe_opsbe->tr, probe_ops, probe->data);
   preempt_enable_notrace();

Use tracing_gen_ctx_dec() to account for the preempt_disable_notrace(),
which will output the correct preemption count:

$ echo ksys_read:stacktrace > set_ftrace_filter

   <...>-410     [006] .....    31.420396: <stack trace>
=> ksys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe

Cc: stable@vger.kernel.org
Fixes: 36590c50b2 ("tracing: Merge irqflags + preempt counter.")
Link: https://lore.kernel.org/20250512094246.1167956-2-dolinux.peng@gmail.com
Signed-off-by: pengdonglin <dolinux.peng@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-14 13:53:23 -04:00
pengdonglin
e333332657 ftrace: Fix preemption accounting for stacktrace trigger command
When using the stacktrace trigger command to trace syscalls, the
preemption count was consistently reported as 1 when the system call
event itself had 0 (".").

For example:

root@ubuntu22-vm:/sys/kernel/tracing/events/syscalls/sys_enter_read
$ echo stacktrace > trigger
$ echo 1 > enable

    sshd-416     [002] .....   232.864910: sys_read(fd: a, buf: 556b1f3221d0, count: 8000)
    sshd-416     [002] ...1.   232.864913: <stack trace>
 => ftrace_syscall_enter
 => syscall_trace_enter
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

The root cause is that the trace framework disables preemption in __DO_TRACE before
invoking the trigger callback.

Use the tracing_gen_ctx_dec() that will accommodate for the increase of
the preemption count in __DO_TRACE when calling the callback. The result
is the accurate reporting of:

    sshd-410     [004] .....   210.117660: sys_read(fd: 4, buf: 559b725ba130, count: 40000)
    sshd-410     [004] .....   210.117662: <stack trace>
 => ftrace_syscall_enter
 => syscall_trace_enter
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

Cc: stable@vger.kernel.org
Fixes: ce33c845b0 ("tracing: Dump stacktrace trigger to the corresponding instance")
Link: https://lore.kernel.org/20250512094246.1167956-1-dolinux.peng@gmail.com
Signed-off-by: pengdonglin <dolinux.peng@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-14 13:53:09 -04:00
Linus Torvalds
1a80a098c6 Merge tag 'execve-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
 "This fixes a corner case for ASLR-disabled static-PIE brk collision
  with vdso allocations:

   - binfmt_elf: Move brk for static PIE even if ASLR disabled"

* tag 'execve-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_elf: Move brk for static PIE even if ASLR disabled
2025-05-14 09:15:16 -07:00
Linus Torvalds
00f281fd9d Merge tag 'soc-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
 "These all address issues in devicetree files:

   - The Rockchip rk3588j are now limited the same way as the vendor
     kernel, to allow room for the industrial-grade temperature ranges.

   - Seven more Rockchip fixes address minor issues with specific boards

   - Invalid clk controller references in multiple amlogic chips, plus
     one accidentally disabled audio on clock

   - Two devicetree fixes for i.MX8MP boards, both for incorrect
     regulator settings

   - A power domain change for apple laptop touchbar, fixing
     suspend/resume problems

   - An incorrect DMA controller setting for sophgo cv18xx chips"

* tag 'soc-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: amazon: Fix simple-bus node name schema warnings
  MAINTAINERS: delete email for Shiraz Hashim
  arm64: dts: imx8mp-var-som: Fix LDO5 shutdown causing SD card timeout
  arm64: dts: imx8mp: use 800MHz NoC OPP for nominal drive mode
  arm64: dts: amlogic: dreambox: fix missing clkc_audio node
  riscv: dts: sophgo: fix DMA data-width configuration for CV18xx
  arm64: dts: rockchip: fix Sige5 RTC interrupt pin
  arm64: dts: rockchip: Assign RT5616 MCLK rate on rk3588-friendlyelec-cm3588
  arm64: dts: rockchip: Align wifi node name with bindings in CB2
  arm64: dts: amlogic: g12: fix reference to unknown/untested PWM clock
  arm64: dts: amlogic: gx: fix reference to unknown/untested PWM clock
  ARM: dts: amlogic: meson8b: fix reference to unknown/untested PWM clock
  ARM: dts: amlogic: meson8: fix reference to unknown/untested PWM clock
  arm64: dts: apple: touchbar: Mark ps_dispdfr_be as always-on
  mailmap: Update email for Asahi Lina
  arm64: dts: rockchip: Fix mmc-pwrseq clock name on rock-pi-4
  arm64: dts: rockchip: Use "regulator-fixed" for btreg on px30-engicam for vcc3v3-btreg
  arm64: dts: rockchip: Add pinmuxing for eMMC on QNAP TS433
  arm64: dts: rockchip: Remove overdrive-mode OPPs from RK3588J SoC dtsi
  arm64: dts: rockchip: Allow Turing RK1 cooling fan to spin down
2025-05-14 09:11:05 -07:00
Eric Biggers
9f35e33144 x86/its: Fix build errors when CONFIG_MODULES=n
Fix several build errors when CONFIG_MODULES=n, including the following:

../arch/x86/kernel/alternative.c:195:25: error: incomplete definition of type 'struct module'
  195 |         for (int i = 0; i < mod->its_num_pages; i++) {

Fixes: 872df34d7c ("x86/its: Use dynamic thunks for indirect branches")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-13 14:36:08 -07:00
Linus Torvalds
405e6c37c8 Merge tag 'probes-fixes-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - fprobe: Fix RCU warning message in list traversal

   fprobe_module_callback() using hlist_for_each_entry_rcu() traverse
   the fprobe list but it locks fprobe_mutex() instead of rcu lock
   because it is enough. So add lockdep_is_held() to avoid warning.

 - tracing: eprobe: Add missing trace_probe_log_clear for eprobe

   __trace_eprobe_create() uses trace_probe_log but forgot to clear it
   at exit. Add trace_probe_log_clear() calls.

 - tracing: probes: Fix possible race in trace_probe_log APIs

   trace_probe_log APIs are used in probe event (dynamic_events,
   kprobe_events and uprobe_events) creation. Only dynamic_events uses
   the dyn_event_ops_mutex mutex to serialize it. This makes kprobe and
   uprobe events to lock the same mutex to serialize its creation to
   avoid race in trace_probe_log APIs.

* tag 'probes-fixes-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: probes: Fix a possible race in trace_probe_log APIs
  tracing: add missing trace_probe_log_clear for eprobes
  tracing: fprobe: Fix RCU warning message in list traversal
2025-05-13 12:20:07 -07:00
Masami Hiramatsu (Google)
fd837de3c9 tracing: probes: Fix a possible race in trace_probe_log APIs
Since the shared trace_probe_log variable can be accessed and
modified via probe event create operation of kprobe_events,
uprobe_events, and dynamic_events, it should be protected.
In the dynamic_events, all operations are serialized by
`dyn_event_ops_mutex`. But kprobe_events and uprobe_events
interfaces are not serialized.

To solve this issue, introduces dyn_event_create(), which runs
create() operation under the mutex, for kprobe_events and
uprobe_events. This also uses lockdep to check the mutex is
held when using trace_probe_log* APIs.

Link: https://lore.kernel.org/all/174684868120.551552.3068655787654268804.stgit@devnote2/

Reported-by: Paul Cacheux <paulcacheux@gmail.com>
Closes: https://lore.kernel.org/all/20250510074456.805a16872b591e2971a4d221@kernel.org/
Fixes: ab105a4fb8 ("tracing: Use tracing error_log with probe events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-05-13 22:23:34 +09:00
Linus Torvalds
e9565e23cd Merge tag 'sched_ext-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
 "A little bit invasive for rc6 but they're important fixes, pass tests
  fine and won't break anything outside sched_ext:

   - scx_bpf_cpuperf_set() calls internal functions that require the rq
     to be locked. It assumed that the BPF caller has rq locked but
     that's not always true. Fix it by tracking whether rq is currently
     held by the CPU and grabbing it if necessary

   - bpf_iter_scx_dsq_new() was leaving the DSQ iterator in an
     uninitialized state after an error. However, next() and destroy()
     can be called on an iterator which failed initialization and thus
     they always need to be initialized even after an init error. Fix by
     always initializing the iterator

   - Remove duplicate BTF_ID_FLAGS() entries"

* tag 'sched_ext-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: bpf_iter_scx_dsq_new() should always initialize iterator
  sched_ext: Fix rq lock state in hotplug ops
  sched_ext: Remove duplicate BTF_ID_FLAGS definitions
  sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()
  sched_ext: Track currently locked rq
2025-05-12 18:02:05 -07:00
Linus Torvalds
d471045e75 Merge tag 'cgroup-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "One low-risk patch to fix a cpuset bug where it over-eagerly tries to
  modify CPU affinity of kernel threads"

* tag 'cgroup-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-12 17:58:57 -07:00
Kyoji Ogasawara
4ce2affc6e btrfs: add back warning for mount option commit values exceeding 300
The Btrfs documentation states that if the commit value is greater than
300 a warning should be issued. The warning was accidentally lost in the
new mount API update.

Fixes: 6941823cc8 ("btrfs: remove old mount API code")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12 21:39:34 +02:00
Boris Burkov
a0fd1c6098 btrfs: fix folio leak in submit_one_async_extent()
If btrfs_reserve_extent() fails while submitting an async_extent for a
compressed write, then we fail to call free_async_extent_pages() on the
async_extent and leak its folios. A likely cause for such a failure
would be btrfs_reserve_extent() failing to find a large enough
contiguous free extent for the compressed extent.

I was able to reproduce this by:

1. mount with compress-force=zstd:3
2. fallocating most of a filesystem to a big file
3. fragmenting the remaining free space
4. trying to copy in a file which zstd would generate large compressed
   extents for (vmlinux worked well for this)

Step 4. hits the memory leak and can be repeated ad nauseam to
eventually exhaust the system memory.

Fix this by detecting the case where we fallback to uncompressed
submission for a compressed async_extent and ensuring that we call
free_async_extent_pages().

Fixes: 131a821a24 ("btrfs: fallback if compressed IO fails for ENOSPC")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-developed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12 21:39:13 +02:00
Filipe Manana
54db6d1bdd btrfs: fix discard worker infinite loop after disabling discard
If the discard worker is running and there's currently only one block
group, that block group is a data block group, it's in the unused block
groups discard list and is being used (it got an extent allocated from it
after becoming unused), the worker can end up in an infinite loop if a
transaction abort happens or the async discard is disabled (during remount
or unmount for example).

This happens like this:

1) Task A, the discard worker, is at peek_discard_list() and
   find_next_block_group() returns block group X;

2) Block group X is in the unused block groups discard list (its discard
   index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past
   it become an unused block group and was added to that list, but then
   later it got an extent allocated from it, so its ->used counter is not
   zero anymore;

3) The current transaction is aborted by task B and we end up at
   __btrfs_handle_fs_error() in the transaction abort path, where we call
   btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from
   fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode
   (setting SB_RDONLY in the super block's s_flags field);

4) Task A calls __add_to_discard_list() with the goal of moving the block
   group from the unused block groups discard list into another discard
   list, but at __add_to_discard_list() we end up doing nothing because
   btrfs_run_discard_work() returns false, since the super block has
   SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set
   anymore in fs_info->flags. So block group X remains in the unused block
   groups discard list;

5) Task A then does a goto into the 'again' label, calls
   find_next_block_group() again we gets block group X again. Then it
   repeats the previous steps over and over since there are not other
   block groups in the discard lists and block group X is never moved
   out of the unused block groups discard list since
   btrfs_run_discard_work() keeps returning false and therefore
   __add_to_discard_list() doesn't move block group X out of that discard
   list.

When this happens we can get a soft lockup report like this:

  [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97]
  [71.957] Modules linked in: xfs af_packet rfkill (...)
  [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G        W          6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
  [71.957] Tainted: [W]=WARN
  [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
  [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
  [71.957] Code: c1 01 48 83 (...)
  [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
  [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
  [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
  [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
  [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
  [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
  [71.957] FS:  0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000
  [71.957] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0
  [71.957] PKRU: 55555554
  [71.957] Call Trace:
  [71.957]  <TASK>
  [71.957]  process_one_work+0x17e/0x330
  [71.957]  worker_thread+0x2ce/0x3f0
  [71.957]  ? __pfx_worker_thread+0x10/0x10
  [71.957]  kthread+0xef/0x220
  [71.957]  ? __pfx_kthread+0x10/0x10
  [71.957]  ret_from_fork+0x34/0x50
  [71.957]  ? __pfx_kthread+0x10/0x10
  [71.957]  ret_from_fork_asm+0x1a/0x30
  [71.957]  </TASK>
  [71.957] Kernel panic - not syncing: softlockup: hung tasks
  [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G        W    L     6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
  [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP
  [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
  [71.992] Call Trace:
  [71.993]  <IRQ>
  [71.994]  dump_stack_lvl+0x5a/0x80
  [71.994]  panic+0x10b/0x2da
  [71.995]  watchdog_timer_fn.cold+0x9a/0xa1
  [71.996]  ? __pfx_watchdog_timer_fn+0x10/0x10
  [71.997]  __hrtimer_run_queues+0x132/0x2a0
  [71.997]  hrtimer_interrupt+0xff/0x230
  [71.998]  __sysvec_apic_timer_interrupt+0x55/0x100
  [71.999]  sysvec_apic_timer_interrupt+0x6c/0x90
  [72.000]  </IRQ>
  [72.000]  <TASK>
  [72.001]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
  [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
  [72.002] Code: c1 01 48 83 (...)
  [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
  [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
  [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
  [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
  [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
  [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
  [72.010]  ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f]
  [72.011]  process_one_work+0x17e/0x330
  [72.012]  worker_thread+0x2ce/0x3f0
  [72.013]  ? __pfx_worker_thread+0x10/0x10
  [72.014]  kthread+0xef/0x220
  [72.014]  ? __pfx_kthread+0x10/0x10
  [72.015]  ret_from_fork+0x34/0x50
  [72.015]  ? __pfx_kthread+0x10/0x10
  [72.016]  ret_from_fork_asm+0x1a/0x30
  [72.017]  </TASK>
  [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  [72.019] Rebooting in 90 seconds..

So fix this by making sure we move a block group out of the unused block
groups discard list when calling __add_to_discard_list().

Fixes: 2bee7eb8bb ("btrfs: discard one region at a time in async discard")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12 21:38:56 +02:00
Linus Torvalds
7a8bdc7fe0 Merge tag 'platform-drivers-x86-v6.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:

 - amd/pmc: Use spurious 8042 quirk with MECHREVO Wujie 14XA

 - amd/pmf:
     - Ensure Smart PC policies are valid
     - Fix memory leak when the engine fails to start

 - amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive drivers

 - asus-wmi: Fix wlan_ctrl_by_user detection

 - thinkpad_acpi: Add support for NEC Lavie X1475JAS

* tag 'platform-drivers-x86-v6.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection
  platform/x86/amd/pmc: Declare quirk_spurious_8042 for MECHREVO Wujie 14XA (GX4HRXL)
  platform/x86: thinkpad_acpi: Support also NEC Lavie X1475JAS
  platform/x86/amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive drivers
  drivers/platform/x86/amd: pmf: Check for invalid Smart PC Policies
  drivers/platform/x86/amd: pmf: Check for invalid sideloaded Smart PC Policies
2025-05-12 10:48:02 -07:00
Linus Torvalds
8b64199a7f Merge tag 'udf_for_v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
 "Fix a bug in UDF inode eviction leading to spewing pointless
  error messages"

* tag 'udf_for_v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Make sure i_lenExtents is uptodate on inode eviction
2025-05-12 10:23:20 -07:00
Steven Rostedt
1b0c192c92 tracing: samples: Initialize trace_array_printk() with the correct function
When using trace_array_printk() on a created instance, the correct
function to use to initialize it is:

  trace_array_init_printk()

Not

  trace_printk_init_buffer()

The former is a proper function to use, the latter is for initializing
trace_printk() and causes the NOTICE banner to be displayed.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Divya Indi <divya.indi@oracle.com>
Link: https://lore.kernel.org/20250509152657.0f6744d9@gandalf.local.home
Fixes: 89ed42495e ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Fixes: 38ce2a9e33 ("tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-12 13:07:22 -04:00
Linus Torvalds
e238e49b18 Merge tag 'vfs-6.15-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Ensure that simple_xattr_list() always includes security.* xattrs

 - Fix eventpoll busy loop optimization when combined with timeouts

 - Disable swapon() for devices with block sizes greater than page sizes

 - Don't call errseq_set() twice during mark_buffer_write_io_error().
   Just use mapping_set_error() which takes care to not deference
   unconditionally

* tag 'vfs-6.15-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: Remove redundant errseq_set call in mark_buffer_write_io_error.
  swapfile: disable swapon for bs > ps devices
  fs/eventpoll: fix endless busy loop after timeout has expired
  fs/xattr.c: fix simple_xattr_list to always include security.* xattrs
2025-05-12 10:04:14 -07:00
Masahiro Yamada
e0cd396d89 kbuild: fix typos "module.builtin" to "modules.builtin"
The filenames in the comments do not match the actual generated files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12 15:04:52 +09:00
Thomas Weißschuh
8cf5b3f836 Revert "kbuild, rust: use -fremap-path-prefix to make paths relative"
This reverts commit dbdffaf50f.

--remap-path-prefix breaks the ability of debuggers to find the source
file corresponding to object files. As there is no simple or uniform
way to specify the source directory explicitly, this breaks developers
workflows.

Revert the unconditional usage of --remap-path-prefix, equivalent to the
same change for -ffile-prefix-map in KBUILD_CPPFLAGS.

Fixes: dbdffaf50f ("kbuild, rust: use -fremap-path-prefix to make paths relative")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12 15:04:13 +09:00
Thomas Weißschuh
020d7f1448 Revert "kbuild: make all file references relative to source root"
This reverts commit cacd22ce69.

-ffile-prefix-map breaks the ability of debuggers to find the source
file corresponding to object files. As there is no simple or uniform
way to specify the source directory explicitly, this breaks developers
workflows.

Revert the unconditional usage of -ffile-prefix-map.

Reported-by: Matthieu Baerts <matttbe@kernel.org>
Closes: https://lore.kernel.org/lkml/edc50aa7-0740-4942-8c15-96f12f2acc7e@kernel.org/
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Closes: https://lore.kernel.org/lkml/aBEttQH4kimHFScx@intel.com/
Fixes: cacd22ce69 ("kbuild: make all file references relative to source root")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12 15:04:13 +09:00
Masahiro Yamada
f0e4b333cf kbuild: fix dependency on sorttable
Commit ac4f06789b ("kbuild: Create intermediate vmlinux build with
relocations preserved") missed replacing one occurrence of "vmlinux"
that was added during the same development cycle.

Fixes: ac4f06789b ("kbuild: Create intermediate vmlinux build with relocations preserved")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
2025-05-12 15:04:09 +09:00
Masahiro Yamada
d1b99cdf22 init: remove unused CONFIG_CC_CAN_LINK_STATIC
This is a leftover from commit 98e20e5e13 ("bpfilter: remove bpfilter").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12 15:03:46 +09:00