linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 09:02:21 -04:00

Author	SHA1	Message	Date
Filipe Manana	40f2b11c1b	btrfs: don't allow log trees to consume global reserve or overcommit metadata For a fsync we never reserve space in advance, we just start a transaction without reserving space and we use an empty block reserve for a log tree. We reserve space as we need while updating a log tree, we end up in btrfs_use_block_rsv() when reserving space for the allocation of a log tree extent buffer and we attempt first to reserve without flushing, and if that fails we attempt to consume from the global reserve or overcommit metadata. This makes us consume space that may be the last resort for a transaction commit to succeed, therefore increasing the chances for a transaction abort with -ENOSPC. So make btrfs_use_block_rsv() fail if we can't reserve metadata space for a log tree extent buffer allocation without flushing, making the fsync fallback to a transaction commit and avoid using critical space that could be the only resort for a transaction commit to succeed when we are in a critical space situation. Reviewed-by: Leo Martins <loemra.dev@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:53 +02:00
Filipe Manana	574d93fc62	btrfs: be less aggressive with metadata overcommit when we can do full flushing Over the years we often get reports of some -ENOSPC failure while updating metadata that leads to a transaction abort. I have seen this happen for filesystems of all sizes and with workloads that are very user/customer specific and unable to reproduce, but Aleksandar recently reported a simple way to reproduce this with a 1G filesystem and using the bonnie++ benchmark tool. The following test script reproduces the failure: $ cat test.sh #!/bin/bash # Create and use a 1G null block device, memory backed, otherwise # the test takes a very long time. modprobe null_blk nr_devices="0" null_dev="/sys/kernel/config/nullb/nullb0" mkdir "$null_dev" size=$((1 * 1024)) # in MB echo 2 > "$null_dev/submit_queues" echo "$size" > "$null_dev/size" echo 1 > "$null_dev/memory_backed" echo 1 > "$null_dev/discard" echo 1 > "$null_dev/power" DEV=/dev/nullb0 MNT=/mnt/nullb0 mkfs.btrfs -f $DEV mount $DEV $MNT mkdir $MNT/test/ bonnie++ -d $MNT/test/ -m BTRFS -u 0 -s 256M -r 128M -b umount $MNT echo 0 > "$null_dev/power" rmdir "$null_dev" When running this bonnie++ fails in the phase where it deletes test directories and files: $ ./test.sh (...) Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...Can't sync directory, turning off dir-sync. Can't delete file 9Bq7sr0000000338 Cleaning up test directory after error. Bonnie: drastic I/O error (rmdir): Read-only file system And in the syslog/dmesg we can see the following transaction abort trace: [161915.501506] BTRFS warning (device nullb0): Skipping commit of aborted transaction. [161915.502983] ------------[ cut here ]------------ [161915.503832] BTRFS: Transaction aborted (error -28) [161915.504748] WARNING: fs/btrfs/transaction.c:2045 at btrfs_commit_transaction+0xa21/0xd30 [btrfs], CPU#11: bonnie++/3377975 [161915.506786] Modules linked in: btrfs dm_zero dm_snapshot (...) [161915.518759] CPU: 11 UID: 0 PID: 3377975 Comm: bonnie++ Tainted: G W 6.19.0-rc7-btrfs-next-224+ #4 PREEMPT(full) [161915.520857] Tainted: [W]=WARN [161915.521405] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [161915.523414] RIP: 0010:btrfs_commit_transaction+0xa24/0xd30 [btrfs] [161915.524630] Code: 48 8b 7c 24 (...) [161915.526982] RSP: 0018:ffffd3fe8206fda8 EFLAGS: 00010292 [161915.527707] RAX: 0000000000000002 RBX: ffff8f4886d3c000 RCX: 0000000000000000 [161915.528723] RDX: 0000000002040001 RSI: 00000000ffffffe4 RDI: ffffffffc088f780 [161915.529691] RBP: ffff8f4f5adae7e0 R08: 0000000000000000 R09: ffffd3fe8206fb90 [161915.530842] R10: ffff8f4f9c1fffa8 R11: 0000000000000003 R12: 00000000ffffffe4 [161915.532027] R13: ffff8f4ef2cf2400 R14: ffff8f4f5adae708 R15: ffff8f4f62d18000 [161915.533229] FS: 00007ff93112a780(0000) GS:ffff8f4ff63ee000(0000) knlGS:0000000000000000 [161915.534611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [161915.535575] CR2: 00005571b3072000 CR3: 0000000176080005 CR4: 0000000000370ef0 [161915.536758] Call Trace: [161915.537185] <TASK> [161915.537575] btrfs_sync_file+0x431/0x530 [btrfs] [161915.538473] do_fsync+0x39/0x80 [161915.539042] __x64_sys_fsync+0xf/0x20 [161915.539750] do_syscall_64+0x50/0xf20 [161915.540396] entry_SYSCALL_64_after_hwframe+0x76/0x7e [161915.541301] RIP: 0033:0x7ff930ca49ee [161915.541904] Code: 08 0f 85 f5 (...) [161915.544830] RSP: 002b:00007ffd94291f38 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [161915.546152] RAX: ffffffffffffffda RBX: 00007ff93112a780 RCX: 00007ff930ca49ee [161915.547263] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [161915.548383] RBP: 0000000000000dab R08: 0000000000000000 R09: 0000000000000000 [161915.549853] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd94291fb0 [161915.551196] R13: 00007ffd94292350 R14: 0000000000000001 R15: 00007ffd94292340 [161915.552161] </TASK> [161915.552457] ---[ end trace 0000000000000000 ]--- [161915.553232] BTRFS info (device nullb0 state A): dumping space info: [161915.553236] BTRFS info (device nullb0 state A): space_info DATA (sub-group id 0) has 12582912 free, is not full [161915.553239] BTRFS info (device nullb0 state A): space_info total=12582912, used=0, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 [161915.553243] BTRFS info (device nullb0 state A): space_info METADATA (sub-group id 0) has -5767168 free, is full [161915.553245] BTRFS info (device nullb0 state A): space_info total=53673984, used=6635520, pinned=46956544, reserved=16384, may_use=5767168, readonly=65536 zone_unusable=0 [161915.553251] BTRFS info (device nullb0 state A): space_info SYSTEM (sub-group id 0) has 8355840 free, is not full [161915.553254] BTRFS info (device nullb0 state A): space_info total=8388608, used=16384, pinned=16384, reserved=0, may_use=0, readonly=0 zone_unusable=0 [161915.553257] BTRFS info (device nullb0 state A): global_block_rsv: size 5767168 reserved 5767168 [161915.553261] BTRFS info (device nullb0 state A): trans_block_rsv: size 0 reserved 0 [161915.553263] BTRFS info (device nullb0 state A): chunk_block_rsv: size 0 reserved 0 [161915.553265] BTRFS info (device nullb0 state A): remap_block_rsv: size 0 reserved 0 [161915.553268] BTRFS info (device nullb0 state A): delayed_block_rsv: size 0 reserved 0 [161915.553270] BTRFS info (device nullb0 state A): delayed_refs_rsv: size 0 reserved 0 [161915.553272] BTRFS: error (device nullb0 state A) in cleanup_transaction:2045: errno=-28 No space left [161915.554463] BTRFS info (device nullb0 state EA): forced readonly The problem is that we allow for a very aggressive metadata overcommit, about 1/8th of the currently available space, even when the task attempting the reservation allows for full flushing. Over time this allows more and more tasks to overcommit without getting a transaction commit to release pinned extents, joining the same transaction and eventually lead to the transaction abort when attempting some tree update, as the extent allocator is not able to find any available metadata extent and it's not able to allocate a new metadata block group either (not enough unallocated space for that). Fix this by allowing the overcommit to be up to 1/64th of the available (unallocated) space instead and for that limit to apply to both types of full flushing, BTRFS_RESERVE_FLUSH_ALL and BTRFS_RESERVE_FLUSH_ALL_STEAL. This way we get more frequent transaction commits to release pinned extents in case our caller is in a context where full flushing is allowed. Note that the space infos dump in the dmesg/syslog right after the transaction abort give the wrong idea that we have plenty of unallocated space when the abort happened. During the bonnie++ workload we had a metadata chunk allocation attempt and it failed with -ENOSPC because at that time we had a bunch of data block groups allocated, which then became empty and got deleted by the cleaner kthread after the metadata chunk allocation failed with -ENOSPC and before the transaction abort happened and dumped the space infos. The custom tracing (some trace_printk() calls spread in strategic places) used to check that: mount-1793735 [011] ...1. 28877.261096: btrfs_add_bg_to_space_info: added bg offset 13631488 length 8388608 flags 1 to space_info->flags 1 total_bytes 8388608 bytes_used 0 bytes_may_use 0 mount-1793735 [011] ...1. 28877.261098: btrfs_add_bg_to_space_info: added bg offset 22020096 length 8388608 flags 34 to space_info->flags 2 total_bytes 8388608 bytes_used 16384 bytes_may_use 0 mount-1793735 [011] ...1. 28877.261100: btrfs_add_bg_to_space_info: added bg offset 30408704 length 53673984 flags 36 to space_info->flags 4 total_bytes 53673984 bytes_used 131072 bytes_may_use 0 These are from loading the block groups created by mkfs during mount. Then when bonnie++ starts doing its thing: kworker/u48:5-1792004 [011] ..... 28886.122050: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 kworker/u48:5-1792004 [011] ..... 28886.122053: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 927596544 kworker/u48:5-1792004 [011] ..... 28886.122055: btrfs_make_block_group: make bg offset 84082688 size 117440512 type 1 kworker/u48:5-1792004 [011] ...1. 28886.122064: btrfs_add_bg_to_space_info: added bg offset 84082688 length 117440512 flags 1 to space_info->flags 1 total_bytes 125829120 bytes_used 0 bytes_may_use 5251072 First allocation of a data block group of 112M. kworker/u48:5-1792004 [011] ..... 28886.192408: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 kworker/u48:5-1792004 [011] ..... 28886.192413: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 810156032 kworker/u48:5-1792004 [011] ..... 28886.192415: btrfs_make_block_group: make bg offset 201523200 size 117440512 type 1 kworker/u48:5-1792004 [011] ...1. 28886.192425: btrfs_add_bg_to_space_info: added bg offset 201523200 length 117440512 flags 1 to space_info->flags 1 total_bytes 243269632 bytes_used 0 bytes_may_use 122691584 Another 112M data block group allocated. kworker/u48:5-1792004 [011] ..... 28886.260935: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 kworker/u48:5-1792004 [011] ..... 28886.260941: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 692715520 kworker/u48:5-1792004 [011] ..... 28886.260943: btrfs_make_block_group: make bg offset 318963712 size 117440512 type 1 kworker/u48:5-1792004 [011] ...1. 28886.260954: btrfs_add_bg_to_space_info: added bg offset 318963712 length 117440512 flags 1 to space_info->flags 1 total_bytes 360710144 bytes_used 0 bytes_may_use 240132096 Yet another one. bonnie++-1793755 [010] ..... 28886.280407: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 bonnie++-1793755 [010] ..... 28886.280412: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 575275008 bonnie++-1793755 [010] ..... 28886.280414: btrfs_make_block_group: make bg offset 436404224 size 117440512 type 1 bonnie++-1793755 [010] ...1. 28886.280419: btrfs_add_bg_to_space_info: added bg offset 436404224 length 117440512 flags 1 to space_info->flags 1 total_bytes 478150656 bytes_used 0 bytes_may_use 268435456 One more. kworker/u48:5-1792004 [011] ..... 28886.566233: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 kworker/u48:5-1792004 [011] ..... 28886.566238: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 457834496 kworker/u48:5-1792004 [011] ..... 28886.566241: btrfs_make_block_group: make bg offset 553844736 size 117440512 type 1 kworker/u48:5-1792004 [011] ...1. 28886.566250: btrfs_add_bg_to_space_info: added bg offset 553844736 length 117440512 flags 1 to space_info->flags 1 total_bytes 595591168 bytes_used 268435456 bytes_may_use 209723392 Another one. bonnie++-1793755 [009] ..... 28886.613446: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 bonnie++-1793755 [009] ..... 28886.613451: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 340393984 bonnie++-1793755 [009] ..... 28886.613453: btrfs_make_block_group: make bg offset 671285248 size 117440512 type 1 bonnie++-1793755 [009] ...1. 28886.613458: btrfs_add_bg_to_space_info: added bg offset 671285248 length 117440512 flags 1 to space_info->flags 1 total_bytes 713031680 bytes_used 268435456 bytes_may_use 2 68435456 Another one. bonnie++-1793755 [009] ..... 28886.674953: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 bonnie++-1793755 [009] ..... 28886.674957: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 222953472 bonnie++-1793755 [009] ..... 28886.674959: btrfs_make_block_group: make bg offset 788725760 size 117440512 type 1 bonnie++-1793755 [009] ...1. 28886.674963: btrfs_add_bg_to_space_info: added bg offset 788725760 length 117440512 flags 1 to space_info->flags 1 total_bytes 830472192 bytes_used 268435456 bytes_may_use 1 34217728 Another one. bonnie++-1793755 [009] ..... 28886.674981: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 bonnie++-1793755 [009] ..... 28886.674982: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 105512960 bonnie++-1793755 [009] ..... 28886.674983: btrfs_make_block_group: make bg offset 906166272 size 105512960 type 1 bonnie++-1793755 [009] ...1. 28886.674984: btrfs_add_bg_to_space_info: added bg offset 906166272 length 105512960 flags 1 to space_info->flags 1 total_bytes 935985152 bytes_used 268435456 bytes_may_use 67108864 Another one, but a bit smaller (~100.6M) since we now have less space. bonnie++-1793758 [009] ..... 28891.962096: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 bonnie++-1793758 [009] ..... 28891.962103: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 65536 dev_extent_want 1073741824 max_avail 12582912 bonnie++-1793758 [009] ..... 28891.962105: btrfs_make_block_group: make bg offset 1011679232 size 12582912 type 1 bonnie++-1793758 [009] ...1. 28891.962114: btrfs_add_bg_to_space_info: added bg offset 1011679232 length 12582912 flags 1 to space_info->flags 1 total_bytes 948568064 bytes_used 268435456 bytes_may_use 8192 Another one, this one even smaller (12M). kworker/u48:5-1792004 [011] ..... 28892.112802: btrfs_chunk_alloc: enter first metadata chunk alloc attempt kworker/u48:5-1792004 [011] ..... 28892.112805: btrfs_create_chunk: gather_device_info 1 ctl->dev_extent_min = 131072 dev_extent_want 536870912 kworker/u48:5-1792004 [011] ..... 28892.112806: btrfs_create_chunk: gather_device_info 2 ctl->dev_extent_min = 131072 dev_extent_want 536870912 max_avail 0 536870912 is 512M, the standard 256M metadata chunk size times 2 because of the DUP profile for metadata. 'max_avail' is what find_free_dev_extent() returns to us in gather_device_info(). As a result, gather_device_info() sets ctl->ndevs to 0, making decide_stripe_size() fail with -ENOSPC, and therefore metadata chunk allocation fails while we are attempting to run delayed items during the transaction commit. kworker/u48:5-1792004 [011] ..... 28892.112807: btrfs_create_chunk: decide_stripe_size fail -ENOSPC In the syslog/dmesg pasted above, which happened after the transaction was aborted, the space info dumps did not account for all these data block groups that were allocated during bonnie++'s workload. And that is because after the metadata chunk allocation failed with -ENOSPC and before the transaction abort happened, most of the data block groups had become empty and got deleted by by the cleaner kthread - when the abort happened, we had bonnie++ in the middle of deleting the files it created. But dumping the space infos right after the metadata chunk allocation fails by adding a call to btrfs_dump_space_info_for_trans_abort() in decide_stripe_size() when it returns -ENOSPC, we get: [29972.409295] BTRFS info (device nullb0): dumping space info: [29972.409300] BTRFS info (device nullb0): space_info DATA (sub-group id 0) has 673341440 free, is not full [29972.409303] BTRFS info (device nullb0): space_info total=948568064, used=0, pinned=275226624, reserved=0, may_use=0, readonly=0 zone_unusable=0 [29972.409305] BTRFS info (device nullb0): space_info METADATA (sub-group id 0) has 3915776 free, is not full [29972.409306] BTRFS info (device nullb0): space_info total=53673984, used=163840, pinned=42827776, reserved=147456, may_use=6553600, readonly=65536 zone_unusable=0 [29972.409308] BTRFS info (device nullb0): space_info SYSTEM (sub-group id 0) has 7979008 free, is not full [29972.409310] BTRFS info (device nullb0): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=393216, readonly=0 zone_unusable=0 [29972.409311] BTRFS info (device nullb0): global_block_rsv: size 5767168 reserved 5767168 [29972.409313] BTRFS info (device nullb0): trans_block_rsv: size 0 reserved 0 [29972.409314] BTRFS info (device nullb0): chunk_block_rsv: size 393216 reserved 393216 [29972.409315] BTRFS info (device nullb0): remap_block_rsv: size 0 reserved 0 [29972.409316] BTRFS info (device nullb0): delayed_block_rsv: size 0 reserved 0 So here we see there's ~904.6M of data space, ~51.2M of metadata space and 8M of system space, making a total of 963.8M. Reported-by: Aleksandar Gerasimovski <Aleksandar.Gerasimovski@belden.com> Link: https://lore.kernel.org/linux-btrfs/SA1PR18MB56922F690C5EC2D85371408B998FA@SA1PR18MB5692.namprd18.prod.outlook.com/ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H61vZ3_+eqJ1A9po2WcgNJJjUu9MJQoYB2oDSAAecHaug@mail.gmail.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:53 +02:00
Qu Wenruo	2672a26a75	btrfs: use per-profile available space in calc_available_free_space() For the following disk layout, can_overcommit() can cause false confidence in available space: devid 1 unallocated: 1GiB devid 2 unallocated: 50GiB metadata type: RAID1 As can_overcommit() simply uses unallocated space with factor to calculate the allocatable metadata chunk size, resulting 25.5GiB available space. But in reality we can only allocate one 1GiB RAID1 chunk, the remaining 49GiB on devid 2 will never be utilized to fulfill a RAID1 chunk. This leads to various ENOSPC related transaction abort and flips the fs read-only. Now use per-profile available space in calc_available_free_space(), and only when that failed we fall back to the old factor based estimation. And for zoned devices or for the very low chance of temporary memory allocation failure, we will still fallback to factor based estimation. But I hope in reality it's very rare. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:53 +02:00
Qu Wenruo	c84053d9f7	btrfs: update per-profile available estimation This involves the following timing: - Chunk allocation - Chunk removal - After Mount - New device - Device removal - Device shrink - Device enlarge And since the function btrfs_update_per_profile_avail() will not return an error, this won't cause new error handling path. Although when btrfs_update_per_profile_avail() failed (only ENOSPC possible) it will mark the per-profile available estimation as unreliable, so later btrfs_get_per_profile_avail() will return false and require the caller to have a fallback solution. The function btrfs_update_per_profile_avail() will be executed with chunk_mutex hold, thus it will slightly slow down those involved functions, but not a lot. As all the core workload is just various u64 calculations inside a loop, without any tree search, the overhead should be acceptable even for all supported 9 profiles. For 4 disks (which exercises all 9 profiles), the execution time of that function will still be less than 10 us. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:53 +02:00
Qu Wenruo	52fead5eb8	btrfs: introduce the device layout aware per-profile available space [BUG] There is a long known bug that if metadata is using RAID1 on two disks with unbalanced sizes, there is a very high chance to hit ENOSPC related transaction abort. [CAUSE] The root cause is in the available space estimation code: - Factor based calculation Just use all unallocated space, divide by the profile factor One obvious user is can_overcommit(). This can not handle the following example: devid 1 unallocated: 1GiB devid 2 unallocated: 50GiB metadata type: RAID1 If using factor based estimation, we can use (1GiB + 50GiB) / 2 = 25.5GiB free space for metadata. Thus we can continue allocating metadata (over-commit) way beyond the 1GiB limit. But this estimation is completely wrong, in reality we can only allocate one single 1GiB RAID1 block group, thus if we continue over-commit, at one time we will hit ENOSPC at some critical path and flips the fs read-only. [SOLUTION] This patch will introduce per-profile available space estimation, which can provide chunk-allocator like behavior to give a (mostly) accurate result, with under-estimate corner cases. There are some differences between the estimation and real chunk allocator: - No consideration on hole size It's fine for most cases, as all data/metadata strips are in 1GiB size thus there should not be any hole wasting much space. And chunk allocator is able to use smaller stripes when there is really no other choice. Although in theory this means it can lead to some over-estimation, it should not cause too much hassle in the real world. The other benefit of such behavior is, we avoid dev-extent tree search completely, thus the overhead is very small. - No true balance for certain cases If we have 3 disks RAID1, and each device has 2GiB unallocated space, we can load balance the chunk allocation so that we can allocate 3GiB RAID1 chunks, and that's what chunk allocator will do. But this current estimation code is using the largest available space to do a single allocation. Meaning the estimation will be 2GiB, thus under estimate. Such under estimation is fine and after the first chunk allocation, the estimation will be updated and still give a correct 2GiB estimation. So this only means the estimation will be a little conservative, which is safer for call sites like metadata over-commit check. With that facility, for above 1GiB + 50GiB case, it will give a RAID1 estimation of 1GiB, instead of the incorrect 25.5GiB. Or for a more complex example: devid 1 unallocated: 1T devid 2 unallocated: 1T devid 3 unallocated: 10T We will get an array of: RAID10: 2T RAID1: 2T RAID1C3: 1T RAID1C4: 0 (not enough devices) DUP: 6T RAID0: 3T SINGLE: 12T RAID5: 2T RAID6: 1T [IMPLEMENTATION] And for the each profile , we go chunk allocator level calculation: The pseudo code looks like: clear_virtual_used_space_of_all_rw_devices(); do { /* * The same as chunk allocator, despite used space, * we also take virtual used space into consideration. / sort_device_with_virtual_free_space(); / * Unlike chunk allocator, we don't need to bother hole/stripe * size, so we use the smallest device to make sure we can * allocated as many stripes as regular chunk allocator / stripe_size = device_with_smallest_free->avail_space; stripe_size = min(stripe_size, to_alloc / ndevs); / * Allocate a virtual chunk, allocated virtual chunk will * increase virtual used space, allow next iteration to * properly emulate chunk allocator behavior. */ ret = alloc_virtual_chunk(stripe_size, &allocated_size); if (ret == 0) avail += allocated_size; } while (ret == 0) This minimal available space based calculation is not perfect, but the important part is, the estimation is never exceeding the real available space. This patch just introduces the infrastructure, no hooks are executed yet. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:53 +02:00
Jiasheng Jiang	08ef56661f	btrfs: zoned: remove redundant space_info lock and variable in do_allocation_zoned() In do_allocation_zoned(), the code acquires space_info->lock before block_group->lock. However, the critical section does not access or modify any members of the space_info structure. Thus, the lock is redundant as it provides no necessary synchronization here. This change simplifies the locking logic and aligns the function with other zoned paths, such as __btrfs_add_free_space_zoned(), which only rely on block_group->lock. Since the 'space_info' local variable is no longer used after removing the lock calls, it is also removed. Removing this unnecessary lock reduces contention on the global space_info lock, improving concurrency in the zoned allocation path. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:52 +02:00
Filipe Manana	6141abb7f1	btrfs: move min sys chunk array size check to validate_sys_chunk_array() We check the minimum size of the sys chunk array in btrfs_validate_super() but we have a better place for that, the helper validate_sys_chunk_array() which we use for every other sys chunk array check. So move it there, also converting the return error from -EINVAL to -EUCLEAN, which is a better fit and also consistent with the other checks. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:52 +02:00
Filipe Manana	f3da62571b	btrfs: remove duplicate system chunk array max size overflow check We check it twice, once in validate_sys_chunk_array() and then again in its caller, btrfs_validate_super(), right after it calls validate_sys_chunk_array(). So remove the duplicated check from btrfs_validate_super(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:52 +02:00
Filipe Manana	c4d30088fa	btrfs: pass boolean literals as the last argument to inc_block_group_ro() The last argument of inc_block_group_ro() is defined as a boolean, but every caller is passing an integer literal, 0 or 1 for false and true respectively. While this is not incorrect, as 0 and 1 are converted to false and true, it's less readable and somewhat awkward since the argument is defined as boolean. Replace 0 and 1 with false and true. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:52 +02:00
Naohiro Aota	1ba19a6ea9	btrfs: tests: zoned: add tests cases for zoned code Add a test function for the zoned code, for now it tests btrfs_load_block_group_by_raid_type() with various test cases. The load_zone_info_tests[] array defines the test cases. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-07 18:55:52 +02:00
Linus Torvalds	591cd656a1	Linux 7.0-rc7 v7.0-rc7	2026-04-05 15:26:23 -07:00
Linus Torvalds	85fb6da43a	Merge tag 'riscv-for-linus-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - Fix a CONFIG_SPARSEMEM crash on RV32 by avoiding early phys_to_page() - Prevent runtime const infrastructure from being used by modules, similar to what was done for x86 - Avoid problems when shutting down ACPI systems with IOMMUs by adding a device dependency between IOMMU and devices that use it - Fix a bug where the CPU pointer masking state isn't properly reset when tagged addresses aren't enabled for a task - Fix some incorrect register assignments, and add some missing ones, in kgdb support code - Fix compilation of non-kernel code that uses the ptrace uapi header by replacing BIT() with _BITUL() - Fix compilation of the validate_v_ptrace kselftest by working around kselftest macro expansion issues * tag 'riscv-for-linus-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: ACPI: RIMT: Add dependency between iommu and devices selftests: riscv: Add braces around EXPECT_EQ() riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftests riscv: Reset pmm when PR_TAGGED_ADDR_ENABLE is not set riscv: make runtime const not usable by modules riscv: patch: Avoid early phys_to_page() riscv: kgdb: fix several debug register assignment bugs	2026-04-05 14:43:47 -07:00
Linus Torvalds	10b76a429a	Merge tag 'x86-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix kexec crash on KCOV-instrumented kernels (Aleksandr Nogikh) - Fix Geode platform driver on-stack property data use-after-return bug (Dmitry Torokhov) * tag 'x86-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/geode: Fix on-stack property data use-after-return bug x86/kexec: Disable KCOV instrumentation after load_segments()	2026-04-05 13:53:07 -07:00
Linus Torvalds	2ab99ad7fa	Merge tag 'sched-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix zero_vruntime tracking again (Peter Zijlstra) - Fix avg_vruntime() usage in sched_debug (Peter Zijlstra) * tag 'sched-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix avg_vruntime() usage sched/fair: Fix zero_vruntime tracking fix	2026-04-05 13:45:37 -07:00
Linus Torvalds	7bba6c8622	Merge tag 'perf-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: - Fix potential bad container_of() in intel_pmu_hw_config() (Ian Rogers) * tag 'perf-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix potential bad container_of in intel_pmu_hw_config	2026-04-05 13:43:26 -07:00
Linus Torvalds	1391af0364	Merge tag 'irq-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: - Fix RISC-V APLIC irqchip driver setup errors on ACPI systems (Jessica Liu) * tag 'irq-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-aplic: Restrict genpd notifier to device tree only	2026-04-05 13:40:58 -07:00
Linus Torvalds	5401b9adeb	i915: don't use a vma that didn't match the context VM In eb_lookup_vma(), the code checks that the context vm matches before incrementing the i915 vma usage count, but for the non-matching case it didn't clear the non-matching vma pointer, so it would then mistakenly be returned, causing potential UaF and refcount issues. Reported-by: Yassine Mounir <sosohero200@gmail.com> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-04-05 12:42:25 -07:00
Linus Torvalds	eb3765aa71	Merge tag 'mips-fixes_7.0_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix TLB uniquification for systems with TLB not initialised by firmware - Fix allocation in TLB uniquification - Fix SiByte cache initialisation - Check uart parameters from firmware on Loongson64 systems - Fix clock id mismatch for Ralink SoCs - Fix GCC version check for __mutli3 workaround * tag 'mips-fixes_7.0_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mips: mm: Allocate tlb_vpn array atomically MIPS: mm: Rewrite TLB uniquification for the hidden bit feature MIPS: mm: Suppress TLB uniquification on EHINV hardware MIPS: Always record SEGBITS in cpu_data.vmbits MIPS: Fix the GCC version check for `__multi3' workaround MIPS: SiByte: Bring back cache initialisation mips: ralink: update CPU clock index MIPS: Loongson64: env: Check UARTs passed by LEFI cautiously	2026-04-05 11:29:07 -07:00
Linus Torvalds	1791c39014	Merge tag 'char-misc-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/iio driver fixes from Greg KH: "Here are a relativly large number of small char/misc/iio and other driver fixes for 7.0-rc7. There's a bunch, but overall they are all small fixes for issues that people have been having that I finally caught up with getting merged due to delays on my end. The "largest" change overall is just some documentation updates to the security-bugs.rst file to hopefully tell the AI tools (and any users that actually read the documentation), how to send us better security bug reports as the quantity of reports these past few weeks has increased dramatically due to tools getting better at "finding" things. Included in here are: - lots of small IIO driver fixes for issues reported in 7.0-rc - gpib driver fixes - comedi driver fixes - interconnect driver fix - nvmem driver fixes - mei driver fix - counter driver fix - binder rust driver fixes - some other small misc driver fixes All of these have been in linux-next this week with no reported issues" * tag 'char-misc-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (63 commits) Documentation: fix two typos in latest update to the security report howto Documentation: clarify the mandatory and desirable info for security reports Documentation: explain how to find maintainers addresses for security reports Documentation: minor updates to the security contacts .get_maintainer.ignore: add myself nvmem: zynqmp_nvmem: Fix buffer size in DMA and memcpy nvmem: imx: assign nvmem_cell_info::raw_len misc: fastrpc: check qcom_scm_assign_mem() return in rpmsg_probe misc: fastrpc: possible double-free of cctx->remote_heap comedi: dt2815: add hardware detection to prevent crash comedi: runflags cannot determine whether to reclaim chanlist comedi: Reinit dev->spinlock between attachments to low-level drivers comedi: me_daq: Fix potential overrun of firmware buffer comedi: me4000: Fix potential overrun of firmware buffer comedi: ni_atmio16d: Fix invalid clean-up after failed attach gpib: fix use-after-free in IO ioctl handlers gpib: lpvo_usb: fix memory leak on disconnect gpib: Fix fluke driver s390 compile issue lis3lv02d: Omit IRQF_ONESHOT if no threaded handler is provided lis3lv02d: fix kernel-doc warnings ...	2026-04-05 10:09:33 -07:00
Linus Torvalds	7a60c79bd0	Merge tag 'tty-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty fixes from Greg KH: "Here are two small tty vt fixes for 7.0-rc7 to resolve some reported issues with the resize ability of the alt screen buffer. Both of these have been in linux-next all week with no reported issues" * tag 'tty-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: resize saved unicode buffer on alt screen exit after resize vt: discard stale unicode buffer on alt screen exit after resize	2026-04-05 10:04:28 -07:00
Linus Torvalds	aea7c84f28	Merge tag 'usb-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are a bunch of USB and Thunderbolt fixes (most all are USB) for 7.0-rc7. More than I normally like this late in the release cycle, partly due to my recent travels, and partly due to people banging away on the USB gadget interfaces and apis more than normal (big shoutout to Android for getting the vendors to actually work upstream on this, that's a huge win overall for everyone here) Included in here are: - Small thunderbolt fix - new USB serial driver ids added - typec driver fixes - gadget driver fixes for some disconnect issues - other usb gadget driver fixes for reported problems with binding and unbinding devices as happens when a gadget device connects / disconnects from a system it is plugged into (or it switches device mode at a user's request, these things are complex little beasts...) - usb offload fixes (where USB audio tunnels through the controller while the main CPU is asleep) for when EMP spikes hit the system causing disconnects to happen (as often happens with static electricity in the winter months). This has been much reported by at least one vendor, and resolves the issues they have been seeing with this codepath. Can't wait for the "formal methods are the answer!" people to try to model that one properly... - Other small usb driver fixes for issues reported. All of these have been in linux-next this week, and before, with no reported issues, and I've personally been stressing these harder than normal on my systems here with no problems" * tag 'usb-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (39 commits) usb: gadget: f_hid: move list and spinlock inits from bind to alloc usb: host: xhci-sideband: delegate offload_usage tracking to class drivers usb: core: use dedicated spinlock for offload state usb: cdns3: gadget: fix state inconsistency on gadget init failure usb: dwc3: imx8mp: fix memory leak on probe failure path usb: gadget: f_uac1_legacy: validate control request size usb: ulpi: fix double free in ulpi_register_interface() error path usb: misc: usbio: Fix URB memory leak on submit failure USB: core: add NO_LPM quirk for Razer Kiyo Pro webcam usb: cdns3: gadget: fix NULL pointer dereference in ep_queue usb: core: phy: avoid double use of 'usb3-phy' USB: serial: option: add MeiG Smart SRM825WN usb: gadget: f_rndis: Fix net_device lifecycle with device_move usb: gadget: f_subset: Fix net_device lifecycle with device_move usb: gadget: f_eem: Fix net_device lifecycle with device_move usb: gadget: f_ecm: Fix net_device lifecycle with device_move usb: gadget: u_ncm: Add kernel-doc comments for struct f_ncm_opts usb: gadget: f_rndis: Protect RNDIS options with mutex usb: gadget: f_subset: Fix unbalanced refcnt in geth_free dt-bindings: connector: add pd-disable dependency ...	2026-04-05 10:00:26 -07:00
Sunil V L	9156585280	ACPI: RIMT: Add dependency between iommu and devices EPROBE_DEFER ensures IOMMU devices are probed before the devices that depend on them. During shutdown, however, the IOMMU may be removed first, leading to issues. To avoid this, a device link is added which enforces the correct removal order. Fixes: `8f77295525` ("ACPI: RISC-V: Add support for RIMT") Signed-off-by: Sunil V L <sunilvl@oss.qualcomm.com> Link: https://patch.msgid.link/20260303061605.722949-1-sunilvl@oss.qualcomm.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:38:03 -06:00
Charlie Jenkins	511361fe7a	selftests: riscv: Add braces around EXPECT_EQ() EXPECT_EQ() expands to multiple lines, breaking up one-line if statements. This issue was not present in the patch on the mailing list but was instead introduced by the maintainer when attempting to fix up checkpatch warnings. Add braces around EXPECT_EQ() to avoid the error even though checkpatch suggests them to be removed: validate_v_ptrace.c:626:17: error: ‘else’ without a previous ‘if’ Fixes: `3789d5eecd` ("selftests: riscv: verify syscalls discard vector context") Fixes: `30eb191c89` ("selftests: riscv: verify ptrace rejects invalid vector csr inputs") Fixes: `849f05ae1e` ("selftests: riscv: verify ptrace accepts valid vector csr values") Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com> Reviewed-and-tested-by: Sergey Matyukevich <geomatsi@gmail.com> Link: https://patch.msgid.link/20260309-fix_selftests-v2-2-9d5a553a531e@gmail.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:37:57 -06:00
Paul Walmsley	87ad7cc9aa	riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftests Fix the build of non-kernel code that includes the RISC-V ptrace uapi header, and the RISC-V validate_v_ptrace.c kselftest, by using the _BITUL() macro rather than BIT(). BIT() is not available outside the kernel. Based on patches and comments from Charlie Jenkins, Michael Neuling, and Andreas Schwab. Fixes: `30eb191c89` ("selftests: riscv: verify ptrace rejects invalid vector csr inputs") Fixes: `2af7c9cf02` ("riscv/ptrace: expose riscv CFI status and state via ptrace and in core files") Cc: Andreas Schwab <schwab@suse.de> Cc: Michael Neuling <mikey@neuling.org> Cc: Charlie Jenkins <thecharlesjenkins@gmail.com> Link: https://patch.msgid.link/20260330024248.449292-1-mikey@neuling.org Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-1-9d5a553a531e@gmail.com/ Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-3-9d5a553a531e@gmail.com/ Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:37:54 -06:00
Zishun Yi	3033b2b1e3	riscv: Reset pmm when PR_TAGGED_ADDR_ENABLE is not set In set_tagged_addr_ctrl(), when PR_TAGGED_ADDR_ENABLE is not set, pmlen is correctly set to 0, but it forgets to reset pmm. This results in the CPU pmm state not corresponding to the software pmlen state. Fix this by resetting pmm along with pmlen. Fixes: `2e17430858` ("riscv: Add support for the tagged address ABI") Signed-off-by: Zishun Yi <vulab@iscas.ac.cn> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://patch.msgid.link/20260322160022.21908-1-vulab@iscas.ac.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:37:45 -06:00
Jisheng Zhang	57f0253bc1	riscv: make runtime const not usable by modules Similar as commit `284922f4c5` ("x86: uaccess: don't use runtime-const rewriting in modules") does, make riscv's runtime const not usable by modules too, to "make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations". The reason is well explained in the above commit: "The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code." Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://patch.msgid.link/20260221023731.3476-1-jszhang@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:37:31 -06:00
Vivian Wang	6b60a128c2	riscv: patch: Avoid early phys_to_page() Similarly to commit `8d09e2d569` ("arm64: patching: avoid early page_to_phys()"), avoid using phys_to_page() for the kernel address case in patch_map(). Since this is called from apply_boot_alternatives() in setup_arch(), and commit `4267739cab` ("arch, mm: consolidate initialization of SPARSE memory model") has moved sparse_init() to after setup_arch(), phys_to_page() is not available there yet, and it panics on boot with SPARSEMEM on RV32, which does not use SPARSEMEM_VMEMMAP. Reported-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Closes: https://lore.kernel.org/r/20260223144108-dcace0b9-02e8-4b67-a7ce-f263bed36f26@linutronix.de/ Fixes: `4267739cab` ("arch, mm: consolidate initialization of SPARSE memory model") Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260310-riscv-sparsemem-alternatives-fix-v1-1-659d5dd257e2@iscas.ac.cn [pjw@kernel.org: fix the subject line to align with the patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:37:03 -06:00
Paul Walmsley	834911eb8e	riscv: kgdb: fix several debug register assignment bugs Fix several bugs in the RISC-V kgdb implementation: - The element of dbg_reg_def[] that is supposed to pertain to the S1 register embeds instead the struct pt_regs offset of the A1 register. Fix this to use the S1 register offset in struct pt_regs. - The sleeping_thread_to_gdb_regs() function copies the value of the S10 register into the gdb_regs[] array element meant for the S9 register, and copies the value of the S11 register into the array element meant for the S10 register. It also neglects to copy the value of the S11 register. Fix all of these issues. Fixes: `fe89bd2be8` ("riscv: Add KGDB support") Cc: Vincent Chen <vincent.chen@sifive.com> Link: https://patch.msgid.link/fde376f8-bcfd-bfe4-e467-07d8f7608d05@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-04-04 18:36:52 -06:00
Linus Torvalds	3aae9383f4	Merge tag 'input-for-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - new IDs for BETOP BTP-KP50B/C and Razer Wolverine V3 Pro added to xpad controller driver - another quirk for new TUXEDO InfinityBook added to i8042 - a small fixup for Synaptics RMI4 driver to properly unlock mutex when encountering an error in F54 - an update to bcm5974 touch controller driver to reliably switch into wellspring mode * tag 'input-for-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - add support for BETOP BTP-KP50B/C controller's wireless mode Input: xpad - add support for Razer Wolverine V3 Pro Input: synaptics-rmi4 - fix a locking bug in an error path Input: i8042 - add TUXEDO InfinityBook Max 16 Gen10 AMD to i8042 quirk table Input: bcm5974 - recover from failed mode switch	2026-04-04 08:24:32 -07:00
Willy Tarreau	f387e2e2b9	Documentation: fix two typos in latest update to the security report howto In previous patch "Documentation: clarify the mandatory and desirable info for security reports" I left two typos that I didn't detect in local checks. One is "get_maintainers.pl" (no 's' in the script name), and the other one is a missing closing quote after "Reported-by", which didn't have effect here but I don't know if it can break rendering elsewhere (e.g. on the public HTML page). Better fix it before it gets merged. Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260404082033.5160-1-w@1wt.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-04-04 10:38:43 +02:00
Shengyu Qu	0d9363a764	Input: xpad - add support for BETOP BTP-KP50B/C controller's wireless mode BETOP's BTP-KP50B and BTP-KP50C controller's wireless dongles are both working as standard Xbox 360 controllers. Add USB device IDs for them to xpad driver. Signed-off-by: Shengyu Qu <wiagn233@outlook.com> Link: https://patch.msgid.link/TY4PR01MB14432B4B298EA186E5F86C46B9855A@TY4PR01MB14432.jpnprd01.prod.outlook.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2026-04-03 22:37:30 -07:00
Zoltan Illes	e2b0ae529d	Input: xpad - add support for Razer Wolverine V3 Pro Add device IDs for the Razer Wolverine V3 Pro controller in both wired (0x0a57) and wireless 2.4 GHz dongle (0x0a59) modes. The controller uses the Xbox 360 protocol (vendor-specific class, subclass 93, protocol 1) on interface 0 with an identical 20-byte input report layout, so no additional processing is needed. Signed-off-by: Zoltan Illes <zoliviragh@gmail.com> Link: https://patch.msgid.link/20260329220031.1325509-1-137647604+ZlordHUN@users.noreply.github.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2026-04-03 22:37:29 -07:00
Linus Torvalds	7ca6d1cfec	Merge tag 'powerpc-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Madhavan Srinivasan: - fix iommu incorrectly bypassing DMA APIs Thanks to Dan Horak, Gaurav Batra, and Ritesh Harjani (IBM). * tag 'powerpc-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv/iommu: iommu incorrectly bypass DMA APIs	2026-04-03 20:08:25 -07:00
Linus Torvalds	3719114091	Merge tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix a memory leak in the zcrypt driver where the AP message buffer for clear key RSA requests was allocated twice, once by the caller and again locally, causing the first allocation to never be freed - Fix the cpum_sf perf sampling rate overflow adjustment to clamp the recalculated rate to the hardware maximum, preventing exceptions on heavily loaded systems running with HZ=1000 * tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix memory leak with CCA cards used as accelerator s390/cpum_sf: Cap sampling rate to prevent lsctl exception	2026-04-03 17:50:24 -07:00
Linus Torvalds	1523e4d567	Merge tag 'hwmon-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix temperature sensor for PRIME X670E-PRO WIFI - occ: Add missing newline, and fix potential division by zero - pmbus: - Fix device ID comparison and printing in tps53676_identify() - Add missing MODULE_IMPORT_NS("PMBUS") for ltc4286 - Check return value of page-select write in pxe1610 probe - Fix array access with zero-length block tps53679 read * tag 'hwmon-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (asus-ec-sensors) Fix T_Sensor for PRIME X670E-PRO WIFI hwmon: (occ) Fix missing newline in occ_show_extended() hwmon: (occ) Fix division by zero in occ_show_power_1() hwmon: (tps53679) Fix device ID comparison and printing in tps53676_identify() hwmon: (ltc4286) Add missing MODULE_IMPORT_NS("PMBUS") hwmon: (pxe1610) Check return value of page-select write in probe hwmon: (tps53679) Fix array access with zero-length block read	2026-04-03 17:13:59 -07:00
Linus Torvalds	631919fb12	Merge tag 'sched_ext-for-7.0-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: "These are late but both fix subtle yet critical problems and the blast radius is limited strictly to sched_ext. - Fix stale direct dispatch state in ddsp_dsq_id which can cause spurious warnings in mark_direct_dispatch() on task wakeup - Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU configs which can lead to incorrectly dispatching migration- disabled tasks to remote CPUs" * tag 'sched_ext-for-7.0-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix stale direct dispatch state in ddsp_dsq_id sched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU	2026-04-03 12:05:06 -07:00
Linus Torvalds	e41255ce7a	Merge tag 'io_uring-7.0-20260403' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - A previous fix in this release covered the case of the rings being RCU protected during resize, but it missed a few spots. This covers the rest - Fix the cBPF filters when COW'ed, introduced in this merge window - Fix for an attempt to import a zero sized buffer - Fix for a missing clamp in importing bundle buffers * tag 'io_uring-7.0-20260403' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/bpf_filters: retain COW'ed settings on parse failures io_uring: protect remaining lockless ctx->rings accesses with RCU io_uring/rsrc: reject zero-length fixed buffer import io_uring/net: fix slab-out-of-bounds read in io_bundle_nbufs()	2026-04-03 11:58:04 -07:00
Linus Torvalds	c514f73377	Merge tag 'spi-fix-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of fixes, mostly probe/remove issues that are the result of Felix Gu going and auditing those areas, plus one error handling fix for the Cadence QSPI driver" * tag 'spi-fix-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: cadence-qspi: Fix exec_mem_op error handling spi: amlogic: spifc-a4: unregister ECC engine on probe failure and remove() callback spi: stm32-ospi: Fix DMA channel leak on stm32_ospi_dma_setup() failure spi: stm32-ospi: Fix reset control leak on probe error spi: stm32-ospi: Fix resource leak in remove() callback	2026-04-03 10:19:52 -07:00
Andrea Righi	7e0ffb72de	sched_ext: Fix stale direct dispatch state in ddsp_dsq_id @p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) triggering a spurious warning in mark_direct_dispatch() when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert(), such as: WARNING: kernel/sched/ext.c:1273 at scx_dsq_insert_commit+0xcd/0x140 The root cause is that ddsp_dsq_id was only cleared in dispatch_enqueue(), which is not reached in all paths that consume or cancel a direct dispatch verdict. Fix it by clearing it at the right places: - direct_dispatch(): cache the direct dispatch state in local variables and clear it before dispatch_enqueue() on the synchronous path. For the deferred path, the direct dispatch state must remain set until process_ddsp_deferred_locals() consumes them. - process_ddsp_deferred_locals(): cache the dispatch state in local variables and clear it before calling dispatch_to_local_dsq(), which may migrate the task to another rq. - do_enqueue_task(): clear the dispatch state on the enqueue path (local/global/bypass fallbacks), where the direct dispatch verdict is ignored. - dequeue_task_scx(): clear the dispatch state after dispatch_dequeue() to handle both the deferred dispatch cancellation and the holding_cpu race, covering all cases where a pending direct dispatch is cancelled. - scx_disable_task(): clear the direct dispatch state when transitioning a task out of the current scheduler. Waking tasks may have had the direct dispatch state set by the outgoing scheduler's ops.select_cpu() and then been queued on a wake_list via ttwu_queue_wakelist(), when SCX_OPS_ALLOW_QUEUED_WAKEUP is set. Such tasks are not on the runqueue and are not iterated by scx_bypass(), so their direct dispatch state won't be cleared. Without this clear, any subsequent SCX scheduler that tries to direct dispatch the task will trigger the WARN_ON_ONCE() in mark_direct_dispatch(). Fixes: `5b26f7b920` ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches") Cc: stable@vger.kernel.org # v6.12+ Cc: Daniel Hodges <hodgesd@meta.com> Cc: Patrick Somaru <patsomaru@meta.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-03 07:14:49 -10:00
Linus Torvalds	1270605fd2	Merge tag 'pm-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a potential NULL pointer dereference in the energy model netlink interface and a potential double free in an error path in the common cpufreq governor management code: - Fix a NULL pointer dereference in the energy model netlink interface that may occur if a given perf domain ID is not recognized (Changwoo Min) - Avoid double free in the cpufreq_dbs_governor_init() error path when kobject_init_and_add() fails (Guangshuo Li)" * tag 'pm-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: governor: fix double free in cpufreq_dbs_governor_init() error path PM: EM: Fix NULL pointer dereference when perf domain ID is not found	2026-04-03 09:56:32 -07:00
Linus Torvalds	576db0f375	Merge tag 'thermal-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Address potential races between thermal zone removal and system resume that may lead to a use-after-free (in two different ways) and a potential use-after-free in the thermal zone unregistration path (Rafael Wysocki)" * tag 'thermal-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Fix thermal zone device registration error path thermal: core: Address thermal zone removal races with resume	2026-04-03 09:49:06 -07:00
Linus Torvalds	116a3308e1	Merge tag 'gpio-fixes-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix kerneldocs for gpio-timberdale and gpio-nomadik - clear the "requested" flag in error path in gpiod_request_commit() - call of_xlate() if provided when setting up shared GPIOs - handle pins shared by child firmware nodes of consumer devices - fix return value check in gpio-qixis-fpga - fix suspend on gpio-mxc - fix gpio-microchip DT bindings * tag 'gpio-fixes-for-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: dt-bindings: gpio: fix microchip #interrupt-cells gpio: shared: shorten the critical section in gpiochip_setup_shared() gpio: mxc: map Both Edge pad wakeup to Rising Edge gpio: qixis-fpga: Fix error handling for devm_regmap_init_mmio() gpio: shared: handle pins shared by child nodes of devices gpio: shared: call gpio_chip::of_xlate() if set gpiolib: clear requested flag if line is invalid gpio: nomadik: repair some kernel-doc comments gpio: timberdale: repair kernel-doc comments gpio: Fix resource leaks on errors in gpiochip_add_data_with_key()	2026-04-03 09:33:38 -07:00
Linus Torvalds	441c63ff42	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: - Implement a basic static call trampoline to fix CFI failures with the generic implementation * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Use static call trampolines when kCFI is enabled	2026-04-03 08:47:13 -07:00
Linus Torvalds	60d9212c69	Merge tag 'drm-fixes-2026-04-03' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Hopefully no Easter eggs in this bunch of fixes. Usual stuff across the amd/intel with some misc bits. Thanks to Thorsten and Alex for making sure a regression fix that was hanging around in process land finally made it in, that is probably the biggest change in here. core: - revert unplug/framebuffer fix as it caused problems - compat ioctl speculation fix bridge: - refcounting fix sysfb: - error handling fix amdgpu: - fix renoir audio regression - UserQ fixes - PASID handling fix - S4 fix for smu11 chips - Misc small fixes amdkfd: - Non-4K page fixes i915: - Fix for #12045: Huawei Matebook E (DRR-WXX): Persistent Black Screen on Boot with i915 and Gen11: Modesetting and Backlight Control Malfunction - Fix for #15826: i915: Raptor Lake-P [UHD Graphics] display flicker/corruption on eDP panel - Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDP xe: - uapi: Accept canonical GPU addresses in xe_vm_madvise_ioctl - Disallow writes to read-only VMAs - PXP fixes - Disable garbage collector work item on SVM close - void memory allocations in xe_device_declare_wedged qaic: - hang fix ast: - initialisation fix" * tag 'drm-fixes-2026-04-03' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations drm/ioc32: stop speculation on the drm_compat_ioctl path drm/sysfb: Fix efidrm error handling and memory type mismatch drm/i915/dp: Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDP drm/i915/cdclk: Do the full CDCLK dance for min_voltage_level changes drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size drm/amdgpu: Fix wait after reset sequence in S4 drm/amd/display: Fix NULL pointer dereference in dcn401_init_hw() drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB drm/amdgpu/userq: fix memory leak in MQD creation error paths drm/amd: Fix MQD and control stack alignment for non-4K drm/amdkfd: Align expected_queue_size to PAGE_SIZE drm/amdgpu: fix the idr allocation flags drm/amdgpu: validate doorbell_offset in user queue creation drm/amdgpu/pm: drop SMU driver if version not matched messages drm/xe: Avoid memory allocations in xe_device_declare_wedged() drm/xe: Disable garbage collector work item on SVM close drm/xe/pxp: Don't allow PXP on older PTL GSC FWs drm/xe/pxp: Clear restart flag in pxp_start after jumping back drm/xe/pxp: Remove incorrect handling of impossible state during suspend ...	2026-04-03 08:23:51 -07:00
Rafael J. Wysocki	744d5721d2	Merge branch 'pm-em' Fix a NULL pointer dereference in the energy model netlink interface that may occur if a given perf domain ID is not recognized (Changwoo Min). * pm-em: PM: EM: Fix NULL pointer dereference when perf domain ID is not found	2026-04-03 14:15:06 +02:00
Willy Tarreau	496fa1befb	Documentation: clarify the mandatory and desirable info for security reports A significant part of the effort of the security team consists in begging reporters for patch proposals, or asking them to provide them in regular format, and most of the time they're willing to provide this, they just didn't know that it would help. So let's add a section detailing the required and desirable contents in a security report to help reporters write more actionable reports which do not require round trips. Cc: Eric Dumazet <edumazet@google.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260403062018.31080-4-w@1wt.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-04-03 13:11:23 +02:00
Willy Tarreau	a72b832a48	Documentation: explain how to find maintainers addresses for security reports These days, 80% of the work done by the security team consists in locating the affected subsystem in a report, running get_maintainers on it, forwarding the report to these persons and responding to the reporter with them in Cc. This is a huge and unneeded overhead that we must try to lower for a better overall efficiency. This patch adds a complete section explaining how to figure the list of recipients to send the report to. Cc: Eric Dumazet <edumazet@google.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260403062018.31080-3-w@1wt.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-04-03 13:11:23 +02:00
Willy Tarreau	f2b1cbef15	Documentation: minor updates to the security contacts This clarifies the fact that the bug reporters must use a valid e-mail address to send their report, and that the security team assists developers working on a fix but doesn't always produce fixes on its own. Cc: Eric Dumazet <edumazet@google.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://patch.msgid.link/20260403062018.31080-2-w@1wt.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-04-03 13:11:23 +02:00
Dave Airlie	75f53c4b2d	Merge tag 'drm-misc-fixes-2026-04-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A refcounting fix for bridges, revert a previous framebuffer use-after-free fix that turned out to be causing more problems, a hang fix for qaic, an initialization fix for ast, a error handling fix for sysfb, and a speculation fix for drm_compat_ioctl. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260402-vivid-perfect-caiman-ca055e@houat	2026-04-03 19:07:43 +10:00
Dave Airlie	293fa6ebd4	Merge tag 'amd-drm-fixes-7.0-2026-04-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.0-2026-04-02: amdgpu: - Fix audio regression on renoir Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260402194409.914769-1-alexander.deucher@amd.com	2026-04-03 18:43:09 +10:00

1 2 3 4 5 ...

1429747 Commits