linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 22:08:33 -04:00

Author	SHA1	Message	Date
David Howells	db26d62d79	netfs: Fix undifferentiation of DIO reads from unbuffered reads On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from "unbuffered reads" (specified by cache=none in the mount parameters). The difference is flagged in the protocol and the server may behave differently: Windows Server will, for example, mandate that DIO reads are block aligned. Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from NETFS_DIO_READ, parallelling the write differentiation that already exists. cifs will then do the right thing. Fixes: `016dc8516a` ("netfs: Implement unbuffered/DIO read support") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/3444961.1747987072@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> cc: Steve French <sfrench@samba.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-23 10:35:03 +02:00
Christian Brauner	5fddfbc0cb	Merge patch series "netfs: Miscellaneous fixes" David Howells <dhowells@redhat.com> says: Here are some miscellaneous fixes and changes for netfslib, if you could pull them: (1) Fix an oops in write-retry due to mis-resetting the I/O iterator. (2) Fix the recording of transferred bytes for short DIO reads. (3) Fix a request's work item to not require a reference, thereby avoiding the need to get rid of it in BH/IRQ context. (4) Fix waiting and waking to be consistent about the waitqueue used. * patches from https://lore.kernel.org/20250519090707.2848510-1-dhowells@redhat.com: netfs: Fix wait/wake to be consistent about the waitqueue used netfs: Fix the request's work item to not require a ref netfs: Fix setting of transferred bytes with short DIO reads netfs: Fix oops in write-retry from mis-resetting the subreq iterator Link: https://lore.kernel.org/20250519090707.2848510-1-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:34 +02:00
David Howells	2b1424cd13	netfs: Fix wait/wake to be consistent about the waitqueue used Fix further inconsistencies in the use of waitqueues (clear_and_wake_up_bit() vs private waitqueue). Move some of this stuff from the read and write sides into common code so that it can be done in fewer places. To make this work, async I/O needs to set NETFS_RREQ_OFFLOAD_COLLECTION to indicate that a workqueue will do the collecting and places that call the wait function need to deal with it returning the amount transferred. Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519090707.2848510-5-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <stfrench@microsoft.com> cc: Ihor Solodrai <ihor.solodrai@pm.me> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: v9fs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:21 +02:00
David Howells	20d72b00ca	netfs: Fix the request's work item to not require a ref When the netfs_io_request struct's work item is queued, it must be supplied with a ref to the work item struct to prevent it being deallocated whilst on the queue or whilst it is being processed. This is tricky to manage as we have to get a ref before we try and queue it and then we may find it's already queued and is thus already holding a ref - in which case we have to try and get rid of the ref again. The problem comes if we're in BH or IRQ context and need to drop the ref: if netfs_put_request() reduces the count to 0, we have to do the cleanup - but the cleanup may need to wait. Fix this by adding a new work item to the request, ->cleanup_work, and dispatching that when the refcount hits zero. That can then synchronously cancel any outstanding work on the main work item before doing the cleanup. Adding a new work item also deals with another problem upstream where it's sometimes changing the work func in the put function and requeuing it - which has occasionally in the past caused the cleanup to happen incorrectly. As a bonus, this allows us to get rid of the 'was_async' parameter from a bunch of functions. This indicated whether the put function might not be permitted to sleep. Fixes: `3d3c950467` ("netfs: Provide readahead and readpage netfs helpers") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519090707.2848510-4-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <stfrench@microsoft.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:20 +02:00
Paulo Alcantara	34eb98c659	netfs: Fix setting of transferred bytes with short DIO reads A netfslib request comprises an ordered stream of subrequests that, when doing an unbuffered/DIO read, are contiguous. The subrequests may be performed in parallel, but may not be fully completed. For instance, if we try and make a 256KiB DIO read from a 3-byte file with a 64KiB rsize and 256KiB bsize, netfslib will attempt to make a read of 256KiB, broken up into four 64KiB subreads, with the expectation that the first will be short and the subsequent three be completely devoid - but we do all four on the basis that the file may have been changed by a third party. The read-collection code, however, walks through all the subreqs and advances the notion of how much data has been read in the stream to the start of each subreq plus its amount transferred (which are 3, 0, 0, 0 for the example above) - which gives an amount apparently read of 3*64KiB - which is incorrect. Fix the collection code to cut short the calculation of the transferred amount with the first short subrequest in an unbuffered read; everything beyond that must be ignored as there's a hole that cannot be filled. This applies both to shortness due to hitting the EOF and shortness due to an error. This is achieved by setting a flag on the request when we collect the first short subrequest (collection is done in ascending order). This can be tested by mounting a cifs volume with rsize=65536,bsize=262144 and doing a 256k DIO read of a very small file (e.g. 3 bytes). read() should return 3, not >3. This problem came in when netfs_read_collection() set rreq->transferred to stream->transferred, even for DIO. Prior to that, netfs_rreq_assess_dio() just went over the list and added up the subreqs till it met a short one - but now the subreqs are discarded earlier. Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Reported-by: Nicolas Baranger <nicolas.baranger@3xo.fr> Closes: https://lore.kernel.org/all/10bec2430ed4df68bde10ed95295d093@3xo.fr/ Signed-off-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519090707.2848510-3-dhowells@redhat.com cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:20 +02:00
David Howells	4481f7f2b3	netfs: Fix oops in write-retry from mis-resetting the subreq iterator Fix the resetting of the subrequest iterator in netfs_retry_write_stream() to use the iterator-reset function as the iterator may have been shortened by a previous retry. In such a case, the amount of data to be written by the subrequest is not "subreq->len" but "subreq->len - subreq->transferred". Without this, KASAN may see an error in iov_iter_revert(): BUG: KASAN: slab-out-of-bounds in iov_iter_revert lib/iov_iter.c:633 [inline] BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x443/0x5a0 lib/iov_iter.c:611 Read of size 4 at addr ffff88802912a0b8 by task kworker/u32:7/1147 CPU: 1 UID: 0 PID: 1147 Comm: kworker/u32:7 Not tainted 6.15.0-rc6-syzkaller-00052-g9f35e33144ae #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Workqueue: events_unbound netfs_write_collection_worker Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc3/0x670 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 iov_iter_revert lib/iov_iter.c:633 [inline] iov_iter_revert+0x443/0x5a0 lib/iov_iter.c:611 netfs_retry_write_stream fs/netfs/write_retry.c:44 [inline] netfs_retry_writes+0x166d/0x1a50 fs/netfs/write_retry.c:231 netfs_collect_write_results fs/netfs/write_collect.c:352 [inline] netfs_write_collection_worker+0x23fd/0x3830 fs/netfs/write_collect.c:374 process_one_work+0x9cf/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: `cd0277ed0c` ("netfs: Use new folio_queue data type and iterator instead of xarray iter") Reported-by: syzbot+25b83a6f2c702075fcbc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=25b83a6f2c702075fcbc Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519090707.2848510-2-dhowells@redhat.com Tested-by: syzbot+25b83a6f2c702075fcbc@syzkaller.appspotmail.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:20 +02:00
Christian Brauner	e02cdc0e7f	Merge patch series "netfs: Miscellaneous cleanups" David Howells <dhowells@redhat.com> says: Here are some miscellaneous very minor cleanups for netfslib for the next merge window, primarily from Max Kellermann, if you could pull them. (1) Remove NETFS_SREQ_SEEK_DATA_READ. (2) Remove NETFS_INVALID_WRITE. (3) Remove NETFS_ICTX_WRITETHROUGH. (4) Remove NETFS_READ_HOLE_CLEAR. (5) Reorder structs to eliminate holes. (6) Remove netfs_io_request::ractl. (7) Only provide proc_link field if CONFIG_PROC_FS=y. (8) Remove folio_queue::marks3. (9) Remove NETFS_RREQ_DONT_UNLOCK_FOLIOS. (10) Remove NETFS_RREQ_BLOCKED. * patches from https://lore.kernel.org/20250519134813.2975312-1-dhowells@redhat.com: fs/netfs: remove unused flag NETFS_RREQ_BLOCKED fs/netfs: remove unused flag NETFS_RREQ_DONT_UNLOCK_FOLIOS folio_queue: remove unused field `marks3` fs/netfs: declare field `proc_link` only if CONFIG_PROC_FS=y fs/netfs: remove `netfs_io_request.ractl` fs/netfs: reorder struct fields to eliminate holes fs/netfs: remove unused enum choice NETFS_READ_HOLE_CLEAR fs/netfs: remove unused flag NETFS_ICTX_WRITETHROUGH fs/netfs: remove unused source NETFS_INVALID_WRITE fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ Link: https://lore.kernel.org/20250519134813.2975312-1-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:43 +02:00
Max Kellermann	4b1ca12dd3	fs/netfs: remove unused flag NETFS_RREQ_BLOCKED NETFS_RREQ_BLOCKED was added by commit `016dc8516a` ("netfs: Implement unbuffered/DIO read support") but has never been used either. Without NETFS_RREQ_BLOCKED, NETFS_RREQ_NONBLOCK makes no sense, and thus can be removed as well. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-12-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:38 +02:00
Max Kellermann	67b916719a	fs/netfs: remove unused flag NETFS_RREQ_DONT_UNLOCK_FOLIOS NETFS_RREQ_DONT_UNLOCK_FOLIOS has never been used ever since it was added by commit `3d3c950467` ("netfs: Provide readahead and readpage netfs helpers"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-11-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:38 +02:00
Max Kellermann	6bb09e5db3	folio_queue: remove unused field `marks3` The last user was removed by commit `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-10-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:38 +02:00
Max Kellermann	07c08bac93	fs/netfs: declare field `proc_link` only if CONFIG_PROC_FS=y This field is only used for the "proc" filesystem. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-9-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:38 +02:00
Max Kellermann	3dc00bca8d	fs/netfs: remove `netfs_io_request.ractl` Since this field is only used by netfs_prepare_read_iterator() when called by netfs_readahead(), we can simply pass it as parameter. This shrinks the struct from 576 to 568 bytes. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-8-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:38 +02:00
Max Kellermann	314ee7035f	fs/netfs: reorder struct fields to eliminate holes This shrinks `struct netfs_io_stream` from 104 to 96 bytes and `struct netfs_io_request` from 600 to 576 bytes. [DH: Modified as the patch to turn netfs_io_request::error into a short was removed from the set] Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-7-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Max Kellermann	d46a7b217d	fs/netfs: remove unused enum choice NETFS_READ_HOLE_CLEAR This choice was added by commit `3a11b3a863` ("netfs: Pass more information on how to deal with a hole in the cache") but the last user was removed by commit `86b374d061` ("netfs: Remove fs/netfs/io.c"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-6-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Max Kellermann	9fcf235e91	fs/netfs: remove unused flag NETFS_ICTX_WRITETHROUGH This flag was added by commit `41d8e7673a` ("netfs: Implement a write-through caching option") but it was never used. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-5-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Max Kellermann	9cd78ca04f	fs/netfs: remove unused source NETFS_INVALID_WRITE This enum choice was added by commit `16af134ca4` ("netfs: Extend the netfs_io_*request structs to handle writes") and its only user was later removed by commit `c245868524` ("netfs: Remove the old writeback code"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-4-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Max Kellermann	c1a606cd75	fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ This flag was added by commit `3d3c950467` ("netfs: Provide readahead and readpage netfs helpers") but its only user was removed by commit `86b374d061` ("netfs: Remove fs/netfs/io.c"). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519134813.2975312-3-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:34:37 +02:00
Christian Brauner	a1b4a25abb	Merge netfs API documentation updates Bring in the netfs API documentation updates which had been in the vfs-6.16.misc branch for most of this cycle. So don't needlessly rewrite the vfs-6.16.misc by dropping it from that branch and moving it to vfs-6.16.netfs. Simply merge vfs-6.16.misc into vfs-6.16.netfs. Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:30:48 +02:00
David Howells	f1745496d3	netfs: Update main API document Bring the netfs documentation up to date. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/1690127.1744208325@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Timothy Day <timday@amazon.com> cc: Jonathan Corbet <corbet@lwn.net> cc: netfs@lists.linux.dev cc: linux-doc@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-11 15:23:50 +02:00
Mateusz Guzik	e45960c279	fs: unconditionally use atime_needs_update() in pick_link() Vast majority of the time the func returns false. This avoids a branch to determine whether we are in RCU mode. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250408073641.1799151-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 11:08:24 +02:00
Christian Brauner	c9b380a017	Merge patch series "fs: sort out cosmetic differences between stat funcs and add predicts" Predict fastpaths in stat and during fdput(). * patches from https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com: fs: predict not having to do anything in fdput() fs: sort out cosmetic differences between stat funcs and add predicts Link: https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:10 +02:00
Mateusz Guzik	5f3e0b4a1f	fs: predict not having to do anything in fdput() This matches the annotation in fdget(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250406235806.1637000-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:07 +02:00
Mateusz Guzik	eaec2cd167	fs: sort out cosmetic differences between stat funcs and add predicts This is a nop, but I did verify asm improves. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:07 +02:00
Christian Brauner	9d36c5145a	Merge patch series "fs: harden anon inodes" Christian Brauner <brauner@kernel.org> says: * Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode. * Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock. * Port pidfs to the new anon_inode_{g,s}etattr() helpers. * Add proper tests for anonymous inode behavior. The anonymous inode specific fixes should ideally be backported to all LTS kernels. * patches from https://lore.kernel.org/20250407-work-anon_inode-v1-0-53a44c20d44e@kernel.org: selftests/filesystems: add fourth test for anonymous inodes selftests/filesystems: add third test for anonymous inodes selftests/filesystems: add second test for anonymous inodes selftests/filesystems: add first test for anonymous inodes anon_inode: raise SB_I_NODEV and SB_I_NOEXEC pidfs: use anon_inode_setattr() anon_inode: explicitly block ->setattr() pidfs: use anon_inode_getattr() anon_inode: use a proper mode internally Link: https://lore.kernel.org/20250407-work-anon_inode-v1-0-53a44c20d44e@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:15 +02:00
Christian Brauner	25a6cc9a63	selftests/filesystems: add open() test for anonymous inodes Test that anonymous inodes cannot be open()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-9-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:15 +02:00
Christian Brauner	f8ca403ae7	selftests/filesystems: add exec() test for anonymous inodes Test that anonymous inodes cannot be exec()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-8-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	fcf31ec7ca	selftests/filesystems: add chmod() test for anonymous inodes Test that anonymous inodes cannot be chmod()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-7-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	c784159750	selftests/filesystems: add chown() test for anonymous inodes Test that anonymous inodes cannot be chown()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-6-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	1ed95281c0	anon_inode: raise SB_I_NODEV and SB_I_NOEXEC It isn't possible to execute anonymous inodes because they cannot be opened in any way after they have been created. This includes execution: execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH) Anonymous inodes have inode->f_op set to no_open_fops which sets no_open() which returns ENXIO. That means any call to do_dentry_open() which is the endpoint of the do_open_execat() will fail. There's no chance to execute an anonymous inode. Unless a given subsystem overrides it ofc. However, we should still harden this and raise SB_I_NODEV and SB_I_NOEXEC on the superblock itself so that no one gets any creative ideas. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-5-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:19:04 +02:00
Christian Brauner	c83b902496	pidfs: use anon_inode_setattr() So far pidfs did use it's own version. Just use the generic version. We use our own wrappers because we're going to be implementing properties soon. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-4-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:19:02 +02:00
Christian Brauner	22bdf3d658	anon_inode: explicitly block ->setattr() It is currently possible to change the mode and owner of the single anonymous inode in the kernel: int main(int argc, char *argv[]) { int ret, sfd; sigset_t mask; struct signalfd_siginfo fdsi; sigemptyset(&mask); sigaddset(&mask, SIGINT); sigaddset(&mask, SIGQUIT); ret = sigprocmask(SIG_BLOCK, &mask, NULL); if (ret < 0) _exit(1); sfd = signalfd(-1, &mask, 0); if (sfd < 0) _exit(2); ret = fchown(sfd, 5555, 5555); if (ret < 0) _exit(3); ret = fchmod(sfd, 0777); if (ret < 0) _exit(3); _exit(4); } This is a bug. It's not really a meaningful one because anonymous inodes don't really figure into path lookup and they cannot be reopened via /proc/<pid>/fd/<nr> and can't be used for lookup itself. So they can only ever serve as direct references. But it is still completely bogus to allow the mode and ownership or any of the properties of the anonymous inode to be changed. Block this! Link: https://lore.kernel.org/20250407-work-anon_inode-v1-3-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:59 +02:00
Christian Brauner	37e62dafbf	pidfs: use anon_inode_getattr() So far pidfs did use it's own version. Just use the generic version. We use our own wrappers because we're going to be implementing our own retrieval properties soon. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-2-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:56 +02:00
Christian Brauner	cfd86ef7e8	anon_inode: use a proper mode internally This allows the VFS to not trip over anonymous inodes and we can add asserts based on the mode into the vfs. When we report it to userspace we can simply hide the mode to avoid regressions. I've audited all direct callers of alloc_anon_inode() and only secretmen overrides i_mode and i_op inode operations but it already uses a regular file. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-1-53a44c20d44e@kernel.org Fixes: `af153bb63a` ("vfs: catch invalid modes in may_open()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Reported-by: syzbot+5d8e79d323a13aa0b248@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67ed3fb3.050a0220.14623d.0009.GAE@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:46 +02:00
David Disseldorp	418556fa57	docs: initramfs: update compression and mtime descriptions Update the document to reflect that initramfs didn't replace initrd following kernel 2.5.x. The initramfs buffer format now supports many compression types in addition to gzip, so include them in the grammar section. c_mtime use is dependent on CONFIG_INITRAMFS_PRESERVE_MTIME. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250402033949.852-2-ddiss@suse.de Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:38:01 +02:00
Linus Torvalds	0af2f6be1b	Linux 6.15-rc1 v6.15-rc1	2025-04-06 13:11:33 -07:00
Thomas Weißschuh	0efdedb335	tools/include: make uapi/linux/types.h usable from assembly The "real" linux/types.h UAPI header gracefully degrades to a NOOP when included from assembly code. Mirror this behaviour in the tools/ variant. Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the toolchain automatically. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: `c9fbaa8795` ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-06 12:55:31 -07:00
Linus Torvalds	710329254d	Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - support up to 8192 processors - add cpuidle governor debug telemetry, disabled by default - update default output to exclude cpuidle invocation counts - bug fixes * tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: v2025.05.06 tools/power turbostat: disable "cpuidle" invocation counters, by default tools/power turbostat: re-factor sysfs code tools/power turbostat: Restore GFX sysfs fflush() call tools/power turbostat: Document GNR UncMHz domain convention tools/power turbostat: report CoreThr per measurement interval tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192 tools/power turbostat: Add idle governor statistics reporting tools/power turbostat: Fix names matching tools/power turbostat: Allow Zero return value for some RAPL registers tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options	2025-04-06 12:32:43 -07:00
Linus Torvalds	59f392fa7c	Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT * tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE	2025-04-06 12:04:53 -07:00
Len Brown	03e00e373c	tools/power turbostat: v2025.05.06 Support up to 8192 processors Add cpuidle governor debug telemetry, disabled by default Update default output to exclude cpuidle invocation counts Bug fixes Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 14:49:20 -04:00
Len Brown	ec4acd3166	tools/power turbostat: disable "cpuidle" invocation counters, by default Create "pct_idle" counter group, the sofware notion of residency so it can now be singled out, independent of other counter groups. Create "cpuidle" group, the cpuidle invocation counts. Disable "cpuidle", by default. Create "swidle" = "cpuidle" + "pct_idle". Undocument "sysfs", the old name for "swidle", but keep it working for backwards compatibilty. Create "hwidle", all the HW idle counters Modify "idle", enabled by default "idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle") Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 14:29:57 -04:00
Linus Torvalds	dda8887894	Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a perf events time accounting bug" * tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix child_total_time_enabled accounting bug at task exit	2025-04-06 10:48:12 -07:00
Linus Torvalds	302deb109d	Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP	2025-04-06 10:44:58 -07:00
Linus Torvalds	6f110a5e4f	Disable SLUB_TINY for build testing ... and don't error out so hard on missing module descriptions. Before commit `6c6c1fc09d` ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1'). After that commit the warning became an unconditional hard error. And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed. The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre. So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds. Reported-by: Damian Tometzki <damian@riscv-rocks.de> Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/ Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Fixes: `6c6c1fc09d` ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-06 10:00:04 -07:00
Len Brown	994633894f	tools/power turbostat: re-factor sysfs code Probe cpuidle "sysfs" residency and counts separately, since soon we will make one disabled on, and the other disabled off. Clarify that some BIC (build-in-counters) are actually "groups". since we're about to re-name some of those groups. no functional change. Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 12:53:18 -04:00
Zhang Rui	f8b136ef26	tools/power turbostat: Restore GFX sysfs fflush() call Do fflush() to discard the buffered data, before each read of the graphics sysfs knobs. Fixes: `ba99a4fc8c` ("tools/power turbostat: Remove unnecessary fflush() call") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 12:36:03 -04:00
Len Brown	3ae8508663	tools/power turbostat: Document GNR UncMHz domain convention Document that on Intel Granite Rapids Systems, Uncore domains 0-2 are CPU domains, and uncore domains 3-4 are IO domains. Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 12:31:59 -04:00
Len Brown	f729775f79	tools/power turbostat: report CoreThr per measurement interval The CoreThr column displays total thermal throttling events since boot time. Change it to report events during the measurement interval. This is more useful for showing a user the current conditions. Total events since boot time are still available to the user via /sys/devices/system/cpu/cpu/thermal_throttle/ Document CoreThr on turbostat.8 Fixes: `eae97e053f` ("turbostat: Support thermal throttle count print") Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Chen Yu <yu.c.chen@intel.com>	2025-04-06 12:21:25 -04:00
Justin Ernst	eb187540d1	tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192 On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output: "turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151" A similar error appears with the use of turbostat --cpu when the inputted cpu range contains a cpu number >= 1024: # turbostat -c 1100-1151 "--cpu 1100-1151" malformed ... Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS. It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low. For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS brings support for parsing cpu numbers >= 1024. Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64. Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 12:14:14 -04:00
Linus Torvalds	16cd1c2657	Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()	2025-04-06 08:35:37 -07:00
Linus Torvalds	ff0c66685d	Merge tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more irq updates from Thomas Gleixner: "A set of updates for the interrupt subsystem: - A treewide cleanup for the irq_domain code, which makes the naming consistent and gets rid of the original oddity of naming domains 'host'. This is a trivial mechanical change and is done late to ensure that all instances have been catched and new code merged post rc1 wont reintroduce new instances. - A trivial consistency fix in the migration code The recent introduction of irq_force_complete_move() in the core code, causes a problem for the nostalgia crowd who maintains ia64 out of tree. The code assumes that hierarchical interrupt domains are enabled and dereferences irq_data::parent_data unconditionally. That works in mainline because both architectures which enable that code have hierarchical domains enabled. Though it breaks the ia64 build, which enables the functionality, but does not have hierarchical domains. While it's not really a problem for mainline today, this unconditional dereference is inconsistent and trivially fixable by using the existing helper function irqd_get_parent_data(), which has the appropriate #ifdeffery in place" * tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move() irqdomain: Stop using 'host' for domain irqdomain: Rename irq_get_default_host() to irq_get_default_domain() irqdomain: Rename irq_set_default_host() to irq_set_default_domain()	2025-04-06 08:17:43 -07:00

1 2 3 4 5 ...

1351140 Commits