linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-10 21:48:45 -04:00

Author	SHA1	Message	Date
Jann Horn	ecb6cc0fd8	eventpoll: fix sphinx documentation build warning Sphinx complains that ep_get_upwards_depth_proc() has a kerneldoc-style comment without documenting its parameters. This is an internal function that was not meant to show up in kernel documentation, so fix the warning by changing the comment to a non-kerneldoc one. Fixes: `22bacca48a` ("epoll: prevent creating circular epoll structures") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250717173655.10ecdce6@canb.auug.org.au Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507171958.aMcW08Cn-lkp@intel.com/ Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250721-epoll-sphinx-fix-v1-1-b695c92bf009@google.com Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-23 15:00:13 +02:00
Christian Brauner	981569a06f	Merge patch series "fs: refactor write_begin/write_end and add ext4 IOCB_DONTCACHE support" 陈涛涛 Taotao Chen <chentaotao@didiglobal.com> says: This patch series refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Tested with ext4 and i915 GEM workloads. * patches from https://lore.kernel.org/20250716093559.217344-1-chentaotao@didiglobal.com: ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create Link: https://lore.kernel.org/20250716093559.217344-1-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:24 +02:00
Taotao Chen	ae21c0c0ac	ext4: support uncached buffered I/O Set FOP_DONTCACHE in ext4_file_operations to declare support for uncached buffered I/O. To handle this flag, update ext4_write_begin() and ext4_da_write_begin() to use write_begin_get_folio(), which encapsulates FGP_DONTCACHE logic based on iocb->ki_flags. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-6-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:18 +02:00
Taotao Chen	b799474b9a	mm/pagemap: add write_begin_get_folio() helper function Add write_begin_get_folio() to simplify the common folio lookup logic used by filesystem ->write_begin() implementations. This helper wraps __filemap_get_folio() with common flags such as FGP_WRITEBEGIN, conditional FGP_DONTCACHE, and set folio order based on the write length. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-5-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:18 +02:00
Taotao Chen	e9d8e2bf23	fs: change write_begin/write_end interface to take struct kiocb * Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:18 +02:00
Taotao Chen	048832a3f4	drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter Refactors shmem_pwrite() to replace the ->write_begin/end logic with a write_iter-based implementation using kiocb and iov_iter. While kernel_write() was considered, it caused about 50% performance regression. vfs_write() is not exported for kernel use. Therefore, file->f_op->write_iter() is called directly with a synchronously initialized kiocb to preserve performance and remove write_begin usage. Performance results use gem_pwrite on Intel CPU i7-10700 (average of 10 runs): - ./gem_pwrite --run-subtest bench -s 16384 Before: 0.205s, After: 0.214s - ./gem_pwrite --run-subtest bench -s 524288 Before: 6.1021s, After: 4.8047s Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-3-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:18 +02:00
Taotao Chen	e7b840fd49	drm/i915: Use kernel_write() in shmem object create Replace the write_begin/write_end loop in i915_gem_object_create_shmem_from_data() with call to kernel_write(). This function initializes shmem-backed GEM objects. kernel_write() simplifies the code by removing manual folio handling. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-2-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:18 +02:00
Jann Horn	f2e467a482	eventpoll: Fix semi-unbounded recursion Ensure that epoll instances can never form a graph deeper than EP_MAX_NESTS+1 links. Currently, ep_loop_check_proc() ensures that the graph is loop-free and does some recursion depth checks, but those recursion depth checks don't limit the depth of the resulting tree for two reasons: - They don't look upwards in the tree. - If there are multiple downwards paths of different lengths, only one of the paths is actually considered for the depth check since commit `28d82dc1c4` ("epoll: limit paths"). Essentially, the current recursion depth check in ep_loop_check_proc() just serves to prevent it from recursing too deeply while checking for loops. A more thorough check is done in reverse_path_check() after the new graph edge has already been created; this checks, among other things, that no paths going upwards from any non-epoll file with a length of more than 5 edges exist. However, this check does not apply to non-epoll files. As a result, it is possible to recurse to a depth of at least roughly 500, tested on v6.15. (I am unsure if deeper recursion is possible; and this may have changed with commit `8c44dac8ad` ("eventpoll: Fix priority inversion problem").) To fix it: 1. In ep_loop_check_proc(), note the subtree depth of each visited node, and use subtree depths for the total depth calculation even when a subtree has already been visited. 2. Add ep_get_upwards_depth_proc() for similarly determining the maximum depth of an upwards walk. 3. In ep_loop_check(), use these values to limit the total path length between epoll nodes to EP_MAX_NESTS edges. Fixes: `22bacca48a` ("epoll: prevent creating circular epoll structures") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/20250711-epoll-recursion-fix-v1-1-fb2457c33292@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-16 14:48:03 +02:00
Jan Kara	3bc4e44108	vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() evict_inodes() uses list_for_each_entry_safe() to iterate sb->s_inodes list. However, since we use i_lru list entry for our local temporary list of inodes to destroy, the inode is guaranteed to stay in sb->s_inodes list while we hold sb->s_inode_list_lock. So there is no real need for safe iteration variant and we can use list_for_each_entry() just fine. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/20250709090635.26319-2-jack@suse.cz Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-10 09:37:32 +02:00
Pankaj Raghav	25050181b6	fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable Since [1], it is possible for filesystems to have blocksize > PAGE_SIZE of the system. Remove the assumption and make the check generic for all blocksizes in generic_check_addressable(). [1] https://lore.kernel.org/linux-xfs/20240822135018.1931258-1-kernel@pankajraghav.com/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/20250630104018.213985-1-p.raghav@samsung.com Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-08 16:48:12 +02:00
Pankaj Raghav	77eb64439a	fs/buffer: remove the min and max limit checks in __getblk_slow() All filesystems will already check the max and min value of their block size during their initialization. __getblk_slow() is a very low-level function to have these checks. Remove them and only check for logical block size alignment. As this check with logical block size alignment might never trigger, add WARN_ON_ONCE() to the check. As WARN_ON_ONCE() will already print the stack, remove the call to dump_stack(). Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/20250626113223.181399-1-p.raghav@samsung.com Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-08 16:48:12 +02:00
Sasha Levin	04a2c4b451	fs: Prevent file descriptor table allocations exceeding INT_MAX When sysctl_nr_open is set to a very high value (for example, 1073741816 as set by systemd), processes attempting to use file descriptors near the limit can trigger massive memory allocation attempts that exceed INT_MAX, resulting in a WARNING in mm/slub.c: WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 __kvmalloc_node_noprof+0x21a/0x288 This happens because kvmalloc_array() and kvmalloc() check if the requested size exceeds INT_MAX and emit a warning when the allocation is not flagged with __GFP_NOWARN. Specifically, when nr_open is set to 1073741816 (0x3ffffff8) and a process calls dup2(oldfd, 1073741880), the kernel attempts to allocate: - File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes - Multiple bitmaps: ~400MB - Total allocation size: > 8GB (exceeding INT_MAX = 2,147,483,647) Reproducer: 1. Set /proc/sys/fs/nr_open to 1073741816: # echo 1073741816 > /proc/sys/fs/nr_open 2. Run a program that uses a high file descriptor: #include <unistd.h> #include <sys/resource.h> int main() { struct rlimit rlim = {1073741824, 1073741824}; setrlimit(RLIMIT_NOFILE, &rlim); dup2(2, 1073741880); // Triggers the warning return 0; } 3. Observe WARNING in dmesg at mm/slub.c:5027 systemd commit a8b627a introduced automatic bumping of fs.nr_open to the maximum possible value. The rationale was that systems with memory control groups (memcg) no longer need separate file descriptor limits since memory is properly accounted. However, this change overlooked that: 1. The kernel's allocation functions still enforce INT_MAX as a maximum size regardless of memcg accounting 2. Programs and tests that legitimately test file descriptor limits can inadvertently trigger massive allocations 3. The resulting allocations (>8GB) are impractical and will always fail systemd's algorithm starts with INT_MAX and keeps halving the value until the kernel accepts it. On most systems, this results in nr_open being set to 1073741816 (0x3ffffff8), which is just under 1GB of file descriptors. While processes rarely use file descriptors near this limit in normal operation, certain selftests (like tools/testing/selftests/core/unshare_test.c) and programs that test file descriptor limits can trigger this issue. Fix this by adding a check in alloc_fdtable() to ensure the requested allocation size does not exceed INT_MAX. This causes the operation to fail with -EMFILE instead of triggering a kernel warning and avoids the impractical >8GB memory allocation request. Fixes: `9cfe015aa4` ("get rid of NR_OPEN and introduce a sysctl_nr_open") Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Link: https://lore.kernel.org/20250629074021.1038845-1-sashal@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-08 16:48:12 +02:00
Matthew Wilcox (Oracle)	b39f7d75dc	fs: Remove three arguments from block_write_end() block_write_end() looks like it can be used as a ->write_end() implementation. However, it can't as it does not unlock nor put the folio. Since it does not use the 'file', 'mapping' nor 'fsdata' arguments, remove them. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/20250624132130.1590285-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-24 15:53:40 +02:00
Ankit Chauhan	06a705356d	fs/ecryptfs: replace snprintf with sysfs_emit in show function Use sysfs_emit() instead of snprintf() in version_show() function to follow the preferred kernel API. Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com> Link: https://lore.kernel.org/20250619031536.19352-1-ankitchauhan2065@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-23 14:12:36 +02:00
Dmitry Antipov	2b7c9664c3	fs: annotate suspected data race between poll_schedule_timeout() and pollwake() When running almost any select()/poll() workload intense enough, KCSAN is likely to report data races around using 'triggered' flag of 'struct poll_wqueues'. For example, running 'find /' on a tty console may trigger the following: BUG: KCSAN: data-race in poll_schedule_timeout / pollwake write to 0xffffc900030cfb90 of 4 bytes by task 97 on cpu 5: pollwake+0xd1/0x130 __wake_up_common_lock+0x7f/0xd0 n_tty_receive_buf_common+0x776/0xc30 n_tty_receive_buf2+0x3d/0x60 tty_ldisc_receive_buf+0x6b/0x100 tty_port_default_receive_buf+0x63/0xa0 flush_to_ldisc+0x169/0x3c0 process_scheduled_works+0x6fe/0xf40 worker_thread+0x53b/0x7b0 kthread+0x4f8/0x590 ret_from_fork+0x28c/0x450 ret_from_fork_asm+0x1a/0x30 read to 0xffffc900030cfb90 of 4 bytes by task 5802 on cpu 4: poll_schedule_timeout+0x96/0x160 do_sys_poll+0x966/0xb30 __se_sys_ppoll+0x1c3/0x210 __x64_sys_ppoll+0x71/0x90 x64_sys_call+0x3079/0x32b0 do_syscall_64+0xfa/0x3b0 entry_SYSCALL_64_after_hwframe+0x77/0x7f According to Jan, "there's no practical issue here because it is hard to imagine how the compiler could compile the above code using some intermediate values stored into 'triggered' or multiple fetches from 'triggered'". Nevertheless, silence KCSAN by using WRITE_ONCE() in __pollwake() and READ_ONCE() in poll_schedule_timeout(), respectively. Link: https://lore.kernel.org/linux-fsdevel/bwx72orsztfjx6aoftzzkl7wle3hi4syvusuwc7x36nw6t235e@bjwrosehblty Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://lore.kernel.org/20250620063059.1800689-1-dmantipov@yandex.ru Acked-by: Marco Elver <elver@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-23 12:36:51 +02:00
Junxuan Liao	2773d282cd	docs/vfs: update references to i_mutex to i_rwsem VFS has switched to i_rwsem for ten years now (`9902af79c0`: parallel lookups actual switch to rwsem), but the VFS documentation and comments still has references to i_mutex. Signed-off-by: Junxuan Liao <ljx@cs.wisc.edu> Link: https://lore.kernel.org/72223729-5471-474a-af3c-f366691fba82@cs.wisc.edu Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-23 12:17:33 +02:00
Pankaj Raghav	6ae5812112	fs/buffer: remove comment about hard sectorsize Commit `e1defc4ff0` ("block: Do away with the notion of hardsect_size") changed hardsect_size to logical block size. The comment on top still says hardsect_size. Remove the comment as the code is pretty clear. While we are at it, format the relevant code. Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/20250618075821.111459-1-p.raghav@samsung.com Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-19 12:07:45 +02:00
RubenKelevra	ffaf1bf373	fs_context: fix parameter name in infofc() macro The macro takes a parameter called "p" but references "fc" internally. This happens to compile as long as callers pass a variable named fc, but breaks otherwise. Rename the first parameter to “fc” to match the usage and to be consistent with warnfc() / errorfc(). Fixes: `a3ff937b33` ("prefix-handling analogues of errorf() and friends") Signed-off-by: RubenKelevra <rubenkelevra@gmail.com> Link: https://lore.kernel.org/20250617230927.1790401-1-rubenkelevra@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-18 11:20:19 +02:00
NeilBrown	bc9241367a	VFS: change old_dir and new_dir in struct renamedata to dentrys all users of 'struct renamedata' have the dentry for the old and new directories, and often have no use for the inode except to store it in the renamedata. This patch changes struct renamedata to hold the dentry, rather than the inode, for the old and new directories, and changes callers to match. The names are also changed from a _dir suffix to _parent. This is consistent with other usage in namei.c and elsewhere. This results in the removal of several local variables and several dereferences of ->d_inode at the cost of adding ->d_inode dereferences to vfs_rename(). Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-16 16:30:45 +02:00
Al Viro	b5ba648a7d	proc_fd_getattr(): don't bother with S_ISDIR() check that thing is callable only as ->i_op->getattr() instance and only for directory inodes (/proc//fd and /proc//task/*/fd) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250615003321.GC3011112@ZenIV Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-16 16:21:07 +02:00
Al Viro	88b1de5497	don't duplicate vfs_open() in kernel_file_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250615003216.GB3011112@ZenIV Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-16 16:18:38 +02:00
Jeff Layton	d209f6e122	filelock: add new locks_wake_up_waiter() helper Currently the function that does this takes a struct file_lock, but __locks_wake_up_blocks() deals with both locks and leases. Currently this works because both file_lock and file_lease have the file_lock_core at the beginning of the struct, but it's fragile to rely on that. Add a new locks_wake_up_waiter() function and call that from __locks_wake_up_blocks(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/20250602-filelock-6-16-v1-1-7da5b2c930fd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-10 13:16:19 +02:00
Jens Axboe	dd765ba872	fs/pipe: set FMODE_NOWAIT in create_pipe_files() Rather than have the caller set the FMODE_NOWAIT flags for both output files, move it to create_pipe_files() where other f_mode flags are set anyway with stream_open(). With that, both __do_pipe_flags() and io_pipe() can remove the manual setting of the NOWAIT flags. No intended functional changes, just a code cleanup. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/1f0473f8-69f3-4eb1-aa77-3334c6a71d24@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-10 13:16:19 +02:00
Andy Shevchenko	cd95e366c9	fs/read_write: Fix spelling typo 'implemenation' --> 'implementation'. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/20250530173204.3611576-1-andriy.shevchenko@linux.intel.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-10 13:16:19 +02:00
Linus Torvalds	19272b37aa	Linux 6.16-rc1 v6.16-rc1	2025-06-08 13:44:43 -07:00
Linus Torvalds	939f15e640	Merge tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Add initial DMR support, which required smarter RAPL probe - Fix AMD MSR RAPL energy reporting - Add RAPL power limit configuration output - Minor fixes * tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2025.06.08 tools/power turbostat: Add initial support for BartlettLake tools/power turbostat: Add initial support for DMR tools/power turbostat: Dump RAPL sysfs info tools/power turbostat: Avoid probing the same perf counters tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared tools/power turbostat: Clean up add perf/msr counter logic tools/power turbostat: Introduce add_msr_counter() tools/power turbostat: Remove add_msr_perf_counter_() tools/power turbostat: Remove add_cstate_perf_counter_() tools/power turbostat: Remove add_rapl_perf_counter_() tools/power turbostat: Quit early for unsupported RAPL counters tools/power turbostat: Always check rapl_joules flag tools/power turbostat: Fix AMD package-energy reporting tools/power turbostat: Fix RAPL_GFX_ALL typo tools/power turbostat: Add Android support for MSR device handling tools/power turbostat.8: pm_domain wording fix tools/power turbostat.8: fix typo: idle_pct should be pct_idle	2025-06-08 11:44:41 -07:00
Linus Torvalds	be54f8c558	Merge tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanup from Thomas Gleixner: "The delayed from_timer() API cleanup: The renaming to the timer_() namespace was delayed due massive conflicts against Linux-next. Now that everything is upstream finish the conversion" tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename from_timer() to timer_container_of()	2025-06-08 11:33:00 -07:00
Linus Torvalds	0529ef8c36	Merge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events	2025-06-08 11:27:20 -07:00
Linus Torvalds	4710eacf8d	Merge tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Add the missing seq_file forward declaration in the timer namespace header" * tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timens: Add struct seq_file forward declaration	2025-06-08 11:25:13 -07:00
Len Brown	42fd37dcc4	tools/power turbostat: version 2025.06.08 Add initial DMR support, which required smarter RAPL probe Fix AMD MSR RAPL energy reporting Add RAPL power limit configuration output Minor fixes Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:17 -04:00
Zhang Rui	d8c0f5d973	tools/power turbostat: Add initial support for BartlettLake Add initial support for BartlettLake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:17 -04:00
Zhang Rui	83075bd59d	tools/power turbostat: Add initial support for DMR Add initial support for DMR. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	2a535d6cc3	tools/power turbostat: Dump RAPL sysfs info for example: intel-rapl:1: psys 28.0s:100W 976.0us:100W intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W intel-rapl:0/intel-rapl:0:0: core disabled intel-rapl:0/intel-rapl:0:1: uncore disabled intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W [lenb: simplified format] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> squish me Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	69078520fd	tools/power turbostat: Avoid probing the same perf counters For the RAPL package energy status counter, Intel and AMD share the same perf_subsys and perf_name, but with different MSR addresses. Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are introduced to describe this counter for different Vendors. As a result, the perf counter is probed twice, and causes a failure in in get_rapl_counters() because expected_read_size and actual_read_size don't match. Fix the problem by skipping the already probed counter. Note, this is not a perfect fix. For example, if different vendors/platforms use the same MSR value for different purpose, the code can be fooled when it probes a rapl_counter_arch_infos[] entry that does not belong to the running Vendor/Platform. In a long run, better to put rapl_counter_arch_infos[] into the platform_features so that this becomes Vendor/Platform specific. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	ff3d019e98	tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared platform_features->rapl_msrs describes the RAPL MSRs supported. While RAPL Perf counters can be exposed from different kernel backend drivers, e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver. Thus, turbostat should first blindly probe all the available RAPL Perf counters, and falls back to the RAPL MSR counters if they are listed in platform_features->rapl_msrs. With this, platforms that don't have RAPL MSRs can clear the platform_features->rapl_msrs bits and use RAPL Perf counters only. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	0362337968	tools/power turbostat: Clean up add perf/msr counter logic Increase the code readability by moving the no_perf/no_msr flag and the cai->perf_name/cai->msr sanity checks into the counter probe functions. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	1ab2e19b4c	tools/power turbostat: Introduce add_msr_counter() probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR counters and MPERF/APERF/SMI MSR counters, thus its name is misleading. Similar to add_perf_counter(), introduce add_msr_counter() to probe a counter via MSR. Introduce wrapper function add_rapl_msr_counter() at the same time to add extra check for Zero return value for specified RAPL counters. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	3403e89f97	tools/power turbostat: Remove add_msr_perf_counter_() As the only caller of add_msr_perf_counter_(), add_msr_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_msr_perf_counter_() and move all the logic to add_msr_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	4d6ced7bef	tools/power turbostat: Remove add_cstate_perf_counter_() As the only caller of add_cstate_perf_counter_(), add_cstate_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_cstate_perf_counter_() and move all the logic to add_cstate_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	c8bca955da	tools/power turbostat: Remove add_rapl_perf_counter_() As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_rapl_perf_counter_() and move all the logic to add_rapl_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	57b53787f0	tools/power turbostat: Quit early for unsupported RAPL counters Quit early for unsupported RAPL counters. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Zhang Rui	fdea6b883b	tools/power turbostat: Always check rapl_joules flag rapl_joules bit should always be checked even if platform_features->rapl_msrs is not set or no_msr flag is used. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Gautham R. Shenoy	adb49732c8	tools/power turbostat: Fix AMD package-energy reporting commit `05a2f07db8` ("tools/power turbostat: read RAPL counters via perf") that adds support to read RAPL counters via perf defines the notion of a RAPL domain_id which is set to physical_core_id on platforms which support per_core_rapl counters (Eg: AMD processors Family 17h onwards) and is set to the physical_package_id on all the other platforms. However, the physical_core_id is only unique within a package and on platforms with multiple packages more than one core can have the same physical_core_id and thus the same domain_id. (For eg, the first cores of each package have the physical_core_id = 0). This results in all these cores with the same physical_core_id using the same entry in the rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the perf-initialization for cores whose domain_ids have already been visited, cores that have the same physical_core_id always read the perf file corresponding to the physical_core_id of the first package and thus the package-energy is incorrectly reported to be the same value for different packages. Note: This issue only arises when RAPL counters are read via perf and not when they are read via MSRs since in the latter case the MSRs are read separately on each core. Fix this issue by associating each CPU with rapl_core_id which is unique across all the packages in the system. Fixes: `05a2f07db8` ("tools/power turbostat: read RAPL counters via perf") Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Kaushlendra Kumar	b4a734d383	tools/power turbostat: Fix RAPL_GFX_ALL typo Fix typo in the currently unused RAPL_GFX_ALL macro definition. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Kaushlendra Kumar	5663785ae0	tools/power turbostat: Add Android support for MSR device handling It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr, updates error messages and permission checks to reflect the Android device path, and wraps platform-specific code with #if defined(ANDROID) to ensure correct behavior on both Android and non-Android systems. These changes improve compatibility and usability of turbostat on Android devices. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Len Brown	c967900fcb	tools/power turbostat.8: pm_domain wording fix turbostat.8: clarify that uncore "domains" are Power Management domains, aka pm_domains. Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:16 -04:00
Len Brown	394c1127ab	tools/power turbostat.8: fix typo: idle_pct should be pct_idle idle_pct should be pct_idle Signed-off-by: Len Brown <len.brown@intel.com>	2025-06-08 14:10:01 -04:00
Linus Torvalds	d9864e7d15	Merge tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Thomas Gleixner: "A single fix for the x86 performance counters on Intel CPUs: The MSR offset calculations for fixed performance counters are stored at the wrong index in the configuration array causing the general purpose counter MSR offset to be overwritten, so both the general purpose and the fixed counters offsets are incorrect. Correct the array index calculation to fix that" * tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()	2025-06-08 11:07:33 -07:00
Linus Torvalds	70b7d651ca	Merge tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for the PCI/MSI code: The conversion to per device MSI domains created a MSI domain with size 1 instead of sizing it to the maximum possible number of MSI interrupts for the device. This "worked" as the subsequent allocations resized the domain, but the recent change to move the prepare() call into the domain creation path broke this works by chance mechanism. Size the domain properly at creation time" * tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Size device MSI domain with the maximum number of vectors	2025-06-08 11:02:53 -07:00
Linus Torvalds	35b574a6c2	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount fixes from Al Viro: "Various mount-related bugfixes: - split the do_move_mount() checks in subtree-of-our-ns and entire-anon cases and adapt detached mount propagation selftest for mount_setattr - allow clone_private_mount() for a path on real rootfs - fix a race in call of has_locked_children() - fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP - make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right userns - avoid false negatives in path_overmount() - don't leak MNT_LOCKED from parent to child in finish_automount() - do_change_type(): refuse to operate on unmounted/not ours mounts" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_change_type(): refuse to operate on unmounted/not ours mounts clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns selftests/mount_setattr: adapt detached mount propagation test do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases fs: allow clone_private_mount() for a path on real rootfs fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) finish_automount(): don't leak MNT_LOCKED from parent to child path_overmount(): avoid false negatives fs/fhandle.c: fix a race in call of has_locked_children()	2025-06-08 10:35:12 -07:00

1 2 3 4 5 ...

1367309 Commits