linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 06:15:06 -04:00

Author	SHA1	Message	Date
Peter Zijlstra	b33a51564e	perf: Use guard() for aux_mutex in perf_mmap() After duplicating the common code into the rb/aux branches is it possible to use a simple guard() for the aux_mutex. Making the aux branch self-contained. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.246250452@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	41b80e1d74	perf: Remove redundant aux_unlock label unlock and aux_unlock are now identical, remove the aux_unlock one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.131293512@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	4118994b33	perf: Move common code into both rb and aux branches if (cond) { A; } else { B; } C; into if (cond) { A; C; } else { B; C; } Notably C has a success branch and both A and B have two places for success. For A (rb case), duplicate the success case because later patches will result in them no longer being identical. For B (aux case), share using goto (cleaned up later). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.016252852@infradead.org	2025-08-15 13:12:59 +02:00
Peter Zijlstra	3821f25868	perf: Merge consecutive conditionals in perf_mmap() if (cond) { A; } else { B; } if (cond) { C; } else { D; } into: if (cond) { A; C; } else { B; D; } Notably the conditions are not identical in form, but are equivalent. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.900078502@infradead.org	2025-08-15 13:12:59 +02:00
Peter Zijlstra	86a0a7c598	perf: Move perf_mmap_calc_limits() into both rb and aux branches if (cond) { A; } else { B; } C; into if (cond) { A; C; } else { B; C; } Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.781244099@infradead.org	2025-08-15 13:12:59 +02:00
Thomas Gleixner	1ea3e3b0da	perf: Split out VM accounting Similarly to the mlock limit calculation the VM accounting is required for both the ringbuffer and the AUX buffer allocations. To prepare for splitting them out into separate functions, move the accounting into a helper function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.660347811@infradead.org	2025-08-15 13:12:59 +02:00
Thomas Gleixner	81e026ca47	perf: Split out mlock limit handling To prepare for splitting the buffer allocation out into separate functions for the ring buffer and the AUX buffer, split out mlock limit handling into a helper function, which can be called from both. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.541975109@infradead.org	2025-08-15 13:12:58 +02:00
Thomas Gleixner	e8c4f6ee8e	perf: Remove redundant condition for AUX buffer size It is already checked whether the VMA size is the same as nr_pages * PAGE_SIZE, so later checking both: aux_size == vma_size && aux_size == nr_pages * PAGE_SIZE is redundant. Remove the vma_size check as nr_pages is what is actually used in the allocation function. That prepares for splitting out the buffer allocation into separate functions, so that only nr_pages needs to be handed in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.424519320@infradead.org	2025-08-15 13:12:58 +02:00
Yunseong Kim	b64fdd422a	perf: Avoid undefined behavior from stopping/starting inactive events Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can leave event->hw.idx at -1. When PMU drivers later attempt to use this negative index as a shift exponent in bitwise operations, it leads to UBSAN shift-out-of-bounds reports. The issue is a logical flaw in how event groups handle throttling when some members are intentionally disabled. Based on the analysis and the reproducer provided by Mark Rutland (this issue on both arm64 and x86-64). The scenario unfolds as follows: 1. A group leader event is configured with a very aggressive sampling period (e.g., sample_period = 1). This causes frequent interrupts and triggers the throttling mechanism. 2. A child event in the same group is created in a disabled state (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF. Since it hasn't been scheduled onto the PMU, its event->hw.idx remains initialized at -1. 3. When throttling occurs, perf_event_throttle_group() and later perf_event_unthrottle_group() iterate through all siblings, including the disabled child event. 4. perf_event_throttle()/unthrottle() are called on this inactive child event, which then call event->pmu->start()/stop(). 5. The PMU driver receives the event with hw.idx == -1 and attempts to use it as a shift exponent. e.g., in macros like PMCNTENSET(idx), leading to the UBSAN report. The throttling mechanism attempts to start/stop events that are not actively scheduled on the hardware. Move the state check into perf_event_throttle()/perf_event_unthrottle() so that inactive events are skipped entirely. This ensures only active events with a valid hw.idx are processed, preventing undefined behavior and silencing UBSAN warnings. The corrected check ensures true before proceeding with PMU operations. The problem can be reproduced with the syzkaller reproducer: Fixes: `9734e25fbf` ("perf: Fix the throttle logic for a group") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250812181046.292382-2-ysk@kzalloc.com	2025-08-15 13:12:56 +02:00
Jiri Olsa	e4414b01c1	bpf: Check the helper function is valid in get_helper_proto kernel test robot reported verifier bug [1] where the helper func pointer could be NULL due to disabled config option. As Alexei suggested we could check on that in get_helper_proto directly. Marking tail_call helper func with BPF_PTR_POISON, because it is unused by design. [1] https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+a9ed3d9132939852d0df@syzkaller.appspotmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250814200655.945632-1-jolsa@kernel.org Closes: https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com	2025-08-15 11:16:56 +02:00
Jesper Dangaard Brouer	2b986b9e91	bpf, cpumap: Disable page_pool direct xdp_return need larger scope When running an XDP bpf_prog on the remote CPU in cpumap code then we must disable the direct return optimization that xdp_return can perform for mem_type page_pool. This optimization assumes code is still executing under RX-NAPI of the original receiving CPU, which isn't true on this remote CPU. The cpumap code already disabled this via helpers xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(), but the scope didn't include xdp_do_flush(). When doing XDP_REDIRECT towards e.g devmap this causes the function bq_xmit_all() to run with direct return optimization enabled. This can lead to hard to find bugs. The issue only happens when bq_xmit_all() cannot ndo_xdp_xmit all frames and them frees them via xdp_return_frame_rx_napi(). Fix by expanding scope to include xdp_do_flush(). This was found by Dragos Tatulea. Fixes: `11941f8a85` ("bpf: cpumap: Implement generic cpumap") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Reported-by: Chris Arges <carges@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Chris Arges <carges@cloudflare.com> Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@firesoul	2025-08-15 11:08:08 +02:00
Paul E. McKenney	51c285baa3	rcutorture: Delay forward-progress testing until boot completes Forward-progress testing can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	6e9c48b3e3	torture: Delay CPU-hotplug operations until boot completes CPU-hotplug operations invoke stop-machine, which can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	9a316fe3ad	rcutorture: Delay rcutorture readers and writers until boot completes The rcutorture writers and (especially) readers are the biggest CPU hogs of the bunch, so this commit therefore makes them wait until boot has completed. This makes the current setting of the boot_ended local variable dead code, so while in the area, this commit removes that as well. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	1b0f583843	rcutorture: Suppress "Writer stall state" reports during boot When rcutorture is running on only the one boot-time CPU while that CPU is busy invoking initcall() functions, the added load is quite likely to unduly delay the RCU grace-period kthread, rcutorture readers, and much else besides. This can result in rcu_torture_stats_print() reporting rcutorture writer stalls, which are not really a bug in that environment. After all, one CPU can only do so much. This commit therefore suppresses rcutorture writer stalls while the kernel is booting, that is, while rcu_inkernel_boot_has_ended() continues returning false. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	b930ff84f3	torture: Announce kernel boot status at torture-test startup Sometimes a given system takes surprisingly long to boot, for example, in one recent case, 70 seconds instead of three seconds. It would be good to fix these slow-boot issues, but it would also be good for the torture tests to announce that the system was still booting at the start of the test. Especially for tests that have a greater probability of false positives when run in the single-CPU boot-time environment. Yes, those tests should defend themselves, but we should also make this situation easier to diagnose. This commit therefore causes torture_print_module_parms() to print "still booting" at the end of its printk() that dumps out the values of its module parameters. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Zqiang	42d590d100	rcu: Remove local_irq_save/restore() in rcu_preempt_deferred_qs_handler() The per-CPU rcu_data structure's ->defer_qs_iw field is initialized by IRQ_WORK_INIT_HARD(), which means that the subsequent invocation of rcu_preempt_deferred_qs_handler() will always be executed with interrupts disabled. This commit therefore removes the local_irq_save/restore() operations from rcu_preempt_deferred_qs_handler() and adds a call to lockdep_assert_irqs_disabled() in order to enable lockdep to diagnose mistaken invocations of this function from interrupts-enabled code. Signed-off-by: Zqiang <qiang.zhang@linux.dev> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:25:15 -07:00
Linus Torvalds	63467137ec	Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Netfilter and IPsec. Current release - regressions: - netfilter: nft_set_pipapo: - don't return bogus extension pointer - fix null deref for empty set Current release - new code bugs: - core: prevent deadlocks when enabling NAPIs with mixed kthread config - eth: netdevsim: Fix wild pointer access in nsim_queue_free(). Previous releases - regressions: - page_pool: allow enabling recycling late, fix false positive warning - sched: ets: use old 'nbands' while purging unused classes - xfrm: - restore GSO for SW crypto - bring back device check in validate_xmit_xfrm - tls: handle data disappearing from under the TLS ULP - ptp: prevent possible ABBA deadlock in ptp_clock_freerun() - eth: - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE - hv_netvsc: fix panic during namespace deletion with VF Previous releases - always broken: - netfilter: fix refcount leak on table dump - vsock: do not allow binding to VMADDR_PORT_ANY - sctp: linearize cloned gso packets in sctp_rcv - eth: - hibmcge: fix the division by zero issue - microchip: fix KSZ8863 reset problem" * tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) net: usb: asix_devices: add phy_mask for ax88772 mdio bus net: kcm: Fix race condition in kcm_unattach() selftests: net/forwarding: test purge of active DWRR classes net/sched: ets: use old 'nbands' while purging unused classes bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE netdevsim: Fix wild pointer access in nsim_queue_free(). net: mctp: Fix bad kfree_skb in bind lookup test netfilter: nf_tables: reject duplicate device on updates ipvs: Fix estimator kthreads preferred affinity netfilter: nft_set_pipapo: fix null deref for empty set selftests: tls: test TCP stealing data from under the TLS socket tls: handle data disappearing from under the TLS ULP ptp: prevent possible ABBA deadlock in ptp_clock_freerun() ixgbe: prevent from unwanted interface name changes devlink: let driver opt out of automatic phys_port_name generation net: prevent deadlocks when enabling NAPIs with mixed kthread config net: update NAPI threaded config even for disabled NAPIs selftests: drv-net: don't assume device has only 2 queues docs: Fix name for net.ipv4.udp_child_hash_entries riscv: dts: thead: Add APB clocks for TH1520 GMACs ...	2025-08-14 07:14:30 -07:00
Paul E. McKenney	faab3ae329	rcu: Document that rcu_barrier() hurries lazy callbacks This commit adds to the rcu_barrier() kerneldoc header stating that this function hurries lazy callbacks and that it does not normally result in additional RCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-13 15:04:06 -07:00
Chen Ridong	4c70fb2624	cpuset: remove redundant CS_ONLINE flag The CS_ONLINE flag was introduced prior to the CSS_ONLINE flag in the cpuset subsystem. Currently, the flag setting sequence is as follows: 1. cpuset_css_online() sets CS_ONLINE 2. css->flags gets CSS_ONLINE set ... 3. cgroup->kill_css sets CSS_DYING 4. cpuset_css_offline() clears CS_ONLINE 5. css->flags clears CSS_ONLINE The is_cpuset_online() check currently occurs between steps 1 and 3. However, it would be equally safe to perform this check between steps 2 and 3, as CSS_ONLINE provides the same synchronization guarantee as CS_ONLINE. Since CS_ONLINE is redundant with CSS_ONLINE and provides no additional synchronization benefits, we can safely remove it to simplify the code. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-13 08:14:20 -10:00
Dan Carpenter	70d0085864	audit: add a missing tab Someone got a bit carried away deleting tabs. Add it back. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paul Moore <paul@paul-moore.com>	2025-08-13 11:51:43 -04:00
Shanker Donthineni	89a2d212bd	dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted When CONFIG_DMA_DIRECT_REMAP is enabled, atomic pool pages are remapped via dma_common_contiguous_remap() using the supplied pgprot. Currently, the mapping uses pgprot_dmacoherent(PAGE_KERNEL), which leaves the memory encrypted on systems with memory encryption enabled (e.g., ARM CCA Realms). This can cause the DMA layer to fail or crash when accessing the memory, as the underlying physical pages are not configured as expected. Fix this by requesting a decrypted mapping in the vmap() call: pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)) This ensures that atomic pool memory is consistently mapped unencrypted. Cc: stable@vger.kernel.org Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250811181759.998805-1-sdonthineni@nvidia.com	2025-08-13 11:02:10 +02:00
John Stultz	21924af67d	locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() path The __clear_task_blocked_on() helper added a number of sanity checks ensuring we hold the mutex wait lock and that the task we are clearing blocked_on pointer (if set) matches the mutex. However, there is an edge case in the _ww_mutex_wound() logic where we need to clear the blocked_on pointer for the task that owns the mutex, not the task that is waiting on the mutex. For this case the sanity checks aren't valid, so handle this by allowing a NULL lock to skip the additional checks. K Prateek Nayak and Maarten Lankhorst also pointed out that in this case where we don't hold the owner's mutex wait_lock, we need to be a bit more careful using READ_ONCE/WRITE_ONCE in both the __clear_task_blocked_on() and __set_task_blocked_on() implementations to avoid accidentally tripping WARN_ONs if two instances race. So do that here as well. This issue was easier to miss, I realized, as the test-ww_mutex driver only exercises the wait-die class of ww_mutexes. I've sent a patch[1] to address this so the logic will be easier to test. [1]: https://lore.kernel.org/lkml/20250801023358.562525-2-jstultz@google.com/ Fixes: `a4f0b6fef4` ("locking/mutex: Add p->blocked_on wrappers for correctness checks") Closes: https://lore.kernel.org/lkml/68894443.a00a0220.26d0e1.0015.GAE@google.com/ Reported-by: syzbot+602c4720aed62576cd79@syzkaller.appspotmail.com Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250805001026.2247040-1-jstultz@google.com	2025-08-13 10:34:54 +02:00
Frederic Weisbecker	c0a23bbc98	ipvs: Fix estimator kthreads preferred affinity The estimator kthreads' affinity are defined by sysctl overwritten preferences and applied through a plain call to the scheduler's affinity API. However since the introduction of managed kthreads preferred affinity, such a practice shortcuts the kthreads core code which eventually overwrites the target to the default unbound affinity. Fix this with using the appropriate kthread's API. Fixes: `d1a8919758` ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-08-13 08:34:33 +02:00
Qianfeng Rong	bf0c2a84df	bpf: Replace kvfree with kfree for kzalloc memory The 'backedge' pointer is allocated with kzalloc(), which returns physically contiguous memory. Using kvfree() to deallocate such memory is functionally safe but semantically incorrect. Replace kvfree() with kfree() to avoid unnecessary is_vmalloc_addr() check in kvfree(). Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250811123949.552885-1-rongqianfeng@vivo.com	2025-08-12 15:55:01 -07:00
Qianfeng Rong	3e2b799008	bpf: Remove redundant __GFP_NOWARN Commit `16f5dfbc85` ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT \| __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250804122731.460158-1-rongqianfeng@vivo.com	2025-08-12 14:56:04 -07:00
Martin KaFai Lau	9e293d47bf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Cross merge bpf/master after 6.17-rc1. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-08-12 12:14:02 -07:00
Thorsten Blum	8a013ec9cb	cgroup: Replace deprecated strcpy() with strscpy() strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-12 08:57:03 -10:00
Andrea Righi	ddf7233fca	sched/ext: Fix invalid task state transitions on class switch When enabling a sched_ext scheduler, we may trigger invalid task state transitions, resulting in warnings like the following (which can be easily reproduced by running the hotplug selftest in a loop): sched_ext: Invalid task state transition 0 -> 3 for fish[770] WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0 ... RIP: 0010:scx_set_task_state+0x7c/0xc0 ... Call Trace: <TASK> scx_enable_task+0x11f/0x2e0 switching_to_scx+0x24/0x110 scx_enable.isra.0+0xd14/0x13d0 bpf_struct_ops_link_create+0x136/0x1a0 __sys_bpf+0x1edd/0x2c30 __x64_sys_bpf+0x21/0x30 do_syscall_64+0xbb/0x370 entry_SYSCALL_64_after_hwframe+0x77/0x7f This happens because we skip initialization for tasks that are already dead (with their usage counter set to zero), but we don't exclude them during the scheduling class transition phase. Fix this by also skipping dead tasks during class swiching, preventing invalid task state transitions. Fixes: `a8532fac7b` ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-11 06:56:37 -10:00
Waiman Long	dfb36e4a8d	futex: Use user_write_access_begin/_end() in futex_put_value() Commit `cec199c5e3` ("futex: Implement FUTEX2_NUMA") introduced the futex_put_value() helper to write a value to the given user address. However, it uses user_read_access_begin() before the write. For architectures that differentiate between read and write accesses, like PowerPC, futex_put_value() fails with -EFAULT. Fix that by using the user_write_access_begin/user_write_access_end() pair instead. Fixes: `cec199c5e3` ("futex: Implement FUTEX2_NUMA") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250811141147.322261-1-longman@redhat.com	2025-08-11 17:53:21 +02:00
Kieran Moy	df1145b56c	audit: fix typo in auditfilter.c comment Correct the misspelling of "searching" (was "serarching") in the function documentation for audit_update_lsm_rules. Found via code inspection, no functional impact. Signed-off-by: Kieran Moy <kfatyuip@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2025-08-11 11:44:57 -04:00
Thorsten Blum	d8c09d7b55	audit: Replace deprecated strcpy() with strscpy() strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Paul Moore <paul@paul-moore.com>	2025-08-11 11:44:55 -04:00
Casey Schaufler	c5055d0c8e	audit: fix indentation in audit_log_exit() Fix two indentation errors in audit_log_exit(). Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>	2025-08-11 11:44:50 -04:00
Oreoluwa Babatunde	2c223f7239	of: reserved_mem: Restructure call site for dma_contiguous_early_fixup() Restructure the call site for dma_contiguous_early_fixup() to where the reserved_mem nodes are being parsed from the DT so that dma_mmu_remap[] is populated before dma_contiguous_remap() is called. Fixes: `8a6e02d0c0` ("of: reserved_mem: Restructure how the reserved memory regions are processed") Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250806172421.2748302-1-oreoluwa.babatunde@oss.qualcomm.com	2025-08-11 13:05:38 +02:00
Qianfeng Rong	110aa2c74d	swiotlb: Remove redundant __GFP_NOWARN Commit `16f5dfbc85` ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT \| __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250805023222.332920-1-rongqianfeng@vivo.com	2025-08-11 11:29:38 +02:00
Petr Tesarik	9f683dfe80	dma-direct: clean up the logic in __dma_direct_alloc_pages() Convert a goto-based loop to a while() loop. To allow the simplification, return early when allocation from CMA is successful. As a bonus, this early return avoids a repeated dma_coherent_ok() check. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com	2025-08-11 11:28:04 +02:00
Frederic Weisbecker	61399e0c54	rcu: Fix racy re-initialization of irq_work causing hangs RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: `b41642c877` ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>	2025-08-11 08:43:49 +05:30
Linus Torvalds	b96ddbc5c8	Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp fixes from Borislav Petkov: - Remove an obsolete comment and fix spelling * tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Remove obsolete comment from takedown_cpu() smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment	2025-08-10 08:51:37 +03:00
Linus Torvalds	7d2fed1f3c	Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a wrong ioremap size in mvebu-gicp - Remove yet another compile-test case for a driver which needs an additional dependency - Fix a lock inversion scenario in the IRQ unit test suite - Remove an impossible flag situation in gic-v5 - Do not iounmap resources in gic-v5 which are managed by devm - Make sure stale, left-over interrupts in mvebu-gicp are cleared on driver init - Fix a reference counting mishap in msi-lib - Fix a dereference-before-null-ptr-check case in the riscv-imsic irqchip driver * tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mvebu-gicp: Use resource_size() for ioremap() irqchip: Build IMX_MU_MSI only on ARM genirq/test: Resolve irq lock inversion warnings irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs irqchip/gic-v5: iwb: Fix iounmap probe failure path irqchip/mvebu-gicp: Clear pending interrupts on init irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select() irqchip/riscv-imsic: Don't dereference before NULL pointer check	2025-08-10 08:46:47 +03:00
Linus Torvalds	8e8f6b635f	Merge tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Prevent a futex hash leak due to different mm lifetimes * tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Move futex cleanup to __mmdrop()	2025-08-10 08:11:39 +03:00
JP Kobryn	eea51c6e3f	cgroup: avoid null de-ref in css_rstat_exit() css_rstat_exit() may be called asynchronously in scenarios where preceding calls to css_rstat_init() have not completed. One such example is this sequence below: css_create(...) { ... init_and_link_css(css, ...); err = percpu_ref_init(...); if (err) goto err_free_css; err = cgroup_idr_alloc(...); if (err) goto err_free_css; err = css_rstat_init(css, ...); if (err) goto err_free_css; ... err_free_css: INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); } If any of the three goto jumps are taken, async cleanup will begin and css_rstat_exit() will be invoked on an uninitialized css->rstat_cpu. Avoid accessing the unitialized field by returning early in css_rstat_exit() if this is the case. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Suggested-by: Michal Koutný <mkoutny@suse.com> Fixes: `5da3bfa029` ("cgroup: use separate rstat trees for each subsystem") Cc: stable@vger.kernel.org # v6.16 Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-09 08:46:32 -10:00
Waiman Long	87eba5bc5a	cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write() The css_get/put() calls in cpuset_partition_write() are unnecessary as an active reference of the kernfs node will be taken which will prevent its removal and guarantee the existence of the css. Only the online check is needed. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-09 08:42:47 -10:00
Waiman Long	150e298ae0	cgroup/cpuset: Fix a partition error with CPU hotplug It was found during testing that an invalid leaf partition with an empty effective exclusive CPU list can become a valid empty partition with no CPU afer an offline/online operation of an unrelated CPU. An empty partition root is allowed in the special case that it has no task in its cgroup and has distributed out all its CPUs to its child partitions. That is certainly not the case here. The problem is in the cpumask_subsets() test in the hotplug case (update with no new mask) of update_parent_effective_cpumask() as it also returns true if the effective exclusive CPU list is empty. Fix that by addding the cpumask_empty() test to root out this exception case. Also add the cpumask_empty() test in cpuset_hotplug_update_tasks() to avoid calling update_parent_effective_cpumask() for this special case. Fixes: `0c7f293efc` ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-09 08:42:28 -10:00
Waiman Long	65f97cc81b	cgroup/cpuset: Use static_branch_enable_cpuslocked() on cpusets_insane_config_key The following lockdep splat was observed. [ 812.359086] ============================================ [ 812.359089] WARNING: possible recursive locking detected [ 812.359097] -------------------------------------------- [ 812.359100] runtest.sh/30042 is trying to acquire lock: [ 812.359105] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0xe/0x20 [ 812.359131] [ 812.359131] but task is already holding lock: [ 812.359134] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_write_resmask+0x98/0xa70 : [ 812.359267] Call Trace: [ 812.359272] <TASK> [ 812.359367] cpus_read_lock+0x3c/0xe0 [ 812.359382] static_key_enable+0xe/0x20 [ 812.359389] check_insane_mems_config.part.0+0x11/0x30 [ 812.359398] cpuset_write_resmask+0x9f2/0xa70 [ 812.359411] cgroup_file_write+0x1c7/0x660 [ 812.359467] kernfs_fop_write_iter+0x358/0x530 [ 812.359479] vfs_write+0xabe/0x1250 [ 812.359529] ksys_write+0xf9/0x1d0 [ 812.359558] do_syscall_64+0x5f/0xe0 Since commit `d74b27d63a` ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), the ordering of cpu hotplug lock and cpuset_mutex had been reversed. That patch correctly used the cpuslocked version of the static branch API to enable cpusets_pre_enable_key and cpusets_enabled_key, but it didn't do the same for cpusets_insane_config_key. The cpusets_insane_config_key can be enabled in the check_insane_mems_config() which is called from update_nodemask() or cpuset_hotplug_update_tasks() with both cpu hotplug lock and cpuset_mutex held. Deadlock can happen with a pending hotplug event that tries to acquire the cpu hotplug write lock which will block further cpus_read_lock() attempt from check_insane_mems_config(). Fix that by switching to use static_branch_enable_cpuslocked(). Fixes: `d74b27d63a` ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-08-09 08:41:39 -10:00
Linus Torvalds	c30a13538d	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix memory leak of bpf_scc_info objects (Eduard Zingerman) - Fix a regression in the 'perf' tool caused by moving UID filtering to BPF (Ilya Leoshkevich) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf bpf-filter: Enable events manually libbpf: Add the ability to suppress perf event enablement bpf: Fix memory leak of bpf_scc_info objects	2025-08-09 09:03:21 +03:00
Eduard Zingerman	77620d1267	bpf: use realloc in bpf_patch_insn_data Avoid excessive vzalloc/vfree calls when patching instructions in do_misc_fixups(). bpf_patch_insn_data() uses vzalloc to allocate new memory for env->insn_aux_data for each patch as follows: struct bpf_prog *bpf_patch_insn_data(env, ...) { ... new_data = vzalloc(... O(program size) ...); ... adjust_insn_aux_data(env, new_data, ...); ... } void adjust_insn_aux_data(env, new_data, ...) { ... memcpy(new_data, env->insn_aux_data); vfree(env->insn_aux_data); env->insn_aux_data = new_data; ... } The vzalloc/vfree pair is hot in perf report collected for e.g. pyperf180 test case. It can be replaced with a call to vrealloc in order to reduce the number of actual memory allocations. This is a stop-gap solution, as bpf_patch_insn_data is still hot in the profile. More comprehansive solutions had been discussed before e.g. as in [1]. [1] https://lore.kernel.org/bpf/CAEf4BzY_E8MSL4mD0UPuuiDcbJhh9e2xQo2=5w+ppRWWiYSGvQ@mail.gmail.com/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Tested-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20250807010205.3210608-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-07 09:17:02 -07:00
Eduard Zingerman	cb070a8156	bpf: removed unused 'env' parameter from is_reg64 and insn_has_def32 Parameter 'env' is not used by is_reg64() and insn_has_def32() functions. Remove the parameter to make it clear that neither function depends on 'env' state, e.g. env->insn_aux_data. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250807010205.3210608-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-07 09:17:02 -07:00
Waiman Long	da274853fe	cpu: Remove obsolete comment from takedown_cpu() takedown_cpu() has a comment about "all preempt/rcu users must observe !cpu_active()" which is kind of meaningless in this function. This comment was originally introduced by commit `6acce3ef84` ("sched: Remove get_online_cpus() usage") when _cpu_down() was setting cpu_active_mask and synchronize_rcu()/synchronize_sched() were added after that. Later commit `40190a78f8` ("sched/hotplug: Convert cpu_[in]active notifiers to state machine") added a new CPUHP_AP_ACTIVE hotplug state to set/clear cpu_active_mask. The following commit `b2454caa89` ("sched/hotplug: Move sync_rcu to be with set_cpu_active(false)") move the synchronize_*() calls to sched_cpu_deactivate() associated with the new hotplug state, but left the comment behind. Remove this comment as it is no longer relevant in takedown_cpu(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250729191232.664931-1-longman@redhat.com	2025-08-06 22:48:12 +02:00
Amery Hung	d87a513d09	bpf: Allow struct_ops to get map id by kdata Add bpf_struct_ops_id() to enable struct_ops implementors to use struct_ops map id as the unique id of a struct_ops in their subsystem. A subsystem that wishes to create a mapping between id and struct_ops instance pointer can update the mapping accordingly during bpf_struct_ops::reg(), unreg(), and update(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250806162540.681679-2-ameryhung@gmail.com	2025-08-06 13:39:58 -07:00
Brian Norris	5b65258229	genirq/test: Resolve irq lock inversion warnings irq_shutdown_and_deactivate() is normally called with the descriptor lock held, and interrupts disabled. Nested a few levels down, it grabs the global irq_resend_lock. Lockdep rightfully complains when interrupts are not disabled: CPU0 CPU1 ---- ---- lock(irq_resend_lock); local_irq_disable(); lock(&irq_desc_lock_class); lock(irq_resend_lock); <Interrupt> lock(&irq_desc_lock_class); ... _raw_spin_lock+0x2b/0x40 clear_irq_resend+0x14/0x70 irq_shutdown_and_deactivate+0x29/0x80 irq_shutdown_depth_test+0x1ce/0x600 kunit_try_run_case+0x90/0x120 Grab the descriptor lock and disable interrupts, to resolve the problem. Fixes: `66067c3c8a` ("genirq: Add kunit tests for depth counts") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/aJJONEIoIiTSDMqc@google.com Closes: https://lore.kernel.org/lkml/31a761e4-8f81-40cf-aaf5-d220ba11911c@roeck-us.net/	2025-08-06 10:29:48 +02:00

... 10 11 12 13 14 ...

49584 Commits