linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-03 03:23:11 -04:00

Author	SHA1	Message	Date
Cunlong Li	22572dbcd3	cgroup: rstat: relax NMI guard after switch to try_cmpxchg Commit `36df6e3dbd` ("cgroup: make css_rstat_updated nmi safe") used this_cpu_cmpxchg() for the lockless insertion, and therefore required both ARCH_HAVE_NMI_SAFE_CMPXCHG and ARCH_HAS_NMI_SAFE_THIS_CPU_OPS in the NMI guard: on archs without the latter, this_cpu_cmpxchg() falls back to "local_irq_save() + plain cmpxchg", and local_irq_save() cannot mask NMIs. Commit `3309b63a22` ("cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated") later replaced this_cpu_cmpxchg() with plain try_cmpxchg() to fix cross-CPU lockless-list corruption, but left the NMI guard untouched. After that switch, css_rstat_updated() no longer performs any this_cpu_*() RMW operations and only relies on the arch having NMI-safe cmpxchg, so ARCH_HAS_NMI_SAFE_THIS_CPU_OPS is no longer required in the guard. Relax the guard accordingly so that archs which have HAVE_NMI and ARCH_HAVE_NMI_SAFE_CMPXCHG but not ARCH_HAS_NMI_SAFE_THIS_CPU_OPS (e.g. sparc, powerpc on PPC64/BOOK3S) can benefit from the existing CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC path. Without this, the css is never queued in NMI on those archs, and the atomics staged by account_{slab,kmem}_nmi_safe() are not drained by flush_nmi_stats(). Fixes: `3309b63a22` ("cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated") Signed-off-by: Cunlong Li <shenxiaogll@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-20 09:44:35 -10:00
Qing Ming	8817005efb	cgroup/rstat: validate cpu before css_rstat_cpu() access css_rstat_updated() is exposed as a BPF kfunc and accepts a caller-provided cpu argument. The function uses cpu for per-cpu rstat lookups without checking whether it refers to a valid possible CPU. A BPF iter/cgroup program with CAP_BPF and CAP_PERFMON can pass an invalid cpu value. On an unfixed UBSCAN_BOUNDS test kernel, cpu == 0x7fffffff triggers: UBSAN: array-index-out-of-bounds in kernel/cgroup/rstat.c:31:9 index 2147483647 is out of range for type 'long unsigned int [64]' Call Trace: css_rstat_updated bpf_iter_run_prog cgroup_iter_seq_show bpf_seq_read Add cpu validation to the BPF-facing css_rstat_updated() kfunc and move the common implementation to __css_rstat_updated() for in-kernel callers. Fixes: `a319185be9` ("cgroup: bpf: enable bpf programs to integrate with rstat") Signed-off-by: Qing Ming <a0yami@mailbox.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-18 09:31:52 -10:00
sunshaojie	345f401666	cgroup/cpuset: Return only actually allocated CPUs during partition invalidation In update_parent_effective_cpumask() with partcmd_invalidate, the CPUs to return to the parent are computed as: adding = cpumask_and(tmp->addmask, xcpus, parent->effective_xcpus); where xcpus = user_xcpus(cs) which returns cs->exclusive_cpus (if set) or cs->cpus_allowed. When exclusive_cpus is not set, user_xcpus(cs) can contain CPUs that were never actually granted to the partition due to sibling exclusion in compute_excpus(). Consequently, the invalidation may return CPUs to the parent that remain in use by sibling partitions, causing overlapping effective_cpus and triggering the WARN_ON_ONCE(1) in generate_sched_domains(). Use cs->effective_xcpus instead, which reflects the CPUs actually granted to this partition. Reproducer (on a 4-CPU machine): cd /sys/fs/cgroup mkdir a1 b1 # a1 becomes partition root with CPUs 0-1 echo "0-1" > a1/cpuset.cpus echo "root" > a1/cpuset.cpus.partition # b1 becomes partition root with CPUs 1-2, but sibling exclusion # reduces its effective_xcpus to CPU 2 only echo "1-2" > b1/cpuset.cpus echo "root" > b1/cpuset.cpus.partition # b1 changes cpus_allowed to 0-1 -> partition invalidation echo "0-1" > b1/cpuset.cpus # Expected: CPUs 2-3 (only CPU 2 returned from b1) # Actual: CPUs 1-3 (CPU 0-1 returned, overlapping with a1) cat cpuset.cpus.effective dmesg will also show a WARNING from generate_sched_domains() reporting overlapping partition root effective_cpus. Fixes: `2a3602030d` ("cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: sunshaojie <sunshaojie@kylinos.cn> Tested-by: Chen Ridong <chenridong@huaweicloud.com> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-13 08:54:53 -10:00
Yu Miao	7d8f3158a5	selftests/cgroup: Fix error path leaks in test_percpu_basic When cg_name_indexed() returns NULL partway through the child creation loop, the code returned -1 without running cleanup_children and cleanup. That left the `parent` pathname allocation unreleased and did not remove child cgroup directories already created under the parent. Fix by jumping to cleanup_children instead of returning. When cg_create() fails, `child` (the pathname from cg_name_indexed()) was not freed before cleanup_children. Fix by freeing `child` before branching to cleanup_children. Fixes: `90631e1dea` ("kselftests: cgroup: add perpcu memory accounting test") Signed-off-by: Yu Miao <yumiao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-13 08:40:52 -10:00
Guopeng Zhang	5dd74441cb	cgroup/cpuset: Reserve DL bandwidth only for root-domain moves cpuset_can_attach() currently adds the bandwidth of all migrating SCHED_DEADLINE tasks to sum_migrate_dl_bw. If the source and destination cpuset effective CPU masks do not overlap, the whole sum is then reserved in the destination root domain. set_cpus_allowed_dl(), however, subtracts bandwidth from the source root domain only when the affinity change really moves the task between root domains. A DL task can move between cpusets that are still in the same root domain, so including that task in sum_migrate_dl_bw can reserve destination bandwidth without a matching source-side subtraction. Share the root-domain move test with set_cpus_allowed_dl(). Keep nr_migrate_dl_tasks counting all migrating deadline tasks for cpuset DL task accounting, but add to sum_migrate_dl_bw only for tasks that need a root-domain bandwidth move. Keep using the destination cpuset effective CPU mask and leave the broader can_attach()/attach() transaction model unchanged. Fixes: `2ef269ef1a` ("cgroup/cpuset: Free DL BW in case can_attach() fails") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-11 10:27:14 -10:00
Guopeng Zhang	4a39eda5fd	cgroup/cpuset: Reset DL migration state on can_attach() failure cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration state in the destination cpuset while walking the taskset. If a later task_can_attach() or security_task_setscheduler() check fails, cgroup_migrate_execute() treats cpuset as the failing subsystem and does not call cpuset_cancel_attach() for it. The partially accumulated state is then left behind and can be consumed by a later attach, corrupting cpuset DL task accounting and pending DL bandwidth accounting. Reset the pending DL migration state from the common error exit when ret is non-zero. Successful can_attach() keeps the state for cpuset_attach() or cpuset_cancel_attach(). Fixes: `2ef269ef1a` ("cgroup/cpuset: Free DL BW in case can_attach() fails") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Reviewed-by: Waiman Long <longman@redhat.com>	2026-05-10 22:14:49 -10:00
Hongfu Li	2a3d7256fa	selftests/cgroup: Fix string comparison in write_test Use string comparison (!=) instead of numeric comparison (-ne) for cpuset values like "0-1". For example: $ [[ "0-1" != "2-3" ]] && echo "true" \|\| echo "false" true $ [[ "0-1" -ne "2-3" ]] && echo "true" \|\| echo "false" false Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-10 15:54:12 -10:00
Hongfu Li	e32e6f0216	selftests/cgroup: Fix cg_read_strcmp() empty string comparison cg_read_strcmp() allocated a buffer sized to strlen(expected) + 1, then passed it to read_text() which calls read(fd, buf, size-1). When comparing against an empty string (""), strlen("") = 0 gives a 1-byte buffer, and read() is asked to read 0 bytes. The file content is never actually read, so strcmp("", buf) always returns 0 regardless of the real content. This caused cg_test_proc_killed() to always report the cgroup as empty immediately, making OOM tests pass without verifying that processes were killed. Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-10 15:53:44 -10:00
Guopeng Zhang	796ad62204	cgroup/dmem: Return -ENOMEM on failed pool preallocation get_cg_pool_unlocked() handles allocation failures under dmemcg_lock by dropping the lock, preallocating a pool with GFP_KERNEL, and retrying the locked lookup and creation path. If the fallback allocation fails too, pool remains NULL. Since the loop condition is while (!pool), the function can keep retrying instead of propagating the allocation failure to the caller. Set pool to ERR_PTR(-ENOMEM) when the fallback allocation fails so the loop exits through the existing common return path. The callers already handle ERR_PTR() from get_cg_pool_unlocked(), so this restores the expected error path. Fixes: `b168ed458d` ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-10 15:43:46 -10:00
Chen Wandun	dde2f938d0	cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed() Since prepare_alloc_pages() unconditionally adds __GFP_HARDWALL for the fast path when cpusets are enabled, the __GFP_HARDWALL check in cpuset_current_node_allowed() causes the PF_EXITING escape path to be skipped on the first allocation attempt. This makes it unreachable in the common case, so dying tasks can get stuck in direct reclaim or even trigger OOM while trying to exit, despite being allowed to allocate from any node. Move the PF_EXITING check before __GFP_HARDWALL so that dying tasks can allocate memory from any node to exit quickly, even when cpusets are enabled. Also update the function comment to reflect the actual behavior of prepare_alloc_pages() and the corrected check ordering. Signed-off-by: Chen Wandun <chenwandun@lixiang.com> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-07 11:57:31 -10:00
T.J. Mercier	d8769544bd	docs: cgroup-v1: Update charge-commit section Commit `1d8f136a42` ("memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocol") removed mem_cgroup_commit_charge() and mem_cgroup_cancel_charge(), but the docs still refer to those functions. There is no longer any charge cancellation. Update the docs to match the code. Signed-off-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-04 11:02:12 -10:00
Tejun Heo	93618edf75	cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated A chain of commits going back to v7.0 reworked rmdir to satisfy the controller invariant that a subsystem's ->css_offline() must not run while tasks are still doing kernel-side work in the cgroup. [1] `d245698d72` ("cgroup: Defer task cgroup unlink until after the task is done switching out") [2] `a72f73c4dd` ("cgroup: Don't expose dead tasks in cgroup") [3] `1b164b876c` ("cgroup: Wait for dying tasks to leave on rmdir") [4] `4c56a8ac68` ("cgroup: Fix cgroup_drain_dying() testing the wrong condition") [5] `13e786b64b` ("cgroup: Increment nr_dying_subsys_* from rmdir context") [1] moved task cset unlink from do_exit() to finish_task_switch() so a task's cset link drops only after the task has fully stopped scheduling. That made tasks past exit_signals() linger on cset->tasks until their final context switch, which led to a series of problems as what userspace expected to see after rmdir diverged from what the kernel needs to wait for. [2]-[5] tried to bridge that divergence: [2] filtered the exiting tasks from cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4] fixed the wait's condition; [5] made nr_dying_subsys_* visible synchronously. The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g. host PID 1 systemd reaping orphan pids that were re-parented to it during the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those pids to free, the pids can't free because PID 1 is the reaper and it's stuck in rmdir, and the system A-A deadlocks. No internal lock ordering breaks this; the wait itself is the bug. The css killing side that drove the original reorder, however, can be made cleanly asynchronous: ->css_offline() is already async, run from css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to make that chain start only after all tasks have left the cgroup. rmdir's user-visible side then returns as soon as cgroup.procs and friends are empty, while ->css_offline() still runs only after the cgroup is fully drained. Verified by the original reproducer (pidns teardown + zombie reaper, runs under vng) which hangs vanilla and succeeds here, and by per-commit deterministic repros for [2], [3], [4], [5] with a boot parameter that widens the post-exit_signals() window so each state is reliably reachable. Some stress tests on top of that. cgroup_apply_control_disable() has the same shape of pre-existing race: when a controller is disabled via subtree_control, kill_css() ran synchronously while tasks past exit_signals() could still be linked to the cgroup's csets, and ->css_offline() could fire before they drained. This patch preserves the existing synchronous behavior at that call site (kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch will defer kill_css_finish() there using a per-css trigger. This seems like the right approach and I don't see problems with it. The changes are somewhat invasive but not excessively so, so backporting to -stable should be okay. If something does turn out to be wrong, the fallback is to revert the entire chain ([1]-[5]) and rework in the development branch instead. v2: Pin cgrp across the deferred destroy work with explicit cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1 wasn't actually broken (ordered cgroup_offline_wq + queue_work order in cgroup_task_dead() saved it) but the explicit ref removes the dependency on those non-obvious invariants. Also note the pre-existing cgroup_apply_control_disable() race in the description; a follow-up will defer kill_css_finish() there. Fixes: `1b164b876c` ("cgroup: Wait for dying tasks to leave on rmdir") Cc: stable@vger.kernel.org # v7.0+ Reported-and-tested-by: Martin Pitt <martin@piware.de> Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/ Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/ Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2026-05-04 08:52:26 -10:00
Petr Vaněk	981cd33861	docs: cgroup: fix typo 'protetion' -> 'protection' Fix a small typo in the description of the memory_hugetlb_accounting mount option. Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-27 07:55:40 -10:00
Petr Malat	13e786b64b	cgroup: Increment nr_dying_subsys_* from rmdir context Incrementing nr_dying_subsys_* in offline_css(), which is executed by cgroup_offline_wq worker, leads to a race where user can see the value to be 0 if he reads cgroup.stat after calling rmdir and before the worker executes. This makes the user wrongly expect resources released by the removed cgroup to be available for a new assignment. Increment nr_dying_subsys_* from kill_css(), which is called from the cgroup_rmdir() context. Fixes: `ab03125268` ("cgroup: Show # of subsystem CSSes in cgroup.stat") Signed-off-by: Petr Malat <oss@malat.biz> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-23 07:37:40 -10:00
Guopeng Zhang	41d701ddc3	cgroup/cpuset: record DL BW alloc CPU for attach rollback cpuset_can_attach() allocates DL bandwidth only when migrating deadline tasks to a disjoint CPU mask, but cpuset_cancel_attach() rolls back based only on nr_migrate_dl_tasks. This makes the DL bandwidth alloc/free paths asymmetric: rollback can call dl_bw_free() even when no dl_bw_alloc() was done. Rollback also needs to undo the reservation against the same CPU/root domain that was charged. Record the CPU used by dl_bw_alloc() and use that state in cpuset_cancel_attach(). If no allocation happened, dl_bw_cpu stays at -1 and rollback skips dl_bw_free(). If allocation did happen, bandwidth is returned to the same CPU/root domain. Successful attach paths are unchanged. This only fixes failed attach rollback accounting. Fixes: `2ef269ef1a` ("cgroup/cpuset: Free DL BW in case can_attach() fails") Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-17 08:57:37 -10:00
cuitao	c802f460dd	cgroup/rdma: fix integer overflow in rdmacg_try_charge() The expression `rpool->resources[index].usage + 1` is computed in int arithmetic before being assigned to s64 variable `new`. When usage equals INT_MAX (the default "max" value), the addition overflows to INT_MIN. This negative value then passes the `new > max` check incorrectly, allowing a charge that should be rejected and corrupting usage to negative. Fix by casting usage to s64 before the addition so the arithmetic is done in 64-bit. Fixes: `39d3e7584a` ("rdmacg: Added rdma cgroup controller") Signed-off-by: cuitao <cuitao@kylinos.cn> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-17 07:25:27 -10:00
Edward Adam Davis	a5b98009f1	sched/psi: fix race between file release and pressure write A potential race condition exists between pressure write and cgroup file release regarding the priv member of struct kernfs_open_file, which triggers the uaf reported in [1]. Consider the following scenario involving execution on two separate CPUs: CPU0 CPU1 ==== ==== vfs_rmdir() kernfs_iop_rmdir() cgroup_rmdir() cgroup_kn_lock_live() cgroup_destroy_locked() cgroup_addrm_files() cgroup_rm_file() kernfs_remove_by_name() kernfs_remove_by_name_ns() vfs_write() __kernfs_remove() new_sync_write() kernfs_drain() kernfs_fop_write_iter() kernfs_drain_open_files() cgroup_file_write() kernfs_release_file() pressure_write() cgroup_file_release() ctx = of->priv; kfree(ctx); of->priv = NULL; cgroup_kn_unlock() cgroup_kn_lock_live() cgroup_get(cgrp) cgroup_kn_unlock() if (ctx->psi.trigger) // here, trigger uaf for ctx, that is of->priv The cgroup_rmdir() is protected by the cgroup_mutex, it also safeguards the memory deallocation of of->priv performed within cgroup_file_release(). However, the operations involving of->priv executed within pressure_write() are not entirely covered by the protection of cgroup_mutex. Consequently, if the code in pressure_write(), specifically the section handling the ctx variable executes after cgroup_file_release() has completed, a uaf vulnerability involving of->priv is triggered. Therefore, the issue can be resolved by extending the scope of the cgroup_mutex lock within pressure_write() to encompass all code paths involving of->priv, thereby properly synchronizing the race condition occurring between cgroup_file_release() and pressure_write(). And, if an live kn lock can be successfully acquired while executing the pressure write operation, it indicates that the cgroup deletion process has not yet reached its final stage; consequently, the priv pointer within open_file cannot be NULL. Therefore, the operation to retrieve the ctx value must be moved to a point after the live kn lock has been successfully acquired. In another situation, specifically after entering cgroup_kn_lock_live() but before acquiring cgroup_mutex, there exists a different class of race condition: CPU0: write memory.pressure CPU1: write cgroup.pressure=0 =========================== ============================= kernfs_fop_write_iter() kernfs_get_active_of(of) pressure_write() cgroup_kn_lock_live(memory.pressure) cgroup_tryget(cgrp) kernfs_break_active_protection(kn) ... blocks on cgroup_mutex cgroup_pressure_write() cgroup_kn_lock_live(cgroup.pressure) cgroup_file_show(memory.pressure, false) kernfs_show(false) kernfs_drain_open_files() cgroup_file_release(of) kfree(ctx) of->priv = NULL cgroup_kn_unlock() ... acquires cgroup_mutex ctx = of->priv; // may now be NULL if (ctx->psi.trigger) // NULL dereference Consequently, there is a possibility that of->priv is NULL, the pressure write needs to check for this. Now that the scope of the cgroup_mutex has been expanded, the original explicit cgroup_get/put operations are no longer necessary, this is because acquiring/releasing the live kn lock inherently executes a cgroup get/put operation. [1] BUG: KASAN: slab-use-after-free in pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011 Call Trace: pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011 cgroup_file_write+0x36f/0x790 kernel/cgroup/cgroup.c:4311 kernfs_fop_write_iter+0x3b0/0x540 fs/kernfs/file.c:352 Allocated by task 9352: cgroup_file_open+0x90/0x3a0 kernel/cgroup/cgroup.c:4256 kernfs_fop_open+0x9eb/0xcb0 fs/kernfs/file.c:724 do_dentry_open+0x83d/0x13e0 fs/open.c:949 Freed by task 9353: cgroup_file_release+0xd6/0x100 kernel/cgroup/cgroup.c:4283 kernfs_release_file fs/kernfs/file.c:764 [inline] kernfs_drain_open_files+0x392/0x720 fs/kernfs/file.c:834 kernfs_drain+0x470/0x600 fs/kernfs/dir.c:525 Fixes: `0e94682b73` ("psi: introduce psi monitor") Reported-by: syzbot+33e571025d88efd1312c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=33e571025d88efd1312c Tested-by: syzbot+33e571025d88efd1312c@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-17 07:25:09 -10:00
Linus Torvalds	d730905bc3	Merge tag 'mips_7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - Support for Mobileye EyeQ6Lplus - Cleanups and fixes * tag 'mips_7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (30 commits) MIPS/mtd: Handle READY GPIO in generic NAND platform data MIPS/input: Move RB532 button to GPIO descriptors MIPS: validate DT bootargs before appending them MIPS: Alchemy: Remove unused forward declaration MAINTAINERS: Mobileye: Add EyeQ6Lplus files MIPS: config: add eyeq6lplus_defconfig MIPS: Add Mobileye EyeQ6Lplus evaluation board dts MIPS: Add Mobileye EyeQ6Lplus SoC dtsi clk: eyeq: Add Mobileye EyeQ6Lplus OLB clk: eyeq: Adjust PLL accuracy computation clk: eyeq: Skip post-divisor when computing PLL frequency pinctrl: eyeq5: Add Mobileye EyeQ6Lplus OLB pinctrl: eyeq5: Use match data reset: eyeq: Add Mobileye EyeQ6Lplus OLB MIPS: Add Mobileye EyeQ6Lplus support dt-bindings: soc: mobileye: Add EyeQ6Lplus OLB dt-bindings: mips: Add Mobileye EyeQ6Lplus SoC MIPS: dts: loongson64g-package: Switch to Loongson UART driver mips: pci-mt7620: rework initialization procedure mips: pci-mt7620: add more register init values ...	2026-04-17 08:53:23 -07:00
Linus Torvalds	a10e80be63	Merge tag 'alpha-for-v7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha Pull alpha updates from Magnus Lindholm: "One fix to silence pgprot_modify() compiler warnings, and one patch adding SECCOMP/SECCOMP_FILTER support together with the syscall and ptrace fixes needed for it" * tag 'alpha-for-v7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha: alpha: Define pgprot_modify to silence tautological comparison warnings alpha: add support for SECCOMP and SECCOMP_FILTER	2026-04-17 08:34:43 -07:00
Linus Torvalds	01f492e181	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Borislav Petkov (AMD)	e55d98e775	x86/CPU: Fix FPDSS on Zen1 Zen1's hardware divider can leave, under certain circumstances, partial results from previous operations. Those results can be leaked by another, attacker thread. Fix that with a chicken bit. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-04-17 06:04:42 -07:00
Linus Torvalds	43cfbdda5a	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several fixes: - Add missing static const - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is turned on - Fix selftest memory leak and syzkaller splat - Fix missed -EFAULT in fault reporting write() fops - Fix a race where map/unmap with the internal IOVA allocator can unmap things it should not" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix a race with concurrent allocation and unmap iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format iommufd: Fix return value of iommufd_fault_fops_write() iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc() iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy} iommufd: vfio compatibility extension check for noiommu mode iommufd: Constify struct dma_buf_attach_ops	2026-04-16 21:21:55 -07:00
Linus Torvalds	87fe97a184	Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl Pull fwctl updates from Jason Gunthorpe: - New fwctl driver for Broadcom RDMA NICs - Bug fix for non-modular builds * tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl: fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal fwctl/bnxt_fwctl: Add documentation entries fwctl/bnxt_fwctl: Add bnxt fwctl device fwctl/bnxt_en: Create an aux device for fwctl fwctl/bnxt_en: Refactor aux bus functions to be more generic fwctl/bnxt_en: Move common definitions to include/linux/bnxt/	2026-04-16 21:15:56 -07:00
Linus Torvalds	8242c709d4	Merge tag 'soc-arm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC ARM code updates from Arnd Bergmann: "These are again very minimal updates: - A workaround for firmware on Google Nexus 10 - A fix for early debugging on OMAP1 - A rework for Microchip SoC configuration - Cleanups on OMAP2 an R-Car-Gen2" * tag 'soc-arm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: omap2: dead code cleanup in kconfig for ARCH_OMAP4 ARM: OMAP1: Fix DEBUG_LL and earlyprintk on OMAP16XX arm64: Kconfig: provide a top-level switch for Microchip platforms ARM: shmobile: rcar-gen2: Use of_phandle_args_equal() helper ARM: omap: fix all kernel-doc warnings ARM: omap2: Replace scnprintf with strscpy in omap3_cpuinfo ARM: samsung: exynos5250: Allow CPU1 to boot	2026-04-16 20:45:14 -07:00
Linus Torvalds	231d703058	Merge tag 'soc-defconfig-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC defconfig updates from Arnd Bergmann: "As usual, we enable a number of additional device drivers as loadable modules, to support the added platforms. The largest change this time is for OMAP2/3, which were not that well supported in the generic arm32 defconfig. The Tegra SoC platforms are now enabled by default in Kconfig when ARCH_TEGRA is enabled, which means the defconfig change is done at the same time as the Kconfig change here" * tag 'soc-defconfig-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits) arch/arm: Drop CONFIG_FIRMWARE_EDID from defconfig files arm64: defconfig: Enable DP83TG720 PHY driver arm64: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC ARM: tegra: defconfig: Drop redundant ARCH_TEGRA_foo_SOC arm64: defconfig: enable pci-pwrctrl-generic as module arm64: defconfig: Enable Lontium LT8713sx driver arm64: defconfig: Enable Qualcomm Eliza SoC display clock controller arm64: defconfig: enable IPQ5210 RDP504 base configs arm64: defconfig: Enable Milos LPASS LPI pinctrl driver arm64: defconfig: Enable Kaanapali clock controllers arm64: defconfig: Enable configs for Arduino VENTUNO Q arm64: defconfig: Enable Qualcomm Eliza basic resource providers arm64: defconfig: Enable S5KJN1 camera sensor arm64: defconfig: Enable configurations for Toradex Aquila AM69 arm64: defconfig: remove SENSORS_SA67MCU arm64: defconfig: Enable Qualcomm WCD937x headphone codec as module arm64: defconfig: Enable QCOMTEE module for QTEE-enabled Qualcomm SoCs ARM: shmobile: defconfig: Refresh for v7.0-rc1 arm: multi_v7_defconfig: Enable more OMAP 3/4 related configs ARM: multi_v7_defconfig: omap2plus_defconfig: Enable ITE IT66121 driver ...	2026-04-16 20:40:20 -07:00
Linus Torvalds	31b43c079f	Merge tag 'soc-drivers-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "The driver updates again are all over the place with many minor fixes going into platform specific code. The most notable changes are: - Support for Microchip pic64gx system controllers - Work on cleaning up devicetree bindings for SoC drivers, and converting them into the new format - Lots of smaller changes for Qualcomm SoC drivers, including support for a number of newly supported chips - reset controller API cleanups and a new driver for Cix Sky1 - Reworks of the Tegra PMC and CBB drivers, along with a change to how individual Tegra SoCs get selected in Kconfig and BPMP firmware driver updates including a refresh of the ABI header to match the version used by firmware - STM32 updates to the firewall bus driver and support for the debug bus through OP-TEE - SCMI firmware driver improvements for reliability, in particular for dealing with broken firmware interrupts - Memory driver updates for Tegra, and a patch to remove the unused Baikal T1 driver" * tag 'soc-drivers-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (193 commits) firmware: arm_ffa: Use the correct buffer size during RXTX_MAP firmware: qcom: scm: Allow QSEECOM on Lenovo IdeaCentre Mini X clk: spear: fix resource leak in clk_register_vco_pll() reset: rzv2h-usb2phy: Add support for VBUS mux controller registration reset: rzv2h-usb2phy: Convert to regmap API dt-bindings: reset: renesas,rzv2h-usb2phy: Document RZ/G3E USB2PHY reset dt-bindings: reset: renesas,rzv2h-usb2phy: Add '#mux-state-cells' property soc: microchip: add mpfs gpio interrupt mux driver dt-bindings: soc: microchip: document PolarFire SoC's gpio interrupt mux gpio: mpfs: Add interrupt support soc: qcom: ubwc: add helpers to get programmable values soc: qcom: ubwc: add helper to get min_acc length firmware: qcom: scm: Register gunyah watchdog device soc: qcom: socinfo: Add SoC ID for SA8650P dt-bindings: arm: qcom,ids: Add SoC ID for SA8650P firmware: qcom: scm: Allow QSEECOM on Mahua CRD soc: qcom: wcnss: simplify allocation of req soc: qcom: pd-mapper: Add support for Eliza soc: qcom: aoss: compare against normalized cooling state soc: qcom: llcc: fix v1 SB syndrome register offset ...	2026-04-16 20:34:34 -07:00
Linus Torvalds	e65f4718a5	Merge tag 'soc-dt-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC devicetree updates from Arnd Bergmann: "A number of SoC platforms are adding modernized variants of their already supported chips time, with a total of 12 new SoCs, and two older SoC getting removed: - Qualcomm Glymur is a compute SoC using 18 Oryon-2 CPU cores - Qualcomm Mahua is a variant of Glymur with only 12 CPU cores, but largely identical. - Qualcomm Eliza is an embeded platform for mobile phone (SM7750) and IOT (QC7790S/M) workloads - Qualcomm IPQ5210 is a wireless networking SoC using Cortex-A53 cores - Qualcomm apq8084 and ipq806x had only rudimentary support but no actual products using them, so they are now gone. - Axis ARTPEC-9 is a follow-up to the ARTPEC-8 embedded SoC, using the Samsung SoC platform but now with Cortex-A55 cores - ARM Zena is a virtual platform in FVP using Cortex-A720AE cores, with additional versions planned to be merged in the future. - ARM corstone-1000-a320 is a reference platform for IOT, using low-end Cortex-A320 cores - Microchip LAN9691 is an updated 64-bit variant of the arm32 lan966x series of networking SoCs - Microchip PIC64GX is an embedded RISC-V chip using SIFIVE U54 CPU cores - Rockchip RV1103B is the low-end 32-bit single-core vision processor - Renesas RZ/G3L (r9a08g046) is an industrial embedded chip using Cortex-A55 cores, similar to the G3E and G3S variants we already supported. - NXP S32N79 is an automotive SoC using Cortex-A78AE cores, a significant upgrade from the older S32V and S32G series These all come with at least one reference board or an initial product using these, in total there are 67 newly added boards. The ones for already supported SoCs are: - Two more Aspeed BMC based boards - Three older tablets based on 32-bit OMAP4 and Exynos5 SoCs - One Set-top-box based on Allwinner H6 - 22 additional industrial/embedded boards using 64-bit NXP i.MX8M or i.MX9 SoCs - 20 Qualcomm SoC based machines across all possible markets: workstation, gaming, laptop, phone, networking, reference, ... - Three more Rockchips rk35xx based boards - Four variants of the Toradex Verdin using TI AM62 Other notable bits are: - A cleanup for the 32-bit Tegra paz00 board moved the last board specific code on Tegra into equivalent dts syntax. - There continues to be a significant number of fixes for static checking of dtc syntax, but it feels like this is slowing down, hopefully getting into a state where most known issues are addressed - Additional hardware support for many existing boards across SoC families, notably Qualcomm, Broadcom, i.MX2, i.MX6, Rockchips, STM32, Mediatek, Tegra, TI and Microchip" * tag 'soc-dt-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (841 commits) arm64: dts: ti: k3: Use memory-region-names for r5f ARM: dts: imx: Add DT overlays for DH i.MX6 DHCOM SoM and boards ARM: dts: imx6sx: remove fallback compatible string fsl,imx28-lcdif ARM: dts: imx25: rename node name tcq to touchscreen ARM: dts: imx: b850v3: Disable unused usdhc4 ARM: dts: imx: b850v3: Define GPIO line names ARM: dts: imx: b850v3: Use alphabetical sorting ARM: dts: imx: bx50v3: Configure phy-mode to eliminate a warning ARM: dts: imx: bx50v3: Configure switch PHY max-speed to 100Mbps ARM: dts: imx7ulp: Add CPU clock and OPP table support ARM: dts: imx7-mba7: Deassert BOOT_EN after boot ARM: dts: tqma7: add boot phase properties ARM: dts: imx7s: add boot phase properties ARM: dts: tqma6ul[l]: correct spelling of TQ-Systems ARM: dts: mba6ulx: add boot phase properties ARM: dts: imx6ul[l]-tqma6ul[l]: add boot phase properties ARM: dts: imx6ul/imx6ull: add boot phase properties ARM: dts: imx6qdl-mba6: add boot phase properties ARM: dts: imx6qdl-tqma6: add boot phase properties ARM: dts: imx6qdl: add boot phase properties ...	2026-04-16 20:28:48 -07:00
Linus Torvalds	440d6635b2	Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...	2026-04-16 20:11:56 -07:00
Linus Torvalds	0b2f2b1fc0	Merge tag 'v7.1-rc1-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - Fix integer underflow in encrypted read - Four debug patches, adding a few tracepoints - Minor update to MAINTAINERS file (preferred server URL for cifs) - Remove the BUG_ON() calls in d_mark_tmpfile_name * tag 'v7.1-rc1-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: MAINTAINERS: change git.samba.org to https smb: client: fix integer underflow in receive_encrypted_read() smb: client: add tracepoints for deferred handle caching smb: client: add oplock level to smb3_open_done tracepoint smb: client: add tracepoint for local lock conflicts smb: client: add tracepoints for lock operations vfs: get rid of BUG_ON() in d_mark_tmpfile_name()	2026-04-16 19:14:55 -07:00
Linus Torvalds	3cd8b194bf	Merge tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd Pull smbdirect updates from Steve French: "Move smbdirect server and client code to common directory: - temporary use of smbdirect_all_c_files.c to allow micro steps - factor out common functions into a smbdirect.ko. - convert cifs.ko to use smbdirect.ko - convert ksmbd.ko to use smbdirect.ko - let smbdirect.ko use global workqueues - move ib_client logic from ksmbd.ko into smbdirect.ko - remove smbdirect_all_c_files.c hack again - some locking and teardown related fixes on top" * tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd: (145 commits) smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error smb: smbdirect: fix copyright header of smbdirect.h smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16 smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic smb: server: no longer use smbdirect_socket_set_custom_workqueue() smb: client: no longer use smbdirect_socket_set_custom_workqueue() smb: smbdirect: introduce global workqueues smb: smbdirect: prepare use of dedicated workqueues for different steps smb: smbdirect: remove unused smbdirect_connection_mr_io_recovery_work() smb: smbdirect: wrap rdma_disconnect() in rdma_[un]lock_handler() smb: server: make use of smbdirect_netdev_rdma_capable_mode_type() smb: smbdirect: introduce smbdirect_netdev_rdma_capable_mode_type() smb: server: make use of smbdirect.ko smb: server: remove unused ksmbd_transport_ops.prepare() smb: server: make use of smbdirect_socket_{listen,accept}() smb: server: only use public smbdirect functions smb: server: make use of smbdirect_socket_create_accepting()/smbdirect_socket_release() smb: server: make use of smbdirect_{socket_init_accepting,connection_wait_for_connected}() smb: server: make use of smbdirect_connection_send_iter() and related functions ...	2026-04-16 08:25:04 -07:00
Linus Torvalds	d3d9443f8b	Merge tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Add two new selftests * tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests/livepatch: add test for module function patching selftests: livepatch: test-ftrace: livepatch a traced function	2026-04-16 08:13:27 -07:00
Linus Torvalds	090748e62f	Merge tag 'm68k-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Add support for QEMU virt-ctrl, and use it for system reset and power off on the virt platform - defconfig updates - Miscellaneous fixes and improvements * tag 'm68k-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: virt: Switch to qemu-virt-ctrl driver power: reset: Add QEMU virt-ctrl driver m68k: defconfig: Update defconfigs for v7.0-rc1 m68k: emu: Replace unbounded sprintf() in nfhd_init_one() m68k: uapi: Add ucontext.h m68k: defconfig: hp300: Enable monochrome and 16-color linux logos m68k: q40: Remove commented out code	2026-04-16 08:11:01 -07:00
Linus Torvalds	948ef73f7e	Merge tag 'efi-next-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Again not a busy cycle for EFI, just some minor tweaks and bug fixes: - Enable boot graphics resource table (BGRT) on Xen/x86 - Correct a misguided assumption in the memory attributes table sanity check - Start tagging efi_mem_reserve()'d regions as MEMBLOCK_RSRV_KERN - Some other minor fixes and cleanups" * tag 'efi-next-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/capsule-loader: fix incorrect sizeof in phys array reallocation efi: Tag memblock reservations of boot services regions as RSRV_KERN memblock: Permit existing reserved regions to be marked RSRV_KERN efi/memattr: Fix thinko in table size sanity check efi: libstub: fix type of fdt 32 and 64bit variables efi: Drop unused efi_range_is_wc() function efi: Enable BGRT loading under Xen efi: make efi_mem_type() and efi_mem_attributes() work on Xen PV	2026-04-16 08:06:25 -07:00
Linus Torvalds	f0bf3eac92	Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu) - Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro) - Conversions to const struct class in support of class_create() deprecation (Jori Koolstra) - Improve selftest compiler compatibility by avoiding initializer on variable-length array (Manish Honap) - Define new uAPI for drivers supporting migration to advise user- space of new initial data for reducing target startup latency. Implemented for mlx5 vfio-pci variant driver (Yishai Hadas) - Enable vfio selftests on aarch64, not just cross-compiles reporting arm64 (Ted Logan) - Update vfio selftest driver support to include additional DSA devices (Yi Lai) - Unconditionally include debugfs root pointer in vfio device struct, avoiding a build failure seen in hisi_acc variant driver without debugfs otherwise (Arnd Bergmann) - Add support for the s390 ISM (Internal Shared Memory) device via a new variant driver. The device is unique in the size of its BAR space (256TiB) and lack of mmap support (Julian Ruess) - Enforce that vfio-pci drivers implement a name in their ops structure for use in sequestering SR-IOV VFs (Alex Williamson) - Prune leftover group notifier code (Paolo Bonzini) - Fix Xe vfio-pci variant driver to avoid migration support as a dependency in the reset path and missing release call (Michał Winiarski) * tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits) vfio/xe: Add a missing vfio_pci_core_release_dev() vfio/xe: Reorganize the init to decouple migration from reset vfio: remove dead notifier code vfio/pci: Require vfio_device_ops.name MAINTAINERS: add VFIO ISM PCI DRIVER section vfio/ism: Implement vfio_pci driver for ISM devices vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it vfio: unhide vdev->debug_root vfio/qat: add support for Intel QAT 420xx VFs vfio: selftests: Support DMR and GNR-D DSA devices vfio: selftests: Build tests on aarch64 vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO vfio/mlx5: consider inflight SAVE during PRE_COPY net/mlx5: Add IFC bits for migration state vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set() vfio: uapi: fix comment typo vfio: mdev: replace mtty_dev->vd_class with a const struct class ...	2026-04-16 08:01:16 -07:00
Petr Mladek	448c0f8cb7	Merge branch 'for-7.1/module-function-test' into for-linus	2026-04-16 10:33:43 +02:00
Stefan Metzmacher	d09a040c18	smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting We should not hold a mutex locked during wait_for_completion() holding a reference is enough. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	25c2e34931	smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error If smbdirect_socket_destroy_sync() and sc->first_error was not set we should set -ESHUTDOWN, that's a better condition doing it only implicitly with the sc->status < SMBDIRECT_SOCKET_DISCONNECTING check. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	3892007f2b	smb: smbdirect: fix copyright header of smbdirect.h Everything in smbdirect.h was taken from my out of tree prototype. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	735610d0ce	smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16 We still limit this to U8_MAX as the rdma api only uses __u8 and that's also the limit for Infiniband and RoCE*, while iWarp would be able to support larger values at the protocol level. As struct smbdirect_socket_parameters will be part of the uapi for IPPROTO_SMBDIRECT in future, change it now even if userspace sockets won't be supported yet. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	aa43bb2c0f	smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic We always build as standalone module (or as part of the core kernel). This also removes unused elements from struct smbdirect_socket and unused exports. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	649c47559a	smb: server: no longer use smbdirect_socket_set_custom_workqueue() smbdirect.ko has global workqueues now, so we should use these default once. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	73dc52d294	smb: client: no longer use smbdirect_socket_set_custom_workqueue() smbdirect.ko has global workqueues now, so we should use these default once. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	1adde16a9e	smb: smbdirect: introduce global workqueues These will be used in future and callers should no longer use smbdirect_socket_set_custom_workqueue(). Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	e4ce1fca04	smb: smbdirect: prepare use of dedicated workqueues for different steps This is a preparation in order to have global workqueues in the smbdirect module instead of having the caller to provide one. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	00ac2a4fe0	smb: smbdirect: remove unused smbdirect_connection_mr_io_recovery_work() This would actually never be used as we only move to SMBDIRECT_MR_ERROR when we directly call smbdirect_socket_schedule_cleanup(). Doing an ib_dereg_mr/ib_alloc_mr dance on working connection is not needed and it's also pointless on a broken connection as we don't reuse any ib_pd. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	a40e6f0166	smb: smbdirect: wrap rdma_disconnect() in rdma_[un]lock_handler() This might not be needed, but it controls the order of ib_drain_qp() and rdma_disconnect(). Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	33b2894e8d	smb: server: make use of smbdirect_netdev_rdma_capable_mode_type() This removes is basically the same logic. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	81a7a3a0fa	smb: smbdirect: introduce smbdirect_netdev_rdma_capable_mode_type() This is basically a copy of ksmbd_rdma_capable_netdev() in the server, but this also prints a message when a device is renamed. The differences are: - It uses rdma_for_each_port() instead of implementing the same logic again. - It returns RDMA_NODE_{UNSPECIFIED,IB_CA,RNIC} values instead of bool Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	50bdab9ae4	smb: server: make use of smbdirect.ko This means we no longer inline the common smbdirect .c files and use the exported functions from the module instead. Note the connection specific logging is still redirect to ksmbd.ko functions via smbdirect_socket_set_logging(). We still don't use real socket layer, but we're very close... Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	98bdc5fda9	smb: server: remove unused ksmbd_transport_ops.prepare() This is no longer needed for smbdirect. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00

1 2 3 4 5 ...

1440385 Commits