linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-02 04:41:10 -04:00

Author	SHA1	Message	Date
Shengming Hu	cafe4074a7	watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() cpustat_tail indexes cpustat_util[], which is a NUM_SAMPLE_PERIODS-sized ring buffer. need_counting_irqs() currently wraps the index using NUM_HARDIRQ_REPORT, which only happens to match NUM_SAMPLE_PERIODS. Use NUM_SAMPLE_PERIODS for the wrap to keep the ring math correct even if the NUM_HARDIRQ_REPORT or NUM_SAMPLE_PERIODS changes. Link: https://lkml.kernel.org/r/tencent_7068189CB6D6689EB353F3D17BF5A5311A07@qq.com Fixes: `e9a9292e23` ("watchdog/softlockup: Report the most frequent interrupts") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Run <zhang.run@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:34 -08:00
Tycho Andersen (AMD)	0758293d5d	kho: fix doc for kho_restore_pages() This function returns NULL if kho_restore_page() returns NULL, which happens in a couple of corner cases. It never returns an error code. Link: https://lkml.kernel.org/r/20260123190506.1058669-1-tycho@kernel.org Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:34 -08:00
Pasha Tatashin	f653ff7af9	tests/liveupdate: add in-kernel liveupdate test Introduce an in-kernel test module to validate the core logic of the Live Update Orchestrator's File-Lifecycle-Bound feature. This provides a low-level, controlled environment to test FLB registration and callback invocation without requiring userspace interaction or actual kexec reboots. The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option. Link: https://lkml.kernel.org/r/20251218155752.3045808-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:33 -08:00
Pasha Tatashin	cab056f2aa	liveupdate: luo_flb: introduce File-Lifecycle-Bound global state Introduce a mechanism for managing global kernel state whose lifecycle is tied to the preservation of one or more files. This is necessary for subsystems where multiple preserved file descriptors depend on a single, shared underlying resource. An example is HugeTLB, where multiple file descriptors such as memfd and guest_memfd may rely on the state of a single HugeTLB subsystem. Preserving this state for each individual file would be redundant and incorrect. The state should be preserved only once when the first file is preserved, and restored/finished only once the last file is handled. This patch introduces File-Lifecycle-Bound (FLB) objects to solve this problem. An FLB is a global, reference-counted object with a defined set of operations: - A file handler (struct liveupdate_file_handler) declares a dependency on one or more FLBs via a new registration function, liveupdate_register_flb(). - When the first file depending on an FLB is preserved, the FLB's .preserve() callback is invoked to save the shared global state. The reference count is then incremented for each subsequent file. - Conversely, when the last file is unpreserved (before reboot) or finished (after reboot), the FLB's .unpreserve() or .finish() callback is invoked to clean up the global resource. The implementation includes: - A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a corresponding FDT node (luo-flb) to serialize the state of all active FLBs and pass them via Kexec Handover. - Core logic in luo_flb.c to manage FLB registration, reference counting, and the invocation of lifecycle callbacks. - An API (liveupdate_flb_get/_incoming/_outgoing) for other kernel subsystems to safely access the live object managed by an FLB, both before and after the live update. This framework provides the necessary infrastructure for more complex subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update Orchestrator. Link: https://lkml.kernel.org/r/20251218155752.3045808-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:33 -08:00
Pasha Tatashin	6845645eef	liveupdate: luo_file: Use private list Switch LUO to use the private list iterators. Link: https://lkml.kernel.org/r/20251218155752.3045808-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:33 -08:00
Arnd Bergmann	90079798f1	delayacct: fix uapi timespec64 definition The custom definition of 'struct timespec64' is incompatible with both the kernel's internal definition and the glibc type, at least on big-endian targets that have the tv_nsec field in a different place, and the definition clashes with any userspace that also defines a timespec64 structure. Running the header check with -Wpadding enabled produces this output that warns about the incorrect padding: usr/include/linux/taskstats.h:25:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] Remove the hack and instead use the regular __kernel_timespec type that is meant to be used in uapi definitions. Link: https://lkml.kernel.org/r/20260202095906.1344100-1-arnd@kernel.org Fixes: 29b63f6eff0e ("delayacct: add timestamp of delay max") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Jiang Kun <jiang.kun2@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:32 -08:00
Pnina Feder	2e171ab29f	panic: add panic_force_cpu= parameter to redirect panic to a specific CPU Some platforms require panic handling to execute on a specific CPU for crash dump to work reliably. This can be due to firmware limitations, interrupt routing constraints, or platform-specific requirements where only a single CPU is able to safely enter the crash kernel. Add the panic_force_cpu= kernel command-line parameter to redirect panic execution to a designated CPU. When the parameter is provided, the CPU that initially triggers panic forwards the panic context to the target CPU via IPI, which then proceeds with the normal panic and kexec flow. The IPI delivery is implemented as a weak function (panic_smp_redirect_cpu) so architectures with NMI support can override it for more reliable delivery. If the specified CPU is invalid, offline, or a panic is already in progress on another CPU, the redirection is skipped and panic continues on the current CPU. [pnina.feder@mobileye.com: fix unused variable warning] Link: https://lkml.kernel.org/r/20260126122618.2967950-1-pnina.feder@mobileye.com Link: https://lkml.kernel.org/r/20260122102457.1154599-1-pnina.feder@mobileye.com Signed-off-by: Pnina Feder <pnina.feder@mobileye.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-03 08:21:26 -08:00
Evangelos Petrongonas	427b2535f5	kho: skip memoryless NUMA nodes when reserving scratch areas kho_reserve_scratch() iterates over all online NUMA nodes to allocate per-node scratch memory. On systems with memoryless NUMA nodes (nodes that have CPUs but no memory), memblock_alloc_range_nid() fails because there is no memory available on that node. This causes KHO initialization to fail and kho_enable to be set to false. Some ARM64 systems have NUMA topologies where certain nodes contain only CPUs without any associated memory. These configurations are valid and should not prevent KHO from functioning. Fix this by only counting nodes that have memory (N_MEMORY state) and skip memoryless nodes in the per-node scratch allocation loop. Link: https://lkml.kernel.org/r/20260120175913.34368-1-epetron@amazon.de Fixes: `3dc92c3114` ("kexec: add Kexec HandOver (KHO) generation helpers"). Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:08 -08:00
Vasily Gorbik	96a54b8ffc	crash_dump: fix dm_crypt keys locking and ref leak crash_load_dm_crypt_keys() reads dm-crypt volume keys from the user keyring. It uses user_key_payload_locked() without holding key->sem, which makes lockdep complain when kexec_file_load() assembles the crash image: ============================= WARNING: suspicious RCU usage ----------------------------- ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by kexec/4875. stack backtrace: Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 lockdep_rcu_suspicious.cold+0x4e/0x96 crash_load_dm_crypt_keys+0x314/0x390 bzImage64_load+0x116/0x9a0 ? __lock_acquire+0x464/0x1ba0 __do_sys_kexec_file_load+0x26a/0x4f0 do_syscall_64+0xbd/0x430 entry_SYSCALL_64_after_hwframe+0x77/0x7f In addition, the key returned by request_key() is never key_put()'d, leaking a key reference on each load attempt. Take key->sem while copying the payload and drop the key reference afterwards. Link: https://lkml.kernel.org/r/patch.git-2d4d76083a5c.your-ad-here.call-01769426386-ext-2560@work.hours Fixes: `479e58549b` ("crash_dump: store dm crypt keys in kdump reserved memory") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:08 -08:00
Mike Rapoport (Microsoft)	b50634c5e8	kho: cleanup error handling in kho_populate() * use dedicated labels for error handling instead of checking if a pointer is not null to decide if it should be unmapped * drop assignment of values to err that are only used to print a numeric error code, there are pr_warn()s for each failure already so printing a numeric error code in the next line does not add anything useful Link: https://lkml.kernel.org/r/20260122121757.575987-1-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:08 -08:00
Ondrej Mosnacek	0895a000e4	ucount: check for CAP_SYS_RESOURCE using ns_capable_noaudit() The user.* sysctls implement the ctl_table_root::permissions hook and they override the file access mode based on the CAP_SYS_RESOURCE capability (at most rwx if capable, at most r-- if not). The capability is being checked unconditionally, so if an LSM denies the capability, an audit record may be logged even when access is in fact granted. Given the logic in the set_permissions() function in kernel/ucount.c and the unfortunate way the permission checking is implemented, it doesn't seem viable to avoid false positive denials by deferring the capability check. Thus, do the same as in net_ctl_permissions() (net/sysctl_net.c) - switch from ns_capable() to ns_capable_noaudit(), so that the check never logs an audit record. Link: https://lkml.kernel.org/r/20260122140745.239428-1-omosnace@redhat.com Fixes: `dbec28460a` ("userns: Add per user namespace sysctls.") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Acked-by: Serge Hallyn <serge@hallyn.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Gladkov <legion@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:08 -08:00
Li Chen	480e1d5c64	kexec: derive purgatory entry from symbol kexec_load_purgatory() derives image->start by locating e_entry inside an SHF_EXECINSTR section. If the purgatory object contains multiple executable sections with overlapping sh_addr, the entrypoint check can match more than once and trigger a WARN. Derive the entry section from the purgatory_start symbol when present and compute image->start from its final placement. Keep the existing e_entry fallback for purgatories that do not expose the symbol. WARNING: kernel/kexec_file.c:1009 at kexec_load_purgatory+0x395/0x3c0, CPU#10: kexec/1784 Call Trace: <TASK> bzImage64_load+0x133/0xa00 __do_sys_kexec_file_load+0x2b3/0x5c0 do_syscall_64+0x81/0x610 entry_SYSCALL_64_after_hwframe+0x76/0x7e [me@linux.beauty: move helper to avoid forward declaration, per Baoquan] Link: https://lkml.kernel.org/r/20260128043511.316860-1-me@linux.beauty Link: https://lkml.kernel.org/r/20260120124005.148381-1-me@linux.beauty Fixes: `8652d44f46` ("kexec: support purgatories with .text.hot sections") Signed-off-by: Li Chen <me@linux.beauty> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Li Chen <me@linux.beauty> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ricardo Ribalda Delgado <ribalda@chromium.org> Cc: Ross Zwisler <zwisler@google.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:07 -08:00
Wang Yaxin	503efe850c	delayacct: add timestamp of delay max Problem ======= Commit `658eb5ab91` ("delayacct: add delay max to record delay peak") introduced the delay max for getdelays, which records abnormal latency peaks and helps us understand the magnitude of such delays. However, the peak latency value alone is insufficient for effective root cause analysis. Without the precise timestamp of when the peak occurred, we still lack the critical context needed to correlate it with other system events. Solution ======== To address this, we need to additionally record a precise timestamp when the maximum latency occurs. By correlating this timestamp with system logs and monitoring metrics, we can identify processes with abnormal resource usage at the same moment, which can help us to pinpoint root causes. Use Case ======== bash-4.4# ./getdelays -d -t 227 print delayacct stats ON TGID 227 CPU count real total virtual total delay total delay average delay max delay min delay max timestamp 46 188000000 192348334 4098012 0.089ms 0.429260ms 0.051205ms 2026-01-15T15:06:58 IO count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A SWAP count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A RECLAIM count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A THRAS HING count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A COMPACT count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A WPCOPY count delay total delay average delay max delay min delay max timestamp 182 19413338 0.107ms 0.547353ms 0.022462ms 2026-01-15T15:05:24 IRQ count delay total delay average delay max delay min delay max timestamp 0 0 0.000ms 0.000000ms 0.000000ms N/A Link: https://lkml.kernel.org/r/20260119100241520gWubW8-5QfhSf9gjqcc_E@zte.com.cn Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:06 -08:00
Steven Rostedt	86e685ff36	tracing: remove size parameter in __trace_puts() The __trace_puts() function takes a string pointer and the size of the string itself. All users currently simply pass in the strlen() of the string it is also passing in. There's no reason to pass in the size. Instead have the __trace_puts() function do the strlen() within the function itself. This fixes a header recursion issue where using strlen() in the macro calling __trace_puts() requires adding #include <linux/string.h> in order to use strlen(). Removing the use of strlen() from the header fixes the recursion issue. Link: https://lore.kernel.org/all/aUN8Hm377C5A0ILX@yury/ Link: https://lkml.kernel.org/r/20260116042510.241009-6-ynorov@nvidia.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Yury Norov <ynorov@nvidia.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:05 -08:00
Pratyush Yadav	8f1081892d	kho: simplify page initialization in kho_restore_page() When restoring a page (from kho_restore_pages()) or folio (from kho_restore_folio()), KHO must initialize the struct page. The initialization differs slightly depending on if a folio is requested or a set of 0-order pages is requested. Conceptually, it is quite simple to understand. When restoring 0-order pages, each page gets a refcount of 1 and that's it. When restoring a folio, head page gets a refcount of 1 and tail pages get 0. kho_restore_page() tries to combine the two separate initialization flow into one piece of code. While it works fine, it is more complicated to read than it needs to be. Make the code simpler by splitting the two initalization paths into two separate functions. This improves readability by clearly showing how each type must be initialized. Link: https://lkml.kernel.org/r/20260116112217.915803-3-pratyush@kernel.org Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:04 -08:00
Pratyush Yadav	840fe43d37	kho: use unsigned long for nr_pages Patch series "kho: clean up page initialization logic", v2. This series simplifies the page initialization logic in kho_restore_page(). It was originally only a single patch [0], but on Pasha's suggestion, I added another patch to use unsigned long for nr_pages. Technically speaking, the patches aren't related and can be applied independently, but bundling them together since patch 2 relies on 1 and it is easier to manage them this way. This patch (of 2): With 4k pages, a 32-bit nr_pages can span up to 16 TiB. While it is a lot, there exist systems with terabytes of RAM. gup is also moving to using long for nr_pages. Use unsigned long and make KHO future-proof. Link: https://lkml.kernel.org/r/20260116112217.915803-1-pratyush@kernel.org Link: https://lkml.kernel.org/r/20260116112217.915803-2-pratyush@kernel.org Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 16:16:04 -08:00
Andrew Morton	2eec08ff09	Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable to pick up changes required to merge "kho: use unsigned long for nr_pages".	2026-01-31 16:12:21 -08:00
Pratyush Yadav (Google)	6ca9de3600	kho: print which scratch buffer failed to be reserved When scratch area fails to reserve, KHO prints a message indicating that. But it doesn't say which scratch failed to allocate. This can be useful information for debugging. Even more so when the failure is hard to reproduce. Along with the current message, also print which exact scratch area failed to be reserved. Link: https://lkml.kernel.org/r/20260116165416.1262531-1-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Matlack <dmatlack@google.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:15 -08:00
Finn Thain	3bb83c9109	bpf: explicitly align bpf_res_spin_lock Patch series "Align atomic storage", v7. This series adds the __aligned attribute to atomic_t and atomic64_t definitions in include/linux and include/asm-generic (respectively) to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh. This series also adds Kconfig options to enable a new run-time warning to help reveal misaligned atomic accesses on platforms which don't trap that. The performance impact is expected to vary across platforms and workloads. The measurements I made on m68k show that some workloads run faster and others slower. This patch (of 4): Align bpf_res_spin_lock to avoid a BUILD_BUG_ON() when the alignment changes, as it will do on m68k when, in a subsequent patch, the minimum alignment of the atomic_t member of struct rqspinlock gets increased from 2 to 4. Drop the BUILD_BUG_ON() as it becomes redundant. Link: https://lkml.kernel.org/r/cover.1768281748.git.fthain@linux-m68k.org Link: https://lkml.kernel.org/r/8a83876b07d1feacc024521e44059ae89abbb1ea.1768281748.git.fthain@linux-m68k.org Signed-off-by: Finn Thain <fthain@linux-m68k.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gary Guo <gary@garyguo.net> Cc: Guo Ren <guoren@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: KP Singh <kpsingh@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Sasha Levin (Microsoft) <sashal@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:14 -08:00
Mathieu Desnoyers	5e65b5ca7d	tsacct: skip all kernel threads This patch is a preparation step for HPCC, for the OOM killer improvements. I suspect that this patch is useful on its own, because it really makes no sense to sum up accounting statistics of use_mm within kernel threads which are only temporarily using those mm. When we hit acct_account_cputime within a irq handler over a kthread that happens to use a userspace mm, we end up summing up the mm's RSS into the tsk acct_rss_mem1, which eventually decays. I don't see a good rationale behind tracking the mm's rss in that way when a kthread use a userspace mm temporarily through use_mm. It causes issues with init_mm and efi_mm which only partially initialize their mm_struct when introducing the new hierarchical percpu counters to replace RSS counters, which requires a pointer dereference when reading the approximate counter sum. The current percpu counters simply load a zeroed atomic counter, which happen to work. Skip all kernel threads in acct_account_cputime(), not just those that happen to have a NULL mm. This is a preparation step before introducing the hierarchical percpu counters. Link: https://lkml.kernel.org/r/20251224173810.648699-2-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Brown <broonie@kernel.org> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Christan König <christian.koenig@amd.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Liam R . Howlett" <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Martin Liu <liumartin@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:13 -08:00
Long Wei	25929dae28	kho: remove duplicate header file references kexec_handover_internal.h is included twice in kexec_handover.c. Remove the redundant first inclusion to eliminate the duplication. Link: https://lkml.kernel.org/r/20251216114400.2677311-1-longwei27@huawei.com Signed-off-by: Long Wei <longwei27@huawei.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: hewenliang <hewenliang4@huawei.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:13 -08:00
mingzhu.wang(王明珠)	2bbd9e1d14	kernel/fork: update obsolete use_mm references to kthread_use_mm The comment for get_task_mm() in kernel/fork.c incorrectly references the deprecated function `use_mm()`, which has been renamed to `kthread_use_mm()` in kernel/kthread.c. This patch updates the documentation to reflect the current function names, ensuring accuracy when developers refer to the kernel thread memory context API. No functional changes were introduced. Link: https://lkml.kernel.org/r/KUZPR04MB8965F954108B4DD7E8FFDB2B8F84A@KUZPR04MB8965.apcprd04.prod.outlook.com Signed-off-by: mingzhu.wang <mingzhu.wang@transsion.com> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiazi Li <jqqlijiazi@gmail.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:12 -08:00
Jason Miu	ac2d8102c4	kho: relocate vmalloc preservation structure to KHO ABI header The `struct kho_vmalloc` defines the in-memory layout for preserving vmalloc regions across kexec. This layout is a contract between kernels and part of the KHO ABI. To reflect this relationship, the related structs and helper macros are relocated to the ABI header, `include/linux/kho/abi/kexec_handover.h`. This move places the structure's definition under the protection of the KHO_FDT_COMPATIBLE version string. The structure and its components are now also documented within the ABI header to describe the contract and prevent ABI breaks. [rppt@kernel.org: update comment, per Pratyush] Link: https://lkml.kernel.org/r/aW_Mqp6HcqLwQImS@kernel.org Link: https://lkml.kernel.org/r/20260105165839.285270-6-rppt@kernel.org Signed-off-by: Jason Miu <jasonmiu@google.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:12 -08:00
Jason Miu	5e1ea1e27b	kho: introduce KHO FDT ABI header Introduce the `include/linux/kho/abi/kexec_handover.h` header file, which defines the stable ABI for the KHO mechanism. This header specifies how preserved data is passed between kernels using an FDT. The ABI contract includes the FDT structure, node properties, and the "kho-v1" compatible string. By centralizing these definitions, this header serves as the foundational agreement for inter-kernel communication of preserved states, ensuring forward compatibility and preventing misinterpretation of data across kexec transitions. Since the ABI definitions are now centralized in the header files, the YAML files that previously described the FDT interfaces are redundant. These redundant files have therefore been removed. Link: https://lkml.kernel.org/r/20260105165839.285270-5-rppt@kernel.org Signed-off-by: Jason Miu <jasonmiu@google.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:12 -08:00
Mike Rapoport (Microsoft)	a6f4e56828	kho: docs: combine concepts and FDT documentation Currently index.rst in KHO documentation looks empty and sad as it only contains links to "Kexec Handover Concepts" and "KHO FDT" chapters. Inline contents of these chapters into index.rst to provide a single coherent chapter describing KHO. While on it, drop parts of the KHO FDT description that will be superseded by addition of KHO ABI documentation. [rppt@kernel.org: fix Documentation/core-api/kho/index.rst] Link: https://lkml.kernel.org/r/aV4bnHlBXGpT_FMc@kernel.org Link: https://lkml.kernel.org/r/20260105165839.285270-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Jason Miu <jasonmiu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:11 -08:00
Pasha Tatashin	998be0a4db	liveupdate: separate memfd support into LIVEUPDATE_MEMFD Decouple memfd preservation support from the core Live Update Orchestrator configuration. Previously, enabling CONFIG_LIVEUPDATE forced a dependency on CONFIG_SHMEM and unconditionally compiled memfd_luo.o. However, Live Update may be used for purposes that do not require memfd-backed memory preservation. Introduce CONFIG_LIVEUPDATE_MEMFD to gate memfd_luo.o. This moves the SHMEM and MEMFD_CREATE dependencies to the specific feature that needs them, allowing the base LIVEUPDATE option to be selected independently of shared memory support. Link: https://lkml.kernel.org/r/20251230161402.1542099-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:07:10 -08:00
Breno Leitao	bd58782995	vmcoreinfo: make hwerr_data visible for debugging If the kernel is compiled with LTO, hwerr_data symbol might be lost, and vmcoreinfo doesn't have it dumped. This is currently seen in some production kernels with LTO enabled. Remove the static qualifier from hwerr_data so that the information is still preserved when the kernel is built with LTO. Making hwerr_data a global symbol ensures its debug info survives the LTO link process and appears in kallsyms. Also document it, so it doesn't get removed in the future as suggested by akpm. Link: https://lkml.kernel.org/r/20260122-fix_vmcoreinfo-v2-1-2d6311f9e36c@debian.org Fixes: `3fa805c37d` ("vmcoreinfo: track and log recoverable hardware errors") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:49 -08:00
Andrew Morton	412a32f0e5	kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho_preserve_vmalloc() should return -ENOMEM when new_vmalloc_chunk() fails. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202601211636.IRaejjdw-lkp@intel.com/ Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:48 -08:00
Ran Xiaokai	e86436ad0a	kho: init alloc tags when restoring pages from reserved memory Memblock pages (including reserved memory) should have their allocation tags initialized to CODETAG_EMPTY via clear_page_tag_ref() before being released to the page allocator. When kho restores pages through kho_restore_page(), missing this call causes mismatched allocation/deallocation tracking and below warning message: alloc_tag was not set WARNING: include/linux/alloc_tag.h:164 at ___free_pages+0xb8/0x260, CPU#1: swapper/0/1 RIP: 0010:___free_pages+0xb8/0x260 kho_restore_vmalloc+0x187/0x2e0 kho_test_init+0x3c4/0xa30 do_one_initcall+0x62/0x2b0 kernel_init_freeable+0x25b/0x480 kernel_init+0x1a/0x1c0 ret_from_fork+0x2d1/0x360 Add missing clear_page_tag_ref() annotation in kho_restore_page() to fix this. Link: https://lkml.kernel.org/r/20260122132740.176468-1-ranxiaokai627@163.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:47 -08:00
Minu Jin	f34e19c34e	fork-comment-fix: remove ambiguous question mark in CLONE_CHILD_CLEARTID comment The current comment "Clear TID on mm_release()?" ends with a question mark, implying uncertainty about whether the TID is actually cleared in mm_release(). However, the code flow is deterministic. When a task exits, mm_release() explicitly checks 'tsk->clear_child_tid' and clears. Since this behavior is unambiguous, remove the confusing question mark and rephrase the comment to clearly state that TID is cleared in mm_release(). Link: https://lkml.kernel.org/r/20251125000407.24470-1-s9430939@naver.com Signed-off-by: Minu Jin <s9430939@naver.com> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:23 -08:00
Petr Mladek	3b07086444	kallsyms: prevent module removal when printing module name and buildid kallsyms_lookup_buildid() copies the symbol name into the given buffer so that it can be safely read anytime later. But it just copies pointers to mod->name and mod->build_id which might get reused after the related struct module gets removed. The lifetime of struct module is synchronized using RCU. Take the rcu read lock for the entire __sprint_symbol(). Link: https://lkml.kernel.org/r/20251128135920.217303-8-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:23 -08:00
Petr Mladek	e8a1e7eaa1	kallsyms/ftrace: set module buildid in ftrace_mod_address_lookup() __sprint_symbol() might access an invalid pointer when kallsyms_lookup_buildid() returns a symbol found by ftrace_mod_address_lookup(). The ftrace lookup function must set both @modname and @modbuildid the same way as module_address_lookup(). Link: https://lkml.kernel.org/r/20251128135920.217303-7-pmladek@suse.com Fixes: `9294523e37` ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:22 -08:00
Petr Mladek	cd6735896d	kallsyms/bpf: rename __bpf_address_lookup() to bpf_address_lookup() bpf_address_lookup() has been used only in kallsyms_lookup_buildid(). It was supposed to set @modname and @modbuildid when the symbol was in a module. But it always just cleared @modname because BPF symbols were never in a module. And it did not clear @modbuildid because the pointer was not passed. The wrapper is no longer needed. Both @modname and @modbuildid are now always initialized to NULL in kallsyms_lookup_buildid(). Remove the wrapper and rename __bpf_address_lookup() to bpf_address_lookup() because this variant is used everywhere. [akpm@linux-foundation.org: fix loongarch] Link: https://lkml.kernel.org/r/20251128135920.217303-6-pmladek@suse.com Fixes: `9294523e37` ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:22 -08:00
Petr Mladek	8e81dac4cd	kallsyms: cleanup code for appending the module buildid Put the code for appending the optional "buildid" into a helper function, It makes __sprint_symbol() better readable. Also print a warning when the "modname" is set and the "buildid" isn't. It might catch a situation when some lookup function in kallsyms_lookup_buildid() does not handle the "buildid". Use pr_*_once() to avoid an infinite recursion when the function is called from printk(). The recursion is rather theoretical but better be on the safe side. Link: https://lkml.kernel.org/r/20251128135920.217303-5-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:22 -08:00
Petr Mladek	acfdbb4ab2	module: add helper function for reading module_buildid() Add a helper function for reading the optional "build_id" member of struct module. It is going to be used also in ftrace_mod_address_lookup(). Use "#ifdef" instead of "#if IS_ENABLED()" to match the declaration of the optional field in struct module. Link: https://lkml.kernel.org/r/20251128135920.217303-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:22 -08:00
Petr Mladek	fda024fb64	kallsyms: clean up modname and modbuildid initialization in kallsyms_lookup_buildid() The @modname and @modbuildid optional return parameters are set only when the symbol is in a module. Always initialize them so that they do not need to be cleared when the module is not in a module. It simplifies the logic and makes the code even slightly more safe. Note that bpf_address_lookup() function will get updated in a separate patch. Link: https://lkml.kernel.org/r/20251128135920.217303-3-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:21 -08:00
Petr Mladek	426295ef18	kallsyms: clean up @namebuf initialization in kallsyms_lookup_buildid() Patch series "kallsyms: Prevent invalid access when showing module buildid", v3. We have seen nested crashes in __sprint_symbol(), see below. They seem to be caused by an invalid pointer to "buildid". This patchset cleans up kallsyms code related to module buildid and fixes this invalid access when printing backtraces. I made an audit of __sprint_symbol() and found several situations when the buildid might be wrong: + bpf_address_lookup() does not set @modbuildid + ftrace_mod_address_lookup() does not set @modbuildid + __sprint_symbol() does not take rcu_read_lock and the related struct module might get removed before mod->build_id is printed. This patchset solves these problems: + 1st, 2nd patches are preparatory + 3rd, 4th, 6th patches fix the above problems + 5th patch cleans up a suspicious initialization code. This is the backtrace, we have seen. But it is not really important. The problems fixed by the patchset are obvious: crash64> bt [62/2029] PID: 136151 TASK: ffff9f6c981d4000 CPU: 367 COMMAND: "btrfs" #0 [ffffbdb687635c28] machine_kexec at ffffffffb4c845b3 #1 [ffffbdb687635c80] __crash_kexec at ffffffffb4d86a6a #2 [ffffbdb687635d08] hex_string at ffffffffb51b3b61 #3 [ffffbdb687635d40] crash_kexec at ffffffffb4d87964 #4 [ffffbdb687635d50] oops_end at ffffffffb4c41fc8 #5 [ffffbdb687635d70] do_trap at ffffffffb4c3e49a #6 [ffffbdb687635db8] do_error_trap at ffffffffb4c3e6a4 #7 [ffffbdb687635df8] exc_stack_segment at ffffffffb5666b33 #8 [ffffbdb687635e20] asm_exc_stack_segment at ffffffffb5800cf9 ... This patch (of 7) The function kallsyms_lookup_buildid() initializes the given @namebuf by clearing the first and the last byte. It is not clear why. The 1st byte makes sense because some callers ignore the return code and expect that the buffer contains a valid string, for example: - function_stat_show() - kallsyms_lookup() - kallsyms_lookup_buildid() The initialization of the last byte does not make much sense because it can later be overwritten. Fortunately, it seems that all called functions behave correctly: - kallsyms_expand_symbol() explicitly adds the trailing '\0' at the end of the function. - All *__address_lookup() functions either use the safe strscpy() or they do not touch the buffer at all. Document the reason for clearing the first byte. And remove the useless initialization of the last byte. Link: https://lkml.kernel.org/r/20251128135920.217303-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:21 -08:00
Li RongQing	e700f5d156	watchdog: softlockup: panic when lockup duration exceeds N thresholds The softlockup_panic sysctl is currently a binary option: panic immediately or never panic on soft lockups. Panicking on any soft lockup, regardless of duration, can be overly aggressive for brief stalls that may be caused by legitimate operations. Conversely, never panicking may allow severe system hangs to persist undetected. Extend softlockup_panic to accept an integer threshold, allowing the kernel to panic only when the normalized lockup duration exceeds N watchdog threshold periods. This provides finer-grained control to distinguish between transient delays and persistent system failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic when duration >= 1 * threshold (20s default, original behavior) - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.) The original behavior is preserved for values 0 and 1, maintaining full backward compatibility while allowing systems to tolerate brief lockups while still catching severe, persistent hangs. [lirongqing@baidu.com: v2] Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:20 -08:00
Pnina Feder	b5bfcc1ffe	kernel/crash: handle multi-page vmcoreinfo in crash kernel copy kimage_crash_copy_vmcoreinfo() currently assumes vmcoreinfo fits in a single page. This breaks if VMCOREINFO_BYTES exceeds PAGE_SIZE. Allocate the required order of control pages and vmap all pages needed to safely copy vmcoreinfo into the crash kernel image. Link: https://lkml.kernel.org/r/20251216132801.807260-3-pnina.feder@mobileye.com Signed-off-by: Pnina Feder <pnina.feder@mobileye.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:20 -08:00
Pnina Feder	76103d1b26	kernel: vmcoreinfo: allocate vmcoreinfo_data based on VMCOREINFO_BYTES Patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE". VMCOREINFO_BYTES is defined as a configurable size, but multiple code paths implicitly assume it always fits into a single page. This series removes that assumption by allocating and mapping vmcoreinfo based on its actual size. Patch 1 updates vmcoreinfo allocation to use get_order(VMCOREINFO_BYTES). Patch 2 updates crash kernel handling to correctly allocate and map multiple pages when copying vmcoreinfo. This makes vmcoreinfo size consistent across the kernel and avoids future breakage if VMCOREINFO_BYTES grows. (No functional change when VMCOREINFO_BYTES == PAGE_SIZE.) This patch (of 2): VMCOREINFO_BYTES defines the size of vmcoreinfo data, but the current implementation assumes a single page allocation. Allocate vmcoreinfo_data using get_order(VMCOREINFO_BYTES) so that vmcoreinfo can safely grow beyond PAGE_SIZE. This avoids hidden assumptions and keeps vmcoreinfo size consistent across the kernel. Link: https://lkml.kernel.org/r/20251216132801.807260-1-pnina.feder@mobileye.com Link: https://lkml.kernel.org/r/20251216132801.807260-2-pnina.feder@mobileye.com Signed-off-by: Pnina Feder <pnina.feder@mobileye.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:20 -08:00
Alejandro Colomar	a9e5620c9a	kernel: fix off-by-one benign bugs We were wasting a byte due to an off-by-one bug. s[c]nprintf() doesn't write more than $2 bytes including the null byte, so trying to pass 'size-1' there is wasting one byte. This is essentially the same as the previous commit, in a different file. Link: https://lkml.kernel.org/r/b4a945a4d40b7104364244f616eb9fb9f1fa691f.1765449750.git.alx@kernel.org Signed-off-by: Alejandro Colomar <alx@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Christopher Bazley <chris.bazley.wg14@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:19 -08:00
Randy Dunlap	24c776355f	kernel.h: drop hex.h and update all hex.h users Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:19 -08:00
Christophe JAILLET	b11052be3e	crash_dump: constify struct configfs_item_operations and configfs_group_operations 'struct configfs_item_operations' and 'configfs_group_operations' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 16339 11001 384 27724 6c4c kernel/crash_dump_dm_crypt.o After: ===== text data bss dec hex filename 16499 10841 384 27724 6c4c kernel/crash_dump_dm_crypt.o Link: https://lkml.kernel.org/r/d046ee5666d2f6b1a48ca1a222dfbd2f7c44462f.1765735035.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Coiby Xu <coxu@redhat.com> Tested-by: Coiby Xu <coxu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:15 -08:00
Linus Torvalds	c25f2fb1f4	Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: - A patch series from David Hildenbrand which fixes a few things related to hugetlb PMD sharing - The remainder are singletons, please see their changelogs for details * tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: restore per-memcg proactive reclaim with !CONFIG_NUMA mm/kfence: fix potential deadlock in reboot notifier Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode mm: do not copy page tables unnecessarily for VM_UFFD_WP mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather mm/rmap: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix hugetlb_pmd_shared() mm: remove unnecessary and incorrect mmap lock assert x86/kfence: avoid writing L1TF-vulnerable PTEs mm/vma: do not leak memory when .mmap_prepare swaps the file migrate: correct lock ordering for hugetlb file folios panic: only warn about deprecated panic_print on write access fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes() mm: take into account mm_cid size for mm_struct static definitions mm: rename cpu_bitmap field to flexible_array mm: add missing static initializer for init_mm::mm_cid.lock	2026-01-20 13:32:16 -08:00
Linus Torvalds	c03e9c42ae	Merge tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - minor fixes for the corner cases of the SWIOTLB pool management (Robin Murphy) * tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: Avoid allocating redundant pools mm_zone: Generalise has_managed_dma() dma/pool: Improve pool lookup	2026-01-20 10:16:18 -08:00
Gal Pressman	90f3c12324	panic: only warn about deprecated panic_print on write access The panic_print_deprecated() warning is being triggered on both read and write operations to the panic_print parameter. This causes spurious warnings when users run 'sysctl -a' to list all sysctl values, since that command reads /proc/sys/kernel/panic_print and triggers the deprecation notice. Modify the handlers to only emit the deprecation warning when the parameter is actually being set: - sysctl_panic_print_handler(): check 'write' flag before warning. - panic_print_get(): remove the deprecation call entirely. This way, users are only warned when they actively try to use the deprecated parameter, not when passively querying system state. Link: https://lkml.kernel.org/r/20260106163321.83586-1-gal@nvidia.com Fixes: `ee13240cd7` ("panic: add note that panic_print sysctl interface is deprecated") Fixes: `2683df6539` ("panic: add note that 'panic_print' parameter is deprecated") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Cc: Feng Tang <feng.tang@linux.alibaba.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-19 12:30:01 -08:00
Linus Torvalds	6f32aa9161	Merge tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Add Chen Ridong as cpuset reviewer - Add SPDX license identifiers to cgroup files that were missing them * tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: kernel: cgroup: Add LGPL-2.1 SPDX license ID to legacy_freezer.c kernel: cgroup: Add SPDX-License-Identifier lines MAINTAINERS: Add Chen Ridong as cpuset reviewer	2026-01-18 14:30:27 -08:00
Linus Torvalds	b671c1dad2	Merge tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix the update_needs_ipi() check in the hrtimer code that may result in incorrect skipping of hrtimer IPIs" * tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix softirq base check in update_needs_ipi()	2026-01-18 10:56:32 -08:00
Linus Torvalds	837c8180e3	Merge tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc deadline scheduler fixes, mainly for a new category of bugs that were discovered and fixed recently: - Fix a race condition in the DL server - Fix a DL server bug which can result in incorrectly going idle when there's work available - Fix DL server bug which triggers a WARN() due to broken get_prio_dl() logic and subsequent misbehavior - Fix double update_rq_clock() calls - Fix setscheduler() assumption about static priorities - Make sure balancing callbacks are always called - Plus a handful of preparatory commits for the fixes" * tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Use ENQUEUE_MOVE to allow priority change sched: Deadline has dynamic priority sched: Audit MOVE vs balance_callbacks sched: Fold rq-pin swizzle into __balance_callbacks() sched/deadline: Avoid double update_rq_clock() sched/deadline: Ensure get_prio_dl() is up-to-date sched/deadline: Fix server stopping with runnable tasks sched: Provide idle_rq() helper sched/deadline: Fix potential race in dl_add_task_root_domain() sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()	2026-01-18 10:17:40 -08:00
Linus Torvalds	b62ce2547f	Merge tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an error path memory leak in the energy model management code, fix a kerneldoc comment in it, and fix and revamp the energy model YNL specification added recently along with the new energy model management netlink interface (that received feedback after being added): - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min)" * tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: Add dump to get-perf-domains in the EM YNL spec PM: EM: Change cpus' type from string to u64 array in the EM YNL spec PM: EM: Rename em.yaml to dev-energymodel.yaml PM: EM: Fix yamllint warnings in the EM YNL spec PM: EM: Fix memory leak in em_create_pd() error path PM: EM: Fix incorrect description of the cost field in struct em_perf_state	2026-01-16 12:08:19 -08:00

1 2 3 4 5 ...

50409 Commits