linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 09:45:08 -04:00

Author	SHA1	Message	Date
Linus Torvalds	9591fdb061	Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Borislav Petkov: - Remove a bunch of asm implementing condition flags testing in KVM's emulator in favor of int3_emulate_jcc() which is written in C - Replace KVM fastops with C-based stubs which avoids problems with the fastop infra related to latter not adhering to the C ABI due to their special calling convention and, more importantly, bypassing compiler control-flow integrity checking because they're written in asm - Remove wrongly used static branches and other ugliness accumulated over time in hyperv's hypercall implementation with a proper static function call to the correct hypervisor call variant - Add some fixes and modifications to allow running FRED-enabled kernels in KVM even on non-FRED hardware - Add kCFI improvements like validating indirect calls and prepare for enabling kCFI with GCC. Add cmdline params documentation and other code cleanups - Use the single-byte 0xd6 insn as the official #UD single-byte undefined opcode instruction as agreed upon by both x86 vendors - Other smaller cleanups and touchups all over the place * tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86,retpoline: Optimize patch_retpoline() x86,ibt: Use UDB instead of 0xEA x86/cfi: Remove __noinitretpoline and __noretpoline x86/cfi: Add "debug" option to "cfi=" bootparam x86/cfi: Standardize on common "CFI:" prefix for CFI reports x86/cfi: Document the "cfi=" bootparam options x86/traps: Clarify KCFI instruction layout compiler_types.h: Move __nocfi out of compiler-specific header objtool: Validate kCFI calls x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware x86/fred: Install system vector handlers even if FRED isn't fully enabled x86/hyperv: Use direct call to hypercall-page x86/hyperv: Clean up hv_do_hypercall() KVM: x86: Remove fastops KVM: x86: Convert em_salc() to C KVM: x86: Introduce EM_ASM_3WCL KVM: x86: Introduce EM_ASM_1SRC2 KVM: x86: Introduce EM_ASM_2CL KVM: x86: Introduce EM_ASM_2W ...	2025-10-11 11:19:16 -07:00
Kees Cook	23ef9d4397	kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2025-09-24 14:29:14 -07:00
Peter Zijlstra	4a1e02b15a	x86,retpoline: Optimize patch_retpoline() Currently the very common retpoline: "CS CALL __x86_indirect_thunk_r11" is transformed into "CALL R11; NOP3" for eIBRS/BHI_NO parts. Similarly, paranoid fineibt has: "CALL R11; NOP". Recognise that CS stuffing can avoid the extra NOP. However, due to prefix decode penalties, make sure to not emit too many CS prefixes. Notably: "CS CALL __x86_indirect_thunk_rax" must not become "CS CS CS CS CALL RAX". Prefix decode penalties are typically many more cycles than decoding an extra NOP. Additionally, if the retpoline is a tail-call, the "JMP %\reg" should be followed by INT3 for straight-line-speculation mitigation, since emit_indirect() now has a length argument, move this into emit_indirect() such that other users (paranoid-fineibt) also do this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250902104627.GM4068168@noisy.programming.kicks-ass.net	2025-09-04 21:59:09 +02:00
Peter Zijlstra	85a2d4a890	x86,ibt: Use UDB instead of 0xEA A while ago [0] FineIBT started using the 0xEA instruction to raise #UD. All existing parts will generate #UD in 64bit mode on that instruction. However; Intel/AMD have not blessed using this instruction, it is on their 'reserved' opcode list for future use. Peter Anvin worked the committees and got use of 0xD6 blessed, it shall be called UDB (per the next SDM or so), and it being a single byte instruction is easy to slip into a single byte immediate -- as is done by this very patch. Reworking the FineIBT code to use UDB wasn't entirely trivial. Notably the FineIBT-BHI1 case ran out of bytes. In order to condense the encoding some it was required to move the hash register from R10D to EAX (thanks hpa!). Per the x86_64 ABI, RAX is used to pass the number of vector registers for vararg function calls -- something that should not happen in the kernel. More so, the kernel is built with -mskip-rax-setup, which should leave RAX completely unused, allowing its re-use. [ For BPF; while the bpf2bpf tail-call uses RAX in its calling convention, that does not use CFI and is unaffected. Only the 'regular' C->BPF transition is covered by CFI. ] The ENDBR poison value is changed from 'OSP NOP3' to 'NOPL -42(%RAX)', this is basically NOP4 but with UDB as its immediate. As such it is still a non-standard NOP value unique to prior ENDBR sites, but now also provides UDB. Per Agner Fog's optimization guide, Jcc is assumed not-taken. That is, the expected path should be the fallthrough case for improved throughput. Since the preamble now relies on the ENDBR poison to provide UDB, the code is changed to write the poison right along with the initial preamble -- this is possible because the ITS mitigation already disabled IBT over rewriting the CFI scheme. The scheme in detail: Preamble: FineIBT FineIBT-BHI1 FineIBT-BHI __cfi_\func: __cfi_\func: __cfi_\func: endbr endbr endbr subl $0x12345678, %eax subl $0x12345678, %eax subl $0x12345678, %eax jne.d32,np \func+3 cmovne %rax, %rdi cs cs call __bhi_args_N jne.d8,np \func+3 \func: \func: \func: nopl -42(%rax) nopl -42(%rax) nopl -42(%rax) Notably there are 7 bytes available after the SUBL; this enables the BHI1 case to fit without the nasty overlapping case it had previously. The !BHI case uses Jcc.d32,np to consume all 7 bytes without the need for an additional NOP, while the BHI case uses CS padding to align the CALL with the end of the preamble such that it returns to \func+0. Caller: FineIBT Paranoid-FineIBT fineibt_caller: fineibt_caller: mov $0x12345678, %eax mov $0x12345678, %eax lea -10(%r11), %r11 cmp -0x11(%r11), %eax nop5 cs lea -0x10(%r11), %r11 retpoline: retpoline: cs call __x86_indirect_thunk_r11 jne fineibt_caller+0xd call %r11 nop Notably this is before apply_retpolines() which will fix up the retpoline call -- since all parts with IBT also have eIBRS (lets ignore ITS). Typically the retpoline site is rewritten (when still intact) into: call %r11 nop3 [0] `06926c6cdb` ("x86/ibt: Optimize the FineIBT instruction sequence") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250901191307.GI4067720@noisy.programming.kicks-ass.net	2025-09-04 21:59:09 +02:00
Kees Cook	026211c40b	x86/cfi: Add "debug" option to "cfi=" bootparam Add "debug" option for "cfi=" bootparam to get details on early CFI initialization steps so future Kees can find breakage easier. Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250904034656.3670313-5-kees@kernel.org	2025-09-04 21:59:08 +02:00
Kees Cook	9f303a35d1	x86/cfi: Standardize on common "CFI:" prefix for CFI reports Use a regular "CFI:" prefix for CFI reports during alternatives setup, including reporting when nothing has happened (i.e. CONFIG_FINEIBT=n). Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250904034656.3670313-4-kees@kernel.org	2025-09-04 21:59:08 +02:00
Linus Torvalds	da23ea194d	Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: "Significant patch series in this pull request: - "mseal cleanups" (Lorenzo Stoakes) Some mseal cleaning with no intended functional change. - "Optimizations for khugepaged" (David Hildenbrand) Improve khugepaged throughput by batching PTE operations for large folios. This gain is mainly for arm64. - "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport) A bugfix, additional debug code and cleanups to the execmem code. - "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song) Bugfixes, cleanups and performance improvememnts to the mTHP swapin code" * tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits) mm: mempool: fix crash in mempool_free() for zero-minimum pools mm: correct type for vmalloc vm_flags fields mm/shmem, swap: fix major fault counting mm/shmem, swap: rework swap entry and index calculation for large swapin mm/shmem, swap: simplify swapin path and result handling mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO mm/shmem, swap: tidy up swap entry splitting mm/shmem, swap: tidy up THP swapin checks mm/shmem, swap: avoid redundant Xarray lookup during swapin x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations execmem: drop writable parameter from execmem_fill_trapping_insns() execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) execmem: move execmem_force_rw() and execmem_restore_rox() before use execmem: rework execmem_cache_free() execmem: introduce execmem_alloc_rw() execmem: drop unused execmem_update_copy() mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped mm/rmap: add anon_vma lifetime debug check mm: remove mm/io-mapping.c ...	2025-08-05 16:02:07 +03:00
Mike Rapoport (Microsoft)	838955f64a	execmem: introduce execmem_alloc_rw() Some callers of execmem_alloc() require the memory to be temporarily writable even when it is allocated from ROX cache. These callers use execemem_make_temp_rw() right after the call to execmem_alloc(). Wrap this sequence in execmem_alloc_rw() API. Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Sami Tolvanen	f1befc82ad	cfi: Move BPF CFI types and helpers to generic code Instead of duplicating the same code for each architecture, move the CFI type hash variables for BPF function types and related helper functions to generic CFI code, and allow architectures to override the function definitions if needed. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250801001004.1859976-7-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-31 18:23:53 -07:00
Mark Rutland	5ccaeedb48	cfi: add C CFI type macro Currently x86 and riscv open-code 4 instances of the same logic to define a u32 variable with the KCFI typeid of a given function. Replace the duplicate logic with a common macro. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Maxwell Bland <mbland@motorola.com> Signed-off-by: Maxwell Bland <mbland@motorola.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Dao Huang <huangdao1@oppo.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250801001004.1859976-6-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-31 18:23:53 -07:00
Masami Hiramatsu (Google)	2aebf5ee43	x86/alternatives: Fix int3 handling failure from broken text_poke array Since smp_text_poke_single() does not expect there is another text_poke request is queued, it can make text_poke_array not sorted or cause a buffer overflow on the text_poke_array.vec[]. This will cause an Oops in int3 because of bsearch failing; CPU 0 CPU 1 CPU 2 ----- ----- ----- smp_text_poke_batch_add() smp_text_poke_single() <<-- Adds out of order <int3> [Fails o find address in text_poke_array ] OOPS! Or unhandled page fault because of a buffer overflow; CPU 0 CPU 1 ----- ----- smp_text_poke_batch_add() <<+ ... \| smp_text_poke_batch_add() <<-- Adds TEXT_POKE_ARRAY_MAX times. smp_text_poke_single() { __smp_text_poke_batch_add() <<-- Adds entry at TEXT_POKE_ARRAY_MAX + 1 smp_text_poke_batch_finish() [Unhandled page fault because text_poke_array.nr_entries is overwritten] BUG! } Use smp_text_poke_batch_add() instead of __smp_text_poke_batch_add() so that it correctly flush the queue if needed. Closes: https://lore.kernel.org/all/CA+G9fYsLu0roY3DV=tKyqP7FEKbOEETRvTDhnpPxJGbA=Cg+4w@mail.gmail.com/ Fixes: `c8976ade0c` ("x86/alternatives: Simplify smp_text_poke_single() by using tp_vec and existing APIs") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lkml.kernel.org/r/\ 175020512308.3582717.13631440385506146631.stgit@mhiramat.tok.corp.google.com	2025-06-18 13:59:56 +02:00
Lukas Bulwahn	3c902383f2	x86/its: Fix an ifdef typo in its_alloc() Commit `a82b26451d` ("x86/its: explicitly manage permissions for ITS pages") reworks its_alloc() and introduces a typo in an ifdef conditional, referring to CONFIG_MODULE instead of CONFIG_MODULES. Fix this typo in its_alloc(). Fixes: `a82b26451d` ("x86/its: explicitly manage permissions for ITS pages") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250616100432.22941-1-lukas.bulwahn%40redhat.com	2025-06-17 16:10:57 -07:00
Peter Zijlstra (Intel)	a82b26451d	x86/its: explicitly manage permissions for ITS pages execmem_alloc() sets permissions differently depending on the kernel configuration, CPU support for PSE and whether a page is allocated before or after mark_rodata_ro(). Add tracking for pages allocated for ITS when patching the core kernel and make sure the permissions for ITS pages are explicitly managed for both kernel and module allocations. Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-5-rppt@kernel.org	2025-06-11 11:20:52 +02:00
Mike Rapoport (Microsoft)	0b0cae7119	x86/its: move its_pages array to struct mod_arch_specific The of pages with ITS thunks allocated for modules are tracked by an array in 'struct module'. Since this is very architecture specific data structure, move it to 'struct mod_arch_specific'. No functional changes. Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-4-rppt@kernel.org	2025-06-11 11:20:51 +02:00
Ingo Molnar	412751aa69	Merge tag 'v6.15-rc7' into x86/core, to pick up fixes Pick up build fixes from upstream to make this tree more testable. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-21 08:45:03 +02:00
Kirill A. Shutemov	7212b58d6d	x86/mm/64: Make 5-level paging support unconditional Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. All major x86 Linux distributions have the feature enabled. Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com	2025-05-17 10:38:16 +02:00
Eric Biggers	9f35e33144	x86/its: Fix build errors when CONFIG_MODULES=n Fix several build errors when CONFIG_MODULES=n, including the following: ../arch/x86/kernel/alternative.c:195:25: error: incomplete definition of type 'struct module' 195 \| for (int i = 0; i < mod->its_num_pages; i++) { Fixes: `872df34d7c` ("x86/its: Use dynamic thunks for indirect branches") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-05-13 14:36:08 -07:00
Ingo Molnar	c4070e1996	Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts Conflicts: Documentation/admin-guide/hw-vuln/index.rst arch/x86/include/asm/cpufeatures.h arch/x86/kernel/alternative.c arch/x86/kernel/cpu/bugs.c arch/x86/kernel/cpu/common.c drivers/base/cpu.c include/linux/cpu.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-13 10:47:10 +02:00
Peter Zijlstra	e52c1dc745	x86/its: FineIBT-paranoid vs ITS FineIBT-paranoid was using the retpoline bytes for the paranoid check, disabling retpolines, because all parts that have IBT also have eIBRS and thus don't need no stinking retpolines. Except... ITS needs the retpolines for indirect calls must not be in the first half of a cacheline :-/ So what was the paranoid call sequence: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b <f0> lea -0x10(%r11), %r11 e: 75 fd jne d <fineibt_paranoid_start+0xd> 10: 41 ff d3 call %r11 13: 90 nop Now becomes: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b f0 lea -0x10(%r11), %r11 e: 2e e8 XX XX XX XX cs call __x86_indirect_paranoid_thunk_r11 Where the paranoid_thunk looks like: 1d: <ea> (bad) __x86_indirect_paranoid_thunk_r11: 1e: 75 fd jne 1d __x86_indirect_its_thunk_r11: 20: 41 ff eb jmp %r11 23: cc int3 [ dhansen: remove initialization to false ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:39:36 -07:00
Peter Zijlstra	872df34d7c	x86/its: Use dynamic thunks for indirect branches ITS mitigation moves the unsafe indirect branches to a safe thunk. This could degrade the prediction accuracy as the source address of indirect branches becomes same for different execution paths. To improve the predictions, and hence the performance, assign a separate thunk for each indirect callsite. This is also a defense-in-depth measure to avoid indirect branches aliasing with each other. As an example, 5000 dynamic thunks would utilize around 16 bits of the address space, thereby gaining entropy. For a BTB that uses 32 bits for indexing, dynamic thunks could provide better prediction accuracy over fixed thunks. Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that they are both more flexible (got to extend them later) and live in 2M TLBs, just like kernel code, avoiding undue TLB pressure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:36:58 -07:00
Pawan Gupta	ebebe30794	x86/ibt: Keep IBT disabled during alternative patching cfi_rewrite_callers() updates the fineIBT hash matching at the caller side, but except for paranoid-mode it relies on apply_retpoline() and friends for any ENDBR relocation. This could temporarily cause an indirect branch to land on a poisoned ENDBR. For instance, with para-virtualization enabled, a simple wrmsrl() could have an indirect branch pointing to native_write_msr() who's ENDBR has been relocated due to fineIBT: <wrmsrl>: push %rbp mov %rsp,%rbp mov %esi,%eax mov %rsi,%rdx shr $0x20,%rdx mov %edi,%edi mov %rax,%rsi call 0x21e65d0(%rip) # <pv_ops+0xb8> ^^^^^^^^^^^^^^^^^^^^^^^ Such an indirect call during the alternative patching could #CP if the caller is not yet* adjusted for the new target ENDBR. To prevent a false #CP, keep CET-IBT disabled until all callers are patched. Patching during the module load does not need to be guarded by IBT-disable because the module code is not executed until the patching is complete. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:33:35 -07:00
Pawan Gupta	a75bf27fe4	x86/its: Add support for ITS-safe return thunk RETs in the lower half of cacheline may be affected by ITS bug, specifically when the RSB-underflows. Use ITS-safe return thunk for such RETs. RETs that are not patched: - RET in retpoline sequence does not need to be patched, because the sequence itself fills an RSB before RET. - RET in Call Depth Tracking (CDT) thunks __x86_indirect_{call\|jump}_thunk and call_depth_return_thunk are not patched because CDT by design prevents RSB-underflow. - RETs in .init section are not reachable after init. - RETs that are explicitly marked safe with ANNOTATE_UNRET_SAFE. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:05 -07:00
Pawan Gupta	8754e67ad4	x86/its: Add support for ITS-safe indirect thunk Due to ITS, indirect branches in the lower half of a cacheline may be vulnerable to branch target injection attack. Introduce ITS-safe thunks to patch indirect branches in the lower half of cacheline with the thunk. Also thunk any eBPF generated indirect branches in emit_indirect_jump(). Below category of indirect branches are not mitigated: - Indirect branches in the .init section are not mitigated because they are discarded after boot. - Indirect branches that are explicitly marked retpoline-safe. Note that retpoline also mitigates the indirect branches against ITS. This is because the retpoline sequence fills an RSB entry before RET, and it does not suffer from RSB-underflow part of the ITS. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>	2025-05-09 13:22:04 -07:00
Peter Zijlstra	4873f494bb	x86/mm: Remove 'mm' argument from unuse_temporary_mm() again Now that unuse_temporary_mm() lives in tlb.c it can access cpu_tlbstate.loaded_mm. [ mingo: Merged it on top of x86/alternatives ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-5-mingo@kernel.org	2025-04-12 10:05:56 +02:00
Andy Lutomirski	d376972c98	x86/mm: Make use_/unuse_temporary_mm() non-static This prepares them for use outside of the alternative machinery. The code is unchanged. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-4-mingo@kernel.org	2025-04-12 10:05:52 +02:00
Peter Zijlstra	0812e096cf	x86/mm: Add 'mm' argument to unuse_temporary_mm() In commit `209954cbc7` ("x86/mm/tlb: Update mm_cpumask lazily") unuse_temporary_mm() grew the assumption that it gets used on poking_mm exclusively. While this is currently true, lets not hard code this assumption. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-2-mingo@kernel.org	2025-04-12 10:05:37 +02:00
Nikolay Borisov	23a76739d6	x86/alternatives: Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() Simplify the alternatives interface some more by moving the poke_batch_finish check into poke_batch_process and renaming the latter. The net effect is one less function name to consider when reading the code. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-54-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	4f9534719e	x86/alternatives: Add comment about noinstr expectations Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-53-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	023f42dd59	x86/alternatives: Rename 'apply_relocation()' to 'text_poke_apply_relocation()' Join the text_poke_*() API namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-52-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	dac0d75427	x86/alternatives: Update the comments in smp_text_poke_batch_process() - Capitalize 'INT3' consistently, - make it clear that 'sync cores' means an SMP sync to all CPUs, - fix typos and spelling. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-51-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	2c373ca064	x86/alternatives: Remove 'smp_text_poke_batch_flush()' It only has a single user left, merge it into smp_text_poke_batch_add() and remove the helper function. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-50-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	b1bb39185d	x86/alternatives: Move declarations of vmlinux.lds.S defined section symbols to <asm/alternative.h> Move it from the middle of a .c file next to the similar declarations of __alt_instructions[] et al. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-49-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	db5c68c88c	x86/alternatives: Simplify the #include section We accumulated lots of unnecessary header inclusions over the years, trim them. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-48-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	3c8454dfc9	x86/alternatives: Rename 'POKE_MAX_OPCODE_SIZE' to 'TEXT_POKE_MAX_OPCODE_SIZE' Join the TEXT_POKE_ namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-47-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	8036fbe5a5	x86/alternatives: Rename 'TP_ARRAY_NR_ENTRIES_MAX' to 'TEXT_POKE_ARRAY_MAX' Standardize on TEXT_POKE_ namespace for CPP constants too. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-46-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	22b9662313	x86/alternatives: Standardize on 'tpl' local variable names for 'struct smp_text_poke_loc *' There's no toilet paper in this code. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-45-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	3e6f47573e	x86/alternatives: Simplify and clean up patch_cmp() - No need to cast over to 'struct smp_text_poke_loc ', void is just fine for a binary search, - Use the canonical (a, b) input parameter nomenclature of cmp_func_t functions and rename the input parameters from (tp, elt) to (tpl_a, tpl_b). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-44-mingo@kernel.org	2025-04-11 11:01:35 +02:00
Ingo Molnar	6af9540379	x86/alternatives: Constify text_poke_addr() This will also allow the simplification of patch_cmp(). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-43-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	0e67e587e2	x86/alternatives: Simplify text_poke_addr_ordered() - Use direct 'void *' pointer comparison, there's no need to force the type to 'unsigned long'. - Remove the 'tp' local variable indirection Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-42-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	6e4955a9d7	x86/alternatives: Rename 'text_poke_sync()' to 'smp_text_poke_sync_each_cpu()' Unlike sync_core(), text_poke_sync() is a very heavy operation, as it sends an IPI to every online CPU in the system and waits for completion. Reflect this in the name. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-41-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	7fbadb50d9	x86/alternatives: Move text_poke_array completion from smp_text_poke_batch_finish() and smp_text_poke_batch_flush() to smp_text_poke_batch_process() Simplifies the code and improves code generation a bit: text data bss dec hex filename 14769 1017 4112 19898 4dba alternative.o.before 14742 1017 4112 19871 4d9f alternative.o.after Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-40-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	cca3473956	x86/alternatives: Add documentation for smp_text_poke_batch_add() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-39-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	9647ce4652	x86/alternatives: Document 'smp_text_poke_single()' Extend the documentation to better describe its purpose. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-38-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	8a6a1b4e0e	x86/alternatives: Remove the mixed-patching restriction on smp_text_poke_single() At this point smp_text_poke_single(addr, opcode, len, emulate) is equivalent to: smp_text_poke_batch_add(addr, opcode, len, emulate); smp_text_poke_batch_finish(); So remove the restriction on mixing single-instruction patching with multi-instruction patching. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-37-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	0e351aec2b	x86/alternatives: Move the text_poke_array manipulation into text_poke_int3_loc_init() and rename it to __smp_text_poke_batch_add() This simplifies the code and code generation a bit: text data bss dec hex filename 14802 1029 4112 19943 4de7 alternative.o.before 14784 1029 4112 19925 4dd5 alternative.o.after Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-36-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	74e8e2bf95	x86/alternatives: Simplify smp_text_poke_batch_process() This function is now using the text_poke_array state exclusively, make that explicit by removing the redundant input parameters. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-34-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	8e35752f0c	x86/alternatives: Simplify smp_text_poke_int3_handler() Remove the 'desc' local variable indirection and use text_poke_array directly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-33-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	b6a25841c1	x86/alternatives: Simplify try_get_text_poke_array() There's no need to return a pointer on success - it's always the same pointer. Return a bool instead. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-32-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	3916eec516	x86/alternatives: Rename 'put_desc()' to 'put_text_poke_array()' Just like with try_get_text_poke_array(), this name better reflects what the underlying code is doing, there's no 'descriptor' indirection anymore. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-31-mingo@kernel.org	2025-04-11 11:01:34 +02:00
Ingo Molnar	46f3d9d329	x86/alternatives: Rename 'try_get_desc()' to 'try_get_text_poke_array()' This better reflects what the underlying code is doing, there's no 'descriptor' indirection anymore. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250411054105.2341982-30-mingo@kernel.org	2025-04-11 11:01:34 +02:00

1 2 3 4 5 ...

369 Commits