read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is
bigger than the variable hv_sched_clock_offset, which is cached during
early boot, but depending on the timing this assumption may be false
when a hibernated VM starts again (the clock counter starts from 0
again) and is resuming back (Note: hv_init_tsc_clocksource() is not
called during hibernation/resume); consequently,
read_hv_sched_clock_tsc() may return a negative integer (which is
interpreted as a huge positive integer since the return type is u64)
and new kernel messages are prefixed with huge timestamps before
read_hv_sched_clock_tsc() grows big enough (which typically takes
several seconds).
Fix the issue by saving the Hyper-V clock counter just before the
suspend, and using it to correct the hv_sched_clock_offset in
resume. This makes hv tsc page based sched_clock continuous and ensures
that post resume, it starts from where it left off during suspend.
Override x86_platform.save_sched_clock_state and
x86_platform.restore_sched_clock_state routines to correct this as soon
as possible.
Note: if Invariant TSC is available, the issue doesn't happen because
1) we don't register read_hv_sched_clock_tsc() for sched clock:
See commit e5313f1c54 ("clocksource/drivers/hyper-v: Rework
clocksource and sched clock setup");
2) the common x86 code adjusts TSC similarly: see
__restore_processor_state() -> tsc_verify_tsc_adjust(true) and
x86_platform.restore_sched_clock_state().
Cc: stable@vger.kernel.org
Fixes: 1349401ff1 ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation")
Co-developed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240917053917.76787-1-namjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240917053917.76787-1-namjain@linux.microsoft.com>
Pull x86 fixes from Borislav Petkov:
- Have the Automatic IBRS setting check on AMD does not falsely fire in
the guest when it has been set already on the host
- Make sure cacheinfo structures memory is allocated to address a boot
NULL ptr dereference on Intel Meteor Lake which has different numbers
of subleafs in its CPUID(4) leaf
- Take care of the GDT restoring on the kexec path too, as expected by
the kernel
- Make sure SMP is not disabled when IO-APIC is disabled on the kernel
cmdline
- Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
propagate changes to the kernelmode page tables, to the user portion,
in PTI
- Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
can get lost and thus user-visible delays happen
- Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
write of 0 (WRPKRU) which will mark PKRU in its init state and thus
lose the actual buffer
* tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
x86/cacheinfo: Delete global num_cache_leaves
cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
x86/kexec: Restore GDT on return from ::preserve_context kexec
x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
x86/pkeys: Ensure updated PKRU value is XRSTOR'd
x86/pkeys: Change caller of update_pkru_in_sigframe()
Pull arm64 fixes from Catalin Marinas:
"Nothing major, some left-overs from the recent merging window (MTE,
coco) and some newly found issues like the ptrace() ones.
- MTE/hugetlbfs:
- Set VM_MTE_ALLOWED in the arch code and remove it from the core
code for hugetlbfs mappings
- Fix copy_highpage() warning when the source is a huge page but
not MTE tagged, taking the wrong small page path
- drivers/virt/coco:
- Add the pKVM and Arm CCA drivers under the arm64 maintainership
- Fix the pkvm driver to fall back to ioremap() (and warn) if the
MMIO_GUARD hypercall fails
- Keep the Arm CCA driver default 'n' rather than 'm'
- A series of fixes for the arm64 ptrace() implementation,
potentially leading to the kernel consuming uninitialised stack
variables when PTRACE_SETREGSET is invoked with a length of 0
- Fix zone_dma_limit calculation when RAM starts below 4GB and
ZONE_DMA is capped to this limit
- Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
call to page_to_phys() (from patch_map()) which checks pfn_valid()
before vmemmap has been set up
- Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
ops when the kernel assumes 8-bit ASIDs but running under a
hypervisor on a system that implements 16-bit ASIDs (found running
Linux under Parallels on Apple M4)
- ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
is using the same SMMU PMCG as HIP09 and suffers from the same
errata
- Add GCS to cpucap_is_possible(), missed in the recent merge"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
arm64: cpufeature: Add GCS to cpucap_is_possible()
coco: virt: arm64: Do not enable cca guest driver by default
arm64: mte: Fix copy_highpage() warning on hugetlb folios
arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
arm64: patching: avoid early page_to_phys()
arm64: mm: Fix zone_dma_limit calculation
arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".
This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.
init_cache_level() no longer needs to set num_leaves. Also, it never had to
set num_levels as it is unnecessary in x86. Keep checking for zero cache
leaves. Such condition indicates a bug.
[ bp: Cleanup. ]
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # 6.3+
Link: https://lore.kernel.org/r/20241128002247.26726-3-ricardo.neri-calderon@linux.intel.com
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>
int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;
segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
Currently gcs_set() doesn't initialize the temporary 'user_gcs'
variable, and a SETREGSET call with a length of 0, 8, or 16 will leave
some portion of this uninitialized. Consequently some arbitrary
uninitialized values may be written back to the relevant fields in task
struct, potentially leaking up to 192 bits of memory from the kernel
stack. The read is limited to a specific slot on the stack, and the
issue does not provide a write mechanism.
As gcs_set() rejects cases where user_gcs::features_enabled has bits set
other than PR_SHADOW_STACK_SUPPORTED_STATUS_MASK, a SETREGSET call with
a length of zero will randomly succeed or fail depending on the value of
the uninitialized value, it isn't possible to leak the full 192 bits.
With a length of 8 or 16, user_gcs::features_enabled can be initialized
to an accepted value, making it practical to leak 128 or 64 bits.
Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length or partial write, the
existing contents of the fields which are not written to will be
retained.
To ensure that the extraction and insertion of fields is consistent
across the GETREGSET and SETREGSET calls, new task_gcs_to_user() and
task_gcs_from_user() helpers are added, matching the style of
pac_address_keys_to_user() and pac_address_keys_from_user().
Before this patch:
| # ./gcs-test
| Attempting to write NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x0000000000000000,
| .gcspr_el0 = 0x900d900d900d900d,
| }
| SETREGSET(nt=0x410, len=24) wrote 24 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x0000000000000000,
| .gcspr_el0 = 0x900d900d900d900d,
| }
|
| Attempting partial write NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x1de7ec7edbadc0de,
| .gcspr_el0 = 0x1de7ec7edbadc0de,
| }
| SETREGSET(nt=0x410, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x000000000093e780,
| .gcspr_el0 = 0xffff800083a63d50,
| }
After this patch:
| # ./gcs-test
| Attempting to write NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x0000000000000000,
| .gcspr_el0 = 0x900d900d900d900d,
| }
| SETREGSET(nt=0x410, len=24) wrote 24 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x0000000000000000,
| .gcspr_el0 = 0x900d900d900d900d,
| }
|
| Attempting partial write NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x1de7ec7edbadc0de,
| .gcspr_el0 = 0x1de7ec7edbadc0de,
| }
| SETREGSET(nt=0x410, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
| .features_enabled = 0x0000000000000000,
| .features_locked = 0x0000000000000000,
| .gcspr_el0 = 0x900d900d900d900d,
| }
Fixes: 7ec3b57cb2 ("arm64/ptrace: Expose GCS via ptrace and core files")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently poe_set() doesn't initialize the temporary 'ctrl' variable,
and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently an arbitrary value will be written back to
target->thread.por_el0, potentially leaking up to 64 bits of memory from
the kernel stack. The read is limited to a specific slot on the stack,
and the issue does not provide a write mechanism.
Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
contents of POR_EL1 will be retained.
Before this patch:
| # ./poe-test
| Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
| SETREGSET(nt=0x40f, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
|
| Attempting to write NT_ARM_POE (zero length)
| SETREGSET(nt=0x40f, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0xffff8000839c3d50
After this patch:
| # ./poe-test
| Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
| SETREGSET(nt=0x40f, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
|
| Attempting to write NT_ARM_POE (zero length)
| SETREGSET(nt=0x40f, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
Fixes: 1751981992 ("arm64/ptrace: add support for FEAT_POE")
Cc: <stable@vger.kernel.org> # 6.12.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently fpmr_set() doesn't initialize the temporary 'fpmr' variable,
and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently an arbitrary value will be written back to
target->thread.uw.fpmr, potentially leaking up to 64 bits of memory from
the kernel stack. The read is limited to a specific slot on the stack,
and the issue does not provide a write mechanism.
Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
contents of FPMR will be retained.
Before this patch:
| # ./fpmr-test
| Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
| SETREGSET(nt=0x40e, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
|
| Attempting to write NT_ARM_FPMR (zero length)
| SETREGSET(nt=0x40e, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0xffff800083963d50
After this patch:
| # ./fpmr-test
| Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
| SETREGSET(nt=0x40e, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
|
| Attempting to write NT_ARM_FPMR (zero length)
| SETREGSET(nt=0x40e, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
Fixes: 4035c22ef7 ("arm64/ptrace: Expose FPMR via ptrace")
Cc: <stable@vger.kernel.org> # 6.9.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently tagged_addr_ctrl_set() doesn't initialize the temporary 'ctrl'
variable, and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently tagged_addr_ctrl_set() will consume an
arbitrary value, potentially leaking up to 64 bits of memory from the
kernel stack. The read is limited to a specific slot on the stack, and
the issue does not provide a write mechanism.
As set_tagged_addr_ctrl() only accepts values where bits [63:4] zero and
rejects other values, a partial SETREGSET attempt will randomly succeed
or fail depending on the value of the uninitialized value, and the
exposure is significantly limited.
Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
value of the tagged address ctrl will be retained.
The NT_ARM_TAGGED_ADDR_CTRL regset is only visible in the
user_aarch64_view used by a native AArch64 task to manipulate another
native AArch64 task. As get_tagged_addr_ctrl() only returns an error
value when called for a compat task, tagged_addr_ctrl_get() and
tagged_addr_ctrl_set() should never observe an error value from
get_tagged_addr_ctrl(). Add a WARN_ON_ONCE() to both to indicate that
such an error would be unexpected, and error handlnig is not missing in
either case.
Fixes: 2200aa7154 ("arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The rework of possible CPUs management erroneously disabled SMP when the
IO/APIC is disabled either by the 'noapic' command line parameter or during
IO/APIC setup. SMP is possible without IO/APIC.
Remove the ioapic_is_disabled conditions from the relevant possible CPU
management code paths to restore the orgininal behaviour.
Fixes: 7c0edad364 ("x86/cpu/topology: Rework possible CPU management")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241202145905.1482-1-ffmancera@riseup.net
The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups
respectively) assume that the root page table is actually a 8KiB allocation,
with the userspace root immediately after the kernel root page table (so that
the former can enforce NX on on all the subordinate page tables, which are
actually shared).
However, users of the kernel_ident_mapping_init() code do not give it an 8KiB
allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset()
allocate only a single 4KiB page. The kexec code on x86_64 currently gets
away with it purely by chance, because it allocates 8KiB for its "control
code page" and then actually uses the first half for the PGD, then copies the
actual trampoline code into the second half only after the identmap code has
finished scribbling over it.
Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as
_PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is
exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to
write to the userspace 'shadow' PGD.
Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the
actual page tables; since __pti_set_user_pgtbl() returns the value to be
written to the kernel page table, it could be filtered out. But there seems
to be no benefit to actually doing so.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/412c90a4df7aef077141d9f68d19cbe5602d6c6d.camel@infradead.org
Cc: stable@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Commit 25c17c4b55 ("hugetlb: arm64: add mte support") improved the
copy_highpage() function to update the tags in the destination hugetlb
folio. However, when the source folio isn't tagged, the code takes the
non-hugetlb path where try_page_mte_tagging() warns as the destination
is a hugetlb folio:
WARNING: CPU: 0 PID: 363 at arch/arm64/include/asm/mte.h:58 copy_highpage+0x1d4/0x2d8
[...]
pc : copy_highpage+0x1d4/0x2d8
lr : copy_highpage+0x78/0x2d8
[...]
Call trace:
copy_highpage+0x1d4/0x2d8 (P)
copy_highpage+0x78/0x2d8 (L)
copy_user_highpage+0x20/0x48
copy_user_large_folio+0x1bc/0x268
hugetlb_wp+0x190/0x860
hugetlb_fault+0xa28/0xc10
handle_mm_fault+0x2a0/0x2c0
do_page_fault+0x12c/0x578
do_mem_abort+0x4c/0xa8
el0_da+0x44/0xb0
el0t_64_sync_handler+0xc4/0x138
el0t_64_sync+0x198/0x1a0
Change the check for the tagged status of the source folio so that it
does not fall through the non-hugetlb case. In addition, only perform
the copy (for the full folio) if the source page is the folio head and
warn if the destination folio is already tagged, for symmetry with the
non-hugetlb case.
Fixes: 25c17c4b55 ("hugetlb: arm64: add mte support")
Reported-by: Sasha Levin <sashal@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/Z0STR6VLt2MCalnY@sashalap
Link: https://lore.kernel.org/r/20241204175004.906754-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linux currently sets the TCR_EL1.AS bit unconditionally during CPU
bring-up. On an 8-bit ASID CPU, this is RES0 and ignored, otherwise
16-bit ASIDs are enabled. However, if running in a VM and the hypervisor
reports 8-bit ASIDs (ID_AA64MMFR0_EL1.ASIDBits == 0) on a 16-bit ASIDs
CPU, Linux uses bits 8 to 63 as a generation number for tracking old
process ASIDs. The bottom 8 bits of this generation end up being written
to TTBR1_EL1 and also used for the ASID-based TLBI operations as the
upper 8 bits of the ASID. Following an ASID roll-over event we can have
threads of the same application with the same 8-bit ASID but different
generation numbers running on separate CPUs. Both TLB caching and the
TLBI operations will end up using different actual 16-bit ASIDs for the
same process.
A similar scenario can happen in a big.LITTLE configuration if the boot
CPU only uses 8-bit ASIDs while secondary CPUs have 16-bit ASIDs.
Ensure that the ASID generation is only tracked by bits 16 and up,
leaving bits 15:8 as 0 if the kernel uses 8-bit ASIDs. Note that
clearing TCR_EL1.AS is not sufficient since the architecture requires
that the top 8 bits of the ASID passed to TLBI instructions are 0 rather
than ignored in such configuration.
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241203151941.353796-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull LoongArch fixes from Huacai Chen:
"Fix bugs about EFI screen info, hugetlb pte clear and Lockdep-RCU
splat in KVM, plus some trival cleanups"
* tag 'loongarch-fixes-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCU
LoongArch: KVM: Protect kvm_check_requests() with SRCU
LoongArch: BPF: Adjust the parameter of emit_jirl()
LoongArch: Add architecture specific huge_pte_clear()
LoongArch/irq: Use seq_put_decimal_ull_width() for decimal values
LoongArch: Fix reserving screen info memory for above-4G firmware
When arm64 is configured with CONFIG_DEBUG_VIRTUAL=y, a warning is
printed from the patching code because patch_map(), e.g.
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/patching.c:45 patch_map.constprop.0+0x120/0xd00
| CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-rc1-00002-ge1a5d6c6be55 #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 800003c5 (Nzcv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : patch_map.constprop.0+0x120/0xd00
| lr : patch_map.constprop.0+0x120/0xd00
| sp : ffffa9bb312a79a0
| x29: ffffa9bb312a79a0 x28: 0000000000000001 x27: 0000000000000001
| x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000402e8
| x23: ffffa9bb2c94c1c8 x22: ffffa9bb2c94c000 x21: ffffa9bb222e883c
| x20: 0000000000000002 x19: ffffc1ffc100ba40 x18: ffffa9bb2cf0f21c
| x17: 0000000000000006 x16: 0000000000000000 x15: 0000000000000004
| x14: 1ffff5376625b4ac x13: ffff753766a67fb8 x12: ffff753766919cd1
| x11: 0000000000000003 x10: 1ffff5376625b4c3 x9 : 1ffff5376625b4af
| x8 : ffff753766254f0a x7 : 0000000041b58ab3 x6 : ffff753766254f18
| x5 : ffffa9bb312d9bc0 x4 : 0000000000000000 x3 : ffffa9bb29bd90e4
| x2 : 0000000000000002 x1 : ffffa9bb312d9bc0 x0 : 0000000000000000
| Call trace:
| patch_map.constprop.0+0x120/0xd00 (P)
| patch_map.constprop.0+0x120/0xd00 (L)
| __aarch64_insn_write+0xa8/0x120
| aarch64_insn_patch_text_nosync+0x4c/0xb8
| arch_jump_label_transform_queue+0x7c/0x100
| jump_label_update+0x154/0x460
| static_key_enable_cpuslocked+0x1d8/0x280
| static_key_enable+0x2c/0x48
| early_randomize_kstack_offset+0x104/0x168
| do_early_param+0xe4/0x148
| parse_args+0x3a4/0x838
| parse_early_options+0x50/0x68
| parse_early_param+0x58/0xe0
| setup_arch+0x78/0x1f0
| start_kernel+0xa0/0x530
| __primary_switched+0x8c/0xa0
| irq event stamp: 0
| hardirqs last enabled at (0): [<0000000000000000>] 0x0
| hardirqs last disabled at (0): [<0000000000000000>] 0x0
| softirqs last enabled at (0): [<0000000000000000>] 0x0
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| ---[ end trace 0000000000000000 ]---
The warning has been produced since commit:
3e25d5a49f ("asm-generic: add an optional pfn_valid check to page_to_phys")
... which added a pfn_valid() check into page_to_phys(), and at this
point in boot pfn_valid() will always return false because the vmemmap
has not yet been initialized and there are no valid mem_sections yet.
Before that commit, the arithmetic performed by page_to_phys() would
give the expected physical address, though it is somewhat dubious to use
vmemmap addresses before the vmemmap has been initialized.
Aside from kernel image addresses, all executable code should be
allocated from execmem (where all allocations will fall within the
vmalloc area), and so there's no need for the fallback case when
CONFIG_EXECMEM=n.
Simplify patch_map() accordingly, directly converting kernel image
addresses and removing the redundant fallback case.
Fixes: 3e25d5a49f ("asm-generic: add an optional pfn_valid check to page_to_phys")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241202170359.1475019-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit ba0fb44aed ("dma-mapping: replace zone_dma_bits by
zone_dma_limit") and subsequent patches changed how zone_dma_limit is
calculated to allow a reduced ZONE_DMA even when RAM starts above 4GB.
Commit 122c234ef4 ("arm64: mm: keep low RAM dma zone") further fixed
this to ensure ZONE_DMA remains below U32_MAX if RAM starts below 4GB,
especially on platforms that do not have IORT or DT description of the
device DMA ranges. While zone boundaries calculation was fixed by the
latter commit, zone_dma_limit, used to determine the GFP_DMA flag in the
core code, was not updated. This results in excessive use of GFP_DMA and
unnecessary ZONE_DMA allocations on some platforms.
Update zone_dma_limit to match the actual upper bound of ZONE_DMA.
Fixes: ba0fb44aed ("dma-mapping: replace zone_dma_bits by zone_dma_limit")
Cc: <stable@vger.kernel.org> # 6.12.x
Reported-by: Yutang Jiang <jiangyutang@os.amperecomputing.com>
Tested-by: Yutang Jiang <jiangyutang@os.amperecomputing.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20241125171650.77424-1-yang@os.amperecomputing.com
[catalin.marinas@arm.com: some tweaking of the commit log]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When XSTATE_BV[i] is 0, and XRSTOR attempts to restore state component
'i' it ignores any value in the XSAVE buffer and instead restores the
state component's init value.
This means that if XSAVE writes XSTATE_BV[PKRU]=0 then XRSTOR will
ignore the value that update_pkru_in_sigframe() writes to the XSAVE buffer.
XSTATE_BV[PKRU] only gets written as 0 if PKRU is in its init state. On
Intel CPUs, basically never happens because the kernel usually
overwrites the init value (aside: this is why we didn't notice this bug
until now). But on AMD, the init tracker is more aggressive and will
track PKRU as being in its init state upon any wrpkru(0x0).
Unfortunately, sig_prepare_pkru() does just that: wrpkru(0x0).
This writes XSTATE_BV[PKRU]=0 which makes XRSTOR ignore the PKRU value
in the sigframe.
To fix this, always overwrite the sigframe XSTATE_BV with a value that
has XSTATE_BV[PKRU]==1. This ensures that XRSTOR will not ignore what
update_pkru_in_sigframe() wrote.
The problematic sequence of events is something like this:
Userspace does:
* wrpkru(0xffff0000) (or whatever)
* Hardware sets: XINUSE[PKRU]=1
Signal happens, kernel is entered:
* sig_prepare_pkru() => wrpkru(0x00000000)
* Hardware sets: XINUSE[PKRU]=0 (aggressive AMD init tracker)
* XSAVE writes most of XSAVE buffer, including
XSTATE_BV[PKRU]=XINUSE[PKRU]=0
* update_pkru_in_sigframe() overwrites PKRU in XSAVE buffer
... signal handling
* XRSTOR sees XSTATE_BV[PKRU]==0, ignores just-written value
from update_pkru_in_sigframe()
Fixes: 70044df250 ("x86/pkeys: Update PKRU to enable all pkeys before XSAVE")
Suggested-by: Rudi Horn <rudi.horn@oracle.com>
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241119174520.3987538-3-aruna.ramakrishna%40oracle.com
The branch instructions beq, bne, blt, bge, bltu, bgeu and jirl belong
to the format reg2i16, but the sequence of oprand is different for the
instruction jirl. So adjust the parameter order of emit_jirl() to make
it more readable correspond with the Instruction Set Architecture manual.
Here are the instruction formats:
beq rj, rd, offs16
bne rj, rd, offs16
blt rj, rd, offs16
bge rj, rd, offs16
bltu rj, rd, offs16
bgeu rj, rd, offs16
jirl rd, rj, offs16
Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#branch-instructions
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When executing mm selftests run_vmtests.sh, there is such an error:
BUG: Bad page state in process uffd-unit-tests pfn:00000
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff)
raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat
virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse
nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs
CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 #184
Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000
900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8
900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001
0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000
0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000
000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940
0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000
0000000000000000 90000000028f8940 ffff800000000000 0000000000000000
0000000000000000 0000000000000000 9000000000223a94 000000012001839c
00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
...
Call Trace:
[<9000000000223a94>] show_stack+0x5c/0x180
[<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0
[<900000000056aa08>] bad_page+0x1a0/0x1f0
[<9000000000574978>] free_unref_folios+0xbf0/0xd20
[<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8
[<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260
[<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0
[<9000000000547f30>] tlb_finish_mmu+0xa8/0x218
[<9000000000543cb8>] exit_mmap+0x1a0/0x360
[<9000000000247658>] __mmput+0x78/0x200
[<900000000025583c>] do_exit+0x43c/0xde8
[<9000000000256490>] do_group_exit+0x68/0x110
[<9000000000256554>] sys_exit_group+0x1c/0x20
[<9000000001c413b4>] do_syscall+0x94/0x130
[<90000000002216d8>] handle_syscall+0xb8/0x158
Disabling lock debugging due to kernel taint
BUG: non-zero pgtables_bytes on freeing mm: -16384
On LoongArch system, invalid huge pte entry should be invalid_pte_table
or a single _PAGE_HUGE bit rather than a zero value. And it should be
the same with invalid pmd entry, since pmd_none() is called by function
free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single
_PAGE_HUGE bit is also treated as a valid pte table and free_pte_range()
will be called in free_pmd_range().
free_pmd_range()
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
if (pmd_none_or_clear_bad(pmd))
continue;
free_pte_range(tlb, pmd, addr);
} while (pmd++, addr = next, addr != end);
Here invalid_pte_table is used for both invalid huge pte entry and
pmd entry.
Cc: stable@vger.kernel.org
Fixes: 09cfefb7fa ("LoongArch: Add memory management")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Performance improvement for reading /proc/interrupts on LoongArch.
On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which is less efficient than
seq_put_decimal_ull_width(), stress reading /proc/interrupts indicates
~30% performance improvement with this patch (and its friends).
Signed-off-by: David Wang <00107082@163.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Since screen_info.lfb_base is a __u32 type, an above-4G address need an
ext_lfb_base to present its higher 32bits. In init_screen_info() we can
use __screen_info_lfb_base() to handle this case for reserving screen
info memory.
Signed-off-by: Xuefeng Zhao <zhaoxuefeng@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c component probing support from Wolfram Sang:
"Add OF component probing.
Some devices are designed and manufactured with some components having
multiple drop-in replacement options. These components are often
connected to the mainboard via ribbon cables, having the same signals
and pin assignments across all options. These may include the display
panel and touchscreen on laptops and tablets, and the trackpad on
laptops. Sometimes which component option is used in a particular
device can be detected by some firmware provided identifier, other
times that information is not available, and the kernel has to try to
probe each device.
Instead of a delicate dance between drivers and device tree quirks,
this change introduces a simple I2C component probe function. For a
given class of devices on the same I2C bus, it will go through all of
them, doing a simple I2C read transfer and see which one of them
responds. It will then enable the device that responds"
* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
of: base: Document prefix argument for of_get_next_child_with_prefix()
i2c: Fix whitespace style issue
arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
platform/chrome: Introduce device tree hardware prober
i2c: of-prober: Add GPIO support to simple helpers
i2c: of-prober: Add simple helpers for regulator support
i2c: Introduce OF component probe function
of: base: Add for_each_child_of_node_with_prefix()
of: dynamic: Add of_changeset_update_prop_string
Pull irq fixes from Borislav Petkov:
- Move the ->select callback to the correct ops structure in
irq-mvebu-sei to fix some Marvell Armada platforms
- Add a workaround for Hisilicon ITS erratum 162100801 which can cause
some virtual interrupts to get lost
- More platform_driver::remove() conversion
* tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Switch back to struct platform_driver::remove()
irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
Pull x86 fixes from Borislav Petkov:
- Add a terminating zero end-element to the array describing AMD CPUs
affected by erratum 1386 so that the matching loop actually
terminates instead of going off into the weeds
- Update the boot protocol documentation to mention the fact that the
preferred address to load the kernel to is considered in the
relocatable kernel case too
- Flush the memory buffer containing the microcode patch after applying
microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns
- Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
has been disabled
* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Terminate the erratum_1386_microcode array
x86/Documentation: Update algo in init_size description of boot protocol
x86/microcode/AMD: Flush patch buffer mapping after application
x86/mm: Carve out INVLPG inline asm for use by others
x86/cpu: Fix PPIN initialization
Pull more kvm updates from Paolo Bonzini:
- ARM fixes
- RISC-V Svade and Svadu (accessed and dirty bit) extension support for
host and guest
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test
RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM
dt-bindings: riscv: Add Svade and Svadu Entries
RISC-V: Add Svade and Svadu Extensions Support
KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
KVM: arm64: vgic: Make vgic_get_irq() more robust
KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
Pull sh updates from John Paul Adrian Glaubitz:
"Two small fixes.
The first one by Huacai Chen addresses a runtime warning when
CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected
which occurs because the cpuinfo code on sh incorrectly uses NR_CPUS
when iterating CPUs instead of the runtime limit nr_cpu_ids.
A second fix by Dan Carpenter fixes a use-after-free bug in
register_intc_controller() which occurred as a result of improper
error handling in the interrupt controller driver code when
registering an interrupt controller during plat_irq_setup() on sh"
* tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: intc: Fix use-after-free bug in register_intc_controller()
sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Pull arm64 fixes from Catalin Marinas:
- Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
on it don't run (and fail) on arm64
- Fix lockdep assert in the Arm SMMUv3 PMU driver
- Fix the port and device ID bits setting in the Arm CMN perf driver
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
perf/arm-cmn: Ensure port and device id bits are set properly
perf/arm-smmuv3: Fix lockdep assert in ->event_init()
arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests
Pull Kbuild updates from Masahiro Yamada:
- Add generic support for built-in boot DTB files
- Enable TAB cycling for dialog buttons in nconfig
- Fix issues in streamline_config.pl
- Refactor Kconfig
- Add support for Clang's AutoFDO (Automatic Feedback-Directed
Optimization)
- Add support for Clang's Propeller, a profile-guided optimization.
- Change the working directory to the external module directory for M=
builds
- Support building external modules in a separate output directory
- Enable objtool for *.mod.o and additional kernel objects
- Use lz4 instead of deprecated lz4c
- Work around a performance issue with "git describe"
- Refactor modpost
* tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits)
kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms
gitignore: Don't ignore 'tags' directory
kbuild: add dependency from vmlinux to resolve_btfids
modpost: replace tdb_hash() with hash_str()
kbuild: deb-pkg: add python3:native to build dependency
genksyms: reduce indentation in export_symbol()
modpost: improve error messages in device_id_check()
modpost: rename alias symbol for MODULE_DEVICE_TABLE()
modpost: rename variables in handle_moddevtable()
modpost: move strstarts() to modpost.h
modpost: convert do_usb_table() to a generic handler
modpost: convert do_of_table() to a generic handler
modpost: convert do_pnp_device_entry() to a generic handler
modpost: convert do_pnp_card_entries() to a generic handler
modpost: call module_alias_printf() from all do_*_entry() functions
modpost: pass (struct module *) to do_*_entry() functions
modpost: remove DEF_FIELD_ADDR_VAR() macro
modpost: deduplicate MODULE_ALIAS() for all drivers
modpost: introduce module_alias_printf() helper
modpost: remove unnecessary check in do_acpi_entry()
...
Pull UML updates from Richard Weinberger:
- Lots of cleanups, mostly from Benjamin Berg and Tiwei Bie
- Removal of unused code
- Fix for sparse warnings
- Cleanup around stub_exe()
* tag 'uml-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (68 commits)
hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio()
um: move thread info into task
um: Always dump trace for specified task in show_stack
um: vector: Do not use drvdata in release
um: net: Do not use drvdata in release
um: ubd: Do not use drvdata in release
um: ubd: Initialize ubd's disk pointer in ubd_add
um: virtio_uml: query the number of vqs if supported
um: virtio_uml: fix call_fd IRQ allocation
um: virtio_uml: send SET_MEM_TABLE message with the exact size
um: remove broken double fault detection
um: remove duplicate UM_NSEC_PER_SEC definition
um: remove file sync for stub data
um: always include kconfig.h and compiler-version.h
um: set DONTDUMP and DONTFORK flags on KASAN shadow memory
um: fix sparse warnings in signal code
um: fix sparse warnings from regset refactor
um: Remove double zero check
um: fix stub exe build with CONFIG_GCOV
um: Use os_set_pdeathsig helper in winch thread/process
...
Pull driver core updates from Greg KH:
"Here is a small set of driver core changes for 6.13-rc1.
Nothing major for this merge cycle, except for the two simple merge
conflicts are here just to make life interesting.
Included in here are:
- sysfs core changes and preparations for more sysfs api cleanups
that can come through all driver trees after -rc1 is out
- fw_devlink fixes based on many reports and debugging sessions
- list_for_each_reverse() removal, no one was using it!
- last-minute seq_printf() format string bug found and fixed in many
drivers all at once.
- minor bugfixes and changes full details in the shortlog"
* tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits)
Fix a potential abuse of seq_printf() format string in drivers
cpu: Remove spurious NULL in attribute_group definition
s390/con3215: Remove spurious NULL in attribute_group definition
perf: arm-ni: Remove spurious NULL in attribute_group definition
driver core: Constify bin_attribute definitions
sysfs: attribute_group: allow registration of const bin_attribute
firmware_loader: Fix possible resource leak in fw_log_firmware_info()
drivers: core: fw_devlink: Fix excess parameter description in docstring
driver core: class: Correct WARN() message in APIs class_(for_each|find)_device()
cacheinfo: Use of_property_present() for non-boolean properties
cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap()
drivers: core: fw_devlink: Make the error message a bit more useful
phy: tegra: xusb: Set fwnode for xusb port devices
drm: display: Set fwnode for aux bus devices
driver core: fw_devlink: Stop trying to optimize cycle detection logic
driver core: Constify attribute arguments of binary attributes
sysfs: bin_attribute: add const read/write callback variants
sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR()
sysfs: treewide: constify attribute callback of bin_attribute::llseek()
sysfs: treewide: constify attribute callback of bin_attribute::mmap()
...
Pull more s390 updates from Heiko Carstens:
- Add swap entry for hugetlbfs support
- Add PTE_MARKER support for hugetlbs mappings; this fixes a regression
(possible page fault loop) which was introduced when support for
UFFDIO_POISON for hugetlbfs was added
- Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support
- Mark IRQ entries in entry code, so that stack tracers can filter out
the non-IRQ parts of stack traces. This fixes stack depot capacity
limit warnings, since without filtering the number of unique stack
traces is huge
- In PCI code fix leak of struct zpci_dev object, and fix potential
double remove of hotplug slot
- Fix pagefault_disable() / pagefault_enable() unbalance in
arch_stack_user_walk_common()
- A couple of inline assembly optimizations, more cmpxchg() to
try_cmpxchg() conversions, and removal of usages of xchg() and
cmpxchg() on one and two byte memory areas
- Various other small improvements and cleanups
* tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (27 commits)
Revert "s390/mm: Allow large pages for KASAN shadow mapping"
s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8()
s390/spinlock: Use R constraint for arch_load_niai4()
s390/spinlock: Generate shorter code for arch_spin_unlock()
s390/spinlock: Remove condition code clobber from arch_spin_unlock()
s390/spinlock: Use symbolic names in inline assemblies
s390: Support PREEMPT_DYNAMIC
s390/pci: Fix potential double remove of hotplug slot
s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails
s390/mm/hugetlbfs: Add missing includes
s390/mm: Add PTE_MARKER support for hugetlbfs mappings
s390/mm: Introduce region-third and segment table swap entries
s390/mm: Introduce region-third and segment table entry present bits
s390/mm: Rearrange region-third and segment table entry SW bits
KVM: s390: Increase size of union sca_utility to four bytes
KVM: s390: Remove one byte cmpxchg() usage
KVM: s390: Use try_cmpxchg() instead of cmpxchg() loops
s390/ap: Replace xchg() with WRITE_ONCE()
s390/mm: Allow large pages for KASAN shadow mapping
s390: Add ARCH_HAS_PREEMPT_LAZY support
...
Pull MIPS updates from Thomas Bogendoerfer:
- fix for loongson64 device tree
- add SPI nand to realtek device tree
- change clock tree for mobileye
* tag 'mips_6.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Loongson64: DTS: Really fix PCIe port nodes for ls7a
mips: dts: realtek: Add SPI NAND controller
MIPS: mobileye: eyeq6h: add OLB nodes OLB and remove fixed clocks
MIPS: mobileye: eyeq5: use OLB as provider for fixed factor clocks
Pull ARM updates from Russell King:
- add dev_is_amba() function to allow conversions during the next cycle
- improve PREEMPT_RT performance with VFP
- KASAN fixes for vmap stack
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9431/1: mm: Pair atomic_set_release() with _read_acquire()
ARM: 9430/1: entry: Do a dummy read from VMAP shadow
ARM: 9429/1: ioremap: Sync PGDs for VMALLOC shadow
ARM: 9426/1: vfp: Move sending signals outside of vfp_state_hold()ed section.
ARM: 9425/1: vfp: Use vfp_state_hold() in vfp_support_entry().
ARM: 9424/1: vfp: Use vfp_state_hold() in vfp_sync_hwstate().
ARM: 9423/1: vfp: Provide vfp_state_hold() for VFP locking.
ARM: 9415/1: amba: Add dev_is_amba() function and export it for modules
Pull sparc updates from Andreas Larsson:
- Make sparc64 compilable with clang
- Replace one-element array with flexible array member
* tag 'sparc-for-6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
sparc/vdso: Add helper function for 64-bit right shift on 32-bit target
sparc: Replace one-element array with flexible array member
sparc/build: Add SPARC target flags for compiling with clang
sparc/build: Put usage of -fcall-used* flags behind cc-option
Pull powerpc fixes from Madhavan Srinivasan:
- Fix htmldocs errors in sysfs-bus-event_source-devices-vpa-pmu
- Fix warning due to missing #size-cells on powermac
Thanks to Michael Ellerman, Yang Li, Rob Herring, and Stephen Rothwell.
* tag 'powerpc-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/prom_init: Fixup missing powermac #size-cells
docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Fix htmldocs errors
powerpc/machdep: Remove duplicated include in svm.c
This reverts commit ff123eb774.
Allowing large pages for KASAN shadow mappings isn't inherently wrong,
but adding POPULATE_KASAN_MAP_SHADOW to large_allowed() exposes an issue
in can_large_pud() and can_large_pmd().
Since commit d8073dc6bc ("s390/mm: Allow large pages only for aligned
physical addresses"), both can_large_pud() and can_large_pmd() call _pa()
to check if large page physical addresses are aligned. However, _pa()
has a side effect: it allocates memory in POPULATE_KASAN_MAP_SHADOW
mode. This results in massive memory leaks.
The proper fix would be to address both large_allowed() and _pa()'s side
effects, but for now, revert this change to avoid the leaks.
Fixes: ff123eb774 ("s390/mm: Allow large pages for KASAN shadow mapping")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add a new variant of arch_cmpxchg_niai8() which makes use of the flag
output constraint, which allows the compiler to generate slightly better
code. Also rename arch_cmpxchg_niai8() to arch_try_cmpxchg_niai8() which
reflects the purpose of the function and makes it consistent with other
"try" variants.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The load instruction used within arch_load_niai4() has a short displacement
and index register. Therefore use the R constraint to reflect this.
The used Q constraint does consider an index register.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use mvhhi instead of sth to write a zero to spinlocks. Compared to the
sth variant this avoids the load of zero to a register, and reduces
register pressure.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>