Commit Graph

1249635 Commits

Author SHA1 Message Date
Ard Biesheuvel
68aec33f8f arm64: mm: Add feature override support for LVA
Add support for overriding the VARange field of the MMFR2 CPU ID
register. This permits the associated LVA feature to be overridden early
enough for the boot code that creates the kernel mapping to take it into
account.

Given that LPA2 implies LVA, disabling the latter should disable the
former as well. So override the ID_AA64MMFR0.TGran field of the current
page size as well if it advertises support for 52-bit addressing.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-71-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel
9cce9c6c2c arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.

Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.

On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.

Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel
e0f92f0d1b arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"
This reverts commit 1682c45b92, which is no longer needed now that
we create the permanent kernel mapping directly during early boot.

This is a RINO (revert in name only) given that some of the code has
moved around, but the changes are straight-forward.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-69-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel
ba5b0333a8 arm64: mm: omit redundant remap of kernel image
Now that the early kernel mapping is created with all the right
attributes and segment boundaries, there is no longer a need to recreate
it and switch to it. This also means we no longer have to copy the kasan
shadow or some parts of the fixmap from one set of page tables to the
other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-68-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:35 +00:00
Ard Biesheuvel
567a70c181 arm64: mm: avoid fixmap for early swapper_pg_dir updates
Early in the boot, when .rodata is still writable, we can poke
swapper_pg_dir entries directly, and there is no need to go through the
fixmap. After a future patch, we will enter the kernel with
swapper_pg_dir already active, and early swapper_pg_dir updates for
creating the fixmap page table hierarchy itself cannot go through the
fixmap for obvious reaons. So let's keep track of whether rodata is
writable, and update the descriptor directly in that case.

As the same reasoning applies to early KASAN init, make the function
noinstr as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-67-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:35 +00:00
Ard Biesheuvel
84b04d3e6b arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.

So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.

Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:34 +00:00
Ard Biesheuvel
34b98e55f6 arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
The mapping from PGD/PUD/PMD to levels and shifts is very confusing,
given that, due to folding, the shifts may be equal for different
levels, if the macros are even #define'd to begin with.

In a subsequent patch, we will modify the ID mapping code to decouple
the number of levels from the kernel's view of how these types are
folded, so prepare for this by reformulating the macros without the use
of these types.

Instead, use SWAPPER_BLOCK_SHIFT as the base quantity, and derive it
from either PAGE_SHIFT or PMD_SHIFT, which -if defined at all- are
defined unambiguously for a given page size, regardless of the number of
configured levels.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-65-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:34 +00:00
Ard Biesheuvel
e6128a8e52 arm64: mm: Use 48-bit virtual addressing for the permanent ID map
Even though we support loading kernels anywhere in 48-bit addressable
physical memory, we create the ID maps based on the number of levels
that we happened to configure for the kernel VA and user VA spaces.

The reason for this is that the PGD/PUD/PMD based classification of
translation levels, along with the associated folding when the number of
levels is less than 5, does not permit creating a page table hierarchy
of a set number of levels. This means that, for instance, on 39-bit VA
kernels we need to configure an additional level above PGD level on the
fly, and 36-bit VA kernels still only support 47-bit virtual addressing
with this trick applied.

Now that we have a separate helper to populate page table hierarchies
that does not define the levels in terms of PUDS/PMDS/etc at all, let's
reuse it to create the permanent ID map with a fixed VA size of 48 bits.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-64-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:34 +00:00
Ard Biesheuvel
97a6f43bb0 arm64: head: Move early kernel mapping routines into C code
The asm version of the kernel mapping code works fine for creating a
coarse grained identity map, but for mapping the kernel down to its
exact boundaries with the right attributes, it is not suitable. This is
why we create a preliminary RWX kernel mapping first, and then rebuild
it from scratch later on.

So let's reimplement this in C, in a way that will make it unnecessary
to create the kernel page tables yet another time in paging_init().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-63-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:33 +00:00
Ard Biesheuvel
82ca151da7 arm64: mmu: Make __cpu_replace_ttbr1() out of line
__cpu_replace_ttbr1() is a static inline, and so it gets instantiated
wherever it is used. This is not really necessary, as it is never called
on a hot path. It also has the unfortunate side effect that the symbol
idmap_cpu_replace_ttbr1 may never be referenced from kCFI enabled C
code, and this means the type id symbol may not exist either.  This will
result in a build error once we start referring to this symbol from asm
code as well. (Note that this problem only occurs when CnP, KAsan and
suspend/resume are all disabled in the Kconfig but that is a valid
config, if unusual).

So let's just move it out of line so all callers will share the same
implementation, which will reference idmap_cpu_replace_ttbr1
unconditionally.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-62-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:33 +00:00
Ard Biesheuvel
293d865f0a arm64: mm: Make kaslr_requires_kpti() a static inline
In preparation for moving the first assignment of arm64_use_ng_mappings
to an earlier stage in the boot, ensure that kaslr_requires_kpti() is
accessible without relying on the core kernel's view on whether or not
KASLR is enabled. So make it a static inline, and move the
kaslr_enabled() check out of it and into the callers, one of which will
disappear in a subsequent patch.

Once/when support for the obsolete ThunderX 1 platform is dropped, this
check reduces to a E0PD feature check on the local CPU.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-61-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:32 +00:00
Ard Biesheuvel
aa6a52b247 arm64: head: move memstart_offset_seed handling to C code
Now that we can set BSS variables from the early code running from the
ID map, we can set memstart_offset_seed directly from the C code that
derives the value instead of passing it back and forth between C and asm
code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-60-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:32 +00:00
Ard Biesheuvel
8d47b8e5c7 arm64: head: allocate more pages for the kernel mapping
In preparation for switching to an early kernel mapping routine that
maps each segment according to its precise boundaries, and with the
correct attributes, let's allocate some extra pages for page tables for
the 4k page size configuration. This is necessary because the start and
end of each segment may not be aligned to the block size, and so we'll
need an extra page table at each segment boundary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-59-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:32 +00:00
Ard Biesheuvel
a669c6a493 arm64: Add helpers to probe local CPU for PAC and BTI support
Add some helpers that will be used by the early kernel mapping code to
check feature support on the local CPU. This permits the early kernel
mapping to be created with the right attributes, removing the need for
tearing it down and recreating it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-58-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
9ddd9baa42 arm64: idreg-override: Create a pseudo feature for rodata=off
Add rodata=off to the set of kernel command line options that is parsed
early using the CPU feature override detection code, so we can easily
refer to it when creating the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-57-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
af73b9a2dd arm64: kaslr: Use feature override instead of parsing the cmdline again
The early kaslr code open codes the detection of 'nokaslr' on the kernel
command line, and this is no longer necessary now that the feature
detection code, which also looks for the same string, executes before
this code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-56-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
35876f35f4 arm64: cpufeature: Add helper to test for CPU feature overrides
Add some helpers to extract and apply feature overrides to the bare
idreg values. This involves inspecting the value and mask of the
specific field that we are interested in, given that an override
value/mask pair might be invalid for one field but valid for another.

Then, wire up the new helper for the hVHE test - note that we can drop
the sysreg test here, as the override will be invalid when trying to
enable hVHE on non-VHE capable hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-55-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Ard Biesheuvel
8a6e40e1f6 arm64: head: move dynamic shadow call stack patching into early C runtime
Once we update the early kernel mapping code to only map the kernel once
with the right permissions, we can no longer perform code patching via
this mapping.

So move this code to an earlier stage of the boot, right after applying
the relocations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-54-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Ard Biesheuvel
dcfe969a64 arm64: head: Run feature override detection before mapping the kernel
To permit the feature overrides to be taken into account before the
KASLR init code runs and the kernel mapping is created, move the
detection code to an earlier stage in the boot.

In a subsequent patch, this will be taken advantage of by merging the
preliminary and permanent mappings of the kernel text and data into a
single one that gets created and relocated before start_kernel() is
called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-53-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Ard Biesheuvel
30687dec5e arm64: Move feature overrides into the BSS section
In order to allow the CPU feature override detection code to run even
earlier, move the feature override global variables into BSS, which is
the only part of the static kernel image that is mapped read-write in
the initial ID map.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-52-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:29 +00:00
Ard Biesheuvel
aa99aad798 arm64: head: Clear BSS and the kernel page tables in one go
We will move the CPU feature overrides into BSS in a subsequent patch,
and this requires that BSS is zeroed before the feature override
detection code runs. So let's map BSS read-write in the ID map, and zero
it via this mapping.

Since the kernel page tables are right next to it, and also zeroed via
the ID map, let's drop the separate clear_page_tables() function, and
just zero everything in one go.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-51-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:29 +00:00
Ard Biesheuvel
9c4cd2a7d1 arm64: kernel: Remove early fdt remap code
The early FDT remap code is no longer used so let's drop it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-50-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:29 +00:00
Ard Biesheuvel
e223a44912 arm64: idreg-override: Move to early mini C runtime
We will want to parse the ID register overrides even earlier, so that we
can take them into account before creating the kernel mapping. So
migrate the code and make it work in the context of the early C runtime.
We will move the invocation to an earlier stage in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-49-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel
734958ef0b arm64: head: move relocation handling to C code
Now that we have a mini C runtime before the kernel mapping is up, we
can move the non-trivial relocation processing code out of head.S and
reimplement it in C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-48-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel
a86aa72eb3 arm64: kernel: Don't rely on objcopy to make code under pi/ __init
We will add some code under pi/ that contains global variables that
should not end up in __initdata, as they will not be writable via the
initial ID map. So only rely on objcopy for making the libfdt code
__init, and use explicit annotations for the rest.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-47-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel
48157aa392 arm64: kernel: Manage absolute relocations in code built under pi/
The mini C runtime runs before relocations are processed, and so it
cannot rely on statically initialized pointer variables.

Add a check to ensure that such code does not get introduced by
accident, by going over the relocations in each object, identifying the
ones that operate on data sections that are part of the executable
image, and raising an error if any relocations of type R_AARCH64_ABS64
exist. Note that such relocations are permitted in other places (e.g.,
debug sections) and will never occur in compiler generated code sections
when using the small code model, so only check sections that have
SHF_ALLOC set and SHF_EXECINSTR cleared.

To accommodate cases where statically initialized symbol references are
unavoidable, introduce a special case for ELF input data sections that
have ".rodata.prel64" in their names, and in these cases, instead of
rejecting any encountered ABS64 relocations, convert them into PREL64
relocations, which don't require any runtime fixups. Note that the code
in question must still be modified to deal with this, as it needs to
convert the 64-bit signed offsets into absolute addresses before use.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-46-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:28 +00:00
Ard Biesheuvel
3567fa63cb arm64: kaslr: Adjust randomization range dynamically
Currently, we base the KASLR randomization range on a rough estimate of
the available space in the upper VA region: the lower 1/4th has the
module region and the upper 1/4th has the fixmap, vmemmap and PCI I/O
ranges, and so we pick a random location in the remaining space in the
middle.

Once we enable support for 5-level paging with 4k pages, this no longer
works: the vmemmap region, being dimensioned to cover a 52-bit linear
region, takes up so much space in the upper VA region (the size of which
is based on a 48-bit VA space for compatibility with non-LVA hardware)
that the region above the vmalloc region takes up more than a quarter of
the available space.

So instead of a heuristic, let's derive the randomization range from the
actual boundaries of the vmalloc region.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-16-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:12 +00:00
Ard Biesheuvel
d432b8d57c arm64: mm: Reclaim unused vmemmap region for vmalloc use
The vmemmap array is statically sized based on the maximum supported
size of the virtual address space, but it is located inside the upper VA
region, which is statically sized based on the *minimum* supported size
of the VA space. This doesn't matter much when using 64k pages, which is
the only configuration that currently supports 52-bit virtual
addressing.

However, upcoming LPA2 support will change this picture somewhat, as in
that case, the vmemmap array will take up more than 25% of the upper VA
region when using 4k pages. Given that most of this space is never used
when running on a system that does not support 52-bit virtual
addressing, let's reclaim the unused vmemmap area in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-15-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:12 +00:00
Ard Biesheuvel
32697ff382 arm64: vmemmap: Avoid base2 order of struct page size to dimension region
The placement and size of the vmemmap region in the kernel virtual
address space is currently derived from the base2 order of the size of a
struct page. This makes for nicely aligned constants with lots of
leading 0xf and trailing 0x0 digits, but given that the actual struct
pages are indexed as an ordinary array, this resulting region is
severely overdimensioned when the size of a struct page is just over a
power of 2.

This doesn't matter today, but once we enable 52-bit virtual addressing
for 4k pages configurations, the vmemmap region may take up almost half
of the upper VA region with the current struct page upper bound at 64
bytes. And once we enable KMSAN or other features that push the size of
a struct page over 64 bytes, we will run out of VMALLOC space entirely.

So instead, let's derive the region size from the actual size of a
struct page, and place the entire region 1 GB from the top of the VA
space, where it still doesn't share any lower level translation table
entries with the fixmap.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-14-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:12 +00:00
Ard Biesheuvel
f9cca24441 arm64: ptdump: Discover start of vmemmap region at runtime
We will soon reclaim the part of the vmemmap region that covers VA space
that is not addressable by the hardware. To avoid confusion, ensure that
the 'vmemmap start' marker points at the start of the region that is
actually being used for the struct page array, rather than the start of
the region we set aside for it at build time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-13-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:11 +00:00
Ard Biesheuvel
34f879fbe4 arm64: ptdump: Allow all region boundaries to be defined at boot time
Rework the way the address_markers array is populated so that we can
tolerate values that are not compile time constants generally, rather
than keeping track manually of the array indexes in question, and poking
new values into them manually. This will be needed for VMALLOC_END,
which will cease to be a compile time constant after a subsequent patch.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-12-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:11 +00:00
Ard Biesheuvel
b730b0f2b1 arm64: mm: Move fixmap region above vmemmap region
Move the fixmap region above the vmemmap region, so that the start of
the vmemmap delineates the end of the region available for vmalloc and
vmap allocations and the randomized placement of the kernel and modules.

In a subsequent patch, we will take advantage of this to reclaim most of
the vmemmap area when running a 52-bit VA capable build with 52-bit
virtual addressing disabled at runtime.

Note that the existing guard region of 256 MiB covers the fixmap and PCI
I/O regions as well, so we can reduce it 8 MiB, which is what we use in
other places too.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-11-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:11 +00:00
Ard Biesheuvel
031e011d8b arm64: mm: Move PCI I/O emulation region above the vmemmap region
Move the PCI I/O region above the vmemmap region in the kernel's VA
space. This will permit us to reclaim the lower part of the vmemmap
region for vmalloc/vmap allocations when running a 52-bit VA capable
build on a 48-bit VA capable system.

Also, given that PCI_IO_START is derived from VMEMMAP_END, use that
symbolic constant directly in ptdump rather than deriving it from
VMEMMAP_START and VMEMMAP_SIZE, as those definitions will change in
subsequent patches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-10-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:10 +00:00
Linus Torvalds
54be6c6c5a Linux 6.8-rc3 v6.8-rc3 2024-02-04 12:20:36 +00:00
Linus Torvalds
3f24fcdacd Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
  and extent handling code"

* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
  ext4: make ext4_map_blocks() distinguish delalloc only extent
  ext4: add a hole extent entry in cache after punch
  ext4: correct the hole length returned by ext4_map_blocks()
  ext4: convert to exclusive lock while inserting delalloc extents
  ext4: refactor ext4_da_map_blocks()
  ext4: remove 'needed' in trace_ext4_discard_preallocations
  ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
  ext4: remove unused return value of ext4_mb_release_group_pa
  ext4: remove unused return value of ext4_mb_release_inode_pa
  ext4: remove unused return value of ext4_mb_release
  ext4: remove unused ext4_allocation_context::ac_groups_considered
  ext4: remove unneeded return value of ext4_mb_release_context
  ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
  ext4: remove unused return value of __mb_check_buddy
  ext4: mark the group block bitmap as corrupted before reporting an error
  ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
  ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
  ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
  ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
  ...
2024-02-04 07:33:01 +00:00
Linus Torvalds
9e28c7a23b Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Five smb3 client fixes, mostly multichannel related:

   - four multichannel fixes including fix for channel allocation when
     multiple inactive channels, fix for unneeded race in channel
     deallocation, correct redundant channel scaling, and redundant
     multichannel disabling scenarios

   - add warning if max compound requests reached"

* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: increase number of PDUs allowed in a compound request
  cifs: failure to add channel on iface should bump up weight
  cifs: do not search for channel if server is terminating
  cifs: avoid redundant calls to disable multichannel
  cifs: make sure that channel scaling is done only once
2024-02-04 07:26:19 +00:00
Linus Torvalds
fc86e5c990 Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:

 - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
   attribute fork

 - Remove conditional compilation of realtime geometry validator
   functions to prevent confusing error messages from being printed on
   the console during the mount operation

* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove conditional building of rt geometry validator functions
  xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
2024-02-04 07:22:51 +00:00
Linus Torvalds
3a0e922079 Merge tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are three tiny driver fixes for 6.8-rc3.  They include:

   - Android binder long-term bug with epoll finally being fixed

   - fastrpc driver shutdown bugfix

   - open-dice lockdep fix

  All of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  binder: signal epoll threads of self-work
  misc: open-dice: Fix spurious lockdep warning
  misc: fastrpc: Mark all sessions as invalid in cb_remove
2024-02-04 07:01:39 +00:00
Linus Torvalds
0214960971 Merge tag 'tty-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 6.8-rc3 that
  resolve a number of reported issues. Included in here are:

   - rs485 flag definition fix that affected the user/kernel abi in -rc1

   - max310x driver fixes

   - 8250_pci1xxxx driver off-by-one fix

   - uart_tiocmget locking race fix

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'tty-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: max310x: prevent infinite while() loop in port startup
  serial: max310x: fail probe if clock crystal is unstable
  serial: max310x: improve crystal stable clock detection
  serial: max310x: set default value when reading clock ready bit
  serial: core: Fix atomicity violation in uart_tiocmget
  serial: 8250_pci1xxxx: fix off by one in pci1xxxx_process_read_data()
  tty: serial: Fix bit order in RS485 flag definitions
2024-02-04 06:58:23 +00:00
Linus Torvalds
809be620dc Merge tag 'usb-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
 "Here are a bunch of small USB driver fixes for 6.8-rc3. Included in
  here are:

   - new usb-serial driver ids

   - new dwc3 driver id added

   - typec driver change revert

   - ncm gadget driver endian bugfix

   - xhci bugfixes for a number of reported issues

   - usb hub bugfix for alternate settings

   - ulpi driver debugfs memory leak fix

   - chipidea driver bugfix

   - usb gadget driver fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
  USB: serial: option: add Fibocom FM101-GL variant
  USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
  USB: serial: cp210x: add ID for IMST iM871A-USB
  usb: typec: tcpm: fix the PD disabled case
  usb: ucsi_acpi: Quirk to ack a connector change ack cmd
  usb: ucsi_acpi: Fix command completion handling
  usb: ucsi: Add missing ppm_lock
  usb: ulpi: Fix debugfs directory leak
  Revert "usb: typec: tcpm: fix cc role at port reset"
  usb: gadget: pch_udc: fix an Excess kernel-doc warning
  usb: f_mass_storage: forbid async queue when shutdown happen
  USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
  usb: chipidea: core: handle power lost in workqueue
  usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
  usb: dwc3: pci: add support for the Intel Arrow Lake-H
  usb: core: Prevent null pointer dereference in update_port_device_state
  xhci: handle isoc Babble and Buffer Overrun events properly
  xhci: process isoc TD properly when there was a transaction error mid TD.
  xhci: fix off by one check when adding a secondary interrupter.
  xhci: fix possible null pointer dereference at secondary interrupter removal
  ...
2024-02-04 06:52:29 +00:00
Linus Torvalds
bdda52cc66 Merge tag 'i2c-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixlet from Wolfram Sang:
 "MAINTAINERS update to point people to the new tree for i2c host driver
  changes"

* tag 'i2c-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Update i2c host drivers repository
2024-02-04 06:47:45 +00:00
Linus Torvalds
8a0c60a0e4 Merge tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
 "Core:

   - fix return value of is_slave_direction() for D2D dma

  Driver fixes for:

   - Documentaion fixes to resolve warnings for at_hdmac driver

   - bunch of fsl driver fixes for memory leaks, and useless kfree

   - TI edma and k3 fixes for packet error and null pointer checks"

* tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: at_hdmac: add missing kernel-doc style description
  dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
  dmaengine: fsl-qdma: Remove a useless devm_kfree()
  dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
  dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
  dmaengine: ti: k3-udma: Report short packet errors
  dmaengine: ti: edma: Add some null pointer checks to the edma_probe
  dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
  dmaengine: at_hdmac: fix some kernel-doc warnings
2024-02-04 06:37:38 +00:00
Linus Torvalds
843a33d63b Merge tag 'phy-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy driver fixes from Vinod Koul:

 - TI null pointer dereference

 - missing erdes mux entry in lan966x driver

 - Return of error code in renesas driver

 - Serdes init sequence and register offsets for IPQ drivers

* tag 'phy-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
  phy: lan966x: Add missing serdes mux entry
  phy: renesas: rcar-gen3-usb2: Fix returning wrong error code
  phy: qcom-qmp-usb: fix serdes init sequence for IPQ6018
  phy: qcom-qmp-usb: fix register offsets for ipq8074/ipq6018
2024-02-04 06:35:00 +00:00
Wolfram Sang
957bd221ac Merge tag 'i2c-host-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Just a maintenance patch that updates the repository where the
i2c host and muxes related patches will be collected.
2024-02-03 19:23:41 +01:00
Linus Torvalds
b555d19156 Merge tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
 "Vendor events:

   - Intel Alderlake/Sapphire Rapids metric fixes, the CPU type
     ("cpu_atom", "cpu_core") needs to be used as a prefix to be
     considered on a metric formula, detected via one of the 'perf test'
     entries.

  'perf test' fixes:

   - Fix the creation of event selector lists on 'perf test' entries, by
     initializing the sample ID flag, which is done by 'perf record', so
     this fix affects only the tests, the common case isn't affected

   - Make 'perf list' respect debug settings (-v) to fix its 'perf test'
     entry

   - Fix 'perf script' test when python support isn't enabled

   - Special case 'perf script' tests on s390, where only DWARF call
     graphs are supported and only on software events

   - Make 'perf daemon' signal test less racy

  Compiler warnings/errors:

   - Remove needless malloc(0) call in 'perf top' that triggers
     -Walloc-size

   - Fix calloc() argument order to address error introduced in gcc-14

  Build:

   - Make minimal shellcheck version to v0.6.0, avoiding the build to
     fail with older versions

  Sync kernel header copies:

   - stat.h to pick STATX_MNT_ID_UNIQUE

   - msr-index.h to pick IA32_MKTME_KEYID_PARTITIONING

   - drm.h to pick DRM_IOCTL_MODE_CLOSEFB

   - unistd.h to pick {list,stat}mount,
     lsm_{[gs]et_self_attr,list_modules} syscall numbers

   - x86 cpufeatures to pick TDX, Zen, APIC MSR fence changes

   - x86's mem{cpy,set}_64.S used in 'perf bench'

   - Also, without tooling effects: asm-generic/unaligned.h, mount.h,
     fcntl.h, kvm headers"

* tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits)
  perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
  tools include UAPI: Sync linux/mount.h copy with the kernel sources
  perf evlist: Fix evlist__new_default() for > 1 core PMU
  tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
  tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
  tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
  perf vendor events intel: Alderlake/sapphirerapids metric fixes
  tools headers UAPI: Sync kvm headers with the kernel sources
  perf tools: Fix calloc() arguments to address error introduced in gcc-14
  perf top: Remove needless malloc(0) call that triggers -Walloc-size
  perf build: Make minimal shellcheck version to v0.6.0
  tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
  perf test shell daemon: Make signal test less racy
  perf test shell script: Fix test for python being disabled
  perf test: Workaround debug output in list test
  perf list: Add output file option
  perf list: Switch error message to pr_err() to respect debug settings (-v)
  perf test: Fix 'perf script' tests on s390
  tools headers UAPI: Sync linux/fcntl.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
  ...
2024-02-03 12:52:36 +00:00
Linus Torvalds
56897d5188 Merge tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing and eventfs fixes from Steven Rostedt:

 - Fix the return code for ring_buffer_poll_wait()

   It was returing a -EINVAL instead of EPOLLERR.

 - Zero out the tracefs_inode so that all fields are initialized.

   The ti->private could have had stale data, but instead of just
   initializing it to NULL, clear out the entire structure when it is
   allocated.

 - Fix a crash in timerlat

   The hrtimer was initialized at read and not open, but is canceled at
   close. If the file was opened and never read the close will pass a
   NULL pointer to hrtime_cancel().

 - Rewrite of eventfs.

   Linus wrote a patch series to remove the dentry references in the
   eventfs_inode and to use ref counting and more of proper VFS
   interfaces to make it work.

 - Add warning to put_ei() if ei is not set to free. That means
   something is about to free it when it shouldn't.

 - Restructure the eventfs_inode to make it more compact, and remove the
   unused llist field.

 - Remove the fsnotify*() funtions for when the inodes were being
   created in the lookup code. It doesn't make sense to notify about
   creation just because something is being looked up.

 - The inode hard link count was not accurate.

   It was being updated when a file was looked up. The inodes of
   directories were updating their parent inode hard link count every
   time the inode was created. That means if memory reclaim cleaned a
   stale directory inode and the inode was lookup up again, it would
   increment the parent inode again as well. Al Viro said to just have
   all eventfs directories have a hard link count of 1. That tells user
   space not to trust it.

* tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Keep all directory links at 1
  eventfs: Remove fsnotify*() functions from lookup()
  eventfs: Restructure eventfs_inode structure to be more condensed
  eventfs: Warn if an eventfs_inode is freed without is_freed being set
  tracing/timerlat: Move hrtimer_init to timerlat_fd open()
  eventfs: Get rid of dentry pointers without refcounts
  eventfs: Clean up dentry ops and add revalidate function
  eventfs: Remove unused d_parent pointer field
  tracefs: dentry lookup crapectomy
  tracefs: Avoid using the ei->dentry pointer unnecessarily
  eventfs: Initialize the tracefs inode properly
  tracefs: Zero out the tracefs_inode when allocating it
  ring-buffer: Clean ring_buffer_poll_wait() error return
2024-02-02 15:32:58 -08:00
Linus Torvalds
6b89b6af45 Merge tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 revert from Andreas Gruenbacher:
 "It turns out that the commit to use GL_NOBLOCK flag for non-blocking
  lookups has several issues, and not all of them have a simple fix"

* tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
2024-02-02 15:30:33 -08:00
Linus Torvalds
b1dd6c26bc Merge tag 'pci-v6.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:

 - Fix a potential deadlock that was reintroduced by an ASPM revert
   merged for v6.8 (Johan Hovold)

 - Add Manivannan Sadhasivam as PCI Endpoint maintainer (Lorenzo
   Pieralisi)

* tag 'pci-v6.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: Add Manivannan Sadhasivam as PCI Endpoint maintainer
  PCI/ASPM: Fix deadlock when enabling ASPM
2024-02-02 12:56:56 -08:00
Linus Torvalds
9c2f0338bb Merge tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm
Pul drm fixes from Dave Airlie:
 "Regular weekly fixes, mostly amdgpu and xe. One nouveau fix is a
  better fix for the deadlock and also helps with a sync race we were
  seeing.

  dma-buf:
   - heaps CMA page accounting fix

  virtio-gpu:
   - fix segment size

  xe:
   - A crash fix
   - A fix for an assert due to missing mem_acces ref
   - Only allow a single user-fence per exec / bind.
   - Some sparse warning fixes
   - Two fixes for compilation failures on various odd combinations of
     gcc / arch pointed out on LKML.
   - Fix a fragile partial allocation pointed out on LKML.
   - A sysfs ABI documentation warning fix

  amdgpu:
   - Fix reboot issue seen on some 7000 series dGPUs
   - Fix client init order for KFD
   - Misc display fixes
   - USB-C fix
   - DCN 3.5 fixes
   - Fix issues with GPU scheduler and GPU reset
   - GPU firmware loading fix
   - Misc fixes
   - GC 11.5 fix
   - VCN 4.0.5 fix
   - IH overflow fix

  amdkfd:
   - SVM fixes
   - Trap handler fix
   - Fix device permission lookup
   - Properly reserve BO before validating it

  nouveau:
   - fence/irq lock deadlock fix (second attempt)
   - gsp command size fix

* tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm: (35 commits)
  nouveau: offload fence uevents work to workqueue
  nouveau/gsp: use correct size for registry rpc.
  drm/amdgpu/pm: Use inline function for IP version check
  drm/hwmon: Fix abi doc warnings
  drm/xe: Make all GuC ABI shift values unsigned
  drm/xe/vm: Subclass userptr vmas
  drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines
  drm/xe: Don't use __user error pointers
  drm/xe: Annotate mcr_[un]lock()
  drm/xe: Only allow 1 ufence per exec / bind IOCTL
  drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms
  drm/xe: Fix crash in trace_dma_fence_init()
  drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
  drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
  drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
  drm/amdkfd: reserve the BO before validating it
  drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
  drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'
  drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'
  drm/amd: Don't init MEC2 firmware when it fails to load
  ...
2024-02-02 12:54:46 -08:00
Linus Torvalds
eab5c86d24 Merge tag 'input-for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a fix for the fix to deal with newer laptops which get confused by
   the "GET ID" command when probing for PS/2 keyboards

 - a couple of tweaks to i8042 to handle Clevo NS70PU and Lifebook U728
   laptops

 - a change to bcm5974 to validate that the device has appropriate
   endpoints

 - an addition of new product ID to xpad driver to recognize Lenovo
   Legion Go controllers

 - a quirk to Goodix controller to deal with extra GPIO described in
   ACPI tables on some devices.

* tag 'input-for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Fujitsu Lifebook U728 to i8042 quirk table
  Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
  Input: atkbd - do not skip atkbd_deactivate() when skipping ATKBD_CMD_GETID
  Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
  Input: bcm5974 - check endpoint type before starting traffic
  Input: xpad - add Lenovo Legion Go controllers
  Input: goodix - accept ACPI resources with gpio_count == 3 && gpio_int_idx == 0
2024-02-02 12:52:44 -08:00