mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-12-27 10:01:39 -05:00
Patch series "Always call constructor for kernel page tables", v2.
There has been much confusion around exactly when page table
constructors/destructors (pagetable_*_[cd]tor) are supposed to be called.
They were initially introduced for user PTEs only (to support split page
table locks), then at the PMD level for the same purpose. Accounting was
added later on, starting at the PTE level and then moving to higher levels
(PMD, PUD). Finally, with my earlier series "Account page tables at all
levels" [1], the ctor/dtor is run for all levels, all the way to PGD.
I thought this was the end of the story, and it hopefully is for user
pgtables, but I was wrong for what concerns kernel pgtables. The current
situation there makes very little sense:
* At the PTE level, the ctor/dtor is not called (at least in the generic
implementation). Specific helpers are used for kernel pgtables at this
level (pte_{alloc,free}_kernel()) and those have never called the
ctor/dtor, most likely because they were initially irrelevant in the
kernel case.
* At all other levels, the ctor/dtor is normally called. This is
potentially wasteful at the PMD level (more on that later).
This series aims to ensure that the ctor/dtor is always called for kernel
pgtables, as it already is for user pgtables. Besides consistency, the
main motivation is to guarantee that ctor/dtor hooks are systematically
called; this makes it possible to insert hooks to protect page tables [2],
for instance. There is however an extra challenge: split locks are not
used for kernel pgtables, and it would therefore be wasteful to initialise
them (ptlock_init()).
It is worth clarifying exactly when split locks are used. They clearly
are for user pgtables, but as illustrated in commit 61444cde91 ("ARM:
8591/1: mm: use fully constructed struct pages for EFI pgd allocations"),
they also are for special page tables like efi_mm. The one case where
split locks are definitely unused is pgtables owned by init_mm; this is
consistent with the behaviour of apply_to_pte_range().
The approach chosen in this series is therefore to pass the mm associated
to the pgtables being constructed to pagetable_{pte,pmd}_ctor() (patch 1),
and skip ptlock_init() if mm == &init_mm (patch 3 and 7). This makes it
possible to call the PTE ctor/dtor from pte_{alloc,free}_kernel() without
unintended consequences (patch 3). As a result the accounting functions
are now called at all levels for kernel pgtables, and split locks are
never initialised.
In configurations where ptlocks are dynamically allocated (32-bit,
PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this
series results in the removal of a kmem_cache allocation for every kernel
PMD. Additionally, for certain architectures that do not use
<asm-generic/pgalloc.h> such as s390, the same optimisation occurs at the
PTE level.
===
Things get more complicated when it comes to special pgtable allocators
(patch 8-12). All architectures need such allocators to create initial
kernel pgtables; we are not concerned with those as the ctor cannot be
called so early in the boot sequence. However, those allocators may also
be used later in the boot sequence or during normal operations. There are
two main use-cases:
1. Mapping EFI memory: efi_mm (arm, arm64, riscv)
2. arch_add_memory(): init_mm
The ctor is already explicitly run (at the PTE/PMD level) in the first
case, as required for pgtables that are not associated with init_mm.
However the same allocators may also be used for the second use-case (or
others), and this is where it gets messy. Patch 1 calls the ctor with
NULL as mm in those situations, as the actual mm isn't available.
Practically this means that ptlocks will be unconditionally initialised.
This is fine on arm - create_mapping_late() is only used for the EFI
mapping. On arm64, __create_pgd_mapping() is also used by
arch_add_memory(); patch 8/9/11 ensure that ctors are called at all levels
with the appropriate mm. The situation is similar on riscv, but
propagating the mm down to the ctor would require significant refactoring.
Since they are already called unconditionally, this series leaves riscv
no worse off - patch 10 adds comments to clarify the situation.
From a cursory look at other architectures implementing arch_add_memory(),
s390 and x86 may also need a similar treatment to add constructor calls.
This is to be taken care of in a future version or as a follow-up.
===
The complications in those special pgtable allocators beg the question:
does it really make sense to treat efi_mm and init_mm differently in e.g.
apply_to_pte_range()? Maybe what we really need is a way to tell if an mm
corresponds to user memory or not, and never use split locks for non-user
mm's. Feedback and suggestions welcome!
This patch (of 12):
In preparation for calling constructors for all kernel page tables while
eliding unnecessary ptlock initialisation, let's pass down the associated
mm to the PTE/PMD level ctors. (These are the two levels where ptlocks
are used.)
In most cases the mm is already around at the point of calling the ctor so
we simply pass it down. This is however not the case for special page
table allocators:
* arch/arm/mm/mmu.c
* arch/arm64/mm/mmu.c
* arch/riscv/mm/init.c
In those cases, the page tables being allocated are either for standard
kernel memory (init_mm) or special page directories, which may not be
associated to any mm. For now let's pass NULL as mm; this will be refined
where possible in future patches.
No functional change in this patch.
Link: https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/ [1]
Link: https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/ [2]
Link: https://lkml.kernel.org/r/20250408095222.860601-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20250408095222.860601-2-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: <x86@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
520 lines
13 KiB
C
520 lines
13 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* linux/arch/m68k/mm/motorola.c
|
|
*
|
|
* Routines specific to the Motorola MMU, originally from:
|
|
* linux/arch/m68k/init.c
|
|
* which are Copyright (C) 1995 Hamish Macdonald
|
|
*
|
|
* Moved 8/20/1999 Sam Creasey
|
|
*/
|
|
|
|
#include <linux/module.h>
|
|
#include <linux/signal.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/swap.h>
|
|
#include <linux/kernel.h>
|
|
#include <linux/string.h>
|
|
#include <linux/types.h>
|
|
#include <linux/init.h>
|
|
#include <linux/memblock.h>
|
|
#include <linux/gfp.h>
|
|
|
|
#include <asm/setup.h>
|
|
#include <linux/uaccess.h>
|
|
#include <asm/page.h>
|
|
#include <asm/pgalloc.h>
|
|
#include <asm/machdep.h>
|
|
#include <asm/io.h>
|
|
#ifdef CONFIG_ATARI
|
|
#include <asm/atari_stram.h>
|
|
#endif
|
|
#include <asm/sections.h>
|
|
|
|
#undef DEBUG
|
|
|
|
#ifndef mm_cachebits
|
|
/*
|
|
* Bits to add to page descriptors for "normal" caching mode.
|
|
* For 68020/030 this is 0.
|
|
* For 68040, this is _PAGE_CACHE040 (cachable, copyback)
|
|
*/
|
|
unsigned long mm_cachebits;
|
|
EXPORT_SYMBOL(mm_cachebits);
|
|
#endif
|
|
|
|
/* Prior to calling these routines, the page should have been flushed
|
|
* from both the cache and ATC, or the CPU might not notice that the
|
|
* cache setting for the page has been changed. -jskov
|
|
*/
|
|
static inline void nocache_page(void *vaddr)
|
|
{
|
|
unsigned long addr = (unsigned long)vaddr;
|
|
|
|
if (CPU_IS_040_OR_060) {
|
|
pte_t *ptep = virt_to_kpte(addr);
|
|
|
|
*ptep = pte_mknocache(*ptep);
|
|
}
|
|
}
|
|
|
|
static inline void cache_page(void *vaddr)
|
|
{
|
|
unsigned long addr = (unsigned long)vaddr;
|
|
|
|
if (CPU_IS_040_OR_060) {
|
|
pte_t *ptep = virt_to_kpte(addr);
|
|
|
|
*ptep = pte_mkcache(*ptep);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Motorola 680x0 user's manual recommends using uncached memory for address
|
|
* translation tables.
|
|
*
|
|
* Seeing how the MMU can be external on (some of) these chips, that seems like
|
|
* a very important recommendation to follow. Provide some helpers to combat
|
|
* 'variation' amongst the users of this.
|
|
*/
|
|
|
|
void mmu_page_ctor(void *page)
|
|
{
|
|
__flush_pages_to_ram(page, 1);
|
|
flush_tlb_kernel_page(page);
|
|
nocache_page(page);
|
|
}
|
|
|
|
void mmu_page_dtor(void *page)
|
|
{
|
|
cache_page(page);
|
|
}
|
|
|
|
/* ++andreas: {get,free}_pointer_table rewritten to use unused fields from
|
|
struct page instead of separately kmalloced struct. Stolen from
|
|
arch/sparc/mm/srmmu.c ... */
|
|
|
|
typedef struct list_head ptable_desc;
|
|
|
|
static struct list_head ptable_list[3] = {
|
|
LIST_HEAD_INIT(ptable_list[0]),
|
|
LIST_HEAD_INIT(ptable_list[1]),
|
|
LIST_HEAD_INIT(ptable_list[2]),
|
|
};
|
|
|
|
#define PD_PTABLE(page) ((ptable_desc *)&(virt_to_page((void *)(page))->lru))
|
|
#define PD_PAGE(ptable) (list_entry(ptable, struct page, lru))
|
|
#define PD_MARKBITS(dp) (*(unsigned int *)&PD_PAGE(dp)->index)
|
|
|
|
static const int ptable_shift[3] = {
|
|
7+2, /* PGD */
|
|
7+2, /* PMD */
|
|
6+2, /* PTE */
|
|
};
|
|
|
|
#define ptable_size(type) (1U << ptable_shift[type])
|
|
#define ptable_mask(type) ((1U << (PAGE_SIZE / ptable_size(type))) - 1)
|
|
|
|
void __init init_pointer_table(void *table, int type)
|
|
{
|
|
ptable_desc *dp;
|
|
unsigned long ptable = (unsigned long)table;
|
|
unsigned long page = ptable & PAGE_MASK;
|
|
unsigned int mask = 1U << ((ptable - page)/ptable_size(type));
|
|
|
|
dp = PD_PTABLE(page);
|
|
if (!(PD_MARKBITS(dp) & mask)) {
|
|
PD_MARKBITS(dp) = ptable_mask(type);
|
|
list_add(dp, &ptable_list[type]);
|
|
}
|
|
|
|
PD_MARKBITS(dp) &= ~mask;
|
|
pr_debug("init_pointer_table: %lx, %x\n", ptable, PD_MARKBITS(dp));
|
|
|
|
/* unreserve the page so it's possible to free that page */
|
|
__ClearPageReserved(PD_PAGE(dp));
|
|
init_page_count(PD_PAGE(dp));
|
|
|
|
return;
|
|
}
|
|
|
|
void *get_pointer_table(struct mm_struct *mm, int type)
|
|
{
|
|
ptable_desc *dp = ptable_list[type].next;
|
|
unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
|
|
unsigned int tmp, off;
|
|
|
|
/*
|
|
* For a pointer table for a user process address space, a
|
|
* table is taken from a page allocated for the purpose. Each
|
|
* page can hold 8 pointer tables. The page is remapped in
|
|
* virtual address space to be noncacheable.
|
|
*/
|
|
if (mask == 0) {
|
|
void *page;
|
|
ptable_desc *new;
|
|
|
|
if (!(page = (void *)get_zeroed_page(GFP_KERNEL)))
|
|
return NULL;
|
|
|
|
switch (type) {
|
|
case TABLE_PTE:
|
|
/*
|
|
* m68k doesn't have SPLIT_PTE_PTLOCKS for not having
|
|
* SMP.
|
|
*/
|
|
pagetable_pte_ctor(mm, virt_to_ptdesc(page));
|
|
break;
|
|
case TABLE_PMD:
|
|
pagetable_pmd_ctor(mm, virt_to_ptdesc(page));
|
|
break;
|
|
case TABLE_PGD:
|
|
pagetable_pgd_ctor(virt_to_ptdesc(page));
|
|
break;
|
|
}
|
|
|
|
mmu_page_ctor(page);
|
|
|
|
new = PD_PTABLE(page);
|
|
PD_MARKBITS(new) = ptable_mask(type) - 1;
|
|
list_add_tail(new, dp);
|
|
|
|
return (pmd_t *)page;
|
|
}
|
|
|
|
for (tmp = 1, off = 0; (mask & tmp) == 0; tmp <<= 1, off += ptable_size(type))
|
|
;
|
|
PD_MARKBITS(dp) = mask & ~tmp;
|
|
if (!PD_MARKBITS(dp)) {
|
|
/* move to end of list */
|
|
list_move_tail(dp, &ptable_list[type]);
|
|
}
|
|
return page_address(PD_PAGE(dp)) + off;
|
|
}
|
|
|
|
int free_pointer_table(void *table, int type)
|
|
{
|
|
ptable_desc *dp;
|
|
unsigned long ptable = (unsigned long)table;
|
|
unsigned long page = ptable & PAGE_MASK;
|
|
unsigned int mask = 1U << ((ptable - page)/ptable_size(type));
|
|
|
|
dp = PD_PTABLE(page);
|
|
if (PD_MARKBITS (dp) & mask)
|
|
panic ("table already free!");
|
|
|
|
PD_MARKBITS (dp) |= mask;
|
|
|
|
if (PD_MARKBITS(dp) == ptable_mask(type)) {
|
|
/* all tables in page are free, free page */
|
|
list_del(dp);
|
|
mmu_page_dtor((void *)page);
|
|
pagetable_dtor(virt_to_ptdesc((void *)page));
|
|
free_page (page);
|
|
return 1;
|
|
} else if (ptable_list[type].next != dp) {
|
|
/*
|
|
* move this descriptor to the front of the list, since
|
|
* it has one or more free tables.
|
|
*/
|
|
list_move(dp, &ptable_list[type]);
|
|
}
|
|
return 0;
|
|
}
|
|
|
|
/* size of memory already mapped in head.S */
|
|
extern __initdata unsigned long m68k_init_mapped_size;
|
|
|
|
extern unsigned long availmem;
|
|
|
|
static pte_t *last_pte_table __initdata = NULL;
|
|
|
|
static pte_t * __init kernel_page_table(void)
|
|
{
|
|
pte_t *pte_table = last_pte_table;
|
|
|
|
if (PAGE_ALIGNED(last_pte_table)) {
|
|
pte_table = memblock_alloc_low(PAGE_SIZE, PAGE_SIZE);
|
|
if (!pte_table) {
|
|
panic("%s: Failed to allocate %lu bytes align=%lx\n",
|
|
__func__, PAGE_SIZE, PAGE_SIZE);
|
|
}
|
|
|
|
clear_page(pte_table);
|
|
mmu_page_ctor(pte_table);
|
|
|
|
last_pte_table = pte_table;
|
|
}
|
|
|
|
last_pte_table += PTRS_PER_PTE;
|
|
|
|
return pte_table;
|
|
}
|
|
|
|
static pmd_t *last_pmd_table __initdata = NULL;
|
|
|
|
static pmd_t * __init kernel_ptr_table(void)
|
|
{
|
|
if (!last_pmd_table) {
|
|
unsigned long pmd, last;
|
|
int i;
|
|
|
|
/* Find the last ptr table that was used in head.S and
|
|
* reuse the remaining space in that page for further
|
|
* ptr tables.
|
|
*/
|
|
last = (unsigned long)kernel_pg_dir;
|
|
for (i = 0; i < PTRS_PER_PGD; i++) {
|
|
pud_t *pud = (pud_t *)(&kernel_pg_dir[i]);
|
|
|
|
if (!pud_present(*pud))
|
|
continue;
|
|
pmd = pgd_page_vaddr(kernel_pg_dir[i]);
|
|
if (pmd > last)
|
|
last = pmd;
|
|
}
|
|
|
|
last_pmd_table = (pmd_t *)last;
|
|
#ifdef DEBUG
|
|
printk("kernel_ptr_init: %p\n", last_pmd_table);
|
|
#endif
|
|
}
|
|
|
|
last_pmd_table += PTRS_PER_PMD;
|
|
if (PAGE_ALIGNED(last_pmd_table)) {
|
|
last_pmd_table = memblock_alloc_low(PAGE_SIZE, PAGE_SIZE);
|
|
if (!last_pmd_table)
|
|
panic("%s: Failed to allocate %lu bytes align=%lx\n",
|
|
__func__, PAGE_SIZE, PAGE_SIZE);
|
|
|
|
clear_page(last_pmd_table);
|
|
mmu_page_ctor(last_pmd_table);
|
|
}
|
|
|
|
return last_pmd_table;
|
|
}
|
|
|
|
static void __init map_node(int node)
|
|
{
|
|
unsigned long physaddr, virtaddr, size;
|
|
pgd_t *pgd_dir;
|
|
p4d_t *p4d_dir;
|
|
pud_t *pud_dir;
|
|
pmd_t *pmd_dir;
|
|
pte_t *pte_dir;
|
|
|
|
size = m68k_memory[node].size;
|
|
physaddr = m68k_memory[node].addr;
|
|
virtaddr = (unsigned long)phys_to_virt(physaddr);
|
|
physaddr |= m68k_supervisor_cachemode |
|
|
_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY;
|
|
if (CPU_IS_040_OR_060)
|
|
physaddr |= _PAGE_GLOBAL040;
|
|
|
|
while (size > 0) {
|
|
#ifdef DEBUG
|
|
if (!(virtaddr & (PMD_SIZE-1)))
|
|
printk ("\npa=%#lx va=%#lx ", physaddr & PAGE_MASK,
|
|
virtaddr);
|
|
#endif
|
|
pgd_dir = pgd_offset_k(virtaddr);
|
|
if (virtaddr && CPU_IS_020_OR_030) {
|
|
if (!(virtaddr & (PGDIR_SIZE-1)) &&
|
|
size >= PGDIR_SIZE) {
|
|
#ifdef DEBUG
|
|
printk ("[very early term]");
|
|
#endif
|
|
pgd_val(*pgd_dir) = physaddr;
|
|
size -= PGDIR_SIZE;
|
|
virtaddr += PGDIR_SIZE;
|
|
physaddr += PGDIR_SIZE;
|
|
continue;
|
|
}
|
|
}
|
|
p4d_dir = p4d_offset(pgd_dir, virtaddr);
|
|
pud_dir = pud_offset(p4d_dir, virtaddr);
|
|
if (!pud_present(*pud_dir)) {
|
|
pmd_dir = kernel_ptr_table();
|
|
#ifdef DEBUG
|
|
printk ("[new pointer %p]", pmd_dir);
|
|
#endif
|
|
pud_set(pud_dir, pmd_dir);
|
|
} else
|
|
pmd_dir = pmd_offset(pud_dir, virtaddr);
|
|
|
|
if (CPU_IS_020_OR_030) {
|
|
if (virtaddr) {
|
|
#ifdef DEBUG
|
|
printk ("[early term]");
|
|
#endif
|
|
pmd_val(*pmd_dir) = physaddr;
|
|
physaddr += PMD_SIZE;
|
|
} else {
|
|
int i;
|
|
#ifdef DEBUG
|
|
printk ("[zero map]");
|
|
#endif
|
|
pte_dir = kernel_page_table();
|
|
pmd_set(pmd_dir, pte_dir);
|
|
|
|
pte_val(*pte_dir++) = 0;
|
|
physaddr += PAGE_SIZE;
|
|
for (i = 1; i < PTRS_PER_PTE; physaddr += PAGE_SIZE, i++)
|
|
pte_val(*pte_dir++) = physaddr;
|
|
}
|
|
size -= PMD_SIZE;
|
|
virtaddr += PMD_SIZE;
|
|
} else {
|
|
if (!pmd_present(*pmd_dir)) {
|
|
#ifdef DEBUG
|
|
printk ("[new table]");
|
|
#endif
|
|
pte_dir = kernel_page_table();
|
|
pmd_set(pmd_dir, pte_dir);
|
|
}
|
|
pte_dir = pte_offset_kernel(pmd_dir, virtaddr);
|
|
|
|
if (virtaddr) {
|
|
if (!pte_present(*pte_dir))
|
|
pte_val(*pte_dir) = physaddr;
|
|
} else
|
|
pte_val(*pte_dir) = 0;
|
|
size -= PAGE_SIZE;
|
|
virtaddr += PAGE_SIZE;
|
|
physaddr += PAGE_SIZE;
|
|
}
|
|
|
|
}
|
|
#ifdef DEBUG
|
|
printk("\n");
|
|
#endif
|
|
}
|
|
|
|
/*
|
|
* Alternate definitions that are compile time constants, for
|
|
* initializing protection_map. The cachebits are fixed later.
|
|
*/
|
|
#define PAGE_NONE_C __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
|
|
#define PAGE_SHARED_C __pgprot(_PAGE_PRESENT | _PAGE_ACCESSED)
|
|
#define PAGE_COPY_C __pgprot(_PAGE_PRESENT | _PAGE_RONLY | _PAGE_ACCESSED)
|
|
#define PAGE_READONLY_C __pgprot(_PAGE_PRESENT | _PAGE_RONLY | _PAGE_ACCESSED)
|
|
|
|
static pgprot_t protection_map[16] __ro_after_init = {
|
|
[VM_NONE] = PAGE_NONE_C,
|
|
[VM_READ] = PAGE_READONLY_C,
|
|
[VM_WRITE] = PAGE_COPY_C,
|
|
[VM_WRITE | VM_READ] = PAGE_COPY_C,
|
|
[VM_EXEC] = PAGE_READONLY_C,
|
|
[VM_EXEC | VM_READ] = PAGE_READONLY_C,
|
|
[VM_EXEC | VM_WRITE] = PAGE_COPY_C,
|
|
[VM_EXEC | VM_WRITE | VM_READ] = PAGE_COPY_C,
|
|
[VM_SHARED] = PAGE_NONE_C,
|
|
[VM_SHARED | VM_READ] = PAGE_READONLY_C,
|
|
[VM_SHARED | VM_WRITE] = PAGE_SHARED_C,
|
|
[VM_SHARED | VM_WRITE | VM_READ] = PAGE_SHARED_C,
|
|
[VM_SHARED | VM_EXEC] = PAGE_READONLY_C,
|
|
[VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY_C,
|
|
[VM_SHARED | VM_EXEC | VM_WRITE] = PAGE_SHARED_C,
|
|
[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_C
|
|
};
|
|
DECLARE_VM_GET_PAGE_PROT
|
|
|
|
/*
|
|
* paging_init() continues the virtual memory environment setup which
|
|
* was begun by the code in arch/head.S.
|
|
*/
|
|
void __init paging_init(void)
|
|
{
|
|
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0, };
|
|
unsigned long min_addr, max_addr;
|
|
unsigned long addr;
|
|
int i;
|
|
|
|
#ifdef DEBUG
|
|
printk ("start of paging_init (%p, %lx)\n", kernel_pg_dir, availmem);
|
|
#endif
|
|
|
|
/* Fix the cache mode in the page descriptors for the 680[46]0. */
|
|
if (CPU_IS_040_OR_060) {
|
|
int i;
|
|
#ifndef mm_cachebits
|
|
mm_cachebits = _PAGE_CACHE040;
|
|
#endif
|
|
for (i = 0; i < 16; i++)
|
|
pgprot_val(protection_map[i]) |= _PAGE_CACHE040;
|
|
}
|
|
|
|
min_addr = m68k_memory[0].addr;
|
|
max_addr = min_addr + m68k_memory[0].size - 1;
|
|
memblock_add_node(m68k_memory[0].addr, m68k_memory[0].size, 0,
|
|
MEMBLOCK_NONE);
|
|
for (i = 1; i < m68k_num_memory;) {
|
|
if (m68k_memory[i].addr < min_addr) {
|
|
printk("Ignoring memory chunk at 0x%lx:0x%lx before the first chunk\n",
|
|
m68k_memory[i].addr, m68k_memory[i].size);
|
|
printk("Fix your bootloader or use a memfile to make use of this area!\n");
|
|
m68k_num_memory--;
|
|
memmove(m68k_memory + i, m68k_memory + i + 1,
|
|
(m68k_num_memory - i) * sizeof(struct m68k_mem_info));
|
|
continue;
|
|
}
|
|
memblock_add_node(m68k_memory[i].addr, m68k_memory[i].size, i,
|
|
MEMBLOCK_NONE);
|
|
addr = m68k_memory[i].addr + m68k_memory[i].size - 1;
|
|
if (addr > max_addr)
|
|
max_addr = addr;
|
|
i++;
|
|
}
|
|
m68k_memoffset = min_addr - PAGE_OFFSET;
|
|
m68k_virt_to_node_shift = fls(max_addr - min_addr) - 6;
|
|
|
|
module_fixup(NULL, __start_fixup, __stop_fixup);
|
|
flush_icache();
|
|
|
|
high_memory = phys_to_virt(max_addr) + 1;
|
|
|
|
min_low_pfn = availmem >> PAGE_SHIFT;
|
|
max_pfn = max_low_pfn = (max_addr >> PAGE_SHIFT) + 1;
|
|
|
|
/* Reserve kernel text/data/bss and the memory allocated in head.S */
|
|
memblock_reserve(m68k_memory[0].addr, availmem - m68k_memory[0].addr);
|
|
|
|
/*
|
|
* Map the physical memory available into the kernel virtual
|
|
* address space. Make sure memblock will not try to allocate
|
|
* pages beyond the memory we already mapped in head.S
|
|
*/
|
|
memblock_set_bottom_up(true);
|
|
|
|
for (i = 0; i < m68k_num_memory; i++) {
|
|
m68k_setup_node(i);
|
|
map_node(i);
|
|
}
|
|
|
|
flush_tlb_all();
|
|
|
|
early_memtest(min_addr, max_addr);
|
|
|
|
/*
|
|
* initialize the bad page table and bad page to point
|
|
* to a couple of allocated pages
|
|
*/
|
|
empty_zero_page = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
|
|
|
|
/*
|
|
* Set up SFC/DFC registers
|
|
*/
|
|
set_fc(USER_DATA);
|
|
|
|
#ifdef DEBUG
|
|
printk ("before free_area_init\n");
|
|
#endif
|
|
for (i = 0; i < m68k_num_memory; i++)
|
|
if (node_present_pages(i))
|
|
node_set_state(i, N_NORMAL_MEMORY);
|
|
|
|
max_zone_pfn[ZONE_DMA] = memblock_end_of_DRAM();
|
|
free_area_init(max_zone_pfn);
|
|
}
|