[mirror_ubuntu-bionic-kernel.git] / arch / cris / arch-v10 / README.mm

Memory management for CRIS/MMU
------------------------------
HISTORY:

$Log: README.mm,v $
Revision 1.1  2001/12/17 13:59:27  bjornw
Initial revision

Revision 1.1  2000/07/10 16:25:21  bjornw
Initial revision

Revision 1.4  2000/01/17 02:31:59  bjornw
Added discussion of paging and VM.

Revision 1.3  1999/12/03 16:43:23  hp
Blurb about that the 3.5G-limitation is not a MMU limitation

Revision 1.2  1999/12/03 16:04:21  hp
Picky comment about not mapping the first page

Revision 1.1  1999/12/03 15:41:30  bjornw
First version of CRIS/MMU memory layout specification.


------------------------------

See the ETRAX-NG HSDD for reference.

We use the page-size of 8 kbytes, as opposed to the i386 page-size of 4 kbytes.

The MMU can, apart from the normal mapping of pages, also do a top-level
segmentation of the kernel memory space. We use this feature to avoid having
to use page-tables to map the physical memory into the kernel's address
space. We also use it to keep the user-mode virtual mapping in the same
map during kernel-mode, so that the kernel easily can access the corresponding
user-mode process' data.

As a comparison, the Linux/i386 2.0 puts the kernel and physical RAM at
address 0, overlapping with the user-mode virtual space, so that descriptor
registers are needed for each memory access to specify which MMU space to
map through. That changed in 2.2, putting the kernel/physical RAM at 
0xc0000000, to co-exist with the user-mode mapping. We will do something
quite similar, but with the additional complexity of having to map the
internal chip I/O registers and the flash memory area (including SRAM
and peripherial chip-selets).

The kernel-mode segmentation map:

        ------------------------                ------------------------
FFFFFFFF|                      | => cached      |                      | 
        |    kernel seg_f      |    flash       |                      |
F0000000|______________________|                |                      |
EFFFFFFF|                      | => uncached    |                      | 
        |    kernel seg_e      |    flash       |                      |
E0000000|______________________|                |        DRAM          |
DFFFFFFF|                      |  paged to any  |      Un-cached       | 
        |    kernel seg_d      |    =======>    |                      |
D0000000|______________________|                |                      |
CFFFFFFF|                      |                |                      | 
        |    kernel seg_c      |==\             |                      |
C0000000|______________________|   \            |______________________|
BFFFFFFF|                      |  uncached      |                      |
        |    kernel seg_b      |=====\=========>|       Registers      |
B0000000|______________________|      \c        |______________________|
AFFFFFFF|                      |       \a       |                      |
        |                      |        \c      | FLASH/SRAM/Peripheral|
        |                      |         \h     |______________________|
        |                      |          \e    |                      |
        |                      |           \d   |                      |
        | kernel seg_0 - seg_a |            \==>|         DRAM         | 
        |                      |                |        Cached        |
        |                      |  paged to any  |                      |
        |                      |    =======>    |______________________| 
        |                      |                |                      |
        |                      |                |        Illegal       |
        |                      |                |______________________|
        |                      |                |                      |      
        |                      |                | FLASH/SRAM/Peripheral|
00000000|______________________|                |______________________|

In user-mode it looks the same except that only the space 0-AFFFFFFF is
available. Therefore, in this model, the virtual address space per process
is limited to 0xb0000000 bytes (minus 8192 bytes, since the first page,
0..8191, is never mapped, in order to trap NULL references).

It also means that the total physical RAM that can be mapped is 256 MB
(kseg_c above). More RAM can be mapped by choosing a different segmentation
and shrinking the user-mode memory space.

The MMU can map all 4 GB in user mode, but doing that would mean that a
few extra instructions would be needed for each access to user mode
memory.

The kernel needs access to both cached and uncached flash. Uncached is
necessary because of the special write/erase sequences. Also, the 
peripherial chip-selects are decoded from that region.

The kernel also needs its own virtual memory space. That is kseg_d. It
is used by the vmalloc() kernel function to allocate virtual contiguous
chunks of memory not possible using the normal kmalloc physical RAM 
allocator.

The setting of the actual MMU control registers to use this layout would
be something like this:

R_MMU_KSEG = ( ( seg_f, seg     ) |   // Flash cached
               ( seg_e, seg     ) |   // Flash uncached
               ( seg_d, page    ) |   // kernel vmalloc area    
               ( seg_c, seg     ) |   // kernel linear segment
               ( seg_b, seg     ) |   // kernel linear segment
               ( seg_a, page    ) |
               ( seg_9, page    ) |
               ( seg_8, page    ) |
               ( seg_7, page    ) |
               ( seg_6, page    ) |
               ( seg_5, page    ) |
               ( seg_4, page    ) |
               ( seg_3, page    ) |
               ( seg_2, page    ) |
               ( seg_1, page    ) |
               ( seg_0, page    ) );

R_MMU_KBASE_HI = ( ( base_f, 0x0 ) |   // flash/sram/periph cached
                   ( base_e, 0x8 ) |   // flash/sram/periph uncached
                   ( base_d, 0x0 ) |   // don't care
                   ( base_c, 0x4 ) |   // physical RAM cached area
                   ( base_b, 0xb ) |   // uncached on-chip registers
                   ( base_a, 0x0 ) |   // don't care
                   ( base_9, 0x0 ) |   // don't care
                   ( base_8, 0x0 ) );  // don't care

R_MMU_KBASE_LO = ( ( base_7, 0x0 ) |   // don't care
                   ( base_6, 0x0 ) |   // don't care
                   ( base_5, 0x0 ) |   // don't care
                   ( base_4, 0x0 ) |   // don't care
                   ( base_3, 0x0 ) |   // don't care
                   ( base_2, 0x0 ) |   // don't care
                   ( base_1, 0x0 ) |   // don't care
                   ( base_0, 0x0 ) );  // don't care

NOTE: while setting up the MMU, we run in a non-mapped mode in the DRAM (0x40
segment) and need to setup the seg_4 to a unity mapping, so that we don't get
a fault before we have had time to jump into the real kernel segment (0xc0). This
is done in head.S temporarily, but fixed by the kernel later in paging_init.


Paging - PTE's, PMD's and PGD's
-------------------------------

[ References: asm/pgtable.h, asm/page.h, asm/mmu.h ]

The paging mechanism uses virtual addresses to split a process memory-space into
pages, a page being the smallest unit that can be freely remapped in memory. On
Linux/CRIS, a page is 8192 bytes (for technical reasons not equal to 4096 as in 
most other 32-bit architectures). It would be inefficient to let a virtual memory
mapping be controlled by a long table of page mappings, so it is broken down into
a 2-level structure with a Page Directory containing pointers to Page Tables which
each have maps of up to 2048 pages (8192 / sizeof(void *)). Linux can actually
handle 3-level structures as well, with a Page Middle Directory in between, but
in many cases, this is folded into a two-level structure by excluding the Middle
Directory.

We'll take a look at how an address is translated while we discuss how it's handled
in the Linux kernel.

The example address is 0xd004000c; in binary this is:

31       23       15       7      0
11010000 00000100 00000000 00001100

|______| |__________||____________|
  PGD        PTE       page offset

Given the top-level Page Directory, the offset in that directory is calculated
using the upper 8 bits:

static inline pgd_t * pgd_offset(struct mm_struct * mm, unsigned long address)
{
	return mm->pgd + (address >> PGDIR_SHIFT);
}

PGDIR_SHIFT is the log2 of the amount of memory an entry in the PGD can map; in our
case it is 24, corresponding to 16 MB. This means that each entry in the PGD 
corresponds to 16 MB of virtual memory.

The pgd_t from our example will therefore be the 208'th (0xd0) entry in mm->pgd.

Since the Middle Directory does not exist, it is a unity mapping:

static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
{
	return (pmd_t *) dir;
}

The Page Table provides the final lookup by using bits 13 to 23 as index:

static inline pte_t * pte_offset(pmd_t * dir, unsigned long address)
{
	return (pte_t *) pmd_page(*dir) + ((address >> PAGE_SHIFT) &
					   (PTRS_PER_PTE - 1));
}

PAGE_SHIFT is the log2 of the size of a page; 13 in our case. PTRS_PER_PTE is
the number of pointers that fit in a Page Table and is used to mask off the 
PGD-part of the address.

The so-far unused bits 0 to 12 are used to index inside a page linearily.

The VM system
-------------

The kernels own page-directory is the swapper_pg_dir, cleared in paging_init, 
and contains the kernels virtual mappings (the kernel itself is not paged - it
is mapped linearily using kseg_c as described above). Architectures without
kernel segments like the i386, need to setup swapper_pg_dir directly in head.S
to map the kernel itself. swapper_pg_dir is pointed to by init_mm.pgd as the
init-task's PGD.

To see what support functions are used to setup a page-table, let's look at the
kernel's internal paged memory system, vmalloc/vfree.

void * vmalloc(unsigned long size)

The vmalloc-system keeps a paged segment in kernel-space at 0xd0000000. What
happens first is that a virtual address chunk is allocated to the request using
get_vm_area(size). After that, physical RAM pages are allocated and put into
the kernel's page-table using alloc_area_pages(addr, size). 

static int alloc_area_pages(unsigned long address, unsigned long size)

First the PGD entry is found using init_mm.pgd. This is passed to
alloc_area_pmd (remember the 3->2 folding). It uses pte_alloc_kernel to
check if the PGD entry points anywhere - if not, a page table page is
allocated and the PGD entry updated. Then the alloc_area_pte function is
used just like alloc_area_pmd to check which page table entry is desired, 
and a physical page is allocated and the table entry updated. All of this
is repeated at the top-level until the entire address range specified has 
been mapped.
Commit	Line	Data
	1	Memory management for CRIS/MMU
	2	------------------------------
	3	HISTORY:
	4
	5	$Log: README.mm,v $
	6	Revision 1.1 2001/12/17 13:59:27 bjornw
	7	Initial revision
	8
	9	Revision 1.1 2000/07/10 16:25:21 bjornw
	10	Initial revision
	11
	12	Revision 1.4 2000/01/17 02:31:59 bjornw
	13	Added discussion of paging and VM.
	14
	15	Revision 1.3 1999/12/03 16:43:23 hp
	16	Blurb about that the 3.5G-limitation is not a MMU limitation
	17
	18	Revision 1.2 1999/12/03 16:04:21 hp
	19	Picky comment about not mapping the first page
	20
	21	Revision 1.1 1999/12/03 15:41:30 bjornw
	22	First version of CRIS/MMU memory layout specification.
	23
	24
	25
	26
	27
	28	------------------------------
	29
	30	See the ETRAX-NG HSDD for reference.
	31
	32	We use the page-size of 8 kbytes, as opposed to the i386 page-size of 4 kbytes.
	33
	34	The MMU can, apart from the normal mapping of pages, also do a top-level
	35	segmentation of the kernel memory space. We use this feature to avoid having
	36	to use page-tables to map the physical memory into the kernel's address
	37	space. We also use it to keep the user-mode virtual mapping in the same
	38	map during kernel-mode, so that the kernel easily can access the corresponding
	39	user-mode process' data.
	40
	41	As a comparison, the Linux/i386 2.0 puts the kernel and physical RAM at
	42	address 0, overlapping with the user-mode virtual space, so that descriptor
	43	registers are needed for each memory access to specify which MMU space to
	44	map through. That changed in 2.2, putting the kernel/physical RAM at
	45	0xc0000000, to co-exist with the user-mode mapping. We will do something
	46	quite similar, but with the additional complexity of having to map the
	47	internal chip I/O registers and the flash memory area (including SRAM
	48	and peripherial chip-selets).
	49
	50	The kernel-mode segmentation map:
	51
	52	------------------------ ------------------------
	53	FFFFFFFF\| \| => cached \| \|
	54	\| kernel seg_f \| flash \| \|
	55	F0000000\|______________________\| \| \|
	56	EFFFFFFF\| \| => uncached \| \|
	57	\| kernel seg_e \| flash \| \|
	58	E0000000\|______________________\| \| DRAM \|
	59	DFFFFFFF\| \| paged to any \| Un-cached \|
	60	\| kernel seg_d \| =======> \| \|
	61	D0000000\|______________________\| \| \|
	62	CFFFFFFF\| \| \| \|
	63	\| kernel seg_c \|==\ \| \|
	64	C0000000\|______________________\| \ \|______________________\|
	65	BFFFFFFF\| \| uncached \| \|
	66	\| kernel seg_b \|=====\=========>\| Registers \|
	67	B0000000\|______________________\| \c \|______________________\|
	68	AFFFFFFF\| \| \a \| \|
	69	\| \| \c \| FLASH/SRAM/Peripheral\|
	70	\| \| \h \|______________________\|
	71	\| \| \e \| \|
	72	\| \| \d \| \|
	73	\| kernel seg_0 - seg_a \| \==>\| DRAM \|
	74	\| \| \| Cached \|
	75	\| \| paged to any \| \|
	76	\| \| =======> \|______________________\|
	77	\| \| \| \|
	78	\| \| \| Illegal \|
	79	\| \| \|______________________\|
	80	\| \| \| \|
	81	\| \| \| FLASH/SRAM/Peripheral\|
	82	00000000\|______________________\| \|______________________\|
	83
	84	In user-mode it looks the same except that only the space 0-AFFFFFFF is
	85	available. Therefore, in this model, the virtual address space per process
	86	is limited to 0xb0000000 bytes (minus 8192 bytes, since the first page,
	87	0..8191, is never mapped, in order to trap NULL references).
	88
	89	It also means that the total physical RAM that can be mapped is 256 MB
	90	(kseg_c above). More RAM can be mapped by choosing a different segmentation
	91	and shrinking the user-mode memory space.
	92
	93	The MMU can map all 4 GB in user mode, but doing that would mean that a
	94	few extra instructions would be needed for each access to user mode
	95	memory.
	96
	97	The kernel needs access to both cached and uncached flash. Uncached is
	98	necessary because of the special write/erase sequences. Also, the
	99	peripherial chip-selects are decoded from that region.
	100
	101	The kernel also needs its own virtual memory space. That is kseg_d. It
	102	is used by the vmalloc() kernel function to allocate virtual contiguous
	103	chunks of memory not possible using the normal kmalloc physical RAM
	104	allocator.
	105
	106	The setting of the actual MMU control registers to use this layout would
	107	be something like this:
	108
	109	R_MMU_KSEG = ( ( seg_f, seg ) \| // Flash cached
	110	( seg_e, seg ) \| // Flash uncached
	111	( seg_d, page ) \| // kernel vmalloc area
	112	( seg_c, seg ) \| // kernel linear segment
	113	( seg_b, seg ) \| // kernel linear segment
	114	( seg_a, page ) \|
	115	( seg_9, page ) \|
	116	( seg_8, page ) \|
	117	( seg_7, page ) \|
	118	( seg_6, page ) \|
	119	( seg_5, page ) \|
	120	( seg_4, page ) \|
	121	( seg_3, page ) \|
	122	( seg_2, page ) \|
	123	( seg_1, page ) \|
	124	( seg_0, page ) );
	125
	126	R_MMU_KBASE_HI = ( ( base_f, 0x0 ) \| // flash/sram/periph cached
	127	( base_e, 0x8 ) \| // flash/sram/periph uncached
	128	( base_d, 0x0 ) \| // don't care
	129	( base_c, 0x4 ) \| // physical RAM cached area
	130	( base_b, 0xb ) \| // uncached on-chip registers
	131	( base_a, 0x0 ) \| // don't care
	132	( base_9, 0x0 ) \| // don't care
	133	( base_8, 0x0 ) ); // don't care
	134
	135	R_MMU_KBASE_LO = ( ( base_7, 0x0 ) \| // don't care
	136	( base_6, 0x0 ) \| // don't care
	137	( base_5, 0x0 ) \| // don't care
	138	( base_4, 0x0 ) \| // don't care
	139	( base_3, 0x0 ) \| // don't care
	140	( base_2, 0x0 ) \| // don't care
	141	( base_1, 0x0 ) \| // don't care
	142	( base_0, 0x0 ) ); // don't care
	143
	144	NOTE: while setting up the MMU, we run in a non-mapped mode in the DRAM (0x40
	145	segment) and need to setup the seg_4 to a unity mapping, so that we don't get
	146	a fault before we have had time to jump into the real kernel segment (0xc0). This
	147	is done in head.S temporarily, but fixed by the kernel later in paging_init.
	148
	149
	150	Paging - PTE's, PMD's and PGD's
	151	-------------------------------
	152
	153	[ References: asm/pgtable.h, asm/page.h, asm/mmu.h ]
	154
	155	The paging mechanism uses virtual addresses to split a process memory-space into
	156	pages, a page being the smallest unit that can be freely remapped in memory. On
	157	Linux/CRIS, a page is 8192 bytes (for technical reasons not equal to 4096 as in
	158	most other 32-bit architectures). It would be inefficient to let a virtual memory
	159	mapping be controlled by a long table of page mappings, so it is broken down into
	160	a 2-level structure with a Page Directory containing pointers to Page Tables which
	161	each have maps of up to 2048 pages (8192 / sizeof(void *)). Linux can actually
	162	handle 3-level structures as well, with a Page Middle Directory in between, but
	163	in many cases, this is folded into a two-level structure by excluding the Middle
	164	Directory.
	165
	166	We'll take a look at how an address is translated while we discuss how it's handled
	167	in the Linux kernel.
	168
	169	The example address is 0xd004000c; in binary this is:
	170
	171	31 23 15 7 0
	172	11010000 00000100 00000000 00001100
	173
	174	\|______\| \|__________\|\|____________\|
	175	PGD PTE page offset
	176
	177	Given the top-level Page Directory, the offset in that directory is calculated
	178	using the upper 8 bits:
	179
	180	static inline pgd_t * pgd_offset(struct mm_struct * mm, unsigned long address)
	181	{
	182	return mm->pgd + (address >> PGDIR_SHIFT);
	183	}
	184
	185	PGDIR_SHIFT is the log2 of the amount of memory an entry in the PGD can map; in our
	186	case it is 24, corresponding to 16 MB. This means that each entry in the PGD
	187	corresponds to 16 MB of virtual memory.
	188
	189	The pgd_t from our example will therefore be the 208'th (0xd0) entry in mm->pgd.
	190
	191	Since the Middle Directory does not exist, it is a unity mapping:
	192
	193	static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
	194	{
	195	return (pmd_t *) dir;
	196	}
	197
	198	The Page Table provides the final lookup by using bits 13 to 23 as index:
	199
	200	static inline pte_t * pte_offset(pmd_t * dir, unsigned long address)
	201	{
	202	return (pte_t ) pmd_page(dir) + ((address >> PAGE_SHIFT) &
	203	(PTRS_PER_PTE - 1));
	204	}
	205
	206	PAGE_SHIFT is the log2 of the size of a page; 13 in our case. PTRS_PER_PTE is
	207	the number of pointers that fit in a Page Table and is used to mask off the
	208	PGD-part of the address.
	209
	210	The so-far unused bits 0 to 12 are used to index inside a page linearily.
	211
	212	The VM system
	213	-------------
	214
	215	The kernels own page-directory is the swapper_pg_dir, cleared in paging_init,
	216	and contains the kernels virtual mappings (the kernel itself is not paged - it
	217	is mapped linearily using kseg_c as described above). Architectures without
	218	kernel segments like the i386, need to setup swapper_pg_dir directly in head.S
	219	to map the kernel itself. swapper_pg_dir is pointed to by init_mm.pgd as the
	220	init-task's PGD.
	221
	222	To see what support functions are used to setup a page-table, let's look at the
	223	kernel's internal paged memory system, vmalloc/vfree.
	224
	225	void * vmalloc(unsigned long size)
	226
	227	The vmalloc-system keeps a paged segment in kernel-space at 0xd0000000. What
	228	happens first is that a virtual address chunk is allocated to the request using
	229	get_vm_area(size). After that, physical RAM pages are allocated and put into
	230	the kernel's page-table using alloc_area_pages(addr, size).
	231
	232	static int alloc_area_pages(unsigned long address, unsigned long size)
	233
	234	First the PGD entry is found using init_mm.pgd. This is passed to
	235	alloc_area_pmd (remember the 3->2 folding). It uses pte_alloc_kernel to
	236	check if the PGD entry points anywhere - if not, a page table page is
	237	allocated and the PGD entry updated. Then the alloc_area_pte function is
	238	used just like alloc_area_pmd to check which page table entry is desired,
	239	and a physical page is allocated and the table entry updated. All of this
	240	is repeated at the top-level until the entire address range specified has
	241	been mapped.
	242
	243
	244