]>
Commit | Line | Data |
---|---|---|
25c3bf8a MR |
1 | .. _pagemap: |
2 | ||
41ea9dd3 MR |
3 | ============================= |
4 | Examining Process Page Tables | |
5 | ============================= | |
ef421be7 TT |
6 | |
7 | pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow | |
8 | userspace programs to examine the page tables and related information by | |
25c3bf8a | 9 | reading files in ``/proc``. |
ef421be7 | 10 | |
80ae2fdc | 11 | There are four components to pagemap: |
ef421be7 | 12 | |
25c3bf8a | 13 | * ``/proc/pid/pagemap``. This file lets a userspace process find out which |
ef421be7 TT |
14 | physical frame each virtual page is mapped to. It contains one 64-bit |
15 | value for each virtual page, containing the following data (from | |
86207d9a | 16 | ``fs/proc/task_mmu.c``, above pagemap_read): |
ef421be7 | 17 | |
c9ba78e2 | 18 | * Bits 0-54 page frame number (PFN) if present |
ef421be7 | 19 | * Bits 0-4 swap type if swapped |
c9ba78e2 | 20 | * Bits 5-54 swap offset if swapped |
e27a20f1 MR |
21 | * Bit 55 pte is soft-dirty (see |
22 | :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`) | |
83b4b0bb | 23 | * Bit 56 page exclusively mapped (since 4.2) |
fb8e37f3 PX |
24 | * Bit 57 pte is uffd-wp write-protected (since 5.13) (see |
25 | :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) | |
f529b1bf | 26 | * Bits 58-60 zero |
83b4b0bb | 27 | * Bit 61 page is file-page or shared-anon (since 3.5) |
ef421be7 TT |
28 | * Bit 62 page swapped |
29 | * Bit 63 page present | |
30 | ||
83b4b0bb KK |
31 | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. |
32 | In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from | |
33 | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. | |
34 | Reason: information about PFNs helps in exploiting Rowhammer vulnerability. | |
35 | ||
ef421be7 TT |
36 | If the page is not present but in swap, then the PFN contains an |
37 | encoding of the swap file number and the page's offset into the | |
38 | swap. Unmapped pages return a null PFN. This allows determining | |
39 | precisely which pages are mapped (or in swap) and comparing mapped | |
40 | pages between processes. | |
41 | ||
86207d9a | 42 | Efficient users of this interface will use ``/proc/pid/maps`` to |
ef421be7 TT |
43 | determine which areas of memory are actually mapped and llseek to |
44 | skip over unmapped regions. | |
45 | ||
25c3bf8a | 46 | * ``/proc/kpagecount``. This file contains a 64-bit count of the number of |
ef421be7 TT |
47 | times each page is mapped, indexed by PFN. |
48 | ||
7f1d23e6 CH |
49 | The page-types tool in the tools/vm directory can be used to query the |
50 | number of times a page is mapped. | |
51 | ||
25c3bf8a | 52 | * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each |
ef421be7 TT |
53 | page, indexed by PFN. |
54 | ||
25c3bf8a MR |
55 | The flags are (from ``fs/proc/page.c``, above kpageflags_read): |
56 | ||
57 | 0. LOCKED | |
58 | 1. ERROR | |
59 | 2. REFERENCED | |
60 | 3. UPTODATE | |
61 | 4. DIRTY | |
62 | 5. LRU | |
63 | 6. ACTIVE | |
64 | 7. SLAB | |
65 | 8. WRITEBACK | |
66 | 9. RECLAIM | |
ef421be7 | 67 | 10. BUDDY |
17e89501 WF |
68 | 11. MMAP |
69 | 12. ANON | |
70 | 13. SWAPCACHE | |
71 | 14. SWAPBACKED | |
72 | 15. COMPOUND_HEAD | |
73 | 16. COMPOUND_TAIL | |
63f8e8d2 | 74 | 17. HUGE |
17e89501 | 75 | 18. UNEVICTABLE |
253fb02d | 76 | 19. HWPOISON |
17e89501 | 77 | 20. NOPAGE |
a1bbb5ec | 78 | 21. KSM |
807f0ccf | 79 | 22. THP |
ca215086 | 80 | 23. OFFLINE |
56873f43 | 81 | 24. ZERO_PAGE |
f074a8f4 | 82 | 25. IDLE |
ca215086 | 83 | 26. PGTABLE |
17e89501 | 84 | |
25c3bf8a | 85 | * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the |
80ae2fdc VD |
86 | memory cgroup each page is charged to, indexed by PFN. Only available when |
87 | CONFIG_MEMCG is set. | |
88 | ||
86207d9a MR |
89 | Short descriptions to the page flags |
90 | ==================================== | |
25c3bf8a MR |
91 | |
92 | 0 - LOCKED | |
86207d9a | 93 | page is being locked for exclusive access, e.g. by undergoing read/write IO |
25c3bf8a MR |
94 | 7 - SLAB |
95 | page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator | |
96 | When compound page is used, SLUB/SLQB will only set this flag on the head | |
97 | page; SLOB will not flag it at all. | |
98 | 10 - BUDDY | |
17e89501 WF |
99 | a free memory block managed by the buddy system allocator |
100 | The buddy system organizes free memory in blocks of various orders. | |
101 | An order N block has 2^N physically contiguous pages, with the BUDDY flag | |
102 | set for and _only_ for the first page. | |
25c3bf8a | 103 | 15 - COMPOUND_HEAD |
17e89501 WF |
104 | A compound page with order N consists of 2^N physically contiguous pages. |
105 | A compound page with order 2 takes the form of "HTTT", where H donates its | |
106 | head page and T donates its tail page(s). The major consumers of compound | |
e27a20f1 MR |
107 | pages are hugeTLB pages |
108 | (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`), | |
109 | the SLUB etc. memory allocators and various device drivers. | |
110 | However in this interface, only huge/giga pages are made visible | |
111 | to end users. | |
25c3bf8a MR |
112 | 16 - COMPOUND_TAIL |
113 | A compound page tail (see description above). | |
114 | 17 - HUGE | |
17e89501 | 115 | this is an integral part of a HugeTLB page |
25c3bf8a | 116 | 19 - HWPOISON |
253fb02d | 117 | hardware detected memory corruption on this page: don't touch the data! |
25c3bf8a | 118 | 20 - NOPAGE |
17e89501 | 119 | no page frame exists at the requested address |
25c3bf8a | 120 | 21 - KSM |
a1bbb5ec | 121 | identical memory pages dynamically shared between one or more processes |
25c3bf8a | 122 | 22 - THP |
807f0ccf | 123 | contiguous pages which construct transparent hugepages |
ca215086 DH |
124 | 23 - OFFLINE |
125 | page is logically offline | |
25c3bf8a | 126 | 24 - ZERO_PAGE |
56873f43 | 127 | zero page for pfn_zero or huge_zero page |
25c3bf8a | 128 | 25 - IDLE |
f074a8f4 | 129 | page has not been accessed since it was marked idle (see |
e27a20f1 MR |
130 | :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`). |
131 | Note that this flag may be stale in case the page was accessed via | |
132 | a PTE. To make sure the flag is up-to-date one has to read | |
133 | ``/sys/kernel/mm/page_idle/bitmap`` first. | |
ca215086 DH |
134 | 26 - PGTABLE |
135 | page is in use as a page table | |
25c3bf8a MR |
136 | |
137 | IO related page flags | |
138 | --------------------- | |
139 | ||
140 | 1 - ERROR | |
141 | IO error occurred | |
142 | 3 - UPTODATE | |
143 | page has up-to-date data | |
144 | ie. for file backed page: (in-memory data revision >= on-disk one) | |
145 | 4 - DIRTY | |
146 | page has been written to, hence contains new data | |
86207d9a | 147 | i.e. for file backed page: (in-memory data revision > on-disk one) |
25c3bf8a MR |
148 | 8 - WRITEBACK |
149 | page is being synced to disk | |
150 | ||
151 | LRU related page flags | |
152 | ---------------------- | |
153 | ||
154 | 5 - LRU | |
155 | page is in one of the LRU lists | |
156 | 6 - ACTIVE | |
157 | page is in the active LRU list | |
158 | 18 - UNEVICTABLE | |
159 | page is in the unevictable (non-)LRU list It is somehow pinned and | |
86207d9a | 160 | not a candidate for LRU page reclaims, e.g. ramfs pages, |
25c3bf8a MR |
161 | shmctl(SHM_LOCK) and mlock() memory segments |
162 | 2 - REFERENCED | |
163 | page has been referenced since last LRU list enqueue/requeue | |
164 | 9 - RECLAIM | |
165 | page will be reclaimed soon after its pageout IO completed | |
166 | 11 - MMAP | |
167 | a memory mapped page | |
168 | 12 - ANON | |
169 | a memory mapped page that is not part of a file | |
170 | 13 - SWAPCACHE | |
86207d9a | 171 | page is mapped to swap space, i.e. has an associated swap entry |
25c3bf8a MR |
172 | 14 - SWAPBACKED |
173 | page is backed by swap/RAM | |
17e89501 | 174 | |
3250af19 RW |
175 | The page-types tool in the tools/vm directory can be used to query the |
176 | above flags. | |
ef421be7 | 177 | |
25c3bf8a MR |
178 | Using pagemap to do something useful |
179 | ==================================== | |
ef421be7 TT |
180 | |
181 | The general procedure for using pagemap to find out about a process' memory | |
182 | usage goes like this: | |
183 | ||
25c3bf8a | 184 | 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are |
ef421be7 TT |
185 | mapped to what. |
186 | 2. Select the maps you are interested in -- all of them, or a particular | |
187 | library, or the stack or the heap, etc. | |
25c3bf8a | 188 | 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine. |
ef421be7 | 189 | 4. Read a u64 for each page from pagemap. |
25c3bf8a MR |
190 | 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you |
191 | just read, seek to that entry in the file, and read the data you want. | |
ef421be7 TT |
192 | |
193 | For example, to find the "unique set size" (USS), which is the amount of | |
194 | memory that a process is using that is not shared with any other process, | |
195 | you can go through every map in the process, find the PFNs, look those up | |
196 | in kpagecount, and tally up the number of pages that are only referenced | |
197 | once. | |
198 | ||
25c3bf8a MR |
199 | Other notes |
200 | =========== | |
ef421be7 TT |
201 | |
202 | Reading from any of the files will return -EINVAL if you are not starting | |
f884ab15 | 203 | the read on an 8-byte boundary (e.g., if you sought an odd number of bytes |
ef421be7 | 204 | into the file), or if the size of the read is not a multiple of 8 bytes. |
83b4b0bb KK |
205 | |
206 | Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is | |
207 | always 12 at most architectures). Since Linux 3.11 their meaning changes | |
208 | after first clear of soft-dirty bits. Since Linux 4.2 they are used for | |
209 | flags unconditionally. |