]>
Commit | Line | Data |
---|---|---|
ef421be7 TT |
1 | pagemap, from the userspace perspective |
2 | --------------------------------------- | |
3 | ||
4 | pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow | |
5 | userspace programs to examine the page tables and related information by | |
6 | reading files in /proc. | |
7 | ||
80ae2fdc | 8 | There are four components to pagemap: |
ef421be7 TT |
9 | |
10 | * /proc/pid/pagemap. This file lets a userspace process find out which | |
11 | physical frame each virtual page is mapped to. It contains one 64-bit | |
12 | value for each virtual page, containing the following data (from | |
13 | fs/proc/task_mmu.c, above pagemap_read): | |
14 | ||
c9ba78e2 | 15 | * Bits 0-54 page frame number (PFN) if present |
ef421be7 | 16 | * Bits 0-4 swap type if swapped |
c9ba78e2 | 17 | * Bits 5-54 swap offset if swapped |
541c237c | 18 | * Bit 55 pte is soft-dirty (see Documentation/vm/soft-dirty.txt) |
83b4b0bb | 19 | * Bit 56 page exclusively mapped (since 4.2) |
77bb499b | 20 | * Bits 57-60 zero |
83b4b0bb | 21 | * Bit 61 page is file-page or shared-anon (since 3.5) |
ef421be7 TT |
22 | * Bit 62 page swapped |
23 | * Bit 63 page present | |
24 | ||
83b4b0bb KK |
25 | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. |
26 | In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from | |
27 | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. | |
28 | Reason: information about PFNs helps in exploiting Rowhammer vulnerability. | |
29 | ||
ef421be7 TT |
30 | If the page is not present but in swap, then the PFN contains an |
31 | encoding of the swap file number and the page's offset into the | |
32 | swap. Unmapped pages return a null PFN. This allows determining | |
33 | precisely which pages are mapped (or in swap) and comparing mapped | |
34 | pages between processes. | |
35 | ||
36 | Efficient users of this interface will use /proc/pid/maps to | |
37 | determine which areas of memory are actually mapped and llseek to | |
38 | skip over unmapped regions. | |
39 | ||
40 | * /proc/kpagecount. This file contains a 64-bit count of the number of | |
41 | times each page is mapped, indexed by PFN. | |
42 | ||
43 | * /proc/kpageflags. This file contains a 64-bit set of flags for each | |
44 | page, indexed by PFN. | |
45 | ||
c9ba78e2 | 46 | The flags are (from fs/proc/page.c, above kpageflags_read): |
ef421be7 TT |
47 | |
48 | 0. LOCKED | |
49 | 1. ERROR | |
50 | 2. REFERENCED | |
51 | 3. UPTODATE | |
52 | 4. DIRTY | |
53 | 5. LRU | |
54 | 6. ACTIVE | |
55 | 7. SLAB | |
56 | 8. WRITEBACK | |
57 | 9. RECLAIM | |
58 | 10. BUDDY | |
17e89501 WF |
59 | 11. MMAP |
60 | 12. ANON | |
61 | 13. SWAPCACHE | |
62 | 14. SWAPBACKED | |
63 | 15. COMPOUND_HEAD | |
64 | 16. COMPOUND_TAIL | |
63f8e8d2 | 65 | 17. HUGE |
17e89501 | 66 | 18. UNEVICTABLE |
253fb02d | 67 | 19. HWPOISON |
17e89501 | 68 | 20. NOPAGE |
a1bbb5ec | 69 | 21. KSM |
807f0ccf | 70 | 22. THP |
56873f43 WY |
71 | 23. BALLOON |
72 | 24. ZERO_PAGE | |
f074a8f4 | 73 | 25. IDLE |
17e89501 | 74 | |
80ae2fdc VD |
75 | * /proc/kpagecgroup. This file contains a 64-bit inode number of the |
76 | memory cgroup each page is charged to, indexed by PFN. Only available when | |
77 | CONFIG_MEMCG is set. | |
78 | ||
17e89501 WF |
79 | Short descriptions to the page flags: |
80 | ||
81 | 0. LOCKED | |
82 | page is being locked for exclusive access, eg. by undergoing read/write IO | |
83 | ||
84 | 7. SLAB | |
85 | page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator | |
86 | When compound page is used, SLUB/SLQB will only set this flag on the head | |
87 | page; SLOB will not flag it at all. | |
88 | ||
89 | 10. BUDDY | |
90 | a free memory block managed by the buddy system allocator | |
91 | The buddy system organizes free memory in blocks of various orders. | |
92 | An order N block has 2^N physically contiguous pages, with the BUDDY flag | |
93 | set for and _only_ for the first page. | |
94 | ||
95 | 15. COMPOUND_HEAD | |
96 | 16. COMPOUND_TAIL | |
97 | A compound page with order N consists of 2^N physically contiguous pages. | |
98 | A compound page with order 2 takes the form of "HTTT", where H donates its | |
99 | head page and T donates its tail page(s). The major consumers of compound | |
100 | pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc. | |
101 | memory allocators and various device drivers. However in this interface, | |
102 | only huge/giga pages are made visible to end users. | |
103 | 17. HUGE | |
104 | this is an integral part of a HugeTLB page | |
105 | ||
253fb02d WF |
106 | 19. HWPOISON |
107 | hardware detected memory corruption on this page: don't touch the data! | |
108 | ||
17e89501 WF |
109 | 20. NOPAGE |
110 | no page frame exists at the requested address | |
111 | ||
a1bbb5ec WF |
112 | 21. KSM |
113 | identical memory pages dynamically shared between one or more processes | |
114 | ||
807f0ccf NH |
115 | 22. THP |
116 | contiguous pages which construct transparent hugepages | |
117 | ||
56873f43 WY |
118 | 23. BALLOON |
119 | balloon compaction page | |
120 | ||
121 | 24. ZERO_PAGE | |
122 | zero page for pfn_zero or huge_zero page | |
123 | ||
f074a8f4 VD |
124 | 25. IDLE |
125 | page has not been accessed since it was marked idle (see | |
126 | Documentation/vm/idle_page_tracking.txt). Note that this flag may be | |
127 | stale in case the page was accessed via a PTE. To make sure the flag | |
128 | is up-to-date one has to read /sys/kernel/mm/page_idle/bitmap first. | |
129 | ||
17e89501 WF |
130 | [IO related page flags] |
131 | 1. ERROR IO error occurred | |
132 | 3. UPTODATE page has up-to-date data | |
133 | ie. for file backed page: (in-memory data revision >= on-disk one) | |
134 | 4. DIRTY page has been written to, hence contains new data | |
135 | ie. for file backed page: (in-memory data revision > on-disk one) | |
136 | 8. WRITEBACK page is being synced to disk | |
137 | ||
138 | [LRU related page flags] | |
139 | 5. LRU page is in one of the LRU lists | |
140 | 6. ACTIVE page is in the active LRU list | |
141 | 18. UNEVICTABLE page is in the unevictable (non-)LRU list | |
142 | It is somehow pinned and not a candidate for LRU page reclaims, | |
143 | eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments | |
144 | 2. REFERENCED page has been referenced since last LRU list enqueue/requeue | |
145 | 9. RECLAIM page will be reclaimed soon after its pageout IO completed | |
146 | 11. MMAP a memory mapped page | |
147 | 12. ANON a memory mapped page that is not part of a file | |
148 | 13. SWAPCACHE page is mapped to swap space, ie. has an associated swap entry | |
149 | 14. SWAPBACKED page is backed by swap/RAM | |
150 | ||
3250af19 RW |
151 | The page-types tool in the tools/vm directory can be used to query the |
152 | above flags. | |
ef421be7 TT |
153 | |
154 | Using pagemap to do something useful: | |
155 | ||
156 | The general procedure for using pagemap to find out about a process' memory | |
157 | usage goes like this: | |
158 | ||
159 | 1. Read /proc/pid/maps to determine which parts of the memory space are | |
160 | mapped to what. | |
161 | 2. Select the maps you are interested in -- all of them, or a particular | |
162 | library, or the stack or the heap, etc. | |
163 | 3. Open /proc/pid/pagemap and seek to the pages you would like to examine. | |
164 | 4. Read a u64 for each page from pagemap. | |
165 | 5. Open /proc/kpagecount and/or /proc/kpageflags. For each PFN you just | |
166 | read, seek to that entry in the file, and read the data you want. | |
167 | ||
168 | For example, to find the "unique set size" (USS), which is the amount of | |
169 | memory that a process is using that is not shared with any other process, | |
170 | you can go through every map in the process, find the PFNs, look those up | |
171 | in kpagecount, and tally up the number of pages that are only referenced | |
172 | once. | |
173 | ||
174 | Other notes: | |
175 | ||
176 | Reading from any of the files will return -EINVAL if you are not starting | |
f884ab15 | 177 | the read on an 8-byte boundary (e.g., if you sought an odd number of bytes |
ef421be7 | 178 | into the file), or if the size of the read is not a multiple of 8 bytes. |
83b4b0bb KK |
179 | |
180 | Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is | |
181 | always 12 at most architectures). Since Linux 3.11 their meaning changes | |
182 | after first clear of soft-dirty bits. Since Linux 4.2 they are used for | |
183 | flags unconditionally. |