]>
Commit | Line | Data |
---|---|---|
d27554d8 | 1 | |
2 | PAT (Page Attribute Table) | |
3 | ||
4 | x86 Page Attribute Table (PAT) allows for setting the memory attribute at the | |
5 | page level granularity. PAT is complementary to the MTRR settings which allows | |
6 | for setting of memory types over physical address ranges. However, PAT is | |
7 | more flexible than MTRR due to its capability to set attributes at page level | |
8 | and also due to the fact that there are no hardware limitations on number of | |
9 | such attribute settings allowed. Added flexibility comes with guidelines for | |
10 | not having memory type aliasing for the same physical memory with multiple | |
11 | virtual addresses. | |
12 | ||
13 | PAT allows for different types of memory attributes. The most commonly used | |
14 | ones that will be supported at this time are Write-back, Uncached, | |
d838270e | 15 | Write-combined, Write-through and Uncached Minus. |
d27554d8 | 16 | |
59dfc3f8 | 17 | |
18 | PAT APIs | |
19 | -------- | |
20 | ||
d27554d8 | 21 | There are many different APIs in the kernel that allows setting of memory |
22 | attributes at the page level. In order to avoid aliasing, these interfaces | |
23 | should be used thoughtfully. Below is a table of interfaces available, | |
24 | their intended usage and their memory attribute relationships. Internally, | |
25 | these APIs use a reserve_memtype()/free_memtype() interface on the physical | |
26 | address range to avoid any aliasing. | |
27 | ||
28 | ||
29 | ------------------------------------------------------------------- | |
30 | API | RAM | ACPI,... | Reserved/Holes | | |
31 | -----------------------|----------|------------|------------------| | |
32 | | | | | | |
59dfc3f8 | 33 | ioremap | -- | UC- | UC- | |
d27554d8 | 34 | | | | | |
35 | ioremap_cache | -- | WB | WB | | |
36 | | | | | | |
2f9e8973 LR |
37 | ioremap_uc | -- | UC | UC | |
38 | | | | | | |
59dfc3f8 | 39 | ioremap_nocache | -- | UC- | UC- | |
d27554d8 | 40 | | | | | |
41 | ioremap_wc | -- | -- | WC | | |
42 | | | | | | |
d838270e TK |
43 | ioremap_wt | -- | -- | WT | |
44 | | | | | | |
59dfc3f8 | 45 | set_memory_uc | UC- | -- | -- | |
d27554d8 | 46 | set_memory_wb | | | | |
47 | | | | | | |
48 | set_memory_wc | WC | -- | -- | | |
49 | set_memory_wb | | | | | |
50 | | | | | | |
623dffb2 TK |
51 | set_memory_wt | WT | -- | -- | |
52 | set_memory_wb | | | | | |
53 | | | | | | |
59dfc3f8 | 54 | pci sysfs resource | -- | -- | UC- | |
d27554d8 | 55 | | | | | |
56 | pci sysfs resource_wc | -- | -- | WC | | |
57 | is IORESOURCE_PREFETCH| | | | | |
58 | | | | | | |
59dfc3f8 | 59 | pci proc | -- | -- | UC- | |
d27554d8 | 60 | !PCIIOC_WRITE_COMBINE | | | | |
61 | | | | | | |
62 | pci proc | -- | -- | WC | | |
63 | PCIIOC_WRITE_COMBINE | | | | | |
64 | | | | | | |
59dfc3f8 | 65 | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | |
d27554d8 | 66 | read-write | | | | |
67 | | | | | | |
59dfc3f8 | 68 | /dev/mem | -- | UC- | UC- | |
d27554d8 | 69 | mmap SYNC flag | | | | |
70 | | | | | | |
59dfc3f8 | 71 | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | |
d27554d8 | 72 | mmap !SYNC flag | |(from exist-| (from exist- | |
73 | and | | ing alias)| ing alias) | | |
74 | any alias to this area| | | | | |
75 | | | | | | |
76 | /dev/mem | -- | WB | WB | | |
77 | mmap !SYNC flag | | | | | |
78 | no alias to this area | | | | | |
79 | and | | | | | |
80 | MTRR says WB | | | | | |
81 | | | | | | |
59dfc3f8 | 82 | /dev/mem | -- | -- | UC- | |
d27554d8 | 83 | mmap !SYNC flag | | | | |
84 | no alias to this area | | | | | |
85 | and | | | | | |
86 | MTRR says !WB | | | | | |
87 | | | | | | |
88 | ------------------------------------------------------------------- | |
89 | ||
a2ced6e1 | 90 | Advanced APIs for drivers |
91 | ------------------------- | |
67bac792 | 92 | A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, |
67fa1666 | 93 | vmf_insert_pfn |
a2ced6e1 | 94 | |
67bac792 | 95 | Drivers wanting to export some pages to userspace do it by using mmap |
a2ced6e1 | 96 | interface and a combination of |
97 | 1) pgprot_noncached() | |
67fa1666 | 98 | 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() |
a2ced6e1 | 99 | |
67bac792 | 100 | With PAT support, a new API pgprot_writecombine is being added. So, drivers can |
a2ced6e1 | 101 | continue to use the above sequence, with either pgprot_noncached() or |
102 | pgprot_writecombine() in step 1, followed by step 2. | |
103 | ||
104 | In addition, step 2 internally tracks the region as UC or WC in memtype | |
105 | list in order to ensure no conflicting mapping. | |
106 | ||
67bac792 | 107 | Note that this set of APIs only works with IO (non RAM) regions. If driver |
108 | wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() | |
a2ced6e1 | 109 | as step 0 above and also track the usage of those pages and use set_memory_wb() |
110 | before the page is freed to free pool. | |
111 | ||
2f9e8973 LR |
112 | MTRR effects on PAT / non-PAT systems |
113 | ------------------------------------- | |
114 | ||
115 | The following table provides the effects of using write-combining MTRRs when | |
116 | using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally | |
117 | mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will | |
118 | be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() | |
119 | is made, should already have been ioremapped with WC attributes or PAT entries, | |
120 | this can be done by using ioremap_wc() / set_memory_wc(). Devices which | |
121 | combine areas of IO memory desired to remain uncacheable with areas where | |
122 | write-combining is desirable should consider use of ioremap_uc() followed by | |
123 | set_memory_wc() to white-list effective write-combined areas. Such use is | |
124 | nevertheless discouraged as the effective memory type is considered | |
125 | implementation defined, yet this strategy can be used as last resort on devices | |
126 | with size-constrained regions where otherwise MTRR write-combining would | |
127 | otherwise not be effective. | |
128 | ||
129 | ---------------------------------------------------------------------- | |
130 | MTRR Non-PAT PAT Linux ioremap value Effective memory type | |
131 | ---------------------------------------------------------------------- | |
132 | Non-PAT | PAT | |
133 | PAT | |
134 | |PCD | |
135 | ||PWT | |
136 | ||| | |
137 | WC 000 WB _PAGE_CACHE_MODE_WB WC | WC | |
138 | WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC | |
139 | WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC | |
140 | WC 011 UC _PAGE_CACHE_MODE_UC UC | UC | |
141 | ---------------------------------------------------------------------- | |
142 | ||
143 | (*) denotes implementation defined and is discouraged | |
a2ced6e1 | 144 | |
d27554d8 | 145 | Notes: |
146 | ||
147 | -- in the above table mean "Not suggested usage for the API". Some of the --'s | |
148 | are strictly enforced by the kernel. Some others are not really enforced | |
149 | today, but may be enforced in future. | |
150 | ||
151 | For ioremap and pci access through /sys or /proc - The actual type returned | |
152 | can be more restrictive, in case of any existing aliasing for that address. | |
153 | For example: If there is an existing uncached mapping, a new ioremap_wc can | |
154 | return uncached mapping in place of write-combine requested. | |
155 | ||
623dffb2 TK |
156 | set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver |
157 | will first make a region uc, wc or wt and switch it back to wb after use. | |
d27554d8 | 158 | |
159 | Over time writes to /proc/mtrr will be deprecated in favor of using PAT based | |
160 | interfaces. Users writing to /proc/mtrr are suggested to use above interfaces. | |
161 | ||
162 | Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access | |
163 | types. | |
164 | ||
623dffb2 | 165 | Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. |
d27554d8 | 166 | |
59dfc3f8 | 167 | |
168 | PAT debugging | |
169 | ------------- | |
170 | ||
171 | With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by | |
172 | ||
173 | # mount -t debugfs debugfs /sys/kernel/debug | |
174 | # cat /sys/kernel/debug/x86/pat_memtype_list | |
175 | PAT memtype list: | |
176 | uncached-minus @ 0x7fadf000-0x7fae0000 | |
177 | uncached-minus @ 0x7fb19000-0x7fb1a000 | |
178 | uncached-minus @ 0x7fb1a000-0x7fb1b000 | |
179 | uncached-minus @ 0x7fb1b000-0x7fb1c000 | |
180 | uncached-minus @ 0x7fb1c000-0x7fb1d000 | |
181 | uncached-minus @ 0x7fb1d000-0x7fb1e000 | |
182 | uncached-minus @ 0x7fb1e000-0x7fb25000 | |
183 | uncached-minus @ 0x7fb25000-0x7fb26000 | |
184 | uncached-minus @ 0x7fb26000-0x7fb27000 | |
185 | uncached-minus @ 0x7fb27000-0x7fb28000 | |
186 | uncached-minus @ 0x7fb28000-0x7fb2e000 | |
187 | uncached-minus @ 0x7fb2e000-0x7fb2f000 | |
188 | uncached-minus @ 0x7fb2f000-0x7fb30000 | |
189 | uncached-minus @ 0x7fb31000-0x7fb32000 | |
190 | uncached-minus @ 0x80000000-0x90000000 | |
191 | ||
192 | This list shows physical address ranges and various PAT settings used to | |
193 | access those physical address ranges. | |
194 | ||
195 | Another, more verbose way of getting PAT related debug messages is with | |
196 | "debugpat" boot parameter. With this parameter, various debug messages are | |
197 | printed to dmesg log. | |
198 | ||
b6350c21 TK |
199 | PAT Initialization |
200 | ------------------ | |
201 | ||
202 | The following table describes how PAT is initialized under various | |
203 | configurations. The PAT MSR must be updated by Linux in order to support WC | |
204 | and WT attributes. Otherwise, the PAT MSR has the value programmed in it | |
205 | by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. | |
206 | ||
207 | MTRR PAT Call Sequence PAT State PAT MSR | |
208 | ========================================================= | |
209 | E E MTRR -> PAT init Enabled OS | |
210 | E D MTRR -> PAT init Disabled - | |
211 | D E MTRR -> PAT disable Disabled BIOS | |
212 | D D MTRR -> PAT disable Disabled - | |
213 | - np/E PAT -> PAT disable Disabled BIOS | |
214 | - np/D PAT -> PAT disable Disabled - | |
215 | E !P/E MTRR -> PAT init Disabled BIOS | |
216 | D !P/E MTRR -> PAT disable Disabled BIOS | |
217 | !M !P/E MTRR stub -> PAT disable Disabled BIOS | |
218 | ||
219 | Legend | |
220 | ------------------------------------------------ | |
221 | E Feature enabled in CPU | |
222 | D Feature disabled/unsupported in CPU | |
223 | np "nopat" boot option specified | |
224 | !P CONFIG_X86_PAT option unset | |
225 | !M CONFIG_MTRR option unset | |
226 | Enabled PAT state set to enabled | |
227 | Disabled PAT state set to disabled | |
228 | OS PAT initializes PAT MSR with OS setting | |
229 | BIOS PAT keeps PAT MSR with BIOS setting | |
230 |