]>
Commit | Line | Data |
---|---|---|
d04f9f5a MR |
1 | .. _balance: |
2 | ||
3 | ================ | |
4 | Memory Balancing | |
5 | ================ | |
6 | ||
1da177e4 LT |
7 | Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> |
8 | ||
d0164adc MG |
9 | Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as |
10 | well as for non __GFP_IO allocations. | |
1da177e4 | 11 | |
d0164adc MG |
12 | The first reason why a caller may avoid reclaim is that the caller can not |
13 | sleep due to holding a spinlock or is in interrupt context. The second may | |
14 | be that the caller is willing to fail the allocation without incurring the | |
15 | overhead of page reclaim. This may happen for opportunistic high-order | |
16 | allocation requests that have order-0 fallback options. In such cases, | |
17 | the caller may also wish to avoid waking kswapd. | |
1da177e4 LT |
18 | |
19 | __GFP_IO allocation requests are made to prevent file system deadlocks. | |
20 | ||
21 | In the absence of non sleepable allocation requests, it seems detrimental | |
22 | to be doing balancing. Page reclamation can be kicked off lazily, that | |
23 | is, only when needed (aka zone free memory is 0), instead of making it | |
24 | a proactive process. | |
25 | ||
26 | That being said, the kernel should try to fulfill requests for direct | |
27 | mapped pages from the direct mapped pool, instead of falling back on | |
28 | the dma pool, so as to keep the dma pool filled for dma requests (atomic | |
29 | or not). A similar argument applies to highmem and direct mapped pages. | |
30 | OTOH, if there is a lot of free dma pages, it is preferable to satisfy | |
31 | regular memory requests by allocating one from the dma pool, instead | |
32 | of incurring the overhead of regular zone balancing. | |
33 | ||
34 | In 2.2, memory balancing/page reclamation would kick off only when the | |
35 | _total_ number of free pages fell below 1/64 th of total memory. With the | |
36 | right ratio of dma and regular memory, it is quite possible that balancing | |
37 | would not be done even when the dma zone was completely empty. 2.2 has | |
38 | been running production machines of varying memory sizes, and seems to be | |
39 | doing fine even with the presence of this problem. In 2.3, due to | |
40 | HIGHMEM, this problem is aggravated. | |
41 | ||
42 | In 2.3, zone balancing can be done in one of two ways: depending on the | |
43 | zone size (and possibly of the size of lower class zones), we can decide | |
44 | at init time how many free pages we should aim for while balancing any | |
45 | zone. The good part is, while balancing, we do not need to look at sizes | |
46 | of lower class zones, the bad part is, we might do too frequent balancing | |
47 | due to ignoring possibly lower usage in the lower class zones. Also, | |
48 | with a slight change in the allocation routine, it is possible to reduce | |
49 | the memclass() macro to be a simple equality. | |
50 | ||
51 | Another possible solution is that we balance only when the free memory | |
52 | of a zone _and_ all its lower class zones falls below 1/64th of the | |
53 | total memory in the zone and its lower class zones. This fixes the 2.2 | |
54 | balancing problem, and stays as close to 2.2 behavior as possible. Also, | |
55 | the balancing algorithm works the same way on the various architectures, | |
56 | which have different numbers and types of zones. If we wanted to get | |
57 | fancy, we could assign different weights to free pages in different | |
58 | zones in the future. | |
59 | ||
60 | Note that if the size of the regular zone is huge compared to dma zone, | |
61 | it becomes less significant to consider the free dma pages while | |
62 | deciding whether to balance the regular zone. The first solution | |
63 | becomes more attractive then. | |
64 | ||
65 | The appended patch implements the second solution. It also "fixes" two | |
66 | problems: first, kswapd is woken up as in 2.2 on low memory conditions | |
67 | for non-sleepable allocations. Second, the HIGHMEM zone is also balanced, | |
68 | so as to give a fighting chance for replace_with_highmem() to get a | |
69 | HIGHMEM page, as well as to ensure that HIGHMEM allocations do not | |
70 | fall back into regular zone. This also makes sure that HIGHMEM pages | |
d04f9f5a | 71 | are not leaked (for example, in situations where a HIGHMEM page is in |
1da177e4 LT |
72 | the swapcache but is not being used by anyone) |
73 | ||
74 | kswapd also needs to know about the zones it should balance. kswapd is | |
d04f9f5a | 75 | primarily needed in a situation where balancing can not be done, |
1da177e4 LT |
76 | probably because all allocation requests are coming from intr context |
77 | and all process contexts are sleeping. For 2.3, kswapd does not really | |
78 | need to balance the highmem zone, since intr context does not request | |
79 | highmem pages. kswapd looks at the zone_wake_kswapd field in the zone | |
80 | structure to decide whether a zone needs balancing. | |
81 | ||
82 | Page stealing from process memory and shm is done if stealing the page would | |
83 | alleviate memory pressure on any zone in the page's node that has fallen below | |
84 | its watermark. | |
85 | ||
41858966 MG |
86 | watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These |
87 | are per-zone fields, used to determine when a zone needs to be balanced. When | |
88 | the number of pages falls below watermark[WMARK_MIN], the hysteric field | |
89 | low_on_memory gets set. This stays set till the number of free pages becomes | |
90 | watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will | |
91 | try to free some pages in the zone (providing GFP_WAIT is set in the request). | |
92 | Orthogonal to this, is the decision to poke kswapd to free some zone pages. | |
93 | That decision is not hysteresis based, and is done when the number of free | |
94 | pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set. | |
1da177e4 LT |
95 | |
96 | ||
97 | (Good) Ideas that I have heard: | |
d04f9f5a | 98 | |
1da177e4 | 99 | 1. Dynamic experience should influence balancing: number of failed requests |
d04f9f5a | 100 | for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net) |
1da177e4 | 101 | 2. Implement a replace_with_highmem()-like replace_with_regular() to preserve |
d04f9f5a | 102 | dma pages. (lkd@tantalophile.demon.co.uk) |