]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <HTML> |
2 | <HEAD> | |
3 | <TITLE>Conservative GC Porting Directions</TITLE> | |
4 | </HEAD> | |
5 | <BODY> | |
6 | <H1>Conservative GC Porting Directions</h1> | |
7 | The collector is designed to be relatively easy to port, but is not | |
8 | portable code per se. The collector inherently has to perform operations, | |
9 | such as scanning the stack(s), that are not possible in portable C code. | |
10 | <P> | |
11 | All of the following assumes that the collector is being ported to a | |
12 | byte-addressable 32- or 64-bit machine. Currently all successful ports | |
13 | to 64-bit machines involve LP64 targets. The code base includes some | |
14 | provisions for P64 targets (notably win64), but that has not been tested. | |
15 | You are hereby discouraged from attempting a port to non-byte-addressable, | |
16 | or 8-bit, or 16-bit machines. | |
17 | <P> | |
18 | The difficulty of porting the collector varies greatly depending on the needed | |
19 | functionality. In the simplest case, only some small additions are needed | |
20 | for the <TT>include/private/gcconfig.h</tt> file. This is described in the | |
21 | following section. Later sections discuss some of the optional features, | |
22 | which typically involve more porting effort. | |
23 | <P> | |
24 | Note that the collector makes heavy use of <TT>ifdef</tt>s. Unlike | |
25 | some other software projects, we have concluded repeatedly that this is preferable | |
26 | to system dependent files, with code duplicated between the files. | |
27 | However, to keep this manageable, we do strongly believe in indenting | |
28 | <TT>ifdef</tt>s correctly (for historical reasons usually without the leading | |
29 | sharp sign). (Separate source files are of course fine if they don't result in | |
30 | code duplication.) | |
31 | <H2>Adding Platforms to <TT>gcconfig.h</tt></h2> | |
32 | If neither thread support, nor tracing of dynamic library data is required, | |
33 | these are often the only changes you will need to make. | |
34 | <P> | |
35 | The <TT>gcconfig.h</tt> file consists of three sections: | |
36 | <OL> | |
37 | <LI> A section that defines GC-internal macros | |
38 | that identify the architecture (e.g. <TT>IA64</tt> or <TT>I386</tt>) | |
39 | and operating system (e.g. <TT>LINUX</tt> or <TT>MSWIN32</tt>). | |
40 | This is usually done by testing predefined macros. By defining | |
41 | our own macros instead of using the predefined ones directly, we can | |
42 | impose a bit more consistency, and somewhat isolate ourselves from | |
43 | compiler differences. | |
44 | <P> | |
45 | It is relatively straightforward to add a new entry here. But please try | |
46 | to be consistent with the existing code. In particular, 64-bit variants | |
47 | of 32-bit architectures general are <I>not</i> treated as a new architecture. | |
48 | Instead we explicitly test for 64-bit-ness in the few places in which it | |
49 | matters. (The notable exception here is <TT>I386</tt> and <TT>X86_64</tt>. | |
50 | This is partially historical, and partially justified by the fact that there | |
51 | are arguably more substantial architecture and ABI differences here than | |
52 | for RISC variants.) | |
53 | <P> | |
54 | on GNU-based systems, <TT>cpp -dM empty_source_file.c</tt> seems to generate | |
55 | a set of predefined macros. On some other systems, the "verbose" | |
56 | compiler option may do so, or the manual page may list them. | |
57 | <LI> | |
58 | A section that defines a small number of platform-specific macros, which are | |
59 | then used directly by the collector. For simple ports, this is where most of | |
60 | the effort is required. We describe the macros below. | |
61 | <P> | |
62 | This section contains a subsection for each architecture (enclosed in a | |
63 | suitable <TT>ifdef</tt>. Each subsection usually contains some | |
64 | architecture-dependent defines, followed by several sets of OS-dependent | |
65 | defines, again enclosed in <TT>ifdef</tt>s. | |
66 | <LI> | |
67 | A section that fills in defaults for some macros left undefined in the preceding | |
68 | section, and defines some other macros that rarely need adjustment for | |
69 | new platforms. You will typically not have to touch these. | |
70 | If you are porting to an OS that | |
71 | was previously completely unsupported, it is likely that you will | |
72 | need to add another clause to the definition of <TT>GET_MEM</tt>. | |
73 | </ol> | |
74 | The following macros must be defined correctly for each architecture and operating | |
75 | system: | |
76 | <DL> | |
77 | <DT><TT>MACH_TYPE</tt> | |
78 | <DD> | |
79 | Defined to a string that represents the machine architecture. Usually | |
80 | just the macro name used to identify the architecture, but enclosed in quotes. | |
81 | <DT><TT>OS_TYPE</tt> | |
82 | <DD> | |
83 | Defined to a string that represents the operating system name. Usually | |
84 | just the macro name used to identify the operating system, but enclosed in quotes. | |
85 | <DT><TT>CPP_WORDSZ</tt> | |
86 | <DD> | |
87 | The word size in bits as a constant suitable for preprocessor tests, | |
88 | i.e. without casts or sizeof expressions. Currently always defined as | |
89 | either 64 or 32. For platforms supporting both 32- and 64-bit ABIs, | |
90 | this should be conditionally defined depending on the current ABI. | |
91 | There is a default of 32. | |
92 | <DT><TT>ALIGNMENT</tt> | |
93 | <DD> | |
94 | Defined to be the largest <TT>N</tt>, such that | |
95 | all pointer are guaranteed to be aligned on <TT>N</tt>-byte boundaries. | |
96 | defining it to be 1 will always work, but perform poorly. | |
97 | For all modern 32-bit platforms, this is 4. For all modern 64-bit | |
98 | platforms, this is 8. Whether or not X86 qualifies as a modern | |
99 | architecture here is compiler- and OS-dependent. | |
100 | <DT><TT>DATASTART</tt> | |
101 | <DD> | |
102 | The beginning of the main data segment. The collector will trace all | |
103 | memory between <TT>DATASTART</tt> and <TT>DATAEND</tt> for root pointers. | |
104 | On some platforms,this can be defined to a constant address, | |
105 | though experience has shown that to be risky. Ideally the linker will | |
106 | define a symbol (e.g. <TT>_data</tt> whose address is the beginning | |
107 | of the data segment. Sometimes the value can be computed using | |
108 | the <TT>GC_SysVGetDataStart</tt> function. Not used if either | |
109 | the next macro is defined, or if dynamic loading is supported, and the | |
110 | dynamic loading support defines a function | |
111 | <TT>GC_register_main_static_data()</tt> which returns false. | |
112 | <DT><TT>SEARCH_FOR_DATA_START</tt> | |
113 | <DD> | |
114 | If this is defined <TT>DATASTART</tt> will be defined to a dynamically | |
115 | computed value which is obtained by starting with the address of | |
116 | <TT>_end</tt> and walking backwards until non-addressable memory is found. | |
117 | This often works on Posix-like platforms. It makes it harder to debug | |
118 | client programs, since startup involves generating and catching a | |
119 | segmentation fault, which tends to confuse users. | |
120 | <DT><TT>DATAEND</tt> | |
121 | <DD> | |
122 | Set to the end of the main data segment. Defaults to <TT>end</tt>, | |
123 | where that is declared as an array. This works in some cases, since | |
124 | the linker introduces a suitable symbol. | |
125 | <DT><TT>DATASTART2, DATAEND2</tt> | |
126 | <DD> | |
127 | Some platforms have two discontiguous main data segments, e.g. | |
128 | for initialized and uninitialized data. If so, these two macros | |
129 | should be defined to the limits of the second main data segment. | |
130 | <DT><TT>STACK_GROWS_UP</tt> | |
131 | <DD> | |
132 | Should be defined if the stack (or thread stacks) grow towards higher | |
133 | addresses. (This appears to be true only on PA-RISC. If your architecture | |
134 | has more than one stack per thread, and is not already supported, you will | |
135 | need to do more work. Grep for "IA64" in the source for an example.) | |
136 | <DT><TT>STACKBOTTOM</tt> | |
137 | <DD> | |
138 | Defined to be the cool end of the stack, which is usually the | |
139 | highest address in the stack. It must bound the region of the | |
140 | stack that contains pointers into the GC heap. With thread support, | |
141 | this must be the cold end of the main stack, which typically | |
142 | cannot be found in the same way as the other thread stacks. | |
143 | If this is not defined and none of the following three macros | |
144 | is defined, client code must explicitly set | |
145 | <TT>GC_stackbottom</tt> to an appropriate value before calling | |
146 | <TT>GC_INIT()</tt> or any other <TT>GC_</tt> routine. | |
147 | <DT><TT>LINUX_STACKBOTTOM</tt> | |
148 | <DD> | |
149 | May be defined instead of <TT>STACKBOTTOM</tt>. | |
150 | If defined, then the cold end of the stack will be determined | |
151 | Currently we usually read it from /proc. | |
152 | <DT><TT>HEURISTIC1</tt> | |
153 | <DD> | |
154 | May be defined instead of <TT>STACKBOTTOM</tt>. | |
155 | <TT>STACK_GRAN</tt> should generally also be undefined and defined. | |
156 | The cold end of the stack is determined by taking an address inside | |
157 | <TT>GC_init's frame</tt>, and rounding it up to | |
158 | the next multiple of <TT>STACK_GRAN</tt>. This works well if the stack base is | |
159 | always aligned to a large power of two. | |
160 | (<TT>STACK_GRAN</tt> is predefined to 0x1000000, which is | |
161 | rarely optimal.) | |
162 | <DT><TT>HEURISTIC2</tt> | |
163 | <DD> | |
164 | May be defined instead of <TT>STACKBOTTOM</tt>. | |
165 | The cold end of the stack is determined by taking an address inside | |
166 | GC_init's frame, incrementing it repeatedly | |
167 | in small steps (decrement if <TT>STACK_GROWS_UP</tt>), and reading the value | |
168 | at each location. We remember the value when the first | |
169 | Segmentation violation or Bus error is signalled, round that | |
170 | to the nearest plausible page boundary, and use that as the | |
171 | stack base. | |
172 | <DT><TT>DYNAMIC_LOADING</tt> | |
173 | <DD> | |
174 | Should be defined if <TT>dyn_load.c</tt> has been updated for this | |
175 | platform and tracing of dynamic library roots is supported. | |
176 | <DT><TT>MPROTECT_VDB, PROC_VDB</tt> | |
177 | <DD> | |
178 | May be defined if the corresponding "virtual dirty bit" | |
179 | implementation in os_dep.c is usable on this platform. This | |
180 | allows incremental/generational garbage collection. | |
181 | <TT>MPROTECT_VDB</tt> identifies modified pages by | |
182 | write protecting the heap and catching faults. | |
183 | <TT>PROC_VDB</tt> uses the /proc primitives to read dirty bits. | |
184 | <DT><TT>PREFETCH, PREFETCH_FOR_WRITE</tt> | |
185 | <DD> | |
186 | The collector uses <TT>PREFETCH</tt>(<I>x</i>) to preload the cache | |
187 | with *<I>x</i>. | |
188 | This defaults to a no-op. | |
189 | <DT><TT>CLEAR_DOUBLE</tt> | |
190 | <DD> | |
191 | If <TT>CLEAR_DOUBLE</tt> is defined, then | |
192 | <TT>CLEAR_DOUBLE</tt>(x) is used as a fast way to | |
193 | clear the two words at GC_malloc-aligned address x. By default, | |
194 | word stores of 0 are used instead. | |
195 | <DT><TT>HEAP_START</tt> | |
196 | <DD> | |
197 | <TT>HEAP_START</tt> may be defined as the initial address hint for mmap-based | |
198 | allocation. | |
199 | <DT><TT>ALIGN_DOUBLE</tt> | |
200 | <DD> | |
201 | Should be defined if the architecture requires double-word alignment | |
202 | of <TT>GC_malloc</tt>ed memory, e.g. 8-byte alignment with a | |
203 | 32-bit ABI. Most modern machines are likely to require this. | |
204 | This is no longer needed for GC7 and later. | |
205 | </dl> | |
206 | <H2>Additional requirements for a basic port</h2> | |
207 | In some cases, you may have to add additional platform-specific code | |
208 | to other files. A likely candidate is the implementation of | |
209 | <TT>GC_with_callee_saves_pushed</tt> in </tt>mach_dep.c</tt>. | |
210 | This ensure that register contents that the collector must trace | |
211 | from are copied to the stack. Typically this can be done portably, | |
212 | but on some platforms it may require assembly code, or just | |
213 | tweaking of conditional compilation tests. | |
214 | <P> | |
215 | For GC7, if your platform supports <TT>getcontext()</tt>, then definining | |
216 | the macro <TT>UNIX_LIKE</tt> for your OS in <TT>gcconfig.h</tt> | |
217 | (if it isn't defined there already) is likely to solve the problem. | |
218 | otherwise, if you are using gcc, <TT>_builtin_unwind_init()</tt> | |
219 | will be used, and should work fine. If that is not applicable either, | |
220 | the implementation will try to use <TT>setjmp()</tt>. This will work if your | |
221 | <TT>setjmp</tt> implementation saves all possibly pointer-valued registers | |
222 | into the buffer, as opposed to trying to unwind the stack at | |
223 | <TT>longjmp</tt> time. The <TT>setjmp_test</tt> test tries to determine this, | |
224 | but often doesn't get it right. | |
225 | <P> | |
226 | In GC6.x versions of the collector, tracing of registers | |
227 | was more commonly handled | |
228 | with assembly code. In GC7, this is generally to be avoided. | |
229 | <P> | |
230 | Most commonly <TT>os_dep.c</tt> will not require attention, but see below. | |
231 | <H2>Thread support</h2> | |
232 | Supporting threads requires that the collector be able to find and suspend | |
233 | all threads potentially accessing the garbage-collected heap, and locate | |
234 | any state associated with each thread that must be traced. | |
235 | <P> | |
236 | The functionality needed for thread support is generally implemented | |
237 | in one or more files specific to the particular thread interface. | |
238 | For example, somewhat portable pthread support is implemented | |
239 | in <TT>pthread_support.c</tt> and <TT>pthread_stop_world.c</tt>. | |
240 | The essential functionality consists of | |
241 | <DL> | |
242 | <DT><TT>GC_stop_world()</tt> | |
243 | <DD> | |
244 | Stops all threads which may access the garbage collected heap, other | |
245 | than the caller. | |
246 | <DT><TT>GC_start_world()</tt> | |
247 | <DD> | |
248 | Restart other threads. | |
249 | <DT><TT>GC_push_all_stacks()</tt> | |
250 | <DD> | |
251 | Push the contents of all thread stacks (or at least of pointer-containing | |
252 | regions in the thread stacks) onto the mark stack. | |
253 | </dl> | |
254 | These very often require that the garbage collector maintain its | |
255 | own data structures to track active threads. | |
256 | <P> | |
257 | In addition, <TT>LOCK</tt> and <TT>UNLOCK</tt> must be implemented | |
258 | in <TT>gc_locks.h</tt> | |
259 | <P> | |
260 | The easiest case is probably a new pthreads platform | |
261 | on which threads can be stopped | |
262 | with signals. In this case, the changes involve: | |
263 | <OL> | |
264 | <LI>Introducing a suitable <TT>GC_</tt><I>X</i><TT>_THREADS</tt> macro, which should | |
265 | be automatically defined by <TT>gc_config_macros.h</tt> in the right cases. | |
266 | It should also result in a definition of <TT>GC_PTHREADS</tt>, as for the | |
267 | existing cases. | |
268 | <LI>For GC7+, ensuring that the <TT>atomic_ops</tt> package at least | |
269 | minimally supports the platform. | |
270 | If incremental GC is needed, or if pthread locks don't | |
271 | perform adequately as the allocation lock, you will probably need to | |
272 | ensure that a sufficient <TT>atomic_ops</tt> port | |
273 | exists for the platform to provided an atomic test and set | |
274 | operation. (Current GC7 versions require more<TT>atomic_ops</tt> | |
275 | asupport than necessary. This is a bug.) For earlier versions define | |
276 | <TT>GC_test_and_set</tt> in <TT>gc_locks.h</tt>. | |
277 | <LI>Making any needed adjustments to <TT>pthread_stop_world.c</tt> and | |
278 | <TT>pthread_support.c</tt>. Ideally none should be needed. In fact, | |
279 | not all of this is as well standardized as one would like, and outright | |
280 | bugs requiring workarounds are common. | |
281 | </ol> | |
282 | Non-preemptive threads packages will probably require further work. Similarly | |
283 | thread-local allocation and parallel marking requires further work | |
284 | in <TT>pthread_support.c</tt>, and may require better <TT>atomic_ops</tt> | |
285 | support. | |
286 | <H2>Dynamic library support</h2> | |
287 | So long as <TT>DATASTART</tt> and <TT>DATAEND</tt> are defined correctly, | |
288 | the collector will trace memory reachable from file scope or <TT>static</tt> | |
289 | variables defined as part of the main executable. This is sufficient | |
290 | if either the program is statically linked, or if pointers to the | |
291 | garbage-collected heap are never stored in non-stack variables | |
292 | defined in dynamic libraries. | |
293 | <P> | |
294 | If dynamic library data sections must also be traced, then | |
295 | <UL> | |
296 | <LI><TT>DYNAMIC_LOADING</tt> must be defined in the appropriate section | |
297 | of <TT>gcconfig.h</tt>. | |
298 | <LI>An appropriate versions of the functions | |
299 | <TT>GC_register_dynamic_libraries()</tt> should be defined in | |
300 | <TT>dyn_load.c</tt>. This function should invoke | |
301 | <TT>GC_cond_add_roots(</tt><I>region_start, region_end</i><TT>, TRUE)</tt> | |
302 | on each dynamic library data section. | |
303 | </ul> | |
304 | <P> | |
305 | Implementations that scan for writable data segments are error prone, particularly | |
306 | in the presence of threads. They frequently result in race conditions | |
307 | when threads exit and stacks disappear. They may also accidentally trace | |
308 | large regions of graphics memory, or mapped files. On at least | |
309 | one occasion they have been known to try to trace device memory that | |
310 | could not safely be read in the manner the GC wanted to read it. | |
311 | <P> | |
312 | It is usually safer to walk the dynamic linker data structure, especially | |
313 | if the linker exports an interface to do so. But beware of poorly documented | |
314 | locking behavior in this case. | |
315 | <H2>Incremental GC support</h2> | |
316 | For incremental and generational collection to work, <TT>os_dep.c</tt> | |
317 | must contain a suitable "virtual dirty bit" implementation, which | |
318 | allows the collector to track which heap pages (assumed to be | |
319 | a multiple of the collectors block size) have been written during | |
320 | a certain time interval. The collector provides several | |
321 | implementations, which might be adapted. The default | |
322 | (<TT>DEFAULT_VDB</tt>) is a placeholder which treats all pages | |
323 | as having been written. This ensures correctness, but renders | |
324 | incremental and generational collection essentially useless. | |
325 | <H2>Stack traces for debug support</h2> | |
326 | If stack traces in objects are need for debug support, | |
327 | <TT>GC_dave_callers</tt> and <TT>GC_print_callers</tt> must be | |
328 | implemented. | |
329 | <H2>Disclaimer</h2> | |
330 | This is an initial pass at porting guidelines. Some things | |
331 | have no doubt been overlooked. | |
332 | </body> | |
333 | </html> |