]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/tools/build/src/engine/boehm_gc/doc/porting.html
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / tools / build / src / engine / boehm_gc / doc / porting.html
CommitLineData
7c673cae
FG
1<HTML>
2<HEAD>
3 <TITLE>Conservative GC Porting Directions</TITLE>
4</HEAD>
5<BODY>
6<H1>Conservative GC Porting Directions</h1>
7The collector is designed to be relatively easy to port, but is not
8portable code per se. The collector inherently has to perform operations,
9such as scanning the stack(s), that are not possible in portable C code.
10<P>
11All of the following assumes that the collector is being ported to a
12byte-addressable 32- or 64-bit machine. Currently all successful ports
13to 64-bit machines involve LP64 targets. The code base includes some
14provisions for P64 targets (notably win64), but that has not been tested.
15You are hereby discouraged from attempting a port to non-byte-addressable,
16or 8-bit, or 16-bit machines.
17<P>
18The difficulty of porting the collector varies greatly depending on the needed
19functionality. In the simplest case, only some small additions are needed
20for the <TT>include/private/gcconfig.h</tt> file. This is described in the
21following section. Later sections discuss some of the optional features,
22which typically involve more porting effort.
23<P>
24Note that the collector makes heavy use of <TT>ifdef</tt>s. Unlike
25some other software projects, we have concluded repeatedly that this is preferable
26to system dependent files, with code duplicated between the files.
27However, to keep this manageable, we do strongly believe in indenting
28<TT>ifdef</tt>s correctly (for historical reasons usually without the leading
29sharp sign). (Separate source files are of course fine if they don't result in
30code duplication.)
31<H2>Adding Platforms to <TT>gcconfig.h</tt></h2>
32If neither thread support, nor tracing of dynamic library data is required,
33these are often the only changes you will need to make.
34<P>
35The <TT>gcconfig.h</tt> file consists of three sections:
36<OL>
37<LI> A section that defines GC-internal macros
38that identify the architecture (e.g. <TT>IA64</tt> or <TT>I386</tt>)
39and operating system (e.g. <TT>LINUX</tt> or <TT>MSWIN32</tt>).
40This is usually done by testing predefined macros. By defining
41our own macros instead of using the predefined ones directly, we can
42impose a bit more consistency, and somewhat isolate ourselves from
43compiler differences.
44<P>
45It is relatively straightforward to add a new entry here. But please try
46to be consistent with the existing code. In particular, 64-bit variants
47of 32-bit architectures general are <I>not</i> treated as a new architecture.
48Instead we explicitly test for 64-bit-ness in the few places in which it
49matters. (The notable exception here is <TT>I386</tt> and <TT>X86_64</tt>.
50This is partially historical, and partially justified by the fact that there
51are arguably more substantial architecture and ABI differences here than
52for RISC variants.)
53<P>
54on GNU-based systems, <TT>cpp -dM empty_source_file.c</tt> seems to generate
55a set of predefined macros. On some other systems, the "verbose"
56compiler option may do so, or the manual page may list them.
57<LI>
58A section that defines a small number of platform-specific macros, which are
59then used directly by the collector. For simple ports, this is where most of
60the effort is required. We describe the macros below.
61<P>
62This section contains a subsection for each architecture (enclosed in a
63suitable <TT>ifdef</tt>. Each subsection usually contains some
64architecture-dependent defines, followed by several sets of OS-dependent
65defines, again enclosed in <TT>ifdef</tt>s.
66<LI>
67A section that fills in defaults for some macros left undefined in the preceding
68section, and defines some other macros that rarely need adjustment for
69new platforms. You will typically not have to touch these.
70If you are porting to an OS that
71was previously completely unsupported, it is likely that you will
72need to add another clause to the definition of <TT>GET_MEM</tt>.
73</ol>
74The following macros must be defined correctly for each architecture and operating
75system:
76<DL>
77<DT><TT>MACH_TYPE</tt>
78<DD>
79Defined to a string that represents the machine architecture. Usually
80just the macro name used to identify the architecture, but enclosed in quotes.
81<DT><TT>OS_TYPE</tt>
82<DD>
83Defined to a string that represents the operating system name. Usually
84just the macro name used to identify the operating system, but enclosed in quotes.
85<DT><TT>CPP_WORDSZ</tt>
86<DD>
87The word size in bits as a constant suitable for preprocessor tests,
88i.e. without casts or sizeof expressions. Currently always defined as
89either 64 or 32. For platforms supporting both 32- and 64-bit ABIs,
90this should be conditionally defined depending on the current ABI.
91There is a default of 32.
92<DT><TT>ALIGNMENT</tt>
93<DD>
94Defined to be the largest <TT>N</tt>, such that
95all pointer are guaranteed to be aligned on <TT>N</tt>-byte boundaries.
96defining it to be 1 will always work, but perform poorly.
97For all modern 32-bit platforms, this is 4. For all modern 64-bit
98platforms, this is 8. Whether or not X86 qualifies as a modern
99architecture here is compiler- and OS-dependent.
100<DT><TT>DATASTART</tt>
101<DD>
102The beginning of the main data segment. The collector will trace all
103memory between <TT>DATASTART</tt> and <TT>DATAEND</tt> for root pointers.
104On some platforms,this can be defined to a constant address,
105though experience has shown that to be risky. Ideally the linker will
106define a symbol (e.g. <TT>_data</tt> whose address is the beginning
107of the data segment. Sometimes the value can be computed using
108the <TT>GC_SysVGetDataStart</tt> function. Not used if either
109the next macro is defined, or if dynamic loading is supported, and the
110dynamic loading support defines a function
111<TT>GC_register_main_static_data()</tt> which returns false.
112<DT><TT>SEARCH_FOR_DATA_START</tt>
113<DD>
114If this is defined <TT>DATASTART</tt> will be defined to a dynamically
115computed value which is obtained by starting with the address of
116<TT>_end</tt> and walking backwards until non-addressable memory is found.
117This often works on Posix-like platforms. It makes it harder to debug
118client programs, since startup involves generating and catching a
119segmentation fault, which tends to confuse users.
120<DT><TT>DATAEND</tt>
121<DD>
122Set to the end of the main data segment. Defaults to <TT>end</tt>,
123where that is declared as an array. This works in some cases, since
124the linker introduces a suitable symbol.
125<DT><TT>DATASTART2, DATAEND2</tt>
126<DD>
127Some platforms have two discontiguous main data segments, e.g.
128for initialized and uninitialized data. If so, these two macros
129should be defined to the limits of the second main data segment.
130<DT><TT>STACK_GROWS_UP</tt>
131<DD>
132Should be defined if the stack (or thread stacks) grow towards higher
133addresses. (This appears to be true only on PA-RISC. If your architecture
134has more than one stack per thread, and is not already supported, you will
135need to do more work. Grep for "IA64" in the source for an example.)
136<DT><TT>STACKBOTTOM</tt>
137<DD>
138Defined to be the cool end of the stack, which is usually the
139highest address in the stack. It must bound the region of the
140stack that contains pointers into the GC heap. With thread support,
141this must be the cold end of the main stack, which typically
142cannot be found in the same way as the other thread stacks.
143If this is not defined and none of the following three macros
144is defined, client code must explicitly set
145<TT>GC_stackbottom</tt> to an appropriate value before calling
146<TT>GC_INIT()</tt> or any other <TT>GC_</tt> routine.
147<DT><TT>LINUX_STACKBOTTOM</tt>
148<DD>
149May be defined instead of <TT>STACKBOTTOM</tt>.
150If defined, then the cold end of the stack will be determined
151Currently we usually read it from /proc.
152<DT><TT>HEURISTIC1</tt>
153<DD>
154May be defined instead of <TT>STACKBOTTOM</tt>.
155<TT>STACK_GRAN</tt> should generally also be undefined and defined.
156The cold end of the stack is determined by taking an address inside
157<TT>GC_init's frame</tt>, and rounding it up to
158the next multiple of <TT>STACK_GRAN</tt>. This works well if the stack base is
159always aligned to a large power of two.
160(<TT>STACK_GRAN</tt> is predefined to 0x1000000, which is
161rarely optimal.)
162<DT><TT>HEURISTIC2</tt>
163<DD>
164May be defined instead of <TT>STACKBOTTOM</tt>.
165The cold end of the stack is determined by taking an address inside
166GC_init's frame, incrementing it repeatedly
167in small steps (decrement if <TT>STACK_GROWS_UP</tt>), and reading the value
168at each location. We remember the value when the first
169Segmentation violation or Bus error is signalled, round that
170to the nearest plausible page boundary, and use that as the
171stack base.
172<DT><TT>DYNAMIC_LOADING</tt>
173<DD>
174Should be defined if <TT>dyn_load.c</tt> has been updated for this
175platform and tracing of dynamic library roots is supported.
176<DT><TT>MPROTECT_VDB, PROC_VDB</tt>
177<DD>
178May be defined if the corresponding "virtual dirty bit"
179implementation in os_dep.c is usable on this platform. This
180allows incremental/generational garbage collection.
181<TT>MPROTECT_VDB</tt> identifies modified pages by
182write protecting the heap and catching faults.
183<TT>PROC_VDB</tt> uses the /proc primitives to read dirty bits.
184<DT><TT>PREFETCH, PREFETCH_FOR_WRITE</tt>
185<DD>
186The collector uses <TT>PREFETCH</tt>(<I>x</i>) to preload the cache
187with *<I>x</i>.
188This defaults to a no-op.
189<DT><TT>CLEAR_DOUBLE</tt>
190<DD>
191If <TT>CLEAR_DOUBLE</tt> is defined, then
192<TT>CLEAR_DOUBLE</tt>(x) is used as a fast way to
193clear the two words at GC_malloc-aligned address x. By default,
194word stores of 0 are used instead.
195<DT><TT>HEAP_START</tt>
196<DD>
197<TT>HEAP_START</tt> may be defined as the initial address hint for mmap-based
198allocation.
199<DT><TT>ALIGN_DOUBLE</tt>
200<DD>
201Should be defined if the architecture requires double-word alignment
202of <TT>GC_malloc</tt>ed memory, e.g. 8-byte alignment with a
20332-bit ABI. Most modern machines are likely to require this.
204This is no longer needed for GC7 and later.
205</dl>
206<H2>Additional requirements for a basic port</h2>
207In some cases, you may have to add additional platform-specific code
208to other files. A likely candidate is the implementation of
209<TT>GC_with_callee_saves_pushed</tt> in </tt>mach_dep.c</tt>.
210This ensure that register contents that the collector must trace
211from are copied to the stack. Typically this can be done portably,
212but on some platforms it may require assembly code, or just
213tweaking of conditional compilation tests.
214<P>
215For GC7, if your platform supports <TT>getcontext()</tt>, then definining
216the macro <TT>UNIX_LIKE</tt> for your OS in <TT>gcconfig.h</tt>
217(if it isn't defined there already) is likely to solve the problem.
218otherwise, if you are using gcc, <TT>_builtin_unwind_init()</tt>
219will be used, and should work fine. If that is not applicable either,
220the implementation will try to use <TT>setjmp()</tt>. This will work if your
221<TT>setjmp</tt> implementation saves all possibly pointer-valued registers
222into the buffer, as opposed to trying to unwind the stack at
223<TT>longjmp</tt> time. The <TT>setjmp_test</tt> test tries to determine this,
224but often doesn't get it right.
225<P>
226In GC6.x versions of the collector, tracing of registers
227was more commonly handled
228with assembly code. In GC7, this is generally to be avoided.
229<P>
230Most commonly <TT>os_dep.c</tt> will not require attention, but see below.
231<H2>Thread support</h2>
232Supporting threads requires that the collector be able to find and suspend
233all threads potentially accessing the garbage-collected heap, and locate
234any state associated with each thread that must be traced.
235<P>
236The functionality needed for thread support is generally implemented
237in one or more files specific to the particular thread interface.
238For example, somewhat portable pthread support is implemented
239in <TT>pthread_support.c</tt> and <TT>pthread_stop_world.c</tt>.
240The essential functionality consists of
241<DL>
242<DT><TT>GC_stop_world()</tt>
243<DD>
244Stops all threads which may access the garbage collected heap, other
245than the caller.
246<DT><TT>GC_start_world()</tt>
247<DD>
248Restart other threads.
249<DT><TT>GC_push_all_stacks()</tt>
250<DD>
251Push the contents of all thread stacks (or at least of pointer-containing
252regions in the thread stacks) onto the mark stack.
253</dl>
254These very often require that the garbage collector maintain its
255own data structures to track active threads.
256<P>
257In addition, <TT>LOCK</tt> and <TT>UNLOCK</tt> must be implemented
258in <TT>gc_locks.h</tt>
259<P>
260The easiest case is probably a new pthreads platform
261on which threads can be stopped
262with signals. In this case, the changes involve:
263<OL>
264<LI>Introducing a suitable <TT>GC_</tt><I>X</i><TT>_THREADS</tt> macro, which should
265be automatically defined by <TT>gc_config_macros.h</tt> in the right cases.
266It should also result in a definition of <TT>GC_PTHREADS</tt>, as for the
267existing cases.
268<LI>For GC7+, ensuring that the <TT>atomic_ops</tt> package at least
269minimally supports the platform.
270If incremental GC is needed, or if pthread locks don't
271perform adequately as the allocation lock, you will probably need to
272ensure that a sufficient <TT>atomic_ops</tt> port
273exists for the platform to provided an atomic test and set
274operation. (Current GC7 versions require more<TT>atomic_ops</tt>
275asupport than necessary. This is a bug.) For earlier versions define
276<TT>GC_test_and_set</tt> in <TT>gc_locks.h</tt>.
277<LI>Making any needed adjustments to <TT>pthread_stop_world.c</tt> and
278<TT>pthread_support.c</tt>. Ideally none should be needed. In fact,
279not all of this is as well standardized as one would like, and outright
280bugs requiring workarounds are common.
281</ol>
282Non-preemptive threads packages will probably require further work. Similarly
283thread-local allocation and parallel marking requires further work
284in <TT>pthread_support.c</tt>, and may require better <TT>atomic_ops</tt>
285support.
286<H2>Dynamic library support</h2>
287So long as <TT>DATASTART</tt> and <TT>DATAEND</tt> are defined correctly,
288the collector will trace memory reachable from file scope or <TT>static</tt>
289variables defined as part of the main executable. This is sufficient
290if either the program is statically linked, or if pointers to the
291garbage-collected heap are never stored in non-stack variables
292defined in dynamic libraries.
293<P>
294If dynamic library data sections must also be traced, then
295<UL>
296<LI><TT>DYNAMIC_LOADING</tt> must be defined in the appropriate section
297of <TT>gcconfig.h</tt>.
298<LI>An appropriate versions of the functions
299<TT>GC_register_dynamic_libraries()</tt> should be defined in
300<TT>dyn_load.c</tt>. This function should invoke
301<TT>GC_cond_add_roots(</tt><I>region_start, region_end</i><TT>, TRUE)</tt>
302on each dynamic library data section.
303</ul>
304<P>
305Implementations that scan for writable data segments are error prone, particularly
306in the presence of threads. They frequently result in race conditions
307when threads exit and stacks disappear. They may also accidentally trace
308large regions of graphics memory, or mapped files. On at least
309one occasion they have been known to try to trace device memory that
310could not safely be read in the manner the GC wanted to read it.
311<P>
312It is usually safer to walk the dynamic linker data structure, especially
313if the linker exports an interface to do so. But beware of poorly documented
314locking behavior in this case.
315<H2>Incremental GC support</h2>
316For incremental and generational collection to work, <TT>os_dep.c</tt>
317must contain a suitable "virtual dirty bit" implementation, which
318allows the collector to track which heap pages (assumed to be
319a multiple of the collectors block size) have been written during
320a certain time interval. The collector provides several
321implementations, which might be adapted. The default
322(<TT>DEFAULT_VDB</tt>) is a placeholder which treats all pages
323as having been written. This ensures correctness, but renders
324incremental and generational collection essentially useless.
325<H2>Stack traces for debug support</h2>
326If stack traces in objects are need for debug support,
327<TT>GC_dave_callers</tt> and <TT>GC_print_callers</tt> must be
328implemented.
329<H2>Disclaimer</h2>
330This is an initial pass at porting guidelines. Some things
331have no doubt been overlooked.
332</body>
333</html>