]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/tools/build/src/engine/boehm_gc/doc/gcdescr.html
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / tools / build / src / engine / boehm_gc / doc / gcdescr.html
CommitLineData
7c673cae
FG
1<HTML>
2<HEAD>
3 <TITLE> Conservative GC Algorithmic Overview </TITLE>
4 <AUTHOR> Hans-J. Boehm, HP Labs (Some of this was written at SGI)</author>
5</HEAD>
6<BODY>
7<H1> <I>This is under construction, and may always be.</i> </h1>
8<H1> Conservative GC Algorithmic Overview </h1>
9<P>
10This is a description of the algorithms and data structures used in our
11conservative garbage collector. I expect the level of detail to increase
12with time. For a survey of GC algorithms, see for example
13<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
14excellent paper</a>. For an overview of the collector interface,
15see <A HREF="gcinterface.html">here</a>.
16<P>
17This description is targeted primarily at someone trying to understand the
18source code. It specifically refers to variable and function names.
19It may also be useful for understanding the algorithms at a higher level.
20<P>
21The description here assumes that the collector is used in default mode.
22In particular, we assume that it used as a garbage collector, and not just
23a leak detector. We initially assume that it is used in stop-the-world,
24non-incremental mode, though the presence of the incremental collector
25will be apparent in the design.
26We assume the default finalization model, but the code affected by that
27is very localized.
28<H2> Introduction </h2>
29The garbage collector uses a modified mark-sweep algorithm. Conceptually
30it operates roughly in four phases, which are performed occasionally
31as part of a memory allocation:
32
33<OL>
34
35<LI>
36<I>Preparation</i> Each object has an associated mark bit.
37Clear all mark bits, indicating that all objects
38are potentially unreachable.
39
40<LI>
41<I>Mark phase</i> Marks all objects that can be reachable via chains of
42pointers from variables. Often the collector has no real information
43about the location of pointer variables in the heap, so it
44views all static data areas, stacks and registers as potentially containing
45pointers. Any bit patterns that represent addresses inside
46heap objects managed by the collector are viewed as pointers.
47Unless the client program has made heap object layout information
48available to the collector, any heap objects found to be reachable from
49variables are again scanned similarly.
50
51<LI>
52<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
53objects, and returns them to an appropriate free list for reuse. This is
54not really a separate phase; even in non incremental mode this is operation
55is usually performed on demand during an allocation that discovers an empty
56free list. Thus the sweep phase is very unlikely to touch a page that
57would not have been touched shortly thereafter anyway.
58
59<LI>
60<I>Finalization phase</i> Unreachable objects which had been registered
61for finalization are enqueued for finalization outside the collector.
62
63</ol>
64
65<P>
66The remaining sections describe the memory allocation data structures,
67and then the last 3 collection phases in more detail. We conclude by
68outlining some of the additional features implemented in the collector.
69
70<H2>Allocation</h2>
71The collector includes its own memory allocator. The allocator obtains
72memory from the system in a platform-dependent way. Under UNIX, it
73uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
74<P>
75Most static data used by the allocator, as well as that needed by the
76rest of the garbage collector is stored inside the
77<TT>_GC_arrays</tt> structure.
78This allows the garbage collector to easily ignore the collectors own
79data structures when it searches for root pointers. Other allocator
80and collector internal data structures are allocated dynamically
81with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
82allow for deallocation, and is therefore used only for permanent data
83structures.
84<P>
85The allocator allocates objects of different <I>kinds</i>.
86Different kinds are handled somewhat differently by certain parts
87of the garbage collector. Certain kinds are scanned for pointers,
88others are not. Some may have per-object type descriptors that
89determine pointer locations. Or a specific kind may correspond
90to one specific object layout. Two built-in kinds are uncollectable.
91One (<TT>STUBBORN</tt>) is immutable without special precautions.
92In spite of that, it is very likely that most C clients of the
93collector currently
94use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
95The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes
96heavy use of a kind (allocated with GC_gcj_malloc) that stores
97type information at a known offset in method tables.
98<P>
99The collector uses a two level allocator. A large block is defined to
100be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
101typically on the order of the page size.
102<P>
103Large block sizes are rounded up to
104the next multiple of <TT>HBLKSIZE</tt> and then allocated by
105<TT>GC_allochblk</tt>. Recent versions of the collector
106use an approximate best fit algorithm by keeping free lists for
107several large block sizes.
108The actual
109implementation of <TT>GC_allochblk</tt>
110is significantly complicated by black-listing issues
111(see below).
112<P>
113Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
114Each chunk is
115dedicated to only one object size and kind. The allocator maintains
116separate free lists for each size and kind of object.
117<P>
118Once a large block is split for use in smaller objects, it can only
119be used for objects of that size, unless the collector discovers a completely
120empty chunk. Completely empty chunks are restored to the appropriate
121large block free list.
122<P>
123In order to avoid allocating blocks for too many distinct object sizes,
124the collector normally does not directly allocate objects of every possible
125request size. Instead request are rounded up to one of a smaller number
126of allocated sizes, for which free lists are maintained. The exact
127allocated sizes are computed on demand, but subject to the constraint
128that they increase roughly in geometric progression. Thus objects
129requested early in the execution are likely to be allocated with exactly
130the requested size, subject to alignment constraints.
131See <TT>GC_init_size_map</tt> for details.
132<P>
133The actual size rounding operation during small object allocation is
134implemented as a table lookup in <TT>GC_size_map</tt>.
135<P>
136Both collector initialization and computation of allocated sizes are
137handled carefully so that they do not slow down the small object fast
138allocation path. An attempt to allocate before the collector is initialized,
139or before the appropriate <TT>GC_size_map</tt> entry is computed,
140will take the same path as an allocation attempt with an empty free list.
141This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
142which performs the appropriate initialization checks.
143<P>
144In non-incremental mode, we make a decision about whether to garbage collect
145whenever an allocation would otherwise have failed with the current heap size.
146If the total amount of allocation since the last collection is less than
147the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
148expand the heap. Otherwise, we initiate a garbage collection. This ensures
149that the amount of garbage collection work per allocated byte remains
150constant.
151<P>
152The above is in fact an oversimplification of the real heap expansion
153and GC triggering heuristic, which adjusts slightly for root size
154and certain kinds of
155fragmentation. In particular:
156<UL>
157<LI> Programs with a large root set size and
158little live heap memory will expand the heap to amortize the cost of
159scanning the roots.
160<LI> Versions 5.x of the collector actually collect more frequently in
161nonincremental mode. The large block allocator usually refuses to split
162large heap blocks once the garbage collection threshold is
163reached. This often has the effect of collecting well before the
164heap fills up, thus reducing fragmentation and working set size at the
165expense of GC time. Versions 6.x choose an intermediate strategy depending
166on how much large object allocation has taken place in the past.
167(If the collector is configured to unmap unused pages, versions 6.x
168use the 5.x strategy.)
169<LI> In calculating the amount of allocation since the last collection we
170give partial credit for objects we expect to be explicitly deallocated.
171Even if all objects are explicitly managed, it is often desirable to collect
172on rare occasion, since that is our only mechanism for coalescing completely
173empty chunks.
174</ul>
175<P>
176It has been suggested that this should be adjusted so that we favor
177expansion if the resulting heap still fits into physical memory.
178In many cases, that would no doubt help. But it is tricky to do this
179in a way that remains robust if multiple application are contending
180for a single pool of physical memory.
181
182<H2>Mark phase</h2>
183
184At each collection, the collector marks all objects that are
185possibly reachable from pointer variables. Since it cannot generally
186tell where pointer variables are located, it scans the following
187<I>root segments</i> for pointers:
188<UL>
189<LI>The registers. Depending on the architecture, this may be done using
190assembly code, or by calling a <TT>setjmp</tt>-like function which saves
191register contents on the stack.
192<LI>The stack(s). In the case of a single-threaded application,
193on most platforms this
194is done by scanning the memory between (an approximation of) the current
195stack pointer and <TT>GC_stackbottom</tt>. (For Itanium, the register stack
196scanned separately.) The <TT>GC_stackbottom</tt> variable is set in
197a highly platform-specific way depending on the appropriate configuration
198information in <TT>gcconfig.h</tt>. Note that the currently active
199stack needs to be scanned carefully, since callee-save registers of
200client code may appear inside collector stack frames, which may
201change during the mark process. This is addressed by scanning
202some sections of the stack "eagerly", effectively capturing a snapshot
203at one point in time.
204<LI>Static data region(s). In the simplest case, this is the region
205between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in
206<TT>gcconfig.h</tt>. However, in most cases, this will also involve
207static data regions associated with dynamic libraries. These are
208identified by the mostly platform-specific code in <TT>dyn_load.c</tt>.
209</ul>
210The marker maintains an explicit stack of memory regions that are known
211to be accessible, but that have not yet been searched for contained pointers.
212Each stack entry contains the starting address of the block to be scanned,
213as well as a descriptor of the block. If no layout information is
214available for the block, then the descriptor is simply a length.
215(For other possibilities, see <TT>gc_mark.h</tt>.)
216<P>
217At the beginning of the mark phase, all root segments
218(as described above) are pushed on the
219stack by <TT>GC_push_roots</tt>. (Registers and eagerly processed
220stack sections are processed by pushing the referenced objects instead
221of the stack section itself.) If <TT>ALL_INTERIOR_PTRS</tt> is not
222defined, then stack roots require special treatment. In this case, the
223normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
224explicitly checks for interior pointers and pushes descriptors for target
225objects.
226<P>
227The marker is structured to allow incremental marking.
228Each call to <TT>GC_mark_some</tt> performs a small amount of
229work towards marking the heap.
230It maintains
231explicit state in the form of <TT>GC_mark_state</tt>, which
232identifies a particular sub-phase. Some other pieces of state, most
233notably the mark stack, identify how much work remains to be done
234in each sub-phase. The normal progression of mark states for
235a stop-the-world collection is:
236<OL>
237<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
238objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
239be false, so the mark state is advanced to
240<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
241uncollectable objects, roots, and then mark everything reachable from them.
242<TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
243objects are pushed, and objects reachable from them are marked.
244At that point, the next call to <TT>GC_mark_some</tt> calls
245<TT>GC_push_roots</tt> to push the roots. It the advances the
246mark state to
247<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
248empty, all reachable objects are marked. Once in this state, we work
249only on emptying the mark stack. Once this is completed, the state
250changes to
251<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
252</ol>
253
254The core mark routine <TT>GC_mark_from</tt>, is called
255repeatedly by several of the sub-phases when the mark stack starts to fill
256up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
257to empty the mark stack.
258The routine is designed to only perform a limited amount of marking at
259each call, so that it can also be used by the incremental collector.
260It is fairly carefully tuned, since it usually consumes a large majority
261of the garbage collection time.
262<P>
263The fact that it perform a only a small amount of work per call also
264allows it to be used as the core routine of the parallel marker. In that
265case it is normally invoked on thread-private mark stacks instead of the
266global mark stack. More details can be found in
267<A HREF="scale.html">scale.html</a>
268<P>
269The marker correctly handles mark stack overflows. Whenever the mark stack
270overflows, the mark state is reset to <TT>MS_INVALID</tt>.
271Since there are already marked objects in the heap,
272this eventually forces a complete
273scan of the heap, searching for pointers, during which any unmarked objects
274referenced by marked objects are again pushed on the mark stack. This
275process is repeated until the mark phase completes without a stack overflow.
276Each time the stack overflows, an attempt is made to grow the mark stack.
277All pieces of the collector that push regions onto the mark stack have to be
278careful to ensure forward progress, even in case of repeated mark stack
279overflows. Every mark attempt results in additional marked objects.
280<P>
281Each mark stack entry is processed by examining all candidate pointers
282in the range described by the entry. If the region has no associated
283type information, then this typically requires that each 4-byte aligned
284quantity (8-byte aligned with 64-bit pointers) be considered a candidate
285pointer.
286<P>
287We determine whether a candidate pointer is actually the address of
288a heap block. This is done in the following steps:
289<NL>
290<LI> The candidate pointer is checked against rough heap bounds.
291These heap bounds are maintained such that all actual heap objects
292fall between them. In order to facilitate black-listing (see below)
293we also include address regions that the heap is likely to expand into.
294Most non-pointers fail this initial test.
295<LI> The candidate pointer is divided into two pieces; the most significant
296bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
297the least significant bits specify an offset within that page.
298(A hardware page may actually consist of multiple such pages.
299HBLKSIZE is usually the page size divided by a small power of two.)
300<LI>
301The page address part of the candidate pointer is looked up in a
302<A HREF="tree.html">table</a>.
303Each table entry contains either 0, indicating that the page is not part
304of the garbage collected heap, a small integer <I>n</i>, indicating
305that the page is part of large object, starting at least <I>n</i> pages
306back, or a pointer to a descriptor for the page. In the first case,
307the candidate pointer i not a true pointer and can be safely ignored.
308In the last two cases, we can obtain a descriptor for the page containing
309the beginning of the object.
310<LI>
311The starting address of the referenced object is computed.
312The page descriptor contains the size of the object(s)
313in that page, the object kind, and the necessary mark bits for those
314objects. The size information can be used to map the candidate pointer
315to the object starting address. To accelerate this process, the page header
316also contains a pointer to a precomputed map of page offsets to displacements
317from the beginning of an object. The use of this map avoids a
318potentially slow integer remainder operation in computing the object
319start address.
320<LI>
321The mark bit for the target object is checked and set. If the object
322was previously unmarked, the object is pushed on the mark stack.
323The descriptor is read from the page descriptor. (This is computed
324from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
325</nl>
326<P>
327At the end of the mark phase, mark bits for left-over free lists are cleared,
328in case a free list was accidentally marked due to a stray pointer.
329
330<H2>Sweep phase</h2>
331
332At the end of the mark phase, all blocks in the heap are examined.
333Unmarked large objects are immediately returned to the large object free list.
334Each small object page is checked to see if all mark bits are clear.
335If so, the entire page is returned to the large object free list.
336Small object pages containing some reachable object are queued for later
337sweeping, unless we determine that the page contains very little free
338space, in which case it is not examined further.
339<P>
340This initial sweep pass touches only block headers, not
341the blocks themselves. Thus it does not require significant paging, even
342if large sections of the heap are not in physical memory.
343<P>
344Nonempty small object pages are swept when an allocation attempt
345encounters an empty free list for that object size and kind.
346Pages for the correct size and kind are repeatedly swept until at
347least one empty block is found. Sweeping such a page involves
348scanning the mark bit array in the page header, and building a free
349list linked through the first words in the objects themselves.
350This does involve touching the appropriate data page, but in most cases
351it will be touched only just before it is used for allocation.
352Hence any paging is essentially unavoidable.
353<P>
354Except in the case of pointer-free objects, we maintain the invariant
355that any object in a small object free list is cleared (except possibly
356for the link field). Thus it becomes the burden of the small object
357sweep routine to clear objects. This has the advantage that we can
358easily recover from accidentally marking a free list, though that could
359also be handled by other means. The collector currently spends a fair
360amount of time clearing objects, and this approach should probably be
361revisited.
362<P>
363In most configurations, we use specialized sweep routines to handle common
364small object sizes. Since we allocate one mark bit per word, it becomes
365easier to examine the relevant mark bits if the object size divides
366the word length evenly. We also suitably unroll the inner sweep loop
367in each case. (It is conceivable that profile-based procedure cloning
368in the compiler could make this unnecessary and counterproductive. I
369know of no existing compiler to which this applies.)
370<P>
371The sweeping of small object pages could be avoided completely at the expense
372of examining mark bits directly in the allocator. This would probably
373be more expensive, since each allocation call would have to reload
374a large amount of state (e.g. next object address to be swept, position
375in mark bit table) before it could do its work. The current scheme
376keeps the allocator simple and allows useful optimizations in the sweeper.
377
378<H2>Finalization</h2>
379Both <TT>GC_register_disappearing_link</tt> and
380<TT>GC_register_finalizer</tt> add the request to a corresponding hash
381table. The hash table is allocated out of collected memory, but
382the reference to the finalizable object is hidden from the collector.
383Currently finalization requests are processed non-incrementally at the
384end of a mark cycle.
385<P>
386The collector makes an initial pass over the table of finalizable objects,
387pushing the contents of unmarked objects onto the mark stack.
388After pushing each object, the marker is invoked to mark all objects
389reachable from it. The object itself is not explicitly marked.
390This assures that objects on which a finalizer depends are neither
391collected nor finalized.
392<P>
393If in the process of marking from an object the
394object itself becomes marked, we have uncovered
395a cycle involving the object. This usually results in a warning from the
396collector. Such objects are not finalized, since it may be
397unsafe to do so. See the more detailed
398<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
399<P>
400Any objects remaining unmarked at the end of this process are added to
401a queue of objects whose finalizers can be run. Depending on collector
402configuration, finalizers are dequeued and run either implicitly during
403allocation calls, or explicitly in response to a user request.
404(Note that the former is unfortunately both the default and not generally safe.
405If finalizers perform synchronization, it may result in deadlocks.
406Nontrivial finalizers generally need to perform synchronization, and
407thus require a different collector configuration.)
408<P>
409The collector provides a mechanism for replacing the procedure that is
410used to mark through objects. This is used both to provide support for
411Java-style unordered finalization, and to ignore certain kinds of cycles,
412<I>e.g.</i> those arising from C++ implementations of virtual inheritance.
413
414<H2>Generational Collection and Dirty Bits</h2>
415We basically use the concurrent and generational GC algorithm described in
416<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
417by Boehm, Demers, and Shenker.
418<P>
419The most significant modification is that
420the collector always starts running in the allocating thread.
421There is no separate garbage collector thread. (If parallel GC is
422enabled, helper threads may also be woken up.)
423If an allocation attempt either requests a large object, or encounters
424an empty small object free list, and notices that there is a collection
425in progress, it immediately performs a small amount of marking work
426as described above.
427<P>
428This change was made both because we wanted to easily accommodate
429single-threaded environments, and because a separate GC thread requires
430very careful control over the scheduler to prevent the mutator from
431out-running the collector, and hence provoking unneeded heap growth.
432<P>
433In incremental mode, the heap is always expanded when we encounter
434insufficient space for an allocation. Garbage collection is triggered
435whenever we notice that more than
436<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
437bytes of allocation have taken place.
438After <TT>GC_full_freq</tt> minor collections a major collection
439is started.
440<P>
441All collections initially run interrupted until a predetermined
442amount of time (50 msecs by default) has expired. If this allows
443the collection to complete entirely, we can avoid correcting
444for data structure modifications during the collection. If it does
445not complete, we return control to the mutator, and perform small
446amounts of additional GC work during those later allocations that
447cannot be satisfied from small object free lists. When marking completes,
448the set of modified pages is retrieved, and we mark once again from
449marked objects on those pages, this time with the mutator stopped.
450<P>
451We keep track of modified pages using one of several distinct mechanisms:
452<OL>
453<LI>
454Through explicit mutator cooperation. Currently this requires
455the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
456<LI>
457(<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
458catching write faults. This is
459implemented for many Unix-like systems and for win32. It is not possible
460in a few environments.
461<LI>
462(<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
463(Currently only Sun's
464Solaris supports this. Though this is considerably cleaner, performance
465may actually be better with mprotect and signals.)
466<LI>
467(<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
468case the one in Xerox PCR.
469<LI>
470(<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if
471none of the other techniques is known to be usable, and
472<TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if
473the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
474</ol>
475
476<H2>Black-listing</h2>
477
478The collector implements <I>black-listing</i> of pages, as described
479in
480<A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/">
481Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
482<A HREF="papers/pldi93.ps.Z">here</a>.
483<P>
484During the mark phase, the collector tracks ``near misses'', i.e. attempts
485to follow a ``pointer'' to just outside the garbage-collected heap, or
486to a currently unallocated page inside the heap. Pages that have been
487the targets of such near misses are likely to be the targets of
488misidentified ``pointers'' in the future. To minimize the future
489damage caused by such misidentifications they will be allocated only to
490small pointerfree objects.
491<P>
492The collector understands two different kinds of black-listing. A
493page may be black listed for interior pointer references
494(<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
495miss from a location that requires interior pointer recognition,
496<I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
497is set. In this case, we also avoid allocating large blocks that include
498this page.
499<P>
500If the near miss came from a source that did not require interior
501pointer recognition, it is black-listed with
502<TT>GC_add_to_black_list_normal</tt>.
503A page black-listed in this way may appear inside a large object,
504so long as it is not the first page of a large object.
505<P>
506The <TT>GC_allochblk</tt> routine respects black-listing when assigning
507a block to a particular object kind and size. It occasionally
508drops (i.e. allocates and forgets) blocks that are completely black-listed
509in order to avoid excessively long large block free lists containing
510only unusable blocks. This would otherwise become an issue
511if there is low demand for small pointerfree objects.
512
513<H2>Thread support</h2>
514We support several different threading models. Unfortunately Pthreads,
515the only reasonably well standardized thread model, supports too narrow
516an interface for conservative garbage collection. There appears to be
517no completely portable way to allow the collector
518to coexist with various Pthreads
519implementations. Hence we currently support only the more
520common Pthreads implementations.
521<P>
522In particular, it is very difficult for the collector to stop all other
523threads in the system and examine the register contents. This is currently
524accomplished with very different mechanisms for some Pthreads
525implementations. The Solaris implementation temporarily disables much
526of the user-level threads implementation by stopping kernel-level threads
527("lwp"s). The Linux/HPUX/OSF1 and Irix implementations sends signals to
528individual Pthreads and has them wait in the signal handler.
529<P>
530The Linux and Irix implementations use
531only documented Pthreads calls, but rely on extensions to their semantics.
532The Linux implementation <TT>linux_threads.c</tt> relies on only very
533mild extensions to the pthreads semantics, and already supports a large number
534of other Unix-like pthreads implementations. Our goal is to make this the
535only pthread support in the collector.
536<P>
537(The Irix implementation is separate only for historical reasons and should
538clearly be merged. The current Solaris implementation probably performs
539better in the uniprocessor case, but does not support thread operations in the
540collector. Hence it cannot support the parallel marker.)
541<P>
542All implementations must
543intercept thread creation and a few other thread-specific calls to allow
544enumeration of threads and location of thread stacks. This is current
545accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
546(really <TT>gc_pthread_redirects.h</tt>), or optionally
547by using ld's function call wrapping mechanism under Linux.
548<P>
549Recent versions of the collector support several facilites to enhance
550the processor-scalability and thread performance of the collector.
551These are discussed in more detail <A HREF="scale.html">here</a>.
552We briefly outline the data approach to thread-local allocation in the
553next section.
554<H2>Thread-local allocation</h2>
555If thread-local allocation is enabled, the collector keeps separate
556arrays of free lists for each thread. Thread-local allocation
557is currently only supported on a few platforms.
558<P>
559The free list arrays associated
560with each thread are only used to satisfy requests for objects that
561are both very small, and belong to one of a small number of well-known
562kinds. These currently include "normal" and pointer-free objects.
563Depending onthe configuration, "gcj" objects may also be included.
564<P>
565Thread-local free list entries contain either a pointer to the first
566element of a free list, or they contain a counter of the number of
567allocation "granules" allocated so far. Initially they contain the
568value one, i.e. a small counter value.
569<P>
570Thread-local allocation allocates directly through the global
571allocator, if the object is of a size or kind not covered by the
572local free lists.
573<P>
574If there is an appropriate local free list, the allocator checks whether it
575contains a sufficiently small counter value. If so, the counter is simply
576incremented by the counter value, and the global allocator is used.
577In this way, the initial few allocations of a given size bypass the local
578allocator. A thread that only allocates a handful of objects of a given
579size will not build up its own free list for that size. This avoids
580wasting space for unpopular objects sizes or kinds.
581<P>
582Once the counter passes a threshold, <TT>GC_malloc_many</tt> is called
583to allocate roughly <TT>HBLKSIZE</tt> space and put it on the corresponding
584local free list. Further allocations of that size and kind then use
585this free list, and no longer need to acquire the allocation lock.
586The allocation procedure is otherwise similar to the global free lists.
587The local free lists are also linked using the first word in the object.
588In most cases this means they require considerably less time.
589<P>
590Local free lists are treated buy most of the rest of the collector
591as though they were in-use reachable data. This requires some care,
592since pointer-free objects are not normally traced, and hence a special
593tracing procedure is required to mark all objects on pointer-free and
594gcj local free lists.
595<P>
596On thread exit, any remaining thread-local free list entries are
597transferred back to the global free list.
598<P>
599Note that if the collector is configured for thread-local allocation,
600GC versions before 7 do not invoke the thread-local allocator by default.
601<TT>GC_malloc</tt> only uses thread-local allocation in version 7 and later.
602In earlier versions, <TT>GC_MALLOC</tt> (all caps) may be directed
603to use thread-local allocation by defining <TT>GC_REDIRECT_TO_LOCAL</tt>
604and then include <TT>gc_local_alloc.h</tt>.
605<P>
606For some more details see <A HREF="scale.html">here</a>, and the
607technical report entitled
608<A HREF="http://www.hpl.hp.com/techreports/2000/HPL-2000-165.html">
609``Fast Multiprocessor Memory Allocation and Garbage Collection''
610</a>
611<P>
612<HR>
613<P>
614Comments are appreciated. Please send mail to
615<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or
616<A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a>
617<P>
618This is a modified copy of a page written while the author was at SGI.
619The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>.
620</body>
621</html>