]>
Commit | Line | Data |
---|---|---|
970d7e83 LB |
1 | ===================================== |
2 | Accurate Garbage Collection with LLVM | |
3 | ===================================== | |
4 | ||
5 | .. contents:: | |
6 | :local: | |
7 | ||
8 | Introduction | |
9 | ============ | |
10 | ||
11 | Garbage collection is a widely used technique that frees the programmer from | |
12 | having to know the lifetimes of heap objects, making software easier to produce | |
13 | and maintain. Many programming languages rely on garbage collection for | |
14 | automatic memory management. There are two primary forms of garbage collection: | |
15 | conservative and accurate. | |
16 | ||
17 | Conservative garbage collection often does not require any special support from | |
18 | either the language or the compiler: it can handle non-type-safe programming | |
19 | languages (such as C/C++) and does not require any special information from the | |
20 | compiler. The `Boehm collector | |
21 | <http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a | |
22 | state-of-the-art conservative collector. | |
23 | ||
24 | Accurate garbage collection requires the ability to identify all pointers in the | |
25 | program at run-time (which requires that the source-language be type-safe in | |
26 | most cases). Identifying pointers at run-time requires compiler support to | |
27 | locate all places that hold live pointer variables at run-time, including the | |
28 | :ref:`processor stack and registers <gcroot>`. | |
29 | ||
30 | Conservative garbage collection is attractive because it does not require any | |
31 | special compiler support, but it does have problems. In particular, because the | |
32 | conservative garbage collector cannot *know* that a particular word in the | |
33 | machine is a pointer, it cannot move live objects in the heap (preventing the | |
34 | use of compacting and generational GC algorithms) and it can occasionally suffer | |
35 | from memory leaks due to integer values that happen to point to objects in the | |
36 | program. In addition, some aggressive compiler transformations can break | |
37 | conservative garbage collectors (though these seem rare in practice). | |
38 | ||
39 | Accurate garbage collectors do not suffer from any of these problems, but they | |
40 | can suffer from degraded scalar optimization of the program. In particular, | |
41 | because the runtime must be able to identify and update all pointers active in | |
42 | the program, some optimizations are less effective. In practice, however, the | |
43 | locality and performance benefits of using aggressive garbage collection | |
44 | techniques dominates any low-level losses. | |
45 | ||
46 | This document describes the mechanisms and interfaces provided by LLVM to | |
47 | support accurate garbage collection. | |
48 | ||
49 | Goals and non-goals | |
50 | ------------------- | |
51 | ||
52 | LLVM's intermediate representation provides :ref:`garbage collection intrinsics | |
53 | <gc_intrinsics>` that offer support for a broad class of collector models. For | |
54 | instance, the intrinsics permit: | |
55 | ||
56 | * semi-space collectors | |
57 | ||
58 | * mark-sweep collectors | |
59 | ||
60 | * generational collectors | |
61 | ||
62 | * reference counting | |
63 | ||
64 | * incremental collectors | |
65 | ||
66 | * concurrent collectors | |
67 | ||
68 | * cooperative collectors | |
69 | ||
70 | We hope that the primitive support built into the LLVM IR is sufficient to | |
71 | support a broad class of garbage collected languages including Scheme, ML, Java, | |
72 | C#, Perl, Python, Lua, Ruby, other scripting languages, and more. | |
73 | ||
74 | However, LLVM does not itself provide a garbage collector --- this should be | |
75 | part of your language's runtime library. LLVM provides a framework for compile | |
76 | time :ref:`code generation plugins <plugin>`. The role of these plugins is to | |
77 | generate code and data structures which conforms to the *binary interface* | |
78 | specified by the *runtime library*. This is similar to the relationship between | |
79 | LLVM and DWARF debugging info, for example. The difference primarily lies in | |
80 | the lack of an established standard in the domain of garbage collection --- thus | |
81 | the plugins. | |
82 | ||
83 | The aspects of the binary interface with which LLVM's GC support is | |
84 | concerned are: | |
85 | ||
86 | * Creation of GC-safe points within code where collection is allowed to execute | |
87 | safely. | |
88 | ||
89 | * Computation of the stack map. For each safe point in the code, object | |
90 | references within the stack frame must be identified so that the collector may | |
91 | traverse and perhaps update them. | |
92 | ||
93 | * Write barriers when storing object references to the heap. These are commonly | |
94 | used to optimize incremental scans in generational collectors. | |
95 | ||
96 | * Emission of read barriers when loading object references. These are useful | |
97 | for interoperating with concurrent collectors. | |
98 | ||
99 | There are additional areas that LLVM does not directly address: | |
100 | ||
101 | * Registration of global roots with the runtime. | |
102 | ||
103 | * Registration of stack map entries with the runtime. | |
104 | ||
105 | * The functions used by the program to allocate memory, trigger a collection, | |
106 | etc. | |
107 | ||
108 | * Computation or compilation of type maps, or registration of them with the | |
109 | runtime. These are used to crawl the heap for object references. | |
110 | ||
111 | In general, LLVM's support for GC does not include features which can be | |
112 | adequately addressed with other features of the IR and does not specify a | |
113 | particular binary interface. On the plus side, this means that you should be | |
114 | able to integrate LLVM with an existing runtime. On the other hand, it leaves a | |
115 | lot of work for the developer of a novel language. However, it's easy to get | |
116 | started quickly and scale up to a more sophisticated implementation as your | |
117 | compiler matures. | |
118 | ||
119 | Getting started | |
120 | =============== | |
121 | ||
122 | Using a GC with LLVM implies many things, for example: | |
123 | ||
124 | * Write a runtime library or find an existing one which implements a GC heap. | |
125 | ||
126 | #. Implement a memory allocator. | |
127 | ||
128 | #. Design a binary interface for the stack map, used to identify references | |
129 | within a stack frame on the machine stack.\* | |
130 | ||
131 | #. Implement a stack crawler to discover functions on the call stack.\* | |
132 | ||
133 | #. Implement a registry for global roots. | |
134 | ||
135 | #. Design a binary interface for type maps, used to identify references | |
136 | within heap objects. | |
137 | ||
138 | #. Implement a collection routine bringing together all of the above. | |
139 | ||
140 | * Emit compatible code from your compiler. | |
141 | ||
142 | * Initialization in the main function. | |
143 | ||
144 | * Use the ``gc "..."`` attribute to enable GC code generation (or | |
145 | ``F.setGC("...")``). | |
146 | ||
147 | * Use ``@llvm.gcroot`` to mark stack roots. | |
148 | ||
149 | * Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` to manipulate GC references, | |
150 | if necessary. | |
151 | ||
152 | * Allocate memory using the GC allocation routine provided by the runtime | |
153 | library. | |
154 | ||
155 | * Generate type maps according to your runtime's binary interface. | |
156 | ||
157 | * Write a compiler plugin to interface LLVM with the runtime library.\* | |
158 | ||
159 | * Lower ``@llvm.gcread`` and ``@llvm.gcwrite`` to appropriate code | |
160 | sequences.\* | |
161 | ||
162 | * Compile LLVM's stack map to the binary form expected by the runtime. | |
163 | ||
164 | * Load the plugin into the compiler. Use ``llc -load`` or link the plugin | |
165 | statically with your language's compiler.\* | |
166 | ||
167 | * Link program executables with the runtime. | |
168 | ||
169 | To help with several of these tasks (those indicated with a \*), LLVM includes a | |
170 | highly portable, built-in ShadowStack code generator. It is compiled into | |
171 | ``llc`` and works even with the interpreter and C backends. | |
172 | ||
173 | In your compiler | |
174 | ---------------- | |
175 | ||
176 | To turn the shadow stack on for your functions, first call: | |
177 | ||
178 | .. code-block:: c++ | |
179 | ||
180 | F.setGC("shadow-stack"); | |
181 | ||
182 | for each function your compiler emits. Since the shadow stack is built into | |
183 | LLVM, you do not need to load a plugin. | |
184 | ||
185 | Your compiler must also use ``@llvm.gcroot`` as documented. Don't forget to | |
186 | create a root for each intermediate value that is generated when evaluating an | |
187 | expression. In ``h(f(), g())``, the result of ``f()`` could easily be collected | |
188 | if evaluating ``g()`` triggers a collection. | |
189 | ||
190 | There's no need to use ``@llvm.gcread`` and ``@llvm.gcwrite`` over plain | |
191 | ``load`` and ``store`` for now. You will need them when switching to a more | |
192 | advanced GC. | |
193 | ||
194 | In your runtime | |
195 | --------------- | |
196 | ||
197 | The shadow stack doesn't imply a memory allocation algorithm. A semispace | |
198 | collector or building atop ``malloc`` are great places to start, and can be | |
199 | implemented with very little code. | |
200 | ||
201 | When it comes time to collect, however, your runtime needs to traverse the stack | |
202 | roots, and for this it needs to integrate with the shadow stack. Luckily, doing | |
203 | so is very simple. (This code is heavily commented to help you understand the | |
204 | data structure, but there are only 20 lines of meaningful code.) | |
205 | ||
206 | .. code-block:: c++ | |
207 | ||
208 | /// @brief The map for a single function's stack frame. One of these is | |
209 | /// compiled as constant data into the executable for each function. | |
210 | /// | |
211 | /// Storage of metadata values is elided if the %metadata parameter to | |
212 | /// @llvm.gcroot is null. | |
213 | struct FrameMap { | |
214 | int32_t NumRoots; //< Number of roots in stack frame. | |
215 | int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. | |
216 | const void *Meta[0]; //< Metadata for each root. | |
217 | }; | |
218 | ||
219 | /// @brief A link in the dynamic shadow stack. One of these is embedded in | |
220 | /// the stack frame of each function on the call stack. | |
221 | struct StackEntry { | |
222 | StackEntry *Next; //< Link to next stack entry (the caller's). | |
223 | const FrameMap *Map; //< Pointer to constant FrameMap. | |
224 | void *Roots[0]; //< Stack roots (in-place array). | |
225 | }; | |
226 | ||
227 | /// @brief The head of the singly-linked list of StackEntries. Functions push | |
228 | /// and pop onto this in their prologue and epilogue. | |
229 | /// | |
230 | /// Since there is only a global list, this technique is not threadsafe. | |
231 | StackEntry *llvm_gc_root_chain; | |
232 | ||
233 | /// @brief Calls Visitor(root, meta) for each GC root on the stack. | |
234 | /// root and meta are exactly the values passed to | |
235 | /// @llvm.gcroot. | |
236 | /// | |
237 | /// Visitor could be a function to recursively mark live objects. Or it | |
238 | /// might copy them to another heap or generation. | |
239 | /// | |
240 | /// @param Visitor A function to invoke for every GC root on the stack. | |
241 | void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { | |
242 | for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { | |
243 | unsigned i = 0; | |
244 | ||
245 | // For roots [0, NumMeta), the metadata pointer is in the FrameMap. | |
246 | for (unsigned e = R->Map->NumMeta; i != e; ++i) | |
247 | Visitor(&R->Roots[i], R->Map->Meta[i]); | |
248 | ||
249 | // For roots [NumMeta, NumRoots), the metadata pointer is null. | |
250 | for (unsigned e = R->Map->NumRoots; i != e; ++i) | |
251 | Visitor(&R->Roots[i], NULL); | |
252 | } | |
253 | } | |
254 | ||
255 | About the shadow stack | |
256 | ---------------------- | |
257 | ||
258 | Unlike many GC algorithms which rely on a cooperative code generator to compile | |
259 | stack maps, this algorithm carefully maintains a linked list of stack roots | |
260 | [:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the | |
261 | machine stack. Maintaining this data structure is slower than using a stack map | |
262 | compiled into the executable as constant data, but has a significant portability | |
263 | advantage because it requires no special support from the target code generator, | |
264 | and does not require tricky platform-specific code to crawl the machine stack. | |
265 | ||
266 | The tradeoff for this simplicity and portability is: | |
267 | ||
268 | * High overhead per function call. | |
269 | ||
270 | * Not thread-safe. | |
271 | ||
272 | Still, it's an easy way to get started. After your compiler and runtime are up | |
273 | and running, writing a :ref:`plugin <plugin>` will allow you to take advantage | |
274 | of :ref:`more advanced GC features <collector-algos>` of LLVM in order to | |
275 | improve performance. | |
276 | ||
277 | .. _gc_intrinsics: | |
278 | ||
279 | IR features | |
280 | =========== | |
281 | ||
282 | This section describes the garbage collection facilities provided by the | |
283 | :doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these | |
284 | IR features is specified by the binary interface implemented by a :ref:`code | |
285 | generation plugin <plugin>`, not by this document. | |
286 | ||
287 | These facilities are limited to those strictly necessary; they are not intended | |
288 | to be a complete interface to any garbage collector. A program will need to | |
289 | interface with the GC library using the facilities provided by that program. | |
290 | ||
291 | Specifying GC code generation: ``gc "..."`` | |
292 | ------------------------------------------- | |
293 | ||
294 | .. code-block:: llvm | |
295 | ||
296 | define ty @name(...) gc "name" { ... | |
297 | ||
298 | The ``gc`` function attribute is used to specify the desired GC style to the | |
299 | compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``. | |
300 | ||
301 | Setting ``gc "name"`` on a function triggers a search for a matching code | |
302 | generation plugin "*name*"; it is that plugin which defines the exact nature of | |
303 | the code generated to support GC. If none is found, the compiler will raise an | |
304 | error. | |
305 | ||
306 | Specifying the GC style on a per-function basis allows LLVM to link together | |
307 | programs that use different garbage collection algorithms (or none at all). | |
308 | ||
309 | .. _gcroot: | |
310 | ||
311 | Identifying GC roots on the stack: ``llvm.gcroot`` | |
312 | -------------------------------------------------- | |
313 | ||
314 | .. code-block:: llvm | |
315 | ||
316 | void @llvm.gcroot(i8** %ptrloc, i8* %metadata) | |
317 | ||
318 | The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable | |
319 | references an object on the heap and is to be tracked for garbage collection. | |
320 | The exact impact on generated code is specified by a :ref:`compiler plugin | |
321 | <plugin>`. All calls to ``llvm.gcroot`` **must** reside inside the first basic | |
322 | block. | |
323 | ||
324 | A compiler which uses mem2reg to raise imperative code using ``alloca`` into SSA | |
325 | form need only add a call to ``@llvm.gcroot`` for those variables which a | |
326 | pointers into the GC heap. | |
327 | ||
328 | It is also important to mark intermediate values with ``llvm.gcroot``. For | |
329 | example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the | |
330 | case that ``g()`` triggers a collection. Note, that stack variables must be | |
331 | initialized and marked with ``llvm.gcroot`` in function's prologue. | |
332 | ||
333 | The first argument **must** be a value referring to an alloca instruction or a | |
334 | bitcast of an alloca. The second contains a pointer to metadata that should be | |
335 | associated with the pointer, and **must** be a constant or global value | |
336 | address. If your target collector uses tags, use a null pointer for metadata. | |
337 | ||
338 | The ``%metadata`` argument can be used to avoid requiring heap objects to have | |
339 | 'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified, | |
340 | its value will be tracked along with the location of the pointer in the stack | |
341 | frame. | |
342 | ||
343 | Consider the following fragment of Java code: | |
344 | ||
345 | .. code-block:: java | |
346 | ||
347 | { | |
348 | Object X; // A null-initialized reference to an object | |
349 | ... | |
350 | } | |
351 | ||
352 | This block (which may be located in the middle of a function or in a loop nest), | |
353 | could be compiled to this LLVM code: | |
354 | ||
355 | .. code-block:: llvm | |
356 | ||
357 | Entry: | |
358 | ;; In the entry block for the function, allocate the | |
359 | ;; stack space for X, which is an LLVM pointer. | |
360 | %X = alloca %Object* | |
361 | ||
362 | ;; Tell LLVM that the stack space is a stack root. | |
363 | ;; Java has type-tags on objects, so we pass null as metadata. | |
364 | %tmp = bitcast %Object** %X to i8** | |
365 | call void @llvm.gcroot(i8** %tmp, i8* null) | |
366 | ... | |
367 | ||
368 | ;; "CodeBlock" is the block corresponding to the start | |
369 | ;; of the scope above. | |
370 | CodeBlock: | |
371 | ;; Java null-initializes pointers. | |
372 | store %Object* null, %Object** %X | |
373 | ||
374 | ... | |
375 | ||
376 | ;; As the pointer goes out of scope, store a null value into | |
377 | ;; it, to indicate that the value is no longer live. | |
378 | store %Object* null, %Object** %X | |
379 | ... | |
380 | ||
381 | Reading and writing references in the heap | |
382 | ------------------------------------------ | |
383 | ||
384 | Some collectors need to be informed when the mutator (the program that needs | |
385 | garbage collection) either reads a pointer from or writes a pointer to a field | |
386 | of a heap object. The code fragments inserted at these points are called *read | |
387 | barriers* and *write barriers*, respectively. The amount of code that needs to | |
388 | be executed is usually quite small and not on the critical path of any | |
389 | computation, so the overall performance impact of the barrier is tolerable. | |
390 | ||
391 | Barriers often require access to the *object pointer* rather than the *derived | |
392 | pointer* (which is a pointer to the field within the object). Accordingly, | |
393 | these intrinsics take both pointers as separate arguments for completeness. In | |
394 | this snippet, ``%object`` is the object pointer, and ``%derived`` is the derived | |
395 | pointer: | |
396 | ||
397 | .. code-block:: llvm | |
398 | ||
399 | ;; An array type. | |
400 | %class.Array = type { %class.Object, i32, [0 x %class.Object*] } | |
401 | ... | |
402 | ||
403 | ;; Load the object pointer from a gcroot. | |
404 | %object = load %class.Array** %object_addr | |
405 | ||
406 | ;; Compute the derived pointer. | |
407 | %derived = getelementptr %object, i32 0, i32 2, i32 %n | |
408 | ||
409 | LLVM does not enforce this relationship between the object and derived pointer | |
410 | (although a :ref:`plugin <plugin>` might). However, it would be an unusual | |
411 | collector that violated it. | |
412 | ||
413 | The use of these intrinsics is naturally optional if the target GC does require | |
414 | the corresponding barrier. Such a GC plugin will replace the intrinsic calls | |
415 | with the corresponding ``load`` or ``store`` instruction if they are used. | |
416 | ||
417 | Write barrier: ``llvm.gcwrite`` | |
418 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
419 | ||
420 | .. code-block:: llvm | |
421 | ||
422 | void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) | |
423 | ||
424 | For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It | |
425 | has exactly the same semantics as a non-volatile ``store`` to the derived | |
426 | pointer (the third argument). The exact code generated is specified by a | |
427 | compiler :ref:`plugin <plugin>`. | |
428 | ||
429 | Many important algorithms require write barriers, including generational and | |
430 | concurrent collectors. Additionally, write barriers could be used to implement | |
431 | reference counting. | |
432 | ||
433 | Read barrier: ``llvm.gcread`` | |
434 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
435 | ||
436 | .. code-block:: llvm | |
437 | ||
438 | i8* @llvm.gcread(i8* %object, i8** %derived) | |
439 | ||
440 | For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has | |
441 | exactly the same semantics as a non-volatile ``load`` from the derived pointer | |
442 | (the second argument). The exact code generated is specified by a | |
443 | :ref:`compiler plugin <plugin>`. | |
444 | ||
445 | Read barriers are needed by fewer algorithms than write barriers, and may have a | |
446 | greater performance impact since pointer reads are more frequent than writes. | |
447 | ||
448 | .. _plugin: | |
449 | ||
450 | Implementing a collector plugin | |
451 | =============================== | |
452 | ||
453 | User code specifies which GC code generation to use with the ``gc`` function | |
454 | attribute or, equivalently, with the ``setGC`` method of ``Function``. | |
455 | ||
456 | To implement a GC plugin, it is necessary to subclass ``llvm::GCStrategy``, | |
457 | which can be accomplished in a few lines of boilerplate code. LLVM's | |
458 | infrastructure provides access to several important algorithms. For an | |
459 | uncontroversial collector, all that remains may be to compile LLVM's computed | |
460 | stack map to assembly code (using the binary representation expected by the | |
461 | runtime library). This can be accomplished in about 100 lines of code. | |
462 | ||
463 | This is not the appropriate place to implement a garbage collected heap or a | |
464 | garbage collector itself. That code should exist in the language's runtime | |
465 | library. The compiler plugin is responsible for generating code which conforms | |
466 | to the binary interface defined by library, most essentially the :ref:`stack map | |
467 | <stack-map>`. | |
468 | ||
469 | To subclass ``llvm::GCStrategy`` and register it with the compiler: | |
470 | ||
471 | .. code-block:: c++ | |
472 | ||
473 | // lib/MyGC/MyGC.cpp - Example LLVM GC plugin | |
474 | ||
475 | #include "llvm/CodeGen/GCStrategy.h" | |
476 | #include "llvm/CodeGen/GCMetadata.h" | |
477 | #include "llvm/Support/Compiler.h" | |
478 | ||
479 | using namespace llvm; | |
480 | ||
481 | namespace { | |
482 | class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { | |
483 | public: | |
484 | MyGC() {} | |
485 | }; | |
486 | ||
487 | GCRegistry::Add<MyGC> | |
488 | X("mygc", "My bespoke garbage collector."); | |
489 | } | |
490 | ||
491 | This boilerplate collector does nothing. More specifically: | |
492 | ||
493 | * ``llvm.gcread`` calls are replaced with the corresponding ``load`` | |
494 | instruction. | |
495 | ||
496 | * ``llvm.gcwrite`` calls are replaced with the corresponding ``store`` | |
497 | instruction. | |
498 | ||
499 | * No safe points are added to the code. | |
500 | ||
501 | * The stack map is not compiled into the executable. | |
502 | ||
1a4d82fc | 503 | Using the LLVM makefiles, this code |
970d7e83 LB |
504 | can be compiled as a plugin using a simple makefile: |
505 | ||
506 | .. code-block:: make | |
507 | ||
508 | # lib/MyGC/Makefile | |
509 | ||
510 | LEVEL := ../.. | |
511 | LIBRARYNAME = MyGC | |
512 | LOADABLE_MODULE = 1 | |
513 | ||
514 | include $(LEVEL)/Makefile.common | |
515 | ||
516 | Once the plugin is compiled, code using it may be compiled using ``llc | |
517 | -load=MyGC.so`` (though MyGC.so may have some other platform-specific | |
518 | extension): | |
519 | ||
520 | :: | |
521 | ||
522 | $ cat sample.ll | |
523 | define void @f() gc "mygc" { | |
524 | entry: | |
1a4d82fc | 525 | ret void |
970d7e83 LB |
526 | } |
527 | $ llvm-as < sample.ll | llc -load=MyGC.so | |
528 | ||
529 | It is also possible to statically link the collector plugin into tools, such as | |
530 | a language-specific compiler front-end. | |
531 | ||
532 | .. _collector-algos: | |
533 | ||
534 | Overview of available features | |
535 | ------------------------------ | |
536 | ||
537 | ``GCStrategy`` provides a range of features through which a plugin may do useful | |
538 | work. Some of these are callbacks, some are algorithms that can be enabled, | |
539 | disabled, or customized. This matrix summarizes the supported (and planned) | |
540 | features and correlates them with the collection techniques which typically | |
541 | require them. | |
542 | ||
543 | .. |v| unicode:: 0x2714 | |
544 | :trim: | |
545 | ||
546 | .. |x| unicode:: 0x2718 | |
547 | :trim: | |
548 | ||
549 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
550 | | Algorithm | Done | Shadow | refcount | mark- | copying | incremental | threaded | concurrent | | |
551 | | | | stack | | sweep | | | | | | |
552 | +============+======+========+==========+=======+=========+=============+==========+============+ | |
553 | | stack map | |v| | | | |x| | |x| | |x| | |x| | |x| | | |
554 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
555 | | initialize | |v| | |x| | |x| | |x| | |x| | |x| | |x| | |x| | | |
556 | | roots | | | | | | | | | | |
557 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
558 | | derived | NO | | | | | | **N**\* | **N**\* | | |
559 | | pointers | | | | | | | | | | |
560 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
561 | | **custom | |v| | | | | | | | | | |
562 | | lowering** | | | | | | | | | | |
563 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
564 | | *gcroot* | |v| | |x| | |x| | | | | | | | |
565 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
566 | | *gcwrite* | |v| | | |x| | | | |x| | | |x| | | |
567 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
568 | | *gcread* | |v| | | | | | | | |x| | | |
569 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
570 | | **safe | | | | | | | | | | |
571 | | points** | | | | | | | | | | |
572 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
573 | | *in | |v| | | | |x| | |x| | |x| | |x| | |x| | | |
574 | | calls* | | | | | | | | | | |
575 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
576 | | *before | |v| | | | | | | |x| | |x| | | |
577 | | calls* | | | | | | | | | | |
578 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
579 | | *for | NO | | | | | | **N** | **N** | | |
580 | | loops* | | | | | | | | | | |
581 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
582 | | *before | |v| | | | | | | |x| | |x| | | |
583 | | escape* | | | | | | | | | | |
584 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
585 | | emit code | NO | | | | | | **N** | **N** | | |
586 | | at safe | | | | | | | | | | |
587 | | points | | | | | | | | | | |
588 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
589 | | **output** | | | | | | | | | | |
590 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
591 | | *assembly* | |v| | | | |x| | |x| | |x| | |x| | |x| | | |
592 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
593 | | *JIT* | NO | | | **?** | **?** | **?** | **?** | **?** | | |
594 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
595 | | *obj* | NO | | | **?** | **?** | **?** | **?** | **?** | | |
596 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
597 | | live | NO | | | **?** | **?** | **?** | **?** | **?** | | |
598 | | analysis | | | | | | | | | | |
599 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
600 | | register | NO | | | **?** | **?** | **?** | **?** | **?** | | |
601 | | map | | | | | | | | | | |
602 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
603 | | \* Derived pointers only pose a hasard to copying collections. | | |
604 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
605 | | **?** denotes a feature which could be utilized if available. | | |
606 | +------------+------+--------+----------+-------+---------+-------------+----------+------------+ | |
607 | ||
608 | To be clear, the collection techniques above are defined as: | |
609 | ||
610 | Shadow Stack | |
611 | The mutator carefully maintains a linked list of stack roots. | |
612 | ||
613 | Reference Counting | |
614 | The mutator maintains a reference count for each object and frees an object | |
615 | when its count falls to zero. | |
616 | ||
617 | Mark-Sweep | |
618 | When the heap is exhausted, the collector marks reachable objects starting | |
619 | from the roots, then deallocates unreachable objects in a sweep phase. | |
620 | ||
621 | Copying | |
622 | As reachability analysis proceeds, the collector copies objects from one heap | |
623 | area to another, compacting them in the process. Copying collectors enable | |
624 | highly efficient "bump pointer" allocation and can improve locality of | |
625 | reference. | |
626 | ||
627 | Incremental | |
628 | (Including generational collectors.) Incremental collectors generally have all | |
629 | the properties of a copying collector (regardless of whether the mature heap | |
630 | is compacting), but bring the added complexity of requiring write barriers. | |
631 | ||
632 | Threaded | |
633 | Denotes a multithreaded mutator; the collector must still stop the mutator | |
634 | ("stop the world") before beginning reachability analysis. Stopping a | |
635 | multithreaded mutator is a complicated problem. It generally requires highly | |
1a4d82fc | 636 | platform-specific code in the runtime, and the production of carefully |
970d7e83 LB |
637 | designed machine code at safe points. |
638 | ||
639 | Concurrent | |
640 | In this technique, the mutator and the collector run concurrently, with the | |
641 | goal of eliminating pause times. In a *cooperative* collector, the mutator | |
642 | further aids with collection should a pause occur, allowing collection to take | |
643 | advantage of multiprocessor hosts. The "stop the world" problem of threaded | |
644 | collectors is generally still present to a limited extent. Sophisticated | |
645 | marking algorithms are necessary. Read barriers may be necessary. | |
646 | ||
647 | As the matrix indicates, LLVM's garbage collection infrastructure is already | |
648 | suitable for a wide variety of collectors, but does not currently extend to | |
649 | multithreaded programs. This will be added in the future as there is | |
650 | interest. | |
651 | ||
652 | .. _stack-map: | |
653 | ||
654 | Computing stack maps | |
655 | -------------------- | |
656 | ||
657 | LLVM automatically computes a stack map. One of the most important features | |
658 | of a ``GCStrategy`` is to compile this information into the executable in | |
659 | the binary representation expected by the runtime library. | |
660 | ||
661 | The stack map consists of the location and identity of each GC root in the | |
662 | each function in the module. For each root: | |
663 | ||
664 | * ``RootNum``: The index of the root. | |
665 | ||
666 | * ``StackOffset``: The offset of the object relative to the frame pointer. | |
667 | ||
668 | * ``RootMetadata``: The value passed as the ``%metadata`` parameter to the | |
669 | ``@llvm.gcroot`` intrinsic. | |
670 | ||
671 | Also, for the function as a whole: | |
672 | ||
673 | * ``getFrameSize()``: The overall size of the function's initial stack frame, | |
674 | not accounting for any dynamic allocation. | |
675 | ||
676 | * ``roots_size()``: The count of roots in the function. | |
677 | ||
678 | To access the stack map, use ``GCFunctionMetadata::roots_begin()`` and | |
679 | -``end()`` from the :ref:`GCMetadataPrinter <assembly>`: | |
680 | ||
681 | .. code-block:: c++ | |
682 | ||
683 | for (iterator I = begin(), E = end(); I != E; ++I) { | |
684 | GCFunctionInfo *FI = *I; | |
685 | unsigned FrameSize = FI->getFrameSize(); | |
686 | size_t RootCount = FI->roots_size(); | |
687 | ||
688 | for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), | |
689 | RE = FI->roots_end(); | |
690 | RI != RE; ++RI) { | |
691 | int RootNum = RI->Num; | |
692 | int RootStackOffset = RI->StackOffset; | |
693 | Constant *RootMetadata = RI->Metadata; | |
694 | } | |
695 | } | |
696 | ||
697 | If the ``llvm.gcroot`` intrinsic is eliminated before code generation by a | |
698 | custom lowering pass, LLVM will compute an empty stack map. This may be useful | |
699 | for collector plugins which implement reference counting or a shadow stack. | |
700 | ||
701 | .. _init-roots: | |
702 | ||
703 | Initializing roots to null: ``InitRoots`` | |
704 | ----------------------------------------- | |
705 | ||
706 | .. code-block:: c++ | |
707 | ||
708 | MyGC::MyGC() { | |
709 | InitRoots = true; | |
710 | } | |
711 | ||
712 | When set, LLVM will automatically initialize each root to ``null`` upon entry to | |
713 | the function. This prevents the GC's sweep phase from visiting uninitialized | |
714 | pointers, which will almost certainly cause it to crash. This initialization | |
715 | occurs before custom lowering, so the two may be used together. | |
716 | ||
717 | Since LLVM does not yet compute liveness information, there is no means of | |
718 | distinguishing an uninitialized stack root from an initialized one. Therefore, | |
719 | this feature should be used by all GC plugins. It is enabled by default. | |
720 | ||
721 | Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers`` | |
722 | --------------------------------------------------------------------------------------------------- | |
723 | ||
724 | For GCs which use barriers or unusual treatment of stack roots, these flags | |
725 | allow the collector to perform arbitrary transformations of the LLVM IR: | |
726 | ||
727 | .. code-block:: c++ | |
728 | ||
729 | class MyGC : public GCStrategy { | |
730 | public: | |
731 | MyGC() { | |
732 | CustomRoots = true; | |
733 | CustomReadBarriers = true; | |
734 | CustomWriteBarriers = true; | |
735 | } | |
736 | ||
737 | virtual bool initializeCustomLowering(Module &M); | |
738 | virtual bool performCustomLowering(Function &F); | |
739 | }; | |
740 | ||
741 | If any of these flags are set, then LLVM suppresses its default lowering for the | |
742 | corresponding intrinsics and instead calls ``performCustomLowering``. | |
743 | ||
744 | LLVM's default action for each intrinsic is as follows: | |
745 | ||
746 | * ``llvm.gcroot``: Leave it alone. The code generator must see it or the stack | |
747 | map will not be computed. | |
748 | ||
749 | * ``llvm.gcread``: Substitute a ``load`` instruction. | |
750 | ||
751 | * ``llvm.gcwrite``: Substitute a ``store`` instruction. | |
752 | ||
753 | If ``CustomReadBarriers`` or ``CustomWriteBarriers`` are specified, then | |
754 | ``performCustomLowering`` **must** eliminate the corresponding barriers. | |
755 | ||
756 | ``performCustomLowering`` must comply with the same restrictions as | |
757 | :ref:`FunctionPass::runOnFunction <writing-an-llvm-pass-runOnFunction>` | |
758 | Likewise, ``initializeCustomLowering`` has the same semantics as | |
759 | :ref:`Pass::doInitialization(Module&) | |
760 | <writing-an-llvm-pass-doInitialization-mod>` | |
761 | ||
762 | The following can be used as a template: | |
763 | ||
764 | .. code-block:: c++ | |
765 | ||
1a4d82fc JJ |
766 | #include "llvm/IR/Module.h" |
767 | #include "llvm/IR/IntrinsicInst.h" | |
970d7e83 LB |
768 | |
769 | bool MyGC::initializeCustomLowering(Module &M) { | |
770 | return false; | |
771 | } | |
772 | ||
773 | bool MyGC::performCustomLowering(Function &F) { | |
774 | bool MadeChange = false; | |
775 | ||
776 | for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) | |
777 | for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ) | |
778 | if (IntrinsicInst *CI = dyn_cast<IntrinsicInst>(II++)) | |
779 | if (Function *F = CI->getCalledFunction()) | |
780 | switch (F->getIntrinsicID()) { | |
781 | case Intrinsic::gcwrite: | |
782 | // Handle llvm.gcwrite. | |
783 | CI->eraseFromParent(); | |
784 | MadeChange = true; | |
785 | break; | |
786 | case Intrinsic::gcread: | |
787 | // Handle llvm.gcread. | |
788 | CI->eraseFromParent(); | |
789 | MadeChange = true; | |
790 | break; | |
791 | case Intrinsic::gcroot: | |
792 | // Handle llvm.gcroot. | |
793 | CI->eraseFromParent(); | |
794 | MadeChange = true; | |
795 | break; | |
796 | } | |
797 | ||
798 | return MadeChange; | |
799 | } | |
800 | ||
801 | .. _safe-points: | |
802 | ||
803 | Generating safe points: ``NeededSafePoints`` | |
804 | -------------------------------------------- | |
805 | ||
806 | LLVM can compute four kinds of safe points: | |
807 | ||
808 | .. code-block:: c++ | |
809 | ||
810 | namespace GC { | |
811 | /// PointKind - The type of a collector-safe point. | |
812 | /// | |
813 | enum PointKind { | |
814 | Loop, //< Instr is a loop (backwards branch). | |
815 | Return, //< Instr is a return instruction. | |
816 | PreCall, //< Instr is a call instruction. | |
817 | PostCall //< Instr is the return address of a call. | |
818 | }; | |
819 | } | |
820 | ||
821 | A collector can request any combination of the four by setting the | |
822 | ``NeededSafePoints`` mask: | |
823 | ||
824 | .. code-block:: c++ | |
825 | ||
826 | MyGC::MyGC() { | |
827 | NeededSafePoints = 1 << GC::Loop | |
828 | | 1 << GC::Return | |
829 | | 1 << GC::PreCall | |
830 | | 1 << GC::PostCall; | |
831 | } | |
832 | ||
833 | It can then use the following routines to access safe points. | |
834 | ||
835 | .. code-block:: c++ | |
836 | ||
837 | for (iterator I = begin(), E = end(); I != E; ++I) { | |
838 | GCFunctionInfo *MD = *I; | |
839 | size_t PointCount = MD->size(); | |
840 | ||
841 | for (GCFunctionInfo::iterator PI = MD->begin(), | |
842 | PE = MD->end(); PI != PE; ++PI) { | |
843 | GC::PointKind PointKind = PI->Kind; | |
844 | unsigned PointNum = PI->Num; | |
845 | } | |
846 | } | |
847 | ||
848 | Almost every collector requires ``PostCall`` safe points, since these correspond | |
849 | to the moments when the function is suspended during a call to a subroutine. | |
850 | ||
851 | Threaded programs generally require ``Loop`` safe points to guarantee that the | |
852 | application will reach a safe point within a bounded amount of time, even if it | |
853 | is executing a long-running loop which contains no function calls. | |
854 | ||
855 | Threaded collectors may also require ``Return`` and ``PreCall`` safe points to | |
856 | implement "stop the world" techniques using self-modifying code, where it is | |
857 | important that the program not exit the function without reaching a safe point | |
858 | (because only the topmost function has been patched). | |
859 | ||
860 | .. _assembly: | |
861 | ||
862 | Emitting assembly code: ``GCMetadataPrinter`` | |
863 | --------------------------------------------- | |
864 | ||
865 | LLVM allows a plugin to print arbitrary assembly code before and after the rest | |
866 | of a module's assembly code. At the end of the module, the GC can compile the | |
867 | LLVM stack map into assembly code. (At the beginning, this information is not | |
868 | yet computed.) | |
869 | ||
870 | Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract | |
871 | base class and registry is provided for printing assembly code, the | |
872 | ``GCMetadaPrinter`` and ``GCMetadataPrinterRegistry``. The AsmWriter will look | |
873 | for such a subclass if the ``GCStrategy`` sets ``UsesMetadata``: | |
874 | ||
875 | .. code-block:: c++ | |
876 | ||
877 | MyGC::MyGC() { | |
878 | UsesMetadata = true; | |
879 | } | |
880 | ||
881 | This separation allows JIT-only clients to be smaller. | |
882 | ||
883 | Note that LLVM does not currently have analogous APIs to support code generation | |
884 | in the JIT, nor using the object writers. | |
885 | ||
886 | .. code-block:: c++ | |
887 | ||
888 | // lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer | |
889 | ||
890 | #include "llvm/CodeGen/GCMetadataPrinter.h" | |
891 | #include "llvm/Support/Compiler.h" | |
892 | ||
893 | using namespace llvm; | |
894 | ||
895 | namespace { | |
896 | class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { | |
897 | public: | |
1a4d82fc | 898 | virtual void beginAssembly(AsmPrinter &AP); |
970d7e83 | 899 | |
1a4d82fc | 900 | virtual void finishAssembly(AsmPrinter &AP); |
970d7e83 LB |
901 | }; |
902 | ||
903 | GCMetadataPrinterRegistry::Add<MyGCPrinter> | |
904 | X("mygc", "My bespoke garbage collector."); | |
905 | } | |
906 | ||
1a4d82fc JJ |
907 | The collector should use ``AsmPrinter`` to print portable assembly code. The |
908 | collector itself contains the stack map for the entire module, and may access | |
909 | the ``GCFunctionInfo`` using its own ``begin()`` and ``end()`` methods. Here's | |
910 | a realistic example: | |
970d7e83 LB |
911 | |
912 | .. code-block:: c++ | |
913 | ||
914 | #include "llvm/CodeGen/AsmPrinter.h" | |
1a4d82fc JJ |
915 | #include "llvm/IR/Function.h" |
916 | #include "llvm/IR/DataLayout.h" | |
970d7e83 | 917 | #include "llvm/Target/TargetAsmInfo.h" |
1a4d82fc | 918 | #include "llvm/Target/TargetMachine.h" |
970d7e83 | 919 | |
1a4d82fc | 920 | void MyGCPrinter::beginAssembly(AsmPrinter &AP) { |
970d7e83 LB |
921 | // Nothing to do. |
922 | } | |
923 | ||
1a4d82fc JJ |
924 | void MyGCPrinter::finishAssembly(AsmPrinter &AP) { |
925 | MCStreamer &OS = AP.OutStreamer; | |
926 | unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize(); | |
970d7e83 LB |
927 | |
928 | // Put this in the data section. | |
1a4d82fc | 929 | OS.SwitchSection(AP.getObjFileLowering().getDataSection()); |
970d7e83 LB |
930 | |
931 | // For each function... | |
932 | for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { | |
933 | GCFunctionInfo &MD = **FI; | |
934 | ||
1a4d82fc | 935 | // A compact GC layout. Emit this data structure: |
970d7e83 LB |
936 | // |
937 | // struct { | |
938 | // int32_t PointCount; | |
1a4d82fc JJ |
939 | // void *SafePointAddress[PointCount]; |
940 | // int32_t StackFrameSize; // in words | |
941 | // int32_t StackArity; | |
942 | // int32_t LiveCount; | |
943 | // int32_t LiveOffsets[LiveCount]; | |
970d7e83 LB |
944 | // } __gcmap_<FUNCTIONNAME>; |
945 | ||
946 | // Align to address width. | |
1a4d82fc | 947 | AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3); |
970d7e83 LB |
948 | |
949 | // Emit PointCount. | |
1a4d82fc | 950 | OS.AddComment("safe point count"); |
970d7e83 | 951 | AP.EmitInt32(MD.size()); |
970d7e83 LB |
952 | |
953 | // And each safe point... | |
954 | for (GCFunctionInfo::iterator PI = MD.begin(), | |
1a4d82fc | 955 | PE = MD.end(); PI != PE; ++PI) { |
970d7e83 | 956 | // Emit the address of the safe point. |
1a4d82fc JJ |
957 | OS.AddComment("safe point address"); |
958 | MCSymbol *Label = PI->Label; | |
959 | AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/); | |
960 | } | |
961 | ||
962 | // Stack information never change in safe points! Only print info from the | |
963 | // first call-site. | |
964 | GCFunctionInfo::iterator PI = MD.begin(); | |
965 | ||
966 | // Emit the stack frame size. | |
967 | OS.AddComment("stack frame size (in words)"); | |
968 | AP.EmitInt32(MD.getFrameSize() / IntPtrSize); | |
969 | ||
970 | // Emit stack arity, i.e. the number of stacked arguments. | |
971 | unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6; | |
972 | unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ? | |
973 | MD.getFunction().arg_size() - RegisteredArgs : 0; | |
974 | OS.AddComment("stack arity"); | |
975 | AP.EmitInt32(StackArity); | |
976 | ||
977 | // Emit the number of live roots in the function. | |
978 | OS.AddComment("live root count"); | |
979 | AP.EmitInt32(MD.live_size(PI)); | |
980 | ||
981 | // And for each live root... | |
982 | for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), | |
983 | LE = MD.live_end(PI); | |
984 | LI != LE; ++LI) { | |
985 | // Emit live root's offset within the stack frame. | |
986 | OS.AddComment("stack index (offset / wordsize)"); | |
987 | AP.EmitInt32(LI->StackOffset); | |
970d7e83 LB |
988 | } |
989 | } | |
990 | } | |
991 | ||
992 | References | |
993 | ========== | |
994 | ||
995 | .. _appel89: | |
996 | ||
997 | [Appel89] Runtime Tags Aren't Necessary. Andrew W. Appel. Lisp and Symbolic | |
998 | Computation 19(7):703-705, July 1989. | |
999 | ||
1000 | .. _goldberg91: | |
1001 | ||
1002 | [Goldberg91] Tag-free garbage collection for strongly typed programming | |
1003 | languages. Benjamin Goldberg. ACM SIGPLAN PLDI'91. | |
1004 | ||
1005 | .. _tolmach94: | |
1006 | ||
1007 | [Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew | |
1008 | Tolmach. Proceedings of the 1994 ACM conference on LISP and functional | |
1009 | programming. | |
1010 | ||
1011 | .. _henderson02: | |
1012 | ||
1013 | [Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment | |
1014 | <http://citeseer.ist.psu.edu/henderson02accurate.html>`__ |