src/librustc_codegen_llvm/debuginfo/doc.rs

   1 //! # Debug Info Module
   2 //!
   3 //! This module serves the purpose of generating debug symbols. We use LLVM's
   4 //! [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html)
   5 //! features for generating the debug information. The general principle is
   6 //! this:
   7 //!
   8 //! Given the right metadata in the LLVM IR, the LLVM code generator is able to
   9 //! create DWARF debug symbols for the given code. The
  10 //! [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured
  11 //! much like DWARF *debugging information entries* (DIE), representing type
  12 //! information such as datatype layout, function signatures, block layout,
  13 //! variable location and scope information, etc. It is the purpose of this
  14 //! module to generate correct metadata and insert it into the LLVM IR.
  15 //!
  16 //! As the exact format of metadata trees may change between different LLVM
  17 //! versions, we now use LLVM
  18 //! [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html)
  19 //! to create metadata where possible. This will hopefully ease the adaption of
  20 //! this module to future LLVM versions.
  21 //!
  22 //! The public API of the module is a set of functions that will insert the
  23 //! correct metadata into the LLVM IR when called with the right parameters.
  24 //! The module is thus driven from an outside client with functions like
  25 //! `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`.
  26 //!
  27 //! Internally the module will try to reuse already created metadata by
  28 //! utilizing a cache. The way to get a shared metadata node when needed is
  29 //! thus to just call the corresponding function in this module:
  30 //!
  31 //!     let file_metadata = file_metadata(crate_context, path);
  32 //!
  33 //! The function will take care of probing the cache for an existing node for
  34 //! that exact file path.
  35 //!
  36 //! All private state used by the module is stored within either the
  37 //! CrateDebugContext struct (owned by the CodegenCx) or the
  38 //! FunctionDebugContext (owned by the FunctionCx).
  39 //!
  40 //! This file consists of three conceptual sections:
  41 //! 1. The public interface of the module
  42 //! 2. Module-internal metadata creation functions
  43 //! 3. Minor utility functions
  44 //!
  45 //!
  46 //! ## Recursive Types
  47 //!
  48 //! Some kinds of types, such as structs and enums can be recursive. That means
  49 //! that the type definition of some type X refers to some other type which in
  50 //! turn (transitively) refers to X. This introduces cycles into the type
  51 //! referral graph. A naive algorithm doing an on-demand, depth-first traversal
  52 //! of this graph when describing types, can get trapped in an endless loop
  53 //! when it reaches such a cycle.
  54 //!
  55 //! For example, the following simple type for a singly-linked list...
  56 //!
  57 //! ```
  58 //! struct List {
  59 //!     value: i32,
  60 //!     tail: Option<Box<List>>,
  61 //! }
  62 //! ```
  63 //!
  64 //! will generate the following callstack with a naive DFS algorithm:
  65 //!
  66 //! ```
  67 //! describe(t = List)
  68 //!   describe(t = i32)
  69 //!   describe(t = Option<Box<List>>)
  70 //!     describe(t = Box<List>)
  71 //!       describe(t = List) // at the beginning again...
  72 //!       ...
  73 //! ```
  74 //!
  75 //! To break cycles like these, we use "forward declarations". That is, when
  76 //! the algorithm encounters a possibly recursive type (any struct or enum), it
  77 //! immediately creates a type description node and inserts it into the cache
  78 //! *before* describing the members of the type. This type description is just
  79 //! a stub (as type members are not described and added to it yet) but it
  80 //! allows the algorithm to already refer to the type. After the stub is
  81 //! inserted into the cache, the algorithm continues as before. If it now
  82 //! encounters a recursive reference, it will hit the cache and does not try to
  83 //! describe the type anew.
  84 //!
  85 //! This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
  86 //! which represents a kind of continuation, storing all state needed to
  87 //! continue traversal at the type members after the type has been registered
  88 //! with the cache. (This implementation approach might be a tad over-
  89 //! engineered and may change in the future)
  90 //!
  91 //!
  92 //! ## Source Locations and Line Information
  93 //!
  94 //! In addition to data type descriptions the debugging information must also
  95 //! allow to map machine code locations back to source code locations in order
  96 //! to be useful. This functionality is also handled in this module. The
  97 //! following functions allow to control source mappings:
  98 //!
  99 //! + set_source_location()
 100 //! + clear_source_location()
 101 //! + start_emitting_source_locations()
 102 //!
 103 //! `set_source_location()` allows to set the current source location. All IR
 104 //! instructions created after a call to this function will be linked to the
 105 //! given source location, until another location is specified with
 106 //! `set_source_location()` or the source location is cleared with
 107 //! `clear_source_location()`. In the later case, subsequent IR instruction
 108 //! will not be linked to any source location. As you can see, this is a
 109 //! stateful API (mimicking the one in LLVM), so be careful with source
 110 //! locations set by previous calls. It's probably best to not rely on any
 111 //! specific state being present at a given point in code.
 112 //!
 113 //! One topic that deserves some extra attention is *function prologues*. At
 114 //! the beginning of a function's machine code there are typically a few
 115 //! instructions for loading argument values into allocas and checking if
 116 //! there's enough stack space for the function to execute. This *prologue* is
 117 //! not visible in the source code and LLVM puts a special PROLOGUE END marker
 118 //! into the line table at the first non-prologue instruction of the function.
 119 //! In order to find out where the prologue ends, LLVM looks for the first
 120 //! instruction in the function body that is linked to a source location. So,
 121 //! when generating prologue instructions we have to make sure that we don't
 122 //! emit source location information until the 'real' function body begins. For
 123 //! this reason, source location emission is disabled by default for any new
 124 //! function being codegened and is only activated after a call to the third
 125 //! function from the list above, `start_emitting_source_locations()`. This
 126 //! function should be called right before regularly starting to codegen the
 127 //! top-level block of the given function.
 128 //!
 129 //! There is one exception to the above rule: `llvm.dbg.declare` instruction
 130 //! must be linked to the source location of the variable being declared. For
 131 //! function parameters these `llvm.dbg.declare` instructions typically occur
 132 //! in the middle of the prologue, however, they are ignored by LLVM's prologue
 133 //! detection. The `create_argument_metadata()` and related functions take care
 134 //! of linking the `llvm.dbg.declare` instructions to the correct source
 135 //! locations even while source location emission is still disabled, so there
 136 //! is no need to do anything special with source location handling here.
 137 //!
 138 //! ## Unique Type Identification
 139 //!
 140 //! In order for link-time optimization to work properly, LLVM needs a unique
 141 //! type identifier that tells it across compilation units which types are the
 142 //! same as others. This type identifier is created by
 143 //! `TypeMap::get_unique_type_id_of_type()` using the following algorithm:
 144 //!
 145 //! (1) Primitive types have their name as ID
 146 //! (2) Structs, enums and traits have a multipart identifier
 147 //!
 148 //!     (1) The first part is the SVH (strict version hash) of the crate they
 149 //!          were originally defined in
 150 //!
 151 //!     (2) The second part is the ast::NodeId of the definition in their
 152 //!          original crate
 153 //!
 154 //!     (3) The final part is a concatenation of the type IDs of their concrete
 155 //!          type arguments if they are generic types.
 156 //!
 157 //! (3) Tuple-, pointer and function types are structurally identified, which
 158 //!     means that they are equivalent if their component types are equivalent
 159 //!     (i.e., (i32, i32) is the same regardless in which crate it is used).
 160 //!
 161 //! This algorithm also provides a stable ID for types that are defined in one
 162 //! crate but instantiated from metadata within another crate. We just have to
 163 //! take care to always map crate and `NodeId`s back to the original crate
 164 //! context.
 165 //!
 166 //! As a side-effect these unique type IDs also help to solve a problem arising
 167 //! from lifetime parameters. Since lifetime parameters are completely omitted
 168 //! in debuginfo, more than one `Ty` instance may map to the same debuginfo
 169 //! type metadata, that is, some struct `Struct<'a>` may have N instantiations
 170 //! with different concrete substitutions for `'a`, and thus there will be N
 171 //! `Ty` instances for the type `Struct<'a>` even though it is not generic
 172 //! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
 173 //! cheap identifier for type metadata -- we have done this in the past, but it
 174 //! led to unnecessary metadata duplication in the best case and LLVM
 175 //! assertions in the worst. However, the unique type ID as described above
 176 //! *can* be used as identifier. Since it is comparatively expensive to
 177 //! construct, though, `ty::type_id()` is still used additionally as an
 178 //! optimization for cases where the exact same type has been seen before
 179 //! (which is most of the time).