src/doc/rustc-dev-guide/src/ty.md

   1 # The `ty` module: representing types
   2
   3 The `ty` module defines how the Rust compiler represents types internally. It also defines the
   4 *typing context* (`tcx` or `TyCtxt`), which is the central data structure in the compiler.
   5
   6 ## `ty::Ty`
   7
   8 When we talk about how rustc represents types,  we usually refer to a type called `Ty` . There are
   9 quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]).
  10
  11 [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/index.html
  12
  13 The specific `Ty` we are referring to is [`rustc::ty::Ty`][ty_ty] (and not
  14 [`rustc_hir::Ty`][hir_ty]). The distinction is important, so we will discuss it first before going
  15 into the details of `ty::Ty`.
  16
  17 [ty_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html
  18 [hir_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.Ty.html
  19
  20 ## `rustc_hir::Ty` vs `ty::Ty`
  21
  22 The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less
  23 the AST (see [this chapter](hir.md)) as it represents the
  24 syntax that the user wrote, and is obtained after parsing and some *desugaring*. It has a
  25 representation of types, but in reality it reflects more of what the user wrote, that is, what they
  26 wrote so as to represent that type.
  27
  28 In contrast, `ty::Ty` represents the semantics of a type, that is, the *meaning* of what the user
  29 wrote. For example, `rustc_hir::Ty` would record the fact that a user used the name `u32` twice
  30 in their program, but the `ty::Ty` would record the fact that both usages refer to the same type.
  31
  32 **Example: `fn foo(x: u32) → u32 { }`** In this function we see that `u32` appears twice. We know
  33 that that is the same type, i.e. the function takes an argument and returns an argument of the same
  34 type, but from the point of view of the HIR there would be two distinct type instances because these
  35 are occurring in two different places in the program. That is, they have two
  36 different [`Span`s][span] (locations).
  37
  38 [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
  39
  40 **Example: `fn foo(x: &u32) -> &u32)`** In addition, HIR might have information left out. This type
  41 `&u32` is incomplete, since in the full rust type there is actually a lifetime, but we didn’t need
  42 to write those lifetimes. There are also some elision rules that insert information. The result may
  43 look like  `fn foo<'a>(x: &'a u32) -> &'a u32)`.
  44
  45 In the HIR level, these things are not spelled out and you can say the picture is rather incomplete.
  46 However, at the `ty::Ty` level, these details are added and it is complete. Moreover, we will have
  47 exactly one `ty::Ty` for a given type, like `u32`, and that `ty::Ty` is used for all `u32`s in the
  48 whole program, not a specific usage, unlike `rustc_hir::Ty`.
  49
  50 Here is a summary:
  51
  52 | [`rustc_hir::Ty`][hir_ty] | [`ty::Ty`][ty_ty] |
  53 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  54 | Describe the *syntax* of a type: what the user wrote (with some desugaring).  | Describe the *semantics* of a type: the meaning of what the user wrote. |
  55 | Each `rustc_hir::Ty` has its own spans corresponding to the appropriate place in the program. | Doesn’t correspond to a single place in the user’s program. |
  56 | `rustc_hir::Ty` has generics and lifetimes; however, some of those lifetimes are special markers like [`LifetimeName::Implicit`][implicit]. | `ty::Ty` has the full type, including generics and lifetimes, even if the user left them out |
  57 | `fn foo(x: u32) → u32 { }` - Two `rustc_hir::Ty` representing each usage of `u32`. Each has its own `Span`s, etc.- `rustc_hir::Ty` doesn’t tell us that both are the same type | `fn foo(x: u32) → u32 { }` - One `ty::Ty` for all instances of `u32` throughout the program.- `ty::Ty` tells us that both usages of `u32` mean the same type. |
  58 | `fn foo(x: &u32) -> &u32)`- Two `rustc_hir::Ty` again.- Lifetimes for the references show up in the `rustc_hir::Ty`s using a special marker, [`LifetimeName::Implicit`][implicit]. | `fn foo(x: &u32) -> &u32)`- A single `ty::Ty`.- The `ty::Ty` has the hidden lifetime param |
  59
  60 [implicit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/enum.LifetimeName.html#variant.Implicit
  61
  62 **Order** HIR is built directly from the AST, so it happens before any `ty::Ty` is produced. After
  63 HIR is built, some basic type inference and type checking is done. During the type inference, we
  64 figure out what the `ty::Ty` of everything is and we also check if the type of something is
  65 ambiguous. The `ty::Ty` then, is used for type checking while making sure everything has the
  66 expected type. The [`astconv` module][astconv], is where the code responsible for converting a
  67 `rustc_hir::Ty` into a `ty::Ty` is located. This occurs during the type-checking phase,
  68 but also in other parts of the compiler that want to ask questions like "what argument types does
  69 this function expect"?.
  70
  71 [astconv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/astconv/index.html
  72
  73 **How semantics drive the two instances of `Ty`** You can think of HIR as the perspective
  74 of the type information that assumes the least. We assume two things are distinct until they are
  75 proven to be the same thing. In other words, we know less about them, so we should assume less about
  76 them.
  77
  78 They are syntactically two strings: `"u32"` at line N column 20 and `"u32"` at line N column 35. We
  79 don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later,
  80 we determine that they semantically are the same type and that’s the `ty::Ty` we use.
  81
  82 Consider another example: `fn foo<T>(x: T) -> u32` and suppose that someone invokes `foo::<u32>(0)`.
  83 This means that `T` and `u32` (in this invocation) actually turns out to be the same type, so we
  84 would eventually end up with the same `ty::Ty` in the end, but we have distinct `rustc_hir::Ty`.
  85 (This is a bit over-simplified, though, since during type checking, we would check the function
  86 generically and would still have a `T` distinct from `u32`. Later, when doing code generation,
  87 we would always be handling "monomorphized" (fully substituted) versions of each function,
  88 and hence we would know what `T` represents (and specifically that it is `u32`).
  89
  90 Here is one more example:
  91
  92 ```rust
  93 mod a {
  94     type X = u32;
  95     pub fn foo(x: X) -> i32 { 22 }
  96 }
  97 mod b {
  98     type X = i32;
  99     pub fn foo(x: X) -> i32 { x }
 100 }
 101 ```
 102
 103 Here the type `X` will vary depending on context, clearly. If you look at the `rustc_hir::Ty`,
 104 you will get back that `X` is an alias in both cases (though it will be mapped via name resolution
 105 to distinct aliases). But if you look at the `ty::Ty` signature, it will be either `fn(u32) -> u32`
 106 or `fn(i32) -> i32` (with type aliases fully expanded).
 107
 108 ## `ty::Ty` implementation
 109
 110 [`rustc::ty::Ty`][ty_ty] is actually a type alias to [`&TyS`][tys] (more about that later). `TyS`
 111 (Type Structure) is where the main functionality is located. You can ignore `TyS` struct in general;
 112 you will basically never access it explicitly. We always pass it by reference using the `Ty` alias.
 113 The only exception is to define inherent methods on types. In particular, `TyS` has a [`kind`][kind]
 114 field of type [`TyKind`][tykind], which represents the key type information. `TyKind` is a big enum
 115 which represents different kinds of types (e.g. primitives, references, abstract data types,
 116 generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
 117 are convenient hacks for efficiency and summarize information about the type that we may want to
 118 know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
 119 are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
 120 type. This allows us to do cheap comparisons for equality, along with the other
 121 benefits of interning.
 122
 123 [tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html
 124 [kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html#structfield.kind
 125 [tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html
 126
 127 ## Allocating and working with types
 128
 129 To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
 130 that correspond mostly to the various kinds of types. For example:
 131
 132 ```rust,ignore
 133 let array_ty = tcx.mk_array(elem_ty, len * 2);
 134 ```
 135
 136 These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
 137 arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
 138 allocate exactly the same type twice).
 139
 140 > NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
 141 > – however, this is almost never what you want to do unless you happen to be hashing and looking
 142 > for duplicates. This is because often in Rust there are multiple ways to represent the same type,
 143 > particularly once inference is involved. If you are going to be testing for type equality, you
 144 > probably need to start looking into the inference code to do it right.
 145
 146 You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
 147 `tcx.types.char`, etc (see [`CommonTypes`] for more).
 148
 149 [`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.CommonTypes.html
 150
 151 ## `ty::TyKind` Variants
 152
 153 Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
 154
 155 Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
 156
 157 ```rust,ignore
 158 fn foo(x: Ty<'tcx>) {
 159   match x.kind {
 160     ...
 161   }
 162 }
 163 ```
 164
 165 The `kind` field is of type `TyKind<'tcx>`, which is an enum defining all of the different kinds of
 166 types in the compiler.
 167
 168 > N.B. inspecting the `kind` field on types during type inference can be risky, as there may be
 169 > inference variables and other things to consider, or sometimes types are not yet known and will
 170 > become known later.
 171
 172 There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
 173 “substitutions”, etc).
 174
 175 There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
 176 Here is a sampling:
 177
 178 [**Algebraic Data Types (ADTs)**]() An [*algebraic Data Type*][wikiadt] is a  `struct`, `enum` or
 179 `union`.  Under the hood, `struct`, `enum` and `union` are actually implemented the same way: they
 180 are all [`ty::TyKind::Adt`][kindadt].  It’s basically a user defined type. We will talk more about
 181 these later.
 182
 183 [**Foreign**][kindforeign] Corresponds to `extern type T`.
 184
 185 [**Str**][kindstr] Is the type str. When the user writes `&str`, `Str` is the how we represent the
 186 `str` part of that type.
 187
 188 [**Slice**][kindslice] Corresponds to `[T]`.
 189
 190 [**Array**][kindarray] Corresponds to `[T; n]`.
 191
 192 [**RawPtr**][kindrawptr] Corresponds to `*mut T` or `*const T`
 193
 194 [**Ref**][kindref] `Ref` stands for safe references, `&'a mut T` or `&'a T`. `Ref` has some
 195 associated parts, like `Ty<'tcx>` which is the type that the reference references, `Region<'tcx>` is
 196 the lifetime or region of the reference and `Mutability` if the reference is mutable or not.
 197
 198 [**Param**][kindparam] Represents a type parameter (e.g. the `T` in `Vec<T>`).
 199
 200 [**Error**][kinderr] Represents a type error somewhere so that we can print better diagnostics. We
 201 will discuss this more later.
 202
 203 [**And Many More**...][kindvars]
 204
 205 [wikiadt]: https://en.wikipedia.org/wiki/Algebraic_data_type
 206 [kindadt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Adt
 207 [kindforeign]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Foreign
 208 [kindstr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Str
 209 [kindslice]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Slice
 210 [kindarray]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Array
 211 [kindrawptr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.RawPtr
 212 [kindref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Ref
 213 [kindparam]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Param
 214 [kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Error
 215 [kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variants
 216
 217 ## Import conventions
 218
 219 Although there is no hard and fast rule, the `ty` module tends to be used like so:
 220
 221 ```rust,ignore
 222 use ty::{self, Ty, TyCtxt};
 223 ```
 224
 225 In particular, since they are so common, the `Ty` and `TyCtxt` types are imported directly. Other
 226 types are often referenced with an explicit `ty::` prefix (e.g. `ty::TraitRef<'tcx>`). But some
 227 modules choose to import a larger or smaller set of names explicitly.
 228
 229 ## ADTs Representation
 230
 231 Let's consider the example of a type like `MyStruct<u32>`, where `MyStruct` is defined like so:
 232
 233 ```rust,ignore
 234 struct MyStruct<T> { x: u32, y: T }
 235 ```
 236
 237 The type `MyStruct<u32>` would be an instance of `TyKind::Adt`:
 238
 239 ```rust,ignore
 240 Adt(&'tcx AdtDef, SubstsRef<'tcx>)
 241 //  ------------  ---------------
 242 //  (1)            (2)
 243 //
 244 // (1) represents the `MyStruct` part
 245 // (2) represents the `<u32>`, or "substitutions" / generic arguments
 246 ```
 247
 248 There are two parts:
 249
 250 - The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type
 251   parameters. In our example, this is the `MyStruct` part *without* the argument `u32`.
 252     - Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`,
 253       they are all represented using `TyKind::Adt`.
 254 - The [`SubstsRef`][substsref] is an interned list of values that are to be substituted for the
 255   generic parameters.  In our example of `MyStruct<u32>`, we would end up with a list like `[u32]`.
 256   We’ll dig more into generics and substitutions in a little bit.
 257
 258 [adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html
 259 [substsref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/subst/type.SubstsRef.html
 260
 261 **`AdtDef` and `DefId`**
 262
 263 For every type defined in the source code, there is a unique `DefId` (see [this
 264 chapter](hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct<T>`
 265 definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`.  Notice that
 266 the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it
 267 is only referenced).
 268
 269 `AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is
 270 essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a
 271 `DefId` with the [`tcx.adt_def(def_id)` query][adtdefq].  The `AdtDef`s are all interned (as you can
 272 see `'tcx` lifetime on it).
 273
 274 [adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def
 275
 276
 277 ## Type errors
 278
 279 There is a `TyKind::Error` that is produced when the user makes a type error. The idea is that
 280 we would propagate this type and suppress other errors that come up due to it so as not to overwhelm
 281 the user with cascading compiler error messages.
 282
 283 There is an **important invariant** for `TyKind::Error`. You should **never** return the 'error
 284 type' unless you **know** that an error has already been reported to the user. This is usually
 285 because (a) you just reported it right there or (b) you are propagating an existing Error type (in
 286 which case the error should've been reported when that error type was produced).
 287
 288 It's important to maintain this invariant because the whole point of the `Error` type is to suppress
 289 other errors -- i.e., we don't report them. If we were to produce an `Error` type without actually
 290 emitting an error to the user, then this could cause later errors to be suppressed, and the
 291 compilation might inadvertently succeed!
 292
 293 Sometimes there is a third case. You believe that an error has been reported, but you believe it
 294 would've been reported earlier in the compilation, not locally. In that case, you can invoke
 295 [`delay_span_bug`] This will make a note that you expect compilation to yield an error -- if however
 296 compilation should succeed, then it will trigger a compiler bug report.
 297
 298 [`delay_span_bug`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/struct.Session.html#method.delay_span_bug
 299
 300 ## Question: Why not substitute “inside” the `AdtDef`?
 301
 302 Recall that we represent a generic struct with `(AdtDef, substs)`. So why bother with this scheme?
 303
 304 Well, the alternate way we could have choosen to represent types would be to always create a new,
 305 fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like
 306 less of a hassle. However, the `(AdtDef, substs)` scheme has some advantages over this.
 307
 308 First, `(AdtDef, substs)` scheme has an efficiency win:
 309
 310 ```rust,ignore
 311 struct MyStruct<T> {
 312   ... 100s of fields ...
 313 }
 314
 315 // Want to do: MyStruct<A> ==> MyStruct<B>
 316 ```
 317
 318 in an example like this, we can subst from `MyStruct<A>` to `MyStruct<B>` (and so on) very cheaply,
 319 by just replacing the one reference to `A` with `B`. But if we eagerly substituted all the fields,
 320 that could be a lot more work because we might have to go through all of the fields in the `AdtDef`
 321 and update all of their types.
 322
 323 A bit more deeply, this corresponds to structs in Rust being [*nominal* types][nominal] — which
 324 means that they are defined by their *name* (and that their contents are then indexed from the
 325 definition of that name, and not carried along “within” the type itself).
 326
 327 [nominal]: https://en.wikipedia.org/wiki/Nominal_type_system