]>
Commit | Line | Data |
---|---|---|
a1dfa0c6 XL |
1 | # The `ty` module: representing types |
2 | ||
6a06907d XL |
3 | <!-- toc --> |
4 | ||
dfeec247 XL |
5 | The `ty` module defines how the Rust compiler represents types internally. It also defines the |
6 | *typing context* (`tcx` or `TyCtxt`), which is the central data structure in the compiler. | |
7 | ||
8 | ## `ty::Ty` | |
9 | ||
10 | When we talk about how rustc represents types, we usually refer to a type called `Ty` . There are | |
11 | quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]). | |
12 | ||
ba9703b0 | 13 | [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/index.html |
dfeec247 | 14 | |
6a06907d | 15 | The specific `Ty` we are referring to is [`rustc_middle::ty::Ty`][ty_ty] (and not |
dfeec247 XL |
16 | [`rustc_hir::Ty`][hir_ty]). The distinction is important, so we will discuss it first before going |
17 | into the details of `ty::Ty`. | |
18 | ||
ba9703b0 | 19 | [ty_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html |
dfeec247 XL |
20 | [hir_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.Ty.html |
21 | ||
22 | ## `rustc_hir::Ty` vs `ty::Ty` | |
23 | ||
24 | The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less | |
25 | the AST (see [this chapter](hir.md)) as it represents the | |
26 | syntax that the user wrote, and is obtained after parsing and some *desugaring*. It has a | |
27 | representation of types, but in reality it reflects more of what the user wrote, that is, what they | |
28 | wrote so as to represent that type. | |
29 | ||
30 | In contrast, `ty::Ty` represents the semantics of a type, that is, the *meaning* of what the user | |
31 | wrote. For example, `rustc_hir::Ty` would record the fact that a user used the name `u32` twice | |
32 | in their program, but the `ty::Ty` would record the fact that both usages refer to the same type. | |
33 | ||
6a06907d | 34 | **Example: `fn foo(x: u32) → u32 { x }`** In this function we see that `u32` appears twice. We know |
dfeec247 XL |
35 | that that is the same type, i.e. the function takes an argument and returns an argument of the same |
36 | type, but from the point of view of the HIR there would be two distinct type instances because these | |
37 | are occurring in two different places in the program. That is, they have two | |
38 | different [`Span`s][span] (locations). | |
39 | ||
40 | [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html | |
41 | ||
6a06907d | 42 | **Example: `fn foo(x: &u32) -> &u32`** In addition, HIR might have information left out. This type |
dfeec247 XL |
43 | `&u32` is incomplete, since in the full rust type there is actually a lifetime, but we didn’t need |
44 | to write those lifetimes. There are also some elision rules that insert information. The result may | |
6a06907d | 45 | look like `fn foo<'a>(x: &'a u32) -> &'a u32`. |
dfeec247 XL |
46 | |
47 | In the HIR level, these things are not spelled out and you can say the picture is rather incomplete. | |
48 | However, at the `ty::Ty` level, these details are added and it is complete. Moreover, we will have | |
49 | exactly one `ty::Ty` for a given type, like `u32`, and that `ty::Ty` is used for all `u32`s in the | |
50 | whole program, not a specific usage, unlike `rustc_hir::Ty`. | |
51 | ||
52 | Here is a summary: | |
53 | ||
54 | | [`rustc_hir::Ty`][hir_ty] | [`ty::Ty`][ty_ty] | | |
55 | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | |
56 | | Describe the *syntax* of a type: what the user wrote (with some desugaring). | Describe the *semantics* of a type: the meaning of what the user wrote. | | |
57 | | Each `rustc_hir::Ty` has its own spans corresponding to the appropriate place in the program. | Doesn’t correspond to a single place in the user’s program. | | |
58 | | `rustc_hir::Ty` has generics and lifetimes; however, some of those lifetimes are special markers like [`LifetimeName::Implicit`][implicit]. | `ty::Ty` has the full type, including generics and lifetimes, even if the user left them out | | |
59 | | `fn foo(x: u32) → u32 { }` - Two `rustc_hir::Ty` representing each usage of `u32`. Each has its own `Span`s, etc.- `rustc_hir::Ty` doesn’t tell us that both are the same type | `fn foo(x: u32) → u32 { }` - One `ty::Ty` for all instances of `u32` throughout the program.- `ty::Ty` tells us that both usages of `u32` mean the same type. | | |
60 | | `fn foo(x: &u32) -> &u32)`- Two `rustc_hir::Ty` again.- Lifetimes for the references show up in the `rustc_hir::Ty`s using a special marker, [`LifetimeName::Implicit`][implicit]. | `fn foo(x: &u32) -> &u32)`- A single `ty::Ty`.- The `ty::Ty` has the hidden lifetime param | | |
61 | ||
62 | [implicit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/enum.LifetimeName.html#variant.Implicit | |
63 | ||
64 | **Order** HIR is built directly from the AST, so it happens before any `ty::Ty` is produced. After | |
65 | HIR is built, some basic type inference and type checking is done. During the type inference, we | |
66 | figure out what the `ty::Ty` of everything is and we also check if the type of something is | |
67 | ambiguous. The `ty::Ty` then, is used for type checking while making sure everything has the | |
6a06907d | 68 | expected type. The [`astconv` module][astconv] is where the code responsible for converting a |
dfeec247 XL |
69 | `rustc_hir::Ty` into a `ty::Ty` is located. This occurs during the type-checking phase, |
70 | but also in other parts of the compiler that want to ask questions like "what argument types does | |
6a06907d | 71 | this function expect?" |
dfeec247 XL |
72 | |
73 | [astconv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/astconv/index.html | |
74 | ||
75 | **How semantics drive the two instances of `Ty`** You can think of HIR as the perspective | |
76 | of the type information that assumes the least. We assume two things are distinct until they are | |
77 | proven to be the same thing. In other words, we know less about them, so we should assume less about | |
78 | them. | |
79 | ||
80 | They are syntactically two strings: `"u32"` at line N column 20 and `"u32"` at line N column 35. We | |
81 | don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later, | |
82 | we determine that they semantically are the same type and that’s the `ty::Ty` we use. | |
83 | ||
6a06907d | 84 | Consider another example: `fn foo<T>(x: T) -> u32`. Suppose that someone invokes `foo::<u32>(0)`. |
dfeec247 XL |
85 | This means that `T` and `u32` (in this invocation) actually turns out to be the same type, so we |
86 | would eventually end up with the same `ty::Ty` in the end, but we have distinct `rustc_hir::Ty`. | |
87 | (This is a bit over-simplified, though, since during type checking, we would check the function | |
88 | generically and would still have a `T` distinct from `u32`. Later, when doing code generation, | |
89 | we would always be handling "monomorphized" (fully substituted) versions of each function, | |
6a06907d | 90 | and hence we would know what `T` represents (and specifically that it is `u32`).) |
dfeec247 XL |
91 | |
92 | Here is one more example: | |
93 | ||
94 | ```rust | |
95 | mod a { | |
96 | type X = u32; | |
6a06907d | 97 | pub fn foo(x: X) -> u32 { 22 } |
dfeec247 XL |
98 | } |
99 | mod b { | |
100 | type X = i32; | |
101 | pub fn foo(x: X) -> i32 { x } | |
102 | } | |
103 | ``` | |
104 | ||
105 | Here the type `X` will vary depending on context, clearly. If you look at the `rustc_hir::Ty`, | |
106 | you will get back that `X` is an alias in both cases (though it will be mapped via name resolution | |
107 | to distinct aliases). But if you look at the `ty::Ty` signature, it will be either `fn(u32) -> u32` | |
108 | or `fn(i32) -> i32` (with type aliases fully expanded). | |
109 | ||
110 | ## `ty::Ty` implementation | |
111 | ||
6a06907d XL |
112 | [`rustc_middle::ty::Ty`][ty_ty] is actually a type alias to [`&TyS`][tys]. |
113 | This type, which is short for "Type Structure", is where the main functionality is located. | |
114 | You can ignore `TyS` struct in general; you will basically never access it explicitly. | |
115 | We always pass it by reference using the `Ty` alias. | |
dfeec247 XL |
116 | The only exception is to define inherent methods on types. In particular, `TyS` has a [`kind`][kind] |
117 | field of type [`TyKind`][tykind], which represents the key type information. `TyKind` is a big enum | |
6a06907d XL |
118 | with variants to represent many different Rust types |
119 | (e.g. primitives, references, abstract data types, generics, lifetimes, etc). | |
120 | `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They | |
dfeec247 | 121 | are convenient hacks for efficiency and summarize information about the type that we may want to |
74b04a01 XL |
122 | know, but they don’t come into the picture as much here. Finally, `ty::TyS`s |
123 | are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like | |
124 | type. This allows us to do cheap comparisons for equality, along with the other | |
125 | benefits of interning. | |
dfeec247 | 126 | |
ba9703b0 XL |
127 | [tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html |
128 | [kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html#structfield.kind | |
129 | [tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html | |
dfeec247 | 130 | |
74b04a01 XL |
131 | ## Allocating and working with types |
132 | ||
133 | To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names | |
134 | that correspond mostly to the various kinds of types. For example: | |
135 | ||
136 | ```rust,ignore | |
137 | let array_ty = tcx.mk_array(elem_ty, len * 2); | |
138 | ``` | |
139 | ||
140 | These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the | |
141 | arena that this `tcx` has access to. Types are always canonicalized and interned (so we never | |
142 | allocate exactly the same type twice). | |
143 | ||
6a06907d XL |
144 | > N.B. |
145 | > Because types are interned, it is possible to compare them for equality efficiently using `==` | |
74b04a01 XL |
146 | > – however, this is almost never what you want to do unless you happen to be hashing and looking |
147 | > for duplicates. This is because often in Rust there are multiple ways to represent the same type, | |
148 | > particularly once inference is involved. If you are going to be testing for type equality, you | |
149 | > probably need to start looking into the inference code to do it right. | |
150 | ||
6a06907d XL |
151 | You can also find various common types in the `tcx` itself by accessing its fields: |
152 | `tcx.types.bool`, `tcx.types.char`, etc. (See [`CommonTypes`] for more.) | |
74b04a01 | 153 | |
ba9703b0 | 154 | [`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.CommonTypes.html |
74b04a01 XL |
155 | |
156 | ## `ty::TyKind` Variants | |
157 | ||
dfeec247 XL |
158 | Note: `TyKind` is **NOT** the functional programming concept of *Kind*. |
159 | ||
160 | Whenever working with a `Ty` in the compiler, it is common to match on the kind of type: | |
161 | ||
162 | ```rust,ignore | |
163 | fn foo(x: Ty<'tcx>) { | |
164 | match x.kind { | |
165 | ... | |
166 | } | |
167 | } | |
168 | ``` | |
169 | ||
170 | The `kind` field is of type `TyKind<'tcx>`, which is an enum defining all of the different kinds of | |
171 | types in the compiler. | |
172 | ||
173 | > N.B. inspecting the `kind` field on types during type inference can be risky, as there may be | |
174 | > inference variables and other things to consider, or sometimes types are not yet known and will | |
175 | > become known later. | |
176 | ||
177 | There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes, | |
178 | “substitutions”, etc). | |
179 | ||
6a06907d XL |
180 | There are many variants on the `TyKind` enum, which you can see by looking at its |
181 | [documentation][tykind]. Here is a sampling: | |
182 | ||
183 | - [**Algebraic Data Types (ADTs)**][kindadt] An [*algebraic data type*][wikiadt] is a `struct`, | |
184 | `enum` or `union`. Under the hood, `struct`, `enum` and `union` are actually implemented | |
185 | the same way: they are all [`ty::TyKind::Adt`][kindadt]. It’s basically a user defined type. | |
186 | We will talk more about these later. | |
187 | - [**Foreign**][kindforeign] Corresponds to `extern type T`. | |
188 | - [**Str**][kindstr] Is the type str. When the user writes `&str`, `Str` is the how we represent the | |
189 | `str` part of that type. | |
190 | - [**Slice**][kindslice] Corresponds to `[T]`. | |
191 | - [**Array**][kindarray] Corresponds to `[T; n]`. | |
192 | - [**RawPtr**][kindrawptr] Corresponds to `*mut T` or `*const T`. | |
193 | - [**Ref**][kindref] `Ref` stands for safe references, `&'a mut T` or `&'a T`. `Ref` has some | |
194 | associated parts, like `Ty<'tcx>` which is the type that the reference references. | |
195 | `Region<'tcx>` is the lifetime or region of the reference and `Mutability` if the reference | |
196 | is mutable or not. | |
197 | - [**Param**][kindparam] Represents a type parameter (e.g. the `T` in `Vec<T>`). | |
198 | - [**Error**][kinderr] Represents a type error somewhere so that we can print better diagnostics. We | |
199 | will discuss this more later. | |
200 | - [**And many more**...][kindvars] | |
dfeec247 XL |
201 | |
202 | [wikiadt]: https://en.wikipedia.org/wiki/Algebraic_data_type | |
ba9703b0 XL |
203 | [kindadt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Adt |
204 | [kindforeign]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Foreign | |
205 | [kindstr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Str | |
206 | [kindslice]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Slice | |
207 | [kindarray]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Array | |
208 | [kindrawptr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.RawPtr | |
209 | [kindref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Ref | |
210 | [kindparam]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Param | |
211 | [kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Error | |
212 | [kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variants | |
dfeec247 | 213 | |
dfeec247 XL |
214 | ## Import conventions |
215 | ||
216 | Although there is no hard and fast rule, the `ty` module tends to be used like so: | |
a1dfa0c6 XL |
217 | |
218 | ```rust,ignore | |
dfeec247 XL |
219 | use ty::{self, Ty, TyCtxt}; |
220 | ``` | |
221 | ||
222 | In particular, since they are so common, the `Ty` and `TyCtxt` types are imported directly. Other | |
223 | types are often referenced with an explicit `ty::` prefix (e.g. `ty::TraitRef<'tcx>`). But some | |
224 | modules choose to import a larger or smaller set of names explicitly. | |
225 | ||
226 | ## ADTs Representation | |
227 | ||
228 | Let's consider the example of a type like `MyStruct<u32>`, where `MyStruct` is defined like so: | |
229 | ||
230 | ```rust,ignore | |
231 | struct MyStruct<T> { x: u32, y: T } | |
232 | ``` | |
233 | ||
234 | The type `MyStruct<u32>` would be an instance of `TyKind::Adt`: | |
235 | ||
236 | ```rust,ignore | |
237 | Adt(&'tcx AdtDef, SubstsRef<'tcx>) | |
238 | // ------------ --------------- | |
239 | // (1) (2) | |
240 | // | |
241 | // (1) represents the `MyStruct` part | |
242 | // (2) represents the `<u32>`, or "substitutions" / generic arguments | |
243 | ``` | |
244 | ||
245 | There are two parts: | |
246 | ||
247 | - The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type | |
248 | parameters. In our example, this is the `MyStruct` part *without* the argument `u32`. | |
6a06907d XL |
249 | (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`, |
250 | they are all represented using `TyKind::Adt`.) | |
dfeec247 XL |
251 | - The [`SubstsRef`][substsref] is an interned list of values that are to be substituted for the |
252 | generic parameters. In our example of `MyStruct<u32>`, we would end up with a list like `[u32]`. | |
253 | We’ll dig more into generics and substitutions in a little bit. | |
254 | ||
ba9703b0 XL |
255 | [adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html |
256 | [substsref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/subst/type.SubstsRef.html | |
dfeec247 XL |
257 | |
258 | **`AdtDef` and `DefId`** | |
259 | ||
260 | For every type defined in the source code, there is a unique `DefId` (see [this | |
261 | chapter](hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct<T>` | |
262 | definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that | |
263 | the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it | |
264 | is only referenced). | |
265 | ||
266 | `AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is | |
267 | essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a | |
6a06907d XL |
268 | `DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown |
269 | by the `'tcx` lifetime. | |
dfeec247 | 270 | |
ba9703b0 | 271 | [adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def |
dfeec247 XL |
272 | |
273 | ||
dfeec247 XL |
274 | ## Type errors |
275 | ||
276 | There is a `TyKind::Error` that is produced when the user makes a type error. The idea is that | |
277 | we would propagate this type and suppress other errors that come up due to it so as not to overwhelm | |
278 | the user with cascading compiler error messages. | |
279 | ||
6a06907d XL |
280 | There is an **important invariant** for `TyKind::Error`. The compiler should |
281 | **never** produce `Error` unless we **know** that an error has already been | |
282 | reported to the user. This is usually | |
dfeec247 XL |
283 | because (a) you just reported it right there or (b) you are propagating an existing Error type (in |
284 | which case the error should've been reported when that error type was produced). | |
285 | ||
286 | It's important to maintain this invariant because the whole point of the `Error` type is to suppress | |
287 | other errors -- i.e., we don't report them. If we were to produce an `Error` type without actually | |
288 | emitting an error to the user, then this could cause later errors to be suppressed, and the | |
289 | compilation might inadvertently succeed! | |
290 | ||
291 | Sometimes there is a third case. You believe that an error has been reported, but you believe it | |
292 | would've been reported earlier in the compilation, not locally. In that case, you can invoke | |
293 | [`delay_span_bug`] This will make a note that you expect compilation to yield an error -- if however | |
294 | compilation should succeed, then it will trigger a compiler bug report. | |
295 | ||
296 | [`delay_span_bug`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/struct.Session.html#method.delay_span_bug | |
297 | ||
6a06907d XL |
298 | For added safety, it's not actually possible to produce a `TyKind::Error` value |
299 | outside of [`rustc_middle::ty`][ty]; there is a private member of | |
300 | `TyKind::Error` that prevents it from being constructable elsewhere. Instead, | |
301 | one should use the [`TyCtxt::ty_error`][terr] or | |
302 | [`TyCtxt::ty_error_with_message`][terrmsg] methods. These methods automatically | |
303 | call `delay_span_bug` before returning an interned `Ty` of kind `Error`. If you | |
304 | were already planning to use [`delay_span_bug`], then you can just pass the | |
305 | span and message to [`ty_error_with_message`][terrmsg] instead to avoid | |
306 | delaying a redundant span bug. | |
307 | ||
308 | [terr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.ty_error | |
309 | [terrmsg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.ty_error_with_message | |
310 | ||
dfeec247 XL |
311 | ## Question: Why not substitute “inside” the `AdtDef`? |
312 | ||
313 | Recall that we represent a generic struct with `(AdtDef, substs)`. So why bother with this scheme? | |
314 | ||
3c0e092e | 315 | Well, the alternate way we could have chosen to represent types would be to always create a new, |
dfeec247 XL |
316 | fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like |
317 | less of a hassle. However, the `(AdtDef, substs)` scheme has some advantages over this. | |
318 | ||
319 | First, `(AdtDef, substs)` scheme has an efficiency win: | |
320 | ||
321 | ```rust,ignore | |
322 | struct MyStruct<T> { | |
323 | ... 100s of fields ... | |
324 | } | |
325 | ||
326 | // Want to do: MyStruct<A> ==> MyStruct<B> | |
a1dfa0c6 XL |
327 | ``` |
328 | ||
dfeec247 XL |
329 | in an example like this, we can subst from `MyStruct<A>` to `MyStruct<B>` (and so on) very cheaply, |
330 | by just replacing the one reference to `A` with `B`. But if we eagerly substituted all the fields, | |
331 | that could be a lot more work because we might have to go through all of the fields in the `AdtDef` | |
332 | and update all of their types. | |
333 | ||
334 | A bit more deeply, this corresponds to structs in Rust being [*nominal* types][nominal] — which | |
335 | means that they are defined by their *name* (and that their contents are then indexed from the | |
336 | definition of that name, and not carried along “within” the type itself). | |
337 | ||
338 | [nominal]: https://en.wikipedia.org/wiki/Nominal_type_system |