]> git.proxmox.com Git - rustc.git/blame - src/doc/rustc-dev-guide/src/ty.md
New upstream version 1.58.1+dfsg1
[rustc.git] / src / doc / rustc-dev-guide / src / ty.md
CommitLineData
a1dfa0c6
XL
1# The `ty` module: representing types
2
6a06907d
XL
3<!-- toc -->
4
dfeec247
XL
5The `ty` module defines how the Rust compiler represents types internally. It also defines the
6*typing context* (`tcx` or `TyCtxt`), which is the central data structure in the compiler.
7
8## `ty::Ty`
9
10When we talk about how rustc represents types, we usually refer to a type called `Ty` . There are
11quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]).
12
ba9703b0 13[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/index.html
dfeec247 14
6a06907d 15The specific `Ty` we are referring to is [`rustc_middle::ty::Ty`][ty_ty] (and not
dfeec247
XL
16[`rustc_hir::Ty`][hir_ty]). The distinction is important, so we will discuss it first before going
17into the details of `ty::Ty`.
18
ba9703b0 19[ty_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html
dfeec247
XL
20[hir_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.Ty.html
21
22## `rustc_hir::Ty` vs `ty::Ty`
23
24The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less
25the AST (see [this chapter](hir.md)) as it represents the
26syntax that the user wrote, and is obtained after parsing and some *desugaring*. It has a
27representation of types, but in reality it reflects more of what the user wrote, that is, what they
28wrote so as to represent that type.
29
30In contrast, `ty::Ty` represents the semantics of a type, that is, the *meaning* of what the user
31wrote. For example, `rustc_hir::Ty` would record the fact that a user used the name `u32` twice
32in their program, but the `ty::Ty` would record the fact that both usages refer to the same type.
33
6a06907d 34**Example: `fn foo(x: u32) → u32 { x }`** In this function we see that `u32` appears twice. We know
dfeec247
XL
35that that is the same type, i.e. the function takes an argument and returns an argument of the same
36type, but from the point of view of the HIR there would be two distinct type instances because these
37are occurring in two different places in the program. That is, they have two
38different [`Span`s][span] (locations).
39
40[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
41
6a06907d 42**Example: `fn foo(x: &u32) -> &u32`** In addition, HIR might have information left out. This type
dfeec247
XL
43`&u32` is incomplete, since in the full rust type there is actually a lifetime, but we didn’t need
44to write those lifetimes. There are also some elision rules that insert information. The result may
6a06907d 45look like `fn foo<'a>(x: &'a u32) -> &'a u32`.
dfeec247
XL
46
47In the HIR level, these things are not spelled out and you can say the picture is rather incomplete.
48However, at the `ty::Ty` level, these details are added and it is complete. Moreover, we will have
49exactly one `ty::Ty` for a given type, like `u32`, and that `ty::Ty` is used for all `u32`s in the
50whole program, not a specific usage, unlike `rustc_hir::Ty`.
51
52Here is a summary:
53
54| [`rustc_hir::Ty`][hir_ty] | [`ty::Ty`][ty_ty] |
55| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
56| Describe the *syntax* of a type: what the user wrote (with some desugaring). | Describe the *semantics* of a type: the meaning of what the user wrote. |
57| Each `rustc_hir::Ty` has its own spans corresponding to the appropriate place in the program. | Doesn’t correspond to a single place in the user’s program. |
58| `rustc_hir::Ty` has generics and lifetimes; however, some of those lifetimes are special markers like [`LifetimeName::Implicit`][implicit]. | `ty::Ty` has the full type, including generics and lifetimes, even if the user left them out |
59| `fn foo(x: u32) → u32 { }` - Two `rustc_hir::Ty` representing each usage of `u32`. Each has its own `Span`s, etc.- `rustc_hir::Ty` doesn’t tell us that both are the same type | `fn foo(x: u32) → u32 { }` - One `ty::Ty` for all instances of `u32` throughout the program.- `ty::Ty` tells us that both usages of `u32` mean the same type. |
60| `fn foo(x: &u32) -> &u32)`- Two `rustc_hir::Ty` again.- Lifetimes for the references show up in the `rustc_hir::Ty`s using a special marker, [`LifetimeName::Implicit`][implicit]. | `fn foo(x: &u32) -> &u32)`- A single `ty::Ty`.- The `ty::Ty` has the hidden lifetime param |
61
62[implicit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/enum.LifetimeName.html#variant.Implicit
63
64**Order** HIR is built directly from the AST, so it happens before any `ty::Ty` is produced. After
65HIR is built, some basic type inference and type checking is done. During the type inference, we
66figure out what the `ty::Ty` of everything is and we also check if the type of something is
67ambiguous. The `ty::Ty` then, is used for type checking while making sure everything has the
6a06907d 68expected type. The [`astconv` module][astconv] is where the code responsible for converting a
dfeec247
XL
69`rustc_hir::Ty` into a `ty::Ty` is located. This occurs during the type-checking phase,
70but also in other parts of the compiler that want to ask questions like "what argument types does
6a06907d 71this function expect?"
dfeec247
XL
72
73[astconv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/astconv/index.html
74
75**How semantics drive the two instances of `Ty`** You can think of HIR as the perspective
76of the type information that assumes the least. We assume two things are distinct until they are
77proven to be the same thing. In other words, we know less about them, so we should assume less about
78them.
79
80They are syntactically two strings: `"u32"` at line N column 20 and `"u32"` at line N column 35. We
81don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later,
82we determine that they semantically are the same type and that’s the `ty::Ty` we use.
83
6a06907d 84Consider another example: `fn foo<T>(x: T) -> u32`. Suppose that someone invokes `foo::<u32>(0)`.
dfeec247
XL
85This means that `T` and `u32` (in this invocation) actually turns out to be the same type, so we
86would eventually end up with the same `ty::Ty` in the end, but we have distinct `rustc_hir::Ty`.
87(This is a bit over-simplified, though, since during type checking, we would check the function
88generically and would still have a `T` distinct from `u32`. Later, when doing code generation,
89we would always be handling "monomorphized" (fully substituted) versions of each function,
6a06907d 90and hence we would know what `T` represents (and specifically that it is `u32`).)
dfeec247
XL
91
92Here is one more example:
93
94```rust
95mod a {
96 type X = u32;
6a06907d 97 pub fn foo(x: X) -> u32 { 22 }
dfeec247
XL
98}
99mod b {
100 type X = i32;
101 pub fn foo(x: X) -> i32 { x }
102}
103```
104
105Here the type `X` will vary depending on context, clearly. If you look at the `rustc_hir::Ty`,
106you will get back that `X` is an alias in both cases (though it will be mapped via name resolution
107to distinct aliases). But if you look at the `ty::Ty` signature, it will be either `fn(u32) -> u32`
108or `fn(i32) -> i32` (with type aliases fully expanded).
109
110## `ty::Ty` implementation
111
6a06907d
XL
112[`rustc_middle::ty::Ty`][ty_ty] is actually a type alias to [`&TyS`][tys].
113This type, which is short for "Type Structure", is where the main functionality is located.
114You can ignore `TyS` struct in general; you will basically never access it explicitly.
115We always pass it by reference using the `Ty` alias.
dfeec247
XL
116The only exception is to define inherent methods on types. In particular, `TyS` has a [`kind`][kind]
117field of type [`TyKind`][tykind], which represents the key type information. `TyKind` is a big enum
6a06907d
XL
118with variants to represent many different Rust types
119(e.g. primitives, references, abstract data types, generics, lifetimes, etc).
120`TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
dfeec247 121are convenient hacks for efficiency and summarize information about the type that we may want to
74b04a01
XL
122know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
123are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
124type. This allows us to do cheap comparisons for equality, along with the other
125benefits of interning.
dfeec247 126
ba9703b0
XL
127[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html
128[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyS.html#structfield.kind
129[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html
dfeec247 130
74b04a01
XL
131## Allocating and working with types
132
133To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
134that correspond mostly to the various kinds of types. For example:
135
136```rust,ignore
137let array_ty = tcx.mk_array(elem_ty, len * 2);
138```
139
140These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
141arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
142allocate exactly the same type twice).
143
6a06907d
XL
144> N.B.
145> Because types are interned, it is possible to compare them for equality efficiently using `==`
74b04a01
XL
146> – however, this is almost never what you want to do unless you happen to be hashing and looking
147> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
148> particularly once inference is involved. If you are going to be testing for type equality, you
149> probably need to start looking into the inference code to do it right.
150
6a06907d
XL
151You can also find various common types in the `tcx` itself by accessing its fields:
152`tcx.types.bool`, `tcx.types.char`, etc. (See [`CommonTypes`] for more.)
74b04a01 153
ba9703b0 154[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.CommonTypes.html
74b04a01
XL
155
156## `ty::TyKind` Variants
157
dfeec247
XL
158Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
159
160Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
161
162```rust,ignore
163fn foo(x: Ty<'tcx>) {
164 match x.kind {
165 ...
166 }
167}
168```
169
170The `kind` field is of type `TyKind<'tcx>`, which is an enum defining all of the different kinds of
171types in the compiler.
172
173> N.B. inspecting the `kind` field on types during type inference can be risky, as there may be
174> inference variables and other things to consider, or sometimes types are not yet known and will
175> become known later.
176
177There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
178“substitutions”, etc).
179
6a06907d
XL
180There are many variants on the `TyKind` enum, which you can see by looking at its
181[documentation][tykind]. Here is a sampling:
182
183- [**Algebraic Data Types (ADTs)**][kindadt] An [*algebraic data type*][wikiadt] is a `struct`,
184 `enum` or `union`. Under the hood, `struct`, `enum` and `union` are actually implemented
185 the same way: they are all [`ty::TyKind::Adt`][kindadt]. It’s basically a user defined type.
186 We will talk more about these later.
187- [**Foreign**][kindforeign] Corresponds to `extern type T`.
188- [**Str**][kindstr] Is the type str. When the user writes `&str`, `Str` is the how we represent the
189 `str` part of that type.
190- [**Slice**][kindslice] Corresponds to `[T]`.
191- [**Array**][kindarray] Corresponds to `[T; n]`.
192- [**RawPtr**][kindrawptr] Corresponds to `*mut T` or `*const T`.
193- [**Ref**][kindref] `Ref` stands for safe references, `&'a mut T` or `&'a T`. `Ref` has some
194 associated parts, like `Ty<'tcx>` which is the type that the reference references.
195 `Region<'tcx>` is the lifetime or region of the reference and `Mutability` if the reference
196 is mutable or not.
197- [**Param**][kindparam] Represents a type parameter (e.g. the `T` in `Vec<T>`).
198- [**Error**][kinderr] Represents a type error somewhere so that we can print better diagnostics. We
199 will discuss this more later.
200- [**And many more**...][kindvars]
dfeec247
XL
201
202[wikiadt]: https://en.wikipedia.org/wiki/Algebraic_data_type
ba9703b0
XL
203[kindadt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Adt
204[kindforeign]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Foreign
205[kindstr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Str
206[kindslice]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Slice
207[kindarray]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Array
208[kindrawptr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.RawPtr
209[kindref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Ref
210[kindparam]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Param
211[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variant.Error
212[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.TyKind.html#variants
dfeec247 213
dfeec247
XL
214## Import conventions
215
216Although there is no hard and fast rule, the `ty` module tends to be used like so:
a1dfa0c6
XL
217
218```rust,ignore
dfeec247
XL
219use ty::{self, Ty, TyCtxt};
220```
221
222In particular, since they are so common, the `Ty` and `TyCtxt` types are imported directly. Other
223types are often referenced with an explicit `ty::` prefix (e.g. `ty::TraitRef<'tcx>`). But some
224modules choose to import a larger or smaller set of names explicitly.
225
226## ADTs Representation
227
228Let's consider the example of a type like `MyStruct<u32>`, where `MyStruct` is defined like so:
229
230```rust,ignore
231struct MyStruct<T> { x: u32, y: T }
232```
233
234The type `MyStruct<u32>` would be an instance of `TyKind::Adt`:
235
236```rust,ignore
237Adt(&'tcx AdtDef, SubstsRef<'tcx>)
238// ------------ ---------------
239// (1) (2)
240//
241// (1) represents the `MyStruct` part
242// (2) represents the `<u32>`, or "substitutions" / generic arguments
243```
244
245There are two parts:
246
247- The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type
248 parameters. In our example, this is the `MyStruct` part *without* the argument `u32`.
6a06907d
XL
249 (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`,
250 they are all represented using `TyKind::Adt`.)
dfeec247
XL
251- The [`SubstsRef`][substsref] is an interned list of values that are to be substituted for the
252 generic parameters. In our example of `MyStruct<u32>`, we would end up with a list like `[u32]`.
253 We’ll dig more into generics and substitutions in a little bit.
254
ba9703b0
XL
255[adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html
256[substsref]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/subst/type.SubstsRef.html
dfeec247
XL
257
258**`AdtDef` and `DefId`**
259
260For every type defined in the source code, there is a unique `DefId` (see [this
261chapter](hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct<T>`
262definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that
263the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it
264is only referenced).
265
266`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is
267essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a
6a06907d
XL
268`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown
269by the `'tcx` lifetime.
dfeec247 270
ba9703b0 271[adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def
dfeec247
XL
272
273
dfeec247
XL
274## Type errors
275
276There is a `TyKind::Error` that is produced when the user makes a type error. The idea is that
277we would propagate this type and suppress other errors that come up due to it so as not to overwhelm
278the user with cascading compiler error messages.
279
6a06907d
XL
280There is an **important invariant** for `TyKind::Error`. The compiler should
281**never** produce `Error` unless we **know** that an error has already been
282reported to the user. This is usually
dfeec247
XL
283because (a) you just reported it right there or (b) you are propagating an existing Error type (in
284which case the error should've been reported when that error type was produced).
285
286It's important to maintain this invariant because the whole point of the `Error` type is to suppress
287other errors -- i.e., we don't report them. If we were to produce an `Error` type without actually
288emitting an error to the user, then this could cause later errors to be suppressed, and the
289compilation might inadvertently succeed!
290
291Sometimes there is a third case. You believe that an error has been reported, but you believe it
292would've been reported earlier in the compilation, not locally. In that case, you can invoke
293[`delay_span_bug`] This will make a note that you expect compilation to yield an error -- if however
294compilation should succeed, then it will trigger a compiler bug report.
295
296[`delay_span_bug`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/struct.Session.html#method.delay_span_bug
297
6a06907d
XL
298For added safety, it's not actually possible to produce a `TyKind::Error` value
299outside of [`rustc_middle::ty`][ty]; there is a private member of
300`TyKind::Error` that prevents it from being constructable elsewhere. Instead,
301one should use the [`TyCtxt::ty_error`][terr] or
302[`TyCtxt::ty_error_with_message`][terrmsg] methods. These methods automatically
303call `delay_span_bug` before returning an interned `Ty` of kind `Error`. If you
304were already planning to use [`delay_span_bug`], then you can just pass the
305span and message to [`ty_error_with_message`][terrmsg] instead to avoid
306delaying a redundant span bug.
307
308[terr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.ty_error
309[terrmsg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.ty_error_with_message
310
dfeec247
XL
311## Question: Why not substitute “inside” the `AdtDef`?
312
313Recall that we represent a generic struct with `(AdtDef, substs)`. So why bother with this scheme?
314
3c0e092e 315Well, the alternate way we could have chosen to represent types would be to always create a new,
dfeec247
XL
316fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like
317less of a hassle. However, the `(AdtDef, substs)` scheme has some advantages over this.
318
319First, `(AdtDef, substs)` scheme has an efficiency win:
320
321```rust,ignore
322struct MyStruct<T> {
323 ... 100s of fields ...
324}
325
326// Want to do: MyStruct<A> ==> MyStruct<B>
a1dfa0c6
XL
327```
328
dfeec247
XL
329in an example like this, we can subst from `MyStruct<A>` to `MyStruct<B>` (and so on) very cheaply,
330by just replacing the one reference to `A` with `B`. But if we eagerly substituted all the fields,
331that could be a lot more work because we might have to go through all of the fields in the `AdtDef`
332and update all of their types.
333
334A bit more deeply, this corresponds to structs in Rust being [*nominal* types][nominal] — which
335means that they are defined by their *name* (and that their contents are then indexed from the
336definition of that name, and not carried along “within” the type itself).
337
338[nominal]: https://en.wikipedia.org/wiki/Nominal_type_system