]>
Commit | Line | Data |
---|---|---|
54a0048b SL |
1 | //! The Rust Linkage Model and Symbol Names |
2 | //! ======================================= | |
3 | //! | |
4 | //! The semantic model of Rust linkage is, broadly, that "there's no global | |
5 | //! namespace" between crates. Our aim is to preserve the illusion of this | |
6 | //! model despite the fact that it's not *quite* possible to implement on | |
7 | //! modern linkers. We initially didn't use system linkers at all, but have | |
8 | //! been convinced of their utility. | |
9 | //! | |
10 | //! There are a few issues to handle: | |
11 | //! | |
12 | //! - Linkers operate on a flat namespace, so we have to flatten names. | |
13 | //! We do this using the C++ namespace-mangling technique. Foo::bar | |
14 | //! symbols and such. | |
15 | //! | |
16 | //! - Symbols for distinct items with the same *name* need to get different | |
17 | //! linkage-names. Examples of this are monomorphizations of functions or | |
18 | //! items within anonymous scopes that end up having the same path. | |
19 | //! | |
20 | //! - Symbols in different crates but with same names "within" the crate need | |
21 | //! to get different linkage-names. | |
22 | //! | |
23 | //! - Symbol names should be deterministic: Two consecutive runs of the | |
24 | //! compiler over the same code base should produce the same symbol names for | |
25 | //! the same items. | |
26 | //! | |
27 | //! - Symbol names should not depend on any global properties of the code base, | |
28 | //! so that small modifications to the code base do not result in all symbols | |
29 | //! changing. In previous versions of the compiler, symbol names incorporated | |
30 | //! the SVH (Stable Version Hash) of the crate. This scheme turned out to be | |
31 | //! infeasible when used in conjunction with incremental compilation because | |
32 | //! small code changes would invalidate all symbols generated previously. | |
33 | //! | |
34 | //! - Even symbols from different versions of the same crate should be able to | |
35 | //! live next to each other without conflict. | |
36 | //! | |
37 | //! In order to fulfill the above requirements the following scheme is used by | |
38 | //! the compiler: | |
39 | //! | |
40 | //! The main tool for avoiding naming conflicts is the incorporation of a 64-bit | |
41 | //! hash value into every exported symbol name. Anything that makes a difference | |
42 | //! to the symbol being named, but does not show up in the regular path needs to | |
43 | //! be fed into this hash: | |
44 | //! | |
45 | //! - Different monomorphizations of the same item have the same path but differ | |
46 | //! in their concrete type parameters, so these parameters are part of the | |
47 | //! data being digested for the symbol hash. | |
48 | //! | |
49 | //! - Rust allows items to be defined in anonymous scopes, such as in | |
50 | //! `fn foo() { { fn bar() {} } { fn bar() {} } }`. Both `bar` functions have | |
51 | //! the path `foo::bar`, since the anonymous scopes do not contribute to the | |
52 | //! path of an item. The compiler already handles this case via so-called | |
53 | //! disambiguating `DefPaths` which use indices to distinguish items with the | |
54 | //! same name. The DefPaths of the functions above are thus `foo[0]::bar[0]` | |
55 | //! and `foo[0]::bar[1]`. In order to incorporate this disambiguation | |
56 | //! information into the symbol name too, these indices are fed into the | |
57 | //! symbol hash, so that the above two symbols would end up with different | |
58 | //! hash values. | |
59 | //! | |
60 | //! The two measures described above suffice to avoid intra-crate conflicts. In | |
61 | //! order to also avoid inter-crate conflicts two more measures are taken: | |
62 | //! | |
63 | //! - The name of the crate containing the symbol is prepended to the symbol | |
0731742a | 64 | //! name, i.e., symbols are "crate qualified". For example, a function `foo` in |
54a0048b SL |
65 | //! module `bar` in crate `baz` would get a symbol name like |
66 | //! `baz::bar::foo::{hash}` instead of just `bar::foo::{hash}`. This avoids | |
67 | //! simple conflicts between functions from different crates. | |
68 | //! | |
69 | //! - In order to be able to also use symbols from two versions of the same | |
70 | //! crate (which naturally also have the same name), a stronger measure is | |
71 | //! required: The compiler accepts an arbitrary "disambiguator" value via the | |
a1dfa0c6 | 72 | //! `-C metadata` command-line argument. This disambiguator is then fed into |
54a0048b SL |
73 | //! the symbol hash of every exported item. Consequently, the symbols in two |
74 | //! identical crates but with different disambiguators are not in conflict | |
75 | //! with each other. This facility is mainly intended to be used by build | |
76 | //! tools like Cargo. | |
77 | //! | |
78 | //! A note on symbol name stability | |
79 | //! ------------------------------- | |
80 | //! Previous versions of the compiler resorted to feeding NodeIds into the | |
81 | //! symbol hash in order to disambiguate between items with the same path. The | |
82 | //! current version of the name generation algorithm takes great care not to do | |
83 | //! that, since NodeIds are notoriously unstable: A small change to the | |
84 | //! code base will offset all NodeIds after the change and thus, much as using | |
85 | //! the SVH in the hash, invalidate an unbounded number of symbol names. This | |
86 | //! makes re-using previously compiled code for incremental compilation | |
87 | //! virtually impossible. Thus, symbol hash generation exclusively relies on | |
88 | //! DefPaths which are much more robust in the face of changes to the code base. | |
89 | ||
ba9703b0 XL |
90 | #![doc(html_root_url = "https://doc.rust-lang.org/nightly/")] |
91 | #![feature(never_type)] | |
92 | #![feature(nll)] | |
93 | #![feature(or_patterns)] | |
94 | #![feature(in_band_lifetimes)] | |
95 | #![recursion_limit = "256"] | |
96 | ||
97 | #[macro_use] | |
98 | extern crate rustc_middle; | |
99 | ||
dfeec247 XL |
100 | use rustc_hir::def_id::{CrateNum, LOCAL_CRATE}; |
101 | use rustc_hir::Node; | |
ba9703b0 XL |
102 | use rustc_middle::middle::codegen_fn_attrs::CodegenFnAttrFlags; |
103 | use rustc_middle::mir::mono::{InstantiationMode, MonoItem}; | |
104 | use rustc_middle::ty::query::Providers; | |
105 | use rustc_middle::ty::subst::SubstsRef; | |
106 | use rustc_middle::ty::{self, Instance, TyCtxt}; | |
107 | use rustc_session::config::SymbolManglingVersion; | |
54a0048b | 108 | |
dfeec247 | 109 | use rustc_span::symbol::Symbol; |
54a0048b | 110 | |
9fa01778 XL |
111 | use log::debug; |
112 | ||
dc9dc135 XL |
113 | mod legacy; |
114 | mod v0; | |
54a0048b | 115 | |
ba9703b0 XL |
116 | pub mod test; |
117 | ||
dfeec247 XL |
118 | /// This function computes the symbol name for the given `instance` and the |
119 | /// given instantiating crate. That is, if you know that instance X is | |
120 | /// instantiated in crate Y, this is the symbol name this instance would have. | |
121 | pub fn symbol_name_for_instance_in_crate( | |
122 | tcx: TyCtxt<'tcx>, | |
123 | instance: Instance<'tcx>, | |
124 | instantiating_crate: CrateNum, | |
125 | ) -> String { | |
126 | compute_symbol_name(tcx, instance, || instantiating_crate) | |
127 | } | |
128 | ||
9fa01778 | 129 | pub fn provide(providers: &mut Providers<'_>) { |
dfeec247 XL |
130 | *providers = Providers { symbol_name: symbol_name_provider, ..*providers }; |
131 | } | |
ea8adc8c | 132 | |
dfeec247 XL |
133 | // The `symbol_name` query provides the symbol name for calling a given |
134 | // instance from the local crate. In particular, it will also look up the | |
135 | // correct symbol name of instances from upstream crates. | |
136 | fn symbol_name_provider(tcx: TyCtxt<'tcx>, instance: Instance<'tcx>) -> ty::SymbolName { | |
137 | let symbol_name = compute_symbol_name(tcx, instance, || { | |
138 | // This closure determines the instantiating crate for instances that | |
139 | // need an instantiating-crate-suffix for their symbol name, in order | |
140 | // to differentiate between local copies. | |
141 | if is_generic(instance.substs) { | |
142 | // For generics we might find re-usable upstream instances. If there | |
143 | // is one, we rely on the symbol being instantiated locally. | |
144 | instance.upstream_monomorphization(tcx).unwrap_or(LOCAL_CRATE) | |
145 | } else { | |
146 | // For non-generic things that need to avoid naming conflicts, we | |
147 | // always instantiate a copy in the local crate. | |
148 | LOCAL_CRATE | |
149 | } | |
150 | }); | |
151 | ||
152 | ty::SymbolName { name: Symbol::intern(&symbol_name) } | |
7cac9316 XL |
153 | } |
154 | ||
dfeec247 XL |
155 | /// Computes the symbol name for the given instance. This function will call |
156 | /// `compute_instantiating_crate` if it needs to factor the instantiating crate | |
157 | /// into the symbol name. | |
158 | fn compute_symbol_name( | |
159 | tcx: TyCtxt<'tcx>, | |
160 | instance: Instance<'tcx>, | |
161 | compute_instantiating_crate: impl FnOnce() -> CrateNum, | |
162 | ) -> String { | |
cc61c64b XL |
163 | let def_id = instance.def_id(); |
164 | let substs = instance.substs; | |
3157f602 | 165 | |
94b46f34 | 166 | debug!("symbol_name(def_id={:?}, substs={:?})", def_id, substs); |
54a0048b | 167 | |
f9f354fc XL |
168 | // FIXME(eddyb) Precompute a custom symbol name based on attributes. |
169 | let is_foreign = if let Some(def_id) = def_id.as_local() { | |
170 | if tcx.plugin_registrar_fn(LOCAL_CRATE) == Some(def_id.to_def_id()) { | |
cc61c64b | 171 | let disambiguator = tcx.sess.local_crate_disambiguator(); |
dfeec247 | 172 | return tcx.sess.generate_plugin_registrar_symbol(disambiguator); |
54a0048b | 173 | } |
f9f354fc | 174 | if tcx.proc_macro_decls_static(LOCAL_CRATE) == Some(def_id.to_def_id()) { |
cc61c64b | 175 | let disambiguator = tcx.sess.local_crate_disambiguator(); |
dfeec247 | 176 | return tcx.sess.generate_proc_macro_decls_symbol(disambiguator); |
3157f602 | 177 | } |
f9f354fc XL |
178 | let hir_id = tcx.hir().as_local_hir_id(def_id); |
179 | match tcx.hir().get(hir_id) { | |
b7449926 | 180 | Node::ForeignItem(_) => true, |
94b46f34 | 181 | _ => false, |
54a0048b | 182 | } |
cc61c64b | 183 | } else { |
7cac9316 | 184 | tcx.is_foreign_item(def_id) |
cc61c64b | 185 | }; |
54a0048b | 186 | |
b7449926 | 187 | let attrs = tcx.codegen_fn_attrs(def_id); |
dfeec247 XL |
188 | |
189 | // Foreign items by default use no mangling for their symbol name. There's a | |
190 | // few exceptions to this rule though: | |
191 | // | |
192 | // * This can be overridden with the `#[link_name]` attribute | |
193 | // | |
194 | // * On the wasm32 targets there is a bug (or feature) in LLD [1] where the | |
195 | // same-named symbol when imported from different wasm modules will get | |
74b04a01 | 196 | // hooked up incorrectly. As a result foreign symbols, on the wasm target, |
dfeec247 XL |
197 | // with a wasm import module, get mangled. Additionally our codegen will |
198 | // deduplicate symbols based purely on the symbol name, but for wasm this | |
199 | // isn't quite right because the same-named symbol on wasm can come from | |
200 | // different modules. For these reasons if `#[link(wasm_import_module)]` | |
201 | // is present we mangle everything on wasm because the demangled form will | |
202 | // show up in the `wasm-import-name` custom attribute in LLVM IR. | |
203 | // | |
204 | // [1]: https://bugs.llvm.org/show_bug.cgi?id=44316 | |
cc61c64b | 205 | if is_foreign { |
dfeec247 XL |
206 | if tcx.sess.target.target.arch != "wasm32" |
207 | || !tcx.wasm_import_module_map(def_id.krate).contains_key(&def_id) | |
208 | { | |
209 | if let Some(name) = attrs.link_name { | |
210 | return name.to_string(); | |
211 | } | |
212 | return tcx.item_name(def_id).to_string(); | |
3157f602 | 213 | } |
cc61c64b | 214 | } |
54a0048b | 215 | |
e74abb32 | 216 | if let Some(name) = attrs.export_name { |
cc61c64b | 217 | // Use provided name |
dfeec247 | 218 | return name.to_string(); |
cc61c64b | 219 | } |
54a0048b | 220 | |
b7449926 | 221 | if attrs.flags.contains(CodegenFnAttrFlags::NO_MANGLE) { |
cc61c64b | 222 | // Don't mangle |
dfeec247 | 223 | return tcx.item_name(def_id).to_string(); |
dc9dc135 XL |
224 | } |
225 | ||
dc9dc135 XL |
226 | let avoid_cross_crate_conflicts = |
227 | // If this is an instance of a generic function, we also hash in | |
228 | // the ID of the instantiating crate. This avoids symbol conflicts | |
229 | // in case the same instances is emitted in two crates of the same | |
230 | // project. | |
dfeec247 | 231 | is_generic(substs) || |
dc9dc135 XL |
232 | |
233 | // If we're dealing with an instance of a function that's inlined from | |
234 | // another crate but we're marking it as globally shared to our | |
235 | // compliation (aka we're not making an internal copy in each of our | |
236 | // codegen units) then this symbol may become an exported (but hidden | |
237 | // visibility) symbol. This means that multiple crates may do the same | |
238 | // and we want to be sure to avoid any symbol conflicts here. | |
239 | match MonoItem::Fn(instance).instantiation_mode(tcx) { | |
240 | InstantiationMode::GloballyShared { may_conflict: true } => true, | |
241 | _ => false, | |
cc61c64b | 242 | }; |
54a0048b | 243 | |
dfeec247 XL |
244 | let instantiating_crate = |
245 | if avoid_cross_crate_conflicts { Some(compute_instantiating_crate()) } else { None }; | |
9fa01778 | 246 | |
dc9dc135 XL |
247 | // Pick the crate responsible for the symbol mangling version, which has to: |
248 | // 1. be stable for each instance, whether it's being defined or imported | |
249 | // 2. obey each crate's own `-Z symbol-mangling-version`, as much as possible | |
250 | // We solve these as follows: | |
251 | // 1. because symbol names depend on both `def_id` and `instantiating_crate`, | |
252 | // both their `CrateNum`s are stable for any given instance, so we can pick | |
253 | // either and have a stable choice of symbol mangling version | |
254 | // 2. we favor `instantiating_crate` where possible (i.e. when `Some`) | |
255 | let mangling_version_crate = instantiating_crate.unwrap_or(def_id.krate); | |
256 | let mangling_version = if mangling_version_crate == LOCAL_CRATE { | |
257 | tcx.sess.opts.debugging_opts.symbol_mangling_version | |
258 | } else { | |
259 | tcx.symbol_mangling_version(mangling_version_crate) | |
260 | }; | |
9fa01778 | 261 | |
dfeec247 | 262 | match mangling_version { |
dc9dc135 XL |
263 | SymbolManglingVersion::Legacy => legacy::mangle(tcx, instance, instantiating_crate), |
264 | SymbolManglingVersion::V0 => v0::mangle(tcx, instance, instantiating_crate), | |
dfeec247 XL |
265 | } |
266 | } | |
9fa01778 | 267 | |
dfeec247 XL |
268 | fn is_generic(substs: SubstsRef<'_>) -> bool { |
269 | substs.non_erasable_generics().next().is_some() | |
54a0048b | 270 | } |