]> git.proxmox.com Git - rustc.git/blob - src/doc/reference/src/attributes/codegen.md
New upstream version 1.67.1+dfsg1
[rustc.git] / src / doc / reference / src / attributes / codegen.md
1 # Code generation attributes
2
3 The following [attributes] are used for controlling code generation.
4
5 ## Optimization hints
6
7 The `cold` and `inline` [attributes] give suggestions to generate code in a
8 way that may be faster than what it would do without the hint. The attributes
9 are only hints, and may be ignored.
10
11 Both attributes can be used on [functions]. When applied to a function in a
12 [trait], they apply only to that function when used as a default function for
13 a trait implementation and not to all trait implementations. The attributes
14 have no effect on a trait function without a body.
15
16 ### The `inline` attribute
17
18 The *`inline` [attribute]* suggests that a copy of the attributed function
19 should be placed in the caller, rather than generating code to call the
20 function where it is defined.
21
22 > ***Note***: The `rustc` compiler automatically inlines functions based on
23 > internal heuristics. Incorrectly inlining functions can make the program
24 > slower, so this attribute should be used with care.
25
26 There are three ways to use the inline attribute:
27
28 * `#[inline]` *suggests* performing an inline expansion.
29 * `#[inline(always)]` *suggests* that an inline expansion should always be
30 performed.
31 * `#[inline(never)]` *suggests* that an inline expansion should never be
32 performed.
33
34 > ***Note***: `#[inline]` in every form is a hint, with no *requirements*
35 > on the language to place a copy of the attributed function in the caller.
36
37 ### The `cold` attribute
38
39 The *`cold` [attribute]* suggests that the attributed function is unlikely to
40 be called.
41
42 ## The `no_builtins` attribute
43
44 The *`no_builtins` [attribute]* may be applied at the crate level to disable
45 optimizing certain code patterns to invocations of library functions that are
46 assumed to exist.
47
48 ## The `target_feature` attribute
49
50 The *`target_feature` [attribute]* may be applied to a function to
51 enable code generation of that function for specific platform architecture
52 features. It uses the [_MetaListNameValueStr_] syntax with a single key of
53 `enable` whose value is a string of comma-separated feature names to enable.
54
55 ```rust
56 # #[cfg(target_feature = "avx2")]
57 #[target_feature(enable = "avx2")]
58 unsafe fn foo_avx2() {}
59 ```
60
61 Each [target architecture] has a set of features that may be enabled. It is an
62 error to specify a feature for a target architecture that the crate is not
63 being compiled for.
64
65 It is [undefined behavior] to call a function that is compiled with a feature
66 that is not supported on the current platform the code is running on, *except*
67 if the platform explicitly documents this to be safe.
68
69 Functions marked with `target_feature` are not inlined into a context that
70 does not support the given features. The `#[inline(always)]` attribute may not
71 be used with a `target_feature` attribute.
72
73 ### Available features
74
75 The following is a list of the available feature names.
76
77 #### `x86` or `x86_64`
78
79 Executing code with unsupported features is undefined behavior on this platform.
80 Hence this platform requires that `#[target_feature]` is only applied to [`unsafe`
81 functions][unsafe function].
82
83 Feature | Implicitly Enables | Description
84 ------------|--------------------|-------------------
85 `adx` | | [ADX] — Multi-Precision Add-Carry Instruction Extensions
86 `aes` | `sse2` | [AES] — Advanced Encryption Standard
87 `avx` | `sse4.2` | [AVX] — Advanced Vector Extensions
88 `avx2` | `avx` | [AVX2] — Advanced Vector Extensions 2
89 `bmi1` | | [BMI1] — Bit Manipulation Instruction Sets
90 `bmi2` | | [BMI2] — Bit Manipulation Instruction Sets 2
91 `fma` | `avx` | [FMA3] — Three-operand fused multiply-add
92 `fxsr` | | [`fxsave`] and [`fxrstor`] — Save and restore x87 FPU, MMX Technology, and SSE State
93 `lzcnt` | | [`lzcnt`] — Leading zeros count
94 `pclmulqdq` | `sse2` | [`pclmulqdq`] — Packed carry-less multiplication quadword
95 `popcnt` | | [`popcnt`] — Count of bits set to 1
96 `rdrand` | | [`rdrand`] — Read random number
97 `rdseed` | | [`rdseed`] — Read random seed
98 `sha` | `sse2` | [SHA] — Secure Hash Algorithm
99 `sse` | | [SSE] — Streaming <abbr title="Single Instruction Multiple Data">SIMD</abbr> Extensions
100 `sse2` | `sse` | [SSE2] — Streaming SIMD Extensions 2
101 `sse3` | `sse2` | [SSE3] — Streaming SIMD Extensions 3
102 `sse4.1` | `ssse3` | [SSE4.1] — Streaming SIMD Extensions 4.1
103 `sse4.2` | `sse4.1` | [SSE4.2] — Streaming SIMD Extensions 4.2
104 `ssse3` | `sse3` | [SSSE3] — Supplemental Streaming SIMD Extensions 3
105 `xsave` | | [`xsave`] — Save processor extended states
106 `xsavec` | | [`xsavec`] — Save processor extended states with compaction
107 `xsaveopt` | | [`xsaveopt`] — Save processor extended states optimized
108 `xsaves` | | [`xsaves`] — Save processor extended states supervisor
109
110 <!-- Keep links near each table to make it easier to move and update. -->
111
112 [ADX]: https://en.wikipedia.org/wiki/Intel_ADX
113 [AES]: https://en.wikipedia.org/wiki/AES_instruction_set
114 [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
115 [AVX2]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX2
116 [BMI1]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
117 [BMI2]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#BMI2
118 [FMA3]: https://en.wikipedia.org/wiki/FMA_instruction_set
119 [`fxsave`]: https://www.felixcloutier.com/x86/fxsave
120 [`fxrstor`]: https://www.felixcloutier.com/x86/fxrstor
121 [`lzcnt`]: https://www.felixcloutier.com/x86/lzcnt
122 [`pclmulqdq`]: https://www.felixcloutier.com/x86/pclmulqdq
123 [`popcnt`]: https://www.felixcloutier.com/x86/popcnt
124 [`rdrand`]: https://en.wikipedia.org/wiki/RdRand
125 [`rdseed`]: https://en.wikipedia.org/wiki/RdRand
126 [SHA]: https://en.wikipedia.org/wiki/Intel_SHA_extensions
127 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
128 [SSE2]: https://en.wikipedia.org/wiki/SSE2
129 [SSE3]: https://en.wikipedia.org/wiki/SSE3
130 [SSE4.1]: https://en.wikipedia.org/wiki/SSE4#SSE4.1
131 [SSE4.2]: https://en.wikipedia.org/wiki/SSE4#SSE4.2
132 [SSSE3]: https://en.wikipedia.org/wiki/SSSE3
133 [`xsave`]: https://www.felixcloutier.com/x86/xsave
134 [`xsavec`]: https://www.felixcloutier.com/x86/xsavec
135 [`xsaveopt`]: https://www.felixcloutier.com/x86/xsaveopt
136 [`xsaves`]: https://www.felixcloutier.com/x86/xsaves
137
138 #### `aarch64`
139
140 This platform requires that `#[target_feature]` is only applied to [`unsafe`
141 functions][unsafe function].
142
143 Further documentation on these features can be found in the [ARM Architecture
144 Reference Manual], or elsewhere on [developer.arm.com].
145
146 [ARM Architecture Reference Manual]: https://developer.arm.com/documentation/ddi0487/latest
147 [developer.arm.com]: https://developer.arm.com
148
149 > ***Note***: The following pairs of features should both be marked as enabled
150 > or disabled together if used:
151 > - `paca` and `pacg`, which LLVM currently implements as one feature.
152
153
154 Feature | Implicitly Enables | Feature Name
155 ---------------|--------------------|-------------------
156 `aes` | `neon` | FEAT_AES - Advanced <abbr title="Single Instruction Multiple Data">SIMD</abbr> AES instructions
157 `bf16` | | FEAT_BF16 - BFloat16 instructions
158 `bti` | | FEAT_BTI - Branch Target Identification
159 `crc` | | FEAT_CRC - CRC32 checksum instructions
160 `dit` | | FEAT_DIT - Data Independent Timing instructions
161 `dotprod` | | FEAT_DotProd - Advanced SIMD Int8 dot product instructions
162 `dpb` | | FEAT_DPB - Data cache clean to point of persistence
163 `dpb2` | | FEAT_DPB2 - Data cache clean to point of deep persistence
164 `f32mm` | `sve` | FEAT_F32MM - SVE single-precision FP matrix multiply instruction
165 `f64mm` | `sve` | FEAT_F64MM - SVE double-precision FP matrix multiply instruction
166 `fcma` | `neon` | FEAT_FCMA - Floating point complex number support
167 `fhm` | `fp16` | FEAT_FHM - Half-precision FP FMLAL instructions
168 `flagm` | | FEAT_FlagM - Conditional flag manipulation
169 `fp16` | `neon` | FEAT_FP16 - Half-precision FP data processing
170 `frintts` | | FEAT_FRINTTS - Floating-point to int helper instructions
171 `i8mm` | | FEAT_I8MM - Int8 Matrix Multiplication
172 `jsconv` | `neon` | FEAT_JSCVT - JavaScript conversion instruction
173 `lse` | | FEAT_LSE - Large System Extension
174 `lor` | | FEAT_LOR - Limited Ordering Regions extension
175 `mte` | | FEAT_MTE - Memory Tagging Extension
176 `neon` | | FEAT_FP & FEAT_AdvSIMD - Floating Point and Advanced SIMD extension
177 `pan` | | FEAT_PAN - Privileged Access-Never extension
178 `paca` | | FEAT_PAuth - Pointer Authentication (address authentication)
179 `pacg` | | FEAT_PAuth - Pointer Authentication (generic authentication)
180 `pmuv3` | | FEAT_PMUv3 - Performance Monitors extension (v3)
181 `rand` | | FEAT_RNG - Random Number Generator
182 `ras` | | FEAT_RAS - Reliability, Availability and Serviceability extension
183 `rcpc` | | FEAT_LRCPC - Release consistent Processor Consistent
184 `rcpc2` | `rcpc` | FEAT_LRCPC2 - RcPc with immediate offsets
185 `rdm` | | FEAT_RDM - Rounding Double Multiply accumulate
186 `sb` | | FEAT_SB - Speculation Barrier
187 `sha2` | `neon` | FEAT_SHA1 & FEAT_SHA256 - Advanced SIMD SHA instructions
188 `sha3` | `sha2` | FEAT_SHA512 & FEAT_SHA3 - Advanced SIMD SHA instructions
189 `sm4` | `neon` | FEAT_SM3 & FEAT_SM4 - Advanced SIMD SM3/4 instructions
190 `spe` | | FEAT_SPE - Statistical Profiling Extension
191 `ssbs` | | FEAT_SSBS - Speculative Store Bypass Safe
192 `sve` | `fp16` | FEAT_SVE - Scalable Vector Extension
193 `sve2` | `sve` | FEAT_SVE2 - Scalable Vector Extension 2
194 `sve2-aes` | `sve2`, `aes` | FEAT_SVE_AES - SVE AES instructions
195 `sve2-sm4` | `sve2`, `sm4` | FEAT_SVE_SM4 - SVE SM4 instructions
196 `sve2-sha3` | `sve2`, `sha3` | FEAT_SVE_SHA3 - SVE SHA3 instructions
197 `sve2-bitperm` | `sve2` | FEAT_SVE_BitPerm - SVE Bit Permute
198 `tme` | | FEAT_TME - Transactional Memory Extension
199 `vh` | | FEAT_VHE - Virtualization Host Extensions
200
201 #### `wasm32` or `wasm64`
202
203 `#[target_feature]` may be used with both safe and
204 [`unsafe` functions][unsafe function] on Wasm platforms. It is impossible to
205 cause undefined behavior via the `#[target_feature]` attribute because
206 attempting to use instructions unsupported by the Wasm engine will fail at load
207 time without the risk of being interpreted in a way different from what the
208 compiler expected.
209
210 Feature | Description
211 ------------|-------------------
212 `simd128` | [WebAssembly simd proposal][simd128]
213
214 [simd128]: https://github.com/webassembly/simd
215
216 ### Additional information
217
218 See the [`target_feature` conditional compilation option] for selectively
219 enabling or disabling compilation of code based on compile-time settings. Note
220 that this option is not affected by the `target_feature` attribute, and is
221 only driven by the features enabled for the entire crate.
222
223 See the [`is_x86_feature_detected`] or [`is_aarch64_feature_detected`] macros
224 in the standard library for runtime feature detection on these platforms.
225
226 > Note: `rustc` has a default set of features enabled for each target and CPU.
227 > The CPU may be chosen with the [`-C target-cpu`] flag. Individual features
228 > may be enabled or disabled for an entire crate with the
229 > [`-C target-feature`] flag.
230
231 ## The `track_caller` attribute
232
233 The `track_caller` attribute may be applied to any function with [`"Rust"` ABI][rust-abi]
234 with the exception of the entry point `fn main`. When applied to functions and methods in
235 trait declarations, the attribute applies to all implementations. If the trait provides a
236 default implementation with the attribute, then the attribute also applies to override implementations.
237
238 When applied to a function in an `extern` block the attribute must also be applied to any linked
239 implementations, otherwise undefined behavior results. When applied to a function which is made
240 available to an `extern` block, the declaration in the `extern` block must also have the attribute,
241 otherwise undefined behavior results.
242
243 ### Behavior
244
245 Applying the attribute to a function `f` allows code within `f` to get a hint of the [`Location`] of
246 the "topmost" tracked call that led to `f`'s invocation. At the point of observation, an
247 implementation behaves as if it walks up the stack from `f`'s frame to find the nearest frame of an
248 *unattributed* function `outer`, and it returns the [`Location`] of the tracked call in `outer`.
249
250 ```rust
251 #[track_caller]
252 fn f() {
253 println!("{}", std::panic::Location::caller());
254 }
255 ```
256
257 > Note: `core` provides [`core::panic::Location::caller`] for observing caller locations. It wraps
258 > the [`core::intrinsics::caller_location`] intrinsic implemented by `rustc`.
259
260 > Note: because the resulting `Location` is a hint, an implementation may halt its walk up the stack
261 > early. See [Limitations](#limitations) for important caveats.
262
263 #### Examples
264
265 When `f` is called directly by `calls_f`, code in `f` observes its callsite within `calls_f`:
266
267 ```rust
268 # #[track_caller]
269 # fn f() {
270 # println!("{}", std::panic::Location::caller());
271 # }
272 fn calls_f() {
273 f(); // <-- f() prints this location
274 }
275 ```
276
277 When `f` is called by another attributed function `g` which is in turn called by `calls_g`, code in
278 both `f` and `g` observes `g`'s callsite within `calls_g`:
279
280 ```rust
281 # #[track_caller]
282 # fn f() {
283 # println!("{}", std::panic::Location::caller());
284 # }
285 #[track_caller]
286 fn g() {
287 println!("{}", std::panic::Location::caller());
288 f();
289 }
290
291 fn calls_g() {
292 g(); // <-- g() prints this location twice, once itself and once from f()
293 }
294 ```
295
296 When `g` is called by another attributed function `h` which is in turn called by `calls_h`, all code
297 in `f`, `g`, and `h` observes `h`'s callsite within `calls_h`:
298
299 ```rust
300 # #[track_caller]
301 # fn f() {
302 # println!("{}", std::panic::Location::caller());
303 # }
304 # #[track_caller]
305 # fn g() {
306 # println!("{}", std::panic::Location::caller());
307 # f();
308 # }
309 #[track_caller]
310 fn h() {
311 println!("{}", std::panic::Location::caller());
312 g();
313 }
314
315 fn calls_h() {
316 h(); // <-- prints this location three times, once itself, once from g(), once from f()
317 }
318 ```
319
320 And so on.
321
322 ### Limitations
323
324 This information is a hint and implementations are not required to preserve it.
325
326 In particular, coercing a function with `#[track_caller]` to a function pointer creates a shim which
327 appears to observers to have been called at the attributed function's definition site, losing actual
328 caller information across virtual calls. A common example of this coercion is the creation of a
329 trait object whose methods are attributed.
330
331 > Note: The aforementioned shim for function pointers is necessary because `rustc` implements
332 > `track_caller` in a codegen context by appending an implicit parameter to the function ABI, but
333 > this would be unsound for an indirect call because the parameter is not a part of the function's
334 > type and a given function pointer type may or may not refer to a function with the attribute. The
335 > creation of a shim hides the implicit parameter from callers of the function pointer, preserving
336 > soundness.
337
338 [_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax
339 [`-C target-cpu`]: ../../rustc/codegen-options/index.html#target-cpu
340 [`-C target-feature`]: ../../rustc/codegen-options/index.html#target-feature
341 [`is_x86_feature_detected`]: ../../std/arch/macro.is_x86_feature_detected.html
342 [`is_aarch64_feature_detected`]: ../../std/arch/macro.is_aarch64_feature_detected.html
343 [`target_feature` conditional compilation option]: ../conditional-compilation.md#target_feature
344 [attribute]: ../attributes.md
345 [attributes]: ../attributes.md
346 [functions]: ../items/functions.md
347 [target architecture]: ../conditional-compilation.md#target_arch
348 [trait]: ../items/traits.md
349 [undefined behavior]: ../behavior-considered-undefined.md
350 [unsafe function]: ../unsafe-keyword.md
351 [rust-abi]: ../items/external-blocks.md#abi
352 [`core::intrinsics::caller_location`]: ../../core/intrinsics/fn.caller_location.html
353 [`core::panic::Location::caller`]: ../../core/panic/struct.Location.html#method.caller
354 [`Location`]: ../../core/panic/struct.Location.html
355
356 ## The `instruction_set` attribute
357
358 The *`instruction_set` attribute* may be applied to a function to enable code generation for a specific
359 instruction set supported by the target architecture. It uses the [_MetaListPath_] syntax and a path
360 comprised of the architecture and instruction set to specify how to generate the code for
361 architectures where a single program may utilize multiple instruction sets.
362
363 The following values are available on targets for the `ARMv4` and `ARMv5te` architectures:
364
365 * `arm::a32` - Uses ARM code.
366 * `arm::t32` - Uses Thumb code.
367
368 <!-- ignore: arm-only -->
369 ```rust,ignore
370 #[instruction_set(arm::a32)]
371 fn foo_arm_code() {}
372
373 #[instruction_set(arm::t32)]
374 fn bar_thumb_code() {}
375 ```
376
377 [_MetaListPath_]: ../attributes.md#meta-item-attribute-syntax