3 Rust provides support for inline assembly via the `asm!` macro.
4 It can be used to embed handwritten assembly in the assembly output generated by the compiler.
5 Generally this should not be necessary, but might be where the required performance or timing
6 cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
8 > **Note**: the examples here are given in x86/x86-64 assembly, but other architectures are also supported.
10 Inline assembly is currently supported on the following architectures:
18 Let us start with the simplest possible example:
28 This will insert a NOP (no operation) instruction into the assembly generated by the compiler.
29 Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert
30 arbitrary instructions and break various invariants. The instructions to be inserted are listed
31 in the first argument of the `asm!` macro as a string literal.
35 Now inserting an instruction that does nothing is rather boring. Let us do something that
36 actually acts on data:
43 asm!("mov {}, 5", out(reg) x);
48 This will write the value `5` into the `u64` variable `x`.
49 You can see that the string literal we use to specify instructions is actually a template string.
50 It is governed by the same rules as Rust [format strings][format-syntax].
51 The arguments that are inserted into the template however look a bit different than you may
52 be familiar with. First we need to specify if the variable is an input or an output of the
53 inline assembly. In this case it is an output. We declared this by writing `out`.
54 We also need to specify in what kind of register the assembly expects the variable.
55 In this case we put it in an arbitrary general purpose register by specifying `reg`.
56 The compiler will choose an appropriate register to insert into
57 the template and will read the variable from there after the inline assembly finishes executing.
59 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
61 Let us see another example that also uses an input:
79 This will add `5` to the input in variable `i` and write the result to variable `o`.
80 The particular way this assembly does this is first copying the value from `i` to the output,
81 and then adding `5` to it.
83 The example shows a few things:
85 First, we can see that `asm!` allows multiple template string arguments; each
86 one is treated as a separate line of assembly code, as if they were all joined
87 together with newlines between them. This makes it easy to format assembly
90 Second, we can see that inputs are declared by writing `in` instead of `out`.
92 Third, we can see that we can specify an argument number, or name as in any format string.
93 For inline assembly templates this is particularly useful as arguments are often used more than once.
94 For more complex inline assembly using this facility is generally recommended, as it improves
95 readability, and allows reordering instructions without changing the argument order.
97 We can further refine the above example to avoid the `mov` instruction:
104 asm!("add {0}, 5", inout(reg) x);
109 We can see that `inout` is used to specify an argument that is both input and output.
110 This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.
112 It is also possible to specify different variables for the input and output parts of an `inout` operand:
120 asm!("add {0}, 5", inout(reg) x => y);
125 ## Late output operands
127 The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`
128 can be written at any time, and can therefore not share its location with any other argument.
129 However, to guarantee optimal performance it is important to use as few registers as possible,
130 so they won't have to be saved and reloaded around the inline assembly block.
131 To achieve this Rust provides a `lateout` specifier. This can be used on any output that is
132 written only after all inputs have been consumed.
133 There is also a `inlateout` variant of this specifier.
135 Here is an example where `inlateout` *cannot* be used in `release` mode or other optimized cases:
154 The above could work well in unoptimized cases (`Debug` mode), but if you want optimized performance (`release` mode or other optimized cases), it could not work.
156 That is because in optimized cases, the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`. If `inlateout` was used, then `a` and `c` could be allocated to the same register, in which case the first instruction to overwrite the value of `c` and cause the assembly code to produce the wrong result.
158 However the following example can use `inlateout` since the output is only modified after all input registers have been read:
166 asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);
171 As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.
173 ## Explicit register operands
175 Some instructions require that the operands be in a specific register.
176 Therefore, Rust inline assembly provides some more specific constraint specifiers.
177 While `reg` is generally available on any architecture, explicit registers are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi` among others can be addressed by their name.
184 asm!("out 0x64, eax", in("eax") cmd);
188 In this example we call the `out` instruction to output the content of the `cmd` variable to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand we had to use the `eax` constraint specifier.
190 > **Note**: unlike other operand types, explicit register operands cannot be used in the template string: you can't use `{}` and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
192 Consider this example which uses the x86 `mul` instruction:
197 fn mul(a: u64, b: u64) -> u128 {
203 // The x86 mul instruction takes rax as an implicit input and writes
204 // the 128-bit result of the multiplication to rax:rdx.
207 inlateout("rax") b => lo,
212 ((hi as u128) << 64) + lo as u128
216 This uses the `mul` instruction to multiply two 64-bit inputs with a 128-bit result.
217 The only explicit operand is a register, that we fill from the variable `a`.
218 The second operand is implicit, and must be the `rax` register, which we fill from the variable `b`.
219 The lower 64 bits of the result are stored in `rax` from which we fill the variable `lo`.
220 The higher 64 bits are stored in `rdx` from which we fill the variable `hi`.
222 ## Clobbered registers
224 In many cases inline assembly will modify state that is not needed as an output.
225 Usually this is either because we have to use a scratch register in the assembly or because instructions modify state that we don't need to further examine.
226 This state is generally referred to as being "clobbered".
227 We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.
233 // three entries of four bytes each
234 let mut name_buf = [0_u8; 12];
235 // String is stored as ascii in ebx, edx, ecx in order
236 // Because ebx is reserved, the asm needs to preserve the value of it.
237 // So we push and pop it around the main asm.
238 // (in 64 bit mode for 64 bit processors, 32 bit processors would use ebx)
245 "mov [rdi + 4], edx",
246 "mov [rdi + 8], ecx",
248 // We use a pointer to an array for storing the values to simplify
249 // the Rust code at the cost of a couple more asm instructions
250 // This is more explicit with how the asm works however, as opposed
251 // to explicit register outputs such as `out("ecx") val`
252 // The *pointer itself* is only an input even though it's written behind
253 in("rdi") name_buf.as_mut_ptr(),
254 // select cpuid 0, also specify eax as clobbered
256 // cpuid clobbers these registers too
262 let name = core::str::from_utf8(&name_buf).unwrap();
263 println!("CPU Manufacturer ID: {}", name);
267 In the example above we use the `cpuid` instruction to read the CPU manufacturer ID.
268 This instruction writes to `eax` with the maximum supported `cpuid` argument and `ebx`, `edx`, and `ecx` with the CPU manufacturer ID as ASCII bytes in that order.
270 Even though `eax` is never read we still need to tell the compiler that the register has been modified so that the compiler can save any values that were in these registers before the asm. This is done by declaring it as an output but with `_` instead of a variable name, which indicates that the output value is to be discarded.
272 This code also works around the limitation that `ebx` is a reserved register by LLVM. That means that LLVM assumes that it has full control over the register and it must be restored to its original state before exiting the asm block, so it cannot be used as an input or output **except** if the compiler uses it to fulfill a general register class (e.g. `in(reg)`). This makes `reg` operands dangerous when using reserved registers as we could unknowingly corrupt out input or output because they share the same register.
274 To work around this we use `rdi` to store the pointer to the output array, save `ebx` via `push`, read from `ebx` inside the asm block into the array and then restoring `ebx` to its original state via `pop`. The `push` and `pop` use the full 64-bit `rbx` version of the register to ensure that the entire register is saved. On 32 bit targets the code would instead use `ebx` in the `push`/`pop`.
276 This can also be used with a general register class to obtain a scratch register for use inside the asm code:
281 // Multiply x by 6 using shifts and adds
293 assert_eq!(x, 4 * 6);
296 ## Symbol operands and ABI clobbers
298 By default, `asm!` assumes that any register not specified as an output will have its contents preserved by the assembly code. The [`clobber_abi`] argument to `asm!` tells the compiler to automatically insert the necessary clobber operands according to the given calling convention ABI: any register which is not fully preserved in that ABI will be treated as clobbered. Multiple `clobber_abi` arguments may be provided and all clobbers from all specified ABIs will be inserted.
300 [`clobber_abi`]: ../../reference/inline-assembly.html#abi-clobbers
305 extern "C" fn foo(arg: i32) -> i32 {
306 println!("arg = {}", arg);
310 fn call_foo(arg: i32) -> i32 {
315 // Function pointer to call
317 // 1st argument in rdi
319 // Return value in rax
321 // Mark all registers which are not preserved by the "C" calling
322 // convention as clobbered.
330 ## Register template modifiers
332 In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
334 By default the compiler will always choose the name that refers to the full register size (e.g. `rax` on x86-64, `eax` on x86, etc).
336 This default can be overridden by using modifiers on the template string operands, just like you would with format strings:
341 let mut x: u16 = 0xab;
344 asm!("mov {0:h}, {0:l}", inout(reg_abcd) x);
347 assert_eq!(x, 0xabab);
350 In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 registers (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.
352 Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
353 The `h` modifier will emit the register name for the high byte of that register and the `l` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
355 If you use a smaller data type (e.g. `u16`) with an operand and forget to use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
357 ## Memory address operands
359 Sometimes assembly instructions require operands passed via memory addresses/memory locations.
360 You have to manually use the memory address syntax specified by the target architecture.
361 For example, on x86/x86_64 using Intel assembly syntax, you should wrap inputs/outputs in `[]` to indicate they are memory operands:
366 fn load_fpu_control_word(control: u16) {
368 asm!("fldcw [{}]", in(reg) &control, options(nostack));
375 Any reuse of a named label, local or otherwise, can result in an assembler or linker error or may cause other strange behavior. Reuse of a named label can happen in a variety of ways including:
377 - explicitly: using a label more than once in one `asm!` block, or multiple times across blocks.
378 - implicitly via inlining: the compiler is allowed to instantiate multiple copies of an `asm!` block, for example when the function containing it is inlined in multiple places.
379 - implicitly via LTO: LTO can cause code from *other crates* to be placed in the same codegen unit, and so could bring in arbitrary labels.
381 As a consequence, you should only use GNU assembler **numeric** [local labels] inside inline assembly code. Defining symbols in assembly code may lead to assembler and/or linker errors due to duplicate symbol definitions.
383 Moreover, on x86 when using the default Intel syntax, due to [an LLVM bug], you shouldn't use labels exclusively made of `0` and `1` digits, e.g. `0`, `11` or `101010`, as they may end up being interpreted as binary values. Using `options(att_syntax)` will avoid any ambiguity, but that affects the syntax of the _entire_ `asm!` block. (See [Options](#options), below, for more on `options`.)
405 This will decrement the `{0}` register value from 10 to 3, then add 2 and store it in `a`.
407 This example shows a few things:
409 - First, that the same number can be used as a label multiple times in the same inline block.
410 - Second, that when a numeric label is used as a reference (as an instruction operand, for example), the suffixes “b” (“backward”) or ”f” (“forward”) should be added to the numeric label. It will then refer to the nearest label defined by this number in this direction.
412 [local labels]: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Local-Labels
413 [an LLVM bug]: https://bugs.llvm.org/show_bug.cgi?id=36144
417 By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However, in many cases it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
419 Let's take our previous example of an `add` instruction:
429 inlateout(reg) a, in(reg) b,
430 options(pure, nomem, nostack),
436 Options can be provided as an optional final argument to the `asm!` macro. We specified three options here:
437 - `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.
438 - `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).
439 - `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86-64 to avoid stack pointer adjustments.
441 These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.
443 See the [reference](../../reference/inline-assembly.html) for the full list of available options and their effects.