src/doc/rust-by-example/src/unsafe/asm.md

   1 # Inline assembly
   2
   3 Rust provides support for inline assembly via the `asm!` macro.
   4 It can be used to embed handwritten assembly in the assembly output generated by the compiler.
   5 Generally this should not be necessary, but might be where the required performance or timing
   6 cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
   7
   8 > **Note**: the examples here are given in x86/x86-64 assembly, but other architectures are also supported.
   9
  10 Inline assembly is currently supported on the following architectures:
  11 - x86 and x86-64
  12 - ARM
  13 - AArch64
  14 - RISC-V
  15
  16 ## Basic usage
  17
  18 Let us start with the simplest possible example:
  19
  20 ```rust
  21 use std::arch::asm;
  22
  23 unsafe {
  24     asm!("nop");
  25 }
  26 ```
  27
  28 This will insert a NOP (no operation) instruction into the assembly generated by the compiler.
  29 Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert
  30 arbitrary instructions and break various invariants. The instructions to be inserted are listed
  31 in the first argument of the `asm!` macro as a string literal.
  32
  33 ## Inputs and outputs
  34
  35 Now inserting an instruction that does nothing is rather boring. Let us do something that
  36 actually acts on data:
  37
  38 ```rust
  39 use std::arch::asm;
  40
  41 let x: u64;
  42 unsafe {
  43     asm!("mov {}, 5", out(reg) x);
  44 }
  45 assert_eq!(x, 5);
  46 ```
  47
  48 This will write the value `5` into the `u64` variable `x`.
  49 You can see that the string literal we use to specify instructions is actually a template string.
  50 It is governed by the same rules as Rust [format strings][format-syntax].
  51 The arguments that are inserted into the template however look a bit different than you may
  52 be familiar with. First we need to specify if the variable is an input or an output of the
  53 inline assembly. In this case it is an output. We declared this by writing `out`.
  54 We also need to specify in what kind of register the assembly expects the variable.
  55 In this case we put it in an arbitrary general purpose register by specifying `reg`.
  56 The compiler will choose an appropriate register to insert into
  57 the template and will read the variable from there after the inline assembly finishes executing.
  58
  59 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
  60
  61 Let us see another example that also uses an input:
  62
  63 ```rust
  64 use std::arch::asm;
  65
  66 let i: u64 = 3;
  67 let o: u64;
  68 unsafe {
  69     asm!(
  70         "mov {0}, {1}",
  71         "add {0}, 5",
  72         out(reg) o,
  73         in(reg) i,
  74     );
  75 }
  76 assert_eq!(o, 8);
  77 ```
  78
  79 This will add `5` to the input in variable `i` and write the result to variable `o`.
  80 The particular way this assembly does this is first copying the value from `i` to the output,
  81 and then adding `5` to it.
  82
  83 The example shows a few things:
  84
  85 First, we can see that `asm!` allows multiple template string arguments; each
  86 one is treated as a separate line of assembly code, as if they were all joined
  87 together with newlines between them. This makes it easy to format assembly
  88 code.
  89
  90 Second, we can see that inputs are declared by writing `in` instead of `out`.
  91
  92 Third, we can see that we can specify an argument number, or name as in any format string.
  93 For inline assembly templates this is particularly useful as arguments are often used more than once.
  94 For more complex inline assembly using this facility is generally recommended, as it improves
  95 readability, and allows reordering instructions without changing the argument order.
  96
  97 We can further refine the above example to avoid the `mov` instruction:
  98
  99 ```rust
 100 use std::arch::asm;
 101
 102 let mut x: u64 = 3;
 103 unsafe {
 104     asm!("add {0}, 5", inout(reg) x);
 105 }
 106 assert_eq!(x, 8);
 107 ```
 108
 109 We can see that `inout` is used to specify an argument that is both input and output.
 110 This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.
 111
 112 It is also possible to specify different variables for the input and output parts of an `inout` operand:
 113
 114 ```rust
 115 use std::arch::asm;
 116
 117 let x: u64 = 3;
 118 let y: u64;
 119 unsafe {
 120     asm!("add {0}, 5", inout(reg) x => y);
 121 }
 122 assert_eq!(y, 8);
 123 ```
 124
 125 ## Late output operands
 126
 127 The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`
 128 can be written at any time, and can therefore not share its location with any other argument.
 129 However, to guarantee optimal performance it is important to use as few registers as possible,
 130 so they won't have to be saved and reloaded around the inline assembly block.
 131 To achieve this Rust provides a `lateout` specifier. This can be used on any output that is
 132 written only after all inputs have been consumed.
 133 There is also a `inlateout` variant of this specifier.
 134
 135 Here is an example where `inlateout` *cannot* be used in `release` mode or other optimized cases:
 136
 137 ```rust
 138 use std::arch::asm;
 139
 140 let mut a: u64 = 4;
 141 let b: u64 = 4;
 142 let c: u64 = 4;
 143 unsafe {
 144     asm!(
 145         "add {0}, {1}",
 146         "add {0}, {2}",
 147         inout(reg) a,
 148         in(reg) b,
 149         in(reg) c,
 150     );
 151 }
 152 assert_eq!(a, 12);
 153 ```
 154 The above could work well in unoptimized cases (`Debug` mode), but if you want optimized performance (`release` mode or other optimized cases), it could not work.
 155
 156 That is because in optimized cases, the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`. If `inlateout` was used, then `a` and `c` could be allocated to the same register, in which case the first instruction to overwrite the value of `c` and cause the assembly code to produce the wrong result.
 157
 158 However the following example can use `inlateout` since the output is only modified after all input registers have been read:
 159
 160 ```rust
 161 use std::arch::asm;
 162
 163 let mut a: u64 = 4;
 164 let b: u64 = 4;
 165 unsafe {
 166     asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);
 167 }
 168 assert_eq!(a, 8);
 169 ```
 170
 171 As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.
 172
 173 ## Explicit register operands
 174
 175 Some instructions require that the operands be in a specific register.
 176 Therefore, Rust inline assembly provides some more specific constraint specifiers.
 177 While `reg` is generally available on any architecture, explicit registers are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi` among others can be addressed by their name.
 178
 179 ```rust,no_run
 180 use std::arch::asm;
 181
 182 let cmd = 0xd1;
 183 unsafe {
 184     asm!("out 0x64, eax", in("eax") cmd);
 185 }
 186 ```
 187
 188 In this example we call the `out` instruction to output the content of the `cmd` variable to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand we had to use the `eax` constraint specifier.
 189
 190 > **Note**: unlike other operand types, explicit register operands cannot be used in the template string: you can't use `{}` and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
 191
 192 Consider this example which uses the x86 `mul` instruction:
 193
 194 ```rust
 195 use std::arch::asm;
 196
 197 fn mul(a: u64, b: u64) -> u128 {
 198     let lo: u64;
 199     let hi: u64;
 200
 201     unsafe {
 202         asm!(
 203             // The x86 mul instruction takes rax as an implicit input and writes
 204             // the 128-bit result of the multiplication to rax:rdx.
 205             "mul {}",
 206             in(reg) a,
 207             inlateout("rax") b => lo,
 208             lateout("rdx") hi
 209         );
 210     }
 211
 212     ((hi as u128) << 64) + lo as u128
 213 }
 214 ```
 215
 216 This uses the `mul` instruction to multiply two 64-bit inputs with a 128-bit result.
 217 The only explicit operand is a register, that we fill from the variable `a`.
 218 The second operand is implicit, and must be the `rax` register, which we fill from the variable `b`.
 219 The lower 64 bits of the result are stored in `rax` from which we fill the variable `lo`.
 220 The higher 64 bits are stored in `rdx` from which we fill the variable `hi`.
 221
 222 ## Clobbered registers
 223
 224 In many cases inline assembly will modify state that is not needed as an output.
 225 Usually this is either because we have to use a scratch register in the assembly or because instructions modify state that we don't need to further examine.
 226 This state is generally referred to as being "clobbered".
 227 We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.
 228
 229 ```rust
 230 use std::arch::asm;
 231
 232 fn main() {
 233     // three entries of four bytes each
 234     let mut name_buf = [0_u8; 12];
 235     // String is stored as ascii in ebx, edx, ecx in order
 236     // Because ebx is reserved, the asm needs to preserve the value of it.
 237     // So we push and pop it around the main asm.
 238     // (in 64 bit mode for 64 bit processors, 32 bit processors would use ebx)
 239
 240     unsafe {
 241         asm!(
 242             "push rbx",
 243             "cpuid",
 244             "mov [rdi], ebx",
 245             "mov [rdi + 4], edx",
 246             "mov [rdi + 8], ecx",
 247             "pop rbx",
 248             // We use a pointer to an array for storing the values to simplify
 249             // the Rust code at the cost of a couple more asm instructions
 250             // This is more explicit with how the asm works however, as opposed
 251             // to explicit register outputs such as `out("ecx") val`
 252             // The *pointer itself* is only an input even though it's written behind
 253             in("rdi") name_buf.as_mut_ptr(),
 254             // select cpuid 0, also specify eax as clobbered
 255             inout("eax") 0 => _,
 256             // cpuid clobbers these registers too
 257             out("ecx") _,
 258             out("edx") _,
 259         );
 260     }
 261
 262     let name = core::str::from_utf8(&name_buf).unwrap();
 263     println!("CPU Manufacturer ID: {}", name);
 264 }
 265 ```
 266
 267 In the example above we use the `cpuid` instruction to read the CPU manufacturer ID.
 268 This instruction writes to `eax` with the maximum supported `cpuid` argument and `ebx`, `edx`, and `ecx` with the CPU manufacturer ID as ASCII bytes in that order.
 269
 270 Even though `eax` is never read we still need to tell the compiler that the register has been modified so that the compiler can save any values that were in these registers before the asm. This is done by declaring it as an output but with `_` instead of a variable name, which indicates that the output value is to be discarded.
 271
 272 This code also works around the limitation that `ebx` is a reserved register by LLVM. That means that LLVM assumes that it has full control over the register and it must be restored to its original state before exiting the asm block, so it cannot be used as an input or output **except** if the compiler uses it to fulfill a general register class (e.g. `in(reg)`). This makes `reg` operands dangerous when using reserved registers as we could unknowingly corrupt out input or output because they share the same register.
 273
 274 To work around this we use `rdi` to store the pointer to the output array, save `ebx` via `push`, read from `ebx` inside the asm block into the array and then restoring `ebx` to its original state via `pop`. The `push` and `pop` use the full 64-bit `rbx` version of the register to ensure that the entire register is saved. On 32 bit targets the code would instead use `ebx` in the `push`/`pop`.
 275
 276 This can also be used with a general register class to obtain a scratch register for use inside the asm code:
 277
 278 ```rust
 279 use std::arch::asm;
 280
 281 // Multiply x by 6 using shifts and adds
 282 let mut x: u64 = 4;
 283 unsafe {
 284     asm!(
 285         "mov {tmp}, {x}",
 286         "shl {tmp}, 1",
 287         "shl {x}, 2",
 288         "add {x}, {tmp}",
 289         x = inout(reg) x,
 290         tmp = out(reg) _,
 291     );
 292 }
 293 assert_eq!(x, 4 * 6);
 294 ```
 295
 296 ## Symbol operands and ABI clobbers
 297
 298 By default, `asm!` assumes that any register not specified as an output will have its contents preserved by the assembly code. The [`clobber_abi`] argument to `asm!` tells the compiler to automatically insert the necessary clobber operands according to the given calling convention ABI: any register which is not fully preserved in that ABI will be treated as clobbered.  Multiple `clobber_abi` arguments may be provided and all clobbers from all specified ABIs will be inserted.
 299
 300 [`clobber_abi`]: ../../reference/inline-assembly.html#abi-clobbers
 301
 302 ```rust
 303 use std::arch::asm;
 304
 305 extern "C" fn foo(arg: i32) -> i32 {
 306     println!("arg = {}", arg);
 307     arg * 2
 308 }
 309
 310 fn call_foo(arg: i32) -> i32 {
 311     unsafe {
 312         let result;
 313         asm!(
 314             "call {}",
 315             // Function pointer to call
 316             in(reg) foo,
 317             // 1st argument in rdi
 318             in("rdi") arg,
 319             // Return value in rax
 320             out("rax") result,
 321             // Mark all registers which are not preserved by the "C" calling
 322             // convention as clobbered.
 323             clobber_abi("C"),
 324         );
 325         result
 326     }
 327 }
 328 ```
 329
 330 ## Register template modifiers
 331
 332 In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
 333
 334 By default the compiler will always choose the name that refers to the full register size (e.g. `rax` on x86-64, `eax` on x86, etc).
 335
 336 This default can be overridden by using modifiers on the template string operands, just like you would with format strings:
 337
 338 ```rust
 339 use std::arch::asm;
 340
 341 let mut x: u16 = 0xab;
 342
 343 unsafe {
 344     asm!("mov {0:h}, {0:l}", inout(reg_abcd) x);
 345 }
 346
 347 assert_eq!(x, 0xabab);
 348 ```
 349
 350 In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 registers (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.
 351
 352 Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
 353 The `h` modifier will emit the register name for the high byte of that register and the `l` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
 354
 355 If you use a smaller data type (e.g. `u16`) with an operand and forget to use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
 356
 357 ## Memory address operands
 358
 359 Sometimes assembly instructions require operands passed via memory addresses/memory locations.
 360 You have to manually use the memory address syntax specified by the target architecture.
 361 For example, on x86/x86_64 using Intel assembly syntax, you should wrap inputs/outputs in `[]` to indicate they are memory operands:
 362
 363 ```rust
 364 use std::arch::asm;
 365
 366 fn load_fpu_control_word(control: u16) {
 367     unsafe {
 368         asm!("fldcw [{}]", in(reg) &control, options(nostack));
 369     }
 370 }
 371 ```
 372
 373 ## Labels
 374
 375 Any reuse of a named label, local or otherwise, can result in an assembler or linker error or may cause other strange behavior. Reuse of a named label can happen in a variety of ways including:
 376
 377 -   explicitly: using a label more than once in one `asm!` block, or multiple times across blocks.
 378 -   implicitly via inlining: the compiler is allowed to instantiate multiple copies of an `asm!` block, for example when the function containing it is inlined in multiple places.
 379 -   implicitly via LTO: LTO can cause code from *other crates* to be placed in the same codegen unit, and so could bring in arbitrary labels.
 380
 381 As a consequence, you should only use GNU assembler **numeric** [local labels] inside inline assembly code. Defining symbols in assembly code may lead to assembler and/or linker errors due to duplicate symbol definitions.
 382
 383 Moreover, on x86 when using the default Intel syntax, due to [an LLVM bug], you shouldn't use labels exclusively made of `0` and `1` digits, e.g. `0`, `11` or `101010`, as they may end up being interpreted as binary values. Using `options(att_syntax)` will avoid any ambiguity, but that affects the syntax of the _entire_ `asm!` block. (See [Options](#options), below, for more on `options`.)
 384
 385 ```rust
 386 use std::arch::asm;
 387
 388 let mut a = 0;
 389 unsafe {
 390     asm!(
 391         "mov {0}, 10",
 392         "2:",
 393         "sub {0}, 1",
 394         "cmp {0}, 3",
 395         "jle 2f",
 396         "jmp 2b",
 397         "2:",
 398         "add {0}, 2",
 399         out(reg) a
 400     );
 401 }
 402 assert_eq!(a, 5);
 403 ```
 404
 405 This will decrement the `{0}` register value from 10 to 3, then add 2 and store it in `a`.
 406
 407 This example shows a few things:
 408
 409 - First, that the same number can be used as a label multiple times in the same inline block.
 410 - Second, that when a numeric label is used as a reference (as an instruction operand, for example), the suffixes “b” (“backward”) or ”f” (“forward”) should be added to the numeric label. It will then refer to the nearest label defined by this number in this direction.
 411
 412 [local labels]: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Local-Labels
 413 [an LLVM bug]: https://bugs.llvm.org/show_bug.cgi?id=36144
 414
 415 ## Options
 416
 417 By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However, in many cases it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
 418
 419 Let's take our previous example of an `add` instruction:
 420
 421 ```rust
 422 use std::arch::asm;
 423
 424 let mut a: u64 = 4;
 425 let b: u64 = 4;
 426 unsafe {
 427     asm!(
 428         "add {0}, {1}",
 429         inlateout(reg) a, in(reg) b,
 430         options(pure, nomem, nostack),
 431     );
 432 }
 433 assert_eq!(a, 8);
 434 ```
 435
 436 Options can be provided as an optional final argument to the `asm!` macro. We specified three options here:
 437 - `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.
 438 - `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).
 439 - `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86-64 to avoid stack pointer adjustments.
 440
 441 These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.
 442
 443 See the [reference](../../reference/inline-assembly.html) for the full list of available options and their effects.