]> git.proxmox.com Git - rustc.git/blob - vendor/generic-array/DESIGN.md
bump version to 1.52.1+dfsg1-1~bpo10+pve2
[rustc.git] / vendor / generic-array / DESIGN.md
1 Design and Usage Notes
2 ======================
3
4 ## Sections
5
6 1. [How it Works](#how-it-works)
7 2. [Initialization](#initialization)
8 3. [Functional Programming](#functional-programming)
9 4. [Miscellaneous Utilities](#miscellaneous-utilities)
10 5. [Safety](#safety)
11 6. [Optimization](#optimization)
12 7. [The Future](#the-future)
13
14 **NOTE**: This document uses `<details>` sections, so look out for collapsible parts with an arrow on the left.
15
16 # How it works
17
18 `generic-array` is a method of achieving fixed-length fixed-size stack-allocated generic arrays without needing const generics in stable Rust.
19
20 That is to say this:
21
22 ```rust
23 struct Foo<const N: usize> {
24 data: [i32; N],
25 }
26 ```
27
28 or anything similar is not currently supported.
29
30 However, Rust's type system is sufficiently advanced, and a "hack" for solving this was created in the form of the `typenum` crate, which recursively defines integer values in binary as nested types, and operations which can be applied to those type-numbers, such as `Add`, `Sub`, etc.
31
32 e.g. `6` would be `UInt<UInt<UInt<UTerm, B1>, B1>, B0>`
33
34 Over time, I've come to see `typenum` as less of a hack and more as an elegant solution.
35
36 The recursive binary nature of `typenum` is what makes `generic-array` possible, so:
37
38 ```rust
39 struct Foo<N: ArrayLength<i32>> {
40 data: GenericArray<i32, N>,
41 }
42 ```
43
44 is supported.
45
46 I often see questions about why `ArrayLength` requires the element type `T` in it's signature, even though it's not used in the inner `ArrayType`.
47
48 This is because `GenericArray` itself does not define the actual array. Rather, it is defined as:
49
50 ```rust
51 pub struct GenericArray<T, N: ArrayLength<T>> {
52 data: N::ArrayType,
53 }
54 ```
55
56 The trait `ArrayLength` does all the real heavy lifting for defining the data, with implementations on `UInt<N, B0>`, `UInt<N, B1>` and `UTerm`, which correspond to even, odd and zero numeric values, respectively.
57
58 `ArrayLength`'s implementations use type-level recursion to peel away each least significant bit and form sort of an opaque binary tree of contiguous data the correct physical size to store `N` elements of `T`. The tree, or block of data, is then stored inside of `GenericArray` to be reinterpreted as the array.
59
60 For example, `GenericArray<T, U6>` more or less expands to (at compile time):
61
62 <details>
63 <summary>Expand for code</summary>
64
65 ```rust
66 GenericArray {
67 // UInt<UInt<UInt<UTerm, B1>, B1>, B0>
68 data: EvenData {
69 // UInt<UInt<UTerm, B1>, B1>
70 left: OddData {
71 // UInt<UTerm, B1>
72 left: OddData {
73 left: (), // UTerm
74 right: (), // UTerm
75 data: T, // Element 0
76 },
77 // UInt<UTerm, B1>
78 right: OddData {
79 left: (), // UTerm
80 right: (), // UTerm
81 data: T, // Element 1
82 },
83 data: T // Element 2
84 },
85 // UInt<UInt<UTerm, B1>, B1>
86 right: OddData {
87 // UInt<UTerm, B1>
88 left: OddData {
89 left: (), // UTerm
90 right: (), // UTerm
91 data: T, // Element 3
92 },
93 // UInt<UTerm, B1>
94 right: OddData {
95 left: (), // UTerm
96 right: (), // UTerm
97 data: T, // Element 4
98 },
99 data: T // Element 5
100 }
101 }
102 }
103 ```
104
105 </details>
106
107 This has the added benefit of only being `log2(N)` deep, which is important for things like `Drop`, which we'll go into later.
108
109 Then, we take `data` and cast it to `*const T` or `*mut T` and use it as a slice like:
110
111 ```rust
112 unsafe {
113 slice::from_raw_parts(
114 self as *const Self as *const T,
115 N::to_usize()
116 )
117 }
118 ```
119
120 It is useful to note that because `typenum` is compile-time with nested generics, `to_usize`, even if it isn't a `const fn`, *does* expand to effectively `1 + 2 + 4 + 8 + ...` and so forth, which LLVM is smart enough to reduce to a single compile-time constant. This helps hint to the optimizers about things such as bounds checks.
121
122 So, to reiterate, we're working with a raw block of contiguous memory the correct physical size to store `N` elements of `T`. It's really no different from how normal arrays are stored.
123
124 ## Pointer Safety
125
126 Of course, casting pointers around and constructing blocks of data out of thin air is normal for C, but here in Rust we try to be a bit less prone to segfaults. Therefore, great care is taken to minimize casual `unsafe` usage and restrict `unsafe` to specific parts of the API, making heavy use those exposed safe APIs internally.
127
128 For example, the above `slice::from_raw_parts` is only used twice in the entire library, once for `&[T]` and `slice::from_raw_parts_mut` once for `&mut [T]`. Everything else goes through those slices.
129
130 # Initialization
131
132 ## Constant
133
134 "Constant" initialization, that is to say - without dynamic values, can be done via the `arr![]` macro, which works almost exactly like `vec![]`, but with an additional type parameter.
135
136 Example:
137
138 ```rust
139 let my_arr = arr![i32; 1, 2, 3, 4, 5, 6, 7, 8];
140 ```
141
142 ## Dynamic
143
144 Although some users have opted to use their own initializers, as of version `0.9` and beyond `generic-array` includes safe methods for initializing elements in the array.
145
146 The `GenericSequence` trait defines a `generate` method which can be used like so:
147
148 ```rust
149 use generic_array::{GenericArray, sequence::GenericSequence};
150
151 let squares: GenericArray<i32, U4> =
152 GenericArray::generate(|i: usize| i as i32 * 2);
153 ```
154
155 and `GenericArray` additionally implements `FromIterator`, although `from_iter` ***will*** panic if the number of elements is not *at least* `N`. It will ignore extra items.
156
157 The safety of these operations is described later.
158
159 # Functional Programming
160
161 In addition to `GenericSequence`, this crate provides a `FunctionalSequence`, which allows extremely efficient `map`, `zip` and `fold` operations on `GenericArray`s.
162
163 As described at the end of the [Optimization](#optimization) section, `FunctionalSequence` uses clever specialization tactics to provide optimized methods wherever possible, while remaining perfectly safe.
164
165 Some examples, taken from `tests/generic.rs`:
166
167 <details>
168 <summary>Expand for code</summary>
169
170 This is so extensive to show how you can build up to processing totally arbitrary sequences, but for the most part these can be used on `GenericArray` instances without much added complexity.
171
172 ```rust
173 /// Super-simple fixed-length i32 `GenericArray`s
174 pub fn generic_array_plain_zip_sum(a: GenericArray<i32, U4>, b: GenericArray<i32, U4>) -> i32 {
175 a.zip(b, |l, r| l + r)
176 .map(|x| x + 1)
177 .fold(0, |a, x| x + a)
178 }
179
180 pub fn generic_array_variable_length_zip_sum<N>(a: GenericArray<i32, N>, b: GenericArray<i32, N>) -> i32
181 where
182 N: ArrayLength<i32>,
183 {
184 a.zip(b, |l, r| l + r)
185 .map(|x| x + 1)
186 .fold(0, |a, x| x + a)
187 }
188
189 pub fn generic_array_same_type_variable_length_zip_sum<T, N>(a: GenericArray<T, N>, b: GenericArray<T, N>) -> i32
190 where
191 N: ArrayLength<T> + ArrayLength<<T as Add<T>>::Output>,
192 T: Add<T, Output=i32>,
193 {
194 a.zip(b, |l, r| l + r)
195 .map(|x| x + 1)
196 .fold(0, |a, x| x + a)
197 }
198
199 /// Complex example using fully generic `GenericArray`s with the same length.
200 ///
201 /// It's mostly just the repeated `Add` traits, which would be present in other systems anyway.
202 pub fn generic_array_zip_sum<A, B, N: ArrayLength<A> + ArrayLength<B>>(a: GenericArray<A, N>, b: GenericArray<B, N>) -> i32
203 where
204 A: Add<B>,
205 N: ArrayLength<<A as Add<B>>::Output> +
206 ArrayLength<<<A as Add<B>>::Output as Add<i32>>::Output>,
207 <A as Add<B>>::Output: Add<i32>,
208 <<A as Add<B>>::Output as Add<i32>>::Output: Add<i32, Output=i32>,
209 {
210 a.zip(b, |l, r| l + r)
211 .map(|x| x + 1)
212 .fold(0, |a, x| x + a)
213 }
214 ```
215 </details>
216
217 and if you really want to go off the deep end and support any arbitrary *`GenericSequence`*:
218
219 <details>
220 <summary>Expand for code</summary>
221
222 ```rust
223 /// Complex example function using generics to pass N-length sequences, zip them, and then map that result.
224 ///
225 /// If used with `GenericArray` specifically this isn't necessary
226 pub fn generic_sequence_zip_sum<A, B>(a: A, b: B) -> i32
227 where
228 A: FunctionalSequence<i32>, // `.zip`
229 B: FunctionalSequence<i32, Length = A::Length>, // `.zip`
230 A: MappedGenericSequence<i32, i32>, // `i32` -> `i32`
231 B: MappedGenericSequence<i32, i32, Mapped = MappedSequence<A, i32, i32>>, // `i32` -> `i32`, prove A and B can map to the same output
232 A::Item: Add<B::Item, Output = i32>, // `l + r`
233 MappedSequence<A, i32, i32>: MappedGenericSequence<i32, i32> + FunctionalSequence<i32>, // `.map`
234 SequenceItem<MappedSequence<A, i32, i32>>: Add<i32, Output=i32>, // `x + 1`
235 MappedSequence<MappedSequence<A, i32, i32>, i32, i32>: Debug, // `println!`
236 MappedSequence<MappedSequence<A, i32, i32>, i32, i32>: FunctionalSequence<i32>, // `.fold`
237 SequenceItem<MappedSequence<MappedSequence<A, i32, i32>, i32, i32>>: Add<i32, Output=i32> // `x + a`, note the order
238 {
239 let c = a.zip(b, |l, r| l + r).map(|x| x + 1);
240
241 println!("{:?}", c);
242
243 c.fold(0, |a, x| x + a)
244 }
245 ```
246
247 of course, as I stated before, that's almost never necessary, especially when you know the concrete types of all the components.
248
249 </details>
250
251 The [`numeric-array`](https://crates.io/crates/numeric-array) crate uses these to apply numeric operations across all elements in a `GenericArray`, making full use of all the optimizations described in the last section here.
252
253 # Miscellaneous Utilities
254
255 Although not usually advertised, `generic-array` contains traits for lengthening, shortening, splitting and concatenating arrays.
256
257 For example, these snippets are taken from `tests/mod.rs`:
258
259 <details>
260 <summary>Expand for code</summary>
261
262 Appending and prepending elements:
263
264 ```rust
265 use generic_array::sequence::Lengthen;
266
267 #[test]
268 fn test_append() {
269 let a = arr![i32; 1, 2, 3];
270
271 let b = a.append(4);
272
273 assert_eq!(b, arr![i32; 1, 2, 3, 4]);
274 }
275
276 #[test]
277 fn test_prepend() {
278 let a = arr![i32; 1, 2, 3];
279
280 let b = a.prepend(4);
281
282 assert_eq!(b, arr![i32; 4, 1, 2, 3]);
283 }
284 ```
285
286 Popping elements from the front of back of the array:
287
288 ```rust
289 use generic_array::sequence::Shorten;
290
291 let a = arr![i32; 1, 2, 3, 4];
292
293 let (init, last) = a.pop_back();
294
295 assert_eq!(init, arr![i32; 1, 2, 3]);
296 assert_eq!(last, 4);
297
298 let (head, tail) = a.pop_front();
299
300 assert_eq!(head, 1);
301 assert_eq!(tail, arr![i32; 2, 3, 4]);
302 ```
303
304 and of course concatenating and splitting:
305
306 ```rust
307 use generic_array::sequence::{Concat, Split};
308
309 let a = arr![i32; 1, 2];
310 let b = arr![i32; 3, 4];
311
312 let c = a.concat(b);
313
314 assert_eq!(c, arr![i32; 1, 2, 3, 4]);
315
316 let (d, e) = c.split();
317
318 assert_eq!(d, arr![i32; 1]);
319 assert_eq!(e, arr![i32; 2, 3, 4]);
320 ```
321 </details>
322
323 `Split` and `Concat` in these examples use type-inference to determine the lengths of the resulting arrays.
324
325 # Safety
326
327 As stated earlier, for raw reinterpretations such as this, safety is a must even while working with unsafe code. Great care is taken to reduce or eliminate undefined behavior.
328
329 For most of the above code examples, the biggest potential undefined behavior hasn't even been applicable for one simple reason: they were all primitive values.
330
331 The simplest way to lead into this is to post these questions:
332
333 1. What if the element type of the array implements `Drop`?
334 2. What if `GenericArray::generate` opens a bunch of files?
335 3. What if halfway through opening each of the files, one is not found?
336 4. What if the resulting error is unwrapped, causing the generation function to panic?
337
338 For a fully initialized `GenericArray`, the expanded structure as described in the [How It Works](#how-it-works) can implement `Drop` naturally, recursively dropping elements. As it is only `log2(N)` deep, the recursion is very small overall.
339
340 In fact, I tested it while writing this, the size of the array itself overflows the stack before any recursive calls to `drop` can.
341
342 However, ***partially*** initialized arrays, such as described in the above hypothetical, pose an issue where `drop` could be called on uninitialized data, which is undefined behavior.
343
344 To solve this, `GenericArray` implements two components named `ArrayBuilder` and `ArrayConsumer`, which work very similarly.
345
346 `ArrayBuilder` creates a block of wholly uninitialized memory via `mem::unintialized()`, and stores that in a `ManuallyDrop` wrapper. `ManuallyDrop` does exactly what it says on the tin, and simply doesn't drop the value unless manually requested to.
347
348 So, as we're initializing our array, `ArrayBuilder` keeps track of the current position through it, and if something happens, `ArrayBuilder` itself will iteratively and manually `drop` all currently initialized elements, ignoring any uninitialized ones, because those are just raw memory and should be ignored.
349
350 `ArrayConsumer` does almost the same, "moving" values out of the array and into something else, like user code. It uses `ptr::read` to "move" the value out, and increments a counter saying that value is no longer valid in the array.
351
352 If a panic occurs in the user code with that element, it's dropped naturally as it was moved into that scope. `ArrayConsumer` then proceeds to iteratively and manually `drop` all *remaining* elements.
353
354 Combined, these two systems provide a safe system for building and consuming `GenericArray`s. In fact, they are used extensively inside the library itself for `FromIterator`, `GenericSequence` and `FunctionalSequence`, among others.
355
356 Even `GenericArray`s implementation of `Clone` makes use of this via:
357
358 ```rust
359 impl<T: Clone, N> Clone for GenericArray<T, N>
360 where
361 N: ArrayLength<T>,
362 {
363 fn clone(&self) -> GenericArray<T, N> {
364 self.map(|x| x.clone())
365 }
366 }
367 ```
368
369 where `.map` is from the `FunctionalSequence`, and uses those builder and consumer structures to safely move and initialize values. Although, in this particular case, a consumer is not necessary as we're using references. More on how that is automatically deduced is described in the next section.
370
371 # Optimization
372
373 Rust and LLVM is smart. Crazy smart. However, it's not magic.
374
375 In my experience, most of Rust's "zero-cost" abstractions stem more from the type system, rather than explicit optimizations. Most Rust code is very easily optimizable and inlinable by design, so it can be simplified and compacted rather well, as opposed to the spaghetti code of some other languages.
376
377 Unfortunately, unless `rustc` or LLVM can "prove" things about code to simplify it, it must still be run, and can prevent further optimization.
378
379 A great example of this, and why I created the `GenericSequence` and `FunctionalSequence` traits, are iterators.
380
381 Custom iterators are slow. Not terribly slow, but slow enough to prevent some rather important optimizations.
382
383 Take `GenericArrayIter` for example:
384
385 <details>
386 <summary>Expand for code</summary>
387
388 ```rust
389 pub struct GenericArrayIter<T, N: ArrayLength<T>> {
390 array: ManuallyDrop<GenericArray<T, N>>,
391 index: usize,
392 index_back: usize,
393 }
394
395 impl<T, N> Iterator for GenericArrayIter<T, N>
396 where
397 N: ArrayLength<T>,
398 {
399 type Item = T;
400
401 #[inline]
402 fn next(&mut self) -> Option<T> {
403 if self.index < self.index_back {
404 let p = unsafe {
405 Some(ptr::read(self.array.get_unchecked(self.index)))
406 };
407
408 self.index += 1;
409
410 p
411 } else {
412 None
413 }
414 }
415
416 //and more
417 }
418 ```
419 </details>
420
421 Seems simple enough, right? Move an element out of the array with `ptr::read` and increment the index. If the iterator is dropped, the remaining elements are dropped exactly as they would with `ArrayConsumer`. `index_back` is provided for `DoubleEndedIterator`.
422
423 Unfortunately, that single `if` statement is terrible. In my mind, this is one of the biggest flaws of the iterator design. A conditional jump on a mutable variable unrelated to the data we are accessing on each call foils the optimizer and generates suboptimal code for the above iterator, even when we use `get_unchecked`.
424
425 The optimizer is unable to see that we are simply accessing memory sequentially. In fact, almost all iterators are like this. Granted, this is usually fine and, especially if they have to handle errors, it's perfectly acceptable.
426
427 However, there is one iterator in the standard library that is optimized perfectly: the slice iterator. So perfectly in fact that it allows the optimizer to do something even more special: **auto-vectorization**! We'll get to that later.
428
429 It's a bit frustrating as to *why* slice iterators can be so perfectly optimized, and it basically boils down to that the iterator itself does not own the data the slice refers to, so it uses raw pointers to the array/sequence/etc. rather than having to use an index on a stack allocated and always moving array. It can check for if the iterator is empty by comparing some `front` and `back` pointers for equality, and because those directly correspond to the position in memory of the next element, LLVM can see that and make optimizations.
430
431 So, the gist of that is: always use slice iterators where possible.
432
433 Here comes the most important part of all of this: `ArrayBuilder` and `ArrayConsumer` don't iterate the arrays themselves. Instead, we use slice iterators (immutable and mutable), with `zip` or `enumerate`, to apply operations to the entire array, incrementing the position in both `ArrayBuilder` or `ArrayConsumer` to keep track.
434
435 For example, `GenericSequence::generate` for `GenericArray` is:
436
437 <details>
438 <summary>Expand for code</summary>
439
440 ```rust
441 fn generate<F>(mut f: F) -> GenericArray<T, N>
442 where
443 F: FnMut(usize) -> T,
444 {
445 unsafe {
446 let mut destination = ArrayBuilder::new();
447
448 {
449 let (destination_iter, position) = destination.iter_position();
450
451 for (i, dst) in destination_iter.enumerate() {
452 ptr::write(dst, f(i));
453
454 *position += 1;
455 }
456 }
457
458 destination.into_inner()
459 }
460 }
461 ```
462
463 where `ArrayBuilder::iter_position` is just an internal convenience function:
464
465 ```rust
466 pub unsafe fn iter_position(&mut self) -> (slice::IterMut<T>, &mut usize) {
467 (self.array.iter_mut(), &mut self.position)
468 }
469 ```
470 </details>
471
472 Of course, this may appear to be redundant, if we're using an iterator that keeps track of the position itself, and the builder is also keeping track of the position. However, the two are decoupled.
473
474 If the generation function doesn't have a chance at panicking, and/or the array element type doesn't implement `Drop`, the optimizer deems the `Drop` implementation on `ArrayBuilder` (and `ArrayConsumer`) dead code, and therefore `position` is never actually read from, so it becomes dead code as well, and is removed.
475
476 So for simple non-`Drop`/non-panicking elements and generation functions, `generate` becomes a very simple loop that uses a slice iterator to write values to the array.
477
478 Next, let's take a look at a more complex example where this *really* shines: `.zip`
479
480 To cut down on excessively verbose code, `.zip` uses `FromIterator` for building the array, which has almost identical code to `generate`, so it will be omitted.
481
482 The first implementation of `.zip` is defined as:
483
484 <details>
485 <summary>Expand for code</summary>
486
487 ```rust
488 fn inverted_zip<B, U, F>(
489 self,
490 lhs: GenericArray<B, Self::Length>,
491 mut f: F,
492 ) -> MappedSequence<GenericArray<B, Self::Length>, B, U>
493 where
494 GenericArray<B, Self::Length>:
495 GenericSequence<B, Length = Self::Length> + MappedGenericSequence<B, U>,
496 Self: MappedGenericSequence<T, U>,
497 Self::Length: ArrayLength<B> + ArrayLength<U>,
498 F: FnMut(B, Self::Item) -> U,
499 {
500 unsafe {
501 let mut left = ArrayConsumer::new(lhs);
502 let mut right = ArrayConsumer::new(self);
503
504 let (left_array_iter, left_position) = left.iter_position();
505 let (right_array_iter, right_position) = right.iter_position();
506
507 FromIterator::from_iter(left_array_iter.zip(right_array_iter).map(|(l, r)| {
508 let left_value = ptr::read(l);
509 let right_value = ptr::read(r);
510
511 *left_position += 1;
512 *right_position += 1;
513
514 f(left_value, right_value)
515 }))
516 }
517 }
518 ```
519 </details>
520
521 The gist of this is that we have two `GenericArray` instances that need to be zipped together and mapped to a new sequence. This employs two `ArrayConsumer`s, and more or less use the same pattern as the previous example.
522
523 Again, the position values can be optimized out, and so can the slice iterator adapters.
524
525 We can go a step further with this, however.
526
527 Consider this:
528
529 ```rust
530 let a = arr![i32; 1, 3, 5, 7];
531 let b = arr![i32; 2, 4, 6, 8];
532
533 let c = a.zip(b, |l, r| l + r);
534
535 assert_eq!(c, arr![i32; 3, 7, 11, 15]);
536 ```
537
538 when compiled with:
539
540 ```
541 cargo rustc --lib --profile test --release -- -C target-cpu=native -C opt-level=3 --emit asm
542 ```
543
544 will produce assembly with the following relevant instructions taken from the entire program:
545
546 ```asm
547 ; Copy constant to register
548 vmovaps __xmm@00000007000000050000000300000001(%rip), %xmm0
549
550 ; Copy constant to register
551 vmovaps __xmm@00000008000000060000000400000002(%rip), %xmm0
552
553 ; Add the two values together
554 vpaddd 192(%rsp), %xmm0, %xmm1
555
556 ; Copy constant to register
557 vmovaps __xmm@0000000f0000000b0000000700000003(%rip), %xmm0
558
559 ; Compare result of the addition with the last constant
560 vpcmpeqb 128(%rsp), %xmm0, %xmm0
561 ```
562
563 so, aside from a bunch of obvious hygiene instructions around those selected instructions,
564 it seriously boils down that `.zip` call to a ***SINGLE*** SIMD instruction. In fact, it continues to do this for even larger arrays. Although it does fall back to individual additions for fewer than four elements, as it can't fit those into an SSE register evenly.
565
566 Using this property of auto-vectorization without sacrificing safety, I created the [`numeric-array`](https://crates.io/crates/numeric-array) crate which makes use of this to wrap `GenericArray` and implement numeric traits so that almost *all* operations can be auto-vectorized, even complex ones like fused multiple-add.
567
568 It doesn't end there, though. You may have noticed that the function name for zip above wasn't `zip`, but `inverted_zip`.
569
570 This is because `generic-array` employs a clever specialization tactic to ensure `.zip` works corrects with:
571
572 1. `a.zip(b, ...)`
573 2. `(&a).zip(b, ...)`
574 3. `(&a).zip(&b, ...)`
575 4. `a.zip(&b, ...)`
576
577 wherein `GenericSequence` and `FunctionalSequence` have default implementations of `zip` variants, with concrete implementations for `GenericArray`. As `GenericSequence` is implemented for `&GenericArray`, where calling `into_iter` on produces a slice iterator, it can use "naive" iterator adapters to the same effect, while the specialized implementations use `ArrayConsumer`.
578
579 The result is that any combination of move or reference calls to `.zip`, `.map` and `.fold` produce code that can be optimized, none of them falling back to slow non-slice iterators. All perfectly safe with the `ArrayBuilder` and `ArrayConsumer` systems.
580
581 Honestly, `GenericArray` is better than standard arrays at this point.
582
583 # The Future
584
585 If/when const generics land in stable Rust, my intention is to reorient this crate or create a new crate to provide traits and wrappers for standard arrays to provide the same safety and performance discussed above.