]> git.proxmox.com Git - rustc.git/blame - vendor/base64/README.md
New upstream version 1.74.1+dfsg1
[rustc.git] / vendor / base64 / README.md
CommitLineData
0a29b90c
FG
1# [base64](https://crates.io/crates/base64)
2
3[![](https://img.shields.io/crates/v/base64.svg)](https://crates.io/crates/base64) [![Docs](https://docs.rs/base64/badge.svg)](https://docs.rs/base64) [![CircleCI](https://circleci.com/gh/marshallpierce/rust-base64/tree/master.svg?style=shield)](https://circleci.com/gh/marshallpierce/rust-base64/tree/master) [![codecov](https://codecov.io/gh/marshallpierce/rust-base64/branch/master/graph/badge.svg)](https://codecov.io/gh/marshallpierce/rust-base64) [![unsafe forbidden](https://img.shields.io/badge/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/)
4
5<a href="https://www.jetbrains.com/?from=rust-base64"><img src="/icon_CLion.svg" height="40px"/></a>
6
7Made with CLion. Thanks to JetBrains for supporting open source!
8
9It's base64. What more could anyone want?
10
11This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at
12multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
13e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input),
14whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is
15slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
16
17See the [docs](https://docs.rs/base64) for all the details.
18
19## FAQ
20
21### I need to decode base64 with whitespace/null bytes/other random things interspersed in it. What should I do?
22
23Remove non-base64 characters from your input before decoding.
24
25If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to
26strip out whatever you need removed.
27
28If you have a `Read` (e.g. reading a file or network socket), there are various approaches.
29
30- Use [iter_read](https://crates.io/crates/iter-read) together with `Read`'s `bytes()` to filter out unwanted bytes.
31- Implement `Read` with a `read()` impl that delegates to your actual `Read`, and then drops any bytes you don't want.
32
33### I need to line-wrap base64, e.g. for MIME/PEM.
34
35[line-wrap](https://crates.io/crates/line-wrap) does just that.
36
37### I want canonical base64 encoding/decoding.
38
39First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to
40produce canonical output across all usage in the wild (hint: they don't).
41However, [people are drawn to their own destruction like moths to a flame](https://eprint.iacr.org/2022/361), so here we
42are.
43
44There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
45of the last encoded token in two or three token suffixes, and the `=` token used to inflate the suffix to a full four
46tokens.
47
48The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens,
49with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we
50decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set
51to 1 instead of 0.
52
53The `=` pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
54than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
55wasted on pointless `=` bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
56when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
57are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
58or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
59url-safe alphabet).
60
61All `Engine` implementations must at a minimum support treating non-canonical padding of both types as an error, and
62optionally may allow other behaviors.
63
64## Rust version compatibility
65
781aab86 66The minimum supported Rust version is 1.48.0.
0a29b90c
FG
67
68# Contributing
69
70Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all
71PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody
72wants to chase bugs in encoding of any sort.
73
74All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the
75free time to give each PR the attention it deserves. I will get to everyone eventually!
76
77## Developing
78
fe692bf9 79Benchmarks are in `benches/`.
0a29b90c
FG
80
81```bash
fe692bf9 82cargo bench
0a29b90c
FG
83```
84
85## no_std
86
87This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate
88the `default-features` to target `core` instead. In that case you lose out on all the functionality revolving
89around `std::io`, `std::error::Error`, and heap allocations. There is an additional `alloc` feature that you can activate
90to bring back the support for heap allocations.
91
92## Profiling
93
94On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the
fe692bf9 95benchmarks with `cargo bench --no-run`.
0a29b90c
FG
96
97Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results
98easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your
99CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual
fe692bf9 100full path with `cargo bench -v`; it will print out the commands it runs. If you use the exact path
0a29b90c
FG
101that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
102to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
103
104```bash
105sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse
106```
107
108Then analyze the results, again with perf:
109
110```bash
111sudo perf annotate -l
112```
113
114You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that
1154.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as
116it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of
117instruction profiling is inherently inaccurate, especially in branch-heavy code.
118
119```text
120 lib.rs:322 0.70 : 10698: mov %rdi,%rax
121 2.82 : 1069b: shr $0x38,%rax
122 : if morsel == decode_tables::INVALID_VALUE {
123 : bad_byte_index = input_index;
124 : break;
125 : };
126 : accum = (morsel as u64) << 58;
127 lib.rs:327 4.02 : 1069f: movzbl (%r9,%rax,1),%r15d
128 : // fast loop of 8 bytes at a time
129 : while input_index < length_of_full_chunks {
130 : let mut accum: u64;
131 :
132 : let input_chunk = BigEndian::read_u64(&input_bytes[input_index..(input_index + 8)]);
133 : morsel = decode_table[(input_chunk >> 56) as usize];
134 lib.rs:322 3.68 : 106a4: cmp $0xff,%r15
135 : if morsel == decode_tables::INVALID_VALUE {
136 0.00 : 106ab: je 1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
137```
138
139## Fuzzing
140
141This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts.
142To run, use an invocation like these:
143
144```bash
145cargo +nightly fuzz run roundtrip
146cargo +nightly fuzz run roundtrip_no_pad
147cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
148cargo +nightly fuzz run decode_random
149```
150
151## License
152
153This project is dual-licensed under MIT and Apache 2.0.
154