]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/bluestore.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / dev / bluestore.rst
1 ===================
2 BlueStore Internals
3 ===================
4
5
6 Small write strategies
7 ----------------------
8
9 * *U*: Uncompressed write of a complete, new blob.
10
11 - write to new blob
12 - kv commit
13
14 * *P*: Uncompressed partial write to unused region of an existing
15 blob.
16
17 - write to unused chunk(s) of existing blob
18 - kv commit
19
20 * *W*: WAL overwrite: commit intent to overwrite, then overwrite
21 async. Must be chunk_size = MAX(block_size, csum_block_size)
22 aligned.
23
24 - kv commit
25 - wal overwrite (chunk-aligned) of existing blob
26
27 * *N*: Uncompressed partial write to a new blob. Initially sparsely
28 utilized. Future writes will either be *P* or *W*.
29
30 - write into a new (sparse) blob
31 - kv commit
32
33 * *R+W*: Read partial chunk, then to WAL overwrite.
34
35 - read (out to chunk boundaries)
36 - kv commit
37 - wal overwrite (chunk-aligned) of existing blob
38
39 * *C*: Compress data, write to new blob.
40
41 - compress and write to new blob
42 - kv commit
43
44 Possible future modes
45 ---------------------
46
47 * *F*: Fragment lextent space by writing small piece of data into a
48 piecemeal blob (that collects random, noncontiguous bits of data we
49 need to write).
50
51 - write to a piecemeal blob (min_alloc_size or larger, but we use just one block of it)
52 - kv commit
53
54 * *X*: WAL read/modify/write on a single block (like legacy
55 bluestore). No checksum.
56
57 - kv commit
58 - wal read/modify/write
59
60 Mapping
61 -------
62
63 This very roughly maps the type of write onto what we do when we
64 encounter a given blob. In practice it's a bit more complicated since there
65 might be several blobs to consider (e.g., we might be able to *W* into one or
66 *P* into another), but it should communicate a rough idea of strategy.
67
68 +--------------------------+--------+--------------+-------------+--------------+---------------+
69 | | raw | raw (cached) | csum (4 KB) | csum (16 KB) | comp (128 KB) |
70 +--------------------------+--------+--------------+-------------+--------------+---------------+
71 | 128+ KB (over)write | U | U | U | U | C |
72 +--------------------------+--------+--------------+-------------+--------------+---------------+
73 | 64 KB (over)write | U | U | U | U | U or C |
74 +--------------------------+--------+--------------+-------------+--------------+---------------+
75 | 4 KB overwrite | W | P | W | P | W | P | R+W | P | N (F?) |
76 +--------------------------+--------+--------------+-------------+--------------+---------------+
77 | 100 byte overwrite | R+W | P | W | P | R+W | P | R+W | P | N (F?) |
78 +--------------------------+--------+--------------+-------------+--------------+---------------+
79 | 100 byte append | R+W | P | W | P | R+W | P | R+W | P | N (F?) |
80 +--------------------------+--------+--------------+-------------+--------------+---------------+
81 +--------------------------+--------+--------------+-------------+--------------+---------------+
82 | 4 KB clone overwrite | P | N | P | N | P | N | P | N | N (F?) |
83 +--------------------------+--------+--------------+-------------+--------------+---------------+
84 | 100 byte clone overwrite | P | N | P | N | P | N | P | N | N (F?) |
85 +--------------------------+--------+--------------+-------------+--------------+---------------+