]> git.proxmox.com Git - ceph.git/blob - ceph/src/rocksdb/docs/_posts/2017-02-17-bulkoad-ingest-sst-file.markdown
update sources to ceph Nautilus 14.2.1
[ceph.git] / ceph / src / rocksdb / docs / _posts / 2017-02-17-bulkoad-ingest-sst-file.markdown
1 ---
2 title: Bulkloading by ingesting external SST files
3 layout: post
4 author: IslamAbdelRahman
5 category: blog
6 ---
7
8 ## Introduction
9
10 One of the basic operations of RocksDB is writing to RocksDB, Writes happen when user call (DB::Put, DB::Write, DB::Delete ... ), but what happens when you write to RocksDB ? .. this is a brief description of what happens.
11 - User insert a new key/value by calling DB::Put() (or DB::Write())
12 - We create a new entry for the new key/value in our in-memory structure (memtable / SkipList by default) and we assign it a new sequence number.
13 - When the memtable exceeds a specific size (64 MB for example), we convert this memtable to a SST file, and put this file in level 0 of our LSM-Tree
14 - Later, compaction will kick in and move data from level 0 to level 1, and then from level 1 to level 2 .. and so on
15
16 But what if we can skip these steps and add data to the lowest possible level directly ? This is what bulk-loading does
17
18 ## Bulkloading
19
20 - Write all of our keys and values into SST file outside of the DB
21 - Add the SST file into the LSM directly
22
23 This is bulk-loading, and in specific use-cases it allow users to achieve faster data loading and better write-amplification.
24
25 and doing it is as simple as
26 ```cpp
27 Options options;
28 SstFileWriter sst_file_writer(EnvOptions(), options, options.comparator);
29 Status s = sst_file_writer.Open(file_path);
30 assert(s.ok());
31
32 // Insert rows into the SST file, note that inserted keys must be
33 // strictly increasing (based on options.comparator)
34 for (...) {
35 s = sst_file_writer.Add(key, value);
36 assert(s.ok());
37 }
38
39 // Ingest the external SST file into the DB
40 s = db_->IngestExternalFile({"/home/usr/file1.sst"}, IngestExternalFileOptions());
41 assert(s.ok());
42 ```
43
44 You can find more details about how to generate SST files and ingesting them into RocksDB in this [wiki page](https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files)
45
46 ## Use cases
47 There are multiple use cases where bulkloading could be useful, for example
48 - Generating SST files in offline jobs in Hadoop, then downloading and ingesting the SST files into RocksDB
49 - Migrating shards between machines by dumping key-range in SST File and loading the file in a different machine
50 - Migrating from a different storage (InnoDB to RocksDB migration in MyRocks)