{% if include.truncate %} {% if page.content contains '' %} diff --git a/ceph/src/rocksdb/docs/_includes/social_plugins.html b/ceph/src/rocksdb/docs/_includes/social_plugins.html index db4f1aecd..9b36580dc 100644 --- a/ceph/src/rocksdb/docs/_includes/social_plugins.html +++ b/ceph/src/rocksdb/docs/_includes/social_plugins.html @@ -21,4 +21,11 @@ fjs.parentNode.insertBefore(js, fjs); }(document, 'script', 'facebook-jssdk')); - + diff --git a/ceph/src/rocksdb/docs/_posts/2017-12-19-write-prepared-txn.markdown b/ceph/src/rocksdb/docs/_posts/2017-12-19-write-prepared-txn.markdown index d592b6f7b..439b3f83c 100644 --- a/ceph/src/rocksdb/docs/_posts/2017-12-19-write-prepared-txn.markdown +++ b/ceph/src/rocksdb/docs/_posts/2017-12-19-write-prepared-txn.markdown @@ -7,8 +7,6 @@ category: blog RocksDB supports both optimistic and pessimistic concurrency controls. The pessimistic transactions make use of locks to provide isolation between the transactions. The default write policy in pessimistic transactions is _WriteCommitted_, which means that the data is written to the DB, i.e., the memtable, only after the transaction is committed. This policy simplified the implementation but came with some limitations in throughput, transaction size, and variety in supported isolation levels. In the below, we explain these in detail and present the other write policies, _WritePrepared_ and _WriteUnprepared_. We then dive into the design of _WritePrepared_ transactions. -> _WritePrepared_ are to be announced as production-ready soon. - ### WriteCommitted, Pros and Cons With _WriteCommitted_ write policy, the data is written to the memtable only after the transaction commits. This greatly simplifies the read path as any data that is read by other transactions can be assumed to be committed. This write policy, however, implies that the writes are buffered in memory in the meanwhile. This makes memory a bottleneck for large transactions. The delay of the commit phase in 2PC (two-phase commit) also becomes noticeable since most of the work, i.e., writing to memtable, is done at the commit phase. When the commit of multiple transactions are done in a serial fashion, such as in 2PC implementation of MySQL, the lengthy commit latency becomes a major contributor to lower throughput. Moreover this write policy cannot provide weaker isolation levels, such as READ UNCOMMITTED, that could potentially provide higher throughput for some applications. @@ -28,10 +26,16 @@ With _WritePrepared_, a transaction still buffers the writes in a write batch ob The _CommitCache_ is a lock-free data structure that caches the recent commit entries. Looking up the entries in the cache must be enough for almost all th transactions that commit in a timely manner. When evicting the older entries from the cache, it still maintains some other data structures to cover the corner cases for transactions that takes abnormally too long to finish. We will cover them in the design details below. -### Preliminary Results -The full experimental results are to be reported soon. Here we present the improvement in tps observed in some preliminary experiments with MyRocks: -* sysbench update-noindex: 25% -* sysbench read-write: 7.6% -* linkbench: 3.7% +### Benchmark Results +Here we presents the improvements observed in MyRocks with sysbench and linkbench: +* benchmark...........tps.........p95 latency....cpu/query +* insert...................68% +* update-noindex...30%......38% +* update-index.......61%.......28% +* read-write............6%........3.5% +* read-only...........-1.2%.....-1.8% +* linkbench.............1.9%......+overall........0.6% + +Here are also the detailed results for [In-Memory Sysbench](https://gist.github.com/maysamyabandeh/bdb868091b2929a6d938615fdcf58424) and [SSD Sysbench](https://gist.github.com/maysamyabandeh/ff94f378ab48925025c34c47eff99306) curtesy of [@mdcallag](https://github.com/mdcallag). Learn more [here](https://github.com/facebook/rocksdb/wiki/WritePrepared-Transactions). diff --git a/ceph/src/rocksdb/docs/_posts/2018-11-21-delete-range.markdown b/ceph/src/rocksdb/docs/_posts/2018-11-21-delete-range.markdown new file mode 100644 index 000000000..96fc3562d --- /dev/null +++ b/ceph/src/rocksdb/docs/_posts/2018-11-21-delete-range.markdown @@ -0,0 +1,292 @@ +--- +title: "DeleteRange: A New Native RocksDB Operation" +layout: post +author: +- abhimadan +- ajkr +category: blog +--- +## Motivation + +### Deletion patterns in LSM + +Deleting a range of keys is a common pattern in RocksDB. Most systems built on top of +RocksDB have multi-component key schemas, where keys sharing a common prefix are +logically related. Here are some examples. + +MyRocks is a MySQL fork using RocksDB as its storage engine. Each key's first +four bytes identify the table or index to which that key belongs. Thus dropping +a table or index involves deleting all the keys with that prefix. + +Rockssandra is a Cassandra variant that uses RocksDB as its storage engine. One +of its admin tool commands, `nodetool cleanup`, removes key-ranges that have been migrated +to other nodes in the cluster. + +Marketplace uses RocksDB to store product data. Its key begins with product ID, +and it stores various data associated with the product in separate keys. When a +product is removed, all these keys must be deleted. + +When we decide what to improve, we try to find a use case that's common across +users, since we want to build a generally useful system, not one that has many +one-off features for individual users. The range deletion pattern is common as +illustrated above, so from this perspective it's a good target for optimization. + +### Existing mechanisms: challenges and opportunities + +The most common pattern we see is scan-and-delete, i.e., advance an iterator +through the to-be-deleted range, and issue a `Delete` for each key. This is +slow (involves read I/O) so cannot be done in any critical path. Additionally, +it creates many tombstones, which slows down iterators and doesn't offer a deadline +for space reclamation. + +Another common pattern is using a custom compaction filter that drops keys in +the deleted range(s). This deletes the range asynchronously, so cannot be used +in cases where readers must not see keys in deleted ranges. Further, it has the +disadvantage of outputting tombstones to all but the bottom level. That's +because compaction cannot detect whether dropping a key would cause an older +version at a lower level to reappear. + +If space reclamation time is important, or it is important that the deleted +range not affect iterators, the user can trigger `CompactRange` on the deleted +range. This can involve arbitrarily long waits in the compaction queue, and +increases write-amp. By the time it's finished, however, the range is completely +gone from the LSM. + +`DeleteFilesInRange` can be used prior to compacting the deleted range as long +as snapshot readers do not need to access them. It drops files that are +completely contained in the deleted range. That saves write-amp because, in +`CompactRange`, the file data would have to be rewritten several times before it +reaches the bottom of the LSM, where tombstones can finally be dropped. + +In addition to the above approaches having various drawbacks, they are quite +complicated to reason about and implement. In an ideal world, deleting a range +of keys would be (1) simple, i.e., a single API call; (2) synchronous, i.e., +when the call finishes, the keys are guaranteed to be wiped from the DB; (3) low +latency so it can be used in critical paths; and (4) a first-class operation +with all the guarantees of any other write, like atomicity, crash-recovery, etc. + +## v1: Getting it to work + +### Where to persist them? + +The first place we thought about storing them is inline with the data blocks. +We could not think of a good way to do it, however, since the start of a range +tombstone covering a key could be anywhere, making binary search impossible. +So, we decided to investigate segregated storage. + +A second solution we considered is appending to the manifest. This file is +append-only, periodically compacted, and stores metadata like the level to which +each SST belongs. This is tempting because it leverages an existing file, which +is maintained in the background and fully read when the DB is opened. However, +it conceptually violates the manifest's purpose, which is to store metadata. It +also has no way to detect when a range tombstone no longer covers anything and +is droppable. Further, it'd be possible for keys above a range tombstone to disappear +when they have their seqnums zeroed upon compaction to the bottommost level. + +A third candidate is using a separate column family. This has similar problems +to the manifest approach. That is, we cannot easily detect when a range +tombstone is obsolete, and seqnum zeroing can cause a key +to go from above a range tombstone to below, i.e., disappearing. The upside is +we can reuse logic for memory buffering, consistent reads/writes, etc. + +The problems with the second and third solutions indicate a need for range +tombstones to be aware of flush/compaction. An easy way to achieve this is put +them in the SST files themselves - but not in the data blocks, as explained for +the first solution. So, we introduced a separate meta-block for range tombstones. +This resolved the problem of when to obsolete range tombstones, as it's simple: +when they're compacted to the bottom level. We also reused the LSM invariants +that newer versions of a key are always in a higher level to prevent the seqnum +zeroing problem. This approach has the side benefit of constraining the range +tombstones seen during reads to ones in a similar key-range. + +![](/static/images/delrange/delrange_sst_blocks.png) +{: style="display: block; margin-left: auto; margin-right: auto; width: 80%"} + +*When there are range tombstones in an SST, they are segregated in a separate meta-block* +{: style="text-align: center"} + +![](/static/images/delrange/delrange_key_schema.png) +{: style="display: block; margin-left: auto; margin-right: auto; width: 80%"} + +*Logical range tombstones (left) and their corresponding physical key-value representation (right)* +{: style="text-align: center"} + +### Write path + +`WriteBatch` stores range tombstones in its buffer which are logged to the WAL and +then applied to a dedicated range tombstone memtable during `Write`. Later in +the background the range tombstone memtable and its corresponding data memtable +are flushed together into a single SST with a range tombstone meta-block. SSTs +periodically undergo compaction which rewrites SSTs with point data and range +tombstones dropped or merged wherever possible. + +We chose to use a dedicated memtable for range tombstones. The memtable +representation is always skiplist in order to minimize overhead in the usual +case, which is the memtable contains zero or a small number of range tombstones. +The range tombstones are segregated to a separate memtable for the same reason +we segregated range tombstones in SSTs. That is, we did not know how to +interleave the range tombstone with point data in a way that we would be able to +find it for arbitrary keys that it covers. + +![](/static/images/delrange/delrange_write_path.png) +{: style="display: block; margin-left: auto; margin-right: auto; width: 70%"} + +*Lifetime of point keys and range tombstones in RocksDB* +{: style="text-align: center"} + +During flush and compaction, we chose to write out all non-obsolete range +tombstones unsorted. Sorting by a single dimension is easy to implement, but +doesn't bring asymptotic improvement to queries over range data. Ideally, we +want to store skylines (see “Read Path” subsection below) computed over our ranges so we can binary search. +However, a couple of concerns cause doing this in flush and compaction to feel +unsatisfactory: (1) we need to store multiple skylines, one for each snapshot, +which further complicates the range tombstone meta-block encoding; and (2) even +if we implement this, the range tombstone memtable still needs to be linearly +scanned. Given these concerns we decided to defer collapsing work to the read +side, hoping a good caching strategy could optimize this at some future point. + + +### Read path + +In point lookups, we aggregate range tombstones in an unordered vector as we +search through live memtable, immutable memtables, and then SSTs. When a key is +found that matches the lookup key, we do a scan through the vector, checking +whether the key is deleted. + +In iterators, we aggregate range tombstones into a skyline as we visit live +memtable, immutable memtables, and SSTs. The skyline is expensive to construct but fast to determine whether a key is covered. The skyline keeps track of the most recent range tombstone found to optimize `Next` and `Prev`. + +|![](/static/images/delrange/delrange_uncollapsed.png) |![](/static/images/delrange/delrange_collapsed.png) | + +*([Image source: Leetcode](https://leetcode.com/problems/the-skyline-problem/description/)) The skyline problem involves taking building location/height data in the +unsearchable form of A and converting it to the form of B, which is +binary-searchable. With overlapping range tombstones, to achieve efficient +searching we need to solve an analogous problem, where the x-axis is the +key-space and the y-axis is the sequence number.* +{: style="text-align: center"} + +### Performance characteristics + +For the v1 implementation, writes are much faster compared to the scan and +delete (optionally within a transaction) pattern. `DeleteRange` only logs to WAL +and applies to memtable. Logging to WAL always `fflush`es, and optionally +`fsync`s or `fdatasync`s. Applying to memtable is always an in-memory operation. +Since range tombstones have a dedicated skiplist memtable, the complexity of inserting is O(log(T)), where T is the number of existing buffered range tombstones. + +Reading in the presence of v1 range tombstones, however, is much slower than reads +in a database where scan-and-delete has happened, due to the linear scan over +range tombstone memtables/meta-blocks. + +Iterating in a database with v1 range tombstones is usually slower than in a +scan-and-delete database, although the gap lessens as iterations grow longer. +When an iterator is first created and seeked, we construct a skyline over its +tombstones. This operation is O(T\*log(T)) where T is the number of tombstones +found across live memtable, immutable memtable, L0 files, and one file from each +of the L1+ levels. However, moving the iterator forwards or backwards is simply +a constant-time operation (excluding edge cases, e.g., many range tombstones +between consecutive point keys). + +## v2: Making it fast + +`DeleteRange`’s negative impact on read perf is a barrier to its adoption. The +root cause is range tombstones are not stored or cached in a format that can be +efficiently searched. We needed to design DeleteRange so that we could maintain +write performance while making read performance competitive with workarounds +used in production (e.g., scan-and-delete). + +### Representations + +The key idea of the redesign is that, instead of globally collapsing range tombstones, + we can locally “fragment” them for each SST file and memtable to guarantee that: + +* no range tombstones overlap; and +* range tombstones are ordered by start key. + +Combined, these properties make range tombstones binary searchable. This + fragmentation will happen on the read path, but unlike the previous design, we can + easily cache many of these range tombstone fragments on the read path. + +### Write path + +The write path remains unchanged. + +### Read path + +When an SST file is opened, its range tombstones are fragmented and cached. For point + lookups, we binary search each file's fragmented range tombstones for one that covers + the lookup key. Unlike the old design, once we find a tombstone, we no longer need to + search for the key in lower levels, since we know that any keys on those levels will be + covered (though we do still check the current level since there may be keys written after + the range tombstone). + +For range scans, we create iterators over all the fragmented range + tombstones and store them in a list, seeking each one to cover the start key of the range + scan (if possible), and query each encountered key in this structure as in the old design, + advancing range tombstone iterators as necessary. In effect, we implicitly create a skyline. + This requires significantly less work on iterator creation, but since each memtable/SST has +its own range tombstone iterator, querying range tombstones requires key comparisons (and +possibly iterator increments) for several iterators (as opposed to v1, where we had a global +collapsed representation of all range tombstones). As a result, very long range scans may become + slower than before, but short range scans are an order of magnitude faster, which are the + more common class of range scan. + +## Benchmarks + +To understand the performance of this new design, we used `db_bench` to compare point lookup, short range scan, + and long range scan performance across: + +* the v1 DeleteRange design, +* the scan-and-delete workaround, and +* the v2 DeleteRange design. + +In these benchmarks, we used a database with 5 million data keys, and 10000 range tombstones (ignoring +those dropped during compaction) that were written in regular intervals after 4.5 million data keys were written. +Writing the range tombstones ensures that most of them are not compacted away, and we have more tombstones +in higher levels that cover keys in lower levels, which allows the benchmarks to exercise more interesting behavior +when reading deleted keys. + +Point lookup benchmarks read 100000 keys from a database using `readwhilewriting`. Range scan benchmarks used +`seekrandomwhilewriting` and seeked 100000 times, and advanced up to 10 keys away from the seek position for short range scans, and advanced up to 1000 keys away from the seek position for long range scans. + +The results are summarized in the tables below, averaged over 10 runs (note the +different SHAs for v1 benchmarks are due to a new `db_bench` flag that was added in order to compare performance with databases with no tombstones; for brevity, those results are not reported here). Also note that the block cache was large enough to hold the entire db, so the large throughput is due to limited I/Os and little time spent on decompression. The range tombstone blocks are always pinned uncompressed in memory. We believe these setup details should not affect relative performance between versions. + +### Point Lookups + +|Name |SHA |avg micros/op |avg ops/sec | +|v1 |35cd754a6 |1.3179 |759,830.90 | +|scan-del |7528130e3 |0.6036 |1,667,237.70 | +|v2 |7528130e3 |0.6128 |1,634,633.40 | + +### Short Range Scans + +|Name |SHA |avg micros/op |avg ops/sec | +|v1 |0ed738fdd |6.23 |176,562.00 | +|scan-del |PR 4677 |2.6844 |377,313.00 | +|v2 |PR 4677 |2.8226 |361,249.70 | + +### Long Range scans + +|Name |SHA |avg micros/op |avg ops/sec | +|v1 |0ed738fdd |52.7066 |19,074.00 | +|scan-del |PR 4677 |38.0325 |26,648.60 | +|v2 |PR 4677 |41.2882 |24,714.70 | + +## Future Work + +Note that memtable range tombstones are fragmented every read; for now this is acceptable, + since we expect there to be relatively few range tombstones in memtables (and users can + enforce this by keeping track of the number of memtable range deletions and manually flushing + after it passes a threshold). In the future, a specialized data structure can be used for storing + range tombstones in memory to avoid this work. + +Another future optimization is to create a new format version that requires range tombstones to + be stored in a fragmented form. This would save time when opening SST files, and when `max_open_files` +is not -1 (i.e., files may be opened several times). + +## Acknowledgements + +Special thanks to Peter Mattis and Nikhil Benesch from Cockroach Labs, who were early users of +DeleteRange v1 in production, contributed the cleanest/most efficient v1 aggregation implementation, found and fixed bugs, and provided initial DeleteRange v2 design and continued help. + +Thanks to Huachao Huang and Jinpeng Zhang from PingCAP for early DeleteRange v1 adoption, bug reports, and fixes. diff --git a/ceph/src/rocksdb/docs/_posts/2019-03-08-format-version-4.markdown b/ceph/src/rocksdb/docs/_posts/2019-03-08-format-version-4.markdown new file mode 100644 index 000000000..ce657696c --- /dev/null +++ b/ceph/src/rocksdb/docs/_posts/2019-03-08-format-version-4.markdown @@ -0,0 +1,36 @@ +--- +title: format_version 4 +layout: post +author: maysamyabandeh +category: blog +--- + +The data blocks in RocksDB consist of a sequence of key/values pairs sorted by key, where the pairs are grouped into _restart intervals_ specified by `block_restart_interval`. Up to RocksDB version 5.14, where the latest and default value of `BlockBasedTableOptions::format_version` is 2, the format of index and data blocks are the same: index blocks use the same key format of <`user_key`,`seq`> and encode pointers to data blocks, <`offset`,`size`>, to a byte string and use them as values. The only difference is that the index blocks use `index_block_restart_interval` for the size of _restart intervals_. `format_version=`3,4 offer more optimized, backward-compatible, yet forward-incompatible format for index blocks. + +### Pros + +Using `format_version`=4 significantly reduces the index block size, in some cases around 4-5x. This frees more space in block cache, which would result in higher hit rate for data and filter blocks, or offer the same performance with a smaller block cache size. + +### Cons + +Being _forward-incompatible_ means that if you enable `format_version=`4 you cannot downgrade to a RocksDB version lower than 5.16. + +### How to use it? + +- `BlockBasedTableOptions::format_version` = 4 +- `BlockBasedTableOptions::index_block_restart_interval` = 16 + +### What is format_version 3? +(Since RocksDB 5.15) In most cases, the sequence number `seq` is not necessary for keys in the index blocks. In such cases, `format_version`=3 skips encoding the sequence number and sets `index_key_is_user_key` in TableProperties, which is used by the reader to know how to decode the index block. + +### What is format_version 4? +(Since RocksDB 5.16) Changes the format of index blocks by delta encoding the index values, which are the block handles. This saves the encoding of `BlockHandle::offset` of the non-head index entries in each restart interval. If used, `TableProperties::index_value_is_delta_encoded` is set, which is used by the reader to know how to decode the index block. The format of each key is (shared_size, non_shared_size, shared, non_shared). The format of each value, i.e., block handle, is (offset, size) whenever the shared_size is 0, which included the first entry in each restart point. Otherwise the format is delta-size = block handle size - size of last block handle. + +The index format in `format_version=4` would be as follows: + + restart_point 0: k, v (off, sz), k, v (delta-sz), ..., k, v (delta-sz) + restart_point 1: k, v (off, sz), k, v (delta-sz), ..., k, v (delta-sz) + ... + restart_point n-1: k, v (off, sz), k, v (delta-sz), ..., k, v (delta-sz) + where, k is key, v is value, and its encoding is in parenthesis. + diff --git a/ceph/src/rocksdb/docs/_sass/_blog.scss b/ceph/src/rocksdb/docs/_sass/_blog.scss index 74335d10b..12a73c1fc 100644 --- a/ceph/src/rocksdb/docs/_sass/_blog.scss +++ b/ceph/src/rocksdb/docs/_sass/_blog.scss @@ -35,11 +35,13 @@ border-radius: 50%; height: 50px; left: 50%; - margin-left: -25px; + margin-left: auto; + margin-right: auto; + display: inline-block; overflow: hidden; - position: absolute; + position: static; top: -25px; width: 50px; } } -} \ No newline at end of file +} diff --git a/ceph/src/rocksdb/docs/static/images/delrange/delrange_collapsed.png b/ceph/src/rocksdb/docs/static/images/delrange/delrange_collapsed.png new file mode 100644 index 0000000000000000000000000000000000000000..52246c2c1d655da30d203e0fdc05d5c06eb68f3a GIT binary patch literal 29265 zcmeFabyU<{^gap*3JOw6h_ryTbi=3!3JOSfN=Ql!Im8%%k_JeFBHbc2q%R#qNVjw| zbjLk|ugLqovF=^#uDjN6U0wb$44==5efHVs+0Wi*^IAzkmf+&Gix?Of1P|^@D`Q}s zOUJ;#TE#sNzEQiQ6NiC8hw(uAwyG=U;;{Ei{oP@ql{QE7B@>!-BV3hF8zJ;xls|QR z`XFsvD)^G=txiVXCu=9uskE-I9hOXpYo+WRayNCfN}Ww{<<)zs@#G`tNyL_ke7TzY zj3VSDyaRnZ>Sv88^;RBj=6K(e|M=?F zD`D>=*yKWgzPX99@#iFo?kwg@WxLbSJOTpU{Zfun1w^UQKAzuf89z&8;2`#AXJ;*R zi=WZ2b>B%)z;cUeJM5CwPI=rB9)<-Sl07ytJvX>@I6iJXgn+U$v#^*rI~UpvcgBa~ zPD7fuCPJ@Ywb5p6AC_H@h_Cm$1V`Xa9^YNpw4==$Y_L$&NC@7y$arB% zKEjTPg^f!@2l2r`f8dveDA}}XK4m@oZQ(1Q1O%tA77O0#b3}CQ*D-Gwh?pcLoq3z6 zj|}Vid?AHd?2UihL%#rV#K!KP4h#z?`P~cn)=&pmynxeP%bw+9c(}2bm*kp2e7pcJ zukwR{weZ8;hyB!I&NbP3rJ^}%{5J+eZPMh$9K;Lv`L^c^L#DmvT$n^FS)evOx zyP?@bvKa@ldi|I8Q$3wFy#^#I`ehT>z2ig=M^ok}n&v%|(&HC$L(E1O+JnQ6*5slllNi9S zFd5t0IxdSWL>UQIX6pqAJ9E%n7a<{ByIdR1Zqytk8f1%M0N;CWCHwq)Q-{azl>xMIuG+40|`f zIYLs^?S(;e4!kI!HmYVtw)?YRl~30AXn*_Ls3(8?1CQDEPYu0ZACG|nI9xC|BvbZV za=g&I64rJ*eT}RWnwy$VAjUBMO!`2SVM|u}XsGQS@wT$`0S^PsmP)A2nO#PKsQT9! z1cT|6x}cUbDtfTHV~EAP4qPuV!q2}Q-+wo3OU2Duo4>xkI+&EyCsI{0V?Dp=Uo0$o zE#&;l$4E@UM~~0!Q5b7ze%>%%HIarpj{+<#F{PP`h$!9pE+`q2@(#%Ol=q7?-o~(?t>o zP^#+~)qvju5Z@m&_!%b0hoi(MCd4_FT$z{eX?^-LB6soze8GlVN7V~y zCDhFb7w++cU4Os36>q*#`5*7p`(w5Z*w!0$n*T9VP%>9z8R*letXz7CEAie(ZZmiK zHs9IDy@0Kb*G4qXnAqk{Zn?27yH8o?rl~>68^kt`<~5=#^X)1R8-3j8lxxPSi9NQr zbfT8d$T~HxiuCs8_ZJuJr<%h0@=RpQ_U1Dd1jV>NlzSbzJK1rUk6c*4G8ELLWhaIb zg8W<__CH{AY^)!ioUr0D2=41ZdA@44E=K$@rbK<2RVKGL$Ey8S2dr{xhlX^=1mcbl zW~3O+4FclA-t(m`SC#gUS+vhM2yV`I`@=XrO5FIyYsr!6;Y-<)esjvB3`U0+}45fmiT$kFG-FN|jE@{3{&O@SRxad&?cGFNkyg6`P~ z99jO^-nX8>Q(EJtMnzr;<%kS7t%Z0WB=RL%);Y0Ta_(8jt(E2~Y59dq?$Aw#&;%u= z#Ajy`u+aoV^qq69db3-nE>x%8PCH#XD7mX83?>d#b4^WbJ*qP;EiK1O9~BMEsFV8C zmhFdcamwT=Qn?rjhO*7t<+aO26MyTrGEA(w6-sEsi2#v&EH*Jw_zIMh~2gSdVgnWWjnwOdh zF0!Uht6C=y>GGydKdl&Ms-Aajljtq6GCwlsGnW~0U&wNu2ogfjYOM>HKd~Ds&YMOc z%OdlO?J9Ps%_QgfuyMOI4L1WV1-k_@1ZZwO`h9|Ak~DbGnN}=78*RY2fpH2psSmc% zL>(ROi}9toZMVi~Ays4Ooy$%6OtXt@wmokbu#bhEz>I%!L^tk}c8;cidz||rp9wb* zk~!@A-wD0i`^D|bx=JHyt&KzZc3pJsl5}Xm+=2w6U?V*kMXe+;zne={5Q2!}4%xuhvH) z-GSM?E`ducX5}~)>#_dBr5R66j*KSz%`?(;vQC@8#>EFyN^sv<8EcT0c|zg2RM1P~ zF>O>RJldwR%Ud%&qYx`9uo0Y^@C1&Z)+R-p*J{0|q?7OMUGuS5im-s@73m+KrSux2 zm~yn-pP%B`WEZ!!f3%NsS)Pwh!JHY6UHkAe<^eKvuOP>hlS{XlXxB}?8JjmDPK?}s z-?`>+e+tT#^Wbobo3XC0DMRJanE`%aTai{msyMAb<`E%9j1;8o9$n%tYkxXB)%;QH z(+1Fh)vZB0myza8N(BSAsW4&t@fyU`hXIu^0)4vb@c?d_vI7UV-t_55F>B3tenHO@ zsAz=~hnwzqnR>qEjalDNfmJVpdbGsRJVwNkInMiNcW0whabhAwD!7bQ&UXmLfrOvC zAZp|?jFoBOS4HSe@wS)2pi=T67hz{mQyW0MyfGN;Da-xGB4;v!aWA!|X0PwokF?&} zN7{MiuR{U)tMA!0ipE_WqO>2oQ*(TvSi=&JpVO(;W|edAiDtpS>WUUXyCpo%Ho41^ zICc;LiNiYZladx+Z+T{$%`Zu_S~_~~x@6gAVgK_#S`M$n!%YvO`WhuDY%V(L!970N z_}#agW4kWMU%a;XjTnCK2c7&sBmAZX0QmZB_;8W9#P(({Bl=`1Evht(%>O2JIOlL5 zim>ZvOgnQngk2Q353)?kdhE82!<%VV)ncAGQdRQKiY-{wBqur~a1 zJx|8$alkO_&*&h{0GfvrSlA#sV~Hi~8I0XU&2sbt(i$5nLi{_f2EncnACnT7*43Mg zYgE38h|s*8fi-`TY4i`W?9&H`f9o1*%|lFdz-@W1>z_N<<6{Bn(V1 zooL@VfQHXsj$%Hue;^4+fSaRbHAere5zqn%aDM&{^Y?oEKa!?mL5btu*)`|GCL<>| zF)}jx8AMG@9rX6?TlmL}s>D8NRqGy&jnyA=uMj=Mh)15SaXh62EzMipq9yk$; zgGD*(Y{j<7+&7As($!Kk-v3#Ccs;g*4Aj|qD{uS!>fNFiV3}K6EYJ76XHl%nGpTR+ zb{Wg{zG<-x^nUJ}KQIZBppQlAiyA4X;;z^+W=U00RP4=wYp;j-BaB1KBqiR*5IJ{q`pH}hl@!idSu&Q(QS)dV0Et1I(~CfP7-0z$#lyPbD=<2&da}s9Ww#`^?zR zsFC54I44qC&v8b#>Mu!2NvJ067Pd3nI6)2$ zbDmFANg4N(Iv9&W7Q3MIVXL*|R@v)jyWr{FY0EP-Kmry#Mh_|585k5K@p%8>05(+V zef)a6{Be*^f_6};_HRH0HhwC^uq{r){$~}O#j@2(i&nyYqh^03RR*GzRi}6MoK&R? z%}+Nt-s@#g)e6%gCQ>c>Je7aOa`5Z_1aL11b$3t<*agp*n;~bUUXM*iMn<#WPq;}+ zq`Dx)tSx;;s6;r)$;ojyB~TZ|MMd}i&w9TlBt%X~BkuZn+732B^;SsxYu8;c$a-N5 zoHJ_6ggaE`=7>z#YvF4{QyOv>a5h0kx>Q@4M)Nb8WKmYNvEig!;=r`_n4kS4EZb7j zKmdhno#qDw6k^|g@$7ot$Sp3Wp!M1@iV=6?H8V4#$zZCUD{-8O+1ihg+msSl^CbQD zXF6ckR+pYM#Kgwxd6<4L1!(aNnw?s06+_hwm%1=EYnME^k6hXN=Ms0sBRxEo7bu+W zYxvKZ&zyU?g;lX)F2>;BQWy?unzaIW9*QyZ(a}poCHbUE-m3uyP8HlYo95pE8~X0M zn(Uw6tEO@GA|LDJgul2z=&Z0U0o%!q?Ai=v^BNUMze?yce`e)xYS2}qMvu0NTCJ^0 z1Tt*$!Qo<-1Pea1Z0fiVA3dsV=B~Uh3tW^C$vq3C$_t&2j`J>elx$XhK0RwN%JC*Q zGvZ-u4-R_04>e}^fBF&9ZmEZ6`7->x*8M4$?ZcUNZ{tn|eQ1r77}R(y@*=8D3EA82 za)k7WYEwU~mpTG|>k)#AeUnL|I0&cvyB6E)Gp}M^S{x}C3#Jz9(MT)tJl_SAMxwm7 zON%}Cw@gy@rx)ip7kVca436EwmTV*l-}^z+Qm1f89hj(mX8BX|su~*Zqxb0QhzAN9 z7YoLqSMo&jaDw)@4=;~>blXz^Y$}1}Gu|8oY$}-7{Hsgk#>Yp8id}I4*0osrs>>IE zCku`&S^k{404wI@Ks=Pb$04+tF(Jl&DPbWtO{fvjR-y>RxLs*E!;e_huCp?N}v9ysi|oS z*{#3=hcA|s+8+=?r&JCZ8FR@p-CiEHDqkum0Xj&Mr*_5)LnN?)IUNVyy&^xe;)PZ! zPObE-M5-B^8e(U9x%zvj2HC+FfeWf~J3zHW8KR_@Hs*0=M%vuBBqXRcvUOj>t{~^9 zBF*Do>y8j_@dMymwdDiUSv$!1b*k)_zsl*up zO5jeiR$L&Y2@D|Tc2?M5`h**e{}`e~&5-^1%q~w2fs7!9XeIK!kB@{s_ncT#5zGaW z6Zv16p3ln>jGmp|>nh+niMb3@t53NJjoNuG=&UZ;d5hK6)eZ4K)oeedc8iWKj&TjN;hCvmU*n=n z6@+03#Q)jpm{>O8tQZvhym$Ak7=6L26VO7CXSSOUWDp?h!l2ZnGd2{GFb4Kz^#!H% zGs2*27X<43cbLD|1OES=G=HWQLhpOy5?O>+{c|Vq;iKAsfB>Efg?w`|aEoJd16#0?b}g1=SDg)iSmQ6!8l=J z`(@plzCJVg6_Hb!oX0DcDY*S2DGmD?duV=YJdDJC!=fgPUFT9-hFC17P;8_+4W>NGxV%bTIM*Y-BQpKlKPF4Rn>ZY8`D^&9BS>Q$vxmJjy7!+vCYsLPpfc$iNO@3 zRrD*@rPDdG1I<^*PgEdE{RC-sKH5k*)C)oM+lFcNnhj1mo^D17uxlhb`vHXo#aa6> zT2^WonIEI2(KF}X4fU)2jdma^pta<{56&bO?;w&a0}oudSJ9W@`av+fkje zifzfZ(xWpKvsQ}Ocf0tk$L18RT6_cBes-+W6-(fXrlx9UJ~CrXc=s zyR6(wJ^GgAz{!vvk=#lg_|ox19j!X4uc|=&%EZ`sJR=8ivR$9`uH8Nv_W%QEh?6mm z7Tu6i6a)|=y1MxY+a@M>+*b^1W-SCZGYZ16#GDr`Oe>w=M(5?!{_ouv)q5ZNeeX?8rU-L{6;gcYGOAd#aZY8WLO$7uX3=TC1gtW~6k5%Jp zUp~ut&1-*N#gG3NDi-=_b*RMuaKG8guqO-}P8r<15PJ$|2IU}1!|K&p`Pgz-8akf= z%Gp5ylPf=)1n(4%^koB7>rCx-$gJcGX?4&qCIar(ie>ZH{9s;lJHK8krqdHp`elNp z5D1pKV!2&gZhQ|=sANoJHz4i4=3h=ZxNLVl>^G@#6Yxfyl)6yt!dQq^C}PQ(mHOXM-2gYRV%~w9p#bNxznF^TJ49wrb6AG+H2l;l4<$Q2uQG{T zEo-+cfKy!c42SoLKH}R!W@pjJn_>?kV>4!Bd;1A@Ss8}9ddAB%x+8+LwF0-|ygWqkm=k`BH>cw`p9IL?dIPx*=;GdXBia3Z+!u0CgG%|~NRPtDyp;B_kP$V@@k~86y0vZax-aC{m@3)9Z>ilFEfy4!!KNYyYMr19V_L-SC>nP z>q7{$Tf7FJoVe#87D36Jqx*WNzs;R6HKUd{cRH>5h@w=xt_?l!STOHW`u;mF5cJT{ z3#(z>GeK+$G5*Cac2lJlzkIjqs|}Fd^?2@hCjQy}wlMENB-P~;Jd+R(JU321)AvD5 zRLrLLsW)!JOgG~_$yVRT3lq2|y+)yrBXCSp-WYW=-(|0)?Rn=Nc(|G4=YdbHJ{Sh@ z2;7Da_><7k^_;4lGUZmxp6_L89(3wJX~f^GFXBT=8P@@rci@>iJveR_O*a zIDlcXW`wgW@0EW0cI6|N-kUb5V{vq-E-^8Y%0V2qb4zk>HgTsj{^&}80gWm%GxPY$ zSaqIxTU_5*zbVz;<~Lr#RJUP;^9^6a#Uo}okG%IfzgnuE__@|X@B{A}l$S;;tx>07 zNQxUw49aGV8d*xtszi&c;7mG%LKSSYTSLhD05FtgYfbF-fLv(s8Pr3kpIKXHk~%5I ziL+)p8=c>bYI}gx+4xv8xig)Jxu%s7m#>{%h5%=Dt|#`|?#OOn7w<=yxQ=8GXNWx` zqiIhj7utCCL4M|`csn|86!hu`lspoD!6y|>FJP|XvNUkDau;bgUKd2QT1RD9v$Nxx z61YCoRxcqm-tSZCr#4(w{F-O}(?*a`8Uvl(5Q7zZArd&TfG1{G_PCX`G6Ts=r}!w# zs~4>UbV{G9R4!e*^p-C%KR^G3Xfu-)>Wp(4^(5PB*HVeS> z;G8%wx4>^WH&PyPyqK@)zBdPz9J9eX#GCZgS>ur2wwr6~=oys#nI`^7$oy`pd~2q{ z)tIBTT5Q%~eFvs3o&Jt>>Pkx&w|AqwSF=_iH0$g3N3$Kp1+~#n zpmyK7%~3WAXy{~fWwzBqy|3tadq%204dpGhY`40jZuDMl_!hj(yZHhwti2>ezB($q z1e=Z6b$+ow|E<9=lVT_5#q@2|XvpsFZYwps_fmqJ+%uunkrGE1uY(C%myDHxkg7#% zDV^=gr60D9 z47YBbY?R1%AYRZVvLuuvriWS;*!61dnYNxcGwWRAJB* zNgO!WX*M#5uloj z>ejv-CPM(BL%~1nX7@4`EwQH3h&tUI@!DU~U(j1X&X!&xj!{{7ewdm;MTkwTBZs@_mKx1aU~k$n0s1=d_nE_tdO5npjL zNoHA70ZF)>Vg6-$!o0^le7@R`P^?WGDHe73^MaJ7S{AJbS%YKwdfnZ4?-560Pn%*p zgNHHVlw*()^2DGM!yQ)jmXWwO``%$C^X+EVt{&5ub19y=S^#_%|W z!^1kE3gZ236pCNM&gWZCq7mBqDkvc%^HH!Fv8yim_qxsRFHQV2S;4d>MRLb-H1Ohy z>Z)OZbji_M&$O>{1g z&K1}3YuzzS$WG`weT|#voV&c&a#7`>jXAV6_2_835(S&0Dj9aE+#lYlradB>*d0=A zHpE#=*lRYB87d{`5fbYA+RM(VxG8D>IjYhrr=;%{v${dSjzwk~smM zn`-%K)z=kG(I>PloaMQtiwrb>YHDVk|L1|I$JU~m(0Yj%jzggss>0*g3y^WuViBmJ zlI#AJWp3(~$iiQ|c%Kqo+qEXX!ZjZ2-W)>IPB^_lv&(d^Z?zEM4MZ5 zp+=kJm6{_fA>P*yKn$m=*TCR%4OPtfk-N<{zlPXDK1<`hMp})VF&r9|Lxw&#QUvnb5`coXHgagW`#HCm%rUG))%Xv-;jQFyZ`IPr&k^xDtEVU)~61(~68DiCBO zR#73&&#&#>p4@~`D#d#`?^-CiXHg0lpP9zCwl+%2_fJ~*li5YFf4^LQ?4Kdju+ z4wSC3kanUa<0mC{^EbFo9W;}zAJhSkC(*2Q08<@X?QY3GJ)*lV08EON!1Zfq<06vKS;*_36Jf+6CQsxLZrHM z?oZpG6D4*)4*jN0PlrvHI&k511n3V5e+BnfaR2LQT$Kv$8TF;ethz6~80^35Pobx_ z@laZvnpjM|93_i|eZ)zZdNlZt%bzXbPOhmN7!mcR7|MPbWOwpxn%+mYBOJ8+{bes* z%3C--8ug!v4*}O<`pU(O{bmtA8n*{^0INMRk@F`p`taftNV$j5_LvaVOI%>O*85aF1s#XJMg6uy32bn{u~X3_Cl5XRawpLV9L+)+2$+cRlg*_FU!@cVysn2; zftJYg(vGK#V961hACpr?^416%A@AGmGNN_lfSkXT6sD&F>`(H!M6J^$5Fl-++F3A= z9jee%s9vjkSb1Pt^Wwr(R|*-H0NRl9Q$(Fwc_kA#ejut+s?SO}01@qSwZb`TNHk#J zi3FSkSa6b1wcjoXUHdb5%p+t~#P_^KruYI&^#W_?rW!aW)x|OXKzR!v;9vnlshgj_ z+FlZ?{*ECPu<(NxxzN+hY=CwYrGgrAwY%R5*)>_Ib7B|1_Nni*eh=??bSh9uY{{eI z9}CjtDaqsGs}npEj|DvOjXi}VJhrSKK780!1i$+2p;8uxUfAiZ2}l|iRHXJZf1D5D z;HqoM(&5p1%C0KO0Z02R8MAfa3tgjimt%cTmE;4%Eh$?2V=q=U8`Z`J8TKs3c3vP# z({t%!r;VN?YoP4*K60#jWm{f+S7)?_41`F3{*Q*NSwKK*iF!$B6voHQa_nhbWTnoC z-MzJpaz9N(Y-fO9M2QSB)Ve#HT3XKM`IeXo{F7E0vP9zb`v3wYhMZx2u>bL4rAE2d zQAmhv#o>CT3|q@K`lSUCkQX>HdxJl}cRHQQjs&D6Zt9G( zefqqap?RJ1_3B-}@raq!?2xWd0T6<=pJ|P4&}iNRQKCJ;p5KQgE|Dggu)~e0pQBfGiLS+8hPO7hglQV=JTN z$%$0uSv2T>pQMlpY#CV)!pThT2e)(l2sKoSwDD9yp!;lwAlw(PY{yt2KOLQHZmw9J z%l{GL4Pb`1SK;aXuRbm;w|NvmnQo3rU&GnvIWp|#>3?yV!D@d7L_@kWM#!iT-udk}%dZ5Rr-(x7Eu__5A8}=M^q6`Tz-McLTr)akRdjwTE$r3~<-i+FRrv%1 zc*-7U)z<0Qu?q_zt-ip2y)L+}m9f2i?_f@m*!$3`CQ+6ca99BW%_+F3fr0~3(QEb~ z3i_6@`L=*}ArXjY{?Am@4f=M1OiFEVA837@SYDo(kFVKib1sDu%|yeHk*Z_g2(6;*P-Q&_Gf5Y_&4k2R0;kj=bVM~{~spj{LRSvbL{+0 z9Q%JI4bRP$N8xuaz$w)Go09i8!SMte{LRDsUo@*#yxVG@oSb}Oyf%P`k1yYd7z9V{ z$NqWMpmCbr_iMM`B$*XS<>KI|0r~i#YKC*QD7ZO8ol~(l?;D@@?zctwb@V&oxL;03 z-_p-0PZ#o-+Y>sdJbPZw(0=EzG8ZfU11ojSAV4?peUEoS!={oG&y_1;ad z+v1+jHz~O-^klWR%z;~1GXhispnyrFk>^~}M+Veo*xF>{Keia zk$;#a&)ii4B0fikE{39uzVNKPZdMKgAoz*Hh7l<{`XGbeG2Jfid<+W2#ERAzyTqx5 zsocONAaE@SuPrpyc~^uaQO2Dp=b)GIa@1H>B!ee-rm8&{fdNvu$8lOgeEqzf)XsN8QQBrI&P}mH9o6{FGZO&`otNRCaJnZORbF!b=X;1DJ33Az zUb_X-{7(uA{P^($u{jkBb8NKIeANSu%shh>z_MROKR)E;;J}-$TS5ZvCuu$&s;opt z-vn2Ea@J)=o;^)mP&qCSL;VpWSZ%}7OB+{I327y-o0yoyitSBjTKfJ7lRt_HNopjD zju(T@cQtkYn)Wj5Yr!I$$5kXFk3Q_$m-<)K#uI`64E+PR z#S^}eCHUVF0e|Q4NKSbmr-iATJd@Z6O+Z3R`-UiXlB5p}4G-Cmoz*!#Fk$kkB-*$d9 zEJhF94G(XYaQgwxctNht08HjQ%gOVI(CM34tJ{9JFKp6N6eNYdu zf|lFum-f3xLAjPXSz6C4;MPjKF;6`G$F|7m2!D{?T^}dmVeB`!PUuv1<%D}UN77Kb zUvYfob8%3bhteNh0xE{}vsOj_Tq6ek^D9)@nKDsoImk ziWSU$jy4TV$VPVIr-?)(K#9y9P*IK)kc<98Bd&~q#5J+CeD)1}<;-xg2YTP{U8n<=|24^e`moJCL`15{nFAchEl6zkhI{kv;IfNjXJvRTP_SS`VKo64}z=#Pe#7*b>&EpR( znW0m;<~~T3=j&H~mW8W8v*R)a}{yaLIqC}=&1Dml(AwGD$m4Ds}8zJ;;F@p%L zeTGC>0>arKNs0IZ;pNXl1+0mn8X_og6A>RD($UcYJ!cU|GDMILWb~RARERB@`{s+* zz$!tq2J>Wf=m6^D;<33Ay;;#J(JcC6g*KDYiwU`wLR9M7@t_!Odus4fNSnvLMcy|e zT->)+=U(~HNx$@ePIm+M?frAF{Ehv^zkmLsSx6}VeLV5A5*yo_Ps(XoOiWD5Xycg~>B@s+>bHnfr(jxc)x4{*YDdOB9oaj+U{KwLx(%P$oZ4Q$#&cZkUehm0V&C3Vw6~nA1fzz2m8j&EB5udRit4D;2brL|2U-N#IpS!8CS>1I;QGxI6 zqt8+p3H)|krkGnfn;$RD&GFca34McI#>fA#&*8fcYD1dewc}REcy}^QA6#kTgaMAq z`9Os6+~6~10m%X<+uhbGh%zHo^3DJY7K0KemSh+E_Vjd%h}qF2VdAXUIsOJ3uM?Y0 zz=ARO*x?byd+nvs(+ofbI!yy7nuPY>Up%*V?=qWxjB5(y!4xAeT>!U30(n@){jf5^ zVA*#LJIRSF57%Nhr}S3uT^@D%HtzDOk($J|-5ND}>)h2$?&Sd69R|P7$KC8LNeqK5 zx-4SUYUR9&wdL$rKPDPAm>z814qho$<{|86r{pvw`2uckOvgMEax~U}XbE&2F9slRlWc zn50kS*a-0PU70Sd-H5y%a${=ck!(ljjZrOwk*fxi;0JaujnpD-oflORPbAY+i2{Sv zR2O$XQ&lvKIdy&9%QB|Uf{4Lb+dSV^&!x_Xxg|NEHTQaK+9!t6P=Lv?s@a7U~bO2!H9|5}uP2h01L^vsx^^ zMn#+nXSRw;E;=j*@SlhqvzvW&?O@*$Xp($52Tz82*4K}qDp80o%O)sH*Yh}u-4oPj zRwLFYzxuD9zr(u)Gwn=%rx>t=9ixi8PerAmcOR{kBrFN;No-n5;fYNYL_OkD`+J)U zW=>9B;vQQZJ*!3J7#PGG=)V!bNLQW-!eP0Cf;Iym`uf@fA$#;EqoKnx!lDwS==(qeO zo~sAb$Y^9&*N3^qMSgH;gy-ck+rMJ)5yB4V7BWMP42T5|5344O7>3fJML>teL#$Lz zLfUX#`A{qW$U(F`4OB^nmy<}OGcq!6RiD4!nVrq$xVQ1N^L9(&Z++j9_Br3hsS`*^ z#W%gRG;m*raQXP?r_SORxrYMC z2vuf9?Q-{mr}7M$Y|he%gkLLlmn0$h8YNG1td?gW(dTu zhZwL`b86m{Gk|tywV3cD6h&i$)rw_GC>HQMcgFl8H+A2;bM+)WSz}+rk9;RLix+^g z&u}4#9T^Afe7wfkSYYGnrU83f?YGUCnV46RFB~i!dVH9}lb_%1rmEe4yiRLUXcxlS z)ki+OaxZxPiSC>RbW9G>$a68U1~#l)STQ(Epxwm%ic)+6to$@kKo0#OJH&_SrddxP5OK>#j?-XhvRq)iw8P zxuJIG}%{6=mfDJ=T3`k(G_~3UJ?5l}Rk(68T#Nfy!m;e?En=DbTa=D7!93*L*BnuCAhKTf z@DWl!dP3URiXZ}eez2$|4YdzDD3T$xNYNPP)TUz zJKP-^pCqN!g`~uaslZ8lY*<%^H}-8JVN9x}eGu541~u7z?bvvDArG)&jJNpsOh9$G zOs^)bO-`2^gxFsaf1De|6Q-_xfUe9Pqc+Nyi8qxPC##s0_$Xo*y4D~$Z*f;F9r~G^ zL=gF66f?DrF6Q5bSMz}zpny`97s@rG;`#W_wKu< zewfp@9Bu43-5sEODW&8q5>sn>J#Ht0B|BTJ$ENhsxnleK`$zWBN`4xLV$aPOS9on6k;r%&CM9rXJ$Km{%kEXJ90 z618~x6fcnHk|Y@AMQ4E3p9Q|dy>ZVF27|%%%@)%3y$ucNSc##d%r{m*)IK36r){2D zG>SVd1}Q7MpMEdl3U$(flj+k6)E$I=mU#!1>IJ0YD^YnhSuJzWS22)5aSc>fJP*p@ zw#`|xg31YNr}6%9R(vJBlq>zzw6ti(x_<{NbXLOI*!-it)pvr ze)>c|7$Dcy-PadU;=Y+K;y4|sRRJpjimy0AmOT@7_1kT1V}_sZ%1>>Oz3<||b=c3d zybOkTO3uN-!Pnd~E;5DXzrua$Er9zWJrP{(&ofpER_>XW+ufQHl^wi?fi*h`3U6m3 z8;>{SY=#ltTgt>pSykE&&%Q5TfOp27D^|I`ES3-I!LF~2-RGq>if@&n7_SQ+f+4tNg5U;7rA{@PxkZ$+Q_P>{n+G9@AtZzW=Lul%Pvp?tSc)Yu(SQ(>mQbW!apw zAS8&XLH2|Fm1pFzRaDzDzU+l|M|Yr{_#Ez4=-a+NHzDq&3UT6*{VB8zN*Cd|aUG8F z!*>nt_rh{Ds4RJzv!w8e7^ zr`(%yM66dG3AP}p8|CC_LJnx z1CYZsKvSI;lXwHG;0DCMEkx?b!Y-|SMEhND`BywcaU)!SD*FiEBdqS2Pg$uIU3X*W4`t~gMPe|XSqser}*;U$$kg-Y$Q zZSuxVHa7FbR12K~+z*L~4R_q+wDv(I`hhjF6pV9$H3wI-dBmC&QhV}e5kayT62|zn z?ok=7GK6}EG{K)86u78w0zCTqW%a6VZ&+z_IA?&k0@Xb_a?ERYK;cY*K@0Mp%EgJ- z5c-bd8K-`7j{>9XY8S4w@AAIDngV}Kq;PHgIumg!6K1dWCP-`PYb3$l{m%0Yai2cV z40){-_Q=^^hI6W`T$=8UPgDO*wt;CX%wES=Dor;S9?DI)<2_TzBfd4Vg&IB9El##; zV{kiQs`?@GDm;{N@!g10p!a8WU5EEwFR9NJwRv`0Aib=yXBH8jdt`89+$Ta?_hRpA zA}$do_g;-<+%?V-3`~^c4&&QOuBc z!$rmA>9VvkQS!#gke+WhUUK$gxW>n;d__Zfh%z>A{S#mLI`b@@A}&629DY2A# z6yv_Nax7eklk^<0?*Ga}x*YzY1z~qcTD6lh;)N>T=Kb{H;fI&OPj4z7&e$=EW0vnq zN987?Kq8g3om3%<0$VK0nEif5Uk1TMs`9Q9Ysb5!8(0-2hGmcnY5LXqjGTcx*m}w4 zYQdJ0#seDiSZwB24|KNK{2*JJRjF3(kYeb9dTE11`FYq-ac1fSU>PnFKYOesoc?l? z_W?-|Q{b!CxNoF7VP7}E?UPqr8HWr+RPg?OEytf#nCqZ6h^9f~tG&-DvsQf%rjpUa zAl++T^y&{aY1RYomF0aON(@&1`?kuV*GBYKps;$$~iPTUT9~!P0m499i2UbD1 za7lX4VQnZ8OSUTWzEu8!d*{g&`swQG>Nb%y|G+Fc)v$NCU2iZ5SovFt-vw7UMLeB; zU}S+E#5}d9SODy4vh39cipMv|HK>y?T31q@JE+`}rB1P72)%;IcB}5v&DTD5pxeE0 z#&;>YSb)U(8uQ9wY_1F+(>*0ZYY!C5;V&;QFRZuhPFKSr@HY2G;8_hFM_%$X(=&~C zKd64PQF;Z7zlOLcAiwG1j~4HL_U0aoYBWJb3dy!Ct2cmNUaT4h*Q^}Q8){d0?M{(L z6J`q(bUo}?Wl-Y6WcvaV>ITC#{(0e$v?6`!)g5jJy}N_=60U(EslQ0irewjqmN+-; zez@7Yei_Gs^u-v4VHxXdNE-Gt4BVue2erf$R8)L{4>*%u(o<6EM}k+Ma~P0@%{mzk zSrzxP%?sO*!gaCRX_?plc~FZo3jXlv)2HVO>55=z!s%D=sRXTaaj-kv*4NBg3iCot z0P7f0u5%m!QNt?(e~m`4CJ=KGpp}BFN?ioi3)xSxd~ewuV(-gL`44X zPmn#=0%}AuDkgU9&39)M-i~JaXDzFo#tqZv3Z{v5Dt9Z}U1&rgv$%*cJ!AvHD--Jb zlzFpBL6}1L>WvtKYj|PO394ALEcA05apSpURw{c>!2X%#j;v4ER7p{w)Rr6dKK2yL zk7pP5Z#1JK?IH?624;YhK_2VMmD!sTM-<8r@f6rIFI85#HWe&u|1yEJV351B76w1C zXQ@pK{9WaYx)3nrR6q#;mf*=3ZVfoy+f{5DHfE}A%k6H z_@bML+sbJ=*U;eKtzwqqtbkpJA^IlP$Mcr3u%;@v&23C4NGCDGZ{ zA=cjOICs;W^913bhZo`D;dEhgI6!;81JtzSH)?G{TYvL4gan&L!u`qS4+>Ha?Qb_{ zmzOX9m~(mt{(wkAWu+8AB$xVn`}-*$Dkw11e_rnYf(>Y`7h7A-8Qv2U6K<@jF~|T} zu?*=1P3m&6JCiw*9wu2Q1tw)CwInUp(xCt76eKe9238UY?g9hUnJ0eHVKXJ(>nl`Z z#@XqA3tj=C{&ZL!<(T2&;fx)-bLYOt#TnL_1AexsGQA$gRnFF14%ffDrRpdlZD@1@Y5c{&&L>Y;mhEvk^>i+?O*zA?*t??rO0000!WbABYnczzKUqEPwa369$e^jJVidr*f@MkP9BrV{9PQpeuzRZp27`ChR8+xV zMNitRzB(E4y2A>-Yv^66?yViBuIT|wbMm0ds>cXlrM^+;;P8rKJaq4*n)JyNoIpmk z6Do}6sxoI@K=+U>H*R4^M@PT+_z<@@Z&h;p`TLF*drys8;}v|DZ#^Q=|I}3&7@3ZP z|F4&9Fp~w>U3q6=0JGpThX3;=lL6|k#QZT4`yLm)1(nPwDKjPB zaa&mBv4lU+_(sS&CiFHI+`r1VGN6MWGEegT$1y?QoqEOMX6wI|0E0hnAtwZ@ zGNd%Z{r=;Ce~bL>MQE=83eyp{Z11AH{q`CTdxK}6|3-F_wYLed4S)Kg&F}yCRSGr% zU?5Ed01lgHpGUT;sqBM!Oxw%*p~k;|SE`~ByS~R%2u;fd{LCqxNev($qE%^RTJXc0 z^11hSaS1CQuTmbSZcq#O{MrbRH|8DD`^Rsty*c5B{hHNnW)J#l_AZXCiV+bTO&mZQfg<( zM62>kn?fVA!n|KQJMaPjEHuv?@^~MNAkI6BMWEY47jsKjZT7wrl{`Uj4F6Vl=tB;@ zCzi$ctd=G&re=P*JbU7=)8*(YJ%M>}6C^t6Gbi>D$=v?n%z>(b(EwA3bBI-Ue66fZ z&8jsT<9=PKb4}lV!0bQ{zg0^eTzpSD-l620$6Z(XFeg{Tb%des2WP-NQ_i!tHA~rw z8xJ$wF3Agc*kH`PKThfdA(pB+ZW@}ki}{_b4GhJ?#OxM!5W2N@XY zpLxRF3gjq}#h~4$^8nxXH-p)!1>kjjyDep%g_`W~N z!rAmp_}zWuFdxlFL)o7Ivsn2$nEjDD7@5{MZ4-tAJ0(e)m1WlhhD>#wXMrCh#7kSQ z$BDy*$gpZlew}n4wYD8rWowU7cSB+uzFpw0|M%^HKh{j0_bad5Nk;fPzoi#zwGzSn|=pMm#X*FJwn!ai0`q5c)>&!c_3c;%^6Ts^_3 z%{bwwOYTG3ueYfv6{H4j2jcjG%AcVCgIHNh{42_Xtg{V$Yme~w@=5UM;{^trL;ah} ztsnz7mc#41LfO{($IYPgM)}$e6S9S-s^RDoLqhQs_QoJd8bsdIbiZ9{Xg4V^GS=mS zk|f-zv>x zf2rVc(JPEVAFPuj(34vtj-6D289F{5>Nu;CMlr|`lk%@}`n8>uzzY~CS=*ZZo=|YY zO9)=q&pmbXc*@Ky!nUM#%9=O4lLr{KHoPGQZn^nM`y(EtO6%`Izn2yTDTxPEgo^!W zfy0y@D1x#}ixlH%_DD*ob6JEzqLr6AZ>1zbSEx53<^ntEj~)uv00OzkO^7lc6I|Tm zQwx$Y!b?a1o9TT4M-8@Rk4EdsBtC?z8%`tGZ3(IVvte#YVL5r5R=*xlsl=dR(*EC(>MNeu77=G#YJcGD%8=Aq zUR1&wb>DM`7`gW4GEs`z6(^5_Omq(dhK_GJWNoC_*k*A%3w~D8Sk@}(ZyLHJHutVie>FW9zfHl``ug7%`JfQg8R=!E zi>yLZ$=vOXqnk=}CrusS^UP3!9L4TAOz$xvAz{qE)b8l1PUe0`@lyH-8s zNJMWaLIXqHez!Y0)lHzpaUE$&6fWrkRz~pM@^Gpx-?Ax=#>9=Kv?=PUVLmo;-qcOj zRj3!R5jt4fK}S&P5JnN<`wI6HgHw8KId7Z&485Rp<`hdTY1(l8E=?L_#h9;5@s0Gy zH9#_|7NP*AHlYz{wDt4{m*S@{UyfqZAJGT`g=n!^(#st~NEa?^3OR?`GCUlRHhhc* ztM+u$BVV6Af+nP7aB_@$`(A3ruYI4HTf@XChoYPDikEy_frhcM-Zyc>?S?zSPmtpX zMx&owT2s-*S==!kdV;tiXZzc6g_}IMEG411UdzmDqUDVha~fDCCy%=+PJp}Gqp zT&9eyf1+QPJyidY#e^d2>-5+8)G~$|K;zQ=kwq9w@FYRg$5}2AVH-#Wf3_M0U2p|x zrU3GTMSxlM8u|$RX|(H=MZ86>l9(y{b?Nu!LJi<9_LXlwSo3kVjM(KU#OWtPuf{Y{ z#I=_TzuoT#t}a9?43OE@8q=tN`;f#E97bgCgXJCOOI{D9@sje4g+k?PEAR5yX~A`3 zY(^U!M3-QikgjQ;IaN#}C81WMOQwzjf-!jxn_BZ+ENrzOFDHRW&)qr34imnfK|FhI zsS9ne11m~6;m1&UO17k$z>Bz1M^DF5Fq(O4o^BkZK|B)wkwIwlej{RS$F-`V;S(Q4W#6*!Fy&xXpOGnDIrn z+|NsWG$$p;4+;p|=&@fJ<>GCc{@@Txb>~pjB`TEThbfBph`~A3Mfn0&L2WTEt~t8H zTjVrJ85?x;LEk{K2Fn+I_t_tEu4ysl*}yi~8(S<$n7fPwT)l6VF?{HA-=8eP01QRT zJAN%#?^4x}?v^t)CqD?e$3c?;N&CZJ=ub>^JL3h+Tf(%Y;cz4$bphk2eh5Q#r{LEA z#6&W{dB4X2c>)~)xL%lfrIpv0Z!%9^O#~$4G-BMSas$%*jcPY4&=ON@h`V$%6+iS| zMyxf5^HQ7L6Jbn|U55JD;b@7IF{e>7bfeLq_O;p*e{)|6IX0Su~AcC}D zpj~cJ^Od^(3zx&aaI#LGm+t?1Zp;Vgz<^R=7hF7_1j@BNpvhR+s`zWyPQ1&%ia%Ch zaBtKD*vsNpXfP9c5J#c6PYiS|#0cDAau-LxCoGwqI+I^E*qIUni@}lt+|gM5w|noQ zk%KReE&PtOSA@vqN@f1Q3zz@^sk3}rE) zyela<#G^kR^p}lbkYE313!&4&5a8B$aay2fThi~vm{phBj$GW?IOd9PlRx-$3l^^3Y&Zi$Zz z(0P;>F<|XeBV#R<)S=yaVbQnSaPH|?CJoN6hTW~a+3@Sjd6QVRV~Q_gKHpD{z2K$_ zmk{ga9w4rIvQ9@am2<6wwAd3DXCfrOJP(cYd9$EEeEe}rb$Vf*C>Y45P3aLYzjNhy z`C630%5J}i?f>*8+dO2+QGPrj^)Jx7O-UiD+q@On&hzOACe2)|%IEL(RsmCS2iq z6EW~<`DRhMzn3BR&+R=!Jq6>p2j}E6%yo`ixYlYK9?TzjYg2xmo`)Wzv^0d>m>&#P zqOGT(nzzbHew6&I*YYBzI=%64rPPE*J)3C)@iKG=Af_1y4;!`?@D))vle-A)C1BYR zWKmm^w)`JAcGncFf|qyfJX|4_HPWGTdhGan{;5La?r;2cQotx>Uk6uDz*iq)MlWG% zLeDerur$-~jZgODzv5jg#a(5-9%;b_2N{*TtMa)>^$qx)%Dxo<3{ITb%u~EtJ|K|s zP5rF+an7U98{!M7&+61~UE(eR6j5^-UW7YPbK~P4s3jPRwXRa@jr9nwvdT-qf%dWf zv1{6#A{?@41H3+hbLCiaw)Q%upk+*8uMwzNMpf<$R2+iYQ9x&zG{(f9J>7fLDWu}= zp4Z}hzOxFm)+-3flBktl0)(H$N=ba$F>Fx^n|ubOIyX>c;|$1+vgX{KAW6?4V~Uc z9wN0Y>723)AG;V{0y5u2y<))%uQ)!>4H7Lpv z^V7`YsiKMDzl#Gpa4|{V4zp}f;#pLrj6e0f_C=M_ytEIJGObgt^v(74uG?qdBBpId zlonY}w^V(u$TImE(`XV5dE)ix0(0dcR<~(XSnH>`0$Ce{dkuKrEP^hk&H9FqS6Mp;skOXq+QQf?+InL zzYhQQL6Wu=qv^%yg1?jZi^ACTBP=QUmRX_*##(~8h1i;9%O+X<a92eUz#1DJz30ilizju5t z2%Eq0S|)(;F5{l+0!Jt^0;&l#;rik86fHA@ zM~=!BUVEDtmOu@_0dtU=6T6D{d~yhBG+y}+6|M*Xw!ir-PE&92oF2P@tI<7* z{A8JjII{if#qq=x@;5bKzgf?n2`)HFmD&swhC^-YP>^zKXH996Rl=i^=2b;toj?$* z4~BoOt2U!t;DSUd7iWs3IStuL{W+3rQpc&$NqZFO2KAVkk-H6ceUE{Nh&<1_1PAD- zv8zn*Eh9?I(pWy4c1_xjyjjiuaEzrTU>4t-leZj>yFnz3T@W%-Mt_1gj26~~<_f-0 zKQ0V<*!VhP%z_$cI@SoE<>#{6qnheQn^)LZ*sq2E3^EcFzKJ&FmI%*^w9~WV!pUqu zloGKH@Hase3F=pai>KHRn?|tljKH;?#LHI8m^Xutz05!MsYwFeqJ7Z%%ir*jbaVJ?&yS1? zZ3IdsA#fn_Sj;iMi*iXAnM`a)Lm5X zsP|mElAqq4)Du8-O2sX^&ZAEF{F7dc?sFq1q!TeCI-Db8PgBBF_d3OXr(BWe-qy6j z5O~=W(H?ng(}?H{h51+Y3YnnS&YN|zCZ4hTDp`wTKRV_&c8*Bz=bfkELx_~PeW}^Z zm%yG`lkV2=7)!s4Wfvuw)1S;qH06BjmX=c(ylaFX^2Db;p@U43X>%2=L z^T9qb%})x1okl$E`wiO|3hC}?UJ9iq5ZXt{bB~xlgjZgXe0g-3Q`MKQ0P=R6@5dHN z4-Pd3a&vu_OeM*4DeTbAVu<>AD5Q*Hhd_#X^``~v46m8zzAK$>R3+=yS}j035{+KO z6i561rTV9!6H=oS>q=}K!Y*HGHl2=WoJX+mj4l+~eshzHE0D{S&h>Rr-LVunyNy2U z<}c?fPTQql?Ygs=KJiUzctRQX8@!X&7Th;*8(wx9GXZrtJug>p6JzF=07*OfSj&8W z8}{;r$r?LnWJT+3U2ve2Wv*x!UlZ1f`_BH--2z?yGpO3C{YfwJPl6(V2hcD0TaBza zxA@$O`74(t7|YRXyWSuUaW$pcC8w+@n5PzB z7kN?GA61!g@eja3$EK*PWz-+p&JH!=JcRTzAD66~yTOa+qNcS>9KEM`6|0iil$xB+ z#Rh8SLWAa(s6AJ|k*6nN)r!A?t;bAHWga4~_Kv#GKoD!2_1X8$sJM=i?~M~&1D8pl zg8X}$?_x|uiKXhFNzuxDEJCJ~YYP2|V6CwE_En?xLZMq7DJd9&tg*#KxJ|LL%Sa9H zUYWf9@7icOufiIgBF90|_=7n^&vTejr&Q$Y4DW#9{-|qGcgvo^#k=+K>h4)GjdjiY zV~oUPVo~;HeqmkI$L&{v*R`Z<%)(lQ`LSr%6u^*dI_ApqBwi=sURV?2oL%Jt&uF=z z-<69|Hm)bN=jKg5>=etGSR0>`b-|97YQ(sBnwmdOZF=`R6b)Ph1fCf`rDjG7uQEEN z0-OKE5$5W?&wO`Ops888zTY6EJpN5%hWzRKV@&ZeEN;_ufZ^fLYw1yghUy4+ueOyc zmhD82^!QTk(Y}2@9wu* zV-*Dl59tq%@H_I^dcc)zcqj0=vh3*MJRZPwYT|~YsWDofmnXb~C%@b-pduPDq#RN- z7IqLcFGeBRx@+)H9bE(0D$1Lcrf4XN7c@{icdvJetAL>aCcjnxRLY57LNt$3#r?|n zG2J-)vQbrfE6g_iplkzs08{<4bUZMp8{w)di>iUucf?sFe1x_E(an?7ZI7s^_QO6z zqy%Z9TbojsMtgXWzZzEc*mADyZJGC*y)J{P*V`6Ld?Vk@I-m20Fv zBg8gqXAn1%a~&19Y}nZ`^)vpcP=E(ZMOw_X5Z-=xY*%5cSaNsqwwqRpD*rch(zi9W zYRWqQU+Clo?tu&MDvqw4;g!^5Fip79m^+BcJ6yX-ZODcJnsp^#Q+Nlfmb?vf6G-v_j2`6lW98+LrAa_nDz&(YsXT>#p`=#sW89V+e-*)_Rtd4OCh=RbT42 z89juK+m?L6&FyDYJxfx)QUWfZ)w%u{F?@+A)DALY5I6e_$fNqk@)*fM-9OMeiPV)C zzp+;fU%yWi4Fmc+V;9|1-&{8P9De7)(#dd4KADt{LUopx*e|GiAY`=yf0bPCf_EOk zfT#=DocNX7Nd~GPue-=p%&1E`^*M%(re5|6W)0(?HF5d^z!o2uxTkS)r#Y3f(*2|4 z;+EV>lhVWW2RRGHR}!yk@bSaPZ5?4@aZ{M%%KE?Ph}eDfYSkXo3{Tb^FiP^I|B7DH zll~8lT_2ttX@dSq833dF7Ki1bq^?ZY=v7;m@99n(6;_ z8I|0XNy91Yu(6o|59I}6z4TU!`V2b=aP#&c_02r>OE+pd*Y;&g%geU_Gi#n`=h88^ zSOgb^Ubwr_aS=+|gnq=a>%c)Dt)Qx+^8D>1dlgvnLxTWO0g`Q4sAkW`(A}jqTk;vt z^D@Yk@Q~_l?wCxYlDAcyEb!i&!*t-y0YPusbX-Q5m;ktVnL+;S>?r^N=xWLg}@td~Y2|r$B zpnT7UZ8Z05NRE8vdjq*V!0cpQ#+vNQs}yUVNC%IU750J_7erR}OFEolBdWY@?%%Oo zkAzGef?PjInJ1&@ZP!0xL|PjRUF_hT8)+d`q<&1>4}4%fG&zjEY0M`(nHC1#ohl@z zQ6rO}&ACgutAQ>Mt5r9e%yODR;TxXg*bZ6?ZEre-{|Y7VPmc=@wuu=xVNh|F2!}q> zDEcBms}|_vzZ^HI`dI_ zilD|~_wta3BY$%NJT6o(H<6Dxinx_x{KXUw$3>GQ?9c2)1NtB2tt6dq^(@tI3{v#_ z@^k8{;egIwjc=1j+QuzNl2`vQi$#ArLc&A-RBaP@d=V1r=&N#%(bja^it(Gz(^Z7~ zHOl){M;qvZ>{P5SV{SjJPt!xSu}>Uy&*sXUlay%rR-)1t$zw5OzEA!E#jMfAethQr zBbpTtMD|V(NR4`aFwiopDE=~=Bw!?mmQdbm+s zcx5YZt5iw3qL&o1n0}_`5dU!NS>$@0lRv@)5wQ?t{Y!H@Z4LHUGl~xC-@WidxMqY* zjgsUjpU$z)^*8%nsOr4}+~WOIF4)V;ZawNC6!U^L_Pxn~)5a!y6GIX5ArMYD5>}$T zNq7n0R3%4iRh~$HrRVEn{-gndF$k@AM)+n=?ga8w-t@GZ6dgRkRk_PpE%NP4InNUn zHlIEjevblKKxn0(@;+0<6KA?c=R?-6-DoJJoxTuMu{Lv1ZK?B$?-nssQs=asdmL>T zb%BMx$2PdFtMpeFjOL{J%kKrGgu~bnhUd%{${g>>vPDVw*wALvQFoN@@L(%iOQ>S6 zGIRn-k)vLDi!~d;A7K~B`$f%5uv-~nkDg}0r?4l9Y1yRVTPVkBzLCEh(qAuUj|koq z&wV<)$Cd2%oHUn!Rvk)qq7>S0%vm`Xh0}f`*_`hWluyT@kfBys{HS^R5D`9mv7 zM`i7049I^pQX94~HT&rikTax-(87It?fu=6>g4^^eDVj=)~#gQeorH$6=YNySYept z;D2c`Hyf?DoPTS9ADJOpKqX3LZb-dW=cQbtX)=&e`yjd!*Cy$6 zZJ3!y1^abvCt^V(3VSP;$TJ@_F!W}T1xgZcZ=lx|KrdmmIdlqm2mCt0dp_+?VI?~1 z6>|Cep+?8eXo=pW-8}M_`+dtToqL;DLzi?P<*f?a#%PsB~bl)=Y!1LapuC3UDBpII8jyrDM6*@9VbL)LJzpckYM(EtI=GGjRrIq z`GAmR0Gjl+#w6mZ003sr7VuQexQFwQ03`ICERP3jl?PM*C`C4C6HBBQB2J0~mAnYC z%R<@uKORKJmp-^8A%2)KM(Nn$y3xv%lu*5+*gW>Xr1c-98E`tC=5q7o7C;qrl8sJK z$dE3`f_Io_@n>Fvmlz~5MFVCNCt00p{l+%}9I-I0^nw z96!8w*Xn+S2XTttjPlkERL#x??q>xmBIzy8DYbgL!sJTom47w6|C5pe9%R1qQ2vQ1 zhxX?&Iy>W_XH1l;(wx-NmP;hqEp-vG*52j1)X75?C?9RC|DwuA@f=RhGF-WnKs zUoEwIDb+6F4~X}-mHxgxodqNWyA!)&jkI-Yj6oL9jgf7fmJ8fcfKW2a`XiO3OIzkY zA}_k)Z>;isTezc&`OB~UsQ`bo6Fvamj2~khDRNILMM^9A+=ji?Zc^hpQ!4S@epjyIg^+xPeE%Zl{N4Wk-NqO7 zp|g-!Vza}CNkm8=uVZ&5HqW)r7e6_b`0k7C)O2k-r&Va+#F3AP_4Z$jdnGq4n~W~- zm15L)itMZD-o)_IVfAu#vjBq&fJ$V;HywVA@_a64K5l0`InEC=Z4;8ldQ0tT`qJUQ zGN*qF{Q~!^m?2?X_=V|Ut}Ao&M2`hVog0M(=KJV!#}pG>>|xUS2Te!*Ul zgVX;*pZZ%}!A0PnU-5>*N9)j(BezW*&H}g=WZ-rzHosEUbEUFt7I4cnqHvH`i z0Hs@6^)r;GLs+SysE^uICArZZxZbzWnS%MydE~KHum%8ss8{{pT6*wK^a(V} z+Sqy2J+BQ>;N$48-Aq})l2fb2FoJLO8ELQPRJlB-tvyNb#oj|Z#QaIctgfgz`ZIvf z)8!oH@F!XOw`+NO9Xe0x28+qgXk6-#d#>ywDDQz_)k|%B$im#B73$YQO>6EV1 zj7%2y=ma^fZ#B06waTm3%j_cjY$3hoTrw`;J9dN&Ho}{;u|*pnkJCl1BaNmMY>79W zz%0KE$7J0qxZ$h``k6hX&UCzs75ct)2YRTdHl14|Qk^p$Z2rN#)TF$uWoqSzN{a^2 zpTYTw4pVn`cbReNwbIH;c*GHVjj~@K8CRH~fa3%Hs?%Jczf3Lh#WTsp-L3C&JajIj zbgV+NDg*RB^A{!<+-)x2dFw3{xfCzTaM+XQEi1Sj)C-Ls~^7KyeF}_-5sWR{AI|2~83CK1~lPz&V z4RJ}AZgWt?+@Ifb&zK-vvZLv6&X@eRyE9*8WPYui#QS1fAc~HVJP{TnNS|V{g0R%q z4wkSpPd+?bh=CZcH5^@QsBrJu zgjC4AeKPF<@d#Ii=b+0%m5}?-4xlyo%~gF01wFxKqVF+3tvg^umCN+DsnD{>f*c{; zFXEqLBSK`XBv+JsJoh&?P}AcJb;azq_NCjT!hV~=Wv8$p<6NanS(Na(Bj-C6jOSKs z7L=3wDCO+OQkzl@e9qEaHz#f#zORSoRD-Ex*U~AiDLWg5#J_Fw50Eg(BK_-^2N$|> z8TQBoo7eN*HDvPbm>+JojwPZ8^s893zC>^`$q=o&c@EJ{n0m%v-UOl3$u`47+MZ=t zqVVJ++@@#>1e@h2D5DPiGD_&FI z=Dc(&s4tXYzqxO8R@GZNNu;$V*0|~>amv~fvk3jB3~6&GSY#!5>Qug}HH2=aOx+L3 z5h^>De8AfMKN#DbG2kn&O2Ih>icUsuYP>c48Q;&hNdxydt+Y;Yp1Pm*G@9jeLwqyn z?#?Ru(#Z>ozDBNO2ixY*<*u>#P^VfVEQ3(s%f(=-uebhLH>HqkJX2bj=i`Xef^Qn{ z{H2XJ8IvkAO8w8NP_NF7P0dej0Vw|Rad>bjhjWP3c!rlsx6) z=~qV&i`P{kd#}B10PJ4^yMKpue@r^skSouUYQZ^jfUTX+(iojB+qR0`G^K6#n_uAZ;@g)5 z#^sY&a9bi6RSt(21P$(}+KkuX=$M894ikazbbh1sfpX1z75+u?`^zT5gDj%;^WmM# z>B^Cj$Qmj`VYn-(`XXxu>Clt5_@*|yTtM!n2W@hYRCb{MXsoJ=8p;N~R@K5pK4CIV zZ@gqL31ndeQY6)ca-Ddzh6LS zsl&y)%5qtSi-%?}l-lBbeZ`nV`whvnLorTS)cB!m=1p=+m!k~6?IN}Mn@%Y=TKZaV zPRQg4H73*zL-yNSM83i16zTrCRq8k;a!YFNZw%*Ot_djoWO86JtG*kRCQ7Ug{|0hv z%w@;~WPN+h+6p`r*Udrc>loLEd0_31IX9bx*_)F1PGxnsH!na!p6XjW*^aQsbor`@ zoeTdO6Kj&awWgI9>W349@4&+lQjeZt%d-~s!>C}V+-w*OrN#t7P&zN`Q(S250wpoL zD&^q)T-m$=NoM{sjlFNy<_${wH93tVGQa_22jbqwAbxTuCUm6B-}=Hj&G3n-n6p(d z*NBNldci5{vrjrLvXrrrKQ;@E7d&UwlhjWm-Rb7*={Kn z*%duRyrsFV$l}rNN*9J;Jb1s-CijNd#c5A(9ZV_20pnhnuqS-^>K^&;{NW>DAkWAVsAH+rhwu5wq+-i)a z^$W#8Loa}XSaMKEm+a5%v(l-G=7GQV8oP=%e+0GJ^GWL=t*;nkT&jB<9_^u>vP10p z=U-||FNcmiGr39|;g@Z+aNn?9zdHs~UE7m#DYNpwl9L3BL!Lg4(b#KoSw_RttWr~)gxxX0W;`$jj28S@o!4Ye`dU!;8uZWMs~(+2qQII!%V;T zO7flnD(p0yA1MX5H{}0tTI*2k&A7}MN$oXrIhE~g7ZefSVNj){)g= ztjy^$r7tK)n!jd6?`1qXHN(ef`?RRGwmk_oNvWat&NwUSb>fU7Z2Rbc9tK?zqrh56 zJ|;+)>wE4+G`2B(G6**^O`2HhR2MmA#B-D)zqR{RO4;{Hl1852)SD!(w`(4~tWL6i z#72fIE?^JhhoTtJ>T(uf667RaRSt;~ZoO}$YnVcodUc^hh+{gQ;>$*PO1`;NwE4!y z$cwN^FBtMgz?a?)$sAc*iN5>5g#aY^?E}vC^=$!68NxgGQWL5HLa7)neT(3p0g@F zL3l7t!Yi3->+pM@XkznI4IK5x_S^A(wh@lyuJL_j80vL+S$v`5^+;v&3bp_82370LT(g-WnNxR zlD)iumiJSgO3X>V$jX-d>ncu9_#k}2u6cw`y4PGvQb+4oBREpv)XC&1HTS9 zqnxj-=_j9`jpYuU?d><&Ko($Q0$GY;|t5e-(i-~R%8iIp+Ga3Gaft@;Az}_ z_|3tqPCp(3tu~&v*|!VxVO9vx#T?Y%xY}Wio~^vJZptF9+>xeZfQk%Lhc~vy&mUk zb|-XH^p~m-CSf$8mna=HVe3Z{mLW*72UkQ1x`qapbsqC@Z~JyG0;7HU1U5_Z-YyD4 z!AAIfu6jY7%NMMtQS7W^0z>J^;7&xexX|7gxJDMkgQd){zY?a<93WM0zSuNaPqUiyEb82u8rqecT|w=`fTc%-=JHuV1a$ zfZ{Z;c9eM$v0Wn{=B`W|^>q80d-1A@yK!ao&Bo*Vlr+z}hONxrR{r8YgnH;Ci-Z|J z+nY;vL3U8AZS%|4w1wveBBeuS6N{}^H8$&My zh2SCYd+~ zb8w>CCthWKORcW+*GkD|*q->7Q0|9Z zS6^{^w!CnFUH>TZKxxFB+==tBCFzXl;>svjNESHYuxWkJb`k3#s$CKK?5PzaJ!AYB zQ+P#B;4K)|BT!g?_UU}ko22PmqrW0uf=hg_jVkepSRZ{dm%%nD*kqZ!yfMg~`=YTb zFMi;`-Y4IqgiPMcmG*RKbI=_dw(owLP6=FG&?S6d?EdDy5@2gLH5$LD@6Y9Q-q6$t zlG|To`BLK`d$=#S_Xu8f`J{v;k{Y^}5zEIOWo|!Q@WL`(-hcIUP5QxD6j+aLkq_IE zeL_ri5f7Wx1w$FJY(I77bA7Da5``M;rTwR_PeooBE&H+aw(U?ucB0aC*WWFP+N##d zg%x7BpJZJ}ah^qRv5XA8hC@>xbGi4Be8i)f2qRX&_O;L;2rFD6gy@m@i8M>9_VB(l z;B8kq9#&=rx#DrYrR4Ps-#&$k$pbdTch4rAt(YVo*NtXzaV6UY9Rms`9hr!6qs3jv zya#0V>GxxwCoZ?$=^qPJ2`I0GfC4fVTP9_~Mg|_wNt=?mO|xM7nQI-fI>%7=T8sj} z#;ss3odlaaAy<@QQq|?;IYtFj4iICRORd<+o#MVK!3G7NwpQSWI!OM=Mw;-N@wT<~ zcznK+$R$#b=-kU?46LS&HgKo=p`_@d4v??rhZkzcrJX+J^yXtROBnTY?4#Nn#aYEv z0c(RZdq*?F+#19l-Ivi=E%e}WrHeL)MOBJLre;oZ{i8m{)W$nPY`X6Crc0?Pu`b!M zM${9wIlUa`n`EQd7#!DKQ$EL&RB$c84zd~RkP+xK!4-;LMiKR@{{l0|h!%nV62q)g2=A$#Kjds+X) z&P{VLV(VR-GqNP^n90`*HR=vT3CA-zSV7gRze&l zus&4g)G&rBvV%86KKkY2>#(ezsv2^ou~#(Uw!mj=G}o0bcYG(_knFGG4m8fPGD;cd zydpOS`hm!W?P`zc)S;$3=-5!56cSo9khgTYlc zk#MpNqT<1JK$cRo{S>-(=?-pk>FxXd=|8hW_bkB0JfHhGQk`_aT1$vK(}npKu*JPE zG_<OnI=>X@VhuvTITeENzG=(xokcarqB5B@k z%d+fT<$%Q)JFKy1jctk6Ejc4bdEgdj2%QQL7{pwj2hLUI(=a1Ti)CA)fP`>d@v%Dd zH3U_(y!(|951`V*c8wnMmESpizI)jeaYH{F5MTX7jXbauJ*DvWNpN|o-I54-8ULQK zTqrLT+maP_Q7JMKkWVVyId)4yNpF8+HOMouX~_0@45uI75K$-ap)nq}+FYekUYgtK zz#(<%eTmQ6Zr*Yuo-ht~DUJ!JYlDGWGh|d%$_ULGNyz5b(}5MBFR{lsr97%0?5=+5 zZj6n39h-s<-HDpZ>9k4+xg^jyre(aJ)%@d=PnJMT^_bJH-3C#4&}-}|5S0W|?=+vT zyZ`1bJ1ml3pXxOM)aVmEOv&9#<@y*&cWAgV`~K>QwXtQ^e%kE~e(j(abQgZ=u1}_( zUhsTY=JqDLuJS^eFslCrb8X_Y)sJZEhK?t(J5Q0&Gi)gHXm>!RpcN*0*nef@+ct$c zc2&omMf~?s2)XY5k=gpS`b?UyAqKasmo|K3RA!nM+Vr86-r;R*SFb;Ua_z$vyuXOC zwMUme^Rh=uecQn%+JEq%4eu#kaOGV)_YCaJ`pekPlJVo$%BJ~zslg&mZJW8*)cV!I zW&|6r%y=ws@cl)B@zmDMI#degOrh?Z53NrpHj`k^TaU?`7rC!M^AF9PpwBR`_Z|A4 zj46SS1#mZ5$tJ0*3RP$bTtC09@ODw57e9ODZ!SQNw}G*-rRU5&K(J0i=k1dL{&><| z@0aX+XG!EU=09z9eZ0o=380qFgUaw9nC_-Zc3De{(o^Np`EvOl;x$LSAvl;mG8D~n z`y!?xe3ItZ)=+@S7i0bv0aiFu6FX{{3&#PoH zDtyFG_nRLYKS58*22|;j<+`4=PdQ2*uhXA5TOUBQLc#QLHP`4>MgcfFZrntV zoJocVT$i&<^vKmvoZi^zjK>raEZ|12X}F4(^Y1)9UfFZLL7%0%*DBED?b|2~3M%JR zs}TJfF5E;}qmf=%ei5Rl=RbsPjgK4zw53AfM5{WNx@Lq&=r36_WOMtAqg94FJpGYb zuQ+x8TW0#%Uk_n;itz@qqQr$SEVK4QuWm}`bHBeq7PH(N!%UrbulcbT$pqs&9><(m z)f&pKaQ006rNjArdy{7hJ3gNEx>bw4rP`(xv)=NAG;vWv0~KuzMrKf-ey+FI=1znV zJ2mVjPa}MN81K^NrOcd$3v`6}q9T6&8Or@vO(VF*WKri3z1>*{;e12~ALwmdLE08t z++O`@WuR25Z0$vITgwSBogc(mZ&~l8`H_*wt0aCL^kXl2D0cDewUDmtt{I4=#f6x&a1ys{I)uG2)}hI{_CH!e7JJ6=;LVRQX)JZs3$EyLRp z=n4$zp>Fz9_zdFv!E9ShmFa&ldfF* z;o#aH#~A3s1r zPXxY$eyQ16=6#;;3l_fi!T8$AIiPQYazc6M-<9fd@hgJ82p(xUOEIlUz7scY)P3UG z0R1`LtkzO-8PRWPo-De5Si|%}cZIb!z>b_kRIsaTfhp7$lJXku3w6e7f@N`oc?h;M z3IjGldr9y09ddZ9kN>44WueB1(~;e!$dtZq1+OMkU69sm%^6OP{8O^<{S zM&otd8K+?&I-tn_Jw@kQzy~#UTs~21?CAoi07oB&-7vt=hl!n5IAVG250Qx40j0G7 z4Ci0UaQE`tVod>g-}Ak?F79Io6251c-DsWPyW6d5_$=)pTmO~Nx=e(j4~86rd~tTN z^fozRW$av5s)i*!p$ zqsNv0(*@5yY`Q>pAGE*1)#0Xe(^R*t>WALPPkHbj00Q*2wZM!X2b+BB33zE_ieC1` ztTf^HtIwEf*V1R6df~!D{B#891-SIF8)iitD}XL>+8z`d0=an0`_xPnGeC1s+^p4d zlwz;K4WNl*h(b*Pfz-X{p<5W`+B6zbXV@DA_$u+EzqHv8FNfI~+)2cSE?|AHZkcZ{ z4=T>dlpSOvf_fH!Hv^L8v)Y~q?cz2KpE?Z>B3jqeXo=Ha)qFCuV?Wt-vmsUCqLYIeZxzk6c!$ai#R^GVv)7^!>-O8uD zhmC~o>CEUrzMY3t)AJgqaIqy(995Dk5HaQA8pV_7^+N(b=@3_YFS{)n+BNL)+7>4#weNV2uH;(+V&z(vR|I<)@W(&@% zbL_L8$4;%;_8p9o3)%w1_v(E0S5pQ0V03Q(oNIzJz5o0199;?J0)$-RLskEP-RQoH_vAO}MdANP-FyF2 z{l@>}IZ>jFtdLQ%5(<&6Y%(Gwv&bIV*^W|)>``{M%#35Nls%3;j+HHr?HI@5dmXA* zz2EQ8AMpL*<@Sp6e4gw1xE}NVxL=PaQvk)X@RQLmp@SlKlOEx9ikp5Gi1u5W=50X#rgk!FfM0UQ{}uOyAoj!!S%p0e+n{iM!(I z;m@h}rM|6NbiQ1GFb~5A;4B*4)}KpDOLUFL;kt*jb0kG3a4~qv(2l7MwUX;kXp(5R zFXe8bzeTy2?={<&-fc#X`&cb0S*hG<<)O8c(^)HobKWlO+ZVXs7cB!F`XWobSs~}2 zGn~EOdA++|xfxXMZxq|3(WTR_y?CELUq^=gyq-N=I^n!L+Pn2`s8AlV-%&k(IMFoI zsVTi8pcO9tGw-ZcgzwBd#oBMSIj38?jZXTAi(C(_bfDHeMyK{O3jA z5wD2LBDTHhqt`h`9*sd?!fYoFFE5Ac?c#04A}Y1G-EA>vZ`J0n2u~f+4frJn)V|=l zYb|aVl&~b6y!ws+ioE=(h!$9Oaou{#{?N=hN#tAoc-m@C+GEvcOsY3$4Mv)ZRyY?V zj0q2cX*(Z@JCZMetVa$TNs%L;BX9^449sW+Y;PT9DqxQrbLn;BzpqUI>y$4W2}Q~&lW-j9s(7ucO`@Z zwC%N@r1e}S1H*KT&qk~mOM`B>=A$g~-|=5gQG#8eX8+|d>`5CJdZ1RN_?%G-NeJa- zjw|qc5KbXiW5Vr^V<#u-j~x>B2j4xz`YD%c`GGAtEQRTvhT&ksV@m_QMWqno=>gYt zK|DYg3aQ1?J9dWPzfJ}HikXbuxF%vHqhNxd-$=dFoL`cUZ8R4 zR6j&odfh?rm0tq{?F~tv7N*VVJMec9D#MLth`0}*I4k5n{YNqrXSGT2KiZRLL*|%N zzd;lvdp>ze-w(-nTd6pRwV69Ovxo#imye|$`!k`=NNLu_+y)2;hPi-pMl(D25eUefSp8hTlULiLaiIDvOXXT@VL6;%&As~(_kuo+ptM?6IM>t_0 z`tKh}skmI?E;&k<2D`jA7cR!yx5C4I3dc(`GT;H-3YPWb|5uQlY)QI>;as)Br2d>Q z&JTffdnUuj-gaVRTmF{I7dE|c?6fS#Qm~Kzd$CniFJ5%=Ab{7?*ZXp76`^n%i=#(T zmQ6#ckRVV_JvCHhz(*O_gR2!+LVBP4vr@({J~DLsta6G~EUU-0o?Mb7T7h^f=q%O9 zGXXJ+iX}hbUxul`Xd+J|)E~t={wR5%4}^y%*K^~87yo{c1nHud8Zk zn1g7q-BE*q8OtI*<>25@tl?S3+JiPF7(Rn%enN~}Bz(Sd zf8yP3QnNg;wXqz{sDC;M5~KtqeOzB(DntW$EpcnFgjuJD-`nVUIYscMR=`k zIyz2hnDb=aUEm#Ui?1d#S|o#XP-cGqT+sTCKlQ3P z>>35gHPvy+9jWcVeZuM~|1$^J@whmAQn;Bl9!!JJrJmCuk-U&H0E9MnEWIfGVIBW} z3;jp|k7Jz~>dav;0cK|p#S54y0rP8$P#P4dp)T&?pU(HaqI3gYuu8Ip$H%@(M|`7r zlO?zSq%=z2k^_MFk_wqUblF+W->jVh4d7Jvq{){u)*K|C&Rqis0ZF*wpEWovK`5XT zd1QN*Xie0g>ghSioTu!T-Z5xgPWfT4&)zn=mo@lWX1t{b>J4b$a3=-$@z(I%N>3zk z&eyN(0Ku8VM35V7f+v>5gQ9bnlMbt)v%E?|nrWtP6ECdco+T%?|H8!>^dKY{|EL4< zPzdqGL1pVg0(dy7BuS^plI*7u#<;Ioft`{;K@u|jM#uRhi-<6AM{&#H@mNJo<4PiD zb`dFqJKhZ?S4}%KPdZYfFqBhOp9N&Wzpa4{f`eDgP>#75Yu7IY?7x0Jl@MfpmmplT zz3t^~Rz0&jA6a5j!4^$}DR}~&0VTiF`QzI_lgTHYcaueO96{c{@zAi9T@qQN)A^;| zlX&~_oic+4qXhnFlPe+7kXu=D8-;#+zsiHZmjHI9gT%lG;}Z=$jb3)_?)X!l z_n=bqHsgQ{lLL>(@03OR&uy9|p>O7;UXIKRbT*up3I|$gjK}wBHGV>{t#Y!=ZHzzG zZmBA)BxxEy^`hZ7=j&I-{|y1}?oaxCx2&dDN)#zk$yr`Ed?S6XNfIWVeX$ouGIsok3edz=JHmx~|1I-h zfdX1ZwE)lK33EqdD;dGA>W@3t=}>%Qb&wSNZvm6xjMaE;=FjoAa;pTL?ZDoHh9imm zT5GlFv#+YQ>Pmd_catLUK8O@X1dDn8EnR3BY1z&?GtBY(8Q~nu5TJ{@-(!%?yk{{w{m*)n3UDK$HSQvT`*1#t2=(iK@%VVa9SJK* zoP^r^9yU5Vdlz!s=utBMKc*2*8bBNp-P_#xJvZdCZM6Iek0l&>?qDH$vzqY3yhr3H zLfYAEQTZaYaDRW_ZRhY;MsCwLJzX?6q!^(8g*LBtgb3L)jBrNCBCH3(H?5}UIQge( z2fy{X%-$A|(*Li+DS&DbH}LxI1j`;hl&aNktdki443h?k%tIF#yY3AYeWFJ4mC3 z*B4;Zjr3%D7i6^`z$IBj;kM3)!Ps#6 z%16e6qYQHpRx&;&xO23tYn$8>Wk!gWV$_0AcYHzm-ZPC8yER8VRq4IyjsOsR<=O%F z%x8n;vtTNrD&w;5C&L}2%a{?JuygTrtlf|7Q-WqZ5NZmzi}{>U5wd?U>dkHu~Pnbj;;voWyk$=ma6in0LO$rV`b6vCsLY z2xW;xBLqQ%lm4Llcb5JtyN%-r2;bdhG=d_xmnL>Zgq%gLH9iB4Hu2>D8f`uXjNRqfM%xwK z63k-}*(5tig~SZ81u+9pb9#c4%;@*4dWJp6f%cHkGH-FGQqcS>&7j)AF@Lu?)u)mZ zGZLU;JfqL@BnvyOl8x~==wN{m%k|(|G5=F8ICR1>F^?II!euKa!$IH#9VWeOwtUz- zdjs^QxjP~60R!eo?r7=Cz#_KkKAZTh0?AqbZp6mpJv&+G5Ri$^R8 z#WHu7-*Y%!RyFd5Zs56oi-~Y1E%ZLY4V{|nESR?|Is09f`GJHMmWed+p5N09Ha4s) z`9(H$9fJr;Ps(EGR}yT7$UXRzj`%qjcOa7t2iEH?<8S>Os7Ed5mgFBkhmD~Zrk0mq zJtj57fd=Bibdr$gT*{sG8Wz^YJl&ddQbZ;ykRwVK)SE0IXrIF2Rd0P=;Na)`^7@J| z^&@=pcXfav5WuHFCZ!Bo49!agtW2`!ivYSr=Bmi95MdK|j3KB#g3x>t0OM!!yFqqv zBd4+VG1l2Y1J7+O57e)q*ULf5GtH~XN^vEG8$zup9mY+krmTt~%>(@nCn)4sf5U>M zpgo#NT9~9+94a+WfLCIAxF~N+ZuvRM@NoMbWn_gXgcOsgLoTfJz%W5_=DC3$22s5?u)|3l{&8wFl5#LD;R=VTr@l`zUcJlK7-FOIL<)ccxj)F=Mo?96G^8Cls zRdfwzI$oc{apjVjV~COvG9m(}8m#{Pi`>r^mY`?872#B@IylS;k;jX8r|5DOcfMY( zI8?^^1Suxa!KM?r6`pwIC4}gkJ#Ij7j42SBY9x0><-Dv*SLqTLjV&meetHMYK(D%>DFihMqRkvsQA1V6iazr(syabg$pTKJl-Am3FhrO5o zPS@KGiH1-ANvkDV2(TzdXt74L60K2mqA<*ubK%7kf|>re$Xb!@z;geoZ&{&HG|c7G z)4?6@Yx?Z*YAjL6BJbL%sjm++`a!yP3ui3L4z0H%wX*V@qO#@bZH1C_lf)gBtas#- z=^%#NZzYHk%^s#RfE3LLUzX&R*kW zZFYU63AdL@X*z(e0$j%?xGRP%+&)zmZqAAwDL46}$>ck3iIZ$oAFoQVJT;GZeE@1V zr#$qO%N$Bs37($MmFS{UK3N3Z+gC~ss|Ja@Xb}^@P$gF*E^uHN6$Y)n~@&} zLN7oq6?Oee5?eXR$&?vy(DZsBmC`KIEpjyPlXy}jO`X!g0?G^YGmw^A1}pEUoZGOC zjF{#3z@*aHZ@P>Zg%)4EsOsgJ#x7K>IDd&miNziGfv3gJfdt$n_*8EzAMqt1NE^}~ zVT-ew#5&*at8S$cM9kOwab+(+hkK5vhRy5kgmlEA-SlChGFKR} z7=Eyro7AXW+vJlThN0GKum| z`(aVl7~e5++zlnhdtLj@b1MlknSSNKab;zn^ANUI7P#XNik zbmh3n2g0%=Sm0tjikDE3x9?R+gn@KbJB}`03y(QOiwQ=k5Y%{rN`b>Jn^aY1z+5b2 zUJXZfr^pP|8rXetZvk04xC<1&tj<@a|5wYV8CK^TJvg$dNOQ=kSKb{__{vv8@f^0- zTPl^|u(TQDGFTEpg$9QtL`WA#vy2joUzsq1_~9@wYUv$wTOhrgj5W@TN?Bakg!(ul z;cftYdd|jsQW^_|#PZp1g;sCWb{8M_PQckV$157O;xP~HJADO!7xe?+mOtJYD~&-J zR@3z{ot5Tzu$?`b5A0NIpvY2CNphGUs)MnF-)HTD9UVMyv)C%@!}Gn1gjsV~WAmdr z{^MFKX$_Y?p{SkEB2gqgk+hGiYFah`8(r^Fd=T3+TXmTaEpJ1%%?Gmg*}zw>k~C*u zM$q!xCtr)6ZU4M$DC`T|gMu96rEkhR2Vdr@x0zF=v`F6)0IkX|+fu|WNs$u4 z1}6dqiIiCrfWMcx0t^eU-FFmmS7fT9ZP#% zOm%HR{N`vq>|qDqA&DH9T&`?W*mSsr>b+^PGyCP|`@j2;n)k8OPz1HK#& z{y65*`D4dXw+|XyA#jbm*>C6%)n_oB@kCEjwWVIN;zuOZ?XDX!#SiQmf#j1Z|D_68 zLYsOQF4ZOtG@0rJXuOO|h32K~BM(OUqcbA++pHT-^v8?9tT@i*Y{rRJh7dzU!7#55 zIGuF#*lwovXdwqfj5O`LM~wOQR&9GAsdcYMF3!hMY|56(JieZ(H{F(x@#|7ypOTaW zh#+I6Pfuu$Zutl$Xxip}-fZex*)35LT{{2l=!xHNE&!IqX6z6zPTpry?ELB#EAzFR ztG1g)Uc$0;qF?zk{0`G}DxAP{!uXT!<$qN%w1@P0d8zH;tOugHbU>&$pDn(#x0eza zU>yqd_p9n)zUuoKI%(OTM@E8z#e?Li_a2_%8dDHp$BLRs#sq{9-C^3 zGqU1C63kZ(m==NsFqukez}aZkuV9Ax;&PhCC2!s3`|VHWiQ^5F@pH7kIyzS4;_B;A z()Mvdxq79s%CdlGv7wRF|f(XO3GpAoJL zt)lso+sRgzTZ}_U#RYR2%nc6ASsu*Kzd*}Vrz~jhasOS|9HaotiLUYSMMUd))VvpU zj;Jnp9-8aP?RNM6-YYa+_n-^|@7Kz+4G|XURT(ZgO92$Q$XQ zpauO?Jm9f@_;iEgV<3xFk#M19Uxl?21`LjLfQqKRPT(OJ(2wPD)Qnq~xh@fHRyzi` z0!WPf-vo(Q@E!rzIM;*Ur@wxeewmOl8OCynf7&)1k*=VwQ&64+T{08|agy%P#!FSs zFfQCY_Kfw|3;@9O`tQQ9P*G1Lxh|{O^YL>eL!Fh9_g)e_0I0wf`r9LMmH?eBr!@8U zV*{tuNP=msWf)x`w57|~gRCY7Vy1zirF^x4Px?MOP|!7AB#6kUy!t1a2!Ky| z1KAO)_=cwQSebP}>Kuu6N5EeB`e2((UKcL(hjc5#aop--QVjXG6CFnQFP7mA8>L*5o}X* z-jE@-r?r^@bUZ&r((+j4Hl!uWOINWa=Wy{Y{czxwGiX>=r6g>_ng8VDL0^H{KHkbL=5d#c&_}KRdj~oCG}sT6Rkx z;55Ii24oZmRH~gw1cR>GM1{pwoYezy-42^4_YK{T4l7Kwz-vAtl6`!MX-aY>r1Zw1 zYS)sw@Kedn`aegO*o?Ij(dU$&KD(QcifiN-Mt*$q6@p-P-Cg86(R?6T30^##!+ww0 zqDiIvC`)b{+fkhU`*4v75tf2#;0V=1XZCPXT=&>jyVFO7Tt1_GM-Z^p)-IWz>$r~} zC4K(fLzIo6*<6EP$1P+@8Xak+YR&D_W=emGeKP39 zi)&{~y$_K^Rv=wd$lOWm@89V8$yNRD%WF1d4R=`4%d)j`wpgs2snkI3xv)3U1b6F`E{dIzWv|0(1t=y^gB5z~)UePXWn#eGMpVQE6g zi|C<%vmqzttY;_kY_wWdtw}v^Jf?VHu<-kE@QLlJ-aK}-A047yk?~@##`VRquYV=J zW{?WWpK}JCrJH%~&2u7yAulW8`x5Tc{}<}sT;XGHVc@lLNnKU=&gUCT!lq|W^g7~0 z-K3XZHT?vLoRxELiWGBpz7OlEg_6(PrD^?A{bQ3Q;k$3sLSZYZWSd<6JKLYq;n;=N1$MN(5oKb5Fb0d-{>|N@^AU@3BTrFr7 z34BJQHDBpi#$cSem*U(S!bz=tJVx8mu`jxwi&<9u;6yvI@qiu_;4iyq5Kq||RUIiNoi(2fQlByc)j z8sa=rR8pJ{m-V9}7luhY4b!P31E=?#+F{u``&V6dUc8vsIu~q!l zOzm*TnZnpH@@5+$qH~1WjJG~f8&z22mp!nmyP#G6WVkWx29-SQnN!!Q>*|93n6Ofc zm+72)f^2)ddza`=To>Ji0b$c|hpt@S_w$OHR;*>#EoI&>FC$iJWw(BlvW_MERLZcW zm99~avNLgVm&P&heDISKts#ko(2ORCz-mQmpT6zeEnjX4~l$IwG>GGXG4IN zR!AAvKT8Q*8m}_E7p75squ}cK+fZ7=&bNj1G zI|aFQy}+>3tefGt+Ljp@c28{=dW_L`+ddw1Sh}&}a;MWxE(m$2^2hn#?-uzV`Vm95 zFeu!RnF?IvZnv5d%6ZfDK}r#5cDY3H4gS)E<3&Tu$Zys*UXFfg3O`XosV{Z$0!rbm zUal!K+bw)Dp~IrL-^r4>>)Gx+yw;|#&BfqRrtP)cH4&;W>Tv!`ct-0?j~PQv`~jOeOsmIKwc$fNzinrpmVF;7KJPvhW@ULpBe zJyloVurW%Q-yCuPZCrJf-RAFe|9-mGRgQp=SSQ+Z>yv>t!U})_ki#S5Stoi9mnb{H zG^52$(5cZ&cm$JQDR)X=n0ok|^7fV#~eGrH7fY<2;lD65BFW+akcV3Y5?5FlYL|( zA1ch28y|ki7LcH;vV!EQ@2VSdI~5QQxtNeT#Gx#hE2 zco1L7v>tp)3q_H^$)8q&QQ);_%#Rb#x+^!IS$Q*Y&wt$`yNt#{MaK%#&5^{Qd9*%} zf9sH;c6Cj@{3!Hxx9|tH71v5USjS{|=g48>m^gVyf0p#7g=P2XU_~ucz+A^bTvCjV zj^fd1O=HU3@Bl{g^{$C~&}Fe=rpmh@u)+zjVrUB56T95IL)_ss7Rub`?w5DLaE^9m zuxqELClr%pjVq*Rhz^2i7uk54>IEIoTeV|P^4_)3S2nS1nnnIn;afKh55>Lem%BHr zbxa%A=Hoe8Zf&ENXH@hoaOdx2(QK~uivR!hsukLDv}CZPF}ymiOvxP5uK`rc`u z$L+-JQ}hHrnN;w>{V*y6UNy{u;Vzuzr!D*Ge}WJfN!hB7!|8$UN2TkAR9n{$xu_kW z)_Qe?1#1PjED`thlDw;qyt-2w8CC}k5D`bhhc*ICGN+)=A^ZBD%3t5kD8*C*HZ0Al z`zk8oeggVrdFO(zklcpE)Q%s>d~07+pY`o#N={#O#8Qjip26v;mjCU-cgf&*%&X|J zSoxSw3*DgO(&GWV83F>7yu5i}C^X_1~7OM|C4Z3APJo%a~8UwP;EU*O7njzQGqL&}uH_Vg1Wpew*iWqa?P<~@Rkw$NB}@rY)2}Hi zJHPH56liWKt|v#El&a6if&MeUfr~OI6XCeR$f;4 zyWU7^+Yrjf%t;Io@!gEtOB46uM;YTzXf5NfySI`jUiA2sYPVNy%sJ2t%QEsdFpY_r zH`X>?ym(P)K3NxlMk9I>re=!_MJ}T^HCC$3A6J&Dy!;j4Cv9-zvpkO)ER8qwuDUL; z4&tU>2Zu^iGF| zDoMN{G<}B+iS$Kk*FQjn2sKQ#5fkE|d(rqhK>HsPfXm>z`%|(j!89WppE%P8-p%$OpMjcqe5+85?OP}7B;>!-NwAtZqJuG8|yDRoy@+0Ssi5;BJ90^ za_I2Hrtk;&W_2U!`HUb30&N zp}q_Y;t^56wY7yLE$idD>D=gOe!I*CvKLm1EZT)nD|Mab$r`Oz7HFZBZlh^XYo)tt znqDIp%V+178oZbvn;MQE_WNmiiVhwyjg7Q`b5q~m*X?i9=6w?oErFy-f|s>Erh${l{p zVOPATd{Ii@&LwoISd1u?0RP)tY{0Oc?N1Q&w2(TgeINRX_dLjqlX`JcbVA+cgH!i@ zDFR76;H>7c9->*w=LbG+Oxt$7@4Q=gQ6id)-dsge?ZrZY$Xe{1 zt3N%ZUCTAPbaC58>apVXt7APIA0Y9wRPZW0#vyiNOI9sW2w$qG4n>DoxY2RDk%TWF zCK|8B7aUN7t-ox@UfAq8Pjh@P_vF#O*<&s6=I~b@hX>Jdrf9(s=ZCWHTZN%(M711E z1kzlgTo;Pqf!0{5Rpa)EE+YHMk&KfTI7)+?JEqZ(J>$G*Mf|r;GcH}@ zKicsvyu3c-PV!!En2qjsx=GM_KT_bZ*_o{Xt; z-nFs+%@}=rHEGdpnc-6HZYx1|qFkH6%Hbysr{n$$L z5xk8oj^L&hw%!49Gz;+QgnJ#<>lk;3uhKGnSSqb!>mV#+DFCmcf6= zgi%#bW9)U2AO*PU_|oSDAiaw~*LyL)Sq(qmu~q;Xc6w7ExCOJ<5JCC>|#>} zZ|D3X%I6)jJYG4m@SuL&vopcZ6@``#Rsdnj_?fAq{)Cu)6WF4xjw>M2Nki|xJR|hF z#AO2dcH`{o__TYMU0gdEP<+EiX>}>@tDGv_uJQvp?h~I67bI&7VZ+DIgHH|@=fWJV zozM9Wju9OHN+QV`0vh2IZfM%8%AEnl2Y{nkpyKJot%IbOB4^gyGJ0Cd_^)sRkys@ofsw!G=&5moW`q zaafDMYMew8Gmz=Z&ClF%;&*u}Zs*eWh4E(|cQCLarWh1IaozUwO1+me76h6)69Oiz zDD`5DbXH%TN&o!Wa^}01<|P4{mxzj7ag!M6yA6A$?YEfDZIDlojyRLRNZ+bMZM`rY zLB!Pq-evMzvJ*@q36AUDM{rCklYcK2VZ+G)wj?^-t>C7^YTnsLiTBH0>t8vWg|hrw zAt22XF4W4e91dqU`cG6^N106uk1IrO&NHe zr9ljLN=2e}Zyg(Of<*vD%4LFljac7uBxO&lGrVwv0D@q-crsoTM7F0m@t<-H&JKvP ze9RqtDyDoQn^RC{@((<)+Nz=~Rr(jk@0Z0r)POx{3d2?o({)LQG+E-yMV}ntZTx7= zRZ}fcQBY8%D=KzEIRgveK^J~ZWw1M1dPjd1o&}1{npJ}p{+>&FwcazvJ0Z7HWFD{5 z0nN`+M>l@|^`wmacp&W0c|Lo1qIpsQK8HF|R8hKxPh%I7!^T9jx_h>rmrQ*Jm$qYa zBNH-75b6#B;343F5*7-DGsdQJo}Mq`ZV30@wLTwWGrYDTfojdnnz`Y_s!iw;wq%driI|bX} z0uX2n9^as|eqwtWnqM2ZcIpo9OZ#B#PB*)%rrpTTT~O=|@PmkQoR#=U+=0$V*+z4F zt(p%PX0z$uJ}I=Ye~st5Y5&|-R#itlB>0VYX&DQJe8z+w7qGPl+-sm~RoHmbw)4hE zyKeF&*Mj$;7`^OT=1^GVt6{|O+fFA*;NnElIYV5aZVJMP{3(7%+17RmV^eo`srOd6 z^&|Z@S-)AWSesZ?uJ_9(f=74?Xa}0!nz^2}!w!^37+2Z&#ax{}$23y^tW@RK{tD6divc#9ni+df?3OeWX*_FpF(}zXITVqqq=)#Tks*wtZ z4ZrDGgv;0*NHOWS!4-72?G(udKHr&Aj_7%OhbRo!n}@1P15ZZQKwn&hlQZwH(sZ=Z zA^oTk-Ck!G#UXZFYgj9x3Z(g=3YUO=O#UXGDe8i{;X6Xanecm0uR*BCdah>4(EWBC zb^VY4%35xhQM9iJ%rPY!O)WqFNQCDi^G{IFTjKG>Hy?R-2^fO{Z@6sFh@DGx?1KK~Gv;6tj7t~^W6D3BUjDG9QZ-)ycBMYKg2Vac$(_t1@7o$%H>#T!v#a=!t zhLubP?8WbaBC*_ja&YNCP=ygrf1C)oc&~pR-RUU5nC;4rXz)J?>R|UPa2=nt>=ntj zc-FeB+oO$TtW|aT&soV1S_Y5*LUfM5UV;^01ZEX(N6{hqyF3Fqn!LoOrTEb#=SAdE z=anYs8`amTkJ*yHS6r-ZIM(>G5EMR+JE^T5ge=A7qBOsJ2&k6WLI#_nqrNR8#%f<^ zMCA;bh&f@Hpw@#ESB3sW8LXW3u7EV^Rpz0ajn)S{rC)-6tBj-6kj}8o?Pgy+x}K8u z;8v_~{nECZ(n=2*#Zz~Sl{s)|HilGANj@rW`FvKsZAkbt54{p^dFGH8|9$40s6 zd<2uFxDXpLW`>4Zu*@Yoc${=7FFSj6gQyz^H~E2tsl}4jLp=`-st_z^J;|rDZ-Cgl zGjT^j@^mdL^xpDaiNFa`43@SpF;76wAkKu%dblcpZgf`P(K>O-z%T1^C6$-T5-ljdZkSr{rWc-;8=@(|LEqHP!--^y~6Z6e0P$;ZL7_N1oSeD+d8k7 zuOb|TZ}0HSUlv}~cVPbQ$c`i2e?J&maCs`y_9(;JSDDd7i4kb89g~cp0BS^hYsc?E zef;SR}6z@ObSHU&cO;=>GF6 zb=@lkHq&>9E3)PE4T_p5FrUbN+mr+!;izjbAGV}2!rx!b4Y~oWQ!N_ycOR}U@;G(d zFL{25Yhqamv*O^&Fyz46?IQ= zb-5?kQ{{ut2R|I_lmEV`Ft`&j+K0;kZMjQMLpz+5T`7D{Iw zpWbz+GO;QxWkdekqyFo&mmr6Ct(6pe=*vsbS#3i8*c6Z-w2TDc2r{md-h+-q5E(mY zGvjZ;LHP*|)?2nUSCF%oJH-{ZA&A`cX{o;z=`AI;Mh1{E5)1IN3o8FCGb^{|czNHZUpwxfmI1z0Xq`h_sp2z@TwQ*rxA(zB>UVC+ zT7Q-?bg_9Mie<;sGVgu`W7jT=p5>9XjG-`0pDOU5V{g9rRh3`hLzTdDFv<@9eyak= z7l$E)bAd!i=bPfQKOe#BuO^P~NVnVkABFoC zP}lsa`u&#O4rWEE+po5P0)qJAcn1tp`D;)AQ%y$rk7KCnlXtBQwxeK|+rES4KPaOtAIIj}XmVUt+HDTL6{7p6{cZde z?rO7hcw~^*v)gW`QQv!l9x)`2XFp;v5((s{Z@i?q%5iAqzm0A}2OiM{cg}w~6nekC zKI~1L_~J?Ux^1QKisxNQ(Y=~=-)j`|j)(d^|I9$aXDY~FOSN*QekQ%M8&VWc6 z_|SoA!~Gl-*?d0hKTQJ9@!;BXP#+r4gy{1;vpNc?7IQ$w@$vD!5~0sMtER4=7R##i z^XJdB4PA|5A||WaA~)kpQ0&VrXAkx^pDf8t*KZZ`$*ZW`ZXAGhjq!z`-gq_aH(iV^|VsU`<;TSsr!!t%tVKBV`bIU!_q!~ma(v~OmVt& z1sxv|aqT1ZSm0o0O?2bamWA<-7x6#QW;YINuU@%zH!M6X(Dr8klDqkNO3%bwrx8LT zn4K|flJd^U{TV29abZSl3B1?-Rz;G~;ojG;orYkRwGmoz7j#%@h>7SUIHsG?T)`eq}btipj07HKWQ+yu+|dQ9E>zl=xXAwJ1a&^v?_>u+cCJLLaK zXYYdsb~^6w(iyAme;FQDAFcov({&rN_`eP9pS#=y4fxoL$5i0|aqY=JrThSCB(qG( z@!~(1{aFFv^N=$ZSS$SZI+1@@0P;c#z;Q~M;_sIKdoTaG!6_VYrVGZ7g~0yjwI@&X z{~JNb@e7P>t}ilkx5PND>L_`E4p101mC~1lvxr^LxpQ3h0!>%J1hn`t0LWYqY`Y#; z&UYp-m)@xAbnC=ou|cO+*}ph&Q~w$l`L{ekQAKxg$R;?b8+`C@Ir+HbiOjMxmrfeB za0y)-Q;CpM)64B19$BGQ8%xQE*kQPF*0kMW_0&|3>0x+7OzeG1q{z9Zcg(YxlCWZz z=xdk7SR&O@m z4bf>{Bkf5}!~PPcgn_Q;S)U|l(;ev26%tO!%urC2u(bHU7D9y|=xLO7Ugt;?o&@vW zacunYZ7|_t{Fu_+BwpMe1`i1}&D`_UeW=x!XUcmI_iMwd?5dBhBrw4jA)ct!2roPx zVN8WC3Dk^YA%q$cvjlR`Je~d9Y#qa6y5lQm)$Ch(9b&E`V%7fDX5IgGzs&Q>CvcK=RGa1cVw{hJ`?>Ro zHD~wR(!v0Xmw1BPEDn?gywMSQII|L><^(5$p?e`Rgm3apj z*(jBya(T;Jtn9)Z#JflquHh)%O=iGV1Le6rj#{Xcjr8xD_ry>cZSc-my9zL3-kvED z7&}m6%FMksPwug*J`PaJI>=x1L6QISmb> zgx|)OyW7*EU(0X`hzdJ*P zp{27k?1$tVIWT9t=~k&&3!?m%Aq}anZH$R@ zR_0N+Fm}v;+OkAHRu8AJ&5QE&^(=H>^|JJkd_B|l;X#9OXBuysR@Y_A+E{@qPlvQ8 z2YzAjU3Ao=$5&~Ff@TY0j7t1;3X$rO{`N4T`b!r1EAz~?E-%{?Rhd2A?0t)AI7T zX0)4v8K^MBrM9(!mj&-}B|4ZlSJ()KXxQ^5aR`wKYX+P#8W$H*Q`UTjOP74cY-_%^ zNRa0$`a}DIH=~)Tz^aJr=lk@o2kh)O&d2%Zqqj^uDzWqe8#-KstIUKY+ay^9yxJr((^bS{SKZZp zd&Gwr-zf?fRWb4ye06-Kv*K*IgZ(VU%F(GtFET#7mXP3q5f|#LSOpoCf^ikPvndPy zXG#c_X9eBj;)WQ<)IC-?=a#K8EP*u>>8yT-H;20MRY=yrR2Jmfuj>EA@T4@zzLTpv z=x*OEGPzH@PqCNFU>gmN5o+J1y)S^$`Z#3|B<#Rndb+~`MGEAuMk`Bt5aBZ%L<7W5sd7BPLMHqj6i!4#8hth30-QJ2eG zN~k<1-9I?rk%Yo>I0zGKdqVA{dc$*GI7CuR4osrz4qbVk9DFbrZ!V!ekYm?=}#Mj~EUc=)5U@pqV`wqA${UfCG`>*tv6hsvM|jSm&Io4V8F zp4z!EUS?i&nR6p82fkx~X}r(ZF*-ng4TLXFB8=)cD?gvh0$w1rcg}{}fyCzPuEzcKCs0a&Q7*kE0;=-O*{u8{rxeErP5+9kG z8?1j6y!&!Qs7IbkIYg2B zvC#g_j7_Kwo}lRdJeZ8fdx0My`AxTOES2esy(mp(73km@)f(rfdA+bO-LKUdDQr_Y zOxiaT3R*FtYy(M`%K;>AnIO{8S5X!TXyITq7rbuyeMmOK0mT@ z?%|o4o&@nPk8AE!U%zf@nLn`>y8S6}M5Wg3GieU$Xi{avEEtUkevosn@b@?L6@Sre|E%~kkWK!*`+t=C2LHLV`At^B)6Uim+=l4jT#5< z8(EYvO<}0ASkhf{w19zQI?RjYQl8m9DduMPjY zR`WTF>M_V}4XyUj+e_o6KRv5F-<)Uo+4kKU8JK+Gf(0vb_fAq%j7}x0BK9<@(z@wK zxKhB{dj|NHHB=Q-BOdZoPVeI9JX#ck(A2@C@wc@nq)yr1M~971s~(qUX0jza zR224n(8zy_d3hJmI&Q8K>J6tkSE7Er8`sp?e;;LDel~ah6aNOUN((9`otX7rzmYhk z9xx@(sz3SDLpsS4J0U`}>)%?))Z7kIFLS>S%5M{peGYTUvnu~Vls&ReIxXeRb%?&& z{c!{prZ=PesMo9g0)nESURL0Yu@HLo&d{!u=~XW$EA>f%M|7ntvBK_}MS4s9yt4b- zOO~8GuFH>l&u^QkI6uJa$cjH4e8X(#9Wq>HzNm&yLwrYcmBZr$Dr_B}S>IKEd}%Cn zu`N}uGw&BT#x@~JRg|lpV?N;lgwHG)TyX%V4x*{yVgeR0y=Z`3rbDR{yvXv zneDCdN6L4f?jw3bQkicbA0wXM%1Wej7B3I$q6k_hxoqu2aA;p+fgzNN!*F9)R zHN%{1rJ|zwXo0w2*%X=I`ja|DUn*Zu#O5)aE8+qTJ;~6(J6B}R6XPqB%KXX1{gqOt*vZ|`xteXu>xNdxm3A-^oJls=iYuo*u zi2+kv3&VWnvi{^zXBdqNwD}$&QI>?-yDOQxA3NdVxgYwXTx*?2`H1}5KIex;6;bl_ zHQ8KOy)D_cFSo)|CwT{DJh&)G-)i81sT7>IrK|L9bAlr6uE3u_T3TwPMF6!;>i^Lt zN!^Wk|J_=aF$$O8V=WXU*3jh#OmE~@^{bgW$(BrlTgS`qnp#zscWrSNOK4~~Exh!S zB_Q_xS=xkAX}~MFjg;#Us9in0ySrT=>*mp6M#p1Z@NF}>%Bxg-E|EWOexmM{cet>b z4}kjrzxLiU9L{e0ACE3YkOV=L1d&9vXwgZOh$xBZHHg6&B8+adClS#?f-riGUWU;J zk3=86j~cxU(Pl7)-|d|9JW0;?oOl20`oB3|xbDkrbML+O+Iz3{S)aAG9k`n+&gdCn!SZy5?b11q*8s4<(I)q}+?Y3}oeu?~QpNs_z8!j6pRnyL)3i7l0 z*Kl|3Z`zU-Ym6p>6(fGRK}F-GXx#e71&1}u=&1RzqQcElSOsY9u#ge)DejwNgttpr z>Ar1GjOhT~L1qI2s!@Gp)%Qjin&`h()N~gcL6-eDJb7w<&~O4=Zly>C&uSON`jE@5 zVi->iQ{X=OuO-6w9EipwrrnaSIc+U!@w^lyP$)^q99Kf++54p*-FC4sqj{x)ZxPAu zA-771s{v~12pTa}2iW56>Jlf{`HU%4hQiG-jsejbPGhQz?2O)YG7^9>U81(m1ViSolHBalk1aOW>1bzi$9Kp!I8rM(Z4!u4FBPN^( zEi{M2t(Uq#jHHt=KI0Su=u; zRc(4hZ4i@`w6tusX5N}F;zB4Ng0y9(AtQa5ivc;KVTwO@a;M1mc;k>3PUGMQz!It_ zIa=^}j}0X6Z9enZOg0WG*L=sRp{LS17`e|!W;Cz&GX=Hl=WO49O4j^@l1&t;#_mD& z@a-MTYFk1c+9-#~nn{*FkI7P$u8dAqZu&=^d&~r1pQ>k7gs7M7NKo1&Mn|{cHGKv7 z@5l36c^$|VP!GY=JKC5KG((hxIsMcXZ}u&4W7}^YeSq8L(aID5()-q5C@W!grT~Ne zjnw)$ktc(foE!<=LWC5{@&xntM1OZ4P}2E$VSL+BL6&VWBrU^KS{z~Z&@1snppIvY z4vm4+EfxWhLk=NYDjC>wnXd6Tz~trfjy33+M3n**+p_QIp;N~s$PL=83V(^N!53P6 z6ubuHv@Zfps-m`sw^Jt{8&*8q<&A!}GvpV2D47zTX`eF^FIA;?F=ZD$E@zy?+2gc+ zb2{E#+*V$6x!zIBsMpGIG8ch>`jXOUgZo6aSiU5?&i?Vi6t+`1^vuw+TdF8Gb;4*E zwj|HBYBbdyK71nzcli`t*^&cKFCif`mTTT$;*G3Z+XJHG3KRgm|6Z_rXSncAu(b==pPuco(&)Qhx87=BciqG zF`n^aZ0%E4?PWV(HjOC=fNSIKWZe;XKHvI@qs~1@Qf{Fl);6IKg37i*e-`M}N7?bW z5N1{3VT~2p9%X*4Dg&P!Cd$XQU;FwJd7elrrZYA;^s*ypNp9O{l<-fM1bJ;4>|g4#a#R0F&q zBmmZVc(toX(9$&=R|vL;g=UlIccAD1ukm4H$#1E1bnlcn<^57GET@1d3;Y+=>)+`;ephRp6vIUjte5(JXllB( z-BkN>FYP>VcSR1!WqyDtnI4*M#|}Z~K)O{#2??0`2Lh~DKf}JsY#>`8)E4RIk zkw0OO6u((!(ph6LPi0=u%SlxUZ_N`yh*eE?&OO;m4Pc>Z0eFy9HlNxVf|3xUlQMEx zc67jxOUf!}O81jN4!|vq1gP58$~SAh>HxqqNZ{(&5OMY08dwwSWCqh~q4`mt z^LeCK!Cg%`8gLBnRhqTX*e0|?hM|NEn8BAO2#qx+6s5IF$UPxsq3~0#a~1RpbI8h8 zx7M@EmF7T*Jlpo3I`M#ePIi!eu=}wzsBSs|>Y)T2OI@RmxmWjo+rmiXNr>-{_C{cu zqJFr(3}uy>Z(W4EruFr82WBBoeSxbYeQ+yd$*+;(X!_n^izhaZ z!^1Xo@763^i!!3)h&w0yOk6#pM=hkUBLA>51X2i`YoY z%FP6w+Z+tp9lyH?xFmWY*(Wj>X!(^kAmI4kg<%+Adi=yU^A3? zo|)-wP}{<%?}FK`{>26GVwU2TUr(T9>lI4~(^d$=klpLF?rE2i)QMX(u+^8pzdk2v zmFf4sL^AI>-E+=U5={{M?>rVxx**a+UhM}D0_*Uz;2DNPT=^}a=d3?Nw z#lpjlf#9m8@ZG3?((Q6(^!Y)2`56ap6j<+Pmg&12Ll3)C#eOP{~^csA<$Eq1jZMWBDie{6l!=dkYL6vcyQy`kv9O4{&>BGJNWMO8V+H3dF# zfn+I7FML<@`++5Z>xX3EuOfUJ90YhE9Ul%4DJ2cvS|*bqFE_mwtRKIA{j3e4rhF;- zE!j_y;Mb?#r=+Jo&p<*3JWwI?6x4kF-s{%}|BRQMVhnW8Q)jG`*Y8Ve_eZaO{b<6} zcSyz)Gp1*}(%wC>HB2H-M}5693`-}p{%_ar{wpvO&_Ek8dn%6Hai|Y?E46@32c$)M^%@_W=>;A9)(8-e~sBlY`HnIE|o;xT5POH5MR2o74 zF1ro3f9K{Bqfrk4Kwdfpo70v4X0rVQnf?10HQy`69?LP9W$M$MW1r|Ni5()avi@$v zCEY<+1QPl%XnbHtE~;Q~)cEd*uy%5JE#Ql|@n$Lh22%a?ZQr$K{zL_bD{jig@Uo1G zhx!R&CQ_oF@n0seN4+p?MVrdppeqHw;9c%dzVY8s%|8Y;OD8 zbEoa#KhGjCpGkLs1X7*q>Z`v1N53^zFa|<+mAW5*s|fxY>aQ?2@HZ)4AT*Qn+C}(3 zcL<*4=y7*;`8I=ebv= z_+_o@4es}%KQf+groQ^{@$2YPK^`8-o0~5l%x$L1wX71tP{H;qlYHvY(UwkTwq8bg z!g9VTAw%BFYWVtw%&e@h^RlAmaUmgRg%9L11hv_WtDNWhO6#>*h3srw5hX1AxBzORmbL**31u~_6?kK-K(VvE<&xG9h zuMIZn0lYZbTM?*e_+Pa01|4S!8_6Zz|Jq;zDCmlj#RvYyd;Pz1RXV+to0L26zx9&3 zN)?DKdv8$4{!LB(_qV-M0M-Wiqv!v%!K=vAMB<6_|E%ozn-Kmrc)9>!ZBT#s^1HF~ zOP^-gfiV3>HPi0z#btjH3@0hjf}Zo~Ps+=`39a#eit%&i{-+o}wd4OA#fY99VE9$R z{iVrcQRNRblkdct$s!iA*A{CtbaqJr&fSk1M=s8sO2Zi-3siXgs_g48UG)wsa7+dX z`FBHq{H_Oz5H|UL7$FK7jc+579VjBz%lUU{kc$q2Fq`;m{8BfQcU=#T-8bGz&n-9= z#MTqeFASt#IzSWgEtC6`qBA%j(^@-|T^=v;U1Hvxcg6woM=NKoUUY`c zsdhrlQP&;WuHHYj?#`1oz{bFL9VGKo+9c%E1%4P*<%E#{3D8}m7jv3_JLuop1?+J^ zgc&Of^72AfobTNSwtY@^a<^-H{EmjE9JCoP!>ECM6h89X{`$300jdj~h)4=JwFydZ z0>P!bScBS}Fe>0B+5f*~}WD04u|7l0WU?Pg!i;DRJI$lQcH8yC9$8?H(`w z^1-9`|C;jfqh1>x^krG!Rc>Yx`_I*#UUx?cA)l`Ci6uv}iTnd_X9GK0p`}?1?zjlT zUqC8fJ|y_f+lE||HM%S7z4)jn+3=p*ZrcYd0a@5HH->c)fp|=@Q?Kyb)yT;qIk!ot zSe=&(mK-_N<{WTV1P(Ik&C!7(djlkiRXf{P?uy3WpCyg=ZQp%YP!S&ER2V}9)X=0t zkpDeW*g_%sNs_2sW@vqiX{jjz-|)DZmoR^ndv9&Kz4@*BpN)#qUd<1{<}R#Dp%IUX zd4BiDAMa+W_g95P%WNW|NBtCBH10>RVj&({bt&voI}15sRv}}uJZT_7wFFV+=oT$e z+>n6rjS%8%o`dMIh@HI<_JN&xv7zYWVtHi5NJFLO>a?FKO9Z_WnZAJCVRLyo#^Hg?@?+wTA8xO@wY;pZB##?{_7mbU0rlJMim&UbhTa$HTMIAAc2< zz{f|TC$)TK9Y*J3nt7G-Sbab%pIhcbe4K5pMgD~P6hmB*Y4~&}WZQgNnrWBZ>9xyI zK<%%Ht`MW2ItElOZ|slLYx~$!XKRw-cVcp|;yZobOO}RO(ATayQAU^kmCy8X(Dt9| zd<)xSA`67+>Skx3)*@Lwf`pkn^y_&ZdxaxBt*m3YnAPFDO;BA#D5YoFe*DYrLsl@i z!MP7i)>&cKgr6U;fF-ENi?m{N4Tx2*j;*nbtetW|PJpoG{q*VC2% z%!DpO9&T3Sd^gtP>8C6GL(6wA(q!wMOs6P!B#@3T@~obty$Xl+$nuBGwS+A3;^D!8 za>I8I3kMtU5guR`F5U17(d5Ar@0`KlDIz7FdkqOsHCaoLk=;ufYm zKW^lsEt*`tYikODXmsKyPkl-n2LfJ=Y_P(q_(GKg&6^uEaEZs*_aonSnmLRW_Y)e9 z49ho3j<2s0@TDRt4OM-7pl+T=_V4utST7s;KPBL|Rf^Z8zUTXG?rF#lH!gWom`Q5q zwUS~6EcO>4M8z1_yWn# za$wf?``UN5ahkNzmp?15oJ69rswcBCcpOE;#;U0t0b@foo^*de6gBG$=SZm+ z)I!a9vst_CbR4yx-QMi+Sa=kVp!I5SXDdw^DrdV>QrNWIss}G-$oYt3EU))l|Itwx z_lEX~pxIhXrMnct%FJ~0+hP3;ske3y4QBdXXXyimPkpJvYfHZb`Fb@;n}8amHqTFH zwY1t+p=*Wt_Ge`LERsF?2g2hqEThBeP|c(}F64e+Jx$gwuBY(n4h_BDAF#ZA=Zb$(vMVxmr)|Qnkn3g_Cu?W$C_Rh0<)kzws;_u$Ksl-Co$HJ|7O}O7e$VQln ziCOU(&8KnDYq;iO24z}44BEh@U7$FhNkaD37^+#P8rgAR^-aaNLOn$omo>R}5}!jQ zLwOzvCvhlD)P{E3YE@QD#$MqvsM}>;yCO1LC@p&5QD*qLT6oR7KWH3T8Kjowb>#`A z^mW6Fm-}UIcoM_;Joz*Y101nVXbwLs|bPD(!bc`=W%D~;>$m6%V@%_nP&qWVQpR)_n&tY!zra4EN87sMIGJGMeN z_hRpb<@Ri6+aD$%_?pAqWvS4^CxI`JmidpMK%L30vFsYfgfNdd%1^!==iY6-c$mVD z($Ufn*&_KvZU9Eai4`9W$bf_lK?VZ6hBzqE2jZwVwzHD z^CMEnwAlAF8_5`O;wqIRU4C&-Z8Q93OYFJ_FES1e$9LOk8?R0JGza>R4X~{H{+MlG zAJKAO7+legrb_kFsF@S|jqe_VOBfh+G?8JMEneEA zk|HT9H0_`L$5}o0!|4V?MG#HZ^y?ic39(OD_GolK<&`n0pO6#MN66w3w>|5=32njU zL0Q8-aW@_Ij}h#dW3VstF_Z_6m-*QQ_(FYP?Yq3fH;W4@wUW09B%Tfq4s8G1pBUdVZ2){zE98qoyAXEDed!>%irrPlPxGDTF@3chFi zXATcotXVmEhBuZ_-iUMj(ILB&1nt7rtZw!2!2`!xZkc(g&D1d%V-$55N{aQ5~LAe=PPUL zx#)ev#{RNPEMZj9EM8pnhuKRHQv6CaE*07Rp}|gNCitq9v?;ON*7wATW_e8+1QZUU zRXfIdLAZW+B_J1q$WWiSQR7EIu`44!AZN8OW=`f8TwwebogF~$N@ck^WJ#$IM3E)d zn}7R>t9W@VV7%R9N1vDw4b3g|Q!r$xMduS$v?8S2?YQgOv;*iwir*8q7FQfMJ>lo1 zplyul-C7o=F$4F_;7*8GN!1;e3nQPd8Fl&BW<2=7Yh1Z0j+(&32VT=%XKIo8c25H-#wwDWraeQ4@|nybiHjpuOU)X_g}YqS1ROTv*pN5ms+s-e z0zsTtQfVOx*aeB8pQnk28xhh1Of?iQ*}tub>hY`;B+XQ`O`>sFtVc#ZB+Jb;Z|EG? zt?fEU5Td;*(!d%to-5%yR^?WL6V-Pgmz952Zil2_eHt{R!r^zq%eB9jvWcwSKHH4f?dS+|ekrr%0A#-z5D#gmDmAfrG-&!l5_NtS zp&$BNKHsgdlVM#kRUh=y^6#P-+ z35d-KQ}6CqlgON_?c4jwYjOmw62svH*w;jgBCCE{!F34Ssy}{%Wp_AzZ)n;~RIu{k z`upsnv%_zPN1XFDN>HO1D#FU@p-rP;DE(I?BVm0@8!|}ej%w10*iXsmLDOCkZ7U+u=!;era;VsjRW_`cExUNz zM_B}Hy~4T;32%;x3};y5NrPmceb9O%I9QKjM%e z@4kSjoTVE&>Zy1sTwk^t?_Vlr+`fJ0G*&PXI>Dp4oJmJ1-V8A6n6((Dww4A*h6 zx6*$7zmp2P^Vck;&Ks@tE=G`$G{{fjmSs~3zACSujM zIqfz=cb|ez60)ji@(qhKIK-0PNOtb`KaDblcvuOI$_SPr#?3mXYb+Mxpd_mELf~V2#3;xUPB0?aPSFxyox(1Dqke1rr7Tw&X{p$VdY38N z2|B&8$Vl&`QB1F-3)6QT{K?aXJJ_K8RnU|~!tx+_nw?7YRYC5f$33FxUU7^VQ`F$mpcCYL^2M(b0~q%r-`L7A6)*>SsYL?7wRaWU9AGk13~sF31yt$V*XDn*+&MU# z!SJyEH5X4C20TKQBw3%`_n54;i4&M3-lNK5stW)Axqg?*J1V?kX~# zot@N|63pQ}-WCH}>9Q*_5bd7=q3053R&G|P`vwr-PD(5L0Y;e3U6s$G$A6}VG9^Gh zAC!m*nN_Z8xVT$KhMyNri0Rq9EhNDs>{n8OR&h9QD@C2OkWZq7jx0d<5Sdm*$kJH$ zD|)V8yqU2CR?yWk{VZ9nOiEVL)Q!#aW#Jx9Z<^T)uSRlVLj_|(y<2kZl#ffBoz(*naYAp8BT`gA^EXDcNRXG=xT_jsA zX+lYl8|v^qC;eN0k#RrO;^eVQ0%XW+roP6qZdY6iv`AzQ?K3P@Tg&yHGS@iayHe|w zf@RWA;Hfdr{NXyMEUmtg_ptp`d9UJAT%Z!{FJ~ZcRHmi#j$0$)Mnz}x#~o>R7J#aM z^cKw;o-(ch8FGV;7Esff}k5;XF4?mna!*o_nN%8r*70ns2|0uht1V|dJ zUByELLqUOww%JG2uiHkmR+PJ#2aXZlC#8p`8Jb^|xgJ)GMH9W`F*`8ZReECIw+CXy zHQ9l}JDzL4{h~Z4UOlEo3H3>$uyhXqJR5aYe1uMdP*itzgo*OhG+KX=89!?%%Y#kk zDE}j=hT4D>XNZ!9-|N5HDz=HFyi~D#VNCm5qG5ikm_k8ytr%MEKH0Yx(CNeBSRo>bocnD%&CX*u) zysVF{rz_7MHPECYuwokR@@SwYtojB$l%Dz~U{E(b|5>{RaNDS;Z3UK%8+_{f$^>(^ zgV1I#Ely1GHN6D{zW53=MTwGq@w!fM0J%h|;y;#h;7>jyXn}+B{A0PBcjVSX} zJh>*1ob)a@53?60j7%~0(q7t~vV9+Ya&Q2Y%v!y$0W$X}tQQ<{#AZ#EXv@e~o69tu ze9Y{s?YGxw`YdBUcpr*%cCu`cvEq9^<*+tSEq;kooJVVzQ;G*g-WA^=)u|8KOIVm| zxu7YvW3rm&nhY^=3@CF~*lL23ZWm;U*hsT7 zo!jA{nStjjsYG05EX$6!`I-nVVu(YP|JmkOh)Gbdt5-i(FxS>A9!usBC11;-+!Sks zQ2u}dq_7_|ivTw3FMM8l!DWPHQgqw4;kYcL^DG(VesWxl8gL#*Sn>Ntj~1`sGiE=& zY6lyQ-`mSLJxx*tbC6dLkZ74Po#0^hZ?)l`xnflKq`0-a*k0(QN2a@#4VUSWwvKyM zwe3llG7HrkHT(<8go>i>nvrx4h0lo1;)Ds5$iQAQJOIb_qCc}duJE*Dg`R&_EO$W& zRQIq@wsA$yWWc>|9hkH!_)9QnNQ>xtuQ1wZ>jibp24dz~Banv$=XsEzQYx4}s$$r7 zJP|w5f7FK^E3VjtS%`rsp@nQn8lrz=Vqal|k?5OXMzlKH-G(Y>p*FvJHSV-!60b(u zW;M>5iBUWTee!ex8OELB^N?E4hfJT#AIG@dvFEkGDe{&W;?YnFN?_TJG@DpOll+dQca_xT^}A3=zrQKg^E0%G_9kkdW+Q zgOgdkRtFIJ#Uz6=NU_*x>HLgpG#BQHMEc5eS8+tVp0dk3ccngNyp514lbb8y!fJ$& zEf}uSjcu`TC3>$Z`Hes8qPPWh^T6u(LDY5FU_~Qy#uYur9-Qf%M+eQpQgvE=FcO#8 z;_lUB7IhS^ z+k)(lc5HV55unWw_=@I(p}sjXz4wh4#db9*x200M(}od%EV4`!Uv z=En|u9RP`PBHjy4vzR@2_s~?S30os!R@q(ZlsU1>qxNV=qi2xQ(}g%-I0@}cu9WDr z!>HM?Fw*B5A3C(H*>mLk`8Et5WI3Ppd~l7Uq-T55RY7S70*;qP-nL*4Qw6z#C1FxN zLN4E8vl|^s+-^*c2qJP9`)SgGhFNxx_kdE`^?W`j6%{=ZPWC$|dY)X5!D(}L#r-D! zW|8tZJ)JvIYY@eGZQqg{R?ls=l#f$=Yw3^#s|7ojp2Ju*9Z&W~&!LHaD3R8072h9nO3J-$ zzx9i4_X@b~?8^QC0mXZA&=`~+u)v$**|(hO2Q1|-^EyWq)MU-BgGS@2Fusg=xN(=Z zBOG4IjEEJ}hRs3sC0J9FecN3>9)v zI@X*DN(0AF&3>e|sWz(yP_{J(eGIQ|ewbvnQArhIZdz@Wtr36AORFp0uWNH0q2)+; ztwX05X*ca^Q^m5D9wC2{BF{p|)?^VJ$bKTEB_U_fTel3<9Jk-q8b}(A^sBPVU#4PU zsnT;fSp)met}L0A*`%4vd1q(%4Zo3cZl=B_=&Q0iZOGLghmOfq)8V!|Qp%@N>GfDa zJOfNetCZ}i-ZwnN&<*hDR;j4N)Get2qoAtWaJlHrT&9pHv&=1Wbkk%B6fp=n(sqq? zd7zR#SEV;lHwC~n>h|nr)`_~cyUcc@)f&&_{E{+?kh%@49M4Xsq+7&@ghA#Wokepi ziY<+ddC!!J3*mA_e)_9Z=Dg4JZmzl?8#sep`SXg?5Iz!2;uIB7n;$QcuCz^*5-|VG zYe_4GO(sXnZjPApXf};7)m#vPmlnlw)8#d_o;NLG&OALT* zwV&_&;dFO*A!B%u$L7P7yWWhS*cWr{bz^d!)~zK8CFHf;j`_5+5y~PJ*g~rMGVD9S zVnp@|b;TE(8Qlk)2gO+cTBl^)vJ61Ykcxp%LYvsjeDb)ir)KdSiV+Tbz2#g|h{-2R z8TdQVmgFI<4QIAD9HZ1*?t`#v7k*5t*0tP7!{Jc(q!s04S9bJvyi>M1KrJCpSXQsa zd7`E=5~R*vjhvDvDm)+?8E4XvwCtHI!oO(uHUk4;f$vysH zdH&d7zY~#sL*fa7xZkd`uni|{hVNJDM$wQLS5c4p{XQ=tcb}5=u(Sh`5bwybYlzN% zf+$)Wc^)h`L`(8)(R771$G1zdD39bgGvs4|zwd`T>-JYlsKTLfl*VwrKUH`|b0W(R0}W5y4I{)Lq8zKRg@SKhvJq_G;ldlk7P zJzEE7%kBr>l|6BO7KQzq+gUYb({>Ec&r z%SC!>d>kx<#$*`RpAJM^tKeU>P5H)FfU4FCYO~J)KsNtFs!(6W*Rn+-J7W7OV;b|U zi)X_ly`~yWHwpd^dj`DtkngwV^oO?6-cshFX>XQKO|_gX*dKl(%ev7ix4Yn_A2*=8 zNSc5mq!x?$(zrqUXkVdIkvQcOb6Y_GPm+wloS5hDB}h9ug#zW~ANW7#W=nev|7qS( zmn$fc6`b1ldNuSkW>o4=)0(jVuxb-AX#y`ObHs+!xL+>rw(E(Bfc8$nn^UlbypA*e z#ThZl^kEemshR%uX9_j!QWY^bjVJ_Zw6BM2l$|4hyB!rW?_=69lDHe`5_N_>C+snIXy@E} z;cpGw9EZdW=GGh4#NMFbMU&}bNZ86lDm?y^>cK>goKeAUw^;4m_dbMS zzSVN|CVePGG@!0SqH;>&L3?Oo*Mt`&=dKdBm{r!RI+b+sOMT{YgGLpsd>%WDkeg!{ zr-gNP<2zGb@$`H#@^!pf!JEv*CyyLkq0Yn-{^(ry2^pASB+U1%@c}IALbqErF`Ey* za@kv_=iG`9tQTCp{wNFx!?U|;$g#J7->$NJvL)Ce;Sh>{*ah96A72O>S2dZPlr7Sl z9!i+r-W%g7mJ@6!{+=A?73xI40D^oqsNopBHl27UW%F1?KT+_6gOP&~Lwz@^@d;w= z+A-Yew(6W2IW--pg8vy3Qu5eO3do_?WW{hh*w zjReX3Kn&A`v5@g8PaY5M?pC!6v@6K_48Bg7n(E1=ROdKM*9ZuIaBFOJ<`KM{@cPc9@=J2pfy82=MoB*7)3Qp?j(N9X#$?>oz%F?%WK=lrGH?&E}3v z(`yNb8{6E(BDTFz%ZB36d)NK$CnX3b+R?tS*-J>?=Ei>9SqJ)tj+0Us%aFr^ohJGP zhJ>}4umiHbVk~ywHYzI#AzqW!u4>#-R^RZ?oSq_i9%q!#VOgxgcw4s;XjqqsA1rcP z<=crH)s+=`;&KHCqiI1OU7ep6wP#5kA3x3~=oV|O$YPdaquz-uxJ6|qyM{hvlb0KN zasWf{d3Acudqb-19Sx%nHsXsreN_qh`dQ_klM`huJ9{F_J~F8YlXNc2kbYJ{H`rd* z557v~!I9KC`iBLnk&u2RpHDO)Wp__tVp%aKey>38A+d`1$hqqEwl^nkMw2CO*|&u6 zJ$@in6dAKMxzlT!oH;<)_YU2YmW?9|rQapIeU>PHn8cB-Vzr>O)Ee2vK(4~1{odM@ zkN@mnvq^oIHNc>rwLR_IC~SMbG#&1&FCF=MbK8E^w&C+!H~!=}lD(()w*6bJvTu16 zuZe5Ugm<)LjN1W?4>sjDJeV?kKMma)88Nd>2_2iZ#8vcGQJR_$QZ{S`DCUID4bZjr zp2?oKE~uM?O&*1;KEqBDZ6GL%|@Iq5F?88_2I-ISlYnXSaB zcW&u5=O%2)74~L-bb$99cK^NWQ2>MsP~LzQ2Ea)d!`*Vf*Ri_15zvZFin-f3bAF3$kxBlT9BCv*}2{ z$ZnpFC!!Irndyo%$IL0Z#hGC?iSS-~uY`$eJkE8yvO&GU?^sYd@V=g%(^R?Kr0*xFWZ&h^` zvRb}$6G@UBG0duW)DXc=e3B7*9ab4Uy<%lyXJi-r2m*Jp_jP91^+5_SxCLSB9o&YRSL@=TCtjI%vBT2xg(K8MtT^ zPy{5jm$#obRcXhUPLXZDNp@R;8MjAIltzBMaa3QMG>CQmAjM;3w3ZwxaaH;ArC{hq zW4mdBAGTRwcLbl%4$n@MM;ynd*+1H*xp%#n{{lKLR5neBj~c!a1+K>I09#Y^Sm@~k zmR7|ir`O-G<4AJDIU+P|Mk&<=aFSa2<#qyFONkVpc#c-2I6{I)R_!P9(kJL7!^t?l zrfKm%u}xK7k7ZO+xZVO0Sqx+MR&7G+#bqcW)%;(to=ne9{OKQ~`?L3b3CH*#y{a?n zEz5M71$1)*XSK06^>{Ym*vZ(l-Mks9Gt_fKp#7x=q*@#bWUk9|6pyAZMf}Sj0pH0FD_b%AT*nmJ1=&z z+DcU&N)>4@-j!yMRm7nPx|S6jl@jpN*!#Cc zmiNZ$_<1(TC3Ai$3VF5$^Fj>v zAuXB)S3->5~B)Fi|H-c$i>8g9jo@elm=k(Nk2awKeN|Ln%* zyEgAOIqnpRX>r;;>ljRYO?56B_>USLD4G4Ht$}4Q{FzxK|Ex#ia*z%dp()E%;>A~S zc8T%rz3tIDu-iwK-+>-|dp!c+-@I4=I)H-K!q-7pKGXMwO-9 z?5wNrAt2$QKSbuDqEit2SGWKEM!}DWWZz%waOZmAf46`?iVF=5c@=Z*HCM&~Um!#Q zsL9z@H}LN#uC|3pAR*6EJ)5KX`-y7}XAw{!5T%l6QeF$=@>cMraH9|qYeL@eqF&(*Emk23@qV}C0 zG_!4^&Ty7%p6+0IEW9#n_*l(GoU~v!F>^Fo%b9Uwr`&r7g3$%bs=!-J2 z6o)Ug7FWWgh({XVk*}_GDv4uaO*ap)B{mnBJkH@s&gs-lMo3BHI4NLiIP&;cx{-oF z&F$s%W>+g@$RZ8DALyi0__^oa$-71_arWU~WsGV<5mJ06f}$@!f__FR;u|^rL4MfY zcI;HIPGTjsLQ52}@^E%9v@E^dES;yVZwq@&h){C|GW}S>7-$a~8y_l1%8t&1|`d2;>*G}g=u-)TRhTVPI zJDEbs{;-q4#Zx!r%678LfW@iV`mBbZe*$@pVQ#+?=#sn;cjir<-NJI_6R!KnBWvbE ziEeqlX}aLq+>>(Re7dqFMYd$WCiI@;_6Up7&A6!~3e`#DC#Iu? zo1t_F`_d<7Wxmfnq)1qHB1(^2v0)jQTIjCVSiryv?Dnf2Lcs-iLJH9D3mt}aH7_ct zZorkSOudsmlV~RMe87lrcZ?{DDU9qS*QJT8lP<~1xui<27r~A~_JwBC#CyuICdJRf z!mdSMyN-sHwd>l%nYpp{P9~U*cuG8^IwdcD=FM~MLt9tJK^W>Jh&10xiaDR zYu}s%KGY*$EUkCx^ckxD<327+DJ}Z8<2u$|`#lM=p_mJ6Z%+|xO7!1`QY?#X$mce$ zQ*CSz7W~wvJ0S|K5X-UK7$Ec2__;Ad-Y{S{q*2)}z$1COwJMT$*l;EqR^6KFBINfk z;D5q=k0$O;5`G13m?(Q2B8HYOAK&3oTICf6qxQHEc5^koifAL*JQ|FTFqqIa>c|n- z4GJfevQh|WB`!0wJ!N2mqvw*Zj61`aBs-(P^<5Qy+Bqi zovya)WGN=k@=ukiqS^(qIMv{xRYKd%2BnerIM?6$0cIsWK9y^8Qj>SCk=yOa^2!cvbqoXp(&xqOci~w`mA!bz+g#G z^y)0S%s|aw!(@jp)lAU<-Lo9({e|pSS_peTir0aAJ-p3 z#y`t1%T%9SIb}-^QWv5bY&i2O0OPtL>P@M7Rd=7R4CQL3nEn|&(`PW?lq?(tk$Q=IS=b10^!%aPt zuO4Bl`o=`^{qdCyRN$iQ;1;@bk@mDqA&tkZjfpfFYw-UjRj325zTB*ooVMRa(ECOl zmXf+CM4M!_`^@S?z~^oQ&w;YF7ff`DoR0@k81yG}DJ3I(3CiAC{#d)WP4lm7qy0gD zkM@H6rsAcXTUd~|Dv!FeFtpP_(CEHzcQ~qTElNc4puxT9RvT}pn$`T16EEip{GSOrhClq;D3Z{QQ0@98}l)+1#(_s=FJw z_|~LKBXnqI*ZV3%h9LO2!DHsq7p+Ti&pz}oDngvbUt3KZ-aA1jN(|MDCfqa!J!>D^9ls zKMWeG6jw!J;^$d%Z|WM7@qgL2-|>?MaOd=3fCJ(z8W>(>Uedqo%)9RY; zhZi?fZH~x{wM{qfyc61Ueqhtg(YrpFM=cP39dh!SAvk{lJG2vpALCt4o}c8dfg}uh zTLO4)rqB=X;0n}St5%psqRSuFE3^PP)jsmmmziF^x?xvw;;$qH3JC@Qihtn;#dXN~ z`&&~46bK_iA|?O2{x!bhLNdfbFecr$$GygwD}w|!;=c)UuC+9B&40uHU)%O&f=Gq! z!FwIL{`xogfzeuAdLs(|Mb=BBgRj4=1dKgc@cwyH{nu&E_s;MFx2c1*4qbWV-At_zgJF*Uxp@i$K%&HrC;8_O=u%`fP7w+_hxz|LWtR_h9rz<$k8!Q1{B z5Gp}l{`aAw^~ir>w@UW1xz7cFjpq(~q1Z@!(k_C{6>-+;*Fa6LkhZr8yNUzn$Ep#B zC-CNymi};Q0k67YQ|c6Z}h8Es`KH!$HNlSX3&u1;qpA zUaq~9eluIKlnmVKmnm6IXm9T>*^Y6xtgph6(-vZ|@VjvjB~aLIviSUXLpk00p9PDN zgcQ<+yNx+)ZS+>pGkHTDe`-mcg~P{wvAIict5}{FVa5;VB`mL&4s*L3`g!a{rN7=Z zyMkIE^_Fg=U8s}OnhMXk3Un}*apg8p_97B)CHOLMA`kHV$^EVGcocCQU}vYXm2i|c z$$Xe_6}{jZ;sF-#q-v!?bosowUA4<%z46eYo7{o6wn7N3h-0CkQ?h+k(zGL%JbV{Z zVyq1R5o0XsbVZZ)R!e5(;qmu?`zz^brjDwm7H4%T+)W=Q1kZLuOH~S54gYDAM!j~l z7F48LqLrf5O33GXxuKi4*sS&3$YM(ENmiyH=Jj;_jGgH2P;xtrbhV`8TP?TC?`}GZ zJf|PUpv*5Ce=lzPhF`%nUjG4qdWtP^&A|L$Q3!2R9cV{SC&=;KvUp`?Pm+2UPR4uX z4W)v$SCu*(-l8VM!V5xPVCd*;F(pq=K1Xt#mo)1f=(UNd5Ee3d_VH)Nv53M3|G~;rFxVwm0FUlKEpZWv1jE^c%(#nF2wU6BxtzM z@@u(zVVFlSOmQw(8LzCVls78_zJi&5GO9Zfi8xkvZJ_gbfO{xo^>uIwSFcNdu;`5= zS!stf4kpP92srmzYbM~sRr2B=&;N?1G{jOJH0&z)EwH$x70BLb?9BdJEDfEo?=cxR z*ZoJL8mX;Y2{?AK$0^J^f*&7K3jBMt1h{!=y-+znt$zu+dBlU%&l{yv~L8ivXk? zB04JP0RR32oP>fy5y!&M_WHaeXfmPZ3kPeomjr##6xOQDkG1p~>N2vO(ptazZV_Y@ zzMHlSqUXrdd(H4i&IfPiApp! z@TyuVNFWO9$xYC4XAl1(1%csD*6n2M3Y3@1F}B44&s#XqoQ^4^kBCpqA$m=wD7)6( zc$eBP`Z{%72h~c?CX2+uxoW5g`zTs-UkjC)PfPS0L5b}LnqS=9i{Qmlkf6{<;qE8C3)0*KiqvSicRAWd4^ozN*1e5;@ak-sc-Jt1;{O> z&zdUy7Bq+pUpNtKuj7Zqon_LILFo%8@V|?(d~5?ekoxUT54Y&+_<4Hc$#);85gtoB z=-+apW`LrtBWP3y2uYgyFGjomIY=ZnFWiM8;-0Z>6l zSp|GTv54#(g}iYuyK&5!%4_3SRpFi!Ln>zgg+#{`R=%E*56k^`cJW4I_vmID0F{=7 zxH5BU`TCw|aVTVY?d7blp-s+0v5gPzmgJwN7Zij@Y3aKJv1_BJpL^%aHakM)F1x59P7DGZICqBs-;nCvm#*A zzXP;5MPXIX#1Z^QF_+imykeRFajWI<8)ON!j;9zLuoP^gA^8Y`_3%a^hU*GB9>Ab% zK=b)>4g`G%`OdbCsc$X!vP5J6&#UFVV*((N#nvPGA)s=J`Z9C@a1l6LSqd{Yd%MTn zJ3LvZ7U9AaSyr<5>Bf)6v}?OcP9Ygg@YE;kYwh{kc{=qV$05mR++SDYq5C2o-=q{;<7J!aZ!ui3eo{8cGhhu`s^$0!w&6V#nda&`&yTWO5^gKGPWFyq{JXhTmB{7cZx5lWtUrQQy^ z_be!<6hzqdKeMWd0`t$NCF_vdFBAe2D|2wda9iPXX8DUgBsfm!WRiQ}xyj1Ofc)%6 z_iBVOh=@hn#9p>45^(oZ4z4@=9GxCxLTIY>a~8E+sne(VjkV|Sssrh_9>#VCH!CHh z3z6!w00$fDmR*E9SQ~PMq|8wL&F)X=7CxTsb=!KTr@1NX|Fq%H!t3=65~T6Ms@uTf`5;l5XowtS@eb?+q0Nx`z0SMR6M(>*Qra_3u!eD$38 ziP^{E*2 zAlPz-aZ7Hb`!Z-Ml9}weuyFOjBi%Yv#+DoZOZAyGXY5XR1H-^)imH_Ph-mA6{IIvv zUNH*oRbp6;38JnSrm6_I|2_YL)SL2pp`g(LYQAbI0VBlQjN$taATEY)Qm$k z3PW;#dJT=F+E?+@B2&B01GTRYbMnUWps~n68ju!6NO0aNu7;jJmk7NonU=oMQk)K_ zUDgV1UTbAu%IUjLL6>D>KJX*f%8B%GYA6IBW!TzNQ3Ld{c+M)6;N7=Gvk*2*X=~gv zxCuR@$Pici-Kgsa>Hq4WQS2ufKY2=>s1|9p^k(4sL9Nzt3;qqO=nI=~kp@Ri8Ptxw zgNS^e#jse7{{w~OQul+0PPcw<49E)JTL~{A3OvW%5IM*T2Vj|AZ7cVk#-H~%4Z9he zuG~d(jKXsyyfk#c62V4u-_C-2yCIT#Ry^BqB`_R_llgrRRT`0)DdWl}uy}61Ak98~ zxZ}HFHjR){Vp}W(+cZ#mcDl+~bSy=1=1Ot%0zB|3!KLv!7C`mKK$+oGRjl1Cp424B z2$F7sler2Pm}ufd*9Qg}Ds$Kw(Qw`DBRl_A$7+ka4NH8Pd$K08<96cL^~kT2$!am6 z!RhInJg=@xt^uL~fSxumRSpl*f`wG~EME(;bkC{Q?xS%(!erg{evbM2LymjZB&$-S z=dnAQ$?$i;z?svgmt#4nuZNoR2Qr@$acwRNO-@&HM@;EWusvK=effYsP3)GL|3_E-@YV^P%o91X3n0f8LB#4Ro@kIwYC8Vb#gI2}e>uhRF-SIn zKee(wzsP(6Y`%k&OJBb|25m1!q{F}`FKh`g9lImRPaHWO6sF6u)k4wBRTt?p(%{~3 z1DTT|@62rFp@yE;QTOUe*}}?5_w&?XH7|>`563BTHwS~LE@J+yFhKoHD^*Lt@lWCQ zu|3dnBZ~yteYsfbp8bH4@QB`l(9+~e5L3eW_h}>{FC6RU3Qpf-i%_kc*~!{Ai-9&; zj;Y@Zt-DeYSH=Ec zl=^M2fKG10KZJU-Mw%N$?7!yj-xJ{%sFSG0fO`qB=}#EAP)ym+;4k|^=$<3UpDaC* z;V2>hNf%rnd59J`vmwA$dsrf|XgG&AO3pRR@SNUQnyj+OdIeNoVB0|IGTI z2*?K$oG%KZX9dt!AL}|ug%i-t*?2a7JoTfb_nrh)P z7X}h(jF?&`s)>@m9}_AMu>~b~S4uvIZ0ma|<>-l^^>e|j1HMM^)sN>_vj9D58)EtA z@v)rfM7fE7oQh4*w-L15O*DMh64HjlZr9$v6cj)H`=!WoLKf|^T^9=rK88A^!o7^= z7a@aU2*kIXLsG~I9%%W_NgBlP6pBQ{$eX9T6rW@9xQGtDL%MY2+kHctzh@r5Etij| zVS=5JulSiJ6}c~K?;L}QyX$NW8FqiH;WjaRQwj`VM#>ou=tyWZnvmeyhKhq5S|GPc z^m=l`mR~Y#&-HS91Vk^4ry`M+_eB^$#`qS!ulOv2FZVrvw#)H5>!sLovjOGvUX=e` zQ0ayuQb_|6Z^`;8(G}@>XM8vSvV{t2bI4=*L7YLhM^%a}1D+%jPyi0w6aUtGz{Jvg{Fauxe z1HQy8k--%zoX>z|sBTVr7F0Go$OWIPa1zuqp}DMo`0)YN8HL>nWX+0n*y(bhQ|b8b zN64gLVV}4C^zIT_qOLa?dCEMMLe>Qe)o6m_7eB4n+jbOO3T^S4KbpVyxOqSSx8zqPw$@lv!NVWVuF{Z4ZYaCU{$@Oe_C3Qae!}PV*sV z1?s5tG`HS{_3^up@T_a6hPK(`!6nnjK%F6SJhX225R-QWL_+zK6(>~sPdZKb(wS|9D zUcuu4E(JgXgqL~XH;_-8dH??XXy?byB}Wk3&B)nI7~K4t`a)koW=<&1Cax#_4xdId zmn2))2utF9SA4p0hf~&X_0NN#E%)KaBwg+T9Nxs>k?Uom9Ikx{v?r7uFoyje;5L({RYLIok$4;JYw15Rc>$^*q#H*bNI-^KWf1buP&??{adqP6Ar94NN>tbp>{w zJIKF1$&gIwY0^<=Fi)a^mrL}sjgwq*#<5Ghgd;KLG7=7#3)c38f&3%-m4+G*KC*Ir z?>~jS=c5y|u0Tn)c1Ekgke5i~1M(siWbh2i@7R-Lq#Yyyj}Gi#`leiEHE75f%aaa^ zc0)@LXutj9muLPt6tzb)&=)etH<`xC{X6b^4U{?)z0!#v*p5w`8y4J$i!bLzsy z*!b-#ddi1q(EI92mKQo*5C_K30O-pP)uQ@V$F9!cgmbk@G)CK1u^;ADM zhcOdV|KI{Q9GU(%na5JZEQ}*?BjpCvidjrh#$B{8;)Z~bF;V?Wc5avwE%3UIe0i5h z^WVY{(F@Z&qDJ0sldxB71rTl(%VxDO0}n+&#fZW4wXIL8{O1-Y0%YoYUr@AV3~6h< zAfa1lU*^Y~%q8(eUnwg-UopCWNT0CrnDMaWTo=e)ed?*~5}F1>>KZQ2pFvq{+~4qu zeKD9C2sTw!LDSsiJS4nN07bu7<;@&XFs*P{xD5PzC1HBxsBk$G%ul^SsqG^9PAjc& z-D62%$k=JGtVnr?X$Gfj8vs+bLYXJD{2IXb+%Sk3OMR+TRzj;AumeV)0UXSq<#mafI*G}E*-nZm=~@o-WR z2A!&rL~>Iu1h5B4lAe^L363)#8>Y6NBcuNx%)T3}r{5N!kZNS-r}+wPg@`95!vn{0L` zskhW=%Tu(%ZE~)xQ!d9v6pe#E)eHUIJ+L)yC5V^rR%&ql+fbbO{!kl`a}=twz}Y#e zAEyR#+?}nGX&7b~na_FRYe22b7sg`BJ5h=Z5L`C@q1R}l4Yh0P5-24h7t@{oqjo(NAf=3$+WS;fcc>-h^k>mT zjGH1?FHd8w&g*ZtxEhyKVuLL~C4>P=VA&tv@>YN0^&Lz)``$XCwFid&%m?k2qHXr; zbE~>AtrL`rYTd`bCBHi!$KLyN*i+(pP6RD}g(Z>aZy4N=ftSF}ilF-4KdvE0%rCU# zZr)r8KtEhd+w(qJU3=aI`__2gg*v}26ml1FgVHUdNefu1g>Q{Mpa8WhK`Pp??FDCF zmn7Fq!>it?KmAPe(H025MHwK?{UP1k!`Mi~h69v2#O^wtXU+un0|zW%I z*;DxKzKj%GK=Kml{rs^aV!MO7P|=!#y0GoUevmm4RV7MFYo#`zk0S7aKla|0ErB#b6Mfk{ zwmYtpy_UE4mTKcxd-wN0i^{bwJ^S4-*CiuYbtlz($hx42>$%e{MO(*|(Kx|aHPQ~) zwe1IE6annuCd{-Tw@ZB7-@`w;ImcZ;mIKcMkM!z=QTSoLxk$1F1~+Ddu+X zD+5X?_(_(57<6y)-ghj8o<7SxLx&lgy{VlTj@!aNQLNAYBN!A{S3Q;wH|R^XVW2O= zyr!H#RKQ*7h%3DlWE#m^2>f5WGKzf90rJ=FXPE`S_h|g5r8^R=@uc2CSySG-zz|o` zr$>+{0fa2HE< zADIk(j;Y&Va*YSEcCra+9HsGBr(G9KF~7x{Y_gm5N;aN~N{ZQxm@50F%fhQ4TyO?1 zZ*|G`Wt^SE`IERE%{TcLa6%#WabA&;@bXJ;%tXpr_?%&h)%U^TNQDY*)sfNR&&A4{ zUS*TWlYo{Kc2RzJofN=jAwiol!{SU?rP%yOCZ5?xFWh^Iiis@A2+4V^VHBLHHsbe+iHJgWlWLT3hL2s!qrMLVK{Cy6aSM3b z%jHP*6NRi^%+_0voI1vGGj0Rc!97?upoZU+6hYxBvGf+4< zJrT%pW^lm{@?Ehg1G(YCR%LoS96gwbayT*Kt|KHI5GOE}y&dboBy&iQWbwfK^6Ns4 z2)>i>x6~fHUf5tvAv*~8!=SmDbKsT(mx_RpIXY*QzliWMx{SEg}Ov?tk{&D7* z9JFxBY`8MfbG(AkAn^z50;&|wTXY3D!aaPt^hD0~)PgcpPy(rq4~^g_jnJHoyW zst2Y$J&394;&)%Az#vXyO&6S$I zM};Z`kps^GAN#A3fl6LMtVI|=)%pNo3EQil7eCj7NWXt~RN2&dIJRp)a3%>b*p|NEMP4A||o2@$nzuNkr zFA0of2k1sseO;^Ap9E|PUgS>996D0_9pG4wCwd7-@F*9KBf?d91U9OR4B$%eUt^RF zyoiG5E(LxhqnmH>ri$AF7K@;w{y05B+SfK7-qP98g{2JKc?z;#iVr{-_-h+=le^vZ zfJoOB@`zpO&Ls%KCh?9p-#1$$A|6psx+3P!Rfm9tjn@|?&2>Z~*S7&$E(eRBNpd!0 zOg&7-n%@`|kH4Z4p^wQ(5g(6v#;*xf=rqB~KnriBXIsKV0qrp%cbJb@4YGeQH;&?L zryvs=c}00&fEaEn+{1iJI*{j_%?nu(etqeXxtHoz)y3TS@PaZCgR>>=pA=*vek^04 z^Jtv*n{+3%ElGY$8URx5;kNRz*fYvsM1La}m;wjOcFYuTy-kEuidyTUHyi=aWIX*3 zQvZC@PJeAihZ-W7OaT%1Zlx3Z_c&=CZ3Xumw#n6-aULSIqxJ3qPL^WQ7v=Vqh>`24 zGiGqOmg8emjh=|F;0}Xoy1QYb(wd)aRzPOeB&_>o}1Ca1re$oTwx8^NU?SlDJ z7xxv=(^vqLh;+o{wItDpe@LJ{(! z3U&|%b)P@*UTphGZz;e0K4>!PbO}J23NMFQ$6?zN5tj>21i;IMQG^sjeS0T$Yc)a5 z3?P_(t;0_&7Y`QI9$!u zT?QHunb8M_cw|BDwqT6&kQ4$icB<>qsYnMahwA;6U#BY;xjcGi769q;;5@!3<`bn; z6dc<~@B+a0CxVe=ij*Ne-T+xXl&b??iq!uNaO3`pis_~L75G>~nB6A#PP;tlD+A&` zruXHC&t1RXn8=8sjY5ENVG&afKkiev1oIj?X$@ut2XUf)>V4=Oe-(5W*MDCKc)RG^y0r%g_O`?1Q5XFYOEwKcJJl5;jbF{+0R zmD`Tz^~J=^r31i=-WKgaBj8v&CmJv|a?l)|JPU0%9c4_F|1{z7#TO1SBXp_pQkqHd zwCNvmuBYd{vn<&uXJwSXN~8K?~W`UYx2e}dNn+LuAkn)>UUJ_0)=gTQ|}^*+oVAKCfqcbCPj&deZcQ+AoK z1sswTBaK)6bLNUXp7peuMs4{2)l^G7(qW?qC=`87NMeT+vAMiW_AFZC;lnw=t>jt> zn+ZGuAV|o|hzSS_J!O4TC&@$P(l&ZAfY(vtM5y-OaAhFo_(I~m#*B5@ShzkoU;*6W z>2|XnQ5Y#Bf8_vt`foS7KX*QKvvh2Rf@mTAc~E(fVjwk|1h89H{%Tu;oSiE54M7V2 z#Ox=lT=*ikgN8eETds37e=T~czEq5-n0yUdjWf))yu5|YlhuZ0iHFk7<%-OAdr}-z z86CxFfc~hIUw`D|*IF1(y$MLpOw&q2teTZomK{@FIcFi%CYt}cEI4ryas@o3ktf%V zn#Jpz6S|LHw5cAduxPX~#A74Y@N(BQ84BH7q0_dNN9A+-_En;H-dy4PeN9YLj3P$6J-3OpL3a&x$Xx2Y%SGR&0^goshQG?vD^(#Hm zWhG8!Ir%!WDoQs~BoSEUs4bxK)ZbmwIXjs;Prs&^ zuqfVN!gsysc>TwQl+pj}1?U_vTd8RJJ=G9&E?)Ul4jKUHZ7Tsd);+gDi)g^M?)9ti z0evv4v-MDww6Ob{54uKAG#YV;02&|}O72_oQiU$b$HGVEN+b>9e!WHQ1i4fD1ALDs zl#$;NdG&aIF*v&FlWZ$_163 z1`=^dXKy=a=o40u`m7l40%8m=RY4XOO9F!V{c810)prj|uv=E-bA=|igFNJDo_f`+ zJi~}#y=P0uC70=%3#yI(2oMw>0Z7DeyiVixjg>S+IE>W=sBOIVGj509v*f4Gp^#Q7 zST8_GQF!`Pzg83EU6Eiqfe1JsdouxMQ1Xy$43%8&|5*|Wl}E049=#VrBKW>}Iiy8> zZ6>Na?r34P{Vn_jwE+hP4(ci9Q^Odl`mDR*;J7ugYPX?K)ahNielk1ap;R!lSNlr0 zUrY_@zc>ds_K>$Npq5I6rqEOCIX4}{Yb!u0G`vOUvPxBDEmkb$jsG+yEx?Lk7N(a5 z+QV80^V9U5@-6Z=?EfC`yB5O-Fn_Y^;#msl<-S{5LkM~U36ds<3;kJ$0#$9u*5uFu zeOq|9N*d*}e0Jsfv9II#3e9H$!SjDDxD|G=OSPld!u!wyglu4pQ)+MEYE|J$L%} z7M(MLL!X!>^Y*x4YIc5@r#~Nif#MpNS5vv8vr04@Kg0(82nx<5*X1wTz z{Y*eR9VCym28>G9MHjL3hIt_4ed`ITA%@(Z& zeMy2h05S?e+oZb4-8=}r58i#*b}GNmW24n|!=AYK*e{`p_2z2w;sJPX)1W51lI-69 zx-qXdbnk^Gq-LPCn-e)=be@K&5 zVphiL;u!A#UCmt><19Pk2C{~J?FjM$h{=G@Gl$r2zuY-@ndb_chLxWQ)e#hk6yCin z;SiT`p$r6BSb0Do$+v~^Oy&P@I#;)cPk<=Y?pq0SkbYuq6KO?Ws2F&&7hi8Q?#@x6 z{=(IhxnD;Dkr(T|O@2uXXj>*esv=e%Y|#M-^oo`3;D6XQNVh&fQKu^CQ@&N}k01M2 zgD~6k!ZfWmdTs|!JpMEHB@)1-&$EGKFJtHFG1FTcpgu~DqpuK1*Q?untgQKe(Gu?C zEok7ftvAz?pp&3gkVP$QwU(4^tzp!qsm}%M`isI_UpRjXVlDyR;4NADun>=+it~=x zbsl(SL9PVv@~vW%PO!2*@{%6Z!U;;U+9Sf)H+Y$zy8HXGiffMjUdOZPX~H%otCUp5 z5p^kOL^S$_ovUSE%*Lg|6ZD%aVmNrMB@qRtWnG0-AIE78t628SZnQLv<^w*)y!8NNVg}* zZBS>3^62eQ6f3f6fre?na#kIhg|C8X0>;>c7xzaVHWZwz@bMqe`-xS-!dfr4Yrv_~ zYtA}H{C~s3IV#El;>1`X<$u4vsRnYWtY`C7|MOWY0xac_B>+u>b$|T8!zf8!(Cz-MMR<+LBYqhIeW^9!q3o2Cc>(I_=a0Xfgq|d z0wTFz*xR`3go1o9j zRdy|*PJ$Px-`Ubl%1nOC9e60*il#Hd-P*rYxxIWX~&)}uXGuyKIiJu7cP3#p4QSebGQdU-XrX*yhrx}^Gvf`bA?g(5y=W| zCrpZMt$K>g;1!`O9Bkp3G9>*^=gKxef49VEXwTWGRLfQNt$$!5P)$6F^ zc0^mvcVU%^yy6kl(KWtvZ;32G;>KtRv*0BqU`Uy(cFgxe`9>TSsrN zt*<*qHD+4lhtoc+Uv=yG78wJ03kD2Kz+EE`W*=~ZY)uBWLMikp(M@CN^O!?z`|>Q} zxTL3=NOzep47+H5i)H`qMCU>pQXwL0+j?+^vFpSDiy}*$z?mg!D6`O-^(ntB)T>=< zB6*2Vdl@@yC*krl9jac( z`yQ}O%VejLDiQN>r82&HXoiTPo=RKOY7nE6Do9 zwADm%ELH4B?w~x{Dv!(aajWcgxw~}2zQ9En*yrI+fa9GWw%fE%*K$J>!! zygjzz(Vl9tWkq;^f%E{^!dN$E@v zmlVxenb;}^g-Dlb6Bpf}n%;JK%9_685dtMev3__{mxft>yPimdogTZBtk3r&PDvz7 zp*Y*DVNj8)%XH1tyb*mBvau&XmfZ59qSlNpR$p?l2#%ha?3t&Mm#kEHW$>ZiC zwv_N(o*tv5;ZJ#|U%T;Tb7Xk1)>p3xmQ^OSzrOw~WJ=p@K##c_X-(iQ!1BAPab{P@ z$Y|~7V9Pib<2n=0OE0^hyYV*(QIDTeQe~F*&pIy)BJB*ql&AJH@t>u8Aqx_TMi1K`Qw1`5C%a@y0x{Ig$DHq7gHex=hPGJ~1bj6pv}}$@ z7hQ$csq|&uCE>sY{ zZFogE%N1?@>kBNekzB?r(Hmzd?G~{wf7@Hp3f#tz>|U~3MpjUkcl1GbygOuLbe_Ci zMo$>&Ec!M+z1Edi{U)zkS7=VcYm3iL*x%&6r-!-HTJ|JCpBIgDtzui~Grf_o>I(I0 z;be~TQAvleOas?W{FzVKoyzp@w`g~6$dZm)(f@RR15;=7V^UA8^hg-55_;RW|MbX# zEGG-v)Le1$K-sIzWoQ}!wRvAi$n-aaX`SV99OcqKI~A-M^IoY<36nabs;DTs9N9H) zYceCgjW)$(Z%?-*lkC>K+()K$ZWMf0twwiemoqFf9bnh0uCH{|P~=qX|LW+Q)7z1+ zK_MsM-J77)IKiCPg;cBURf$ski626DD)+*b*m`1Ocw+OtT0?{Q6Z7{y$@?HuJja^m z>Kslu@%pNz^^*uO4~>@|fk%&lESJ$AF!ot)i{ARjE13L`( zI5w=fYgG|0p4_mv?hF*TYhL`rD>)DX=fc-%(~eu8aL*QbQX_K^MT>^Lk(yH~H%ZyE ze#;su{glVq;$z)6+k0q$+q$@ad(5e$xy{!AdjdPKHrHIt?w&D{mcN0VVrM9cV%y_! zj-I!T9Y{fTff&Sx30bneS!|MC+~k^7Z9*H09>xUz%M;wYR-fsHvhJeeEsM3d_d3*C z^!}Pv;Yy3Gc5!c@pG=kn5uC)ktS(Rh4*)GJ1R3ajP}4a6L|tZ8C0ak&D8a2sa69Lmb%vAJ-rO=}!h*$Cwd(d>T1$(j6$behA8l|&@Dn0JSG1;Y#2t$wZkkH> z#24|(CPg)$)VwWRUxk4%THOq`?DH5Y-6kt4O$JrE-B==nDo?Wg*urjS#Jpoz_L{3LP2@O|#=~MQVjoS(5X=sv{CDGQ1mlFB z<^^MGSUq291@IqJh$3O5w|FeTMsD%Y++3Y_yCIU~psb^`X_HvLY`5Xhx20>Rr}~eJ zr(S1jIRSM`W9(fsM~4!GQao5yDwa&z7{3G;H!V6@z@230`Xm{m&L5oyUT1`PO^EqinmM#T)hsXJ7amS&5*S z_!QtG2*zdh72qP0JO(M4GyKQWUfQ~xRJ#Rf!}lG&J2RRa1=|Ofmsl9w$`VRn@3PmV zedg!Z=^&HldHlwjv#|DetZcM)f`mICL&RoPam=cW);wDu|4h;nM@gI8V>7EZv)Nb8`>wR72F(WT2+L(2Kbo`DyGa5(`JX1#gm_oRG@5#rQfdD zKQ44WtsM>|(@^6Il?sx3Ak4EY=t?omiI{wI)_LeVyVj%P-zZIs`NMbQPoi#u6ldn+ z%CJn{sad74RkY9V!v{Z!XPjKOBkV^vP6U6aHGqytZBMkE(!ML;v%uhc z_r6q6X`TF~Rr>sLto5WN){E`@7%em^`I28)5@E~F32o3)Glh2qvjU%tV;LEeeA}M)@tiGR(r|c-Cvuw z?M4Zs45%yF4cw9U*{c+)C~0R)-SP|L^DU8UW-&Z#!%rON-kOM?qB0fc#X4TpBbz&% zJsy1lO+~ThM0Tl}EU#l;mod6IEW+k?SnNJd)9m$&I?<=o{vP)R{}?+9`;w0-+8RQ$ zRh3|++UBv9^UC9RCNo+Ms29VB>4hahM1ATWD~{w?0JB>0GqOvtSLV2ziJxX2Nyl9U zzeLinH9W**LdK5SZkK#v-y5qvQhhenfBG;>a#&EhX>rq@5<|YSa}iCmIY#AeWbo`#q%oB#4{N}3NEva6F_Oj|V{TZu| z@0k3(;q>)@ti;1MOi*c~9`=6FXL3*Q^{%(-oI@D^s^}@C1v@t&5?4D)`q?a?uRH#IOPmaAGWl%(uq|$LD)s&gB?u-_$RXW7oa) z`3^!YjWsU~SqDKBeyg3f3GTEU8k=ggH+NDi**zL3ybf1W zB1N|_`EO2CJDyeH3I)DxdC;*n^ijRh_*CpHt#xvr)ZXC%l|Rl|gPeA7;9-jX5z#^S z^B*^!dv@D#SMH_l)XGRW+t!$r*}ARiTP<6t4w>fvOnt8K!hPxNCkR=0#FbC52zHnY z|7h2OntIyLSBQc@6rUnG++aKoF znS}D{R*U;_nF2y%-5m;xANq0C1U@Jc{-<^Zqw17IO&^vrZu1RF`C;Cypn5FPQiTMA z!IH{o>6f|Ti4lxweO0DzZ-EBEtk*SD?TXJi=Q<$n{zCL7k``?D5)$ZonQu9151@PG zhl=lg&i(@Cf25@SLEK|5>1X!Z>zgmGuZChRLDy{W)LgxXcu~hoaruMdZGHx|%Wllp zaA{Q6L#$|}WFq3{d8JQr54;2^8^rH%F?9vr1qh>GNNZKhma)mF2h3HZJ)+)I|G0R9 z9)%#Z8vWVhLT$|kn_}}Ixi=gJbjva470NJUC9Gq2f;Hh7AuLILBh(7{K7LeN33_sK z2y#xk76nhwJ^I}8w75)G>}{COaQSev&CXcjj|#6ItLAT|3a1HQw(kkeKFgQZ{?Gl- zg#iBZ2W`zKSlCS#l{>PqPEtyZqOhf;p1bl%{vUZ7A8a|+&YPh`-(mDWZg-S?XC;8cIngO$miqqr{gM$1hq{CriM(Yv2s!!-)Dik*XXjuK#Lk3W@yGBhmYT5us=w znc3res1)W7s8&OR4V@;3Z9c$IlT}Uqb*BQY_`7DZ_H%0uxCJ#>$+eFf7s==~iMFt` zKMc+Hm)+E#Z{)SbSwIKO6rnRnWT;6q8hoie+#3@g3B$T^L2r#>W|}Nr^Y)G zOT=4WjmT0Aiasm21gg!puiTyM349 z?RMo&tv_+fBzx-#xmgZ#zuakL_oQD)ILR9}`S%vz(e_!&!xVg+?$tT#^xidH(T*?u z#KBhzfwiZF2E1?FkGUMFYBic~$EY{8<;8OG_SC;r(A_Z&Qo=zLq=sT>ks?w=ln#;56oF6;($NH@C?zxn3^fGlgn-gQ=iBH#XPobj^L`)h zOGYwQ_85DtHTPO`{{C~$t#Sa6%DSIsb|_5+m-(@n>=zjBgRFmid<26cCJ(1uK^5*b zJvCk~jJ&-}G%!toywuk7f;v>V8eP0NaTAwrow+^^Bu*e<(f?v5r&%?M29RxZNvx>S zv8{t+rO)t5HTD%f_5&6*U6yq~8fj*G#>+C*C!AfLlRjWP=k#jB>oY$0w zV4ZcdU%$H}F4riRiG`BxjVyfQH!t=Cj>G7$RL_3N=!9&hwAD%3<9eslx+oRb0?h^H zj&E_Z@lg@%d}EhEy8vq2R`)$-$xMra=iM@D{cI+)l*KYN&SvJn_Q^6BEkS8$yelQ4#_HEruTc)d1ztULsv3Aa z!`lSz>DW7$;1yHt9bJUUP8nE;UUG|YPEBA=x0wa~l9E+C$emd>(CFYk?-8HQ&+(kL z+2Q;Fq-(Rg%b|Fl_n5)}$EZ!wPg`o(Go-vrv{k*mPCumqT8rhJDzrs~x z53@=6?y=0pLx7EVq&l>OUS2-aGY6fBzHSorlKo`zGfnA)XYP)d;H1SVjT&!U$b5w7 zXh~L~^z%Ljz)fMQm209v^^FF_2>J1600vP%vNbREfIYpFI9~brt^dYQ@asvqTmiMx z+j3;YGLu?4u1<+879N(G1$Iv~|D*uF{x`Sq0F`%Ul4WHFT&ma_3uBT=#}wPaBGW3Q z_^*S80HCqw(DrbV0050Q^m_rwj5sW&T7E49yku?O_q?s?_abG-Tw$>KrhAWeb4os1 zpROzK^jc`;j*u2}Jb0-s_)0IxV1Zd6Yxpww#Id_R(;u|EyvlRlye`tL)~*~@7?zu5 zp=DxdnHQbsfYxs4??l;Zy-1)O+~6(1A%Pm$k$pRvAH4>XwY4`o!DqHnOi|bsSe$uq zj`mbz$RHCEjI|+wl_s>c`(pGfuYova=7sd8J40u2vV?`9V=(SzK}_ZA z4EWVYHY8@pYx1LMeTAg!GV@zc8N@eq!%SAhcFzOh+{-!iaw_9putK=Hk!}&|>A|_N z%ozRtQYY-U3fI|GO&;I%4DiZJptDirX`g%k;ui}-A~MW7RWAhQL3G-`^xdj}2io{4 z_YTmv2xL|U)mbxAk4;(UqrEy+ZE2ZRAcbl8@oFCr{vlS=IoIWaRQ4ZlZ2WJexYX!cbOM*p5`lGxXSALBKtw3 zMBbqK1Xhv=Wm@R2>^tX6?J-X@D`zm>AE~Jbb-=41RuEGb zUIg!GJB4_w=$VGV?N;r9R5-wt$6SZ~#Ia;z+%Nt$r{-KM3wTlu8lKQRMm(WBc2Tt0 zoRL@>Rv)D|v!&m?1p^2hFYsT0u+_58u6Qu!gx?o&g`RuTs%QR~Q#uZi@MUyB484^! zvZJnHrKV%B;r37EFBoIfKJN|rh9!G`8!4Zc{5PM-$rlAJ5%2#niXDCLKeZB9 zKkLhwIk0DcHz3`xPVKvbKOShFbBR~qldF2nvujSFMuI7)5#h`{eZBLH=UzFpJ<@nl zu@GmW6$wlndPo4ZUe6DRL{xrrOY|33`>}wx^;Hp&D=pWH5xhfcR75G%)R;S!{fFQ? z$HwaH+nYI6n0Br_s5>$ynDy z#PP`tcUpeMq99FO!%Tufae-eW9U&7bYPCF}`%|Lfj)KfR|OWtKw{r#^5<~2S+((d z3n$f;bp1{nhV*ck6kgftYVnq|9T}b2#AztxoPfJ$O!!n;p`e~0g{Fey1jwzuJIJ&fojZrA2{)I;QskRKI?YgWc*N#Lyv*!Y2ow7 zM$~jPL|)(wkuTQNzhOu{aR;ebSR3au+FGvWe3O*{-EsL`{=P9zR+&v{;n6~gZU9%! zrxT$M!4+W+5A9?t5m} z#70aqqT24?5RZcBV#TM=0Ah*ch0clZR{Q+HRaGL9@bjT_cRQdy4UH^?TYGc`wU{*z zq$OkEl`mggq|1Ko_Rv200>C>kLRflSU_Ut^J_@f@-CUQH3PuKKez|&l;f24}6K+(b zBhfOVs504{bn*vsGBTJFBHG65CV=m8SLhrZ?Up`EnQWa2WRjgPC^(@Y-}_@R_GU-v zBv(+JAidlIqN8G!EjqC}ulunUw07hp$B93uPzjrjrFfkWX9pbob2(xoKfOlh-tlU3 ztn&)sNqvoTB~Lt@on8k7Kx<^`M`})0dt#>ENdm0|bn-cHyEh482BWjUiltFP)?|b1 zW4lf{)C(zSg0tAJ zHuohh&L$nefdm&^x>2^HcyfcL@y@J-;u8{4ZZX-taW^MN4_7KZ{;=(rNuSxv)lDJL1DOJwFHh0HeyG0$i0D5tllb9))dl$(p_ z3?Gf_qhmi@>*b7SG3;{i%`VrJJb%lLnY)Lo^r>665{n7xCd(>x!dj^8Q@s#UXx5PnkIVn~JtZT2_M+jR zo&568ld~_|!Xh92yj}BX=15;?;!^Ywh@4@hUwQ8<^s)FsYB))~5YTdslk$Qr7Xms% zNjk!=qYN^6;&GNj4ep8|w;QQ^P3PMi78+>>u3G#{6t0HZJ8;Dga3!_6(3&diRC*!9 z@a17aKM~7iAz6l+XXL!)N)_x$=-Ai0LDKu29zX?de838PhF2J+1Il<9QG7zLK9{!? zLM1u0cmgU9X$jrBt#(1H+JXH^mr6OknlN)QqP;n(`Mw741B_E{Y)pd$gI^q@7<=dq_TWRnse5Go!9s!gC}2oj8S{~=;qEf(6O-f zY=a=_nOR0)W#guah{@_&Yje?UK$9hZ*}AtiK!6R&q-~!w&!t?l$qg^i2K2^!g_Q>O zK-{7%hGS#-z7vA>qg(aab~IK3YL*I2glpcgwB@ox%K@X`Xyt&_SCfd;Ux8Ha6h_K4 z2rj}Al(zuUKOU=Q#8Z+KEEDm{&Ce;{YB22Mx560Nq#Hxw5u3ERf}k5ZMcDvdwJg5{ z&Rf&8%oM)8M;P89Q+I07h=3J8KX}ADdmC{^*|{fjFX&(|xq-I^t;&;wOiV19`|k)i z7u&I)@=&B@1wXk{>SrndvfLLGl`1&6R&m|@>$OSf5Re~k@>9}iTdBof+3LV+*XD_Q z_4=&<#x0@?hcHd%zH}S)M#4(-Z9I@O)*Qm(;X zJpuCbU66ka)?CNocD8g?i$M7n!!m^=7xUE{cpIk=3MQ{QS?7J_PzWc7jC$v$6uNyg z`6(=`V+zJ>ZF1p@5W_U8!vLsssQz~KM8E7$ji-q{H< zF)@YO?poW~iK4EwDS-~K0hRs1Ow9XV9byEl%Ygc6d9v=UM(r8I(|bSWKzbc{&HQ0f6mi%vzlb7(k9s7N#uBvdF_#81F9^UEOx8yYO@Q&x; z;XzgjPk?vs+%QPS!(+y~EhnSxacpsnEQ{L8S7e2wO1P#1I$lo5Cqfu-;v01d-8=e^ zPyAC)Uyvh*gj{4PA$c-%;%{D}dO5{X&ICv63nel(jYf}b0yVRn&t%iH6g`k7k$K8I zT9R&av805Z7~x9kyj8!)E-~$~lqk@8edhCLi7kQDnPgAPcx@6r5w%qPo-btPr@WoT4?K+6(RLyB!W&%b)|KWrchg(3xmBMUvl&oo458f zhtjqpzj}(Bvy$_lDvoRc-=id0Oj8-XwA;tmHKH&jAvKOx!H;JrPm!If8(Te3%3O@J`=qL@{Fa*g?tQqc%Y|KXn2n;8nKuv0Zb`_2DIC+|e9@xw zCA+xQf=AMD|41|sB0Kj&_cy!-cEtd`daS9JcbD}dQHyC6bt}$ZEse}Ecd5*s?LD)f2OG|66Zfl%gG0Ec6(~2~*?wDCEqu!;l8v6do z*-o>R_=-rc0-tgP;qTWGk7pFnlKL~w6?II;om!~mfyY}>?2BpLK?CRnn4m?9K>WsV z=F%25dNv=T;$7g;F5|%Wiq#lhi}<(nyaK00)3CEJkm{4M1L8I>MoSD5=54&C3OyWz|srQ zDnklW#UihnkcBhJrLM^5bqJw1BM1eYDL)Ur? zLQO)LL;c7u;ulZ+_|OZySqFP{`2PH*n<3_|0~6O03=H{FUGKuLJ|SB=zA6xlG`|g)fI+4X$TwyqEM^aIWC$jLO(Z+pIr`(kwrK zk1)D0h1>Nw4i&L9BU5`BoWZ|GcY7Xhnx70^p&pdUS*Pk%De5|T_~hEWMLs)gWmcWZ zWlmEqWe>l93W#S)NIl19W>8`O@iN4kj?0%>kDKx^yFMpuvbMKtT$$}ovss^RKQ6IW zEV==|*sZvlkN%ewxj>v;7J!fk4K(VW=%>?QaLMwtes_uWkD&QC#*)aLpr8Jh*YLhi zoA$7<5T4?>43%?ra}%yx`@}sr6)Dx>ONih-+j`}L_tuh%fPjF67{x;C>&mhDxvG*I zhmS?B>jGI1oPR<>+p~{B_ck@jd960n)&w?@S&%VFCQjPd)MZ@5hwDhx4cw8(x%5!A zFVU%9oAdn}Yi<0QN-=!Y`z`D1olFm)teK!t&kGW`;AbN4UPE72vh_ADk zSZ}%AU^XPM=`%o58_$6QUtFvBgbmCqCMfx9I68bm$KyP90=ZDYuftq_oLz0^;ce; zTklM~{?bTH=$)<0L*z*L+&+4Fc^Iw>ebzTt`tV6Od9X&`_T)RcRFi;#1?7e|gWtL$ z)i*#U(k;BxKRwX)RI~i2aIQex^V5iU);!78CQ6&;5b9y*=gIiIAW{a!!unlXyXT__ zy5|yJ8`h3`QUc~w?i2Me43F(uoGYb!E}eLsYt3kwYk-$pdrFQYK?J%qy!41nGB*5e z@--(`u~XMHogNQ4YDu}e8Nh{1)E#c~sfl4DLL6std?2f&$Ev`yAZaS+-E; z$!})tCzk8t`4oCn;u5uX=%w(rr%#`jIgV9R$q1>|QoV1WkXOMMCD>nFs_A8$*p{EM z-=d9cBr7yhl5)4+W<|_1^8F*Y%qFJyWrX6LVf{WX&QR)Lihih?@~CKU-AxWtZCJO% zL$cU^)M?k9FC7IJC5R`<5xfQ!Z(~e+t7gs`PAS)4X@=@P8DLBe!P?>9viT)LIy?28 z9g~C%_QN+>WB=EeBO7Z|&jro4411bi5bPLD5NY09_T8${>$WSbi+<9x zKGT^gGT|eTK>M9S)g3t!RpB@`Ql4B}-Ol#ssKBLY=Dp0Hg;nJA)Zs-{X3Zf5qHWQ< zN@8w{x6~we?u&Y(?7a>!TfM;NDSbvCldf0e=0M|Cb2s%jy?(yPX?M?>N_%dgho+`5DXb!&m7q-2e%+ZP1LyJ+mS{e!pU@&<&g;V2BBGFp7G= z|YH-qXZ=+lI)xYjoi~^gGkd|7Y4pZCk)#I_8UehEFW6> z14V3H+-<9l#M@VLMjS+)CLgDcoX-5Zj*&0_N`GtX(37$J54KmnCbI|)T!ZbpR0D^6 zFRI<97O?rY%G*bWn7@pqiPJVLgGPmx&+yiA!+~$N*(Z8V<}Bd{X|rbzap@R=V6Wd{$CD(!n-d&eBSF^c(1B(zJ^+?< zl%hw-Zs4sI2-U`ONDq<}N|-G{uwy1LcQ{sYLiSCyM$YU{yG*r&bL}^Ou$`ge&<>&G zj-&a{AV#+j;si-|p6O^8h)qmXj+ETBcAJHU<+-UkJ&mu_4!8Nwa4O&?Zx?bTdto47 zDOaaZ6?j@ISk)FE8yh=A6?CdKIM3!DHU={no4fd%g#UbrLQ%wF#6=*nnRi4ujUHi9p*@c;xtR?}S`;12R_dWCSUaedXQ!%EXg81lXlF zdQRfcE{kbnlao8@+~#xc{kzO1JsdsaCD?If(17 zXpGIw!!d|M-&t1$`V32;^V>)Hyl(#9vZ)41Q@Pfx5k$Su(1EtH!jByfS+12RX}Ddc{H)LJT&C9feQz->!wmN2l?lhR3b+>BLIL0a z5iNJ}|6CH`E2>$yo!Dz(;G$e=UcUE(>g4TzkLjBrkT~Sp_&4f6>I#x}(G3S^t{e7o zTu?b7(gTTy-)@Zn7yPS`UR#+>(i$+Co?>>toL4|fpdlRxRl_YO)yM9o;LQOq@IyiW zxj{oAA%(`*4ei4;UVlufX*Wreh51@RlIW#ay1#kF6vc8Rvqdh-^cx>?JEYn0`HZSx zsisI8d79*qwnj-c5)7rLR^y4p^ZjwF=oA9BCj6v~Dj&RF3mRB>F!f0+?Qi8qs5X-D z(7^%XvQVAjq!)jhUig&HyEOj#ZPuG~^&qg~jfA^y7fxi4ztdb`kSV|G8~$f)W^xE{ zx>>G5L)zN*HZrB!>u7Ap*T$xA(nTy3)vtGmti2>_j6vFN(mZ_1bQCWI=3r$O|jfR&TR(5RU!KHgG z!ItAxK^Vu;$}5FgeYM%y+1Pm4a{)E~zV=BwU6p^T=tHy9hxnmODRt1=62EO*B#-Nc zEqr(Ks?e(U9W4yFDhxuJ>=3)#1{lW#J?-wJyTdR5exhYrdwo6;SRtnqrvKIB`T@p+ zsz&ubo*@&Nq~2cJUxTy~Q?s2PZbLI~!L|tccNfc*{|HAn307;trLS}97vhaByj6Wv zVTaiV;<=yCa|ozvBq#7P5-T5MseV6u11=llZNzRxhs;Tn0(L@*L+Ou#!*Z7K>!QKi zk$-iE155FZaUa1gPL^imy1@aQ!I}G9fAkmrUp!q4ml1TUpKHQC7gdu79XKK?c;VI8 zNLGPw>fH7@Pae36EqdICc%zJHL)1wl@%h$Ajud5}?Km^9X=vdObO0JYzM};L)Pte) z*?%);ku}&iF?+`c`k?6uula$9_kIezCWZ~E`u6vUaCSm2v-v5CyQoGb*G9?ONH6<1gmpdk!08{Q5J&6Df}4wVsR$k6W!$U~c1k8topFPdR#qORcod zhm1-#-)p6K3!LV0)bM`6EpQM*eSt#<@!`EJBosAiGKN(Kz*DHMd5s}G7PRl?U#ax_ zday4xfkxZhwM)Wrtwn(=LrU(B13RV~gUc}X%ReL%(*DvxK>Ht?{1syTT)7YU9$YGyq^541mffgKG{QhMZX)IPm&~4FkA4l9g6R;yK0O zY<-@ZTzg@2Q1rLb#lCO>)|!EN>5zs3TdxDZQ@5|rfwLo0aLD6LMmSr~(gEZhG+Dse zQ5Oqy))6s(QThKDVk(I))hPK(KY<8{IKFK_FCVBjBqih)^;-&Wc8AliiwXlrzrud_ zDzyGAemrv#fGIZmwA9iq+M4+|DjQ{xy(i1?4 zCbVfTsE0Qmc+LBzL-$uja&44>PTk;L87}C=@2@ZCx%ERYoWSng)OX;I`owOF1%TGF zC+;x(;-nwKe2Sepi*(w`0cU4)CBVwvSMNV$4M2!x3XXWR=n+l-7=a^}I+Ca(h4Q~$ zpE^HY7yDHU@IPS%ltgt>fDJ6HT~plMo{H>EmhkRb``#u-TlIavKUb$WTPw2%VB-k@ z$w-b;QBg@(V1fze8`p`dC@YVa)QaIachadb04(?V`tC`Y-@!hQY5n$OXx)Gc7DA|T zLn>yfrKnOoF!+($|Mu-ZjSEQiu;*2nj^a9)uN!%T_FHCW+xQz2LG-F!X~sj(A|$V+ z1Mos$VAeCz6e2TpOkhI!`sJ5$&9|1uuo(lMyfz%O$1I(CvFIBJ9)>^6g1Ds@0c}VM zkEkH_9;@FRFc>bug6}6u&)=dKajd=muo^^`Wx(AK6cBi6>@}B8#g+%)E99M|tGJ6k zBN{!MmP~PVihPWY!9nUVJV9fRW^;RWYPbaCyoPuIu&7sRsWMV*u0Ufb1&~3WBDdT& z+%z6@7hll@(%$gYk$AKFu3lc%t|D8gmuix?R}BVY8U6NF6WFqmBntlFvVl{VC%1X+ z9_vuzuq8P*bqJ6-ssnwk^BD$I+MOv)HX0^B#L;9xOVz#87U zwKYjyIhEl4R%&SPmXnA=iFf{h7+u?r^FDW&9BW$Hcnl`zz4>%e+;E zeQj>uclOyY3qw|izdoTgz6BC*^9z|u=If0T&j@%|)>96|lRyS-%2tR99&7PvX@Xl%?`eka+B9MT|V6d(HVth&+0HAUJ2 zVq&~^6*JA`_i}%$|$${cuPNzgo{ z1rbriIZkIT_P(R7Tqrn5sNc*#fUhtGD@dDu#s`TplXQ?m4yA&FJes4)m(HdA(v1%y zj8z!wd>PXZkcQo{!X6W!U0N%0sy~7W}hPlbwyrQEb|(lr==x(LXysS zRV})oq$zm5sR^D7)eXfbq#_=53R>u*V%77PrVXIez*KC(`9v*scrSn5thIfJ41d~~ zWWlJAs473OG?b{xs1m;+(BfXEnFilUYbm z{!kSoaUG=2X|&A+Hv>*G#ts&p#Z__4YN^`l;S*UcLTXbe0Ornpc+Lhle9t4Vx3xTx z(vGNfGJiM=Wvu%rmiWiNdd8tXFOs9t4FZW?EEH^1v<+DH1L?5{S)ZwD^^;_bpM6uR zE!*SEKp?ZYGEv)tn?5`SNdnamwqY+?2xo!n?DRj|hM856`ZOO((LZS7CxNCFm~_u3 zlh{!AnhF=`0U)&~!X@R8bBARY7wTAA?WuFwy}iA9Lt&KcoAJ@S2GPde$nr9fTCtZC z8VG$x&z5U9SkV5?v2ZgACt$z}Pz~^uDF@XzRilm0311EefM|W0TwO|=q4h}Nh`f0fWywOK!Bx`s`mtE`|<#&is-o8 zj&mF8H>A~h11za=lr&uu2zW#dgm4iTB7jif$KUyb-~rUn)CELAv-bUekhi0a9#Qm& zM-K_fkuV*pr^AUY>HoGKLS<3^JM4@$H^(3|j``XE>3{Dz zPTW$MPXwUG?%~7JzeAlqmX)OC(Wqpvd9rzH{RH5N)NT&=Wr94r*|?8DJy-)t^xM{j z-uQx;s29MjGWj`ch~FhE*?~x*hc`F67221rj3iuUp+;*XxH$sDV=_W9&B{2IF-r*` z@=C_IIf7>(?J!*udnm00?5-=Y%dQ{oaB~C?K?2HcPvq}&VUuan z1^%h=ZD#B^Wa+yI(i;~GadQM8K-$4s@AC!V3;cdKpc;4z)!)i-6E0N%G35)5`%Mz; z3-uU~-iWDn!`TrtNISG!TzUCtL9p^Xpy$GiLEMDPH;}`r;<3ctx)J!DBVzs{-Vrg6 zIP*x%k96km5{V<7dB{0FGC)Vx{HJ~R@47}sZ|&a?7MiI2>VnDD1*JkGrB-So2LFNm zax!Qag62C?(DlM+Tsu@69>uMb0RYzGnXAM&o{+z^QmTtOW^cW-cX82PzuLuSr0AZk z8zQF+BP_3Je)eB0&nkeYCH(e%s(p4I#)FzqEIfYYq3yqLoJYj3B;$I1wY5BQge>3! zEj7|}sk`Ie%NCn?`sJUVz8JZQQ90z@9fKSvdL_9b(y>dgwYM**n&+}KTD2%V-ORZV zFJy;56d)WPb@&uT*9a5|`Ux}g+j_7?dcz-IP5OF!gg+mcN%50(7^@Zni3qCsmTC>? zu+1~qwhEiWwX}rfm3Mx2o;Vs!4R~K!(YusY5%dW_;g)Qfpeb(H!Nu0gOO*Z|Y^iES zGm*gqQQ`39o)GE9aR3VEOVoyY=RfnI&((eP%OCo2kiU3}z&(|@FOT<>w|w*h{VarO z`KEM1+l=BKm&N)-NRLPCfMVp21xe?Ejw^5s78oTgH5}|GGlXxRxIHGpt@AUn|4WfX z5uWlYv%CLZQ%2}7Gq&$7phJUn-9d%oZVISeDg&JYWj4JTpi(*AWwz_}#(ZvJ1|bXoFb7&{81u~Q&9(JX-8s*pt`B!Ks?bfe zw(0sh@dot@=i(bZ2*Y@?PNnEm?{;2$+7ra&8`~aC$+x)+n;*2>3B|b; zni_y5@6M)0W_O5I;Wm@t1r$KiJ9cY84dVuo z603u8&zYpfyn3hTiEA}lhV1O}nrv^!VOwK1oyihh(IWpsM1AoAXI@KNc5gtBY+_4YG)tb~il*1E?vK8lUS1p;l5_H^&)-)}Hz;NO z*ZTllCGn@ixjE@_5c8Ce>*GeO5D?hm-MWJFb!`wU-d)HS!ig$5kCyzS-;QXB6EGby z(IL_OpO&6M+P_(U)dB#C{GVR_Xvrf*_tU~2Y4}4*?MTBPA_iW-k%s@jsNs)P@R15W zQo%<7%da5p$o3rB9zcQrjYe1Lgu~K&f9_mgw$|L3`-BQWGdpt<;^P&uo#Xm-Uap`G zoUV)r7c6XKJwwNtl`Q7Y2e1zABuSWPdq>CkeJ~V57YjgoqZV%iM2_+<^wahu9Iz(@ zU-4u=-Rv0e2pD_{P`R`FtIlhv;$Ex!$7^vcR&`g9mm1Q}{} z0HORS1im$2V$s$UaQtLXOE^6=A`#oN?J!(&ktCPDZbLJKVjx{9h83sT?HJ?&TtI$tO^5`-*VH7*}&Ix?k=On1DU(`{BAbSI70bL8a-2cE;em;!0;id?y@xPoGj}KHpAHS( z69?U7U_zCMDp{vFz(l1TAgjCcd8qPLD+O0izuZofHhjWUCZ>ktIwvGXgdQqor;6^W zyyyXqx}_};<>NlbmZ9q^pIN|2 zIU=5KTT!UNuSsI=mZ1KYCfHv0(vK-Gbvx4u7KnBu==V=i*pXUQ7MRarn+M2=eO@sq zt;ctN0)lkq(zRS@Gnc+k^MJ(~Cnm3zx%m8|LNBU& zB@c{_do`o@&-}5A7KN`>JS3nalJl^O7Ac-SsGVJ-s`5xLlbug*i3(@BU$SqcVM-&n zqQ2G@Hoa~G5;*7eZmcjR3AQJsW40UQw;fakCJ9vmX6#Io@D_&uWrCU4akd-Ci&Nri zM*_szJx-%@ukc{EunI7=CG48l#%xx?g1#Tv?1^HDcNccG2g^Lyr{y=|zx4n-C*Jp2I=iLpr zI#*m@=0>J`TwY{r>oE8RAk*9^325N-jEb(wVbORi=%eNblP1+p2vuVsUJNcd}?F+k1- zZMQ@YkTHtKT3TB6fK*6N1cR97`F*>8f$12+SEM;t;&;=SFr0R5^4cbTn9W6#&%vg< zU9x29BI6!xpe-UN#aH()9EC}EACmsaHm}ZpxMX9*t}q=C!;&q>DQ9k5#WpOjz^F!8 zMOD@8Pg&LvI}5*fSzSFm+U&j%NNMn;E}Bs%R zPE~;pSJ36+RpCdRJ5{XGiqsfVXmtpUn3%Fbrcr2u<{Fl~hzWM%l}OI<8FMKxt#e!D zdIcGFUEP!Y#bYnUFCVsMVa5IvSrMx(n_W2cAqVt%QrW)|cbNIi!Ftac&g={zO^8@FJ6I6%OhTro*=s{EOKGhU3`(no)**vKZ(wyic z@|d%#$24}QDL{w$#kP6}FzpE!4N(;;;2w&gqdSwBe7Em;?eF~b#bn7OUL|`kRoRbW zr>E3`uCh6kgH4lOpiE0T3q8|}^wRg))S1bLIZ*jv9#RHyo6^Bd3Sn=Ru+BHwrgKry zXo6bCwx~%WBts^GJ9Eo#?okNtbt^En18r9qF{E^ZA~TY z5LZI+_0IyszSd1;q%emH{K?HL_4$!d<%7M>gT3f9b+7QNo=FZw)ePm*!>|p|TNVer zV|2;jurYcq3thOsN@HC|(~_m_Q@?m#2)S8RikbDNyN+q*_jRJ=mT%lgK@Z&cA|PSi z1xyFqts-rEGJvs0wo2)$me0TDANhG#zy+Q-XISmRh0gebo;cX&ZMYc@O`A-}-p*i- zerlnDpF3dR&*c9B4$!A4I_}#e_ zorBrP{n-rLz2wcg%oJ51be-}JkxWv?9;%@U+~`Nr35^^nJ}=Cq!nx!%fs@{QH@*bJ zk1D^mc0n?wV%bh*r&d&0Mo4<(E0 z<+{=ghNz(RPDSMD7PA@@xR$)FbOu!qR6+XkdgoDy>&ubNRDgh1Y3M!zj+<{~$FRpzI6rXf27oiC6+lkJhzu5|^f z4Y5FV&nnd>x7Z;?>Q}=TH-&G=_~3eZ^oj7r-+ZL*j-HePBj zWnb3-XS&%>Ty@3&A#k6*O*XA*Ko?xM>d{MWMjgtKG5pEUXMpTi(LDvQg9Ss$2 z^`mykwx8rKCpgztCoEr^x`=@Dd#{JMD&yx+S;Gum!O&7G4y9mOUsk#@RI|r(4=^jI zv~AFZ@r{OsRQXM2wZ;Ye zf0bC}FUX{UDhH7DEGwt!S4kOm0Qix@t?;ChxcU?J2muQUp{bLm#a1 zAG~1rc*e>se4SvgePS=XsQQ`k5@gQKhcTS&01Q?#L-`&wQL2rK2VVSy-$_uy(psB9 zp$Q63l3=JXHsyw`y5%vSrXPfC&~RY*{^`MP5Ck`+Znb=lY zzx^4%4pkE-4%1-SetqbJ(>`?7Ww!`wJNSA|H~2HU`4A1AlL#N|!4@zy--}*7M_*XJ zDvAaJI4mML>raKX`NtBT(qabWqh$fZq(6ozX_w#1cl`7%PAHUL2pJou50 z>M2an_~aC4gM>Hge$29cfA6c#TyudpBZ00;sUIOux=p1vvI^Ea zdGK$u$i33uC*aO?2@DS;SEsV4;%T~3R8msj#crC*tV^`MdP-S$(vtu0Mc(DcGo=E< zaj-3q=}|a?;_cTLtd+CHoBir7Dv#Ih$i=UyPPDZtEg%(8a<`j3Cp!_Hp|$$wZbr2S zsSqhe$CH2HhI}J=EEL-`Kc8d4+I2Pg4p+{-;B0;s{3fejIRod-Kxr#J(=MUQd8#GsuIeve0h_qu1&O%e(OxjnTM~%Lfu$%-zk}xn56OL`m(H85&HQ0 zdZE$i@tvKj{wTvNei!1p-6(DLcCdx`E|VOzYv7)hnhxz52p6j~fpqMNc4~s#PF^Y( z@vB*xB@xu;T&YeS6F20{d`wgm(nwH4z$iwCp0`Tfo% z>V@L?xVWcOdr5n9!~>PfND_2TK@Ue{1JhYd68^DMPLDU=>Ca4Dg<iaasfp8nrJ9xRO+5I60mazA5%7X7_+I=QG3i>;w`aiy)L>l-_{m7wy=Zj>wA^1(bBiDS#jPVxBv6jmV3{4n_BWB z!o0%5);B5&4PoxQ{A)QgG}_UfDJe|OK0c%4Z_jW0l-%oXr2{KcL+n-iZTrI_BM;1& z3Dvo|xivLQy{;V>Uq+*A;*yfeKh1A$x^kYGEPcI_RjNkVl&&$mjzSH5?(8&3@8dst zO%E^qj-qV}V;`R3fLbWs_T+gPAL>VfKD6(jrl>wAP}jVfc3fH+R@}y7 N?VGA{1vg9s{|{KkL?i$J literal 0 HcmV?d00001 diff --git a/ceph/src/rocksdb/docs/static/images/delrange/delrange_write_path.png b/ceph/src/rocksdb/docs/static/images/delrange/delrange_write_path.png new file mode 100644 index 0000000000000000000000000000000000000000..229dfb349ac12ff5f72581f141ea7dedee6453e8 GIT binary patch literal 109609 zcmeFZXIPWj7B;L1f}$uQO%zl{QE4I~N*6~^QLxcV5QZkwq$h-kQbuVy(!0_`N`%lu z0#cRGq=g;t6f$ZC&lXyASQ& zwr$(q>(_q0yKURf`EA>FjC1V-|M_s}%CT+Rj&8gD>*f1C+vkS4Uz+#8C1y_Ktg)$v zvDdP3f4F=Ad9m?;kbLdLgPrpAYzOsrw(|$DD;MlySCRR&{=&*{o?Wl;?WsiGd$Dq1 zZeIt?B(|qN=ay%W-)b)k`O3KR0uQi`++s?+$wg0Hb79TXa*c&h8+wA4r^66hcyv@6RfO_@6 zeWT;I?QhIqt3R38we>LHU&J-P$@5buT>XdnxL-_)$|oI|cgCC*)Qcgf8eNwWz;Se+8oML+iyHvF^g+$XKpw5+*=0M{K2g! zlK>oQ5B7Hkn2BZJ?neW~Irvv_5{xjr=AW);SYXyam7XHL<)An&u6dJ*dsAG%#c9RX z3{|*u?}+I4I&$PlkmHU^72^@gyOGi+#qV4f0{`WP{^5f1PXQ0wfPAY5+^i4>n?#C_ zRpsU~j$7Qk;LJ(YY`BtS%_yZrh4!ay5rlpT&pTJtiqFj-M&G%@=I#}|12Y4ZOR zMp`EOfXRDBr7NsAr|TY#`B1ggD3UuK(f=41Bb}R~2>swkU9L?RwD5T_>IczdDOn1f z++Xa_mGw|xBFR-VK^@_rpbGO^GZE*MG%U#LMg8{q4^fh~Y4s*iwyQVaRI=x)GPj`2 z&3jolr%N`LVJz*;M8Gk4$INQ1u>4U)mysk<*j!=l_zoxW8T4cd?2Kn?yu1X_&b%@2 zvw~x92KdGM$LfuUMq#`AgP7R@^zI8(SWgJ_(x)<2_{QqNYRP90x^-SVZ?VFudmz%1 zc=bIRvqbE!KM+$GMQjWg$!%5HT&xRvnFUR6yBw&WYltiu3GatJ`PAFcGdk`vMjz7rawn)9f@ZhN{TuiV@3&bGAr?6ix6ATHoXd}yd-mw7HumCS zk_LI#j75TiMJ_&nBFFZn$|WDa;kZ9wINH#{IR8Bgg?fD0`Vu%%xp>sGtDyp!+G6nY z4Q1GGf9Mq9qEmTKk|i$ElL^%1TA$-bAFU4ZAM)BFFt*&!Ar*P)_qg{!+qMM+?23vl z3QNmVC&CCci_xV}S!}th+RN8mA-d}H)f*a7`wlwx@WYlGZkC%mS7i3it05a&ie&wl zLZd&tFDtbY81tP>jvb?(T3Q)!M9!~ZxSGZ)7*TQI0^^ZS3XwH3Gp&294jVH!x-3*g zwm-c?MuPo?uTg5Sh{-KIOS}+D{a0k zo6GRfY2N+BSbUfSN+jkDnR(MSy;uZ1MrPTMeLnjjboV2 zJ{U|r#j)5$j#0;>oJSasBwHV*S{9b65lIbu(93)3H|d*hm%h;#ysHzdhi8|r>{Gp8 zt{#Uv$IY*N>}TuIp|>4pP3CJEXHD$vIvdamVl2dhtKhhqaixbtYIwd$$@|5dmpV=l zJ^K2KW({E@pYWdg8@r135zqTXi$Es|0fC6LlzfWro6P_6^{j1sw2o1dN9Xlu_~1hM zc!VgST+0Xq_kHHS`Ts>a}3%EZPap%0^t-kxKt=;M@WuJoMk6pvKtOha>D1T9#jQoRTc$H1RjCTeh&3vzB=phrJ;+8n+Kw>NIG;l^G*teAl>AE;k!V z$2HT#C<|39ZI#=tR^`~T{KT5AaRY7s5yJ8(m8CYrZqGHTGu$HS{Z#yNepcP zEh}wNRo>yu$mm_SR&m3OVBHFn9DGwN-l57c6Aw#Xuch}YzBnsat@l8xE!>Yaujsr#QLtDAbypz z{e`UGtYfX!k-e2-^Rq0z-!3}Ks0Q5UxwC+!Em-JdDHsy}0%`Vc2r=_Qx_Y=&g;2m< zU*opN8uzE{r+;}`$TiXY@jj#QG2Knpwq}ijnM_5fL7Ya*RT$;eWRhR@;!VtJSCc}k ztD{~mGqZZng}uUkw3C5H&>V@&?>9Xy8&0LRBd}#n!&|t-vl$@$IY#UiVx@emCvWrI zvAX^#j;^0uzqa|uE4i+G6;os{3}Y_Yt|>{$F7u95UMR1f2X61J>&W1B@sCCojCZ7V z=4LxAxAZai=9+~`#bcXMgTNRJ$Mc|&dp8OXI$(c&**g_hXP?`<2ki$Zj>p-=$*R+* z^;WWj#;3IX4Qw~D=MU`{IP-Wb!==8fcs?7duk+oFZAkAwc4MwRW}l%I+0fTO|GrrY zzt>uiAq9)ijKdOLv|7I~LMVLh>|=JKQBATohoytUoC=bKS^&8VuKofVJM^hXHJM?Ejll1IpI zB`MNkF^w3L%MYl@%hcx*DMnoLDZUf&+k5U6l=b=sRuNyGjro+PFkV!3e!_AS+dcDo z>wztOTLPMWBUwk699k13XkqkD@oC0hkqZY4^3Bnu`}Z`YxYBoJ?~cACd_Mqaapi3S z@T}aE&)|1cD}eP(NK|WFcDUib|Cr(!U({35`djm(-ujpbz60 zK1-xZx!y`vDsjcC>&l?#iZtzG=>@-<7M`E@9vd`DwSC#VmAjwk=OImWYnJ1rDc3!I zX9-L5Yp{s(DUIJynOJJ8i$FnathUt9+~jALD)?B_H`anyUilx8?_DMqHi_a=$-YA) zBNkkr#)25Z{_qQT#{*DpO1Z8Iwhmg5w^6Z;W*~4JHJ12sd%-{^SHVKhKs41^GfF$U z2&<;Rf>w+eXKim2?QUKDbPy-bV{n@D2@$ihGdI~F-)^)vD5wBVK@;W+)y+l&IzQi1 z9KK(EcO>zHtNVUEtBD~6yxF~XOK(C%-zV8UaPhilAA5cM`oAg6pHl942uz9r?fAy` z0Fgg{Yg|Cu!Y8PE(Z&Gl%H!TR4v<_+EE0el#2w9%C%lom{hni4|%uj#jU*bzaw^O*qRc& zg(=haMzrbz6rPr4b>0P`ten58!rWZpp4>{wivQ3smGsnt zd{+hyi?k|3V}^xqJ9-lXzr;YP(@+O>+$L}!#ET!(V~6rK-%3@(K$T-2NU1|Gm-g%{ zKrd8^jz3o&8`BF@R~K;`uJnwaEeWQki_zcp66sKBTMI2F>Hxg$20PR)$xX{^M&i%C z|9A5g5W}~Bpkzqm%Q4pB`@~*^X>P{JG}4z-y!(}D8}hb^!`=jVj*7d~Ot@>ot0yw* za=GJrsPs%(BlF|4JP6PLICcE$HIjz&*_#>4%qB;jk z9J?fuRd?sTheBQ4<_ep5PVL7bVqL2iYq>dB&6nXV5qcmLF9ymsS9~B##=Nq4%S=u~ zayQ6lmvX20{*=twqXAqy_9@SOs71@|a6Nk2>CJ(yOamDO;w6gwhnYWN?SMCYpKO1# zmA3Gw64}su>hyk;^0t5&fwNr&nd*^ke^+yXsRA)N|eoW&!4+^>RNclZFmJg-teArVju4sN-GDG7-leX#rpB33A_R z025W@b4j7)qm=9Krja@ z9IHjeLi{Ug3XYEb{F}5MjsX+QvkmRPEZ&sP*RfDp-S5$T;dEDaHn(im2c+T97HD+G zUACB}M>*|p8G_33)oX_(wN55sR4f^u=4#PaH^@JF=?{eeHXB zUA)xieh1F~*x~?A{>-?K_w2V~dh%xn+N{`3S#`w|$)#TxSNjoeZ@0Pp>Sy!p$J|T{ zlwF;FL6}>MYgl&gDt76QajqJFQ#>7R*cqbhcM@(t;^?qhKb+q7@yV|JQwMwy?R~4w zjmFbGdz&(@Cx>J^$|_5=sy;p``m;9_02a37tM-=@;L)C6zxGxeCHs?^s?ihDCb4Sg zDLn!AM~?tR<%32`J#1r?V_Gpn?Tj?7v%;Zv-A&x5I9TgPDGB)l^EXOqU1fz>90BHN zf*z!rn4M$YIX*r0I#qXkfGn`U0;M!8zGg~9-gI$Mxdf+43T6t6T|A1wZQ)A!Shdl5 z{CQxB$@)iQ>NW?w{r^x~PprNvmcZ1bnIuzo?H4-1=2CS|RTcH{&PJDd<)_E->E9AX zPX5Dq??2kH0p`}r)Pi0EkyrwoT-`J$Wa(dAfCU{UeqWcI-R7;0nmq4^YU9VjuV;46 zO~t(wt)|FNqnV>)=V>o<)}+=wU9I0`Jk!lTp9)Q zEBNpJ-;FDI8BGZGw4XiHc|(vPnIZCle@i2yqxe~Hk|Gi zQ@`=_U+ex``f1PEoNhfrYj=4Mdm2^(82ddQ2}1v3%a2cIeA78u#gt0wL}G3v1fa#! zVe)P70J`4s<;yGVax^kSk>H0~ZvNwhPG@JPjtCqBX)h@&%T9EzJgo{66Nkny;ps@` zdSq9X@5+XLNC=QOm2Ea5%EU>B@WSQFBL1%P6bY$vdyj_a1j|YJMr@v^^n2<=MNy(XE zf7*OdCE3v4o4$q@9Dsg%rQI34S7f&nONVP9D(jT#E7ePTxLD_>vSzf&>82zTj|}vB zCX=iA<8jllW|J;av2FPc7yH;UrrMBHZqQKz0^@`7{TrsumkG5&+%(gxb?tU2YLVat z=b^YG8z3nScu`X@Ov-Lb?MvcGr5)n^bEKBKvMfAdAZDQKk;rPLKG5hNmAyG6$7QT*;vs7e%i5eH$M^X`SOua8H1r=IumAiyd7DJt{DUAre+)hfsXj`!G;mn32Y z^g&6epA7&4d6x#~Z4`^tQD1oN-`8-e7oDIF^Ccz>aEAXBNonP>#w!dNWL2UFZ!NeC zeNK5Fp!N5hAsV}vqnW2q0PC=hgkI?{^KyIle|I@!nK`yZbqRkg;%U;CEP=Xmw~!P;dZH zg+qUl%`^QHC|LM0Hq()rYp=8B4joIE+XDu@W4XLtxl_cH;M7ns{a1SOcetlvS_`^E~?JdVVw?G`Oh?|~!KaWh~&WObd>bBmOdHD9e1 zfD`acv+@X5J;`exfGal&Cru$Is-tY@E_?SH+wogGra@Q=p*}>@}I>uKOU0_u86%GivX0vm1Qf7oIU-#ya2shN>0drqsbdYzm1Xb-DX^sEB zZNP&HdR~Md2Gq*zPieG{tiArHu>YP1_>krj5MwGTU#Y@Eb5vzH9f+J@HoBLHQO)9$ z$=(*DU$kNdq(Tf!$o-P{O~!wTnULc2QV@0A$c&aW%wkoFv@MO0*1U%s!>|AWfJT-B z=dztm+{lua__`|#qMN_@+;>xK2esA#i1gOgLhW=2)OgW*p<+@% zlvM%Yo+JV(VnSsq-<`8YZkoAF7KRZ464o~;zN6A`WY$F^`kMtXUC2yz_ejHi))T{q zz}C-H>|j|%`6C06TujdqPk%47IlI{C<-2i(_ibAa0?EW&GO_Tn+qr!ohp3UBru%%RDMqlB%l+ljaSSAg7blswM9fa>qe-v!RuwA9T&n>t?SZ`F^efeQa@WBCMCNLQt zFW4{~US9}gPwTt1O0Le6*&NmCGPmN_M*Kw@Q_R=u>OMMqG%3ICgt>(4j1MSv71x#R z^IDtPq5`8~`I5uL%RHAWyBT*`tIr@e%C0H;S?S~3y@PiXsjjjE@A<8IX0A|8@@on< zZ=%|)W6XW2ZCrvd5l8k&P^r=@G_yCbnlQg~wsnEZa(o5{*XzX=ZuY0XW?C9gE=ZI<-eF#yaeSbT!h{*UQZJ zbW7e-Zkh7555IgHk5nJARxe%NQw$J$Nf`WBx>>b9{DF$0ncLOTs0v~j)2I1YG z(3VgEn|x${;)st7xm3qtvb1peiSU|sk=o3W}ls<>i=IqFG8VFk_a*u)U zqc0ZX}Ktwz2hb6L(*iO?l8e|(MdgAhXS=%UZen4SaW7hH~{spwl(YYVK~Nbcm3laf&?CjKPp#oXzCQdRT^! zQG47Qcz!amBMq$qs#i)MSXj%o}Ag%eeyetV6oYi{@3FC^GoU~XED zSZWXy{I1|EZd4!LnUYswA|C#7x}PbRv;W)?w}Fw>S17VUY;PDP#rTp8eO&yWqy879 zBf2zPx>%8x(Wa@)d)RryrZBm1oW>(c>v_Ii<*;=-8@f-@bzpQ5)_ovGWP!w^F5mPa z%uVb0QiJ@`BZI6KDDvUUAh&*ui8{*eZJXb^;jA(K=h&U=ylEQIZXlXs8VXoCv9icY zVqKXDt>0!};~xp30@63A#)PrW7hvA4zgWcCi*Dns83n)!N&{Mn0am`7S)v#U!>w78!k zPP3tfU{OVPuq7?$$h@1MjM2ID_7v*Z=jQS|>y&NBTQAwzc}cb^VPV2uAKvlrZme^Y zl?*vNxB|1KZ2km3fH3R~o3z-#)xHbf3quFNs!wna9vW_!fsp`zK{C*bwQ{gGC|rRZ zDdJnIV{e}D;$kz6ho}6wEQl;Ymd3mI8>-IAz`&P@QcO|)aHy}LD2!SoGwW3)PpWEL zEn{XnwpYdx72Fr<;R&PqlIt&C&ZI-8ENoHZsZDE@OpD%XI(?-iJNx{>KB9Ew`nMF( z0P|J(#)~A>y)!zdT8lnYyM`mW)QTlCyGyiLa>@m{ZB^ER z=_tK~+q;ad4G2kkH{Z2swhTPSaH50}g{!uM^&12!SkupShV5*v0#g>;dHt%>ne@kf zG4s~3^cIVGM_^eIRZ{OHNp@vEOJjM>yi|9Q1Jx4yXeEg2s0fCWU) z!%=JuvHcRroIe$Vbo#aw|zEmk)nyu9Y7d+8aPN6&FoRAT>VE&gB5a@-a6!T$H z!xd_LF!1}aJW56D5B(il`X=?ZaI~SQr3uB?YWFD`}FdG;g6q%tJ+a>G!D6mz#z-LWanPJfEnl7lIcJA|EeAYZKSy29V#JYfm zwGi3WHiE|Ta^0Mg9XdHjNS?cJ!z9o47?DBuyM~c+Uae0P%_>A`-&&Q+n#83&kFlYl z4{SX=n%`K@F2pN3m{1UMEwR?O@2q+sWn{*}w%BpsM=PON#0A9Ag*WKra?wVy zOO95VggfnbLMtg%IN5Mvov<_iJV4(w#^BJJ4_xzXsKce^t{*u##_#D~%5=3LWo(+U zd$Ce7HUAOcN&U!>!H;WrNGf(EnUo>b5$fvH^IkuvYRGP=dLqHG69n)^N37ZCcl`&x4Vr{J@; zP4~jU$?LImy0Z&&H{?@kseGN!QwgsS84GSLEs32?2jcs2=JK;esDf0^BmDG)+n4%# zM`CP60XZ-UatP%JsTzZ|{lhr?nGF}eulcZRzWITva4`}DN0YdA;tsUW#E`qG-;G}{ zyo`djsD+)79oK8Q5%!04e1Cucidc9xV+|>&iJqg*CC9n>d^#3#-xOD1D0>lP*fpnk zfA9A(?x#BwmrFl2i-_nNXOmgXACob=clUua7rR)bwp!A;x&b3jWoJ(uc&dS<{ zcxc+XTSDxhJfV`2tu3WcDSPZ8z9`oGjk#3FrIomXlc-EN@w53QLk_JG6+c5V8fhHt zGtBHY>5d9N!Q+n-mA#!Ach}aJak5BJ#mDH0W~(@XWr#f?^=#D9ShgCjnYa0IZv;tEv4>jmvn)hg{QA+vFZ|XI5pQ8s z+_I3eMPgFX*)z5&3hus*n^M*0qzFlG&yIHSm!^v}^!qjJhlhU6viu3dpW2@|GDA-H z6Svd%5FP~+9Q7h^^>@a(NsE>8yn8JBBap>1n^pk})7!2LEovl$3VX`QOEG@40iNP? z40KNKdD_FlQl}u|F;;>6AfEGXLHXDVkC~io1B+xO&zUE+c}6kvwI8MoqZm?M!t+h3 zWHR~vEkqOFd_yYNc4Z}{-4}Ko>C%wG0AdTsvf|uU3(l{^U5{2BJRUWOvVN$vTCdu3 zTc$Rr400$n#kl`S#F3@X3)^yqtrO0x#)QnxwCC!TW?s=tN@^_i%u-cu%qYE1ljwc{ zpNKG3DHvO9bhqx5bd<>0hjqYd2`i2vuNJ?snR`}dhwlbsIpcgz+U{KFQ<7=O zYglBg;!Ar^B>Aq=a^U8snb^5E2TKUrq2@ zCxdAexaOg&&1b~uwJGNv1{hQ}-ewGKqq}*;>&rp_r(O=bC$)DXxeWDC_{|V^?u_XK zA=gs{-nI~RcL~wvVV=8IycMO*#L&`6I4q>___@~gi%tCp}!1UPh?V$YKpk-VB%Q976;n0 zeAjn&2t%-bv=>G{!7_yhj8C{m|`IM9=pY1cqa)+f|ci;|5A#4 znGf{V&p=o3Qnf}Y-JNmyjE2?k1j2u~jbD_}9h-E|`_RL0EF zAHz^^7NDR&1U<)c?JSj(mj1v>T1V~7-wd}{D|JN~lS(`go)O^D5B^uf1^jTtW9Tnb z#KgdoHal#((x&9^f+OE+`p2yPROa)?JLpLBaFu!Czuu)0%5@X?oQ--h-S?}uoyA9_ zJ0*vI0)v4;JqzkIHA7Wb|8k5^o_r|aPATcM>aWmp9ylv$lmaPV=DhUMe}r7M!e)Zf_bQ9n4cx29ckO9==$0WQs9 zC&-WwC}`acFt6qz`7bN#Wpb;#Ynp6S5N^t=xz@ZSU$$mXIL*@c>F~))#$;oQH8}$$ zoQi1+dkj=}JRH*5;^n~_fbhcpDFNuulf^btjXi%~h0`PG1ATlw<8(;GEzTC9xO&ST z`47FAzx3#DVS^8rOu$|uR~HI^9l6cFe?ju>3&?E@*r=-l|_GL{G|7ukyz+fuxrStsn9 zd)pY--?KFE&EF8qGCegiXHMaVNjFUAf`rCRWsk&-_%@u&mnJ)Sg#08g%P25aEvoF3 zQ9hw?HpSu9GKacgMH8#3TJ%Ty9+cipo4Hw>$W;(=RhN~__*>Rj_qvp{p&M^>hTk;iT#m^WM(>ew_J+fZ95PjQY51r z_EeaYbRU(EsVYlk>{XvN5daYtOvJza*%pEUetk~>L+^@Em_g5FOLcCkjk2hxk-tcVB3={~6mg3=O)3 z?AftzWM#H;#*iP@ay9B1DumWl|_%pGEcfvKN4E`XHersn3e=*lzvY;m7o) zo8p9Y1M`9{{DRQ#fU#6Bu5@WYc=O%xW-bsiK@`18(QjuJ#JYF?>QlBb-8yaE z)GAuCbBFNyTqGuS#HS$tz>sIvg-cgo;!U619sd@3{n*tD|Kb8X{!|Eq!it+PTpEYEBa zbli?TLmU3kJGGJOeS!Y?VK_G1qu#-3dXb|ff0;z_s*)w7mqE8`e)v$$v2kReK?)rZ zHRw{U3Zgs`E5VENRowq83HXfn>=UBCt{WJF#mn!^7KMe6LUXH^3af1J5I55lC!Kq? zYZkMnvRoH*oQm4joZDP zL=r6+g9T}F1y{|fiwlP84|TZx5swpGzmAAH#iS}z_nj29tqDIuh*#)BrXp2UPpG1b z98W@evmg1HLPYNgdj}@Fs3;yw-sH^bu+0zls!}+qN?k90kDq&8xe?9rR4&3Bk{B03 zH?tcSMwHC77VMp%e41I^!pGFy-nFV+41(f8wKeId=(SmVHjPvbaV^vPLwuYF!e5Ve zRm%OHe7GD7pV6Ra%do$tWv=1-wZ5Dgq`c;@``^&Hq~-#(@m5$6k;#J?{|? ze!!37noPQzw1hXs^kY75iW(>T|6D-cFLWlrKg6p9rzDqLtbtTIk(+o^0Q@*+VD7zE|53_>4-?b(3=b5Pw^pb6qZrt=-M@gPPU z3Z#7R$a2Lu))_$H`h-V7C#El5!YPPWE=|5=pt6n2?2V#qw1 zSW4_KJ#S-BIFIlznnix>E0ZR}9gFAS)yjT@slefWb=~WWr~^$uZ9h*9^n0yN99^j>WHa1&b$$2?s;C| zY{{*6>WKxfyb_mMQj+;P6WjH!wdc)=y322TF-=ULhOJI~75ynKs96*c>g;gdSe<_M zw!~|GV6{cQJtN<=%wSe|t@~crCUu3=CbH@=oz+TgQttX7vhrUn`{}yOSc`#i|K5s6 zkm@A>20H|&46DwB(m-9LO>Iqq5NyqMMfzx-$ipQo zSfLoe3h#{gJNPAdZniZqnaX-F`7gqCTRw>xJ;y9&trq@jYp~M2kEbd*FXGLn@W%;O zrNr}eM_;vU%#K!BcVYQ?3nK*|)I8bKPn-&y%MAzQydR@06IL}&DkLpc7Q1fsCYr1E zkTfrO2w=uR@z9bH3=4)dtlsEJ2)L6!xq^4DAjYuH?}RLWgiHBoFbkE`udU$fk^y}< zGjy(Btao*o5INP?FGoVHQ>(RZ6#Ifgo`G2S%e*gX^P?eR1#v_I(qMJ=S%v##mWWGL zA>@H&;1k7Rk9$PSV>oq%)rt(9CZ>bX*YrVgd`c(32-rzY)shsqaE)AG{Y=I~S((|5 zoLTilzVp+C4!ujK8|Df@sF!f*0w_Rdmp(^t^z?t`Y-p(}sMg#USE$IU2GgO4a+Rcg zbGntkdCwF{Q5ZE*onJ$|=<(4PO|aU=BbRl{t^nRs zX934?n#P&g}oRbbOr+zj>@8Y>{m5&*ej#>eULK&6A9=l-t9DMDx5(paC{FwV>w z;eL&ICZ7yLh}BD>7hfFkFWG_qIDgeY9=G`ji!~A6OGua9v4tuGSJ%9x>x`FS*YXc; zzk^Nk>@}u#B{UVX+XU*-rI&PDf@gDoz3mBr>rAe=L#O3}je=JKI7RKhEgeIg6wzyO z1zn<(2#PnN3~y)^Y#s)O+>pZMu=0y{99eLCy^_z;?WLJy#8iTfAWl{E>^NoyqqWZp zSry;p!Qq`>eEzp|P>u@~4q3aQrXhl`^uA{7@9&~+u<1w39AVlu#n_=Kt+NTLhR*q9 z>NTopiS{I`fcS5j;&-dY+egBIURifQ5j2ds-jd$Gs9Y$3y>)f{8SN}%@ z%LX-*KZV*TS_?of9SkEXTHBhq%-8>sSi*(|G@4V%;A(;Blj)4*mZCUM{0pKagnFZO z$qclzOlQ2-eGjTHA)+&K+_EuW!cJ(;N)J=%Bt-Rwzt?^pORcGRcqjn50WAhUccqH& zvXdVz&kr=fN&A%+`m^KT2TM9eb*M{OVQ z8K7!kAo<%v0969w!bVDd5@xU^l`q?a(-l2(BzE9r??UVPXIQSHSM{L6cmZ@2&pAj) zGTW{_F(?wW3~dIkDN#u#ZlyMS{Yab`cb{Dsw+UpWwC9C zZ_^R)Z@Jghnbh_!VY{R|JCc9T-CWum-qn1yne9G^;tU zW*htQG)IWxrlZRYCr$2FUiU=axN;{TFy%@e6Q+&N_bVv0s@Up1n6PO1ZLrYO3_9CT zdQg5@jb4(UX%fuv=v?I*udHn}Onsyil5+7iHIg3SBXl0rc+DGGHNbMag+kGtYzM}_1omXF;b zKibSwaL+TajK!clraL{FPiJSul^8gBS&+?8{=rVzA;68&YK28y#rUB5A8qikx*;WIA^%+88inrNWhpHmqf_qsywx*4{sN=bWwNfD+7X z_d6Y*dJ*p(&hGVP4zvuK+taJ*Wz1ej%UD$w$?nFxm}KpagSbb%XNd9Eg^#SO2&T_z zg`3ua2Ko1`dM(wyg)!Cy`iL~Z!rp8G)G8VmS^s!(SUi-)0SgY*2)xbPZ9`0p^dPy% zn&X&ZCl00m2E!9cHcf@G1BI!Bua4WlrV(o)y!lKUNdQpX1>W8b%k8H%m1U1nG? zeA3#lmPd!nLYnGxhIOnO*FYYrnC$U+A;FsV$r#op!L-+ICJA2jGNbquL09N|k$mb* z-3D=VDt%A3y`*hR&lxIoxru5=S%ZC>ef;x2e3wi1;kn5eYxNQnoJp_a1@b#w?1-%= zF*S62bHvGwF^6g=bB!N&ZEimtI(cYVQQG!Z!?}q3^MeaZO=7ZVzc+>_QM~T*oDuW= z_9`K=New}@T=WJ;xGR-OO;tRY)tlFGm1SuD&`8JWHe%gT>b*wkD{XKIPz5E~iSJ4B z^xM4d061m4Ws7eO4bDEo#;uX3$8K~4QBbcx`peYj`%W(0)Rs<&BV!3A5Oa*`x3}|r z>X5^xzGfh2qSGfIuBkq)5Z2ba2J`ZwmtEl{UrmKdmdxa=ZDRc*P1Y)Y>PlcKB>T=2 z#i7CvsRw9L#Jc2iK^=SS`dDZ*W^%c`SGkSzVc~h(eE7-?Mr3{xmvt*~IXimXXs0~H z%64F`AGH!2+~ejOrl+#>>kBvs=iOy8m4d9Cd0*=t?y|}9zbyK4Ol`}ROD9>FLpL|p zgg*nBCPg596g3TWbmCi)-ose3r1&nv-f8{lK!70}W~D9Bl(5hhqXIeqi8?fwcZAZ+ zyGmG#&F~BQ+%h=DjJd>EMA^s^tJ|{N6O=O#wuKB{zw~a;>_-{7!KDPpf}B7PS{-|EGk5JecAo=Qy7NTbM4SmM5eg{>?qns3-{5H9QNN{s>(q&wU1R;8y908u8t2;ge+ zNus}Jp0PgN84v|-j>>MwkD(j$&`1P*)&Z2jrz4aXPDuqF;{Q#@qtP9lXPV4er-Aa~ zbZgCo^EP;_Vb;zq#0Iv}{&WCcbfCdL7ThSpH9uy;uF6Vxp1buu2;nafgHFAHFFZ5! zc@yyZo}OySVt0(R>F&w8`EOI{kuo0C&6Re&Le61^3~yDVcbIzNFB2@1EUQ`eA&Pg% z+;6GD;p0<|9!JoNFBDgEx$9)skwJ-F^E(?SrPA+JLMOQB#6XItciYnUumM}YcLEZ zs*h2;&&k^|j9}7L$#Q;Q12}`di2v7PmZqg%A3 z?7`H$4D^!ATRsrY`v)sJUHBb$i~w#Zm{n7LDFUeXvr8r|49ARO(CL9DVbQO&CEE`J zd@{(+y1h*6#`^h6KZ`K87fy6LBGalg4fL_ z8oFO!KFi9YjI2dYUpf#1Oiz(UOK~VuL*wUSfDkm`827wYPGeu941{%K2~m%`iI^!b z3l7b}2gZmStfJ!1^l+^0;8l2`pB=)M8Ev=LAO!AnD6j}wUaZ@fOLI_x)K=VGFbO(^HZG9+~|M>mMhGog((|)tW*88I4}8engCuFaGRS&UHj$x z+wCQ4O99tQ>&z7>G;_rhg0*Pw$6IIadN)>0VwGcE>; zc<;By3PH~RyMF*K!_=+!rJA^SmTOyP?Dm`awl6z2Rn;ehr@b7UhV23P6y@XahY#FR zhXkf#elAOa+j$0WM?3}@?R2vwQtG_0&fVVK?N;#(fRu|m;7o-4epIIUs-SrktB607 z&&rm`_axh&z*brnbK9f#y&aBA7C?0t9Uy666vDaNKImfIBQ#$|NKrkPmdg7sx(Vw# zTHDNU0avHEWwybcCOdUxZxu?}`m1~)t)uL}yk!;2vo>2wq4OY3%3lylUzij57Csi# zmy7IuoU$}L8_lPC@l8{HT)a=+Ud4<)?!^W{$W7jb0;9s?EM!SF#WSGB{y^O0n!wdp z;@O#ai)CSdcaGdGaT%89v(5zR`j8Y?Y} zmfeoUzor@R%#RIo?ZREQHn2ksK2Jl9_&^;c9A#<3Z*0Uf71QVFT!b++UXs_W_YA!- ze+}!%uJyjAdP7j`7J^={^=9}Qy{7Rhpan8l=R!I!Wh-FMd-I{+JN#(sYJNW`)Jc@P zvdE%9YiPyYcf8a%_l9fDDyB!{M$6mpu4Xp3aA+KW3;0=iYN8x^j>W8MVZ%X4fD%~( z09neK(HcWz@cGiO%Dn4)GrGC^tzs+?IB*rmYhV<6XB|g7#h4b7?ytcma^;KiP1zI+ z)iW6SK=GZkCRcu8IdCtJo7d*sTXbSAT$gVvH_k``wMIL~^NRx~i$}>E^ElkM;klPMk?$WridZO`* zsWNXVeREPBIjAen_|gV$n=zSo5g#>I;d@}dns4!`6g0F2S6>cu0ewTK+3>(lQTT_F zfjL+CV z3;OYuwza&P)|GnTfr;9?P}O#*sb--WE<blJfe^c*S0F}$=3G}!^#n%MnHA?rMT=mZTidvV}}#savLMRF*@ zMG?W++#d9@1bcZ-PjGE*(o{ex>h22=TOR&IGOW;@WY>CIBjz{0;4#%CR(x~f+Gcs< zua>`aSjnOlDE=qui-WOQyT+ULy3Hp1Y{EXV?>snQBO5sZ+SUfbZI!Ko*&}c$m6C2i zHaD2B)yNf`T>)K$LCWqPkYT+$*oSfVgKmzXx>D(1s;_H{k^}BISa$p*WCPwQ3b-qw z(~WiiRE@pB?!)R@xL*a-K)x4&|KE4* z1h)_?eoA`}W({fw^?7l*jMnO#2=Dgz(=lb=8=l^21kxQ1{Lj$IZ&6*s5@N$sj$2&C z*cZv>L@RXGXHf=f`~}v(HK`@>s7-lyDmo{}MHsJMOYmLqr4PLY7gPu>u<_+ z3JRu*e-K#^yT|+i9Sh(Eg=CRCsa=-9;fJ%iM<&HPGsZ)^3|*4qEG|tooF{cdGOyqB z3jcrDd#|`Amo8p-i=bE$6;P0(Y(+qdO0RJ%pdcb5HB^z_YmfjjBA`?oU5X$@sz9ir z2nZ;>cS4gAdJiqhH^E&Eyx+Mvm&Y6P3s0VzS+m-#S?j-K4;x;B#4;emba30fhOJC| z^@<4lv60invYX^K&y*R|mQK4Jkvw|~890l6Ftd?BiW!6JWk}Je5;4fJ$&)CKACmJ) zq={PiBf_D=5Bmi>A3#qr`a2{x@JnFeA#6-%(;ebEZn^qaHBOw%6*fq<70$7-8z~>C?S&=>eDJrn4DR8Iru&&v*St3m(M3P>>F)94vLMg zuXt{GgnMikfGkAte>0a(ViGnX5R$M}lHJtLb^p zJO1UDIbtyuiL$_Lk;<8|((bF;6D?VVrQ@t<( z*tk1LPm@5Uoa0h#%(&so*0Sv%+m03eH=)@vV&u2!_d$gt&J|NEnSoC9ua%3ZTPLvO* zhQ3#ul&x8dx?`)(X16C5|3cL|Uh-xBJ&p;BL%?r*FeukSRv!A5+y5G>`F9{+Z-)f? z&LIbhvtG~8gV(FGj7r*<*1310?*C_1bC;b%?EcX!UNlO*{Lf zPYsIhZ@x(X)Oq?uf9^P}g`3RQ{v4h^g0kAt+1WWTaUTsXix*@w(0hLdSF615#m^Re zK@d{~v1P=L-an{Rj)#R2>Q-YEfKJma*aMT6G&X-2g>mn(#Ka=++^1(@m7B}hy)fR- zbOL2#fZuTR_)J9!^W;g9+GCQCSX!__?DElPRRMvNR|}7Y>_H&ZbC5B1QR?WF3iPjB z0MC1UJMslawdK5b7fr31kT6 zZa2EulMR4#1#5ItD{li<&!0&=_+Tr$TqV~HpF=^m0R00Lrdj;gUSyOaU{_VKm1@Y@vSlyBIc|Au@<{m4MYKUOB~MO=+M zEGu2Ztc3NrmS5iki(Kcr#Wq(n;n@}?ZkE|RI_|X>q4vt3)OZ6lG7hmF^1A&;#YNnA zqsECcZ&;^Mr&dm9eS{OfG^S}rbAC%|-eE(kjyXavdg8A|`^y0v%p%yd^6|@j$%g;R zR4B8$fOEMHt

*k;LSor>D#q?0leb-^+15hL0 zdRuys)gjM3{I-bu2P37tymCjs_8zT$HJ|j`qv9v0tgwD*w-45Af*N#T7Zg^lS?N2WFxm*X`BuhjN^hg1jAjCczM3PvQJ~_r=GuE zk-LsEF~4lC3d8-P$#&YOgl0rRGzlC~``F$Bvirf7>%m-RNLwnM7Y^kgz^9ii^0UVs z-Vb`&R|6f)gI~oL&f$@KVhq0>*2~%L;o4s>`(){vz)!C*@+HPv@x>-*HoJR;RUhvA ziiA(@)#hqpPoNF4(|=*t(2n1!UTxJ0pCq*=Jls0uDl^qP&D2J#ibQKUoAI5r4OXZU zH`xC)W*2=;Yd>yNZ1*&-yZci|FF}jd)T;63Eqd8&rLCb7UtT)2G>qW&uCHtfb^BA5 zp^12Q84Fv7LwJf&${1H0fWw&14b_sOyGY~JqpPtr`2W12l@;u9ttlT zzHm-&xf_>yT&^=6$&#V4Flc1IYY z${)uRuv9;7TC!^B_-et8iu@zYHHI%s>3vh%AIRrCTMEEay-Ck(HJ{evutBy! zZ`xbOYkl{R#On_na48J21_e@YblI^%&g1pwz9A~1TGkL%JN6nhTU=d82-9U5FH=7b z9-Qg%9o>Fgxm?Vt@QtxP*~DpdmyE^URQ}cTKNGt@zdxLpBAG$$o$5z1hn`=oybWmc zMv?Bh`+k4-0$8*>Ii(p%Vb z6SDm`B=+RxM%4wYJnDrN9fAxtCF3e*LuI6UJpL1-a3gzaltN3-AQ{eynzF*u5ZjiD8!xDYy$8ES7$qKU*{G zS^-OHas>ARKVOs*X%KIFY<$~X)~yQv|Nn-vkjW2iwoYsxI#l%~*w-x`KK;tG>h5cb z*VZNH#|4>GZUvc$Wu(M%Zo6)dHB9}%e*0(W?8!TeT6ek=~2X@$xlld7?wwQI@arW3tY7+yDhm+Ll2Y!>Ub!gQzD{E^>V|2ZW1-*o|u z;g4x**4>?EesSmTi+k=yol>YGozm2;+=p_s`F6O@A4vvtmw@~Bc+N#4!4dnoMa(EO zXpO`^z~=wzqWI?&T8lLKR+@@s(A|5Z%j)+O4h)ktz@y$JzkO`8BiUd^an)a%{Ol{5 ztyTcZtDU(P37fnB{{`(sTj~;(I0bz~buJxP5GNiz4Dq=Y45pJI^_Ge4Vh3#cYlinV9 zWP6K_AH@~B>B>^-Jr)CE4yhn!`_IxMz!j`0Rx{l$?bUt_BBe6X==;CqF_sYBqIXGImua8<42I2>KPOAt z{?8+VU&hB8!0?xRp{QR0%c(4nXWw`^oU0SdnDtuDGzWQweok}6y?BiEdLYLB7-cG> z%iBKkHqtx2GwTC;l}RiABh7{^&YEClgp={)4Y&^qXT^7u+5h+naMM4#Kt5sZf4g=g zkJuAUEMR21T1=Z@;J~6{fw1*Ck8_^RmME65s4?N*e z1Ywp+YdA&~59?QVk9*MR*~p*lR#tKcI$=`^e?7DSINU^t^H7+ZwICtNE`rgbJf&8!#YK%C}O0#A?_^G!(I9pNMChY_; zUk;A{I*uYCJZ2XUe4TAA1nHt4AVe5B*Zi2B6EIGUhfAM50+ zkr+)rf}v8KU7U6-LuMpjL#~Am?k@*}G4@&+#XBz3dS49`-T!H2B;^7j%wdqwpY5Pw z3|@es;E`L=TgZ=}*2htN6k;Dm^R)Jv>e(w80^MCAoEE-?vr13YICSn5{sP7Rgo}`{)|tRji2##oMHkp8O``?BcGtOV zN@<&=Ou6=vp3JiNuNo?|9$@KX9)bUW^nks{{p8@7?mUe)3Yw(&E=A(xgbT zJ}%ft>nv)!UurI(brut8tSmRGAnnQjtl{5_{BQT#vIu}3Rb4`y|C??6-^T$IM!=W0 z=*WvH3}OAxB@qv7LNsWAzMomuxeUib8a%^BA*pa6`tZjd*1_ z)0~ER!%Q6%Leny{IeH)&cHlECDq0Q_?lyhP3EZvYq3#E6$P2LSO2kf|TXB$%nRg-kv-gMb(f|&6n(l~x)aksW+oo93@&1n2_;Ev*_lHMDHt%>Bv?+lZ!#Buu!4deI zOHhof@m5?lvpHjs3jq#;Zw4mR7to=ws zST;oDzcB{rvlr!EK>6OH6+$vH9CsP~U1!_b;)lRN(`I5@2(-`1!|6xQN$9tkCn~5* z;ht}Q%IrAPm0|kg5Dui**Js$Z=(`>2c^6SG6kwTb`O1I(CjQm)?TTiVqSvbw7eOWv zp>8`Tv#VDW_i@&1I(e^V)xF(>)p=UBid!EJ7$TMQo$U780Vr4}m%Z5qF(=oetK zD4U=OnxhvC?iIDEp%!Y#f2B-2Ys=TmJVNyVnxF(8YAxEM!PdX6r8FvnlX>N!pQln- z^3?+JL22S8Z5LQ~eL(!V0*1EAjz$An08w5!LTG1lB=Hk)l&5676LsYJFgz17++Aqu zSxhq*yMRb+?pthL9Q6UV{A)XSXuE$wx-EzRiW5YxrX(~__X7r0cMG~jy+Bc% z+PdMp^LP}RkBgv0V(y1V=l%Suyk}!v&Vy*l{XHrkFZ=vjF`bu$5A>#IIl{jRf8 zy#(hBYaTuHG*{!Yt03?g5H3#ODAR9mWk$v@`Hw`|_}Q_>=knos7m#Kmdi@dI2TCPQ zl6jos$Rb4qV4(LOg(0u^l$HoFpq!JXZ<6wHhHOC4E7krr0gEwCO1w|=Lx^}BZ`5rS z#%FlM9%)~w%Ww&feSq{s7a_Mg?&*Ie$cQ+}U zVNeZ8@pTqTE$Kz&Cz|n>d+DKA+k}{Zm2CiDx>Uk@+hw3utX4w8%hV!wP% zZiyfx2Yml};j1Ir*E}w82%(_e%H{QZAno*5J*9>&1=q^z*0vBGRHU1M4~&&;l&GIP zN8R!xw|Mwzx8y??Mr_(w)BUea?|Gw_?s!b_CV7e%>sl3i#9^>HF`#YEYeF8mBAJ?Y z?~9kvPDbbj&7Bch2|&5HQWFMMo%I_sjiX=*4RKe8dr2^Z1RFXwgi5FUbjf`PDygR@ z!wY{7jJ~AX&9l}!czuU`8FBJ{;cSMP^J$E4slv$zW|b!PFcs>Dmu_A)rzeUL6xq;> zV?H))aNEOgV9mwM*VaX^Vc%27hn(^GRB}2V84TUpnyEo&-4w(r5Y4W;PKHyMY@x@aY+ENZcKigYGu-^fS8tm?9-XulHhV&|e#5p)pES9y` z6(}7qRJ(@I-{PxP=f!)WtefDx@jNxC7L3bJtzpROjaJy7zi)$1J}~Y%l`-zE8{~JM zBp@+(nq*F|;AT0H=UUinIuYrbTHu+}Ml!;If`p94d&A2WZSCF=kl;?0tfe#@r{rc8 zxZ#D4_lO~_CJEBTJmjOPaOkPn9GvwIYUs=))S>!`T7i3WR9?`6zWUW>WX0D0_R1%w zjI)*D-X46Fo%53;qI0yV)3;Hb{+?}F2N7a5t8qNp%~6pq74Dg*H~4Ca?`DWdHKFhl z1Kv%c@HE&!HJ&;h5sTX{US*_#!IC zy*1UOLY@A4Thy#TL3G4+^O~cWt2x(14@nUiWj-D*_?*7pq&2q?whnI!v^b4Oj~zTQ zm`%A1&Rx8np#JTBto*o}M&ta5(5O6Kwr}5=lfw?529Pg2zMHBLR$}K-Yf(Pm;7Y@5yy|k?FHJS+yN$zI^+^b~N*-?fELraGufMf6 zBP^cC4Yha6X4qsF=evP8)}(hKT_+8|HIpf%7Z)?^*W`V?f!Oqa{*h{ctV8Sp!(vf= zeZ1Y6Qbt8Zt)TbuvHbziEN|unibVPcE4|!NfFLL^DPHnJG!S?)d3XfOaz=bUy!f(iZSX)k+sFuf$s1g*Jp?W|4R8C z07>ssD0&aq#)AuOYI#DIkFs9%bl&Hl>8=D4C(7aBzW!qM5qN<+uG?0kU+2^```s8V z#QLjHH~dO-u7zQ#1MuoQK5m=LB<%W!A_}GvGhY_%arHW@Nmbj67g=@apcU)fGI)_) z2{!7^;}c#u;PHQL*I#=k0f`Q}gPT{`_KEuNM8BE+rG^uvyYXZ_Px35=$w0sU1;NwL z!MqeY0=M9Wo=~~OnER?doRDH%)Naf7dTi!1hWdS5m*)@!8@_7b!<6ootev%Xwf*;4eWvsZIAUqw zqL1UGuF&B>rS1~@fljrSY}KqPJ)f^Xz{}Rn0V&2dB%D3CX)9;iQ`GF*-nM}*9qo#K zF@Kv!Xv%HGGv0w;Z%NIv86|*@xjvC07+g?#-hl$t@d1>5W(tX-EEVx1vpmuSh|7x? zFYIyaO&aKzTLX-3wkA65@!CTrw`>L^)O>1+SHj3Mo8nY5#MtZqy4-(1DDQK$AQJ8o z1*k&h_E9gB5AO`J1d+xL#*~?ternG4`CcZozBH{xDdrECi~T?WsqJoy=P)_lEf z;jua0c|i>L2)@OW<^&kx-BG#u1h9PUR`@};@kMQGfH64f`kd6SwTk^MZAS@AXd zBe^Q5Q?D^+b)N5rOZKzgYrK$1F{e?V(~oe|w$@HSL++BE&(2H`Vb4bhsTe6WJgjml z1miq0fSsAm>Je1M{pW+kmq4bc>x6iVVD3afTci104C=qZ{T?1h_grtG#difLqA)&A zrI$mSmVkc@#G?;!CCPQ85!zrRuu~02TYJ&>-KCHq-F?>0F|)kUVb-Y~sPI9NgUb_+ z-2fp?jHLViH@C+(CHl`4Kok3{>p^ep0n)aH0D}ETWcoI-BdOI~y+h&kZ6kCF4foUK z+2P{hjpsYb{=_26u<}uMggS1USv!}B<97Kt_BQ|RuCdPZ8GgI7D|q9M&DiSWKzfC7 z4m9$14AiORM`c*GGV-w)uyx1T^*4lxI=?L^$XIM2EWMESu@@H?UjYr;$L7Y@ibpR{ zd)Dr)Z5B4pn$Fg}-W7)D!SPbS@U(=J4pvPugMi1&IiOu8+{iq7W$I|b87WbLGlo?KH&FxQ7xxMf@#DhN!cO_L-(`7FVU#BMPF`@hsZpvcIS9>bxg(LMMr(#CtRAKknoM!K9y{s;MarfV=>$B zGdyeaen3bE_B_iLc2me^cy{}krXXi~QVW2&#c%!S?KKG7WF zItyl-T~p&SnldA~Af+{2EgwAtyn%4F?s@K+y1s=$K@yhrji34capu*_4>fjsw`op@ zU159Wg>Jp(zKlR?1 zUAMKhwUTsXSB#R3BF4e%*G0?kcHjRM460UJ)>w>pY1gb>tp-B~);}@s6Fcs`kZ~FA z?;X9an7RcU%I`NoU3$Z-loZ*U_;Y&H>G~|rED)VElZM{hm@)D?aW2Z2$FDx}(ZZXCz%CC)gL9AHkbb82C$;VAf-KMSmCJk^RC*fqE~K3Wrb zPaCWA2#g(TA#v=SH6O2C<|z*APT`6sEI$1^P+)i<#fCRIWBZotjvZPm&&Sk<`MDYo z1bV51YBa_WYwY|$3NN}DHq8wC3L#(;7>~Bm4`Mo?WO-&^3SvSh`%BijD($nPZ6O71 zA_}aI%%=EAx32Mb86kFy-{_|!ArSf?>PQKwpB_oE^ zCl&}%zuvuf{kg+ahE=Iz`^m7*?eYF&!Wnm4BF60n!G7Wt zncbSH9!7*&!uXD7{2a!Sj~Je%i=dK9wLswR3u2{KaIZA~)>wF%-R)I%*yR$%m_z8D z+4Gn@FWxz1{g7;?{;Nd;vtwyX$u69u5GTBgnO+)FalRxeh=QB3pOag=yM{t5Y(jc^ zt4#X)3`HFomI+d@%|lu+42D71HCe~F>_Rs`!R*RIHL-*b>g~Q|Yms}uYp((7f7?fc z2HCFVGvNq^5dk%!oVv^7=@p<$Vo!pPa>&g5uBgi3M)PuXA3`Mf9`r3#gL-j9sZ&_z zs5+_GflTk-hO4o0%B2}m+1P9Mqg)N~X<|Df+?i3PPuh6i_CFdRge1Q_EE2Ma2rnD1 zmJhRk{M#I_*zaS(-%I96D$Y{5x4=6(pN*!wGemBa>^E!BZj1+>Exy{q`~3JN`<6cW zUHuUxB;+my-e}VPIomJQr*qe91TN1v@M&b8CpFKARV&0rpx3K|e|NFzPqiGvb3cG#+n2__9d%*@Oh_>D~~TydePC_cpN zZCgY2@vw4)2wIxh>q}1=xOVhI6dmvS^tD&$1IRuGvu*|@P=xws0nk+5S^1J7qqOFJ z7-LdU43&4jnGho%%A8%Y?G8|?=|w*%TP=FEi<({oO6dw1`wC|iN?W|;6T^s(aoDNQ zjjM}!OG@tED7_q!VDRJc{fn8`y5|MwE`^!L-;6JuS8V}K;hW&!_Pu+hx(P^mMvTZW z;o5T@@@sqBPpX?${9|4}QDI$tf2}6Re8W12XLs04VXNPk>DxxAvqI%8x5K8>Tsgpu z=tnEgZYr@AtLs2N>m9UL!>h!|EPuA5VZA}NMC03|eLoLNrccyqu!pH049{7tu{dV> zcSQEYnle)))J)LoW`~(5eLOg{CsXP$i3-$t5j|-*wR6pJutvfbeXj1vl-)fx&w!)a z;Cj`$GUOx^H06>k@}xi~xsM!bg>IKX<|mt(E0D)itJV3u;?C?rZQ}=hC<;wsK_G6@ zSgzE&@nWg*5~hl*Cc#$=f!-~2H>47PO9F%^b z6WWy|I8mQ4EtxTGR97PAW3R=>=ipl3EhOJel93}ABMU#whMCl#+YMXGl8z9_-sx_! z-Ml``~ zZM)43J)Z6N=6QJRp0QaB<{&kSrp^glYR!`~JIiRwUj6InM0j}{{kjitz34J}SFI(h zKw<^PkGozM--Vza((P_+N5(45&Y3C5Mt-SAH-uUZ;s;oTI!^;;Cp&vga<;xW2(I*_ z*>I7+djSGBa_8f>8B42s>!33m1ck7$HfJW29!sVIbDJZ~>T ze1XZaZ8GRN2w`Jji>(F#dlJUC(Na^SVBHnVDwy5d;v~Vfec%6JWI3&pDwWB2u%>wx z63v+^gjZx^Cq*wZ%Ud4N&ArixUzw=d_@J!2oa+3a8^8$(4h~MJc9h%@XqyiAyQS?8 zuXTWMqg@%MuZ;hLEkYGu1E=S6*Lv@sHND@t0 zO5d(M{S$2|B|k04jbf6o+Ry~udMzE3Q{)sRQ0b&*1)>?}y=iWlIVN;dZaHqHgkIKo zQVyS*hg4qDRyBAV&x#d^F;1oE@}cY;*}C4TA5gJcOgO4g&*@+7PfXlMOa8!~gLpq^^0-SvOY(P?tU37y=-QLE?*6v0T`IgD%mSCbu{Eiv=VUJn-hh}p zmfkQk_Q6qgA>9k~Q_=cYGn%bOkEQAVoA^?_@B%5%DRrI0j{2NyfxbO#W_>!``&o1Z zLJ^=+I0N6TA(&skq<$PD1SPVc#NKa|!Pqad4d-Gw@k)1OI~`)p*EnB(&mGn=>wa#= zW;ECTdX;d_^?8pekl$4h4r-zC)A#Bs*wt9=t_!}Gj~AyT0Q76Gt)idM>~0e_|M|8(Hm)+|Ji+VA(^&)a4RJtdg=txxwCyS$KSWTozlPSlnWt9Gz=IfGMG8 z^I}-O>IMmr*r;tt?TKb;XCu+eDCnDeVVxf2Fh`Gmck^G!M%K@>MpAli2Ymn~ka4G= z;mnxhW#w6bj)ZQ6=zWMmp_^@>_c@~Tok6u0QS!(gdh>Lgzd8|qF zI>UawJ1f$HW@I6rswH1TE`H$UHzT~jZeH@g-^Z&589jOdA(L$L{SMos%P(#~6z_~W zrzzfC{!i%O24n$(zU0{b{T@%;0Ue_c)Y5puM=a1=WR_986?Mn-otVF!TF%~_&oYn( z+n(L$A5w0Hk1gt|th&>q_x~A8;-F zUMw)l7}%NpcgA^&j@%wKU1g=yEGh5fvDx;_@w;$_S~ES$?SJ+QyoMJxHfndy`B-H` z_%IXdtq)-+4tIvMm#h^B9YB011q<^<(~VOFVf)J-zLl=R+ablgG;YO*ijF0Z#t9 z)f;USd319SogSdjbTFS1cCa~?FS-FMR;?j(1xf$Y+b@0UyH%|_$)ESuk-SBlc(?tO zF;8?92vv=^RD}&Y-`x$ia3xzl?E?u>rI{jLVgqhTTs4rIa#Jy%t(&X^TF(u7>-yfS zxgI=P_@V2^(m9Fu%VbzPGPufwt&2FS$;z=tdr;9fBpRY6pcpePFe-h+_2uX1s@j@Y;z!=6yO5xl?YWy|P-l^K?RnU0g#% z5pQ`PpIM&^rvo8Z|G*I%x6Q!nxT;OCO8(Z|?XXLHYwv&HJ$zI{S|sR8$D3AXI;w29 zlnY=?fEtqEbtxTDz*UjGhpLMOwTxY?)p~kl#xzmllfRSQzrNs+j{9Dq#)KFZo}v|1 zj6*jL=3~IqcS!~1ngx{YGb;fOdTo~gt9;q)8C%V-9YYD|TVh@ZYv z5W|k2cI2n@JWkY!KCkVA#4(`v7Y_G&hqVbQfsI-GMe#x@F{=fJiRxPS0HbD@L5<| z?Xdgx?>fyt8_bOKm%m6&l45~Z?)gF@hF-*MZJ8jh=M&fEN6zAQs8uvuq)Jb4({bLdV9gCyf;H#tX1>dti3 zo0;#35FZPkl^G4cG0CzU9dD=*FSjtu;)%1f$&L*cyJ^2(vx=IJgYk!}^J2u6k`jW; zp<^9A3>)N%85=(9@|Gb@t4z7v^p8W|Pe1p{@a8n1LlkT+YJ-2QuJ=*Oq1g3rdZ%Ys zD3FTkOB3k>>Ut(hyU;k}#2R}TF_$CsJv7m!pKp5Ys-e=wjz|zHv_U9WIQt1>%WqVr z+qRC1yvwbjZvmtsC45!0-oAaJ{gfr!DdqJ2W&poPjk8xDf4%-zgM*^i`Wl~Pv?dZ%lcVCGF`LI33JVdtSGL>O#_pz=L8#49dpw}n2T_;$Z^=7>a zWVXM%#EEU}KaNSfjwqW0TJ#qA z+!H;Q<1Ge+;^$rX)#zI$i^I>K*q-BhhVKwDJ2aGZ%PTZa6oj!B>`B}u<}_@(OJtZ( zqidY4O1EJYRDJ3&-FwF6-wTvltk2-GaaN$3VLUVUW;}s)NYzD|KQ31k$_eT94~eEp zbnB-Z3_bDYSvtt;()EKNpoZWpGsB(=5SP&xmp2fc?I&>+r_xgVwAI+Izc-Z4%*yI_ z^@u5OxGc76=UJs21>E1901nGHp@5tEPQZY^Vv$Zpbq#ga!=AWoxgO zPbd57Du$)6zem|irz_t3eTk{%Nxx~zPzar+Ybdg+Qk;L>Z#S%H@3iNzxJ(`CuWg9! z`%w&Sa1{I_pc3IDorCjD&L;X!>Zy#svA~mec+p)!oPC_ReD=iLqD^{C*a6cmV$%R8 zTKUjEq<)kqM=tEL%C3Z^^T&~;UB@T~RA)ux>*c?3>HqW7=X6l%rd^o8{KMe97$qOE=g zC>krMz14&y3RR*f{t!zVn%=s7JG5=cgzLw&?i@qzBX9Ky17$$KWLO{yzP{J9K5^hq~9VtD6)GOxVp>(A~t zxr2S1$Sk(l2#)QhYSzx=-ec(kHr43-4&=JZi0$x30*x!R9LU-%X9SVUOSQ z>7%IAtaCVBob2fX7rX-G5gBtU%42h`YI|NJ5}M?kCE_Gvyc5TUoFN5z_Y|B%fmxk0 z?G!OCMZ=`PqSEprFt2K+1)MPnd}e0((XVav(Gu^-6*7b45N`kDwZ);=FNML@7i>yFz9a$cqOnz(`I;j6T>gEUw_fctPunCas-Kcsq(wm$G}kc45kY9 zCiDB7WRi%N#%M3Bm| za=+f^c&@lXXEm7ky)aHOw*pGksJfk-s&s@d2=HQ;i2(E{21I#L$mMl%0(mQM@NpLr-WvA$9}nVC%RvpNZ;oP^3JDm-jT~x;%7Ac?YL?*dK1;=@R@xP zr}0M!K|>aV6!iMqS`j`2qw`t3{6V^mz=W7|Nz|wD$6?uNn+t74GUB)H5u2{`M9WC4 zRXZ0p?Jj&bn>Pog{S}bHl(BYU%MHM$h6zw-or=*p^OH(yT3ofiU%j=D&v^-?CC|B3 z<~J-~{k6)Nd>oIF{&;Xt&t8g0tfo|=b5B0-&MW#@hB(tf z2n+V)X07&*m7BtgnA9@DlXc+{T6k#+^B|~qI_5ZT@`CO0kMF!THNS#tF*)d#E3wwk zx{qU9-tAg8P+#;em>4qlz*?wj8Tk^1pS`1~M*lm4_<=?GCuINY2DD1?A|(StA81Hp znNIuN*!wBQ)ShYMA2*Ozw$NBsE}oa``YkP-ts`CKgP2RdE66y(g z1Fj^is4d_rJia+;G{aL5S~g0M49Zm)1zcvV8PYMy%STx_YNkys6VL89g*}%EYAa5) z#aNpwX1vQnv53%ZJn{Mf5h6I=E>WA$$xq-)9s@7a)NR%+eQW@~?!SA+$Y+Ox65bv{ zWWsza*&T=XFLMbRq%JhF6!!`7{;;wilRb%5LjbL@eSMWdKk@L)jA{El897|c!B=w5 zqq0Y((GG!aqfzgI`AvE0{W2eHQg_;z4QrL!EBNd6J%qJ}x+RrR=J*-ZKX|IJlAt+(C738zf{zL% z3ioaL+XqrbLi8xxhc^QNCEQ!T(qqk=5+n{^a!Ew8>idYY6&zyaj+xR0dClauZAH7= zck%^YxG}|^XXWiEo*wvoym7bmS6KTFcm5joZh3Y7?;!V=jpn^T8M#bS2O0s%ZdrK+ zP-dWptv`gnhSaEzzEu4a^2E6#kM=&wF?Dj_8x+5=cjH96YhuXUaiSBo-f2M6Z%he^ zgZQcc^LHv4rucZ4@&gil2IZ6d{uhfk?Hz>I`X9!!ISV~K;ncSDU~+RT@1+j5Um8*q zRV$??9XGRXtxzgouzj~kO$}c6UQLr2<>!q$7SVG{%K=-Cy!*X3I`~Wg+13B5!{p z@3;Ch490$73H`YKq`s@!=e8M^q|#GoCm+YN8NKy9*PWytG2Eu%@A@c5#yMxx{yqPv zv$u|mvI*NjVP&O6KnWE^3?wC`!&N{~kq!|SknZkYP-#@UQ@T5rR6(SXZUpIWDLJzs zD!kwMzVkc&CA-f)Gjq>fGjq*#H#A6Be<1S0Gn&7zHt#mxosb0ZQ0#7vP=j7;JeXItE61cN<8O6EE49xdWMifKZkhURiyT3utDd+E)yK$SPouB?|r(uc-z ztS+rRYF!BmN_NSIv;wr&?5`TZW*2{)hjRzluYK_wfaO)12QBn%v3@R|rBIFuN8XeB&&lhd9xxM0Y^e>KrU!O?7>y!lK)N@-?6MjndIJ?6++mO9 z$f=4dd{9|4RGdA8A1D2GfrxTBWU+)HCGHGTH8N>q`8odk{JS^}rI_^sZ_eEi3Aw9r zy-51qS54otA!>^FqWtK^yPb|hKVUD*N2HU&6oS^Q`;-B8Vp6R6ZoPPv?1 z?bENBmgltf{Eiqvm~ zf7yRf63y*y80Hj)OG_t0#kzD5@y+;a@{vV+ehPA_@+{24ld0kpIhevTyHZi!C=YTb zx=Uv3(~>-ym`yl=>vHAjd6m5N3k#g3zm)L3cej>|uV0aumF;tJcO|#54%43*oNt9j z!zBd!;u)45_I{d@U1~@6o0h4uB6x?^f}TWJ)^JtO^~P0WKA5nOFDr?mF-h}bt_XaP z7QZC7<>m=UX?ZfYRppd(CK-0~v~6unvZvN;M8CogVk8wav$Bd7WDP@N&3Lo$3lG!X zX(Y%vE6MUOPb4j92qFlB^zEOyuGS2pQ+U(571R$ZMt|WjXD>b=Zj)#~qxNjBCRf?c zqe)==q0;L;a)BB;()y0~N~(e#3TA`2uu)*_y%$UYX=pq{YHFsQ=Jgib4Ar{Oa?+oogxRVWrGmm#TjCH2ITFM@?u*k<2nWBb0Qz^ z^3g63RlALDqlyRG#=9F{y0*`Y-wa#2cIXA*ltaB6gUuV^5-rVZ%B7I3Oufi*86k8X z=!%R|lCBBd5fe#axtwpYwzC!QGp@SglzqVmcb4fb$`?w^0DnPS0X6w`(nI=y-6$gF zECyMDYSK>g2N$0cc%-|qysA+6`0}#7X(rFc!^2fY>{aMT;fD-gSSKR?Sy)g@BJS_K zM1m7tdtZg(tW?2P7rk32w}Svs))=DEzj5dD)ISXqDX1R&`ZRg}m%5uH+r4wlPlH97 zpN5FB;gZ}VDb>1qgzEY?Dw|Ql+8E^h=Q00&h=5?df-e8sG@c3#VcIF;wk{MHKN}{{G|)BKnax*Y)nlbee**U7IqJGhguxrOSUz!_nD|#vC<*au{p9&75q0(0|&i4 z@@c-nNozyLhzCnf@5#?Ej^6TF8Mu{sMnYo2SbPYT4>Q%q%iZ+|qyB;sa0+~V!5dB4 zj*kELF4do@qfmkDuNJERnU!tQ=Q*lMzwkd20ywG~Mq1hqnXf&E*$;x@;2svNAO;l7*veMP*iN|OA8RO6Xe;m1+MHsR|}w;`88!`_?=Ol zrXQz;{xcaZU^2!WEK|t-N$Xhw;PFb81af29Y;SHA(!}|NI6oyud0TZbT zZ->KXen)*udZ2fU?a6S96_^QhIj^E1^3tCk!?TPUz;4>`JY5gMj^PW)d3lESc z#=)oN+>##_DbQw)wG z!0wl|kRJQ-zqgV{?D0)VD-cYG0}4(A{R<~L-JL^;fZcX#xW=QIE7Bey1t_C2T|MDW z{-e!~gq&+lp1K@wzp*7*fv|h`*7g5Or(USed`qRd@6!){!9+2k{PUzvP@h47T85WS z;mAS5KLo(e&ZCYtrqA;u&YZei+EHxn)Fz~f?{rCpJT3pn(fplcksI&QaGT6qqaxM3 zf7D_u_&5BXcv5N(BK_0}X-hBE`5ynxxZ<;`Wo1$$D6^t?ehsfK~ zU5a@`mn^c$oXQ~&kSS6bCzH(kqT=ot2@_zfjzN%?)cZmUJMVo-Vk^bihjo5 z*gHKvy&14*Elk>72X{D)kz?i<7a*AU-k7Vc+gUsoJEmu`x4m&C)FMi(`oQ8(f3g2V?GxKf&oOW~`(#9Z?iRRhQg7EjDerlu%_1>tMb((MM>I(YV?za2Ocl>cGoc3KZ zi$O2bfE-wjS{q%R6OB631}q%v#zp7->{bj?;czM)QcusIQ)N6R3G27FNtbD|c86rg7Z3PFl zV+3h-A1}z(=seNIN~xHgsEirhGScnjFAW|>vw0q2iO?Z^)op$Eez=I01~w{=!&abq zU-P6hvHOV+`X$et(1Tcdj85WM@)BB+WSBcHiH5nExv*SB+S~lQV0(+@)>-S3O22E| zZ9bWg9qD+$0aSte9JkX)gWi!LVx?X$aQwh>SGWwxYQW#A8P3CBkZyUE$Ec`)9B<%r zaTA}(c`ZR2g7^?}hB={-2>zn@rjpS&{prejRXDWT(qI!KZ03`F8G0l~Qu4L_qQQd? zZ9{%3wP`(~(tAg~6?=J9u;FJNt{2Vz_nEt+Fh?1>x;-Ka=+|&~>az zGfcCv*shZAS(>ScP$%^`AC0V(Ca#yZ??UFS{|3AS#=~ZLqNI4qW_?aNSP3R{K29S; z!$Ji2GdipbiK5xpHT2GIatzh@IB5-s*095pw;e*#eo=*4ASx_kO3O;CV;o0%!fci! z8FHwuS16f|bI#P5!QoMkZS(x&eD@(Z64y9Qsw;_>+vxNpkrD%8JZ3p~&r|S(u}g%8 zoXa*0V(oOcE_uJ46VlCRSWc!WNV?By(|I;o8oH^GSe?)DGh?0yDyS4eNya#@_rod- zB2#u&uQ&`N2tLq>f3@y>W**`tbTS@e(9HB~=}!i0#Qoi@>xvFn-*vxbP*UwzlqYQ( zsB(5z%f;nHvd@(biDpz|KyE-(_Gis+JXLsw2AQBX0V}Phk-g@O-l!`NkiGB=wBX}^ zT=QpX82&2p2Yp@L+{@(Q;_~aGa`>La(`}#BjkSc!7RG*saV!_>C~aM`9_F?NXE{Yz zrwbJ9?AkJP`D`{xpqxQ8@jZUmcnsvjKe^*`08r>=?I+D_xy=r{ciEe1vF;o?SBiwm z0`)H$1X60BX_Vb>8hlg3{+zVPiRSK8j9-u3d-|bEonnR&hJ4|HLsNAR%pVYHveS(6 z>UB~_^bu8R2W!!eR;mZ>)2vwt^>jJ4cF)-t_5Zr`(6IQb7w%pn&J#MiR;QjKa<)0^ z>qVD4YgD6ma|O-K^C>0muQ98=+V`_m=L>cto@%p3JaDsP@$AXepWQv*cLstL_^O)B zc%D6Gf8HWtWok?YtG{+ePlEJ>l~KHb*okC^kl#tB#i^fpnG3P76W@qs>)M? zFW&ciwxN!t*;=}pHvi+HX0xU#H?;3_VOKs?=OyzOZhT2)OmcUIgM;6WoPl3K&bM!k zIFD%g%<{-R;B~q!`R*ynoEt#uq-u>n+RwbmcZR2%N^kTxt(-Cg1ykv1HqyQRT#;CsSw++ zCu0xqh&c9YRTmPFe+ z04qMdDtb^QO%R02R%FV+Vm4?P!s&cs#=%*fsyoDTHjg=UB@PyqccmppYOb+<#h^vw zEM!&YPIP0P^qpaFo}$(4cRs--+Nd6MIJ9!ZWI)Y+r#!1*iB(x(s1!4s3ixkBnlBmU^W_b&Dr(ZkQq+4jV;Wd_SPz7U zUtzVJl}T9nOkKxsFj8usyrR`jGiTnn{=KOALJFNIU3;$@UD4GXL-x?-4;1x3SH`~* zvDKx_b)!&oDG`s1B`Mau6mB(b2k2vW+P+-n%*<#9hq}EK(bn|5ngh=w*nP7+*L1{_ z;N+%#Y{2+lgZ^roUgp9+A^limaCKc`(!H4a%zS^xqbrc>%0VARt+fX_FF|3!*#iOf z8d9nofYIO{Rg6O(KzTD2dX7j2URwkAq<)qW%S?`ih+Tep#!7>lcr%veJoqfwCb9i#T2K0DFJr#m`eFXuJMl+C|#c7ZivfMd%=Qg|4+$vwFGHMfY2x7cTE z42xQIhjGfMc-(zxyMbXKF(R)aFf$=Ocab)eTDAD)r?UWdqBL{<5*t%d{27`tBrfOu z+_+la)jEDMEJ==cs8iA~R&6n%vx`JI0|mO$H*d+HPb!l5=E7A;R-{aI)aR}1Hr5`6 z(P{NW@Er6;)*H!jwC`lec3=CwJ^v7Ao$f~sC&|AqALJF~7=h^Hc)c^Px zH);@xI8MjYOnND7c!txMBsU?S&~Y0B#~>lSAiE&fryTebi~1evWa$pEFW)qtQf+e5 zcBUswarsmx1Cz7q%2$l})yC+RVPcW}72Asjn^BFOL!d;#f-A>(RPOI(yd%LVV?Y0B zF3pMVNX$ZpU7}d-ug6AR5J%?|sWh$88PH5k|Gv9awRDGW3|_+x4VPViw?AAJu5%`w z*mRZB#y{YVY?wV`Xdu?$%nprPs=rqmrlmRqt?Jf-IL8H5KhrZ3#cd5nRtO%=c)H~$ zzmFXyz-|!^WwYst)D97fT~J(1#S+|^IFvW@NnK7L?cyDi1zkK(Sdc44R8-W<^q`46 zIMRx;<>>>hIz>^F&sGmFzr&F!%>%e^4h6y`1`!1nzG^K~+0Tzdem-Wn!g23?SB8uJ z)2B~8V+gwrvtnvWpf-sXg(-9=hW<_feaWr9T>x{^q;;Nl1d`ysuEgjYB18d;SeZ56u_brBWkZz88MU&x5Br zzir3O>`V(PL@SmBtU7%m=d|M;N^>A#Mcl=-V+pMo;bBg{({(6QY5@egrj1zSg%iUX zy$Ejiu9B1NqDeK$$jVakDGE6nES|l1b0w-bC%ePIXu&pC2~@lf1%Ql)4va)>PqBDcL-y+&@#x;k(k!r{tNQ5j*bQd%cN^Wyx<)w+xdy);rp>XT zv2p&&y_A~+2Ibio3Q|3kU9xPjAISBb~odLCDPr$#6|`MIi8IOH! ziBow&wy+%^J!|)%_;T5>??_?XBQV1-O0X-O{vo}&^yIRY2BWUWN}FjWE{{c>s9dpX z%EYTsK}iICmtuQ6bF-`@BQNz6CUG84E>HLJEP7I!v-PxCU);gZ2*b(fLZmN52;}D} zzWl3oPHPiW!^IB>*@SGQ#@{rE4?Jd73tQ#zOd%>`yxRQeheS#1Z?_uV*DHhJt~Ob~+qH zt_0zJ3On2YCX#iD@g)k1ar;F80`|VENhI4=-gWsh&~| zERx0y_o}&(vq+-}GIFI>0L(-kwTa-@3Kz(;bbd}9xyl|bS9l=pMKch3CqW1d4^MNN z^empX+A^1R@3TrBO~Sp9s;CBC0?|;tN}YI6IwJ9hFsx>RI0CmUx3uZpe}@@@ahj?Nc)fg%v_>(7$oTebs~uvqut9 zR`|?)USKZ=vlfTnQP=29_l<6xb8!0?Zp{~D?cv)_iZ|6CKe$~%MDRWi_YHNg@K>M^ zQ2mJW5rs%@g9yB{A(qdG)+KVpMu>^3T>3|4G z4=<%X`aV|eKE=mywNDAzY-zeld8O$VZLhPoqLR`t| ziVAjEs)(MO^@-Y5r=H6xy@feYT@O|pn+S8yNUfUmR2O&hE7q8Se!cW{s%-*6W{9_% z9dnubHFPi3Fl8hJ!r~hQWvUenxyLyQv`{gj-RK6i9J!=${ta(~$GQ`!AVXaY%$wu@xf7m5X9v?`t>teaX*c65@3C4$&xhO!3aubG*GAL(oxFjkY7|*k zYkv8qMh}LiU5q@BbejmHre|6Ot;bZeYUg^xR_Df0&FAxm_SdqCvQ&p-y%r$dWB~cF zHDU*8WQ`k*?4Jj0M1~Sy*9Zd$q3e#Rs`<*ZEYd^cvqNM})(^CcpE`1gURO4p^lit# zvkvD;xp^kB1}&;Tqw&18T#dCtsgb#+_)+AgFhZTvSP7-zpNJC#0_c3Vt!fCM(Sa4Q ziz}Val$i(;V?C5{f=lAvcV|hA44Tnct#pE-$eBy39G8_EYG|U6x;f3_KLwS`UN za10T*9jMG(&zD@uPcan{I?K_qy?hvyV0+*gsdgz`;?{`}LD3DtsV-e)m6haMmZyNB z(~<4;XWH_h-zrS~eZ{Tl^$*1g1neT*auWmR7sxKxVfID@(GQu1cIpS$HD8!0~)APf*b& zK^xEdZlU0AX6mf*eHj8dgf1~@N)fB=ueZ45b4)nXC1v_f`-_WBy`m8FtTLP(ivb$k zT%`A-t;7bVcLc-roDl7Ck2I5Pz2OKTJ^RM`cz7p|2FN{X^c$~n_7(w_B?4%gRs zzFn|3J_wjE&)e!O!jOoW{ zyI^f7Z!KTZcec5_&)(?ZwS)D2CzH#)JI+tIEYg)k1pE%q!zJucy;};GR6o(XSPdxA zJqf_X(#>ML+~TY9VFJKk;F3v=E_pnytM`YSDlFjjeVAXm2F0|$DoI78UhADtYX9CP zrz!|fOzYjW|A0i^Cw-mRi$3%p-@Ac)J?@hz5Rl_T(W} zV6F>!|7?)=7+#W&GcQKVYua5boDzzHV0Y+a*SX|y{Ik*p$~V4{K9yw*4heCY)9Q$l zRp~ci%1LH^Tya1?$b1Y!2iw(H{hE`t{U?%YP*ISg`qTJ@5)m2Z8wBT)Zu6@`5TQ6f zR)tBJ`GnGra0M_)j56~!Z!?|qC;RrFa{@F7P;_CQ{~QOLhybQ0F)$Npa# zo_GxQ9(tB&`{#dpg(%0hxXP#X%Y0w1!|FpL`P}L@RfMUfz^aqdc^6m_WUgp|6~)SC z#GhMPN5A{|hkLVpgZn6a+*i0KTL{?z9~>I#$2-@&N&K{)(~vU@<<+dxxt{8Z!@6TEK>w@Ekmrcw?WXB`SQY;g;@WB5d-o`o5gjRmG~_nn zoVmsEOa2=sJ&fWEqEK;5Bgbl8cIdgv@ZC;BIw$~;l$V&wt5_43RdA>#6L)4Z9mOkj zj8!^tc3V|B4*>Kh^%diZNGu?f!R`u9yQwPL_+9MM@9$$pF{a7$MHy}sC2w8xqzLK@t9{xnC+A2buh;07zG&$)Vs3dgoW zX5%+p`^IlA39I+NCN6X|E`s0LJvVUu_$G9$H88OXvfQ&Loe1naG)swd=oNH~h3j)k z*$u7ntx8oCRU%TMx5PSZ{;egU2&KvF&z(r0@xu9w0nM|{WxeHHoqG6*FzoTZL@8}s z6$~P!#@b%r3vY9gFO1Vrz)c%$B3lSLT{Cm_PZJ@qeIrC#EVt@N%t5e`!E=9d12S zPGfuBU2m9EUw;{bBY)3NH26&*%@vlAGDITYl7-T{_>GQB8S&yHI@Wb8FUC^OWA? z%E7Qf<QYW|3K^aYK-sLgHO84%W z^B>#?#P?_5Ys(?6Y^+~86BUD63F^IFAWbUUwM)(6cbh8~RnUzqR*5}k3LzPQ&vmjE zu#}ZMJ2LpccH)fdL1`Hv47;RDCifc@*@;qw&a6utryw!Z9@QMb#)In7dE{?EpK3Yo zj`*?&Q)3ZH!O(dME2zmwXgfgc?J)sr`8|DowX=Y-l z6wC2~#Fp#Jp_kEojv1w^0UZ~1ofh2oC%=b`JZ?gIG9EY-+DWm{+K z4|Z@)?T0nUa2d28v(AbO=`nQ&cB9_f?tY2uZ6U3>VNZs0V>%aNSssvxlFtm_vnEn4 zPquW|hv@(st0ut8D7hTqu>Ll}-}TDd{e9aw%hLV$?K>G{&wL3V$Ci$UFfd{|0II9^ zTlP=X9VN@c_a?J{4af^a=c3COUE7wLkiV=0LjyPfxZs;y>rX$gIg0{HZsQhCJgv!av zu~XJOFWRXsmb)z8WhmwV-2PQCzw$|J;RU?9%xnbL_(FmCrSd}~3-m=(?+TnGzEC&EOUE|U%Tx~mTX)#RJ3>Iv$q(PeNfsrtK?U!Fzl{UlzTBk8XoLCI2 zTQZecbz-+SYqd)hcB3D^2{sncBB+PK?&~ZEV`l#n?RSY%l&{g+_xMN*?F?c&-kXjD z31(17*AJzW?Xz4wB=pt(AbN`r%Ss5}gLu&6MW=64v5s?K!-bSgzhPyw8*AT!;gv&5 zrgbtS4rO6n>T{0Qt18v2jH54Otlo%ao-G$ru#jxrdyH2pJEF@!8lioJ>qAJ90{ekdAmML4BPk@Ai*?=rAlK8WQSK zwz#_kO$JYviO8}`+-OAFm2d?m46&QHKuU10MeC|}T|Qtk{^GN4)OhcAWr>%^^mp7}jfni?65^VzGuBGwSip{YIqhM+|5a#9;$6gQZheztusxT8E&bV!h4dXdx8p(6VUS%f{rrV>B&0mWI!`0U;T zB%5Nmx&&R?Q63F*Ky7nOy^y$gQqW+fKp)EcP2LUUs$KJXJoAcm=OCsRN&p8|U zEXfEjE7s>{qjWLR3+dmydWyG?6W3UO5Ef-uyQT(Dq4fMQP=wINeR$$o%z%0FkOR8x z?ZMWhPtoJz`}_hFQ7B&LUH-~l4rgY-b+~$W&Je{sdD*yQfPeccLI%>v&)m1w4uYj+ z-F-iaSmO?(yeE$Dnhvu=a8wDfwIx207$-ch3oHu*l`Iw0X%4_G4%Qwad@Y17+v1o1 z3{I_>nVJMH*buAbbSQjgx5yXSYy@S0v`n~V0j|La8Jun7)1=kT6vQu zVKlFJ@aGS3g$%k*B1Wz`dsd#3-|7Dcui>F`YTf{g7kY+UR40>Jig!;D_EjcHeI>iF zZvtx2NduF`iQ%{Gxs^#kB2b z(yUE$Jey>eQ&;GZ*a2N5SPpMkxWK;Rw()6wRe30OY5@p1$r~_ieAB-D0k3ND=CKupIW3Nac9uJ0WK|+%o&qLOsMQx$RXw5ZC`JR4(KW} zI6}<8BBAvK-fLZPMyGNEp^FkHY2cPELoUnF@CjCG3~MAXuXNVlNxTOw9})@fzYMaN z|1!XBQ`IMI58Y;21?ZP>g#;+$_>XGpxGhFMePRc}LY zs(0Q+{%M{2S*1E#shiDaM=t5f0~h9n zC*Nr$g-^PuQ|mTF9tqfi><$SD_)R8s>igqAsiDBjCz=GM+&bwfPOf!aU<#wPE_m%!q&U^bh8Lu7JuK0kL4{zd%r-V;easvIqQ_lSEzomv;E znh(P>tR0Mo*f;vj$)$5G@g)`wk9GmgAT}QECa=4gUWzyT$RCf^Y}orY`gF1ZK(ki? zoGU{HP#baj+IKICVYptv4fBX{VYyj_qOVW}@J4Y_`H!#k^Y2Do59d+olf_O}TAeDf z>bZe}3fYsRaL}LL;L#njSD-BQi7=4=j1ryR$$j4+N~+-Awx=U4R=f)VWi+JWn?j}0 zTU#}Or}3`RNurNYjgUtpW8b<`9Q>TaKX=)BrD7F@*X5{MMzkZnT$|MEd_0X`uSw(s zT=o^HlZ>39J_o=IJm|@?HEHQ$kgv3E+gm#r3hNyLFs+OCjULl$Kn~pjf)~V$##;=? zM<*0vj3E|HL|t7yJ{$*Yf84l;lBM**btXv}bj|LWvXXgx*-23Irs%=Dc)1KL%&q8R zh&9K75fYD`wGK|m@F-k>)_kjMs`yUjQ@}IfxU+swF>p$_L95#qSsj&EQn@#{%Z@DN zZ7sRi`ecN%0ZW$s!(3fKTwL6kYg3}_MhXCfQAj95&NR$faFU&L5+L0sD(E;fagE2S z_Zi!!KY5X}(JYhsg6^&6-nb#Sjuzl@UP8LC1d`jT$27N@+IJL6z4<^+Ew9#o>1DoF zGfM=!A&dL@Yk3$r>~sQ?Pm_I^8k6ujWcS6>1jC<4=I4Fw)Ph?@(I&0{qJT&R6ww6Z z8x}$05_?d@qj_Z+hJE#@in{%HqqoW;=&}vp` zTDrE|+<%GEy@$*_NsfDrdlT)|)u4(}g;N68Q#oCgz*3u6)3uh^5De z^2-_Mvf;$qHMYeG7`};er3g|YDv4m#_R}>oVn-3T0~(S%1cBxCG?(`6fH!?;_Tl-_ynDFO$1 zn9gSpLiVQxiHY_ZfK)a$>+(jp^8I=TBfTiYVvJ>1JI9ESwKRO9UgM`Y4En7&b}7wz zS9zWkyZpLFH@*x|QAN#V*F~u5B-OoWr!%e#M*_}?$^ho@f#6%((2$J;4^Tlx7OwC0j#%h1sC)#l35UNVG&Ts zx(i+UfQIAINN!eye28XxtiM-rMcTd;%f)xe`|XdVxA4US39r!Od??HPQz<337X)fA zN9ZYDPjIHLd4+lo#%MW_d~Cn*Bw3mLR3K%bwOZeqifUA=#y%h?V6Ijtsz&N zZub(uT2X;jzp7O484bhu(Ma-XI+G`)hTu&1PEdQI(ZZc);TnOK#l|C*f=q%DfM+5b zg%jaRtw@VXYo@&=&Iq_fGKfA%+skrf0^*7l-|epaq+Inbn&%ul2Zh%6t7K)IuUCsZ zOH)f+8BM=r5O4A6p$IIY{WBgJNnXwCaDs~$tjntLJSF89_b~@AlP>y@MQ2X-r7}-G z78UitVTKV@-8GC$ELi@*x}lU*yjAr2S4pw8GF8}MSm{(Hk{qZW=UrM{*Kf(1PZyXr3t^4oV6Y;O~M@||Qy{iv$FLl`foWjvLBCVw1r_CXxns@EVxl)n}KQkul@fM{rW^cN+Kk7 z8A1yy_b&=)*O_<43*sjYYH4d}1;4gg3ByuSPh%wX5u`BNyU*iI{vr3Op$+n!La4yT z&G}FTk*eYF#~oYamPM6m&q8iJr!7{96P8Z1taxwi@8K)hreu^qp=`6_gdu;h+hf6; zcwk>JfE9kr#BX%FkME2(hv>NtqjUo9U%P~c_w1;|TkjIpD4h?z-eoaSI{i%ED4))` zA9*<8v3wY>tQJhkuR>81e=L!rD3}t#-TTF~Z*07m85r(Oc{lsg2H9`i!x3Jc@`hz+ zCqJiuSXonh;0Q0r8rtg<)5rcL*Kd5DP3_A;l)^lMXw7tmli}ul-LIr?L6sC)B|$$l z!M+y=tS~jKFQI0hu-pB5M#b%K$bzmKvFY;=a(`)iT`E`-(x}EcR8l}~1Ztq?aKm8)uf{%fh}VS7I7+6oW)Z&IKQmeC4VyXUo~la10fWwC8}6K4S|7W1Zka(B|HW( zS29q}3_O#!p;Qkyyid08^vWC#hYR%**0_6rA-h|iJCKeaolvZEe%VaXw@2p-UC^|n zf>H9sc2H%{>{h&QT~*)odrl@MiGm)T{M?L!7+7R>Xgr5912cxft7gItf+Z3$Z%;hQs*=DJv8Uz zS$zT_MEfkGVn%R*G_nz)>|k^^PI-T`NR10$_wsJTOW{hFtCpP?b?I6@PA=YJ*z3Wx zwSCKl&)QdgN|?_wN~W z?4dy==fJwa0e=S#1Gn&3bd~)n^CkbCjr6-5b5ZX<`TnfLtxczqo7%^!R(p{-6f7_` z9TzfStdQ2Xv#LEiWo{!`Rmtz7-<>sArCquPHB-DsPHyy~Z*^%3B(~n<_8Zq*_$20f zvWt|?`!>ehJ;-9$@0nvcNCM&KRWA{jmfD4o(KU;B)BXBH19ajc)^T#3ngG z7waE9mMP~|;ld&wCV|-2H#_9`+RaaFm0vh=pAn30KOfgeRH@h>yHnKos*|wK;o)@^ zUE{+fIyN-KCZR|5^BbUX#02Caa^g<+@k{{~%xQ);#KF(aCI0!WK$&7hwD4O_X6F4T z!l)m0V_zcAx7^$m{VT?560Fal8aSk#WWI&rk2!~9T`DTa_UQr!{!<<%&hrfMS1%BL z7cX7DoZ6-P@?{*m@tfl>V%Z=QRbP4Yi`b^Bm9YtQoYCXpyD_(fMgD0^9fx;!KEKeo zsimcRviWa6AI0u*^Z~lI7v=<;Pe4D}NrA)N#x}G0>tL{W!(itZwY0RN4bD^w9ltZ@ zC92U!!onN|NOnQ+OEEAdmf!khNRLGZrbZ9KRp70i9O$l|=2(61c&@60v74HkyEbJ< zI&kaCiMzmpps%cC`FCfZjc;DUXJREAv?JfeZwr z|Ea2)TD%pky(6o#bE`bvd4vTBEbS&mL3%@&ocLd<5);{Mbjj-%GK_qOE9(By0##`! zIj_yTq~v68M;~9`#P8p~Q|`_!E{ZYJpv3XcUqvAE&!B*?{D?W7_H1UCmKbN{D?cIC z=`z%Qiw(hk(St1cW{LOeM_fjFo-zai@q1*lsFCrl%|83{qF4xouy^7H3ka3p1_za( zdYN9wZ`mX)0k@78RK=os6(U+$9|97~AfU2#{kmuVnIDT1n5{<>xcy>fV<`Lfw@SD1 zLspc&udJ+$xtzbs`3`7V$WkZ3D-i`3({@QNI3qLN)!VD-@+ua?8%J?0*bb?z>e8Du>n=gv0|bjno$P! zc&T#NDA&yQSL|Y{4i3B%5eZ)-LZ|`pjn2nFnIi%U2^z$-?iL`R z?wq~)Nn;q?Kr1r%V{M}=0GCqgucfW+&FBZL4b<4;7_0}EDmU`|DS$xD-aj_I9d&|JUvk%^P#6KM#axEY2HcJlKCV3`<686qG-EN;oW3>U`j|Gye9z6ow7Xu~4e}p*J&-rsI!R zUSmx#D3!bew$x^FW`&;TFtUf^Vl9LZ*tq|r{^vv0(xyxtM zgnyg+^I{l=mGiG{Z9|@^mZ+!9_p}o7%%{XE=C9N|2hSkm!I6O z*p0Gb*yGr_?;t05+NgD`uP6~na3O$svN`Upe9?7qC{M8$7ZbewMF>g4BdVj1Z(3qoS=U7HEimFad{QIEJ*GzhL z<&&zt)y4}x74};s--p1h?%<5#HmRtC}9wHJ1FcERP^$I&1s z3In7(-+unIVV;=ku3g#M*dTr|1nxe{&BJ?!hT!KY$v=6~q2_d111Jxb`P$uow1@UK z6Bkz)sJI${DoC=v(W658K-m%8mrj{RkV-2J{xSRo1|~i}_$!N*Vll3i#xMH~blD$3 zp;)?F)_FPFii!$d4UNywMCoqWqC%rS0YQHz`d8sUUIYAxAP%n7=Jhfj%y(cVDEMUo z$bsK5xw*Ls*^qwHt{a+s+_oY#;QjwT>YFS7SzY>zH_Om(x-3cn?PhBD`b zgaqGVZ*a2#OnUTg~^T@y(6d6Mh3+ucch6K049h z(K(R0WnOIS`NzOhqG5wkB3V^~m$ zMT#&2ZS-nQr<0Eb93xn>YPZoV<{DTg!TWNYP$R%IJf2uo?C`kKm~3L@9phbWqGL|t(!1tDF|BAyvZ&1_xEelW3wPs zaAL4x;6c41>H;Rn1()04;Mdu;EAD)pyPeUNgT-r+j&1bEeBon*q(pB?(v g-<{XVtv)ytIFhzlHb3$j4g7m7Bqo^kP}TMS0bd{oivR!s literal 0 HcmV?d00001 diff --git a/ceph/src/rocksdb/env/env.cc b/ceph/src/rocksdb/env/env.cc index 9b7f5e40d..fde03577d 100644 --- a/ceph/src/rocksdb/env/env.cc +++ b/ceph/src/rocksdb/env/env.cc @@ -30,6 +30,8 @@ std::string Env::PriorityToString(Env::Priority priority) { return "Low"; case Env::Priority::HIGH: return "High"; + case Env::Priority::USER: + return "User"; case Env::Priority::TOTAL: assert(false); } @@ -43,7 +45,7 @@ uint64_t Env::GetThreadID() const { Status Env::ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) { Status s = RenameFile(old_fname, fname); if (!s.ok()) { @@ -138,6 +140,8 @@ void Logger::Logv(const InfoLogLevel log_level, const char* format, va_list ap) // are INFO level. We don't want to add extra costs to those existing // logging. Logv(format, ap); + } else if (log_level == InfoLogLevel::HEADER_LEVEL) { + LogHeader(format, ap); } else { char new_format[500]; snprintf(new_format, sizeof(new_format) - 1, "[%s] %s", @@ -242,11 +246,11 @@ void Fatal(Logger* info_log, const char* format, ...) { va_end(ap); } -void LogFlush(const shared_ptr& info_log) { +void LogFlush(const std::shared_ptr& info_log) { LogFlush(info_log.get()); } -void Log(const InfoLogLevel log_level, const shared_ptr& info_log, +void Log(const InfoLogLevel log_level, const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); @@ -254,49 +258,49 @@ void Log(const InfoLogLevel log_level, const shared_ptr& info_log, va_end(ap); } -void Header(const shared_ptr& info_log, const char* format, ...) { +void Header(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Headerv(info_log.get(), format, ap); va_end(ap); } -void Debug(const shared_ptr& info_log, const char* format, ...) { +void Debug(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Debugv(info_log.get(), format, ap); va_end(ap); } -void Info(const shared_ptr& info_log, const char* format, ...) { +void Info(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Infov(info_log.get(), format, ap); va_end(ap); } -void Warn(const shared_ptr& info_log, const char* format, ...) { +void Warn(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Warnv(info_log.get(), format, ap); va_end(ap); } -void Error(const shared_ptr& info_log, const char* format, ...) { +void Error(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Errorv(info_log.get(), format, ap); va_end(ap); } -void Fatal(const shared_ptr& info_log, const char* format, ...) { +void Fatal(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Fatalv(info_log.get(), format, ap); va_end(ap); } -void Log(const shared_ptr& info_log, const char* format, ...) { +void Log(const std::shared_ptr& info_log, const char* format, ...) { va_list ap; va_start(ap, format); Logv(info_log.get(), format, ap); @@ -305,7 +309,7 @@ void Log(const shared_ptr& info_log, const char* format, ...) { Status WriteStringToFile(Env* env, const Slice& data, const std::string& fname, bool should_sync) { - unique_ptr file; + std::unique_ptr file; EnvOptions soptions; Status s = env->NewWritableFile(fname, &file, soptions); if (!s.ok()) { @@ -324,7 +328,7 @@ Status WriteStringToFile(Env* env, const Slice& data, const std::string& fname, Status ReadFileToString(Env* env, const std::string& fname, std::string* data) { EnvOptions soptions; data->clear(); - unique_ptr file; + std::unique_ptr file; Status s = env->NewSequentialFile(fname, &file, soptions); if (!s.ok()) { return s; diff --git a/ceph/src/rocksdb/env/env_basic_test.cc b/ceph/src/rocksdb/env/env_basic_test.cc index e05f61aa6..3efae758a 100644 --- a/ceph/src/rocksdb/env/env_basic_test.cc +++ b/ceph/src/rocksdb/env/env_basic_test.cc @@ -21,8 +21,8 @@ class NormalizingEnvWrapper : public EnvWrapper { explicit NormalizingEnvWrapper(Env* base) : EnvWrapper(base) {} // Removes . and .. from directory listing - virtual Status GetChildren(const std::string& dir, - std::vector* result) override { + Status GetChildren(const std::string& dir, + std::vector* result) override { Status status = EnvWrapper::GetChildren(dir, result); if (status.ok()) { result->erase(std::remove_if(result->begin(), result->end(), @@ -35,7 +35,7 @@ class NormalizingEnvWrapper : public EnvWrapper { } // Removes . and .. from directory listing - virtual Status GetChildrenFileAttributes( + Status GetChildrenFileAttributes( const std::string& dir, std::vector* result) override { Status status = EnvWrapper::GetChildrenFileAttributes(dir, result); if (status.ok()) { @@ -60,11 +60,9 @@ class EnvBasicTestWithParam : public testing::Test, test_dir_ = test::PerThreadDBPath(env_, "env_basic_test"); } - void SetUp() { - env_->CreateDirIfMissing(test_dir_); - } + void SetUp() override { env_->CreateDirIfMissing(test_dir_); } - void TearDown() { + void TearDown() override { std::vector files; env_->GetChildren(test_dir_, &files); for (const auto& file : files) { @@ -133,7 +131,7 @@ INSTANTIATE_TEST_CASE_P(CustomEnv, EnvMoreTestWithParam, TEST_P(EnvBasicTestWithParam, Basics) { uint64_t file_size; - unique_ptr writable_file; + std::unique_ptr writable_file; std::vector children; // Check that the directory is empty. @@ -186,8 +184,8 @@ TEST_P(EnvBasicTestWithParam, Basics) { ASSERT_EQ(0U, file_size); // Check that opening non-existent file fails. - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; ASSERT_TRUE(!env_->NewSequentialFile(test_dir_ + "/non_existent", &seq_file, soptions_) .ok()); @@ -208,9 +206,9 @@ TEST_P(EnvBasicTestWithParam, Basics) { } TEST_P(EnvBasicTestWithParam, ReadWrite) { - unique_ptr writable_file; - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr writable_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; Slice result; char scratch[100]; @@ -247,7 +245,7 @@ TEST_P(EnvBasicTestWithParam, ReadWrite) { } TEST_P(EnvBasicTestWithParam, Misc) { - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->NewWritableFile(test_dir_ + "/b", &writable_file, soptions_)); // These are no-ops, but we test they return success. @@ -266,14 +264,14 @@ TEST_P(EnvBasicTestWithParam, LargeWrite) { write_data.append(1, static_cast(i)); } - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->NewWritableFile(test_dir_ + "/f", &writable_file, soptions_)); ASSERT_OK(writable_file->Append("foo")); ASSERT_OK(writable_file->Append(write_data)); ASSERT_OK(writable_file->Close()); writable_file.reset(); - unique_ptr seq_file; + std::unique_ptr seq_file; Slice result; ASSERT_OK(env_->NewSequentialFile(test_dir_ + "/f", &seq_file, soptions_)); ASSERT_OK(seq_file->Read(3, &result, scratch)); // Read "foo". @@ -340,7 +338,7 @@ TEST_P(EnvMoreTestWithParam, GetChildren) { // if dir is a file, returns IOError ASSERT_OK(env_->CreateDir(test_dir_)); - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK( env_->NewWritableFile(test_dir_ + "/file", &writable_file, soptions_)); ASSERT_OK(writable_file->Close()); diff --git a/ceph/src/rocksdb/env/env_chroot.cc b/ceph/src/rocksdb/env/env_chroot.cc index 6a1fda8a8..8a7fb4499 100644 --- a/ceph/src/rocksdb/env/env_chroot.cc +++ b/ceph/src/rocksdb/env/env_chroot.cc @@ -38,9 +38,9 @@ class ChrootEnv : public EnvWrapper { #endif } - virtual Status NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override { + Status NewSequentialFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -49,9 +49,9 @@ class ChrootEnv : public EnvWrapper { options); } - virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -60,9 +60,9 @@ class ChrootEnv : public EnvWrapper { options); } - virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -71,10 +71,10 @@ class ChrootEnv : public EnvWrapper { options); } - virtual Status ReuseWritableFile(const std::string& fname, - const std::string& old_fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReuseWritableFile(const std::string& fname, + const std::string& old_fname, + std::unique_ptr* result, + const EnvOptions& options) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -88,9 +88,9 @@ class ChrootEnv : public EnvWrapper { options); } - virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomRWFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -99,8 +99,8 @@ class ChrootEnv : public EnvWrapper { options); } - virtual Status NewDirectory(const std::string& dir, - unique_ptr* result) override { + Status NewDirectory(const std::string& dir, + std::unique_ptr* result) override { auto status_and_enc_path = EncodePathWithNewBasename(dir); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -108,7 +108,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::NewDirectory(status_and_enc_path.second, result); } - virtual Status FileExists(const std::string& fname) override { + Status FileExists(const std::string& fname) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -116,8 +116,8 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::FileExists(status_and_enc_path.second); } - virtual Status GetChildren(const std::string& dir, - std::vector* result) override { + Status GetChildren(const std::string& dir, + std::vector* result) override { auto status_and_enc_path = EncodePath(dir); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -125,7 +125,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::GetChildren(status_and_enc_path.second, result); } - virtual Status GetChildrenFileAttributes( + Status GetChildrenFileAttributes( const std::string& dir, std::vector* result) override { auto status_and_enc_path = EncodePath(dir); if (!status_and_enc_path.first.ok()) { @@ -135,7 +135,7 @@ class ChrootEnv : public EnvWrapper { result); } - virtual Status DeleteFile(const std::string& fname) override { + Status DeleteFile(const std::string& fname) override { auto status_and_enc_path = EncodePath(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -143,7 +143,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::DeleteFile(status_and_enc_path.second); } - virtual Status CreateDir(const std::string& dirname) override { + Status CreateDir(const std::string& dirname) override { auto status_and_enc_path = EncodePathWithNewBasename(dirname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -151,7 +151,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::CreateDir(status_and_enc_path.second); } - virtual Status CreateDirIfMissing(const std::string& dirname) override { + Status CreateDirIfMissing(const std::string& dirname) override { auto status_and_enc_path = EncodePathWithNewBasename(dirname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -159,7 +159,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::CreateDirIfMissing(status_and_enc_path.second); } - virtual Status DeleteDir(const std::string& dirname) override { + Status DeleteDir(const std::string& dirname) override { auto status_and_enc_path = EncodePath(dirname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -167,8 +167,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::DeleteDir(status_and_enc_path.second); } - virtual Status GetFileSize(const std::string& fname, - uint64_t* file_size) override { + Status GetFileSize(const std::string& fname, uint64_t* file_size) override { auto status_and_enc_path = EncodePath(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -176,8 +175,8 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::GetFileSize(status_and_enc_path.second, file_size); } - virtual Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime) override { + Status GetFileModificationTime(const std::string& fname, + uint64_t* file_mtime) override { auto status_and_enc_path = EncodePath(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -186,8 +185,7 @@ class ChrootEnv : public EnvWrapper { file_mtime); } - virtual Status RenameFile(const std::string& src, - const std::string& dest) override { + Status RenameFile(const std::string& src, const std::string& dest) override { auto status_and_src_enc_path = EncodePath(src); if (!status_and_src_enc_path.first.ok()) { return status_and_src_enc_path.first; @@ -200,8 +198,7 @@ class ChrootEnv : public EnvWrapper { status_and_dest_enc_path.second); } - virtual Status LinkFile(const std::string& src, - const std::string& dest) override { + Status LinkFile(const std::string& src, const std::string& dest) override { auto status_and_src_enc_path = EncodePath(src); if (!status_and_src_enc_path.first.ok()) { return status_and_src_enc_path.first; @@ -214,7 +211,7 @@ class ChrootEnv : public EnvWrapper { status_and_dest_enc_path.second); } - virtual Status LockFile(const std::string& fname, FileLock** lock) override { + Status LockFile(const std::string& fname, FileLock** lock) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -225,7 +222,7 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::LockFile(status_and_enc_path.second, lock); } - virtual Status GetTestDirectory(std::string* path) override { + Status GetTestDirectory(std::string* path) override { // Adapted from PosixEnv's implementation since it doesn't provide a way to // create directory in the chroot. char buf[256]; @@ -237,8 +234,8 @@ class ChrootEnv : public EnvWrapper { return Status::OK(); } - virtual Status NewLogger(const std::string& fname, - shared_ptr* result) override { + Status NewLogger(const std::string& fname, + std::shared_ptr* result) override { auto status_and_enc_path = EncodePathWithNewBasename(fname); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; @@ -246,8 +243,8 @@ class ChrootEnv : public EnvWrapper { return EnvWrapper::NewLogger(status_and_enc_path.second, result); } - virtual Status GetAbsolutePath(const std::string& db_path, - std::string* output_path) override { + Status GetAbsolutePath(const std::string& db_path, + std::string* output_path) override { auto status_and_enc_path = EncodePath(db_path); if (!status_and_enc_path.first.ok()) { return status_and_enc_path.first; diff --git a/ceph/src/rocksdb/env/env_encryption.cc b/ceph/src/rocksdb/env/env_encryption.cc index e80796fe0..aa59e6635 100644 --- a/ceph/src/rocksdb/env/env_encryption.cc +++ b/ceph/src/rocksdb/env/env_encryption.cc @@ -8,6 +8,7 @@ #include #include #include +#include #include "rocksdb/env_encryption.h" #include "util/aligned_buffer.h" @@ -42,7 +43,7 @@ class EncryptedSequentialFile : public SequentialFile { // If an error was encountered, returns a non-OK status. // // REQUIRES: External synchronization - virtual Status Read(size_t n, Slice* result, char* scratch) override { + Status Read(size_t n, Slice* result, char* scratch) override { assert(scratch); Status status = file_->Read(n, result, scratch); if (!status.ok()) { @@ -60,7 +61,7 @@ class EncryptedSequentialFile : public SequentialFile { // file, and Skip will return OK. // // REQUIRES: External synchronization - virtual Status Skip(uint64_t n) override { + Status Skip(uint64_t n) override { auto status = file_->Skip(n); if (!status.ok()) { return status; @@ -71,26 +72,25 @@ class EncryptedSequentialFile : public SequentialFile { // Indicates the upper layers if the current SequentialFile implementation // uses direct IO. - virtual bool use_direct_io() const override { - return file_->use_direct_io(); - } + bool use_direct_io() const override { return file_->use_direct_io(); } // Use the returned alignment value to allocate // aligned buffer for Direct I/O - virtual size_t GetRequiredBufferAlignment() const override { - return file_->GetRequiredBufferAlignment(); + size_t GetRequiredBufferAlignment() const override { + return file_->GetRequiredBufferAlignment(); } // Remove any kind of caching of data from the offset to offset+length // of this file. If the length is 0, then it refers to the end of file. // If the system is not caching the file contents, then this is a noop. - virtual Status InvalidateCache(size_t offset, size_t length) override { + Status InvalidateCache(size_t offset, size_t length) override { return file_->InvalidateCache(offset + prefixLength_, length); } // Positioned Read for direct I/O // If Direct I/O enabled, offset, n, and scratch should be properly aligned - virtual Status PositionedRead(uint64_t offset, size_t n, Slice* result, char* scratch) override { + Status PositionedRead(uint64_t offset, size_t n, Slice* result, + char* scratch) override { assert(scratch); offset += prefixLength_; // Skip prefix auto status = file_->PositionedRead(offset, n, result, scratch); @@ -101,7 +101,6 @@ class EncryptedSequentialFile : public SequentialFile { status = stream_->Decrypt(offset, (char*)result->data(), result->size()); return status; } - }; // A file abstraction for randomly reading the contents of a file. @@ -125,7 +124,8 @@ class EncryptedRandomAccessFile : public RandomAccessFile { // // Safe for concurrent use by multiple threads. // If Direct I/O enabled, offset, n, and scratch should be aligned properly. - virtual Status Read(uint64_t offset, size_t n, Slice* result, char* scratch) const override { + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { assert(scratch); offset += prefixLength_; auto status = file_->Read(offset, n, result, scratch); @@ -137,7 +137,7 @@ class EncryptedRandomAccessFile : public RandomAccessFile { } // Readahead the file starting from offset by n bytes for caching. - virtual Status Prefetch(uint64_t offset, size_t n) override { + Status Prefetch(uint64_t offset, size_t n) override { //return Status::OK(); return file_->Prefetch(offset + prefixLength_, n); } @@ -157,30 +157,26 @@ class EncryptedRandomAccessFile : public RandomAccessFile { // a single varint. // // Note: these IDs are only valid for the duration of the process. - virtual size_t GetUniqueId(char* id, size_t max_size) const override { + size_t GetUniqueId(char* id, size_t max_size) const override { return file_->GetUniqueId(id, max_size); }; - virtual void Hint(AccessPattern pattern) override { - file_->Hint(pattern); - } + void Hint(AccessPattern pattern) override { file_->Hint(pattern); } // Indicates the upper layers if the current RandomAccessFile implementation // uses direct IO. - virtual bool use_direct_io() const override { - return file_->use_direct_io(); - } + bool use_direct_io() const override { return file_->use_direct_io(); } // Use the returned alignment value to allocate // aligned buffer for Direct I/O - virtual size_t GetRequiredBufferAlignment() const override { - return file_->GetRequiredBufferAlignment(); + size_t GetRequiredBufferAlignment() const override { + return file_->GetRequiredBufferAlignment(); } // Remove any kind of caching of data from the offset to offset+length // of this file. If the length is 0, then it refers to the end of file. // If the system is not caching the file contents, then this is a noop. - virtual Status InvalidateCache(size_t offset, size_t length) override { + Status InvalidateCache(size_t offset, size_t length) override { return file_->InvalidateCache(offset + prefixLength_, length); } }; @@ -247,16 +243,18 @@ class EncryptedWritableFile : public WritableFileWrapper { // Indicates the upper layers if the current WritableFile implementation // uses direct IO. - virtual bool use_direct_io() const override { return file_->use_direct_io(); } + bool use_direct_io() const override { return file_->use_direct_io(); } // Use the returned alignment value to allocate // aligned buffer for Direct I/O - virtual size_t GetRequiredBufferAlignment() const override { return file_->GetRequiredBufferAlignment(); } + size_t GetRequiredBufferAlignment() const override { + return file_->GetRequiredBufferAlignment(); + } /* * Get the size of valid data in the file. */ - virtual uint64_t GetFileSize() override { + uint64_t GetFileSize() override { return file_->GetFileSize() - prefixLength_; } @@ -264,7 +262,7 @@ class EncryptedWritableFile : public WritableFileWrapper { // before closing. It is not always possible to keep track of the file // size due to whole pages writes. The behavior is undefined if called // with other writes to follow. - virtual Status Truncate(uint64_t size) override { + Status Truncate(uint64_t size) override { return file_->Truncate(size + prefixLength_); } @@ -272,7 +270,7 @@ class EncryptedWritableFile : public WritableFileWrapper { // of this file. If the length is 0, then it refers to the end of file. // If the system is not caching the file contents, then this is a noop. // This call has no effect on dirty pages in the cache. - virtual Status InvalidateCache(size_t offset, size_t length) override { + Status InvalidateCache(size_t offset, size_t length) override { return file_->InvalidateCache(offset + prefixLength_, length); } @@ -282,7 +280,7 @@ class EncryptedWritableFile : public WritableFileWrapper { // This asks the OS to initiate flushing the cached data to disk, // without waiting for completion. // Default implementation does nothing. - virtual Status RangeSync(uint64_t offset, uint64_t nbytes) override { + Status RangeSync(uint64_t offset, uint64_t nbytes) override { return file_->RangeSync(offset + prefixLength_, nbytes); } @@ -291,12 +289,12 @@ class EncryptedWritableFile : public WritableFileWrapper { // of space on devices where it can result in less file // fragmentation and/or less waste from over-zealous filesystem // pre-allocation. - virtual void PrepareWrite(size_t offset, size_t len) override { + void PrepareWrite(size_t offset, size_t len) override { file_->PrepareWrite(offset + prefixLength_, len); } // Pre-allocates space for a file. - virtual Status Allocate(uint64_t offset, uint64_t len) override { + Status Allocate(uint64_t offset, uint64_t len) override { return file_->Allocate(offset + prefixLength_, len); } }; @@ -314,17 +312,17 @@ class EncryptedRandomRWFile : public RandomRWFile { // Indicates if the class makes use of direct I/O // If false you must pass aligned buffer to Write() - virtual bool use_direct_io() const override { return file_->use_direct_io(); } + bool use_direct_io() const override { return file_->use_direct_io(); } // Use the returned alignment value to allocate // aligned buffer for Direct I/O - virtual size_t GetRequiredBufferAlignment() const override { - return file_->GetRequiredBufferAlignment(); + size_t GetRequiredBufferAlignment() const override { + return file_->GetRequiredBufferAlignment(); } // Write bytes in `data` at offset `offset`, Returns Status::OK() on success. // Pass aligned buffer when use_direct_io() returns true. - virtual Status Write(uint64_t offset, const Slice& data) override { + Status Write(uint64_t offset, const Slice& data) override { AlignedBuffer buf; Status status; Slice dataToWrite(data); @@ -347,7 +345,8 @@ class EncryptedRandomRWFile : public RandomRWFile { // Read up to `n` bytes starting from offset `offset` and store them in // result, provided `scratch` size should be at least `n`. // Returns Status::OK() on success. - virtual Status Read(uint64_t offset, size_t n, Slice* result, char* scratch) const override { + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { assert(scratch); offset += prefixLength_; auto status = file_->Read(offset, n, result, scratch); @@ -358,21 +357,13 @@ class EncryptedRandomRWFile : public RandomRWFile { return status; } - virtual Status Flush() override { - return file_->Flush(); - } + Status Flush() override { return file_->Flush(); } - virtual Status Sync() override { - return file_->Sync(); - } + Status Sync() override { return file_->Sync(); } - virtual Status Fsync() override { - return file_->Fsync(); - } + Status Fsync() override { return file_->Fsync(); } - virtual Status Close() override { - return file_->Close(); - } + Status Close() override { return file_->Close(); } }; // EncryptedEnv implements an Env wrapper that adds encryption to files stored on disk. @@ -384,9 +375,9 @@ class EncryptedEnv : public EnvWrapper { } // NewSequentialFile opens a file for sequential reading. - virtual Status NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override { + Status NewSequentialFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_reads) { return Status::InvalidArgument(); @@ -421,9 +412,9 @@ class EncryptedEnv : public EnvWrapper { } // NewRandomAccessFile opens a file for random read access. - virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_reads) { return Status::InvalidArgument(); @@ -456,11 +447,11 @@ class EncryptedEnv : public EnvWrapper { (*result) = std::unique_ptr(new EncryptedRandomAccessFile(underlying.release(), stream.release(), prefixLength)); return Status::OK(); } - + // NewWritableFile opens a file for sequential writing. - virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_writes) { return Status::InvalidArgument(); @@ -504,9 +495,9 @@ class EncryptedEnv : public EnvWrapper { // returns non-OK. // // The returned file will only be accessed by one thread at a time. - virtual Status ReopenWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReopenWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_writes) { return Status::InvalidArgument(); @@ -544,10 +535,10 @@ class EncryptedEnv : public EnvWrapper { } // Reuse an existing file by renaming it and opening it as writable. - virtual Status ReuseWritableFile(const std::string& fname, - const std::string& old_fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReuseWritableFile(const std::string& fname, + const std::string& old_fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_writes) { return Status::InvalidArgument(); @@ -589,9 +580,9 @@ class EncryptedEnv : public EnvWrapper { // *result and returns OK. On failure returns non-OK. // // The returned file will only be accessed by one thread at a time. - virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomRWFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); if (options.use_mmap_reads || options.use_mmap_writes) { return Status::InvalidArgument(); @@ -649,7 +640,8 @@ class EncryptedEnv : public EnvWrapper { // NotFound if "dir" does not exist, the calling process does not have // permission to access "dir", or if "dir" is invalid. // IOError if an IO Error was encountered - virtual Status GetChildrenFileAttributes(const std::string& dir, std::vector* result) override { + Status GetChildrenFileAttributes( + const std::string& dir, std::vector* result) override { auto status = EnvWrapper::GetChildrenFileAttributes(dir, result); if (!status.ok()) { return status; @@ -660,10 +652,10 @@ class EncryptedEnv : public EnvWrapper { it->size_bytes -= prefixLength; } return Status::OK(); - } + } // Store the size of fname in *file_size. - virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) override { + Status GetFileSize(const std::string& fname, uint64_t* file_size) override { auto status = EnvWrapper::GetFileSize(fname, file_size); if (!status.ok()) { return status; @@ -671,7 +663,7 @@ class EncryptedEnv : public EnvWrapper { size_t prefixLength = provider_->GetPrefixLength(); assert(*file_size >= prefixLength); *file_size -= prefixLength; - return Status::OK(); + return Status::OK(); } private: @@ -692,7 +684,7 @@ Status BlockAccessCipherStream::Encrypt(uint64_t fileOffset, char *data, size_t auto blockSize = BlockSize(); uint64_t blockIndex = fileOffset / blockSize; size_t blockOffset = fileOffset % blockSize; - unique_ptr blockBuffer; + std::unique_ptr blockBuffer; std::string scratch; AllocateScratch(scratch); @@ -705,8 +697,8 @@ Status BlockAccessCipherStream::Encrypt(uint64_t fileOffset, char *data, size_t // We're not encrypting a full block. // Copy data to blockBuffer if (!blockBuffer.get()) { - // Allocate buffer - blockBuffer = unique_ptr(new char[blockSize]); + // Allocate buffer + blockBuffer = std::unique_ptr(new char[blockSize]); } block = blockBuffer.get(); // Copy plain data to block buffer @@ -737,11 +729,13 @@ Status BlockAccessCipherStream::Decrypt(uint64_t fileOffset, char *data, size_t auto blockSize = BlockSize(); uint64_t blockIndex = fileOffset / blockSize; size_t blockOffset = fileOffset % blockSize; - unique_ptr blockBuffer; + std::unique_ptr blockBuffer; std::string scratch; AllocateScratch(scratch); + assert(fileOffset < dataSize); + // Decrypt individual blocks. while (1) { char *block = data; @@ -750,8 +744,8 @@ Status BlockAccessCipherStream::Decrypt(uint64_t fileOffset, char *data, size_t // We're not decrypting a full block. // Copy data to blockBuffer if (!blockBuffer.get()) { - // Allocate buffer - blockBuffer = unique_ptr(new char[blockSize]); + // Allocate buffer + blockBuffer = std::unique_ptr(new char[blockSize]); } block = blockBuffer.get(); // Copy encrypted data to block buffer @@ -765,6 +759,14 @@ Status BlockAccessCipherStream::Decrypt(uint64_t fileOffset, char *data, size_t // Copy decrypted data back to `data`. memmove(data, block + blockOffset, n); } + + // Simply decrementing dataSize by n could cause it to underflow, + // which will very likely make it read over the original bounds later + assert(dataSize >= n); + if (dataSize < n) { + return Status::Corruption("Cannot decrypt data at given offset"); + } + dataSize -= n; if (dataSize == 0) { return Status::OK(); @@ -882,13 +884,22 @@ size_t CTREncryptionProvider::PopulateSecretPrefixPart(char* /*prefix*/, return 0; } -Status CTREncryptionProvider::CreateCipherStream(const std::string& fname, const EnvOptions& options, Slice &prefix, unique_ptr* result) { +Status CTREncryptionProvider::CreateCipherStream( + const std::string& fname, const EnvOptions& options, Slice& prefix, + std::unique_ptr* result) { // Read plain text part of prefix. auto blockSize = cipher_.BlockSize(); uint64_t initialCounter; Slice iv; decodeCTRParameters(prefix.data(), blockSize, initialCounter, iv); + // If the prefix is smaller than twice the block size, we would below read a + // very large chunk of the file (and very likely read over the bounds) + assert(prefix.size() >= 2 * blockSize); + if (prefix.size() < 2 * blockSize) { + return Status::Corruption("Unable to read from file " + fname + ": read attempt would read beyond file bounds"); + } + // Decrypt the encrypted part of the prefix, starting from block 2 (block 0, 1 with initial counter & IV are unencrypted) CTRCipherStream cipherStream(cipher_, iv.data(), initialCounter); auto status = cipherStream.Decrypt(0, (char*)prefix.data() + (2 * blockSize), prefix.size() - (2 * blockSize)); @@ -905,8 +916,9 @@ Status CTREncryptionProvider::CreateCipherStream(const std::string& fname, const Status CTREncryptionProvider::CreateCipherStreamFromPrefix( const std::string& /*fname*/, const EnvOptions& /*options*/, uint64_t initialCounter, const Slice& iv, const Slice& /*prefix*/, - unique_ptr* result) { - (*result) = unique_ptr(new CTRCipherStream(cipher_, iv.data(), initialCounter)); + std::unique_ptr* result) { + (*result) = std::unique_ptr( + new CTRCipherStream(cipher_, iv.data(), initialCounter)); return Status::OK(); } diff --git a/ceph/src/rocksdb/env/env_hdfs.cc b/ceph/src/rocksdb/env/env_hdfs.cc index 1eaea3a1c..5acf9301c 100644 --- a/ceph/src/rocksdb/env/env_hdfs.cc +++ b/ceph/src/rocksdb/env/env_hdfs.cc @@ -11,13 +11,14 @@ #ifndef ROCKSDB_HDFS_FILE_C #define ROCKSDB_HDFS_FILE_C -#include #include #include #include +#include #include #include #include "rocksdb/status.h" +#include "util/logging.h" #include "util/string_util.h" #define HDFS_EXISTS 0 @@ -36,9 +37,11 @@ namespace { // Log error message static Status IOError(const std::string& context, int err_number) { - return (err_number == ENOSPC) ? - Status::NoSpace(context, strerror(err_number)) : - Status::IOError(context, strerror(err_number)); + return (err_number == ENOSPC) + ? Status::NoSpace(context, strerror(err_number)) + : (err_number == ENOENT) + ? Status::PathNotFound(context, strerror(err_number)) + : Status::IOError(context, strerror(err_number)); } // assume that there is one global logger for now. It is not thread-safe, @@ -222,7 +225,7 @@ class HdfsWritableFile: public WritableFile { filename_.c_str()); const char* src = data.data(); size_t left = data.size(); - size_t ret = hdfsWrite(fileSys_, hfile_, src, left); + size_t ret = hdfsWrite(fileSys_, hfile_, src, static_cast(left)); ROCKS_LOG_DEBUG(mylog, "[hdfs] HdfsWritableFile Appended %s\n", filename_.c_str()); if (ret != left) { @@ -252,7 +255,8 @@ class HdfsWritableFile: public WritableFile { // This is used by HdfsLogger to write data to the debug log file virtual Status Append(const char* src, size_t size) { - if (hdfsWrite(fileSys_, hfile_, src, size) != (tSize)size) { + if (hdfsWrite(fileSys_, hfile_, src, static_cast(size)) != + static_cast(size)) { return IOError(filename_, errno); } return Status::OK(); @@ -280,11 +284,10 @@ class HdfsLogger : public Logger { Status HdfsCloseHelper() { ROCKS_LOG_DEBUG(mylog, "[hdfs] HdfsLogger closed %s\n", file_->getName().c_str()); - Status s = file_->Close(); if (mylog != nullptr && mylog == this) { mylog = nullptr; } - return s; + return Status::OK(); } protected: @@ -297,14 +300,15 @@ class HdfsLogger : public Logger { file_->getName().c_str()); } - virtual ~HdfsLogger() { + ~HdfsLogger() override { if (!closed_) { closed_ = true; HdfsCloseHelper(); } } - virtual void Logv(const char* format, va_list ap) { + using Logger::Logv; + void Logv(const char* format, va_list ap) override { const uint64_t thread_id = (*gettid_)(); // We try twice: the first time with a fixed-size stack allocated buffer, @@ -381,8 +385,8 @@ const std::string HdfsEnv::pathsep = "/"; // open a file for sequential reading Status HdfsEnv::NewSequentialFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& /*options*/) { result->reset(); HdfsReadableFile* f = new HdfsReadableFile(fileSys_, fname); if (f == nullptr || !f->isValid()) { @@ -396,8 +400,8 @@ Status HdfsEnv::NewSequentialFile(const std::string& fname, // open a file for random reading Status HdfsEnv::NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& /*options*/) { result->reset(); HdfsReadableFile* f = new HdfsReadableFile(fileSys_, fname); if (f == nullptr || !f->isValid()) { @@ -411,8 +415,8 @@ Status HdfsEnv::NewRandomAccessFile(const std::string& fname, // create a new file for writing Status HdfsEnv::NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& /*options*/) { result->reset(); Status s; HdfsWritableFile* f = new HdfsWritableFile(fileSys_, fname); @@ -430,14 +434,16 @@ class HdfsDirectory : public Directory { explicit HdfsDirectory(int fd) : fd_(fd) {} ~HdfsDirectory() {} - virtual Status Fsync() { return Status::OK(); } + Status Fsync() override { return Status::OK(); } + + int GetFd() const { return fd_; } private: int fd_; }; Status HdfsEnv::NewDirectory(const std::string& name, - unique_ptr* result) { + std::unique_ptr* result) { int value = hdfsExists(fileSys_, name.c_str()); switch (value) { case HDFS_EXISTS: @@ -475,10 +481,10 @@ Status HdfsEnv::GetChildren(const std::string& path, pHdfsFileInfo = hdfsListDirectory(fileSys_, path.c_str(), &numEntries); if (numEntries >= 0) { for(int i = 0; i < numEntries; i++) { - char* pathname = pHdfsFileInfo[i].mName; - char* filename = std::rindex(pathname, '/'); - if (filename != nullptr) { - result->push_back(filename+1); + std::string pathname(pHdfsFileInfo[i].mName); + size_t pos = pathname.rfind("/"); + if (std::string::npos != pos) { + result->push_back(pathname.substr(pos + 1)); } } if (pHdfsFileInfo != nullptr) { @@ -569,19 +575,17 @@ Status HdfsEnv::RenameFile(const std::string& src, const std::string& target) { return IOError(src, errno); } -Status HdfsEnv::LockFile(const std::string& fname, FileLock** lock) { +Status HdfsEnv::LockFile(const std::string& /*fname*/, FileLock** lock) { // there isn's a very good way to atomically check and create // a file via libhdfs *lock = nullptr; return Status::OK(); } -Status HdfsEnv::UnlockFile(FileLock* lock) { - return Status::OK(); -} +Status HdfsEnv::UnlockFile(FileLock* /*lock*/) { return Status::OK(); } Status HdfsEnv::NewLogger(const std::string& fname, - shared_ptr* result) { + std::shared_ptr* result) { HdfsWritableFile* f = new HdfsWritableFile(fileSys_, fname); if (f == nullptr || !f->isValid()) { delete f; @@ -610,10 +614,10 @@ Status NewHdfsEnv(Env** hdfs_env, const std::string& fsname) { // dummy placeholders used when HDFS is not available namespace rocksdb { Status HdfsEnv::NewSequentialFile(const std::string& /*fname*/, - unique_ptr* /*result*/, + std::unique_ptr* /*result*/, const EnvOptions& /*options*/) { return Status::NotSupported("Not compiled with hdfs support"); - } +} Status NewHdfsEnv(Env** /*hdfs_env*/, const std::string& /*fsname*/) { return Status::NotSupported("Not compiled with hdfs support"); diff --git a/ceph/src/rocksdb/env/env_posix.cc b/ceph/src/rocksdb/env/env_posix.cc index 34d49b9dc..387c02793 100644 --- a/ceph/src/rocksdb/env/env_posix.cc +++ b/ceph/src/rocksdb/env/env_posix.cc @@ -119,7 +119,7 @@ class PosixEnv : public Env { public: PosixEnv(); - virtual ~PosixEnv() { + ~PosixEnv() override { for (const auto tid : threads_to_join_) { pthread_join(tid, nullptr); } @@ -141,9 +141,9 @@ class PosixEnv : public Env { } } - virtual Status NewSequentialFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewSequentialFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); int fd = -1; int flags = cloexec_flags(O_RDONLY, &options); @@ -191,9 +191,9 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); Status s; int fd; @@ -249,7 +249,7 @@ class PosixEnv : public Env { } virtual Status OpenWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options, bool reopen = false) { result->reset(); @@ -332,22 +332,22 @@ class PosixEnv : public Env { return s; } - virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { return OpenWritableFile(fname, result, options, false); } - virtual Status ReopenWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReopenWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { return OpenWritableFile(fname, result, options, true); } - virtual Status ReuseWritableFile(const std::string& fname, - const std::string& old_fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReuseWritableFile(const std::string& fname, + const std::string& old_fname, + std::unique_ptr* result, + const EnvOptions& options) override { result->reset(); Status s; int fd = -1; @@ -429,9 +429,9 @@ class PosixEnv : public Env { return s; } - virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomRWFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { int fd = -1; int flags = cloexec_flags(O_RDWR, &options); @@ -453,9 +453,9 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status NewMemoryMappedFileBuffer( + Status NewMemoryMappedFileBuffer( const std::string& fname, - unique_ptr* result) override { + std::unique_ptr* result) override { int fd = -1; Status status; int flags = cloexec_flags(O_RDWR, nullptr); @@ -496,8 +496,8 @@ class PosixEnv : public Env { return status; } - virtual Status NewDirectory(const std::string& name, - unique_ptr* result) override { + Status NewDirectory(const std::string& name, + std::unique_ptr* result) override { result->reset(); int fd; int flags = cloexec_flags(0, nullptr); @@ -513,7 +513,7 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status FileExists(const std::string& fname) override { + Status FileExists(const std::string& fname) override { int result = access(fname.c_str(), F_OK); if (result == 0) { @@ -535,8 +535,8 @@ class PosixEnv : public Env { } } - virtual Status GetChildren(const std::string& dir, - std::vector* result) override { + Status GetChildren(const std::string& dir, + std::vector* result) override { result->clear(); DIR* d = opendir(dir.c_str()); if (d == nullptr) { @@ -557,7 +557,7 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status DeleteFile(const std::string& fname) override { + Status DeleteFile(const std::string& fname) override { Status result; if (unlink(fname.c_str()) != 0) { result = IOError("while unlink() file", fname, errno); @@ -565,7 +565,7 @@ class PosixEnv : public Env { return result; }; - virtual Status CreateDir(const std::string& name) override { + Status CreateDir(const std::string& name) override { Status result; if (mkdir(name.c_str(), 0755) != 0) { result = IOError("While mkdir", name, errno); @@ -573,7 +573,7 @@ class PosixEnv : public Env { return result; }; - virtual Status CreateDirIfMissing(const std::string& name) override { + Status CreateDirIfMissing(const std::string& name) override { Status result; if (mkdir(name.c_str(), 0755) != 0) { if (errno != EEXIST) { @@ -587,7 +587,7 @@ class PosixEnv : public Env { return result; }; - virtual Status DeleteDir(const std::string& name) override { + Status DeleteDir(const std::string& name) override { Status result; if (rmdir(name.c_str()) != 0) { result = IOError("file rmdir", name, errno); @@ -595,8 +595,7 @@ class PosixEnv : public Env { return result; }; - virtual Status GetFileSize(const std::string& fname, - uint64_t* size) override { + Status GetFileSize(const std::string& fname, uint64_t* size) override { Status s; struct stat sbuf; if (stat(fname.c_str(), &sbuf) != 0) { @@ -608,8 +607,8 @@ class PosixEnv : public Env { return s; } - virtual Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime) override { + Status GetFileModificationTime(const std::string& fname, + uint64_t* file_mtime) override { struct stat s; if (stat(fname.c_str(), &s) !=0) { return IOError("while stat a file for modification time", fname, errno); @@ -617,8 +616,8 @@ class PosixEnv : public Env { *file_mtime = static_cast(s.st_mtime); return Status::OK(); } - virtual Status RenameFile(const std::string& src, - const std::string& target) override { + Status RenameFile(const std::string& src, + const std::string& target) override { Status result; if (rename(src.c_str(), target.c_str()) != 0) { result = IOError("While renaming a file to " + target, src, errno); @@ -626,8 +625,7 @@ class PosixEnv : public Env { return result; } - virtual Status LinkFile(const std::string& src, - const std::string& target) override { + Status LinkFile(const std::string& src, const std::string& target) override { Status result; if (link(src.c_str(), target.c_str()) != 0) { if (errno == EXDEV) { @@ -647,8 +645,8 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status AreFilesSame(const std::string& first, - const std::string& second, bool* res) override { + Status AreFilesSame(const std::string& first, const std::string& second, + bool* res) override { struct stat statbuf[2]; if (stat(first.c_str(), &statbuf[0]) != 0) { return IOError("stat file", first, errno); @@ -667,7 +665,7 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status LockFile(const std::string& fname, FileLock** lock) override { + Status LockFile(const std::string& fname, FileLock** lock) override { *lock = nullptr; Status result; @@ -713,7 +711,7 @@ class PosixEnv : public Env { return result; } - virtual Status UnlockFile(FileLock* lock) override { + Status UnlockFile(FileLock* lock) override { PosixFileLock* my_lock = reinterpret_cast(lock); Status result; mutex_lockedFiles.Lock(); @@ -731,19 +729,19 @@ class PosixEnv : public Env { return result; } - virtual void Schedule(void (*function)(void* arg1), void* arg, - Priority pri = LOW, void* tag = nullptr, - void (*unschedFunction)(void* arg) = nullptr) override; + void Schedule(void (*function)(void* arg1), void* arg, Priority pri = LOW, + void* tag = nullptr, + void (*unschedFunction)(void* arg) = nullptr) override; - virtual int UnSchedule(void* arg, Priority pri) override; + int UnSchedule(void* arg, Priority pri) override; - virtual void StartThread(void (*function)(void* arg), void* arg) override; + void StartThread(void (*function)(void* arg), void* arg) override; - virtual void WaitForJoin() override; + void WaitForJoin() override; - virtual unsigned int GetThreadPoolQueueLen(Priority pri = LOW) const override; + unsigned int GetThreadPoolQueueLen(Priority pri = LOW) const override; - virtual Status GetTestDirectory(std::string* result) override { + Status GetTestDirectory(std::string* result) override { const char* env = getenv("TEST_TMPDIR"); if (env && env[0] != '\0') { *result = env; @@ -757,8 +755,7 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status GetThreadList( - std::vector* thread_list) override { + Status GetThreadList(std::vector* thread_list) override { assert(thread_status_updater_); return thread_status_updater_->GetThreadList(thread_list); } @@ -774,12 +771,9 @@ class PosixEnv : public Env { return gettid(tid); } - virtual uint64_t GetThreadID() const override { - return gettid(pthread_self()); - } + uint64_t GetThreadID() const override { return gettid(pthread_self()); } - virtual Status GetFreeSpace(const std::string& fname, - uint64_t* free_space) override { + Status GetFreeSpace(const std::string& fname, uint64_t* free_space) override { struct statvfs sbuf; if (statvfs(fname.c_str(), &sbuf) < 0) { @@ -790,8 +784,8 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status NewLogger(const std::string& fname, - shared_ptr* result) override { + Status NewLogger(const std::string& fname, + std::shared_ptr* result) override { FILE* f; { IOSTATS_TIMER_GUARD(open_nanos); @@ -817,13 +811,13 @@ class PosixEnv : public Env { } } - virtual uint64_t NowMicros() override { + uint64_t NowMicros() override { struct timeval tv; gettimeofday(&tv, nullptr); return static_cast(tv.tv_sec) * 1000000 + tv.tv_usec; } - virtual uint64_t NowNanos() override { + uint64_t NowNanos() override { #if defined(OS_LINUX) || defined(OS_FREEBSD) || defined(OS_AIX) struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); @@ -843,9 +837,19 @@ class PosixEnv : public Env { #endif } - virtual void SleepForMicroseconds(int micros) override { usleep(micros); } + uint64_t NowCPUNanos() override { +#if defined(OS_LINUX) || defined(OS_FREEBSD) || defined(OS_AIX) || \ + defined(__MACH__) + struct timespec ts; + clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts); + return static_cast(ts.tv_sec) * 1000000000 + ts.tv_nsec; +#endif + return 0; + } + + void SleepForMicroseconds(int micros) override { usleep(micros); } - virtual Status GetHostName(char* name, uint64_t len) override { + Status GetHostName(char* name, uint64_t len) override { int ret = gethostname(name, static_cast(len)); if (ret < 0) { if (errno == EFAULT || errno == EINVAL) @@ -856,7 +860,7 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status GetCurrentTime(int64_t* unix_time) override { + Status GetCurrentTime(int64_t* unix_time) override { time_t ret = time(nullptr); if (ret == (time_t) -1) { return IOError("GetCurrentTime", "", errno); @@ -865,8 +869,8 @@ class PosixEnv : public Env { return Status::OK(); } - virtual Status GetAbsolutePath(const std::string& db_path, - std::string* output_path) override { + Status GetAbsolutePath(const std::string& db_path, + std::string* output_path) override { if (!db_path.empty() && db_path[0] == '/') { *output_path = db_path; return Status::OK(); @@ -883,28 +887,28 @@ class PosixEnv : public Env { } // Allow increasing the number of worker threads. - virtual void SetBackgroundThreads(int num, Priority pri) override { + void SetBackgroundThreads(int num, Priority pri) override { assert(pri >= Priority::BOTTOM && pri <= Priority::HIGH); thread_pools_[pri].SetBackgroundThreads(num); } - virtual int GetBackgroundThreads(Priority pri) override { + int GetBackgroundThreads(Priority pri) override { assert(pri >= Priority::BOTTOM && pri <= Priority::HIGH); return thread_pools_[pri].GetBackgroundThreads(); } - virtual Status SetAllowNonOwnerAccess(bool allow_non_owner_access) override { + Status SetAllowNonOwnerAccess(bool allow_non_owner_access) override { allow_non_owner_access_ = allow_non_owner_access; return Status::OK(); } // Allow increasing the number of worker threads. - virtual void IncBackgroundThreadsIfNeeded(int num, Priority pri) override { + void IncBackgroundThreadsIfNeeded(int num, Priority pri) override { assert(pri >= Priority::BOTTOM && pri <= Priority::HIGH); thread_pools_[pri].IncBackgroundThreadsIfNeeded(num); } - virtual void LowerThreadPoolIOPriority(Priority pool = LOW) override { + void LowerThreadPoolIOPriority(Priority pool = LOW) override { assert(pool >= Priority::BOTTOM && pool <= Priority::HIGH); #ifdef OS_LINUX thread_pools_[pool].LowerIOPriority(); @@ -913,7 +917,7 @@ class PosixEnv : public Env { #endif } - virtual void LowerThreadPoolCPUPriority(Priority pool = LOW) override { + void LowerThreadPoolCPUPriority(Priority pool = LOW) override { assert(pool >= Priority::BOTTOM && pool <= Priority::HIGH); #ifdef OS_LINUX thread_pools_[pool].LowerCPUPriority(); @@ -922,7 +926,7 @@ class PosixEnv : public Env { #endif } - virtual std::string TimeToString(uint64_t secondsSince1970) override { + std::string TimeToString(uint64_t secondsSince1970) override { const time_t seconds = (time_t)secondsSince1970; struct tm t; int maxsize = 64; diff --git a/ceph/src/rocksdb/env/env_test.cc b/ceph/src/rocksdb/env/env_test.cc index eda6b9d5d..478009284 100644 --- a/ceph/src/rocksdb/env/env_test.cc +++ b/ceph/src/rocksdb/env/env_test.cc @@ -118,7 +118,7 @@ class EnvPosixTestWithParam } } - ~EnvPosixTestWithParam() { WaitThreadPoolsEmpty(); } + ~EnvPosixTestWithParam() override { WaitThreadPoolsEmpty(); } }; static void SetBool(void* ptr) { @@ -181,11 +181,11 @@ TEST_F(EnvPosixTest, DISABLED_FilePermission) { std::vector fileNames{ test::PerThreadDBPath(env_, "testfile"), test::PerThreadDBPath(env_, "testfile1")}; - unique_ptr wfile; + std::unique_ptr wfile; ASSERT_OK(env_->NewWritableFile(fileNames[0], &wfile, soptions)); ASSERT_OK(env_->NewWritableFile(fileNames[1], &wfile, soptions)); wfile.reset(); - unique_ptr rwfile; + std::unique_ptr rwfile; ASSERT_OK(env_->NewRandomRWFile(fileNames[1], &rwfile, soptions)); struct stat sb; @@ -217,7 +217,7 @@ TEST_F(EnvPosixTest, MemoryMappedFileBuffer) { std::string expected_data; std::string fname = test::PerThreadDBPath(env_, "testfile"); { - unique_ptr wfile; + std::unique_ptr wfile; const EnvOptions soptions; ASSERT_OK(env_->NewWritableFile(fname, &wfile, soptions)); @@ -812,7 +812,7 @@ class IoctlFriendlyTmpdir { #ifndef ROCKSDB_LITE TEST_F(EnvPosixTest, PositionedAppend) { - unique_ptr writable_file; + std::unique_ptr writable_file; EnvOptions options; options.use_direct_writes = true; options.use_mmap_writes = false; @@ -832,7 +832,7 @@ TEST_F(EnvPosixTest, PositionedAppend) { // The file now has 1 sector worth of a followed by a page worth of b // Verify the above - unique_ptr seq_file; + std::unique_ptr seq_file; ASSERT_OK(env_->NewSequentialFile(ift.name() + "/f", &seq_file, options)); char scratch[kPageSize * 2]; Slice result; @@ -851,10 +851,10 @@ TEST_P(EnvPosixTestWithParam, RandomAccessUniqueID) { soptions.use_direct_reads = soptions.use_direct_writes = direct_io_; IoctlFriendlyTmpdir ift; std::string fname = ift.name() + "/testfile"; - unique_ptr wfile; + std::unique_ptr wfile; ASSERT_OK(env_->NewWritableFile(fname, &wfile, soptions)); - unique_ptr file; + std::unique_ptr file; // Get Unique ID ASSERT_OK(env_->NewRandomAccessFile(fname, &file, soptions)); @@ -921,7 +921,7 @@ TEST_P(EnvPosixTestWithParam, AllocateTest) { EnvOptions soptions; soptions.use_mmap_writes = false; soptions.use_direct_reads = soptions.use_direct_writes = direct_io_; - unique_ptr wfile; + std::unique_ptr wfile; ASSERT_OK(env_->NewWritableFile(fname, &wfile, soptions)); // allocate 100 MB @@ -990,14 +990,14 @@ TEST_P(EnvPosixTestWithParam, RandomAccessUniqueIDConcurrent) { fnames.push_back(ift.name() + "/" + "testfile" + ToString(i)); // Create file. - unique_ptr wfile; + std::unique_ptr wfile; ASSERT_OK(env_->NewWritableFile(fnames[i], &wfile, soptions)); } // Collect and check whether the IDs are unique. std::unordered_set ids; for (const std::string fname : fnames) { - unique_ptr file; + std::unique_ptr file; std::string unique_id; ASSERT_OK(env_->NewRandomAccessFile(fname, &file, soptions)); size_t id_size = file->GetUniqueId(temp_id, MAX_ID_SIZE); @@ -1033,14 +1033,14 @@ TEST_P(EnvPosixTestWithParam, RandomAccessUniqueIDDeletes) { for (int i = 0; i < 1000; ++i) { // Create file. { - unique_ptr wfile; + std::unique_ptr wfile; ASSERT_OK(env_->NewWritableFile(fname, &wfile, soptions)); } // Get Unique ID std::string unique_id; { - unique_ptr file; + std::unique_ptr file; ASSERT_OK(env_->NewRandomAccessFile(fname, &file, soptions)); size_t id_size = file->GetUniqueId(temp_id, MAX_ID_SIZE); ASSERT_TRUE(id_size > 0); @@ -1076,7 +1076,7 @@ TEST_P(EnvPosixTestWithParam, InvalidateCache) { // Create file. { - unique_ptr wfile; + std::unique_ptr wfile; #if !defined(OS_MACOSX) && !defined(OS_WIN) && !defined(OS_SOLARIS) && !defined(OS_AIX) if (soptions.use_direct_writes) { soptions.use_direct_writes = false; @@ -1090,7 +1090,7 @@ TEST_P(EnvPosixTestWithParam, InvalidateCache) { // Random Read { - unique_ptr file; + std::unique_ptr file; auto scratch = NewAligned(kSectorSize, 0); Slice result; #if !defined(OS_MACOSX) && !defined(OS_WIN) && !defined(OS_SOLARIS) && !defined(OS_AIX) @@ -1107,7 +1107,7 @@ TEST_P(EnvPosixTestWithParam, InvalidateCache) { // Sequential Read { - unique_ptr file; + std::unique_ptr file; auto scratch = NewAligned(kSectorSize, 0); Slice result; #if !defined(OS_MACOSX) && !defined(OS_WIN) && !defined(OS_SOLARIS) && !defined(OS_AIX) @@ -1135,7 +1135,7 @@ TEST_P(EnvPosixTestWithParam, InvalidateCache) { class TestLogger : public Logger { public: using Logger::Logv; - virtual void Logv(const char* format, va_list ap) override { + void Logv(const char* format, va_list ap) override { log_count++; char new_format[550]; @@ -1217,7 +1217,7 @@ class TestLogger2 : public Logger { public: explicit TestLogger2(size_t max_log_size) : max_log_size_(max_log_size) {} using Logger::Logv; - virtual void Logv(const char* format, va_list ap) override { + void Logv(const char* format, va_list ap) override { char new_format[2000]; std::fill_n(new_format, sizeof(new_format), '2'); { @@ -1252,7 +1252,7 @@ TEST_P(EnvPosixTestWithParam, LogBufferMaxSizeTest) { TEST_P(EnvPosixTestWithParam, Preallocation) { rocksdb::SyncPoint::GetInstance()->EnableProcessing(); const std::string src = test::PerThreadDBPath(env_, "testfile"); - unique_ptr srcfile; + std::unique_ptr srcfile; EnvOptions soptions; soptions.use_direct_reads = soptions.use_direct_writes = direct_io_; #if !defined(OS_MACOSX) && !defined(OS_WIN) && !defined(OS_SOLARIS) && !defined(OS_AIX) && !defined(OS_OPENBSD) && !defined(OS_FREEBSD) @@ -1315,7 +1315,7 @@ TEST_P(EnvPosixTestWithParam, ConsistentChildrenAttributes) { for (int i = 0; i < kNumChildren; ++i) { const std::string path = test::TmpDir(env_) + "/" + "testfile_" + std::to_string(i); - unique_ptr file; + std::unique_ptr file; #if !defined(OS_MACOSX) && !defined(OS_WIN) && !defined(OS_SOLARIS) && !defined(OS_AIX) && !defined(OS_OPENBSD) && !defined(OS_FREEBSD) if (soptions.use_direct_writes) { rocksdb::SyncPoint::GetInstance()->SetCallBack( @@ -1368,50 +1368,110 @@ TEST_P(EnvPosixTestWithParam, WritableFileWrapper) { inc(1); return Status::OK(); } - Status Truncate(uint64_t /*size*/) override { return Status::OK(); } - Status Close() override { inc(2); return Status::OK(); } - Status Flush() override { inc(3); return Status::OK(); } - Status Sync() override { inc(4); return Status::OK(); } - Status Fsync() override { inc(5); return Status::OK(); } - void SetIOPriority(Env::IOPriority /*pri*/) override { inc(6); } - uint64_t GetFileSize() override { inc(7); return 0; } + + Status PositionedAppend(const Slice& /*data*/, + uint64_t /*offset*/) override { + inc(2); + return Status::OK(); + } + + Status Truncate(uint64_t /*size*/) override { + inc(3); + return Status::OK(); + } + + Status Close() override { + inc(4); + return Status::OK(); + } + + Status Flush() override { + inc(5); + return Status::OK(); + } + + Status Sync() override { + inc(6); + return Status::OK(); + } + + Status Fsync() override { + inc(7); + return Status::OK(); + } + + bool IsSyncThreadSafe() const override { + inc(8); + return true; + } + + bool use_direct_io() const override { + inc(9); + return true; + } + + size_t GetRequiredBufferAlignment() const override { + inc(10); + return 0; + } + + void SetIOPriority(Env::IOPriority /*pri*/) override { inc(11); } + + Env::IOPriority GetIOPriority() override { + inc(12); + return Env::IOPriority::IO_LOW; + } + + void SetWriteLifeTimeHint(Env::WriteLifeTimeHint /*hint*/) override { + inc(13); + } + + Env::WriteLifeTimeHint GetWriteLifeTimeHint() override { + inc(14); + return Env::WriteLifeTimeHint::WLTH_NOT_SET; + } + + uint64_t GetFileSize() override { + inc(15); + return 0; + } + + void SetPreallocationBlockSize(size_t /*size*/) override { inc(16); } + void GetPreallocationStatus(size_t* /*block_size*/, size_t* /*last_allocated_block*/) override { - inc(8); + inc(17); } + size_t GetUniqueId(char* /*id*/, size_t /*max_size*/) const override { - inc(9); + inc(18); return 0; } + Status InvalidateCache(size_t /*offset*/, size_t /*length*/) override { - inc(10); + inc(19); return Status::OK(); } - protected: - Status Allocate(uint64_t /*offset*/, uint64_t /*len*/) override { - inc(11); + Status RangeSync(uint64_t /*offset*/, uint64_t /*nbytes*/) override { + inc(20); return Status::OK(); } - Status RangeSync(uint64_t /*offset*/, uint64_t /*nbytes*/) override { - inc(12); + + void PrepareWrite(size_t /*offset*/, size_t /*len*/) override { inc(21); } + + Status Allocate(uint64_t /*offset*/, uint64_t /*len*/) override { + inc(22); return Status::OK(); } public: - ~Base() { - inc(13); - } + ~Base() override { inc(23); } }; class Wrapper : public WritableFileWrapper { public: explicit Wrapper(WritableFile* target) : WritableFileWrapper(target) {} - - void CallProtectedMethods() { - Allocate(0, 0); - RangeSync(0, 0); - } }; int step = 0; @@ -1420,19 +1480,30 @@ TEST_P(EnvPosixTestWithParam, WritableFileWrapper) { Base b(&step); Wrapper w(&b); w.Append(Slice()); + w.PositionedAppend(Slice(), 0); + w.Truncate(0); w.Close(); w.Flush(); w.Sync(); w.Fsync(); + w.IsSyncThreadSafe(); + w.use_direct_io(); + w.GetRequiredBufferAlignment(); w.SetIOPriority(Env::IOPriority::IO_HIGH); + w.GetIOPriority(); + w.SetWriteLifeTimeHint(Env::WriteLifeTimeHint::WLTH_NOT_SET); + w.GetWriteLifeTimeHint(); w.GetFileSize(); + w.SetPreallocationBlockSize(0); w.GetPreallocationStatus(nullptr, nullptr); w.GetUniqueId(nullptr, 0); w.InvalidateCache(0, 0); - w.CallProtectedMethods(); + w.RangeSync(0, 0); + w.PrepareWrite(0, 0); + w.Allocate(0, 0); } - EXPECT_EQ(14, step); + EXPECT_EQ(24, step); } TEST_P(EnvPosixTestWithParam, PosixRandomRWFile) { @@ -1567,7 +1638,7 @@ TEST_P(EnvPosixTestWithParam, PosixRandomRWFileRandomized) { const std::string path = test::PerThreadDBPath(env_, "random_rw_file_rand"); env_->DeleteFile(path); - unique_ptr file; + std::unique_ptr file; #ifdef OS_LINUX // Cannot open non-existing file. @@ -1618,15 +1689,15 @@ class TestEnv : public EnvWrapper { public: using Logger::Logv; TestLogger(TestEnv* env_ptr) : Logger() { env = env_ptr; } - ~TestLogger() { + ~TestLogger() override { if (!closed_) { CloseHelper(); } } - virtual void Logv(const char* /*format*/, va_list /*ap*/) override{}; + void Logv(const char* /*format*/, va_list /*ap*/) override{}; protected: - virtual Status CloseImpl() override { return CloseHelper(); } + Status CloseImpl() override { return CloseHelper(); } private: Status CloseHelper() { @@ -1640,8 +1711,8 @@ class TestEnv : public EnvWrapper { int GetCloseCount() { return close_count; } - virtual Status NewLogger(const std::string& /*fname*/, - shared_ptr* result) { + Status NewLogger(const std::string& /*fname*/, + std::shared_ptr* result) override { result->reset(new TestLogger(this)); return Status::OK(); } @@ -1685,8 +1756,8 @@ INSTANTIATE_TEST_CASE_P(DefaultEnvWithDirectIO, EnvPosixTestWithParam, #endif // !defined(ROCKSDB_LITE) #if !defined(ROCKSDB_LITE) && !defined(OS_WIN) -static unique_ptr chroot_env(NewChrootEnv(Env::Default(), - test::TmpDir(Env::Default()))); +static std::unique_ptr chroot_env( + NewChrootEnv(Env::Default(), test::TmpDir(Env::Default()))); INSTANTIATE_TEST_CASE_P( ChrootEnvWithoutDirectIO, EnvPosixTestWithParam, ::testing::Values(std::pair(chroot_env.get(), false))); diff --git a/ceph/src/rocksdb/env/io_posix.cc b/ceph/src/rocksdb/env/io_posix.cc index 70dd7e9b0..628ed8413 100644 --- a/ceph/src/rocksdb/env/io_posix.cc +++ b/ceph/src/rocksdb/env/io_posix.cc @@ -83,10 +83,14 @@ size_t GetLogicalBufferSize(int __attribute__((__unused__)) fd) { if (!device_dir.empty() && device_dir.back() == '/') { device_dir.pop_back(); } - // NOTE: sda3 does not have a `queue/` subdir, only the parent sda has it. + // NOTE: sda3 and nvme0n1p1 do not have a `queue/` subdir, only the parent sda + // and nvme0n1 have it. // $ ls -al '/sys/dev/block/8:3' // lrwxrwxrwx. 1 root root 0 Jun 26 01:38 /sys/dev/block/8:3 -> // ../../block/sda/sda3 + // $ ls -al '/sys/dev/block/259:4' + // lrwxrwxrwx 1 root root 0 Jan 31 16:04 /sys/dev/block/259:4 -> + // ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 size_t parent_end = device_dir.rfind('/', device_dir.length() - 1); if (parent_end == std::string::npos) { return kDefaultPageSize; @@ -95,8 +99,11 @@ size_t GetLogicalBufferSize(int __attribute__((__unused__)) fd) { if (parent_begin == std::string::npos) { return kDefaultPageSize; } - if (device_dir.substr(parent_begin + 1, parent_end - parent_begin - 1) != - "block") { + std::string parent = + device_dir.substr(parent_begin + 1, parent_end - parent_begin - 1); + std::string child = device_dir.substr(parent_end + 1, std::string::npos); + if (parent != "block" && + (child.compare(0, 4, "nvme") || child.find('p') != std::string::npos)) { device_dir = device_dir.substr(0, parent_end); } std::string fname = device_dir + "/queue/logical_block_size"; @@ -258,7 +265,6 @@ size_t PosixHelper::GetUniqueIdFromFile(int fd, char* id, size_t max_size) { struct stat buf; int result = fstat(fd, &buf); - assert(result != -1); if (result == -1) { return 0; } @@ -818,7 +824,8 @@ Status PosixWritableFile::Close() { // but it will be nice to log these errors. int dummy __attribute__((__unused__)); dummy = ftruncate(fd_, filesize_); -#if defined(ROCKSDB_FALLOCATE_PRESENT) && !defined(TRAVIS) +#if defined(ROCKSDB_FALLOCATE_PRESENT) && defined(FALLOC_FL_PUNCH_HOLE) && \ + !defined(TRAVIS) // in some file systems, ftruncate only trims trailing space if the // new file size is smaller than the current size. Calling fallocate // with FALLOC_FL_PUNCH_HOLE flag to explicitly release these unused diff --git a/ceph/src/rocksdb/env/io_posix.h b/ceph/src/rocksdb/env/io_posix.h index 106f6df65..e6824d3e8 100644 --- a/ceph/src/rocksdb/env/io_posix.h +++ b/ceph/src/rocksdb/env/io_posix.h @@ -41,6 +41,9 @@ static Status IOError(const std::string& context, const std::string& file_name, strerror(err_number)); case ESTALE: return Status::IOError(Status::kStaleFile); + case ENOENT: + return Status::PathNotFound(IOErrorMsg(context, file_name), + strerror(err_number)); default: return Status::IOError(IOErrorMsg(context, file_name), strerror(err_number)); diff --git a/ceph/src/rocksdb/env/mock_env.cc b/ceph/src/rocksdb/env/mock_env.cc index 12c096cef..793a0837a 100644 --- a/ceph/src/rocksdb/env/mock_env.cc +++ b/ceph/src/rocksdb/env/mock_env.cc @@ -183,9 +183,9 @@ class MockSequentialFile : public SequentialFile { file_->Ref(); } - ~MockSequentialFile() { file_->Unref(); } + ~MockSequentialFile() override { file_->Unref(); } - virtual Status Read(size_t n, Slice* result, char* scratch) override { + Status Read(size_t n, Slice* result, char* scratch) override { Status s = file_->Read(pos_, n, result, scratch); if (s.ok()) { pos_ += result->size(); @@ -193,7 +193,7 @@ class MockSequentialFile : public SequentialFile { return s; } - virtual Status Skip(uint64_t n) override { + Status Skip(uint64_t n) override { if (pos_ > file_->Size()) { return Status::IOError("pos_ > file_->Size()"); } @@ -214,10 +214,10 @@ class MockRandomAccessFile : public RandomAccessFile { public: explicit MockRandomAccessFile(MemFile* file) : file_(file) { file_->Ref(); } - ~MockRandomAccessFile() { file_->Unref(); } + ~MockRandomAccessFile() override { file_->Unref(); } - virtual Status Read(uint64_t offset, size_t n, Slice* result, - char* scratch) const override { + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { return file_->Read(offset, n, result, scratch); } @@ -229,22 +229,22 @@ class MockRandomRWFile : public RandomRWFile { public: explicit MockRandomRWFile(MemFile* file) : file_(file) { file_->Ref(); } - ~MockRandomRWFile() { file_->Unref(); } + ~MockRandomRWFile() override { file_->Unref(); } - virtual Status Write(uint64_t offset, const Slice& data) override { + Status Write(uint64_t offset, const Slice& data) override { return file_->Write(offset, data); } - virtual Status Read(uint64_t offset, size_t n, Slice* result, - char* scratch) const override { + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { return file_->Read(offset, n, result, scratch); } - virtual Status Close() override { return file_->Fsync(); } + Status Close() override { return file_->Fsync(); } - virtual Status Flush() override { return Status::OK(); } + Status Flush() override { return Status::OK(); } - virtual Status Sync() override { return file_->Fsync(); } + Status Sync() override { return file_->Fsync(); } private: MemFile* file_; @@ -257,9 +257,9 @@ class MockWritableFile : public WritableFile { file_->Ref(); } - ~MockWritableFile() { file_->Unref(); } + ~MockWritableFile() override { file_->Unref(); } - virtual Status Append(const Slice& data) override { + Status Append(const Slice& data) override { size_t bytes_written = 0; while (bytes_written < data.size()) { auto bytes = RequestToken(data.size() - bytes_written); @@ -271,17 +271,17 @@ class MockWritableFile : public WritableFile { } return Status::OK(); } - virtual Status Truncate(uint64_t size) override { + Status Truncate(uint64_t size) override { file_->Truncate(static_cast(size)); return Status::OK(); } - virtual Status Close() override { return file_->Fsync(); } + Status Close() override { return file_->Fsync(); } - virtual Status Flush() override { return Status::OK(); } + Status Flush() override { return Status::OK(); } - virtual Status Sync() override { return file_->Fsync(); } + Status Sync() override { return file_->Fsync(); } - virtual uint64_t GetFileSize() override { return file_->Size(); } + uint64_t GetFileSize() override { return file_->Size(); } private: inline size_t RequestToken(size_t bytes) { @@ -299,7 +299,7 @@ class MockWritableFile : public WritableFile { class MockEnvDirectory : public Directory { public: - virtual Status Fsync() override { return Status::OK(); } + Status Fsync() override { return Status::OK(); } }; class MockEnvFileLock : public FileLock { @@ -319,7 +319,7 @@ class TestMemLogger : public Logger { static const uint64_t flush_every_seconds_ = 5; std::atomic_uint_fast64_t last_flush_micros_; Env* env_; - bool flush_pending_; + std::atomic flush_pending_; public: TestMemLogger(std::unique_ptr f, Env* env, @@ -330,9 +330,9 @@ class TestMemLogger : public Logger { last_flush_micros_(0), env_(env), flush_pending_(false) {} - virtual ~TestMemLogger() {} + ~TestMemLogger() override {} - virtual void Flush() override { + void Flush() override { if (flush_pending_) { flush_pending_ = false; } @@ -340,7 +340,7 @@ class TestMemLogger : public Logger { } using Logger::Logv; - virtual void Logv(const char* format, va_list ap) override { + void Logv(const char* format, va_list ap) override { // We try twice: the first time with a fixed-size stack allocated buffer, // and the second time with a much larger dynamically allocated buffer. char buffer[500]; @@ -424,7 +424,7 @@ MockEnv::~MockEnv() { // Partial implementation of the Env interface. Status MockEnv::NewSequentialFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& /*soptions*/) { auto fn = NormalizePath(fname); MutexLock lock(&mutex_); @@ -441,7 +441,7 @@ Status MockEnv::NewSequentialFile(const std::string& fname, } Status MockEnv::NewRandomAccessFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& /*soptions*/) { auto fn = NormalizePath(fname); MutexLock lock(&mutex_); @@ -458,7 +458,7 @@ Status MockEnv::NewRandomAccessFile(const std::string& fname, } Status MockEnv::NewRandomRWFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& /*soptions*/) { auto fn = NormalizePath(fname); MutexLock lock(&mutex_); @@ -476,7 +476,7 @@ Status MockEnv::NewRandomRWFile(const std::string& fname, Status MockEnv::ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) { auto s = RenameFile(old_fname, fname); if (!s.ok()) { @@ -487,7 +487,7 @@ Status MockEnv::ReuseWritableFile(const std::string& fname, } Status MockEnv::NewWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& env_options) { auto fn = NormalizePath(fname); MutexLock lock(&mutex_); @@ -503,7 +503,7 @@ Status MockEnv::NewWritableFile(const std::string& fname, } Status MockEnv::NewDirectory(const std::string& /*name*/, - unique_ptr* result) { + std::unique_ptr* result) { result->reset(new MockEnvDirectory()); return Status::OK(); } @@ -660,7 +660,7 @@ Status MockEnv::LinkFile(const std::string& src, const std::string& dest) { } Status MockEnv::NewLogger(const std::string& fname, - shared_ptr* result) { + std::shared_ptr* result) { auto fn = NormalizePath(fname); MutexLock lock(&mutex_); auto iter = file_map_.find(fn); diff --git a/ceph/src/rocksdb/env/mock_env.h b/ceph/src/rocksdb/env/mock_env.h index 816256ab0..87b8deaf8 100644 --- a/ceph/src/rocksdb/env/mock_env.h +++ b/ceph/src/rocksdb/env/mock_env.h @@ -28,28 +28,28 @@ class MockEnv : public EnvWrapper { // Partial implementation of the Env interface. virtual Status NewSequentialFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& soptions) override; virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& soptions) override; virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) override; virtual Status ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) override; virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& env_options) override; virtual Status NewDirectory(const std::string& name, - unique_ptr* result) override; + std::unique_ptr* result) override; virtual Status FileExists(const std::string& fname) override; @@ -81,7 +81,7 @@ class MockEnv : public EnvWrapper { const std::string& target) override; virtual Status NewLogger(const std::string& fname, - shared_ptr* result) override; + std::shared_ptr* result) override; virtual Status LockFile(const std::string& fname, FileLock** flock) override; diff --git a/ceph/src/rocksdb/env/mock_env_test.cc b/ceph/src/rocksdb/env/mock_env_test.cc index 19e259ccd..2daf682e7 100644 --- a/ceph/src/rocksdb/env/mock_env_test.cc +++ b/ceph/src/rocksdb/env/mock_env_test.cc @@ -20,16 +20,14 @@ class MockEnvTest : public testing::Test { MockEnvTest() : env_(new MockEnv(Env::Default())) { } - ~MockEnvTest() { - delete env_; - } + ~MockEnvTest() override { delete env_; } }; TEST_F(MockEnvTest, Corrupt) { const std::string kGood = "this is a good string, synced to disk"; const std::string kCorrupted = "this part may be corrupted"; const std::string kFileName = "/dir/f"; - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->NewWritableFile(kFileName, &writable_file, soptions_)); ASSERT_OK(writable_file->Append(kGood)); ASSERT_TRUE(writable_file->GetFileSize() == kGood.size()); @@ -37,7 +35,7 @@ TEST_F(MockEnvTest, Corrupt) { std::string scratch; scratch.resize(kGood.size() + kCorrupted.size() + 16); Slice result; - unique_ptr rand_file; + std::unique_ptr rand_file; ASSERT_OK(env_->NewRandomAccessFile(kFileName, &rand_file, soptions_)); ASSERT_OK(rand_file->Read(0, kGood.size(), &result, &(scratch[0]))); ASSERT_EQ(result.compare(kGood), 0); diff --git a/ceph/src/rocksdb/examples/Makefile b/ceph/src/rocksdb/examples/Makefile index 57cd1a75a..27a6f0f42 100644 --- a/ceph/src/rocksdb/examples/Makefile +++ b/ceph/src/rocksdb/examples/Makefile @@ -43,8 +43,11 @@ transaction_example: librocksdb transaction_example.cc options_file_example: librocksdb options_file_example.cc $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS) +multi_processes_example: librocksdb multi_processes_example.cc + $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS) + clean: - rm -rf ./simple_example ./column_families_example ./compact_files_example ./compaction_filter_example ./c_simple_example c_simple_example.o ./optimistic_transaction_example ./transaction_example ./options_file_example + rm -rf ./simple_example ./column_families_example ./compact_files_example ./compaction_filter_example ./c_simple_example c_simple_example.o ./optimistic_transaction_example ./transaction_example ./options_file_example ./multi_processes_example librocksdb: cd .. && $(MAKE) static_lib diff --git a/ceph/src/rocksdb/examples/multi_processes_example.cc b/ceph/src/rocksdb/examples/multi_processes_example.cc new file mode 100644 index 000000000..b1c1d02ba --- /dev/null +++ b/ceph/src/rocksdb/examples/multi_processes_example.cc @@ -0,0 +1,395 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +// How to use this example +// Open two terminals, in one of them, run `./multi_processes_example 0` to +// start a process running the primary instance. This will create a new DB in +// kDBPath. The process will run for a while inserting keys to the normal +// RocksDB database. +// Next, go to the other terminal and run `./multi_processes_example 1` to +// start a process running the secondary instance. This will create a secondary +// instance following the aforementioned primary instance. This process will +// run for a while, tailing the logs of the primary. After process with primary +// instance exits, this process will keep running until you hit 'CTRL+C'. + +#include +#include +#include +#include +#include +#include +#include +#include + +#if defined(OS_LINUX) +#include +#include +#include +#include +#include +#include +#endif // !OS_LINUX + +#include "rocksdb/db.h" +#include "rocksdb/options.h" +#include "rocksdb/slice.h" + +using rocksdb::ColumnFamilyDescriptor; +using rocksdb::ColumnFamilyHandle; +using rocksdb::ColumnFamilyOptions; +using rocksdb::DB; +using rocksdb::FlushOptions; +using rocksdb::Iterator; +using rocksdb::Options; +using rocksdb::ReadOptions; +using rocksdb::Slice; +using rocksdb::Status; +using rocksdb::WriteOptions; + +const std::string kDBPath = "/tmp/rocksdb_multi_processes_example"; +const std::string kPrimaryStatusFile = + "/tmp/rocksdb_multi_processes_example_primary_status"; +const uint64_t kMaxKey = 600000; +const size_t kMaxValueLength = 256; +const size_t kNumKeysPerFlush = 1000; + +const std::vector& GetColumnFamilyNames() { + static std::vector column_family_names = { + rocksdb::kDefaultColumnFamilyName, "pikachu"}; + return column_family_names; +} + +inline bool IsLittleEndian() { + uint32_t x = 1; + return *reinterpret_cast(&x) != 0; +} + +static std::atomic& ShouldSecondaryWait() { + static std::atomic should_secondary_wait{1}; + return should_secondary_wait; +} + +static std::string Key(uint64_t k) { + std::string ret; + if (IsLittleEndian()) { + ret.append(reinterpret_cast(&k), sizeof(k)); + } else { + char buf[sizeof(k)]; + buf[0] = k & 0xff; + buf[1] = (k >> 8) & 0xff; + buf[2] = (k >> 16) & 0xff; + buf[3] = (k >> 24) & 0xff; + buf[4] = (k >> 32) & 0xff; + buf[5] = (k >> 40) & 0xff; + buf[6] = (k >> 48) & 0xff; + buf[7] = (k >> 56) & 0xff; + ret.append(buf, sizeof(k)); + } + size_t i = 0, j = ret.size() - 1; + while (i < j) { + char tmp = ret[i]; + ret[i] = ret[j]; + ret[j] = tmp; + ++i; + --j; + } + return ret; +} + +static uint64_t Key(std::string key) { + assert(key.size() == sizeof(uint64_t)); + size_t i = 0, j = key.size() - 1; + while (i < j) { + char tmp = key[i]; + key[i] = key[j]; + key[j] = tmp; + ++i; + --j; + } + uint64_t ret = 0; + if (IsLittleEndian()) { + memcpy(&ret, key.c_str(), sizeof(uint64_t)); + } else { + const char* buf = key.c_str(); + ret |= static_cast(buf[0]); + ret |= (static_cast(buf[1]) << 8); + ret |= (static_cast(buf[2]) << 16); + ret |= (static_cast(buf[3]) << 24); + ret |= (static_cast(buf[4]) << 32); + ret |= (static_cast(buf[5]) << 40); + ret |= (static_cast(buf[6]) << 48); + ret |= (static_cast(buf[7]) << 56); + } + return ret; +} + +static Slice GenerateRandomValue(const size_t max_length, char scratch[]) { + size_t sz = 1 + (std::rand() % max_length); + int rnd = std::rand(); + for (size_t i = 0; i != sz; ++i) { + scratch[i] = static_cast(rnd ^ i); + } + return Slice(scratch, sz); +} + +static bool ShouldCloseDB() { return true; } + +// TODO: port this example to other systems. It should be straightforward for +// POSIX-compliant systems. +#if defined(OS_LINUX) +void CreateDB() { + long my_pid = static_cast(getpid()); + Options options; + Status s = rocksdb::DestroyDB(kDBPath, options); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to destroy DB: %s\n", my_pid, + s.ToString().c_str()); + assert(false); + } + options.create_if_missing = true; + DB* db = nullptr; + s = DB::Open(options, kDBPath, &db); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to open DB: %s\n", my_pid, + s.ToString().c_str()); + assert(false); + } + std::vector handles; + ColumnFamilyOptions cf_opts(options); + for (const auto& cf_name : GetColumnFamilyNames()) { + if (rocksdb::kDefaultColumnFamilyName != cf_name) { + ColumnFamilyHandle* handle = nullptr; + s = db->CreateColumnFamily(cf_opts, cf_name, &handle); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to create CF %s: %s\n", my_pid, + cf_name.c_str(), s.ToString().c_str()); + assert(false); + } + handles.push_back(handle); + } + } + fprintf(stdout, "[process %ld] Column families created\n", my_pid); + for (auto h : handles) { + delete h; + } + handles.clear(); + delete db; +} + +void RunPrimary() { + long my_pid = static_cast(getpid()); + fprintf(stdout, "[process %ld] Primary instance starts\n", my_pid); + CreateDB(); + std::srand(time(nullptr)); + DB* db = nullptr; + Options options; + options.create_if_missing = false; + std::vector column_families; + for (const auto& cf_name : GetColumnFamilyNames()) { + column_families.push_back(ColumnFamilyDescriptor(cf_name, options)); + } + std::vector handles; + WriteOptions write_opts; + char val_buf[kMaxValueLength] = {0}; + uint64_t curr_key = 0; + while (curr_key < kMaxKey) { + Status s; + if (nullptr == db) { + s = DB::Open(options, kDBPath, column_families, &handles, &db); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to open DB: %s\n", my_pid, + s.ToString().c_str()); + assert(false); + } + } + assert(nullptr != db); + assert(handles.size() == GetColumnFamilyNames().size()); + for (auto h : handles) { + assert(nullptr != h); + for (size_t i = 0; i != kNumKeysPerFlush; ++i) { + Slice key = Key(curr_key + static_cast(i)); + Slice value = GenerateRandomValue(kMaxValueLength, val_buf); + s = db->Put(write_opts, h, key, value); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to insert\n", my_pid); + assert(false); + } + } + s = db->Flush(FlushOptions(), h); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to flush\n", my_pid); + assert(false); + } + } + curr_key += static_cast(kNumKeysPerFlush); + if (ShouldCloseDB()) { + for (auto h : handles) { + delete h; + } + handles.clear(); + delete db; + db = nullptr; + } + } + if (nullptr != db) { + for (auto h : handles) { + delete h; + } + handles.clear(); + delete db; + db = nullptr; + } + fprintf(stdout, "[process %ld] Finished adding keys\n", my_pid); +} + +void secondary_instance_sigint_handler(int signal) { + ShouldSecondaryWait().store(0, std::memory_order_relaxed); + fprintf(stdout, "\n"); + fflush(stdout); +}; + +void RunSecondary() { + ::signal(SIGINT, secondary_instance_sigint_handler); + long my_pid = static_cast(getpid()); + const std::string kSecondaryPath = + "/tmp/rocksdb_multi_processes_example_secondary"; + // Create directory if necessary + if (nullptr == opendir(kSecondaryPath.c_str())) { + int ret = + mkdir(kSecondaryPath.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH); + if (ret < 0) { + perror("failed to create directory for secondary instance"); + exit(0); + } + } + DB* db = nullptr; + Options options; + options.create_if_missing = false; + options.max_open_files = -1; + Status s = DB::OpenAsSecondary(options, kDBPath, kSecondaryPath, &db); + if (!s.ok()) { + fprintf(stderr, "[process %ld] Failed to open in secondary mode: %s\n", + my_pid, s.ToString().c_str()); + assert(false); + } else { + fprintf(stdout, "[process %ld] Secondary instance starts\n", my_pid); + } + + ReadOptions ropts; + ropts.verify_checksums = true; + ropts.total_order_seek = true; + + std::vector test_threads; + test_threads.emplace_back([&]() { + while (1 == ShouldSecondaryWait().load(std::memory_order_relaxed)) { + std::unique_ptr iter(db->NewIterator(ropts)); + iter->SeekToFirst(); + size_t count = 0; + for (; iter->Valid(); iter->Next()) { + ++count; + } + } + fprintf(stdout, "[process %ld] Range_scan thread finished\n", my_pid); + }); + + test_threads.emplace_back([&]() { + std::srand(time(nullptr)); + while (1 == ShouldSecondaryWait().load(std::memory_order_relaxed)) { + Slice key = Key(std::rand() % kMaxKey); + std::string value; + db->Get(ropts, key, &value); + } + fprintf(stdout, "[process %ld] Point lookup thread finished\n"); + }); + + uint64_t curr_key = 0; + while (1 == ShouldSecondaryWait().load(std::memory_order_relaxed)) { + s = db->TryCatchUpWithPrimary(); + if (!s.ok()) { + fprintf(stderr, + "[process %ld] error while trying to catch up with " + "primary %s\n", + my_pid, s.ToString().c_str()); + assert(false); + } + { + std::unique_ptr iter(db->NewIterator(ropts)); + if (!iter) { + fprintf(stderr, "[process %ld] Failed to create iterator\n", my_pid); + assert(false); + } + iter->SeekToLast(); + if (iter->Valid()) { + uint64_t curr_max_key = Key(iter->key().ToString()); + if (curr_max_key != curr_key) { + fprintf(stdout, "[process %ld] Observed key %" PRIu64 "\n", my_pid, + curr_key); + curr_key = curr_max_key; + } + } + } + std::this_thread::sleep_for(std::chrono::seconds(1)); + } + s = db->TryCatchUpWithPrimary(); + if (!s.ok()) { + fprintf(stderr, + "[process %ld] error while trying to catch up with " + "primary %s\n", + my_pid, s.ToString().c_str()); + assert(false); + } + + std::vector column_families; + for (const auto& cf_name : GetColumnFamilyNames()) { + column_families.push_back(ColumnFamilyDescriptor(cf_name, options)); + } + std::vector handles; + DB* verification_db = nullptr; + s = DB::OpenForReadOnly(options, kDBPath, column_families, &handles, + &verification_db); + assert(s.ok()); + Iterator* iter1 = verification_db->NewIterator(ropts); + iter1->SeekToFirst(); + + Iterator* iter = db->NewIterator(ropts); + iter->SeekToFirst(); + for (; iter->Valid() && iter1->Valid(); iter->Next(), iter1->Next()) { + if (iter->key().ToString() != iter1->key().ToString()) { + fprintf(stderr, "%" PRIu64 "!= %" PRIu64 "\n", + Key(iter->key().ToString()), Key(iter1->key().ToString())); + assert(false); + } else if (iter->value().ToString() != iter1->value().ToString()) { + fprintf(stderr, "Value mismatch\n"); + assert(false); + } + } + fprintf(stdout, "[process %ld] Verification succeeded\n", my_pid); + for (auto& thr : test_threads) { + thr.join(); + } + delete iter; + delete iter1; + delete db; + delete verification_db; +} + +int main(int argc, char** argv) { + if (argc < 2) { + fprintf(stderr, "%s <0 for primary, 1 for secondary>\n", argv[0]); + return 0; + } + if (atoi(argv[1]) == 0) { + RunPrimary(); + } else { + RunSecondary(); + } + return 0; +} +#else // OS_LINUX +int main() { + fpritnf(stderr, "Not implemented.\n"); + return 0; +} +#endif // !OS_LINUX diff --git a/ceph/src/rocksdb/hdfs/env_hdfs.h b/ceph/src/rocksdb/hdfs/env_hdfs.h index b0c9e33fd..903e32ef9 100644 --- a/ceph/src/rocksdb/hdfs/env_hdfs.h +++ b/ceph/src/rocksdb/hdfs/env_hdfs.h @@ -54,110 +54,109 @@ class HdfsEnv : public Env { hdfsDisconnect(fileSys_); } - virtual Status NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options); + Status NewSequentialFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override; - virtual Status NewRandomAccessFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options); + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override; - virtual Status NewWritableFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options); + Status NewWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override; - virtual Status NewDirectory(const std::string& name, - std::unique_ptr* result); + Status NewDirectory(const std::string& name, + std::unique_ptr* result) override; - virtual Status FileExists(const std::string& fname); + Status FileExists(const std::string& fname) override; - virtual Status GetChildren(const std::string& path, - std::vector* result); + Status GetChildren(const std::string& path, + std::vector* result) override; - virtual Status DeleteFile(const std::string& fname); + Status DeleteFile(const std::string& fname) override; - virtual Status CreateDir(const std::string& name); + Status CreateDir(const std::string& name) override; - virtual Status CreateDirIfMissing(const std::string& name); + Status CreateDirIfMissing(const std::string& name) override; - virtual Status DeleteDir(const std::string& name); + Status DeleteDir(const std::string& name) override; - virtual Status GetFileSize(const std::string& fname, uint64_t* size); + Status GetFileSize(const std::string& fname, uint64_t* size) override; - virtual Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime); + Status GetFileModificationTime(const std::string& fname, + uint64_t* file_mtime) override; - virtual Status RenameFile(const std::string& src, const std::string& target); + Status RenameFile(const std::string& src, const std::string& target) override; - virtual Status LinkFile(const std::string& src, const std::string& target) { + Status LinkFile(const std::string& /*src*/, + const std::string& /*target*/) override { return Status::NotSupported(); // not supported } - virtual Status LockFile(const std::string& fname, FileLock** lock); + Status LockFile(const std::string& fname, FileLock** lock) override; - virtual Status UnlockFile(FileLock* lock); + Status UnlockFile(FileLock* lock) override; - virtual Status NewLogger(const std::string& fname, - std::shared_ptr* result); + Status NewLogger(const std::string& fname, + std::shared_ptr* result) override; - virtual void Schedule(void (*function)(void* arg), void* arg, - Priority pri = LOW, void* tag = nullptr, void (*unschedFunction)(void* arg) = 0) { + void Schedule(void (*function)(void* arg), void* arg, Priority pri = LOW, + void* tag = nullptr, + void (*unschedFunction)(void* arg) = 0) override { posixEnv->Schedule(function, arg, pri, tag, unschedFunction); } - virtual int UnSchedule(void* tag, Priority pri) { + int UnSchedule(void* tag, Priority pri) override { return posixEnv->UnSchedule(tag, pri); } - virtual void StartThread(void (*function)(void* arg), void* arg) { + void StartThread(void (*function)(void* arg), void* arg) override { posixEnv->StartThread(function, arg); } - virtual void WaitForJoin() { posixEnv->WaitForJoin(); } + void WaitForJoin() override { posixEnv->WaitForJoin(); } - virtual unsigned int GetThreadPoolQueueLen(Priority pri = LOW) const - override { + unsigned int GetThreadPoolQueueLen(Priority pri = LOW) const override { return posixEnv->GetThreadPoolQueueLen(pri); } - virtual Status GetTestDirectory(std::string* path) { + Status GetTestDirectory(std::string* path) override { return posixEnv->GetTestDirectory(path); } - virtual uint64_t NowMicros() { - return posixEnv->NowMicros(); - } + uint64_t NowMicros() override { return posixEnv->NowMicros(); } - virtual void SleepForMicroseconds(int micros) { + void SleepForMicroseconds(int micros) override { posixEnv->SleepForMicroseconds(micros); } - virtual Status GetHostName(char* name, uint64_t len) { + Status GetHostName(char* name, uint64_t len) override { return posixEnv->GetHostName(name, len); } - virtual Status GetCurrentTime(int64_t* unix_time) { + Status GetCurrentTime(int64_t* unix_time) override { return posixEnv->GetCurrentTime(unix_time); } - virtual Status GetAbsolutePath(const std::string& db_path, - std::string* output_path) { + Status GetAbsolutePath(const std::string& db_path, + std::string* output_path) override { return posixEnv->GetAbsolutePath(db_path, output_path); } - virtual void SetBackgroundThreads(int number, Priority pri = LOW) { + void SetBackgroundThreads(int number, Priority pri = LOW) override { posixEnv->SetBackgroundThreads(number, pri); } - virtual int GetBackgroundThreads(Priority pri = LOW) { + int GetBackgroundThreads(Priority pri = LOW) override { return posixEnv->GetBackgroundThreads(pri); } - virtual void IncBackgroundThreadsIfNeeded(int number, Priority pri) override { + void IncBackgroundThreadsIfNeeded(int number, Priority pri) override { posixEnv->IncBackgroundThreadsIfNeeded(number, pri); } - virtual std::string TimeToString(uint64_t number) { + std::string TimeToString(uint64_t number) override { return posixEnv->TimeToString(number); } @@ -166,9 +165,7 @@ class HdfsEnv : public Env { return (uint64_t)pthread_self(); } - virtual uint64_t GetThreadID() const override { - return HdfsEnv::gettid(); - } + uint64_t GetThreadID() const override { return HdfsEnv::gettid(); } private: std::string fsname_; // string of the form "hdfs://hostname:port/" @@ -206,7 +203,7 @@ class HdfsEnv : public Env { std::string host(parts[0]); std::string remaining(parts[1]); - int rem = remaining.find(pathsep); + int rem = static_cast(remaining.find(pathsep)); std::string portStr = (rem == 0 ? remaining : remaining.substr(0, rem)); @@ -255,23 +252,24 @@ class HdfsEnv : public Env { } virtual Status NewSequentialFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) override; - virtual Status NewRandomAccessFile(const std::string& /*fname*/, - unique_ptr* /*result*/, - const EnvOptions& /*options*/) override { + virtual Status NewRandomAccessFile( + const std::string& /*fname*/, + std::unique_ptr* /*result*/, + const EnvOptions& /*options*/) override { return notsup; } virtual Status NewWritableFile(const std::string& /*fname*/, - unique_ptr* /*result*/, + std::unique_ptr* /*result*/, const EnvOptions& /*options*/) override { return notsup; } virtual Status NewDirectory(const std::string& /*name*/, - unique_ptr* /*result*/) override { + std::unique_ptr* /*result*/) override { return notsup; } @@ -328,7 +326,7 @@ class HdfsEnv : public Env { virtual Status UnlockFile(FileLock* /*lock*/) override { return notsup; } virtual Status NewLogger(const std::string& /*fname*/, - shared_ptr* /*result*/) override { + std::shared_ptr* /*result*/) override { return notsup; } diff --git a/ceph/src/rocksdb/hdfs/setup.sh b/ceph/src/rocksdb/hdfs/setup.sh old mode 100644 new mode 100755 index f071b7e31..ba76ec209 --- a/ceph/src/rocksdb/hdfs/setup.sh +++ b/ceph/src/rocksdb/hdfs/setup.sh @@ -1,8 +1,8 @@ # shellcheck disable=SC2148 export USE_HDFS=1 -export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:/usr/lib/hadoop/lib/native +export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$HADOOP_HOME/lib/native -export CLASSPATH= +export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob` for f in `find /usr/lib/hadoop-hdfs | grep jar`; do export CLASSPATH=$CLASSPATH:$f; done for f in `find /usr/lib/hadoop | grep jar`; do export CLASSPATH=$CLASSPATH:$f; done for f in `find /usr/lib/hadoop/client | grep jar`; do export CLASSPATH=$CLASSPATH:$f; done diff --git a/ceph/src/rocksdb/include/rocksdb/advanced_options.h b/ceph/src/rocksdb/include/rocksdb/advanced_options.h index 940a6f6b7..b7ab7c584 100644 --- a/ceph/src/rocksdb/include/rocksdb/advanced_options.h +++ b/ceph/src/rocksdb/include/rocksdb/advanced_options.h @@ -62,13 +62,6 @@ struct CompactionOptionsFIFO { // Default: 1GB uint64_t max_table_files_size; - // Drop files older than TTL. TTL based deletion will take precedence over - // size based deletion if ttl > 0. - // delete if sst_file_creation_time < (current_time - ttl) - // unit: seconds. Ex: 1 day = 1 * 24 * 60 * 60 - // Default: 0 (disabled) - uint64_t ttl = 0; - // If true, try to do compaction to compact smaller files into larger ones. // Minimum files to compact follows options.level0_file_num_compaction_trigger // and compaction won't trigger if average compact bytes per del file is @@ -78,10 +71,8 @@ struct CompactionOptionsFIFO { bool allow_compaction = false; CompactionOptionsFIFO() : max_table_files_size(1 * 1024 * 1024 * 1024) {} - CompactionOptionsFIFO(uint64_t _max_table_files_size, bool _allow_compaction, - uint64_t _ttl = 0) + CompactionOptionsFIFO(uint64_t _max_table_files_size, bool _allow_compaction) : max_table_files_size(_max_table_files_size), - ttl(_ttl), allow_compaction(_allow_compaction) {} }; @@ -281,6 +272,15 @@ struct AdvancedColumnFamilyOptions { // Dynamically changeable through SetOptions() API double memtable_prefix_bloom_size_ratio = 0.0; + // Enable whole key bloom filter in memtable. Note this will only take effect + // if memtable_prefix_bloom_size_ratio is not 0. Enabling whole key filtering + // can potentially reduce CPU usage for point-look-ups. + // + // Default: false (disable) + // + // Dynamically changeable through SetOptions() API + bool memtable_whole_key_filtering = false; + // Page size for huge page for the arena used by the memtable. If <=0, it // won't allocate from huge page but from malloc. // Users are responsible to reserve huge pages for it to be allocated. For @@ -413,6 +413,7 @@ struct AdvancedColumnFamilyOptions { // of the level. // At the same time max_bytes_for_level_multiplier and // max_bytes_for_level_multiplier_additional are still satisfied. + // (When L0 is too large, we make some adjustment. See below.) // // With this option on, from an empty DB, we make last level the base level, // which means merging L0 data into the last level, until it exceeds @@ -451,6 +452,29 @@ struct AdvancedColumnFamilyOptions { // max_bytes_for_level_base, for a more predictable LSM tree shape. It is // useful to limit worse case space amplification. // + // + // If the compaction from L0 is lagged behind, a special mode will be turned + // on to prioritize write amplification against max_bytes_for_level_multiplier + // or max_bytes_for_level_base. The L0 compaction is lagged behind by looking + // at number of L0 files and total L0 size. If number of L0 files is at least + // the double of level0_file_num_compaction_trigger, or the total size is + // at least max_bytes_for_level_base, this mode is on. The target of L1 grows + // to the actual data size in L0, and then determine the target for each level + // so that each level will have the same level multiplier. + // + // For example, when L0 size is 100MB, the size of last level is 1600MB, + // max_bytes_for_level_base = 80MB, and max_bytes_for_level_multiplier = 10. + // Since L0 size is larger than max_bytes_for_level_base, this is a L0 + // compaction backlogged mode. So that the L1 size is determined to be 100MB. + // Based on max_bytes_for_level_multiplier = 10, at least 3 non-0 levels will + // be needed. The level multiplier will be calculated to be 4 and the three + // levels' target to be [100MB, 400MB, 1600MB]. + // + // In this mode, The number of levels will be no more than the normal mode, + // and the level multiplier will be lower. The write amplification will + // likely to be reduced. + // + // // max_bytes_for_level_multiplier_additional is ignored with this flag on. // // Turning this feature on or off for an existing DB can cause unexpected @@ -478,19 +502,25 @@ struct AdvancedColumnFamilyOptions { // threshold. But it's not guaranteed. // Value 0 will be sanitized. // - // Default: result.target_file_size_base * 25 + // Default: target_file_size_base * 25 + // + // Dynamically changeable through SetOptions() API uint64_t max_compaction_bytes = 0; // All writes will be slowed down to at least delayed_write_rate if estimated // bytes needed to be compaction exceed this threshold. // // Default: 64GB + // + // Dynamically changeable through SetOptions() API uint64_t soft_pending_compaction_bytes_limit = 64 * 1073741824ull; // All writes are stopped if estimated bytes needed to be compaction exceed // this threshold. // // Default: 256GB + // + // Dynamically changeable through SetOptions() API uint64_t hard_pending_compaction_bytes_limit = 256 * 1073741824ull; // The compaction style. Default: kCompactionStyleLevel @@ -498,17 +528,21 @@ struct AdvancedColumnFamilyOptions { // If level compaction_style = kCompactionStyleLevel, for each level, // which files are prioritized to be picked to compact. - // Default: kByCompensatedSize - CompactionPri compaction_pri = kByCompensatedSize; + // Default: kMinOverlappingRatio + CompactionPri compaction_pri = kMinOverlappingRatio; // The options needed to support Universal Style compactions + // + // Dynamically changeable through SetOptions() API + // Dynamic change example: + // SetOptions("compaction_options_universal", "{size_ratio=2;}") CompactionOptionsUniversal compaction_options_universal; // The options for FIFO compaction style // // Dynamically changeable through SetOptions() API // Dynamic change example: - // SetOption("compaction_options_fifo", "{max_table_files_size=100;ttl=2;}") + // SetOptions("compaction_options_fifo", "{max_table_files_size=100;}") CompactionOptionsFIFO compaction_options_fifo; // An iteration->Next() sequentially skips over keys with the same @@ -578,7 +612,10 @@ struct AdvancedColumnFamilyOptions { bool optimize_filters_for_hits = false; // After writing every SST file, reopen it and read all the keys. + // // Default: false + // + // Dynamically changeable through SetOptions() API bool paranoid_file_checks = false; // In debug mode, RocksDB run consistency checks on the LSM every time the LSM @@ -588,18 +625,31 @@ struct AdvancedColumnFamilyOptions { bool force_consistency_checks = false; // Measure IO stats in compactions and flushes, if true. + // // Default: false + // + // Dynamically changeable through SetOptions() API bool report_bg_io_stats = false; - // Non-bottom-level files older than TTL will go through the compaction - // process. This needs max_open_files to be set to -1. - // Enabled only for level compaction for now. + // Files older than TTL will go through the compaction process. + // Supported in Level and FIFO compaction. + // Pre-req: This needs max_open_files to be set to -1. + // In Level: Non-bottom-level files older than TTL will go through the + // compation process. + // In FIFO: Files older than TTL will be deleted. + // unit: seconds. Ex: 1 day = 1 * 24 * 60 * 60 // // Default: 0 (disabled) // // Dynamically changeable through SetOptions() API uint64_t ttl = 0; + // If this option is set then 1 in N blocks are compressed + // using a fast (lz4) and slow (zstd) compression algorithm. + // The compressibility is reported as stats and the stored + // data is left uncompressed (unless compression is also requested). + uint64_t sample_for_compression = 0; + // Create ColumnFamilyOptions with default values for all fields AdvancedColumnFamilyOptions(); // Create ColumnFamilyOptions from Options diff --git a/ceph/src/rocksdb/include/rocksdb/c.h b/ceph/src/rocksdb/include/rocksdb/c.h index 0899ed625..05699492c 100644 --- a/ceph/src/rocksdb/include/rocksdb/c.h +++ b/ceph/src/rocksdb/include/rocksdb/c.h @@ -637,7 +637,6 @@ extern ROCKSDB_LIBRARY_API rocksdb_iterator_t* rocksdb_writebatch_wi_create_iter rocksdb_iterator_t* base_iterator, rocksdb_column_family_handle_t* cf); - /* Block based table options */ extern ROCKSDB_LIBRARY_API rocksdb_block_based_table_options_t* @@ -729,6 +728,9 @@ extern ROCKSDB_LIBRARY_API void rocksdb_options_set_cuckoo_table_factory( extern ROCKSDB_LIBRARY_API void rocksdb_set_options( rocksdb_t* db, int count, const char* const keys[], const char* const values[], char** errptr); +extern ROCKSDB_LIBRARY_API void rocksdb_set_options_cf( + rocksdb_t* db, rocksdb_column_family_handle_t* handle, int count, const char* const keys[], const char* const values[], char** errptr); + extern ROCKSDB_LIBRARY_API rocksdb_options_t* rocksdb_options_create(); extern ROCKSDB_LIBRARY_API void rocksdb_options_destroy(rocksdb_options_t*); extern ROCKSDB_LIBRARY_API void rocksdb_options_increase_parallelism( @@ -1422,6 +1424,10 @@ extern ROCKSDB_LIBRARY_API const char* rocksdb_livefiles_smallestkey( const rocksdb_livefiles_t*, int index, size_t* size); extern ROCKSDB_LIBRARY_API const char* rocksdb_livefiles_largestkey( const rocksdb_livefiles_t*, int index, size_t* size); +extern ROCKSDB_LIBRARY_API uint64_t rocksdb_livefiles_entries( + const rocksdb_livefiles_t*, int index); +extern ROCKSDB_LIBRARY_API uint64_t rocksdb_livefiles_deletions( + const rocksdb_livefiles_t*, int index); extern ROCKSDB_LIBRARY_API void rocksdb_livefiles_destroy( const rocksdb_livefiles_t*); @@ -1453,6 +1459,13 @@ extern ROCKSDB_LIBRARY_API rocksdb_transactiondb_t* rocksdb_transactiondb_open( const rocksdb_transactiondb_options_t* txn_db_options, const char* name, char** errptr); +rocksdb_transactiondb_t* rocksdb_transactiondb_open_column_families( + const rocksdb_options_t* options, + const rocksdb_transactiondb_options_t* txn_db_options, const char* name, + int num_column_families, const char** column_family_names, + const rocksdb_options_t** column_family_options, + rocksdb_column_family_handle_t** column_family_handles, char** errptr); + extern ROCKSDB_LIBRARY_API const rocksdb_snapshot_t* rocksdb_transactiondb_create_snapshot(rocksdb_transactiondb_t* txn_db); @@ -1498,6 +1511,11 @@ extern ROCKSDB_LIBRARY_API char* rocksdb_transaction_get_for_update( const char* key, size_t klen, size_t* vlen, unsigned char exclusive, char** errptr); +char* rocksdb_transaction_get_for_update_cf( + rocksdb_transaction_t* txn, const rocksdb_readoptions_t* options, + rocksdb_column_family_handle_t* column_family, const char* key, size_t klen, + size_t* vlen, unsigned char exclusive, char** errptr); + extern ROCKSDB_LIBRARY_API char* rocksdb_transactiondb_get( rocksdb_transactiondb_t* txn_db, const rocksdb_readoptions_t* options, const char* key, size_t klen, size_t* vlen, char** errptr); @@ -1532,10 +1550,19 @@ extern ROCKSDB_LIBRARY_API void rocksdb_transaction_merge( rocksdb_transaction_t* txn, const char* key, size_t klen, const char* val, size_t vlen, char** errptr); +extern ROCKSDB_LIBRARY_API void rocksdb_transaction_merge_cf( + rocksdb_transaction_t* txn, rocksdb_column_family_handle_t* column_family, + const char* key, size_t klen, const char* val, size_t vlen, char** errptr); + extern ROCKSDB_LIBRARY_API void rocksdb_transactiondb_merge( rocksdb_transactiondb_t* txn_db, const rocksdb_writeoptions_t* options, const char* key, size_t klen, const char* val, size_t vlen, char** errptr); +extern ROCKSDB_LIBRARY_API void rocksdb_transactiondb_merge_cf( + rocksdb_transactiondb_t* txn_db, const rocksdb_writeoptions_t* options, + rocksdb_column_family_handle_t* column_family, const char* key, size_t klen, + const char* val, size_t vlen, char** errptr); + extern ROCKSDB_LIBRARY_API void rocksdb_transaction_delete( rocksdb_transaction_t* txn, const char* key, size_t klen, char** errptr); @@ -1565,6 +1592,11 @@ extern ROCKSDB_LIBRARY_API rocksdb_iterator_t* rocksdb_transactiondb_create_iterator(rocksdb_transactiondb_t* txn_db, const rocksdb_readoptions_t* options); +extern ROCKSDB_LIBRARY_API rocksdb_iterator_t* +rocksdb_transactiondb_create_iterator_cf( + rocksdb_transactiondb_t* txn_db, const rocksdb_readoptions_t* options, + rocksdb_column_family_handle_t* column_family); + extern ROCKSDB_LIBRARY_API void rocksdb_transactiondb_close( rocksdb_transactiondb_t* txn_db); diff --git a/ceph/src/rocksdb/include/rocksdb/cache.h b/ceph/src/rocksdb/include/rocksdb/cache.h index da3b934d8..ed7790aeb 100644 --- a/ceph/src/rocksdb/include/rocksdb/cache.h +++ b/ceph/src/rocksdb/include/rocksdb/cache.h @@ -25,6 +25,7 @@ #include #include #include +#include "rocksdb/memory_allocator.h" #include "rocksdb/slice.h" #include "rocksdb/statistics.h" #include "rocksdb/status.h" @@ -33,6 +34,8 @@ namespace rocksdb { class Cache; +extern const bool kDefaultToAdaptiveMutex; + struct LRUCacheOptions { // Capacity of the cache. size_t capacity = 0; @@ -58,13 +61,32 @@ struct LRUCacheOptions { // BlockBasedTableOptions::cache_index_and_filter_blocks_with_high_priority. double high_pri_pool_ratio = 0.0; + // If non-nullptr will use this allocator instead of system allocator when + // allocating memory for cache blocks. Call this method before you start using + // the cache! + // + // Caveat: when the cache is used as block cache, the memory allocator is + // ignored when dealing with compression libraries that allocate memory + // internally (currently only XPRESS). + std::shared_ptr memory_allocator; + + // Whether to use adaptive mutexes for cache shards. Note that adaptive + // mutexes need to be supported by the platform in order for this to have any + // effect. The default value is true if RocksDB is compiled with + // -DROCKSDB_DEFAULT_TO_ADAPTIVE_MUTEX, false otherwise. + bool use_adaptive_mutex = kDefaultToAdaptiveMutex; + LRUCacheOptions() {} LRUCacheOptions(size_t _capacity, int _num_shard_bits, - bool _strict_capacity_limit, double _high_pri_pool_ratio) + bool _strict_capacity_limit, double _high_pri_pool_ratio, + std::shared_ptr _memory_allocator = nullptr, + bool _use_adaptive_mutex = kDefaultToAdaptiveMutex) : capacity(_capacity), num_shard_bits(_num_shard_bits), strict_capacity_limit(_strict_capacity_limit), - high_pri_pool_ratio(_high_pri_pool_ratio) {} + high_pri_pool_ratio(_high_pri_pool_ratio), + memory_allocator(std::move(_memory_allocator)), + use_adaptive_mutex(_use_adaptive_mutex) {} }; // Create a new cache with a fixed size capacity. The cache is sharded @@ -75,10 +97,11 @@ struct LRUCacheOptions { // high_pri_pool_pct. // num_shard_bits = -1 means it is automatically determined: every shard // will be at least 512KB and number of shard bits will not exceed 6. -extern std::shared_ptr NewLRUCache(size_t capacity, - int num_shard_bits = -1, - bool strict_capacity_limit = false, - double high_pri_pool_ratio = 0.0); +extern std::shared_ptr NewLRUCache( + size_t capacity, int num_shard_bits = -1, + bool strict_capacity_limit = false, double high_pri_pool_ratio = 0.0, + std::shared_ptr memory_allocator = nullptr, + bool use_adaptive_mutex = kDefaultToAdaptiveMutex); extern std::shared_ptr NewLRUCache(const LRUCacheOptions& cache_opts); @@ -97,7 +120,8 @@ class Cache { // likely to get evicted than low priority entries. enum class Priority { HIGH, LOW }; - Cache() {} + Cache(std::shared_ptr allocator = nullptr) + : memory_allocator_(std::move(allocator)) {} // Destroys all existing entries by calling the "deleter" // function that was passed via the Insert() function. @@ -228,10 +252,14 @@ class Cache { virtual void TEST_mark_as_data_block(const Slice& /*key*/, size_t /*charge*/) {} + MemoryAllocator* memory_allocator() const { return memory_allocator_.get(); } + private: // No copying allowed Cache(const Cache&); Cache& operator=(const Cache&); + + std::shared_ptr memory_allocator_; }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/compaction_filter.h b/ceph/src/rocksdb/include/rocksdb/compaction_filter.h index 98f86c281..5d476fb8e 100644 --- a/ceph/src/rocksdb/include/rocksdb/compaction_filter.h +++ b/ceph/src/rocksdb/include/rocksdb/compaction_filter.h @@ -75,14 +75,11 @@ class CompactionFilter { // to modify the existing_value and pass it back through new_value. // value_changed needs to be set to true in this case. // - // If you use snapshot feature of RocksDB (i.e. call GetSnapshot() API on a - // DB* object), CompactionFilter might not be very useful for you. Due to - // guarantees we need to maintain, compaction process will not call Filter() - // on any keys that were written before the latest snapshot. In other words, - // compaction will only call Filter() on keys written after your most recent - // call to GetSnapshot(). In most cases, Filter() will not be called very - // often. This is something we're fixing. See the discussion at: - // https://www.facebook.com/groups/mysqlonrocksdb/permalink/999723240091865/ + // Note that RocksDB snapshots (i.e. call GetSnapshot() API on a + // DB* object) will not guarantee to preserve the state of the DB with + // CompactionFilter. Data seen from a snapshot might disppear after a + // compaction finishes. If you use snapshots, think twice about whether you + // want to use compaction filter and whether you are using it in a safe way. // // If multithreaded compaction is being used *and* a single CompactionFilter // instance was supplied via Options::compaction_filter, this method may be @@ -135,9 +132,9 @@ class CompactionFilter { // // Caveats: // - The keys are skipped even if there are snapshots containing them, - // as if IgnoreSnapshots() was true; i.e. values removed - // by kRemoveAndSkipUntil can disappear from a snapshot - beware - // if you're using TransactionDB or DB::GetSnapshot(). + // i.e. values removed by kRemoveAndSkipUntil can disappear from a + // snapshot - beware if you're using TransactionDB or + // DB::GetSnapshot(). // - If value for a key was overwritten or merged into (multiple Put()s // or Merge()s), and compaction filter skips this key with // kRemoveAndSkipUntil, it's possible that it will remove only @@ -176,15 +173,12 @@ class CompactionFilter { return Decision::kKeep; } - // By default, compaction will only call Filter() on keys written after the - // most recent call to GetSnapshot(). However, if the compaction filter - // overrides IgnoreSnapshots to make it return true, the compaction filter - // will be called even if the keys were written before the last snapshot. - // This behavior is to be used only when we want to delete a set of keys - // irrespective of snapshots. In particular, care should be taken - // to understand that the values of these keys will change even if we are - // using a snapshot. - virtual bool IgnoreSnapshots() const { return false; } + // This function is deprecated. Snapshots will always be ignored for + // compaction filters, because we realized that not ignoring snapshots doesn't + // provide the gurantee we initially thought it would provide. Repeatable + // reads will not be guaranteed anyway. If you override the function and + // returns false, we will fail the compaction. + virtual bool IgnoreSnapshots() const { return true; } // Returns a name that identifies this compaction filter. // The name will be printed to LOG file on start up for diagnosis. @@ -195,7 +189,7 @@ class CompactionFilter { // application to know about different compactions class CompactionFilterFactory { public: - virtual ~CompactionFilterFactory() { } + virtual ~CompactionFilterFactory() {} virtual std::unique_ptr CreateCompactionFilter( const CompactionFilter::Context& context) = 0; diff --git a/ceph/src/rocksdb/include/rocksdb/compaction_job_stats.h b/ceph/src/rocksdb/include/rocksdb/compaction_job_stats.h index e5d8af8bd..4021fcab2 100644 --- a/ceph/src/rocksdb/include/rocksdb/compaction_job_stats.h +++ b/ceph/src/rocksdb/include/rocksdb/compaction_job_stats.h @@ -18,6 +18,9 @@ struct CompactionJobStats { // the elapsed time of this compaction in microseconds. uint64_t elapsed_micros; + // the elapsed CPU time of this compaction in microseconds. + uint64_t cpu_micros; + // the number of compaction input records. uint64_t num_input_records; // the number of compaction input files. diff --git a/ceph/src/rocksdb/include/rocksdb/comparator.h b/ceph/src/rocksdb/include/rocksdb/comparator.h index 12e05ffee..46279f9a6 100644 --- a/ceph/src/rocksdb/include/rocksdb/comparator.h +++ b/ceph/src/rocksdb/include/rocksdb/comparator.h @@ -55,9 +55,8 @@ class Comparator { // If *start < limit, changes *start to a short string in [start,limit). // Simple comparator implementations may return with *start unchanged, // i.e., an implementation of this method that does nothing is correct. - virtual void FindShortestSeparator( - std::string* start, - const Slice& limit) const = 0; + virtual void FindShortestSeparator(std::string* start, + const Slice& limit) const = 0; // Changes *key to a short string >= *key. // Simple comparator implementations may return with *key unchanged, diff --git a/ceph/src/rocksdb/include/rocksdb/concurrent_task_limiter.h b/ceph/src/rocksdb/include/rocksdb/concurrent_task_limiter.h new file mode 100644 index 000000000..2e054efda --- /dev/null +++ b/ceph/src/rocksdb/include/rocksdb/concurrent_task_limiter.h @@ -0,0 +1,46 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// Copyright (c) 2011 The LevelDB Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. See the AUTHORS file for names of contributors. + +#pragma once + +#include "rocksdb/env.h" +#include "rocksdb/statistics.h" + +namespace rocksdb { + +class ConcurrentTaskLimiter { + public: + virtual ~ConcurrentTaskLimiter() {} + + // Returns a name that identifies this concurrent task limiter. + virtual const std::string& GetName() const = 0; + + // Set max concurrent tasks. + // limit = 0 means no new task allowed. + // limit < 0 means no limitation. + virtual void SetMaxOutstandingTask(int32_t limit) = 0; + + // Reset to unlimited max concurrent task. + virtual void ResetMaxOutstandingTask() = 0; + + // Returns current outstanding task count. + virtual int32_t GetOutstandingTask() const = 0; +}; + +// Create a ConcurrentTaskLimiter that can be shared with mulitple CFs +// across RocksDB instances to control concurrent tasks. +// +// @param name: Name of the limiter. +// @param limit: max concurrent tasks. +// limit = 0 means no new task allowed. +// limit < 0 means no limitation. +extern ConcurrentTaskLimiter* NewConcurrentTaskLimiter(const std::string& name, + int32_t limit); + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/convenience.h b/ceph/src/rocksdb/include/rocksdb/convenience.h index c6b11d032..d3cbe6016 100644 --- a/ceph/src/rocksdb/include/rocksdb/convenience.h +++ b/ceph/src/rocksdb/include/rocksdb/convenience.h @@ -277,15 +277,13 @@ Status GetPlainTableOptionsFromMap( // BlockBasedTableOptions as part of the string for block-based table factory: // "write_buffer_size=1024;block_based_table_factory={block_size=4k};" // "max_write_buffer_num=2" -Status GetColumnFamilyOptionsFromString( - const ColumnFamilyOptions& base_options, - const std::string& opts_str, - ColumnFamilyOptions* new_options); +Status GetColumnFamilyOptionsFromString(const ColumnFamilyOptions& base_options, + const std::string& opts_str, + ColumnFamilyOptions* new_options); -Status GetDBOptionsFromString( - const DBOptions& base_options, - const std::string& opts_str, - DBOptions* new_options); +Status GetDBOptionsFromString(const DBOptions& base_options, + const std::string& opts_str, + DBOptions* new_options); Status GetStringFromDBOptions(std::string* opts_str, const DBOptions& db_options, @@ -301,14 +299,12 @@ Status GetStringFromCompressionType(std::string* compression_str, std::vector GetSupportedCompressions(); Status GetBlockBasedTableOptionsFromString( - const BlockBasedTableOptions& table_options, - const std::string& opts_str, + const BlockBasedTableOptions& table_options, const std::string& opts_str, BlockBasedTableOptions* new_table_options); -Status GetPlainTableOptionsFromString( - const PlainTableOptions& table_options, - const std::string& opts_str, - PlainTableOptions* new_table_options); +Status GetPlainTableOptionsFromString(const PlainTableOptions& table_options, + const std::string& opts_str, + PlainTableOptions* new_table_options); Status GetMemTableRepFactoryFromString( const std::string& opts_str, diff --git a/ceph/src/rocksdb/include/rocksdb/db.h b/ceph/src/rocksdb/include/rocksdb/db.h index f1430bce8..b40af20e2 100644 --- a/ceph/src/rocksdb/include/rocksdb/db.h +++ b/ceph/src/rocksdb/include/rocksdb/db.h @@ -52,9 +52,11 @@ struct ExternalSstFileInfo; class WriteBatch; class Env; class EventListener; +class StatsHistoryIterator; class TraceWriter; - -using std::unique_ptr; +#ifdef ROCKSDB_LITE +class CompactionJobInfo; +#endif extern const std::string kDefaultColumnFamilyName; struct ColumnFamilyDescriptor { @@ -95,16 +97,22 @@ struct Range { Slice start; Slice limit; - Range() { } - Range(const Slice& s, const Slice& l) : start(s), limit(l) { } + Range() {} + Range(const Slice& s, const Slice& l) : start(s), limit(l) {} }; struct RangePtr { const Slice* start; const Slice* limit; - RangePtr() : start(nullptr), limit(nullptr) { } - RangePtr(const Slice* s, const Slice* l) : start(s), limit(l) { } + RangePtr() : start(nullptr), limit(nullptr) {} + RangePtr(const Slice* s, const Slice* l) : start(s), limit(l) {} +}; + +struct IngestExternalFileArg { + ColumnFamilyHandle* column_family = nullptr; + std::vector external_files; + IngestExternalFileOptions options; }; // A collections of table properties objects, where @@ -123,8 +131,7 @@ class DB { // OK on success. // Stores nullptr in *dbptr and returns a non-OK status on error. // Caller should delete *dbptr when it is no longer needed. - static Status Open(const Options& options, - const std::string& name, + static Status Open(const Options& options, const std::string& name, DB** dbptr); // Open the database for read only. All DB interfaces @@ -134,9 +141,9 @@ class DB { // // Not supported in ROCKSDB_LITE, in which case the function will // return Status::NotSupported. - static Status OpenForReadOnly(const Options& options, - const std::string& name, DB** dbptr, - bool error_if_log_file_exist = false); + static Status OpenForReadOnly(const Options& options, const std::string& name, + DB** dbptr, + bool error_if_log_file_exist = false); // Open the database for read only with column families. When opening DB with // read only, you can specify only a subset of column families in the @@ -152,6 +159,54 @@ class DB { std::vector* handles, DB** dbptr, bool error_if_log_file_exist = false); + // The following OpenAsSecondary functions create a secondary instance that + // can dynamically tail the MANIFEST of a primary that must have already been + // created. User can call TryCatchUpWithPrimary to make the secondary + // instance catch up with primary (WAL tailing is NOT supported now) whenever + // the user feels necessary. Column families created by the primary after the + // secondary instance starts are currently ignored by the secondary instance. + // Column families opened by secondary and dropped by the primary will be + // dropped by secondary as well. However the user of the secondary instance + // can still access the data of such dropped column family as long as they + // do not destroy the corresponding column family handle. + // WAL tailing is not supported at present, but will arrive soon. + // + // The options argument specifies the options to open the secondary instance. + // The name argument specifies the name of the primary db that you have used + // to open the primary instance. + // The secondary_path argument points to a directory where the secondary + // instance stores its info log. + // The dbptr is an out-arg corresponding to the opened secondary instance. + // The pointer points to a heap-allocated database, and the user should + // delete it after use. + // Open DB as secondary instance with only the default column family. + // Return OK on success, non-OK on failures. + static Status OpenAsSecondary(const Options& options, const std::string& name, + const std::string& secondary_path, DB** dbptr); + + // Open DB as secondary instance with column families. You can open a subset + // of column families in secondary mode. + // The db_options specify the database specific options. + // The name argument specifies the name of the primary db that you have used + // to open the primary instance. + // The secondary_path argument points to a directory where the secondary + // instance stores its info log. + // The column_families argument specifieds a list of column families to open. + // If any of the column families does not exist, the function returns non-OK + // status. + // The handles is an out-arg corresponding to the opened database column + // familiy handles. + // The dbptr is an out-arg corresponding to the opened secondary instance. + // The pointer points to a heap-allocated database, and the caller should + // delete it after use. Before deleting the dbptr, the user should also + // delete the pointers stored in handles vector. + // Return OK on success, on-OK on failures. + static Status OpenAsSecondary( + const DBOptions& db_options, const std::string& name, + const std::string& secondary_path, + const std::vector& column_families, + std::vector* handles, DB** dbptr); + // Open DB with column families. // db_options specify database specific options // column_families is the vector of all column families in the database, @@ -190,7 +245,7 @@ class DB { const std::string& name, std::vector* column_families); - DB() { } + DB() {} virtual ~DB(); // Create a column_family and return the handle of column family @@ -287,16 +342,12 @@ class DB { // a non-OK status on error. It is not an error if no keys exist in the range // ["begin_key", "end_key"). // - // This feature is currently an experimental performance optimization for - // deleting very large ranges of contiguous keys. Invoking it many times or on - // small ranges may severely degrade read performance; in particular, the - // resulting performance can be worse than calling Delete() for each key in - // the range. Note also the degraded read performance affects keys outside the - // deleted ranges, and affects database operations involving scans, like flush - // and compaction. - // - // Consider setting ReadOptions::ignore_range_deletions = true to speed - // up reads for key(s) that are known to be unaffected by range deletions. + // This feature is now usable in production, with the following caveats: + // 1) Accumulating many range tombstones in the memtable will degrade read + // performance; this can be avoided by manually flushing occasionally. + // 2) Limiting the maximum number of open files in the presence of range + // tombstones can degrade read performance. To avoid this problem, set + // max_open_files to -1 whenever possible. virtual Status DeleteRange(const WriteOptions& options, ColumnFamilyHandle* column_family, const Slice& begin_key, const Slice& end_key); @@ -342,7 +393,8 @@ class DB { virtual Status Get(const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* value) = 0; - virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value) { + virtual Status Get(const ReadOptions& options, const Slice& key, + std::string* value) { return Get(options, DefaultColumnFamily(), key, value); } @@ -363,9 +415,10 @@ class DB { virtual std::vector MultiGet(const ReadOptions& options, const std::vector& keys, std::vector* values) { - return MultiGet(options, std::vector( - keys.size(), DefaultColumnFamily()), - keys, values); + return MultiGet( + options, + std::vector(keys.size(), DefaultColumnFamily()), + keys, values); } // If the key definitely does not exist in the database, then this method @@ -572,6 +625,11 @@ class DB { // log files that should be kept. static const std::string kMinLogNumberToKeep; + // "rocksdb.min-obsolete-sst-number-to-keep" - return the minimum file + // number for an obsolete SST to be kept. The max value of `uint64_t` + // will be returned if all obsolete files can be deleted. + static const std::string kMinObsoleteSstNumberToKeep; + // "rocksdb.total-sst-files-size" - returns total size (bytes) of all SST // files. // WARNING: may slow down online queries if there are too many files. @@ -670,6 +728,7 @@ class DB { // "rocksdb.current-super-version-number" // "rocksdb.estimate-live-data-size" // "rocksdb.min-log-number-to-keep" + // "rocksdb.min-obsolete-sst-number-to-keep" // "rocksdb.total-sst-files-size" // "rocksdb.live-sst-files-size" // "rocksdb.base-level" @@ -721,13 +780,10 @@ class DB { // include_flags should be of type DB::SizeApproximationFlags virtual void GetApproximateSizes(ColumnFamilyHandle* column_family, const Range* range, int n, uint64_t* sizes, - uint8_t include_flags - = INCLUDE_FILES) = 0; + uint8_t include_flags = INCLUDE_FILES) = 0; virtual void GetApproximateSizes(const Range* range, int n, uint64_t* sizes, - uint8_t include_flags - = INCLUDE_FILES) { - GetApproximateSizes(DefaultColumnFamily(), range, n, sizes, - include_flags); + uint8_t include_flags = INCLUDE_FILES) { + GetApproximateSizes(DefaultColumnFamily(), range, n, sizes, include_flags); } // The method is similar to GetApproximateSizes, except it @@ -744,8 +800,7 @@ class DB { // Deprecated versions of GetApproximateSizes ROCKSDB_DEPRECATED_FUNC virtual void GetApproximateSizes( - const Range* range, int n, uint64_t* sizes, - bool include_memtable) { + const Range* range, int n, uint64_t* sizes, bool include_memtable) { uint8_t include_flags = SizeApproximationFlags::INCLUDE_FILES; if (include_memtable) { include_flags |= SizeApproximationFlags::INCLUDE_MEMTABLES; @@ -753,9 +808,8 @@ class DB { GetApproximateSizes(DefaultColumnFamily(), range, n, sizes, include_flags); } ROCKSDB_DEPRECATED_FUNC virtual void GetApproximateSizes( - ColumnFamilyHandle* column_family, - const Range* range, int n, uint64_t* sizes, - bool include_memtable) { + ColumnFamilyHandle* column_family, const Range* range, int n, + uint64_t* sizes, bool include_memtable) { uint8_t include_flags = SizeApproximationFlags::INCLUDE_FILES; if (include_memtable) { include_flags |= SizeApproximationFlags::INCLUDE_MEMTABLES; @@ -832,18 +886,20 @@ class DB { virtual Status CompactFiles( const CompactionOptions& compact_options, ColumnFamilyHandle* column_family, - const std::vector& input_file_names, - const int output_level, const int output_path_id = -1, - std::vector* const output_file_names = nullptr) = 0; + const std::vector& input_file_names, const int output_level, + const int output_path_id = -1, + std::vector* const output_file_names = nullptr, + CompactionJobInfo* compaction_job_info = nullptr) = 0; virtual Status CompactFiles( const CompactionOptions& compact_options, - const std::vector& input_file_names, - const int output_level, const int output_path_id = -1, - std::vector* const output_file_names = nullptr) { + const std::vector& input_file_names, const int output_level, + const int output_path_id = -1, + std::vector* const output_file_names = nullptr, + CompactionJobInfo* compaction_job_info = nullptr) { return CompactFiles(compact_options, DefaultColumnFamily(), input_file_names, output_level, output_path_id, - output_file_names); + output_file_names, compaction_job_info); } // This function will wait until all currently running background processes @@ -900,11 +956,24 @@ class DB { virtual DBOptions GetDBOptions() const = 0; // Flush all mem-table data. + // Flush a single column family, even when atomic flush is enabled. To flush + // multiple column families, use Flush(options, column_families). virtual Status Flush(const FlushOptions& options, ColumnFamilyHandle* column_family) = 0; virtual Status Flush(const FlushOptions& options) { return Flush(options, DefaultColumnFamily()); } + // Flushes multiple column families. + // If atomic flush is not enabled, Flush(options, column_families) is + // equivalent to calling Flush(options, column_family) multiple times. + // If atomic flush is enabled, Flush(options, column_families) will flush all + // column families specified in 'column_families' up to the latest sequence + // number at the time when flush is requested. + // Note that RocksDB 5.15 and earlier may not be able to open later versions + // with atomic flush enabled. + virtual Status Flush( + const FlushOptions& options, + const std::vector& column_families) = 0; // Flush the WAL memory buffer to the file. If sync is true, it calls SyncWAL // afterwards. @@ -917,6 +986,16 @@ class DB { // Currently only works if allow_mmap_writes = false in Options. virtual Status SyncWAL() = 0; + // Lock the WAL. Also flushes the WAL after locking. + virtual Status LockWAL() { + return Status::NotSupported("LockWAL not implemented"); + } + + // Unlock the WAL. + virtual Status UnlockWAL() { + return Status::NotSupported("UnlockWAL not implemented"); + } + // The sequence number of the most recent transaction. virtual SequenceNumber GetLatestSequenceNumber() const = 0; @@ -979,9 +1058,9 @@ class DB { // cleared aggressively and the iterator might keep getting invalid before // an update is read. virtual Status GetUpdatesSince( - SequenceNumber seq_number, unique_ptr* iter, - const TransactionLogIterator::ReadOptions& - read_options = TransactionLogIterator::ReadOptions()) = 0; + SequenceNumber seq_number, std::unique_ptr* iter, + const TransactionLogIterator::ReadOptions& read_options = + TransactionLogIterator::ReadOptions()) = 0; // Windows API macro interference #undef DeleteFile @@ -1000,8 +1079,7 @@ class DB { ColumnFamilyMetaData* /*metadata*/) {} // Get the metadata of the default column family. - void GetColumnFamilyMetaData( - ColumnFamilyMetaData* metadata) { + void GetColumnFamilyMetaData(ColumnFamilyMetaData* metadata) { GetColumnFamilyMetaData(DefaultColumnFamily(), metadata); } @@ -1033,6 +1111,24 @@ class DB { return IngestExternalFile(DefaultColumnFamily(), external_files, options); } + // IngestExternalFiles() will ingest files for multiple column families, and + // record the result atomically to the MANIFEST. + // If this function returns OK, all column families' ingestion must succeed. + // If this function returns NOK, or the process crashes, then non-of the + // files will be ingested into the database after recovery. + // Note that it is possible for application to observe a mixed state during + // the execution of this function. If the user performs range scan over the + // column families with iterators, iterator on one column family may return + // ingested data, while iterator on other column family returns old data. + // Users can use snapshot for a consistent view of data. + // If your db ingests multiple SST files using this API, i.e. args.size() + // > 1, then RocksDB 5.15 and earlier will not be able to open it. + // + // REQUIRES: each arg corresponds to a different column family: namely, for + // 0 <= i < j < len(args), args[i].column_family != args[j].column_family. + virtual Status IngestExternalFiles( + const std::vector& args) = 0; + virtual Status VerifyChecksum() = 0; // AddFile() is deprecated, please use IngestExternalFile() @@ -1182,6 +1278,31 @@ class DB { // Needed for StackableDB virtual DB* GetRootDB() { return this; } + // Given a time window, return an iterator for accessing stats history + // User is responsible for deleting StatsHistoryIterator after use + virtual Status GetStatsHistory( + uint64_t /*start_time*/, uint64_t /*end_time*/, + std::unique_ptr* /*stats_iterator*/) { + return Status::NotSupported("GetStatsHistory() is not implemented."); + } + +#ifndef ROCKSDB_LITE + // Make the secondary instance catch up with the primary by tailing and + // replaying the MANIFEST and WAL of the primary. + // Column families created by the primary after the secondary instance starts + // will be ignored unless the secondary instance closes and restarts with the + // newly created column families. + // Column families that exist before secondary instance starts and dropped by + // the primary afterwards will be marked as dropped. However, as long as the + // secondary instance does not delete the corresponding column family + // handles, the data of the column family is still accessible to the + // secondary. + // TODO: we will support WAL tailing soon. + virtual Status TryCatchUpWithPrimary() { + return Status::NotSupported("Supported only by secondary instance"); + } +#endif // !ROCKSDB_LITE + private: // No copying allowed DB(const DB&); @@ -1192,7 +1313,7 @@ class DB { // Be very careful using this method. Status DestroyDB(const std::string& name, const Options& options, const std::vector& column_families = - std::vector()); + std::vector()); #ifndef ROCKSDB_LITE // If a DB cannot be opened, you may attempt to call this method to diff --git a/ceph/src/rocksdb/include/rocksdb/env.h b/ceph/src/rocksdb/include/rocksdb/env.h index 755836461..4d3a96fe2 100644 --- a/ceph/src/rocksdb/include/rocksdb/env.h +++ b/ceph/src/rocksdb/include/rocksdb/env.h @@ -32,6 +32,13 @@ #undef GetCurrentTime #endif +#if defined(__GNUC__) || defined(__clang__) +#define ROCKSDB_PRINTF_FORMAT_ATTR(format_param, dots_param) \ + __attribute__((__format__(__printf__, format_param, dots_param))) +#else +#define ROCKSDB_PRINTF_FORMAT_ATTR(format_param, dots_param) +#endif + namespace rocksdb { class FileLock; @@ -50,24 +57,20 @@ class RateLimiter; class ThreadStatusUpdater; struct ThreadStatus; -using std::unique_ptr; -using std::shared_ptr; - const size_t kDefaultPageSize = 4 * 1024; // Options while opening a file to read/write struct EnvOptions { - // Construct with default Options EnvOptions(); // Construct from Options explicit EnvOptions(const DBOptions& options); - // If true, then use mmap to read data + // If true, then use mmap to read data bool use_mmap_reads = false; - // If true, then use mmap to write data + // If true, then use mmap to write data bool use_mmap_writes = true; // If true, then use O_DIRECT for reading data @@ -137,9 +140,8 @@ class Env { // // The returned file will only be accessed by one thread at a time. virtual Status NewSequentialFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) - = 0; + std::unique_ptr* result, + const EnvOptions& options) = 0; // Create a brand new random access read-only file with the // specified name. On success, stores a pointer to the new file in @@ -149,18 +151,17 @@ class Env { // // The returned file may be concurrently accessed by multiple threads. virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) - = 0; + std::unique_ptr* result, + const EnvOptions& options) = 0; // These values match Linux definition // https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fcntl.h#n56 enum WriteLifeTimeHint { - WLTH_NOT_SET = 0, // No hint information set - WLTH_NONE, // No hints about write life time - WLTH_SHORT, // Data written has a short life time - WLTH_MEDIUM, // Data written has a medium life time - WLTH_LONG, // Data written has a long life time - WLTH_EXTREME, // Data written has an extremely long life time + WLTH_NOT_SET = 0, // No hint information set + WLTH_NONE, // No hints about write life time + WLTH_SHORT, // Data written has a short life time + WLTH_MEDIUM, // Data written has a medium life time + WLTH_LONG, // Data written has a long life time + WLTH_EXTREME, // Data written has an extremely long life time }; // Create an object that writes to a new file with the specified @@ -171,7 +172,7 @@ class Env { // // The returned file will only be accessed by one thread at a time. virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) = 0; // Create an object that writes to a new file with the specified @@ -182,7 +183,7 @@ class Env { // // The returned file will only be accessed by one thread at a time. virtual Status ReopenWritableFile(const std::string& /*fname*/, - unique_ptr* /*result*/, + std::unique_ptr* /*result*/, const EnvOptions& /*options*/) { return Status::NotSupported(); } @@ -190,7 +191,7 @@ class Env { // Reuse an existing file by renaming it and opening it as writable. virtual Status ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options); // Open `fname` for random read and write, if file doesn't exist the file @@ -199,7 +200,7 @@ class Env { // // The returned file will only be accessed by one thread at a time. virtual Status NewRandomRWFile(const std::string& /*fname*/, - unique_ptr* /*result*/, + std::unique_ptr* /*result*/, const EnvOptions& /*options*/) { return Status::NotSupported("RandomRWFile is not implemented in this Env"); } @@ -209,7 +210,7 @@ class Env { // file in `*result`. The file must exist prior to this call. virtual Status NewMemoryMappedFileBuffer( const std::string& /*fname*/, - unique_ptr* /*result*/) { + std::unique_ptr* /*result*/) { return Status::NotSupported( "MemoryMappedFileBuffer is not implemented in this Env"); } @@ -222,7 +223,7 @@ class Env { // *result and returns OK. On failure stores nullptr in *result and // returns non-OK. virtual Status NewDirectory(const std::string& name, - unique_ptr* result) = 0; + std::unique_ptr* result) = 0; // Returns OK if the named file exists. // NotFound if the named file does not exist, @@ -321,16 +322,12 @@ class Env { virtual Status UnlockFile(FileLock* lock) = 0; // Priority for scheduling job in thread pool - enum Priority { BOTTOM, LOW, HIGH, TOTAL }; + enum Priority { BOTTOM, LOW, HIGH, USER, TOTAL }; static std::string PriorityToString(Priority priority); // Priority for requesting bytes in rate limiter scheduler - enum IOPriority { - IO_LOW = 0, - IO_HIGH = 1, - IO_TOTAL = 2 - }; + enum IOPriority { IO_LOW = 0, IO_HIGH = 1, IO_TOTAL = 2 }; // Arrange to run "(*function)(arg)" once in a background thread, in // the thread pool specified by pri. By default, jobs go to the 'LOW' @@ -370,7 +367,7 @@ class Env { // Create and return a log file for storing informational messages. virtual Status NewLogger(const std::string& fname, - shared_ptr* result) = 0; + std::shared_ptr* result) = 0; // Returns the number of micro-seconds since some fixed point in time. // It is often used as system time such as in GenericRateLimiter @@ -382,9 +379,10 @@ class Env { // Default implementation simply relies on NowMicros. // In platform-specific implementations, NowNanos() should return time points // that are MONOTONIC. - virtual uint64_t NowNanos() { - return NowMicros() * 1000; - } + virtual uint64_t NowNanos() { return NowMicros() * 1000; } + + // 0 indicates not supported. + virtual uint64_t NowCPUNanos() { return 0; } // Sleep/delay the thread for the prescribed number of micro-seconds. virtual void SleepForMicroseconds(int micros) = 0; @@ -398,7 +396,7 @@ class Env { // Get full directory name for this db. virtual Status GetAbsolutePath(const std::string& db_path, - std::string* output_path) = 0; + std::string* output_path) = 0; // The number of background worker threads of a specific thread pool // for this environment. 'LOW' is the default pool. @@ -486,6 +484,8 @@ class Env { return Status::NotSupported(); } + // If you're adding methods here, remember to add them to EnvWrapper too. + protected: // The pointer to an internal structure that will update the // status of each thread. @@ -505,7 +505,7 @@ ThreadStatusUpdater* CreateThreadStatusUpdater(); // A file abstraction for reading sequentially through a file class SequentialFile { public: - SequentialFile() { } + SequentialFile() {} virtual ~SequentialFile(); // Read up to "n" bytes from the file. "scratch[0..n-1]" may be @@ -548,13 +548,15 @@ class SequentialFile { Slice* /*result*/, char* /*scratch*/) { return Status::NotSupported(); } + + // If you're adding methods here, remember to add them to + // SequentialFileWrapper too. }; // A file abstraction for randomly reading the contents of a file. class RandomAccessFile { public: - - RandomAccessFile() { } + RandomAccessFile() {} virtual ~RandomAccessFile(); // Read up to "n" bytes from the file starting at "offset". @@ -591,8 +593,8 @@ class RandomAccessFile { // // Note: these IDs are only valid for the duration of the process. virtual size_t GetUniqueId(char* /*id*/, size_t /*max_size*/) const { - return 0; // Default implementation to prevent issues with backwards - // compatibility. + return 0; // Default implementation to prevent issues with backwards + // compatibility. }; enum AccessPattern { NORMAL, RANDOM, SEQUENTIAL, WILLNEED, DONTNEED }; @@ -613,6 +615,9 @@ class RandomAccessFile { virtual Status InvalidateCache(size_t /*offset*/, size_t /*length*/) { return Status::NotSupported("InvalidateCache not supported."); } + + // If you're adding methods here, remember to add them to + // RandomAccessFileWrapper too. }; // A file abstraction for sequential writing. The implementation @@ -621,11 +626,10 @@ class RandomAccessFile { class WritableFile { public: WritableFile() - : last_preallocated_block_(0), - preallocation_block_size_(0), - io_priority_(Env::IO_TOTAL), - write_hint_(Env::WLTH_NOT_SET) { - } + : last_preallocated_block_(0), + preallocation_block_size_(0), + io_priority_(Env::IO_TOTAL), + write_hint_(Env::WLTH_NOT_SET) {} virtual ~WritableFile(); // Append data to the end of the file @@ -653,7 +657,8 @@ class WritableFile { // // PositionedAppend() requires aligned buffer to be passed in. The alignment // required is queried via GetRequiredBufferAlignment() - virtual Status PositionedAppend(const Slice& /* data */, uint64_t /* offset */) { + virtual Status PositionedAppend(const Slice& /* data */, + uint64_t /* offset */) { return Status::NotSupported(); } @@ -664,7 +669,7 @@ class WritableFile { virtual Status Truncate(uint64_t /*size*/) { return Status::OK(); } virtual Status Close() = 0; virtual Status Flush() = 0; - virtual Status Sync() = 0; // sync data + virtual Status Sync() = 0; // sync data /* * Sync data and/or metadata as well. @@ -672,15 +677,11 @@ class WritableFile { * Override this method for environments where we need to sync * metadata as well. */ - virtual Status Fsync() { - return Sync(); - } + virtual Status Fsync() { return Sync(); } // true if Sync() and Fsync() are safe to call concurrently with Append() // and Flush(). - virtual bool IsSyncThreadSafe() const { - return false; - } + virtual bool IsSyncThreadSafe() const { return false; } // Indicates the upper layers if the current WritableFile implementation // uses direct IO. @@ -693,9 +694,7 @@ class WritableFile { * Change the priority in rate limiter if rate limiting is enabled. * If rate limiting is not enabled, this call has no effect. */ - virtual void SetIOPriority(Env::IOPriority pri) { - io_priority_ = pri; - } + virtual void SetIOPriority(Env::IOPriority pri) { io_priority_ = pri; } virtual Env::IOPriority GetIOPriority() { return io_priority_; } @@ -707,9 +706,7 @@ class WritableFile { /* * Get the size of valid data in the file. */ - virtual uint64_t GetFileSize() { - return 0; - } + virtual uint64_t GetFileSize() { return 0; } /* * Get and set the default pre-allocation block size for writes to @@ -729,7 +726,7 @@ class WritableFile { // For documentation, refer to RandomAccessFile::GetUniqueId() virtual size_t GetUniqueId(char* /*id*/, size_t /*max_size*/) const { - return 0; // Default implementation to prevent issues with backwards + return 0; // Default implementation to prevent issues with backwards } // Remove any kind of caching of data from the offset to offset+length @@ -764,10 +761,10 @@ class WritableFile { // cover this write would be and Allocate to that point. const auto block_size = preallocation_block_size_; size_t new_last_preallocated_block = - (offset + len + block_size - 1) / block_size; + (offset + len + block_size - 1) / block_size; if (new_last_preallocated_block > last_preallocated_block_) { size_t num_spanned_blocks = - new_last_preallocated_block - last_preallocated_block_; + new_last_preallocated_block - last_preallocated_block_; Allocate(block_size * last_preallocated_block_, block_size * num_spanned_blocks); last_preallocated_block_ = new_last_preallocated_block; @@ -779,6 +776,9 @@ class WritableFile { return Status::OK(); } + // If you're adding methods here, remember to add them to + // WritableFileWrapper too. + protected: size_t preallocation_block_size() { return preallocation_block_size_; } @@ -790,9 +790,6 @@ class WritableFile { void operator=(const WritableFile&); protected: - friend class WritableFileWrapper; - friend class WritableFileMirror; - Env::IOPriority io_priority_; Env::WriteLifeTimeHint write_hint_; }; @@ -829,6 +826,9 @@ class RandomRWFile { virtual Status Close() = 0; + // If you're adding methods here, remember to add them to + // RandomRWFileWrapper too. + // No copying allowed RandomRWFile(const RandomRWFile&) = delete; RandomRWFile& operator=(const RandomRWFile&) = delete; @@ -837,7 +837,7 @@ class RandomRWFile { // MemoryMappedFileBuffer object represents a memory-mapped file's raw buffer. // Subclasses should release the mapping upon destruction. class MemoryMappedFileBuffer { -public: + public: MemoryMappedFileBuffer(void* _base, size_t _length) : base_(_base), length_(_length) {} @@ -848,11 +848,11 @@ public: MemoryMappedFileBuffer(const MemoryMappedFileBuffer&) = delete; MemoryMappedFileBuffer& operator=(const MemoryMappedFileBuffer&) = delete; - void* GetBase() const { return base_; } - size_t GetLen() const { return length_; } + void* GetBase() const { return base_; } + size_t GetLen() const { return length_; } -protected: - void* base_; + protected: + void* base_; const size_t length_; }; @@ -867,6 +867,9 @@ class Directory { virtual size_t GetUniqueId(char* /*id*/, size_t /*max_size*/) const { return 0; } + + // If you're adding methods here, remember to add them to + // DirectoryWrapper too. }; enum InfoLogLevel : unsigned char { @@ -909,7 +912,8 @@ class Logger { // and format. Any log with level under the internal log level // of *this (see @SetInfoLogLevel and @GetInfoLogLevel) will not be // printed. - virtual void Logv(const InfoLogLevel log_level, const char* format, va_list ap); + virtual void Logv(const InfoLogLevel log_level, const char* format, + va_list ap); virtual size_t GetLogFileSize() const { return kDoNotSupportGetLogFileSize; } // Flush to the OS buffers @@ -919,6 +923,8 @@ class Logger { log_level_ = log_level; } + // If you're adding methods here, remember to add them to LoggerWrapper too. + protected: virtual Status CloseImpl(); bool closed_; @@ -930,58 +936,65 @@ class Logger { InfoLogLevel log_level_; }; - // Identifies a locked file. class FileLock { public: - FileLock() { } + FileLock() {} virtual ~FileLock(); + private: // No copying allowed FileLock(const FileLock&); void operator=(const FileLock&); }; -extern void LogFlush(const shared_ptr& info_log); +extern void LogFlush(const std::shared_ptr& info_log); extern void Log(const InfoLogLevel log_level, - const shared_ptr& info_log, const char* format, ...); + const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(3, 4); // a set of log functions with different log levels. -extern void Header(const shared_ptr& info_log, const char* format, ...); -extern void Debug(const shared_ptr& info_log, const char* format, ...); -extern void Info(const shared_ptr& info_log, const char* format, ...); -extern void Warn(const shared_ptr& info_log, const char* format, ...); -extern void Error(const shared_ptr& info_log, const char* format, ...); -extern void Fatal(const shared_ptr& info_log, const char* format, ...); +extern void Header(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Debug(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Info(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Warn(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Error(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Fatal(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); // Log the specified data to *info_log if info_log is non-nullptr. // The default info log level is InfoLogLevel::INFO_LEVEL. -extern void Log(const shared_ptr& info_log, const char* format, ...) -# if defined(__GNUC__) || defined(__clang__) - __attribute__((__format__ (__printf__, 2, 3))) -# endif - ; +extern void Log(const std::shared_ptr& info_log, const char* format, + ...) ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); -extern void LogFlush(Logger *info_log); +extern void LogFlush(Logger* info_log); extern void Log(const InfoLogLevel log_level, Logger* info_log, - const char* format, ...); + const char* format, ...) ROCKSDB_PRINTF_FORMAT_ATTR(3, 4); // The default info log level is InfoLogLevel::INFO_LEVEL. extern void Log(Logger* info_log, const char* format, ...) -# if defined(__GNUC__) || defined(__clang__) - __attribute__((__format__ (__printf__, 2, 3))) -# endif - ; + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); // a set of log functions with different log levels. -extern void Header(Logger* info_log, const char* format, ...); -extern void Debug(Logger* info_log, const char* format, ...); -extern void Info(Logger* info_log, const char* format, ...); -extern void Warn(Logger* info_log, const char* format, ...); -extern void Error(Logger* info_log, const char* format, ...); -extern void Fatal(Logger* info_log, const char* format, ...); +extern void Header(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Debug(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Info(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Warn(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Error(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); +extern void Fatal(Logger* info_log, const char* format, ...) + ROCKSDB_PRINTF_FORMAT_ATTR(2, 3); // A utility routine: write "data" to the named file. extern Status WriteStringToFile(Env* env, const Slice& data, @@ -992,50 +1005,79 @@ extern Status WriteStringToFile(Env* env, const Slice& data, extern Status ReadFileToString(Env* env, const std::string& fname, std::string* data); +// Below are helpers for wrapping most of the classes in this file. +// They forward all calls to another instance of the class. +// Useful when wrapping the default implementations. +// Typical usage is to inherit your wrapper from *Wrapper, e.g.: +// +// class MySequentialFileWrapper : public rocksdb::SequentialFileWrapper { +// public: +// MySequentialFileWrapper(rocksdb::SequentialFile* target): +// rocksdb::SequentialFileWrapper(target) {} +// Status Read(size_t n, Slice* result, char* scratch) override { +// cout << "Doing a read of size " << n << "!" << endl; +// return rocksdb::SequentialFileWrapper::Read(n, result, scratch); +// } +// // All other methods are forwarded to target_ automatically. +// }; +// +// This is often more convenient than inheriting the class directly because +// (a) Don't have to override and forward all methods - the Wrapper will +// forward everything you're not explicitly overriding. +// (b) Don't need to update the wrapper when more methods are added to the +// rocksdb class. Unless you actually want to override the behavior. +// (And unless rocksdb people forgot to update the *Wrapper class.) + // An implementation of Env that forwards all calls to another Env. // May be useful to clients who wish to override just part of the // functionality of another Env. class EnvWrapper : public Env { public: // Initialize an EnvWrapper that delegates all calls to *t - explicit EnvWrapper(Env* t) : target_(t) { } + explicit EnvWrapper(Env* t) : target_(t) {} ~EnvWrapper() override; // Return the target to which this Env forwards all calls Env* target() const { return target_; } // The following text is boilerplate that forwards all methods to target() - Status NewSequentialFile(const std::string& f, unique_ptr* r, + Status NewSequentialFile(const std::string& f, + std::unique_ptr* r, const EnvOptions& options) override { return target_->NewSequentialFile(f, r, options); } Status NewRandomAccessFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) override { return target_->NewRandomAccessFile(f, r, options); } - Status NewWritableFile(const std::string& f, unique_ptr* r, + Status NewWritableFile(const std::string& f, std::unique_ptr* r, const EnvOptions& options) override { return target_->NewWritableFile(f, r, options); } Status ReopenWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) override { return target_->ReopenWritableFile(fname, result, options); } Status ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) override { return target_->ReuseWritableFile(fname, old_fname, r, options); } Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) override { return target_->NewRandomRWFile(fname, result, options); } + Status NewMemoryMappedFileBuffer( + const std::string& fname, + std::unique_ptr* result) override { + return target_->NewMemoryMappedFileBuffer(fname, result); + } Status NewDirectory(const std::string& name, - unique_ptr* result) override { + std::unique_ptr* result) override { return target_->NewDirectory(name, result); } Status FileExists(const std::string& f) override { @@ -1052,6 +1094,9 @@ class EnvWrapper : public Env { Status DeleteFile(const std::string& f) override { return target_->DeleteFile(f); } + Status Truncate(const std::string& fname, size_t size) override { + return target_->Truncate(fname, size); + } Status CreateDir(const std::string& d) override { return target_->CreateDir(d); } @@ -1113,11 +1158,12 @@ class EnvWrapper : public Env { return target_->GetTestDirectory(path); } Status NewLogger(const std::string& fname, - shared_ptr* result) override { + std::shared_ptr* result) override { return target_->NewLogger(fname, result); } uint64_t NowMicros() override { return target_->NowMicros(); } uint64_t NowNanos() override { return target_->NowNanos(); } + uint64_t NowCPUNanos() override { return target_->NowCPUNanos(); } void SleepForMicroseconds(int micros) override { target_->SleepForMicroseconds(micros); @@ -1167,9 +1213,7 @@ class EnvWrapper : public Env { return target_->GetThreadStatusUpdater(); } - uint64_t GetThreadID() const override { - return target_->GetThreadID(); - } + uint64_t GetThreadID() const override { return target_->GetThreadID(); } std::string GenerateUniqueId() override { return target_->GenerateUniqueId(); @@ -1200,19 +1244,69 @@ class EnvWrapper : public Env { const ImmutableDBOptions& db_options) const override { return target_->OptimizeForCompactionTableRead(env_options, db_options); } + Status GetFreeSpace(const std::string& path, uint64_t* diskfree) override { + return target_->GetFreeSpace(path, diskfree); + } private: Env* target_; }; -// An implementation of WritableFile that forwards all calls to another -// WritableFile. May be useful to clients who wish to override just part of the -// functionality of another WritableFile. -// It's declared as friend of WritableFile to allow forwarding calls to -// protected virtual methods. +class SequentialFileWrapper : public SequentialFile { + public: + explicit SequentialFileWrapper(SequentialFile* target) : target_(target) {} + + Status Read(size_t n, Slice* result, char* scratch) override { + return target_->Read(n, result, scratch); + } + Status Skip(uint64_t n) override { return target_->Skip(n); } + bool use_direct_io() const override { return target_->use_direct_io(); } + size_t GetRequiredBufferAlignment() const override { + return target_->GetRequiredBufferAlignment(); + } + Status InvalidateCache(size_t offset, size_t length) override { + return target_->InvalidateCache(offset, length); + } + Status PositionedRead(uint64_t offset, size_t n, Slice* result, + char* scratch) override { + return target_->PositionedRead(offset, n, result, scratch); + } + + private: + SequentialFile* target_; +}; + +class RandomAccessFileWrapper : public RandomAccessFile { + public: + explicit RandomAccessFileWrapper(RandomAccessFile* target) + : target_(target) {} + + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { + return target_->Read(offset, n, result, scratch); + } + Status Prefetch(uint64_t offset, size_t n) override { + return target_->Prefetch(offset, n); + } + size_t GetUniqueId(char* id, size_t max_size) const override { + return target_->GetUniqueId(id, max_size); + }; + void Hint(AccessPattern pattern) override { target_->Hint(pattern); } + bool use_direct_io() const override { return target_->use_direct_io(); } + size_t GetRequiredBufferAlignment() const override { + return target_->GetRequiredBufferAlignment(); + } + Status InvalidateCache(size_t offset, size_t length) override { + return target_->InvalidateCache(offset, length); + } + + private: + RandomAccessFile* target_; +}; + class WritableFileWrapper : public WritableFile { public: - explicit WritableFileWrapper(WritableFile* t) : target_(t) { } + explicit WritableFileWrapper(WritableFile* t) : target_(t) {} Status Append(const Slice& data) override { return target_->Append(data); } Status PositionedAppend(const Slice& data, uint64_t offset) override { @@ -1224,41 +1318,127 @@ class WritableFileWrapper : public WritableFile { Status Sync() override { return target_->Sync(); } Status Fsync() override { return target_->Fsync(); } bool IsSyncThreadSafe() const override { return target_->IsSyncThreadSafe(); } + + bool use_direct_io() const override { return target_->use_direct_io(); } + + size_t GetRequiredBufferAlignment() const override { + return target_->GetRequiredBufferAlignment(); + } + void SetIOPriority(Env::IOPriority pri) override { target_->SetIOPriority(pri); } + Env::IOPriority GetIOPriority() override { return target_->GetIOPriority(); } + + void SetWriteLifeTimeHint(Env::WriteLifeTimeHint hint) override { + target_->SetWriteLifeTimeHint(hint); + } + + Env::WriteLifeTimeHint GetWriteLifeTimeHint() override { + return target_->GetWriteLifeTimeHint(); + } + uint64_t GetFileSize() override { return target_->GetFileSize(); } + + void SetPreallocationBlockSize(size_t size) override { + target_->SetPreallocationBlockSize(size); + } + void GetPreallocationStatus(size_t* block_size, size_t* last_allocated_block) override { target_->GetPreallocationStatus(block_size, last_allocated_block); } + size_t GetUniqueId(char* id, size_t max_size) const override { return target_->GetUniqueId(id, max_size); } + Status InvalidateCache(size_t offset, size_t length) override { return target_->InvalidateCache(offset, length); } - void SetPreallocationBlockSize(size_t size) override { - target_->SetPreallocationBlockSize(size); + Status RangeSync(uint64_t offset, uint64_t nbytes) override { + return target_->RangeSync(offset, nbytes); } + void PrepareWrite(size_t offset, size_t len) override { target_->PrepareWrite(offset, len); } - protected: Status Allocate(uint64_t offset, uint64_t len) override { return target_->Allocate(offset, len); } - Status RangeSync(uint64_t offset, uint64_t nbytes) override { - return target_->RangeSync(offset, nbytes); - } private: WritableFile* target_; }; +class RandomRWFileWrapper : public RandomRWFile { + public: + explicit RandomRWFileWrapper(RandomRWFile* target) : target_(target) {} + + bool use_direct_io() const override { return target_->use_direct_io(); } + size_t GetRequiredBufferAlignment() const override { + return target_->GetRequiredBufferAlignment(); + } + Status Write(uint64_t offset, const Slice& data) override { + return target_->Write(offset, data); + } + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { + return target_->Read(offset, n, result, scratch); + } + Status Flush() override { return target_->Flush(); } + Status Sync() override { return target_->Sync(); } + Status Fsync() override { return target_->Fsync(); } + Status Close() override { return target_->Close(); } + + private: + RandomRWFile* target_; +}; + +class DirectoryWrapper : public Directory { + public: + explicit DirectoryWrapper(Directory* target) : target_(target) {} + + Status Fsync() override { return target_->Fsync(); } + size_t GetUniqueId(char* id, size_t max_size) const override { + return target_->GetUniqueId(id, max_size); + } + + private: + Directory* target_; +}; + +class LoggerWrapper : public Logger { + public: + explicit LoggerWrapper(Logger* target) : target_(target) {} + + Status Close() override { return target_->Close(); } + void LogHeader(const char* format, va_list ap) override { + return target_->LogHeader(format, ap); + } + void Logv(const char* format, va_list ap) override { + return target_->Logv(format, ap); + } + void Logv(const InfoLogLevel log_level, const char* format, + va_list ap) override { + return target_->Logv(log_level, format, ap); + } + size_t GetLogFileSize() const override { return target_->GetLogFileSize(); } + void Flush() override { return target_->Flush(); } + InfoLogLevel GetInfoLogLevel() const override { + return target_->GetInfoLogLevel(); + } + void SetInfoLogLevel(const InfoLogLevel log_level) override { + return target_->SetInfoLogLevel(log_level); + } + + private: + Logger* target_; +}; + // Returns a new environment that stores its data in memory and delegates // all non-file-storage tasks to base_env. The caller must delete the result // when it is no longer needed. diff --git a/ceph/src/rocksdb/include/rocksdb/env_encryption.h b/ceph/src/rocksdb/include/rocksdb/env_encryption.h index 70dce616a..a80da963a 100644 --- a/ceph/src/rocksdb/include/rocksdb/env_encryption.h +++ b/ceph/src/rocksdb/include/rocksdb/env_encryption.h @@ -5,7 +5,7 @@ #pragma once -#if !defined(ROCKSDB_LITE) +#if !defined(ROCKSDB_LITE) #include @@ -15,180 +15,190 @@ namespace rocksdb { class EncryptionProvider; -// Returns an Env that encrypts data when stored on disk and decrypts data when +// Returns an Env that encrypts data when stored on disk and decrypts data when // read from disk. Env* NewEncryptedEnv(Env* base_env, EncryptionProvider* provider); -// BlockAccessCipherStream is the base class for any cipher stream that -// supports random access at block level (without requiring data from other blocks). -// E.g. CTR (Counter operation mode) supports this requirement. +// BlockAccessCipherStream is the base class for any cipher stream that +// supports random access at block level (without requiring data from other +// blocks). E.g. CTR (Counter operation mode) supports this requirement. class BlockAccessCipherStream { - public: - virtual ~BlockAccessCipherStream() {}; + public: + virtual ~BlockAccessCipherStream(){}; - // BlockSize returns the size of each block supported by this cipher stream. - virtual size_t BlockSize() = 0; + // BlockSize returns the size of each block supported by this cipher stream. + virtual size_t BlockSize() = 0; - // Encrypt one or more (partial) blocks of data at the file offset. - // Length of data is given in dataSize. - virtual Status Encrypt(uint64_t fileOffset, char *data, size_t dataSize); + // Encrypt one or more (partial) blocks of data at the file offset. + // Length of data is given in dataSize. + virtual Status Encrypt(uint64_t fileOffset, char* data, size_t dataSize); - // Decrypt one or more (partial) blocks of data at the file offset. - // Length of data is given in dataSize. - virtual Status Decrypt(uint64_t fileOffset, char *data, size_t dataSize); + // Decrypt one or more (partial) blocks of data at the file offset. + // Length of data is given in dataSize. + virtual Status Decrypt(uint64_t fileOffset, char* data, size_t dataSize); - protected: - // Allocate scratch space which is passed to EncryptBlock/DecryptBlock. - virtual void AllocateScratch(std::string&) = 0; + protected: + // Allocate scratch space which is passed to EncryptBlock/DecryptBlock. + virtual void AllocateScratch(std::string&) = 0; - // Encrypt a block of data at the given block index. - // Length of data is equal to BlockSize(); - virtual Status EncryptBlock(uint64_t blockIndex, char *data, char* scratch) = 0; + // Encrypt a block of data at the given block index. + // Length of data is equal to BlockSize(); + virtual Status EncryptBlock(uint64_t blockIndex, char* data, + char* scratch) = 0; - // Decrypt a block of data at the given block index. - // Length of data is equal to BlockSize(); - virtual Status DecryptBlock(uint64_t blockIndex, char *data, char* scratch) = 0; + // Decrypt a block of data at the given block index. + // Length of data is equal to BlockSize(); + virtual Status DecryptBlock(uint64_t blockIndex, char* data, + char* scratch) = 0; }; -// BlockCipher +// BlockCipher class BlockCipher { - public: - virtual ~BlockCipher() {}; + public: + virtual ~BlockCipher(){}; - // BlockSize returns the size of each block supported by this cipher stream. - virtual size_t BlockSize() = 0; + // BlockSize returns the size of each block supported by this cipher stream. + virtual size_t BlockSize() = 0; - // Encrypt a block of data. - // Length of data is equal to BlockSize(). - virtual Status Encrypt(char *data) = 0; + // Encrypt a block of data. + // Length of data is equal to BlockSize(). + virtual Status Encrypt(char* data) = 0; - // Decrypt a block of data. - // Length of data is equal to BlockSize(). - virtual Status Decrypt(char *data) = 0; + // Decrypt a block of data. + // Length of data is equal to BlockSize(). + virtual Status Decrypt(char* data) = 0; }; // Implements a BlockCipher using ROT13. // -// Note: This is a sample implementation of BlockCipher, +// Note: This is a sample implementation of BlockCipher, // it is NOT considered safe and should NOT be used in production. class ROT13BlockCipher : public BlockCipher { - private: - size_t blockSize_; - public: - ROT13BlockCipher(size_t blockSize) - : blockSize_(blockSize) {} - virtual ~ROT13BlockCipher() {}; - - // BlockSize returns the size of each block supported by this cipher stream. - virtual size_t BlockSize() override { return blockSize_; } - - // Encrypt a block of data. - // Length of data is equal to BlockSize(). - virtual Status Encrypt(char *data) override; - - // Decrypt a block of data. - // Length of data is equal to BlockSize(). - virtual Status Decrypt(char *data) override; + private: + size_t blockSize_; + + public: + ROT13BlockCipher(size_t blockSize) : blockSize_(blockSize) {} + virtual ~ROT13BlockCipher(){}; + + // BlockSize returns the size of each block supported by this cipher stream. + virtual size_t BlockSize() override { return blockSize_; } + + // Encrypt a block of data. + // Length of data is equal to BlockSize(). + virtual Status Encrypt(char* data) override; + + // Decrypt a block of data. + // Length of data is equal to BlockSize(). + virtual Status Decrypt(char* data) override; }; -// CTRCipherStream implements BlockAccessCipherStream using an -// Counter operations mode. +// CTRCipherStream implements BlockAccessCipherStream using an +// Counter operations mode. // See https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation // -// Note: This is a possible implementation of BlockAccessCipherStream, +// Note: This is a possible implementation of BlockAccessCipherStream, // it is considered suitable for use. class CTRCipherStream final : public BlockAccessCipherStream { - private: - BlockCipher& cipher_; - std::string iv_; - uint64_t initialCounter_; - public: - CTRCipherStream(BlockCipher& c, const char *iv, uint64_t initialCounter) - : cipher_(c), iv_(iv, c.BlockSize()), initialCounter_(initialCounter) {}; - virtual ~CTRCipherStream() {}; - - // BlockSize returns the size of each block supported by this cipher stream. - virtual size_t BlockSize() override { return cipher_.BlockSize(); } - - protected: - // Allocate scratch space which is passed to EncryptBlock/DecryptBlock. - virtual void AllocateScratch(std::string&) override; - - // Encrypt a block of data at the given block index. - // Length of data is equal to BlockSize(); - virtual Status EncryptBlock(uint64_t blockIndex, char *data, char *scratch) override; - - // Decrypt a block of data at the given block index. - // Length of data is equal to BlockSize(); - virtual Status DecryptBlock(uint64_t blockIndex, char *data, char *scratch) override; + private: + BlockCipher& cipher_; + std::string iv_; + uint64_t initialCounter_; + + public: + CTRCipherStream(BlockCipher& c, const char* iv, uint64_t initialCounter) + : cipher_(c), iv_(iv, c.BlockSize()), initialCounter_(initialCounter){}; + virtual ~CTRCipherStream(){}; + + // BlockSize returns the size of each block supported by this cipher stream. + virtual size_t BlockSize() override { return cipher_.BlockSize(); } + + protected: + // Allocate scratch space which is passed to EncryptBlock/DecryptBlock. + virtual void AllocateScratch(std::string&) override; + + // Encrypt a block of data at the given block index. + // Length of data is equal to BlockSize(); + virtual Status EncryptBlock(uint64_t blockIndex, char* data, + char* scratch) override; + + // Decrypt a block of data at the given block index. + // Length of data is equal to BlockSize(); + virtual Status DecryptBlock(uint64_t blockIndex, char* data, + char* scratch) override; }; -// The encryption provider is used to create a cipher stream for a specific file. -// The returned cipher stream will be used for actual encryption/decryption -// actions. +// The encryption provider is used to create a cipher stream for a specific +// file. The returned cipher stream will be used for actual +// encryption/decryption actions. class EncryptionProvider { public: - virtual ~EncryptionProvider() {}; - - // GetPrefixLength returns the length of the prefix that is added to every file - // and used for storing encryption options. - // For optimal performance, the prefix length should be a multiple of - // the page size. - virtual size_t GetPrefixLength() = 0; - - // CreateNewPrefix initialized an allocated block of prefix memory - // for a new file. - virtual Status CreateNewPrefix(const std::string& fname, char *prefix, size_t prefixLength) = 0; - - // CreateCipherStream creates a block access cipher stream for a file given - // given name and options. - virtual Status CreateCipherStream(const std::string& fname, const EnvOptions& options, - Slice& prefix, unique_ptr* result) = 0; + virtual ~EncryptionProvider(){}; + + // GetPrefixLength returns the length of the prefix that is added to every + // file and used for storing encryption options. For optimal performance, the + // prefix length should be a multiple of the page size. + virtual size_t GetPrefixLength() = 0; + + // CreateNewPrefix initialized an allocated block of prefix memory + // for a new file. + virtual Status CreateNewPrefix(const std::string& fname, char* prefix, + size_t prefixLength) = 0; + + // CreateCipherStream creates a block access cipher stream for a file given + // given name and options. + virtual Status CreateCipherStream( + const std::string& fname, const EnvOptions& options, Slice& prefix, + std::unique_ptr* result) = 0; }; -// This encryption provider uses a CTR cipher stream, with a given block cipher +// This encryption provider uses a CTR cipher stream, with a given block cipher // and IV. // -// Note: This is a possible implementation of EncryptionProvider, +// Note: This is a possible implementation of EncryptionProvider, // it is considered suitable for use, provided a safe BlockCipher is used. class CTREncryptionProvider : public EncryptionProvider { - private: - BlockCipher& cipher_; - protected: - const static size_t defaultPrefixLength = 4096; + private: + BlockCipher& cipher_; + + protected: + const static size_t defaultPrefixLength = 4096; public: - CTREncryptionProvider(BlockCipher& c) - : cipher_(c) {}; - virtual ~CTREncryptionProvider() {} - - // GetPrefixLength returns the length of the prefix that is added to every file - // and used for storing encryption options. - // For optimal performance, the prefix length should be a multiple of - // the page size. - virtual size_t GetPrefixLength() override; - - // CreateNewPrefix initialized an allocated block of prefix memory - // for a new file. - virtual Status CreateNewPrefix(const std::string& fname, char *prefix, size_t prefixLength) override; - - // CreateCipherStream creates a block access cipher stream for a file given - // given name and options. - virtual Status CreateCipherStream(const std::string& fname, const EnvOptions& options, - Slice& prefix, unique_ptr* result) override; - - protected: - // PopulateSecretPrefixPart initializes the data into a new prefix block - // that will be encrypted. This function will store the data in plain text. - // It will be encrypted later (before written to disk). - // Returns the amount of space (starting from the start of the prefix) - // that has been initialized. - virtual size_t PopulateSecretPrefixPart(char *prefix, size_t prefixLength, size_t blockSize); - - // CreateCipherStreamFromPrefix creates a block access cipher stream for a file given - // given name and options. The given prefix is already decrypted. - virtual Status CreateCipherStreamFromPrefix(const std::string& fname, const EnvOptions& options, - uint64_t initialCounter, const Slice& iv, const Slice& prefix, unique_ptr* result); + CTREncryptionProvider(BlockCipher& c) : cipher_(c){}; + virtual ~CTREncryptionProvider() {} + + // GetPrefixLength returns the length of the prefix that is added to every + // file and used for storing encryption options. For optimal performance, the + // prefix length should be a multiple of the page size. + virtual size_t GetPrefixLength() override; + + // CreateNewPrefix initialized an allocated block of prefix memory + // for a new file. + virtual Status CreateNewPrefix(const std::string& fname, char* prefix, + size_t prefixLength) override; + + // CreateCipherStream creates a block access cipher stream for a file given + // given name and options. + virtual Status CreateCipherStream( + const std::string& fname, const EnvOptions& options, Slice& prefix, + std::unique_ptr* result) override; + + protected: + // PopulateSecretPrefixPart initializes the data into a new prefix block + // that will be encrypted. This function will store the data in plain text. + // It will be encrypted later (before written to disk). + // Returns the amount of space (starting from the start of the prefix) + // that has been initialized. + virtual size_t PopulateSecretPrefixPart(char* prefix, size_t prefixLength, + size_t blockSize); + + // CreateCipherStreamFromPrefix creates a block access cipher stream for a + // file given given name and options. The given prefix is already decrypted. + virtual Status CreateCipherStreamFromPrefix( + const std::string& fname, const EnvOptions& options, + uint64_t initialCounter, const Slice& iv, const Slice& prefix, + std::unique_ptr* result); }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/filter_policy.h b/ceph/src/rocksdb/include/rocksdb/filter_policy.h index 4e1dc3bfc..5d465b782 100644 --- a/ceph/src/rocksdb/include/rocksdb/filter_policy.h +++ b/ceph/src/rocksdb/include/rocksdb/filter_policy.h @@ -19,9 +19,9 @@ #pragma once +#include #include #include -#include #include #include @@ -47,7 +47,7 @@ class FilterBitsBuilder { // Calculate num of entries fit into a space. #if defined(_MSC_VER) #pragma warning(push) -#pragma warning(disable : 4702) // unreachable code +#pragma warning(disable : 4702) // unreachable code #endif virtual int CalculateNumEntry(const uint32_t /*space*/) { #ifndef ROCKSDB_LITE @@ -102,8 +102,8 @@ class FilterPolicy { // // Warning: do not change the initial contents of *dst. Instead, // append the newly constructed filter to *dst. - virtual void CreateFilter(const Slice* keys, int n, std::string* dst) - const = 0; + virtual void CreateFilter(const Slice* keys, int n, + std::string* dst) const = 0; // "filter" contains the data appended by a preceding call to // CreateFilter() on this class. This method must return true if @@ -114,9 +114,7 @@ class FilterPolicy { // Get the FilterBitsBuilder, which is ONLY used for full filter block // It contains interface to take individual key, then generate filter - virtual FilterBitsBuilder* GetFilterBitsBuilder() const { - return nullptr; - } + virtual FilterBitsBuilder* GetFilterBitsBuilder() const { return nullptr; } // Get the FilterBitsReader, which is ONLY used for full filter block // It contains interface to tell if key can be in filter @@ -145,6 +143,6 @@ class FilterPolicy { // ignores trailing spaces, it would be incorrect to use a // FilterPolicy (like NewBloomFilterPolicy) that does not ignore // trailing spaces in keys. -extern const FilterPolicy* NewBloomFilterPolicy(int bits_per_key, - bool use_block_based_builder = true); -} +extern const FilterPolicy* NewBloomFilterPolicy( + int bits_per_key, bool use_block_based_builder = false); +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/flush_block_policy.h b/ceph/src/rocksdb/include/rocksdb/flush_block_policy.h index 5daa96762..38807249c 100644 --- a/ceph/src/rocksdb/include/rocksdb/flush_block_policy.h +++ b/ceph/src/rocksdb/include/rocksdb/flush_block_policy.h @@ -20,10 +20,9 @@ class FlushBlockPolicy { public: // Keep track of the key/value sequences and return the boolean value to // determine if table builder should flush current data block. - virtual bool Update(const Slice& key, - const Slice& value) = 0; + virtual bool Update(const Slice& key, const Slice& value) = 0; - virtual ~FlushBlockPolicy() { } + virtual ~FlushBlockPolicy() {} }; class FlushBlockPolicyFactory { @@ -41,7 +40,7 @@ class FlushBlockPolicyFactory { const BlockBasedTableOptions& table_options, const BlockBuilder& data_block_builder) const = 0; - virtual ~FlushBlockPolicyFactory() { } + virtual ~FlushBlockPolicyFactory() {} }; class FlushBlockBySizePolicyFactory : public FlushBlockPolicyFactory { @@ -59,4 +58,4 @@ class FlushBlockBySizePolicyFactory : public FlushBlockPolicyFactory { const BlockBuilder& data_block_builder); }; -} // rocksdb +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/iostats_context.h b/ceph/src/rocksdb/include/rocksdb/iostats_context.h index 77a59643a..67f2b3217 100644 --- a/ceph/src/rocksdb/include/rocksdb/iostats_context.h +++ b/ceph/src/rocksdb/include/rocksdb/iostats_context.h @@ -44,6 +44,10 @@ struct IOStatsContext { uint64_t prepare_write_nanos; // time spent in Logger::Logv(). uint64_t logger_nanos; + // CPU time spent in write() and pwrite() + uint64_t cpu_write_nanos; + // CPU time spent in read() and pread() + uint64_t cpu_read_nanos; }; // Get Thread-local IOStatsContext object pointer diff --git a/ceph/src/rocksdb/include/rocksdb/ldb_tool.h b/ceph/src/rocksdb/include/rocksdb/ldb_tool.h index 0dbc65c4b..636605ff7 100644 --- a/ceph/src/rocksdb/include/rocksdb/ldb_tool.h +++ b/ceph/src/rocksdb/include/rocksdb/ldb_tool.h @@ -38,6 +38,6 @@ class LDBTool { const std::vector* column_families = nullptr); }; -} // namespace rocksdb +} // namespace rocksdb #endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/listener.h b/ceph/src/rocksdb/include/rocksdb/listener.h index 46ce712dc..d4a61c20e 100644 --- a/ceph/src/rocksdb/include/rocksdb/listener.h +++ b/ceph/src/rocksdb/include/rocksdb/listener.h @@ -4,6 +4,7 @@ #pragma once +#include #include #include #include @@ -143,7 +144,24 @@ struct TableFileDeletionInfo { Status status; }; +struct FileOperationInfo { + using TimePoint = std::chrono::time_point; + + const std::string& path; + uint64_t offset; + size_t length; + const TimePoint& start_timestamp; + const TimePoint& finish_timestamp; + Status status; + FileOperationInfo(const std::string& _path, const TimePoint& start, + const TimePoint& finish) + : path(_path), start_timestamp(start), finish_timestamp(finish) {} +}; + struct FlushJobInfo { + // the id of the column family + uint32_t cf_id; // the name of the column family std::string cf_name; // the path to the newly created file @@ -174,9 +192,11 @@ struct FlushJobInfo { struct CompactionJobInfo { CompactionJobInfo() = default; - explicit CompactionJobInfo(const CompactionJobStats& _stats) : - stats(_stats) {} + explicit CompactionJobInfo(const CompactionJobStats& _stats) + : stats(_stats) {} + // the id of the column family where the compaction happened. + uint32_t cf_id; // the name of the column family where the compaction happened. std::string cf_name; // the status indicating whether the compaction was successful or not. @@ -224,7 +244,6 @@ struct MemTableInfo { uint64_t num_entries; // Total number of deletes in memtable uint64_t num_deletes; - }; struct ExternalFileIngestionInfo { @@ -297,6 +316,15 @@ class EventListener { // returned value. virtual void OnTableFileDeleted(const TableFileDeletionInfo& /*info*/) {} + // A callback function to RocksDB which will be called before a + // RocksDB starts to compact. The default implementation is + // no-op. + // + // Note that the this function must be implemented in a way such that + // it should not run for an extended period of time before the function + // returns. Otherwise, RocksDB may be blocked. + virtual void OnCompactionBegin(DB* /*db*/, const CompactionJobInfo& /*ci*/) {} + // A callback function for RocksDB which will be called whenever // a registered RocksDB compacts a file. The default implementation // is a no-op. @@ -350,8 +378,7 @@ class EventListener { // Note that if applications would like to use the passed reference // outside this function call, they should make copies from these // returned value. - virtual void OnMemTableSealed( - const MemTableInfo& /*info*/) {} + virtual void OnMemTableSealed(const MemTableInfo& /*info*/) {} // A callback function for RocksDB which will be called before // a column family handle is deleted. @@ -395,6 +422,18 @@ class EventListener { // returns. Otherwise, RocksDB may be blocked. virtual void OnStallConditionsChanged(const WriteStallInfo& /*info*/) {} + // A callback function for RocksDB which will be called whenever a file read + // operation finishes. + virtual void OnFileReadFinish(const FileOperationInfo& /* info */) {} + + // A callback function for RocksDB which will be called whenever a file write + // operation finishes. + virtual void OnFileWriteFinish(const FileOperationInfo& /* info */) {} + + // If true, the OnFileReadFinish and OnFileWriteFinish will be called. If + // false, then they won't be called. + virtual bool ShouldBeNotifiedOnFileIO() { return false; } + // A callback function for RocksDB which will be called just before // starting the automatic recovery process for recoverable background // errors, such as NoSpace(). The callback can suppress the automatic @@ -415,8 +454,7 @@ class EventListener { #else -class EventListener { -}; +class EventListener {}; #endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/memory_allocator.h b/ceph/src/rocksdb/include/rocksdb/memory_allocator.h new file mode 100644 index 000000000..889c0e921 --- /dev/null +++ b/ceph/src/rocksdb/include/rocksdb/memory_allocator.h @@ -0,0 +1,77 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#pragma once + +#include "rocksdb/status.h" + +#include + +namespace rocksdb { + +// MemoryAllocator is an interface that a client can implement to supply custom +// memory allocation and deallocation methods. See rocksdb/cache.h for more +// information. +// All methods should be thread-safe. +class MemoryAllocator { + public: + virtual ~MemoryAllocator() = default; + + // Name of the cache allocator, printed in the log + virtual const char* Name() const = 0; + + // Allocate a block of at least size. Has to be thread-safe. + virtual void* Allocate(size_t size) = 0; + + // Deallocate previously allocated block. Has to be thread-safe. + virtual void Deallocate(void* p) = 0; + + // Returns the memory size of the block allocated at p. The default + // implementation that just returns the original allocation_size is fine. + virtual size_t UsableSize(void* /*p*/, size_t allocation_size) const { + // default implementation just returns the allocation size + return allocation_size; + } +}; + +struct JemallocAllocatorOptions { + // Jemalloc tcache cache allocations by size class. For each size class, + // it caches between 20 (for large size classes) to 200 (for small size + // classes). To reduce tcache memory usage in case the allocator is access + // by large number of threads, we can control whether to cache an allocation + // by its size. + bool limit_tcache_size = false; + + // Lower bound of allocation size to use tcache, if limit_tcache_size=true. + // When used with block cache, it is recommneded to set it to block_size/4. + size_t tcache_size_lower_bound = 1024; + + // Upper bound of allocation size to use tcache, if limit_tcache_size=true. + // When used with block cache, it is recommneded to set it to block_size. + size_t tcache_size_upper_bound = 16 * 1024; +}; + +// Generate memory allocators which allocates through Jemalloc and utilize +// MADV_DONTDUMP through madvice to exclude cache items from core dump. +// Applications can use the allocator with block cache to exclude block cache +// usage from core dump. +// +// Implementation details: +// The JemallocNodumpAllocator creates a delicated jemalloc arena, and all +// allocations of the JemallocNodumpAllocator is through the same arena. +// The memory allocator hooks memory allocation of the arena, and call +// madvice() with MADV_DONTDUMP flag to exclude the piece of memory from +// core dump. Side benefit of using single arena would be reduce of jemalloc +// metadata for some workload. +// +// To mitigate mutex contention for using one single arena, jemalloc tcache +// (thread-local cache) is enabled to cache unused allocations for future use. +// The tcache normally incur 0.5M extra memory usage per-thread. The usage +// can be reduce by limitting allocation sizes to cache. +extern Status NewJemallocNodumpAllocator( + JemallocAllocatorOptions& options, + std::shared_ptr* memory_allocator); + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/memtablerep.h b/ceph/src/rocksdb/include/rocksdb/memtablerep.h index 4c2a23e0a..328422f57 100644 --- a/ceph/src/rocksdb/include/rocksdb/memtablerep.h +++ b/ceph/src/rocksdb/include/rocksdb/memtablerep.h @@ -35,11 +35,11 @@ #pragma once -#include -#include +#include #include #include -#include +#include +#include namespace rocksdb { @@ -75,7 +75,7 @@ class MemTableRep { virtual int operator()(const char* prefix_len_key, const Slice& key) const = 0; - virtual ~KeyComparator() { } + virtual ~KeyComparator() {} }; explicit MemTableRep(Allocator* allocator) : allocator_(allocator) {} @@ -142,7 +142,7 @@ class MemTableRep { // does nothing. After MarkReadOnly() is called, this table rep will // not be written to (ie No more calls to Allocate(), Insert(), // or any writes done directly to entries accessed through the iterator.) - virtual void MarkReadOnly() { } + virtual void MarkReadOnly() {} // Notify this table rep that it has been flushed to stable storage. // By default, does nothing. @@ -150,7 +150,7 @@ class MemTableRep { // Invariant: MarkReadOnly() is called, before MarkFlushed(). // Note that this method if overridden, should not run for an extended period // of time. Otherwise, RocksDB may be blocked. - virtual void MarkFlushed() { } + virtual void MarkFlushed() {} // Look up key from the mem table, since the first key in the mem table whose // user_key matches the one given k, call the function callback_func(), with @@ -176,7 +176,7 @@ class MemTableRep { // that was allocated through the allocator. Safe to call from any thread. virtual size_t ApproximateMemoryUsage() = 0; - virtual ~MemTableRep() { } + virtual ~MemTableRep() {} // Iteration over the contents of a skip collection class Iterator { @@ -317,16 +317,14 @@ class VectorRepFactory : public MemTableRepFactory { const size_t count_; public: - explicit VectorRepFactory(size_t count = 0) : count_(count) { } + explicit VectorRepFactory(size_t count = 0) : count_(count) {} using MemTableRepFactory::CreateMemTableRep; virtual MemTableRep* CreateMemTableRep(const MemTableRep::KeyComparator&, Allocator*, const SliceTransform*, Logger* logger) override; - virtual const char* Name() const override { - return "VectorRepFactory"; - } + virtual const char* Name() const override { return "VectorRepFactory"; } }; // This class contains a fixed array of buckets, each @@ -337,8 +335,7 @@ class VectorRepFactory : public MemTableRepFactory { // link lists in the skiplist extern MemTableRepFactory* NewHashSkipListRepFactory( size_t bucket_count = 1000000, int32_t skiplist_height = 4, - int32_t skiplist_branching_factor = 4 -); + int32_t skiplist_branching_factor = 4); // The factory is to create memtables based on a hash table: // it contains a fixed array of buckets, each pointing to either a linked list @@ -362,39 +359,5 @@ extern MemTableRepFactory* NewHashLinkListRepFactory( bool if_log_bucket_dist_when_flash = true, uint32_t threshold_use_skiplist = 256); -// This factory creates a cuckoo-hashing based mem-table representation. -// Cuckoo-hash is a closed-hash strategy, in which all key/value pairs -// are stored in the bucket array itself instead of in some data structures -// external to the bucket array. In addition, each key in cuckoo hash -// has a constant number of possible buckets in the bucket array. These -// two properties together makes cuckoo hash more memory efficient and -// a constant worst-case read time. Cuckoo hash is best suitable for -// point-lookup workload. -// -// When inserting a key / value, it first checks whether one of its possible -// buckets is empty. If so, the key / value will be inserted to that vacant -// bucket. Otherwise, one of the keys originally stored in one of these -// possible buckets will be "kicked out" and move to one of its possible -// buckets (and possibly kicks out another victim.) In the current -// implementation, such "kick-out" path is bounded. If it cannot find a -// "kick-out" path for a specific key, this key will be stored in a backup -// structure, and the current memtable to be forced to immutable. -// -// Note that currently this mem-table representation does not support -// snapshot (i.e., it only queries latest state) and iterators. In addition, -// MultiGet operation might also lose its atomicity due to the lack of -// snapshot support. -// -// Parameters: -// write_buffer_size: the write buffer size in bytes. -// average_data_size: the average size of key + value in bytes. This value -// together with write_buffer_size will be used to compute the number -// of buckets. -// hash_function_count: the number of hash functions that will be used by -// the cuckoo-hash. The number also equals to the number of possible -// buckets each key will have. -extern MemTableRepFactory* NewHashCuckooRepFactory( - size_t write_buffer_size, size_t average_data_size = 64, - unsigned int hash_function_count = 4); #endif // ROCKSDB_LITE } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/merge_operator.h b/ceph/src/rocksdb/include/rocksdb/merge_operator.h index b90f3d72f..d8ddcc6a0 100644 --- a/ceph/src/rocksdb/include/rocksdb/merge_operator.h +++ b/ceph/src/rocksdb/include/rocksdb/merge_operator.h @@ -108,6 +108,23 @@ class MergeOperator { Slice& existing_operand; }; + // This function applies a stack of merge operands in chrionological order + // on top of an existing value. There are two ways in which this method is + // being used: + // a) During Get() operation, it used to calculate the final value of a key + // b) During compaction, in order to collapse some operands with the based + // value. + // + // Note: The name of the method is somewhat misleading, as both in the cases + // of Get() or compaction it may be called on a subset of operands: + // K: 0 +1 +2 +7 +4 +5 2 +1 +2 + // ^ + // | + // snapshot + // In the example above, Get(K) operation will call FullMerge with a base + // value of 2 and operands [+1, +2]. Compaction process might decide to + // collapse the beginning of the history up to the snapshot by performing + // full Merge with base value of 0 and operands [+1, +2, +7, +3]. virtual bool FullMergeV2(const MergeOperationInput& merge_in, MergeOperationOutput* merge_out) const; @@ -182,11 +199,11 @@ class MergeOperator { // consistent MergeOperator between DB opens. virtual const char* Name() const = 0; - // Determines whether the MergeOperator can be called with just a single + // Determines whether the PartialMerge can be called with just a single // merge operand. - // Override and return true for allowing a single operand. Both FullMergeV2 - // and PartialMerge/PartialMergeMulti should be overridden and implemented - // correctly to handle a single operand. + // Override and return true for allowing a single operand. PartialMerge + // and PartialMergeMulti should be overridden and implemented + // correctly to properly handle a single operand. virtual bool AllowSingleOperand() const { return false; } // Allows to control when to invoke a full merge during Get. @@ -222,13 +239,10 @@ class AssociativeMergeOperator : public MergeOperator { // returns false, it is because client specified bad data or there was // internal corruption. The client should assume that this will be treated // as an error by the library. - virtual bool Merge(const Slice& key, - const Slice* existing_value, - const Slice& value, - std::string* new_value, + virtual bool Merge(const Slice& key, const Slice* existing_value, + const Slice& value, std::string* new_value, Logger* logger) const = 0; - private: // Default implementations of the MergeOperator functions bool FullMergeV2(const MergeOperationInput& merge_in, diff --git a/ceph/src/rocksdb/include/rocksdb/metadata.h b/ceph/src/rocksdb/include/rocksdb/metadata.h index a9773bf40..a0ab41efd 100644 --- a/ceph/src/rocksdb/include/rocksdb/metadata.h +++ b/ceph/src/rocksdb/include/rocksdb/metadata.h @@ -22,8 +22,8 @@ struct SstFileMetaData; struct ColumnFamilyMetaData { ColumnFamilyMetaData() : size(0), file_count(0), name("") {} ColumnFamilyMetaData(const std::string& _name, uint64_t _size, - const std::vector&& _levels) : - size(_size), name(_name), levels(_levels) {} + const std::vector&& _levels) + : size(_size), name(_name), levels(_levels) {} // The size of this column family in bytes, which is equal to the sum of // the file size of its "levels". @@ -39,9 +39,8 @@ struct ColumnFamilyMetaData { // The metadata that describes a level. struct LevelMetaData { LevelMetaData(int _level, uint64_t _size, - const std::vector&& _files) : - level(_level), size(_size), - files(_files) {} + const std::vector&& _files) + : level(_level), size(_size), files(_files) {} // The level which this meta data describes. const int level; @@ -63,7 +62,10 @@ struct SstFileMetaData { smallestkey(""), largestkey(""), num_reads_sampled(0), - being_compacted(false) {} + being_compacted(false), + num_entries(0), + num_deletions(0) {} + SstFileMetaData(const std::string& _file_name, const std::string& _path, size_t _size, SequenceNumber _smallest_seqno, SequenceNumber _largest_seqno, @@ -78,7 +80,9 @@ struct SstFileMetaData { smallestkey(_smallestkey), largestkey(_largestkey), num_reads_sampled(_num_reads_sampled), - being_compacted(_being_compacted) {} + being_compacted(_being_compacted), + num_entries(0), + num_deletions(0) {} // File size in bytes. size_t size; @@ -89,15 +93,19 @@ struct SstFileMetaData { SequenceNumber smallest_seqno; // Smallest sequence number in file. SequenceNumber largest_seqno; // Largest sequence number in file. - std::string smallestkey; // Smallest user defined key in the file. - std::string largestkey; // Largest user defined key in the file. - uint64_t num_reads_sampled; // How many times the file is read. + std::string smallestkey; // Smallest user defined key in the file. + std::string largestkey; // Largest user defined key in the file. + uint64_t num_reads_sampled; // How many times the file is read. bool being_compacted; // true if the file is currently being compacted. + + uint64_t num_entries; + uint64_t num_deletions; }; // The full set of metadata associated with each SST file. struct LiveFileMetaData : SstFileMetaData { std::string column_family_name; // Name of the column family - int level; // Level at which this file resides. + int level; // Level at which this file resides. + LiveFileMetaData() : column_family_name(), level(0) {} }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/options.h b/ceph/src/rocksdb/include/rocksdb/options.h index c4597c489..f7d6dfaf5 100644 --- a/ceph/src/rocksdb/include/rocksdb/options.h +++ b/ceph/src/rocksdb/include/rocksdb/options.h @@ -10,11 +10,11 @@ #include #include -#include -#include -#include #include +#include +#include #include +#include #include "rocksdb/advanced_options.h" #include "rocksdb/comparator.h" @@ -34,6 +34,7 @@ class Cache; class CompactionFilter; class CompactionFilterFactory; class Comparator; +class ConcurrentTaskLimiter; class Env; enum InfoLogLevel : unsigned char; class SstFileManager; @@ -93,8 +94,7 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions { // an iterator, only Put() and Get() API calls // // Not supported in ROCKSDB_LITE - ColumnFamilyOptions* OptimizeForPointLookup( - uint64_t block_cache_size_mb); + ColumnFamilyOptions* OptimizeForPointLookup(uint64_t block_cache_size_mb); // Default values for some parameters in ColumnFamilyOptions are not // optimized for heavy workloads and big datasets, which means you might @@ -188,8 +188,7 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions { // Dynamically changeable through SetOptions() API size_t write_buffer_size = 64 << 20; - // Compress blocks using the specified compression algorithm. This - // parameter can be changed dynamically. + // Compress blocks using the specified compression algorithm. // // Default: kSnappyCompression, if it's supported. If snappy is not linked // with the library, the default is kNoCompression. @@ -212,6 +211,8 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions { // - kZlibCompression: Z_DEFAULT_COMPRESSION (currently -1) // - kLZ4HCCompression: 0 // - For all others, we do not specify a compression level + // + // Dynamically changeable through SetOptions() API CompressionType compression; // Compression algorithm that will be used for the bottommost level that @@ -292,6 +293,14 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions { // Default: empty std::vector cf_paths; + // Compaction concurrent thread limiter for the column family. + // If non-nullptr, use given concurrent thread limiter to control + // the max outstanding compaction tasks. Limiter can be shared with + // multiple column families across db instances. + // + // Default: nullptr + std::shared_ptr compaction_thread_limiter = nullptr; + // Create ColumnFamilyOptions with default values for all fields ColumnFamilyOptions(); // Create ColumnFamilyOptions from Options @@ -331,7 +340,6 @@ struct DbPath { DbPath(const std::string& p, uint64_t t) : path(p), target_size(t) {} }; - struct DBOptions { // The function recovers options to the option as in version 4.6. DBOptions* OldDefaults(int rocksdb_major_version = 4, @@ -406,9 +414,9 @@ struct DBOptions { std::shared_ptr info_log = nullptr; #ifdef NDEBUG - InfoLogLevel info_log_level = INFO_LEVEL; + InfoLogLevel info_log_level = INFO_LEVEL; #else - InfoLogLevel info_log_level = DEBUG_LEVEL; + InfoLogLevel info_log_level = DEBUG_LEVEL; #endif // NDEBUG // Number of open files that can be used by the DB. You may need to @@ -416,7 +424,10 @@ struct DBOptions { // files opened are always kept open. You can estimate number of files based // on target_file_size_base and target_file_size_multiplier for level-based // compaction. For universal-style compaction, you can usually set it to -1. + // // Default: -1 + // + // Dynamically changeable through SetDBOptions() API. int max_open_files = -1; // If max_open_files is -1, DB will open all files on DB::Open(). You can @@ -431,7 +442,10 @@ struct DBOptions { // [sum of all write_buffer_size * max_write_buffer_number] * 4 // This option takes effect only when there are more than one column family as // otherwise the wal size is dictated by the write_buffer_size. + // // Default: 0 + // + // Dynamically changeable through SetDBOptions() API. uint64_t max_total_wal_size = 0; // If non-null, then we should collect metrics about database operations @@ -492,13 +506,23 @@ struct DBOptions { // value is 6 hours. The files that get out of scope by compaction // process will still get automatically delete on every compaction, // regardless of this setting + // + // Default: 6 hours + // + // Dynamically changeable through SetDBOptions() API. uint64_t delete_obsolete_files_period_micros = 6ULL * 60 * 60 * 1000000; // Maximum number of concurrent background jobs (compactions and flushes). + // + // Default: 2 + // + // Dynamically changeable through SetDBOptions() API. int max_background_jobs = 2; // NOT SUPPORTED ANYMORE: RocksDB automatically decides this based on the // value of max_background_jobs. This option is ignored. + // + // Dynamically changeable through SetDBOptions() API. int base_background_compactions = -1; // NOT SUPPORTED ANYMORE: RocksDB automatically decides this based on the @@ -513,7 +537,10 @@ struct DBOptions { // If you're increasing this, also consider increasing number of threads in // LOW priority thread pool. For more information, see // Env::SetBackgroundThreads + // // Default: -1 + // + // Dynamically changeable through SetDBOptions() API. int max_background_compactions = -1; // This value represents the maximum number of threads that will @@ -642,9 +669,21 @@ struct DBOptions { bool skip_log_error_on_recovery = false; // if not zero, dump rocksdb.stats to LOG every stats_dump_period_sec + // // Default: 600 (10 min) + // + // Dynamically changeable through SetDBOptions() API. unsigned int stats_dump_period_sec = 600; + // if not zero, dump rocksdb.stats to RocksDB every stats_persist_period_sec + // Default: 600 + unsigned int stats_persist_period_sec = 600; + + // if not zero, periodically take stats snapshots and store in memory, the + // memory size for stats snapshots is capped at stats_history_buffer_size + // Default: 1MB + size_t stats_history_buffer_size = 1024 * 1024; + // If set true, will hint the underlying file system that the file // access pattern is random, when a sst file is opened. // Default: true @@ -668,7 +707,7 @@ struct DBOptions { // a limit, a flush will be triggered in the next DB to which the next write // is issued. // - // If the object is only passed to on DB, the behavior is the same as + // If the object is only passed to one DB, the behavior is the same as // db_write_buffer_size. When write_buffer_manager is set, the value set will // override db_write_buffer_size. // @@ -681,12 +720,7 @@ struct DBOptions { // Specify the file access pattern once a compaction is started. // It will be applied to all input files of a compaction. // Default: NORMAL - enum AccessHint { - NONE, - NORMAL, - SEQUENTIAL, - WILLNEED - }; + enum AccessHint { NONE, NORMAL, SEQUENTIAL, WILLNEED }; AccessHint access_hint_on_compaction_start = NORMAL; // If true, always create a new file descriptor and new table reader @@ -709,6 +743,8 @@ struct DBOptions { // true. // // Default: 0 + // + // Dynamically changeable through SetDBOptions() API. size_t compaction_readahead_size = 0; // This is a maximum buffer size that is used by WinMmapReadableFile in @@ -735,9 +771,10 @@ struct DBOptions { // write requests if the logical sector size is unusual // // Default: 1024 * 1024 (1 MB) + // + // Dynamically changeable through SetDBOptions() API. size_t writable_file_max_buffer_size = 1024 * 1024; - // Use adaptive mutex, which spins in the user space before resorting // to kernel. This could reduce context switch when the mutex is not // heavily contended. However, if the mutex is hot, we could end up @@ -757,20 +794,27 @@ struct DBOptions { // to smooth out write I/Os over time. Users shouldn't rely on it for // persistency guarantee. // Issue one request for every bytes_per_sync written. 0 turns it off. - // Default: 0 // // You may consider using rate_limiter to regulate write rate to device. // When rate limiter is enabled, it automatically enables bytes_per_sync // to 1MB. // // This option applies to table files + // + // Default: 0, turned off + // + // Note: DOES NOT apply to WAL files. See wal_bytes_per_sync instead + // Dynamically changeable through SetDBOptions() API. uint64_t bytes_per_sync = 0; // Same as bytes_per_sync, but applies to WAL files + // // Default: 0, turned off + // + // Dynamically changeable through SetDBOptions() API. uint64_t wal_bytes_per_sync = 0; - // A vector of EventListeners which callback functions will be called + // A vector of EventListeners whose callback functions will be called // when specific RocksDB event happens. std::vector> listeners; @@ -794,6 +838,8 @@ struct DBOptions { // Unit: byte per second. // // Default: 0 + // + // Dynamically changeable through SetDBOptions() API. uint64_t delayed_write_rate = 0; // By default, a single write thread queue is maintained. The thread gets @@ -943,6 +989,28 @@ struct DBOptions { // relies on manual invocation of FlushWAL to write the WAL buffer to its // file. bool manual_wal_flush = false; + + // If true, RocksDB supports flushing multiple column families and committing + // their results atomically to MANIFEST. Note that it is not + // necessary to set atomic_flush to true if WAL is always enabled since WAL + // allows the database to be restored to the last persistent state in WAL. + // This option is useful when there are column families with writes NOT + // protected by WAL. + // For manual flush, application has to specify which column families to + // flush atomically in DB::Flush. + // For auto-triggered flush, RocksDB atomically flushes ALL column families. + // + // Currently, any WAL-enabled writes after atomic flush may be replayed + // independently if the process crashes later and tries to recover. + bool atomic_flush = false; + + // If true, ColumnFamilyHandle's and Iterator's destructors won't delete + // obsolete files directly and will instead schedule a background job + // to do it. Use it if you're destroying iterators or ColumnFamilyHandle-s + // from latency-sensitive threads. + // If set to true, takes precedence over + // ReadOptions::background_purge_on_iterator_cleanup. + bool avoid_unnecessary_blocking_io = false; }; // Options to control the behavior of a database (passed to DB::Open) @@ -1145,7 +1213,10 @@ struct WriteOptions { bool sync; // If true, writes will not first go to the write ahead log, - // and the write may got lost after a crash. + // and the write may get lost after a crash. The backup engine + // relies on write-ahead logs to back up the memtable, so if + // you disable write-ahead logs, you must create backups with + // flush_before_backup=true to avoid losing unflushed memtable data. // Default: false bool disableWAL; @@ -1286,8 +1357,32 @@ struct IngestExternalFileOptions { // 2. Without writing external SST file, it's possible to do checksum. // We have a plan to set this option to false by default in the future. bool write_global_seqno = true; + // Set to true if you would like to verify the checksums of each block of the + // external SST file before ingestion. + // Warning: setting this to true causes slowdown in file ingestion because + // the external SST file has to be read. + bool verify_checksums_before_ingest = false; }; -struct TraceOptions {}; +enum TraceFilterType : uint64_t { + // Trace all the operations + kTraceFilterNone = 0x0, + // Do not trace the get operations + kTraceFilterGet = 0x1 << 0, + // Do not trace the write operations + kTraceFilterWrite = 0x1 << 1 +}; + +// TraceOptions is used for StartTrace +struct TraceOptions { + // To avoid the trace file size grows large than the storage space, + // user can set the max trace file size in Bytes. Default is 64GB + uint64_t max_trace_file_size = uint64_t{64} * 1024 * 1024 * 1024; + // Specify trace sampling option, i.e. capture one per how many requests. + // Default to 1 (capture every request). + uint64_t sampling_frequency = 1; + // Note: The filtering happens before sampling. + uint64_t filter = kTraceFilterNone; +}; } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/perf_context.h b/ceph/src/rocksdb/include/rocksdb/perf_context.h index d3771d3f0..a1d803c2c 100644 --- a/ceph/src/rocksdb/include/rocksdb/perf_context.h +++ b/ceph/src/rocksdb/include/rocksdb/perf_context.h @@ -6,6 +6,7 @@ #pragma once #include +#include #include #include "rocksdb/perf_level.h" @@ -16,18 +17,64 @@ namespace rocksdb { // and transparently. // Use SetPerfLevel(PerfLevel::kEnableTime) to enable time stats. +// Break down performance counters by level and store per-level perf context in +// PerfContextByLevel +struct PerfContextByLevel { + // # of times bloom filter has avoided file reads, i.e., negatives. + uint64_t bloom_filter_useful = 0; + // # of times bloom FullFilter has not avoided the reads. + uint64_t bloom_filter_full_positive = 0; + // # of times bloom FullFilter has not avoided the reads and data actually + // exist. + uint64_t bloom_filter_full_true_positive = 0; + + // total number of user key returned (only include keys that are found, does + // not include keys that are deleted or merged without a final put + uint64_t user_key_return_count; + + // total nanos spent on reading data from SST files + uint64_t get_from_table_nanos; + + uint64_t block_cache_hit_count = 0; // total number of block cache hits + uint64_t block_cache_miss_count = 0; // total number of block cache misses + + void Reset(); // reset all performance counters to zero +}; + struct PerfContext { + ~PerfContext(); + + PerfContext() {} + + PerfContext(const PerfContext&); + PerfContext& operator=(const PerfContext&); + PerfContext(PerfContext&&) noexcept; - void Reset(); // reset all performance counters to zero + void Reset(); // reset all performance counters to zero std::string ToString(bool exclude_zero_counters = false) const; - uint64_t user_key_comparison_count; // total number of user key comparisons - uint64_t block_cache_hit_count; // total number of block cache hits - uint64_t block_read_count; // total number of block reads (with IO) - uint64_t block_read_byte; // total number of bytes from block reads - uint64_t block_read_time; // total nanos spent on block reads - uint64_t block_checksum_time; // total nanos spent on block checksum + // enable per level perf context and allocate storage for PerfContextByLevel + void EnablePerLevelPerfContext(); + + // temporarily disable per level perf contxt by setting the flag to false + void DisablePerLevelPerfContext(); + + // free the space for PerfContextByLevel, also disable per level perf context + void ClearPerLevelPerfContext(); + + uint64_t user_key_comparison_count; // total number of user key comparisons + uint64_t block_cache_hit_count; // total number of block cache hits + uint64_t block_read_count; // total number of block reads (with IO) + uint64_t block_read_byte; // total number of bytes from block reads + uint64_t block_read_time; // total nanos spent on block reads + uint64_t block_cache_index_hit_count; // total number of index block hits + uint64_t index_block_read_count; // total number of index block reads + uint64_t block_cache_filter_hit_count; // total number of filter block hits + uint64_t filter_block_read_count; // total number of filter block reads + uint64_t compression_dict_block_read_count; // total number of compression + // dictionary block reads + uint64_t block_checksum_time; // total nanos spent on block checksum uint64_t block_decompress_time; // total nanos spent on block decompression uint64_t get_read_bytes; // bytes for vals returned by Get @@ -68,9 +115,9 @@ struct PerfContext { // uint64_t internal_merge_count; - uint64_t get_snapshot_time; // total nanos spent on getting snapshot - uint64_t get_from_memtable_time; // total nanos spent on querying memtables - uint64_t get_from_memtable_count; // number of mem tables queried + uint64_t get_snapshot_time; // total nanos spent on getting snapshot + uint64_t get_from_memtable_time; // total nanos spent on querying memtables + uint64_t get_from_memtable_count; // number of mem tables queried // total nanos spent after Get() finds a key uint64_t get_post_process_time; uint64_t get_from_output_files_time; // total nanos reading from output files @@ -168,10 +215,18 @@ struct PerfContext { uint64_t env_lock_file_nanos; uint64_t env_unlock_file_nanos; uint64_t env_new_logger_nanos; + + uint64_t get_cpu_nanos; + uint64_t iter_next_cpu_nanos; + uint64_t iter_prev_cpu_nanos; + uint64_t iter_seek_cpu_nanos; + + std::map* level_to_perf_context = nullptr; + bool per_level_perf_context_enabled = false; }; // Get Thread-local PerfContext object pointer // if defined(NPERF_CONTEXT), then the pointer is not thread-local PerfContext* get_perf_context(); -} +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/perf_level.h b/ceph/src/rocksdb/include/rocksdb/perf_level.h index 218c6015f..de0a214d6 100644 --- a/ceph/src/rocksdb/include/rocksdb/perf_level.h +++ b/ceph/src/rocksdb/include/rocksdb/perf_level.h @@ -17,8 +17,11 @@ enum PerfLevel : unsigned char { kEnableCount = 2, // enable only count stats kEnableTimeExceptForMutex = 3, // Other than count stats, also enable time // stats except for mutexes - kEnableTime = 4, // enable count and time stats - kOutOfBounds = 5 // N.B. Must always be the last value! + // Other than time, also measure CPU time counters. Still don't measure + // time (neither wall time nor CPU time) for mutexes. + kEnableTimeAndCPUTimeExceptForMutex = 4, + kEnableTime = 5, // enable count and time stats + kOutOfBounds = 6 // N.B. Must always be the last value! }; // set the perf stats level for current thread diff --git a/ceph/src/rocksdb/include/rocksdb/rate_limiter.h b/ceph/src/rocksdb/include/rocksdb/rate_limiter.h index a81a3ac91..57b1169b6 100644 --- a/ceph/src/rocksdb/include/rocksdb/rate_limiter.h +++ b/ceph/src/rocksdb/include/rocksdb/rate_limiter.h @@ -81,11 +81,11 @@ class RateLimiter { // Max bytes can be granted in a single burst virtual int64_t GetSingleBurstBytes() const = 0; - // Total bytes that go though rate limiter + // Total bytes that go through rate limiter virtual int64_t GetTotalBytesThrough( const Env::IOPriority pri = Env::IO_TOTAL) const = 0; - // Total # of requests that go though rate limiter + // Total # of requests that go through rate limiter virtual int64_t GetTotalRequests( const Env::IOPriority pri = Env::IO_TOTAL) const = 0; diff --git a/ceph/src/rocksdb/include/rocksdb/slice.h b/ceph/src/rocksdb/include/rocksdb/slice.h index 9ccbdc51e..2b01e6d9a 100644 --- a/ceph/src/rocksdb/include/rocksdb/slice.h +++ b/ceph/src/rocksdb/include/rocksdb/slice.h @@ -19,9 +19,9 @@ #pragma once #include -#include #include #include +#include #include #ifdef __cpp_lib_string_view @@ -35,14 +35,14 @@ namespace rocksdb { class Slice { public: // Create an empty slice. - Slice() : data_(""), size_(0) { } + Slice() : data_(""), size_(0) {} // Create a slice that refers to d[0,n-1]. - Slice(const char* d, size_t n) : data_(d), size_(n) { } + Slice(const char* d, size_t n) : data_(d), size_(n) {} // Create a slice that refers to the contents of "s" /* implicit */ - Slice(const std::string& s) : data_(s.data()), size_(s.size()) { } + Slice(const std::string& s) : data_(s.data()), size_(s.size()) {} #ifdef __cpp_lib_string_view // Create a slice that refers to the same contents as "sv" @@ -52,9 +52,7 @@ class Slice { // Create a slice that refers to s[0,strlen(s)-1] /* implicit */ - Slice(const char* s) : data_(s) { - size_ = (s == nullptr) ? 0 : strlen(s); - } + Slice(const char* s) : data_(s) { size_ = (s == nullptr) ? 0 : strlen(s); } // Create a single slice from SliceParts using buf as storage. // buf must exist as long as the returned Slice exists. @@ -77,7 +75,10 @@ class Slice { } // Change this slice to refer to an empty array - void clear() { data_ = ""; size_ = 0; } + void clear() { + data_ = ""; + size_ = 0; + } // Drop the first "n" bytes from this slice. void remove_prefix(size_t n) { @@ -117,8 +118,7 @@ class Slice { // Return true iff "x" is a prefix of "*this" bool starts_with(const Slice& x) const { - return ((size_ >= x.size_) && - (memcmp(data_, x.data_, x.size_) == 0)); + return ((size_ >= x.size_) && (memcmp(data_, x.data_, x.size_) == 0)); } bool ends_with(const Slice& x) const { @@ -129,7 +129,7 @@ class Slice { // Compare two slices and returns the first byte where they differ size_t difference_offset(const Slice& b) const; - // private: make these public for rocksdbjni access + // private: make these public for rocksdbjni access const char* data_; size_t size_; @@ -202,6 +202,7 @@ class PinnableSlice : public Slice, public Cleanable { void Reset() { Cleanable::Reset(); pinned_ = false; + size_ = 0; } inline std::string* GetSelf() { return buf_; } @@ -218,8 +219,8 @@ class PinnableSlice : public Slice, public Cleanable { // A set of Slices that are virtually concatenated together. 'parts' points // to an array of Slices. The number of elements in the array is 'num_parts'. struct SliceParts { - SliceParts(const Slice* _parts, int _num_parts) : - parts(_parts), num_parts(_num_parts) { } + SliceParts(const Slice* _parts, int _num_parts) + : parts(_parts), num_parts(_num_parts) {} SliceParts() : parts(nullptr), num_parts(0) {} const Slice* parts; @@ -231,17 +232,17 @@ inline bool operator==(const Slice& x, const Slice& y) { (memcmp(x.data(), y.data(), x.size()) == 0)); } -inline bool operator!=(const Slice& x, const Slice& y) { - return !(x == y); -} +inline bool operator!=(const Slice& x, const Slice& y) { return !(x == y); } inline int Slice::compare(const Slice& b) const { assert(data_ != nullptr && b.data_ != nullptr); const size_t min_len = (size_ < b.size_) ? size_ : b.size_; int r = memcmp(data_, b.data_, min_len); if (r == 0) { - if (size_ < b.size_) r = -1; - else if (size_ > b.size_) r = +1; + if (size_ < b.size_) + r = -1; + else if (size_ > b.size_) + r = +1; } return r; } @@ -255,4 +256,4 @@ inline size_t Slice::difference_offset(const Slice& b) const { return off; } -} // namespace rocksdb \ No newline at end of file +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/slice_transform.h b/ceph/src/rocksdb/include/rocksdb/slice_transform.h index 2bbe06153..39e3d5fa1 100644 --- a/ceph/src/rocksdb/include/rocksdb/slice_transform.h +++ b/ceph/src/rocksdb/include/rocksdb/slice_transform.h @@ -28,7 +28,7 @@ class Slice; */ class SliceTransform { public: - virtual ~SliceTransform() {}; + virtual ~SliceTransform(){}; // Return the name of this transformation. virtual const char* Name() const = 0; @@ -98,4 +98,4 @@ extern const SliceTransform* NewCappedPrefixTransform(size_t cap_len); extern const SliceTransform* NewNoopTransform(); -} +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/sst_dump_tool.h b/ceph/src/rocksdb/include/rocksdb/sst_dump_tool.h index 021faa019..c7cc4a0fc 100644 --- a/ceph/src/rocksdb/include/rocksdb/sst_dump_tool.h +++ b/ceph/src/rocksdb/include/rocksdb/sst_dump_tool.h @@ -5,11 +5,13 @@ #ifndef ROCKSDB_LITE #pragma once +#include "rocksdb/options.h" + namespace rocksdb { class SSTDumpTool { public: - int Run(int argc, char** argv); + int Run(int argc, char** argv, Options options = Options()); }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/sst_file_manager.h b/ceph/src/rocksdb/include/rocksdb/sst_file_manager.h index 1474da955..3e3ef859b 100644 --- a/ceph/src/rocksdb/include/rocksdb/sst_file_manager.h +++ b/ceph/src/rocksdb/include/rocksdb/sst_file_manager.h @@ -83,6 +83,8 @@ class SstFileManager { // Create a new SstFileManager that can be shared among multiple RocksDB // instances to track SST file and control there deletion rate. +// Even though SstFileManager don't track WAL files but it still control +// there deletion rate. // // @param env: Pointer to Env object, please see "rocksdb/env.h". // @param info_log: If not nullptr, info_log will be used to log errors. @@ -93,6 +95,7 @@ class SstFileManager { // this value is set to 1024 (1 Kb / sec) and we deleted a file of size 4 Kb // in 1 second, we will wait for another 3 seconds before we delete other // files, Set to 0 to disable deletion rate limiting. +// This option also affect the delete rate of WAL files in the DB. // @param delete_existing_trash: Deprecated, this argument have no effect, but // if user provide trash_dir we will schedule deletes for files in the dir // @param status: If not nullptr, status will contain any errors that happened diff --git a/ceph/src/rocksdb/include/rocksdb/sst_file_reader.h b/ceph/src/rocksdb/include/rocksdb/sst_file_reader.h new file mode 100644 index 000000000..517907dd5 --- /dev/null +++ b/ceph/src/rocksdb/include/rocksdb/sst_file_reader.h @@ -0,0 +1,45 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#pragma once + +#ifndef ROCKSDB_LITE + +#include "rocksdb/iterator.h" +#include "rocksdb/options.h" +#include "rocksdb/slice.h" +#include "rocksdb/table_properties.h" + +namespace rocksdb { + +// SstFileReader is used to read sst files that are generated by DB or +// SstFileWriter. +class SstFileReader { + public: + SstFileReader(const Options& options); + + ~SstFileReader(); + + // Prepares to read from the file located at "file_path". + Status Open(const std::string& file_path); + + // Returns a new iterator over the table contents. + // Most read options provide the same control as we read from DB. + // If "snapshot" is nullptr, the iterator returns only the latest keys. + Iterator* NewIterator(const ReadOptions& options); + + std::shared_ptr GetTableProperties() const; + + // Verifies whether there is corruption in this table. + Status VerifyChecksum(); + + private: + struct Rep; + std::unique_ptr rep_; +}; + +} // namespace rocksdb + +#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/sst_file_writer.h b/ceph/src/rocksdb/include/rocksdb/sst_file_writer.h index 24bcdbd13..273c913e4 100644 --- a/ceph/src/rocksdb/include/rocksdb/sst_file_writer.h +++ b/ceph/src/rocksdb/include/rocksdb/sst_file_writer.h @@ -77,8 +77,9 @@ class SstFileWriter { // be ingested into this column_family, note that passing nullptr means that // the column_family is unknown. // If invalidate_page_cache is set to true, SstFileWriter will give the OS a - // hint that this file pages is not needed every time we write 1MB to the file. - // To use the rate limiter an io_priority smaller than IO_TOTAL can be passed. + // hint that this file pages is not needed every time we write 1MB to the + // file. To use the rate limiter an io_priority smaller than IO_TOTAL can be + // passed. SstFileWriter(const EnvOptions& env_options, const Options& options, ColumnFamilyHandle* column_family = nullptr, bool invalidate_page_cache = true, diff --git a/ceph/src/rocksdb/include/rocksdb/statistics.h b/ceph/src/rocksdb/include/rocksdb/statistics.h index c493a1824..bad1c87ec 100644 --- a/ceph/src/rocksdb/include/rocksdb/statistics.h +++ b/ceph/src/rocksdb/include/rocksdb/statistics.h @@ -8,8 +8,9 @@ #include #include #include -#include +#include #include +#include #include #include "rocksdb/status.h" @@ -21,6 +22,8 @@ namespace rocksdb { * 1. Any ticker should be added before TICKER_ENUM_MAX. * 2. Add a readable string in TickersNameMap below for the newly added ticker. * 3. Add a corresponding enum value to TickerType.java in the java API + * 4. Add the enum conversions from Java and C++ to portal.h's toJavaTickerType + * and toCppTickers */ enum Tickers : uint32_t { // total block cache misses @@ -155,7 +158,8 @@ enum Tickers : uint32_t { // Disabled by default. To enable it set stats level to kAll DB_MUTEX_WAIT_MICROS, RATE_LIMIT_DELAY_MILLIS, - NO_ITERATORS, // number of iterators currently open + // DEPRECATED number of iterators currently open + NO_ITERATORS, // Number of MultiGet calls, keys read, and bytes read NUMBER_MULTIGET_CALLS, @@ -308,7 +312,7 @@ enum Tickers : uint32_t { // # of bytes in the blob files evicted because of BlobDB is full. BLOB_DB_FIFO_BYTES_EVICTED, - // These coutners indicate a performance issue in WritePrepared transactions. + // These counters indicate a performance issue in WritePrepared transactions. // We should not seem them ticking them much. // # of times prepare_mutex_ is acquired in the fast path. TXN_PREPARE_MUTEX_OVERHEAD, @@ -319,162 +323,25 @@ enum Tickers : uint32_t { // # of times snapshot_mutex_ is acquired in the fast path. TXN_SNAPSHOT_MUTEX_OVERHEAD, - // Number of keys actually found in MultiGet calls (vs number requested by caller) + // Number of keys actually found in MultiGet calls (vs number requested by + // caller) // NUMBER_MULTIGET_KEYS_READ gives the number requested by caller NUMBER_MULTIGET_KEYS_FOUND, + + NO_ITERATOR_CREATED, // number of iterators created + NO_ITERATOR_DELETED, // number of iterators deleted + + BLOCK_CACHE_COMPRESSION_DICT_MISS, + BLOCK_CACHE_COMPRESSION_DICT_HIT, + BLOCK_CACHE_COMPRESSION_DICT_ADD, + BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT, + BLOCK_CACHE_COMPRESSION_DICT_BYTES_EVICT, TICKER_ENUM_MAX }; // The order of items listed in Tickers should be the same as // the order listed in TickersNameMap -const std::vector> TickersNameMap = { - {BLOCK_CACHE_MISS, "rocksdb.block.cache.miss"}, - {BLOCK_CACHE_HIT, "rocksdb.block.cache.hit"}, - {BLOCK_CACHE_ADD, "rocksdb.block.cache.add"}, - {BLOCK_CACHE_ADD_FAILURES, "rocksdb.block.cache.add.failures"}, - {BLOCK_CACHE_INDEX_MISS, "rocksdb.block.cache.index.miss"}, - {BLOCK_CACHE_INDEX_HIT, "rocksdb.block.cache.index.hit"}, - {BLOCK_CACHE_INDEX_ADD, "rocksdb.block.cache.index.add"}, - {BLOCK_CACHE_INDEX_BYTES_INSERT, "rocksdb.block.cache.index.bytes.insert"}, - {BLOCK_CACHE_INDEX_BYTES_EVICT, "rocksdb.block.cache.index.bytes.evict"}, - {BLOCK_CACHE_FILTER_MISS, "rocksdb.block.cache.filter.miss"}, - {BLOCK_CACHE_FILTER_HIT, "rocksdb.block.cache.filter.hit"}, - {BLOCK_CACHE_FILTER_ADD, "rocksdb.block.cache.filter.add"}, - {BLOCK_CACHE_FILTER_BYTES_INSERT, - "rocksdb.block.cache.filter.bytes.insert"}, - {BLOCK_CACHE_FILTER_BYTES_EVICT, "rocksdb.block.cache.filter.bytes.evict"}, - {BLOCK_CACHE_DATA_MISS, "rocksdb.block.cache.data.miss"}, - {BLOCK_CACHE_DATA_HIT, "rocksdb.block.cache.data.hit"}, - {BLOCK_CACHE_DATA_ADD, "rocksdb.block.cache.data.add"}, - {BLOCK_CACHE_DATA_BYTES_INSERT, "rocksdb.block.cache.data.bytes.insert"}, - {BLOCK_CACHE_BYTES_READ, "rocksdb.block.cache.bytes.read"}, - {BLOCK_CACHE_BYTES_WRITE, "rocksdb.block.cache.bytes.write"}, - {BLOOM_FILTER_USEFUL, "rocksdb.bloom.filter.useful"}, - {BLOOM_FILTER_FULL_POSITIVE, "rocksdb.bloom.filter.full.positive"}, - {BLOOM_FILTER_FULL_TRUE_POSITIVE, - "rocksdb.bloom.filter.full.true.positive"}, - {PERSISTENT_CACHE_HIT, "rocksdb.persistent.cache.hit"}, - {PERSISTENT_CACHE_MISS, "rocksdb.persistent.cache.miss"}, - {SIM_BLOCK_CACHE_HIT, "rocksdb.sim.block.cache.hit"}, - {SIM_BLOCK_CACHE_MISS, "rocksdb.sim.block.cache.miss"}, - {MEMTABLE_HIT, "rocksdb.memtable.hit"}, - {MEMTABLE_MISS, "rocksdb.memtable.miss"}, - {GET_HIT_L0, "rocksdb.l0.hit"}, - {GET_HIT_L1, "rocksdb.l1.hit"}, - {GET_HIT_L2_AND_UP, "rocksdb.l2andup.hit"}, - {COMPACTION_KEY_DROP_NEWER_ENTRY, "rocksdb.compaction.key.drop.new"}, - {COMPACTION_KEY_DROP_OBSOLETE, "rocksdb.compaction.key.drop.obsolete"}, - {COMPACTION_KEY_DROP_RANGE_DEL, "rocksdb.compaction.key.drop.range_del"}, - {COMPACTION_KEY_DROP_USER, "rocksdb.compaction.key.drop.user"}, - {COMPACTION_RANGE_DEL_DROP_OBSOLETE, - "rocksdb.compaction.range_del.drop.obsolete"}, - {COMPACTION_OPTIMIZED_DEL_DROP_OBSOLETE, - "rocksdb.compaction.optimized.del.drop.obsolete"}, - {COMPACTION_CANCELLED, "rocksdb.compaction.cancelled"}, - {NUMBER_KEYS_WRITTEN, "rocksdb.number.keys.written"}, - {NUMBER_KEYS_READ, "rocksdb.number.keys.read"}, - {NUMBER_KEYS_UPDATED, "rocksdb.number.keys.updated"}, - {BYTES_WRITTEN, "rocksdb.bytes.written"}, - {BYTES_READ, "rocksdb.bytes.read"}, - {NUMBER_DB_SEEK, "rocksdb.number.db.seek"}, - {NUMBER_DB_NEXT, "rocksdb.number.db.next"}, - {NUMBER_DB_PREV, "rocksdb.number.db.prev"}, - {NUMBER_DB_SEEK_FOUND, "rocksdb.number.db.seek.found"}, - {NUMBER_DB_NEXT_FOUND, "rocksdb.number.db.next.found"}, - {NUMBER_DB_PREV_FOUND, "rocksdb.number.db.prev.found"}, - {ITER_BYTES_READ, "rocksdb.db.iter.bytes.read"}, - {NO_FILE_CLOSES, "rocksdb.no.file.closes"}, - {NO_FILE_OPENS, "rocksdb.no.file.opens"}, - {NO_FILE_ERRORS, "rocksdb.no.file.errors"}, - {STALL_L0_SLOWDOWN_MICROS, "rocksdb.l0.slowdown.micros"}, - {STALL_MEMTABLE_COMPACTION_MICROS, "rocksdb.memtable.compaction.micros"}, - {STALL_L0_NUM_FILES_MICROS, "rocksdb.l0.num.files.stall.micros"}, - {STALL_MICROS, "rocksdb.stall.micros"}, - {DB_MUTEX_WAIT_MICROS, "rocksdb.db.mutex.wait.micros"}, - {RATE_LIMIT_DELAY_MILLIS, "rocksdb.rate.limit.delay.millis"}, - {NO_ITERATORS, "rocksdb.num.iterators"}, - {NUMBER_MULTIGET_CALLS, "rocksdb.number.multiget.get"}, - {NUMBER_MULTIGET_KEYS_READ, "rocksdb.number.multiget.keys.read"}, - {NUMBER_MULTIGET_BYTES_READ, "rocksdb.number.multiget.bytes.read"}, - {NUMBER_FILTERED_DELETES, "rocksdb.number.deletes.filtered"}, - {NUMBER_MERGE_FAILURES, "rocksdb.number.merge.failures"}, - {BLOOM_FILTER_PREFIX_CHECKED, "rocksdb.bloom.filter.prefix.checked"}, - {BLOOM_FILTER_PREFIX_USEFUL, "rocksdb.bloom.filter.prefix.useful"}, - {NUMBER_OF_RESEEKS_IN_ITERATION, "rocksdb.number.reseeks.iteration"}, - {GET_UPDATES_SINCE_CALLS, "rocksdb.getupdatessince.calls"}, - {BLOCK_CACHE_COMPRESSED_MISS, "rocksdb.block.cachecompressed.miss"}, - {BLOCK_CACHE_COMPRESSED_HIT, "rocksdb.block.cachecompressed.hit"}, - {BLOCK_CACHE_COMPRESSED_ADD, "rocksdb.block.cachecompressed.add"}, - {BLOCK_CACHE_COMPRESSED_ADD_FAILURES, - "rocksdb.block.cachecompressed.add.failures"}, - {WAL_FILE_SYNCED, "rocksdb.wal.synced"}, - {WAL_FILE_BYTES, "rocksdb.wal.bytes"}, - {WRITE_DONE_BY_SELF, "rocksdb.write.self"}, - {WRITE_DONE_BY_OTHER, "rocksdb.write.other"}, - {WRITE_TIMEDOUT, "rocksdb.write.timeout"}, - {WRITE_WITH_WAL, "rocksdb.write.wal"}, - {COMPACT_READ_BYTES, "rocksdb.compact.read.bytes"}, - {COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes"}, - {FLUSH_WRITE_BYTES, "rocksdb.flush.write.bytes"}, - {NUMBER_DIRECT_LOAD_TABLE_PROPERTIES, - "rocksdb.number.direct.load.table.properties"}, - {NUMBER_SUPERVERSION_ACQUIRES, "rocksdb.number.superversion_acquires"}, - {NUMBER_SUPERVERSION_RELEASES, "rocksdb.number.superversion_releases"}, - {NUMBER_SUPERVERSION_CLEANUPS, "rocksdb.number.superversion_cleanups"}, - {NUMBER_BLOCK_COMPRESSED, "rocksdb.number.block.compressed"}, - {NUMBER_BLOCK_DECOMPRESSED, "rocksdb.number.block.decompressed"}, - {NUMBER_BLOCK_NOT_COMPRESSED, "rocksdb.number.block.not_compressed"}, - {MERGE_OPERATION_TOTAL_TIME, "rocksdb.merge.operation.time.nanos"}, - {FILTER_OPERATION_TOTAL_TIME, "rocksdb.filter.operation.time.nanos"}, - {ROW_CACHE_HIT, "rocksdb.row.cache.hit"}, - {ROW_CACHE_MISS, "rocksdb.row.cache.miss"}, - {READ_AMP_ESTIMATE_USEFUL_BYTES, "rocksdb.read.amp.estimate.useful.bytes"}, - {READ_AMP_TOTAL_READ_BYTES, "rocksdb.read.amp.total.read.bytes"}, - {NUMBER_RATE_LIMITER_DRAINS, "rocksdb.number.rate_limiter.drains"}, - {NUMBER_ITER_SKIP, "rocksdb.number.iter.skip"}, - {BLOB_DB_NUM_PUT, "rocksdb.blobdb.num.put"}, - {BLOB_DB_NUM_WRITE, "rocksdb.blobdb.num.write"}, - {BLOB_DB_NUM_GET, "rocksdb.blobdb.num.get"}, - {BLOB_DB_NUM_MULTIGET, "rocksdb.blobdb.num.multiget"}, - {BLOB_DB_NUM_SEEK, "rocksdb.blobdb.num.seek"}, - {BLOB_DB_NUM_NEXT, "rocksdb.blobdb.num.next"}, - {BLOB_DB_NUM_PREV, "rocksdb.blobdb.num.prev"}, - {BLOB_DB_NUM_KEYS_WRITTEN, "rocksdb.blobdb.num.keys.written"}, - {BLOB_DB_NUM_KEYS_READ, "rocksdb.blobdb.num.keys.read"}, - {BLOB_DB_BYTES_WRITTEN, "rocksdb.blobdb.bytes.written"}, - {BLOB_DB_BYTES_READ, "rocksdb.blobdb.bytes.read"}, - {BLOB_DB_WRITE_INLINED, "rocksdb.blobdb.write.inlined"}, - {BLOB_DB_WRITE_INLINED_TTL, "rocksdb.blobdb.write.inlined.ttl"}, - {BLOB_DB_WRITE_BLOB, "rocksdb.blobdb.write.blob"}, - {BLOB_DB_WRITE_BLOB_TTL, "rocksdb.blobdb.write.blob.ttl"}, - {BLOB_DB_BLOB_FILE_BYTES_WRITTEN, "rocksdb.blobdb.blob.file.bytes.written"}, - {BLOB_DB_BLOB_FILE_BYTES_READ, "rocksdb.blobdb.blob.file.bytes.read"}, - {BLOB_DB_BLOB_FILE_SYNCED, "rocksdb.blobdb.blob.file.synced"}, - {BLOB_DB_BLOB_INDEX_EXPIRED_COUNT, - "rocksdb.blobdb.blob.index.expired.count"}, - {BLOB_DB_BLOB_INDEX_EXPIRED_SIZE, "rocksdb.blobdb.blob.index.expired.size"}, - {BLOB_DB_BLOB_INDEX_EVICTED_COUNT, - "rocksdb.blobdb.blob.index.evicted.count"}, - {BLOB_DB_BLOB_INDEX_EVICTED_SIZE, "rocksdb.blobdb.blob.index.evicted.size"}, - {BLOB_DB_GC_NUM_FILES, "rocksdb.blobdb.gc.num.files"}, - {BLOB_DB_GC_NUM_NEW_FILES, "rocksdb.blobdb.gc.num.new.files"}, - {BLOB_DB_GC_FAILURES, "rocksdb.blobdb.gc.failures"}, - {BLOB_DB_GC_NUM_KEYS_OVERWRITTEN, "rocksdb.blobdb.gc.num.keys.overwritten"}, - {BLOB_DB_GC_NUM_KEYS_EXPIRED, "rocksdb.blobdb.gc.num.keys.expired"}, - {BLOB_DB_GC_NUM_KEYS_RELOCATED, "rocksdb.blobdb.gc.num.keys.relocated"}, - {BLOB_DB_GC_BYTES_OVERWRITTEN, "rocksdb.blobdb.gc.bytes.overwritten"}, - {BLOB_DB_GC_BYTES_EXPIRED, "rocksdb.blobdb.gc.bytes.expired"}, - {BLOB_DB_GC_BYTES_RELOCATED, "rocksdb.blobdb.gc.bytes.relocated"}, - {BLOB_DB_FIFO_NUM_FILES_EVICTED, "rocksdb.blobdb.fifo.num.files.evicted"}, - {BLOB_DB_FIFO_NUM_KEYS_EVICTED, "rocksdb.blobdb.fifo.num.keys.evicted"}, - {BLOB_DB_FIFO_BYTES_EVICTED, "rocksdb.blobdb.fifo.bytes.evicted"}, - {TXN_PREPARE_MUTEX_OVERHEAD, "rocksdb.txn.overhead.mutex.prepare"}, - {TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD, - "rocksdb.txn.overhead.mutex.old.commit.map"}, - {TXN_DUPLICATE_KEY_OVERHEAD, "rocksdb.txn.overhead.duplicate.key"}, - {TXN_SNAPSHOT_MUTEX_OVERHEAD, "rocksdb.txn.overhead.mutex.snapshot"}, - {NUMBER_MULTIGET_KEYS_FOUND, "rocksdb.number.multiget.keys.found"}, -}; +extern const std::vector> TickersNameMap; /** * Keep adding histogram's here. @@ -488,6 +355,7 @@ enum Histograms : uint32_t { DB_GET = 0, DB_WRITE, COMPACTION_TIME, + COMPACTION_CPU_TIME, SUBCOMPACTION_SETUP_TIME, TABLE_SYNC_MICROS, COMPACTION_OUTFILE_SYNC_MICROS, @@ -557,57 +425,10 @@ enum Histograms : uint32_t { // Time spent flushing memtable to disk FLUSH_TIME, - HISTOGRAM_ENUM_MAX, // TODO(ldemailly): enforce HistogramsNameMap match + HISTOGRAM_ENUM_MAX, }; -const std::vector> HistogramsNameMap = { - {DB_GET, "rocksdb.db.get.micros"}, - {DB_WRITE, "rocksdb.db.write.micros"}, - {COMPACTION_TIME, "rocksdb.compaction.times.micros"}, - {SUBCOMPACTION_SETUP_TIME, "rocksdb.subcompaction.setup.times.micros"}, - {TABLE_SYNC_MICROS, "rocksdb.table.sync.micros"}, - {COMPACTION_OUTFILE_SYNC_MICROS, "rocksdb.compaction.outfile.sync.micros"}, - {WAL_FILE_SYNC_MICROS, "rocksdb.wal.file.sync.micros"}, - {MANIFEST_FILE_SYNC_MICROS, "rocksdb.manifest.file.sync.micros"}, - {TABLE_OPEN_IO_MICROS, "rocksdb.table.open.io.micros"}, - {DB_MULTIGET, "rocksdb.db.multiget.micros"}, - {READ_BLOCK_COMPACTION_MICROS, "rocksdb.read.block.compaction.micros"}, - {READ_BLOCK_GET_MICROS, "rocksdb.read.block.get.micros"}, - {WRITE_RAW_BLOCK_MICROS, "rocksdb.write.raw.block.micros"}, - {STALL_L0_SLOWDOWN_COUNT, "rocksdb.l0.slowdown.count"}, - {STALL_MEMTABLE_COMPACTION_COUNT, "rocksdb.memtable.compaction.count"}, - {STALL_L0_NUM_FILES_COUNT, "rocksdb.num.files.stall.count"}, - {HARD_RATE_LIMIT_DELAY_COUNT, "rocksdb.hard.rate.limit.delay.count"}, - {SOFT_RATE_LIMIT_DELAY_COUNT, "rocksdb.soft.rate.limit.delay.count"}, - {NUM_FILES_IN_SINGLE_COMPACTION, "rocksdb.numfiles.in.singlecompaction"}, - {DB_SEEK, "rocksdb.db.seek.micros"}, - {WRITE_STALL, "rocksdb.db.write.stall"}, - {SST_READ_MICROS, "rocksdb.sst.read.micros"}, - {NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"}, - {BYTES_PER_READ, "rocksdb.bytes.per.read"}, - {BYTES_PER_WRITE, "rocksdb.bytes.per.write"}, - {BYTES_PER_MULTIGET, "rocksdb.bytes.per.multiget"}, - {BYTES_COMPRESSED, "rocksdb.bytes.compressed"}, - {BYTES_DECOMPRESSED, "rocksdb.bytes.decompressed"}, - {COMPRESSION_TIMES_NANOS, "rocksdb.compression.times.nanos"}, - {DECOMPRESSION_TIMES_NANOS, "rocksdb.decompression.times.nanos"}, - {READ_NUM_MERGE_OPERANDS, "rocksdb.read.num.merge_operands"}, - {BLOB_DB_KEY_SIZE, "rocksdb.blobdb.key.size"}, - {BLOB_DB_VALUE_SIZE, "rocksdb.blobdb.value.size"}, - {BLOB_DB_WRITE_MICROS, "rocksdb.blobdb.write.micros"}, - {BLOB_DB_GET_MICROS, "rocksdb.blobdb.get.micros"}, - {BLOB_DB_MULTIGET_MICROS, "rocksdb.blobdb.multiget.micros"}, - {BLOB_DB_SEEK_MICROS, "rocksdb.blobdb.seek.micros"}, - {BLOB_DB_NEXT_MICROS, "rocksdb.blobdb.next.micros"}, - {BLOB_DB_PREV_MICROS, "rocksdb.blobdb.prev.micros"}, - {BLOB_DB_BLOB_FILE_WRITE_MICROS, "rocksdb.blobdb.blob.file.write.micros"}, - {BLOB_DB_BLOB_FILE_READ_MICROS, "rocksdb.blobdb.blob.file.read.micros"}, - {BLOB_DB_BLOB_FILE_SYNC_MICROS, "rocksdb.blobdb.blob.file.sync.micros"}, - {BLOB_DB_GC_MICROS, "rocksdb.blobdb.gc.micros"}, - {BLOB_DB_COMPRESSION_MICROS, "rocksdb.blobdb.compression.micros"}, - {BLOB_DB_DECOMPRESSION_MICROS, "rocksdb.blobdb.decompression.micros"}, - {FLUSH_TIME, "rocksdb.db.flush.micros"}, -}; +extern const std::vector> HistogramsNameMap; struct HistogramData { double median; @@ -620,9 +441,14 @@ struct HistogramData { double max = 0.0; uint64_t count = 0; uint64_t sum = 0; + double min = 0.0; }; -enum StatsLevel { +enum StatsLevel : uint8_t { + // Disable timer stats, and skip histogram stats + kExceptHistogramOrTimers, + // Skip timer stats + kExceptTimers, // Collect all stats except time inside mutex lock AND time spent on // compression. kExceptDetailedTimers, @@ -647,12 +473,29 @@ class Statistics { virtual void recordTick(uint32_t tickerType, uint64_t count = 0) = 0; virtual void setTickerCount(uint32_t tickerType, uint64_t count) = 0; virtual uint64_t getAndResetTickerCount(uint32_t tickerType) = 0; - virtual void measureTime(uint32_t histogramType, uint64_t time) = 0; + virtual void reportTimeToHistogram(uint32_t histogramType, uint64_t time) { + if (get_stats_level() <= StatsLevel::kExceptTimers) { + return; + } + recordInHistogram(histogramType, time); + } + // The function is here only for backward compatibility reason. + // Users implementing their own Statistics class should override + // recordInHistogram() instead and leave measureTime() as it is. + virtual void measureTime(uint32_t /*histogramType*/, uint64_t /*time*/) { + // This is not supposed to be called. + assert(false); + } + virtual void recordInHistogram(uint32_t histogramType, uint64_t time) { + // measureTime() is the old and inaccurate function name. + // To keep backward compatible. If users implement their own + // statistics, which overrides meareTime() but doesn't override + // this function. We forward to measureTime(). + measureTime(histogramType, time); + } // Resets all ticker and histogram stats - virtual Status Reset() { - return Status::NotSupported("Not implemented"); - } + virtual Status Reset() { return Status::NotSupported("Not implemented"); } // String representation of the statistic object. virtual std::string ToString() const { @@ -660,12 +503,24 @@ class Statistics { return std::string("ToString(): not implemented"); } + virtual bool getTickerMap(std::map*) const { + // Do nothing by default + return false; + }; + // Override this function to disable particular histogram collection virtual bool HistEnabledForType(uint32_t type) const { return type < HISTOGRAM_ENUM_MAX; } + void set_stats_level(StatsLevel sl) { + stats_level_.store(sl, std::memory_order_relaxed); + } + StatsLevel get_stats_level() const { + return stats_level_.load(std::memory_order_relaxed); + } - StatsLevel stats_level_ = kExceptDetailedTimers; + private: + std::atomic stats_level_{kExceptDetailedTimers}; }; // Create a concrete DBStatistics object diff --git a/ceph/src/rocksdb/include/rocksdb/stats_history.h b/ceph/src/rocksdb/include/rocksdb/stats_history.h new file mode 100644 index 000000000..40ea51d1f --- /dev/null +++ b/ceph/src/rocksdb/include/rocksdb/stats_history.h @@ -0,0 +1,49 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// Copyright (c) 2011 The LevelDB Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. See the AUTHORS file for names of contributors. + +#pragma once + +#include +#include + +// #include "db/db_impl.h" +#include "rocksdb/statistics.h" +#include "rocksdb/status.h" + +namespace rocksdb { + +class DBImpl; + +class StatsHistoryIterator { + public: + StatsHistoryIterator() {} + virtual ~StatsHistoryIterator() {} + + virtual bool Valid() const = 0; + + // Moves to the next stats history record. After this call, Valid() is + // true iff the iterator was not positioned at the last entry in the source. + // REQUIRES: Valid() + virtual void Next() = 0; + + // Return the time stamp (in microseconds) when stats history is recorded. + // REQUIRES: Valid() + virtual uint64_t GetStatsTime() const = 0; + + // Return the current stats history as an std::map which specifies the + // mapping from stats name to stats value . The underlying storage + // for the returned map is valid only until the next modification of + // the iterator. + // REQUIRES: Valid() + virtual const std::map& GetStatsMap() const = 0; + + // If an error has occurred, return it. Else return an ok status. + virtual Status status() const = 0; +}; + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/status.h b/ceph/src/rocksdb/include/rocksdb/status.h index 40b374ecf..12e8070d1 100644 --- a/ceph/src/rocksdb/include/rocksdb/status.h +++ b/ceph/src/rocksdb/include/rocksdb/status.h @@ -73,6 +73,7 @@ class Status { kStaleFile = 6, kMemoryLimit = 7, kSpaceLimit = 8, + kPathNotFound = 9, kMaxSubCode }; @@ -198,6 +199,11 @@ class Status { return Status(kIOError, kSpaceLimit, msg, msg2); } + static Status PathNotFound() { return Status(kIOError, kPathNotFound); } + static Status PathNotFound(const Slice& msg, const Slice& msg2 = Slice()) { + return Status(kIOError, kPathNotFound, msg, msg2); + } + // Returns true iff the status indicates success. bool ok() const { return code() == kOk; } @@ -266,6 +272,14 @@ class Status { return (code() == kAborted) && (subcode() == kMemoryLimit); } + // Returns true iff the status indicates a PathNotFound error + // This is caused by an I/O error returning the specific "no such file or + // directory" error condition. A PathNotFound error is an I/O error with + // a specific subcode, enabling users to take appropriate action if necessary + bool IsPathNotFound() const { + return (code() == kIOError) && (subcode() == kPathNotFound); + } + // Return a string representation of this status suitable for printing. // Returns the string "OK" for success. std::string ToString() const; @@ -291,11 +305,12 @@ class Status { static const char* CopyState(const char* s); }; -inline Status::Status(const Status& s) : code_(s.code_), subcode_(s.subcode_), sev_(s.sev_) { +inline Status::Status(const Status& s) + : code_(s.code_), subcode_(s.subcode_), sev_(s.sev_) { state_ = (s.state_ == nullptr) ? nullptr : CopyState(s.state_); } inline Status::Status(const Status& s, Severity sev) - : code_(s.code_), subcode_(s.subcode_), sev_(sev) { + : code_(s.code_), subcode_(s.subcode_), sev_(sev) { state_ = (s.state_ == nullptr) ? nullptr : CopyState(s.state_); } inline Status& Status::operator=(const Status& s) { diff --git a/ceph/src/rocksdb/include/rocksdb/table.h b/ceph/src/rocksdb/include/rocksdb/table.h index a177d1c7a..6c584375c 100644 --- a/ceph/src/rocksdb/include/rocksdb/table.h +++ b/ceph/src/rocksdb/include/rocksdb/table.h @@ -41,12 +41,11 @@ class WritableFileWriter; struct EnvOptions; struct Options; -using std::unique_ptr; - enum ChecksumType : char { kNoChecksum = 0x0, kCRC32c = 0x1, kxxHash = 0x2, + kxxHash64 = 0x3, }; // For advanced user only @@ -61,6 +60,10 @@ struct BlockBasedTableOptions { // TODO(kailiu) Temporarily disable this feature by making the default value // to be false. // + // TODO(ajkr) we need to update names of variables controlling meta-block + // caching as they should now apply to range tombstone and compression + // dictionary meta-blocks, in addition to index and filter meta-blocks. + // // Indicating if we'd put index/filter blocks to the block cache. // If not specified, each "table reader" object will pre-load index/filter // block during table initialization. @@ -137,6 +140,8 @@ struct BlockBasedTableOptions { // If non-NULL use the specified cache for compressed blocks. // If NULL, rocksdb will not use a compressed block cache. + // Note: though it looks similar to `block_cache`, RocksDB doesn't put the + // same type of object there. std::shared_ptr block_cache_compressed = nullptr; // Approximate size of user data packed per block. Note that the @@ -222,7 +227,7 @@ struct BlockBasedTableOptions { // Default: 0 (disabled) uint32_t read_amp_bytes_per_bit = 0; - // We currently have three versions: + // We currently have five versions: // 0 -- This version is currently written out by all RocksDB's versions by // default. Can be read by really old RocksDB's. Doesn't support changing // checksum (default is CRC32). @@ -351,13 +356,13 @@ struct PlainTableOptions { }; // -- Plain Table with prefix-only seek -// For this factory, you need to set Options.prefix_extractor properly to make it -// work. Look-up will starts with prefix hash lookup for key prefix. Inside the -// hash bucket found, a binary search is executed for hash conflicts. Finally, -// a linear search is used. +// For this factory, you need to set Options.prefix_extractor properly to make +// it work. Look-up will starts with prefix hash lookup for key prefix. Inside +// the hash bucket found, a binary search is executed for hash conflicts. +// Finally, a linear search is used. -extern TableFactory* NewPlainTableFactory(const PlainTableOptions& options = - PlainTableOptions()); +extern TableFactory* NewPlainTableFactory( + const PlainTableOptions& options = PlainTableOptions()); struct CuckooTablePropertyNames { // The key that is used to fill empty buckets. @@ -449,10 +454,10 @@ class TableFactory { // NewTableReader() is called in three places: // (1) TableCache::FindTable() calls the function when table cache miss // and cache the table object returned. - // (2) SstFileReader (for SST Dump) opens the table and dump the table + // (2) SstFileDumper (for SST Dump) opens the table and dump the table // contents using the iterator of the table. - // (3) DBImpl::IngestExternalFile() calls this function to read the contents of - // the sst file it's attempting to add + // (3) DBImpl::IngestExternalFile() calls this function to read the contents + // of the sst file it's attempting to add // // table_reader_options is a TableReaderOptions which contain all the // needed parameters and configuration to open the table. @@ -461,8 +466,8 @@ class TableFactory { // table_reader is the output table reader. virtual Status NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table_reader, bool prefetch_index_and_filter_in_cache = true) const = 0; // Return a table builder to write to a file for this table type. @@ -491,9 +496,8 @@ class TableFactory { // // If the function cannot find a way to sanitize the input DB Options, // a non-ok Status will be returned. - virtual Status SanitizeOptions( - const DBOptions& db_opts, - const ColumnFamilyOptions& cf_opts) const = 0; + virtual Status SanitizeOptions(const DBOptions& db_opts, + const ColumnFamilyOptions& cf_opts) const = 0; // Return a string that contains printable format of table configurations. // RocksDB prints configurations at DB Open(). @@ -533,7 +537,8 @@ class TableFactory { // @block_based_table_factory: block based table factory to use. If NULL, use // a default one. // @plain_table_factory: plain table factory to use. If NULL, use a default one. -// @cuckoo_table_factory: cuckoo table factory to use. If NULL, use a default one. +// @cuckoo_table_factory: cuckoo table factory to use. If NULL, use a default +// one. extern TableFactory* NewAdaptiveTableFactory( std::shared_ptr table_factory_to_write = nullptr, std::shared_ptr block_based_table_factory = nullptr, diff --git a/ceph/src/rocksdb/include/rocksdb/table_properties.h b/ceph/src/rocksdb/include/rocksdb/table_properties.h index d545e455f..70e8d2cba 100644 --- a/ceph/src/rocksdb/include/rocksdb/table_properties.h +++ b/ceph/src/rocksdb/include/rocksdb/table_properties.h @@ -40,6 +40,8 @@ struct TablePropertiesNames { static const std::string kRawValueSize; static const std::string kNumDataBlocks; static const std::string kNumEntries; + static const std::string kDeletedKeys; + static const std::string kMergeOperands; static const std::string kNumRangeDeletions; static const std::string kFormatVersion; static const std::string kFixedKeyLen; @@ -51,6 +53,7 @@ struct TablePropertiesNames { static const std::string kPrefixExtractorName; static const std::string kPropertyCollectors; static const std::string kCompression; + static const std::string kCompressionOptions; static const std::string kCreationTime; static const std::string kOldestKeyTime; }; @@ -90,6 +93,14 @@ class TablePropertiesCollector { return Add(key, value); } + // Called after each new block is cut + virtual void BlockAdd(uint64_t /* blockRawBytes */, + uint64_t /* blockCompressedBytesFast */, + uint64_t /* blockCompressedBytesSlow */) { + // Nothing to do here. Callback registers can override. + return; + } + // Finish() will be called when a table has already been built and is ready // for writing the properties block. // @params properties User will add their collected statistics to @@ -152,6 +163,10 @@ struct TableProperties { uint64_t num_data_blocks = 0; // the number of entries in this table uint64_t num_entries = 0; + // the number of deletions in the table + uint64_t num_deletions = 0; + // the number of merge operands in the table + uint64_t num_merge_operands = 0; // the number of range deletions in this table uint64_t num_range_deletions = 0; // format version, reserved for backward compatibility @@ -195,6 +210,9 @@ struct TableProperties { // The compression algo used to compress the SST files. std::string compression_name; + // Compression options used to compress the SST files. + std::string compression_options; + // user collected properties UserCollectedProperties user_collected_properties; UserCollectedProperties readable_properties; @@ -216,6 +234,10 @@ struct TableProperties { // Below is a list of non-basic properties that are collected by database // itself. Especially some properties regarding to the internal keys (which // is unknown to `table`). +// +// DEPRECATED: these properties now belong as TableProperties members. Please +// use TableProperties::num_deletions and TableProperties::num_merge_operands, +// respectively. extern uint64_t GetDeletedKeys(const UserCollectedProperties& props); extern uint64_t GetMergeOperands(const UserCollectedProperties& props, bool* property_present); diff --git a/ceph/src/rocksdb/include/rocksdb/thread_status.h b/ceph/src/rocksdb/include/rocksdb/thread_status.h index e7a25f190..b81c1c284 100644 --- a/ceph/src/rocksdb/include/rocksdb/thread_status.h +++ b/ceph/src/rocksdb/include/rocksdb/thread_status.h @@ -20,8 +20,7 @@ #include #include -#if !defined(ROCKSDB_LITE) && \ - !defined(NROCKSDB_THREAD_STATUS) && \ +#if !defined(ROCKSDB_LITE) && !defined(NROCKSDB_THREAD_STATUS) && \ defined(ROCKSDB_SUPPORT_THREAD_LOCAL) #define ROCKSDB_USING_THREAD_STATUS #endif @@ -43,9 +42,9 @@ struct ThreadStatus { // The type of a thread. enum ThreadType : int { HIGH_PRIORITY = 0, // RocksDB BG thread in high-pri thread pool - LOW_PRIORITY, // RocksDB BG thread in low-pri thread pool - USER, // User thread (Non-RocksDB BG thread) - BOTTOM_PRIORITY, // RocksDB BG thread in bottom-pri thread pool + LOW_PRIORITY, // RocksDB BG thread in low-pri thread pool + USER, // User thread (Non-RocksDB BG thread) + BOTTOM_PRIORITY, // RocksDB BG thread in bottom-pri thread pool NUM_THREAD_TYPES }; @@ -105,22 +104,20 @@ struct ThreadStatus { NUM_STATE_TYPES }; - ThreadStatus(const uint64_t _id, - const ThreadType _thread_type, - const std::string& _db_name, - const std::string& _cf_name, + ThreadStatus(const uint64_t _id, const ThreadType _thread_type, + const std::string& _db_name, const std::string& _cf_name, const OperationType _operation_type, const uint64_t _op_elapsed_micros, const OperationStage _operation_stage, - const uint64_t _op_props[], - const StateType _state_type) : - thread_id(_id), thread_type(_thread_type), - db_name(_db_name), - cf_name(_cf_name), - operation_type(_operation_type), - op_elapsed_micros(_op_elapsed_micros), - operation_stage(_operation_stage), - state_type(_state_type) { + const uint64_t _op_props[], const StateType _state_type) + : thread_id(_id), + thread_type(_thread_type), + db_name(_db_name), + cf_name(_cf_name), + operation_type(_operation_type), + op_elapsed_micros(_op_elapsed_micros), + operation_stage(_operation_stage), + state_type(_state_type) { for (int i = 0; i < kNumOperationProperties; ++i) { op_properties[i] = _op_props[i]; } @@ -172,23 +169,20 @@ struct ThreadStatus { static const std::string MicrosToString(uint64_t op_elapsed_time); // Obtain a human-readable string describing the specified operation stage. - static const std::string& GetOperationStageName( - OperationStage stage); + static const std::string& GetOperationStageName(OperationStage stage); // Obtain the name of the "i"th operation property of the // specified operation. - static const std::string& GetOperationPropertyName( - OperationType op_type, int i); + static const std::string& GetOperationPropertyName(OperationType op_type, + int i); // Translate the "i"th property of the specified operation given // a property value. - static std::map - InterpretOperationProperties( - OperationType op_type, const uint64_t* op_properties); + static std::map InterpretOperationProperties( + OperationType op_type, const uint64_t* op_properties); // Obtain the name of a state given its type. static const std::string& GetStateName(StateType state_type); }; - } // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/threadpool.h b/ceph/src/rocksdb/include/rocksdb/threadpool.h index e871ee18c..2e2f2b44f 100644 --- a/ceph/src/rocksdb/include/rocksdb/threadpool.h +++ b/ceph/src/rocksdb/include/rocksdb/threadpool.h @@ -47,7 +47,6 @@ class ThreadPool { virtual void SubmitJob(const std::function&) = 0; // This moves the function in for efficiency virtual void SubmitJob(std::function&&) = 0; - }; // NewThreadPool() is a function that could be used to create a ThreadPool diff --git a/ceph/src/rocksdb/include/rocksdb/trace_reader_writer.h b/ceph/src/rocksdb/include/rocksdb/trace_reader_writer.h index 31226487b..28919a0fa 100644 --- a/ceph/src/rocksdb/include/rocksdb/trace_reader_writer.h +++ b/ceph/src/rocksdb/include/rocksdb/trace_reader_writer.h @@ -24,6 +24,7 @@ class TraceWriter { virtual Status Write(const Slice& data) = 0; virtual Status Close() = 0; + virtual uint64_t GetFileSize() = 0; }; // TraceReader allows reading RocksDB traces from any system, one operation at diff --git a/ceph/src/rocksdb/include/rocksdb/transaction_log.h b/ceph/src/rocksdb/include/rocksdb/transaction_log.h index 1d8ef9186..80f373b24 100644 --- a/ceph/src/rocksdb/include/rocksdb/transaction_log.h +++ b/ceph/src/rocksdb/include/rocksdb/transaction_log.h @@ -5,18 +5,18 @@ #pragma once +#include +#include #include "rocksdb/status.h" #include "rocksdb/types.h" #include "rocksdb/write_batch.h" -#include -#include namespace rocksdb { class LogFile; typedef std::vector> VectorLogPtr; -enum WalFileType { +enum WalFileType { /* Indicates that WAL file is in archive directory. WAL files are moved from * the main db directory to archive directory once they are not live and stay * there until cleaned up. Files are cleaned depending on archive size @@ -27,7 +27,7 @@ enum WalFileType { /* Indicates that WAL file is live and resides in the main db directory */ kAliveLogFile = 1 -} ; +}; class LogFile { public: @@ -39,7 +39,6 @@ class LogFile { // For an archived-log-file = /archive/000003.log virtual std::string PathName() const = 0; - // Primary identifier for log file. // This is directly proportional to creation time of the log file virtual uint64_t LogNumber() const = 0; @@ -60,7 +59,7 @@ struct BatchResult { // Add empty __ctor and __dtor for the rule of five // However, preserve the original semantics and prohibit copying - // as the unique_ptr member does not copy. + // as the std::unique_ptr member does not copy. BatchResult() {} ~BatchResult() {} @@ -119,4 +118,4 @@ class TransactionLogIterator { : verify_checksums_(verify_checksums) {} }; }; -} // namespace rocksdb +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/types.h b/ceph/src/rocksdb/include/rocksdb/types.h index 0868a7415..2cd4039bd 100644 --- a/ceph/src/rocksdb/include/rocksdb/types.h +++ b/ceph/src/rocksdb/include/rocksdb/types.h @@ -15,6 +15,8 @@ namespace rocksdb { // Represents a sequence number in a WAL file. typedef uint64_t SequenceNumber; +const SequenceNumber kMinUnCommittedSeq = 1; // 0 is always committed + // User-oriented representation of internal key types. enum EntryType { kEntryPut, @@ -32,11 +34,9 @@ struct FullKey { SequenceNumber sequence; EntryType type; - FullKey() - : sequence(0) - {} // Intentionally left uninitialized (for speed) + FullKey() : sequence(0) {} // Intentionally left uninitialized (for speed) FullKey(const Slice& u, const SequenceNumber& seq, EntryType t) - : user_key(u), sequence(seq), type(t) { } + : user_key(u), sequence(seq), type(t) {} std::string DebugString(bool hex = false) const; void clear() { diff --git a/ceph/src/rocksdb/include/rocksdb/universal_compaction.h b/ceph/src/rocksdb/include/rocksdb/universal_compaction.h index 04e2c849f..e219694b3 100644 --- a/ceph/src/rocksdb/include/rocksdb/universal_compaction.h +++ b/ceph/src/rocksdb/include/rocksdb/universal_compaction.h @@ -16,13 +16,12 @@ namespace rocksdb { // into a single compaction run // enum CompactionStopStyle { - kCompactionStopStyleSimilarSize, // pick files of similar size - kCompactionStopStyleTotalSize // total size of picked files > next file + kCompactionStopStyleSimilarSize, // pick files of similar size + kCompactionStopStyleTotalSize // total size of picked files > next file }; class CompactionOptionsUniversal { public: - // Percentage flexibility while comparing file size. If the candidate file(s) // size is 1% smaller than the next file's size, then include next file into // this candidate set. // Default: 1 diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/backupable_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/backupable_db.h index d087f1274..7817c5649 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/backupable_db.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/backupable_db.h @@ -15,10 +15,10 @@ #endif #include -#include +#include #include +#include #include -#include #include "rocksdb/utilities/stackable_db.h" @@ -257,12 +257,13 @@ class BackupEngine { // BackupableDBOptions have to be the same as the ones used in previous // BackupEngines for the same backup directory. - static Status Open(Env* db_env, - const BackupableDBOptions& options, + static Status Open(Env* db_env, const BackupableDBOptions& options, BackupEngine** backup_engine_ptr); // same as CreateNewBackup, but stores extra application metadata // Flush will always trigger if 2PC is enabled. + // If write-ahead logs are disabled, set flush_before_backup=true to + // avoid losing unflushed key/value pairs from the memtable. virtual Status CreateNewBackupWithMetadata( DB* db, const std::string& app_metadata, bool flush_before_backup = false, std::function progress_callback = []() {}) = 0; @@ -270,6 +271,8 @@ class BackupEngine { // Captures the state of the database in the latest backup // NOT a thread safe call // Flush will always trigger if 2PC is enabled. + // If write-ahead logs are disabled, set flush_before_backup=true to + // avoid losing unflushed key/value pairs from the memtable. virtual Status CreateNewBackup(DB* db, bool flush_before_backup = false, std::function progress_callback = []() {}) { diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/date_tiered_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/date_tiered_db.h deleted file mode 100644 index f259b05a8..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/date_tiered_db.h +++ /dev/null @@ -1,108 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include -#include - -#include "rocksdb/db.h" - -namespace rocksdb { - -// Date tiered database is a wrapper of DB that implements -// a simplified DateTieredCompactionStrategy by using multiple column famillies -// as time windows. -// -// DateTieredDB provides an interface similar to DB, but it assumes that user -// provides keys with last 8 bytes encoded as timestamp in seconds. DateTieredDB -// is assigned with a TTL to declare when data should be deleted. -// -// DateTieredDB hides column families layer from standard RocksDB instance. It -// uses multiple column families to manage time series data, each containing a -// specific range of time. Column families are named by its maximum possible -// timestamp. A column family is created automatically when data newer than -// latest timestamp of all existing column families. The time range of a column -// family is configurable by `column_family_interval`. By doing this, we -// guarantee that compaction will only happen in a column family. -// -// DateTieredDB is assigned with a TTL. When all data in a column family are -// expired (CF_Timestamp <= CUR_Timestamp - TTL), we directly drop the whole -// column family. -// -// TODO(jhli): This is only a simplified version of DTCS. In a complete DTCS, -// time windows can be merged over time, so that older time windows will have -// larger time range. Also, compaction are executed only for adjacent SST files -// to guarantee there is no time overlap between SST files. - -class DateTieredDB { - public: - // Open a DateTieredDB whose name is `dbname`. - // Similar to DB::Open(), created database object is stored in dbptr. - // - // Two parameters can be configured: `ttl` to specify the length of time that - // keys should exist in the database, and `column_family_interval` to specify - // the time range of a column family interval. - // - // Open a read only database if read only is set as true. - // TODO(jhli): Should use an option object that includes ttl and - // column_family_interval. - static Status Open(const Options& options, const std::string& dbname, - DateTieredDB** dbptr, int64_t ttl, - int64_t column_family_interval, bool read_only = false); - - explicit DateTieredDB() {} - - virtual ~DateTieredDB() {} - - // Wrapper for Put method. Similar to DB::Put(), but column family to be - // inserted is decided by the timestamp in keys, i.e. the last 8 bytes of user - // key. If key is already obsolete, it will not be inserted. - // - // When client put a key value pair in DateTieredDB, it assumes last 8 bytes - // of keys are encoded as timestamp. Timestamp is a 64-bit signed integer - // encoded as the number of seconds since 1970-01-01 00:00:00 (UTC) (Same as - // Env::GetCurrentTime()). Timestamp should be encoded in big endian. - virtual Status Put(const WriteOptions& options, const Slice& key, - const Slice& val) = 0; - - // Wrapper for Get method. Similar to DB::Get() but column family is decided - // by timestamp in keys. If key is already obsolete, it will not be found. - virtual Status Get(const ReadOptions& options, const Slice& key, - std::string* value) = 0; - - // Wrapper for Delete method. Similar to DB::Delete() but column family is - // decided by timestamp in keys. If key is already obsolete, return NotFound - // status. - virtual Status Delete(const WriteOptions& options, const Slice& key) = 0; - - // Wrapper for KeyMayExist method. Similar to DB::KeyMayExist() but column - // family is decided by timestamp in keys. Return false when key is already - // obsolete. - virtual bool KeyMayExist(const ReadOptions& options, const Slice& key, - std::string* value, bool* value_found = nullptr) = 0; - - // Wrapper for Merge method. Similar to DB::Merge() but column family is - // decided by timestamp in keys. - virtual Status Merge(const WriteOptions& options, const Slice& key, - const Slice& value) = 0; - - // Create an iterator that hides low level details. This iterator internally - // merge results from all active time series column families. Note that - // column families are not deleted until all data are obsolete, so this - // iterator can possibly access obsolete key value pairs. - virtual Iterator* NewIterator(const ReadOptions& opts) = 0; - - // Explicitly drop column families in which all keys are obsolete. This - // process is also inplicitly done in Put() operation. - virtual Status DropObsoleteColumnFamilies() = 0; - - static const uint64_t kTSLength = sizeof(int64_t); // size of timestamp -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/db_ttl.h b/ceph/src/rocksdb/include/rocksdb/utilities/db_ttl.h index b40919a0f..227796cbe 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/db_ttl.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/db_ttl.h @@ -9,8 +9,8 @@ #include #include -#include "rocksdb/utilities/stackable_db.h" #include "rocksdb/db.h" +#include "rocksdb/utilities/stackable_db.h" namespace rocksdb { @@ -60,9 +60,9 @@ class DBWithTTL : public StackableDB { DBWithTTL** dbptr, std::vector ttls, bool read_only = false); - virtual void SetTtl(int32_t ttl) = 0; + virtual void SetTtl(int32_t ttl) = 0; - virtual void SetTtl(ColumnFamilyHandle *h, int32_t ttl) = 0; + virtual void SetTtl(ColumnFamilyHandle* h, int32_t ttl) = 0; protected: explicit DBWithTTL(DB* db) : StackableDB(db) {} diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/document_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/document_db.h deleted file mode 100644 index 3668a50b9..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/document_db.h +++ /dev/null @@ -1,149 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include - -#include "rocksdb/utilities/stackable_db.h" -#include "rocksdb/utilities/json_document.h" -#include "rocksdb/db.h" - -namespace rocksdb { - -// IMPORTANT: DocumentDB is a work in progress. It is unstable and we might -// change the API without warning. Talk to RocksDB team before using this in -// production ;) - -// DocumentDB is a layer on top of RocksDB that provides a very simple JSON API. -// When creating a DB, you specify a list of indexes you want to keep on your -// data. You can insert a JSON document to the DB, which is automatically -// indexed. Every document added to the DB needs to have "_id" field which is -// automatically indexed and is an unique primary key. All other indexes are -// non-unique. - -// NOTE: field names in the JSON are NOT allowed to start with '$' or -// contain '.'. We don't currently enforce that rule, but will start behaving -// badly. - -// Cursor is what you get as a result of executing query. To get all -// results from a query, call Next() on a Cursor while Valid() returns true -class Cursor { - public: - Cursor() = default; - virtual ~Cursor() {} - - virtual bool Valid() const = 0; - virtual void Next() = 0; - // Lifecycle of the returned JSONDocument is until the next Next() call - virtual const JSONDocument& document() const = 0; - virtual Status status() const = 0; - - private: - // No copying allowed - Cursor(const Cursor&); - void operator=(const Cursor&); -}; - -struct DocumentDBOptions { - int background_threads = 4; - uint64_t memtable_size = 128 * 1024 * 1024; // 128 MB - uint64_t cache_size = 1 * 1024 * 1024 * 1024; // 1 GB -}; - -// TODO(icanadi) Add `JSONDocument* info` parameter to all calls that can be -// used by the caller to get more information about the call execution (number -// of dropped records, number of updated records, etc.) -class DocumentDB : public StackableDB { - public: - struct IndexDescriptor { - // Currently, you can only define an index on a single field. To specify an - // index on a field X, set index description to JSON "{X: 1}" - // Currently the value needs to be 1, which means ascending. - // In the future, we plan to also support indexes on multiple keys, where - // you could mix ascending sorting (1) with descending sorting indexes (-1) - JSONDocument* description; - std::string name; - }; - - // Open DocumentDB with specified indexes. The list of indexes has to be - // complete, i.e. include all indexes present in the DB, except the primary - // key index. - // Otherwise, Open() will return an error - static Status Open(const DocumentDBOptions& options, const std::string& name, - const std::vector& indexes, - DocumentDB** db, bool read_only = false); - - explicit DocumentDB(DB* db) : StackableDB(db) {} - - // Create a new index. It will stop all writes for the duration of the call. - // All current documents in the DB are scanned and corresponding index entries - // are created - virtual Status CreateIndex(const WriteOptions& write_options, - const IndexDescriptor& index) = 0; - - // Drop an index. Client is responsible to make sure that index is not being - // used by currently executing queries - virtual Status DropIndex(const std::string& name) = 0; - - // Insert a document to the DB. The document needs to have a primary key "_id" - // which can either be a string or an integer. Otherwise the write will fail - // with InvalidArgument. - virtual Status Insert(const WriteOptions& options, - const JSONDocument& document) = 0; - - // Deletes all documents matching a filter atomically - virtual Status Remove(const ReadOptions& read_options, - const WriteOptions& write_options, - const JSONDocument& query) = 0; - - // Does this sequence of operations: - // 1. Find all documents matching a filter - // 2. For all documents, atomically: - // 2.1. apply the update operators - // 2.2. update the secondary indexes - // - // Currently only $set update operator is supported. - // Syntax is: {$set: {key1: value1, key2: value2, etc...}} - // This operator will change a document's key1 field to value1, key2 to - // value2, etc. New values will be set even if a document didn't have an entry - // for the specified key. - // - // You can not change a primary key of a document. - // - // Update example: Update({id: {$gt: 5}, $index: id}, {$set: {enabled: true}}) - virtual Status Update(const ReadOptions& read_options, - const WriteOptions& write_options, - const JSONDocument& filter, - const JSONDocument& updates) = 0; - - // query has to be an array in which every element is an operator. Currently - // only $filter operator is supported. Syntax of $filter operator is: - // {$filter: {key1: condition1, key2: condition2, etc.}} where conditions can - // be either: - // 1) a single value in which case the condition is equality condition, or - // 2) a defined operators, like {$gt: 4}, which will match all documents that - // have key greater than 4. - // - // Supported operators are: - // 1) $gt -- greater than - // 2) $gte -- greater than or equal - // 3) $lt -- less than - // 4) $lte -- less than or equal - // If you want the filter to use an index, you need to specify it like this: - // {$filter: {...(conditions)..., $index: index_name}} - // - // Example query: - // * [{$filter: {name: John, age: {$gte: 18}, $index: age}}] - // will return all Johns whose age is greater or equal to 18 and it will use - // index "age" to satisfy the query. - virtual Cursor* Query(const ReadOptions& read_options, - const JSONDocument& query) = 0; -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/env_librados.h b/ceph/src/rocksdb/include/rocksdb/utilities/env_librados.h index 82a1f0ba5..7be75878d 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/env_librados.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/env_librados.h @@ -172,4 +172,4 @@ class EnvLibrados : public EnvWrapper { librados::IoCtx* _GetIoctx(const std::string& prefix); friend class LibradosWritableFile; }; -} +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/env_mirror.h b/ceph/src/rocksdb/include/rocksdb/utilities/env_mirror.h index bc27cdc48..6d513fc79 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/env_mirror.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/env_mirror.h @@ -19,8 +19,8 @@ #ifndef ROCKSDB_LITE -#include #include +#include #include #include "rocksdb/env.h" @@ -31,37 +31,32 @@ class RandomAccessFileMirror; class WritableFileMirror; class EnvMirror : public EnvWrapper { - Env* a_, *b_; + Env *a_, *b_; bool free_a_, free_b_; public: - EnvMirror(Env* a, Env* b, bool free_a=false, bool free_b=false) - : EnvWrapper(a), - a_(a), - b_(b), - free_a_(free_a), - free_b_(free_b) {} + EnvMirror(Env* a, Env* b, bool free_a = false, bool free_b = false) + : EnvWrapper(a), a_(a), b_(b), free_a_(free_a), free_b_(free_b) {} ~EnvMirror() { - if (free_a_) - delete a_; - if (free_b_) - delete b_; + if (free_a_) delete a_; + if (free_b_) delete b_; } - Status NewSequentialFile(const std::string& f, unique_ptr* r, + Status NewSequentialFile(const std::string& f, + std::unique_ptr* r, const EnvOptions& options) override; Status NewRandomAccessFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) override; - Status NewWritableFile(const std::string& f, unique_ptr* r, + Status NewWritableFile(const std::string& f, std::unique_ptr* r, const EnvOptions& options) override; Status ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) override; virtual Status NewDirectory(const std::string& name, - unique_ptr* result) override { - unique_ptr br; + std::unique_ptr* result) override { + std::unique_ptr br; Status as = a_->NewDirectory(name, result); Status bs = b_->NewDirectory(name, &br); assert(as == bs); @@ -156,12 +151,12 @@ class EnvMirror : public EnvWrapper { class FileLockMirror : public FileLock { public: - FileLock* a_, *b_; + FileLock *a_, *b_; FileLockMirror(FileLock* a, FileLock* b) : a_(a), b_(b) {} }; Status LockFile(const std::string& f, FileLock** l) override { - FileLock* al, *bl; + FileLock *al, *bl; Status as = a_->LockFile(f, &al); Status bs = b_->LockFile(f, &bl); assert(as == bs); diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/geo_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/geo_db.h deleted file mode 100644 index ec3cbdf26..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/geo_db.h +++ /dev/null @@ -1,114 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// - -#ifndef ROCKSDB_LITE -#pragma once -#include -#include - -#include "rocksdb/utilities/stackable_db.h" -#include "rocksdb/status.h" - -namespace rocksdb { - -// -// Configurable options needed for setting up a Geo database -// -struct GeoDBOptions { - // Backup info and error messages will be written to info_log - // if non-nullptr. - // Default: nullptr - Logger* info_log; - - explicit GeoDBOptions(Logger* _info_log = nullptr):info_log(_info_log) { } -}; - -// -// A position in the earth's geoid -// -class GeoPosition { - public: - double latitude; - double longitude; - - explicit GeoPosition(double la = 0, double lo = 0) : - latitude(la), longitude(lo) { - } -}; - -// -// Description of an object on the Geoid. It is located by a GPS location, -// and is identified by the id. The value associated with this object is -// an opaque string 'value'. Different objects identified by unique id's -// can have the same gps-location associated with them. -// -class GeoObject { - public: - GeoPosition position; - std::string id; - std::string value; - - GeoObject() {} - - GeoObject(const GeoPosition& pos, const std::string& i, - const std::string& val) : - position(pos), id(i), value(val) { - } -}; - -class GeoIterator { - public: - GeoIterator() = default; - virtual ~GeoIterator() {} - virtual void Next() = 0; - virtual bool Valid() const = 0; - virtual const GeoObject& geo_object() = 0; - virtual Status status() const = 0; -}; - -// -// Stack your DB with GeoDB to be able to get geo-spatial support -// -class GeoDB : public StackableDB { - public: - // GeoDBOptions have to be the same as the ones used in a previous - // incarnation of the DB - // - // GeoDB owns the pointer `DB* db` now. You should not delete it or - // use it after the invocation of GeoDB - // GeoDB(DB* db, const GeoDBOptions& options) : StackableDB(db) {} - GeoDB(DB* db, const GeoDBOptions& /*options*/) : StackableDB(db) {} - virtual ~GeoDB() {} - - // Insert a new object into the location database. The object is - // uniquely identified by the id. If an object with the same id already - // exists in the db, then the old one is overwritten by the new - // object being inserted here. - virtual Status Insert(const GeoObject& object) = 0; - - // Retrieve the value of the object located at the specified GPS - // location and is identified by the 'id'. - virtual Status GetByPosition(const GeoPosition& pos, - const Slice& id, std::string* value) = 0; - - // Retrieve the value of the object identified by the 'id'. This method - // could be potentially slower than GetByPosition - virtual Status GetById(const Slice& id, GeoObject* object) = 0; - - // Delete the specified object - virtual Status Remove(const Slice& id) = 0; - - // Returns an iterator for the items within a circular radius from the - // specified gps location. If 'number_of_values' is specified, - // then the iterator is capped to that number of objects. - // The radius is specified in 'meters'. - virtual GeoIterator* SearchRadial(const GeoPosition& pos, - double radius, - int number_of_values = INT_MAX) = 0; -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/json_document.h b/ceph/src/rocksdb/include/rocksdb/utilities/json_document.h deleted file mode 100644 index 5d841f951..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/json_document.h +++ /dev/null @@ -1,195 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include -#include -#include -#include -#include -#include - -#include "rocksdb/slice.h" - -// We use JSONDocument for DocumentDB API -// Implementation inspired by folly::dynamic, rapidjson and fbson - -namespace fbson { - class FbsonValue; - class ObjectVal; - template - class FbsonWriterT; - class FbsonOutStream; - typedef FbsonWriterT FbsonWriter; -} // namespace fbson - -namespace rocksdb { - -// NOTE: none of this is thread-safe -class JSONDocument { - public: - // return nullptr on parse failure - static JSONDocument* ParseJSON(const char* json); - - enum Type { - kNull, - kArray, - kBool, - kDouble, - kInt64, - kObject, - kString, - }; - - /* implicit */ JSONDocument(); // null - /* implicit */ JSONDocument(bool b); - /* implicit */ JSONDocument(double d); - /* implicit */ JSONDocument(int8_t i); - /* implicit */ JSONDocument(int16_t i); - /* implicit */ JSONDocument(int32_t i); - /* implicit */ JSONDocument(int64_t i); - /* implicit */ JSONDocument(const std::string& s); - /* implicit */ JSONDocument(const char* s); - // constructs JSONDocument of specific type with default value - explicit JSONDocument(Type _type); - - JSONDocument(const JSONDocument& json_document); - - JSONDocument(JSONDocument&& json_document); - - Type type() const; - - // REQUIRES: IsObject() - bool Contains(const std::string& key) const; - // REQUIRES: IsObject() - // Returns non-owner object - JSONDocument operator[](const std::string& key) const; - - // REQUIRES: IsArray() == true || IsObject() == true - size_t Count() const; - - // REQUIRES: IsArray() - // Returns non-owner object - JSONDocument operator[](size_t i) const; - - JSONDocument& operator=(JSONDocument jsonDocument); - - bool IsNull() const; - bool IsArray() const; - bool IsBool() const; - bool IsDouble() const; - bool IsInt64() const; - bool IsObject() const; - bool IsString() const; - - // REQUIRES: IsBool() == true - bool GetBool() const; - // REQUIRES: IsDouble() == true - double GetDouble() const; - // REQUIRES: IsInt64() == true - int64_t GetInt64() const; - // REQUIRES: IsString() == true - std::string GetString() const; - - bool operator==(const JSONDocument& rhs) const; - - bool operator!=(const JSONDocument& rhs) const; - - JSONDocument Copy() const; - - bool IsOwner() const; - - std::string DebugString() const; - - private: - class ItemsIteratorGenerator; - - public: - // REQUIRES: IsObject() - ItemsIteratorGenerator Items() const; - - // appends serialized object to dst - void Serialize(std::string* dst) const; - // returns nullptr if Slice doesn't represent valid serialized JSONDocument - static JSONDocument* Deserialize(const Slice& src); - - private: - friend class JSONDocumentBuilder; - - JSONDocument(fbson::FbsonValue* val, bool makeCopy); - - void InitFromValue(const fbson::FbsonValue* val); - - // iteration on objects - class const_item_iterator { - private: - class Impl; - public: - typedef std::pair value_type; - explicit const_item_iterator(Impl* impl); - const_item_iterator(const_item_iterator&&); - const_item_iterator& operator++(); - bool operator!=(const const_item_iterator& other); - value_type operator*(); - ~const_item_iterator(); - private: - friend class ItemsIteratorGenerator; - std::unique_ptr it_; - }; - - class ItemsIteratorGenerator { - public: - explicit ItemsIteratorGenerator(const fbson::ObjectVal& object); - const_item_iterator begin() const; - - const_item_iterator end() const; - - private: - const fbson::ObjectVal& object_; - }; - - std::unique_ptr data_; - mutable fbson::FbsonValue* value_; - - // Our serialization format's first byte specifies the encoding version. That - // way, we can easily change our format while providing backwards - // compatibility. This constant specifies the current version of the - // serialization format - static const char kSerializationFormatVersion; -}; - -class JSONDocumentBuilder { - public: - JSONDocumentBuilder(); - - explicit JSONDocumentBuilder(fbson::FbsonOutStream* out); - - void Reset(); - - bool WriteStartArray(); - - bool WriteEndArray(); - - bool WriteStartObject(); - - bool WriteEndObject(); - - bool WriteKeyValue(const std::string& key, const JSONDocument& value); - - bool WriteJSONDocument(const JSONDocument& value); - - JSONDocument GetJSONDocument(); - - ~JSONDocumentBuilder(); - - private: - std::unique_ptr writer_; -}; - -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd.h b/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd.h index 907c9daf2..57ab88a34 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd.h @@ -96,6 +96,12 @@ class LDBCommand { ldb_options_ = ldb_options; } + const std::map& TEST_GetOptionMap() { + return option_map_; + } + + const std::vector& TEST_GetFlags() { return flags_; } + virtual bool NoDBOpen() { return false; } virtual ~LDBCommand() { CloseDB(); } diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd_execute_result.h b/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd_execute_result.h index 5ddc6feb6..85c219542 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd_execute_result.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/ldb_cmd_execute_result.h @@ -12,26 +12,28 @@ namespace rocksdb { class LDBCommandExecuteResult { -public: + public: enum State { - EXEC_NOT_STARTED = 0, EXEC_SUCCEED = 1, EXEC_FAILED = 2, + EXEC_NOT_STARTED = 0, + EXEC_SUCCEED = 1, + EXEC_FAILED = 2, }; LDBCommandExecuteResult() : state_(EXEC_NOT_STARTED), message_("") {} - LDBCommandExecuteResult(State state, std::string& msg) : - state_(state), message_(msg) {} + LDBCommandExecuteResult(State state, std::string& msg) + : state_(state), message_(msg) {} std::string ToString() { std::string ret; switch (state_) { - case EXEC_SUCCEED: - break; - case EXEC_FAILED: - ret.append("Failed: "); - break; - case EXEC_NOT_STARTED: - ret.append("Not started: "); + case EXEC_SUCCEED: + break; + case EXEC_FAILED: + ret.append("Failed: "); + break; + case EXEC_NOT_STARTED: + ret.append("Not started: "); } if (!message_.empty()) { ret.append(message_); @@ -44,17 +46,11 @@ public: message_ = ""; } - bool IsSucceed() { - return state_ == EXEC_SUCCEED; - } + bool IsSucceed() { return state_ == EXEC_SUCCEED; } - bool IsNotStarted() { - return state_ == EXEC_NOT_STARTED; - } + bool IsNotStarted() { return state_ == EXEC_NOT_STARTED; } - bool IsFailed() { - return state_ == EXEC_FAILED; - } + bool IsFailed() { return state_ == EXEC_FAILED; } static LDBCommandExecuteResult Succeed(std::string msg) { return LDBCommandExecuteResult(EXEC_SUCCEED, msg); @@ -64,7 +60,7 @@ public: return LDBCommandExecuteResult(EXEC_FAILED, msg); } -private: + private: State state_; std::string message_; @@ -72,4 +68,4 @@ private: bool operator!=(const LDBCommandExecuteResult&); }; -} +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/lua/rocks_lua_compaction_filter.h b/ceph/src/rocksdb/include/rocksdb/utilities/lua/rocks_lua_compaction_filter.h deleted file mode 100644 index 71dd7ee5b..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/lua/rocks_lua_compaction_filter.h +++ /dev/null @@ -1,189 +0,0 @@ -// Copyright (c) 2016, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once - -#if defined(LUA) && !defined(ROCKSDB_LITE) -// lua headers -extern "C" { -#include -#include -#include -} - -#include -#include -#include - -#include "rocksdb/compaction_filter.h" -#include "rocksdb/env.h" -#include "rocksdb/slice.h" -#include "rocksdb/utilities/lua/rocks_lua_custom_library.h" -#include "rocksdb/utilities/lua/rocks_lua_util.h" - -namespace rocksdb { -namespace lua { - -struct RocksLuaCompactionFilterOptions { - // The lua script in string that implements all necessary CompactionFilter - // virtual functions. The specified lua_script must implement the following - // functions, which are Name and Filter, as described below. - // - // 0. The Name function simply returns a string representing the name of - // the lua script. If there's any erorr in the Name function, an - // empty string will be used. - // --- Example - // function Name() - // return "DefaultLuaCompactionFilter" - // end - // - // - // 1. The script must contains a function called Filter, which implements - // CompactionFilter::Filter() , takes three input arguments, and returns - // three values as the following API: - // - // function Filter(level, key, existing_value) - // ... - // return is_filtered, is_changed, new_value - // end - // - // Note that if ignore_value is set to true, then Filter should implement - // the following API: - // - // function Filter(level, key) - // ... - // return is_filtered - // end - // - // If there're any error in the Filter() function, then it will keep - // the input key / value pair. - // - // -- Input - // The function must take three arguments (integer, string, string), - // which map to "level", "key", and "existing_value" passed from - // RocksDB. - // - // -- Output - // The function must return three values (boolean, boolean, string). - // - is_filtered: if the first return value is true, then it indicates - // the input key / value pair should be filtered. - // - is_changed: if the second return value is true, then it indicates - // the existing_value needs to be changed, and the resulting value - // is stored in the third return value. - // - new_value: if the second return value is true, then this third - // return value stores the new value of the input key / value pair. - // - // -- Examples - // -- a filter that keeps all key-value pairs - // function Filter(level, key, existing_value) - // return false, false, "" - // end - // - // -- a filter that keeps all keys and change their values to "Rocks" - // function Filter(level, key, existing_value) - // return false, true, "Rocks" - // end - - std::string lua_script; - - // If set to true, then existing_value will not be passed to the Filter - // function, and the Filter function only needs to return a single boolean - // flag indicating whether to filter out this key or not. - // - // function Filter(level, key) - // ... - // return is_filtered - // end - bool ignore_value = false; - - // A boolean flag to determine whether to ignore snapshots. - bool ignore_snapshots = false; - - // When specified a non-null pointer, the first "error_limit_per_filter" - // errors of each CompactionFilter that is lua related will be included - // in this log. - std::shared_ptr error_log; - - // The number of errors per CompactionFilter will be printed - // to error_log. - int error_limit_per_filter = 1; - - // A string to luaL_reg array map that allows the Lua CompactionFilter - // to use custom C library. The string will be used as the library - // name in Lua. - std::vector> libraries; - - /////////////////////////////////////////////////////////////////////////// - // NOT YET SUPPORTED - // The name of the Lua function in "lua_script" that implements - // CompactionFilter::FilterMergeOperand(). The function must take - // three input arguments (integer, string, string), which map to "level", - // "key", and "operand" passed from the RocksDB. In addition, the - // function must return a single boolean value, indicating whether - // to filter the input key / operand. - // - // DEFAULT: the default implementation always returns false. - // @see CompactionFilter::FilterMergeOperand -}; - -class RocksLuaCompactionFilterFactory : public CompactionFilterFactory { - public: - explicit RocksLuaCompactionFilterFactory( - const RocksLuaCompactionFilterOptions opt); - - virtual ~RocksLuaCompactionFilterFactory() {} - - std::unique_ptr CreateCompactionFilter( - const CompactionFilter::Context& context) override; - - // Change the Lua script so that the next compaction after this - // function call will use the new Lua script. - void SetScript(const std::string& new_script); - - // Obtain the current Lua script - std::string GetScript(); - - const char* Name() const override; - - private: - RocksLuaCompactionFilterOptions opt_; - std::string name_; - // A lock to protect "opt_" to make it dynamically changeable. - std::mutex opt_mutex_; -}; - -// A wrapper class that invokes Lua script to perform CompactionFilter -// functions. -class RocksLuaCompactionFilter : public rocksdb::CompactionFilter { - public: - explicit RocksLuaCompactionFilter(const RocksLuaCompactionFilterOptions& opt) - : options_(opt), - lua_state_wrapper_(opt.lua_script, opt.libraries), - error_count_(0), - name_("") {} - - virtual bool Filter(int level, const Slice& key, const Slice& existing_value, - std::string* new_value, - bool* value_changed) const override; - // Not yet supported - virtual bool FilterMergeOperand(int /*level*/, const Slice& /*key*/, - const Slice& /*operand*/) const override { - return false; - } - virtual bool IgnoreSnapshots() const override; - virtual const char* Name() const override; - - protected: - void LogLuaError(const char* format, ...) const; - - RocksLuaCompactionFilterOptions options_; - LuaStateWrapper lua_state_wrapper_; - mutable int error_count_; - mutable std::string name_; -}; - -} // namespace lua -} // namespace rocksdb -#endif // defined(LUA) && !defined(ROCKSDB_LITE) diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/object_registry.h b/ceph/src/rocksdb/include/rocksdb/utilities/object_registry.h index b046ba7c1..86a51b92e 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/object_registry.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/object_registry.h @@ -27,8 +27,8 @@ namespace rocksdb { template T* NewCustomObject(const std::string& target, std::unique_ptr* res_guard); -// Returns a new T when called with a string. Populates the unique_ptr argument -// if granting ownership to caller. +// Returns a new T when called with a string. Populates the std::unique_ptr +// argument if granting ownership to caller. template using FactoryFunc = std::function*)>; diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/options_util.h b/ceph/src/rocksdb/include/rocksdb/utilities/options_util.h index d02c57410..d97b394ea 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/options_util.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/options_util.h @@ -33,6 +33,12 @@ namespace rocksdb { // * merge_operator // * compaction_filter // +// User can also choose to load customized comparator and/or merge_operator +// through object registry: +// * comparator needs to be registered through Registrar +// * merge operator needs to be registered through +// Registrar>. +// // For table_factory, this function further supports deserializing // BlockBasedTableFactory and its BlockBasedTableOptions except the // pointer options of BlockBasedTableOptions (flush_block_policy_factory, @@ -58,7 +64,8 @@ namespace rocksdb { Status LoadLatestOptions(const std::string& dbpath, Env* env, DBOptions* db_options, std::vector* cf_descs, - bool ignore_unknown_options = false); + bool ignore_unknown_options = false, + std::shared_ptr* cache = {}); // Similar to LoadLatestOptions, this function constructs the DBOptions // and ColumnFamilyDescriptors based on the specified RocksDB Options file. @@ -67,7 +74,8 @@ Status LoadLatestOptions(const std::string& dbpath, Env* env, Status LoadOptionsFromFile(const std::string& options_file_name, Env* env, DBOptions* db_options, std::vector* cf_descs, - bool ignore_unknown_options = false); + bool ignore_unknown_options = false, + std::shared_ptr* cache = {}); // Returns the latest options file name under the specified db path. Status GetLatestOptionsFileName(const std::string& dbpath, Env* env, diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/sim_cache.h b/ceph/src/rocksdb/include/rocksdb/utilities/sim_cache.h index f29fd5e8f..bc2a7bc13 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/sim_cache.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/sim_cache.h @@ -73,7 +73,8 @@ class SimCache : public Cache { // stop logging to the file automatically after reaching a specific size in // bytes, a values of 0 disable this feature virtual Status StartActivityLogging(const std::string& activity_log_file, - Env* env, uint64_t max_logging_size = 0) = 0; + Env* env, + uint64_t max_logging_size = 0) = 0; // Stop cache activity logging if any virtual void StopActivityLogging() = 0; diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/spatial_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/spatial_db.h deleted file mode 100644 index 477b77cf6..000000000 --- a/ceph/src/rocksdb/include/rocksdb/utilities/spatial_db.h +++ /dev/null @@ -1,261 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include - -#include "rocksdb/db.h" -#include "rocksdb/slice.h" -#include "rocksdb/utilities/stackable_db.h" - -namespace rocksdb { -namespace spatial { - -// NOTE: SpatialDB is experimental and we might change its API without warning. -// Please talk to us before developing against SpatialDB API. -// -// SpatialDB is a support for spatial indexes built on top of RocksDB. -// When creating a new SpatialDB, clients specifies a list of spatial indexes to -// build on their data. Each spatial index is defined by the area and -// granularity. If you're storing map data, different spatial index -// granularities can be used for different zoom levels. -// -// Each element inserted into SpatialDB has: -// * a bounding box, which determines how will the element be indexed -// * string blob, which will usually be WKB representation of the polygon -// (http://en.wikipedia.org/wiki/Well-known_text) -// * feature set, which is a map of key-value pairs, where value can be null, -// int, double, bool, string -// * a list of indexes to insert the element in -// -// Each query is executed on a single spatial index. Query guarantees that it -// will return all elements intersecting the specified bounding box, but it -// might also return some extra non-intersecting elements. - -// Variant is a class that can be many things: null, bool, int, double or string -// It is used to store different value types in FeatureSet (see below) -struct Variant { - // Don't change the values here, they are persisted on disk - enum Type { - kNull = 0x0, - kBool = 0x1, - kInt = 0x2, - kDouble = 0x3, - kString = 0x4, - }; - - Variant() : type_(kNull) {} - /* implicit */ Variant(bool b) : type_(kBool) { data_.b = b; } - /* implicit */ Variant(uint64_t i) : type_(kInt) { data_.i = i; } - /* implicit */ Variant(double d) : type_(kDouble) { data_.d = d; } - /* implicit */ Variant(const std::string& s) : type_(kString) { - new (&data_.s) std::string(s); - } - - Variant(const Variant& v) : type_(v.type_) { Init(v, data_); } - - Variant& operator=(const Variant& v); - - Variant(Variant&& rhs) : type_(kNull) { *this = std::move(rhs); } - - Variant& operator=(Variant&& v); - - ~Variant() { Destroy(type_, data_); } - - Type type() const { return type_; } - bool get_bool() const { return data_.b; } - uint64_t get_int() const { return data_.i; } - double get_double() const { return data_.d; } - const std::string& get_string() const { return *GetStringPtr(data_); } - - bool operator==(const Variant& other) const; - bool operator!=(const Variant& other) const { return !(*this == other); } - - private: - Type type_; - - union Data { - bool b; - uint64_t i; - double d; - // Current version of MS compiler not C++11 compliant so can not put - // std::string - // however, even then we still need the rest of the maintenance. - char s[sizeof(std::string)]; - } data_; - - // Avoid type_punned aliasing problem - static std::string* GetStringPtr(Data& d) { - void* p = d.s; - return reinterpret_cast(p); - } - - static const std::string* GetStringPtr(const Data& d) { - const void* p = d.s; - return reinterpret_cast(p); - } - - static void Init(const Variant&, Data&); - - static void Destroy(Type t, Data& d) { - if (t == kString) { - using std::string; - GetStringPtr(d)->~string(); - } - } -}; - -// FeatureSet is a map of key-value pairs. One feature set is associated with -// each element in SpatialDB. It can be used to add rich data about the element. -class FeatureSet { - private: - typedef std::unordered_map map; - - public: - class iterator { - public: - /* implicit */ iterator(const map::const_iterator itr) : itr_(itr) {} - iterator& operator++() { - ++itr_; - return *this; - } - bool operator!=(const iterator& other) { return itr_ != other.itr_; } - bool operator==(const iterator& other) { return itr_ == other.itr_; } - map::value_type operator*() { return *itr_; } - - private: - map::const_iterator itr_; - }; - FeatureSet() = default; - - FeatureSet* Set(const std::string& key, const Variant& value); - bool Contains(const std::string& key) const; - // REQUIRES: Contains(key) - const Variant& Get(const std::string& key) const; - iterator Find(const std::string& key) const; - - iterator begin() const { return map_.begin(); } - iterator end() const { return map_.end(); } - - void Clear(); - size_t Size() const { return map_.size(); } - - void Serialize(std::string* output) const; - // REQUIRED: empty FeatureSet - bool Deserialize(const Slice& input); - - std::string DebugString() const; - - private: - map map_; -}; - -// BoundingBox is a helper structure for defining rectangles representing -// bounding boxes of spatial elements. -template -struct BoundingBox { - T min_x, min_y, max_x, max_y; - BoundingBox() = default; - BoundingBox(T _min_x, T _min_y, T _max_x, T _max_y) - : min_x(_min_x), min_y(_min_y), max_x(_max_x), max_y(_max_y) {} - - bool Intersects(const BoundingBox& a) const { - return !(min_x > a.max_x || min_y > a.max_y || a.min_x > max_x || - a.min_y > max_y); - } -}; - -struct SpatialDBOptions { - uint64_t cache_size = 1 * 1024 * 1024 * 1024LL; // 1GB - int num_threads = 16; - bool bulk_load = true; -}; - -// Cursor is used to return data from the query to the client. To get all the -// data from the query, just call Next() while Valid() is true -class Cursor { - public: - Cursor() = default; - virtual ~Cursor() {} - - virtual bool Valid() const = 0; - // REQUIRES: Valid() - virtual void Next() = 0; - - // Lifetime of the underlying storage until the next call to Next() - // REQUIRES: Valid() - virtual const Slice blob() = 0; - // Lifetime of the underlying storage until the next call to Next() - // REQUIRES: Valid() - virtual const FeatureSet& feature_set() = 0; - - virtual Status status() const = 0; - - private: - // No copying allowed - Cursor(const Cursor&); - void operator=(const Cursor&); -}; - -// SpatialIndexOptions defines a spatial index that will be built on the data -struct SpatialIndexOptions { - // Spatial indexes are referenced by names - std::string name; - // An area that is indexed. If the element is not intersecting with spatial - // index's bbox, it will not be inserted into the index - BoundingBox bbox; - // tile_bits control the granularity of the spatial index. Each dimension of - // the bbox will be split into (1 << tile_bits) tiles, so there will be a - // total of (1 << tile_bits)^2 tiles. It is recommended to configure a size of - // each tile to be approximately the size of the query on that spatial index - uint32_t tile_bits; - SpatialIndexOptions() {} - SpatialIndexOptions(const std::string& _name, - const BoundingBox& _bbox, uint32_t _tile_bits) - : name(_name), bbox(_bbox), tile_bits(_tile_bits) {} -}; - -class SpatialDB : public StackableDB { - public: - // Creates the SpatialDB with specified list of indexes. - // REQUIRED: db doesn't exist - static Status Create(const SpatialDBOptions& options, const std::string& name, - const std::vector& spatial_indexes); - - // Open the existing SpatialDB. The resulting db object will be returned - // through db parameter. - // REQUIRED: db was created using SpatialDB::Create - static Status Open(const SpatialDBOptions& options, const std::string& name, - SpatialDB** db, bool read_only = false); - - explicit SpatialDB(DB* db) : StackableDB(db) {} - - // Insert the element into the DB. Element will be inserted into specified - // spatial_indexes, based on specified bbox. - // REQUIRES: spatial_indexes.size() > 0 - virtual Status Insert(const WriteOptions& write_options, - const BoundingBox& bbox, const Slice& blob, - const FeatureSet& feature_set, - const std::vector& spatial_indexes) = 0; - - // Calling Compact() after inserting a bunch of elements should speed up - // reading. This is especially useful if you use SpatialDBOptions::bulk_load - // Num threads determines how many threads we'll use for compactions. Setting - // this to bigger number will use more IO and CPU, but finish faster - virtual Status Compact(int num_threads = 1) = 0; - - // Query the specified spatial_index. Query will return all elements that - // intersect bbox, but it may also return some extra elements. - virtual Cursor* Query(const ReadOptions& read_options, - const BoundingBox& bbox, - const std::string& spatial_index) = 0; -}; - -} // namespace spatial -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/stackable_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/stackable_db.h index 721203f7c..8fef9b3e8 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/stackable_db.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/stackable_db.h @@ -13,7 +13,6 @@ #undef DeleteFile #endif - namespace rocksdb { // This class contains APIs to stack rocksdb wrappers.Eg. Stack TTL over base d @@ -37,9 +36,7 @@ class StackableDB : public DB { virtual Status Close() override { return db_->Close(); } - virtual DB* GetBaseDB() { - return db_; - } + virtual DB* GetBaseDB() { return db_; } virtual DB* GetRootDB() override { return db_->GetRootDB(); } @@ -107,6 +104,12 @@ class StackableDB : public DB { return db_->IngestExternalFile(column_family, external_files, options); } + using DB::IngestExternalFiles; + virtual Status IngestExternalFiles( + const std::vector& args) override { + return db_->IngestExternalFiles(args); + } + virtual Status VerifyChecksum() override { return db_->VerifyChecksum(); } using DB::KeyMayExist; @@ -138,10 +141,8 @@ class StackableDB : public DB { return db_->Merge(options, column_family, key, value); } - - virtual Status Write(const WriteOptions& opts, WriteBatch* updates) - override { - return db_->Write(opts, updates); + virtual Status Write(const WriteOptions& opts, WriteBatch* updates) override { + return db_->Write(opts, updates); } using DB::NewIterator; @@ -157,10 +158,7 @@ class StackableDB : public DB { return db_->NewIterators(options, column_families, iterators); } - - virtual const Snapshot* GetSnapshot() override { - return db_->GetSnapshot(); - } + virtual const Snapshot* GetSnapshot() override { return db_->GetSnapshot(); } virtual void ReleaseSnapshot(const Snapshot* snapshot) override { return db_->ReleaseSnapshot(snapshot); @@ -191,12 +189,10 @@ class StackableDB : public DB { } using DB::GetApproximateSizes; - virtual void GetApproximateSizes(ColumnFamilyHandle* column_family, - const Range* r, int n, uint64_t* sizes, - uint8_t include_flags - = INCLUDE_FILES) override { - return db_->GetApproximateSizes(column_family, r, n, sizes, - include_flags); + virtual void GetApproximateSizes( + ColumnFamilyHandle* column_family, const Range* r, int n, uint64_t* sizes, + uint8_t include_flags = INCLUDE_FILES) override { + return db_->GetApproximateSizes(column_family, r, n, sizes, include_flags); } using DB::GetApproximateMemTableStats; @@ -218,12 +214,13 @@ class StackableDB : public DB { virtual Status CompactFiles( const CompactionOptions& compact_options, ColumnFamilyHandle* column_family, - const std::vector& input_file_names, - const int output_level, const int output_path_id = -1, - std::vector* const output_file_names = nullptr) override { - return db_->CompactFiles( - compact_options, column_family, input_file_names, - output_level, output_path_id, output_file_names); + const std::vector& input_file_names, const int output_level, + const int output_path_id = -1, + std::vector* const output_file_names = nullptr, + CompactionJobInfo* compaction_job_info = nullptr) override { + return db_->CompactFiles(compact_options, column_family, input_file_names, + output_level, output_path_id, output_file_names, + compaction_job_info); } virtual Status PauseBackgroundWork() override { @@ -244,24 +241,20 @@ class StackableDB : public DB { } using DB::MaxMemCompactionLevel; - virtual int MaxMemCompactionLevel(ColumnFamilyHandle* column_family) - override { + virtual int MaxMemCompactionLevel( + ColumnFamilyHandle* column_family) override { return db_->MaxMemCompactionLevel(column_family); } using DB::Level0StopWriteTrigger; - virtual int Level0StopWriteTrigger(ColumnFamilyHandle* column_family) - override { + virtual int Level0StopWriteTrigger( + ColumnFamilyHandle* column_family) override { return db_->Level0StopWriteTrigger(column_family); } - virtual const std::string& GetName() const override { - return db_->GetName(); - } + virtual const std::string& GetName() const override { return db_->GetName(); } - virtual Env* GetEnv() const override { - return db_->GetEnv(); - } + virtual Env* GetEnv() const override { return db_->GetEnv(); } using DB::GetOptions; virtual Options GetOptions(ColumnFamilyHandle* column_family) const override { @@ -278,13 +271,20 @@ class StackableDB : public DB { ColumnFamilyHandle* column_family) override { return db_->Flush(fopts, column_family); } - - virtual Status SyncWAL() override { - return db_->SyncWAL(); + virtual Status Flush( + const FlushOptions& fopts, + const std::vector& column_families) override { + return db_->Flush(fopts, column_families); } + virtual Status SyncWAL() override { return db_->SyncWAL(); } + virtual Status FlushWAL(bool sync) override { return db_->FlushWAL(sync); } + virtual Status LockWAL() override { return db_->LockWAL(); } + + virtual Status UnlockWAL() override { return db_->UnlockWAL(); } + #ifndef ROCKSDB_LITE virtual Status DisableFileDeletions() override { @@ -300,9 +300,8 @@ class StackableDB : public DB { db_->GetLiveFilesMetaData(metadata); } - virtual void GetColumnFamilyMetaData( - ColumnFamilyHandle *column_family, - ColumnFamilyMetaData* cf_meta) override { + virtual void GetColumnFamilyMetaData(ColumnFamilyHandle* column_family, + ColumnFamilyMetaData* cf_meta) override { db_->GetColumnFamilyMetaData(column_family, cf_meta); } @@ -310,14 +309,15 @@ class StackableDB : public DB { virtual Status GetLiveFiles(std::vector& vec, uint64_t* mfs, bool flush_memtable = true) override { - return db_->GetLiveFiles(vec, mfs, flush_memtable); + return db_->GetLiveFiles(vec, mfs, flush_memtable); } virtual SequenceNumber GetLatestSequenceNumber() const override { return db_->GetLatestSequenceNumber(); } - virtual bool SetPreserveDeletesSequenceNumber(SequenceNumber seqnum) override { + virtual bool SetPreserveDeletesSequenceNumber( + SequenceNumber seqnum) override { return db_->SetPreserveDeletesSequenceNumber(seqnum); } @@ -364,7 +364,7 @@ class StackableDB : public DB { } virtual Status GetUpdatesSince( - SequenceNumber seq_number, unique_ptr* iter, + SequenceNumber seq_number, std::unique_ptr* iter, const TransactionLogIterator::ReadOptions& read_options) override { return db_->GetUpdatesSince(seq_number, iter, read_options); } @@ -389,4 +389,4 @@ class StackableDB : public DB { std::shared_ptr shared_db_ptr_; }; -} // namespace rocksdb +} // namespace rocksdb diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/table_properties_collectors.h b/ceph/src/rocksdb/include/rocksdb/utilities/table_properties_collectors.h index c74f89bc9..bb350bcf9 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/table_properties_collectors.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/table_properties_collectors.h @@ -40,20 +40,18 @@ class CompactOnDeletionCollectorFactory private: friend std::shared_ptr - NewCompactOnDeletionCollectorFactory( - size_t sliding_window_size, - size_t deletion_trigger); + NewCompactOnDeletionCollectorFactory(size_t sliding_window_size, + size_t deletion_trigger); // A factory of a table property collector that marks a SST // file as need-compaction when it observe at least "D" deletion // entries in any "N" consecutive entires. // // @param sliding_window_size "N" // @param deletion_trigger "D" - CompactOnDeletionCollectorFactory( - size_t sliding_window_size, - size_t deletion_trigger) : - sliding_window_size_(sliding_window_size), - deletion_trigger_(deletion_trigger) {} + CompactOnDeletionCollectorFactory(size_t sliding_window_size, + size_t deletion_trigger) + : sliding_window_size_(sliding_window_size), + deletion_trigger_(deletion_trigger) {} std::atomic sliding_window_size_; std::atomic deletion_trigger_; @@ -69,9 +67,8 @@ class CompactOnDeletionCollectorFactory // @param deletion_trigger "D". Note that even when "N" is changed, // the specified number for "D" will not be changed. extern std::shared_ptr - NewCompactOnDeletionCollectorFactory( - size_t sliding_window_size, - size_t deletion_trigger); +NewCompactOnDeletionCollectorFactory(size_t sliding_window_size, + size_t deletion_trigger); } // namespace rocksdb #endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/transaction.h b/ceph/src/rocksdb/include/rocksdb/utilities/transaction.h index 86627d4f4..ce6724822 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/transaction.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/transaction.h @@ -208,8 +208,10 @@ class Transaction { // Read this key and ensure that this transaction will only // be able to be committed if this key is not written outside this // transaction after it has first been read (or after the snapshot if a - // snapshot is set in this transaction). The transaction behavior is the - // same regardless of whether the key exists or not. + // snapshot is set in this transaction and do_validate is true). If + // do_validate is false, ReadOptions::snapshot is expected to be nullptr so + // that GetForUpdate returns the latest committed value. The transaction + // behavior is the same regardless of whether the key exists or not. // // Note: Currently, this function will return Status::MergeInProgress // if the most recent write to the queried key in this batch is a Merge. @@ -234,26 +236,31 @@ class Transaction { virtual Status GetForUpdate(const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, std::string* value, - bool exclusive = true) = 0; + bool exclusive = true, + const bool do_validate = true) = 0; // An overload of the above method that receives a PinnableSlice // For backward compatibility a default implementation is provided virtual Status GetForUpdate(const ReadOptions& options, - ColumnFamilyHandle* /*column_family*/, + ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* pinnable_val, - bool /*exclusive*/ = true) { + bool exclusive = true, + const bool do_validate = true) { if (pinnable_val == nullptr) { std::string* null_str = nullptr; - return GetForUpdate(options, key, null_str); + return GetForUpdate(options, column_family, key, null_str, exclusive, + do_validate); } else { - auto s = GetForUpdate(options, key, pinnable_val->GetSelf()); + auto s = GetForUpdate(options, column_family, key, + pinnable_val->GetSelf(), exclusive, do_validate); pinnable_val->PinSelf(); return s; } } virtual Status GetForUpdate(const ReadOptions& options, const Slice& key, - std::string* value, bool exclusive = true) = 0; + std::string* value, bool exclusive = true, + const bool do_validate = true) = 0; virtual std::vector MultiGetForUpdate( const ReadOptions& options, @@ -286,6 +293,9 @@ class Transaction { // functions in WriteBatch, but will also do conflict checking on the // keys being written. // + // assume_tracked=true expects the key be already tracked. If valid then it + // skips ValidateSnapshot. Returns error otherwise. + // // If this Transaction was created on an OptimisticTransactionDB, these // functions should always return Status::OK(). // @@ -298,28 +308,33 @@ class Transaction { // (See max_write_buffer_number_to_maintain) // or other errors on unexpected failures. virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) = 0; + const Slice& value, const bool assume_tracked = false) = 0; virtual Status Put(const Slice& key, const Slice& value) = 0; virtual Status Put(ColumnFamilyHandle* column_family, const SliceParts& key, - const SliceParts& value) = 0; + const SliceParts& value, + const bool assume_tracked = false) = 0; virtual Status Put(const SliceParts& key, const SliceParts& value) = 0; virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) = 0; + const Slice& value, + const bool assume_tracked = false) = 0; virtual Status Merge(const Slice& key, const Slice& value) = 0; - virtual Status Delete(ColumnFamilyHandle* column_family, - const Slice& key) = 0; + virtual Status Delete(ColumnFamilyHandle* column_family, const Slice& key, + const bool assume_tracked = false) = 0; virtual Status Delete(const Slice& key) = 0; virtual Status Delete(ColumnFamilyHandle* column_family, - const SliceParts& key) = 0; + const SliceParts& key, + const bool assume_tracked = false) = 0; virtual Status Delete(const SliceParts& key) = 0; virtual Status SingleDelete(ColumnFamilyHandle* column_family, - const Slice& key) = 0; + const Slice& key, + const bool assume_tracked = false) = 0; virtual Status SingleDelete(const Slice& key) = 0; virtual Status SingleDelete(ColumnFamilyHandle* column_family, - const SliceParts& key) = 0; + const SliceParts& key, + const bool assume_tracked = false) = 0; virtual Status SingleDelete(const SliceParts& key) = 0; // PutUntracked() will write a Put to the batch of operations to be committed diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/transaction_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/transaction_db.h index 3d7bc355a..6c4346ff3 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/transaction_db.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/transaction_db.h @@ -93,6 +93,16 @@ struct TransactionDBOptions { // logic in myrocks. This hack of simply not rolling back merge operands works // for the special way that myrocks uses this operands. bool rollback_merge_operands = false; + + private: + // 128 entries + size_t wp_snapshot_cache_bits = static_cast(7); + // 8m entry, 64MB size + size_t wp_commit_cache_bits = static_cast(23); + + friend class WritePreparedTxnDB; + friend class WritePreparedTransactionTestBase; + friend class MySQLStyleTransactionTest; }; struct TransactionOptions { @@ -117,7 +127,6 @@ struct TransactionOptions { // return 0 if // a.compare(b) returns 0. - // If positive, specifies the wait timeout in milliseconds when // a transaction attempts to lock a key. // @@ -171,8 +180,8 @@ struct KeyLockInfo { struct DeadlockInfo { TransactionID m_txn_id; uint32_t m_cf_id; - std::string m_waiting_key; bool m_exclusive; + std::string m_waiting_key; }; struct DeadlockPath { diff --git a/ceph/src/rocksdb/include/rocksdb/utilities/utility_db.h b/ceph/src/rocksdb/include/rocksdb/utilities/utility_db.h index a34a63898..3008fee1a 100644 --- a/ceph/src/rocksdb/include/rocksdb/utilities/utility_db.h +++ b/ceph/src/rocksdb/include/rocksdb/utilities/utility_db.h @@ -4,12 +4,12 @@ #pragma once #ifndef ROCKSDB_LITE -#include #include +#include -#include "rocksdb/utilities/stackable_db.h" -#include "rocksdb/utilities/db_ttl.h" #include "rocksdb/db.h" +#include "rocksdb/utilities/db_ttl.h" +#include "rocksdb/utilities/stackable_db.h" namespace rocksdb { @@ -22,14 +22,12 @@ class UtilityDB { #if defined(__GNUC__) || defined(__clang__) __attribute__((deprecated)) #elif _WIN32 - __declspec(deprecated) + __declspec(deprecated) #endif - static Status OpenTtlDB(const Options& options, - const std::string& name, - StackableDB** dbptr, - int32_t ttl = 0, - bool read_only = false); + static Status + OpenTtlDB(const Options& options, const std::string& name, + StackableDB** dbptr, int32_t ttl = 0, bool read_only = false); }; -} // namespace rocksdb +} // namespace rocksdb #endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/include/rocksdb/version.h b/ceph/src/rocksdb/include/rocksdb/version.h index c24ba1d39..d72f6b649 100644 --- a/ceph/src/rocksdb/include/rocksdb/version.h +++ b/ceph/src/rocksdb/include/rocksdb/version.h @@ -4,8 +4,8 @@ // (found in the LICENSE.Apache file in the root directory). #pragma once -#define ROCKSDB_MAJOR 5 -#define ROCKSDB_MINOR 17 +#define ROCKSDB_MAJOR 6 +#define ROCKSDB_MINOR 1 #define ROCKSDB_PATCH 2 // Do not use these. We made the mistake of declaring macros starting with diff --git a/ceph/src/rocksdb/include/rocksdb/wal_filter.h b/ceph/src/rocksdb/include/rocksdb/wal_filter.h index b8be77b23..e25746dba 100644 --- a/ceph/src/rocksdb/include/rocksdb/wal_filter.h +++ b/ceph/src/rocksdb/include/rocksdb/wal_filter.h @@ -5,8 +5,8 @@ #pragma once -#include #include +#include namespace rocksdb { @@ -34,7 +34,7 @@ class WalFilter { virtual ~WalFilter() {} // Provide ColumnFamily->LogNumber map to filter - // so that filter can determine whether a log number applies to a given + // so that filter can determine whether a log number applies to a given // column family (i.e. that log hasn't been flushed to SST already for the // column family). // We also pass in name->id map as only name is known during @@ -83,8 +83,8 @@ class WalFilter { return LogRecord(batch, new_batch, batch_changed); } - // Please see the comments for LogRecord above. This function is for - // compatibility only and contains a subset of parameters. + // Please see the comments for LogRecord above. This function is for + // compatibility only and contains a subset of parameters. // New code should use the function above. virtual WalProcessingOption LogRecord(const WriteBatch& /*batch*/, WriteBatch* /*new_batch*/, diff --git a/ceph/src/rocksdb/include/rocksdb/write_batch.h b/ceph/src/rocksdb/include/rocksdb/write_batch.h index c40c448fd..8782d08f1 100644 --- a/ceph/src/rocksdb/include/rocksdb/write_batch.h +++ b/ceph/src/rocksdb/include/rocksdb/write_batch.h @@ -24,10 +24,10 @@ #pragma once +#include #include #include #include -#include #include "rocksdb/status.h" #include "rocksdb/write_batch_base.h" diff --git a/ceph/src/rocksdb/include/rocksdb/write_batch_base.h b/ceph/src/rocksdb/include/rocksdb/write_batch_base.h index f91332ee2..a7747a7c8 100644 --- a/ceph/src/rocksdb/include/rocksdb/write_batch_base.h +++ b/ceph/src/rocksdb/include/rocksdb/write_batch_base.h @@ -69,7 +69,7 @@ class WriteBatchBase { const SliceParts& key); virtual Status SingleDelete(const SliceParts& key); - // If the database contains mappings in the range ["begin_key", "end_key"], + // If the database contains mappings in the range ["begin_key", "end_key"), // erase them. Else do nothing. virtual Status DeleteRange(ColumnFamilyHandle* column_family, const Slice& begin_key, const Slice& end_key) = 0; diff --git a/ceph/src/rocksdb/include/rocksdb/write_buffer_manager.h b/ceph/src/rocksdb/include/rocksdb/write_buffer_manager.h index 856cf4b24..dea904c18 100644 --- a/ceph/src/rocksdb/include/rocksdb/write_buffer_manager.h +++ b/ceph/src/rocksdb/include/rocksdb/write_buffer_manager.h @@ -30,6 +30,8 @@ class WriteBufferManager { bool enabled() const { return buffer_size_ != 0; } + bool cost_to_cache() const { return cache_rep_ != nullptr; } + // Only valid if enabled() size_t memory_usage() const { return memory_used_.load(std::memory_order_relaxed); diff --git a/ceph/src/rocksdb/java/CMakeLists.txt b/ceph/src/rocksdb/java/CMakeLists.txt index 96c08b231..360951834 100644 --- a/ceph/src/rocksdb/java/CMakeLists.txt +++ b/ceph/src/rocksdb/java/CMakeLists.txt @@ -11,6 +11,9 @@ set(JNI_NATIVE_SOURCES rocksjni/compaction_filter.cc rocksjni/compaction_filter_factory.cc rocksjni/compaction_filter_factory_jnicallback.cc + rocksjni/compaction_job_info.cc + rocksjni/compaction_job_stats.cc + rocksjni/compaction_options.cc rocksjni/compaction_options_fifo.cc rocksjni/compaction_options_universal.cc rocksjni/compact_range_options.cc @@ -25,6 +28,7 @@ set(JNI_NATIVE_SOURCES rocksjni/jnicallback.cc rocksjni/loggerjnicallback.cc rocksjni/lru_cache.cc + rocksjni/memory_util.cc rocksjni/memtablejni.cc rocksjni/merge_operator.cc rocksjni/native_comparator_wrapper_test.cc @@ -32,6 +36,7 @@ set(JNI_NATIVE_SOURCES rocksjni/optimistic_transaction_options.cc rocksjni/options.cc rocksjni/options_util.cc + rocksjni/persistent_cache.cc rocksjni/ratelimiterjni.cc rocksjni/remove_emptyvalue_compactionfilterjni.cc rocksjni/restorejni.cc @@ -45,6 +50,11 @@ set(JNI_NATIVE_SOURCES rocksjni/statistics.cc rocksjni/statisticsjni.cc rocksjni/table.cc + rocksjni/table_filter.cc + rocksjni/table_filter_jnicallback.cc + rocksjni/thread_status.cc + rocksjni/trace_writer.cc + rocksjni/trace_writer_jnicallback.cc rocksjni/transaction.cc rocksjni/transaction_db.cc rocksjni/transaction_db_options.cc @@ -53,10 +63,13 @@ set(JNI_NATIVE_SOURCES rocksjni/transaction_notifier_jnicallback.cc rocksjni/transaction_options.cc rocksjni/ttl.cc + rocksjni/wal_filter.cc + rocksjni/wal_filter_jnicallback.cc rocksjni/write_batch.cc rocksjni/writebatchhandlerjnicallback.cc rocksjni/write_batch_test.cc rocksjni/write_batch_with_index.cc + rocksjni/write_buffer_manager.cc ) set(NATIVE_JAVA_CLASSES @@ -67,7 +80,10 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.AbstractNativeReference org.rocksdb.AbstractRocksIterator org.rocksdb.AbstractSlice + org.rocksdb.AbstractTableFilter + org.rocksdb.AbstractTraceWriter org.rocksdb.AbstractTransactionNotifier + org.rocksdb.AbstractWalFilter org.rocksdb.BackupableDBOptions org.rocksdb.BackupEngine org.rocksdb.BlockBasedTableConfig @@ -78,6 +94,9 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.ClockCache org.rocksdb.ColumnFamilyHandle org.rocksdb.ColumnFamilyOptions + org.rocksdb.CompactionJobInfo + org.rocksdb.CompactionJobStats + org.rocksdb.CompactionOptions org.rocksdb.CompactionOptionsFIFO org.rocksdb.CompactionOptionsUniversal org.rocksdb.CompactRangeOptions @@ -93,9 +112,11 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.FlushOptions org.rocksdb.HashLinkedListMemTableConfig org.rocksdb.HashSkipListMemTableConfig + org.rocksdb.HdfsEnv org.rocksdb.IngestExternalFileOptions org.rocksdb.Logger org.rocksdb.LRUCache + org.rocksdb.MemoryUtil org.rocksdb.MemTableConfig org.rocksdb.NativeComparatorWrapper org.rocksdb.NativeLibraryLoader @@ -103,6 +124,7 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.OptimisticTransactionOptions org.rocksdb.Options org.rocksdb.OptionsUtil + org.rocksdb.PersistentCache org.rocksdb.PlainTableConfig org.rocksdb.RateLimiter org.rocksdb.ReadOptions @@ -124,12 +146,15 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.Statistics org.rocksdb.StringAppendOperator org.rocksdb.TableFormatConfig + org.rocksdb.ThreadStatus + org.rocksdb.TimedEnv org.rocksdb.Transaction org.rocksdb.TransactionDB org.rocksdb.TransactionDBOptions org.rocksdb.TransactionLogIterator org.rocksdb.TransactionOptions org.rocksdb.TtlDB + org.rocksdb.UInt64AddOperator org.rocksdb.VectorMemTableConfig org.rocksdb.WBWIRocksIterator org.rocksdb.WriteBatch @@ -142,6 +167,7 @@ set(NATIVE_JAVA_CLASSES org.rocksdb.SnapshotTest org.rocksdb.WriteBatchTest org.rocksdb.WriteBatchTestInternalHelper + org.rocksdb.WriteBufferManager ) include(FindJava) @@ -167,10 +193,14 @@ add_jar( src/main/java/org/rocksdb/AbstractCompactionFilter.java src/main/java/org/rocksdb/AbstractComparator.java src/main/java/org/rocksdb/AbstractImmutableNativeReference.java + src/main/java/org/rocksdb/AbstractMutableOptions.java src/main/java/org/rocksdb/AbstractNativeReference.java src/main/java/org/rocksdb/AbstractRocksIterator.java src/main/java/org/rocksdb/AbstractSlice.java + src/main/java/org/rocksdb/AbstractTableFilter.java + src/main/java/org/rocksdb/AbstractTraceWriter.java src/main/java/org/rocksdb/AbstractTransactionNotifier.java + src/main/java/org/rocksdb/AbstractWalFilter.java src/main/java/org/rocksdb/AbstractWriteBatch.java src/main/java/org/rocksdb/AccessHint.java src/main/java/org/rocksdb/AdvancedColumnFamilyOptionsInterface.java @@ -189,11 +219,16 @@ add_jar( src/main/java/org/rocksdb/ClockCache.java src/main/java/org/rocksdb/ColumnFamilyDescriptor.java src/main/java/org/rocksdb/ColumnFamilyHandle.java + src/main/java/org/rocksdb/ColumnFamilyMetaData.java src/main/java/org/rocksdb/ColumnFamilyOptionsInterface.java src/main/java/org/rocksdb/ColumnFamilyOptions.java + src/main/java/org/rocksdb/CompactionJobInfo.java + src/main/java/org/rocksdb/CompactionJobStats.java + src/main/java/org/rocksdb/CompactionOptions.java src/main/java/org/rocksdb/CompactionOptionsFIFO.java src/main/java/org/rocksdb/CompactionOptionsUniversal.java src/main/java/org/rocksdb/CompactionPriority.java + src/main/java/org/rocksdb/CompactionReason.java src/main/java/org/rocksdb/CompactRangeOptions.java src/main/java/org/rocksdb/CompactionStopStyle.java src/main/java/org/rocksdb/CompactionStyle.java @@ -202,6 +237,7 @@ add_jar( src/main/java/org/rocksdb/ComparatorType.java src/main/java/org/rocksdb/CompressionOptions.java src/main/java/org/rocksdb/CompressionType.java + src/main/java/org/rocksdb/DataBlockIndexType.java src/main/java/org/rocksdb/DBOptionsInterface.java src/main/java/org/rocksdb/DBOptions.java src/main/java/org/rocksdb/DbPath.java @@ -215,24 +251,39 @@ add_jar( src/main/java/org/rocksdb/FlushOptions.java src/main/java/org/rocksdb/HashLinkedListMemTableConfig.java src/main/java/org/rocksdb/HashSkipListMemTableConfig.java + src/main/java/org/rocksdb/HdfsEnv.java src/main/java/org/rocksdb/HistogramData.java src/main/java/org/rocksdb/HistogramType.java src/main/java/org/rocksdb/IndexType.java src/main/java/org/rocksdb/InfoLogLevel.java src/main/java/org/rocksdb/IngestExternalFileOptions.java + src/main/java/org/rocksdb/LevelMetaData.java + src/main/java/org/rocksdb/LiveFileMetaData.java + src/main/java/org/rocksdb/LogFile.java src/main/java/org/rocksdb/Logger.java src/main/java/org/rocksdb/LRUCache.java + src/main/java/org/rocksdb/MemoryUsageType.java + src/main/java/org/rocksdb/MemoryUtil.java src/main/java/org/rocksdb/MemTableConfig.java src/main/java/org/rocksdb/MergeOperator.java - src/main/java/org/rocksdb/MutableColumnFamilyOptionsInterface.java src/main/java/org/rocksdb/MutableColumnFamilyOptions.java + src/main/java/org/rocksdb/MutableColumnFamilyOptionsInterface.java + src/main/java/org/rocksdb/MutableDBOptions.java + src/main/java/org/rocksdb/MutableDBOptionsInterface.java + src/main/java/org/rocksdb/MutableOptionKey.java + src/main/java/org/rocksdb/MutableOptionValue.java src/main/java/org/rocksdb/NativeComparatorWrapper.java src/main/java/org/rocksdb/NativeLibraryLoader.java + src/main/java/org/rocksdb/OperationStage.java + src/main/java/org/rocksdb/OperationType.java src/main/java/org/rocksdb/OptimisticTransactionDB.java src/main/java/org/rocksdb/OptimisticTransactionOptions.java src/main/java/org/rocksdb/Options.java src/main/java/org/rocksdb/OptionsUtil.java + src/main/java/org/rocksdb/PersistentCache.java src/main/java/org/rocksdb/PlainTableConfig.java + src/main/java/org/rocksdb/Priority.java + src/main/java/org/rocksdb/Range.java src/main/java/org/rocksdb/RateLimiter.java src/main/java/org/rocksdb/RateLimiterMode.java src/main/java/org/rocksdb/ReadOptions.java @@ -248,11 +299,14 @@ add_jar( src/main/java/org/rocksdb/RocksMemEnv.java src/main/java/org/rocksdb/RocksMutableObject.java src/main/java/org/rocksdb/RocksObject.java + src/main/java/org/rocksdb/SizeApproximationFlag.java src/main/java/org/rocksdb/SkipListMemTableConfig.java src/main/java/org/rocksdb/Slice.java src/main/java/org/rocksdb/Snapshot.java src/main/java/org/rocksdb/SstFileManager.java + src/main/java/org/rocksdb/SstFileMetaData.java src/main/java/org/rocksdb/SstFileWriter.java + src/main/java/org/rocksdb/StateType.java src/main/java/org/rocksdb/StatisticsCollectorCallback.java src/main/java/org/rocksdb/StatisticsCollector.java src/main/java/org/rocksdb/Statistics.java @@ -260,8 +314,15 @@ add_jar( src/main/java/org/rocksdb/StatsLevel.java src/main/java/org/rocksdb/Status.java src/main/java/org/rocksdb/StringAppendOperator.java + src/main/java/org/rocksdb/TableFilter.java + src/main/java/org/rocksdb/TableProperties.java src/main/java/org/rocksdb/TableFormatConfig.java + src/main/java/org/rocksdb/ThreadType.java + src/main/java/org/rocksdb/ThreadStatus.java src/main/java/org/rocksdb/TickerType.java + src/main/java/org/rocksdb/TimedEnv.java + src/main/java/org/rocksdb/TraceOptions.java + src/main/java/org/rocksdb/TraceWriter.java src/main/java/org/rocksdb/TransactionalDB.java src/main/java/org/rocksdb/TransactionalOptions.java src/main/java/org/rocksdb/TransactionDB.java @@ -272,12 +333,16 @@ add_jar( src/main/java/org/rocksdb/TtlDB.java src/main/java/org/rocksdb/TxnDBWritePolicy.java src/main/java/org/rocksdb/VectorMemTableConfig.java + src/main/java/org/rocksdb/WalFileType.java + src/main/java/org/rocksdb/WalFilter.java + src/main/java/org/rocksdb/WalProcessingOption.java src/main/java/org/rocksdb/WALRecoveryMode.java src/main/java/org/rocksdb/WBWIRocksIterator.java src/main/java/org/rocksdb/WriteBatchInterface.java src/main/java/org/rocksdb/WriteBatch.java src/main/java/org/rocksdb/WriteBatchWithIndex.java src/main/java/org/rocksdb/WriteOptions.java + src/main/java/org/rocksdb/WriteBufferManager.java src/main/java/org/rocksdb/util/BytewiseComparator.java src/main/java/org/rocksdb/util/DirectBytewiseComparator.java src/main/java/org/rocksdb/util/Environment.java @@ -290,6 +355,7 @@ add_jar( src/test/java/org/rocksdb/RocksDBExceptionTest.java src/test/java/org/rocksdb/RocksMemoryResource.java src/test/java/org/rocksdb/SnapshotTest.java + src/main/java/org/rocksdb/UInt64AddOperator.java src/test/java/org/rocksdb/WriteBatchTest.java src/test/java/org/rocksdb/util/CapturingWriteBatchHandler.java src/test/java/org/rocksdb/util/WriteBatchGetter.java diff --git a/ceph/src/rocksdb/java/Makefile b/ceph/src/rocksdb/java/Makefile index f58fff06e..efc9d2b4e 100644 --- a/ceph/src/rocksdb/java/Makefile +++ b/ceph/src/rocksdb/java/Makefile @@ -1,7 +1,10 @@ NATIVE_JAVA_CLASSES = org.rocksdb.AbstractCompactionFilter\ org.rocksdb.AbstractCompactionFilterFactory\ org.rocksdb.AbstractSlice\ + org.rocksdb.AbstractTableFilter\ + org.rocksdb.AbstractTraceWriter\ org.rocksdb.AbstractTransactionNotifier\ + org.rocksdb.AbstractWalFilter\ org.rocksdb.BackupEngine\ org.rocksdb.BackupableDBOptions\ org.rocksdb.BlockBasedTableConfig\ @@ -12,6 +15,9 @@ NATIVE_JAVA_CLASSES = org.rocksdb.AbstractCompactionFilter\ org.rocksdb.CassandraValueMergeOperator\ org.rocksdb.ColumnFamilyHandle\ org.rocksdb.ColumnFamilyOptions\ + org.rocksdb.CompactionJobInfo\ + org.rocksdb.CompactionJobStats\ + org.rocksdb.CompactionOptions\ org.rocksdb.CompactionOptionsFIFO\ org.rocksdb.CompactionOptionsUniversal\ org.rocksdb.CompactRangeOptions\ @@ -28,14 +34,18 @@ NATIVE_JAVA_CLASSES = org.rocksdb.AbstractCompactionFilter\ org.rocksdb.IngestExternalFileOptions\ org.rocksdb.HashLinkedListMemTableConfig\ org.rocksdb.HashSkipListMemTableConfig\ + org.rocksdb.HdfsEnv\ org.rocksdb.Logger\ org.rocksdb.LRUCache\ + org.rocksdb.MemoryUsageType\ + org.rocksdb.MemoryUtil\ org.rocksdb.MergeOperator\ org.rocksdb.NativeComparatorWrapper\ org.rocksdb.OptimisticTransactionDB\ org.rocksdb.OptimisticTransactionOptions\ org.rocksdb.Options\ org.rocksdb.OptionsUtil\ + org.rocksdb.PersistentCache\ org.rocksdb.PlainTableConfig\ org.rocksdb.RateLimiter\ org.rocksdb.ReadOptions\ @@ -51,6 +61,8 @@ NATIVE_JAVA_CLASSES = org.rocksdb.AbstractCompactionFilter\ org.rocksdb.SstFileManager\ org.rocksdb.SstFileWriter\ org.rocksdb.Statistics\ + org.rocksdb.ThreadStatus\ + org.rocksdb.TimedEnv\ org.rocksdb.Transaction\ org.rocksdb.TransactionDB\ org.rocksdb.TransactionDBOptions\ @@ -60,10 +72,12 @@ NATIVE_JAVA_CLASSES = org.rocksdb.AbstractCompactionFilter\ org.rocksdb.VectorMemTableConfig\ org.rocksdb.Snapshot\ org.rocksdb.StringAppendOperator\ + org.rocksdb.UInt64AddOperator\ org.rocksdb.WriteBatch\ org.rocksdb.WriteBatch.Handler\ org.rocksdb.WriteOptions\ org.rocksdb.WriteBatchWithIndex\ + org.rocksdb.WriteBufferManager\ org.rocksdb.WBWIRocksIterator NATIVE_JAVA_TEST_CLASSES = org.rocksdb.RocksDBExceptionTest\ @@ -90,7 +104,10 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.ClockCacheTest\ org.rocksdb.ColumnFamilyOptionsTest\ org.rocksdb.ColumnFamilyTest\ - org.rocksdb.CompactionFilterFactoryTest\ + org.rocksdb.CompactionFilterFactoryTest\ + org.rocksdb.CompactionJobInfoTest\ + org.rocksdb.CompactionJobStatsTest\ + org.rocksdb.CompactionOptionsTest\ org.rocksdb.CompactionOptionsFIFOTest\ org.rocksdb.CompactionOptionsUniversalTest\ org.rocksdb.CompactionPriorityTest\ @@ -103,6 +120,7 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.DirectComparatorTest\ org.rocksdb.DirectSliceTest\ org.rocksdb.EnvOptionsTest\ + org.rocksdb.HdfsEnvTest\ org.rocksdb.IngestExternalFileOptionsTest\ org.rocksdb.util.EnvironmentTest\ org.rocksdb.FilterTest\ @@ -111,10 +129,12 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.KeyMayExistTest\ org.rocksdb.LoggerTest\ org.rocksdb.LRUCacheTest\ + org.rocksdb.MemoryUtilTest\ org.rocksdb.MemTableTest\ org.rocksdb.MergeTest\ org.rocksdb.MixedOptionsTest\ org.rocksdb.MutableColumnFamilyOptionsTest\ + org.rocksdb.MutableDBOptionsTest\ org.rocksdb.NativeComparatorWrapperTest\ org.rocksdb.NativeLibraryLoaderTest\ org.rocksdb.OptimisticTransactionTest\ @@ -128,7 +148,7 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.ReadOptionsTest\ org.rocksdb.RocksDBTest\ org.rocksdb.RocksDBExceptionTest\ - org.rocksdb.RocksEnvTest\ + org.rocksdb.DefaultEnvTest\ org.rocksdb.RocksIteratorTest\ org.rocksdb.RocksMemEnvTest\ org.rocksdb.util.SizeUnitTest\ @@ -136,6 +156,8 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.SnapshotTest\ org.rocksdb.SstFileManagerTest\ org.rocksdb.SstFileWriterTest\ + org.rocksdb.TableFilterTest\ + org.rocksdb.TimedEnvTest\ org.rocksdb.TransactionTest\ org.rocksdb.TransactionDBTest\ org.rocksdb.TransactionOptionsTest\ @@ -144,6 +166,7 @@ JAVA_TESTS = org.rocksdb.BackupableDBOptionsTest\ org.rocksdb.TtlDBTest\ org.rocksdb.StatisticsTest\ org.rocksdb.StatisticsCollectorTest\ + org.rocksdb.WalFilterTest\ org.rocksdb.WALRecoveryModeTest\ org.rocksdb.WriteBatchHandlerTest\ org.rocksdb.WriteBatchTest\ diff --git a/ceph/src/rocksdb/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java b/ceph/src/rocksdb/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java index db69e58cc..67f6a5cc0 100644 --- a/ceph/src/rocksdb/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java +++ b/ceph/src/rocksdb/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java @@ -493,7 +493,7 @@ public class DbBenchmark { options.setCreateIfMissing(false); } if (useMemenv_) { - options.setEnv(new RocksMemEnv()); + options.setEnv(new RocksMemEnv(Env.getDefault())); } switch (memtable_) { case "skip_list": diff --git a/ceph/src/rocksdb/java/crossbuild/docker-build-linux-centos.sh b/ceph/src/rocksdb/java/crossbuild/docker-build-linux-centos.sh index 6353b9ea2..d894b14a2 100755 --- a/ceph/src/rocksdb/java/crossbuild/docker-build-linux-centos.sh +++ b/ceph/src/rocksdb/java/crossbuild/docker-build-linux-centos.sh @@ -1,6 +1,7 @@ #!/usr/bin/env bash set -e +#set -x rm -rf /rocksdb-local cp -r /rocksdb-host /rocksdb-local @@ -8,11 +9,19 @@ cd /rocksdb-local # Use scl devtoolset if available (i.e. CentOS <7) if hash scl 2>/dev/null; then - scl enable devtoolset-2 'make jclean clean' - scl enable devtoolset-2 'PORTABLE=1 make -j8 rocksdbjavastatic' + if scl --list | grep -q 'devtoolset-7'; then + scl enable devtoolset-7 'make jclean clean' + scl enable devtoolset-7 'PORTABLE=1 make -j6 rocksdbjavastatic' + elif scl --list | grep -q 'devtoolset-2'; then + scl enable devtoolset-2 'make jclean clean' + scl enable devtoolset-2 'PORTABLE=1 make -j6 rocksdbjavastatic' + else + echo "Could not find devtoolset" + exit 1; + fi else make jclean clean - PORTABLE=1 make -j8 rocksdbjavastatic + PORTABLE=1 make -j6 rocksdbjavastatic fi cp java/target/librocksdbjni-linux*.so java/target/rocksdbjni-*-linux*.jar /rocksdb-host/java/target diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_filter_factory.cc b/ceph/src/rocksdb/java/rocksjni/compaction_filter_factory.cc index c2fb1b0a1..2ef0a7746 100644 --- a/ceph/src/rocksdb/java/rocksjni/compaction_filter_factory.cc +++ b/ceph/src/rocksdb/java/rocksjni/compaction_filter_factory.cc @@ -31,9 +31,8 @@ jlong Java_org_rocksdb_AbstractCompactionFilterFactory_createNewCompactionFilter * Signature: (J)V */ void Java_org_rocksdb_AbstractCompactionFilterFactory_disposeInternal( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* ptr_sptr_cff = reinterpret_cast< std::shared_ptr*>(jhandle); delete ptr_sptr_cff; - // @lint-ignore TXT4 T25377293 Grandfathered in } diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_job_info.cc b/ceph/src/rocksdb/java/rocksjni/compaction_job_info.cc new file mode 100644 index 000000000..6af6efcb8 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/compaction_job_info.cc @@ -0,0 +1,222 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::CompactionJobInfo. + +#include + +#include "include/org_rocksdb_CompactionJobInfo.h" +#include "rocksdb/listener.h" +#include "rocksjni/portal.h" + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: newCompactionJobInfo + * Signature: ()J + */ +jlong Java_org_rocksdb_CompactionJobInfo_newCompactionJobInfo( + JNIEnv*, jclass) { + auto* compact_job_info = new rocksdb::CompactionJobInfo(); + return reinterpret_cast(compact_job_info); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_CompactionJobInfo_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + delete compact_job_info; +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: columnFamilyName + * Signature: (J)[B + */ +jbyteArray Java_org_rocksdb_CompactionJobInfo_columnFamilyName( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::JniUtil::copyBytes( + env, compact_job_info->cf_name); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: status + * Signature: (J)Lorg/rocksdb/Status; + */ +jobject Java_org_rocksdb_CompactionJobInfo_status( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::StatusJni::construct( + env, compact_job_info->status); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: threadId + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobInfo_threadId( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return static_cast(compact_job_info->thread_id); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: jobId + * Signature: (J)I + */ +jint Java_org_rocksdb_CompactionJobInfo_jobId( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return static_cast(compact_job_info->job_id); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: baseInputLevel + * Signature: (J)I + */ +jint Java_org_rocksdb_CompactionJobInfo_baseInputLevel( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return static_cast(compact_job_info->base_input_level); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: outputLevel + * Signature: (J)I + */ +jint Java_org_rocksdb_CompactionJobInfo_outputLevel( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return static_cast(compact_job_info->output_level); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: inputFiles + * Signature: (J)[Ljava/lang/String; + */ +jobjectArray Java_org_rocksdb_CompactionJobInfo_inputFiles( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::JniUtil::toJavaStrings( + env, &compact_job_info->input_files); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: outputFiles + * Signature: (J)[Ljava/lang/String; + */ +jobjectArray Java_org_rocksdb_CompactionJobInfo_outputFiles( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::JniUtil::toJavaStrings( + env, &compact_job_info->output_files); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: tableProperties + * Signature: (J)Ljava/util/Map; + */ +jobject Java_org_rocksdb_CompactionJobInfo_tableProperties( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + auto* map = &compact_job_info->table_properties; + + jobject jhash_map = rocksdb::HashMapJni::construct( + env, static_cast(map->size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV, jobject, jobject> fn_map_kv = + [env](const std::pair>& kv) { + jstring jkey = rocksdb::JniUtil::toJavaString(env, &(kv.first), false); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jobject jtable_properties = rocksdb::TablePropertiesJni::fromCppTableProperties( + env, *(kv.second.get())); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>( + new std::pair(static_cast(jkey), jtable_properties)); + }; + + if (!rocksdb::HashMapJni::putAll(env, jhash_map, map->begin(), map->end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: compactionReason + * Signature: (J)B + */ +jbyte Java_org_rocksdb_CompactionJobInfo_compactionReason( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::CompactionReasonJni::toJavaCompactionReason( + compact_job_info->compaction_reason); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: compression + * Signature: (J)B + */ +jbyte Java_org_rocksdb_CompactionJobInfo_compression( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + return rocksdb::CompressionTypeJni::toJavaCompressionType( + compact_job_info->compression); +} + +/* + * Class: org_rocksdb_CompactionJobInfo + * Method: stats + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobInfo_stats( + JNIEnv *, jclass, jlong jhandle) { + auto* compact_job_info = + reinterpret_cast(jhandle); + auto* stats = new rocksdb::CompactionJobStats(); + stats->Add(compact_job_info->stats); + return reinterpret_cast(stats); +} diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_job_stats.cc b/ceph/src/rocksdb/java/rocksjni/compaction_job_stats.cc new file mode 100644 index 000000000..7d13dd12f --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/compaction_job_stats.cc @@ -0,0 +1,361 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::CompactionJobStats. + +#include + +#include "include/org_rocksdb_CompactionJobStats.h" +#include "rocksdb/compaction_job_stats.h" +#include "rocksjni/portal.h" + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: newCompactionJobStats + * Signature: ()J + */ +jlong Java_org_rocksdb_CompactionJobStats_newCompactionJobStats( + JNIEnv*, jclass) { + auto* compact_job_stats = new rocksdb::CompactionJobStats(); + return reinterpret_cast(compact_job_stats); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_CompactionJobStats_disposeInternal( + JNIEnv *, jobject, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + delete compact_job_stats; +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: reset + * Signature: (J)V + */ +void Java_org_rocksdb_CompactionJobStats_reset( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + compact_job_stats->Reset(); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: add + * Signature: (JJ)V + */ +void Java_org_rocksdb_CompactionJobStats_add( + JNIEnv*, jclass, jlong jhandle, jlong jother_handle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + auto* other_compact_job_stats = + reinterpret_cast(jother_handle); + compact_job_stats->Add(*other_compact_job_stats); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: elapsedMicros + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_elapsedMicros( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast(compact_job_stats->elapsed_micros); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numInputRecords + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numInputRecords( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast(compact_job_stats->num_input_records); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numInputFiles + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numInputFiles( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast(compact_job_stats->num_input_files); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numInputFilesAtOutputLevel + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numInputFilesAtOutputLevel( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_input_files_at_output_level); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numOutputRecords + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numOutputRecords( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_output_records); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numOutputFiles + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numOutputFiles( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_output_files); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: isManualCompaction + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_CompactionJobStats_isManualCompaction( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + if (compact_job_stats->is_manual_compaction) { + return JNI_TRUE; + } else { + return JNI_FALSE; + } +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: totalInputBytes + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_totalInputBytes( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->total_input_bytes); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: totalOutputBytes + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_totalOutputBytes( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->total_output_bytes); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numRecordsReplaced + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numRecordsReplaced( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_records_replaced); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: totalInputRawKeyBytes + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_totalInputRawKeyBytes( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->total_input_raw_key_bytes); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: totalInputRawValueBytes + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_totalInputRawValueBytes( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->total_input_raw_value_bytes); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numInputDeletionRecords + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numInputDeletionRecords( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_input_deletion_records); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numExpiredDeletionRecords + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numExpiredDeletionRecords( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_expired_deletion_records); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numCorruptKeys + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numCorruptKeys( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_corrupt_keys); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: fileWriteNanos + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_fileWriteNanos( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->file_write_nanos); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: fileRangeSyncNanos + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_fileRangeSyncNanos( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->file_range_sync_nanos); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: fileFsyncNanos + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_fileFsyncNanos( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->file_fsync_nanos); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: filePrepareWriteNanos + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_filePrepareWriteNanos( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->file_prepare_write_nanos); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: smallestOutputKeyPrefix + * Signature: (J)[B + */ +jbyteArray Java_org_rocksdb_CompactionJobStats_smallestOutputKeyPrefix( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return rocksdb::JniUtil::copyBytes(env, + compact_job_stats->smallest_output_key_prefix); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: largestOutputKeyPrefix + * Signature: (J)[B + */ +jbyteArray Java_org_rocksdb_CompactionJobStats_largestOutputKeyPrefix( + JNIEnv* env, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return rocksdb::JniUtil::copyBytes(env, + compact_job_stats->largest_output_key_prefix); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numSingleDelFallthru + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numSingleDelFallthru( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_single_del_fallthru); +} + +/* + * Class: org_rocksdb_CompactionJobStats + * Method: numSingleDelMismatch + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionJobStats_numSingleDelMismatch( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_job_stats = + reinterpret_cast(jhandle); + return static_cast( + compact_job_stats->num_single_del_mismatch); +} \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_options.cc b/ceph/src/rocksdb/java/rocksjni/compaction_options.cc new file mode 100644 index 000000000..6aaabea73 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/compaction_options.cc @@ -0,0 +1,116 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::CompactionOptions. + +#include + +#include "include/org_rocksdb_CompactionOptions.h" +#include "rocksdb/options.h" +#include "rocksjni/portal.h" + + +/* + * Class: org_rocksdb_CompactionOptions + * Method: newCompactionOptions + * Signature: ()J + */ +jlong Java_org_rocksdb_CompactionOptions_newCompactionOptions( + JNIEnv*, jclass) { + auto* compact_opts = new rocksdb::CompactionOptions(); + return reinterpret_cast(compact_opts); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_CompactionOptions_disposeInternal( + JNIEnv *, jobject, jlong jhandle) { + auto* compact_opts = + reinterpret_cast(jhandle); + delete compact_opts; +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: compression + * Signature: (J)B + */ +jbyte Java_org_rocksdb_CompactionOptions_compression( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_opts = + reinterpret_cast(jhandle); + return rocksdb::CompressionTypeJni::toJavaCompressionType( + compact_opts->compression); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: setCompression + * Signature: (JB)V + */ +void Java_org_rocksdb_CompactionOptions_setCompression( + JNIEnv*, jclass, jlong jhandle, jbyte jcompression_type_value) { + auto* compact_opts = + reinterpret_cast(jhandle); + compact_opts->compression = + rocksdb::CompressionTypeJni::toCppCompressionType( + jcompression_type_value); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: outputFileSizeLimit + * Signature: (J)J + */ +jlong Java_org_rocksdb_CompactionOptions_outputFileSizeLimit( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_opts = + reinterpret_cast(jhandle); + return static_cast( + compact_opts->output_file_size_limit); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: setOutputFileSizeLimit + * Signature: (JJ)V + */ +void Java_org_rocksdb_CompactionOptions_setOutputFileSizeLimit( + JNIEnv*, jclass, jlong jhandle, jlong joutput_file_size_limit) { + auto* compact_opts = + reinterpret_cast(jhandle); + compact_opts->output_file_size_limit = + static_cast(joutput_file_size_limit); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: maxSubcompactions + * Signature: (J)I + */ +jint Java_org_rocksdb_CompactionOptions_maxSubcompactions( + JNIEnv*, jclass, jlong jhandle) { + auto* compact_opts = + reinterpret_cast(jhandle); + return static_cast( + compact_opts->max_subcompactions); +} + +/* + * Class: org_rocksdb_CompactionOptions + * Method: setMaxSubcompactions + * Signature: (JI)V + */ +void Java_org_rocksdb_CompactionOptions_setMaxSubcompactions( + JNIEnv*, jclass, jlong jhandle, jint jmax_subcompactions) { + auto* compact_opts = + reinterpret_cast(jhandle); + compact_opts->max_subcompactions = + static_cast(jmax_subcompactions); +} \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_options_fifo.cc b/ceph/src/rocksdb/java/rocksjni/compaction_options_fifo.cc index 95bbfc621..b7c445fd6 100644 --- a/ceph/src/rocksdb/java/rocksjni/compaction_options_fifo.cc +++ b/ceph/src/rocksdb/java/rocksjni/compaction_options_fifo.cc @@ -17,7 +17,7 @@ * Signature: ()J */ jlong Java_org_rocksdb_CompactionOptionsFIFO_newCompactionOptionsFIFO( - JNIEnv* /*env*/, jclass /*jcls*/) { + JNIEnv*, jclass) { const auto* opt = new rocksdb::CompactionOptionsFIFO(); return reinterpret_cast(opt); } @@ -28,8 +28,7 @@ jlong Java_org_rocksdb_CompactionOptionsFIFO_newCompactionOptionsFIFO( * Signature: (JJ)V */ void Java_org_rocksdb_CompactionOptionsFIFO_setMaxTableFilesSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_table_files_size) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_table_files_size) { auto* opt = reinterpret_cast(jhandle); opt->max_table_files_size = static_cast(jmax_table_files_size); } @@ -39,20 +38,40 @@ void Java_org_rocksdb_CompactionOptionsFIFO_setMaxTableFilesSize( * Method: maxTableFilesSize * Signature: (J)J */ -jlong Java_org_rocksdb_CompactionOptionsFIFO_maxTableFilesSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_CompactionOptionsFIFO_maxTableFilesSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_table_files_size); } +/* + * Class: org_rocksdb_CompactionOptionsFIFO + * Method: setAllowCompaction + * Signature: (JZ)V + */ +void Java_org_rocksdb_CompactionOptionsFIFO_setAllowCompaction( + JNIEnv*, jobject, jlong jhandle, jboolean allow_compaction) { + auto* opt = reinterpret_cast(jhandle); + opt->allow_compaction = static_cast(allow_compaction); +} + +/* + * Class: org_rocksdb_CompactionOptionsFIFO + * Method: allowCompaction + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_CompactionOptionsFIFO_allowCompaction( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->allow_compaction); +} + /* * Class: org_rocksdb_CompactionOptionsFIFO * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_CompactionOptionsFIFO_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_CompactionOptionsFIFO_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { delete reinterpret_cast(jhandle); } diff --git a/ceph/src/rocksdb/java/rocksjni/compaction_options_universal.cc b/ceph/src/rocksdb/java/rocksjni/compaction_options_universal.cc index da31bc688..7ca519885 100644 --- a/ceph/src/rocksdb/java/rocksjni/compaction_options_universal.cc +++ b/ceph/src/rocksdb/java/rocksjni/compaction_options_universal.cc @@ -18,7 +18,7 @@ * Signature: ()J */ jlong Java_org_rocksdb_CompactionOptionsUniversal_newCompactionOptionsUniversal( - JNIEnv* /*env*/, jclass /*jcls*/) { + JNIEnv*, jclass) { const auto* opt = new rocksdb::CompactionOptionsUniversal(); return reinterpret_cast(opt); } @@ -29,7 +29,7 @@ jlong Java_org_rocksdb_CompactionOptionsUniversal_newCompactionOptionsUniversal( * Signature: (JI)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setSizeRatio( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jsize_ratio) { + JNIEnv*, jobject, jlong jhandle, jint jsize_ratio) { auto* opt = reinterpret_cast(jhandle); opt->size_ratio = static_cast(jsize_ratio); } @@ -39,9 +39,8 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setSizeRatio( * Method: sizeRatio * Signature: (J)I */ -jint Java_org_rocksdb_CompactionOptionsUniversal_sizeRatio(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompactionOptionsUniversal_sizeRatio( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->size_ratio); } @@ -52,7 +51,7 @@ jint Java_org_rocksdb_CompactionOptionsUniversal_sizeRatio(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setMinMergeWidth( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jmin_merge_width) { + JNIEnv*, jobject, jlong jhandle, jint jmin_merge_width) { auto* opt = reinterpret_cast(jhandle); opt->min_merge_width = static_cast(jmin_merge_width); } @@ -62,9 +61,8 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setMinMergeWidth( * Method: minMergeWidth * Signature: (J)I */ -jint Java_org_rocksdb_CompactionOptionsUniversal_minMergeWidth(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompactionOptionsUniversal_minMergeWidth( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->min_merge_width); } @@ -75,7 +73,7 @@ jint Java_org_rocksdb_CompactionOptionsUniversal_minMergeWidth(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setMaxMergeWidth( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jmax_merge_width) { + JNIEnv*, jobject, jlong jhandle, jint jmax_merge_width) { auto* opt = reinterpret_cast(jhandle); opt->max_merge_width = static_cast(jmax_merge_width); } @@ -85,9 +83,8 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setMaxMergeWidth( * Method: maxMergeWidth * Signature: (J)I */ -jint Java_org_rocksdb_CompactionOptionsUniversal_maxMergeWidth(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompactionOptionsUniversal_maxMergeWidth( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_merge_width); } @@ -98,8 +95,7 @@ jint Java_org_rocksdb_CompactionOptionsUniversal_maxMergeWidth(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setMaxSizeAmplificationPercent( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmax_size_amplification_percent) { + JNIEnv*, jobject, jlong jhandle, jint jmax_size_amplification_percent) { auto* opt = reinterpret_cast(jhandle); opt->max_size_amplification_percent = static_cast(jmax_size_amplification_percent); @@ -111,7 +107,7 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setMaxSizeAmplificationPercent( * Signature: (J)I */ jint Java_org_rocksdb_CompactionOptionsUniversal_maxSizeAmplificationPercent( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_size_amplification_percent); } @@ -122,7 +118,7 @@ jint Java_org_rocksdb_CompactionOptionsUniversal_maxSizeAmplificationPercent( * Signature: (JI)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setCompressionSizePercent( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jcompression_size_percent) { auto* opt = reinterpret_cast(jhandle); opt->compression_size_percent = @@ -135,7 +131,7 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setCompressionSizePercent( * Signature: (J)I */ jint Java_org_rocksdb_CompactionOptionsUniversal_compressionSizePercent( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->compression_size_percent); } @@ -146,7 +142,7 @@ jint Java_org_rocksdb_CompactionOptionsUniversal_compressionSizePercent( * Signature: (JB)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setStopStyle( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jbyte jstop_style_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jstop_style_value) { auto* opt = reinterpret_cast(jhandle); opt->stop_style = rocksdb::CompactionStopStyleJni::toCppCompactionStopStyle( jstop_style_value); @@ -157,9 +153,8 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setStopStyle( * Method: stopStyle * Signature: (J)B */ -jbyte Java_org_rocksdb_CompactionOptionsUniversal_stopStyle(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_CompactionOptionsUniversal_stopStyle( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return rocksdb::CompactionStopStyleJni::toJavaCompactionStopStyle( opt->stop_style); @@ -171,8 +166,7 @@ jbyte Java_org_rocksdb_CompactionOptionsUniversal_stopStyle(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_CompactionOptionsUniversal_setAllowTrivialMove( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jallow_trivial_move) { + JNIEnv*, jobject, jlong jhandle, jboolean jallow_trivial_move) { auto* opt = reinterpret_cast(jhandle); opt->allow_trivial_move = static_cast(jallow_trivial_move); } @@ -183,7 +177,7 @@ void Java_org_rocksdb_CompactionOptionsUniversal_setAllowTrivialMove( * Signature: (J)Z */ jboolean Java_org_rocksdb_CompactionOptionsUniversal_allowTrivialMove( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return opt->allow_trivial_move; } @@ -194,6 +188,6 @@ jboolean Java_org_rocksdb_CompactionOptionsUniversal_allowTrivialMove( * Signature: (J)V */ void Java_org_rocksdb_CompactionOptionsUniversal_disposeInternal( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { delete reinterpret_cast(jhandle); } diff --git a/ceph/src/rocksdb/java/rocksjni/compression_options.cc b/ceph/src/rocksdb/java/rocksjni/compression_options.cc index a5598abe1..f0155eb33 100644 --- a/ceph/src/rocksdb/java/rocksjni/compression_options.cc +++ b/ceph/src/rocksdb/java/rocksjni/compression_options.cc @@ -17,7 +17,7 @@ * Signature: ()J */ jlong Java_org_rocksdb_CompressionOptions_newCompressionOptions( - JNIEnv* /*env*/, jclass /*jcls*/) { + JNIEnv*, jclass) { const auto* opt = new rocksdb::CompressionOptions(); return reinterpret_cast(opt); } @@ -27,10 +27,8 @@ jlong Java_org_rocksdb_CompressionOptions_newCompressionOptions( * Method: setWindowBits * Signature: (JI)V */ -void Java_org_rocksdb_CompressionOptions_setWindowBits(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint jwindow_bits) { +void Java_org_rocksdb_CompressionOptions_setWindowBits( + JNIEnv*, jobject, jlong jhandle, jint jwindow_bits) { auto* opt = reinterpret_cast(jhandle); opt->window_bits = static_cast(jwindow_bits); } @@ -40,9 +38,8 @@ void Java_org_rocksdb_CompressionOptions_setWindowBits(JNIEnv* /*env*/, * Method: windowBits * Signature: (J)I */ -jint Java_org_rocksdb_CompressionOptions_windowBits(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompressionOptions_windowBits( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->window_bits); } @@ -52,9 +49,8 @@ jint Java_org_rocksdb_CompressionOptions_windowBits(JNIEnv* /*env*/, * Method: setLevel * Signature: (JI)V */ -void Java_org_rocksdb_CompressionOptions_setLevel(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, jint jlevel) { +void Java_org_rocksdb_CompressionOptions_setLevel( + JNIEnv*, jobject, jlong jhandle, jint jlevel) { auto* opt = reinterpret_cast(jhandle); opt->level = static_cast(jlevel); } @@ -64,9 +60,8 @@ void Java_org_rocksdb_CompressionOptions_setLevel(JNIEnv* /*env*/, * Method: level * Signature: (J)I */ -jint Java_org_rocksdb_CompressionOptions_level(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompressionOptions_level( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->level); } @@ -76,10 +71,8 @@ jint Java_org_rocksdb_CompressionOptions_level(JNIEnv* /*env*/, * Method: setStrategy * Signature: (JI)V */ -void Java_org_rocksdb_CompressionOptions_setStrategy(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint jstrategy) { +void Java_org_rocksdb_CompressionOptions_setStrategy( + JNIEnv*, jobject, jlong jhandle, jint jstrategy) { auto* opt = reinterpret_cast(jhandle); opt->strategy = static_cast(jstrategy); } @@ -89,9 +82,8 @@ void Java_org_rocksdb_CompressionOptions_setStrategy(JNIEnv* /*env*/, * Method: strategy * Signature: (J)I */ -jint Java_org_rocksdb_CompressionOptions_strategy(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompressionOptions_strategy( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->strategy); } @@ -101,12 +93,10 @@ jint Java_org_rocksdb_CompressionOptions_strategy(JNIEnv* /*env*/, * Method: setMaxDictBytes * Signature: (JI)V */ -void Java_org_rocksdb_CompressionOptions_setMaxDictBytes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint jmax_dict_bytes) { +void Java_org_rocksdb_CompressionOptions_setMaxDictBytes( + JNIEnv*, jobject, jlong jhandle, jint jmax_dict_bytes) { auto* opt = reinterpret_cast(jhandle); - opt->max_dict_bytes = static_cast(jmax_dict_bytes); + opt->max_dict_bytes = static_cast(jmax_dict_bytes); } /* @@ -114,44 +104,61 @@ void Java_org_rocksdb_CompressionOptions_setMaxDictBytes(JNIEnv* /*env*/, * Method: maxDictBytes * Signature: (J)I */ -jint Java_org_rocksdb_CompressionOptions_maxDictBytes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompressionOptions_maxDictBytes( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_dict_bytes); } /* * Class: org_rocksdb_CompressionOptions - * Method: setEnabled + * Method: setZstdMaxTrainBytes * Signature: (JI)V */ -void Java_org_rocksdb_CompressionOptions_setEnabled(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean jenabled) { +void Java_org_rocksdb_CompressionOptions_setZstdMaxTrainBytes( + JNIEnv*, jobject, jlong jhandle, jint jzstd_max_train_bytes) { auto* opt = reinterpret_cast(jhandle); - opt->enabled = static_cast(jenabled); + opt->zstd_max_train_bytes = static_cast(jzstd_max_train_bytes); } /* * Class: org_rocksdb_CompressionOptions - * Method: Enabled + * Method: zstdMaxTrainBytes * Signature: (J)I */ -jint Java_org_rocksdb_CompressionOptions_enabled(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_CompressionOptions_zstdMaxTrainBytes( + JNIEnv *, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->zstd_max_train_bytes); +} + +/* + * Class: org_rocksdb_CompressionOptions + * Method: setEnabled + * Signature: (JZ)V + */ +void Java_org_rocksdb_CompressionOptions_setEnabled( + JNIEnv*, jobject, jlong jhandle, jboolean jenabled) { + auto* opt = reinterpret_cast(jhandle); + opt->enabled = jenabled == JNI_TRUE; +} + +/* + * Class: org_rocksdb_CompressionOptions + * Method: enabled + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_CompressionOptions_enabled( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); - return static_cast(opt->enabled); + return static_cast(opt->enabled); } /* * Class: org_rocksdb_CompressionOptions * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_CompressionOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_CompressionOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { delete reinterpret_cast(jhandle); } diff --git a/ceph/src/rocksdb/java/rocksjni/env.cc b/ceph/src/rocksdb/java/rocksjni/env.cc index 5433faf00..ed54bd36a 100644 --- a/ceph/src/rocksdb/java/rocksjni/env.cc +++ b/ceph/src/rocksdb/java/rocksjni/env.cc @@ -6,66 +6,160 @@ // This file implements the "bridge" between Java and C++ and enables // calling c++ rocksdb::Env methods from Java side. +#include +#include + +#include "portal.h" #include "rocksdb/env.h" #include "include/org_rocksdb_Env.h" +#include "include/org_rocksdb_HdfsEnv.h" #include "include/org_rocksdb_RocksEnv.h" #include "include/org_rocksdb_RocksMemEnv.h" +#include "include/org_rocksdb_TimedEnv.h" /* * Class: org_rocksdb_Env * Method: getDefaultEnvInternal * Signature: ()J */ -jlong Java_org_rocksdb_Env_getDefaultEnvInternal(JNIEnv* /*env*/, - jclass /*jclazz*/) { +jlong Java_org_rocksdb_Env_getDefaultEnvInternal( + JNIEnv*, jclass) { return reinterpret_cast(rocksdb::Env::Default()); } +/* + * Class: org_rocksdb_RocksEnv + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_RocksEnv_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* e = reinterpret_cast(jhandle); + assert(e != nullptr); + delete e; +} + /* * Class: org_rocksdb_Env * Method: setBackgroundThreads - * Signature: (JII)V + * Signature: (JIB)V */ -void Java_org_rocksdb_Env_setBackgroundThreads(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jint num, jint priority) { +void Java_org_rocksdb_Env_setBackgroundThreads( + JNIEnv*, jobject, jlong jhandle, jint jnum, jbyte jpriority_value) { auto* rocks_env = reinterpret_cast(jhandle); - switch (priority) { - case org_rocksdb_Env_FLUSH_POOL: - rocks_env->SetBackgroundThreads(num, rocksdb::Env::Priority::LOW); - break; - case org_rocksdb_Env_COMPACTION_POOL: - rocks_env->SetBackgroundThreads(num, rocksdb::Env::Priority::HIGH); - break; - } + rocks_env->SetBackgroundThreads(static_cast(jnum), + rocksdb::PriorityJni::toCppPriority(jpriority_value)); } /* - * Class: org_rocksdb_sEnv + * Class: org_rocksdb_Env + * Method: getBackgroundThreads + * Signature: (JB)I + */ +jint Java_org_rocksdb_Env_getBackgroundThreads( + JNIEnv*, jobject, jlong jhandle, jbyte jpriority_value) { + auto* rocks_env = reinterpret_cast(jhandle); + const int num = rocks_env->GetBackgroundThreads( + rocksdb::PriorityJni::toCppPriority(jpriority_value)); + return static_cast(num); +} + +/* + * Class: org_rocksdb_Env * Method: getThreadPoolQueueLen - * Signature: (JI)I + * Signature: (JB)I + */ +jint Java_org_rocksdb_Env_getThreadPoolQueueLen( + JNIEnv*, jobject, jlong jhandle, jbyte jpriority_value) { + auto* rocks_env = reinterpret_cast(jhandle); + const int queue_len = rocks_env->GetThreadPoolQueueLen( + rocksdb::PriorityJni::toCppPriority(jpriority_value)); + return static_cast(queue_len); +} + +/* + * Class: org_rocksdb_Env + * Method: incBackgroundThreadsIfNeeded + * Signature: (JIB)V + */ +void Java_org_rocksdb_Env_incBackgroundThreadsIfNeeded( + JNIEnv*, jobject, jlong jhandle, jint jnum, jbyte jpriority_value) { + auto* rocks_env = reinterpret_cast(jhandle); + rocks_env->IncBackgroundThreadsIfNeeded(static_cast(jnum), + rocksdb::PriorityJni::toCppPriority(jpriority_value)); +} + +/* + * Class: org_rocksdb_Env + * Method: lowerThreadPoolIOPriority + * Signature: (JB)V + */ +void Java_org_rocksdb_Env_lowerThreadPoolIOPriority( + JNIEnv*, jobject, jlong jhandle, jbyte jpriority_value) { + auto* rocks_env = reinterpret_cast(jhandle); + rocks_env->LowerThreadPoolIOPriority( + rocksdb::PriorityJni::toCppPriority(jpriority_value)); +} + +/* + * Class: org_rocksdb_Env + * Method: lowerThreadPoolCPUPriority + * Signature: (JB)V + */ +void Java_org_rocksdb_Env_lowerThreadPoolCPUPriority( + JNIEnv*, jobject, jlong jhandle, jbyte jpriority_value) { + auto* rocks_env = reinterpret_cast(jhandle); + rocks_env->LowerThreadPoolCPUPriority( + rocksdb::PriorityJni::toCppPriority(jpriority_value)); +} + +/* + * Class: org_rocksdb_Env + * Method: getThreadList + * Signature: (J)[Lorg/rocksdb/ThreadStatus; */ -jint Java_org_rocksdb_Env_getThreadPoolQueueLen(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jint pool_id) { +jobjectArray Java_org_rocksdb_Env_getThreadList( + JNIEnv* env, jobject, jlong jhandle) { auto* rocks_env = reinterpret_cast(jhandle); - switch (pool_id) { - case org_rocksdb_RocksEnv_FLUSH_POOL: - return rocks_env->GetThreadPoolQueueLen(rocksdb::Env::Priority::LOW); - case org_rocksdb_RocksEnv_COMPACTION_POOL: - return rocks_env->GetThreadPoolQueueLen(rocksdb::Env::Priority::HIGH); + std::vector thread_status; + rocksdb::Status s = rocks_env->GetThreadList(&thread_status); + if (!s.ok()) { + // error, throw exception + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + // object[] + const jsize len = static_cast(thread_status.size()); + jobjectArray jthread_status = + env->NewObjectArray(len, rocksdb::ThreadStatusJni::getJClass(env), nullptr); + if (jthread_status == nullptr) { + // an exception occurred + return nullptr; + } + for (jsize i = 0; i < len; ++i) { + jobject jts = + rocksdb::ThreadStatusJni::construct(env, &(thread_status[i])); + env->SetObjectArrayElement(jthread_status, i, jts); + if (env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(jthread_status); + return nullptr; + } } - return 0; + + return jthread_status; } /* * Class: org_rocksdb_RocksMemEnv * Method: createMemEnv - * Signature: ()J + * Signature: (J)J */ -jlong Java_org_rocksdb_RocksMemEnv_createMemEnv(JNIEnv* /*env*/, - jclass /*jclazz*/) { - return reinterpret_cast(rocksdb::NewMemEnv(rocksdb::Env::Default())); +jlong Java_org_rocksdb_RocksMemEnv_createMemEnv( + JNIEnv*, jclass, jlong jbase_env_handle) { + auto* base_env = reinterpret_cast(jbase_env_handle); + return reinterpret_cast(rocksdb::NewMemEnv(base_env)); } /* @@ -73,10 +167,68 @@ jlong Java_org_rocksdb_RocksMemEnv_createMemEnv(JNIEnv* /*env*/, * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_RocksMemEnv_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_RocksMemEnv_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* e = reinterpret_cast(jhandle); + assert(e != nullptr); + delete e; +} + +/* + * Class: org_rocksdb_HdfsEnv + * Method: createHdfsEnv + * Signature: (Ljava/lang/String;)J + */ +jlong Java_org_rocksdb_HdfsEnv_createHdfsEnv( + JNIEnv* env, jclass, jstring jfsname) { + jboolean has_exception = JNI_FALSE; + auto fsname = rocksdb::JniUtil::copyStdString(env, jfsname, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return 0; + } + rocksdb::Env* hdfs_env; + rocksdb::Status s = rocksdb::NewHdfsEnv(&hdfs_env, fsname); + if (!s.ok()) { + // error occurred + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return 0; + } + return reinterpret_cast(hdfs_env); +} + +/* + * Class: org_rocksdb_HdfsEnv + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_HdfsEnv_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto* e = reinterpret_cast(jhandle); assert(e != nullptr); delete e; } + +/* + * Class: org_rocksdb_TimedEnv + * Method: createTimedEnv + * Signature: (J)J + */ +jlong Java_org_rocksdb_TimedEnv_createTimedEnv( + JNIEnv*, jclass, jlong jbase_env_handle) { + auto* base_env = reinterpret_cast(jbase_env_handle); + return reinterpret_cast(rocksdb::NewTimedEnv(base_env)); +} + +/* + * Class: org_rocksdb_TimedEnv + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_TimedEnv_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* e = reinterpret_cast(jhandle); + assert(e != nullptr); + delete e; +} + diff --git a/ceph/src/rocksdb/java/rocksjni/env_options.cc b/ceph/src/rocksdb/java/rocksjni/env_options.cc index 1c0ebe374..9ed330183 100644 --- a/ceph/src/rocksdb/java/rocksjni/env_options.cc +++ b/ceph/src/rocksdb/java/rocksjni/env_options.cc @@ -32,20 +32,32 @@ * Method: newEnvOptions * Signature: ()J */ -jlong Java_org_rocksdb_EnvOptions_newEnvOptions(JNIEnv * /*env*/, - jclass /*jcls*/) { +jlong Java_org_rocksdb_EnvOptions_newEnvOptions__( + JNIEnv*, jclass) { auto *env_opt = new rocksdb::EnvOptions(); return reinterpret_cast(env_opt); } +/* + * Class: org_rocksdb_EnvOptions + * Method: newEnvOptions + * Signature: (J)J + */ +jlong Java_org_rocksdb_EnvOptions_newEnvOptions__J( + JNIEnv*, jclass, jlong jdboptions_handle) { + auto* db_options = + reinterpret_cast(jdboptions_handle); + auto* env_opt = new rocksdb::EnvOptions(*db_options); + return reinterpret_cast(env_opt); +} + /* * Class: org_rocksdb_EnvOptions * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_EnvOptions_disposeInternal(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_EnvOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto *eo = reinterpret_cast(jhandle); assert(eo != nullptr); delete eo; @@ -53,93 +65,82 @@ void Java_org_rocksdb_EnvOptions_disposeInternal(JNIEnv * /*env*/, /* * Class: org_rocksdb_EnvOptions - * Method: setUseDirectReads + * Method: setUseMmapReads * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setUseDirectReads(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean use_direct_reads) { - ENV_OPTIONS_SET_BOOL(jhandle, use_direct_reads); +void Java_org_rocksdb_EnvOptions_setUseMmapReads( + JNIEnv*, jobject, jlong jhandle, jboolean use_mmap_reads) { + ENV_OPTIONS_SET_BOOL(jhandle, use_mmap_reads); } /* * Class: org_rocksdb_EnvOptions - * Method: useDirectReads + * Method: useMmapReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_useDirectReads(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return ENV_OPTIONS_GET(jhandle, use_direct_reads); +jboolean Java_org_rocksdb_EnvOptions_useMmapReads( + JNIEnv*, jobject, jlong jhandle) { + return ENV_OPTIONS_GET(jhandle, use_mmap_reads); } /* * Class: org_rocksdb_EnvOptions - * Method: setUseDirectWrites + * Method: setUseMmapWrites * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setUseDirectWrites( - JNIEnv * /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean use_direct_writes) { - ENV_OPTIONS_SET_BOOL(jhandle, use_direct_writes); +void Java_org_rocksdb_EnvOptions_setUseMmapWrites( + JNIEnv*, jobject, jlong jhandle, jboolean use_mmap_writes) { + ENV_OPTIONS_SET_BOOL(jhandle, use_mmap_writes); } /* * Class: org_rocksdb_EnvOptions - * Method: useDirectWrites + * Method: useMmapWrites * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_useDirectWrites(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return ENV_OPTIONS_GET(jhandle, use_direct_writes); +jboolean Java_org_rocksdb_EnvOptions_useMmapWrites( + JNIEnv*, jobject, jlong jhandle) { + return ENV_OPTIONS_GET(jhandle, use_mmap_writes); } /* * Class: org_rocksdb_EnvOptions - * Method: setUseMmapReads + * Method: setUseDirectReads * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setUseMmapReads(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean use_mmap_reads) { - ENV_OPTIONS_SET_BOOL(jhandle, use_mmap_reads); +void Java_org_rocksdb_EnvOptions_setUseDirectReads( + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_reads) { + ENV_OPTIONS_SET_BOOL(jhandle, use_direct_reads); } /* * Class: org_rocksdb_EnvOptions - * Method: useMmapReads + * Method: useDirectReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_useMmapReads(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return ENV_OPTIONS_GET(jhandle, use_mmap_reads); +jboolean Java_org_rocksdb_EnvOptions_useDirectReads( + JNIEnv*, jobject, jlong jhandle) { + return ENV_OPTIONS_GET(jhandle, use_direct_reads); } /* * Class: org_rocksdb_EnvOptions - * Method: setUseMmapWrites + * Method: setUseDirectWrites * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setUseMmapWrites(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean use_mmap_writes) { - ENV_OPTIONS_SET_BOOL(jhandle, use_mmap_writes); +void Java_org_rocksdb_EnvOptions_setUseDirectWrites( + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_writes) { + ENV_OPTIONS_SET_BOOL(jhandle, use_direct_writes); } /* * Class: org_rocksdb_EnvOptions - * Method: useMmapWrites + * Method: useDirectWrites * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_useMmapWrites(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return ENV_OPTIONS_GET(jhandle, use_mmap_writes); +jboolean Java_org_rocksdb_EnvOptions_useDirectWrites( + JNIEnv*, jobject, jlong jhandle) { + return ENV_OPTIONS_GET(jhandle, use_direct_writes); } /* @@ -147,10 +148,8 @@ jboolean Java_org_rocksdb_EnvOptions_useMmapWrites(JNIEnv * /*env*/, * Method: setAllowFallocate * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setAllowFallocate(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean allow_fallocate) { +void Java_org_rocksdb_EnvOptions_setAllowFallocate( + JNIEnv*, jobject, jlong jhandle, jboolean allow_fallocate) { ENV_OPTIONS_SET_BOOL(jhandle, allow_fallocate); } @@ -159,9 +158,8 @@ void Java_org_rocksdb_EnvOptions_setAllowFallocate(JNIEnv * /*env*/, * Method: allowFallocate * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_allowFallocate(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_EnvOptions_allowFallocate( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, allow_fallocate); } @@ -170,10 +168,8 @@ jboolean Java_org_rocksdb_EnvOptions_allowFallocate(JNIEnv * /*env*/, * Method: setSetFdCloexec * Signature: (JZ)V */ -void Java_org_rocksdb_EnvOptions_setSetFdCloexec(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean set_fd_cloexec) { +void Java_org_rocksdb_EnvOptions_setSetFdCloexec( + JNIEnv*, jobject, jlong jhandle, jboolean set_fd_cloexec) { ENV_OPTIONS_SET_BOOL(jhandle, set_fd_cloexec); } @@ -182,9 +178,8 @@ void Java_org_rocksdb_EnvOptions_setSetFdCloexec(JNIEnv * /*env*/, * Method: setFdCloexec * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_setFdCloexec(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_EnvOptions_setFdCloexec( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, set_fd_cloexec); } @@ -193,10 +188,8 @@ jboolean Java_org_rocksdb_EnvOptions_setFdCloexec(JNIEnv * /*env*/, * Method: setBytesPerSync * Signature: (JJ)V */ -void Java_org_rocksdb_EnvOptions_setBytesPerSync(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong bytes_per_sync) { +void Java_org_rocksdb_EnvOptions_setBytesPerSync( + JNIEnv*, jobject, jlong jhandle, jlong bytes_per_sync) { ENV_OPTIONS_SET_UINT64_T(jhandle, bytes_per_sync); } @@ -205,9 +198,8 @@ void Java_org_rocksdb_EnvOptions_setBytesPerSync(JNIEnv * /*env*/, * Method: bytesPerSync * Signature: (J)J */ -jlong Java_org_rocksdb_EnvOptions_bytesPerSync(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_EnvOptions_bytesPerSync( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, bytes_per_sync); } @@ -217,8 +209,7 @@ jlong Java_org_rocksdb_EnvOptions_bytesPerSync(JNIEnv * /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_EnvOptions_setFallocateWithKeepSize( - JNIEnv * /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean fallocate_with_keep_size) { + JNIEnv*, jobject, jlong jhandle, jboolean fallocate_with_keep_size) { ENV_OPTIONS_SET_BOOL(jhandle, fallocate_with_keep_size); } @@ -227,9 +218,8 @@ void Java_org_rocksdb_EnvOptions_setFallocateWithKeepSize( * Method: fallocateWithKeepSize * Signature: (J)Z */ -jboolean Java_org_rocksdb_EnvOptions_fallocateWithKeepSize(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_EnvOptions_fallocateWithKeepSize( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, fallocate_with_keep_size); } @@ -239,8 +229,7 @@ jboolean Java_org_rocksdb_EnvOptions_fallocateWithKeepSize(JNIEnv * /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_EnvOptions_setCompactionReadaheadSize( - JNIEnv * /*env*/, jobject /*jobj*/, jlong jhandle, - jlong compaction_readahead_size) { + JNIEnv*, jobject, jlong jhandle, jlong compaction_readahead_size) { ENV_OPTIONS_SET_SIZE_T(jhandle, compaction_readahead_size); } @@ -249,9 +238,8 @@ void Java_org_rocksdb_EnvOptions_setCompactionReadaheadSize( * Method: compactionReadaheadSize * Signature: (J)J */ -jlong Java_org_rocksdb_EnvOptions_compactionReadaheadSize(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_EnvOptions_compactionReadaheadSize( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, compaction_readahead_size); } @@ -261,8 +249,7 @@ jlong Java_org_rocksdb_EnvOptions_compactionReadaheadSize(JNIEnv * /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_EnvOptions_setRandomAccessMaxBufferSize( - JNIEnv * /*env*/, jobject /*jobj*/, jlong jhandle, - jlong random_access_max_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong random_access_max_buffer_size) { ENV_OPTIONS_SET_SIZE_T(jhandle, random_access_max_buffer_size); } @@ -271,9 +258,8 @@ void Java_org_rocksdb_EnvOptions_setRandomAccessMaxBufferSize( * Method: randomAccessMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_EnvOptions_randomAccessMaxBufferSize(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_EnvOptions_randomAccessMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, random_access_max_buffer_size); } @@ -283,8 +269,7 @@ jlong Java_org_rocksdb_EnvOptions_randomAccessMaxBufferSize(JNIEnv * /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_EnvOptions_setWritableFileMaxBufferSize( - JNIEnv * /*env*/, jobject /*jobj*/, jlong jhandle, - jlong writable_file_max_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong writable_file_max_buffer_size) { ENV_OPTIONS_SET_SIZE_T(jhandle, writable_file_max_buffer_size); } @@ -293,9 +278,8 @@ void Java_org_rocksdb_EnvOptions_setWritableFileMaxBufferSize( * Method: writableFileMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_EnvOptions_writableFileMaxBufferSize(JNIEnv * /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_EnvOptions_writableFileMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { return ENV_OPTIONS_GET(jhandle, writable_file_max_buffer_size); } @@ -304,9 +288,8 @@ jlong Java_org_rocksdb_EnvOptions_writableFileMaxBufferSize(JNIEnv * /*env*/, * Method: setRateLimiter * Signature: (JJ)V */ -void Java_org_rocksdb_EnvOptions_setRateLimiter(JNIEnv * /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong rl_handle) { +void Java_org_rocksdb_EnvOptions_setRateLimiter( + JNIEnv*, jobject, jlong jhandle, jlong rl_handle) { auto *sptr_rate_limiter = reinterpret_cast *>(rl_handle); auto *env_opt = reinterpret_cast(jhandle); diff --git a/ceph/src/rocksdb/java/rocksjni/ingest_external_file_options.cc b/ceph/src/rocksdb/java/rocksjni/ingest_external_file_options.cc index a26e6f6d5..e0871ff8e 100644 --- a/ceph/src/rocksdb/java/rocksjni/ingest_external_file_options.cc +++ b/ceph/src/rocksdb/java/rocksjni/ingest_external_file_options.cc @@ -17,7 +17,7 @@ * Signature: ()J */ jlong Java_org_rocksdb_IngestExternalFileOptions_newIngestExternalFileOptions__( - JNIEnv* /*env*/, jclass /*jclazz*/) { + JNIEnv*, jclass) { auto* options = new rocksdb::IngestExternalFileOptions(); return reinterpret_cast(options); } @@ -28,7 +28,7 @@ jlong Java_org_rocksdb_IngestExternalFileOptions_newIngestExternalFileOptions__( * Signature: (ZZZZ)J */ jlong Java_org_rocksdb_IngestExternalFileOptions_newIngestExternalFileOptions__ZZZZ( - JNIEnv* /*env*/, jclass /*jcls*/, jboolean jmove_files, + JNIEnv*, jclass, jboolean jmove_files, jboolean jsnapshot_consistency, jboolean jallow_global_seqno, jboolean jallow_blocking_flush) { auto* options = new rocksdb::IngestExternalFileOptions(); @@ -44,9 +44,8 @@ jlong Java_org_rocksdb_IngestExternalFileOptions_newIngestExternalFileOptions__Z * Method: moveFiles * Signature: (J)Z */ -jboolean Java_org_rocksdb_IngestExternalFileOptions_moveFiles(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_IngestExternalFileOptions_moveFiles( + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return static_cast(options->move_files); @@ -58,7 +57,7 @@ jboolean Java_org_rocksdb_IngestExternalFileOptions_moveFiles(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_IngestExternalFileOptions_setMoveFiles( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jboolean jmove_files) { + JNIEnv*, jobject, jlong jhandle, jboolean jmove_files) { auto* options = reinterpret_cast(jhandle); options->move_files = static_cast(jmove_files); @@ -70,7 +69,7 @@ void Java_org_rocksdb_IngestExternalFileOptions_setMoveFiles( * Signature: (J)Z */ jboolean Java_org_rocksdb_IngestExternalFileOptions_snapshotConsistency( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return static_cast(options->snapshot_consistency); @@ -82,8 +81,7 @@ jboolean Java_org_rocksdb_IngestExternalFileOptions_snapshotConsistency( * Signature: (JZ)V */ void Java_org_rocksdb_IngestExternalFileOptions_setSnapshotConsistency( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jsnapshot_consistency) { + JNIEnv*, jobject, jlong jhandle, jboolean jsnapshot_consistency) { auto* options = reinterpret_cast(jhandle); options->snapshot_consistency = static_cast(jsnapshot_consistency); @@ -95,7 +93,7 @@ void Java_org_rocksdb_IngestExternalFileOptions_setSnapshotConsistency( * Signature: (J)Z */ jboolean Java_org_rocksdb_IngestExternalFileOptions_allowGlobalSeqNo( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return static_cast(options->allow_global_seqno); @@ -107,8 +105,7 @@ jboolean Java_org_rocksdb_IngestExternalFileOptions_allowGlobalSeqNo( * Signature: (JZ)V */ void Java_org_rocksdb_IngestExternalFileOptions_setAllowGlobalSeqNo( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jallow_global_seqno) { + JNIEnv*, jobject, jlong jhandle, jboolean jallow_global_seqno) { auto* options = reinterpret_cast(jhandle); options->allow_global_seqno = static_cast(jallow_global_seqno); @@ -120,7 +117,7 @@ void Java_org_rocksdb_IngestExternalFileOptions_setAllowGlobalSeqNo( * Signature: (J)Z */ jboolean Java_org_rocksdb_IngestExternalFileOptions_allowBlockingFlush( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return static_cast(options->allow_blocking_flush); @@ -132,22 +129,68 @@ jboolean Java_org_rocksdb_IngestExternalFileOptions_allowBlockingFlush( * Signature: (JZ)V */ void Java_org_rocksdb_IngestExternalFileOptions_setAllowBlockingFlush( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jallow_blocking_flush) { + JNIEnv*, jobject, jlong jhandle, jboolean jallow_blocking_flush) { auto* options = reinterpret_cast(jhandle); options->allow_blocking_flush = static_cast(jallow_blocking_flush); } +/* + * Class: org_rocksdb_IngestExternalFileOptions + * Method: ingestBehind + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_IngestExternalFileOptions_ingestBehind( + JNIEnv*, jobject, jlong jhandle) { + auto* options = + reinterpret_cast(jhandle); + return options->ingest_behind == JNI_TRUE; +} + +/* + * Class: org_rocksdb_IngestExternalFileOptions + * Method: setIngestBehind + * Signature: (JZ)V + */ +void Java_org_rocksdb_IngestExternalFileOptions_setIngestBehind( + JNIEnv*, jobject, jlong jhandle, jboolean jingest_behind) { + auto* options = + reinterpret_cast(jhandle); + options->ingest_behind = jingest_behind == JNI_TRUE; +} + +/* + * Class: org_rocksdb_IngestExternalFileOptions + * Method: writeGlobalSeqno + * Signature: (J)Z + */ +JNIEXPORT jboolean JNICALL Java_org_rocksdb_IngestExternalFileOptions_writeGlobalSeqno( + JNIEnv*, jobject, jlong jhandle) { + auto* options = + reinterpret_cast(jhandle); + return options->write_global_seqno == JNI_TRUE; +} + +/* + * Class: org_rocksdb_IngestExternalFileOptions + * Method: setWriteGlobalSeqno + * Signature: (JZ)V + */ +JNIEXPORT void JNICALL Java_org_rocksdb_IngestExternalFileOptions_setWriteGlobalSeqno( + JNIEnv*, jobject, jlong jhandle, jboolean jwrite_global_seqno) { + auto* options = + reinterpret_cast(jhandle); + options->write_global_seqno = jwrite_global_seqno == JNI_TRUE; +} + /* * Class: org_rocksdb_IngestExternalFileOptions * Method: disposeInternal * Signature: (J)V */ void Java_org_rocksdb_IngestExternalFileOptions_disposeInternal( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); delete options; - // @lint-ignore TXT4 T25377293 Grandfathered in } \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/memory_util.cc b/ceph/src/rocksdb/java/rocksjni/memory_util.cc new file mode 100644 index 000000000..043850213 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/memory_util.cc @@ -0,0 +1,100 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#include +#include +#include +#include +#include + +#include "include/org_rocksdb_MemoryUtil.h" + +#include "rocksjni/portal.h" + +#include "rocksdb/utilities/memory_util.h" + + +/* + * Class: org_rocksdb_MemoryUtil + * Method: getApproximateMemoryUsageByType + * Signature: ([J[J)Ljava/util/Map; + */ +jobject Java_org_rocksdb_MemoryUtil_getApproximateMemoryUsageByType( + JNIEnv *env, jclass /*jclazz*/, jlongArray jdb_handles, jlongArray jcache_handles) { + + std::vector dbs; + jsize db_handle_count = env->GetArrayLength(jdb_handles); + if(db_handle_count > 0) { + jlong *ptr_jdb_handles = env->GetLongArrayElements(jdb_handles, nullptr); + if (ptr_jdb_handles == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + for (jsize i = 0; i < db_handle_count; i++) { + dbs.push_back(reinterpret_cast(ptr_jdb_handles[i])); + } + env->ReleaseLongArrayElements(jdb_handles, ptr_jdb_handles, JNI_ABORT); + } + + std::unordered_set cache_set; + jsize cache_handle_count = env->GetArrayLength(jcache_handles); + if(cache_handle_count > 0) { + jlong *ptr_jcache_handles = env->GetLongArrayElements(jcache_handles, nullptr); + if (ptr_jcache_handles == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + for (jsize i = 0; i < cache_handle_count; i++) { + auto *cache_ptr = + reinterpret_cast *>(ptr_jcache_handles[i]); + cache_set.insert(cache_ptr->get()); + } + env->ReleaseLongArrayElements(jcache_handles, ptr_jcache_handles, JNI_ABORT); + } + + std::map usage_by_type; + if(rocksdb::MemoryUtil::GetApproximateMemoryUsageByType(dbs, cache_set, &usage_by_type) != rocksdb::Status::OK()) { + // Non-OK status + return nullptr; + } + + jobject jusage_by_type = rocksdb::HashMapJni::construct( + env, static_cast(usage_by_type.size())); + if (jusage_by_type == nullptr) { + // exception occurred + return nullptr; + } + const rocksdb::HashMapJni::FnMapKV + fn_map_kv = + [env](const std::pair& pair) { + // Construct key + const jobject jusage_type = + rocksdb::ByteJni::valueOf(env, rocksdb::MemoryUsageTypeJni::toJavaMemoryUsageType(pair.first)); + if (jusage_type == nullptr) { + // an error occurred + return std::unique_ptr>(nullptr); + } + // Construct value + const jobject jusage_value = + rocksdb::LongJni::valueOf(env, pair.second); + if (jusage_value == nullptr) { + // an error occurred + return std::unique_ptr>(nullptr); + } + // Construct and return pointer to pair of jobjects + return std::unique_ptr>( + new std::pair(jusage_type, + jusage_value)); + }; + + if (!rocksdb::HashMapJni::putAll(env, jusage_by_type, usage_by_type.begin(), + usage_by_type.end(), fn_map_kv)) { + // exception occcurred + jusage_by_type = nullptr; + } + + return jusage_by_type; + +} diff --git a/ceph/src/rocksdb/java/rocksjni/memtablejni.cc b/ceph/src/rocksdb/java/rocksjni/memtablejni.cc index effb6eda0..ad704c3b1 100644 --- a/ceph/src/rocksdb/java/rocksjni/memtablejni.cc +++ b/ceph/src/rocksdb/java/rocksjni/memtablejni.cc @@ -20,7 +20,7 @@ jlong Java_org_rocksdb_HashSkipListMemTableConfig_newMemTableFactoryHandle( JNIEnv* env, jobject /*jobj*/, jlong jbucket_count, jint jheight, jint jbranching_factor) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jbucket_count); + rocksdb::Status s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jbucket_count); if (s.ok()) { return reinterpret_cast(rocksdb::NewHashSkipListRepFactory( static_cast(jbucket_count), static_cast(jheight), @@ -40,9 +40,9 @@ jlong Java_org_rocksdb_HashLinkedListMemTableConfig_newMemTableFactoryHandle( jlong jhuge_page_tlb_size, jint jbucket_entries_logging_threshold, jboolean jif_log_bucket_dist_when_flash, jint jthreshold_use_skiplist) { rocksdb::Status statusBucketCount = - rocksdb::check_if_jlong_fits_size_t(jbucket_count); + rocksdb::JniUtil::check_if_jlong_fits_size_t(jbucket_count); rocksdb::Status statusHugePageTlb = - rocksdb::check_if_jlong_fits_size_t(jhuge_page_tlb_size); + rocksdb::JniUtil::check_if_jlong_fits_size_t(jhuge_page_tlb_size); if (statusBucketCount.ok() && statusHugePageTlb.ok()) { return reinterpret_cast(rocksdb::NewHashLinkListRepFactory( static_cast(jbucket_count), @@ -63,7 +63,7 @@ jlong Java_org_rocksdb_HashLinkedListMemTableConfig_newMemTableFactoryHandle( */ jlong Java_org_rocksdb_VectorMemTableConfig_newMemTableFactoryHandle( JNIEnv* env, jobject /*jobj*/, jlong jreserved_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jreserved_size); + rocksdb::Status s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jreserved_size); if (s.ok()) { return reinterpret_cast( new rocksdb::VectorRepFactory(static_cast(jreserved_size))); @@ -79,7 +79,7 @@ jlong Java_org_rocksdb_VectorMemTableConfig_newMemTableFactoryHandle( */ jlong Java_org_rocksdb_SkipListMemTableConfig_newMemTableFactoryHandle0( JNIEnv* env, jobject /*jobj*/, jlong jlookahead) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jlookahead); + rocksdb::Status s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jlookahead); if (s.ok()) { return reinterpret_cast( new rocksdb::SkipListFactory(static_cast(jlookahead))); diff --git a/ceph/src/rocksdb/java/rocksjni/merge_operator.cc b/ceph/src/rocksdb/java/rocksjni/merge_operator.cc index 782153f57..e06a06f7e 100644 --- a/ceph/src/rocksdb/java/rocksjni/merge_operator.cc +++ b/ceph/src/rocksdb/java/rocksjni/merge_operator.cc @@ -13,6 +13,7 @@ #include #include "include/org_rocksdb_StringAppendOperator.h" +#include "include/org_rocksdb_UInt64AddOperator.h" #include "rocksdb/db.h" #include "rocksdb/memtablerep.h" #include "rocksdb/merge_operator.h" @@ -47,3 +48,28 @@ void Java_org_rocksdb_StringAppendOperator_disposeInternal(JNIEnv* /*env*/, reinterpret_cast*>(jhandle); delete sptr_string_append_op; // delete std::shared_ptr } + +/* + * Class: org_rocksdb_UInt64AddOperator + * Method: newSharedUInt64AddOperator + * Signature: ()J + */ +jlong Java_org_rocksdb_UInt64AddOperator_newSharedUInt64AddOperator( + JNIEnv* /*env*/, jclass /*jclazz*/) { + auto* sptr_uint64_add_op = new std::shared_ptr( + rocksdb::MergeOperators::CreateUInt64AddOperator()); + return reinterpret_cast(sptr_uint64_add_op); +} + +/* + * Class: org_rocksdb_UInt64AddOperator + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_UInt64AddOperator_disposeInternal(JNIEnv* /*env*/, + jobject /*jobj*/, + jlong jhandle) { + auto* sptr_uint64_add_op = + reinterpret_cast*>(jhandle); + delete sptr_uint64_add_op; // delete std::shared_ptr +} diff --git a/ceph/src/rocksdb/java/rocksjni/optimistic_transaction_db.cc b/ceph/src/rocksdb/java/rocksjni/optimistic_transaction_db.cc index 27c8d3822..1505ff989 100644 --- a/ceph/src/rocksdb/java/rocksjni/optimistic_transaction_db.cc +++ b/ceph/src/rocksdb/java/rocksjni/optimistic_transaction_db.cc @@ -22,7 +22,7 @@ * Signature: (JLjava/lang/String;)J */ jlong Java_org_rocksdb_OptimisticTransactionDB_open__JLjava_lang_String_2( - JNIEnv* env, jclass /*jcls*/, jlong joptions_handle, jstring jdb_path) { + JNIEnv* env, jclass, jlong joptions_handle, jstring jdb_path) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { // exception thrown: OutOfMemoryError @@ -50,7 +50,7 @@ jlong Java_org_rocksdb_OptimisticTransactionDB_open__JLjava_lang_String_2( */ jlongArray Java_org_rocksdb_OptimisticTransactionDB_open__JLjava_lang_String_2_3_3B_3J( - JNIEnv* env, jclass /*jcls*/, jlong jdb_options_handle, jstring jdb_path, + JNIEnv* env, jclass, jlong jdb_options_handle, jstring jdb_path, jobjectArray jcolumn_names, jlongArray jcolumn_options_handles) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { @@ -150,14 +150,40 @@ Java_org_rocksdb_OptimisticTransactionDB_open__JLjava_lang_String_2_3_3B_3J( return nullptr; } +/* + * Class: org_rocksdb_OptimisticTransactionDB + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_OptimisticTransactionDB_disposeInternal( + JNIEnv *, jobject, jlong jhandle) { + auto* optimistic_txn_db = + reinterpret_cast(jhandle); + assert(optimistic_txn_db != nullptr); + delete optimistic_txn_db; +} + +/* + * Class: org_rocksdb_OptimisticTransactionDB + * Method: closeDatabase + * Signature: (J)V + */ +void Java_org_rocksdb_OptimisticTransactionDB_closeDatabase( + JNIEnv* env, jclass, jlong jhandle) { + auto* optimistic_txn_db = + reinterpret_cast(jhandle); + assert(optimistic_txn_db != nullptr); + rocksdb::Status s = optimistic_txn_db->Close(); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); +} + /* * Class: org_rocksdb_OptimisticTransactionDB * Method: beginTransaction * Signature: (JJ)J */ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction__JJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle) { auto* optimistic_txn_db = reinterpret_cast(jhandle); auto* write_options = @@ -193,8 +219,8 @@ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction__JJJ( * Signature: (JJJ)J */ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction_1withOld__JJJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle, jlong jold_txn_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle, + jlong jold_txn_handle) { auto* optimistic_txn_db = reinterpret_cast(jhandle); auto* write_options = @@ -218,9 +244,8 @@ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction_1withOld__JJJ( * Signature: (JJJJ)J */ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction_1withOld__JJJJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle, jlong joptimistic_txn_options_handle, - jlong jold_txn_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle, + jlong joptimistic_txn_options_handle, jlong jold_txn_handle) { auto* optimistic_txn_db = reinterpret_cast(jhandle); auto* write_options = @@ -245,21 +270,9 @@ jlong Java_org_rocksdb_OptimisticTransactionDB_beginTransaction_1withOld__JJJJ( * Method: getBaseDB * Signature: (J)J */ -jlong Java_org_rocksdb_OptimisticTransactionDB_getBaseDB(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_OptimisticTransactionDB_getBaseDB( + JNIEnv*, jobject, jlong jhandle) { auto* optimistic_txn_db = reinterpret_cast(jhandle); return reinterpret_cast(optimistic_txn_db->GetBaseDB()); } - -/* - * Class: org_rocksdb_OptimisticTransactionDB - * Method: disposeInternal - * Signature: (J)V - */ -void Java_org_rocksdb_OptimisticTransactionDB_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { - delete reinterpret_cast(jhandle); -} diff --git a/ceph/src/rocksdb/java/rocksjni/options.cc b/ceph/src/rocksdb/java/rocksjni/options.cc index 9aed80e1e..12f44b5eb 100644 --- a/ceph/src/rocksdb/java/rocksjni/options.cc +++ b/ceph/src/rocksdb/java/rocksjni/options.cc @@ -22,6 +22,7 @@ #include "rocksjni/comparatorjnicallback.h" #include "rocksjni/portal.h" #include "rocksjni/statisticsjni.h" +#include "rocksjni/table_filter_jnicallback.h" #include "rocksdb/comparator.h" #include "rocksdb/convenience.h" @@ -40,7 +41,8 @@ * Method: newOptions * Signature: ()J */ -jlong Java_org_rocksdb_Options_newOptions__(JNIEnv* /*env*/, jclass /*jcls*/) { +jlong Java_org_rocksdb_Options_newOptions__( + JNIEnv*, jclass) { auto* op = new rocksdb::Options(); return reinterpret_cast(op); } @@ -50,9 +52,8 @@ jlong Java_org_rocksdb_Options_newOptions__(JNIEnv* /*env*/, jclass /*jcls*/) { * Method: newOptions * Signature: (JJ)J */ -jlong Java_org_rocksdb_Options_newOptions__JJ(JNIEnv* /*env*/, jclass /*jcls*/, - jlong jdboptions, - jlong jcfoptions) { +jlong Java_org_rocksdb_Options_newOptions__JJ( + JNIEnv*, jclass, jlong jdboptions, jlong jcfoptions) { auto* dbOpt = reinterpret_cast(jdboptions); auto* cfOpt = reinterpret_cast(jcfoptions); @@ -65,8 +66,8 @@ jlong Java_org_rocksdb_Options_newOptions__JJ(JNIEnv* /*env*/, jclass /*jcls*/, * Method: copyOptions * Signature: (J)J */ -jlong Java_org_rocksdb_Options_copyOptions(JNIEnv* /*env*/, jclass /*jcls*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_copyOptions( + JNIEnv*, jclass, jlong jhandle) { auto new_opt = new rocksdb::Options(*(reinterpret_cast(jhandle))); return reinterpret_cast(new_opt); @@ -77,8 +78,8 @@ jlong Java_org_rocksdb_Options_copyOptions(JNIEnv* /*env*/, jclass /*jcls*/, * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_Options_disposeInternal(JNIEnv* /*env*/, jobject /*jobj*/, - jlong handle) { +void Java_org_rocksdb_Options_disposeInternal( + JNIEnv*, jobject, jlong handle) { auto* op = reinterpret_cast(handle); assert(op != nullptr); delete op; @@ -89,10 +90,8 @@ void Java_org_rocksdb_Options_disposeInternal(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setIncreaseParallelism * Signature: (JI)V */ -void Java_org_rocksdb_Options_setIncreaseParallelism(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint totalThreads) { +void Java_org_rocksdb_Options_setIncreaseParallelism( + JNIEnv*, jobject, jlong jhandle, jint totalThreads) { reinterpret_cast(jhandle)->IncreaseParallelism( static_cast(totalThreads)); } @@ -102,9 +101,8 @@ void Java_org_rocksdb_Options_setIncreaseParallelism(JNIEnv* /*env*/, * Method: setCreateIfMissing * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setCreateIfMissing(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, jboolean flag) { +void Java_org_rocksdb_Options_setCreateIfMissing( + JNIEnv*, jobject, jlong jhandle, jboolean flag) { reinterpret_cast(jhandle)->create_if_missing = flag; } @@ -113,9 +111,8 @@ void Java_org_rocksdb_Options_setCreateIfMissing(JNIEnv* /*env*/, * Method: createIfMissing * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_createIfMissing(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_createIfMissing( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->create_if_missing; } @@ -124,10 +121,8 @@ jboolean Java_org_rocksdb_Options_createIfMissing(JNIEnv* /*env*/, * Method: setCreateMissingColumnFamilies * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setCreateMissingColumnFamilies(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean flag) { +void Java_org_rocksdb_Options_setCreateMissingColumnFamilies( + JNIEnv*, jobject, jlong jhandle, jboolean flag) { reinterpret_cast(jhandle)->create_missing_column_families = flag; } @@ -137,9 +132,8 @@ void Java_org_rocksdb_Options_setCreateMissingColumnFamilies(JNIEnv* /*env*/, * Method: createMissingColumnFamilies * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_createMissingColumnFamilies(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_createMissingColumnFamilies( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->create_missing_column_families; } @@ -149,10 +143,8 @@ jboolean Java_org_rocksdb_Options_createMissingColumnFamilies(JNIEnv* /*env*/, * Method: setComparatorHandle * Signature: (JI)V */ -void Java_org_rocksdb_Options_setComparatorHandle__JI(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint builtinComparator) { +void Java_org_rocksdb_Options_setComparatorHandle__JI( + JNIEnv*, jobject, jlong jhandle, jint builtinComparator) { switch (builtinComparator) { case 1: reinterpret_cast(jhandle)->comparator = @@ -170,11 +162,9 @@ void Java_org_rocksdb_Options_setComparatorHandle__JI(JNIEnv* /*env*/, * Method: setComparatorHandle * Signature: (JJB)V */ -void Java_org_rocksdb_Options_setComparatorHandle__JJB(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jopt_handle, - jlong jcomparator_handle, - jbyte jcomparator_type) { +void Java_org_rocksdb_Options_setComparatorHandle__JJB( + JNIEnv*, jobject, jlong jopt_handle, jlong jcomparator_handle, + jbyte jcomparator_type) { rocksdb::Comparator* comparator = nullptr; switch (jcomparator_type) { // JAVA_COMPARATOR @@ -203,10 +193,8 @@ void Java_org_rocksdb_Options_setComparatorHandle__JJB(JNIEnv* /*env*/, * Method: setMergeOperatorName * Signature: (JJjava/lang/String)V */ -void Java_org_rocksdb_Options_setMergeOperatorName(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jstring jop_name) { +void Java_org_rocksdb_Options_setMergeOperatorName( + JNIEnv* env, jobject, jlong jhandle, jstring jop_name) { const char* op_name = env->GetStringUTFChars(jop_name, nullptr); if (op_name == nullptr) { // exception thrown: OutOfMemoryError @@ -225,23 +213,50 @@ void Java_org_rocksdb_Options_setMergeOperatorName(JNIEnv* env, * Method: setMergeOperator * Signature: (JJjava/lang/String)V */ -void Java_org_rocksdb_Options_setMergeOperator(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong mergeOperatorHandle) { +void Java_org_rocksdb_Options_setMergeOperator( + JNIEnv*, jobject, jlong jhandle, jlong mergeOperatorHandle) { reinterpret_cast(jhandle)->merge_operator = *(reinterpret_cast*>( mergeOperatorHandle)); } +/* + * Class: org_rocksdb_Options + * Method: setCompactionFilterHandle + * Signature: (JJ)V + */ +void Java_org_rocksdb_Options_setCompactionFilterHandle( + JNIEnv*, jobject, jlong jopt_handle, + jlong jcompactionfilter_handle) { + reinterpret_cast(jopt_handle)-> + compaction_filter = reinterpret_cast + (jcompactionfilter_handle); +} + +/* + * Class: org_rocksdb_Options + * Method: setCompactionFilterFactoryHandle + * Signature: (JJ)V + */ +void JNICALL Java_org_rocksdb_Options_setCompactionFilterFactoryHandle( + JNIEnv*, jobject, jlong jopt_handle, + jlong jcompactionfilterfactory_handle) { + auto* cff_factory = + reinterpret_cast *>( + jcompactionfilterfactory_handle); + reinterpret_cast(jopt_handle)-> + compaction_filter_factory = *cff_factory; +} + /* * Class: org_rocksdb_Options * Method: setWriteBufferSize * Signature: (JJ)I */ -void Java_org_rocksdb_Options_setWriteBufferSize(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, - jlong jwrite_buffer_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jwrite_buffer_size); +void Java_org_rocksdb_Options_setWriteBufferSize( + JNIEnv* env, jobject, jlong jhandle, jlong jwrite_buffer_size) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jwrite_buffer_size); if (s.ok()) { reinterpret_cast(jhandle)->write_buffer_size = jwrite_buffer_size; @@ -250,14 +265,27 @@ void Java_org_rocksdb_Options_setWriteBufferSize(JNIEnv* env, jobject /*jobj*/, } } +/* + * Class: org_rocksdb_Options + * Method: setWriteBufferManager + * Signature: (JJ)V + */ +void Java_org_rocksdb_Options_setWriteBufferManager( + JNIEnv*, jobject, jlong joptions_handle, + jlong jwrite_buffer_manager_handle) { + auto* write_buffer_manager = + reinterpret_cast *>(jwrite_buffer_manager_handle); + reinterpret_cast(joptions_handle)->write_buffer_manager = + *write_buffer_manager; +} + /* * Class: org_rocksdb_Options * Method: writeBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_writeBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_writeBufferSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->write_buffer_size; } @@ -267,7 +295,7 @@ jlong Java_org_rocksdb_Options_writeBufferSize(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMaxWriteBufferNumber( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jmax_write_buffer_number) { reinterpret_cast(jhandle)->max_write_buffer_number = jmax_write_buffer_number; @@ -278,9 +306,8 @@ void Java_org_rocksdb_Options_setMaxWriteBufferNumber( * Method: setStatistics * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setStatistics(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jstatistics_handle) { +void Java_org_rocksdb_Options_setStatistics( + JNIEnv*, jobject, jlong jhandle, jlong jstatistics_handle) { auto* opt = reinterpret_cast(jhandle); auto* pSptr = reinterpret_cast*>( jstatistics_handle); @@ -292,8 +319,8 @@ void Java_org_rocksdb_Options_setStatistics(JNIEnv* /*env*/, jobject /*jobj*/, * Method: statistics * Signature: (J)J */ -jlong Java_org_rocksdb_Options_statistics(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_statistics( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); std::shared_ptr sptr = opt->statistics; if (sptr == nullptr) { @@ -310,9 +337,8 @@ jlong Java_org_rocksdb_Options_statistics(JNIEnv* /*env*/, jobject /*jobj*/, * Method: maxWriteBufferNumber * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxWriteBufferNumber(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxWriteBufferNumber( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_write_buffer_number; } @@ -321,9 +347,8 @@ jint Java_org_rocksdb_Options_maxWriteBufferNumber(JNIEnv* /*env*/, * Method: errorIfExists * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_errorIfExists(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_errorIfExists( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->error_if_exists; } @@ -332,9 +357,8 @@ jboolean Java_org_rocksdb_Options_errorIfExists(JNIEnv* /*env*/, * Method: setErrorIfExists * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setErrorIfExists(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean error_if_exists) { +void Java_org_rocksdb_Options_setErrorIfExists( + JNIEnv*, jobject, jlong jhandle, jboolean error_if_exists) { reinterpret_cast(jhandle)->error_if_exists = static_cast(error_if_exists); } @@ -344,9 +368,8 @@ void Java_org_rocksdb_Options_setErrorIfExists(JNIEnv* /*env*/, * Method: paranoidChecks * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_paranoidChecks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_paranoidChecks( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->paranoid_checks; } @@ -355,9 +378,8 @@ jboolean Java_org_rocksdb_Options_paranoidChecks(JNIEnv* /*env*/, * Method: setParanoidChecks * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setParanoidChecks(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean paranoid_checks) { +void Java_org_rocksdb_Options_setParanoidChecks( + JNIEnv*, jobject, jlong jhandle, jboolean paranoid_checks) { reinterpret_cast(jhandle)->paranoid_checks = static_cast(paranoid_checks); } @@ -367,8 +389,8 @@ void Java_org_rocksdb_Options_setParanoidChecks(JNIEnv* /*env*/, * Method: setEnv * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setEnv(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jlong jenv) { +void Java_org_rocksdb_Options_setEnv( + JNIEnv*, jobject, jlong jhandle, jlong jenv) { reinterpret_cast(jhandle)->env = reinterpret_cast(jenv); } @@ -378,10 +400,8 @@ void Java_org_rocksdb_Options_setEnv(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setMaxTotalWalSize * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setMaxTotalWalSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jmax_total_wal_size) { +void Java_org_rocksdb_Options_setMaxTotalWalSize( + JNIEnv*, jobject, jlong jhandle, jlong jmax_total_wal_size) { reinterpret_cast(jhandle)->max_total_wal_size = static_cast(jmax_total_wal_size); } @@ -391,9 +411,8 @@ void Java_org_rocksdb_Options_setMaxTotalWalSize(JNIEnv* /*env*/, * Method: maxTotalWalSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxTotalWalSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxTotalWalSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_total_wal_size; } @@ -402,8 +421,8 @@ jlong Java_org_rocksdb_Options_maxTotalWalSize(JNIEnv* /*env*/, * Method: maxOpenFiles * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxOpenFiles( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_open_files; } @@ -412,9 +431,8 @@ jint Java_org_rocksdb_Options_maxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setMaxOpenFiles * Signature: (JI)V */ -void Java_org_rocksdb_Options_setMaxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jint max_open_files) { +void Java_org_rocksdb_Options_setMaxOpenFiles( + JNIEnv*, jobject, jlong jhandle, jint max_open_files) { reinterpret_cast(jhandle)->max_open_files = static_cast(max_open_files); } @@ -425,8 +443,7 @@ void Java_org_rocksdb_Options_setMaxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMaxFileOpeningThreads( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmax_file_opening_threads) { + JNIEnv*, jobject, jlong jhandle, jint jmax_file_opening_threads) { reinterpret_cast(jhandle)->max_file_opening_threads = static_cast(jmax_file_opening_threads); } @@ -436,9 +453,8 @@ void Java_org_rocksdb_Options_setMaxFileOpeningThreads( * Method: maxFileOpeningThreads * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxFileOpeningThreads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxFileOpeningThreads( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_file_opening_threads); } @@ -448,8 +464,8 @@ jint Java_org_rocksdb_Options_maxFileOpeningThreads(JNIEnv* /*env*/, * Method: useFsync * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_useFsync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_useFsync( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_fsync; } @@ -458,8 +474,8 @@ jboolean Java_org_rocksdb_Options_useFsync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setUseFsync * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setUseFsync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jboolean use_fsync) { +void Java_org_rocksdb_Options_setUseFsync( + JNIEnv*, jobject, jlong jhandle, jboolean use_fsync) { reinterpret_cast(jhandle)->use_fsync = static_cast(use_fsync); } @@ -469,9 +485,9 @@ void Java_org_rocksdb_Options_setUseFsync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setDbPaths * Signature: (J[Ljava/lang/String;[J)V */ -void Java_org_rocksdb_Options_setDbPaths(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jobjectArray jpaths, - jlongArray jtarget_sizes) { +void Java_org_rocksdb_Options_setDbPaths( + JNIEnv* env, jobject, jlong jhandle, jobjectArray jpaths, + jlongArray jtarget_sizes) { std::vector db_paths; jlong* ptr_jtarget_size = env->GetLongArrayElements(jtarget_sizes, nullptr); if (ptr_jtarget_size == nullptr) { @@ -515,8 +531,8 @@ void Java_org_rocksdb_Options_setDbPaths(JNIEnv* env, jobject /*jobj*/, * Method: dbPathsLen * Signature: (J)J */ -jlong Java_org_rocksdb_Options_dbPathsLen(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_dbPathsLen( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->db_paths.size()); } @@ -526,9 +542,9 @@ jlong Java_org_rocksdb_Options_dbPathsLen(JNIEnv* /*env*/, jobject /*jobj*/, * Method: dbPaths * Signature: (J[Ljava/lang/String;[J)V */ -void Java_org_rocksdb_Options_dbPaths(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jobjectArray jpaths, - jlongArray jtarget_sizes) { +void Java_org_rocksdb_Options_dbPaths( + JNIEnv* env, jobject, jlong jhandle, jobjectArray jpaths, + jlongArray jtarget_sizes) { jlong* ptr_jtarget_size = env->GetLongArrayElements(jtarget_sizes, nullptr); if (ptr_jtarget_size == nullptr) { // exception thrown: OutOfMemoryError @@ -565,8 +581,8 @@ void Java_org_rocksdb_Options_dbPaths(JNIEnv* env, jobject /*jobj*/, * Method: dbLogDir * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_Options_dbLogDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_Options_dbLogDir( + JNIEnv* env, jobject, jlong jhandle) { return env->NewStringUTF( reinterpret_cast(jhandle)->db_log_dir.c_str()); } @@ -576,8 +592,8 @@ jstring Java_org_rocksdb_Options_dbLogDir(JNIEnv* env, jobject /*jobj*/, * Method: setDbLogDir * Signature: (JLjava/lang/String)V */ -void Java_org_rocksdb_Options_setDbLogDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jstring jdb_log_dir) { +void Java_org_rocksdb_Options_setDbLogDir( + JNIEnv* env, jobject, jlong jhandle, jstring jdb_log_dir) { const char* log_dir = env->GetStringUTFChars(jdb_log_dir, nullptr); if (log_dir == nullptr) { // exception thrown: OutOfMemoryError @@ -592,8 +608,8 @@ void Java_org_rocksdb_Options_setDbLogDir(JNIEnv* env, jobject /*jobj*/, * Method: walDir * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_Options_walDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_Options_walDir( + JNIEnv* env, jobject, jlong jhandle) { return env->NewStringUTF( reinterpret_cast(jhandle)->wal_dir.c_str()); } @@ -603,8 +619,8 @@ jstring Java_org_rocksdb_Options_walDir(JNIEnv* env, jobject /*jobj*/, * Method: setWalDir * Signature: (JLjava/lang/String)V */ -void Java_org_rocksdb_Options_setWalDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jstring jwal_dir) { +void Java_org_rocksdb_Options_setWalDir( + JNIEnv* env, jobject, jlong jhandle, jstring jwal_dir) { const char* wal_dir = env->GetStringUTFChars(jwal_dir, nullptr); if (wal_dir == nullptr) { // exception thrown: OutOfMemoryError @@ -619,9 +635,8 @@ void Java_org_rocksdb_Options_setWalDir(JNIEnv* env, jobject /*jobj*/, * Method: deleteObsoleteFilesPeriodMicros * Signature: (J)J */ -jlong Java_org_rocksdb_Options_deleteObsoleteFilesPeriodMicros(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_deleteObsoleteFilesPeriodMicros( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->delete_obsolete_files_period_micros; } @@ -632,7 +647,7 @@ jlong Java_org_rocksdb_Options_deleteObsoleteFilesPeriodMicros(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setDeleteObsoleteFilesPeriodMicros( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jlong micros) { + JNIEnv*, jobject, jlong jhandle, jlong micros) { reinterpret_cast(jhandle) ->delete_obsolete_files_period_micros = static_cast(micros); } @@ -642,10 +657,8 @@ void Java_org_rocksdb_Options_setDeleteObsoleteFilesPeriodMicros( * Method: setBaseBackgroundCompactions * Signature: (JI)V */ -void Java_org_rocksdb_Options_setBaseBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max) { +void Java_org_rocksdb_Options_setBaseBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->base_background_compactions = static_cast(max); } @@ -655,9 +668,8 @@ void Java_org_rocksdb_Options_setBaseBackgroundCompactions(JNIEnv* /*env*/, * Method: baseBackgroundCompactions * Signature: (J)I */ -jint Java_org_rocksdb_Options_baseBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_baseBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->base_background_compactions; } @@ -667,9 +679,8 @@ jint Java_org_rocksdb_Options_baseBackgroundCompactions(JNIEnv* /*env*/, * Method: maxBackgroundCompactions * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_background_compactions; } @@ -679,10 +690,8 @@ jint Java_org_rocksdb_Options_maxBackgroundCompactions(JNIEnv* /*env*/, * Method: setMaxBackgroundCompactions * Signature: (JI)V */ -void Java_org_rocksdb_Options_setMaxBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max) { +void Java_org_rocksdb_Options_setMaxBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->max_background_compactions = static_cast(max); } @@ -692,9 +701,8 @@ void Java_org_rocksdb_Options_setMaxBackgroundCompactions(JNIEnv* /*env*/, * Method: setMaxSubcompactions * Signature: (JI)V */ -void Java_org_rocksdb_Options_setMaxSubcompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, jint max) { +void Java_org_rocksdb_Options_setMaxSubcompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->max_subcompactions = static_cast(max); } @@ -704,9 +712,8 @@ void Java_org_rocksdb_Options_setMaxSubcompactions(JNIEnv* /*env*/, * Method: maxSubcompactions * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxSubcompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxSubcompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_subcompactions; } @@ -715,9 +722,8 @@ jint Java_org_rocksdb_Options_maxSubcompactions(JNIEnv* /*env*/, * Method: maxBackgroundFlushes * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxBackgroundFlushes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxBackgroundFlushes( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_background_flushes; } @@ -727,8 +733,7 @@ jint Java_org_rocksdb_Options_maxBackgroundFlushes(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMaxBackgroundFlushes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint max_background_flushes) { + JNIEnv*, jobject, jlong jhandle, jint max_background_flushes) { reinterpret_cast(jhandle)->max_background_flushes = static_cast(max_background_flushes); } @@ -738,9 +743,8 @@ void Java_org_rocksdb_Options_setMaxBackgroundFlushes( * Method: maxBackgroundJobs * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxBackgroundJobs(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxBackgroundJobs( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_background_jobs; } @@ -749,10 +753,8 @@ jint Java_org_rocksdb_Options_maxBackgroundJobs(JNIEnv* /*env*/, * Method: setMaxBackgroundJobs * Signature: (JI)V */ -void Java_org_rocksdb_Options_setMaxBackgroundJobs(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max_background_jobs) { +void Java_org_rocksdb_Options_setMaxBackgroundJobs( + JNIEnv*, jobject, jlong jhandle, jint max_background_jobs) { reinterpret_cast(jhandle)->max_background_jobs = static_cast(max_background_jobs); } @@ -762,8 +764,8 @@ void Java_org_rocksdb_Options_setMaxBackgroundJobs(JNIEnv* /*env*/, * Method: maxLogFileSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxLogFileSize(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxLogFileSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_log_file_size; } @@ -772,10 +774,9 @@ jlong Java_org_rocksdb_Options_maxLogFileSize(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setMaxLogFileSize * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setMaxLogFileSize(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, - jlong max_log_file_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(max_log_file_size); +void Java_org_rocksdb_Options_setMaxLogFileSize( + JNIEnv* env, jobject, jlong jhandle, jlong max_log_file_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(max_log_file_size); if (s.ok()) { reinterpret_cast(jhandle)->max_log_file_size = max_log_file_size; @@ -789,9 +790,8 @@ void Java_org_rocksdb_Options_setMaxLogFileSize(JNIEnv* env, jobject /*jobj*/, * Method: logFileTimeToRoll * Signature: (J)J */ -jlong Java_org_rocksdb_Options_logFileTimeToRoll(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_logFileTimeToRoll( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->log_file_time_to_roll; } @@ -801,9 +801,9 @@ jlong Java_org_rocksdb_Options_logFileTimeToRoll(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setLogFileTimeToRoll( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, jlong log_file_time_to_roll) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(log_file_time_to_roll); + JNIEnv* env, jobject, jlong jhandle, jlong log_file_time_to_roll) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(log_file_time_to_roll); if (s.ok()) { reinterpret_cast(jhandle)->log_file_time_to_roll = log_file_time_to_roll; @@ -817,8 +817,8 @@ void Java_org_rocksdb_Options_setLogFileTimeToRoll( * Method: keepLogFileNum * Signature: (J)J */ -jlong Java_org_rocksdb_Options_keepLogFileNum(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_keepLogFileNum( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->keep_log_file_num; } @@ -827,10 +827,9 @@ jlong Java_org_rocksdb_Options_keepLogFileNum(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setKeepLogFileNum * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setKeepLogFileNum(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, - jlong keep_log_file_num) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(keep_log_file_num); +void Java_org_rocksdb_Options_setKeepLogFileNum( + JNIEnv* env, jobject, jlong jhandle, jlong keep_log_file_num) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(keep_log_file_num); if (s.ok()) { reinterpret_cast(jhandle)->keep_log_file_num = keep_log_file_num; @@ -844,9 +843,8 @@ void Java_org_rocksdb_Options_setKeepLogFileNum(JNIEnv* env, jobject /*jobj*/, * Method: recycleLogFileNum * Signature: (J)J */ -jlong Java_org_rocksdb_Options_recycleLogFileNum(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_recycleLogFileNum( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->recycle_log_file_num; } @@ -855,11 +853,9 @@ jlong Java_org_rocksdb_Options_recycleLogFileNum(JNIEnv* /*env*/, * Method: setRecycleLogFileNum * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setRecycleLogFileNum(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jlong recycle_log_file_num) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(recycle_log_file_num); +void Java_org_rocksdb_Options_setRecycleLogFileNum( + JNIEnv* env, jobject, jlong jhandle, jlong recycle_log_file_num) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(recycle_log_file_num); if (s.ok()) { reinterpret_cast(jhandle)->recycle_log_file_num = recycle_log_file_num; @@ -873,9 +869,8 @@ void Java_org_rocksdb_Options_setRecycleLogFileNum(JNIEnv* env, * Method: maxManifestFileSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxManifestFileSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxManifestFileSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_manifest_file_size; } @@ -883,9 +878,8 @@ jlong Java_org_rocksdb_Options_maxManifestFileSize(JNIEnv* /*env*/, * Method: memTableFactoryName * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_Options_memTableFactoryName(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_Options_memTableFactoryName( + JNIEnv* env, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); rocksdb::MemTableRepFactory* tf = opt->memtable_factory.get(); @@ -907,8 +901,7 @@ jstring Java_org_rocksdb_Options_memTableFactoryName(JNIEnv* env, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMaxManifestFileSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong max_manifest_file_size) { + JNIEnv*, jobject, jlong jhandle, jlong max_manifest_file_size) { reinterpret_cast(jhandle)->max_manifest_file_size = static_cast(max_manifest_file_size); } @@ -917,10 +910,8 @@ void Java_org_rocksdb_Options_setMaxManifestFileSize( * Method: setMemTableFactory * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setMemTableFactory(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jfactory_handle) { +void Java_org_rocksdb_Options_setMemTableFactory( + JNIEnv*, jobject, jlong jhandle, jlong jfactory_handle) { reinterpret_cast(jhandle)->memtable_factory.reset( reinterpret_cast(jfactory_handle)); } @@ -930,9 +921,8 @@ void Java_org_rocksdb_Options_setMemTableFactory(JNIEnv* /*env*/, * Method: setRateLimiter * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setRateLimiter(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jrate_limiter_handle) { +void Java_org_rocksdb_Options_setRateLimiter( + JNIEnv*, jobject, jlong jhandle, jlong jrate_limiter_handle) { std::shared_ptr* pRateLimiter = reinterpret_cast*>( jrate_limiter_handle); @@ -945,8 +935,7 @@ void Java_org_rocksdb_Options_setRateLimiter(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setSstFileManager( - JNIEnv* /*env*/, jobject /*job*/, jlong jhandle, - jlong jsst_file_manager_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jsst_file_manager_handle) { auto* sptr_sst_file_manager = reinterpret_cast*>( jsst_file_manager_handle); @@ -959,8 +948,8 @@ void Java_org_rocksdb_Options_setSstFileManager( * Method: setLogger * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setLogger(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jlong jlogger_handle) { +void Java_org_rocksdb_Options_setLogger( + JNIEnv*, jobject, jlong jhandle, jlong jlogger_handle) { std::shared_ptr* pLogger = reinterpret_cast*>( jlogger_handle); @@ -972,8 +961,8 @@ void Java_org_rocksdb_Options_setLogger(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setInfoLogLevel * Signature: (JB)V */ -void Java_org_rocksdb_Options_setInfoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jbyte jlog_level) { +void Java_org_rocksdb_Options_setInfoLogLevel( + JNIEnv*, jobject, jlong jhandle, jbyte jlog_level) { reinterpret_cast(jhandle)->info_log_level = static_cast(jlog_level); } @@ -983,8 +972,8 @@ void Java_org_rocksdb_Options_setInfoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, * Method: infoLogLevel * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_infoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_infoLogLevel( + JNIEnv*, jobject, jlong jhandle) { return static_cast( reinterpret_cast(jhandle)->info_log_level); } @@ -994,9 +983,8 @@ jbyte Java_org_rocksdb_Options_infoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, * Method: tableCacheNumshardbits * Signature: (J)I */ -jint Java_org_rocksdb_Options_tableCacheNumshardbits(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_tableCacheNumshardbits( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->table_cache_numshardbits; } @@ -1006,8 +994,7 @@ jint Java_org_rocksdb_Options_tableCacheNumshardbits(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setTableCacheNumshardbits( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint table_cache_numshardbits) { + JNIEnv*, jobject, jlong jhandle, jint table_cache_numshardbits) { reinterpret_cast(jhandle)->table_cache_numshardbits = static_cast(table_cache_numshardbits); } @@ -1017,7 +1004,7 @@ void Java_org_rocksdb_Options_setTableCacheNumshardbits( * Signature: (JI)V */ void Java_org_rocksdb_Options_useFixedLengthPrefixExtractor( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jprefix_length) { + JNIEnv*, jobject, jlong jhandle, jint jprefix_length) { reinterpret_cast(jhandle)->prefix_extractor.reset( rocksdb::NewFixedPrefixTransform(static_cast(jprefix_length))); } @@ -1026,10 +1013,8 @@ void Java_org_rocksdb_Options_useFixedLengthPrefixExtractor( * Method: useCappedPrefixExtractor * Signature: (JI)V */ -void Java_org_rocksdb_Options_useCappedPrefixExtractor(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint jprefix_length) { +void Java_org_rocksdb_Options_useCappedPrefixExtractor( + JNIEnv*, jobject, jlong jhandle, jint jprefix_length) { reinterpret_cast(jhandle)->prefix_extractor.reset( rocksdb::NewCappedPrefixTransform(static_cast(jprefix_length))); } @@ -1039,8 +1024,8 @@ void Java_org_rocksdb_Options_useCappedPrefixExtractor(JNIEnv* /*env*/, * Method: walTtlSeconds * Signature: (J)J */ -jlong Java_org_rocksdb_Options_walTtlSeconds(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_walTtlSeconds( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->WAL_ttl_seconds; } @@ -1049,9 +1034,8 @@ jlong Java_org_rocksdb_Options_walTtlSeconds(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setWalTtlSeconds * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setWalTtlSeconds(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong WAL_ttl_seconds) { +void Java_org_rocksdb_Options_setWalTtlSeconds( + JNIEnv*, jobject, jlong jhandle, jlong WAL_ttl_seconds) { reinterpret_cast(jhandle)->WAL_ttl_seconds = static_cast(WAL_ttl_seconds); } @@ -1061,8 +1045,8 @@ void Java_org_rocksdb_Options_setWalTtlSeconds(JNIEnv* /*env*/, * Method: walTtlSeconds * Signature: (J)J */ -jlong Java_org_rocksdb_Options_walSizeLimitMB(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_walSizeLimitMB( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->WAL_size_limit_MB; } @@ -1071,9 +1055,8 @@ jlong Java_org_rocksdb_Options_walSizeLimitMB(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setWalSizeLimitMB * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setWalSizeLimitMB(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong WAL_size_limit_MB) { +void Java_org_rocksdb_Options_setWalSizeLimitMB( + JNIEnv*, jobject, jlong jhandle, jlong WAL_size_limit_MB) { reinterpret_cast(jhandle)->WAL_size_limit_MB = static_cast(WAL_size_limit_MB); } @@ -1083,9 +1066,8 @@ void Java_org_rocksdb_Options_setWalSizeLimitMB(JNIEnv* /*env*/, * Method: manifestPreallocationSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_manifestPreallocationSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_manifestPreallocationSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->manifest_preallocation_size; } @@ -1096,8 +1078,8 @@ jlong Java_org_rocksdb_Options_manifestPreallocationSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setManifestPreallocationSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, jlong preallocation_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(preallocation_size); + JNIEnv* env, jobject, jlong jhandle, jlong preallocation_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(preallocation_size); if (s.ok()) { reinterpret_cast(jhandle)->manifest_preallocation_size = preallocation_size; @@ -1110,11 +1092,12 @@ void Java_org_rocksdb_Options_setManifestPreallocationSize( * Method: setTableFactory * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setTableFactory(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jfactory_handle) { - reinterpret_cast(jhandle)->table_factory.reset( - reinterpret_cast(jfactory_handle)); +void Java_org_rocksdb_Options_setTableFactory( + JNIEnv*, jobject, jlong jhandle, jlong jtable_factory_handle) { + auto* options = reinterpret_cast(jhandle); + auto* table_factory = + reinterpret_cast(jtable_factory_handle); + options->table_factory.reset(table_factory); } /* @@ -1122,9 +1105,8 @@ void Java_org_rocksdb_Options_setTableFactory(JNIEnv* /*env*/, jobject /*jobj*/, * Method: allowMmapReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_allowMmapReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_allowMmapReads( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->allow_mmap_reads; } @@ -1133,9 +1115,8 @@ jboolean Java_org_rocksdb_Options_allowMmapReads(JNIEnv* /*env*/, * Method: setAllowMmapReads * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setAllowMmapReads(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean allow_mmap_reads) { +void Java_org_rocksdb_Options_setAllowMmapReads( + JNIEnv*, jobject, jlong jhandle, jboolean allow_mmap_reads) { reinterpret_cast(jhandle)->allow_mmap_reads = static_cast(allow_mmap_reads); } @@ -1145,9 +1126,8 @@ void Java_org_rocksdb_Options_setAllowMmapReads(JNIEnv* /*env*/, * Method: allowMmapWrites * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_allowMmapWrites(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_allowMmapWrites( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->allow_mmap_writes; } @@ -1156,10 +1136,8 @@ jboolean Java_org_rocksdb_Options_allowMmapWrites(JNIEnv* /*env*/, * Method: setAllowMmapWrites * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setAllowMmapWrites(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean allow_mmap_writes) { +void Java_org_rocksdb_Options_setAllowMmapWrites( + JNIEnv*, jobject, jlong jhandle, jboolean allow_mmap_writes) { reinterpret_cast(jhandle)->allow_mmap_writes = static_cast(allow_mmap_writes); } @@ -1169,9 +1147,8 @@ void Java_org_rocksdb_Options_setAllowMmapWrites(JNIEnv* /*env*/, * Method: useDirectReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_useDirectReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_useDirectReads( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_direct_reads; } @@ -1180,9 +1157,8 @@ jboolean Java_org_rocksdb_Options_useDirectReads(JNIEnv* /*env*/, * Method: setUseDirectReads * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setUseDirectReads(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean use_direct_reads) { +void Java_org_rocksdb_Options_setUseDirectReads( + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_reads) { reinterpret_cast(jhandle)->use_direct_reads = static_cast(use_direct_reads); } @@ -1193,7 +1169,7 @@ void Java_org_rocksdb_Options_setUseDirectReads(JNIEnv* /*env*/, * Signature: (J)Z */ jboolean Java_org_rocksdb_Options_useDirectIoForFlushAndCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->use_direct_io_for_flush_and_compaction; } @@ -1204,7 +1180,7 @@ jboolean Java_org_rocksdb_Options_useDirectIoForFlushAndCompaction( * Signature: (JZ)V */ void Java_org_rocksdb_Options_setUseDirectIoForFlushAndCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_io_for_flush_and_compaction) { reinterpret_cast(jhandle) ->use_direct_io_for_flush_and_compaction = @@ -1216,9 +1192,8 @@ void Java_org_rocksdb_Options_setUseDirectIoForFlushAndCompaction( * Method: setAllowFAllocate * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setAllowFAllocate(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean jallow_fallocate) { +void Java_org_rocksdb_Options_setAllowFAllocate( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_fallocate) { reinterpret_cast(jhandle)->allow_fallocate = static_cast(jallow_fallocate); } @@ -1228,9 +1203,8 @@ void Java_org_rocksdb_Options_setAllowFAllocate(JNIEnv* /*env*/, * Method: allowFAllocate * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_allowFAllocate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_allowFAllocate( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->allow_fallocate); } @@ -1240,9 +1214,8 @@ jboolean Java_org_rocksdb_Options_allowFAllocate(JNIEnv* /*env*/, * Method: isFdCloseOnExec * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_isFdCloseOnExec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_isFdCloseOnExec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->is_fd_close_on_exec; } @@ -1251,10 +1224,8 @@ jboolean Java_org_rocksdb_Options_isFdCloseOnExec(JNIEnv* /*env*/, * Method: setIsFdCloseOnExec * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setIsFdCloseOnExec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean is_fd_close_on_exec) { +void Java_org_rocksdb_Options_setIsFdCloseOnExec( + JNIEnv*, jobject, jlong jhandle, jboolean is_fd_close_on_exec) { reinterpret_cast(jhandle)->is_fd_close_on_exec = static_cast(is_fd_close_on_exec); } @@ -1264,9 +1235,8 @@ void Java_org_rocksdb_Options_setIsFdCloseOnExec(JNIEnv* /*env*/, * Method: statsDumpPeriodSec * Signature: (J)I */ -jint Java_org_rocksdb_Options_statsDumpPeriodSec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_statsDumpPeriodSec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->stats_dump_period_sec; } @@ -1276,7 +1246,7 @@ jint Java_org_rocksdb_Options_statsDumpPeriodSec(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setStatsDumpPeriodSec( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint stats_dump_period_sec) { reinterpret_cast(jhandle)->stats_dump_period_sec = static_cast(stats_dump_period_sec); @@ -1287,9 +1257,8 @@ void Java_org_rocksdb_Options_setStatsDumpPeriodSec( * Method: adviseRandomOnOpen * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_adviseRandomOnOpen(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_adviseRandomOnOpen( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->advise_random_on_open; } @@ -1299,7 +1268,7 @@ jboolean Java_org_rocksdb_Options_adviseRandomOnOpen(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setAdviseRandomOnOpen( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean advise_random_on_open) { reinterpret_cast(jhandle)->advise_random_on_open = static_cast(advise_random_on_open); @@ -1311,7 +1280,7 @@ void Java_org_rocksdb_Options_setAdviseRandomOnOpen( * Signature: (JJ)V */ void Java_org_rocksdb_Options_setDbWriteBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jdb_write_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->db_write_buffer_size = static_cast(jdb_write_buffer_size); @@ -1322,9 +1291,8 @@ void Java_org_rocksdb_Options_setDbWriteBufferSize( * Method: dbWriteBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_dbWriteBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_dbWriteBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->db_write_buffer_size); } @@ -1335,7 +1303,7 @@ jlong Java_org_rocksdb_Options_dbWriteBufferSize(JNIEnv* /*env*/, * Signature: (JB)V */ void Java_org_rocksdb_Options_setAccessHintOnCompactionStart( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jbyte jaccess_hint_value) { auto* opt = reinterpret_cast(jhandle); opt->access_hint_on_compaction_start = @@ -1347,9 +1315,8 @@ void Java_org_rocksdb_Options_setAccessHintOnCompactionStart( * Method: accessHintOnCompactionStart * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_accessHintOnCompactionStart(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_accessHintOnCompactionStart( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return rocksdb::AccessHintJni::toJavaAccessHint( opt->access_hint_on_compaction_start); @@ -1361,7 +1328,7 @@ jbyte Java_org_rocksdb_Options_accessHintOnCompactionStart(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setNewTableReaderForCompactionInputs( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean jnew_table_reader_for_compaction_inputs) { auto* opt = reinterpret_cast(jhandle); opt->new_table_reader_for_compaction_inputs = @@ -1374,7 +1341,7 @@ void Java_org_rocksdb_Options_setNewTableReaderForCompactionInputs( * Signature: (J)Z */ jboolean Java_org_rocksdb_Options_newTableReaderForCompactionInputs( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->new_table_reader_for_compaction_inputs); } @@ -1385,7 +1352,7 @@ jboolean Java_org_rocksdb_Options_newTableReaderForCompactionInputs( * Signature: (JJ)V */ void Java_org_rocksdb_Options_setCompactionReadaheadSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_readahead_size) { auto* opt = reinterpret_cast(jhandle); opt->compaction_readahead_size = @@ -1397,9 +1364,8 @@ void Java_org_rocksdb_Options_setCompactionReadaheadSize( * Method: compactionReadaheadSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_compactionReadaheadSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_compactionReadaheadSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->compaction_readahead_size); } @@ -1410,8 +1376,7 @@ jlong Java_org_rocksdb_Options_compactionReadaheadSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setRandomAccessMaxBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jrandom_access_max_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong jrandom_access_max_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->random_access_max_buffer_size = static_cast(jrandom_access_max_buffer_size); @@ -1422,9 +1387,8 @@ void Java_org_rocksdb_Options_setRandomAccessMaxBufferSize( * Method: randomAccessMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_randomAccessMaxBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_randomAccessMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->random_access_max_buffer_size); } @@ -1435,7 +1399,7 @@ jlong Java_org_rocksdb_Options_randomAccessMaxBufferSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setWritableFileMaxBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jwritable_file_max_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->writable_file_max_buffer_size = @@ -1447,9 +1411,8 @@ void Java_org_rocksdb_Options_setWritableFileMaxBufferSize( * Method: writableFileMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_writableFileMaxBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_writableFileMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->writable_file_max_buffer_size); } @@ -1459,9 +1422,8 @@ jlong Java_org_rocksdb_Options_writableFileMaxBufferSize(JNIEnv* /*env*/, * Method: useAdaptiveMutex * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_useAdaptiveMutex(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_useAdaptiveMutex( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_adaptive_mutex; } @@ -1470,10 +1432,8 @@ jboolean Java_org_rocksdb_Options_useAdaptiveMutex(JNIEnv* /*env*/, * Method: setUseAdaptiveMutex * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setUseAdaptiveMutex(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean use_adaptive_mutex) { +void Java_org_rocksdb_Options_setUseAdaptiveMutex( + JNIEnv*, jobject, jlong jhandle, jboolean use_adaptive_mutex) { reinterpret_cast(jhandle)->use_adaptive_mutex = static_cast(use_adaptive_mutex); } @@ -1483,8 +1443,8 @@ void Java_org_rocksdb_Options_setUseAdaptiveMutex(JNIEnv* /*env*/, * Method: bytesPerSync * Signature: (J)J */ -jlong Java_org_rocksdb_Options_bytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_bytesPerSync( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->bytes_per_sync; } @@ -1493,9 +1453,8 @@ jlong Java_org_rocksdb_Options_bytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setBytesPerSync * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setBytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong bytes_per_sync) { +void Java_org_rocksdb_Options_setBytesPerSync( + JNIEnv*, jobject, jlong jhandle, jlong bytes_per_sync) { reinterpret_cast(jhandle)->bytes_per_sync = static_cast(bytes_per_sync); } @@ -1505,10 +1464,8 @@ void Java_org_rocksdb_Options_setBytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setWalBytesPerSync * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setWalBytesPerSync(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jwal_bytes_per_sync) { +void Java_org_rocksdb_Options_setWalBytesPerSync( + JNIEnv*, jobject, jlong jhandle, jlong jwal_bytes_per_sync) { reinterpret_cast(jhandle)->wal_bytes_per_sync = static_cast(jwal_bytes_per_sync); } @@ -1518,9 +1475,8 @@ void Java_org_rocksdb_Options_setWalBytesPerSync(JNIEnv* /*env*/, * Method: walBytesPerSync * Signature: (J)J */ -jlong Java_org_rocksdb_Options_walBytesPerSync(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_walBytesPerSync( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->wal_bytes_per_sync); } @@ -1531,8 +1487,7 @@ jlong Java_org_rocksdb_Options_walBytesPerSync(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setEnableThreadTracking( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jenable_thread_tracking) { + JNIEnv*, jobject, jlong jhandle, jboolean jenable_thread_tracking) { auto* opt = reinterpret_cast(jhandle); opt->enable_thread_tracking = static_cast(jenable_thread_tracking); } @@ -1542,9 +1497,8 @@ void Java_org_rocksdb_Options_setEnableThreadTracking( * Method: enableThreadTracking * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_enableThreadTracking(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_enableThreadTracking( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->enable_thread_tracking); } @@ -1554,10 +1508,8 @@ jboolean Java_org_rocksdb_Options_enableThreadTracking(JNIEnv* /*env*/, * Method: setDelayedWriteRate * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setDelayedWriteRate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jdelayed_write_rate) { +void Java_org_rocksdb_Options_setDelayedWriteRate( + JNIEnv*, jobject, jlong jhandle, jlong jdelayed_write_rate) { auto* opt = reinterpret_cast(jhandle); opt->delayed_write_rate = static_cast(jdelayed_write_rate); } @@ -1567,22 +1519,41 @@ void Java_org_rocksdb_Options_setDelayedWriteRate(JNIEnv* /*env*/, * Method: delayedWriteRate * Signature: (J)J */ -jlong Java_org_rocksdb_Options_delayedWriteRate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_delayedWriteRate( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->delayed_write_rate); } +/* + * Class: org_rocksdb_Options + * Method: setEnablePipelinedWrite + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setEnablePipelinedWrite( + JNIEnv*, jobject, jlong jhandle, jboolean jenable_pipelined_write) { + auto* opt = reinterpret_cast(jhandle); + opt->enable_pipelined_write = jenable_pipelined_write == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: enablePipelinedWrite + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_enablePipelinedWrite( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->enable_pipelined_write); +} + /* * Class: org_rocksdb_Options * Method: setAllowConcurrentMemtableWrite * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setAllowConcurrentMemtableWrite(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean allow) { +void Java_org_rocksdb_Options_setAllowConcurrentMemtableWrite( + JNIEnv*, jobject, jlong jhandle, jboolean allow) { reinterpret_cast(jhandle) ->allow_concurrent_memtable_write = static_cast(allow); } @@ -1592,9 +1563,8 @@ void Java_org_rocksdb_Options_setAllowConcurrentMemtableWrite(JNIEnv* /*env*/, * Method: allowConcurrentMemtableWrite * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_allowConcurrentMemtableWrite(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_allowConcurrentMemtableWrite( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->allow_concurrent_memtable_write; } @@ -1605,7 +1575,7 @@ jboolean Java_org_rocksdb_Options_allowConcurrentMemtableWrite(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setEnableWriteThreadAdaptiveYield( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jboolean yield) { + JNIEnv*, jobject, jlong jhandle, jboolean yield) { reinterpret_cast(jhandle) ->enable_write_thread_adaptive_yield = static_cast(yield); } @@ -1616,7 +1586,7 @@ void Java_org_rocksdb_Options_setEnableWriteThreadAdaptiveYield( * Signature: (J)Z */ jboolean Java_org_rocksdb_Options_enableWriteThreadAdaptiveYield( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->enable_write_thread_adaptive_yield; } @@ -1626,10 +1596,8 @@ jboolean Java_org_rocksdb_Options_enableWriteThreadAdaptiveYield( * Method: setWriteThreadMaxYieldUsec * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setWriteThreadMaxYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong max) { +void Java_org_rocksdb_Options_setWriteThreadMaxYieldUsec( + JNIEnv*, jobject, jlong jhandle, jlong max) { reinterpret_cast(jhandle)->write_thread_max_yield_usec = static_cast(max); } @@ -1639,9 +1607,8 @@ void Java_org_rocksdb_Options_setWriteThreadMaxYieldUsec(JNIEnv* /*env*/, * Method: writeThreadMaxYieldUsec * Signature: (J)J */ -jlong Java_org_rocksdb_Options_writeThreadMaxYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_writeThreadMaxYieldUsec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->write_thread_max_yield_usec; } @@ -1651,10 +1618,8 @@ jlong Java_org_rocksdb_Options_writeThreadMaxYieldUsec(JNIEnv* /*env*/, * Method: setWriteThreadSlowYieldUsec * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setWriteThreadSlowYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong slow) { +void Java_org_rocksdb_Options_setWriteThreadSlowYieldUsec( + JNIEnv*, jobject, jlong jhandle, jlong slow) { reinterpret_cast(jhandle)->write_thread_slow_yield_usec = static_cast(slow); } @@ -1664,9 +1629,8 @@ void Java_org_rocksdb_Options_setWriteThreadSlowYieldUsec(JNIEnv* /*env*/, * Method: writeThreadSlowYieldUsec * Signature: (J)J */ -jlong Java_org_rocksdb_Options_writeThreadSlowYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_writeThreadSlowYieldUsec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->write_thread_slow_yield_usec; } @@ -1677,7 +1641,7 @@ jlong Java_org_rocksdb_Options_writeThreadSlowYieldUsec(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setSkipStatsUpdateOnDbOpen( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean jskip_stats_update_on_db_open) { auto* opt = reinterpret_cast(jhandle); opt->skip_stats_update_on_db_open = @@ -1689,9 +1653,8 @@ void Java_org_rocksdb_Options_setSkipStatsUpdateOnDbOpen( * Method: skipStatsUpdateOnDbOpen * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_skipStatsUpdateOnDbOpen(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_skipStatsUpdateOnDbOpen( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->skip_stats_update_on_db_open); } @@ -1702,7 +1665,7 @@ jboolean Java_org_rocksdb_Options_skipStatsUpdateOnDbOpen(JNIEnv* /*env*/, * Signature: (JB)V */ void Java_org_rocksdb_Options_setWalRecoveryMode( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jbyte jwal_recovery_mode_value) { auto* opt = reinterpret_cast(jhandle); opt->wal_recovery_mode = rocksdb::WALRecoveryModeJni::toCppWALRecoveryMode( @@ -1714,9 +1677,8 @@ void Java_org_rocksdb_Options_setWalRecoveryMode( * Method: walRecoveryMode * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_walRecoveryMode(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_walRecoveryMode( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return rocksdb::WALRecoveryModeJni::toJavaWALRecoveryMode( opt->wal_recovery_mode); @@ -1727,8 +1689,8 @@ jbyte Java_org_rocksdb_Options_walRecoveryMode(JNIEnv* /*env*/, * Method: setAllow2pc * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setAllow2pc(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jboolean jallow_2pc) { +void Java_org_rocksdb_Options_setAllow2pc( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_2pc) { auto* opt = reinterpret_cast(jhandle); opt->allow_2pc = static_cast(jallow_2pc); } @@ -1738,8 +1700,8 @@ void Java_org_rocksdb_Options_setAllow2pc(JNIEnv* /*env*/, jobject /*jobj*/, * Method: allow2pc * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_allow2pc(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_allow2pc( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->allow_2pc); } @@ -1749,23 +1711,35 @@ jboolean Java_org_rocksdb_Options_allow2pc(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setRowCache * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setRowCache(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jrow_cache_handle) { +void Java_org_rocksdb_Options_setRowCache( + JNIEnv*, jobject, jlong jhandle, jlong jrow_cache_handle) { auto* opt = reinterpret_cast(jhandle); auto* row_cache = reinterpret_cast*>(jrow_cache_handle); opt->row_cache = *row_cache; } + +/* + * Class: org_rocksdb_Options + * Method: setWalFilter + * Signature: (JJ)V + */ +void Java_org_rocksdb_Options_setWalFilter( + JNIEnv*, jobject, jlong jhandle, jlong jwal_filter_handle) { + auto* opt = reinterpret_cast(jhandle); + auto* wal_filter = + reinterpret_cast(jwal_filter_handle); + opt->wal_filter = wal_filter; +} + /* * Class: org_rocksdb_Options * Method: setFailIfOptionsFileError * Signature: (JZ)V */ void Java_org_rocksdb_Options_setFailIfOptionsFileError( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jfail_if_options_file_error) { + JNIEnv*, jobject, jlong jhandle, jboolean jfail_if_options_file_error) { auto* opt = reinterpret_cast(jhandle); opt->fail_if_options_file_error = static_cast(jfail_if_options_file_error); @@ -1776,9 +1750,8 @@ void Java_org_rocksdb_Options_setFailIfOptionsFileError( * Method: failIfOptionsFileError * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_failIfOptionsFileError(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_failIfOptionsFileError( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->fail_if_options_file_error); } @@ -1788,10 +1761,8 @@ jboolean Java_org_rocksdb_Options_failIfOptionsFileError(JNIEnv* /*env*/, * Method: setDumpMallocStats * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setDumpMallocStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean jdump_malloc_stats) { +void Java_org_rocksdb_Options_setDumpMallocStats( + JNIEnv*, jobject, jlong jhandle, jboolean jdump_malloc_stats) { auto* opt = reinterpret_cast(jhandle); opt->dump_malloc_stats = static_cast(jdump_malloc_stats); } @@ -1801,9 +1772,8 @@ void Java_org_rocksdb_Options_setDumpMallocStats(JNIEnv* /*env*/, * Method: dumpMallocStats * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_dumpMallocStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_dumpMallocStats( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->dump_malloc_stats); } @@ -1814,8 +1784,7 @@ jboolean Java_org_rocksdb_Options_dumpMallocStats(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setAvoidFlushDuringRecovery( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean javoid_flush_during_recovery) { + JNIEnv*, jobject, jlong jhandle, jboolean javoid_flush_during_recovery) { auto* opt = reinterpret_cast(jhandle); opt->avoid_flush_during_recovery = static_cast(javoid_flush_during_recovery); @@ -1826,9 +1795,8 @@ void Java_org_rocksdb_Options_setAvoidFlushDuringRecovery( * Method: avoidFlushDuringRecovery * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_avoidFlushDuringRecovery(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_avoidFlushDuringRecovery( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->avoid_flush_during_recovery); } @@ -1839,8 +1807,7 @@ jboolean Java_org_rocksdb_Options_avoidFlushDuringRecovery(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setAvoidFlushDuringShutdown( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean javoid_flush_during_shutdown) { + JNIEnv*, jobject, jlong jhandle, jboolean javoid_flush_during_shutdown) { auto* opt = reinterpret_cast(jhandle); opt->avoid_flush_during_shutdown = static_cast(javoid_flush_during_shutdown); @@ -1851,19 +1818,128 @@ void Java_org_rocksdb_Options_setAvoidFlushDuringShutdown( * Method: avoidFlushDuringShutdown * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_avoidFlushDuringShutdown(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_avoidFlushDuringShutdown( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->avoid_flush_during_shutdown); } +/* + * Class: org_rocksdb_Options + * Method: setAllowIngestBehind + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setAllowIngestBehind( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_ingest_behind) { + auto* opt = reinterpret_cast(jhandle); + opt->allow_ingest_behind = jallow_ingest_behind == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: allowIngestBehind + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_allowIngestBehind( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->allow_ingest_behind); +} + +/* + * Class: org_rocksdb_Options + * Method: setPreserveDeletes + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setPreserveDeletes( + JNIEnv*, jobject, jlong jhandle, jboolean jpreserve_deletes) { + auto* opt = reinterpret_cast(jhandle); + opt->preserve_deletes = jpreserve_deletes == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: preserveDeletes + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_preserveDeletes( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->preserve_deletes); +} + +/* + * Class: org_rocksdb_Options + * Method: setTwoWriteQueues + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setTwoWriteQueues( + JNIEnv*, jobject, jlong jhandle, jboolean jtwo_write_queues) { + auto* opt = reinterpret_cast(jhandle); + opt->two_write_queues = jtwo_write_queues == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: twoWriteQueues + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_twoWriteQueues( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->two_write_queues); +} + +/* + * Class: org_rocksdb_Options + * Method: setManualWalFlush + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setManualWalFlush( + JNIEnv*, jobject, jlong jhandle, jboolean jmanual_wal_flush) { + auto* opt = reinterpret_cast(jhandle); + opt->manual_wal_flush = jmanual_wal_flush == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: manualWalFlush + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_manualWalFlush( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->manual_wal_flush); +} + +/* + * Class: org_rocksdb_Options + * Method: setAtomicFlush + * Signature: (JZ)V + */ +void Java_org_rocksdb_Options_setAtomicFlush( + JNIEnv*, jobject, jlong jhandle, jboolean jatomic_flush) { + auto* opt = reinterpret_cast(jhandle); + opt->atomic_flush = jatomic_flush == JNI_TRUE; +} + +/* + * Class: org_rocksdb_Options + * Method: atomicFlush + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_Options_atomicFlush( + JNIEnv *, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->atomic_flush); +} + /* * Method: tableFactoryName * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_Options_tableFactoryName(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_Options_tableFactoryName( + JNIEnv* env, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); rocksdb::TableFactory* tf = opt->table_factory.get(); @@ -1879,9 +1955,8 @@ jstring Java_org_rocksdb_Options_tableFactoryName(JNIEnv* env, jobject /*jobj*/, * Method: minWriteBufferNumberToMerge * Signature: (J)I */ -jint Java_org_rocksdb_Options_minWriteBufferNumberToMerge(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_minWriteBufferNumberToMerge( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->min_write_buffer_number_to_merge; } @@ -1892,8 +1967,7 @@ jint Java_org_rocksdb_Options_minWriteBufferNumberToMerge(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMinWriteBufferNumberToMerge( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmin_write_buffer_number_to_merge) { + JNIEnv*, jobject, jlong jhandle, jint jmin_write_buffer_number_to_merge) { reinterpret_cast(jhandle) ->min_write_buffer_number_to_merge = static_cast(jmin_write_buffer_number_to_merge); @@ -1903,9 +1977,8 @@ void Java_org_rocksdb_Options_setMinWriteBufferNumberToMerge( * Method: maxWriteBufferNumberToMaintain * Signature: (J)I */ -jint Java_org_rocksdb_Options_maxWriteBufferNumberToMaintain(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_maxWriteBufferNumberToMaintain( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_write_buffer_number_to_maintain; } @@ -1916,7 +1989,7 @@ jint Java_org_rocksdb_Options_maxWriteBufferNumberToMaintain(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMaxWriteBufferNumberToMaintain( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jmax_write_buffer_number_to_maintain) { reinterpret_cast(jhandle) ->max_write_buffer_number_to_maintain = @@ -1929,8 +2002,7 @@ void Java_org_rocksdb_Options_setMaxWriteBufferNumberToMaintain( * Signature: (JB)V */ void Java_org_rocksdb_Options_setCompressionType( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompression_type_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompression_type_value) { auto* opts = reinterpret_cast(jhandle); opts->compression = rocksdb::CompressionTypeJni::toCppCompressionType( jcompression_type_value); @@ -1941,9 +2013,8 @@ void Java_org_rocksdb_Options_setCompressionType( * Method: compressionType * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_compressionType(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_compressionType( + JNIEnv*, jobject, jlong jhandle) { auto* opts = reinterpret_cast(jhandle); return rocksdb::CompressionTypeJni::toJavaCompressionType(opts->compression); } @@ -1956,11 +2027,11 @@ jbyte Java_org_rocksdb_Options_compressionType(JNIEnv* /*env*/, * @param jcompression_levels A reference to a java byte array * where each byte indicates a compression level * - * @return A unique_ptr to the vector, or unique_ptr(nullptr) if a JNI exception - * occurs + * @return A std::unique_ptr to the vector, or std::unique_ptr(nullptr) if a JNI + * exception occurs */ -std::unique_ptr> -rocksdb_compression_vector_helper(JNIEnv* env, jbyteArray jcompression_levels) { +std::unique_ptr>rocksdb_compression_vector_helper( + JNIEnv* env, jbyteArray jcompression_levels) { jsize len = env->GetArrayLength(jcompression_levels); jbyte* jcompression_level = env->GetByteArrayElements(jcompression_levels, nullptr); @@ -2030,8 +2101,7 @@ jbyteArray rocksdb_compression_list_helper( * Signature: (J[B)V */ void Java_org_rocksdb_Options_setCompressionPerLevel( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jbyteArray jcompressionLevels) { + JNIEnv* env, jobject, jlong jhandle, jbyteArray jcompressionLevels) { auto uptr_compression_levels = rocksdb_compression_vector_helper(env, jcompressionLevels); if (!uptr_compression_levels) { @@ -2047,9 +2117,8 @@ void Java_org_rocksdb_Options_setCompressionPerLevel( * Method: compressionPerLevel * Signature: (J)[B */ -jbyteArray Java_org_rocksdb_Options_compressionPerLevel(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle) { +jbyteArray Java_org_rocksdb_Options_compressionPerLevel( + JNIEnv* env, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return rocksdb_compression_list_helper(env, options->compression_per_level); } @@ -2060,8 +2129,7 @@ jbyteArray Java_org_rocksdb_Options_compressionPerLevel(JNIEnv* env, * Signature: (JB)V */ void Java_org_rocksdb_Options_setBottommostCompressionType( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompression_type_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompression_type_value) { auto* options = reinterpret_cast(jhandle); options->bottommost_compression = rocksdb::CompressionTypeJni::toCppCompressionType( @@ -2073,9 +2141,8 @@ void Java_org_rocksdb_Options_setBottommostCompressionType( * Method: bottommostCompressionType * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_bottommostCompressionType(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_bottommostCompressionType( + JNIEnv*, jobject, jlong jhandle) { auto* options = reinterpret_cast(jhandle); return rocksdb::CompressionTypeJni::toJavaCompressionType( options->bottommost_compression); @@ -2087,7 +2154,7 @@ jbyte Java_org_rocksdb_Options_bottommostCompressionType(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setBottommostCompressionOptions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jbottommost_compression_options_handle) { auto* options = reinterpret_cast(jhandle); auto* bottommost_compression_options = @@ -2102,8 +2169,7 @@ void Java_org_rocksdb_Options_setBottommostCompressionOptions( * Signature: (JJ)V */ void Java_org_rocksdb_Options_setCompressionOptions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jcompression_options_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jcompression_options_handle) { auto* options = reinterpret_cast(jhandle); auto* compression_options = reinterpret_cast( jcompression_options_handle); @@ -2115,12 +2181,12 @@ void Java_org_rocksdb_Options_setCompressionOptions( * Method: setCompactionStyle * Signature: (JB)V */ -void Java_org_rocksdb_Options_setCompactionStyle(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jbyte compaction_style) { - reinterpret_cast(jhandle)->compaction_style = - static_cast(compaction_style); +void Java_org_rocksdb_Options_setCompactionStyle( + JNIEnv*, jobject, jlong jhandle, jbyte jcompaction_style) { + auto* options = reinterpret_cast(jhandle); + options->compaction_style = + rocksdb::CompactionStyleJni::toCppCompactionStyle( + jcompaction_style); } /* @@ -2128,10 +2194,11 @@ void Java_org_rocksdb_Options_setCompactionStyle(JNIEnv* /*env*/, * Method: compactionStyle * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_compactionStyle(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return reinterpret_cast(jhandle)->compaction_style; +jbyte Java_org_rocksdb_Options_compactionStyle( + JNIEnv*, jobject, jlong jhandle) { + auto* options = reinterpret_cast(jhandle); + return rocksdb::CompactionStyleJni::toJavaCompactionStyle( + options->compaction_style); } /* @@ -2140,8 +2207,7 @@ jbyte Java_org_rocksdb_Options_compactionStyle(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMaxTableFilesSizeFIFO( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_table_files_size) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_table_files_size) { reinterpret_cast(jhandle) ->compaction_options_fifo.max_table_files_size = static_cast(jmax_table_files_size); @@ -2152,9 +2218,8 @@ void Java_org_rocksdb_Options_setMaxTableFilesSizeFIFO( * Method: maxTableFilesSizeFIFO * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxTableFilesSizeFIFO(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxTableFilesSizeFIFO( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->compaction_options_fifo.max_table_files_size; } @@ -2164,8 +2229,8 @@ jlong Java_org_rocksdb_Options_maxTableFilesSizeFIFO(JNIEnv* /*env*/, * Method: numLevels * Signature: (J)I */ -jint Java_org_rocksdb_Options_numLevels(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_numLevels( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->num_levels; } @@ -2174,8 +2239,8 @@ jint Java_org_rocksdb_Options_numLevels(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setNumLevels * Signature: (JI)V */ -void Java_org_rocksdb_Options_setNumLevels(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jint jnum_levels) { +void Java_org_rocksdb_Options_setNumLevels( + JNIEnv*, jobject, jlong jhandle, jint jnum_levels) { reinterpret_cast(jhandle)->num_levels = static_cast(jnum_levels); } @@ -2186,7 +2251,7 @@ void Java_org_rocksdb_Options_setNumLevels(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (J)I */ jint Java_org_rocksdb_Options_levelZeroFileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger; } @@ -2197,7 +2262,7 @@ jint Java_org_rocksdb_Options_levelZeroFileNumCompactionTrigger( * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevelZeroFileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jlevel0_file_num_compaction_trigger) { reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger = @@ -2209,9 +2274,8 @@ void Java_org_rocksdb_Options_setLevelZeroFileNumCompactionTrigger( * Method: levelZeroSlowdownWritesTrigger * Signature: (J)I */ -jint Java_org_rocksdb_Options_levelZeroSlowdownWritesTrigger(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_levelZeroSlowdownWritesTrigger( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger; } @@ -2222,8 +2286,7 @@ jint Java_org_rocksdb_Options_levelZeroSlowdownWritesTrigger(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevelZeroSlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_slowdown_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_slowdown_writes_trigger) { reinterpret_cast(jhandle)->level0_slowdown_writes_trigger = static_cast(jlevel0_slowdown_writes_trigger); } @@ -2233,9 +2296,8 @@ void Java_org_rocksdb_Options_setLevelZeroSlowdownWritesTrigger( * Method: levelZeroStopWritesTrigger * Signature: (J)I */ -jint Java_org_rocksdb_Options_levelZeroStopWritesTrigger(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_levelZeroStopWritesTrigger( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_stop_writes_trigger; } @@ -2246,8 +2308,7 @@ jint Java_org_rocksdb_Options_levelZeroStopWritesTrigger(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevelZeroStopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_stop_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_stop_writes_trigger) { reinterpret_cast(jhandle)->level0_stop_writes_trigger = static_cast(jlevel0_stop_writes_trigger); } @@ -2257,9 +2318,8 @@ void Java_org_rocksdb_Options_setLevelZeroStopWritesTrigger( * Method: targetFileSizeBase * Signature: (J)J */ -jlong Java_org_rocksdb_Options_targetFileSizeBase(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_targetFileSizeBase( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->target_file_size_base; } @@ -2269,8 +2329,7 @@ jlong Java_org_rocksdb_Options_targetFileSizeBase(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setTargetFileSizeBase( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jtarget_file_size_base) { + JNIEnv*, jobject, jlong jhandle, jlong jtarget_file_size_base) { reinterpret_cast(jhandle)->target_file_size_base = static_cast(jtarget_file_size_base); } @@ -2280,9 +2339,8 @@ void Java_org_rocksdb_Options_setTargetFileSizeBase( * Method: targetFileSizeMultiplier * Signature: (J)I */ -jint Java_org_rocksdb_Options_targetFileSizeMultiplier(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_targetFileSizeMultiplier( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->target_file_size_multiplier; } @@ -2293,8 +2351,7 @@ jint Java_org_rocksdb_Options_targetFileSizeMultiplier(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setTargetFileSizeMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jtarget_file_size_multiplier) { + JNIEnv*, jobject, jlong jhandle, jint jtarget_file_size_multiplier) { reinterpret_cast(jhandle)->target_file_size_multiplier = static_cast(jtarget_file_size_multiplier); } @@ -2304,9 +2361,8 @@ void Java_org_rocksdb_Options_setTargetFileSizeMultiplier( * Method: maxBytesForLevelBase * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxBytesForLevelBase(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxBytesForLevelBase( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_bytes_for_level_base; } @@ -2316,8 +2372,7 @@ jlong Java_org_rocksdb_Options_maxBytesForLevelBase(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMaxBytesForLevelBase( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_bytes_for_level_base) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_bytes_for_level_base) { reinterpret_cast(jhandle)->max_bytes_for_level_base = static_cast(jmax_bytes_for_level_base); } @@ -2328,7 +2383,7 @@ void Java_org_rocksdb_Options_setMaxBytesForLevelBase( * Signature: (J)Z */ jboolean Java_org_rocksdb_Options_levelCompactionDynamicLevelBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level_compaction_dynamic_level_bytes; } @@ -2339,8 +2394,7 @@ jboolean Java_org_rocksdb_Options_levelCompactionDynamicLevelBytes( * Signature: (JZ)V */ void Java_org_rocksdb_Options_setLevelCompactionDynamicLevelBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jenable_dynamic_level_bytes) { + JNIEnv*, jobject, jlong jhandle, jboolean jenable_dynamic_level_bytes) { reinterpret_cast(jhandle) ->level_compaction_dynamic_level_bytes = (jenable_dynamic_level_bytes); } @@ -2350,9 +2404,8 @@ void Java_org_rocksdb_Options_setLevelCompactionDynamicLevelBytes( * Method: maxBytesForLevelMultiplier * Signature: (J)D */ -jdouble Java_org_rocksdb_Options_maxBytesForLevelMultiplier(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jdouble Java_org_rocksdb_Options_maxBytesForLevelMultiplier( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_bytes_for_level_multiplier; } @@ -2363,8 +2416,7 @@ jdouble Java_org_rocksdb_Options_maxBytesForLevelMultiplier(JNIEnv* /*env*/, * Signature: (JD)V */ void Java_org_rocksdb_Options_setMaxBytesForLevelMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jdouble jmax_bytes_for_level_multiplier) { + JNIEnv*, jobject, jlong jhandle, jdouble jmax_bytes_for_level_multiplier) { reinterpret_cast(jhandle)->max_bytes_for_level_multiplier = static_cast(jmax_bytes_for_level_multiplier); } @@ -2374,9 +2426,8 @@ void Java_org_rocksdb_Options_setMaxBytesForLevelMultiplier( * Method: maxCompactionBytes * Signature: (J)I */ -jlong Java_org_rocksdb_Options_maxCompactionBytes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxCompactionBytes( + JNIEnv*, jobject, jlong jhandle) { return static_cast( reinterpret_cast(jhandle)->max_compaction_bytes); } @@ -2387,8 +2438,7 @@ jlong Java_org_rocksdb_Options_maxCompactionBytes(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMaxCompactionBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_compaction_bytes) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_compaction_bytes) { reinterpret_cast(jhandle)->max_compaction_bytes = static_cast(jmax_compaction_bytes); } @@ -2398,8 +2448,8 @@ void Java_org_rocksdb_Options_setMaxCompactionBytes( * Method: arenaBlockSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_arenaBlockSize(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_arenaBlockSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->arena_block_size; } @@ -2408,10 +2458,9 @@ jlong Java_org_rocksdb_Options_arenaBlockSize(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setArenaBlockSize * Signature: (JJ)V */ -void Java_org_rocksdb_Options_setArenaBlockSize(JNIEnv* env, - jobject /*jobj*/, jlong jhandle, - jlong jarena_block_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jarena_block_size); +void Java_org_rocksdb_Options_setArenaBlockSize( + JNIEnv* env, jobject, jlong jhandle, jlong jarena_block_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jarena_block_size); if (s.ok()) { reinterpret_cast(jhandle)->arena_block_size = jarena_block_size; @@ -2425,9 +2474,8 @@ void Java_org_rocksdb_Options_setArenaBlockSize(JNIEnv* env, * Method: disableAutoCompactions * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_disableAutoCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_disableAutoCompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->disable_auto_compactions; } @@ -2437,8 +2485,7 @@ jboolean Java_org_rocksdb_Options_disableAutoCompactions(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setDisableAutoCompactions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jdisable_auto_compactions) { + JNIEnv*, jobject, jlong jhandle, jboolean jdisable_auto_compactions) { reinterpret_cast(jhandle)->disable_auto_compactions = static_cast(jdisable_auto_compactions); } @@ -2448,9 +2495,8 @@ void Java_org_rocksdb_Options_setDisableAutoCompactions( * Method: maxSequentialSkipInIterations * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxSequentialSkipInIterations(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxSequentialSkipInIterations( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_sequential_skip_in_iterations; } @@ -2461,7 +2507,7 @@ jlong Java_org_rocksdb_Options_maxSequentialSkipInIterations(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMaxSequentialSkipInIterations( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jmax_sequential_skip_in_iterations) { reinterpret_cast(jhandle) ->max_sequential_skip_in_iterations = @@ -2473,9 +2519,8 @@ void Java_org_rocksdb_Options_setMaxSequentialSkipInIterations( * Method: inplaceUpdateSupport * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_inplaceUpdateSupport(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_inplaceUpdateSupport( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->inplace_update_support; } @@ -2485,8 +2530,7 @@ jboolean Java_org_rocksdb_Options_inplaceUpdateSupport(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setInplaceUpdateSupport( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jinplace_update_support) { + JNIEnv*, jobject, jlong jhandle, jboolean jinplace_update_support) { reinterpret_cast(jhandle)->inplace_update_support = static_cast(jinplace_update_support); } @@ -2496,9 +2540,8 @@ void Java_org_rocksdb_Options_setInplaceUpdateSupport( * Method: inplaceUpdateNumLocks * Signature: (J)J */ -jlong Java_org_rocksdb_Options_inplaceUpdateNumLocks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_inplaceUpdateNumLocks( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->inplace_update_num_locks; } @@ -2508,10 +2551,9 @@ jlong Java_org_rocksdb_Options_inplaceUpdateNumLocks(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setInplaceUpdateNumLocks( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jinplace_update_num_locks) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jinplace_update_num_locks); + JNIEnv* env, jobject, jlong jhandle, jlong jinplace_update_num_locks) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jinplace_update_num_locks); if (s.ok()) { reinterpret_cast(jhandle)->inplace_update_num_locks = jinplace_update_num_locks; @@ -2525,9 +2567,8 @@ void Java_org_rocksdb_Options_setInplaceUpdateNumLocks( * Method: memtablePrefixBloomSizeRatio * Signature: (J)I */ -jdouble Java_org_rocksdb_Options_memtablePrefixBloomSizeRatio(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jdouble Java_org_rocksdb_Options_memtablePrefixBloomSizeRatio( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->memtable_prefix_bloom_size_ratio; } @@ -2538,7 +2579,7 @@ jdouble Java_org_rocksdb_Options_memtablePrefixBloomSizeRatio(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setMemtablePrefixBloomSizeRatio( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jdouble jmemtable_prefix_bloom_size_ratio) { reinterpret_cast(jhandle) ->memtable_prefix_bloom_size_ratio = @@ -2550,8 +2591,8 @@ void Java_org_rocksdb_Options_setMemtablePrefixBloomSizeRatio( * Method: bloomLocality * Signature: (J)I */ -jint Java_org_rocksdb_Options_bloomLocality(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_bloomLocality( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->bloom_locality; } @@ -2560,9 +2601,8 @@ jint Java_org_rocksdb_Options_bloomLocality(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setBloomLocality * Signature: (JI)V */ -void Java_org_rocksdb_Options_setBloomLocality(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jint jbloom_locality) { +void Java_org_rocksdb_Options_setBloomLocality( + JNIEnv*, jobject, jlong jhandle, jint jbloom_locality) { reinterpret_cast(jhandle)->bloom_locality = static_cast(jbloom_locality); } @@ -2572,9 +2612,8 @@ void Java_org_rocksdb_Options_setBloomLocality(JNIEnv* /*env*/, * Method: maxSuccessiveMerges * Signature: (J)J */ -jlong Java_org_rocksdb_Options_maxSuccessiveMerges(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_maxSuccessiveMerges( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_successive_merges; } @@ -2584,10 +2623,9 @@ jlong Java_org_rocksdb_Options_maxSuccessiveMerges(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMaxSuccessiveMerges( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jmax_successive_merges) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jmax_successive_merges); + JNIEnv* env, jobject, jlong jhandle, jlong jmax_successive_merges) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jmax_successive_merges); if (s.ok()) { reinterpret_cast(jhandle)->max_successive_merges = jmax_successive_merges; @@ -2601,9 +2639,8 @@ void Java_org_rocksdb_Options_setMaxSuccessiveMerges( * Method: optimizeFiltersForHits * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_optimizeFiltersForHits(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_optimizeFiltersForHits( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->optimize_filters_for_hits; } @@ -2614,8 +2651,7 @@ jboolean Java_org_rocksdb_Options_optimizeFiltersForHits(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setOptimizeFiltersForHits( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean joptimize_filters_for_hits) { + JNIEnv*, jobject, jlong jhandle, jboolean joptimize_filters_for_hits) { reinterpret_cast(jhandle)->optimize_filters_for_hits = static_cast(joptimize_filters_for_hits); } @@ -2625,9 +2661,8 @@ void Java_org_rocksdb_Options_setOptimizeFiltersForHits( * Method: optimizeForSmallDb * Signature: (J)V */ -void Java_org_rocksdb_Options_optimizeForSmallDb(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_Options_optimizeForSmallDb( + JNIEnv*, jobject, jlong jhandle) { reinterpret_cast(jhandle)->OptimizeForSmallDb(); } @@ -2637,8 +2672,7 @@ void Java_org_rocksdb_Options_optimizeForSmallDb(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_optimizeForPointLookup( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong block_cache_size_mb) { + JNIEnv*, jobject, jlong jhandle, jlong block_cache_size_mb) { reinterpret_cast(jhandle)->OptimizeForPointLookup( block_cache_size_mb); } @@ -2649,8 +2683,7 @@ void Java_org_rocksdb_Options_optimizeForPointLookup( * Signature: (JJ)V */ void Java_org_rocksdb_Options_optimizeLevelStyleCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong memtable_memory_budget) { + JNIEnv*, jobject, jlong jhandle, jlong memtable_memory_budget) { reinterpret_cast(jhandle)->OptimizeLevelStyleCompaction( memtable_memory_budget); } @@ -2661,8 +2694,7 @@ void Java_org_rocksdb_Options_optimizeLevelStyleCompaction( * Signature: (JJ)V */ void Java_org_rocksdb_Options_optimizeUniversalStyleCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong memtable_memory_budget) { + JNIEnv*, jobject, jlong jhandle, jlong memtable_memory_budget) { reinterpret_cast(jhandle) ->OptimizeUniversalStyleCompaction(memtable_memory_budget); } @@ -2672,9 +2704,8 @@ void Java_org_rocksdb_Options_optimizeUniversalStyleCompaction( * Method: prepareForBulkLoad * Signature: (J)V */ -void Java_org_rocksdb_Options_prepareForBulkLoad(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_Options_prepareForBulkLoad( + JNIEnv*, jobject, jlong jhandle) { reinterpret_cast(jhandle)->PrepareForBulkLoad(); } @@ -2683,9 +2714,8 @@ void Java_org_rocksdb_Options_prepareForBulkLoad(JNIEnv* /*env*/, * Method: memtableHugePageSize * Signature: (J)J */ -jlong Java_org_rocksdb_Options_memtableHugePageSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_memtableHugePageSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->memtable_huge_page_size; } @@ -2695,10 +2725,9 @@ jlong Java_org_rocksdb_Options_memtableHugePageSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setMemtableHugePageSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jmemtable_huge_page_size) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jmemtable_huge_page_size); + JNIEnv* env, jobject, jlong jhandle, jlong jmemtable_huge_page_size) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jmemtable_huge_page_size); if (s.ok()) { reinterpret_cast(jhandle)->memtable_huge_page_size = jmemtable_huge_page_size; @@ -2712,9 +2741,8 @@ void Java_org_rocksdb_Options_setMemtableHugePageSize( * Method: softPendingCompactionBytesLimit * Signature: (J)J */ -jlong Java_org_rocksdb_Options_softPendingCompactionBytesLimit(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_softPendingCompactionBytesLimit( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->soft_pending_compaction_bytes_limit; } @@ -2725,7 +2753,7 @@ jlong Java_org_rocksdb_Options_softPendingCompactionBytesLimit(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setSoftPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jsoft_pending_compaction_bytes_limit) { reinterpret_cast(jhandle) ->soft_pending_compaction_bytes_limit = @@ -2737,9 +2765,8 @@ void Java_org_rocksdb_Options_setSoftPendingCompactionBytesLimit( * Method: softHardCompactionBytesLimit * Signature: (J)J */ -jlong Java_org_rocksdb_Options_hardPendingCompactionBytesLimit(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_Options_hardPendingCompactionBytesLimit( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->hard_pending_compaction_bytes_limit; } @@ -2750,7 +2777,7 @@ jlong Java_org_rocksdb_Options_hardPendingCompactionBytesLimit(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_Options_setHardPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jhard_pending_compaction_bytes_limit) { reinterpret_cast(jhandle) ->hard_pending_compaction_bytes_limit = @@ -2762,9 +2789,8 @@ void Java_org_rocksdb_Options_setHardPendingCompactionBytesLimit( * Method: level0FileNumCompactionTrigger * Signature: (J)I */ -jint Java_org_rocksdb_Options_level0FileNumCompactionTrigger(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_level0FileNumCompactionTrigger( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger; } @@ -2775,7 +2801,7 @@ jint Java_org_rocksdb_Options_level0FileNumCompactionTrigger(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevel0FileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jlevel0_file_num_compaction_trigger) { reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger = @@ -2787,9 +2813,8 @@ void Java_org_rocksdb_Options_setLevel0FileNumCompactionTrigger( * Method: level0SlowdownWritesTrigger * Signature: (J)I */ -jint Java_org_rocksdb_Options_level0SlowdownWritesTrigger(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_level0SlowdownWritesTrigger( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger; } @@ -2800,8 +2825,7 @@ jint Java_org_rocksdb_Options_level0SlowdownWritesTrigger(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevel0SlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_slowdown_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_slowdown_writes_trigger) { reinterpret_cast(jhandle)->level0_slowdown_writes_trigger = static_cast(jlevel0_slowdown_writes_trigger); } @@ -2811,9 +2835,8 @@ void Java_org_rocksdb_Options_setLevel0SlowdownWritesTrigger( * Method: level0StopWritesTrigger * Signature: (J)I */ -jint Java_org_rocksdb_Options_level0StopWritesTrigger(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_Options_level0StopWritesTrigger( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_stop_writes_trigger; } @@ -2824,8 +2847,7 @@ jint Java_org_rocksdb_Options_level0StopWritesTrigger(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_Options_setLevel0StopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_stop_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_stop_writes_trigger) { reinterpret_cast(jhandle)->level0_stop_writes_trigger = static_cast(jlevel0_stop_writes_trigger); } @@ -2836,7 +2858,7 @@ void Java_org_rocksdb_Options_setLevel0StopWritesTrigger( * Signature: (J)[I */ jintArray Java_org_rocksdb_Options_maxBytesForLevelMultiplierAdditional( - JNIEnv* env, jobject /*jobj*/, jlong jhandle) { + JNIEnv* env, jobject, jlong jhandle) { auto mbflma = reinterpret_cast(jhandle) ->max_bytes_for_level_multiplier_additional; @@ -2874,7 +2896,7 @@ jintArray Java_org_rocksdb_Options_maxBytesForLevelMultiplierAdditional( * Signature: (J[I)V */ void Java_org_rocksdb_Options_setMaxBytesForLevelMultiplierAdditional( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, + JNIEnv* env, jobject, jlong jhandle, jintArray jmax_bytes_for_level_multiplier_additional) { jsize len = env->GetArrayLength(jmax_bytes_for_level_multiplier_additional); jint* additionals = env->GetIntArrayElements( @@ -2900,9 +2922,8 @@ void Java_org_rocksdb_Options_setMaxBytesForLevelMultiplierAdditional( * Method: paranoidFileChecks * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_paranoidFileChecks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_paranoidFileChecks( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->paranoid_file_checks; } @@ -2912,8 +2933,7 @@ jboolean Java_org_rocksdb_Options_paranoidFileChecks(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_Options_setParanoidFileChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jparanoid_file_checks) { + JNIEnv*, jobject, jlong jhandle, jboolean jparanoid_file_checks) { reinterpret_cast(jhandle)->paranoid_file_checks = static_cast(jparanoid_file_checks); } @@ -2924,8 +2944,7 @@ void Java_org_rocksdb_Options_setParanoidFileChecks( * Signature: (JB)V */ void Java_org_rocksdb_Options_setCompactionPriority( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompaction_priority_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompaction_priority_value) { auto* opts = reinterpret_cast(jhandle); opts->compaction_pri = rocksdb::CompactionPriorityJni::toCppCompactionPriority( @@ -2937,9 +2956,8 @@ void Java_org_rocksdb_Options_setCompactionPriority( * Method: compactionPriority * Signature: (J)B */ -jbyte Java_org_rocksdb_Options_compactionPriority(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Options_compactionPriority( + JNIEnv*, jobject, jlong jhandle) { auto* opts = reinterpret_cast(jhandle); return rocksdb::CompactionPriorityJni::toJavaCompactionPriority( opts->compaction_pri); @@ -2950,10 +2968,8 @@ jbyte Java_org_rocksdb_Options_compactionPriority(JNIEnv* /*env*/, * Method: setReportBgIoStats * Signature: (JZ)V */ -void Java_org_rocksdb_Options_setReportBgIoStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean jreport_bg_io_stats) { +void Java_org_rocksdb_Options_setReportBgIoStats( + JNIEnv*, jobject, jlong jhandle, jboolean jreport_bg_io_stats) { auto* opts = reinterpret_cast(jhandle); opts->report_bg_io_stats = static_cast(jreport_bg_io_stats); } @@ -2963,20 +2979,41 @@ void Java_org_rocksdb_Options_setReportBgIoStats(JNIEnv* /*env*/, * Method: reportBgIoStats * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_reportBgIoStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_reportBgIoStats( + JNIEnv*, jobject, jlong jhandle) { auto* opts = reinterpret_cast(jhandle); return static_cast(opts->report_bg_io_stats); } +/* + * Class: org_rocksdb_Options + * Method: setTtl + * Signature: (JJ)V + */ +void Java_org_rocksdb_Options_setTtl( + JNIEnv*, jobject, jlong jhandle, jlong jttl) { + auto* opts = reinterpret_cast(jhandle); + opts->ttl = static_cast(jttl); +} + +/* + * Class: org_rocksdb_Options + * Method: ttl + * Signature: (J)J + */ +jlong Java_org_rocksdb_Options_ttl( + JNIEnv*, jobject, jlong jhandle) { + auto* opts = reinterpret_cast(jhandle); + return static_cast(opts->ttl); +} + /* * Class: org_rocksdb_Options * Method: setCompactionOptionsUniversal * Signature: (JJ)V */ void Java_org_rocksdb_Options_setCompactionOptionsUniversal( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_options_universal_handle) { auto* opts = reinterpret_cast(jhandle); auto* opts_uni = reinterpret_cast( @@ -2990,8 +3027,7 @@ void Java_org_rocksdb_Options_setCompactionOptionsUniversal( * Signature: (JJ)V */ void Java_org_rocksdb_Options_setCompactionOptionsFIFO( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jcompaction_options_fifo_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_options_fifo_handle) { auto* opts = reinterpret_cast(jhandle); auto* opts_fifo = reinterpret_cast( jcompaction_options_fifo_handle); @@ -3004,8 +3040,7 @@ void Java_org_rocksdb_Options_setCompactionOptionsFIFO( * Signature: (JZ)V */ void Java_org_rocksdb_Options_setForceConsistencyChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jforce_consistency_checks) { + JNIEnv*, jobject, jlong jhandle, jboolean jforce_consistency_checks) { auto* opts = reinterpret_cast(jhandle); opts->force_consistency_checks = static_cast(jforce_consistency_checks); } @@ -3015,9 +3050,8 @@ void Java_org_rocksdb_Options_setForceConsistencyChecks( * Method: forceConsistencyChecks * Signature: (J)Z */ -jboolean Java_org_rocksdb_Options_forceConsistencyChecks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_Options_forceConsistencyChecks( + JNIEnv*, jobject, jlong jhandle) { auto* opts = reinterpret_cast(jhandle); return static_cast(opts->force_consistency_checks); } @@ -3031,7 +3065,7 @@ jboolean Java_org_rocksdb_Options_forceConsistencyChecks(JNIEnv* /*env*/, * Signature: ()J */ jlong Java_org_rocksdb_ColumnFamilyOptions_newColumnFamilyOptions( - JNIEnv* /*env*/, jclass /*jcls*/) { + JNIEnv*, jclass) { auto* op = new rocksdb::ColumnFamilyOptions(); return reinterpret_cast(op); } @@ -3042,19 +3076,31 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_newColumnFamilyOptions( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_copyColumnFamilyOptions( - JNIEnv* /*env*/, jclass /*jcls*/, jlong jhandle) { + JNIEnv*, jclass, jlong jhandle) { auto new_opt = new rocksdb::ColumnFamilyOptions( *(reinterpret_cast(jhandle))); return reinterpret_cast(new_opt); } +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: newColumnFamilyOptionsFromOptions + * Signature: (J)J + */ +jlong Java_org_rocksdb_ColumnFamilyOptions_newColumnFamilyOptionsFromOptions( + JNIEnv*, jclass, jlong joptions_handle) { + auto new_opt = new rocksdb::ColumnFamilyOptions( + *reinterpret_cast(joptions_handle)); + return reinterpret_cast(new_opt); +} + /* * Class: org_rocksdb_ColumnFamilyOptions * Method: getColumnFamilyOptionsFromProps * Signature: (Ljava/util/String;)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_getColumnFamilyOptionsFromProps( - JNIEnv* env, jclass /*jclazz*/, jstring jopt_string) { + JNIEnv* env, jclass, jstring jopt_string) { const char* opt_string = env->GetStringUTFChars(jopt_string, nullptr); if (opt_string == nullptr) { // exception thrown: OutOfMemoryError @@ -3084,9 +3130,8 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_getColumnFamilyOptionsFromProps( * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_ColumnFamilyOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong handle) { +void Java_org_rocksdb_ColumnFamilyOptions_disposeInternal( + JNIEnv*, jobject, jlong handle) { auto* cfo = reinterpret_cast(handle); assert(cfo != nullptr); delete cfo; @@ -3097,9 +3142,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_disposeInternal(JNIEnv* /*env*/, * Method: optimizeForSmallDb * Signature: (J)V */ -void Java_org_rocksdb_ColumnFamilyOptions_optimizeForSmallDb(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_ColumnFamilyOptions_optimizeForSmallDb( + JNIEnv*, jobject, jlong jhandle) { reinterpret_cast(jhandle) ->OptimizeForSmallDb(); } @@ -3110,8 +3154,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_optimizeForSmallDb(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_optimizeForPointLookup( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong block_cache_size_mb) { + JNIEnv*, jobject, jlong jhandle, jlong block_cache_size_mb) { reinterpret_cast(jhandle) ->OptimizeForPointLookup(block_cache_size_mb); } @@ -3122,8 +3165,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_optimizeForPointLookup( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_optimizeLevelStyleCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong memtable_memory_budget) { + JNIEnv*, jobject, jlong jhandle, jlong memtable_memory_budget) { reinterpret_cast(jhandle) ->OptimizeLevelStyleCompaction(memtable_memory_budget); } @@ -3134,8 +3176,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_optimizeLevelStyleCompaction( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_optimizeUniversalStyleCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong memtable_memory_budget) { + JNIEnv*, jobject, jlong jhandle, jlong memtable_memory_budget) { reinterpret_cast(jhandle) ->OptimizeUniversalStyleCompaction(memtable_memory_budget); } @@ -3146,7 +3187,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_optimizeUniversalStyleCompaction( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setComparatorHandle__JI( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint builtinComparator) { + JNIEnv*, jobject, jlong jhandle, jint builtinComparator) { switch (builtinComparator) { case 1: reinterpret_cast(jhandle)->comparator = @@ -3165,8 +3206,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setComparatorHandle__JI( * Signature: (JJB)V */ void Java_org_rocksdb_ColumnFamilyOptions_setComparatorHandle__JJB( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jopt_handle, - jlong jcomparator_handle, jbyte jcomparator_type) { + JNIEnv*, jobject, jlong jopt_handle, jlong jcomparator_handle, + jbyte jcomparator_type) { rocksdb::Comparator* comparator = nullptr; switch (jcomparator_type) { // JAVA_COMPARATOR @@ -3196,7 +3237,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setComparatorHandle__JJB( * Signature: (JJjava/lang/String)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMergeOperatorName( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, jstring jop_name) { + JNIEnv* env, jobject, jlong jhandle, jstring jop_name) { auto* options = reinterpret_cast(jhandle); const char* op_name = env->GetStringUTFChars(jop_name, nullptr); if (op_name == nullptr) { @@ -3215,8 +3256,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMergeOperatorName( * Signature: (JJjava/lang/String)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMergeOperator( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong mergeOperatorHandle) { + JNIEnv*, jobject, jlong jhandle, jlong mergeOperatorHandle) { reinterpret_cast(jhandle)->merge_operator = *(reinterpret_cast*>( mergeOperatorHandle)); @@ -3228,8 +3268,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMergeOperator( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionFilterHandle( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jopt_handle, - jlong jcompactionfilter_handle) { + JNIEnv*, jobject, jlong jopt_handle, jlong jcompactionfilter_handle) { reinterpret_cast(jopt_handle) ->compaction_filter = reinterpret_cast(jcompactionfilter_handle); @@ -3240,9 +3279,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionFilterHandle( * Method: setCompactionFilterFactoryHandle * Signature: (JJ)V */ -void JNICALL -Java_org_rocksdb_ColumnFamilyOptions_setCompactionFilterFactoryHandle( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jopt_handle, +void Java_org_rocksdb_ColumnFamilyOptions_setCompactionFilterFactoryHandle( + JNIEnv*, jobject, jlong jopt_handle, jlong jcompactionfilterfactory_handle) { auto* cff_factory = reinterpret_cast*>( @@ -3257,9 +3295,8 @@ Java_org_rocksdb_ColumnFamilyOptions_setCompactionFilterFactoryHandle( * Signature: (JJ)I */ void Java_org_rocksdb_ColumnFamilyOptions_setWriteBufferSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jwrite_buffer_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jwrite_buffer_size); + JNIEnv* env, jobject, jlong jhandle, jlong jwrite_buffer_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jwrite_buffer_size); if (s.ok()) { reinterpret_cast(jhandle) ->write_buffer_size = jwrite_buffer_size; @@ -3273,9 +3310,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setWriteBufferSize( * Method: writeBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_ColumnFamilyOptions_writeBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ColumnFamilyOptions_writeBufferSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->write_buffer_size; } @@ -3286,8 +3322,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_writeBufferSize(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxWriteBufferNumber( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmax_write_buffer_number) { + JNIEnv*, jobject, jlong jhandle, jint jmax_write_buffer_number) { reinterpret_cast(jhandle) ->max_write_buffer_number = jmax_write_buffer_number; } @@ -3297,9 +3332,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxWriteBufferNumber( * Method: maxWriteBufferNumber * Signature: (J)I */ -jint Java_org_rocksdb_ColumnFamilyOptions_maxWriteBufferNumber(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_ColumnFamilyOptions_maxWriteBufferNumber( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_write_buffer_number; } @@ -3309,7 +3343,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_maxWriteBufferNumber(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMemTableFactory( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jlong jfactory_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jfactory_handle) { reinterpret_cast(jhandle) ->memtable_factory.reset( reinterpret_cast(jfactory_handle)); @@ -3321,7 +3355,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMemTableFactory( * Signature: (J)Ljava/lang/String */ jstring Java_org_rocksdb_ColumnFamilyOptions_memTableFactoryName( - JNIEnv* env, jobject /*jobj*/, jlong jhandle) { + JNIEnv* env, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); rocksdb::MemTableRepFactory* tf = opt->memtable_factory.get(); @@ -3342,7 +3376,7 @@ jstring Java_org_rocksdb_ColumnFamilyOptions_memTableFactoryName( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_useFixedLengthPrefixExtractor( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jprefix_length) { + JNIEnv*, jobject, jlong jhandle, jint jprefix_length) { reinterpret_cast(jhandle) ->prefix_extractor.reset( rocksdb::NewFixedPrefixTransform(static_cast(jprefix_length))); @@ -3353,7 +3387,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_useFixedLengthPrefixExtractor( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_useCappedPrefixExtractor( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jprefix_length) { + JNIEnv*, jobject, jlong jhandle, jint jprefix_length) { reinterpret_cast(jhandle) ->prefix_extractor.reset( rocksdb::NewCappedPrefixTransform(static_cast(jprefix_length))); @@ -3364,7 +3398,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_useCappedPrefixExtractor( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setTableFactory( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jlong jfactory_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jfactory_handle) { reinterpret_cast(jhandle)->table_factory.reset( reinterpret_cast(jfactory_handle)); } @@ -3373,9 +3407,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setTableFactory( * Method: tableFactoryName * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_ColumnFamilyOptions_tableFactoryName(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_ColumnFamilyOptions_tableFactoryName( + JNIEnv* env, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); rocksdb::TableFactory* tf = opt->table_factory.get(); @@ -3392,7 +3425,7 @@ jstring Java_org_rocksdb_ColumnFamilyOptions_tableFactoryName(JNIEnv* env, * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_minWriteBufferNumberToMerge( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->min_write_buffer_number_to_merge; } @@ -3403,8 +3436,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_minWriteBufferNumberToMerge( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMinWriteBufferNumberToMerge( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmin_write_buffer_number_to_merge) { + JNIEnv*, jobject, jlong jhandle, jint jmin_write_buffer_number_to_merge) { reinterpret_cast(jhandle) ->min_write_buffer_number_to_merge = static_cast(jmin_write_buffer_number_to_merge); @@ -3416,7 +3448,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMinWriteBufferNumberToMerge( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_maxWriteBufferNumberToMaintain( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_write_buffer_number_to_maintain; } @@ -3427,7 +3459,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_maxWriteBufferNumberToMaintain( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxWriteBufferNumberToMaintain( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jmax_write_buffer_number_to_maintain) { reinterpret_cast(jhandle) ->max_write_buffer_number_to_maintain = @@ -3440,8 +3472,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxWriteBufferNumberToMaintain( * Signature: (JB)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionType( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompression_type_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompression_type_value) { auto* cf_opts = reinterpret_cast(jhandle); cf_opts->compression = rocksdb::CompressionTypeJni::toCppCompressionType( jcompression_type_value); @@ -3452,9 +3483,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionType( * Method: compressionType * Signature: (J)B */ -jbyte Java_org_rocksdb_ColumnFamilyOptions_compressionType(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_ColumnFamilyOptions_compressionType( + JNIEnv*, jobject, jlong jhandle) { auto* cf_opts = reinterpret_cast(jhandle); return rocksdb::CompressionTypeJni::toJavaCompressionType( cf_opts->compression); @@ -3466,8 +3496,7 @@ jbyte Java_org_rocksdb_ColumnFamilyOptions_compressionType(JNIEnv* /*env*/, * Signature: (J[B)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionPerLevel( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jbyteArray jcompressionLevels) { + JNIEnv* env, jobject, jlong jhandle, jbyteArray jcompressionLevels) { auto* options = reinterpret_cast(jhandle); auto uptr_compression_levels = rocksdb_compression_vector_helper(env, jcompressionLevels); @@ -3484,7 +3513,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionPerLevel( * Signature: (J)[B */ jbyteArray Java_org_rocksdb_ColumnFamilyOptions_compressionPerLevel( - JNIEnv* env, jobject /*jobj*/, jlong jhandle) { + JNIEnv* env, jobject, jlong jhandle) { auto* cf_options = reinterpret_cast(jhandle); return rocksdb_compression_list_helper(env, cf_options->compression_per_level); @@ -3496,8 +3525,7 @@ jbyteArray Java_org_rocksdb_ColumnFamilyOptions_compressionPerLevel( * Signature: (JB)V */ void Java_org_rocksdb_ColumnFamilyOptions_setBottommostCompressionType( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompression_type_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompression_type_value) { auto* cf_options = reinterpret_cast(jhandle); cf_options->bottommost_compression = rocksdb::CompressionTypeJni::toCppCompressionType( @@ -3510,7 +3538,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setBottommostCompressionType( * Signature: (J)B */ jbyte Java_org_rocksdb_ColumnFamilyOptions_bottommostCompressionType( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* cf_options = reinterpret_cast(jhandle); return rocksdb::CompressionTypeJni::toJavaCompressionType( cf_options->bottommost_compression); @@ -3521,7 +3549,7 @@ jbyte Java_org_rocksdb_ColumnFamilyOptions_bottommostCompressionType( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setBottommostCompressionOptions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jbottommost_compression_options_handle) { auto* cf_options = reinterpret_cast(jhandle); auto* bottommost_compression_options = @@ -3536,8 +3564,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setBottommostCompressionOptions( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionOptions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jcompression_options_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jcompression_options_handle) { auto* cf_options = reinterpret_cast(jhandle); auto* compression_options = reinterpret_cast( jcompression_options_handle); @@ -3550,9 +3577,10 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompressionOptions( * Signature: (JB)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionStyle( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jbyte compaction_style) { - reinterpret_cast(jhandle)->compaction_style = - static_cast(compaction_style); + JNIEnv*, jobject, jlong jhandle, jbyte jcompaction_style) { + auto* cf_options = reinterpret_cast(jhandle); + cf_options->compaction_style = + rocksdb::CompactionStyleJni::toCppCompactionStyle(jcompaction_style); } /* @@ -3560,11 +3588,11 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionStyle( * Method: compactionStyle * Signature: (J)B */ -jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionStyle(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { - return reinterpret_cast(jhandle) - ->compaction_style; +jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionStyle( + JNIEnv*, jobject, jlong jhandle) { + auto* cf_options = reinterpret_cast(jhandle); + return rocksdb::CompactionStyleJni::toJavaCompactionStyle( + cf_options->compaction_style); } /* @@ -3573,8 +3601,7 @@ jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionStyle(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxTableFilesSizeFIFO( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_table_files_size) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_table_files_size) { reinterpret_cast(jhandle) ->compaction_options_fifo.max_table_files_size = static_cast(jmax_table_files_size); @@ -3586,7 +3613,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxTableFilesSizeFIFO( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_maxTableFilesSizeFIFO( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->compaction_options_fifo.max_table_files_size; } @@ -3596,9 +3623,8 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_maxTableFilesSizeFIFO( * Method: numLevels * Signature: (J)I */ -jint Java_org_rocksdb_ColumnFamilyOptions_numLevels(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_ColumnFamilyOptions_numLevels( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->num_levels; } @@ -3607,10 +3633,8 @@ jint Java_org_rocksdb_ColumnFamilyOptions_numLevels(JNIEnv* /*env*/, * Method: setNumLevels * Signature: (JI)V */ -void Java_org_rocksdb_ColumnFamilyOptions_setNumLevels(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint jnum_levels) { +void Java_org_rocksdb_ColumnFamilyOptions_setNumLevels( + JNIEnv*, jobject, jlong jhandle, jint jnum_levels) { reinterpret_cast(jhandle)->num_levels = static_cast(jnum_levels); } @@ -3621,7 +3645,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setNumLevels(JNIEnv* /*env*/, * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroFileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger; } @@ -3632,7 +3656,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroFileNumCompactionTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroFileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jlevel0_file_num_compaction_trigger) { reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger = @@ -3645,7 +3669,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroFileNumCompactionTrigger( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroSlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger; } @@ -3656,8 +3680,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroSlowdownWritesTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroSlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_slowdown_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_slowdown_writes_trigger) { reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger = static_cast(jlevel0_slowdown_writes_trigger); @@ -3669,7 +3692,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroSlowdownWritesTrigger( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroStopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_stop_writes_trigger; } @@ -3680,8 +3703,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_levelZeroStopWritesTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroStopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_stop_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_stop_writes_trigger) { reinterpret_cast(jhandle) ->level0_stop_writes_trigger = static_cast(jlevel0_stop_writes_trigger); @@ -3692,9 +3714,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevelZeroStopWritesTrigger( * Method: targetFileSizeBase * Signature: (J)J */ -jlong Java_org_rocksdb_ColumnFamilyOptions_targetFileSizeBase(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ColumnFamilyOptions_targetFileSizeBase( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->target_file_size_base; } @@ -3705,8 +3726,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_targetFileSizeBase(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setTargetFileSizeBase( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jtarget_file_size_base) { + JNIEnv*, jobject, jlong jhandle, jlong jtarget_file_size_base) { reinterpret_cast(jhandle) ->target_file_size_base = static_cast(jtarget_file_size_base); } @@ -3717,7 +3737,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setTargetFileSizeBase( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_targetFileSizeMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->target_file_size_multiplier; } @@ -3728,8 +3748,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_targetFileSizeMultiplier( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setTargetFileSizeMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jtarget_file_size_multiplier) { + JNIEnv*, jobject, jlong jhandle, jint jtarget_file_size_multiplier) { reinterpret_cast(jhandle) ->target_file_size_multiplier = static_cast(jtarget_file_size_multiplier); @@ -3741,7 +3760,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setTargetFileSizeMultiplier( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelBase( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_bytes_for_level_base; } @@ -3752,8 +3771,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelBase( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelBase( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_bytes_for_level_base) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_bytes_for_level_base) { reinterpret_cast(jhandle) ->max_bytes_for_level_base = static_cast(jmax_bytes_for_level_base); @@ -3765,7 +3783,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelBase( * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_levelCompactionDynamicLevelBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level_compaction_dynamic_level_bytes; } @@ -3776,8 +3794,7 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_levelCompactionDynamicLevelBytes( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevelCompactionDynamicLevelBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jenable_dynamic_level_bytes) { + JNIEnv*, jobject, jlong jhandle, jboolean jenable_dynamic_level_bytes) { reinterpret_cast(jhandle) ->level_compaction_dynamic_level_bytes = (jenable_dynamic_level_bytes); } @@ -3788,7 +3805,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevelCompactionDynamicLevelBytes( * Signature: (J)D */ jdouble Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_bytes_for_level_multiplier; } @@ -3799,8 +3816,7 @@ jdouble Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelMultiplier( * Signature: (JD)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelMultiplier( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jdouble jmax_bytes_for_level_multiplier) { + JNIEnv*, jobject, jlong jhandle, jdouble jmax_bytes_for_level_multiplier) { reinterpret_cast(jhandle) ->max_bytes_for_level_multiplier = static_cast(jmax_bytes_for_level_multiplier); @@ -3811,9 +3827,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelMultiplier( * Method: maxCompactionBytes * Signature: (J)I */ -jlong Java_org_rocksdb_ColumnFamilyOptions_maxCompactionBytes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ColumnFamilyOptions_maxCompactionBytes( + JNIEnv*, jobject, jlong jhandle) { return static_cast( reinterpret_cast(jhandle) ->max_compaction_bytes); @@ -3825,8 +3840,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_maxCompactionBytes(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxCompactionBytes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jmax_compaction_bytes) { + JNIEnv*, jobject, jlong jhandle, jlong jmax_compaction_bytes) { reinterpret_cast(jhandle) ->max_compaction_bytes = static_cast(jmax_compaction_bytes); } @@ -3836,9 +3850,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxCompactionBytes( * Method: arenaBlockSize * Signature: (J)J */ -jlong Java_org_rocksdb_ColumnFamilyOptions_arenaBlockSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ColumnFamilyOptions_arenaBlockSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->arena_block_size; } @@ -3849,8 +3862,8 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_arenaBlockSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setArenaBlockSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, jlong jarena_block_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(jarena_block_size); + JNIEnv* env, jobject, jlong jhandle, jlong jarena_block_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(jarena_block_size); if (s.ok()) { reinterpret_cast(jhandle)->arena_block_size = jarena_block_size; @@ -3865,7 +3878,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setArenaBlockSize( * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_disableAutoCompactions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->disable_auto_compactions; } @@ -3876,8 +3889,7 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_disableAutoCompactions( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setDisableAutoCompactions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jdisable_auto_compactions) { + JNIEnv*, jobject, jlong jhandle, jboolean jdisable_auto_compactions) { reinterpret_cast(jhandle) ->disable_auto_compactions = static_cast(jdisable_auto_compactions); } @@ -3888,7 +3900,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setDisableAutoCompactions( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_maxSequentialSkipInIterations( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_sequential_skip_in_iterations; } @@ -3899,7 +3911,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_maxSequentialSkipInIterations( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxSequentialSkipInIterations( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jmax_sequential_skip_in_iterations) { reinterpret_cast(jhandle) ->max_sequential_skip_in_iterations = @@ -3912,7 +3924,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxSequentialSkipInIterations( * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_inplaceUpdateSupport( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->inplace_update_support; } @@ -3923,8 +3935,7 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_inplaceUpdateSupport( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setInplaceUpdateSupport( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jinplace_update_support) { + JNIEnv*, jobject, jlong jhandle, jboolean jinplace_update_support) { reinterpret_cast(jhandle) ->inplace_update_support = static_cast(jinplace_update_support); } @@ -3935,7 +3946,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setInplaceUpdateSupport( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_inplaceUpdateNumLocks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->inplace_update_num_locks; } @@ -3946,10 +3957,9 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_inplaceUpdateNumLocks( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setInplaceUpdateNumLocks( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jinplace_update_num_locks) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jinplace_update_num_locks); + JNIEnv* env, jobject, jlong jhandle, jlong jinplace_update_num_locks) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jinplace_update_num_locks); if (s.ok()) { reinterpret_cast(jhandle) ->inplace_update_num_locks = jinplace_update_num_locks; @@ -3964,7 +3974,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setInplaceUpdateNumLocks( * Signature: (J)I */ jdouble Java_org_rocksdb_ColumnFamilyOptions_memtablePrefixBloomSizeRatio( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->memtable_prefix_bloom_size_ratio; } @@ -3975,7 +3985,7 @@ jdouble Java_org_rocksdb_ColumnFamilyOptions_memtablePrefixBloomSizeRatio( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMemtablePrefixBloomSizeRatio( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jdouble jmemtable_prefix_bloom_size_ratio) { reinterpret_cast(jhandle) ->memtable_prefix_bloom_size_ratio = @@ -3987,9 +3997,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMemtablePrefixBloomSizeRatio( * Method: bloomLocality * Signature: (J)I */ -jint Java_org_rocksdb_ColumnFamilyOptions_bloomLocality(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_ColumnFamilyOptions_bloomLocality( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->bloom_locality; } @@ -4000,7 +4009,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_bloomLocality(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setBloomLocality( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jint jbloom_locality) { + JNIEnv*, jobject, jlong jhandle, jint jbloom_locality) { reinterpret_cast(jhandle)->bloom_locality = static_cast(jbloom_locality); } @@ -4010,9 +4019,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setBloomLocality( * Method: maxSuccessiveMerges * Signature: (J)J */ -jlong Java_org_rocksdb_ColumnFamilyOptions_maxSuccessiveMerges(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ColumnFamilyOptions_maxSuccessiveMerges( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_successive_merges; } @@ -4023,10 +4031,9 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_maxSuccessiveMerges(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxSuccessiveMerges( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jmax_successive_merges) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jmax_successive_merges); + JNIEnv* env, jobject, jlong jhandle, jlong jmax_successive_merges) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jmax_successive_merges); if (s.ok()) { reinterpret_cast(jhandle) ->max_successive_merges = jmax_successive_merges; @@ -4041,7 +4048,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxSuccessiveMerges( * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_optimizeFiltersForHits( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->optimize_filters_for_hits; } @@ -4052,8 +4059,7 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_optimizeFiltersForHits( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setOptimizeFiltersForHits( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean joptimize_filters_for_hits) { + JNIEnv*, jobject, jlong jhandle, jboolean joptimize_filters_for_hits) { reinterpret_cast(jhandle) ->optimize_filters_for_hits = static_cast(joptimize_filters_for_hits); @@ -4065,7 +4071,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setOptimizeFiltersForHits( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_memtableHugePageSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->memtable_huge_page_size; } @@ -4076,10 +4082,9 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_memtableHugePageSize( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMemtableHugePageSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong jmemtable_huge_page_size) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(jmemtable_huge_page_size); + JNIEnv* env, jobject, jlong jhandle, jlong jmemtable_huge_page_size) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(jmemtable_huge_page_size); if (s.ok()) { reinterpret_cast(jhandle) ->memtable_huge_page_size = jmemtable_huge_page_size; @@ -4094,7 +4099,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMemtableHugePageSize( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_softPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->soft_pending_compaction_bytes_limit; } @@ -4105,7 +4110,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_softPendingCompactionBytesLimit( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setSoftPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jsoft_pending_compaction_bytes_limit) { reinterpret_cast(jhandle) ->soft_pending_compaction_bytes_limit = @@ -4118,7 +4123,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setSoftPendingCompactionBytesLimit( * Signature: (J)J */ jlong Java_org_rocksdb_ColumnFamilyOptions_hardPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->hard_pending_compaction_bytes_limit; } @@ -4129,7 +4134,7 @@ jlong Java_org_rocksdb_ColumnFamilyOptions_hardPendingCompactionBytesLimit( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setHardPendingCompactionBytesLimit( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jhard_pending_compaction_bytes_limit) { reinterpret_cast(jhandle) ->hard_pending_compaction_bytes_limit = @@ -4142,7 +4147,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setHardPendingCompactionBytesLimit( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_level0FileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger; } @@ -4153,7 +4158,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_level0FileNumCompactionTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0FileNumCompactionTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jint jlevel0_file_num_compaction_trigger) { reinterpret_cast(jhandle) ->level0_file_num_compaction_trigger = @@ -4166,7 +4171,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0FileNumCompactionTrigger( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_level0SlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger; } @@ -4177,8 +4182,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_level0SlowdownWritesTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0SlowdownWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_slowdown_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_slowdown_writes_trigger) { reinterpret_cast(jhandle) ->level0_slowdown_writes_trigger = static_cast(jlevel0_slowdown_writes_trigger); @@ -4190,7 +4194,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0SlowdownWritesTrigger( * Signature: (J)I */ jint Java_org_rocksdb_ColumnFamilyOptions_level0StopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->level0_stop_writes_trigger; } @@ -4201,8 +4205,7 @@ jint Java_org_rocksdb_ColumnFamilyOptions_level0StopWritesTrigger( * Signature: (JI)V */ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0StopWritesTrigger( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jlevel0_stop_writes_trigger) { + JNIEnv*, jobject, jlong jhandle, jint jlevel0_stop_writes_trigger) { reinterpret_cast(jhandle) ->level0_stop_writes_trigger = static_cast(jlevel0_stop_writes_trigger); @@ -4213,9 +4216,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setLevel0StopWritesTrigger( * Method: maxBytesForLevelMultiplierAdditional * Signature: (J)[I */ -jintArray -Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelMultiplierAdditional( - JNIEnv* env, jobject /*jobj*/, jlong jhandle) { +jintArray Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelMultiplierAdditional( + JNIEnv* env, jobject, jlong jhandle) { auto mbflma = reinterpret_cast(jhandle) ->max_bytes_for_level_multiplier_additional; @@ -4252,7 +4254,7 @@ Java_org_rocksdb_ColumnFamilyOptions_maxBytesForLevelMultiplierAdditional( * Signature: (J[I)V */ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelMultiplierAdditional( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, + JNIEnv* env, jobject, jlong jhandle, jintArray jmax_bytes_for_level_multiplier_additional) { jsize len = env->GetArrayLength(jmax_bytes_for_level_multiplier_additional); jint* additionals = @@ -4279,7 +4281,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setMaxBytesForLevelMultiplierAdditiona * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_paranoidFileChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->paranoid_file_checks; } @@ -4290,8 +4292,7 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_paranoidFileChecks( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setParanoidFileChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jparanoid_file_checks) { + JNIEnv*, jobject, jlong jhandle, jboolean jparanoid_file_checks) { reinterpret_cast(jhandle) ->paranoid_file_checks = static_cast(jparanoid_file_checks); } @@ -4302,8 +4303,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setParanoidFileChecks( * Signature: (JB)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionPriority( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jcompaction_priority_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jcompaction_priority_value) { auto* cf_opts = reinterpret_cast(jhandle); cf_opts->compaction_pri = rocksdb::CompactionPriorityJni::toCppCompactionPriority( @@ -4315,9 +4315,8 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionPriority( * Method: compactionPriority * Signature: (J)B */ -jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionPriority(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionPriority( + JNIEnv*, jobject, jlong jhandle) { auto* cf_opts = reinterpret_cast(jhandle); return rocksdb::CompactionPriorityJni::toJavaCompactionPriority( cf_opts->compaction_pri); @@ -4329,8 +4328,7 @@ jbyte Java_org_rocksdb_ColumnFamilyOptions_compactionPriority(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setReportBgIoStats( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jreport_bg_io_stats) { + JNIEnv*, jobject, jlong jhandle, jboolean jreport_bg_io_stats) { auto* cf_opts = reinterpret_cast(jhandle); cf_opts->report_bg_io_stats = static_cast(jreport_bg_io_stats); } @@ -4340,20 +4338,41 @@ void Java_org_rocksdb_ColumnFamilyOptions_setReportBgIoStats( * Method: reportBgIoStats * Signature: (J)Z */ -jboolean Java_org_rocksdb_ColumnFamilyOptions_reportBgIoStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ColumnFamilyOptions_reportBgIoStats( + JNIEnv*, jobject, jlong jhandle) { auto* cf_opts = reinterpret_cast(jhandle); return static_cast(cf_opts->report_bg_io_stats); } +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: setTtl + * Signature: (JJ)V + */ +void Java_org_rocksdb_ColumnFamilyOptions_setTtl( + JNIEnv*, jobject, jlong jhandle, jlong jttl) { + auto* cf_opts = reinterpret_cast(jhandle); + cf_opts->ttl = static_cast(jttl); +} + +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: ttl + * Signature: (J)J + */ +JNIEXPORT jlong JNICALL Java_org_rocksdb_ColumnFamilyOptions_ttl( + JNIEnv*, jobject, jlong jhandle) { + auto* cf_opts = reinterpret_cast(jhandle); + return static_cast(cf_opts->ttl); +} + /* * Class: org_rocksdb_ColumnFamilyOptions * Method: setCompactionOptionsUniversal * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionOptionsUniversal( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_options_universal_handle) { auto* cf_opts = reinterpret_cast(jhandle); auto* opts_uni = reinterpret_cast( @@ -4367,8 +4386,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionOptionsUniversal( * Signature: (JJ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionOptionsFIFO( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jcompaction_options_fifo_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_options_fifo_handle) { auto* cf_opts = reinterpret_cast(jhandle); auto* opts_fifo = reinterpret_cast( jcompaction_options_fifo_handle); @@ -4381,8 +4399,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setCompactionOptionsFIFO( * Signature: (JZ)V */ void Java_org_rocksdb_ColumnFamilyOptions_setForceConsistencyChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jforce_consistency_checks) { + JNIEnv*, jobject, jlong jhandle, jboolean jforce_consistency_checks) { auto* cf_opts = reinterpret_cast(jhandle); cf_opts->force_consistency_checks = static_cast(jforce_consistency_checks); @@ -4394,7 +4411,7 @@ void Java_org_rocksdb_ColumnFamilyOptions_setForceConsistencyChecks( * Signature: (J)Z */ jboolean Java_org_rocksdb_ColumnFamilyOptions_forceConsistencyChecks( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* cf_opts = reinterpret_cast(jhandle); return static_cast(cf_opts->force_consistency_checks); } @@ -4407,7 +4424,8 @@ jboolean Java_org_rocksdb_ColumnFamilyOptions_forceConsistencyChecks( * Method: newDBOptions * Signature: ()J */ -jlong Java_org_rocksdb_DBOptions_newDBOptions(JNIEnv* /*env*/, jclass /*jcls*/) { +jlong Java_org_rocksdb_DBOptions_newDBOptions( + JNIEnv*, jclass) { auto* dbop = new rocksdb::DBOptions(); return reinterpret_cast(dbop); } @@ -4417,21 +4435,32 @@ jlong Java_org_rocksdb_DBOptions_newDBOptions(JNIEnv* /*env*/, jclass /*jcls*/) * Method: copyDBOptions * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_copyDBOptions(JNIEnv* /*env*/, jclass /*jcls*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_copyDBOptions( + JNIEnv*, jclass, jlong jhandle) { auto new_opt = new rocksdb::DBOptions(*(reinterpret_cast(jhandle))); return reinterpret_cast(new_opt); } +/* + * Class: org_rocksdb_DBOptions + * Method: newDBOptionsFromOptions + * Signature: (J)J + */ +jlong Java_org_rocksdb_DBOptions_newDBOptionsFromOptions( + JNIEnv*, jclass, jlong joptions_handle) { + auto new_opt = + new rocksdb::DBOptions(*reinterpret_cast(joptions_handle)); + return reinterpret_cast(new_opt); +} + /* * Class: org_rocksdb_DBOptions * Method: getDBOptionsFromProps * Signature: (Ljava/util/String;)J */ -jlong Java_org_rocksdb_DBOptions_getDBOptionsFromProps(JNIEnv* env, - jclass /*jclazz*/, - jstring jopt_string) { +jlong Java_org_rocksdb_DBOptions_getDBOptionsFromProps( + JNIEnv* env, jclass, jstring jopt_string) { const char* opt_string = env->GetStringUTFChars(jopt_string, nullptr); if (opt_string == nullptr) { // exception thrown: OutOfMemoryError @@ -4461,9 +4490,8 @@ jlong Java_org_rocksdb_DBOptions_getDBOptionsFromProps(JNIEnv* env, * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_DBOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong handle) { +void Java_org_rocksdb_DBOptions_disposeInternal( + JNIEnv*, jobject, jlong handle) { auto* dbo = reinterpret_cast(handle); assert(dbo != nullptr); delete dbo; @@ -4474,9 +4502,8 @@ void Java_org_rocksdb_DBOptions_disposeInternal(JNIEnv* /*env*/, * Method: optimizeForSmallDb * Signature: (J)V */ -void Java_org_rocksdb_DBOptions_optimizeForSmallDb(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_DBOptions_optimizeForSmallDb( + JNIEnv*, jobject, jlong jhandle) { reinterpret_cast(jhandle)->OptimizeForSmallDb(); } @@ -4485,8 +4512,8 @@ void Java_org_rocksdb_DBOptions_optimizeForSmallDb(JNIEnv* /*env*/, * Method: setEnv * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setEnv(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jlong jenv_handle) { +void Java_org_rocksdb_DBOptions_setEnv( + JNIEnv*, jobject, jlong jhandle, jlong jenv_handle) { reinterpret_cast(jhandle)->env = reinterpret_cast(jenv_handle); } @@ -4496,10 +4523,8 @@ void Java_org_rocksdb_DBOptions_setEnv(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setIncreaseParallelism * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setIncreaseParallelism(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint totalThreads) { +void Java_org_rocksdb_DBOptions_setIncreaseParallelism( + JNIEnv*, jobject, jlong jhandle, jint totalThreads) { reinterpret_cast(jhandle)->IncreaseParallelism( static_cast(totalThreads)); } @@ -4509,10 +4534,8 @@ void Java_org_rocksdb_DBOptions_setIncreaseParallelism(JNIEnv* /*env*/, * Method: setCreateIfMissing * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setCreateIfMissing(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean flag) { +void Java_org_rocksdb_DBOptions_setCreateIfMissing( + JNIEnv*, jobject, jlong jhandle, jboolean flag) { reinterpret_cast(jhandle)->create_if_missing = flag; } @@ -4521,9 +4544,8 @@ void Java_org_rocksdb_DBOptions_setCreateIfMissing(JNIEnv* /*env*/, * Method: createIfMissing * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_createIfMissing(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_createIfMissing( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->create_if_missing; } @@ -4532,10 +4554,8 @@ jboolean Java_org_rocksdb_DBOptions_createIfMissing(JNIEnv* /*env*/, * Method: setCreateMissingColumnFamilies * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setCreateMissingColumnFamilies(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean flag) { +void Java_org_rocksdb_DBOptions_setCreateMissingColumnFamilies( + JNIEnv*, jobject, jlong jhandle, jboolean flag) { reinterpret_cast(jhandle) ->create_missing_column_families = flag; } @@ -4546,7 +4566,7 @@ void Java_org_rocksdb_DBOptions_setCreateMissingColumnFamilies(JNIEnv* /*env*/, * Signature: (J)Z */ jboolean Java_org_rocksdb_DBOptions_createMissingColumnFamilies( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->create_missing_column_families; } @@ -4556,10 +4576,8 @@ jboolean Java_org_rocksdb_DBOptions_createMissingColumnFamilies( * Method: setErrorIfExists * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setErrorIfExists(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean error_if_exists) { +void Java_org_rocksdb_DBOptions_setErrorIfExists( + JNIEnv*, jobject, jlong jhandle, jboolean error_if_exists) { reinterpret_cast(jhandle)->error_if_exists = static_cast(error_if_exists); } @@ -4569,9 +4587,8 @@ void Java_org_rocksdb_DBOptions_setErrorIfExists(JNIEnv* /*env*/, * Method: errorIfExists * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_errorIfExists(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_errorIfExists( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->error_if_exists; } @@ -4580,10 +4597,8 @@ jboolean Java_org_rocksdb_DBOptions_errorIfExists(JNIEnv* /*env*/, * Method: setParanoidChecks * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setParanoidChecks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean paranoid_checks) { +void Java_org_rocksdb_DBOptions_setParanoidChecks( + JNIEnv*, jobject, jlong jhandle, jboolean paranoid_checks) { reinterpret_cast(jhandle)->paranoid_checks = static_cast(paranoid_checks); } @@ -4593,9 +4608,8 @@ void Java_org_rocksdb_DBOptions_setParanoidChecks(JNIEnv* /*env*/, * Method: paranoidChecks * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_paranoidChecks(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_paranoidChecks( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->paranoid_checks; } @@ -4604,9 +4618,8 @@ jboolean Java_org_rocksdb_DBOptions_paranoidChecks(JNIEnv* /*env*/, * Method: setRateLimiter * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setRateLimiter(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong jrate_limiter_handle) { +void Java_org_rocksdb_DBOptions_setRateLimiter( + JNIEnv*, jobject, jlong jhandle, jlong jrate_limiter_handle) { std::shared_ptr* pRateLimiter = reinterpret_cast*>( jrate_limiter_handle); @@ -4619,8 +4632,7 @@ void Java_org_rocksdb_DBOptions_setRateLimiter(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setSstFileManager( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jsst_file_manager_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jsst_file_manager_handle) { auto* sptr_sst_file_manager = reinterpret_cast*>( jsst_file_manager_handle); @@ -4633,8 +4645,8 @@ void Java_org_rocksdb_DBOptions_setSstFileManager( * Method: setLogger * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setLogger(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jlong jlogger_handle) { +void Java_org_rocksdb_DBOptions_setLogger( + JNIEnv*, jobject, jlong jhandle, jlong jlogger_handle) { std::shared_ptr* pLogger = reinterpret_cast*>( jlogger_handle); @@ -4646,9 +4658,8 @@ void Java_org_rocksdb_DBOptions_setLogger(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setInfoLogLevel * Signature: (JB)V */ -void Java_org_rocksdb_DBOptions_setInfoLogLevel(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jbyte jlog_level) { +void Java_org_rocksdb_DBOptions_setInfoLogLevel( + JNIEnv*, jobject, jlong jhandle, jbyte jlog_level) { reinterpret_cast(jhandle)->info_log_level = static_cast(jlog_level); } @@ -4658,8 +4669,8 @@ void Java_org_rocksdb_DBOptions_setInfoLogLevel(JNIEnv* /*env*/, * Method: infoLogLevel * Signature: (J)B */ -jbyte Java_org_rocksdb_DBOptions_infoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_DBOptions_infoLogLevel( + JNIEnv*, jobject, jlong jhandle) { return static_cast( reinterpret_cast(jhandle)->info_log_level); } @@ -4669,10 +4680,8 @@ jbyte Java_org_rocksdb_DBOptions_infoLogLevel(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setMaxTotalWalSize * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setMaxTotalWalSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jmax_total_wal_size) { +void Java_org_rocksdb_DBOptions_setMaxTotalWalSize( + JNIEnv*, jobject, jlong jhandle, jlong jmax_total_wal_size) { reinterpret_cast(jhandle)->max_total_wal_size = static_cast(jmax_total_wal_size); } @@ -4682,9 +4691,8 @@ void Java_org_rocksdb_DBOptions_setMaxTotalWalSize(JNIEnv* /*env*/, * Method: maxTotalWalSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_maxTotalWalSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_maxTotalWalSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_total_wal_size; } @@ -4693,9 +4701,8 @@ jlong Java_org_rocksdb_DBOptions_maxTotalWalSize(JNIEnv* /*env*/, * Method: setMaxOpenFiles * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setMaxOpenFiles(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jint max_open_files) { +void Java_org_rocksdb_DBOptions_setMaxOpenFiles( + JNIEnv*, jobject, jlong jhandle, jint max_open_files) { reinterpret_cast(jhandle)->max_open_files = static_cast(max_open_files); } @@ -4705,8 +4712,8 @@ void Java_org_rocksdb_DBOptions_setMaxOpenFiles(JNIEnv* /*env*/, * Method: maxOpenFiles * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxOpenFiles( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_open_files; } @@ -4716,8 +4723,7 @@ jint Java_org_rocksdb_DBOptions_maxOpenFiles(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (JI)V */ void Java_org_rocksdb_DBOptions_setMaxFileOpeningThreads( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jmax_file_opening_threads) { + JNIEnv*, jobject, jlong jhandle, jint jmax_file_opening_threads) { reinterpret_cast(jhandle)->max_file_opening_threads = static_cast(jmax_file_opening_threads); } @@ -4727,9 +4733,8 @@ void Java_org_rocksdb_DBOptions_setMaxFileOpeningThreads( * Method: maxFileOpeningThreads * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxFileOpeningThreads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxFileOpeningThreads( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->max_file_opening_threads); } @@ -4739,9 +4744,8 @@ jint Java_org_rocksdb_DBOptions_maxFileOpeningThreads(JNIEnv* /*env*/, * Method: setStatistics * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setStatistics(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jstatistics_handle) { +void Java_org_rocksdb_DBOptions_setStatistics( + JNIEnv*, jobject, jlong jhandle, jlong jstatistics_handle) { auto* opt = reinterpret_cast(jhandle); auto* pSptr = reinterpret_cast*>( jstatistics_handle); @@ -4753,8 +4757,8 @@ void Java_org_rocksdb_DBOptions_setStatistics(JNIEnv* /*env*/, jobject /*jobj*/, * Method: statistics * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_statistics(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_statistics( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); std::shared_ptr sptr = opt->statistics; if (sptr == nullptr) { @@ -4771,8 +4775,8 @@ jlong Java_org_rocksdb_DBOptions_statistics(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setUseFsync * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setUseFsync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jboolean use_fsync) { +void Java_org_rocksdb_DBOptions_setUseFsync( + JNIEnv*, jobject, jlong jhandle, jboolean use_fsync) { reinterpret_cast(jhandle)->use_fsync = static_cast(use_fsync); } @@ -4782,8 +4786,8 @@ void Java_org_rocksdb_DBOptions_setUseFsync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: useFsync * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_useFsync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_useFsync( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_fsync; } @@ -4792,9 +4796,9 @@ jboolean Java_org_rocksdb_DBOptions_useFsync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setDbPaths * Signature: (J[Ljava/lang/String;[J)V */ -void Java_org_rocksdb_DBOptions_setDbPaths(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jobjectArray jpaths, - jlongArray jtarget_sizes) { +void Java_org_rocksdb_DBOptions_setDbPaths( + JNIEnv* env, jobject, jlong jhandle, jobjectArray jpaths, + jlongArray jtarget_sizes) { std::vector db_paths; jlong* ptr_jtarget_size = env->GetLongArrayElements(jtarget_sizes, nullptr); if (ptr_jtarget_size == nullptr) { @@ -4838,8 +4842,8 @@ void Java_org_rocksdb_DBOptions_setDbPaths(JNIEnv* env, jobject /*jobj*/, * Method: dbPathsLen * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_dbPathsLen(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_dbPathsLen( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->db_paths.size()); } @@ -4849,9 +4853,9 @@ jlong Java_org_rocksdb_DBOptions_dbPathsLen(JNIEnv* /*env*/, jobject /*jobj*/, * Method: dbPaths * Signature: (J[Ljava/lang/String;[J)V */ -void Java_org_rocksdb_DBOptions_dbPaths(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jobjectArray jpaths, - jlongArray jtarget_sizes) { +void Java_org_rocksdb_DBOptions_dbPaths( + JNIEnv* env, jobject, jlong jhandle, jobjectArray jpaths, + jlongArray jtarget_sizes) { jlong* ptr_jtarget_size = env->GetLongArrayElements(jtarget_sizes, nullptr); if (ptr_jtarget_size == nullptr) { // exception thrown: OutOfMemoryError @@ -4888,9 +4892,8 @@ void Java_org_rocksdb_DBOptions_dbPaths(JNIEnv* env, jobject /*jobj*/, * Method: setDbLogDir * Signature: (JLjava/lang/String)V */ -void Java_org_rocksdb_DBOptions_setDbLogDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, - jstring jdb_log_dir) { +void Java_org_rocksdb_DBOptions_setDbLogDir( + JNIEnv* env, jobject, jlong jhandle, jstring jdb_log_dir) { const char* log_dir = env->GetStringUTFChars(jdb_log_dir, nullptr); if (log_dir == nullptr) { // exception thrown: OutOfMemoryError @@ -4906,8 +4909,8 @@ void Java_org_rocksdb_DBOptions_setDbLogDir(JNIEnv* env, jobject /*jobj*/, * Method: dbLogDir * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_DBOptions_dbLogDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_DBOptions_dbLogDir( + JNIEnv* env, jobject, jlong jhandle) { return env->NewStringUTF( reinterpret_cast(jhandle)->db_log_dir.c_str()); } @@ -4917,8 +4920,8 @@ jstring Java_org_rocksdb_DBOptions_dbLogDir(JNIEnv* env, jobject /*jobj*/, * Method: setWalDir * Signature: (JLjava/lang/String)V */ -void Java_org_rocksdb_DBOptions_setWalDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jstring jwal_dir) { +void Java_org_rocksdb_DBOptions_setWalDir( + JNIEnv* env, jobject, jlong jhandle, jstring jwal_dir) { const char* wal_dir = env->GetStringUTFChars(jwal_dir, 0); reinterpret_cast(jhandle)->wal_dir.assign(wal_dir); env->ReleaseStringUTFChars(jwal_dir, wal_dir); @@ -4929,8 +4932,8 @@ void Java_org_rocksdb_DBOptions_setWalDir(JNIEnv* env, jobject /*jobj*/, * Method: walDir * Signature: (J)Ljava/lang/String */ -jstring Java_org_rocksdb_DBOptions_walDir(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_DBOptions_walDir( + JNIEnv* env, jobject, jlong jhandle) { return env->NewStringUTF( reinterpret_cast(jhandle)->wal_dir.c_str()); } @@ -4941,7 +4944,7 @@ jstring Java_org_rocksdb_DBOptions_walDir(JNIEnv* env, jobject /*jobj*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setDeleteObsoleteFilesPeriodMicros( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jlong micros) { + JNIEnv*, jobject, jlong jhandle, jlong micros) { reinterpret_cast(jhandle) ->delete_obsolete_files_period_micros = static_cast(micros); } @@ -4952,7 +4955,7 @@ void Java_org_rocksdb_DBOptions_setDeleteObsoleteFilesPeriodMicros( * Signature: (J)J */ jlong Java_org_rocksdb_DBOptions_deleteObsoleteFilesPeriodMicros( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->delete_obsolete_files_period_micros; } @@ -4962,10 +4965,8 @@ jlong Java_org_rocksdb_DBOptions_deleteObsoleteFilesPeriodMicros( * Method: setBaseBackgroundCompactions * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setBaseBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max) { +void Java_org_rocksdb_DBOptions_setBaseBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->base_background_compactions = static_cast(max); } @@ -4975,9 +4976,8 @@ void Java_org_rocksdb_DBOptions_setBaseBackgroundCompactions(JNIEnv* /*env*/, * Method: baseBackgroundCompactions * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_baseBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_baseBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->base_background_compactions; } @@ -4987,10 +4987,8 @@ jint Java_org_rocksdb_DBOptions_baseBackgroundCompactions(JNIEnv* /*env*/, * Method: setMaxBackgroundCompactions * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setMaxBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max) { +void Java_org_rocksdb_DBOptions_setMaxBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->max_background_compactions = static_cast(max); } @@ -5000,9 +4998,8 @@ void Java_org_rocksdb_DBOptions_setMaxBackgroundCompactions(JNIEnv* /*env*/, * Method: maxBackgroundCompactions * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxBackgroundCompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxBackgroundCompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->max_background_compactions; } @@ -5012,9 +5009,8 @@ jint Java_org_rocksdb_DBOptions_maxBackgroundCompactions(JNIEnv* /*env*/, * Method: setMaxSubcompactions * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setMaxSubcompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, jint max) { +void Java_org_rocksdb_DBOptions_setMaxSubcompactions( + JNIEnv*, jobject, jlong jhandle, jint max) { reinterpret_cast(jhandle)->max_subcompactions = static_cast(max); } @@ -5024,9 +5020,8 @@ void Java_org_rocksdb_DBOptions_setMaxSubcompactions(JNIEnv* /*env*/, * Method: maxSubcompactions * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxSubcompactions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxSubcompactions( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_subcompactions; } @@ -5036,8 +5031,7 @@ jint Java_org_rocksdb_DBOptions_maxSubcompactions(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_DBOptions_setMaxBackgroundFlushes( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint max_background_flushes) { + JNIEnv*, jobject, jlong jhandle, jint max_background_flushes) { reinterpret_cast(jhandle)->max_background_flushes = static_cast(max_background_flushes); } @@ -5047,9 +5041,8 @@ void Java_org_rocksdb_DBOptions_setMaxBackgroundFlushes( * Method: maxBackgroundFlushes * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxBackgroundFlushes(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxBackgroundFlushes( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_background_flushes; } @@ -5058,10 +5051,8 @@ jint Java_org_rocksdb_DBOptions_maxBackgroundFlushes(JNIEnv* /*env*/, * Method: setMaxBackgroundJobs * Signature: (JI)V */ -void Java_org_rocksdb_DBOptions_setMaxBackgroundJobs(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jint max_background_jobs) { +void Java_org_rocksdb_DBOptions_setMaxBackgroundJobs( + JNIEnv*, jobject, jlong jhandle, jint max_background_jobs) { reinterpret_cast(jhandle)->max_background_jobs = static_cast(max_background_jobs); } @@ -5071,9 +5062,8 @@ void Java_org_rocksdb_DBOptions_setMaxBackgroundJobs(JNIEnv* /*env*/, * Method: maxBackgroundJobs * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_maxBackgroundJobs(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_maxBackgroundJobs( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_background_jobs; } @@ -5082,11 +5072,9 @@ jint Java_org_rocksdb_DBOptions_maxBackgroundJobs(JNIEnv* /*env*/, * Method: setMaxLogFileSize * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setMaxLogFileSize(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jlong max_log_file_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(max_log_file_size); +void Java_org_rocksdb_DBOptions_setMaxLogFileSize( + JNIEnv* env, jobject, jlong jhandle, jlong max_log_file_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(max_log_file_size); if (s.ok()) { reinterpret_cast(jhandle)->max_log_file_size = max_log_file_size; @@ -5100,9 +5088,8 @@ void Java_org_rocksdb_DBOptions_setMaxLogFileSize(JNIEnv* env, * Method: maxLogFileSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_maxLogFileSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_maxLogFileSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_log_file_size; } @@ -5112,10 +5099,9 @@ jlong Java_org_rocksdb_DBOptions_maxLogFileSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setLogFileTimeToRoll( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong log_file_time_to_roll) { - rocksdb::Status s = - rocksdb::check_if_jlong_fits_size_t(log_file_time_to_roll); + JNIEnv* env, jobject, jlong jhandle, jlong log_file_time_to_roll) { + auto s = + rocksdb::JniUtil::check_if_jlong_fits_size_t(log_file_time_to_roll); if (s.ok()) { reinterpret_cast(jhandle)->log_file_time_to_roll = log_file_time_to_roll; @@ -5129,9 +5115,8 @@ void Java_org_rocksdb_DBOptions_setLogFileTimeToRoll( * Method: logFileTimeToRoll * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_logFileTimeToRoll(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_logFileTimeToRoll( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->log_file_time_to_roll; } @@ -5140,11 +5125,9 @@ jlong Java_org_rocksdb_DBOptions_logFileTimeToRoll(JNIEnv* /*env*/, * Method: setKeepLogFileNum * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setKeepLogFileNum(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jlong keep_log_file_num) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(keep_log_file_num); +void Java_org_rocksdb_DBOptions_setKeepLogFileNum( + JNIEnv* env, jobject, jlong jhandle, jlong keep_log_file_num) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(keep_log_file_num); if (s.ok()) { reinterpret_cast(jhandle)->keep_log_file_num = keep_log_file_num; @@ -5158,9 +5141,8 @@ void Java_org_rocksdb_DBOptions_setKeepLogFileNum(JNIEnv* env, * Method: keepLogFileNum * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_keepLogFileNum(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_keepLogFileNum( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->keep_log_file_num; } @@ -5170,9 +5152,8 @@ jlong Java_org_rocksdb_DBOptions_keepLogFileNum(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setRecycleLogFileNum( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong recycle_log_file_num) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(recycle_log_file_num); + JNIEnv* env, jobject, jlong jhandle, jlong recycle_log_file_num) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(recycle_log_file_num); if (s.ok()) { reinterpret_cast(jhandle)->recycle_log_file_num = recycle_log_file_num; @@ -5186,9 +5167,8 @@ void Java_org_rocksdb_DBOptions_setRecycleLogFileNum( * Method: recycleLogFileNum * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_recycleLogFileNum(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_recycleLogFileNum( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->recycle_log_file_num; } @@ -5198,8 +5178,7 @@ jlong Java_org_rocksdb_DBOptions_recycleLogFileNum(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setMaxManifestFileSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong max_manifest_file_size) { + JNIEnv*, jobject, jlong jhandle, jlong max_manifest_file_size) { reinterpret_cast(jhandle)->max_manifest_file_size = static_cast(max_manifest_file_size); } @@ -5209,9 +5188,8 @@ void Java_org_rocksdb_DBOptions_setMaxManifestFileSize( * Method: maxManifestFileSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_maxManifestFileSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_maxManifestFileSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->max_manifest_file_size; } @@ -5221,8 +5199,7 @@ jlong Java_org_rocksdb_DBOptions_maxManifestFileSize(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_DBOptions_setTableCacheNumshardbits( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint table_cache_numshardbits) { + JNIEnv*, jobject, jlong jhandle, jint table_cache_numshardbits) { reinterpret_cast(jhandle)->table_cache_numshardbits = static_cast(table_cache_numshardbits); } @@ -5232,9 +5209,8 @@ void Java_org_rocksdb_DBOptions_setTableCacheNumshardbits( * Method: tableCacheNumshardbits * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_tableCacheNumshardbits(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_tableCacheNumshardbits( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->table_cache_numshardbits; } @@ -5244,10 +5220,8 @@ jint Java_org_rocksdb_DBOptions_tableCacheNumshardbits(JNIEnv* /*env*/, * Method: setWalTtlSeconds * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setWalTtlSeconds(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong WAL_ttl_seconds) { +void Java_org_rocksdb_DBOptions_setWalTtlSeconds( + JNIEnv*, jobject, jlong jhandle, jlong WAL_ttl_seconds) { reinterpret_cast(jhandle)->WAL_ttl_seconds = static_cast(WAL_ttl_seconds); } @@ -5257,9 +5231,8 @@ void Java_org_rocksdb_DBOptions_setWalTtlSeconds(JNIEnv* /*env*/, * Method: walTtlSeconds * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_walTtlSeconds(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_walTtlSeconds( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->WAL_ttl_seconds; } @@ -5268,10 +5241,8 @@ jlong Java_org_rocksdb_DBOptions_walTtlSeconds(JNIEnv* /*env*/, * Method: setWalSizeLimitMB * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setWalSizeLimitMB(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong WAL_size_limit_MB) { +void Java_org_rocksdb_DBOptions_setWalSizeLimitMB( + JNIEnv*, jobject, jlong jhandle, jlong WAL_size_limit_MB) { reinterpret_cast(jhandle)->WAL_size_limit_MB = static_cast(WAL_size_limit_MB); } @@ -5281,9 +5252,8 @@ void Java_org_rocksdb_DBOptions_setWalSizeLimitMB(JNIEnv* /*env*/, * Method: walTtlSeconds * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_walSizeLimitMB(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_walSizeLimitMB( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->WAL_size_limit_MB; } @@ -5293,9 +5263,8 @@ jlong Java_org_rocksdb_DBOptions_walSizeLimitMB(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setManifestPreallocationSize( - JNIEnv* env, jobject /*jobj*/, jlong jhandle, - jlong preallocation_size) { - rocksdb::Status s = rocksdb::check_if_jlong_fits_size_t(preallocation_size); + JNIEnv* env, jobject, jlong jhandle, jlong preallocation_size) { + auto s = rocksdb::JniUtil::check_if_jlong_fits_size_t(preallocation_size); if (s.ok()) { reinterpret_cast(jhandle) ->manifest_preallocation_size = preallocation_size; @@ -5309,9 +5278,8 @@ void Java_org_rocksdb_DBOptions_setManifestPreallocationSize( * Method: manifestPreallocationSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_manifestPreallocationSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_manifestPreallocationSize( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->manifest_preallocation_size; } @@ -5321,9 +5289,8 @@ jlong Java_org_rocksdb_DBOptions_manifestPreallocationSize(JNIEnv* /*env*/, * Method: useDirectReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_useDirectReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_useDirectReads( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_direct_reads; } @@ -5332,10 +5299,8 @@ jboolean Java_org_rocksdb_DBOptions_useDirectReads(JNIEnv* /*env*/, * Method: setUseDirectReads * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setUseDirectReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean use_direct_reads) { +void Java_org_rocksdb_DBOptions_setUseDirectReads( + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_reads) { reinterpret_cast(jhandle)->use_direct_reads = static_cast(use_direct_reads); } @@ -5346,7 +5311,7 @@ void Java_org_rocksdb_DBOptions_setUseDirectReads(JNIEnv* /*env*/, * Signature: (J)Z */ jboolean Java_org_rocksdb_DBOptions_useDirectIoForFlushAndCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->use_direct_io_for_flush_and_compaction; } @@ -5357,7 +5322,7 @@ jboolean Java_org_rocksdb_DBOptions_useDirectIoForFlushAndCompaction( * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setUseDirectIoForFlushAndCompaction( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean use_direct_io_for_flush_and_compaction) { reinterpret_cast(jhandle) ->use_direct_io_for_flush_and_compaction = @@ -5369,10 +5334,8 @@ void Java_org_rocksdb_DBOptions_setUseDirectIoForFlushAndCompaction( * Method: setAllowFAllocate * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setAllowFAllocate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean jallow_fallocate) { +void Java_org_rocksdb_DBOptions_setAllowFAllocate( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_fallocate) { reinterpret_cast(jhandle)->allow_fallocate = static_cast(jallow_fallocate); } @@ -5382,9 +5345,8 @@ void Java_org_rocksdb_DBOptions_setAllowFAllocate(JNIEnv* /*env*/, * Method: allowFAllocate * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_allowFAllocate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_allowFAllocate( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->allow_fallocate); } @@ -5394,10 +5356,8 @@ jboolean Java_org_rocksdb_DBOptions_allowFAllocate(JNIEnv* /*env*/, * Method: setAllowMmapReads * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setAllowMmapReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean allow_mmap_reads) { +void Java_org_rocksdb_DBOptions_setAllowMmapReads( + JNIEnv*, jobject, jlong jhandle, jboolean allow_mmap_reads) { reinterpret_cast(jhandle)->allow_mmap_reads = static_cast(allow_mmap_reads); } @@ -5407,9 +5367,8 @@ void Java_org_rocksdb_DBOptions_setAllowMmapReads(JNIEnv* /*env*/, * Method: allowMmapReads * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_allowMmapReads(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_allowMmapReads( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->allow_mmap_reads; } @@ -5418,10 +5377,8 @@ jboolean Java_org_rocksdb_DBOptions_allowMmapReads(JNIEnv* /*env*/, * Method: setAllowMmapWrites * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setAllowMmapWrites(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean allow_mmap_writes) { +void Java_org_rocksdb_DBOptions_setAllowMmapWrites( + JNIEnv*, jobject, jlong jhandle, jboolean allow_mmap_writes) { reinterpret_cast(jhandle)->allow_mmap_writes = static_cast(allow_mmap_writes); } @@ -5431,9 +5388,8 @@ void Java_org_rocksdb_DBOptions_setAllowMmapWrites(JNIEnv* /*env*/, * Method: allowMmapWrites * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_allowMmapWrites(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_allowMmapWrites( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->allow_mmap_writes; } @@ -5443,8 +5399,7 @@ jboolean Java_org_rocksdb_DBOptions_allowMmapWrites(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setIsFdCloseOnExec( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean is_fd_close_on_exec) { + JNIEnv*, jobject, jlong jhandle, jboolean is_fd_close_on_exec) { reinterpret_cast(jhandle)->is_fd_close_on_exec = static_cast(is_fd_close_on_exec); } @@ -5454,9 +5409,8 @@ void Java_org_rocksdb_DBOptions_setIsFdCloseOnExec( * Method: isFdCloseOnExec * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_isFdCloseOnExec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_isFdCloseOnExec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->is_fd_close_on_exec; } @@ -5466,8 +5420,7 @@ jboolean Java_org_rocksdb_DBOptions_isFdCloseOnExec(JNIEnv* /*env*/, * Signature: (JI)V */ void Java_org_rocksdb_DBOptions_setStatsDumpPeriodSec( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint stats_dump_period_sec) { + JNIEnv*, jobject, jlong jhandle, jint stats_dump_period_sec) { reinterpret_cast(jhandle)->stats_dump_period_sec = static_cast(stats_dump_period_sec); } @@ -5477,9 +5430,8 @@ void Java_org_rocksdb_DBOptions_setStatsDumpPeriodSec( * Method: statsDumpPeriodSec * Signature: (J)I */ -jint Java_org_rocksdb_DBOptions_statsDumpPeriodSec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jint Java_org_rocksdb_DBOptions_statsDumpPeriodSec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->stats_dump_period_sec; } @@ -5489,8 +5441,7 @@ jint Java_org_rocksdb_DBOptions_statsDumpPeriodSec(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setAdviseRandomOnOpen( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean advise_random_on_open) { + JNIEnv*, jobject, jlong jhandle, jboolean advise_random_on_open) { reinterpret_cast(jhandle)->advise_random_on_open = static_cast(advise_random_on_open); } @@ -5500,9 +5451,8 @@ void Java_org_rocksdb_DBOptions_setAdviseRandomOnOpen( * Method: adviseRandomOnOpen * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_adviseRandomOnOpen(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_adviseRandomOnOpen( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->advise_random_on_open; } @@ -5512,20 +5462,32 @@ jboolean Java_org_rocksdb_DBOptions_adviseRandomOnOpen(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setDbWriteBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jdb_write_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong jdb_write_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->db_write_buffer_size = static_cast(jdb_write_buffer_size); } +/* + * Class: org_rocksdb_DBOptions + * Method: setWriteBufferManager + * Signature: (JJ)V + */ +void Java_org_rocksdb_DBOptions_setWriteBufferManager( + JNIEnv*, jobject, jlong jdb_options_handle, + jlong jwrite_buffer_manager_handle) { + auto* write_buffer_manager = + reinterpret_cast *>(jwrite_buffer_manager_handle); + reinterpret_cast(jdb_options_handle)->write_buffer_manager = + *write_buffer_manager; +} + /* * Class: org_rocksdb_DBOptions * Method: dbWriteBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_dbWriteBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_dbWriteBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->db_write_buffer_size); } @@ -5536,8 +5498,7 @@ jlong Java_org_rocksdb_DBOptions_dbWriteBufferSize(JNIEnv* /*env*/, * Signature: (JB)V */ void Java_org_rocksdb_DBOptions_setAccessHintOnCompactionStart( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jaccess_hint_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jaccess_hint_value) { auto* opt = reinterpret_cast(jhandle); opt->access_hint_on_compaction_start = rocksdb::AccessHintJni::toCppAccessHint(jaccess_hint_value); @@ -5548,9 +5509,8 @@ void Java_org_rocksdb_DBOptions_setAccessHintOnCompactionStart( * Method: accessHintOnCompactionStart * Signature: (J)B */ -jbyte Java_org_rocksdb_DBOptions_accessHintOnCompactionStart(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_DBOptions_accessHintOnCompactionStart( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return rocksdb::AccessHintJni::toJavaAccessHint( opt->access_hint_on_compaction_start); @@ -5562,7 +5522,7 @@ jbyte Java_org_rocksdb_DBOptions_accessHintOnCompactionStart(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setNewTableReaderForCompactionInputs( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean jnew_table_reader_for_compaction_inputs) { auto* opt = reinterpret_cast(jhandle); opt->new_table_reader_for_compaction_inputs = @@ -5575,7 +5535,7 @@ void Java_org_rocksdb_DBOptions_setNewTableReaderForCompactionInputs( * Signature: (J)Z */ jboolean Java_org_rocksdb_DBOptions_newTableReaderForCompactionInputs( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->new_table_reader_for_compaction_inputs); } @@ -5586,8 +5546,7 @@ jboolean Java_org_rocksdb_DBOptions_newTableReaderForCompactionInputs( * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setCompactionReadaheadSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jcompaction_readahead_size) { + JNIEnv*, jobject, jlong jhandle, jlong jcompaction_readahead_size) { auto* opt = reinterpret_cast(jhandle); opt->compaction_readahead_size = static_cast(jcompaction_readahead_size); @@ -5598,9 +5557,8 @@ void Java_org_rocksdb_DBOptions_setCompactionReadaheadSize( * Method: compactionReadaheadSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_compactionReadaheadSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_compactionReadaheadSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->compaction_readahead_size); } @@ -5611,8 +5569,7 @@ jlong Java_org_rocksdb_DBOptions_compactionReadaheadSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setRandomAccessMaxBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jrandom_access_max_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong jrandom_access_max_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->random_access_max_buffer_size = static_cast(jrandom_access_max_buffer_size); @@ -5623,9 +5580,8 @@ void Java_org_rocksdb_DBOptions_setRandomAccessMaxBufferSize( * Method: randomAccessMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_randomAccessMaxBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_randomAccessMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->random_access_max_buffer_size); } @@ -5636,8 +5592,7 @@ jlong Java_org_rocksdb_DBOptions_randomAccessMaxBufferSize(JNIEnv* /*env*/, * Signature: (JJ)V */ void Java_org_rocksdb_DBOptions_setWritableFileMaxBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwritable_file_max_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jlong jwritable_file_max_buffer_size) { auto* opt = reinterpret_cast(jhandle); opt->writable_file_max_buffer_size = static_cast(jwritable_file_max_buffer_size); @@ -5648,9 +5603,8 @@ void Java_org_rocksdb_DBOptions_setWritableFileMaxBufferSize( * Method: writableFileMaxBufferSize * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_writableFileMaxBufferSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_writableFileMaxBufferSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->writable_file_max_buffer_size); } @@ -5661,8 +5615,7 @@ jlong Java_org_rocksdb_DBOptions_writableFileMaxBufferSize(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setUseAdaptiveMutex( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean use_adaptive_mutex) { + JNIEnv*, jobject, jlong jhandle, jboolean use_adaptive_mutex) { reinterpret_cast(jhandle)->use_adaptive_mutex = static_cast(use_adaptive_mutex); } @@ -5672,9 +5625,8 @@ void Java_org_rocksdb_DBOptions_setUseAdaptiveMutex( * Method: useAdaptiveMutex * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_useAdaptiveMutex(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_useAdaptiveMutex( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->use_adaptive_mutex; } @@ -5683,9 +5635,8 @@ jboolean Java_org_rocksdb_DBOptions_useAdaptiveMutex(JNIEnv* /*env*/, * Method: setBytesPerSync * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setBytesPerSync(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jlong bytes_per_sync) { +void Java_org_rocksdb_DBOptions_setBytesPerSync( + JNIEnv*, jobject, jlong jhandle, jlong bytes_per_sync) { reinterpret_cast(jhandle)->bytes_per_sync = static_cast(bytes_per_sync); } @@ -5695,8 +5646,8 @@ void Java_org_rocksdb_DBOptions_setBytesPerSync(JNIEnv* /*env*/, * Method: bytesPerSync * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_bytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_bytesPerSync( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->bytes_per_sync; } @@ -5705,10 +5656,8 @@ jlong Java_org_rocksdb_DBOptions_bytesPerSync(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setWalBytesPerSync * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setWalBytesPerSync(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jwal_bytes_per_sync) { +void Java_org_rocksdb_DBOptions_setWalBytesPerSync( + JNIEnv*, jobject, jlong jhandle, jlong jwal_bytes_per_sync) { reinterpret_cast(jhandle)->wal_bytes_per_sync = static_cast(jwal_bytes_per_sync); } @@ -5718,60 +5667,76 @@ void Java_org_rocksdb_DBOptions_setWalBytesPerSync(JNIEnv* /*env*/, * Method: walBytesPerSync * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_walBytesPerSync(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_walBytesPerSync( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->wal_bytes_per_sync); } /* * Class: org_rocksdb_DBOptions - * Method: setEnableThreadTracking + * Method: setDelayedWriteRate + * Signature: (JJ)V + */ +void Java_org_rocksdb_DBOptions_setDelayedWriteRate( + JNIEnv*, jobject, jlong jhandle, jlong jdelayed_write_rate) { + auto* opt = reinterpret_cast(jhandle); + opt->delayed_write_rate = static_cast(jdelayed_write_rate); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: delayedWriteRate + * Signature: (J)J + */ +jlong Java_org_rocksdb_DBOptions_delayedWriteRate( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->delayed_write_rate); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: setEnablePipelinedWrite * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setEnableThreadTracking( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jenable_thread_tracking) { +void Java_org_rocksdb_DBOptions_setEnablePipelinedWrite( + JNIEnv*, jobject, jlong jhandle, jboolean jenable_pipelined_write) { auto* opt = reinterpret_cast(jhandle); - opt->enable_thread_tracking = static_cast(jenable_thread_tracking); + opt->enable_pipelined_write = jenable_pipelined_write == JNI_TRUE; } /* * Class: org_rocksdb_DBOptions - * Method: enableThreadTracking + * Method: enablePipelinedWrite * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_enableThreadTracking(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_enablePipelinedWrite( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); - return static_cast(opt->enable_thread_tracking); + return static_cast(opt->enable_pipelined_write); } /* * Class: org_rocksdb_DBOptions - * Method: setDelayedWriteRate - * Signature: (JJ)V + * Method: setEnableThreadTracking + * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setDelayedWriteRate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jdelayed_write_rate) { +void Java_org_rocksdb_DBOptions_setEnableThreadTracking( + JNIEnv*, jobject, jlong jhandle, jboolean jenable_thread_tracking) { auto* opt = reinterpret_cast(jhandle); - opt->delayed_write_rate = static_cast(jdelayed_write_rate); + opt->enable_thread_tracking = jenable_thread_tracking == JNI_TRUE; } /* * Class: org_rocksdb_DBOptions - * Method: delayedWriteRate - * Signature: (J)J + * Method: enableThreadTracking + * Signature: (J)Z */ -jlong Java_org_rocksdb_DBOptions_delayedWriteRate(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_enableThreadTracking( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); - return static_cast(opt->delayed_write_rate); + return static_cast(opt->enable_thread_tracking); } /* @@ -5780,7 +5745,7 @@ jlong Java_org_rocksdb_DBOptions_delayedWriteRate(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setAllowConcurrentMemtableWrite( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jboolean allow) { + JNIEnv*, jobject, jlong jhandle, jboolean allow) { reinterpret_cast(jhandle) ->allow_concurrent_memtable_write = static_cast(allow); } @@ -5791,7 +5756,7 @@ void Java_org_rocksdb_DBOptions_setAllowConcurrentMemtableWrite( * Signature: (J)Z */ jboolean Java_org_rocksdb_DBOptions_allowConcurrentMemtableWrite( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->allow_concurrent_memtable_write; } @@ -5802,7 +5767,7 @@ jboolean Java_org_rocksdb_DBOptions_allowConcurrentMemtableWrite( * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setEnableWriteThreadAdaptiveYield( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, jboolean yield) { + JNIEnv*, jobject, jlong jhandle, jboolean yield) { reinterpret_cast(jhandle) ->enable_write_thread_adaptive_yield = static_cast(yield); } @@ -5813,7 +5778,7 @@ void Java_org_rocksdb_DBOptions_setEnableWriteThreadAdaptiveYield( * Signature: (J)Z */ jboolean Java_org_rocksdb_DBOptions_enableWriteThreadAdaptiveYield( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->enable_write_thread_adaptive_yield; } @@ -5823,10 +5788,8 @@ jboolean Java_org_rocksdb_DBOptions_enableWriteThreadAdaptiveYield( * Method: setWriteThreadMaxYieldUsec * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setWriteThreadMaxYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong max) { +void Java_org_rocksdb_DBOptions_setWriteThreadMaxYieldUsec( + JNIEnv*, jobject, jlong jhandle, jlong max) { reinterpret_cast(jhandle)->write_thread_max_yield_usec = static_cast(max); } @@ -5836,9 +5799,8 @@ void Java_org_rocksdb_DBOptions_setWriteThreadMaxYieldUsec(JNIEnv* /*env*/, * Method: writeThreadMaxYieldUsec * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_writeThreadMaxYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_writeThreadMaxYieldUsec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->write_thread_max_yield_usec; } @@ -5848,10 +5810,8 @@ jlong Java_org_rocksdb_DBOptions_writeThreadMaxYieldUsec(JNIEnv* /*env*/, * Method: setWriteThreadSlowYieldUsec * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setWriteThreadSlowYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong slow) { +void Java_org_rocksdb_DBOptions_setWriteThreadSlowYieldUsec( + JNIEnv*, jobject, jlong jhandle, jlong slow) { reinterpret_cast(jhandle)->write_thread_slow_yield_usec = static_cast(slow); } @@ -5861,9 +5821,8 @@ void Java_org_rocksdb_DBOptions_setWriteThreadSlowYieldUsec(JNIEnv* /*env*/, * Method: writeThreadSlowYieldUsec * Signature: (J)J */ -jlong Java_org_rocksdb_DBOptions_writeThreadSlowYieldUsec(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_DBOptions_writeThreadSlowYieldUsec( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->write_thread_slow_yield_usec; } @@ -5874,8 +5833,7 @@ jlong Java_org_rocksdb_DBOptions_writeThreadSlowYieldUsec(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setSkipStatsUpdateOnDbOpen( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jskip_stats_update_on_db_open) { + JNIEnv*, jobject, jlong jhandle, jboolean jskip_stats_update_on_db_open) { auto* opt = reinterpret_cast(jhandle); opt->skip_stats_update_on_db_open = static_cast(jskip_stats_update_on_db_open); @@ -5886,9 +5844,8 @@ void Java_org_rocksdb_DBOptions_setSkipStatsUpdateOnDbOpen( * Method: skipStatsUpdateOnDbOpen * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_skipStatsUpdateOnDbOpen(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_skipStatsUpdateOnDbOpen( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->skip_stats_update_on_db_open); } @@ -5899,8 +5856,7 @@ jboolean Java_org_rocksdb_DBOptions_skipStatsUpdateOnDbOpen(JNIEnv* /*env*/, * Signature: (JB)V */ void Java_org_rocksdb_DBOptions_setWalRecoveryMode( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jbyte jwal_recovery_mode_value) { + JNIEnv*, jobject, jlong jhandle, jbyte jwal_recovery_mode_value) { auto* opt = reinterpret_cast(jhandle); opt->wal_recovery_mode = rocksdb::WALRecoveryModeJni::toCppWALRecoveryMode( jwal_recovery_mode_value); @@ -5911,9 +5867,8 @@ void Java_org_rocksdb_DBOptions_setWalRecoveryMode( * Method: walRecoveryMode * Signature: (J)B */ -jbyte Java_org_rocksdb_DBOptions_walRecoveryMode(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_DBOptions_walRecoveryMode( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return rocksdb::WALRecoveryModeJni::toJavaWALRecoveryMode( opt->wal_recovery_mode); @@ -5924,9 +5879,8 @@ jbyte Java_org_rocksdb_DBOptions_walRecoveryMode(JNIEnv* /*env*/, * Method: setAllow2pc * Signature: (JZ)V */ -void Java_org_rocksdb_DBOptions_setAllow2pc(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jboolean jallow_2pc) { +void Java_org_rocksdb_DBOptions_setAllow2pc( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_2pc) { auto* opt = reinterpret_cast(jhandle); opt->allow_2pc = static_cast(jallow_2pc); } @@ -5936,8 +5890,8 @@ void Java_org_rocksdb_DBOptions_setAllow2pc(JNIEnv* /*env*/, jobject /*jobj*/, * Method: allow2pc * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_allow2pc(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_allow2pc( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->allow_2pc); } @@ -5947,23 +5901,34 @@ jboolean Java_org_rocksdb_DBOptions_allow2pc(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setRowCache * Signature: (JJ)V */ -void Java_org_rocksdb_DBOptions_setRowCache(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jlong jrow_cache_handle) { +void Java_org_rocksdb_DBOptions_setRowCache( + JNIEnv*, jobject, jlong jhandle, jlong jrow_cache_handle) { auto* opt = reinterpret_cast(jhandle); auto* row_cache = reinterpret_cast*>(jrow_cache_handle); opt->row_cache = *row_cache; } +/* + * Class: org_rocksdb_DBOptions + * Method: setWalFilter + * Signature: (JJ)V + */ +void Java_org_rocksdb_DBOptions_setWalFilter( + JNIEnv*, jobject, jlong jhandle, jlong jwal_filter_handle) { + auto* opt = reinterpret_cast(jhandle); + auto* wal_filter = + reinterpret_cast(jwal_filter_handle); + opt->wal_filter = wal_filter; +} + /* * Class: org_rocksdb_DBOptions * Method: setFailIfOptionsFileError * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setFailIfOptionsFileError( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jfail_if_options_file_error) { + JNIEnv*, jobject, jlong jhandle, jboolean jfail_if_options_file_error) { auto* opt = reinterpret_cast(jhandle); opt->fail_if_options_file_error = static_cast(jfail_if_options_file_error); @@ -5974,9 +5939,8 @@ void Java_org_rocksdb_DBOptions_setFailIfOptionsFileError( * Method: failIfOptionsFileError * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_failIfOptionsFileError(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_failIfOptionsFileError( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->fail_if_options_file_error); } @@ -5987,8 +5951,7 @@ jboolean Java_org_rocksdb_DBOptions_failIfOptionsFileError(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setDumpMallocStats( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jdump_malloc_stats) { + JNIEnv*, jobject, jlong jhandle, jboolean jdump_malloc_stats) { auto* opt = reinterpret_cast(jhandle); opt->dump_malloc_stats = static_cast(jdump_malloc_stats); } @@ -5998,9 +5961,8 @@ void Java_org_rocksdb_DBOptions_setDumpMallocStats( * Method: dumpMallocStats * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_dumpMallocStats(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_dumpMallocStats( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->dump_malloc_stats); } @@ -6011,8 +5973,7 @@ jboolean Java_org_rocksdb_DBOptions_dumpMallocStats(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setAvoidFlushDuringRecovery( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean javoid_flush_during_recovery) { + JNIEnv*, jobject, jlong jhandle, jboolean javoid_flush_during_recovery) { auto* opt = reinterpret_cast(jhandle); opt->avoid_flush_during_recovery = static_cast(javoid_flush_during_recovery); @@ -6023,21 +5984,129 @@ void Java_org_rocksdb_DBOptions_setAvoidFlushDuringRecovery( * Method: avoidFlushDuringRecovery * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_avoidFlushDuringRecovery(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_avoidFlushDuringRecovery( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->avoid_flush_during_recovery); } +/* + * Class: org_rocksdb_DBOptions + * Method: setAllowIngestBehind + * Signature: (JZ)V + */ +void Java_org_rocksdb_DBOptions_setAllowIngestBehind( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_ingest_behind) { + auto* opt = reinterpret_cast(jhandle); + opt->allow_ingest_behind = jallow_ingest_behind == JNI_TRUE; +} + +/* + * Class: org_rocksdb_DBOptions + * Method: allowIngestBehind + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_DBOptions_allowIngestBehind( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->allow_ingest_behind); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: setPreserveDeletes + * Signature: (JZ)V + */ +void Java_org_rocksdb_DBOptions_setPreserveDeletes( + JNIEnv*, jobject, jlong jhandle, jboolean jpreserve_deletes) { + auto* opt = reinterpret_cast(jhandle); + opt->preserve_deletes = jpreserve_deletes == JNI_TRUE; +} + +/* + * Class: org_rocksdb_DBOptions + * Method: preserveDeletes + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_DBOptions_preserveDeletes( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->preserve_deletes); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: setTwoWriteQueues + * Signature: (JZ)V + */ +void Java_org_rocksdb_DBOptions_setTwoWriteQueues( + JNIEnv*, jobject, jlong jhandle, jboolean jtwo_write_queues) { + auto* opt = reinterpret_cast(jhandle); + opt->two_write_queues = jtwo_write_queues == JNI_TRUE; +} + +/* + * Class: org_rocksdb_DBOptions + * Method: twoWriteQueues + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_DBOptions_twoWriteQueues( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->two_write_queues); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: setManualWalFlush + * Signature: (JZ)V + */ +void Java_org_rocksdb_DBOptions_setManualWalFlush( + JNIEnv*, jobject, jlong jhandle, jboolean jmanual_wal_flush) { + auto* opt = reinterpret_cast(jhandle); + opt->manual_wal_flush = jmanual_wal_flush == JNI_TRUE; +} + +/* + * Class: org_rocksdb_DBOptions + * Method: manualWalFlush + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_DBOptions_manualWalFlush( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->manual_wal_flush); +} + +/* + * Class: org_rocksdb_DBOptions + * Method: setAtomicFlush + * Signature: (JZ)V + */ +void Java_org_rocksdb_DBOptions_setAtomicFlush( + JNIEnv*, jobject, jlong jhandle, jboolean jatomic_flush) { + auto* opt = reinterpret_cast(jhandle); + opt->atomic_flush = jatomic_flush == JNI_TRUE; +} + +/* + * Class: org_rocksdb_DBOptions + * Method: atomicFlush + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_DBOptions_atomicFlush( + JNIEnv *, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->atomic_flush); +} + /* * Class: org_rocksdb_DBOptions * Method: setAvoidFlushDuringShutdown * Signature: (JZ)V */ void Java_org_rocksdb_DBOptions_setAvoidFlushDuringShutdown( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean javoid_flush_during_shutdown) { + JNIEnv*, jobject, jlong jhandle, jboolean javoid_flush_during_shutdown) { auto* opt = reinterpret_cast(jhandle); opt->avoid_flush_during_shutdown = static_cast(javoid_flush_during_shutdown); @@ -6048,9 +6117,8 @@ void Java_org_rocksdb_DBOptions_setAvoidFlushDuringShutdown( * Method: avoidFlushDuringShutdown * Signature: (J)Z */ -jboolean Java_org_rocksdb_DBOptions_avoidFlushDuringShutdown(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_DBOptions_avoidFlushDuringShutdown( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->avoid_flush_during_shutdown); } @@ -6063,8 +6131,8 @@ jboolean Java_org_rocksdb_DBOptions_avoidFlushDuringShutdown(JNIEnv* /*env*/, * Method: newWriteOptions * Signature: ()J */ -jlong Java_org_rocksdb_WriteOptions_newWriteOptions(JNIEnv* /*env*/, - jclass /*jcls*/) { +jlong Java_org_rocksdb_WriteOptions_newWriteOptions( + JNIEnv*, jclass) { auto* op = new rocksdb::WriteOptions(); return reinterpret_cast(op); } @@ -6074,9 +6142,8 @@ jlong Java_org_rocksdb_WriteOptions_newWriteOptions(JNIEnv* /*env*/, * Method: copyWriteOptions * Signature: (J)J */ -jlong Java_org_rocksdb_WriteOptions_copyWriteOptions(JNIEnv* /*env*/, - jclass /*jcls*/, - jlong jhandle) { +jlong Java_org_rocksdb_WriteOptions_copyWriteOptions( + JNIEnv*, jclass, jlong jhandle) { auto new_opt = new rocksdb::WriteOptions( *(reinterpret_cast(jhandle))); return reinterpret_cast(new_opt); @@ -6087,9 +6154,8 @@ jlong Java_org_rocksdb_WriteOptions_copyWriteOptions(JNIEnv* /*env*/, * Method: disposeInternal * Signature: ()V */ -void Java_org_rocksdb_WriteOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle) { +void Java_org_rocksdb_WriteOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto* write_options = reinterpret_cast(jhandle); assert(write_options != nullptr); delete write_options; @@ -6100,9 +6166,8 @@ void Java_org_rocksdb_WriteOptions_disposeInternal(JNIEnv* /*env*/, * Method: setSync * Signature: (JZ)V */ -void Java_org_rocksdb_WriteOptions_setSync(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle, jboolean jflag) { +void Java_org_rocksdb_WriteOptions_setSync( + JNIEnv*, jobject, jlong jhandle, jboolean jflag) { reinterpret_cast(jhandle)->sync = jflag; } @@ -6111,9 +6176,8 @@ void Java_org_rocksdb_WriteOptions_setSync(JNIEnv* /*env*/, * Method: sync * Signature: (J)Z */ -jboolean Java_org_rocksdb_WriteOptions_sync(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle) { +jboolean Java_org_rocksdb_WriteOptions_sync( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->sync; } @@ -6122,10 +6186,8 @@ jboolean Java_org_rocksdb_WriteOptions_sync(JNIEnv* /*env*/, * Method: setDisableWAL * Signature: (JZ)V */ -void Java_org_rocksdb_WriteOptions_setDisableWAL(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle, - jboolean jflag) { +void Java_org_rocksdb_WriteOptions_setDisableWAL( + JNIEnv*, jobject, jlong jhandle, jboolean jflag) { reinterpret_cast(jhandle)->disableWAL = jflag; } @@ -6134,9 +6196,8 @@ void Java_org_rocksdb_WriteOptions_setDisableWAL(JNIEnv* /*env*/, * Method: disableWAL * Signature: (J)Z */ -jboolean Java_org_rocksdb_WriteOptions_disableWAL(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle) { +jboolean Java_org_rocksdb_WriteOptions_disableWAL( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->disableWAL; } @@ -6146,7 +6207,7 @@ jboolean Java_org_rocksdb_WriteOptions_disableWAL(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_WriteOptions_setIgnoreMissingColumnFamilies( - JNIEnv* /*env*/, jobject /*jwrite_options*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean jignore_missing_column_families) { reinterpret_cast(jhandle) ->ignore_missing_column_families = @@ -6159,7 +6220,7 @@ void Java_org_rocksdb_WriteOptions_setIgnoreMissingColumnFamilies( * Signature: (J)Z */ jboolean Java_org_rocksdb_WriteOptions_ignoreMissingColumnFamilies( - JNIEnv* /*env*/, jobject /*jwrite_options*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->ignore_missing_column_families; } @@ -6169,10 +6230,8 @@ jboolean Java_org_rocksdb_WriteOptions_ignoreMissingColumnFamilies( * Method: setNoSlowdown * Signature: (JZ)V */ -void Java_org_rocksdb_WriteOptions_setNoSlowdown(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle, - jboolean jno_slowdown) { +void Java_org_rocksdb_WriteOptions_setNoSlowdown( + JNIEnv*, jobject, jlong jhandle, jboolean jno_slowdown) { reinterpret_cast(jhandle)->no_slowdown = static_cast(jno_slowdown); } @@ -6182,12 +6241,32 @@ void Java_org_rocksdb_WriteOptions_setNoSlowdown(JNIEnv* /*env*/, * Method: noSlowdown * Signature: (J)Z */ -jboolean Java_org_rocksdb_WriteOptions_noSlowdown(JNIEnv* /*env*/, - jobject /*jwrite_options*/, - jlong jhandle) { +jboolean Java_org_rocksdb_WriteOptions_noSlowdown( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->no_slowdown; } +/* + * Class: org_rocksdb_WriteOptions + * Method: setLowPri + * Signature: (JZ)V + */ +void Java_org_rocksdb_WriteOptions_setLowPri( + JNIEnv*, jobject, jlong jhandle, jboolean jlow_pri) { + reinterpret_cast(jhandle)->low_pri = + static_cast(jlow_pri); +} + +/* + * Class: org_rocksdb_WriteOptions + * Method: lowPri + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_WriteOptions_lowPri( + JNIEnv*, jobject, jlong jhandle) { + return reinterpret_cast(jhandle)->low_pri; +} + ///////////////////////////////////////////////////////////////////// // rocksdb::ReadOptions @@ -6196,19 +6275,32 @@ jboolean Java_org_rocksdb_WriteOptions_noSlowdown(JNIEnv* /*env*/, * Method: newReadOptions * Signature: ()J */ -jlong Java_org_rocksdb_ReadOptions_newReadOptions(JNIEnv* /*env*/, - jclass /*jcls*/) { +jlong Java_org_rocksdb_ReadOptions_newReadOptions__( + JNIEnv*, jclass) { auto* read_options = new rocksdb::ReadOptions(); return reinterpret_cast(read_options); } +/* + * Class: org_rocksdb_ReadOptions + * Method: newReadOptions + * Signature: (ZZ)J + */ +jlong Java_org_rocksdb_ReadOptions_newReadOptions__ZZ( + JNIEnv*, jclass, jboolean jverify_checksums, jboolean jfill_cache) { + auto* read_options = + new rocksdb::ReadOptions(static_cast(jverify_checksums), + static_cast(jfill_cache)); + return reinterpret_cast(read_options); +} + /* * Class: org_rocksdb_ReadOptions * Method: copyReadOptions * Signature: (J)J */ -jlong Java_org_rocksdb_ReadOptions_copyReadOptions(JNIEnv* /*env*/, jclass /*jcls*/, - jlong jhandle) { +jlong Java_org_rocksdb_ReadOptions_copyReadOptions( + JNIEnv*, jclass, jlong jhandle) { auto new_opt = new rocksdb::ReadOptions( *(reinterpret_cast(jhandle))); return reinterpret_cast(new_opt); @@ -6219,9 +6311,8 @@ jlong Java_org_rocksdb_ReadOptions_copyReadOptions(JNIEnv* /*env*/, jclass /*jcl * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_ReadOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_ReadOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto* read_options = reinterpret_cast(jhandle); assert(read_options != nullptr); delete read_options; @@ -6233,8 +6324,7 @@ void Java_org_rocksdb_ReadOptions_disposeInternal(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ReadOptions_setVerifyChecksums( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jverify_checksums) { + JNIEnv*, jobject, jlong jhandle, jboolean jverify_checksums) { reinterpret_cast(jhandle)->verify_checksums = static_cast(jverify_checksums); } @@ -6244,9 +6334,8 @@ void Java_org_rocksdb_ReadOptions_setVerifyChecksums( * Method: verifyChecksums * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_verifyChecksums(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_verifyChecksums( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->verify_checksums; } @@ -6255,9 +6344,8 @@ jboolean Java_org_rocksdb_ReadOptions_verifyChecksums(JNIEnv* /*env*/, * Method: setFillCache * Signature: (JZ)V */ -void Java_org_rocksdb_ReadOptions_setFillCache(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jboolean jfill_cache) { +void Java_org_rocksdb_ReadOptions_setFillCache( + JNIEnv*, jobject, jlong jhandle, jboolean jfill_cache) { reinterpret_cast(jhandle)->fill_cache = static_cast(jfill_cache); } @@ -6267,9 +6355,8 @@ void Java_org_rocksdb_ReadOptions_setFillCache(JNIEnv* /*env*/, * Method: fillCache * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_fillCache(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_fillCache( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->fill_cache; } @@ -6278,8 +6365,8 @@ jboolean Java_org_rocksdb_ReadOptions_fillCache(JNIEnv* /*env*/, * Method: setTailing * Signature: (JZ)V */ -void Java_org_rocksdb_ReadOptions_setTailing(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jboolean jtailing) { +void Java_org_rocksdb_ReadOptions_setTailing( + JNIEnv*, jobject, jlong jhandle, jboolean jtailing) { reinterpret_cast(jhandle)->tailing = static_cast(jtailing); } @@ -6289,8 +6376,8 @@ void Java_org_rocksdb_ReadOptions_setTailing(JNIEnv* /*env*/, jobject /*jobj*/, * Method: tailing * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_tailing(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_tailing( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->tailing; } @@ -6299,8 +6386,8 @@ jboolean Java_org_rocksdb_ReadOptions_tailing(JNIEnv* /*env*/, jobject /*jobj*/, * Method: managed * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_managed(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_managed( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->managed; } @@ -6309,8 +6396,8 @@ jboolean Java_org_rocksdb_ReadOptions_managed(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setManaged * Signature: (JZ)V */ -void Java_org_rocksdb_ReadOptions_setManaged(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jboolean jmanaged) { +void Java_org_rocksdb_ReadOptions_setManaged( + JNIEnv*, jobject, jlong jhandle, jboolean jmanaged) { reinterpret_cast(jhandle)->managed = static_cast(jmanaged); } @@ -6320,9 +6407,8 @@ void Java_org_rocksdb_ReadOptions_setManaged(JNIEnv* /*env*/, jobject /*jobj*/, * Method: totalOrderSeek * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_totalOrderSeek(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_totalOrderSeek( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->total_order_seek; } @@ -6332,8 +6418,7 @@ jboolean Java_org_rocksdb_ReadOptions_totalOrderSeek(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ReadOptions_setTotalOrderSeek( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jtotal_order_seek) { + JNIEnv*, jobject, jlong jhandle, jboolean jtotal_order_seek) { reinterpret_cast(jhandle)->total_order_seek = static_cast(jtotal_order_seek); } @@ -6343,9 +6428,8 @@ void Java_org_rocksdb_ReadOptions_setTotalOrderSeek( * Method: prefixSameAsStart * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_prefixSameAsStart(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_prefixSameAsStart( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->prefix_same_as_start; } @@ -6355,8 +6439,7 @@ jboolean Java_org_rocksdb_ReadOptions_prefixSameAsStart(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ReadOptions_setPrefixSameAsStart( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jprefix_same_as_start) { + JNIEnv*, jobject, jlong jhandle, jboolean jprefix_same_as_start) { reinterpret_cast(jhandle)->prefix_same_as_start = static_cast(jprefix_same_as_start); } @@ -6366,8 +6449,8 @@ void Java_org_rocksdb_ReadOptions_setPrefixSameAsStart( * Method: pinData * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_pinData(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_pinData( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->pin_data; } @@ -6376,9 +6459,8 @@ jboolean Java_org_rocksdb_ReadOptions_pinData(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setPinData * Signature: (JZ)V */ -void Java_org_rocksdb_ReadOptions_setPinData(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, - jboolean jpin_data) { +void Java_org_rocksdb_ReadOptions_setPinData( + JNIEnv*, jobject, jlong jhandle, jboolean jpin_data) { reinterpret_cast(jhandle)->pin_data = static_cast(jpin_data); } @@ -6389,7 +6471,7 @@ void Java_org_rocksdb_ReadOptions_setPinData(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (J)Z */ jboolean Java_org_rocksdb_ReadOptions_backgroundPurgeOnIteratorCleanup( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->background_purge_on_iterator_cleanup); } @@ -6400,7 +6482,7 @@ jboolean Java_org_rocksdb_ReadOptions_backgroundPurgeOnIteratorCleanup( * Signature: (JZ)V */ void Java_org_rocksdb_ReadOptions_setBackgroundPurgeOnIteratorCleanup( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, + JNIEnv*, jobject, jlong jhandle, jboolean jbackground_purge_on_iterator_cleanup) { auto* opt = reinterpret_cast(jhandle); opt->background_purge_on_iterator_cleanup = @@ -6412,9 +6494,8 @@ void Java_org_rocksdb_ReadOptions_setBackgroundPurgeOnIteratorCleanup( * Method: readaheadSize * Signature: (J)J */ -jlong Java_org_rocksdb_ReadOptions_readaheadSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ReadOptions_readaheadSize( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->readahead_size); } @@ -6424,22 +6505,42 @@ jlong Java_org_rocksdb_ReadOptions_readaheadSize(JNIEnv* /*env*/, * Method: setReadaheadSize * Signature: (JJ)V */ -void Java_org_rocksdb_ReadOptions_setReadaheadSize(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jlong jreadahead_size) { +void Java_org_rocksdb_ReadOptions_setReadaheadSize( + JNIEnv*, jobject, jlong jhandle, jlong jreadahead_size) { auto* opt = reinterpret_cast(jhandle); opt->readahead_size = static_cast(jreadahead_size); } +/* + * Class: org_rocksdb_ReadOptions + * Method: maxSkippableInternalKeys + * Signature: (J)J + */ +jlong Java_org_rocksdb_ReadOptions_maxSkippableInternalKeys( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->max_skippable_internal_keys); +} + +/* + * Class: org_rocksdb_ReadOptions + * Method: setMaxSkippableInternalKeys + * Signature: (JJ)V + */ +void Java_org_rocksdb_ReadOptions_setMaxSkippableInternalKeys( + JNIEnv*, jobject, jlong jhandle, jlong jmax_skippable_internal_keys) { + auto* opt = reinterpret_cast(jhandle); + opt->max_skippable_internal_keys = + static_cast(jmax_skippable_internal_keys); +} + /* * Class: org_rocksdb_ReadOptions * Method: ignoreRangeDeletions * Signature: (J)Z */ -jboolean Java_org_rocksdb_ReadOptions_ignoreRangeDeletions(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ReadOptions_ignoreRangeDeletions( + JNIEnv*, jobject, jlong jhandle) { auto* opt = reinterpret_cast(jhandle); return static_cast(opt->ignore_range_deletions); } @@ -6450,8 +6551,7 @@ jboolean Java_org_rocksdb_ReadOptions_ignoreRangeDeletions(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ReadOptions_setIgnoreRangeDeletions( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean jignore_range_deletions) { + JNIEnv*, jobject, jlong jhandle, jboolean jignore_range_deletions) { auto* opt = reinterpret_cast(jhandle); opt->ignore_range_deletions = static_cast(jignore_range_deletions); } @@ -6461,8 +6561,8 @@ void Java_org_rocksdb_ReadOptions_setIgnoreRangeDeletions( * Method: setSnapshot * Signature: (JJ)V */ -void Java_org_rocksdb_ReadOptions_setSnapshot(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jlong jsnapshot) { +void Java_org_rocksdb_ReadOptions_setSnapshot( + JNIEnv*, jobject, jlong jhandle, jlong jsnapshot) { reinterpret_cast(jhandle)->snapshot = reinterpret_cast(jsnapshot); } @@ -6472,8 +6572,8 @@ void Java_org_rocksdb_ReadOptions_setSnapshot(JNIEnv* /*env*/, jobject /*jobj*/, * Method: snapshot * Signature: (J)J */ -jlong Java_org_rocksdb_ReadOptions_snapshot(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ReadOptions_snapshot( + JNIEnv*, jobject, jlong jhandle) { auto& snapshot = reinterpret_cast(jhandle)->snapshot; return reinterpret_cast(snapshot); } @@ -6483,8 +6583,8 @@ jlong Java_org_rocksdb_ReadOptions_snapshot(JNIEnv* /*env*/, jobject /*jobj*/, * Method: readTier * Signature: (J)B */ -jbyte Java_org_rocksdb_ReadOptions_readTier(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_ReadOptions_readTier( + JNIEnv*, jobject, jlong jhandle) { return static_cast( reinterpret_cast(jhandle)->read_tier); } @@ -6494,8 +6594,8 @@ jbyte Java_org_rocksdb_ReadOptions_readTier(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setReadTier * Signature: (JB)V */ -void Java_org_rocksdb_ReadOptions_setReadTier(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle, jbyte jread_tier) { +void Java_org_rocksdb_ReadOptions_setReadTier( + JNIEnv*, jobject, jlong jhandle, jbyte jread_tier) { reinterpret_cast(jhandle)->read_tier = static_cast(jread_tier); } @@ -6506,8 +6606,7 @@ void Java_org_rocksdb_ReadOptions_setReadTier(JNIEnv* /*env*/, jobject /*jobj*/, * Signature: (JJ)I */ void Java_org_rocksdb_ReadOptions_setIterateUpperBound( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jupper_bound_slice_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jupper_bound_slice_handle) { reinterpret_cast(jhandle)->iterate_upper_bound = reinterpret_cast(jupper_bound_slice_handle); } @@ -6517,14 +6616,71 @@ void Java_org_rocksdb_ReadOptions_setIterateUpperBound( * Method: iterateUpperBound * Signature: (J)J */ -jlong Java_org_rocksdb_ReadOptions_iterateUpperBound(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jlong Java_org_rocksdb_ReadOptions_iterateUpperBound( + JNIEnv*, jobject, jlong jhandle) { auto& upper_bound_slice_handle = reinterpret_cast(jhandle)->iterate_upper_bound; return reinterpret_cast(upper_bound_slice_handle); } +/* + * Class: org_rocksdb_ReadOptions + * Method: setIterateLowerBound + * Signature: (JJ)I + */ +void Java_org_rocksdb_ReadOptions_setIterateLowerBound( + JNIEnv*, jobject, jlong jhandle, jlong jlower_bound_slice_handle) { + reinterpret_cast(jhandle)->iterate_lower_bound = + reinterpret_cast(jlower_bound_slice_handle); +} + +/* + * Class: org_rocksdb_ReadOptions + * Method: iterateLowerBound + * Signature: (J)J + */ +jlong Java_org_rocksdb_ReadOptions_iterateLowerBound( + JNIEnv*, jobject, jlong jhandle) { + auto& lower_bound_slice_handle = + reinterpret_cast(jhandle)->iterate_lower_bound; + return reinterpret_cast(lower_bound_slice_handle); +} + +/* + * Class: org_rocksdb_ReadOptions + * Method: setTableFilter + * Signature: (JJ)V + */ +void Java_org_rocksdb_ReadOptions_setTableFilter( + JNIEnv*, jobject, jlong jhandle, jlong jjni_table_filter_handle) { + auto* opt = reinterpret_cast(jhandle); + auto* jni_table_filter = + reinterpret_cast(jjni_table_filter_handle); + opt->table_filter = jni_table_filter->GetTableFilterFunction(); +} + +/* + * Class: org_rocksdb_ReadOptions + * Method: setIterStartSeqnum + * Signature: (JJ)V + */ +void Java_org_rocksdb_ReadOptions_setIterStartSeqnum( + JNIEnv*, jobject, jlong jhandle, jlong jiter_start_seqnum) { + auto* opt = reinterpret_cast(jhandle); + opt->iter_start_seqnum = static_cast(jiter_start_seqnum); +} + +/* + * Class: org_rocksdb_ReadOptions + * Method: iterStartSeqnum + * Signature: (J)J + */ +jlong Java_org_rocksdb_ReadOptions_iterStartSeqnum( + JNIEnv*, jobject, jlong jhandle) { + auto* opt = reinterpret_cast(jhandle); + return static_cast(opt->iter_start_seqnum); +} + ///////////////////////////////////////////////////////////////////// // rocksdb::ComparatorOptions @@ -6533,8 +6689,8 @@ jlong Java_org_rocksdb_ReadOptions_iterateUpperBound(JNIEnv* /*env*/, * Method: newComparatorOptions * Signature: ()J */ -jlong Java_org_rocksdb_ComparatorOptions_newComparatorOptions(JNIEnv* /*env*/, - jclass /*jcls*/) { +jlong Java_org_rocksdb_ComparatorOptions_newComparatorOptions( + JNIEnv*, jclass) { auto* comparator_opt = new rocksdb::ComparatorJniCallbackOptions(); return reinterpret_cast(comparator_opt); } @@ -6544,9 +6700,8 @@ jlong Java_org_rocksdb_ComparatorOptions_newComparatorOptions(JNIEnv* /*env*/, * Method: useAdaptiveMutex * Signature: (J)Z */ -jboolean Java_org_rocksdb_ComparatorOptions_useAdaptiveMutex(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_ComparatorOptions_useAdaptiveMutex( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle) ->use_adaptive_mutex; } @@ -6557,8 +6712,7 @@ jboolean Java_org_rocksdb_ComparatorOptions_useAdaptiveMutex(JNIEnv* /*env*/, * Signature: (JZ)V */ void Java_org_rocksdb_ComparatorOptions_setUseAdaptiveMutex( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jboolean juse_adaptive_mutex) { + JNIEnv*, jobject, jlong jhandle, jboolean juse_adaptive_mutex) { reinterpret_cast(jhandle) ->use_adaptive_mutex = static_cast(juse_adaptive_mutex); } @@ -6568,9 +6722,8 @@ void Java_org_rocksdb_ComparatorOptions_setUseAdaptiveMutex( * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_ComparatorOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_ComparatorOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto* comparator_opt = reinterpret_cast(jhandle); assert(comparator_opt != nullptr); @@ -6585,8 +6738,8 @@ void Java_org_rocksdb_ComparatorOptions_disposeInternal(JNIEnv* /*env*/, * Method: newFlushOptions * Signature: ()J */ -jlong Java_org_rocksdb_FlushOptions_newFlushOptions(JNIEnv* /*env*/, - jclass /*jcls*/) { +jlong Java_org_rocksdb_FlushOptions_newFlushOptions( + JNIEnv*, jclass) { auto* flush_opt = new rocksdb::FlushOptions(); return reinterpret_cast(flush_opt); } @@ -6596,10 +6749,8 @@ jlong Java_org_rocksdb_FlushOptions_newFlushOptions(JNIEnv* /*env*/, * Method: setWaitForFlush * Signature: (JZ)V */ -void Java_org_rocksdb_FlushOptions_setWaitForFlush(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jboolean jwait) { +void Java_org_rocksdb_FlushOptions_setWaitForFlush( + JNIEnv*, jobject, jlong jhandle, jboolean jwait) { reinterpret_cast(jhandle)->wait = static_cast(jwait); } @@ -6609,20 +6760,40 @@ void Java_org_rocksdb_FlushOptions_setWaitForFlush(JNIEnv* /*env*/, * Method: waitForFlush * Signature: (J)Z */ -jboolean Java_org_rocksdb_FlushOptions_waitForFlush(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +jboolean Java_org_rocksdb_FlushOptions_waitForFlush( + JNIEnv*, jobject, jlong jhandle) { return reinterpret_cast(jhandle)->wait; } +/* + * Class: org_rocksdb_FlushOptions + * Method: setAllowWriteStall + * Signature: (JZ)V + */ +void Java_org_rocksdb_FlushOptions_setAllowWriteStall( + JNIEnv*, jobject, jlong jhandle, jboolean jallow_write_stall) { + auto* flush_options = reinterpret_cast(jhandle); + flush_options->allow_write_stall = jallow_write_stall == JNI_TRUE; +} + +/* + * Class: org_rocksdb_FlushOptions + * Method: allowWriteStall + * Signature: (J)Z + */ +jboolean Java_org_rocksdb_FlushOptions_allowWriteStall( + JNIEnv*, jobject, jlong jhandle) { + auto* flush_options = reinterpret_cast(jhandle); + return static_cast(flush_options->allow_write_stall); +} + /* * Class: org_rocksdb_FlushOptions * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_FlushOptions_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_FlushOptions_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { auto* flush_opt = reinterpret_cast(jhandle); assert(flush_opt != nullptr); delete flush_opt; diff --git a/ceph/src/rocksdb/java/rocksjni/options_util.cc b/ceph/src/rocksdb/java/rocksjni/options_util.cc index 2e057c407..7dd007845 100644 --- a/ceph/src/rocksdb/java/rocksjni/options_util.cc +++ b/ceph/src/rocksdb/java/rocksjni/options_util.cc @@ -7,6 +7,7 @@ // calling C++ rocksdb::OptionsUtil methods from Java side. #include +#include #include "include/org_rocksdb_OptionsUtil.h" @@ -56,19 +57,23 @@ void build_column_family_descriptor_list( void Java_org_rocksdb_OptionsUtil_loadLatestOptions( JNIEnv* env, jclass /*jcls*/, jstring jdbpath, jlong jenv_handle, jlong jdb_opts_handle, jobject jcfds, jboolean ignore_unknown_options) { - const char* db_path = env->GetStringUTFChars(jdbpath, nullptr); + jboolean has_exception = JNI_FALSE; + auto db_path = rocksdb::JniUtil::copyStdString(env, jdbpath, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } std::vector cf_descs; rocksdb::Status s = rocksdb::LoadLatestOptions( db_path, reinterpret_cast(jenv_handle), reinterpret_cast(jdb_opts_handle), &cf_descs, ignore_unknown_options); - env->ReleaseStringUTFChars(jdbpath, db_path); - if (!s.ok()) { + // error, raise an exception rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } else { + build_column_family_descriptor_list(env, jcfds, cf_descs); } - - build_column_family_descriptor_list(env, jcfds, cf_descs); } /* @@ -79,19 +84,23 @@ void Java_org_rocksdb_OptionsUtil_loadLatestOptions( void Java_org_rocksdb_OptionsUtil_loadOptionsFromFile( JNIEnv* env, jclass /*jcls*/, jstring jopts_file_name, jlong jenv_handle, jlong jdb_opts_handle, jobject jcfds, jboolean ignore_unknown_options) { - const char* opts_file_name = env->GetStringUTFChars(jopts_file_name, nullptr); + jboolean has_exception = JNI_FALSE; + auto opts_file_name = rocksdb::JniUtil::copyStdString(env, jopts_file_name, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } std::vector cf_descs; rocksdb::Status s = rocksdb::LoadOptionsFromFile( opts_file_name, reinterpret_cast(jenv_handle), reinterpret_cast(jdb_opts_handle), &cf_descs, ignore_unknown_options); - env->ReleaseStringUTFChars(jopts_file_name, opts_file_name); - if (!s.ok()) { + // error, raise an exception rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } else { + build_column_family_descriptor_list(env, jcfds, cf_descs); } - - build_column_family_descriptor_list(env, jcfds, cf_descs); } /* @@ -101,14 +110,21 @@ void Java_org_rocksdb_OptionsUtil_loadOptionsFromFile( */ jstring Java_org_rocksdb_OptionsUtil_getLatestOptionsFileName( JNIEnv* env, jclass /*jcls*/, jstring jdbpath, jlong jenv_handle) { - const char* db_path = env->GetStringUTFChars(jdbpath, nullptr); + jboolean has_exception = JNI_FALSE; + auto db_path = rocksdb::JniUtil::copyStdString(env, jdbpath, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return nullptr; + } std::string options_file_name; - if (db_path != nullptr) { - rocksdb::GetLatestOptionsFileName( - db_path, reinterpret_cast(jenv_handle), - &options_file_name); + rocksdb::Status s = rocksdb::GetLatestOptionsFileName( + db_path, reinterpret_cast(jenv_handle), + &options_file_name); + if (!s.ok()) { + // error, raise an exception + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } else { + return env->NewStringUTF(options_file_name.c_str()); } - env->ReleaseStringUTFChars(jdbpath, db_path); - - return env->NewStringUTF(options_file_name.c_str()); } diff --git a/ceph/src/rocksdb/java/rocksjni/persistent_cache.cc b/ceph/src/rocksdb/java/rocksjni/persistent_cache.cc new file mode 100644 index 000000000..2b6fc60ba --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/persistent_cache.cc @@ -0,0 +1,53 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::PersistentCache. + +#include +#include + +#include "include/org_rocksdb_PersistentCache.h" +#include "rocksdb/persistent_cache.h" +#include "loggerjnicallback.h" +#include "portal.h" + +/* + * Class: org_rocksdb_PersistentCache + * Method: newPersistentCache + * Signature: (JLjava/lang/String;JJZ)J + */ +jlong Java_org_rocksdb_PersistentCache_newPersistentCache( + JNIEnv* env, jclass, jlong jenv_handle, jstring jpath, + jlong jsz, jlong jlogger_handle, jboolean joptimized_for_nvm) { + auto* rocks_env = reinterpret_cast(jenv_handle); + jboolean has_exception = JNI_FALSE; + std::string path = rocksdb::JniUtil::copyStdString(env, jpath, &has_exception); + if (has_exception == JNI_TRUE) { + return 0; + } + auto* logger = + reinterpret_cast*>(jlogger_handle); + auto* cache = new std::shared_ptr(nullptr); + rocksdb::Status s = rocksdb::NewPersistentCache( + rocks_env, path, static_cast(jsz), *logger, + static_cast(joptimized_for_nvm), cache); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } + return reinterpret_cast(cache); +} + +/* + * Class: org_rocksdb_PersistentCache + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_PersistentCache_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* cache = + reinterpret_cast*>(jhandle); + delete cache; // delete std::shared_ptr +} diff --git a/ceph/src/rocksdb/java/rocksjni/portal.h b/ceph/src/rocksdb/java/rocksjni/portal.h index a0d1846a6..70e67653e 100644 --- a/ceph/src/rocksdb/java/rocksjni/portal.h +++ b/ceph/src/rocksdb/java/rocksjni/portal.h @@ -10,11 +10,12 @@ #ifndef JAVA_ROCKSJNI_PORTAL_H_ #define JAVA_ROCKSJNI_PORTAL_H_ +#include #include -#include #include #include #include +#include #include #include #include @@ -25,13 +26,18 @@ #include "rocksdb/filter_policy.h" #include "rocksdb/rate_limiter.h" #include "rocksdb/status.h" +#include "rocksdb/table.h" #include "rocksdb/utilities/backupable_db.h" +#include "rocksdb/utilities/memory_util.h" #include "rocksdb/utilities/transaction_db.h" #include "rocksdb/utilities/write_batch_with_index.h" #include "rocksjni/compaction_filter_factory_jnicallback.h" #include "rocksjni/comparatorjnicallback.h" #include "rocksjni/loggerjnicallback.h" +#include "rocksjni/table_filter_jnicallback.h" +#include "rocksjni/trace_writer_jnicallback.h" #include "rocksjni/transaction_notifier_jnicallback.h" +#include "rocksjni/wal_filter_jnicallback.h" #include "rocksjni/writebatchhandlerjnicallback.h" // Remove macro on windows @@ -41,15 +47,6 @@ namespace rocksdb { -// Detect if jlong overflows size_t -inline Status check_if_jlong_fits_size_t(const jlong& jvalue) { - Status s = Status::OK(); - if (static_cast(jvalue) > std::numeric_limits::max()) { - s = Status::InvalidArgument(Slice("jlong overflows 32 bit value.")); - } - return s; -} - class JavaClass { public: /** @@ -158,11 +155,12 @@ template class JavaException : public JavaClass { } }; -// The portal class for org.rocksdb.RocksDB -class RocksDBJni : public RocksDBNativeClass { +// The portal class for java.lang.IllegalArgumentException +class IllegalArgumentExceptionJni : + public JavaException { public: /** - * Get the Java Class org.rocksdb.RocksDB + * Get the Java Class java.lang.IllegalArgumentException * * @param env A pointer to the Java environment * @@ -171,7 +169,34 @@ class RocksDBJni : public RocksDBNativeClass { * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/RocksDB"); + return JavaException::getJClass(env, "java/lang/IllegalArgumentException"); + } + + /** + * Create and throw a Java IllegalArgumentException with the provided status + * + * If s.ok() == true, then this function will not throw any exception. + * + * @param env A pointer to the Java environment + * @param s The status for the exception + * + * @return true if an exception was thrown, false otherwise + */ + static bool ThrowNew(JNIEnv* env, const Status& s) { + assert(!s.ok()); + if (s.ok()) { + return false; + } + + // get the IllegalArgumentException class + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + std::cerr << "IllegalArgumentExceptionJni::ThrowNew/class - Error: unexpected exception!" << std::endl; + return env->ExceptionCheck(); + } + + return JavaException::ThrowNew(env, s.ToString()); } }; @@ -472,6 +497,100 @@ class StatusJni : public RocksDBNativeClass { } } + static std::unique_ptr toCppStatus( + const jbyte jcode_value, const jbyte jsub_code_value) { + std::unique_ptr status; + switch (jcode_value) { + case 0x0: + //Ok + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::OK())); + break; + case 0x1: + //NotFound + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::NotFound( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x2: + //Corruption + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::Corruption( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x3: + //NotSupported + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::NotSupported( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x4: + //InvalidArgument + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::InvalidArgument( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x5: + //IOError + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::IOError( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x6: + //MergeInProgress + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::MergeInProgress( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x7: + //Incomplete + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::Incomplete( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x8: + //ShutdownInProgress + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::ShutdownInProgress( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x9: + //TimedOut + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::TimedOut( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0xA: + //Aborted + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::Aborted( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0xB: + //Busy + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::Busy( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0xC: + //Expired + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::Expired( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0xD: + //TryAgain + status = std::unique_ptr( + new rocksdb::Status(rocksdb::Status::TryAgain( + rocksdb::SubCodeJni::toCppSubCode(jsub_code_value)))); + break; + case 0x7F: + default: + return nullptr; + } + return status; + } + // Returns the equivalent rocksdb::Status for the Java org.rocksdb.Status static std::unique_ptr toCppStatus(JNIEnv* env, const jobject jstatus) { jmethodID mid_code = getCodeMethod(env); @@ -513,14 +632,14 @@ class StatusJni : public RocksDBNativeClass { return nullptr; } - jbyte jsubCode_value = 0x0; // None + jbyte jsub_code_value = 0x0; // None if (jsubCode != nullptr) { jmethodID mid_subCode_value = rocksdb::SubCodeJni::getValueMethod(env); if (mid_subCode_value == nullptr) { // exception occurred return nullptr; } - jsubCode_value =env->CallByteMethod(jsubCode, mid_subCode_value); + jsub_code_value = env->CallByteMethod(jsubCode, mid_subCode_value); if (env->ExceptionCheck()) { // exception occurred if (jcode != nullptr) { @@ -547,68 +666,8 @@ class StatusJni : public RocksDBNativeClass { return nullptr; } - std::unique_ptr status; - switch (jcode_value) { - case 0x0: - //Ok - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::OK())); - break; - case 0x1: - //NotFound - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::NotFound(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x2: - //Corruption - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::Corruption(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x3: - //NotSupported - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::NotSupported(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x4: - //InvalidArgument - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::InvalidArgument(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x5: - //IOError - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::IOError(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x6: - //MergeInProgress - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::MergeInProgress(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x7: - //Incomplete - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::Incomplete(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x8: - //ShutdownInProgress - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::ShutdownInProgress(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x9: - //TimedOut - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::TimedOut(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0xA: - //Aborted - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::Aborted(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0xB: - //Busy - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::Busy(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0xC: - //Expired - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::Expired(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0xD: - //TryAgain - status = std::unique_ptr(new rocksdb::Status(rocksdb::Status::TryAgain(rocksdb::SubCodeJni::toCppSubCode(jsubCode_value)))); - break; - case 0x7F: - default: - return nullptr; - } + std::unique_ptr status = + toCppStatus(jcode_value, jsub_code_value); // delete all local refs if (jstate != nullptr) { @@ -679,7 +738,6 @@ class RocksDBExceptionJni : * @return true if an exception was thrown, false otherwise */ static bool ThrowNew(JNIEnv* env, const Status& s) { - assert(!s.ok()); if (s.ok()) { return false; } @@ -894,12 +952,11 @@ class RocksDBExceptionJni : } }; -// The portal class for java.lang.IllegalArgumentException -class IllegalArgumentExceptionJni : - public JavaException { +// The portal class for java.util.List +class ListJni : public JavaClass { public: /** - * Get the Java Class java.lang.IllegalArgumentException + * Get the Java Class java.util.List * * @param env A pointer to the Java environment * @@ -907,45 +964,25 @@ class IllegalArgumentExceptionJni : * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getJClass(JNIEnv* env) { - return JavaException::getJClass(env, "java/lang/IllegalArgumentException"); + static jclass getListClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/util/List"); } /** - * Create and throw a Java IllegalArgumentException with the provided status - * - * If s.ok() == true, then this function will not throw any exception. + * Get the Java Class java.util.ArrayList * * @param env A pointer to the Java environment - * @param s The status for the exception * - * @return true if an exception was thrown, false otherwise + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static bool ThrowNew(JNIEnv* env, const Status& s) { - assert(!s.ok()); - if (s.ok()) { - return false; - } - - // get the IllegalArgumentException class - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - std::cerr << "IllegalArgumentExceptionJni::ThrowNew/class - Error: unexpected exception!" << std::endl; - return env->ExceptionCheck(); - } - - return JavaException::ThrowNew(env, s.ToString()); + static jclass getArrayListClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/util/ArrayList"); } -}; - -// The portal class for org.rocksdb.Options -class OptionsJni : public RocksDBNativeClass< - rocksdb::Options*, OptionsJni> { - public: /** - * Get the Java Class org.rocksdb.Options + * Get the Java Class java.util.Iterator * * @param env A pointer to the Java environment * @@ -953,87 +990,119 @@ class OptionsJni : public RocksDBNativeClass< * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/Options"); + static jclass getIteratorClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/util/Iterator"); } -}; -// The portal class for org.rocksdb.DBOptions -class DBOptionsJni : public RocksDBNativeClass< - rocksdb::DBOptions*, DBOptionsJni> { - public: /** - * Get the Java Class org.rocksdb.DBOptions + * Get the Java Method: List#iterator * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/DBOptions"); + static jmethodID getIteratorMethod(JNIEnv* env) { + jclass jlist_clazz = getListClass(env); + if(jlist_clazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = + env->GetMethodID(jlist_clazz, "iterator", "()Ljava/util/Iterator;"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.ColumnFamilyOptions -class ColumnFamilyOptionsJni - : public RocksDBNativeClass { - public: /** - * Get the Java Class org.rocksdb.ColumnFamilyOptions + * Get the Java Method: Iterator#hasNext * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/ColumnFamilyOptions"); + static jmethodID getHasNextMethod(JNIEnv* env) { + jclass jiterator_clazz = getIteratorClass(env); + if(jiterator_clazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jiterator_clazz, "hasNext", "()Z"); + assert(mid != nullptr); + return mid; } /** - * Create a new Java org.rocksdb.ColumnFamilyOptions object with the same - * properties as the provided C++ rocksdb::ColumnFamilyOptions object + * Get the Java Method: Iterator#next * * @param env A pointer to the Java environment - * @param cfoptions A pointer to rocksdb::ColumnFamilyOptions object * - * @return A reference to a Java org.rocksdb.ColumnFamilyOptions object, or - * nullptr if an an exception occurs + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jobject construct(JNIEnv* env, const ColumnFamilyOptions* cfoptions) { - auto* cfo = new rocksdb::ColumnFamilyOptions(*cfoptions); - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { + static jmethodID getNextMethod(JNIEnv* env) { + jclass jiterator_clazz = getIteratorClass(env); + if(jiterator_clazz == nullptr) { // exception occurred accessing class return nullptr; } - jmethodID mid = env->GetMethodID(jclazz, "", "(J)V"); - if (mid == nullptr) { - // exception thrown: NoSuchMethodException or OutOfMemoryError + static jmethodID mid = + env->GetMethodID(jiterator_clazz, "next", "()Ljava/lang/Object;"); + assert(mid != nullptr); + return mid; + } + + /** + * Get the Java Method: ArrayList constructor + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getArrayListConstructorMethodId(JNIEnv* env) { + jclass jarray_list_clazz = getArrayListClass(env); + if(jarray_list_clazz == nullptr) { + // exception occurred accessing class return nullptr; } + static jmethodID mid = + env->GetMethodID(jarray_list_clazz, "", "(I)V"); + assert(mid != nullptr); + return mid; + } - jobject jcfd = env->NewObject(jclazz, mid, reinterpret_cast(cfo)); - if (env->ExceptionCheck()) { + /** + * Get the Java Method: List#add + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getListAddMethodId(JNIEnv* env) { + jclass jlist_clazz = getListClass(env); + if(jlist_clazz == nullptr) { + // exception occurred accessing class return nullptr; } - return jcfd; + static jmethodID mid = + env->GetMethodID(jlist_clazz, "add", "(Ljava/lang/Object;)Z"); + assert(mid != nullptr); + return mid; } }; -// The portal class for org.rocksdb.WriteOptions -class WriteOptionsJni : public RocksDBNativeClass< - rocksdb::WriteOptions*, WriteOptionsJni> { +// The portal class for java.lang.Byte +class ByteJni : public JavaClass { public: /** - * Get the Java Class org.rocksdb.WriteOptions + * Get the Java Class java.lang.Byte * * @param env A pointer to the Java environment * @@ -1042,16 +1111,11 @@ class WriteOptionsJni : public RocksDBNativeClass< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/WriteOptions"); + return JavaClass::getJClass(env, "java/lang/Byte"); } -}; -// The portal class for org.rocksdb.ReadOptions -class ReadOptionsJni : public RocksDBNativeClass< - rocksdb::ReadOptions*, ReadOptionsJni> { - public: /** - * Get the Java Class org.rocksdb.ReadOptions + * Get the Java Class byte[] * * @param env A pointer to the Java environment * @@ -1059,66 +1123,87 @@ class ReadOptionsJni : public RocksDBNativeClass< * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/ReadOptions"); + static jclass getArrayJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "[B"); } -}; -// The portal class for org.rocksdb.WriteBatch -class WriteBatchJni : public RocksDBNativeClass< - rocksdb::WriteBatch*, WriteBatchJni> { - public: /** - * Get the Java Class org.rocksdb.WriteBatch + * Creates a new 2-dimensional Java Byte Array byte[][] * * @param env A pointer to the Java environment + * @param len The size of the first dimension * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return A reference to the Java byte[][] or nullptr if an exception occurs */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/WriteBatch"); + static jobjectArray new2dByteArray(JNIEnv* env, const jsize len) { + jclass clazz = getArrayJClass(env); + if(clazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + return env->NewObjectArray(len, clazz, nullptr); } /** - * Create a new Java org.rocksdb.WriteBatch object + * Get the Java Method: Byte#byteValue * * @param env A pointer to the Java environment - * @param wb A pointer to rocksdb::WriteBatch object * - * @return A reference to a Java org.rocksdb.WriteBatch object, or - * nullptr if an an exception occurs + * @return The Java Method ID or nullptr if the class or method id could not + * be retrieved */ - static jobject construct(JNIEnv* env, const WriteBatch* wb) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { + static jmethodID getByteValueMethod(JNIEnv* env) { + jclass clazz = getJClass(env); + if(clazz == nullptr) { // exception occurred accessing class return nullptr; } - jmethodID mid = env->GetMethodID(jclazz, "", "(J)V"); + static jmethodID mid = env->GetMethodID(clazz, "byteValue", "()B"); + assert(mid != nullptr); + return mid; + } + + /** + * Calls the Java Method: Byte#valueOf, returning a constructed Byte jobject + * + * @param env A pointer to the Java environment + * + * @return A constructing Byte object or nullptr if the class or method id could not + * be retrieved, or an exception occurred + */ + static jobject valueOf(JNIEnv* env, jbyte jprimitive_byte) { + jclass clazz = getJClass(env); + if (clazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = + env->GetStaticMethodID(clazz, "valueOf", "(B)Ljava/lang/Byte;"); if (mid == nullptr) { // exception thrown: NoSuchMethodException or OutOfMemoryError return nullptr; } - jobject jwb = env->NewObject(jclazz, mid, reinterpret_cast(wb)); + const jobject jbyte_obj = + env->CallStaticObjectMethod(clazz, mid, jprimitive_byte); if (env->ExceptionCheck()) { + // exception occurred return nullptr; } - return jwb; + return jbyte_obj; } + }; -// The portal class for org.rocksdb.WriteBatch.Handler -class WriteBatchHandlerJni : public RocksDBNativeClass< - const rocksdb::WriteBatchHandlerJniCallback*, - WriteBatchHandlerJni> { +// The portal class for java.lang.Integer +class IntegerJni : public JavaClass { public: /** - * Get the Java Class org.rocksdb.WriteBatch.Handler + * Get the Java Class java.lang.Integer * * @param env A pointer to the Java environment * @@ -1127,719 +1212,1752 @@ class WriteBatchHandlerJni : public RocksDBNativeClass< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/WriteBatch$Handler"); + return JavaClass::getJClass(env, "java/lang/Integer"); } - /** - * Get the Java Method: WriteBatch.Handler#put - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getPutCfMethodId(JNIEnv* env) { + static jobject valueOf(JNIEnv* env, jint jprimitive_int) { jclass jclazz = getJClass(env); - if(jclazz == nullptr) { + if (jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "put", "(I[B[B)V"); - assert(mid != nullptr); - return mid; - } + jmethodID mid = + env->GetStaticMethodID(jclazz, "valueOf", "(I)Ljava/lang/Integer;"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } - /** - * Get the Java Method: WriteBatch.Handler#put - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getPutMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class + const jobject jinteger_obj = + env->CallStaticObjectMethod(jclazz, mid, jprimitive_int); + if (env->ExceptionCheck()) { + // exception occurred return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "put", "([B[B)V"); - assert(mid != nullptr); - return mid; + return jinteger_obj; } +}; +// The portal class for java.lang.Long +class LongJni : public JavaClass { + public: /** - * Get the Java Method: WriteBatch.Handler#merge + * Get the Java Class java.lang.Long * * @param env A pointer to the Java environment * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jmethodID getMergeCfMethodId(JNIEnv* env) { + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/lang/Long"); + } + + static jobject valueOf(JNIEnv* env, jlong jprimitive_long) { jclass jclazz = getJClass(env); - if(jclazz == nullptr) { + if (jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "merge", "(I[B[B)V"); - assert(mid != nullptr); - return mid; - } + jmethodID mid = + env->GetStaticMethodID(jclazz, "valueOf", "(J)Ljava/lang/Long;"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } - /** - * Get the Java Method: WriteBatch.Handler#merge - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMergeMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class + const jobject jlong_obj = + env->CallStaticObjectMethod(jclazz, mid, jprimitive_long); + if (env->ExceptionCheck()) { + // exception occurred return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "merge", "([B[B)V"); - assert(mid != nullptr); - return mid; + return jlong_obj; } +}; +// The portal class for java.lang.StringBuilder +class StringBuilderJni : public JavaClass { + public: /** - * Get the Java Method: WriteBatch.Handler#delete + * Get the Java Class java.lang.StringBuilder * * @param env A pointer to the Java environment * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jmethodID getDeleteCfMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } - - static jmethodID mid = env->GetMethodID(jclazz, "delete", "(I[B)V"); - assert(mid != nullptr); - return mid; + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/lang/StringBuilder"); } /** - * Get the Java Method: WriteBatch.Handler#delete + * Get the Java Method: StringBuilder#append * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getDeleteMethodId(JNIEnv* env) { + static jmethodID getListAddMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "delete", "([B)V"); + static jmethodID mid = + env->GetMethodID(jclazz, "append", + "(Ljava/lang/String;)Ljava/lang/StringBuilder;"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: WriteBatch.Handler#singleDelete + * Appends a C-style string to a StringBuilder * * @param env A pointer to the Java environment + * @param jstring_builder Reference to a java.lang.StringBuilder + * @param c_str A C-style string to append to the StringBuilder * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved + * @return A reference to the updated StringBuilder, or a nullptr if + * an exception occurs */ - static jmethodID getSingleDeleteCfMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } - - static jmethodID mid = env->GetMethodID(jclazz, "singleDelete", "(I[B)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#singleDelete - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getSingleDeleteMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class + static jobject append(JNIEnv* env, jobject jstring_builder, + const char* c_str) { + jmethodID mid = getListAddMethodId(env); + if(mid == nullptr) { + // exception occurred accessing class or method return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "singleDelete", "([B)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#deleteRange - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getDeleteRangeCfMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if (jclazz == nullptr) { - // exception occurred accessing class + jstring new_value_str = env->NewStringUTF(c_str); + if(new_value_str == nullptr) { + // exception thrown: OutOfMemoryError return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "deleteRange", "(I[B[B)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#deleteRange - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getDeleteRangeMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if (jclazz == nullptr) { - // exception occurred accessing class + jobject jresult_string_builder = + env->CallObjectMethod(jstring_builder, mid, new_value_str); + if(env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(new_value_str); return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "deleteRange", "([B[B)V"); - assert(mid != nullptr); - return mid; + return jresult_string_builder; } +}; - /** - * Get the Java Method: WriteBatch.Handler#logData - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getLogDataMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; +// various utility functions for working with RocksDB and JNI +class JniUtil { + public: + /** + * Detect if jlong overflows size_t + * + * @param jvalue the jlong value + * + * @return + */ + inline static Status check_if_jlong_fits_size_t(const jlong& jvalue) { + Status s = Status::OK(); + if (static_cast(jvalue) > std::numeric_limits::max()) { + s = Status::InvalidArgument(Slice("jlong overflows 32 bit value.")); + } + return s; } - static jmethodID mid = env->GetMethodID(jclazz, "logData", "([B)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#putBlobIndex - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getPutBlobIndexCfMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } + /** + * Obtains a reference to the JNIEnv from + * the JVM + * + * If the current thread is not attached to the JavaVM + * then it will be attached so as to retrieve the JNIEnv + * + * If a thread is attached, it must later be manually + * released by calling JavaVM::DetachCurrentThread. + * This can be handled by always matching calls to this + * function with calls to {@link JniUtil::releaseJniEnv(JavaVM*, jboolean)} + * + * @param jvm (IN) A pointer to the JavaVM instance + * @param attached (OUT) A pointer to a boolean which + * will be set to JNI_TRUE if we had to attach the thread + * + * @return A pointer to the JNIEnv or nullptr if a fatal error + * occurs and the JNIEnv cannot be retrieved + */ + static JNIEnv* getJniEnv(JavaVM* jvm, jboolean* attached) { + assert(jvm != nullptr); - static jmethodID mid = env->GetMethodID(jclazz, "putBlobIndex", "(I[B[B)V"); - assert(mid != nullptr); - return mid; - } + JNIEnv *env; + const jint env_rs = jvm->GetEnv(reinterpret_cast(&env), + JNI_VERSION_1_2); - /** - * Get the Java Method: WriteBatch.Handler#markBeginPrepare - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMarkBeginPrepareMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + if(env_rs == JNI_OK) { + // current thread is already attached, return the JNIEnv + *attached = JNI_FALSE; + return env; + } else if(env_rs == JNI_EDETACHED) { + // current thread is not attached, attempt to attach + const jint rs_attach = jvm->AttachCurrentThread(reinterpret_cast(&env), NULL); + if(rs_attach == JNI_OK) { + *attached = JNI_TRUE; + return env; + } else { + // error, could not attach the thread + std::cerr << "JniUtil::getJniEnv - Fatal: could not attach current thread to JVM!" << std::endl; + return nullptr; + } + } else if(env_rs == JNI_EVERSION) { + // error, JDK does not support JNI_VERSION_1_2+ + std::cerr << "JniUtil::getJniEnv - Fatal: JDK does not support JNI_VERSION_1_2" << std::endl; + return nullptr; + } else { + std::cerr << "JniUtil::getJniEnv - Fatal: Unknown error: env_rs=" << env_rs << std::endl; + return nullptr; + } } - static jmethodID mid = env->GetMethodID(jclazz, "markBeginPrepare", "()V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#markEndPrepare - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMarkEndPrepareMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + /** + * Counterpart to {@link JniUtil::getJniEnv(JavaVM*, jboolean*)} + * + * Detachess the current thread from the JVM if it was previously + * attached + * + * @param jvm (IN) A pointer to the JavaVM instance + * @param attached (IN) JNI_TRUE if we previously had to attach the thread + * to the JavaVM to get the JNIEnv + */ + static void releaseJniEnv(JavaVM* jvm, jboolean& attached) { + assert(jvm != nullptr); + if(attached == JNI_TRUE) { + const jint rs_detach = jvm->DetachCurrentThread(); + assert(rs_detach == JNI_OK); + if(rs_detach != JNI_OK) { + std::cerr << "JniUtil::getJniEnv - Warn: Unable to detach current thread from JVM!" << std::endl; + } + } } - static jmethodID mid = env->GetMethodID(jclazz, "markEndPrepare", "([B)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#markNoop - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMarkNoopMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + /** + * Copies a Java String[] to a C++ std::vector + * + * @param env (IN) A pointer to the java environment + * @param jss (IN) The Java String array to copy + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError or ArrayIndexOutOfBoundsException + * exception occurs + * + * @return A std::vector containing copies of the Java strings + */ + static std::vector copyStrings(JNIEnv* env, + jobjectArray jss, jboolean* has_exception) { + return rocksdb::JniUtil::copyStrings(env, jss, + env->GetArrayLength(jss), has_exception); } - static jmethodID mid = env->GetMethodID(jclazz, "markNoop", "(Z)V"); - assert(mid != nullptr); - return mid; - } - - /** - * Get the Java Method: WriteBatch.Handler#markRollback - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMarkRollbackMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } + /** + * Copies a Java String[] to a C++ std::vector + * + * @param env (IN) A pointer to the java environment + * @param jss (IN) The Java String array to copy + * @param jss_len (IN) The length of the Java String array to copy + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError or ArrayIndexOutOfBoundsException + * exception occurs + * + * @return A std::vector containing copies of the Java strings + */ + static std::vector copyStrings(JNIEnv* env, + jobjectArray jss, const jsize jss_len, jboolean* has_exception) { + std::vector strs; + strs.reserve(jss_len); + for (jsize i = 0; i < jss_len; i++) { + jobject js = env->GetObjectArrayElement(jss, i); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + *has_exception = JNI_TRUE; + return strs; + } - static jmethodID mid = env->GetMethodID(jclazz, "markRollback", "([B)V"); - assert(mid != nullptr); - return mid; - } + jstring jstr = static_cast(js); + const char* str = env->GetStringUTFChars(jstr, nullptr); + if(str == nullptr) { + // exception thrown: OutOfMemoryError + env->DeleteLocalRef(js); + *has_exception = JNI_TRUE; + return strs; + } - /** - * Get the Java Method: WriteBatch.Handler#markCommit - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getMarkCommitMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } + strs.push_back(std::string(str)); - static jmethodID mid = env->GetMethodID(jclazz, "markCommit", "([B)V"); - assert(mid != nullptr); - return mid; - } + env->ReleaseStringUTFChars(jstr, str); + env->DeleteLocalRef(js); + } - /** - * Get the Java Method: WriteBatch.Handler#shouldContinue - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getContinueMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + *has_exception = JNI_FALSE; + return strs; } - static jmethodID mid = env->GetMethodID(jclazz, "shouldContinue", "()Z"); - assert(mid != nullptr); - return mid; - } -}; - -class WriteBatchSavePointJni : public JavaClass { - public: - /** - * Get the Java Class org.rocksdb.WriteBatch.SavePoint - * - * @param env A pointer to the Java environment - * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown - */ - static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "org/rocksdb/WriteBatch$SavePoint"); - } + /** + * Copies a jstring to a C-style null-terminated byte string + * and releases the original jstring + * + * The jstring is copied as UTF-8 + * + * If an exception occurs, then JNIEnv::ExceptionCheck() + * will have been called + * + * @param env (IN) A pointer to the java environment + * @param js (IN) The java string to copy + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError exception occurs + * + * @return A pointer to the copied string, or a + * nullptr if has_exception == JNI_TRUE + */ + static std::unique_ptr copyString(JNIEnv* env, jstring js, + jboolean* has_exception) { + const char *utf = env->GetStringUTFChars(js, nullptr); + if(utf == nullptr) { + // exception thrown: OutOfMemoryError + env->ExceptionCheck(); + *has_exception = JNI_TRUE; + return nullptr; + } else if(env->ExceptionCheck()) { + // exception thrown + env->ReleaseStringUTFChars(js, utf); + *has_exception = JNI_TRUE; + return nullptr; + } - /** - * Get the Java Method: HistogramData constructor - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getConstructorMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + const jsize utf_len = env->GetStringUTFLength(js); + std::unique_ptr str(new char[utf_len + 1]); // Note: + 1 is needed for the c_str null terminator + std::strcpy(str.get(), utf); + env->ReleaseStringUTFChars(js, utf); + *has_exception = JNI_FALSE; + return str; } - static jmethodID mid = env->GetMethodID(jclazz, "", "(JJJ)V"); - assert(mid != nullptr); - return mid; - } + /** + * Copies a jstring to a std::string + * and releases the original jstring + * + * If an exception occurs, then JNIEnv::ExceptionCheck() + * will have been called + * + * @param env (IN) A pointer to the java environment + * @param js (IN) The java string to copy + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError exception occurs + * + * @return A std:string copy of the jstring, or an + * empty std::string if has_exception == JNI_TRUE + */ + static std::string copyStdString(JNIEnv* env, jstring js, + jboolean* has_exception) { + const char *utf = env->GetStringUTFChars(js, nullptr); + if(utf == nullptr) { + // exception thrown: OutOfMemoryError + env->ExceptionCheck(); + *has_exception = JNI_TRUE; + return std::string(); + } else if(env->ExceptionCheck()) { + // exception thrown + env->ReleaseStringUTFChars(js, utf); + *has_exception = JNI_TRUE; + return std::string(); + } - /** - * Create a new Java org.rocksdb.WriteBatch.SavePoint object - * - * @param env A pointer to the Java environment - * @param savePoint A pointer to rocksdb::WriteBatch::SavePoint object - * - * @return A reference to a Java org.rocksdb.WriteBatch.SavePoint object, or - * nullptr if an an exception occurs - */ - static jobject construct(JNIEnv* env, const SavePoint &save_point) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; + std::string name(utf); + env->ReleaseStringUTFChars(js, utf); + *has_exception = JNI_FALSE; + return name; } - jmethodID mid = getConstructorMethodId(env); - if (mid == nullptr) { - // exception thrown: NoSuchMethodException or OutOfMemoryError - return nullptr; + /** + * Copies bytes from a std::string to a jByteArray + * + * @param env A pointer to the java environment + * @param bytes The bytes to copy + * + * @return the Java byte[], or nullptr if an exception occurs + * + * @throws RocksDBException thrown + * if memory size to copy exceeds general java specific array size limitation. + */ + static jbyteArray copyBytes(JNIEnv* env, std::string bytes) { + return createJavaByteArrayWithSizeCheck(env, bytes.c_str(), bytes.size()); } - jobject jsave_point = env->NewObject(jclazz, mid, - static_cast(save_point.size), - static_cast(save_point.count), - static_cast(save_point.content_flags)); - if (env->ExceptionCheck()) { - return nullptr; - } + /** + * Given a Java byte[][] which is an array of java.lang.Strings + * where each String is a byte[], the passed function `string_fn` + * will be called on each String, the result is the collected by + * calling the passed function `collector_fn` + * + * @param env (IN) A pointer to the java environment + * @param jbyte_strings (IN) A Java array of Strings expressed as bytes + * @param string_fn (IN) A transform function to call for each String + * @param collector_fn (IN) A collector which is called for the result + * of each `string_fn` + * @param has_exception (OUT) will be set to JNI_TRUE + * if an ArrayIndexOutOfBoundsException or OutOfMemoryError + * exception occurs + */ + template static void byteStrings(JNIEnv* env, + jobjectArray jbyte_strings, + std::function string_fn, + std::function collector_fn, + jboolean *has_exception) { + const jsize jlen = env->GetArrayLength(jbyte_strings); - return jsave_point; - } -}; + for(jsize i = 0; i < jlen; i++) { + jobject jbyte_string_obj = env->GetObjectArrayElement(jbyte_strings, i); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + *has_exception = JNI_TRUE; // signal error + return; + } -// The portal class for org.rocksdb.WriteBatchWithIndex -class WriteBatchWithIndexJni : public RocksDBNativeClass< - rocksdb::WriteBatchWithIndex*, WriteBatchWithIndexJni> { - public: - /** - * Get the Java Class org.rocksdb.WriteBatchWithIndex + jbyteArray jbyte_string_ary = + reinterpret_cast(jbyte_string_obj); + T result = byteString(env, jbyte_string_ary, string_fn, has_exception); + + env->DeleteLocalRef(jbyte_string_obj); + + if(*has_exception == JNI_TRUE) { + // exception thrown: OutOfMemoryError + return; + } + + collector_fn(i, result); + } + + *has_exception = JNI_FALSE; + } + + /** + * Given a Java String which is expressed as a Java Byte Array byte[], + * the passed function `string_fn` will be called on the String + * and the result returned + * + * @param env (IN) A pointer to the java environment + * @param jbyte_string_ary (IN) A Java String expressed in bytes + * @param string_fn (IN) A transform function to call on the String + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError exception occurs + */ + template static T byteString(JNIEnv* env, + jbyteArray jbyte_string_ary, + std::function string_fn, + jboolean* has_exception) { + const jsize jbyte_string_len = env->GetArrayLength(jbyte_string_ary); + return byteString(env, jbyte_string_ary, jbyte_string_len, string_fn, + has_exception); + } + + /** + * Given a Java String which is expressed as a Java Byte Array byte[], + * the passed function `string_fn` will be called on the String + * and the result returned + * + * @param env (IN) A pointer to the java environment + * @param jbyte_string_ary (IN) A Java String expressed in bytes + * @param jbyte_string_len (IN) The length of the Java String + * expressed in bytes + * @param string_fn (IN) A transform function to call on the String + * @param has_exception (OUT) will be set to JNI_TRUE + * if an OutOfMemoryError exception occurs + */ + template static T byteString(JNIEnv* env, + jbyteArray jbyte_string_ary, const jsize jbyte_string_len, + std::function string_fn, + jboolean* has_exception) { + jbyte* jbyte_string = + env->GetByteArrayElements(jbyte_string_ary, nullptr); + if(jbyte_string == nullptr) { + // exception thrown: OutOfMemoryError + *has_exception = JNI_TRUE; + return nullptr; // signal error + } + + T result = + string_fn(reinterpret_cast(jbyte_string), jbyte_string_len); + + env->ReleaseByteArrayElements(jbyte_string_ary, jbyte_string, JNI_ABORT); + + *has_exception = JNI_FALSE; + return result; + } + + /** + * Converts a std::vector to a Java byte[][] where each Java String + * is expressed as a Java Byte Array byte[]. + * + * @param env A pointer to the java environment + * @param strings A vector of Strings + * + * @return A Java array of Strings expressed as bytes, + * or nullptr if an exception is thrown + */ + static jobjectArray stringsBytes(JNIEnv* env, std::vector strings) { + jclass jcls_ba = ByteJni::getArrayJClass(env); + if(jcls_ba == nullptr) { + // exception occurred + return nullptr; + } + + const jsize len = static_cast(strings.size()); + + jobjectArray jbyte_strings = env->NewObjectArray(len, jcls_ba, nullptr); + if(jbyte_strings == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + for (jsize i = 0; i < len; i++) { + std::string *str = &strings[i]; + const jsize str_len = static_cast(str->size()); + + jbyteArray jbyte_string_ary = env->NewByteArray(str_len); + if(jbyte_string_ary == nullptr) { + // exception thrown: OutOfMemoryError + env->DeleteLocalRef(jbyte_strings); + return nullptr; + } + + env->SetByteArrayRegion( + jbyte_string_ary, 0, str_len, + const_cast(reinterpret_cast(str->c_str()))); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jbyte_string_ary); + env->DeleteLocalRef(jbyte_strings); + return nullptr; + } + + env->SetObjectArrayElement(jbyte_strings, i, jbyte_string_ary); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + // or ArrayStoreException + env->DeleteLocalRef(jbyte_string_ary); + env->DeleteLocalRef(jbyte_strings); + return nullptr; + } + + env->DeleteLocalRef(jbyte_string_ary); + } + + return jbyte_strings; + } + + /** + * Converts a std::vector to a Java String[]. + * + * @param env A pointer to the java environment + * @param strings A vector of Strings + * + * @return A Java array of Strings, + * or nullptr if an exception is thrown + */ + static jobjectArray toJavaStrings(JNIEnv* env, + const std::vector* strings) { + jclass jcls_str = env->FindClass("java/lang/String"); + if(jcls_str == nullptr) { + // exception occurred + return nullptr; + } + + const jsize len = static_cast(strings->size()); + + jobjectArray jstrings = env->NewObjectArray(len, jcls_str, nullptr); + if(jstrings == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + for (jsize i = 0; i < len; i++) { + const std::string *str = &((*strings)[i]); + jstring js = rocksdb::JniUtil::toJavaString(env, str); + if (js == nullptr) { + env->DeleteLocalRef(jstrings); + return nullptr; + } + + env->SetObjectArrayElement(jstrings, i, js); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + // or ArrayStoreException + env->DeleteLocalRef(js); + env->DeleteLocalRef(jstrings); + return nullptr; + } + } + + return jstrings; + } + + /** + * Creates a Java UTF String from a C++ std::string + * + * @param env A pointer to the java environment + * @param string the C++ std::string + * @param treat_empty_as_null true if empty strings should be treated as null + * + * @return the Java UTF string, or nullptr if the provided string + * is null (or empty and treat_empty_as_null is set), or if an + * exception occurs allocating the Java String. + */ + static jstring toJavaString(JNIEnv* env, const std::string* string, + const bool treat_empty_as_null = false) { + if (string == nullptr) { + return nullptr; + } + + if (treat_empty_as_null && string->empty()) { + return nullptr; + } + + return env->NewStringUTF(string->c_str()); + } + + /** + * Copies bytes to a new jByteArray with the check of java array size limitation. + * + * @param bytes pointer to memory to copy to a new jByteArray + * @param size number of bytes to copy + * + * @return the Java byte[], or nullptr if an exception occurs + * + * @throws RocksDBException thrown + * if memory size to copy exceeds general java array size limitation to avoid overflow. + */ + static jbyteArray createJavaByteArrayWithSizeCheck(JNIEnv* env, const char* bytes, const size_t size) { + // Limitation for java array size is vm specific + // In general it cannot exceed Integer.MAX_VALUE (2^31 - 1) + // Current HotSpot VM limitation for array size is Integer.MAX_VALUE - 5 (2^31 - 1 - 5) + // It means that the next call to env->NewByteArray can still end with + // OutOfMemoryError("Requested array size exceeds VM limit") coming from VM + static const size_t MAX_JARRAY_SIZE = (static_cast(1)) << 31; + if(size > MAX_JARRAY_SIZE) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, "Requested array size exceeds VM limit"); + return nullptr; + } + + const jsize jlen = static_cast(size); + jbyteArray jbytes = env->NewByteArray(jlen); + if(jbytes == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + env->SetByteArrayRegion(jbytes, 0, jlen, + const_cast(reinterpret_cast(bytes))); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jbytes); + return nullptr; + } + + return jbytes; + } + + /** + * Copies bytes from a rocksdb::Slice to a jByteArray + * + * @param env A pointer to the java environment + * @param bytes The bytes to copy + * + * @return the Java byte[] or nullptr if an exception occurs + * + * @throws RocksDBException thrown + * if memory size to copy exceeds general java specific array size limitation. + */ + static jbyteArray copyBytes(JNIEnv* env, const Slice& bytes) { + return createJavaByteArrayWithSizeCheck(env, bytes.data(), bytes.size()); + } + + /* + * Helper for operations on a key and value + * for example WriteBatch->Put + * + * TODO(AR) could be used for RocksDB->Put etc. + */ + static std::unique_ptr kv_op( + std::function op, + JNIEnv* env, jobject /*jobj*/, + jbyteArray jkey, jint jkey_len, + jbyteArray jvalue, jint jvalue_len) { + jbyte* key = env->GetByteArrayElements(jkey, nullptr); + if(env->ExceptionCheck()) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + jbyte* value = env->GetByteArrayElements(jvalue, nullptr); + if(env->ExceptionCheck()) { + // exception thrown: OutOfMemoryError + if(key != nullptr) { + env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); + } + return nullptr; + } + + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + rocksdb::Slice value_slice(reinterpret_cast(value), + jvalue_len); + + auto status = op(key_slice, value_slice); + + if(value != nullptr) { + env->ReleaseByteArrayElements(jvalue, value, JNI_ABORT); + } + if(key != nullptr) { + env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); + } + + return std::unique_ptr(new rocksdb::Status(status)); + } + + /* + * Helper for operations on a key + * for example WriteBatch->Delete + * + * TODO(AR) could be used for RocksDB->Delete etc. + */ + static std::unique_ptr k_op( + std::function op, + JNIEnv* env, jobject /*jobj*/, + jbyteArray jkey, jint jkey_len) { + jbyte* key = env->GetByteArrayElements(jkey, nullptr); + if(env->ExceptionCheck()) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + + auto status = op(key_slice); + + if(key != nullptr) { + env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); + } + + return std::unique_ptr(new rocksdb::Status(status)); + } + + /* + * Helper for operations on a value + * for example WriteBatchWithIndex->GetFromBatch + */ + static jbyteArray v_op( + std::function op, + JNIEnv* env, jbyteArray jkey, jint jkey_len) { + jbyte* key = env->GetByteArrayElements(jkey, nullptr); + if(env->ExceptionCheck()) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + + std::string value; + rocksdb::Status s = op(key_slice, &value); + + if(key != nullptr) { + env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); + } + + if (s.IsNotFound()) { + return nullptr; + } + + if (s.ok()) { + jbyteArray jret_value = + env->NewByteArray(static_cast(value.size())); + if(jret_value == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + env->SetByteArrayRegion(jret_value, 0, static_cast(value.size()), + const_cast(reinterpret_cast(value.c_str()))); + if(env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + if(jret_value != nullptr) { + env->DeleteLocalRef(jret_value); + } + return nullptr; + } + + return jret_value; + } + + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + /** + * Creates a vector of C++ pointers from + * a Java array of C++ pointer addresses. + * + * @param env (IN) A pointer to the java environment + * @param pointers (IN) A Java array of C++ pointer addresses + * @param has_exception (OUT) will be set to JNI_TRUE + * if an ArrayIndexOutOfBoundsException or OutOfMemoryError + * exception occurs. + * + * @return A vector of C++ pointers. + */ + template static std::vector fromJPointers( + JNIEnv* env, jlongArray jptrs, jboolean *has_exception) { + const jsize jptrs_len = env->GetArrayLength(jptrs); + std::vector ptrs; + jlong* jptr = env->GetLongArrayElements(jptrs, nullptr); + if (jptr == nullptr) { + // exception thrown: OutOfMemoryError + *has_exception = JNI_TRUE; + return ptrs; + } + ptrs.reserve(jptrs_len); + for (jsize i = 0; i < jptrs_len; i++) { + ptrs.push_back(reinterpret_cast(jptr[i])); + } + env->ReleaseLongArrayElements(jptrs, jptr, JNI_ABORT); + return ptrs; + } + + /** + * Creates a Java array of C++ pointer addresses + * from a vector of C++ pointers. + * + * @param env (IN) A pointer to the java environment + * @param pointers (IN) A vector of C++ pointers + * @param has_exception (OUT) will be set to JNI_TRUE + * if an ArrayIndexOutOfBoundsException or OutOfMemoryError + * exception occurs + * + * @return Java array of C++ pointer addresses. + */ + template static jlongArray toJPointers(JNIEnv* env, + const std::vector &pointers, + jboolean *has_exception) { + const jsize len = static_cast(pointers.size()); + std::unique_ptr results(new jlong[len]); + std::transform(pointers.begin(), pointers.end(), results.get(), [](T* pointer) -> jlong { + return reinterpret_cast(pointer); + }); + + jlongArray jpointers = env->NewLongArray(len); + if (jpointers == nullptr) { + // exception thrown: OutOfMemoryError + *has_exception = JNI_TRUE; + return nullptr; + } + + env->SetLongArrayRegion(jpointers, 0, len, results.get()); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + *has_exception = JNI_TRUE; + env->DeleteLocalRef(jpointers); + return nullptr; + } + + *has_exception = JNI_FALSE; + + return jpointers; + } +}; + +class MapJni : public JavaClass { + public: + /** + * Get the Java Class java.util.Map + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/util/Map"); + } + + /** + * Get the Java Method: Map#put + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getMapPutMethodId(JNIEnv* env) { + jclass jlist_clazz = getJClass(env); + if(jlist_clazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = + env->GetMethodID(jlist_clazz, "put", "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;"); + assert(mid != nullptr); + return mid; + } +}; + +class HashMapJni : public JavaClass { + public: + /** + * Get the Java Class java.util.HashMap + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "java/util/HashMap"); + } + + /** + * Create a new Java java.util.HashMap object. + * + * @param env A pointer to the Java environment + * + * @return A reference to a Java java.util.HashMap object, or + * nullptr if an an exception occurs + */ + static jobject construct(JNIEnv* env, const uint32_t initial_capacity = 16) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + jmethodID mid = env->GetMethodID(jclazz, "", "(I)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jobject jhash_map = env->NewObject(jclazz, mid, static_cast(initial_capacity)); + if (env->ExceptionCheck()) { + return nullptr; + } + + return jhash_map; + } + + /** + * A function which maps a std::pair to a std::pair + * + * @return Either a pointer to a std::pair, or nullptr + * if an error occurs during the mapping + */ + template + using FnMapKV = std::function> (const std::pair&)>; + + // template ::value_type, std::pair>::value, int32_t>::type = 0> + // static void putAll(JNIEnv* env, const jobject jhash_map, I iterator, const FnMapKV &fn_map_kv) { + /** + * Returns true if it succeeds, false if an error occurs + */ + template + static bool putAll(JNIEnv* env, const jobject jhash_map, iterator_type iterator, iterator_type end, const FnMapKV &fn_map_kv) { + const jmethodID jmid_put = rocksdb::MapJni::getMapPutMethodId(env); + if (jmid_put == nullptr) { + return false; + } + + for (auto it = iterator; it != end; ++it) { + const std::unique_ptr> result = fn_map_kv(*it); + if (result == nullptr) { + // an error occurred during fn_map_kv + return false; + } + env->CallObjectMethod(jhash_map, jmid_put, result->first, result->second); + if (env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(result->second); + env->DeleteLocalRef(result->first); + return false; + } + + // release local references + env->DeleteLocalRef(result->second); + env->DeleteLocalRef(result->first); + } + + return true; + } + + /** + * Creates a java.util.Map from a std::map + * + * @param env A pointer to the Java environment + * @param map the Cpp map + * + * @return a reference to the Java java.util.Map object, or nullptr if an exception occcurred + */ + static jobject fromCppMap(JNIEnv* env, const std::map* map) { + if (map == nullptr) { + return nullptr; + } + + jobject jhash_map = construct(env, static_cast(map->size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV fn_map_kv = + [env](const std::pair& kv) { + jstring jkey = rocksdb::JniUtil::toJavaString(env, &(kv.first), false); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jstring jvalue = rocksdb::JniUtil::toJavaString(env, &(kv.second), true); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>(new std::pair(static_cast(jkey), static_cast(jvalue))); + }; + + if (!putAll(env, jhash_map, map->begin(), map->end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; + } + + /** + * Creates a java.util.Map from a std::map + * + * @param env A pointer to the Java environment + * @param map the Cpp map + * + * @return a reference to the Java java.util.Map object, or nullptr if an exception occcurred + */ + static jobject fromCppMap(JNIEnv* env, const std::map* map) { + if (map == nullptr) { + return nullptr; + } + + if (map == nullptr) { + return nullptr; + } + + jobject jhash_map = construct(env, static_cast(map->size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV fn_map_kv = + [env](const std::pair& kv) { + jstring jkey = rocksdb::JniUtil::toJavaString(env, &(kv.first), false); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jobject jvalue = rocksdb::IntegerJni::valueOf(env, static_cast(kv.second)); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>(new std::pair(static_cast(jkey), jvalue)); + }; + + if (!putAll(env, jhash_map, map->begin(), map->end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; + } + + /** + * Creates a java.util.Map from a std::map + * + * @param env A pointer to the Java environment + * @param map the Cpp map + * + * @return a reference to the Java java.util.Map object, or nullptr if an exception occcurred + */ + static jobject fromCppMap(JNIEnv* env, const std::map* map) { + if (map == nullptr) { + return nullptr; + } + + jobject jhash_map = construct(env, static_cast(map->size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV fn_map_kv = + [env](const std::pair& kv) { + jstring jkey = rocksdb::JniUtil::toJavaString(env, &(kv.first), false); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jobject jvalue = rocksdb::LongJni::valueOf(env, static_cast(kv.second)); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>(new std::pair(static_cast(jkey), jvalue)); + }; + + if (!putAll(env, jhash_map, map->begin(), map->end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; + } + + /** + * Creates a java.util.Map from a std::map + * + * @param env A pointer to the Java environment + * @param map the Cpp map + * + * @return a reference to the Java java.util.Map object, or nullptr if an exception occcurred + */ + static jobject fromCppMap(JNIEnv* env, const std::map* map) { + if (map == nullptr) { + return nullptr; + } + + jobject jhash_map = construct(env, static_cast(map->size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV fn_map_kv = + [env](const std::pair& kv) { + jobject jkey = rocksdb::IntegerJni::valueOf(env, static_cast(kv.first)); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jobject jvalue = rocksdb::LongJni::valueOf(env, static_cast(kv.second)); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>(new std::pair(static_cast(jkey), jvalue)); + }; + + if (!putAll(env, jhash_map, map->begin(), map->end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; + } +}; + +// The portal class for org.rocksdb.RocksDB +class RocksDBJni : public RocksDBNativeClass { + public: + /** + * Get the Java Class org.rocksdb.RocksDB + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/RocksDB"); + } +}; + +// The portal class for org.rocksdb.Options +class OptionsJni : public RocksDBNativeClass< + rocksdb::Options*, OptionsJni> { + public: + /** + * Get the Java Class org.rocksdb.Options + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/Options"); + } +}; + +// The portal class for org.rocksdb.DBOptions +class DBOptionsJni : public RocksDBNativeClass< + rocksdb::DBOptions*, DBOptionsJni> { + public: + /** + * Get the Java Class org.rocksdb.DBOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/DBOptions"); + } +}; + +// The portal class for org.rocksdb.ColumnFamilyOptions +class ColumnFamilyOptionsJni + : public RocksDBNativeClass { + public: + /** + * Get the Java Class org.rocksdb.ColumnFamilyOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/ColumnFamilyOptions"); + } + + /** + * Create a new Java org.rocksdb.ColumnFamilyOptions object with the same + * properties as the provided C++ rocksdb::ColumnFamilyOptions object + * + * @param env A pointer to the Java environment + * @param cfoptions A pointer to rocksdb::ColumnFamilyOptions object + * + * @return A reference to a Java org.rocksdb.ColumnFamilyOptions object, or + * nullptr if an an exception occurs + */ + static jobject construct(JNIEnv* env, const ColumnFamilyOptions* cfoptions) { + auto* cfo = new rocksdb::ColumnFamilyOptions(*cfoptions); + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + jmethodID mid = env->GetMethodID(jclazz, "", "(J)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jobject jcfd = env->NewObject(jclazz, mid, reinterpret_cast(cfo)); + if (env->ExceptionCheck()) { + return nullptr; + } + + return jcfd; + } +}; + +// The portal class for org.rocksdb.WriteOptions +class WriteOptionsJni : public RocksDBNativeClass< + rocksdb::WriteOptions*, WriteOptionsJni> { + public: + /** + * Get the Java Class org.rocksdb.WriteOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/WriteOptions"); + } +}; + +// The portal class for org.rocksdb.ReadOptions +class ReadOptionsJni : public RocksDBNativeClass< + rocksdb::ReadOptions*, ReadOptionsJni> { + public: + /** + * Get the Java Class org.rocksdb.ReadOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/ReadOptions"); + } +}; + +// The portal class for org.rocksdb.WriteBatch +class WriteBatchJni : public RocksDBNativeClass< + rocksdb::WriteBatch*, WriteBatchJni> { + public: + /** + * Get the Java Class org.rocksdb.WriteBatch + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/WriteBatch"); + } + + /** + * Create a new Java org.rocksdb.WriteBatch object + * + * @param env A pointer to the Java environment + * @param wb A pointer to rocksdb::WriteBatch object + * + * @return A reference to a Java org.rocksdb.WriteBatch object, or + * nullptr if an an exception occurs + */ + static jobject construct(JNIEnv* env, const WriteBatch* wb) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + jmethodID mid = env->GetMethodID(jclazz, "", "(J)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jobject jwb = env->NewObject(jclazz, mid, reinterpret_cast(wb)); + if (env->ExceptionCheck()) { + return nullptr; + } + + return jwb; + } +}; + +// The portal class for org.rocksdb.WriteBatch.Handler +class WriteBatchHandlerJni : public RocksDBNativeClass< + const rocksdb::WriteBatchHandlerJniCallback*, + WriteBatchHandlerJni> { + public: + /** + * Get the Java Class org.rocksdb.WriteBatch.Handler + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/WriteBatch$Handler"); + } + + /** + * Get the Java Method: WriteBatch.Handler#put + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getPutCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "put", "(I[B[B)V"); + assert(mid != nullptr); + return mid; + } + + /** + * Get the Java Method: WriteBatch.Handler#put + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getPutMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "put", "([B[B)V"); + assert(mid != nullptr); + return mid; + } + + /** + * Get the Java Method: WriteBatch.Handler#merge + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getMergeCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "merge", "(I[B[B)V"); + assert(mid != nullptr); + return mid; + } + + /** + * Get the Java Method: WriteBatch.Handler#merge * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/WriteBatchWithIndex"); + static jmethodID getMergeMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "merge", "([B[B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.HistogramData -class HistogramDataJni : public JavaClass { - public: /** - * Get the Java Class org.rocksdb.HistogramData + * Get the Java Method: WriteBatch.Handler#delete * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "org/rocksdb/HistogramData"); + static jmethodID getDeleteCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "delete", "(I[B)V"); + assert(mid != nullptr); + return mid; } /** - * Get the Java Method: HistogramData constructor + * Get the Java Method: WriteBatch.Handler#delete * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getConstructorMethodId(JNIEnv* env) { + static jmethodID getDeleteMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "", "(DDDDD)V"); + static jmethodID mid = env->GetMethodID(jclazz, "delete", "([B)V"); assert(mid != nullptr); return mid; } -}; -// The portal class for org.rocksdb.BackupableDBOptions -class BackupableDBOptionsJni : public RocksDBNativeClass< - rocksdb::BackupableDBOptions*, BackupableDBOptionsJni> { - public: /** - * Get the Java Class org.rocksdb.BackupableDBOptions + * Get the Java Method: WriteBatch.Handler#singleDelete * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/BackupableDBOptions"); + static jmethodID getSingleDeleteCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "singleDelete", "(I[B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.BackupEngine -class BackupEngineJni : public RocksDBNativeClass< - rocksdb::BackupEngine*, BackupEngineJni> { - public: /** - * Get the Java Class org.rocksdb.BackupableEngine + * Get the Java Method: WriteBatch.Handler#singleDelete * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/BackupEngine"); + static jmethodID getSingleDeleteMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "singleDelete", "([B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.RocksIterator -class IteratorJni : public RocksDBNativeClass< - rocksdb::Iterator*, IteratorJni> { - public: /** - * Get the Java Class org.rocksdb.RocksIterator + * Get the Java Method: WriteBatch.Handler#deleteRange * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/RocksIterator"); + static jmethodID getDeleteRangeCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "deleteRange", "(I[B[B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.Filter -class FilterJni : public RocksDBNativeClass< - std::shared_ptr*, FilterJni> { - public: /** - * Get the Java Class org.rocksdb.Filter + * Get the Java Method: WriteBatch.Handler#deleteRange * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/Filter"); + static jmethodID getDeleteRangeMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "deleteRange", "([B[B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.ColumnFamilyHandle -class ColumnFamilyHandleJni : public RocksDBNativeClass< - rocksdb::ColumnFamilyHandle*, ColumnFamilyHandleJni> { - public: /** - * Get the Java Class org.rocksdb.ColumnFamilyHandle + * Get the Java Method: WriteBatch.Handler#logData * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/ColumnFamilyHandle"); + static jmethodID getLogDataMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "logData", "([B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.FlushOptions -class FlushOptionsJni : public RocksDBNativeClass< - rocksdb::FlushOptions*, FlushOptionsJni> { - public: /** - * Get the Java Class org.rocksdb.FlushOptions + * Get the Java Method: WriteBatch.Handler#putBlobIndex * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/FlushOptions"); + static jmethodID getPutBlobIndexCfMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "putBlobIndex", "(I[B[B)V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.ComparatorOptions -class ComparatorOptionsJni : public RocksDBNativeClass< - rocksdb::ComparatorJniCallbackOptions*, ComparatorOptionsJni> { - public: /** - * Get the Java Class org.rocksdb.ComparatorOptions + * Get the Java Method: WriteBatch.Handler#markBeginPrepare * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/ComparatorOptions"); + static jmethodID getMarkBeginPrepareMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "markBeginPrepare", "()V"); + assert(mid != nullptr); + return mid; } -}; -// The portal class for org.rocksdb.AbstractCompactionFilterFactory -class AbstractCompactionFilterFactoryJni : public RocksDBNativeClass< - const rocksdb::CompactionFilterFactoryJniCallback*, - AbstractCompactionFilterFactoryJni> { - public: /** - * Get the Java Class org.rocksdb.AbstractCompactionFilterFactory + * Get the Java Method: WriteBatch.Handler#markEndPrepare * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getMarkEndPrepareMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "markEndPrepare", "([B)V"); + assert(mid != nullptr); + return mid; + } + + /** + * Get the Java Method: WriteBatch.Handler#markNoop + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/AbstractCompactionFilterFactory"); + static jmethodID getMarkNoopMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "markNoop", "(Z)V"); + assert(mid != nullptr); + return mid; } /** - * Get the Java Method: AbstractCompactionFilterFactory#name + * Get the Java Method: WriteBatch.Handler#markRollback * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getNameMethodId(JNIEnv* env) { + static jmethodID getMarkRollbackMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID( - jclazz, "name", "()Ljava/lang/String;"); + static jmethodID mid = env->GetMethodID(jclazz, "markRollback", "([B)V"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: AbstractCompactionFilterFactory#createCompactionFilter + * Get the Java Method: WriteBatch.Handler#markCommit * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getCreateCompactionFilterMethodId(JNIEnv* env) { + static jmethodID getMarkCommitMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, - "createCompactionFilter", - "(ZZ)J"); + static jmethodID mid = env->GetMethodID(jclazz, "markCommit", "([B)V"); assert(mid != nullptr); return mid; } -}; - -// The portal class for org.rocksdb.AbstractTransactionNotifier -class AbstractTransactionNotifierJni : public RocksDBNativeClass< - const rocksdb::TransactionNotifierJniCallback*, - AbstractTransactionNotifierJni> { - public: - static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/AbstractTransactionNotifier"); - } - // Get the java method `snapshotCreated` - // of org.rocksdb.AbstractTransactionNotifier. - static jmethodID getSnapshotCreatedMethodId(JNIEnv* env) { + /** + * Get the Java Method: WriteBatch.Handler#shouldContinue + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getContinueMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "snapshotCreated", "(J)V"); + static jmethodID mid = env->GetMethodID(jclazz, "shouldContinue", "()Z"); assert(mid != nullptr); return mid; } }; -// The portal class for org.rocksdb.AbstractComparator -class AbstractComparatorJni : public RocksDBNativeClass< - const rocksdb::BaseComparatorJniCallback*, - AbstractComparatorJni> { +class WriteBatchSavePointJni : public JavaClass { public: /** - * Get the Java Class org.rocksdb.AbstractComparator + * Get the Java Class org.rocksdb.WriteBatch.SavePoint * * @param env A pointer to the Java environment * @@ -1848,104 +2966,125 @@ class AbstractComparatorJni : public RocksDBNativeClass< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, - "org/rocksdb/AbstractComparator"); + return JavaClass::getJClass(env, "org/rocksdb/WriteBatch$SavePoint"); } /** - * Get the Java Method: Comparator#name + * Get the Java Method: HistogramData constructor * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getNameMethodId(JNIEnv* env) { + static jmethodID getConstructorMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = - env->GetMethodID(jclazz, "name", "()Ljava/lang/String;"); + static jmethodID mid = env->GetMethodID(jclazz, "", "(JJJ)V"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: Comparator#compare + * Create a new Java org.rocksdb.WriteBatch.SavePoint object * * @param env A pointer to the Java environment + * @param savePoint A pointer to rocksdb::WriteBatch::SavePoint object * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved + * @return A reference to a Java org.rocksdb.WriteBatch.SavePoint object, or + * nullptr if an an exception occurs */ - static jmethodID getCompareMethodId(JNIEnv* env) { + static jobject construct(JNIEnv* env, const SavePoint &save_point) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = - env->GetMethodID(jclazz, "compare", - "(Lorg/rocksdb/AbstractSlice;Lorg/rocksdb/AbstractSlice;)I"); - assert(mid != nullptr); - return mid; + jmethodID mid = getConstructorMethodId(env); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jobject jsave_point = env->NewObject(jclazz, mid, + static_cast(save_point.size), + static_cast(save_point.count), + static_cast(save_point.content_flags)); + if (env->ExceptionCheck()) { + return nullptr; + } + + return jsave_point; } +}; +// The portal class for org.rocksdb.WriteBatchWithIndex +class WriteBatchWithIndexJni : public RocksDBNativeClass< + rocksdb::WriteBatchWithIndex*, WriteBatchWithIndexJni> { + public: /** - * Get the Java Method: Comparator#findShortestSeparator + * Get the Java Class org.rocksdb.WriteBatchWithIndex * * @param env A pointer to the Java environment * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jmethodID getFindShortestSeparatorMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/WriteBatchWithIndex"); + } +}; - static jmethodID mid = - env->GetMethodID(jclazz, "findShortestSeparator", - "(Ljava/lang/String;Lorg/rocksdb/AbstractSlice;)Ljava/lang/String;"); - assert(mid != nullptr); - return mid; +// The portal class for org.rocksdb.HistogramData +class HistogramDataJni : public JavaClass { + public: + /** + * Get the Java Class org.rocksdb.HistogramData + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/HistogramData"); } /** - * Get the Java Method: Comparator#findShortSuccessor + * Get the Java Method: HistogramData constructor * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getFindShortSuccessorMethodId(JNIEnv* env) { + static jmethodID getConstructorMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = - env->GetMethodID(jclazz, "findShortSuccessor", - "(Ljava/lang/String;)Ljava/lang/String;"); + static jmethodID mid = env->GetMethodID(jclazz, "", "(DDDDDDJJD)V"); assert(mid != nullptr); return mid; } }; -// The portal class for org.rocksdb.AbstractSlice -class AbstractSliceJni : public NativeRocksMutableObject< - const rocksdb::Slice*, AbstractSliceJni> { +// The portal class for org.rocksdb.BackupableDBOptions +class BackupableDBOptionsJni : public RocksDBNativeClass< + rocksdb::BackupableDBOptions*, BackupableDBOptionsJni> { public: /** - * Get the Java Class org.rocksdb.AbstractSlice + * Get the Java Class org.rocksdb.BackupableDBOptions * * @param env A pointer to the Java environment * @@ -1954,16 +3093,17 @@ class AbstractSliceJni : public NativeRocksMutableObject< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/AbstractSlice"); + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/BackupableDBOptions"); } }; -// The portal class for org.rocksdb.Slice -class SliceJni : public NativeRocksMutableObject< - const rocksdb::Slice*, AbstractSliceJni> { +// The portal class for org.rocksdb.BackupEngine +class BackupEngineJni : public RocksDBNativeClass< + rocksdb::BackupEngine*, BackupEngineJni> { public: /** - * Get the Java Class org.rocksdb.Slice + * Get the Java Class org.rocksdb.BackupableEngine * * @param env A pointer to the Java environment * @@ -1972,45 +3112,34 @@ class SliceJni : public NativeRocksMutableObject< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/Slice"); + return RocksDBNativeClass::getJClass(env, "org/rocksdb/BackupEngine"); } +}; +// The portal class for org.rocksdb.RocksIterator +class IteratorJni : public RocksDBNativeClass< + rocksdb::Iterator*, IteratorJni> { + public: /** - * Constructs a Slice object + * Get the Java Class org.rocksdb.RocksIterator * * @param env A pointer to the Java environment * - * @return A reference to a Java Slice object, or a nullptr if an - * exception occurs + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jobject construct0(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } - - static jmethodID mid = env->GetMethodID(jclazz, "", "()V"); - if(mid == nullptr) { - // exception occurred accessing method - return nullptr; - } - - jobject jslice = env->NewObject(jclazz, mid); - if(env->ExceptionCheck()) { - return nullptr; - } - - return jslice; + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/RocksIterator"); } }; -// The portal class for org.rocksdb.DirectSlice -class DirectSliceJni : public NativeRocksMutableObject< - const rocksdb::Slice*, AbstractSliceJni> { +// The portal class for org.rocksdb.Filter +class FilterJni : public RocksDBNativeClass< + std::shared_ptr*, FilterJni> { public: /** - * Get the Java Class org.rocksdb.DirectSlice + * Get the Java Class org.rocksdb.Filter * * @param env A pointer to the Java environment * @@ -2019,44 +3148,53 @@ class DirectSliceJni : public NativeRocksMutableObject< * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return RocksDBNativeClass::getJClass(env, "org/rocksdb/DirectSlice"); + return RocksDBNativeClass::getJClass(env, "org/rocksdb/Filter"); } +}; +// The portal class for org.rocksdb.ColumnFamilyHandle +class ColumnFamilyHandleJni : public RocksDBNativeClass< + rocksdb::ColumnFamilyHandle*, ColumnFamilyHandleJni> { + public: /** - * Constructs a DirectSlice object + * Get the Java Class org.rocksdb.ColumnFamilyHandle * * @param env A pointer to the Java environment * - * @return A reference to a Java DirectSlice object, or a nullptr if an - * exception occurs + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jobject construct0(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } - - static jmethodID mid = env->GetMethodID(jclazz, "", "()V"); - if(mid == nullptr) { - // exception occurred accessing method - return nullptr; - } - - jobject jdirect_slice = env->NewObject(jclazz, mid); - if(env->ExceptionCheck()) { - return nullptr; - } + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/ColumnFamilyHandle"); + } +}; - return jdirect_slice; +// The portal class for org.rocksdb.FlushOptions +class FlushOptionsJni : public RocksDBNativeClass< + rocksdb::FlushOptions*, FlushOptionsJni> { + public: + /** + * Get the Java Class org.rocksdb.FlushOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/FlushOptions"); } }; -// The portal class for java.util.List -class ListJni : public JavaClass { +// The portal class for org.rocksdb.ComparatorOptions +class ComparatorOptionsJni : public RocksDBNativeClass< + rocksdb::ComparatorJniCallbackOptions*, ComparatorOptionsJni> { public: /** - * Get the Java Class java.util.List + * Get the Java Class org.rocksdb.ComparatorOptions * * @param env A pointer to the Java environment * @@ -2064,12 +3202,18 @@ class ListJni : public JavaClass { * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getListClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/util/List"); + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/ComparatorOptions"); } +}; +// The portal class for org.rocksdb.AbstractCompactionFilterFactory +class AbstractCompactionFilterFactoryJni : public RocksDBNativeClass< + const rocksdb::CompactionFilterFactoryJniCallback*, + AbstractCompactionFilterFactoryJni> { + public: /** - * Get the Java Class java.util.ArrayList + * Get the Java Class org.rocksdb.AbstractCompactionFilterFactory * * @param env A pointer to the Java environment * @@ -2077,132 +3221,193 @@ class ListJni : public JavaClass { * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getArrayListClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/util/ArrayList"); + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/AbstractCompactionFilterFactory"); } /** - * Get the Java Class java.util.Iterator + * Get the Java Method: AbstractCompactionFilterFactory#name * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getIteratorClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/util/Iterator"); + static jmethodID getNameMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID( + jclazz, "name", "()Ljava/lang/String;"); + assert(mid != nullptr); + return mid; } /** - * Get the Java Method: List#iterator + * Get the Java Method: AbstractCompactionFilterFactory#createCompactionFilter * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getIteratorMethod(JNIEnv* env) { - jclass jlist_clazz = getListClass(env); - if(jlist_clazz == nullptr) { + static jmethodID getCreateCompactionFilterMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = - env->GetMethodID(jlist_clazz, "iterator", "()Ljava/util/Iterator;"); + static jmethodID mid = env->GetMethodID(jclazz, + "createCompactionFilter", + "(ZZ)J"); + assert(mid != nullptr); + return mid; + } +}; + +// The portal class for org.rocksdb.AbstractTransactionNotifier +class AbstractTransactionNotifierJni : public RocksDBNativeClass< + const rocksdb::TransactionNotifierJniCallback*, + AbstractTransactionNotifierJni> { + public: + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/AbstractTransactionNotifier"); + } + + // Get the java method `snapshotCreated` + // of org.rocksdb.AbstractTransactionNotifier. + static jmethodID getSnapshotCreatedMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + static jmethodID mid = env->GetMethodID(jclazz, "snapshotCreated", "(J)V"); assert(mid != nullptr); return mid; } +}; +// The portal class for org.rocksdb.AbstractComparator +class AbstractComparatorJni : public RocksDBNativeClass< + const rocksdb::BaseComparatorJniCallback*, + AbstractComparatorJni> { + public: /** - * Get the Java Method: Iterator#hasNext + * Get the Java Class org.rocksdb.AbstractComparator + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/AbstractComparator"); + } + + /** + * Get the Java Method: Comparator#name * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getHasNextMethod(JNIEnv* env) { - jclass jiterator_clazz = getIteratorClass(env); - if(jiterator_clazz == nullptr) { + static jmethodID getNameMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jiterator_clazz, "hasNext", "()Z"); + static jmethodID mid = + env->GetMethodID(jclazz, "name", "()Ljava/lang/String;"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: Iterator#next + * Get the Java Method: Comparator#compare * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getNextMethod(JNIEnv* env) { - jclass jiterator_clazz = getIteratorClass(env); - if(jiterator_clazz == nullptr) { + static jmethodID getCompareMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } static jmethodID mid = - env->GetMethodID(jiterator_clazz, "next", "()Ljava/lang/Object;"); + env->GetMethodID(jclazz, "compare", + "(Lorg/rocksdb/AbstractSlice;Lorg/rocksdb/AbstractSlice;)I"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: ArrayList constructor + * Get the Java Method: Comparator#findShortestSeparator * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getArrayListConstructorMethodId(JNIEnv* env) { - jclass jarray_list_clazz = getArrayListClass(env); - if(jarray_list_clazz == nullptr) { + static jmethodID getFindShortestSeparatorMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } + static jmethodID mid = - env->GetMethodID(jarray_list_clazz, "", "(I)V"); + env->GetMethodID(jclazz, "findShortestSeparator", + "(Ljava/lang/String;Lorg/rocksdb/AbstractSlice;)Ljava/lang/String;"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: List#add + * Get the Java Method: Comparator#findShortSuccessor * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getListAddMethodId(JNIEnv* env) { - jclass jlist_clazz = getListClass(env); - if(jlist_clazz == nullptr) { + static jmethodID getFindShortSuccessorMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } static jmethodID mid = - env->GetMethodID(jlist_clazz, "add", "(Ljava/lang/Object;)Z"); + env->GetMethodID(jclazz, "findShortSuccessor", + "(Ljava/lang/String;)Ljava/lang/String;"); assert(mid != nullptr); return mid; } }; -// The portal class for java.lang.Byte -class ByteJni : public JavaClass { +// The portal class for org.rocksdb.AbstractSlice +class AbstractSliceJni : public NativeRocksMutableObject< + const rocksdb::Slice*, AbstractSliceJni> { public: /** - * Get the Java Class java.lang.Byte + * Get the Java Class org.rocksdb.AbstractSlice * * @param env A pointer to the Java environment * @@ -2211,11 +3416,16 @@ class ByteJni : public JavaClass { * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/lang/Byte"); + return RocksDBNativeClass::getJClass(env, "org/rocksdb/AbstractSlice"); } +}; +// The portal class for org.rocksdb.Slice +class SliceJni : public NativeRocksMutableObject< + const rocksdb::Slice*, AbstractSliceJni> { + public: /** - * Get the Java Class byte[] + * Get the Java Class org.rocksdb.Slice * * @param env A pointer to the Java environment * @@ -2223,54 +3433,46 @@ class ByteJni : public JavaClass { * ClassFormatError, ClassCircularityError, NoClassDefFoundError, * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ - static jclass getArrayJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "[B"); + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, "org/rocksdb/Slice"); } /** - * Creates a new 2-dimensional Java Byte Array byte[][] + * Constructs a Slice object * * @param env A pointer to the Java environment - * @param len The size of the first dimension * - * @return A reference to the Java byte[][] or nullptr if an exception occurs + * @return A reference to a Java Slice object, or a nullptr if an + * exception occurs */ - static jobjectArray new2dByteArray(JNIEnv* env, const jsize len) { - jclass clazz = getArrayJClass(env); - if(clazz == nullptr) { + static jobject construct0(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - return env->NewObjectArray(len, clazz, nullptr); - } + static jmethodID mid = env->GetMethodID(jclazz, "", "()V"); + if(mid == nullptr) { + // exception occurred accessing method + return nullptr; + } - /** - * Get the Java Method: Byte#byteValue - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getByteValueMethod(JNIEnv* env) { - jclass clazz = getJClass(env); - if(clazz == nullptr) { - // exception occurred accessing class + jobject jslice = env->NewObject(jclazz, mid); + if(env->ExceptionCheck()) { return nullptr; } - static jmethodID mid = env->GetMethodID(clazz, "byteValue", "()B"); - assert(mid != nullptr); - return mid; + return jslice; } }; -// The portal class for java.lang.StringBuilder -class StringBuilderJni : public JavaClass { - public: +// The portal class for org.rocksdb.DirectSlice +class DirectSliceJni : public NativeRocksMutableObject< + const rocksdb::Slice*, AbstractSliceJni> { + public: /** - * Get the Java Class java.lang.StringBuilder + * Get the Java Class org.rocksdb.DirectSlice * * @param env A pointer to the Java environment * @@ -2279,64 +3481,36 @@ class StringBuilderJni : public JavaClass { * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/lang/StringBuilder"); - } - - /** - * Get the Java Method: StringBuilder#append - * - * @param env A pointer to the Java environment - * - * @return The Java Method ID or nullptr if the class or method id could not - * be retieved - */ - static jmethodID getListAddMethodId(JNIEnv* env) { - jclass jclazz = getJClass(env); - if(jclazz == nullptr) { - // exception occurred accessing class - return nullptr; - } - - static jmethodID mid = - env->GetMethodID(jclazz, "append", - "(Ljava/lang/String;)Ljava/lang/StringBuilder;"); - assert(mid != nullptr); - return mid; + return RocksDBNativeClass::getJClass(env, "org/rocksdb/DirectSlice"); } /** - * Appends a C-style string to a StringBuilder + * Constructs a DirectSlice object * * @param env A pointer to the Java environment - * @param jstring_builder Reference to a java.lang.StringBuilder - * @param c_str A C-style string to append to the StringBuilder - * - * @return A reference to the updated StringBuilder, or a nullptr if - * an exception occurs - */ - static jobject append(JNIEnv* env, jobject jstring_builder, - const char* c_str) { - jmethodID mid = getListAddMethodId(env); - if(mid == nullptr) { - // exception occurred accessing class or method + * + * @return A reference to a Java DirectSlice object, or a nullptr if an + * exception occurs + */ + static jobject construct0(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class return nullptr; } - jstring new_value_str = env->NewStringUTF(c_str); - if(new_value_str == nullptr) { - // exception thrown: OutOfMemoryError + static jmethodID mid = env->GetMethodID(jclazz, "", "()V"); + if(mid == nullptr) { + // exception occurred accessing method return nullptr; } - jobject jresult_string_builder = - env->CallObjectMethod(jstring_builder, mid, new_value_str); + jobject jdirect_slice = env->NewObject(jclazz, mid); if(env->ExceptionCheck()) { - // exception occurred - env->DeleteLocalRef(new_value_str); return nullptr; } - return jresult_string_builder; + return jdirect_slice; } }; @@ -3345,9 +4519,98 @@ class TickerTypeJni { return 0x5D; case rocksdb::Tickers::NUMBER_MULTIGET_KEYS_FOUND: return 0x5E; + case rocksdb::Tickers::NO_ITERATOR_CREATED: + // -0x01 to fixate the new value that incorrectly changed TICKER_ENUM_MAX. + return -0x01; + case rocksdb::Tickers::NO_ITERATOR_DELETED: + return 0x60; + case rocksdb::Tickers::COMPACTION_OPTIMIZED_DEL_DROP_OBSOLETE: + return 0x61; + case rocksdb::Tickers::COMPACTION_CANCELLED: + return 0x62; + case rocksdb::Tickers::BLOOM_FILTER_FULL_POSITIVE: + return 0x63; + case rocksdb::Tickers::BLOOM_FILTER_FULL_TRUE_POSITIVE: + return 0x64; + case rocksdb::Tickers::BLOB_DB_NUM_PUT: + return 0x65; + case rocksdb::Tickers::BLOB_DB_NUM_WRITE: + return 0x66; + case rocksdb::Tickers::BLOB_DB_NUM_GET: + return 0x67; + case rocksdb::Tickers::BLOB_DB_NUM_MULTIGET: + return 0x68; + case rocksdb::Tickers::BLOB_DB_NUM_SEEK: + return 0x69; + case rocksdb::Tickers::BLOB_DB_NUM_NEXT: + return 0x6A; + case rocksdb::Tickers::BLOB_DB_NUM_PREV: + return 0x6B; + case rocksdb::Tickers::BLOB_DB_NUM_KEYS_WRITTEN: + return 0x6C; + case rocksdb::Tickers::BLOB_DB_NUM_KEYS_READ: + return 0x6D; + case rocksdb::Tickers::BLOB_DB_BYTES_WRITTEN: + return 0x6E; + case rocksdb::Tickers::BLOB_DB_BYTES_READ: + return 0x6F; + case rocksdb::Tickers::BLOB_DB_WRITE_INLINED: + return 0x70; + case rocksdb::Tickers::BLOB_DB_WRITE_INLINED_TTL: + return 0x71; + case rocksdb::Tickers::BLOB_DB_WRITE_BLOB: + return 0x72; + case rocksdb::Tickers::BLOB_DB_WRITE_BLOB_TTL: + return 0x73; + case rocksdb::Tickers::BLOB_DB_BLOB_FILE_BYTES_WRITTEN: + return 0x74; + case rocksdb::Tickers::BLOB_DB_BLOB_FILE_BYTES_READ: + return 0x75; + case rocksdb::Tickers::BLOB_DB_BLOB_FILE_SYNCED: + return 0x76; + case rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EXPIRED_COUNT: + return 0x77; + case rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EXPIRED_SIZE: + return 0x78; + case rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EVICTED_COUNT: + return 0x79; + case rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EVICTED_SIZE: + return 0x7A; + case rocksdb::Tickers::BLOB_DB_GC_NUM_FILES: + return 0x7B; + case rocksdb::Tickers::BLOB_DB_GC_NUM_NEW_FILES: + return 0x7C; + case rocksdb::Tickers::BLOB_DB_GC_FAILURES: + return 0x7D; + case rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_OVERWRITTEN: + return 0x7E; + case rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_EXPIRED: + return 0x7F; + case rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_RELOCATED: + return -0x02; + case rocksdb::Tickers::BLOB_DB_GC_BYTES_OVERWRITTEN: + return -0x03; + case rocksdb::Tickers::BLOB_DB_GC_BYTES_EXPIRED: + return -0x04; + case rocksdb::Tickers::BLOB_DB_GC_BYTES_RELOCATED: + return -0x05; + case rocksdb::Tickers::BLOB_DB_FIFO_NUM_FILES_EVICTED: + return -0x06; + case rocksdb::Tickers::BLOB_DB_FIFO_NUM_KEYS_EVICTED: + return -0x07; + case rocksdb::Tickers::BLOB_DB_FIFO_BYTES_EVICTED: + return -0x08; + case rocksdb::Tickers::TXN_PREPARE_MUTEX_OVERHEAD: + return -0x09; + case rocksdb::Tickers::TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD: + return -0x0A; + case rocksdb::Tickers::TXN_DUPLICATE_KEY_OVERHEAD: + return -0x0B; + case rocksdb::Tickers::TXN_SNAPSHOT_MUTEX_OVERHEAD: + return -0x0C; case rocksdb::Tickers::TICKER_ENUM_MAX: + // 0x5F for backwards compatibility on current minor version. return 0x5F; - default: // undefined/default return 0x0; @@ -3548,7 +4811,97 @@ class TickerTypeJni { return rocksdb::Tickers::NUMBER_ITER_SKIP; case 0x5E: return rocksdb::Tickers::NUMBER_MULTIGET_KEYS_FOUND; + case -0x01: + // -0x01 to fixate the new value that incorrectly changed TICKER_ENUM_MAX. + return rocksdb::Tickers::NO_ITERATOR_CREATED; + case 0x60: + return rocksdb::Tickers::NO_ITERATOR_DELETED; + case 0x61: + return rocksdb::Tickers::COMPACTION_OPTIMIZED_DEL_DROP_OBSOLETE; + case 0x62: + return rocksdb::Tickers::COMPACTION_CANCELLED; + case 0x63: + return rocksdb::Tickers::BLOOM_FILTER_FULL_POSITIVE; + case 0x64: + return rocksdb::Tickers::BLOOM_FILTER_FULL_TRUE_POSITIVE; + case 0x65: + return rocksdb::Tickers::BLOB_DB_NUM_PUT; + case 0x66: + return rocksdb::Tickers::BLOB_DB_NUM_WRITE; + case 0x67: + return rocksdb::Tickers::BLOB_DB_NUM_GET; + case 0x68: + return rocksdb::Tickers::BLOB_DB_NUM_MULTIGET; + case 0x69: + return rocksdb::Tickers::BLOB_DB_NUM_SEEK; + case 0x6A: + return rocksdb::Tickers::BLOB_DB_NUM_NEXT; + case 0x6B: + return rocksdb::Tickers::BLOB_DB_NUM_PREV; + case 0x6C: + return rocksdb::Tickers::BLOB_DB_NUM_KEYS_WRITTEN; + case 0x6D: + return rocksdb::Tickers::BLOB_DB_NUM_KEYS_READ; + case 0x6E: + return rocksdb::Tickers::BLOB_DB_BYTES_WRITTEN; + case 0x6F: + return rocksdb::Tickers::BLOB_DB_BYTES_READ; + case 0x70: + return rocksdb::Tickers::BLOB_DB_WRITE_INLINED; + case 0x71: + return rocksdb::Tickers::BLOB_DB_WRITE_INLINED_TTL; + case 0x72: + return rocksdb::Tickers::BLOB_DB_WRITE_BLOB; + case 0x73: + return rocksdb::Tickers::BLOB_DB_WRITE_BLOB_TTL; + case 0x74: + return rocksdb::Tickers::BLOB_DB_BLOB_FILE_BYTES_WRITTEN; + case 0x75: + return rocksdb::Tickers::BLOB_DB_BLOB_FILE_BYTES_READ; + case 0x76: + return rocksdb::Tickers::BLOB_DB_BLOB_FILE_SYNCED; + case 0x77: + return rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EXPIRED_COUNT; + case 0x78: + return rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EXPIRED_SIZE; + case 0x79: + return rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EVICTED_COUNT; + case 0x7A: + return rocksdb::Tickers::BLOB_DB_BLOB_INDEX_EVICTED_SIZE; + case 0x7B: + return rocksdb::Tickers::BLOB_DB_GC_NUM_FILES; + case 0x7C: + return rocksdb::Tickers::BLOB_DB_GC_NUM_NEW_FILES; + case 0x7D: + return rocksdb::Tickers::BLOB_DB_GC_FAILURES; + case 0x7E: + return rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_OVERWRITTEN; + case 0x7F: + return rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_EXPIRED; + case -0x02: + return rocksdb::Tickers::BLOB_DB_GC_NUM_KEYS_RELOCATED; + case -0x03: + return rocksdb::Tickers::BLOB_DB_GC_BYTES_OVERWRITTEN; + case -0x04: + return rocksdb::Tickers::BLOB_DB_GC_BYTES_EXPIRED; + case -0x05: + return rocksdb::Tickers::BLOB_DB_GC_BYTES_RELOCATED; + case -0x06: + return rocksdb::Tickers::BLOB_DB_FIFO_NUM_FILES_EVICTED; + case -0x07: + return rocksdb::Tickers::BLOB_DB_FIFO_NUM_KEYS_EVICTED; + case -0x08: + return rocksdb::Tickers::BLOB_DB_FIFO_BYTES_EVICTED; + case -0x09: + return rocksdb::Tickers::TXN_PREPARE_MUTEX_OVERHEAD; + case -0x0A: + return rocksdb::Tickers::TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD; + case -0x0B: + return rocksdb::Tickers::TXN_DUPLICATE_KEY_OVERHEAD; + case -0x0C: + return rocksdb::Tickers::TXN_SNAPSHOT_MUTEX_OVERHEAD; case 0x5F: + // 0x5F for backwards compatibility on current minor version. return rocksdb::Tickers::TICKER_ENUM_MAX; default: @@ -3628,10 +4981,40 @@ class HistogramTypeJni { return 0x1D; case rocksdb::Histograms::READ_NUM_MERGE_OPERANDS: return 0x1E; + // 0x20 to skip 0x1F so TICKER_ENUM_MAX remains unchanged for minor version compatibility. case rocksdb::Histograms::FLUSH_TIME: - return 0x1F; - case rocksdb::Histograms::HISTOGRAM_ENUM_MAX: return 0x20; + case rocksdb::Histograms::BLOB_DB_KEY_SIZE: + return 0x21; + case rocksdb::Histograms::BLOB_DB_VALUE_SIZE: + return 0x22; + case rocksdb::Histograms::BLOB_DB_WRITE_MICROS: + return 0x23; + case rocksdb::Histograms::BLOB_DB_GET_MICROS: + return 0x24; + case rocksdb::Histograms::BLOB_DB_MULTIGET_MICROS: + return 0x25; + case rocksdb::Histograms::BLOB_DB_SEEK_MICROS: + return 0x26; + case rocksdb::Histograms::BLOB_DB_NEXT_MICROS: + return 0x27; + case rocksdb::Histograms::BLOB_DB_PREV_MICROS: + return 0x28; + case rocksdb::Histograms::BLOB_DB_BLOB_FILE_WRITE_MICROS: + return 0x29; + case rocksdb::Histograms::BLOB_DB_BLOB_FILE_READ_MICROS: + return 0x2A; + case rocksdb::Histograms::BLOB_DB_BLOB_FILE_SYNC_MICROS: + return 0x2B; + case rocksdb::Histograms::BLOB_DB_GC_MICROS: + return 0x2C; + case rocksdb::Histograms::BLOB_DB_COMPRESSION_MICROS: + return 0x2D; + case rocksdb::Histograms::BLOB_DB_DECOMPRESSION_MICROS: + return 0x2E; + case rocksdb::Histograms::HISTOGRAM_ENUM_MAX: + // 0x1F for backwards compatibility on current minor version. + return 0x1F; default: // undefined/default @@ -3705,9 +5088,39 @@ class HistogramTypeJni { return rocksdb::Histograms::DECOMPRESSION_TIMES_NANOS; case 0x1E: return rocksdb::Histograms::READ_NUM_MERGE_OPERANDS; - case 0x1F: - return rocksdb::Histograms::FLUSH_TIME; + // 0x20 to skip 0x1F so TICKER_ENUM_MAX remains unchanged for minor version compatibility. case 0x20: + return rocksdb::Histograms::FLUSH_TIME; + case 0x21: + return rocksdb::Histograms::BLOB_DB_KEY_SIZE; + case 0x22: + return rocksdb::Histograms::BLOB_DB_VALUE_SIZE; + case 0x23: + return rocksdb::Histograms::BLOB_DB_WRITE_MICROS; + case 0x24: + return rocksdb::Histograms::BLOB_DB_GET_MICROS; + case 0x25: + return rocksdb::Histograms::BLOB_DB_MULTIGET_MICROS; + case 0x26: + return rocksdb::Histograms::BLOB_DB_SEEK_MICROS; + case 0x27: + return rocksdb::Histograms::BLOB_DB_NEXT_MICROS; + case 0x28: + return rocksdb::Histograms::BLOB_DB_PREV_MICROS; + case 0x29: + return rocksdb::Histograms::BLOB_DB_BLOB_FILE_WRITE_MICROS; + case 0x2A: + return rocksdb::Histograms::BLOB_DB_BLOB_FILE_READ_MICROS; + case 0x2B: + return rocksdb::Histograms::BLOB_DB_BLOB_FILE_SYNC_MICROS; + case 0x2C: + return rocksdb::Histograms::BLOB_DB_GC_MICROS; + case 0x2D: + return rocksdb::Histograms::BLOB_DB_COMPRESSION_MICROS; + case 0x2E: + return rocksdb::Histograms::BLOB_DB_DECOMPRESSION_MICROS; + case 0x1F: + // 0x1F for backwards compatibility on current minor version. return rocksdb::Histograms::HISTOGRAM_ENUM_MAX; default: @@ -3795,6 +5208,48 @@ class RateLimiterModeJni { } }; +// The portal class for org.rocksdb.MemoryUsageType +class MemoryUsageTypeJni { +public: + // Returns the equivalent org.rocksdb.MemoryUsageType for the provided + // C++ rocksdb::MemoryUtil::UsageType enum + static jbyte toJavaMemoryUsageType( + const rocksdb::MemoryUtil::UsageType& usage_type) { + switch(usage_type) { + case rocksdb::MemoryUtil::UsageType::kMemTableTotal: + return 0x0; + case rocksdb::MemoryUtil::UsageType::kMemTableUnFlushed: + return 0x1; + case rocksdb::MemoryUtil::UsageType::kTableReadersTotal: + return 0x2; + case rocksdb::MemoryUtil::UsageType::kCacheTotal: + return 0x3; + default: + // undefined: use kNumUsageTypes + return 0x4; + } + } + + // Returns the equivalent C++ rocksdb::MemoryUtil::UsageType enum for the + // provided Java org.rocksdb.MemoryUsageType + static rocksdb::MemoryUtil::UsageType toCppMemoryUsageType( + jbyte usage_type) { + switch(usage_type) { + case 0x0: + return rocksdb::MemoryUtil::UsageType::kMemTableTotal; + case 0x1: + return rocksdb::MemoryUtil::UsageType::kMemTableUnFlushed; + case 0x2: + return rocksdb::MemoryUtil::UsageType::kTableReadersTotal; + case 0x3: + return rocksdb::MemoryUtil::UsageType::kCacheTotal; + default: + // undefined/default: use kNumUsageTypes + return rocksdb::MemoryUtil::UsageType::kNumUsageTypes; + } + } +}; + // The portal class for org.rocksdb.Transaction class TransactionJni : public JavaClass { public: @@ -4123,704 +5578,1399 @@ class DeadlockPathJni : public JavaClass { } }; -// various utility functions for working with RocksDB and JNI -class JniUtil { +class AbstractTableFilterJni : public RocksDBNativeClass { public: - /** - * Obtains a reference to the JNIEnv from - * the JVM - * - * If the current thread is not attached to the JavaVM - * then it will be attached so as to retrieve the JNIEnv - * - * If a thread is attached, it must later be manually - * released by calling JavaVM::DetachCurrentThread. - * This can be handled by always matching calls to this - * function with calls to {@link JniUtil::releaseJniEnv(JavaVM*, jboolean)} - * - * @param jvm (IN) A pointer to the JavaVM instance - * @param attached (OUT) A pointer to a boolean which - * will be set to JNI_TRUE if we had to attach the thread - * - * @return A pointer to the JNIEnv or nullptr if a fatal error - * occurs and the JNIEnv cannot be retrieved - */ - static JNIEnv* getJniEnv(JavaVM* jvm, jboolean* attached) { - assert(jvm != nullptr); + /** + * Get the Java Method: TableFilter#filter(TableProperties) + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getFilterMethod(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } - JNIEnv *env; - const jint env_rs = jvm->GetEnv(reinterpret_cast(&env), - JNI_VERSION_1_2); + static jmethodID mid = + env->GetMethodID(jclazz, "filter", "(Lorg/rocksdb/TableProperties;)Z"); + assert(mid != nullptr); + return mid; + } - if(env_rs == JNI_OK) { - // current thread is already attached, return the JNIEnv - *attached = JNI_FALSE; - return env; - } else if(env_rs == JNI_EDETACHED) { - // current thread is not attached, attempt to attach - const jint rs_attach = jvm->AttachCurrentThread(reinterpret_cast(&env), NULL); - if(rs_attach == JNI_OK) { - *attached = JNI_TRUE; - return env; - } else { - // error, could not attach the thread - std::cerr << "JniUtil::getJinEnv - Fatal: could not attach current thread to JVM!" << std::endl; - return nullptr; - } - } else if(env_rs == JNI_EVERSION) { - // error, JDK does not support JNI_VERSION_1_2+ - std::cerr << "JniUtil::getJinEnv - Fatal: JDK does not support JNI_VERSION_1_2" << std::endl; - return nullptr; - } else { - std::cerr << "JniUtil::getJinEnv - Fatal: Unknown error: env_rs=" << env_rs << std::endl; - return nullptr; - } + private: + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/TableFilter"); + } +}; + +class TablePropertiesJni : public JavaClass { + public: + /** + * Create a new Java org.rocksdb.TableProperties object. + * + * @param env A pointer to the Java environment + * @param table_properties A Cpp table properties object + * + * @return A reference to a Java org.rocksdb.TableProperties object, or + * nullptr if an an exception occurs + */ + static jobject fromCppTableProperties(JNIEnv* env, const rocksdb::TableProperties& table_properties) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; } - /** - * Counterpart to {@link JniUtil::getJniEnv(JavaVM*, jboolean*)} - * - * Detachess the current thread from the JVM if it was previously - * attached - * - * @param jvm (IN) A pointer to the JavaVM instance - * @param attached (IN) JNI_TRUE if we previously had to attach the thread - * to the JavaVM to get the JNIEnv - */ - static void releaseJniEnv(JavaVM* jvm, jboolean& attached) { - assert(jvm != nullptr); - if(attached == JNI_TRUE) { - const jint rs_detach = jvm->DetachCurrentThread(); - assert(rs_detach == JNI_OK); - if(rs_detach != JNI_OK) { - std::cerr << "JniUtil::getJinEnv - Warn: Unable to detach current thread from JVM!" << std::endl; - } - } + jmethodID mid = env->GetMethodID(jclazz, "", "(JJJJJJJJJJJJJJJJJJJ[BLjava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/util/Map;Ljava/util/Map;Ljava/util/Map;)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jbyteArray jcolumn_family_name = rocksdb::JniUtil::copyBytes(env, table_properties.column_family_name); + if (jcolumn_family_name == nullptr) { + // exception occurred creating java string + return nullptr; + } + + jstring jfilter_policy_name = rocksdb::JniUtil::toJavaString(env, &table_properties.filter_policy_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + return nullptr; + } + + jstring jcomparator_name = rocksdb::JniUtil::toJavaString(env, &table_properties.comparator_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + return nullptr; + } + + jstring jmerge_operator_name = rocksdb::JniUtil::toJavaString(env, &table_properties.merge_operator_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + return nullptr; + } + + jstring jprefix_extractor_name = rocksdb::JniUtil::toJavaString(env, &table_properties.prefix_extractor_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + return nullptr; + } + + jstring jproperty_collectors_names = rocksdb::JniUtil::toJavaString(env, &table_properties.property_collectors_names, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + env->DeleteLocalRef(jprefix_extractor_name); + return nullptr; + } + + jstring jcompression_name = rocksdb::JniUtil::toJavaString(env, &table_properties.compression_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + env->DeleteLocalRef(jprefix_extractor_name); + env->DeleteLocalRef(jproperty_collectors_names); + return nullptr; + } + + // Map + jobject juser_collected_properties = rocksdb::HashMapJni::fromCppMap(env, &table_properties.user_collected_properties); + if (env->ExceptionCheck()) { + // exception occurred creating java map + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + env->DeleteLocalRef(jprefix_extractor_name); + env->DeleteLocalRef(jproperty_collectors_names); + env->DeleteLocalRef(jcompression_name); + return nullptr; + } + + // Map + jobject jreadable_properties = rocksdb::HashMapJni::fromCppMap(env, &table_properties.readable_properties); + if (env->ExceptionCheck()) { + // exception occurred creating java map + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + env->DeleteLocalRef(jprefix_extractor_name); + env->DeleteLocalRef(jproperty_collectors_names); + env->DeleteLocalRef(jcompression_name); + env->DeleteLocalRef(juser_collected_properties); + return nullptr; + } + + // Map + jobject jproperties_offsets = rocksdb::HashMapJni::fromCppMap(env, &table_properties.properties_offsets); + if (env->ExceptionCheck()) { + // exception occurred creating java map + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfilter_policy_name); + env->DeleteLocalRef(jcomparator_name); + env->DeleteLocalRef(jmerge_operator_name); + env->DeleteLocalRef(jprefix_extractor_name); + env->DeleteLocalRef(jproperty_collectors_names); + env->DeleteLocalRef(jcompression_name); + env->DeleteLocalRef(juser_collected_properties); + env->DeleteLocalRef(jreadable_properties); + return nullptr; + } + + jobject jtable_properties = env->NewObject(jclazz, mid, + static_cast(table_properties.data_size), + static_cast(table_properties.index_size), + static_cast(table_properties.index_partitions), + static_cast(table_properties.top_level_index_size), + static_cast(table_properties.index_key_is_user_key), + static_cast(table_properties.index_value_is_delta_encoded), + static_cast(table_properties.filter_size), + static_cast(table_properties.raw_key_size), + static_cast(table_properties.raw_value_size), + static_cast(table_properties.num_data_blocks), + static_cast(table_properties.num_entries), + static_cast(table_properties.num_deletions), + static_cast(table_properties.num_merge_operands), + static_cast(table_properties.num_range_deletions), + static_cast(table_properties.format_version), + static_cast(table_properties.fixed_key_len), + static_cast(table_properties.column_family_id), + static_cast(table_properties.creation_time), + static_cast(table_properties.oldest_key_time), + jcolumn_family_name, + jfilter_policy_name, + jcomparator_name, + jmerge_operator_name, + jprefix_extractor_name, + jproperty_collectors_names, + jcompression_name, + juser_collected_properties, + jreadable_properties, + jproperties_offsets + ); + + if (env->ExceptionCheck()) { + return nullptr; + } + + return jtable_properties; + } + + private: + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/TableProperties"); + } +}; + +class ColumnFamilyDescriptorJni : public JavaClass { + public: + /** + * Get the Java Class org.rocksdb.ColumnFamilyDescriptor + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/ColumnFamilyDescriptor"); + } + + /** + * Create a new Java org.rocksdb.ColumnFamilyDescriptor object with the same + * properties as the provided C++ rocksdb::ColumnFamilyDescriptor object + * + * @param env A pointer to the Java environment + * @param cfd A pointer to rocksdb::ColumnFamilyDescriptor object + * + * @return A reference to a Java org.rocksdb.ColumnFamilyDescriptor object, or + * nullptr if an an exception occurs + */ + static jobject construct(JNIEnv* env, ColumnFamilyDescriptor* cfd) { + jbyteArray jcf_name = JniUtil::copyBytes(env, cfd->name); + jobject cfopts = ColumnFamilyOptionsJni::construct(env, &(cfd->options)); + + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; } - /** - * Copies a Java String[] to a C++ std::vector - * - * @param env (IN) A pointer to the java environment - * @param jss (IN) The Java String array to copy - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError or ArrayIndexOutOfBoundsException - * exception occurs - * - * @return A std::vector containing copies of the Java strings - */ - static std::vector copyStrings(JNIEnv* env, - jobjectArray jss, jboolean* has_exception) { - return rocksdb::JniUtil::copyStrings(env, jss, - env->GetArrayLength(jss), has_exception); + jmethodID mid = env->GetMethodID(jclazz, "", + "([BLorg/rocksdb/ColumnFamilyOptions;)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + env->DeleteLocalRef(jcf_name); + return nullptr; } - /** - * Copies a Java String[] to a C++ std::vector - * - * @param env (IN) A pointer to the java environment - * @param jss (IN) The Java String array to copy - * @param jss_len (IN) The length of the Java String array to copy - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError or ArrayIndexOutOfBoundsException - * exception occurs - * - * @return A std::vector containing copies of the Java strings - */ - static std::vector copyStrings(JNIEnv* env, - jobjectArray jss, const jsize jss_len, jboolean* has_exception) { - std::vector strs; - for (jsize i = 0; i < jss_len; i++) { - jobject js = env->GetObjectArrayElement(jss, i); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - *has_exception = JNI_TRUE; - return strs; - } + jobject jcfd = env->NewObject(jclazz, mid, jcf_name, cfopts); + if (env->ExceptionCheck()) { + env->DeleteLocalRef(jcf_name); + return nullptr; + } - jstring jstr = static_cast(js); - const char* str = env->GetStringUTFChars(jstr, nullptr); - if(str == nullptr) { - // exception thrown: OutOfMemoryError - env->DeleteLocalRef(js); - *has_exception = JNI_TRUE; - return strs; - } + return jcfd; + } - strs.push_back(std::string(str)); + /** + * Get the Java Method: ColumnFamilyDescriptor#columnFamilyName + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getColumnFamilyNameMethod(JNIEnv* env) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } - env->ReleaseStringUTFChars(jstr, str); - env->DeleteLocalRef(js); - } + static jmethodID mid = env->GetMethodID(jclazz, "columnFamilyName", "()[B"); + assert(mid != nullptr); + return mid; + } - *has_exception = JNI_FALSE; - return strs; + /** + * Get the Java Method: ColumnFamilyDescriptor#columnFamilyOptions + * + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved + */ + static jmethodID getColumnFamilyOptionsMethod(JNIEnv* env) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; } - /** - * Copies a jstring to a C-style null-terminated byte string - * and releases the original jstring - * - * The jstring is copied as UTF-8 - * - * If an exception occurs, then JNIEnv::ExceptionCheck() - * will have been called - * - * @param env (IN) A pointer to the java environment - * @param js (IN) The java string to copy - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError exception occurs - * - * @return A pointer to the copied string, or a - * nullptr if has_exception == JNI_TRUE - */ - static std::unique_ptr copyString(JNIEnv* env, jstring js, - jboolean* has_exception) { - const char *utf = env->GetStringUTFChars(js, nullptr); - if(utf == nullptr) { - // exception thrown: OutOfMemoryError - env->ExceptionCheck(); - *has_exception = JNI_TRUE; - return nullptr; - } else if(env->ExceptionCheck()) { - // exception thrown - env->ReleaseStringUTFChars(js, utf); - *has_exception = JNI_TRUE; - return nullptr; - } + static jmethodID mid = env->GetMethodID( + jclazz, "columnFamilyOptions", "()Lorg/rocksdb/ColumnFamilyOptions;"); + assert(mid != nullptr); + return mid; + } +}; - const jsize utf_len = env->GetStringUTFLength(js); - std::unique_ptr str(new char[utf_len + 1]); // Note: + 1 is needed for the c_str null terminator - std::strcpy(str.get(), utf); - env->ReleaseStringUTFChars(js, utf); - *has_exception = JNI_FALSE; - return str; - } +// The portal class for org.rocksdb.IndexType +class IndexTypeJni { + public: + // Returns the equivalent org.rocksdb.IndexType for the provided + // C++ rocksdb::IndexType enum + static jbyte toJavaIndexType( + const rocksdb::BlockBasedTableOptions::IndexType& index_type) { + switch(index_type) { + case rocksdb::BlockBasedTableOptions::IndexType::kBinarySearch: + return 0x0; + case rocksdb::BlockBasedTableOptions::IndexType::kHashSearch: + return 0x1; + case rocksdb::BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch: + return 0x2; + default: + return 0x7F; // undefined + } + } - /** - * Copies a jstring to a std::string - * and releases the original jstring - * - * If an exception occurs, then JNIEnv::ExceptionCheck() - * will have been called - * - * @param env (IN) A pointer to the java environment - * @param js (IN) The java string to copy - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError exception occurs - * - * @return A std:string copy of the jstring, or an - * empty std::string if has_exception == JNI_TRUE - */ - static std::string copyStdString(JNIEnv* env, jstring js, - jboolean* has_exception) { - const char *utf = env->GetStringUTFChars(js, nullptr); - if(utf == nullptr) { - // exception thrown: OutOfMemoryError - env->ExceptionCheck(); - *has_exception = JNI_TRUE; - return std::string(); - } else if(env->ExceptionCheck()) { - // exception thrown - env->ReleaseStringUTFChars(js, utf); - *has_exception = JNI_TRUE; - return std::string(); - } + // Returns the equivalent C++ rocksdb::IndexType enum for the + // provided Java org.rocksdb.IndexType + static rocksdb::BlockBasedTableOptions::IndexType toCppIndexType( + jbyte jindex_type) { + switch(jindex_type) { + case 0x0: + return rocksdb::BlockBasedTableOptions::IndexType::kBinarySearch; + case 0x1: + return rocksdb::BlockBasedTableOptions::IndexType::kHashSearch; + case 0x2: + return rocksdb::BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch; + default: + // undefined/default + return rocksdb::BlockBasedTableOptions::IndexType::kBinarySearch; + } + } +}; - std::string name(utf); - env->ReleaseStringUTFChars(js, utf); - *has_exception = JNI_FALSE; - return name; - } +// The portal class for org.rocksdb.DataBlockIndexType +class DataBlockIndexTypeJni { + public: + // Returns the equivalent org.rocksdb.DataBlockIndexType for the provided + // C++ rocksdb::DataBlockIndexType enum + static jbyte toJavaDataBlockIndexType( + const rocksdb::BlockBasedTableOptions::DataBlockIndexType& index_type) { + switch(index_type) { + case rocksdb::BlockBasedTableOptions::DataBlockIndexType::kDataBlockBinarySearch: + return 0x0; + case rocksdb::BlockBasedTableOptions::DataBlockIndexType::kDataBlockBinaryAndHash: + return 0x1; + default: + return 0x7F; // undefined + } + } - /** - * Copies bytes from a std::string to a jByteArray - * - * @param env A pointer to the java environment - * @param bytes The bytes to copy - * - * @return the Java byte[] or nullptr if an exception occurs - * - * @throws RocksDBException thrown - * if memory size to copy exceeds general java specific array size limitation. - */ - static jbyteArray copyBytes(JNIEnv* env, std::string bytes) { - return createJavaByteArrayWithSizeCheck(env, bytes.c_str(), bytes.size()); - } + // Returns the equivalent C++ rocksdb::DataBlockIndexType enum for the + // provided Java org.rocksdb.DataBlockIndexType + static rocksdb::BlockBasedTableOptions::DataBlockIndexType toCppDataBlockIndexType( + jbyte jindex_type) { + switch(jindex_type) { + case 0x0: + return rocksdb::BlockBasedTableOptions::DataBlockIndexType::kDataBlockBinarySearch; + case 0x1: + return rocksdb::BlockBasedTableOptions::DataBlockIndexType::kDataBlockBinaryAndHash; + default: + // undefined/default + return rocksdb::BlockBasedTableOptions::DataBlockIndexType::kDataBlockBinarySearch; + } + } +}; - /** - * Given a Java byte[][] which is an array of java.lang.Strings - * where each String is a byte[], the passed function `string_fn` - * will be called on each String, the result is the collected by - * calling the passed function `collector_fn` - * - * @param env (IN) A pointer to the java environment - * @param jbyte_strings (IN) A Java array of Strings expressed as bytes - * @param string_fn (IN) A transform function to call for each String - * @param collector_fn (IN) A collector which is called for the result - * of each `string_fn` - * @param has_exception (OUT) will be set to JNI_TRUE - * if an ArrayIndexOutOfBoundsException or OutOfMemoryError - * exception occurs - */ - template static void byteStrings(JNIEnv* env, - jobjectArray jbyte_strings, - std::function string_fn, - std::function collector_fn, - jboolean *has_exception) { - const jsize jlen = env->GetArrayLength(jbyte_strings); +// The portal class for org.rocksdb.ChecksumType +class ChecksumTypeJni { + public: + // Returns the equivalent org.rocksdb.ChecksumType for the provided + // C++ rocksdb::ChecksumType enum + static jbyte toJavaChecksumType( + const rocksdb::ChecksumType& checksum_type) { + switch(checksum_type) { + case rocksdb::ChecksumType::kNoChecksum: + return 0x0; + case rocksdb::ChecksumType::kCRC32c: + return 0x1; + case rocksdb::ChecksumType::kxxHash: + return 0x2; + case rocksdb::ChecksumType::kxxHash64: + return 0x3; + default: + return 0x7F; // undefined + } + } + + // Returns the equivalent C++ rocksdb::ChecksumType enum for the + // provided Java org.rocksdb.ChecksumType + static rocksdb::ChecksumType toCppChecksumType( + jbyte jchecksum_type) { + switch(jchecksum_type) { + case 0x0: + return rocksdb::ChecksumType::kNoChecksum; + case 0x1: + return rocksdb::ChecksumType::kCRC32c; + case 0x2: + return rocksdb::ChecksumType::kxxHash; + case 0x3: + return rocksdb::ChecksumType::kxxHash64; + default: + // undefined/default + return rocksdb::ChecksumType::kCRC32c; + } + } +}; - for(jsize i = 0; i < jlen; i++) { - jobject jbyte_string_obj = env->GetObjectArrayElement(jbyte_strings, i); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - *has_exception = JNI_TRUE; // signal error - return; - } +// The portal class for org.rocksdb.Priority +class PriorityJni { + public: + // Returns the equivalent org.rocksdb.Priority for the provided + // C++ rocksdb::Env::Priority enum + static jbyte toJavaPriority( + const rocksdb::Env::Priority& priority) { + switch(priority) { + case rocksdb::Env::Priority::BOTTOM: + return 0x0; + case rocksdb::Env::Priority::LOW: + return 0x1; + case rocksdb::Env::Priority::HIGH: + return 0x2; + case rocksdb::Env::Priority::TOTAL: + return 0x3; + default: + return 0x7F; // undefined + } + } - jbyteArray jbyte_string_ary = - reinterpret_cast(jbyte_string_obj); - T result = byteString(env, jbyte_string_ary, string_fn, has_exception); + // Returns the equivalent C++ rocksdb::env::Priority enum for the + // provided Java org.rocksdb.Priority + static rocksdb::Env::Priority toCppPriority( + jbyte jpriority) { + switch(jpriority) { + case 0x0: + return rocksdb::Env::Priority::BOTTOM; + case 0x1: + return rocksdb::Env::Priority::LOW; + case 0x2: + return rocksdb::Env::Priority::HIGH; + case 0x3: + return rocksdb::Env::Priority::TOTAL; + default: + // undefined/default + return rocksdb::Env::Priority::LOW; + } + } +}; - env->DeleteLocalRef(jbyte_string_obj); +// The portal class for org.rocksdb.ThreadType +class ThreadTypeJni { + public: + // Returns the equivalent org.rocksdb.ThreadType for the provided + // C++ rocksdb::ThreadStatus::ThreadType enum + static jbyte toJavaThreadType( + const rocksdb::ThreadStatus::ThreadType& thread_type) { + switch(thread_type) { + case rocksdb::ThreadStatus::ThreadType::HIGH_PRIORITY: + return 0x0; + case rocksdb::ThreadStatus::ThreadType::LOW_PRIORITY: + return 0x1; + case rocksdb::ThreadStatus::ThreadType::USER: + return 0x2; + case rocksdb::ThreadStatus::ThreadType::BOTTOM_PRIORITY: + return 0x3; + default: + return 0x7F; // undefined + } + } - if(*has_exception == JNI_TRUE) { - // exception thrown: OutOfMemoryError - return; - } + // Returns the equivalent C++ rocksdb::ThreadStatus::ThreadType enum for the + // provided Java org.rocksdb.ThreadType + static rocksdb::ThreadStatus::ThreadType toCppThreadType( + jbyte jthread_type) { + switch(jthread_type) { + case 0x0: + return rocksdb::ThreadStatus::ThreadType::HIGH_PRIORITY; + case 0x1: + return rocksdb::ThreadStatus::ThreadType::LOW_PRIORITY; + case 0x2: + return ThreadStatus::ThreadType::USER; + case 0x3: + return rocksdb::ThreadStatus::ThreadType::BOTTOM_PRIORITY; + default: + // undefined/default + return rocksdb::ThreadStatus::ThreadType::LOW_PRIORITY; + } + } +}; - collector_fn(i, result); - } +// The portal class for org.rocksdb.OperationType +class OperationTypeJni { + public: + // Returns the equivalent org.rocksdb.OperationType for the provided + // C++ rocksdb::ThreadStatus::OperationType enum + static jbyte toJavaOperationType( + const rocksdb::ThreadStatus::OperationType& operation_type) { + switch(operation_type) { + case rocksdb::ThreadStatus::OperationType::OP_UNKNOWN: + return 0x0; + case rocksdb::ThreadStatus::OperationType::OP_COMPACTION: + return 0x1; + case rocksdb::ThreadStatus::OperationType::OP_FLUSH: + return 0x2; + default: + return 0x7F; // undefined + } + } - *has_exception = JNI_FALSE; - } + // Returns the equivalent C++ rocksdb::ThreadStatus::OperationType enum for the + // provided Java org.rocksdb.OperationType + static rocksdb::ThreadStatus::OperationType toCppOperationType( + jbyte joperation_type) { + switch(joperation_type) { + case 0x0: + return rocksdb::ThreadStatus::OperationType::OP_UNKNOWN; + case 0x1: + return rocksdb::ThreadStatus::OperationType::OP_COMPACTION; + case 0x2: + return rocksdb::ThreadStatus::OperationType::OP_FLUSH; + default: + // undefined/default + return rocksdb::ThreadStatus::OperationType::OP_UNKNOWN; + } + } +}; - /** - * Given a Java String which is expressed as a Java Byte Array byte[], - * the passed function `string_fn` will be called on the String - * and the result returned - * - * @param env (IN) A pointer to the java environment - * @param jbyte_string_ary (IN) A Java String expressed in bytes - * @param string_fn (IN) A transform function to call on the String - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError exception occurs - */ - template static T byteString(JNIEnv* env, - jbyteArray jbyte_string_ary, - std::function string_fn, - jboolean* has_exception) { - const jsize jbyte_string_len = env->GetArrayLength(jbyte_string_ary); - return byteString(env, jbyte_string_ary, jbyte_string_len, string_fn, - has_exception); - } +// The portal class for org.rocksdb.OperationStage +class OperationStageJni { + public: + // Returns the equivalent org.rocksdb.OperationStage for the provided + // C++ rocksdb::ThreadStatus::OperationStage enum + static jbyte toJavaOperationStage( + const rocksdb::ThreadStatus::OperationStage& operation_stage) { + switch(operation_stage) { + case rocksdb::ThreadStatus::OperationStage::STAGE_UNKNOWN: + return 0x0; + case rocksdb::ThreadStatus::OperationStage::STAGE_FLUSH_RUN: + return 0x1; + case rocksdb::ThreadStatus::OperationStage::STAGE_FLUSH_WRITE_L0: + return 0x2; + case rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_PREPARE: + return 0x3; + case rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_RUN: + return 0x4; + case rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_PROCESS_KV: + return 0x5; + case rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_INSTALL: + return 0x6; + case rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_SYNC_FILE: + return 0x7; + case rocksdb::ThreadStatus::OperationStage::STAGE_PICK_MEMTABLES_TO_FLUSH: + return 0x8; + case rocksdb::ThreadStatus::OperationStage::STAGE_MEMTABLE_ROLLBACK: + return 0x9; + case rocksdb::ThreadStatus::OperationStage::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS: + return 0xA; + default: + return 0x7F; // undefined + } + } - /** - * Given a Java String which is expressed as a Java Byte Array byte[], - * the passed function `string_fn` will be called on the String - * and the result returned - * - * @param env (IN) A pointer to the java environment - * @param jbyte_string_ary (IN) A Java String expressed in bytes - * @param jbyte_string_len (IN) The length of the Java String - * expressed in bytes - * @param string_fn (IN) A transform function to call on the String - * @param has_exception (OUT) will be set to JNI_TRUE - * if an OutOfMemoryError exception occurs - */ - template static T byteString(JNIEnv* env, - jbyteArray jbyte_string_ary, const jsize jbyte_string_len, - std::function string_fn, - jboolean* has_exception) { - jbyte* jbyte_string = - env->GetByteArrayElements(jbyte_string_ary, nullptr); - if(jbyte_string == nullptr) { - // exception thrown: OutOfMemoryError - *has_exception = JNI_TRUE; - return nullptr; // signal error - } + // Returns the equivalent C++ rocksdb::ThreadStatus::OperationStage enum for the + // provided Java org.rocksdb.OperationStage + static rocksdb::ThreadStatus::OperationStage toCppOperationStage( + jbyte joperation_stage) { + switch(joperation_stage) { + case 0x0: + return rocksdb::ThreadStatus::OperationStage::STAGE_UNKNOWN; + case 0x1: + return rocksdb::ThreadStatus::OperationStage::STAGE_FLUSH_RUN; + case 0x2: + return rocksdb::ThreadStatus::OperationStage::STAGE_FLUSH_WRITE_L0; + case 0x3: + return rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_PREPARE; + case 0x4: + return rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_RUN; + case 0x5: + return rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_PROCESS_KV; + case 0x6: + return rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_INSTALL; + case 0x7: + return rocksdb::ThreadStatus::OperationStage::STAGE_COMPACTION_SYNC_FILE; + case 0x8: + return rocksdb::ThreadStatus::OperationStage::STAGE_PICK_MEMTABLES_TO_FLUSH; + case 0x9: + return rocksdb::ThreadStatus::OperationStage::STAGE_MEMTABLE_ROLLBACK; + case 0xA: + return rocksdb::ThreadStatus::OperationStage::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS; + default: + // undefined/default + return rocksdb::ThreadStatus::OperationStage::STAGE_UNKNOWN; + } + } +}; - T result = - string_fn(reinterpret_cast(jbyte_string), jbyte_string_len); +// The portal class for org.rocksdb.StateType +class StateTypeJni { + public: + // Returns the equivalent org.rocksdb.StateType for the provided + // C++ rocksdb::ThreadStatus::StateType enum + static jbyte toJavaStateType( + const rocksdb::ThreadStatus::StateType& state_type) { + switch(state_type) { + case rocksdb::ThreadStatus::StateType::STATE_UNKNOWN: + return 0x0; + case rocksdb::ThreadStatus::StateType::STATE_MUTEX_WAIT: + return 0x1; + default: + return 0x7F; // undefined + } + } - env->ReleaseByteArrayElements(jbyte_string_ary, jbyte_string, JNI_ABORT); + // Returns the equivalent C++ rocksdb::ThreadStatus::StateType enum for the + // provided Java org.rocksdb.StateType + static rocksdb::ThreadStatus::StateType toCppStateType( + jbyte jstate_type) { + switch(jstate_type) { + case 0x0: + return rocksdb::ThreadStatus::StateType::STATE_UNKNOWN; + case 0x1: + return rocksdb::ThreadStatus::StateType::STATE_MUTEX_WAIT; + default: + // undefined/default + return rocksdb::ThreadStatus::StateType::STATE_UNKNOWN; + } + } +}; - *has_exception = JNI_FALSE; - return result; +// The portal class for org.rocksdb.ThreadStatus +class ThreadStatusJni : public JavaClass { + public: + /** + * Get the Java Class org.rocksdb.ThreadStatus + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, + "org/rocksdb/ThreadStatus"); + } + + /** + * Create a new Java org.rocksdb.ThreadStatus object with the same + * properties as the provided C++ rocksdb::ThreadStatus object + * + * @param env A pointer to the Java environment + * @param thread_status A pointer to rocksdb::ThreadStatus object + * + * @return A reference to a Java org.rocksdb.ColumnFamilyOptions object, or + * nullptr if an an exception occurs + */ + static jobject construct(JNIEnv* env, + const rocksdb::ThreadStatus* thread_status) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; } - /** - * Converts a std::vector to a Java byte[][] where each Java String - * is expressed as a Java Byte Array byte[]. - * - * @param env A pointer to the java environment - * @param strings A vector of Strings - * - * @return A Java array of Strings expressed as bytes - */ - static jobjectArray stringsBytes(JNIEnv* env, std::vector strings) { - jclass jcls_ba = ByteJni::getArrayJClass(env); - if(jcls_ba == nullptr) { - // exception occurred + jmethodID mid = env->GetMethodID(jclazz, "", "(JBLjava/lang/String;Ljava/lang/String;BJB[JB)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + jstring jdb_name = + JniUtil::toJavaString(env, &(thread_status->db_name), true); + if (env->ExceptionCheck()) { + // an error occurred return nullptr; - } + } - const jsize len = static_cast(strings.size()); + jstring jcf_name = + JniUtil::toJavaString(env, &(thread_status->cf_name), true); + if (env->ExceptionCheck()) { + // an error occurred + env->DeleteLocalRef(jdb_name); + return nullptr; + } - jobjectArray jbyte_strings = env->NewObjectArray(len, jcls_ba, nullptr); - if(jbyte_strings == nullptr) { + // long[] + const jsize len = static_cast(rocksdb::ThreadStatus::kNumOperationProperties); + jlongArray joperation_properties = + env->NewLongArray(len); + if (joperation_properties == nullptr) { + // an exception occurred + env->DeleteLocalRef(jdb_name); + env->DeleteLocalRef(jcf_name); + return nullptr; + } + jlong *body = env->GetLongArrayElements(joperation_properties, nullptr); + if (body == nullptr) { // exception thrown: OutOfMemoryError + env->DeleteLocalRef(jdb_name); + env->DeleteLocalRef(jcf_name); + env->DeleteLocalRef(joperation_properties); return nullptr; - } + } + for (size_t i = 0; i < len; ++i) { + body[i] = static_cast(thread_status->op_properties[i]); + } + env->ReleaseLongArrayElements(joperation_properties, body, 0); + + jobject jcfd = env->NewObject(jclazz, mid, + static_cast(thread_status->thread_id), + ThreadTypeJni::toJavaThreadType(thread_status->thread_type), + jdb_name, + jcf_name, + OperationTypeJni::toJavaOperationType(thread_status->operation_type), + static_cast(thread_status->op_elapsed_micros), + OperationStageJni::toJavaOperationStage(thread_status->operation_stage), + joperation_properties, + StateTypeJni::toJavaStateType(thread_status->state_type)); + if (env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(jdb_name); + env->DeleteLocalRef(jcf_name); + env->DeleteLocalRef(joperation_properties); + return nullptr; + } - for (jsize i = 0; i < len; i++) { - std::string *str = &strings[i]; - const jsize str_len = static_cast(str->size()); + // cleanup + env->DeleteLocalRef(jdb_name); + env->DeleteLocalRef(jcf_name); + env->DeleteLocalRef(joperation_properties); - jbyteArray jbyte_string_ary = env->NewByteArray(str_len); - if(jbyte_string_ary == nullptr) { - // exception thrown: OutOfMemoryError - env->DeleteLocalRef(jbyte_strings); - return nullptr; - } + return jcfd; + } +}; + +// The portal class for org.rocksdb.CompactionStyle +class CompactionStyleJni { + public: + // Returns the equivalent org.rocksdb.CompactionStyle for the provided + // C++ rocksdb::CompactionStyle enum + static jbyte toJavaCompactionStyle( + const rocksdb::CompactionStyle& compaction_style) { + switch(compaction_style) { + case rocksdb::CompactionStyle::kCompactionStyleLevel: + return 0x0; + case rocksdb::CompactionStyle::kCompactionStyleUniversal: + return 0x1; + case rocksdb::CompactionStyle::kCompactionStyleFIFO: + return 0x2; + case rocksdb::CompactionStyle::kCompactionStyleNone: + return 0x3; + default: + return 0x7F; // undefined + } + } + + // Returns the equivalent C++ rocksdb::CompactionStyle enum for the + // provided Java org.rocksdb.CompactionStyle + static rocksdb::CompactionStyle toCppCompactionStyle( + jbyte jcompaction_style) { + switch(jcompaction_style) { + case 0x0: + return rocksdb::CompactionStyle::kCompactionStyleLevel; + case 0x1: + return rocksdb::CompactionStyle::kCompactionStyleUniversal; + case 0x2: + return rocksdb::CompactionStyle::kCompactionStyleFIFO; + case 0x3: + return rocksdb::CompactionStyle::kCompactionStyleNone; + default: + // undefined/default + return rocksdb::CompactionStyle::kCompactionStyleLevel; + } + } +}; + +// The portal class for org.rocksdb.CompactionReason +class CompactionReasonJni { + public: + // Returns the equivalent org.rocksdb.CompactionReason for the provided + // C++ rocksdb::CompactionReason enum + static jbyte toJavaCompactionReason( + const rocksdb::CompactionReason& compaction_reason) { + switch(compaction_reason) { + case rocksdb::CompactionReason::kUnknown: + return 0x0; + case rocksdb::CompactionReason::kLevelL0FilesNum: + return 0x1; + case rocksdb::CompactionReason::kLevelMaxLevelSize: + return 0x2; + case rocksdb::CompactionReason::kUniversalSizeAmplification: + return 0x3; + case rocksdb::CompactionReason::kUniversalSizeRatio: + return 0x4; + case rocksdb::CompactionReason::kUniversalSortedRunNum: + return 0x5; + case rocksdb::CompactionReason::kFIFOMaxSize: + return 0x6; + case rocksdb::CompactionReason::kFIFOReduceNumFiles: + return 0x7; + case rocksdb::CompactionReason::kFIFOTtl: + return 0x8; + case rocksdb::CompactionReason::kManualCompaction: + return 0x9; + case rocksdb::CompactionReason::kFilesMarkedForCompaction: + return 0x10; + case rocksdb::CompactionReason::kBottommostFiles: + return 0x0A; + case rocksdb::CompactionReason::kTtl: + return 0x0B; + case rocksdb::CompactionReason::kFlush: + return 0x0C; + case rocksdb::CompactionReason::kExternalSstIngestion: + return 0x0D; + default: + return 0x7F; // undefined + } + } + + // Returns the equivalent C++ rocksdb::CompactionReason enum for the + // provided Java org.rocksdb.CompactionReason + static rocksdb::CompactionReason toCppCompactionReason( + jbyte jcompaction_reason) { + switch(jcompaction_reason) { + case 0x0: + return rocksdb::CompactionReason::kUnknown; + case 0x1: + return rocksdb::CompactionReason::kLevelL0FilesNum; + case 0x2: + return rocksdb::CompactionReason::kLevelMaxLevelSize; + case 0x3: + return rocksdb::CompactionReason::kUniversalSizeAmplification; + case 0x4: + return rocksdb::CompactionReason::kUniversalSizeRatio; + case 0x5: + return rocksdb::CompactionReason::kUniversalSortedRunNum; + case 0x6: + return rocksdb::CompactionReason::kFIFOMaxSize; + case 0x7: + return rocksdb::CompactionReason::kFIFOReduceNumFiles; + case 0x8: + return rocksdb::CompactionReason::kFIFOTtl; + case 0x9: + return rocksdb::CompactionReason::kManualCompaction; + case 0x10: + return rocksdb::CompactionReason::kFilesMarkedForCompaction; + case 0x0A: + return rocksdb::CompactionReason::kBottommostFiles; + case 0x0B: + return rocksdb::CompactionReason::kTtl; + case 0x0C: + return rocksdb::CompactionReason::kFlush; + case 0x0D: + return rocksdb::CompactionReason::kExternalSstIngestion; + default: + // undefined/default + return rocksdb::CompactionReason::kUnknown; + } + } +}; - env->SetByteArrayRegion( - jbyte_string_ary, 0, str_len, - const_cast(reinterpret_cast(str->c_str()))); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jbyte_string_ary); - env->DeleteLocalRef(jbyte_strings); - return nullptr; - } +// The portal class for org.rocksdb.WalFileType +class WalFileTypeJni { + public: + // Returns the equivalent org.rocksdb.WalFileType for the provided + // C++ rocksdb::WalFileType enum + static jbyte toJavaWalFileType( + const rocksdb::WalFileType& wal_file_type) { + switch(wal_file_type) { + case rocksdb::WalFileType::kArchivedLogFile: + return 0x0; + case rocksdb::WalFileType::kAliveLogFile: + return 0x1; + default: + return 0x7F; // undefined + } + } - env->SetObjectArrayElement(jbyte_strings, i, jbyte_string_ary); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - // or ArrayStoreException - env->DeleteLocalRef(jbyte_string_ary); - env->DeleteLocalRef(jbyte_strings); - return nullptr; - } + // Returns the equivalent C++ rocksdb::WalFileType enum for the + // provided Java org.rocksdb.WalFileType + static rocksdb::WalFileType toCppWalFileType( + jbyte jwal_file_type) { + switch(jwal_file_type) { + case 0x0: + return rocksdb::WalFileType::kArchivedLogFile; + case 0x1: + return rocksdb::WalFileType::kAliveLogFile; + default: + // undefined/default + return rocksdb::WalFileType::kAliveLogFile; + } + } +}; - env->DeleteLocalRef(jbyte_string_ary); - } +class LogFileJni : public JavaClass { + public: + /** + * Create a new Java org.rocksdb.LogFile object. + * + * @param env A pointer to the Java environment + * @param log_file A Cpp log file object + * + * @return A reference to a Java org.rocksdb.LogFile object, or + * nullptr if an an exception occurs + */ + static jobject fromCppLogFile(JNIEnv* env, rocksdb::LogFile* log_file) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } - return jbyte_strings; + jmethodID mid = env->GetMethodID(jclazz, "", "(Ljava/lang/String;JBJJ)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; } - - /** - * Copies bytes to a new jByteArray with the check of java array size limitation. - * - * @param bytes pointer to memory to copy to a new jByteArray - * @param size number of bytes to copy - * - * @return the Java byte[] or nullptr if an exception occurs - * - * @throws RocksDBException thrown - * if memory size to copy exceeds general java array size limitation to avoid overflow. - */ - static jbyteArray createJavaByteArrayWithSizeCheck(JNIEnv* env, const char* bytes, const size_t size) { - // Limitation for java array size is vm specific - // In general it cannot exceed Integer.MAX_VALUE (2^31 - 1) - // Current HotSpot VM limitation for array size is Integer.MAX_VALUE - 5 (2^31 - 1 - 5) - // It means that the next call to env->NewByteArray can still end with - // OutOfMemoryError("Requested array size exceeds VM limit") coming from VM - static const size_t MAX_JARRAY_SIZE = (static_cast(1)) << 31; - if(size > MAX_JARRAY_SIZE) { - rocksdb::RocksDBExceptionJni::ThrowNew(env, "Requested array size exceeds VM limit"); - return nullptr; - } - - const jsize jlen = static_cast(size); - jbyteArray jbytes = env->NewByteArray(jlen); - if(jbytes == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } - - env->SetByteArrayRegion(jbytes, 0, jlen, - const_cast(reinterpret_cast(bytes))); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jbytes); - return nullptr; - } - return jbytes; + std::string path_name = log_file->PathName(); + jstring jpath_name = rocksdb::JniUtil::toJavaString(env, &path_name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + return nullptr; } - /** - * Copies bytes from a rocksdb::Slice to a jByteArray - * - * @param env A pointer to the java environment - * @param bytes The bytes to copy - * - * @return the Java byte[] or nullptr if an exception occurs - * - * @throws RocksDBException thrown - * if memory size to copy exceeds general java specific array size limitation. - */ - static jbyteArray copyBytes(JNIEnv* env, const Slice& bytes) { - return createJavaByteArrayWithSizeCheck(env, bytes.data(), bytes.size()); + jobject jlog_file = env->NewObject(jclazz, mid, + jpath_name, + static_cast(log_file->LogNumber()), + rocksdb::WalFileTypeJni::toJavaWalFileType(log_file->Type()), + static_cast(log_file->StartSequence()), + static_cast(log_file->SizeFileBytes()) + ); + + if (env->ExceptionCheck()) { + env->DeleteLocalRef(jpath_name); + return nullptr; } - /* - * Helper for operations on a key and value - * for example WriteBatch->Put - * - * TODO(AR) could be used for RocksDB->Put etc. - */ - static std::unique_ptr kv_op( - std::function op, - JNIEnv* env, jobject /*jobj*/, - jbyteArray jkey, jint jkey_len, - jbyteArray jvalue, jint jvalue_len) { - jbyte* key = env->GetByteArrayElements(jkey, nullptr); - if(env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - return nullptr; - } + // cleanup + env->DeleteLocalRef(jpath_name); - jbyte* value = env->GetByteArrayElements(jvalue, nullptr); - if(env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - if(key != nullptr) { - env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); - } - return nullptr; - } + return jlog_file; + } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - rocksdb::Slice value_slice(reinterpret_cast(value), - jvalue_len); + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/LogFile"); + } +}; - auto status = op(key_slice, value_slice); +class LiveFileMetaDataJni : public JavaClass { + public: + /** + * Create a new Java org.rocksdb.LiveFileMetaData object. + * + * @param env A pointer to the Java environment + * @param live_file_meta_data A Cpp live file meta data object + * + * @return A reference to a Java org.rocksdb.LiveFileMetaData object, or + * nullptr if an an exception occurs + */ + static jobject fromCppLiveFileMetaData(JNIEnv* env, + rocksdb::LiveFileMetaData* live_file_meta_data) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } - if(value != nullptr) { - env->ReleaseByteArrayElements(jvalue, value, JNI_ABORT); - } - if(key != nullptr) { - env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); - } + jmethodID mid = env->GetMethodID(jclazz, "", "([BILjava/lang/String;Ljava/lang/String;JJJ[B[BJZJJ)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } - return std::unique_ptr(new rocksdb::Status(status)); + jbyteArray jcolumn_family_name = rocksdb::JniUtil::copyBytes( + env, live_file_meta_data->column_family_name); + if (jcolumn_family_name == nullptr) { + // exception occurred creating java byte array + return nullptr; } - /* - * Helper for operations on a key - * for example WriteBatch->Delete - * - * TODO(AR) could be used for RocksDB->Delete etc. - */ - static std::unique_ptr k_op( - std::function op, - JNIEnv* env, jobject /*jobj*/, - jbyteArray jkey, jint jkey_len) { - jbyte* key = env->GetByteArrayElements(jkey, nullptr); - if(env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - return nullptr; - } + jstring jfile_name = rocksdb::JniUtil::toJavaString( + env, &live_file_meta_data->name, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + return nullptr; + } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + jstring jpath = rocksdb::JniUtil::toJavaString( + env, &live_file_meta_data->db_path, true); + if (env->ExceptionCheck()) { + // exception occurred creating java string + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfile_name); + return nullptr; + } - auto status = op(key_slice); + jbyteArray jsmallest_key = rocksdb::JniUtil::copyBytes( + env, live_file_meta_data->smallestkey); + if (jsmallest_key == nullptr) { + // exception occurred creating java byte array + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + return nullptr; + } - if(key != nullptr) { - env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); - } + jbyteArray jlargest_key = rocksdb::JniUtil::copyBytes( + env, live_file_meta_data->largestkey); + if (jlargest_key == nullptr) { + // exception occurred creating java byte array + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + return nullptr; + } - return std::unique_ptr(new rocksdb::Status(status)); + jobject jlive_file_meta_data = env->NewObject(jclazz, mid, + jcolumn_family_name, + static_cast(live_file_meta_data->level), + jfile_name, + jpath, + static_cast(live_file_meta_data->size), + static_cast(live_file_meta_data->smallest_seqno), + static_cast(live_file_meta_data->largest_seqno), + jsmallest_key, + jlargest_key, + static_cast(live_file_meta_data->num_reads_sampled), + static_cast(live_file_meta_data->being_compacted), + static_cast(live_file_meta_data->num_entries), + static_cast(live_file_meta_data->num_deletions) + ); + + if (env->ExceptionCheck()) { + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + env->DeleteLocalRef(jlargest_key); + return nullptr; } - /* - * Helper for operations on a value - * for example WriteBatchWithIndex->GetFromBatch - */ - static jbyteArray v_op( - std::function op, - JNIEnv* env, jbyteArray jkey, jint jkey_len) { - jbyte* key = env->GetByteArrayElements(jkey, nullptr); - if(env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - return nullptr; - } + // cleanup + env->DeleteLocalRef(jcolumn_family_name); + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + env->DeleteLocalRef(jlargest_key); - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + return jlive_file_meta_data; + } - std::string value; - rocksdb::Status s = op(key_slice, &value); + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/LiveFileMetaData"); + } +}; - if(key != nullptr) { - env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); - } +class SstFileMetaDataJni : public JavaClass { + public: + /** + * Create a new Java org.rocksdb.SstFileMetaData object. + * + * @param env A pointer to the Java environment + * @param sst_file_meta_data A Cpp sst file meta data object + * + * @return A reference to a Java org.rocksdb.SstFileMetaData object, or + * nullptr if an an exception occurs + */ + static jobject fromCppSstFileMetaData(JNIEnv* env, + const rocksdb::SstFileMetaData* sst_file_meta_data) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } - if (s.IsNotFound()) { - return nullptr; - } + jmethodID mid = env->GetMethodID(jclazz, "", "(Ljava/lang/String;Ljava/lang/String;JJJ[B[BJZJJ)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } - if (s.ok()) { - jbyteArray jret_value = - env->NewByteArray(static_cast(value.size())); - if(jret_value == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } + jstring jfile_name = rocksdb::JniUtil::toJavaString( + env, &sst_file_meta_data->name, true); + if (jfile_name == nullptr) { + // exception occurred creating java byte array + return nullptr; + } - env->SetByteArrayRegion(jret_value, 0, static_cast(value.size()), - const_cast(reinterpret_cast(value.c_str()))); - if(env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - if(jret_value != nullptr) { - env->DeleteLocalRef(jret_value); - } - return nullptr; - } + jstring jpath = rocksdb::JniUtil::toJavaString( + env, &sst_file_meta_data->db_path, true); + if (jpath == nullptr) { + // exception occurred creating java byte array + env->DeleteLocalRef(jfile_name); + return nullptr; + } - return jret_value; - } + jbyteArray jsmallest_key = rocksdb::JniUtil::copyBytes( + env, sst_file_meta_data->smallestkey); + if (jsmallest_key == nullptr) { + // exception occurred creating java byte array + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + return nullptr; + } - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + jbyteArray jlargest_key = rocksdb::JniUtil::copyBytes( + env, sst_file_meta_data->largestkey); + if (jlargest_key == nullptr) { + // exception occurred creating java byte array + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + return nullptr; + } + + jobject jsst_file_meta_data = env->NewObject(jclazz, mid, + jfile_name, + jpath, + static_cast(sst_file_meta_data->size), + static_cast(sst_file_meta_data->smallest_seqno), + static_cast(sst_file_meta_data->largest_seqno), + jsmallest_key, + jlargest_key, + static_cast(sst_file_meta_data->num_reads_sampled), + static_cast(sst_file_meta_data->being_compacted), + static_cast(sst_file_meta_data->num_entries), + static_cast(sst_file_meta_data->num_deletions) + ); + + if (env->ExceptionCheck()) { + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + env->DeleteLocalRef(jlargest_key); return nullptr; } + + // cleanup + env->DeleteLocalRef(jfile_name); + env->DeleteLocalRef(jpath); + env->DeleteLocalRef(jsmallest_key); + env->DeleteLocalRef(jlargest_key); + + return jsst_file_meta_data; + } + + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/SstFileMetaData"); + } }; -class ColumnFamilyDescriptorJni : public JavaClass { +class LevelMetaDataJni : public JavaClass { public: /** - * Get the Java Class org.rocksdb.ColumnFamilyDescriptor + * Create a new Java org.rocksdb.LevelMetaData object. * * @param env A pointer to the Java environment + * @param level_meta_data A Cpp level meta data object * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return A reference to a Java org.rocksdb.LevelMetaData object, or + * nullptr if an an exception occurs */ + static jobject fromCppLevelMetaData(JNIEnv* env, + const rocksdb::LevelMetaData* level_meta_data) { + jclass jclazz = getJClass(env); + if (jclazz == nullptr) { + // exception occurred accessing class + return nullptr; + } + + jmethodID mid = env->GetMethodID(jclazz, "", "(IJ[Lorg/rocksdb/SstFileMetaData;)V"); + if (mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return nullptr; + } + + const jsize jlen = + static_cast(level_meta_data->files.size()); + jobjectArray jfiles = env->NewObjectArray(jlen, SstFileMetaDataJni::getJClass(env), nullptr); + if (jfiles == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + jsize i = 0; + for (auto it = level_meta_data->files.begin(); + it != level_meta_data->files.end(); ++it) { + jobject jfile = SstFileMetaDataJni::fromCppSstFileMetaData(env, &(*it)); + if (jfile == nullptr) { + // exception occurred + env->DeleteLocalRef(jfiles); + return nullptr; + } + env->SetObjectArrayElement(jfiles, i++, jfile); + } + + jobject jlevel_meta_data = env->NewObject(jclazz, mid, + static_cast(level_meta_data->level), + static_cast(level_meta_data->size), + jfiles + ); + + if (env->ExceptionCheck()) { + env->DeleteLocalRef(jfiles); + return nullptr; + } + + // cleanup + env->DeleteLocalRef(jfiles); + + return jlevel_meta_data; + } + static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "org/rocksdb/ColumnFamilyDescriptor"); + return JavaClass::getJClass(env, "org/rocksdb/LevelMetaData"); } +}; +class ColumnFamilyMetaDataJni : public JavaClass { + public: /** - * Create a new Java org.rocksdb.ColumnFamilyDescriptor object with the same - * properties as the provided C++ rocksdb::ColumnFamilyDescriptor object + * Create a new Java org.rocksdb.ColumnFamilyMetaData object. * * @param env A pointer to the Java environment - * @param cfd A pointer to rocksdb::ColumnFamilyDescriptor object + * @param column_famly_meta_data A Cpp live file meta data object * - * @return A reference to a Java org.rocksdb.ColumnFamilyDescriptor object, or + * @return A reference to a Java org.rocksdb.ColumnFamilyMetaData object, or * nullptr if an an exception occurs */ - static jobject construct(JNIEnv* env, ColumnFamilyDescriptor* cfd) { - jbyteArray jcf_name = JniUtil::copyBytes(env, cfd->name); - jobject cfopts = ColumnFamilyOptionsJni::construct(env, &(cfd->options)); - + static jobject fromCppColumnFamilyMetaData(JNIEnv* env, + const rocksdb::ColumnFamilyMetaData* column_famly_meta_data) { jclass jclazz = getJClass(env); if (jclazz == nullptr) { // exception occurred accessing class return nullptr; } - jmethodID mid = env->GetMethodID(jclazz, "", - "([BLorg/rocksdb/ColumnFamilyOptions;)V"); + jmethodID mid = env->GetMethodID(jclazz, "", "(JJ[B[Lorg/rocksdb/LevelMetaData;)V"); if (mid == nullptr) { // exception thrown: NoSuchMethodException or OutOfMemoryError - env->DeleteLocalRef(jcf_name); return nullptr; } - jobject jcfd = env->NewObject(jclazz, mid, jcf_name, cfopts); + jbyteArray jname = rocksdb::JniUtil::copyBytes( + env, column_famly_meta_data->name); + if (jname == nullptr) { + // exception occurred creating java byte array + return nullptr; + } + + const jsize jlen = + static_cast(column_famly_meta_data->levels.size()); + jobjectArray jlevels = env->NewObjectArray(jlen, LevelMetaDataJni::getJClass(env), nullptr); + if(jlevels == nullptr) { + // exception thrown: OutOfMemoryError + env->DeleteLocalRef(jname); + return nullptr; + } + + jsize i = 0; + for (auto it = column_famly_meta_data->levels.begin(); + it != column_famly_meta_data->levels.end(); ++it) { + jobject jlevel = LevelMetaDataJni::fromCppLevelMetaData(env, &(*it)); + if (jlevel == nullptr) { + // exception occurred + env->DeleteLocalRef(jname); + env->DeleteLocalRef(jlevels); + return nullptr; + } + env->SetObjectArrayElement(jlevels, i++, jlevel); + } + + jobject jcolumn_family_meta_data = env->NewObject(jclazz, mid, + static_cast(column_famly_meta_data->size), + static_cast(column_famly_meta_data->file_count), + jname, + jlevels + ); + if (env->ExceptionCheck()) { - env->DeleteLocalRef(jcf_name); + env->DeleteLocalRef(jname); + env->DeleteLocalRef(jlevels); return nullptr; } - return jcfd; + // cleanup + env->DeleteLocalRef(jname); + env->DeleteLocalRef(jlevels); + + return jcolumn_family_meta_data; + } + + static jclass getJClass(JNIEnv* env) { + return JavaClass::getJClass(env, "org/rocksdb/ColumnFamilyMetaData"); } +}; +// The portal class for org.rocksdb.AbstractTraceWriter +class AbstractTraceWriterJni : public RocksDBNativeClass< + const rocksdb::TraceWriterJniCallback*, + AbstractTraceWriterJni> { + public: /** - * Get the Java Method: ColumnFamilyDescriptor#columnFamilyName + * Get the Java Class org.rocksdb.AbstractTraceWriter + * + * @param env A pointer to the Java environment + * + * @return The Java Class or nullptr if one of the + * ClassFormatError, ClassCircularityError, NoClassDefFoundError, + * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + */ + static jclass getJClass(JNIEnv* env) { + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/AbstractTraceWriter"); + } + + /** + * Get the Java Method: AbstractTraceWriter#write * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getColumnFamilyNameMethod(JNIEnv* env) { + static jmethodID getWriteProxyMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); - if (jclazz == nullptr) { + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = env->GetMethodID(jclazz, "columnFamilyName", "()[B"); + static jmethodID mid = env->GetMethodID( + jclazz, "writeProxy", "(J)S"); assert(mid != nullptr); return mid; } /** - * Get the Java Method: ColumnFamilyDescriptor#columnFamilyOptions + * Get the Java Method: AbstractTraceWriter#closeWriter * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getColumnFamilyOptionsMethod(JNIEnv* env) { + static jmethodID getCloseWriterProxyMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); - if (jclazz == nullptr) { + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } static jmethodID mid = env->GetMethodID( - jclazz, "columnFamilyOptions", "()Lorg/rocksdb/ColumnFamilyOptions;"); + jclazz, "closeWriterProxy", "()S"); assert(mid != nullptr); return mid; } -}; - -class MapJni : public JavaClass { - public: - /** - * Get the Java Class java.util.Map - * - * @param env A pointer to the Java environment - * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown - */ - static jclass getClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/util/Map"); - } /** - * Get the Java Method: Map#put + * Get the Java Method: AbstractTraceWriter#getFileSize * * @param env A pointer to the Java environment * * @return The Java Method ID or nullptr if the class or method id could not * be retieved */ - static jmethodID getMapPutMethodId(JNIEnv* env) { - jclass jlist_clazz = getClass(env); - if(jlist_clazz == nullptr) { + static jmethodID getGetFileSizeMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - static jmethodID mid = - env->GetMethodID(jlist_clazz, "put", "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;"); + static jmethodID mid = env->GetMethodID( + jclazz, "getFileSize", "()J"); assert(mid != nullptr); return mid; } }; -class HashMapJni : public JavaClass { +// The portal class for org.rocksdb.AbstractWalFilter +class AbstractWalFilterJni : public RocksDBNativeClass< + const rocksdb::WalFilterJniCallback*, + AbstractWalFilterJni> { public: /** - * Get the Java Class java.util.HashMap + * Get the Java Class org.rocksdb.AbstractWalFilter * * @param env A pointer to the Java environment * @@ -4829,120 +6979,114 @@ class HashMapJni : public JavaClass { * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown */ static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/util/HashMap"); + return RocksDBNativeClass::getJClass(env, + "org/rocksdb/AbstractWalFilter"); } /** - * Create a new Java java.util.HashMap object. + * Get the Java Method: AbstractWalFilter#columnFamilyLogNumberMap * * @param env A pointer to the Java environment * - * @return A reference to a Java java.util.HashMap object, or - * nullptr if an an exception occurs + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jobject construct(JNIEnv* env, const uint32_t initial_capacity = 16) { + static jmethodID getColumnFamilyLogNumberMapMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); - if (jclazz == nullptr) { + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - jmethodID mid = env->GetMethodID(jclazz, "", "(I)V"); - if (mid == nullptr) { - // exception thrown: NoSuchMethodException or OutOfMemoryError - return nullptr; - } - - jobject jhash_map = env->NewObject(jclazz, mid, static_cast(initial_capacity)); - if (env->ExceptionCheck()) { - return nullptr; - } - - return jhash_map; + static jmethodID mid = env->GetMethodID( + jclazz, "columnFamilyLogNumberMap", + "(Ljava/util/Map;Ljava/util/Map;)V"); + assert(mid != nullptr); + return mid; } /** - * A function which maps a std::pair to a std::pair + * Get the Java Method: AbstractTraceWriter#logRecordFoundProxy * - * @return Either a pointer to a std::pair, or nullptr - * if an error occurs during the mapping - */ - template - using FnMapKV = std::function> (const std::pair&)>; - - // template ::value_type, std::pair>::value, int32_t>::type = 0> - // static void putAll(JNIEnv* env, const jobject jhash_map, I iterator, const FnMapKV &fn_map_kv) { - /** - * Returns true if it succeeds, false if an error occurs + * @param env A pointer to the Java environment + * + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - template - static bool putAll(JNIEnv* env, const jobject jhash_map, iterator_type iterator, iterator_type end, const FnMapKV &fn_map_kv) { - const jmethodID jmid_put = rocksdb::MapJni::getMapPutMethodId(env); - if (jmid_put == nullptr) { - return false; - } - - for (auto it = iterator; it != end; ++it) { - const std::unique_ptr> result = fn_map_kv(*it); - if (result == nullptr) { - // an error occurred during fn_map_kv - return false; - } - env->CallObjectMethod(jhash_map, jmid_put, result->first, result->second); - if (env->ExceptionCheck()) { - // exception occurred - env->DeleteLocalRef(result->second); - env->DeleteLocalRef(result->first); - return false; - } - - // release local references - env->DeleteLocalRef(result->second); - env->DeleteLocalRef(result->first); + static jmethodID getLogRecordFoundProxyMethodId(JNIEnv* env) { + jclass jclazz = getJClass(env); + if(jclazz == nullptr) { + // exception occurred accessing class + return nullptr; } - return true; + static jmethodID mid = env->GetMethodID( + jclazz, "logRecordFoundProxy", "(JLjava/lang/String;JJ)S"); + assert(mid != nullptr); + return mid; } -}; -class LongJni : public JavaClass { - public: /** - * Get the Java Class java.lang.Long + * Get the Java Method: AbstractTraceWriter#name * * @param env A pointer to the Java environment * - * @return The Java Class or nullptr if one of the - * ClassFormatError, ClassCircularityError, NoClassDefFoundError, - * OutOfMemoryError or ExceptionInInitializerError exceptions is thrown + * @return The Java Method ID or nullptr if the class or method id could not + * be retieved */ - static jclass getJClass(JNIEnv* env) { - return JavaClass::getJClass(env, "java/lang/Long"); - } - - static jobject valueOf(JNIEnv* env, jlong jprimitive_long) { + static jmethodID getNameMethodId(JNIEnv* env) { jclass jclazz = getJClass(env); - if (jclazz == nullptr) { + if(jclazz == nullptr) { // exception occurred accessing class return nullptr; } - jmethodID mid = - env->GetStaticMethodID(jclazz, "valueOf", "(J)Ljava/lang/Long;"); - if (mid == nullptr) { - // exception thrown: NoSuchMethodException or OutOfMemoryError - return nullptr; - } + static jmethodID mid = env->GetMethodID( + jclazz, "name", "()Ljava/lang/String;"); + assert(mid != nullptr); + return mid; + } +}; - const jobject jlong_obj = - env->CallStaticObjectMethod(jclazz, mid, jprimitive_long); - if (env->ExceptionCheck()) { - // exception occurred - return nullptr; - } +// The portal class for org.rocksdb.WalProcessingOption +class WalProcessingOptionJni { + public: + // Returns the equivalent org.rocksdb.WalProcessingOption for the provided + // C++ rocksdb::WalFilter::WalProcessingOption enum + static jbyte toJavaWalProcessingOption( + const rocksdb::WalFilter::WalProcessingOption& wal_processing_option) { + switch(wal_processing_option) { + case rocksdb::WalFilter::WalProcessingOption::kContinueProcessing: + return 0x0; + case rocksdb::WalFilter::WalProcessingOption::kIgnoreCurrentRecord: + return 0x1; + case rocksdb::WalFilter::WalProcessingOption::kStopReplay: + return 0x2; + case rocksdb::WalFilter::WalProcessingOption::kCorruptedRecord: + return 0x3; + default: + return 0x7F; // undefined + } + } - return jlong_obj; - } + // Returns the equivalent C++ rocksdb::WalFilter::WalProcessingOption enum for + // the provided Java org.rocksdb.WalProcessingOption + static rocksdb::WalFilter::WalProcessingOption toCppWalProcessingOption( + jbyte jwal_processing_option) { + switch(jwal_processing_option) { + case 0x0: + return rocksdb::WalFilter::WalProcessingOption::kContinueProcessing; + case 0x1: + return rocksdb::WalFilter::WalProcessingOption::kIgnoreCurrentRecord; + case 0x2: + return rocksdb::WalFilter::WalProcessingOption::kStopReplay; + case 0x3: + return rocksdb::WalFilter::WalProcessingOption::kCorruptedRecord; + default: + // undefined/default + return rocksdb::WalFilter::WalProcessingOption::kCorruptedRecord; + } + } }; } // namespace rocksdb #endif // JAVA_ROCKSJNI_PORTAL_H_ diff --git a/ceph/src/rocksdb/java/rocksjni/rocksjni.cc b/ceph/src/rocksdb/java/rocksjni/rocksjni.cc index 6e50c32f7..53224232c 100644 --- a/ceph/src/rocksdb/java/rocksjni/rocksjni.cc +++ b/ceph/src/rocksdb/java/rocksjni/rocksjni.cc @@ -27,8 +27,6 @@ #undef min #endif -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Open jlong rocksdb_open_helper( JNIEnv* env, jlong jopt_handle, jstring jdb_path, std::function(jopt_handle); - std::vector handles; + std::vector cf_handles; rocksdb::DB* db = nullptr; - rocksdb::Status s = open_fn(*opt, db_path, column_families, &handles, &db); + rocksdb::Status s = open_fn(*opt, db_path, column_families, &cf_handles, &db); // we have now finished with db_path env->ReleaseStringUTFChars(jdb_path, db_path); // check if open operation was successful - if (s.ok()) { - const jsize resultsLen = 1 + len_cols; // db handle + column family handles - std::unique_ptr results = - std::unique_ptr(new jlong[resultsLen]); - results[0] = reinterpret_cast(db); - for (int i = 1; i <= len_cols; i++) { - results[i] = reinterpret_cast(handles[i - 1]); - } + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } - jlongArray jresults = env->NewLongArray(resultsLen); - if (jresults == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } + const jsize resultsLen = 1 + len_cols; // db handle + column family handles + std::unique_ptr results = + std::unique_ptr(new jlong[resultsLen]); + results[0] = reinterpret_cast(db); + for (int i = 1; i <= len_cols; i++) { + results[i] = reinterpret_cast(cf_handles[i - 1]); + } - env->SetLongArrayRegion(jresults, 0, resultsLen, results.get()); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jresults); - return nullptr; - } + jlongArray jresults = env->NewLongArray(resultsLen); + if (jresults == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } - return jresults; - } else { - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + env->SetLongArrayRegion(jresults, 0, resultsLen, results.get()); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jresults); return nullptr; } + + return jresults; } /* @@ -174,7 +170,7 @@ jlongArray rocksdb_open_helper( * Signature: (JLjava/lang/String;[[B[J)[J */ jlongArray Java_org_rocksdb_RocksDB_openROnly__JLjava_lang_String_2_3_3B_3J( - JNIEnv* env, jclass /*jcls*/, jlong jopt_handle, jstring jdb_path, + JNIEnv* env, jclass, jlong jopt_handle, jstring jdb_path, jobjectArray jcolumn_names, jlongArray jcolumn_options) { return rocksdb_open_helper( env, jopt_handle, jdb_path, jcolumn_names, jcolumn_options, @@ -192,7 +188,7 @@ jlongArray Java_org_rocksdb_RocksDB_openROnly__JLjava_lang_String_2_3_3B_3J( * Signature: (JLjava/lang/String;[[B[J)[J */ jlongArray Java_org_rocksdb_RocksDB_open__JLjava_lang_String_2_3_3B_3J( - JNIEnv* env, jclass /*jcls*/, jlong jopt_handle, jstring jdb_path, + JNIEnv* env, jclass, jlong jopt_handle, jstring jdb_path, jobjectArray jcolumn_names, jlongArray jcolumn_options) { return rocksdb_open_helper( env, jopt_handle, jdb_path, jcolumn_names, jcolumn_options, @@ -203,18 +199,38 @@ jlongArray Java_org_rocksdb_RocksDB_open__JLjava_lang_String_2_3_3B_3J( rocksdb::DB::Open); } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::ListColumnFamilies +/* + * Class: org_rocksdb_RocksDB + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* db = reinterpret_cast(jhandle); + assert(db != nullptr); + delete db; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: closeDatabase + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_closeDatabase( + JNIEnv* env, jclass, jlong jhandle) { + auto* db = reinterpret_cast(jhandle); + assert(db != nullptr); + rocksdb::Status s = db->Close(); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); +} /* * Class: org_rocksdb_RocksDB * Method: listColumnFamilies * Signature: (JLjava/lang/String;)[[B */ -jobjectArray Java_org_rocksdb_RocksDB_listColumnFamilies(JNIEnv* env, - jclass /*jclazz*/, - jlong jopt_handle, - jstring jdb_path) { +jobjectArray Java_org_rocksdb_RocksDB_listColumnFamilies( + JNIEnv* env, jclass, jlong jopt_handle, jstring jdb_path) { std::vector column_family_names; const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { @@ -234,17 +250,211 @@ jobjectArray Java_org_rocksdb_RocksDB_listColumnFamilies(JNIEnv* env, return jcolumn_family_names; } +/* + * Class: org_rocksdb_RocksDB + * Method: createColumnFamily + * Signature: (J[BIJ)J + */ +jlong Java_org_rocksdb_RocksDB_createColumnFamily( + JNIEnv* env, jobject, jlong jhandle, jbyteArray jcf_name, + jint jcf_name_len, jlong jcf_options_handle) { + auto* db = reinterpret_cast(jhandle); + jboolean has_exception = JNI_FALSE; + const std::string cf_name = + rocksdb::JniUtil::byteString(env, jcf_name, jcf_name_len, + [](const char* str, const size_t len) { + return std::string(str, len); + }, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return 0; + } + auto* cf_options = + reinterpret_cast(jcf_options_handle); + rocksdb::ColumnFamilyHandle *cf_handle; + rocksdb::Status s = db->CreateColumnFamily(*cf_options, cf_name, &cf_handle); + if (!s.ok()) { + // error occurred + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return 0; + } + return reinterpret_cast(cf_handle); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: createColumnFamilies + * Signature: (JJ[[B)[J + */ +jlongArray Java_org_rocksdb_RocksDB_createColumnFamilies__JJ_3_3B( + JNIEnv* env, jobject, jlong jhandle, jlong jcf_options_handle, + jobjectArray jcf_names) { + auto* db = reinterpret_cast(jhandle); + auto* cf_options = + reinterpret_cast(jcf_options_handle); + jboolean has_exception = JNI_FALSE; + std::vector cf_names; + rocksdb::JniUtil::byteStrings(env, jcf_names, + [](const char* str, const size_t len) { + return std::string(str, len); + }, + [&cf_names](const size_t, std::string str) { + cf_names.push_back(str); + }, + &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return nullptr; + } + + std::vector cf_handles; + rocksdb::Status s = db->CreateColumnFamilies(*cf_options, cf_names, &cf_handles); + if (!s.ok()) { + // error occurred + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + jlongArray jcf_handles = rocksdb::JniUtil::toJPointers( + env, cf_handles, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return nullptr; + } + return jcf_handles; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: createColumnFamilies + * Signature: (J[J[[B)[J + */ +jlongArray Java_org_rocksdb_RocksDB_createColumnFamilies__J_3J_3_3B( + JNIEnv* env, jobject, jlong jhandle, jlongArray jcf_options_handles, + jobjectArray jcf_names) { + auto* db = reinterpret_cast(jhandle); + const jsize jlen = env->GetArrayLength(jcf_options_handles); + std::vector cf_descriptors; + cf_descriptors.reserve(jlen); + + jboolean jcf_options_handles_is_copy = JNI_FALSE; + jlong *jcf_options_handles_elems = env->GetLongArrayElements(jcf_options_handles, &jcf_options_handles_is_copy); + if(jcf_options_handles_elems == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + // extract the column family descriptors + jboolean has_exception = JNI_FALSE; + for (jsize i = 0; i < jlen; i++) { + auto* cf_options = reinterpret_cast( + jcf_options_handles_elems[i]); + jbyteArray jcf_name = static_cast( + env->GetObjectArrayElement(jcf_names, i)); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->ReleaseLongArrayElements(jcf_options_handles, jcf_options_handles_elems, JNI_ABORT); + return nullptr; + } + const std::string cf_name = + rocksdb::JniUtil::byteString(env, jcf_name, + [](const char* str, const size_t len) { + return std::string(str, len); + }, + &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + env->DeleteLocalRef(jcf_name); + env->ReleaseLongArrayElements(jcf_options_handles, jcf_options_handles_elems, JNI_ABORT); + return nullptr; + } + + cf_descriptors.push_back(rocksdb::ColumnFamilyDescriptor(cf_name, *cf_options)); + + env->DeleteLocalRef(jcf_name); + } + + std::vector cf_handles; + rocksdb::Status s = db->CreateColumnFamilies(cf_descriptors, &cf_handles); + + env->ReleaseLongArrayElements(jcf_options_handles, jcf_options_handles_elems, JNI_ABORT); + + if (!s.ok()) { + // error occurred + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + jlongArray jcf_handles = rocksdb::JniUtil::toJPointers( + env, cf_handles, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return nullptr; + } + return jcf_handles; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: dropColumnFamily + * Signature: (JJ)V; + */ +void Java_org_rocksdb_RocksDB_dropColumnFamily( + JNIEnv* env, jobject, jlong jdb_handle, + jlong jcf_handle) { + auto* db_handle = reinterpret_cast(jdb_handle); + auto* cf_handle = reinterpret_cast(jcf_handle); + rocksdb::Status s = db_handle->DropColumnFamily(cf_handle); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } +} + +/* + * Class: org_rocksdb_RocksDB + * Method: dropColumnFamilies + * Signature: (J[J)V + */ +void Java_org_rocksdb_RocksDB_dropColumnFamilies( + JNIEnv* env, jobject, jlong jdb_handle, + jlongArray jcolumn_family_handles) { + auto* db_handle = reinterpret_cast(jdb_handle); + + std::vector cf_handles; + if (jcolumn_family_handles != nullptr) { + const jsize len_cols = env->GetArrayLength(jcolumn_family_handles); + + jlong* jcfh = env->GetLongArrayElements(jcolumn_family_handles, nullptr); + if (jcfh == nullptr) { + // exception thrown: OutOfMemoryError + return; + } + + for (jsize i = 0; i < len_cols; i++) { + auto* cf_handle = reinterpret_cast(jcfh[i]); + cf_handles.push_back(cf_handle); + } + env->ReleaseLongArrayElements(jcolumn_family_handles, jcfh, JNI_ABORT); + } + + rocksdb::Status s = db_handle->DropColumnFamilies(cf_handles); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } +} + ////////////////////////////////////////////////////////////////////////////// // rocksdb::DB::Put /** * @return true if the put succeeded, false if a Java Exception was thrown */ -bool rocksdb_put_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::WriteOptions& write_options, - rocksdb::ColumnFamilyHandle* cf_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { +bool rocksdb_put_helper( + JNIEnv* env, rocksdb::DB* db, + const rocksdb::WriteOptions& write_options, + rocksdb::ColumnFamilyHandle* cf_handle, jbyteArray jkey, + jint jkey_off, jint jkey_len, jbyteArray jval, + jint jval_off, jint jval_len) { jbyte* key = new jbyte[jkey_len]; env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); if (env->ExceptionCheck()) { @@ -290,17 +500,15 @@ bool rocksdb_put_helper(JNIEnv* env, rocksdb::DB* db, * Method: put * Signature: (J[BII[BII)V */ -void Java_org_rocksdb_RocksDB_put__J_3BII_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { +void Java_org_rocksdb_RocksDB_put__J_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { auto* db = reinterpret_cast(jdb_handle); static const rocksdb::WriteOptions default_write_options = rocksdb::WriteOptions(); - rocksdb_put_helper(env, db, default_write_options, nullptr, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len); + jkey_len, jval, jval_off, jval_len); } /* @@ -308,19 +516,18 @@ void Java_org_rocksdb_RocksDB_put__J_3BII_3BII(JNIEnv* env, jobject /*jdb*/, * Method: put * Signature: (J[BII[BIIJ)V */ -void Java_org_rocksdb_RocksDB_put__J_3BII_3BIIJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len, - jlong jcf_handle) { +void Java_org_rocksdb_RocksDB_put__J_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + jlong jcf_handle) { auto* db = reinterpret_cast(jdb_handle); static const rocksdb::WriteOptions default_write_options = rocksdb::WriteOptions(); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { rocksdb_put_helper(env, db, default_write_options, cf_handle, jkey, - jkey_off, jkey_len, jval, jval_off, jval_len); + jkey_off, jkey_len, jval, jval_off, jval_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); @@ -332,18 +539,16 @@ void Java_org_rocksdb_RocksDB_put__J_3BII_3BIIJ(JNIEnv* env, jobject /*jdb*/, * Method: put * Signature: (JJ[BII[BII)V */ -void Java_org_rocksdb_RocksDB_put__JJ_3BII_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jwrite_options_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { +void Java_org_rocksdb_RocksDB_put__JJ_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jlong jwrite_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { auto* db = reinterpret_cast(jdb_handle); auto* write_options = reinterpret_cast(jwrite_options_handle); - rocksdb_put_helper(env, db, *write_options, nullptr, jkey, jkey_off, jkey_len, - jval, jval_off, jval_len); + jval, jval_off, jval_len); } /* @@ -352,16 +557,17 @@ void Java_org_rocksdb_RocksDB_put__JJ_3BII_3BII(JNIEnv* env, jobject /*jdb*/, * Signature: (JJ[BII[BIIJ)V */ void Java_org_rocksdb_RocksDB_put__JJ_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len, jlong jcf_handle) { + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + jlong jcf_handle) { auto* db = reinterpret_cast(jdb_handle); auto* write_options = reinterpret_cast(jwrite_options_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { rocksdb_put_helper(env, db, *write_options, cf_handle, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len); + jkey_len, jval, jval_off, jval_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); @@ -369,1174 +575,1148 @@ void Java_org_rocksdb_RocksDB_put__JJ_3BII_3BIIJ( } ////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Write -/* - * Class: org_rocksdb_RocksDB - * Method: write0 - * Signature: (JJJ)V - */ -void Java_org_rocksdb_RocksDB_write0(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jwrite_options_handle, - jlong jwb_handle) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options_handle); - auto* wb = reinterpret_cast(jwb_handle); - - rocksdb::Status s = db->Write(*write_options, wb); - - if (!s.ok()) { - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - } -} +// rocksdb::DB::Delete() -/* - * Class: org_rocksdb_RocksDB - * Method: write1 - * Signature: (JJJ)V +/** + * @return true if the delete succeeded, false if a Java Exception was thrown */ -void Java_org_rocksdb_RocksDB_write1(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jwrite_options_handle, - jlong jwbwi_handle) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options_handle); - auto* wbwi = reinterpret_cast(jwbwi_handle); - auto* wb = wbwi->GetWriteBatch(); - - rocksdb::Status s = db->Write(*write_options, wb); - - if (!s.ok()) { - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - } -} - -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::KeyMayExist -jboolean key_may_exist_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::ReadOptions& read_opt, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, - jobject jstring_builder, bool* has_exception) { +bool rocksdb_delete_helper( + JNIEnv* env, rocksdb::DB* db, const rocksdb::WriteOptions& write_options, + rocksdb::ColumnFamilyHandle* cf_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len) { jbyte* key = new jbyte[jkey_len]; env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); if (env->ExceptionCheck()) { // exception thrown: ArrayIndexOutOfBoundsException delete[] key; - *has_exception = true; return false; } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - std::string value; - bool value_found = false; - bool keyMayExist; + rocksdb::Status s; if (cf_handle != nullptr) { - keyMayExist = - db->KeyMayExist(read_opt, cf_handle, key_slice, &value, &value_found); + s = db->Delete(write_options, cf_handle, key_slice); } else { - keyMayExist = db->KeyMayExist(read_opt, key_slice, &value, &value_found); + // backwards compatibility + s = db->Delete(write_options, key_slice); } // cleanup delete[] key; - // extract the value - if (value_found && !value.empty()) { - jobject jresult_string_builder = - rocksdb::StringBuilderJni::append(env, jstring_builder, value.c_str()); - if (jresult_string_builder == nullptr) { - *has_exception = true; - return false; - } + if (s.ok()) { + return true; } - *has_exception = false; - return static_cast(keyMayExist); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return false; } /* * Class: org_rocksdb_RocksDB - * Method: keyMayExist - * Signature: (J[BIILjava/lang/StringBuilder;)Z + * Method: delete + * Signature: (J[BII)V */ -jboolean Java_org_rocksdb_RocksDB_keyMayExist__J_3BIILjava_lang_StringBuilder_2( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len, jobject jstring_builder) { +void Java_org_rocksdb_RocksDB_delete__J_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len) { auto* db = reinterpret_cast(jdb_handle); - bool has_exception = false; - return key_may_exist_helper(env, db, rocksdb::ReadOptions(), nullptr, jkey, - jkey_off, jkey_len, jstring_builder, - &has_exception); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + rocksdb_delete_helper(env, db, default_write_options, nullptr, jkey, jkey_off, + jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: keyMayExist - * Signature: (J[BIIJLjava/lang/StringBuilder;)Z + * Method: delete + * Signature: (J[BIIJ)V */ -jboolean -Java_org_rocksdb_RocksDB_keyMayExist__J_3BIIJLjava_lang_StringBuilder_2( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len, jlong jcf_handle, jobject jstring_builder) { +void Java_org_rocksdb_RocksDB_delete__J_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jlong jcf_handle) { auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - bool has_exception = false; - return key_may_exist_helper(env, db, rocksdb::ReadOptions(), cf_handle, - jkey, jkey_off, jkey_len, jstring_builder, - &has_exception); + rocksdb_delete_helper(env, db, default_write_options, cf_handle, jkey, + jkey_off, jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - return true; } } /* * Class: org_rocksdb_RocksDB - * Method: keyMayExist - * Signature: (JJ[BIILjava/lang/StringBuilder;)Z + * Method: delete + * Signature: (JJ[BII)V */ -jboolean -Java_org_rocksdb_RocksDB_keyMayExist__JJ_3BIILjava_lang_StringBuilder_2( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jread_options_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jobject jstring_builder) { +void Java_org_rocksdb_RocksDB_delete__JJ_3BII( + JNIEnv* env, jobject, + jlong jdb_handle, + jlong jwrite_options, + jbyteArray jkey, jint jkey_off, jint jkey_len) { auto* db = reinterpret_cast(jdb_handle); - auto& read_options = - *reinterpret_cast(jread_options_handle); - bool has_exception = false; - return key_may_exist_helper(env, db, read_options, nullptr, jkey, jkey_off, - jkey_len, jstring_builder, &has_exception); + auto* write_options = + reinterpret_cast(jwrite_options); + rocksdb_delete_helper(env, db, *write_options, nullptr, jkey, jkey_off, + jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: keyMayExist - * Signature: (JJ[BIIJLjava/lang/StringBuilder;)Z + * Method: delete + * Signature: (JJ[BIIJ)V */ -jboolean -Java_org_rocksdb_RocksDB_keyMayExist__JJ_3BIIJLjava_lang_StringBuilder_2( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jread_options_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle, - jobject jstring_builder) { +void Java_org_rocksdb_RocksDB_delete__JJ_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options, + jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle) { auto* db = reinterpret_cast(jdb_handle); - auto& read_options = - *reinterpret_cast(jread_options_handle); + auto* write_options = + reinterpret_cast(jwrite_options); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - bool has_exception = false; - return key_may_exist_helper(env, db, read_options, cf_handle, jkey, - jkey_off, jkey_len, jstring_builder, - &has_exception); + rocksdb_delete_helper(env, db, *write_options, cf_handle, jkey, jkey_off, + jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - return true; } } ////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Get - -jbyteArray rocksdb_get_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::ReadOptions& read_opt, - rocksdb::ColumnFamilyHandle* column_family_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len) { - jbyte* key = new jbyte[jkey_len]; - env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - delete[] key; - return nullptr; +// rocksdb::DB::SingleDelete() +/** + * @return true if the single delete succeeded, false if a Java Exception + * was thrown + */ +bool rocksdb_single_delete_helper( + JNIEnv* env, rocksdb::DB* db, + const rocksdb::WriteOptions& write_options, + rocksdb::ColumnFamilyHandle* cf_handle, + jbyteArray jkey, jint jkey_len) { + jbyte* key = env->GetByteArrayElements(jkey, nullptr); + if (key == nullptr) { + // exception thrown: OutOfMemoryError + return false; } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - std::string value; rocksdb::Status s; - if (column_family_handle != nullptr) { - s = db->Get(read_opt, column_family_handle, key_slice, &value); + if (cf_handle != nullptr) { + s = db->SingleDelete(write_options, cf_handle, key_slice); } else { // backwards compatibility - s = db->Get(read_opt, key_slice, &value); + s = db->SingleDelete(write_options, key_slice); } - // cleanup - delete[] key; - - if (s.IsNotFound()) { - return nullptr; - } + // trigger java unref on key and value. + // by passing JNI_ABORT, it will simply release the reference without + // copying the result back to the java byte array. + env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); if (s.ok()) { - jbyteArray jret_value = rocksdb::JniUtil::copyBytes(env, value); - if (jret_value == nullptr) { - // exception occurred - return nullptr; - } - return jret_value; + return true; } rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return nullptr; + return false; } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (J[BII)[B + * Method: singleDelete + * Signature: (J[BI)V */ -jbyteArray Java_org_rocksdb_RocksDB_get__J_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len) { - return rocksdb_get_helper(env, reinterpret_cast(jdb_handle), - rocksdb::ReadOptions(), nullptr, jkey, jkey_off, - jkey_len); +void Java_org_rocksdb_RocksDB_singleDelete__J_3BI( + JNIEnv* env, jobject, + jlong jdb_handle, + jbyteArray jkey, + jint jkey_len) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + rocksdb_single_delete_helper(env, db, default_write_options, nullptr, + jkey, jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (J[BIIJ)[B + * Method: singleDelete + * Signature: (J[BIJ)V */ -jbyteArray Java_org_rocksdb_RocksDB_get__J_3BIIJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, - jlong jcf_handle) { - auto db_handle = reinterpret_cast(jdb_handle); - auto cf_handle = reinterpret_cast(jcf_handle); +void Java_org_rocksdb_RocksDB_singleDelete__J_3BIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_len, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - return rocksdb_get_helper(env, db_handle, rocksdb::ReadOptions(), cf_handle, - jkey, jkey_off, jkey_len); + rocksdb_single_delete_helper(env, db, default_write_options, cf_handle, + jkey, jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - return nullptr; } } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (JJ[BII)[B + * Method: singleDelete + * Signature: (JJ[BIJ)V */ -jbyteArray Java_org_rocksdb_RocksDB_get__JJ_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jropt_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len) { - return rocksdb_get_helper( - env, reinterpret_cast(jdb_handle), - *reinterpret_cast(jropt_handle), nullptr, jkey, - jkey_off, jkey_len); +void Java_org_rocksdb_RocksDB_singleDelete__JJ_3BI( + JNIEnv* env, jobject, jlong jdb_handle, + jlong jwrite_options, + jbyteArray jkey, + jint jkey_len) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options); + rocksdb_single_delete_helper(env, db, *write_options, nullptr, jkey, + jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (JJ[BIIJ)[B + * Method: singleDelete + * Signature: (JJ[BIJ)V */ -jbyteArray Java_org_rocksdb_RocksDB_get__JJ_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jropt_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle) { - auto* db_handle = reinterpret_cast(jdb_handle); - auto& ro_opt = *reinterpret_cast(jropt_handle); +void Java_org_rocksdb_RocksDB_singleDelete__JJ_3BIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options, + jbyteArray jkey, jint jkey_len, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - return rocksdb_get_helper(env, db_handle, ro_opt, cf_handle, jkey, jkey_off, - jkey_len); + rocksdb_single_delete_helper(env, db, *write_options, cf_handle, jkey, + jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - return nullptr; } } -jint rocksdb_get_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::ReadOptions& read_options, - rocksdb::ColumnFamilyHandle* column_family_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, - jbyteArray jval, jint jval_off, jint jval_len, - bool* has_exception) { - static const int kNotFound = -1; - static const int kStatusError = -2; - - jbyte* key = new jbyte[jkey_len]; - env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); +////////////////////////////////////////////////////////////////////////////// +// rocksdb::DB::DeleteRange() +/** + * @return true if the delete range succeeded, false if a Java Exception + * was thrown + */ +bool rocksdb_delete_range_helper( + JNIEnv* env, rocksdb::DB* db, + const rocksdb::WriteOptions& write_options, + rocksdb::ColumnFamilyHandle* cf_handle, + jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, + jbyteArray jend_key, jint jend_key_off, jint jend_key_len) { + jbyte* begin_key = new jbyte[jbegin_key_len]; + env->GetByteArrayRegion(jbegin_key, jbegin_key_off, jbegin_key_len, + begin_key); if (env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - delete[] key; - *has_exception = true; - return kStatusError; + // exception thrown: ArrayIndexOutOfBoundsException + delete[] begin_key; + return false; } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + rocksdb::Slice begin_key_slice(reinterpret_cast(begin_key), + jbegin_key_len); - // TODO(yhchiang): we might save one memory allocation here by adding - // a DB::Get() function which takes preallocated jbyte* as input. - std::string cvalue; - rocksdb::Status s; - if (column_family_handle != nullptr) { - s = db->Get(read_options, column_family_handle, key_slice, &cvalue); - } else { - // backwards compatibility - s = db->Get(read_options, key_slice, &cvalue); + jbyte* end_key = new jbyte[jend_key_len]; + env->GetByteArrayRegion(jend_key, jend_key_off, jend_key_len, end_key); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + delete[] begin_key; + delete[] end_key; + return false; } + rocksdb::Slice end_key_slice(reinterpret_cast(end_key), jend_key_len); - // cleanup - delete[] key; + rocksdb::Status s = + db->DeleteRange(write_options, cf_handle, begin_key_slice, end_key_slice); - if (s.IsNotFound()) { - *has_exception = false; - return kNotFound; - } else if (!s.ok()) { - *has_exception = true; - // Here since we are throwing a Java exception from c++ side. - // As a result, c++ does not know calling this function will in fact - // throwing an exception. As a result, the execution flow will - // not stop here, and codes after this throw will still be - // executed. - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + // cleanup + delete[] begin_key; + delete[] end_key; - // Return a dummy const value to avoid compilation error, although - // java side might not have a chance to get the return value :) - return kStatusError; + if (s.ok()) { + return true; } - const jint cvalue_len = static_cast(cvalue.size()); - const jint length = std::min(jval_len, cvalue_len); - - env->SetByteArrayRegion( - jval, jval_off, length, - const_cast(reinterpret_cast(cvalue.c_str()))); - if (env->ExceptionCheck()) { - // exception thrown: OutOfMemoryError - *has_exception = true; - return kStatusError; - } + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return false; +} - *has_exception = false; - return cvalue_len; +/* + * Class: org_rocksdb_RocksDB + * Method: deleteRange + * Signature: (J[BII[BII)V + */ +void Java_org_rocksdb_RocksDB_deleteRange__J_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, + jbyteArray jend_key, jint jend_key_off, jint jend_key_len) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + rocksdb_delete_range_helper(env, db, default_write_options, nullptr, + jbegin_key, jbegin_key_off, jbegin_key_len, + jend_key, jend_key_off, jend_key_len); } -inline void multi_get_helper_release_keys( - JNIEnv* env, std::vector>& keys_to_free) { - auto end = keys_to_free.end(); - for (auto it = keys_to_free.begin(); it != end; ++it) { - delete[] it->first; - env->DeleteLocalRef(it->second); +/* + * Class: org_rocksdb_RocksDB + * Method: deleteRange + * Signature: (J[BII[BIIJ)V + */ +void Java_org_rocksdb_RocksDB_deleteRange__J_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, + jbyteArray jend_key, jint jend_key_off, jint jend_key_len, + jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + auto* cf_handle = reinterpret_cast(jcf_handle); + if (cf_handle != nullptr) { + rocksdb_delete_range_helper(env, db, default_write_options, cf_handle, + jbegin_key, jbegin_key_off, jbegin_key_len, + jend_key, jend_key_off, jend_key_len); + } else { + rocksdb::RocksDBExceptionJni::ThrowNew( + env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); } - keys_to_free.clear(); } -/** - * cf multi get - * - * @return byte[][] of values or nullptr if an exception occurs +/* + * Class: org_rocksdb_RocksDB + * Method: deleteRange + * Signature: (JJ[BII[BII)V */ -jobjectArray multi_get_helper(JNIEnv* env, jobject /*jdb*/, rocksdb::DB* db, - const rocksdb::ReadOptions& rOpt, - jobjectArray jkeys, jintArray jkey_offs, - jintArray jkey_lens, - jlongArray jcolumn_family_handles) { - std::vector cf_handles; - if (jcolumn_family_handles != nullptr) { - const jsize len_cols = env->GetArrayLength(jcolumn_family_handles); - - jlong* jcfh = env->GetLongArrayElements(jcolumn_family_handles, nullptr); - if (jcfh == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } - - for (jsize i = 0; i < len_cols; i++) { - auto* cf_handle = reinterpret_cast(jcfh[i]); - cf_handles.push_back(cf_handle); - } - env->ReleaseLongArrayElements(jcolumn_family_handles, jcfh, JNI_ABORT); - } +void Java_org_rocksdb_RocksDB_deleteRange__JJ_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options, + jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, + jbyteArray jend_key, jint jend_key_off, jint jend_key_len) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options); + rocksdb_delete_range_helper(env, db, *write_options, nullptr, jbegin_key, + jbegin_key_off, jbegin_key_len, jend_key, + jend_key_off, jend_key_len); +} - const jsize len_keys = env->GetArrayLength(jkeys); - if (env->EnsureLocalCapacity(len_keys) != 0) { - // exception thrown: OutOfMemoryError - return nullptr; +/* + * Class: org_rocksdb_RocksDB + * Method: deleteRange + * Signature: (JJ[BII[BIIJ)V + */ +void Java_org_rocksdb_RocksDB_deleteRange__JJ_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options, + jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, + jbyteArray jend_key, jint jend_key_off, jint jend_key_len, + jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options); + auto* cf_handle = reinterpret_cast(jcf_handle); + if (cf_handle != nullptr) { + rocksdb_delete_range_helper(env, db, *write_options, cf_handle, + jbegin_key, jbegin_key_off, jbegin_key_len, + jend_key, jend_key_off, jend_key_len); + } else { + rocksdb::RocksDBExceptionJni::ThrowNew( + env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); } +} - jint* jkey_off = env->GetIntArrayElements(jkey_offs, nullptr); - if (jkey_off == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } +////////////////////////////////////////////////////////////////////////////// +// rocksdb::DB::Merge - jint* jkey_len = env->GetIntArrayElements(jkey_lens, nullptr); - if (jkey_len == nullptr) { - // exception thrown: OutOfMemoryError - env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); - return nullptr; +/** + * @return true if the merge succeeded, false if a Java Exception was thrown + */ +bool rocksdb_merge_helper( + JNIEnv* env, rocksdb::DB* db, const rocksdb::WriteOptions& write_options, + rocksdb::ColumnFamilyHandle* cf_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { + jbyte* key = new jbyte[jkey_len]; + env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + delete[] key; + return false; } + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - std::vector keys; - std::vector> keys_to_free; - for (jsize i = 0; i < len_keys; i++) { - jobject jkey = env->GetObjectArrayElement(jkeys, i); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); - env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); - multi_get_helper_release_keys(env, keys_to_free); - return nullptr; - } - - jbyteArray jkey_ba = reinterpret_cast(jkey); - - const jint len_key = jkey_len[i]; - jbyte* key = new jbyte[len_key]; - env->GetByteArrayRegion(jkey_ba, jkey_off[i], len_key, key); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - delete[] key; - env->DeleteLocalRef(jkey); - env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); - env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); - multi_get_helper_release_keys(env, keys_to_free); - return nullptr; - } - - rocksdb::Slice key_slice(reinterpret_cast(key), len_key); - keys.push_back(key_slice); - - keys_to_free.push_back(std::pair(key, jkey)); + jbyte* value = new jbyte[jval_len]; + env->GetByteArrayRegion(jval, jval_off, jval_len, value); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + delete[] value; + delete[] key; + return false; } + rocksdb::Slice value_slice(reinterpret_cast(value), jval_len); - // cleanup jkey_off and jken_len - env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); - env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); - - std::vector values; - std::vector s; - if (cf_handles.size() == 0) { - s = db->MultiGet(rOpt, keys, &values); + rocksdb::Status s; + if (cf_handle != nullptr) { + s = db->Merge(write_options, cf_handle, key_slice, value_slice); } else { - s = db->MultiGet(rOpt, cf_handles, keys, &values); - } - - // free up allocated byte arrays - multi_get_helper_release_keys(env, keys_to_free); - - // prepare the results - jobjectArray jresults = - rocksdb::ByteJni::new2dByteArray(env, static_cast(s.size())); - if (jresults == nullptr) { - // exception occurred - return nullptr; + s = db->Merge(write_options, key_slice, value_slice); } - // TODO(AR) it is not clear to me why EnsureLocalCapacity is needed for the - // loop as we cleanup references with env->DeleteLocalRef(jentry_value); - if (env->EnsureLocalCapacity(static_cast(s.size())) != 0) { - // exception thrown: OutOfMemoryError - return nullptr; - } - // add to the jresults - for (std::vector::size_type i = 0; i != s.size(); i++) { - if (s[i].ok()) { - std::string* value = &values[i]; - const jsize jvalue_len = static_cast(value->size()); - jbyteArray jentry_value = env->NewByteArray(jvalue_len); - if (jentry_value == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } - - env->SetByteArrayRegion( - jentry_value, 0, static_cast(jvalue_len), - const_cast(reinterpret_cast(value->c_str()))); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jentry_value); - return nullptr; - } - - env->SetObjectArrayElement(jresults, static_cast(i), jentry_value); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jentry_value); - return nullptr; - } + // cleanup + delete[] value; + delete[] key; - env->DeleteLocalRef(jentry_value); - } + if (s.ok()) { + return true; } - return jresults; -} - -/* - * Class: org_rocksdb_RocksDB - * Method: multiGet - * Signature: (J[[B[I[I)[[B - */ -jobjectArray Java_org_rocksdb_RocksDB_multiGet__J_3_3B_3I_3I( - JNIEnv* env, jobject jdb, jlong jdb_handle, jobjectArray jkeys, - jintArray jkey_offs, jintArray jkey_lens) { - return multi_get_helper(env, jdb, reinterpret_cast(jdb_handle), - rocksdb::ReadOptions(), jkeys, jkey_offs, jkey_lens, - nullptr); -} - -/* - * Class: org_rocksdb_RocksDB - * Method: multiGet - * Signature: (J[[B[I[I[J)[[B - */ -jobjectArray Java_org_rocksdb_RocksDB_multiGet__J_3_3B_3I_3I_3J( - JNIEnv* env, jobject jdb, jlong jdb_handle, jobjectArray jkeys, - jintArray jkey_offs, jintArray jkey_lens, - jlongArray jcolumn_family_handles) { - return multi_get_helper(env, jdb, reinterpret_cast(jdb_handle), - rocksdb::ReadOptions(), jkeys, jkey_offs, jkey_lens, - jcolumn_family_handles); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return false; } /* * Class: org_rocksdb_RocksDB - * Method: multiGet - * Signature: (JJ[[B[I[I)[[B + * Method: merge + * Signature: (J[BII[BII)V */ -jobjectArray Java_org_rocksdb_RocksDB_multiGet__JJ_3_3B_3I_3I( - JNIEnv* env, jobject jdb, jlong jdb_handle, jlong jropt_handle, - jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens) { - return multi_get_helper( - env, jdb, reinterpret_cast(jdb_handle), - *reinterpret_cast(jropt_handle), jkeys, jkey_offs, - jkey_lens, nullptr); +void Java_org_rocksdb_RocksDB_merge__J_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + rocksdb_merge_helper(env, db, default_write_options, nullptr, jkey, jkey_off, + jkey_len, jval, jval_off, jval_len); } /* * Class: org_rocksdb_RocksDB - * Method: multiGet - * Signature: (JJ[[B[I[I[J)[[B + * Method: merge + * Signature: (J[BII[BIIJ)V */ -jobjectArray Java_org_rocksdb_RocksDB_multiGet__JJ_3_3B_3I_3I_3J( - JNIEnv* env, jobject jdb, jlong jdb_handle, jlong jropt_handle, - jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens, - jlongArray jcolumn_family_handles) { - return multi_get_helper( - env, jdb, reinterpret_cast(jdb_handle), - *reinterpret_cast(jropt_handle), jkeys, jkey_offs, - jkey_lens, jcolumn_family_handles); +void Java_org_rocksdb_RocksDB_merge__J_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + static const rocksdb::WriteOptions default_write_options = + rocksdb::WriteOptions(); + auto* cf_handle = reinterpret_cast(jcf_handle); + if (cf_handle != nullptr) { + rocksdb_merge_helper(env, db, default_write_options, cf_handle, jkey, + jkey_off, jkey_len, jval, jval_off, jval_len); + } else { + rocksdb::RocksDBExceptionJni::ThrowNew( + env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + } } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (J[BII[BII)I + * Method: merge + * Signature: (JJ[BII[BII)V */ -jint Java_org_rocksdb_RocksDB_get__J_3BII_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { - bool has_exception = false; - return rocksdb_get_helper(env, reinterpret_cast(jdb_handle), - rocksdb::ReadOptions(), nullptr, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len, &has_exception); +void Java_org_rocksdb_RocksDB_merge__JJ_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options_handle); + rocksdb_merge_helper(env, db, *write_options, nullptr, jkey, jkey_off, + jkey_len, jval, jval_off, jval_len); } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (J[BII[BIIJ)I + * Method: merge + * Signature: (JJ[BII[BIIJ)V */ -jint Java_org_rocksdb_RocksDB_get__J_3BII_3BIIJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len, - jlong jcf_handle) { - auto* db_handle = reinterpret_cast(jdb_handle); +void Java_org_rocksdb_RocksDB_merge__JJ_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jwrite_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - bool has_exception = false; - return rocksdb_get_helper(env, db_handle, rocksdb::ReadOptions(), cf_handle, - jkey, jkey_off, jkey_len, jval, jval_off, - jval_len, &has_exception); + rocksdb_merge_helper(env, db, *write_options, cf_handle, jkey, jkey_off, + jkey_len, jval, jval_off, jval_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - // will never be evaluated - return 0; } } +jlong rocksdb_iterator_helper(rocksdb::DB* db, + rocksdb::ReadOptions read_options, + rocksdb::ColumnFamilyHandle* cf_handle) { + rocksdb::Iterator* iterator = nullptr; + if (cf_handle != nullptr) { + iterator = db->NewIterator(read_options, cf_handle); + } else { + iterator = db->NewIterator(read_options); + } + return reinterpret_cast(iterator); +} + +////////////////////////////////////////////////////////////////////////////// +// rocksdb::DB::Write /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (JJ[BII[BII)I + * Method: write0 + * Signature: (JJJ)V */ -jint Java_org_rocksdb_RocksDB_get__JJ_3BII_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jropt_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { - bool has_exception = false; - return rocksdb_get_helper( - env, reinterpret_cast(jdb_handle), - *reinterpret_cast(jropt_handle), nullptr, jkey, - jkey_off, jkey_len, jval, jval_off, jval_len, &has_exception); +void Java_org_rocksdb_RocksDB_write0( + JNIEnv* env, jobject, jlong jdb_handle, + jlong jwrite_options_handle, jlong jwb_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options_handle); + auto* wb = reinterpret_cast(jwb_handle); + + rocksdb::Status s = db->Write(*write_options, wb); + + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } } /* * Class: org_rocksdb_RocksDB - * Method: get - * Signature: (JJ[BII[BIIJ)I + * Method: write1 + * Signature: (JJJ)V */ -jint Java_org_rocksdb_RocksDB_get__JJ_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jropt_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len, jlong jcf_handle) { - auto* db_handle = reinterpret_cast(jdb_handle); - auto& ro_opt = *reinterpret_cast(jropt_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); - if (cf_handle != nullptr) { - bool has_exception = false; - return rocksdb_get_helper(env, db_handle, ro_opt, cf_handle, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len, - &has_exception); - } else { - rocksdb::RocksDBExceptionJni::ThrowNew( - env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - // will never be evaluated - return 0; +void Java_org_rocksdb_RocksDB_write1( + JNIEnv* env, jobject, jlong jdb_handle, + jlong jwrite_options_handle, jlong jwbwi_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto* write_options = + reinterpret_cast(jwrite_options_handle); + auto* wbwi = reinterpret_cast(jwbwi_handle); + auto* wb = wbwi->GetWriteBatch(); + + rocksdb::Status s = db->Write(*write_options, wb); + + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); } } ////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Delete() +// rocksdb::DB::Get -/** - * @return true if the delete succeeded, false if a Java Exception was thrown - */ -bool rocksdb_delete_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::WriteOptions& write_options, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len) { +jbyteArray rocksdb_get_helper( + JNIEnv* env, rocksdb::DB* db, + const rocksdb::ReadOptions& read_opt, + rocksdb::ColumnFamilyHandle* column_family_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len) { jbyte* key = new jbyte[jkey_len]; env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); if (env->ExceptionCheck()) { // exception thrown: ArrayIndexOutOfBoundsException delete[] key; - return false; + return nullptr; } + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + std::string value; rocksdb::Status s; - if (cf_handle != nullptr) { - s = db->Delete(write_options, cf_handle, key_slice); + if (column_family_handle != nullptr) { + s = db->Get(read_opt, column_family_handle, key_slice, &value); } else { // backwards compatibility - s = db->Delete(write_options, key_slice); + s = db->Get(read_opt, key_slice, &value); } // cleanup delete[] key; + if (s.IsNotFound()) { + return nullptr; + } + if (s.ok()) { - return true; + jbyteArray jret_value = rocksdb::JniUtil::copyBytes(env, value); + if (jret_value == nullptr) { + // exception occurred + return nullptr; + } + return jret_value; } rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return false; + return nullptr; } /* * Class: org_rocksdb_RocksDB - * Method: delete - * Signature: (J[BII)V + * Method: get + * Signature: (J[BII)[B */ -void Java_org_rocksdb_RocksDB_delete__J_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - rocksdb_delete_helper(env, db, default_write_options, nullptr, jkey, jkey_off, - jkey_len); +jbyteArray Java_org_rocksdb_RocksDB_get__J_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len) { + return rocksdb_get_helper(env, reinterpret_cast(jdb_handle), + rocksdb::ReadOptions(), nullptr, jkey, jkey_off, jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: delete - * Signature: (J[BIIJ)V + * Method: get + * Signature: (J[BIIJ)[B */ -void Java_org_rocksdb_RocksDB_delete__J_3BIIJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len, - jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - auto* cf_handle = reinterpret_cast(jcf_handle); +jbyteArray Java_org_rocksdb_RocksDB_get__J_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle) { + auto db_handle = reinterpret_cast(jdb_handle); + auto cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_delete_helper(env, db, default_write_options, cf_handle, jkey, - jkey_off, jkey_len); + return rocksdb_get_helper(env, db_handle, rocksdb::ReadOptions(), cf_handle, + jkey, jkey_off, jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + return nullptr; } } /* * Class: org_rocksdb_RocksDB - * Method: delete - * Signature: (JJ[BII)V + * Method: get + * Signature: (JJ[BII)[B */ -void Java_org_rocksdb_RocksDB_delete__JJ_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jwrite_options, - jbyteArray jkey, jint jkey_off, - jint jkey_len) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); - rocksdb_delete_helper(env, db, *write_options, nullptr, jkey, jkey_off, - jkey_len); +jbyteArray Java_org_rocksdb_RocksDB_get__JJ_3BII( + JNIEnv* env, jobject, + jlong jdb_handle, jlong jropt_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len) { + return rocksdb_get_helper( + env, reinterpret_cast(jdb_handle), + *reinterpret_cast(jropt_handle), nullptr, jkey, + jkey_off, jkey_len); } /* * Class: org_rocksdb_RocksDB - * Method: delete - * Signature: (JJ[BIIJ)V + * Method: get + * Signature: (JJ[BIIJ)[B */ -void Java_org_rocksdb_RocksDB_delete__JJ_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options, +jbyteArray Java_org_rocksdb_RocksDB_get__JJ_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jropt_handle, jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); + auto* db_handle = reinterpret_cast(jdb_handle); + auto& ro_opt = *reinterpret_cast(jropt_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_delete_helper(env, db, *write_options, cf_handle, jkey, jkey_off, - jkey_len); + return rocksdb_get_helper( + env, db_handle, ro_opt, cf_handle, jkey, jkey_off, jkey_len); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + return nullptr; } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::SingleDelete() -/** - * @return true if the single delete succeeded, false if a Java Exception - * was thrown - */ -bool rocksdb_single_delete_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::WriteOptions& write_options, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jkey, jint jkey_len) { - jbyte* key = env->GetByteArrayElements(jkey, nullptr); - if (key == nullptr) { +jint rocksdb_get_helper( + JNIEnv* env, rocksdb::DB* db, const rocksdb::ReadOptions& read_options, + rocksdb::ColumnFamilyHandle* column_family_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + bool* has_exception) { + static const int kNotFound = -1; + static const int kStatusError = -2; + + jbyte* key = new jbyte[jkey_len]; + env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); + if (env->ExceptionCheck()) { // exception thrown: OutOfMemoryError - return false; + delete[] key; + *has_exception = true; + return kStatusError; } rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); + // TODO(yhchiang): we might save one memory allocation here by adding + // a DB::Get() function which takes preallocated jbyte* as input. + std::string cvalue; rocksdb::Status s; - if (cf_handle != nullptr) { - s = db->SingleDelete(write_options, cf_handle, key_slice); + if (column_family_handle != nullptr) { + s = db->Get(read_options, column_family_handle, key_slice, &cvalue); } else { // backwards compatibility - s = db->SingleDelete(write_options, key_slice); + s = db->Get(read_options, key_slice, &cvalue); } - // trigger java unref on key and value. - // by passing JNI_ABORT, it will simply release the reference without - // copying the result back to the java byte array. - env->ReleaseByteArrayElements(jkey, key, JNI_ABORT); + // cleanup + delete[] key; - if (s.ok()) { - return true; + if (s.IsNotFound()) { + *has_exception = false; + return kNotFound; + } else if (!s.ok()) { + *has_exception = true; + // Here since we are throwing a Java exception from c++ side. + // As a result, c++ does not know calling this function will in fact + // throwing an exception. As a result, the execution flow will + // not stop here, and codes after this throw will still be + // executed. + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + + // Return a dummy const value to avoid compilation error, although + // java side might not have a chance to get the return value :) + return kStatusError; + } + + const jint cvalue_len = static_cast(cvalue.size()); + const jint length = std::min(jval_len, cvalue_len); + + env->SetByteArrayRegion( + jval, jval_off, length, + const_cast(reinterpret_cast(cvalue.c_str()))); + if (env->ExceptionCheck()) { + // exception thrown: OutOfMemoryError + *has_exception = true; + return kStatusError; } - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return false; + *has_exception = false; + return cvalue_len; } + /* * Class: org_rocksdb_RocksDB - * Method: singleDelete - * Signature: (J[BI)V + * Method: get + * Signature: (J[BII[BII)I */ -void Java_org_rocksdb_RocksDB_singleDelete__J_3BI(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, - jint jkey_len) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - rocksdb_single_delete_helper(env, db, default_write_options, nullptr, jkey, - jkey_len); +jint Java_org_rocksdb_RocksDB_get__J_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { + bool has_exception = false; + return rocksdb_get_helper(env, reinterpret_cast(jdb_handle), + rocksdb::ReadOptions(), nullptr, jkey, jkey_off, + jkey_len, jval, jval_off, jval_len, &has_exception); } /* * Class: org_rocksdb_RocksDB - * Method: singleDelete - * Signature: (J[BIJ)V + * Method: get + * Signature: (J[BII[BIIJ)I */ -void Java_org_rocksdb_RocksDB_singleDelete__J_3BIJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, - jint jkey_len, - jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); +jint Java_org_rocksdb_RocksDB_get__J_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + jlong jcf_handle) { + auto* db_handle = reinterpret_cast(jdb_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_single_delete_helper(env, db, default_write_options, cf_handle, - jkey, jkey_len); + bool has_exception = false; + return rocksdb_get_helper(env, db_handle, rocksdb::ReadOptions(), cf_handle, + jkey, jkey_off, jkey_len, jval, jval_off, + jval_len, &has_exception); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + // will never be evaluated + return 0; } } /* * Class: org_rocksdb_RocksDB - * Method: singleDelete - * Signature: (JJ[BIJ)V + * Method: get + * Signature: (JJ[BII[BII)I */ -void Java_org_rocksdb_RocksDB_singleDelete__JJ_3BI(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jwrite_options, - jbyteArray jkey, - jint jkey_len) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); - rocksdb_single_delete_helper(env, db, *write_options, nullptr, jkey, - jkey_len); +jint Java_org_rocksdb_RocksDB_get__JJ_3BII_3BII( + JNIEnv* env, jobject, jlong jdb_handle, jlong jropt_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len) { + bool has_exception = false; + return rocksdb_get_helper( + env, reinterpret_cast(jdb_handle), + *reinterpret_cast(jropt_handle), nullptr, jkey, + jkey_off, jkey_len, jval, jval_off, jval_len, &has_exception); } /* * Class: org_rocksdb_RocksDB - * Method: singleDelete - * Signature: (JJ[BIJ)V + * Method: get + * Signature: (JJ[BII[BIIJ)I */ -void Java_org_rocksdb_RocksDB_singleDelete__JJ_3BIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options, - jbyteArray jkey, jint jkey_len, jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); +jint Java_org_rocksdb_RocksDB_get__JJ_3BII_3BIIJ( + JNIEnv* env, jobject, jlong jdb_handle, jlong jropt_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jbyteArray jval, jint jval_off, jint jval_len, + jlong jcf_handle) { + auto* db_handle = reinterpret_cast(jdb_handle); + auto& ro_opt = *reinterpret_cast(jropt_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_single_delete_helper(env, db, *write_options, cf_handle, jkey, - jkey_len); + bool has_exception = false; + return rocksdb_get_helper(env, db_handle, ro_opt, cf_handle, + jkey, jkey_off, jkey_len, + jval, jval_off, jval_len, + &has_exception); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + // will never be evaluated + return 0; } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::DeleteRange() +inline void multi_get_helper_release_keys( + JNIEnv* env, std::vector>& keys_to_free) { + auto end = keys_to_free.end(); + for (auto it = keys_to_free.begin(); it != end; ++it) { + delete[] it->first; + env->DeleteLocalRef(it->second); + } + keys_to_free.clear(); +} + /** - * @return true if the delete range succeeded, false if a Java Exception - * was thrown + * cf multi get + * + * @return byte[][] of values or nullptr if an exception occurs */ -bool rocksdb_delete_range_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::WriteOptions& write_options, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jbegin_key, jint jbegin_key_off, - jint jbegin_key_len, jbyteArray jend_key, - jint jend_key_off, jint jend_key_len) { - jbyte* begin_key = new jbyte[jbegin_key_len]; - env->GetByteArrayRegion(jbegin_key, jbegin_key_off, jbegin_key_len, - begin_key); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - delete[] begin_key; - return false; +jobjectArray multi_get_helper( + JNIEnv* env, jobject, rocksdb::DB* db, const rocksdb::ReadOptions& rOpt, + jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens, + jlongArray jcolumn_family_handles) { + std::vector cf_handles; + if (jcolumn_family_handles != nullptr) { + const jsize len_cols = env->GetArrayLength(jcolumn_family_handles); + + jlong* jcfh = env->GetLongArrayElements(jcolumn_family_handles, nullptr); + if (jcfh == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + for (jsize i = 0; i < len_cols; i++) { + auto* cf_handle = reinterpret_cast(jcfh[i]); + cf_handles.push_back(cf_handle); + } + env->ReleaseLongArrayElements(jcolumn_family_handles, jcfh, JNI_ABORT); } - rocksdb::Slice begin_key_slice(reinterpret_cast(begin_key), - jbegin_key_len); - jbyte* end_key = new jbyte[jend_key_len]; - env->GetByteArrayRegion(jend_key, jend_key_off, jend_key_len, end_key); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - delete[] begin_key; - delete[] end_key; - return false; + const jsize len_keys = env->GetArrayLength(jkeys); + if (env->EnsureLocalCapacity(len_keys) != 0) { + // exception thrown: OutOfMemoryError + return nullptr; } - rocksdb::Slice end_key_slice(reinterpret_cast(end_key), jend_key_len); - rocksdb::Status s = - db->DeleteRange(write_options, cf_handle, begin_key_slice, end_key_slice); + jint* jkey_off = env->GetIntArrayElements(jkey_offs, nullptr); + if (jkey_off == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } - // cleanup - delete[] begin_key; - delete[] end_key; + jint* jkey_len = env->GetIntArrayElements(jkey_lens, nullptr); + if (jkey_len == nullptr) { + // exception thrown: OutOfMemoryError + env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); + return nullptr; + } - if (s.ok()) { - return true; + std::vector keys; + std::vector> keys_to_free; + for (jsize i = 0; i < len_keys; i++) { + jobject jkey = env->GetObjectArrayElement(jkeys, i); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); + env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); + multi_get_helper_release_keys(env, keys_to_free); + return nullptr; + } + + jbyteArray jkey_ba = reinterpret_cast(jkey); + + const jint len_key = jkey_len[i]; + jbyte* key = new jbyte[len_key]; + env->GetByteArrayRegion(jkey_ba, jkey_off[i], len_key, key); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + delete[] key; + env->DeleteLocalRef(jkey); + env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); + env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); + multi_get_helper_release_keys(env, keys_to_free); + return nullptr; + } + + rocksdb::Slice key_slice(reinterpret_cast(key), len_key); + keys.push_back(key_slice); + + keys_to_free.push_back(std::pair(key, jkey)); } - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return false; -} + // cleanup jkey_off and jken_len + env->ReleaseIntArrayElements(jkey_lens, jkey_len, JNI_ABORT); + env->ReleaseIntArrayElements(jkey_offs, jkey_off, JNI_ABORT); -/* - * Class: org_rocksdb_RocksDB - * Method: deleteRange - * Signature: (J[BII[BII)V - */ -void Java_org_rocksdb_RocksDB_deleteRange__J_3BII_3BII( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jbegin_key, - jint jbegin_key_off, jint jbegin_key_len, jbyteArray jend_key, - jint jend_key_off, jint jend_key_len) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - rocksdb_delete_range_helper(env, db, default_write_options, nullptr, - jbegin_key, jbegin_key_off, jbegin_key_len, - jend_key, jend_key_off, jend_key_len); + std::vector values; + std::vector s; + if (cf_handles.size() == 0) { + s = db->MultiGet(rOpt, keys, &values); + } else { + s = db->MultiGet(rOpt, cf_handles, keys, &values); + } + + // free up allocated byte arrays + multi_get_helper_release_keys(env, keys_to_free); + + // prepare the results + jobjectArray jresults = + rocksdb::ByteJni::new2dByteArray(env, static_cast(s.size())); + if (jresults == nullptr) { + // exception occurred + return nullptr; + } + + // TODO(AR) it is not clear to me why EnsureLocalCapacity is needed for the + // loop as we cleanup references with env->DeleteLocalRef(jentry_value); + if (env->EnsureLocalCapacity(static_cast(s.size())) != 0) { + // exception thrown: OutOfMemoryError + return nullptr; + } + // add to the jresults + for (std::vector::size_type i = 0; i != s.size(); i++) { + if (s[i].ok()) { + std::string* value = &values[i]; + const jsize jvalue_len = static_cast(value->size()); + jbyteArray jentry_value = env->NewByteArray(jvalue_len); + if (jentry_value == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + env->SetByteArrayRegion( + jentry_value, 0, static_cast(jvalue_len), + const_cast(reinterpret_cast(value->c_str()))); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jentry_value); + return nullptr; + } + + env->SetObjectArrayElement(jresults, static_cast(i), jentry_value); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jentry_value); + return nullptr; + } + + env->DeleteLocalRef(jentry_value); + } + } + + return jresults; } /* * Class: org_rocksdb_RocksDB - * Method: deleteRange - * Signature: (J[BII[BIIJ)V + * Method: multiGet + * Signature: (J[[B[I[I)[[B */ -void Java_org_rocksdb_RocksDB_deleteRange__J_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jbegin_key, - jint jbegin_key_off, jint jbegin_key_len, jbyteArray jend_key, - jint jend_key_off, jint jend_key_len, jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - auto* cf_handle = reinterpret_cast(jcf_handle); - if (cf_handle != nullptr) { - rocksdb_delete_range_helper(env, db, default_write_options, cf_handle, - jbegin_key, jbegin_key_off, jbegin_key_len, - jend_key, jend_key_off, jend_key_len); - } else { - rocksdb::RocksDBExceptionJni::ThrowNew( - env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - } +jobjectArray Java_org_rocksdb_RocksDB_multiGet__J_3_3B_3I_3I( + JNIEnv* env, jobject jdb, jlong jdb_handle, + jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens) { + return multi_get_helper(env, jdb, reinterpret_cast(jdb_handle), + rocksdb::ReadOptions(), jkeys, jkey_offs, jkey_lens, + nullptr); } /* * Class: org_rocksdb_RocksDB - * Method: deleteRange - * Signature: (JJ[BII[BII)V + * Method: multiGet + * Signature: (J[[B[I[I[J)[[B */ -void Java_org_rocksdb_RocksDB_deleteRange__JJ_3BII_3BII( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options, - jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, - jbyteArray jend_key, jint jend_key_off, jint jend_key_len) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); - rocksdb_delete_range_helper(env, db, *write_options, nullptr, jbegin_key, - jbegin_key_off, jbegin_key_len, jend_key, - jend_key_off, jend_key_len); +jobjectArray Java_org_rocksdb_RocksDB_multiGet__J_3_3B_3I_3I_3J( + JNIEnv* env, jobject jdb, jlong jdb_handle, + jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens, + jlongArray jcolumn_family_handles) { + return multi_get_helper(env, jdb, reinterpret_cast(jdb_handle), + rocksdb::ReadOptions(), jkeys, jkey_offs, jkey_lens, + jcolumn_family_handles); } /* * Class: org_rocksdb_RocksDB - * Method: deleteRange - * Signature: (JJ[BII[BIIJ)V + * Method: multiGet + * Signature: (JJ[[B[I[I)[[B */ -void Java_org_rocksdb_RocksDB_deleteRange__JJ_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options, - jbyteArray jbegin_key, jint jbegin_key_off, jint jbegin_key_len, - jbyteArray jend_key, jint jend_key_off, jint jend_key_len, - jlong jcf_handle) { - auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options); - auto* cf_handle = reinterpret_cast(jcf_handle); - if (cf_handle != nullptr) { - rocksdb_delete_range_helper(env, db, *write_options, cf_handle, jbegin_key, - jbegin_key_off, jbegin_key_len, jend_key, - jend_key_off, jend_key_len); - } else { - rocksdb::RocksDBExceptionJni::ThrowNew( - env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); - } +jobjectArray Java_org_rocksdb_RocksDB_multiGet__JJ_3_3B_3I_3I( + JNIEnv* env, jobject jdb, jlong jdb_handle, jlong jropt_handle, + jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens) { + return multi_get_helper( + env, jdb, reinterpret_cast(jdb_handle), + *reinterpret_cast(jropt_handle), jkeys, jkey_offs, + jkey_lens, nullptr); } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Merge - -/** - * @return true if the merge succeeded, false if a Java Exception was thrown +/* + * Class: org_rocksdb_RocksDB + * Method: multiGet + * Signature: (JJ[[B[I[I[J)[[B */ -bool rocksdb_merge_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::WriteOptions& write_options, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, - jbyteArray jval, jint jval_off, jint jval_len) { +jobjectArray Java_org_rocksdb_RocksDB_multiGet__JJ_3_3B_3I_3I_3J( + JNIEnv* env, jobject jdb, jlong jdb_handle, jlong jropt_handle, + jobjectArray jkeys, jintArray jkey_offs, jintArray jkey_lens, + jlongArray jcolumn_family_handles) { + return multi_get_helper( + env, jdb, reinterpret_cast(jdb_handle), + *reinterpret_cast(jropt_handle), jkeys, jkey_offs, + jkey_lens, jcolumn_family_handles); +} + +////////////////////////////////////////////////////////////////////////////// +// rocksdb::DB::KeyMayExist +jboolean key_may_exist_helper(JNIEnv* env, rocksdb::DB* db, + const rocksdb::ReadOptions& read_opt, + rocksdb::ColumnFamilyHandle* cf_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jobject jstring_builder, bool* has_exception) { jbyte* key = new jbyte[jkey_len]; env->GetByteArrayRegion(jkey, jkey_off, jkey_len, key); if (env->ExceptionCheck()) { // exception thrown: ArrayIndexOutOfBoundsException delete[] key; + *has_exception = true; return false; } - rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - jbyte* value = new jbyte[jval_len]; - env->GetByteArrayRegion(jval, jval_off, jval_len, value); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - delete[] value; - delete[] key; - return false; - } - rocksdb::Slice value_slice(reinterpret_cast(value), jval_len); + rocksdb::Slice key_slice(reinterpret_cast(key), jkey_len); - rocksdb::Status s; + std::string value; + bool value_found = false; + bool keyMayExist; if (cf_handle != nullptr) { - s = db->Merge(write_options, cf_handle, key_slice, value_slice); + keyMayExist = + db->KeyMayExist(read_opt, cf_handle, key_slice, &value, &value_found); } else { - s = db->Merge(write_options, key_slice, value_slice); + keyMayExist = db->KeyMayExist(read_opt, key_slice, &value, &value_found); } // cleanup - delete[] value; delete[] key; - if (s.ok()) { - return true; + // extract the value + if (value_found && !value.empty()) { + jobject jresult_string_builder = + rocksdb::StringBuilderJni::append(env, jstring_builder, value.c_str()); + if (jresult_string_builder == nullptr) { + *has_exception = true; + return false; + } } - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return false; + *has_exception = false; + return static_cast(keyMayExist); } /* * Class: org_rocksdb_RocksDB - * Method: merge - * Signature: (J[BII[BII)V + * Method: keyMayExist + * Signature: (J[BIILjava/lang/StringBuilder;)Z */ -void Java_org_rocksdb_RocksDB_merge__J_3BII_3BII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jkey, jint jkey_off, - jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { +jboolean Java_org_rocksdb_RocksDB_keyMayExist__J_3BIILjava_lang_StringBuilder_2( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, jobject jstring_builder) { auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); - - rocksdb_merge_helper(env, db, default_write_options, nullptr, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len); + bool has_exception = false; + return key_may_exist_helper(env, db, rocksdb::ReadOptions(), nullptr, jkey, + jkey_off, jkey_len, jstring_builder, &has_exception); } /* * Class: org_rocksdb_RocksDB - * Method: merge - * Signature: (J[BII[BIIJ)V + * Method: keyMayExist + * Signature: (J[BIIJLjava/lang/StringBuilder;)Z */ -void Java_org_rocksdb_RocksDB_merge__J_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jkey, - jint jkey_off, jint jkey_len, jbyteArray jval, jint jval_off, jint jval_len, - jlong jcf_handle) { +jboolean +Java_org_rocksdb_RocksDB_keyMayExist__J_3BIIJLjava_lang_StringBuilder_2( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, + jlong jcf_handle, jobject jstring_builder) { auto* db = reinterpret_cast(jdb_handle); - static const rocksdb::WriteOptions default_write_options = - rocksdb::WriteOptions(); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_merge_helper(env, db, default_write_options, cf_handle, jkey, - jkey_off, jkey_len, jval, jval_off, jval_len); + bool has_exception = false; + return key_may_exist_helper(env, db, rocksdb::ReadOptions(), cf_handle, + jkey, jkey_off, jkey_len, jstring_builder, + &has_exception); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + return true; } } /* * Class: org_rocksdb_RocksDB - * Method: merge - * Signature: (JJ[BII[BII)V + * Method: keyMayExist + * Signature: (JJ[BIILjava/lang/StringBuilder;)Z */ -void Java_org_rocksdb_RocksDB_merge__JJ_3BII_3BII( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len) { +jboolean +Java_org_rocksdb_RocksDB_keyMayExist__JJ_3BIILjava_lang_StringBuilder_2( + JNIEnv* env, jobject, jlong jdb_handle, jlong jread_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, jobject jstring_builder) { auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options_handle); - - rocksdb_merge_helper(env, db, *write_options, nullptr, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len); + auto& read_options = + *reinterpret_cast(jread_options_handle); + bool has_exception = false; + return key_may_exist_helper(env, db, read_options, nullptr, jkey, jkey_off, + jkey_len, jstring_builder, &has_exception); } /* * Class: org_rocksdb_RocksDB - * Method: merge - * Signature: (JJ[BII[BIIJ)V + * Method: keyMayExist + * Signature: (JJ[BIIJLjava/lang/StringBuilder;)Z */ -void Java_org_rocksdb_RocksDB_merge__JJ_3BII_3BIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jwrite_options_handle, - jbyteArray jkey, jint jkey_off, jint jkey_len, jbyteArray jval, - jint jval_off, jint jval_len, jlong jcf_handle) { +jboolean +Java_org_rocksdb_RocksDB_keyMayExist__JJ_3BIIJLjava_lang_StringBuilder_2( + JNIEnv* env, jobject, jlong jdb_handle, jlong jread_options_handle, + jbyteArray jkey, jint jkey_off, jint jkey_len, jlong jcf_handle, + jobject jstring_builder) { auto* db = reinterpret_cast(jdb_handle); - auto* write_options = - reinterpret_cast(jwrite_options_handle); + auto& read_options = + *reinterpret_cast(jread_options_handle); auto* cf_handle = reinterpret_cast(jcf_handle); if (cf_handle != nullptr) { - rocksdb_merge_helper(env, db, *write_options, cf_handle, jkey, jkey_off, - jkey_len, jval, jval_off, jval_len); + bool has_exception = false; + return key_may_exist_helper(env, db, read_options, cf_handle, jkey, + jkey_off, jkey_len, jstring_builder, &has_exception); } else { rocksdb::RocksDBExceptionJni::ThrowNew( env, rocksdb::Status::InvalidArgument("Invalid ColumnFamilyHandle.")); + return true; } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::~DB() - -/* - * Class: org_rocksdb_RocksDB - * Method: disposeInternal - * Signature: (J)V - */ -void Java_org_rocksdb_RocksDB_disposeInternal(JNIEnv* /*env*/, - jobject /*java_db*/, - jlong jhandle) { - auto* db = reinterpret_cast(jhandle); - assert(db != nullptr); - delete db; -} - -jlong rocksdb_iterator_helper(rocksdb::DB* db, - rocksdb::ReadOptions read_options, - rocksdb::ColumnFamilyHandle* cf_handle) { - rocksdb::Iterator* iterator = nullptr; - if (cf_handle != nullptr) { - iterator = db->NewIterator(read_options, cf_handle); - } else { - iterator = db->NewIterator(read_options); - } - return reinterpret_cast(iterator); -} - /* * Class: org_rocksdb_RocksDB * Method: iterator * Signature: (J)J */ -jlong Java_org_rocksdb_RocksDB_iterator__J(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle) { +jlong Java_org_rocksdb_RocksDB_iterator__J( + JNIEnv*, jobject, jlong db_handle) { auto* db = reinterpret_cast(db_handle); return rocksdb_iterator_helper(db, rocksdb::ReadOptions(), nullptr); } @@ -1546,9 +1726,8 @@ jlong Java_org_rocksdb_RocksDB_iterator__J(JNIEnv* /*env*/, jobject /*jdb*/, * Method: iterator * Signature: (JJ)J */ -jlong Java_org_rocksdb_RocksDB_iterator__JJ(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle, - jlong jread_options_handle) { +jlong Java_org_rocksdb_RocksDB_iterator__JJ( + JNIEnv*, jobject, jlong db_handle, jlong jread_options_handle) { auto* db = reinterpret_cast(db_handle); auto& read_options = *reinterpret_cast(jread_options_handle); @@ -1560,9 +1739,8 @@ jlong Java_org_rocksdb_RocksDB_iterator__JJ(JNIEnv* /*env*/, jobject /*jdb*/, * Method: iteratorCF * Signature: (JJ)J */ -jlong Java_org_rocksdb_RocksDB_iteratorCF__JJ(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle, - jlong jcf_handle) { +jlong Java_org_rocksdb_RocksDB_iteratorCF__JJ( + JNIEnv*, jobject, jlong db_handle, jlong jcf_handle) { auto* db = reinterpret_cast(db_handle); auto* cf_handle = reinterpret_cast(jcf_handle); return rocksdb_iterator_helper(db, rocksdb::ReadOptions(), cf_handle); @@ -1573,10 +1751,9 @@ jlong Java_org_rocksdb_RocksDB_iteratorCF__JJ(JNIEnv* /*env*/, jobject /*jdb*/, * Method: iteratorCF * Signature: (JJJ)J */ -jlong Java_org_rocksdb_RocksDB_iteratorCF__JJJ(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle, - jlong jcf_handle, - jlong jread_options_handle) { +jlong Java_org_rocksdb_RocksDB_iteratorCF__JJJ( + JNIEnv*, jobject, + jlong db_handle, jlong jcf_handle, jlong jread_options_handle) { auto* db = reinterpret_cast(db_handle); auto* cf_handle = reinterpret_cast(jcf_handle); auto& read_options = @@ -1589,10 +1766,10 @@ jlong Java_org_rocksdb_RocksDB_iteratorCF__JJJ(JNIEnv* /*env*/, jobject /*jdb*/, * Method: iterators * Signature: (J[JJ)[J */ -jlongArray Java_org_rocksdb_RocksDB_iterators(JNIEnv* env, jobject /*jdb*/, - jlong db_handle, - jlongArray jcolumn_family_handles, - jlong jread_options_handle) { +jlongArray Java_org_rocksdb_RocksDB_iterators( + JNIEnv* env, jobject, jlong db_handle, + jlongArray jcolumn_family_handles, + jlong jread_options_handle) { auto* db = reinterpret_cast(db_handle); auto& read_options = *reinterpret_cast(jread_options_handle); @@ -1643,76 +1820,12 @@ jlongArray Java_org_rocksdb_RocksDB_iterators(JNIEnv* env, jobject /*jdb*/, } } -/* - * Class: org_rocksdb_RocksDB - * Method: getDefaultColumnFamily - * Signature: (J)J - */ -jlong Java_org_rocksdb_RocksDB_getDefaultColumnFamily(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jdb_handle) { - auto* db_handle = reinterpret_cast(jdb_handle); - auto* cf_handle = db_handle->DefaultColumnFamily(); - return reinterpret_cast(cf_handle); -} - -/* - * Class: org_rocksdb_RocksDB - * Method: createColumnFamily - * Signature: (J[BJ)J - */ -jlong Java_org_rocksdb_RocksDB_createColumnFamily(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jbyteArray jcolumn_name, - jlong jcolumn_options) { - rocksdb::ColumnFamilyHandle* handle; - jboolean has_exception = JNI_FALSE; - std::string column_name = rocksdb::JniUtil::byteString( - env, jcolumn_name, - [](const char* str, const size_t len) { return std::string(str, len); }, - &has_exception); - if (has_exception == JNI_TRUE) { - // exception occurred - return 0; - } - - auto* db_handle = reinterpret_cast(jdb_handle); - auto* cfOptions = - reinterpret_cast(jcolumn_options); - - rocksdb::Status s = - db_handle->CreateColumnFamily(*cfOptions, column_name, &handle); - - if (s.ok()) { - return reinterpret_cast(handle); - } - - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return 0; -} - -/* - * Class: org_rocksdb_RocksDB - * Method: dropColumnFamily - * Signature: (JJ)V; - */ -void Java_org_rocksdb_RocksDB_dropColumnFamily(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jcf_handle) { - auto* cf_handle = reinterpret_cast(jcf_handle); - auto* db_handle = reinterpret_cast(jdb_handle); - rocksdb::Status s = db_handle->DropColumnFamily(cf_handle); - if (!s.ok()) { - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - } -} - /* * Method: getSnapshot * Signature: (J)J */ -jlong Java_org_rocksdb_RocksDB_getSnapshot(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle) { +jlong Java_org_rocksdb_RocksDB_getSnapshot( + JNIEnv*, jobject, jlong db_handle) { auto* db = reinterpret_cast(db_handle); const rocksdb::Snapshot* snapshot = db->GetSnapshot(); return reinterpret_cast(snapshot); @@ -1722,9 +1835,9 @@ jlong Java_org_rocksdb_RocksDB_getSnapshot(JNIEnv* /*env*/, jobject /*jdb*/, * Method: releaseSnapshot * Signature: (JJ)V */ -void Java_org_rocksdb_RocksDB_releaseSnapshot(JNIEnv* /*env*/, jobject /*jdb*/, - jlong db_handle, - jlong snapshot_handle) { +void Java_org_rocksdb_RocksDB_releaseSnapshot( + JNIEnv*, jobject, jlong db_handle, + jlong snapshot_handle) { auto* db = reinterpret_cast(db_handle); auto* snapshot = reinterpret_cast(snapshot_handle); db->ReleaseSnapshot(snapshot); @@ -1732,51 +1845,30 @@ void Java_org_rocksdb_RocksDB_releaseSnapshot(JNIEnv* /*env*/, jobject /*jdb*/, /* * Class: org_rocksdb_RocksDB - * Method: getProperty0 - * Signature: (JLjava/lang/String;I)Ljava/lang/String; - */ -jstring Java_org_rocksdb_RocksDB_getProperty0__JLjava_lang_String_2I( - JNIEnv* env, jobject /*jdb*/, jlong db_handle, jstring jproperty, - jint jproperty_len) { - const char* property = env->GetStringUTFChars(jproperty, nullptr); - if (property == nullptr) { - // exception thrown: OutOfMemoryError - return nullptr; - } - rocksdb::Slice property_slice(property, jproperty_len); - - auto* db = reinterpret_cast(db_handle); - std::string property_value; - bool retCode = db->GetProperty(property_slice, &property_value); - env->ReleaseStringUTFChars(jproperty, property); - - if (retCode) { - return env->NewStringUTF(property_value.c_str()); - } - - rocksdb::RocksDBExceptionJni::ThrowNew(env, rocksdb::Status::NotFound()); - return nullptr; -} - -/* - * Class: org_rocksdb_RocksDB - * Method: getProperty0 + * Method: getProperty * Signature: (JJLjava/lang/String;I)Ljava/lang/String; */ -jstring Java_org_rocksdb_RocksDB_getProperty0__JJLjava_lang_String_2I( - JNIEnv* env, jobject /*jdb*/, jlong db_handle, jlong jcf_handle, +jstring Java_org_rocksdb_RocksDB_getProperty( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, jstring jproperty, jint jproperty_len) { const char* property = env->GetStringUTFChars(jproperty, nullptr); if (property == nullptr) { // exception thrown: OutOfMemoryError return nullptr; } - rocksdb::Slice property_slice(property, jproperty_len); + rocksdb::Slice property_name(property, jproperty_len); + + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } - auto* db = reinterpret_cast(db_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); std::string property_value; - bool retCode = db->GetProperty(cf_handle, property_slice, &property_value); + bool retCode = db->GetProperty(cf_handle, property_name, &property_value); env->ReleaseStringUTFChars(jproperty, property); if (retCode) { @@ -1789,51 +1881,66 @@ jstring Java_org_rocksdb_RocksDB_getProperty0__JJLjava_lang_String_2I( /* * Class: org_rocksdb_RocksDB - * Method: getLongProperty - * Signature: (JLjava/lang/String;I)L; + * Method: getMapProperty + * Signature: (JJLjava/lang/String;I)Ljava/util/Map; */ -jlong Java_org_rocksdb_RocksDB_getLongProperty__JLjava_lang_String_2I( - JNIEnv* env, jobject /*jdb*/, jlong db_handle, jstring jproperty, - jint jproperty_len) { - const char* property = env->GetStringUTFChars(jproperty, nullptr); +jobject Java_org_rocksdb_RocksDB_getMapProperty( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, + jstring jproperty, jint jproperty_len) { + const char* property = env->GetStringUTFChars(jproperty, nullptr); if (property == nullptr) { // exception thrown: OutOfMemoryError - return 0; + return nullptr; + } + rocksdb::Slice property_name(property, jproperty_len); + + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); } - rocksdb::Slice property_slice(property, jproperty_len); - auto* db = reinterpret_cast(db_handle); - uint64_t property_value = 0; - bool retCode = db->GetIntProperty(property_slice, &property_value); + std::map property_value; + bool retCode = db->GetMapProperty(cf_handle, property_name, &property_value); env->ReleaseStringUTFChars(jproperty, property); if (retCode) { - return property_value; + return rocksdb::HashMapJni::fromCppMap(env, &property_value); } rocksdb::RocksDBExceptionJni::ThrowNew(env, rocksdb::Status::NotFound()); - return 0; + return nullptr; } /* * Class: org_rocksdb_RocksDB * Method: getLongProperty - * Signature: (JJLjava/lang/String;I)L; + * Signature: (JJLjava/lang/String;I)J */ -jlong Java_org_rocksdb_RocksDB_getLongProperty__JJLjava_lang_String_2I( - JNIEnv* env, jobject /*jdb*/, jlong db_handle, jlong jcf_handle, +jlong Java_org_rocksdb_RocksDB_getLongProperty( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, jstring jproperty, jint jproperty_len) { const char* property = env->GetStringUTFChars(jproperty, nullptr); if (property == nullptr) { // exception thrown: OutOfMemoryError return 0; } - rocksdb::Slice property_slice(property, jproperty_len); + rocksdb::Slice property_name(property, jproperty_len); + + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } - auto* db = reinterpret_cast(db_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); uint64_t property_value; - bool retCode = db->GetIntProperty(cf_handle, property_slice, &property_value); + bool retCode = db->GetIntProperty(cf_handle, property_name, &property_value); env->ReleaseStringUTFChars(jproperty, property); if (retCode) { @@ -1844,21 +1951,33 @@ jlong Java_org_rocksdb_RocksDB_getLongProperty__JJLjava_lang_String_2I( return 0; } +/* + * Class: org_rocksdb_RocksDB + * Method: resetStats + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_resetStats( + JNIEnv *, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + db->ResetStats(); +} + /* * Class: org_rocksdb_RocksDB * Method: getAggregatedLongProperty * Signature: (JLjava/lang/String;I)J */ jlong Java_org_rocksdb_RocksDB_getAggregatedLongProperty( - JNIEnv* env, jobject, jlong db_handle, jstring jproperty, jint jproperty_len) { + JNIEnv* env, jobject, jlong db_handle, + jstring jproperty, jint jproperty_len) { const char* property = env->GetStringUTFChars(jproperty, nullptr); if (property == nullptr) { return 0; } - rocksdb::Slice property_slice(property, jproperty_len); + rocksdb::Slice property_name(property, jproperty_len); auto* db = reinterpret_cast(db_handle); uint64_t property_value = 0; - bool retCode = db->GetAggregatedIntProperty(property_slice, &property_value); + bool retCode = db->GetAggregatedIntProperty(property_name, &property_value); env->ReleaseStringUTFChars(jproperty, property); if (retCode) { @@ -1869,278 +1988,576 @@ jlong Java_org_rocksdb_RocksDB_getAggregatedLongProperty( return 0; } +/* + * Class: org_rocksdb_RocksDB + * Method: getApproximateSizes + * Signature: (JJ[JB)[J + */ +jlongArray Java_org_rocksdb_RocksDB_getApproximateSizes( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, + jlongArray jrange_slice_handles, jbyte jinclude_flags) { + const jsize jlen = env->GetArrayLength(jrange_slice_handles); + const size_t range_count = jlen / 2; -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::Flush + jboolean jranges_is_copy = JNI_FALSE; + jlong* jranges = env->GetLongArrayElements(jrange_slice_handles, + &jranges_is_copy); + if (jranges == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } -void rocksdb_flush_helper(JNIEnv* env, rocksdb::DB* db, - const rocksdb::FlushOptions& flush_options, - rocksdb::ColumnFamilyHandle* column_family_handle) { - rocksdb::Status s; - if (column_family_handle != nullptr) { - s = db->Flush(flush_options, column_family_handle); + auto ranges = std::unique_ptr( + new rocksdb::Range[range_count]); + for (jsize i = 0; i < jlen; ++i) { + auto* start = reinterpret_cast(jranges[i]); + auto* limit = reinterpret_cast(jranges[++i]); + ranges.get()[i] = rocksdb::Range(*start, *limit); + } + + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); } else { - s = db->Flush(flush_options); + cf_handle = + reinterpret_cast(jcf_handle); } - if (!s.ok()) { - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + + auto sizes = std::unique_ptr(new uint64_t[range_count]); + db->GetApproximateSizes(cf_handle, ranges.get(), + static_cast(range_count), sizes.get(), + static_cast(jinclude_flags)); + + // release LongArrayElements + env->ReleaseLongArrayElements(jrange_slice_handles, jranges, JNI_ABORT); + + // prepare results + auto results = std::unique_ptr(new jlong[range_count]); + for (size_t i = 0; i < range_count; ++i) { + results.get()[i] = static_cast(sizes.get()[i]); + } + + const jsize jrange_count = jlen / 2; + jlongArray jresults = env->NewLongArray(jrange_count); + if (jresults == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + env->SetLongArrayRegion(jresults, 0, jrange_count, results.get()); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jresults); + return nullptr; } + + return jresults; } /* * Class: org_rocksdb_RocksDB - * Method: flush - * Signature: (JJ)V + * Method: getApproximateMemTableStats + * Signature: (JJJJ)[J */ -void Java_org_rocksdb_RocksDB_flush__JJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jflush_options) { +jlongArray Java_org_rocksdb_RocksDB_getApproximateMemTableStats( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, + jlong jstartHandle, jlong jlimitHandle) { + auto* start = reinterpret_cast(jstartHandle); + auto* limit = reinterpret_cast( jlimitHandle); + const rocksdb::Range range(*start, *limit); + auto* db = reinterpret_cast(jdb_handle); - auto* flush_options = - reinterpret_cast(jflush_options); - rocksdb_flush_helper(env, db, *flush_options, nullptr); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + + uint64_t count = 0; + uint64_t sizes = 0; + db->GetApproximateMemTableStats(cf_handle, range, &count, &sizes); + + // prepare results + jlong results[2] = { + static_cast(count), + static_cast(sizes)}; + + const jsize jcount = static_cast(count); + jlongArray jsizes = env->NewLongArray(jcount); + if (jsizes == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + env->SetLongArrayRegion(jsizes, 0, jcount, results); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jsizes); + return nullptr; + } + + return jsizes; } /* * Class: org_rocksdb_RocksDB - * Method: flush - * Signature: (JJJ)V + * Method: compactRange + * Signature: (J[BI[BIJJ)V */ -void Java_org_rocksdb_RocksDB_flush__JJJ(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, jlong jflush_options, - jlong jcf_handle) { +void Java_org_rocksdb_RocksDB_compactRange( + JNIEnv* env, jobject, jlong jdb_handle, + jbyteArray jbegin, jint jbegin_len, + jbyteArray jend, jint jend_len, + jlong jcompact_range_opts_handle, + jlong jcf_handle) { + jboolean has_exception = JNI_FALSE; + + std::string str_begin; + if (jbegin_len > 0) { + str_begin = rocksdb::JniUtil::byteString(env, jbegin, jbegin_len, + [](const char* str, const size_t len) { + return std::string(str, len); + }, + &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } + } + + std::string str_end; + if (jend_len > 0) { + str_end = rocksdb::JniUtil::byteString(env, jend, jend_len, + [](const char* str, const size_t len) { + return std::string(str, len); + }, + &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } + } + + rocksdb::CompactRangeOptions *compact_range_opts = nullptr; + if (jcompact_range_opts_handle == 0) { + // NOTE: we DO own the pointer! + compact_range_opts = new rocksdb::CompactRangeOptions(); + } else { + // NOTE: we do NOT own the pointer! + compact_range_opts = + reinterpret_cast(jcompact_range_opts_handle); + } + auto* db = reinterpret_cast(jdb_handle); - auto* flush_options = - reinterpret_cast(jflush_options); - auto* cf_handle = reinterpret_cast(jcf_handle); - rocksdb_flush_helper(env, db, *flush_options, cf_handle); -} -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::CompactRange - Full + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } -void rocksdb_compactrange_helper(JNIEnv* env, rocksdb::DB* db, - rocksdb::ColumnFamilyHandle* cf_handle, - jboolean jreduce_level, jint jtarget_level, - jint jtarget_path_id) { rocksdb::Status s; - rocksdb::CompactRangeOptions compact_options; - compact_options.change_level = jreduce_level; - compact_options.target_level = jtarget_level; - compact_options.target_path_id = static_cast(jtarget_path_id); - if (cf_handle != nullptr) { - s = db->CompactRange(compact_options, cf_handle, nullptr, nullptr); + if (jbegin_len > 0 || jend_len > 0) { + const rocksdb::Slice begin(str_begin); + const rocksdb::Slice end(str_end); + s = db->CompactRange(*compact_range_opts, cf_handle, &begin, &end); } else { - // backwards compatibility - s = db->CompactRange(compact_options, nullptr, nullptr); + s = db->CompactRange(*compact_range_opts, cf_handle, nullptr, nullptr); } - if (s.ok()) { - return; + if (jcompact_range_opts_handle == 0) { + delete compact_range_opts; } rocksdb::RocksDBExceptionJni::ThrowNew(env, s); } /* * Class: org_rocksdb_RocksDB - * Method: compactRange0 - * Signature: (JZII)V + * Method: setOptions + * Signature: (JJ[Ljava/lang/String;[Ljava/lang/String;)V */ -void Java_org_rocksdb_RocksDB_compactRange0__JZII(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jboolean jreduce_level, - jint jtarget_level, - jint jtarget_path_id) { +void Java_org_rocksdb_RocksDB_setOptions( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, + jobjectArray jkeys, jobjectArray jvalues) { + const jsize len = env->GetArrayLength(jkeys); + assert(len == env->GetArrayLength(jvalues)); + + std::unordered_map options_map; + for (jsize i = 0; i < len; i++) { + jobject jobj_key = env->GetObjectArrayElement(jkeys, i); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + return; + } + + jobject jobj_value = env->GetObjectArrayElement(jvalues, i); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jobj_key); + return; + } + + jboolean has_exception = JNI_FALSE; + std::string s_key = + rocksdb::JniUtil::copyStdString( + env, reinterpret_cast(jobj_key), &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + env->DeleteLocalRef(jobj_value); + env->DeleteLocalRef(jobj_key); + return; + } + + std::string s_value = + rocksdb::JniUtil::copyStdString( + env, reinterpret_cast(jobj_value), &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + env->DeleteLocalRef(jobj_value); + env->DeleteLocalRef(jobj_key); + return; + } + + options_map[s_key] = s_value; + + env->DeleteLocalRef(jobj_key); + env->DeleteLocalRef(jobj_value); + } + auto* db = reinterpret_cast(jdb_handle); - rocksdb_compactrange_helper(env, db, nullptr, jreduce_level, jtarget_level, - jtarget_path_id); + auto* cf_handle = reinterpret_cast(jcf_handle); + auto s = db->SetOptions(cf_handle, options_map); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } } /* * Class: org_rocksdb_RocksDB - * Method: compactRange - * Signature: (JZIIJ)V + * Method: setDBOptions + * Signature: (J[Ljava/lang/String;[Ljava/lang/String;)V */ -void Java_org_rocksdb_RocksDB_compactRange__JZIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jboolean jreduce_level, - jint jtarget_level, jint jtarget_path_id, jlong jcf_handle) { +void Java_org_rocksdb_RocksDB_setDBOptions( + JNIEnv* env, jobject, jlong jdb_handle, + jobjectArray jkeys, jobjectArray jvalues) { + const jsize len = env->GetArrayLength(jkeys); + assert(len == env->GetArrayLength(jvalues)); + + std::unordered_map options_map; + for (jsize i = 0; i < len; i++) { + jobject jobj_key = env->GetObjectArrayElement(jkeys, i); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + return; + } + + jobject jobj_value = env->GetObjectArrayElement(jvalues, i); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + env->DeleteLocalRef(jobj_key); + return; + } + + jboolean has_exception = JNI_FALSE; + std::string s_key = + rocksdb::JniUtil::copyStdString( + env, reinterpret_cast(jobj_key), &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + env->DeleteLocalRef(jobj_value); + env->DeleteLocalRef(jobj_key); + return; + } + + std::string s_value = + rocksdb::JniUtil::copyStdString( + env, reinterpret_cast(jobj_value), &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + env->DeleteLocalRef(jobj_value); + env->DeleteLocalRef(jobj_key); + return; + } + + options_map[s_key] = s_value; + + env->DeleteLocalRef(jobj_key); + env->DeleteLocalRef(jobj_value); + } + auto* db = reinterpret_cast(jdb_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); - rocksdb_compactrange_helper(env, db, cf_handle, jreduce_level, jtarget_level, - jtarget_path_id); + auto s = db->SetDBOptions(options_map); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::CompactRange - Range - -/** - * @return true if the compact range succeeded, false if a Java Exception - * was thrown +/* + * Class: org_rocksdb_RocksDB + * Method: compactFiles + * Signature: (JJJ[Ljava/lang/String;IIJ)[Ljava/lang/String; */ -bool rocksdb_compactrange_helper(JNIEnv* env, rocksdb::DB* db, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jbegin, jint jbegin_len, - jbyteArray jend, jint jend_len, - const rocksdb::CompactRangeOptions& compact_options) { - jbyte* begin = env->GetByteArrayElements(jbegin, nullptr); - if (begin == nullptr) { - // exception thrown: OutOfMemoryError - return false; +jobjectArray Java_org_rocksdb_RocksDB_compactFiles( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcompaction_opts_handle, + jlong jcf_handle, jobjectArray jinput_file_names, jint joutput_level, + jint joutput_path_id, jlong jcompaction_job_info_handle) { + jboolean has_exception = JNI_FALSE; + const std::vector input_file_names = + rocksdb::JniUtil::copyStrings(env, jinput_file_names, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return nullptr; } - jbyte* end = env->GetByteArrayElements(jend, nullptr); - if (end == nullptr) { - // exception thrown: OutOfMemoryError - env->ReleaseByteArrayElements(jbegin, begin, JNI_ABORT); - return false; + auto* compaction_opts = + reinterpret_cast(jcompaction_opts_handle); + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); } - const rocksdb::Slice begin_slice(reinterpret_cast(begin), jbegin_len); - const rocksdb::Slice end_slice(reinterpret_cast(end), jend_len); + rocksdb::CompactionJobInfo* compaction_job_info = nullptr; + if (jcompaction_job_info_handle != 0) { + compaction_job_info = + reinterpret_cast(jcompaction_job_info_handle); + } - rocksdb::Status s; - if (cf_handle != nullptr) { - s = db->CompactRange(compact_options, cf_handle, &begin_slice, &end_slice); - } else { - // backwards compatibility - s = db->CompactRange(compact_options, &begin_slice, &end_slice); + std::vector output_file_names; + auto s = db->CompactFiles(*compaction_opts, cf_handle, input_file_names, + static_cast(joutput_level), static_cast(joutput_path_id), + &output_file_names, compaction_job_info); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; } - env->ReleaseByteArrayElements(jend, end, JNI_ABORT); - env->ReleaseByteArrayElements(jbegin, begin, JNI_ABORT); + return rocksdb::JniUtil::toJavaStrings(env, &output_file_names); +} - if (s.ok()) { - return true; +/* + * Class: org_rocksdb_RocksDB + * Method: pauseBackgroundWork + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_pauseBackgroundWork( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto s = db->PauseBackgroundWork(); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); } +} - rocksdb::RocksDBExceptionJni::ThrowNew(env, s); - return false; +/* + * Class: org_rocksdb_RocksDB + * Method: continueBackgroundWork + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_continueBackgroundWork( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto s = db->ContinueBackgroundWork(); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } } -/** - * @return true if the compact range succeeded, false if a Java Exception - * was thrown +/* + * Class: org_rocksdb_RocksDB + * Method: enableAutoCompaction + * Signature: (J[J)V + */ +void Java_org_rocksdb_RocksDB_enableAutoCompaction( + JNIEnv* env, jobject, jlong jdb_handle, jlongArray jcf_handles) { + auto* db = reinterpret_cast(jdb_handle); + jboolean has_exception = JNI_FALSE; + const std::vector cf_handles = + rocksdb::JniUtil::fromJPointers(env, jcf_handles, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } + db->EnableAutoCompaction(cf_handles); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: numberLevels + * Signature: (JJ)I + */ +jint Java_org_rocksdb_RocksDB_numberLevels( + JNIEnv*, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + return static_cast(db->NumberLevels(cf_handle)); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: maxMemCompactionLevel + * Signature: (JJ)I */ -bool rocksdb_compactrange_helper(JNIEnv* env, rocksdb::DB* db, - rocksdb::ColumnFamilyHandle* cf_handle, - jbyteArray jbegin, jint jbegin_len, - jbyteArray jend, jint jend_len, - jboolean jreduce_level, jint jtarget_level, - jint jtarget_path_id) { - rocksdb::CompactRangeOptions compact_options; - compact_options.change_level = jreduce_level; - compact_options.target_level = jtarget_level; - compact_options.target_path_id = static_cast(jtarget_path_id); +jint Java_org_rocksdb_RocksDB_maxMemCompactionLevel( + JNIEnv*, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + return static_cast(db->MaxMemCompactionLevel(cf_handle)); +} - return rocksdb_compactrange_helper(env, db, cf_handle, jbegin, jbegin_len, - jend, jend_len, compact_options); +/* + * Class: org_rocksdb_RocksDB + * Method: level0StopWriteTrigger + * Signature: (JJ)I + */ +jint Java_org_rocksdb_RocksDB_level0StopWriteTrigger( + JNIEnv*, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + return static_cast(db->Level0StopWriteTrigger(cf_handle)); } /* * Class: org_rocksdb_RocksDB - * Method: compactRange0 - * Signature: (J[BI[BIZII)V + * Method: getName + * Signature: (J)Ljava/lang/String; */ -void Java_org_rocksdb_RocksDB_compactRange0__J_3BI_3BIZII( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jbegin, - jint jbegin_len, jbyteArray jend, jint jend_len, jboolean jreduce_level, - jint jtarget_level, jint jtarget_path_id) { +jstring Java_org_rocksdb_RocksDB_getName( + JNIEnv* env, jobject, jlong jdb_handle) { auto* db = reinterpret_cast(jdb_handle); - rocksdb_compactrange_helper(env, db, nullptr, jbegin, jbegin_len, jend, - jend_len, jreduce_level, jtarget_level, - jtarget_path_id); + std::string name = db->GetName(); + return rocksdb::JniUtil::toJavaString(env, &name, false); } /* * Class: org_rocksdb_RocksDB - * Method: compactRange - * Signature: (JJ[BI[BIZII)V + * Method: getEnv + * Signature: (J)J */ -void Java_org_rocksdb_RocksDB_compactRange__J_3BI_3BIZIIJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jbegin, - jint jbegin_len, jbyteArray jend, jint jend_len, jboolean jreduce_level, - jint jtarget_level, jint jtarget_path_id, jlong jcf_handle) { +jlong Java_org_rocksdb_RocksDB_getEnv( + JNIEnv*, jobject, jlong jdb_handle) { auto* db = reinterpret_cast(jdb_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); - rocksdb_compactrange_helper(env, db, cf_handle, jbegin, jbegin_len, jend, - jend_len, jreduce_level, jtarget_level, - jtarget_path_id); + return reinterpret_cast(db->GetEnv()); } - -void Java_org_rocksdb_RocksDB_compactRange__J_3BI_3BIJJ( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jbyteArray jbegin, - jint jbegin_len, jbyteArray jend, jint jend_len, - jlong jcompact_options_handle, jlong jcf_handle) { +/* + * Class: org_rocksdb_RocksDB + * Method: flush + * Signature: (JJ[J)V + */ +void Java_org_rocksdb_RocksDB_flush( + JNIEnv* env, jobject, jlong jdb_handle, jlong jflush_opts_handle, + jlongArray jcf_handles) { auto* db = reinterpret_cast(jdb_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); - auto* compact_options = reinterpret_cast(jcompact_options_handle); - - rocksdb_compactrange_helper(env, db, cf_handle, jbegin, jbegin_len, jend, - jend_len, *compact_options); + auto* flush_opts = + reinterpret_cast(jflush_opts_handle); + std::vector cf_handles; + if (jcf_handles == nullptr) { + cf_handles.push_back(db->DefaultColumnFamily()); + } else { + jboolean has_exception = JNI_FALSE; + cf_handles = + rocksdb::JniUtil::fromJPointers( + env, jcf_handles, &has_exception); + if (has_exception) { + // exception occurred + return; + } + } + auto s = db->Flush(*flush_opts, cf_handles); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } } - -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::PauseBackgroundWork - /* * Class: org_rocksdb_RocksDB - * Method: pauseBackgroundWork - * Signature: (J)V + * Method: flushWal + * Signature: (JZ)V */ -void Java_org_rocksdb_RocksDB_pauseBackgroundWork(JNIEnv* env, jobject /*jobj*/, - jlong jdb_handle) { +void Java_org_rocksdb_RocksDB_flushWal( + JNIEnv* env, jobject, jlong jdb_handle, jboolean jsync) { auto* db = reinterpret_cast(jdb_handle); - auto s = db->PauseBackgroundWork(); + auto s = db->FlushWAL(jsync == JNI_TRUE); if (!s.ok()) { rocksdb::RocksDBExceptionJni::ThrowNew(env, s); } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::ContinueBackgroundWork - /* * Class: org_rocksdb_RocksDB - * Method: continueBackgroundWork + * Method: syncWal * Signature: (J)V */ -void Java_org_rocksdb_RocksDB_continueBackgroundWork(JNIEnv* env, - jobject /*jobj*/, - jlong jdb_handle) { +void Java_org_rocksdb_RocksDB_syncWal( + JNIEnv* env, jobject, jlong jdb_handle) { auto* db = reinterpret_cast(jdb_handle); - auto s = db->ContinueBackgroundWork(); + auto s = db->SyncWAL(); if (!s.ok()) { rocksdb::RocksDBExceptionJni::ThrowNew(env, s); } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::GetLatestSequenceNumber - /* * Class: org_rocksdb_RocksDB * Method: getLatestSequenceNumber * Signature: (J)V */ -jlong Java_org_rocksdb_RocksDB_getLatestSequenceNumber(JNIEnv* /*env*/, - jobject /*jdb*/, - jlong jdb_handle) { +jlong Java_org_rocksdb_RocksDB_getLatestSequenceNumber( + JNIEnv*, jobject, jlong jdb_handle) { auto* db = reinterpret_cast(jdb_handle); return db->GetLatestSequenceNumber(); } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB enable/disable file deletions +/* + * Class: org_rocksdb_RocksDB + * Method: setPreserveDeletesSequenceNumber + * Signature: (JJ)Z + */ +jboolean JNICALL Java_org_rocksdb_RocksDB_setPreserveDeletesSequenceNumber( + JNIEnv*, jobject, jlong jdb_handle, jlong jseq_number) { + auto* db = reinterpret_cast(jdb_handle); + if (db->SetPreserveDeletesSequenceNumber( + static_cast(jseq_number))) { + return JNI_TRUE; + } else { + return JNI_FALSE; + } +} /* * Class: org_rocksdb_RocksDB - * Method: enableFileDeletions + * Method: disableFileDeletions * Signature: (J)V */ -void Java_org_rocksdb_RocksDB_disableFileDeletions(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle) { +void Java_org_rocksdb_RocksDB_disableFileDeletions( + JNIEnv* env, jobject, jlong jdb_handle) { auto* db = reinterpret_cast(jdb_handle); rocksdb::Status s = db->DisableFileDeletions(); if (!s.ok()) { @@ -2153,9 +2570,8 @@ void Java_org_rocksdb_RocksDB_disableFileDeletions(JNIEnv* env, jobject /*jdb*/, * Method: enableFileDeletions * Signature: (JZ)V */ -void Java_org_rocksdb_RocksDB_enableFileDeletions(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jboolean jforce) { +void Java_org_rocksdb_RocksDB_enableFileDeletions( + JNIEnv* env, jobject, jlong jdb_handle, jboolean jforce) { auto* db = reinterpret_cast(jdb_handle); rocksdb::Status s = db->EnableFileDeletions(jforce); if (!s.ok()) { @@ -2163,17 +2579,84 @@ void Java_org_rocksdb_RocksDB_enableFileDeletions(JNIEnv* env, jobject /*jdb*/, } } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::GetUpdatesSince +/* + * Class: org_rocksdb_RocksDB + * Method: getLiveFiles + * Signature: (JZ)[Ljava/lang/String; + */ +jobjectArray Java_org_rocksdb_RocksDB_getLiveFiles( + JNIEnv* env, jobject, jlong jdb_handle, jboolean jflush_memtable) { + auto* db = reinterpret_cast(jdb_handle); + std::vector live_files; + uint64_t manifest_file_size = 0; + auto s = db->GetLiveFiles( + live_files, &manifest_file_size, jflush_memtable == JNI_TRUE); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + // append the manifest_file_size to the vector + // for passing back to java + live_files.push_back(std::to_string(manifest_file_size)); + + return rocksdb::JniUtil::toJavaStrings(env, &live_files); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: getSortedWalFiles + * Signature: (J)[Lorg/rocksdb/LogFile; + */ +jobjectArray Java_org_rocksdb_RocksDB_getSortedWalFiles( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + std::vector> sorted_wal_files; + auto s = db->GetSortedWalFiles(sorted_wal_files); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + // convert to Java type + const jsize jlen = static_cast(sorted_wal_files.size()); + jobjectArray jsorted_wal_files = env->NewObjectArray( + jlen, rocksdb::LogFileJni::getJClass(env), nullptr); + if(jsorted_wal_files == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + + jsize i = 0; + for (auto it = sorted_wal_files.begin(); it != sorted_wal_files.end(); ++it) { + jobject jlog_file = rocksdb::LogFileJni::fromCppLogFile(env, it->get()); + if (jlog_file == nullptr) { + // exception occurred + env->DeleteLocalRef(jsorted_wal_files); + return nullptr; + } + + env->SetObjectArrayElement(jsorted_wal_files, i++, jlog_file); + if (env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(jlog_file); + env->DeleteLocalRef(jsorted_wal_files); + return nullptr; + } + + env->DeleteLocalRef(jlog_file); + } + + return jsorted_wal_files; +} /* * Class: org_rocksdb_RocksDB * Method: getUpdatesSince * Signature: (JJ)J */ -jlong Java_org_rocksdb_RocksDB_getUpdatesSince(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, - jlong jsequence_number) { +jlong Java_org_rocksdb_RocksDB_getUpdatesSince( + JNIEnv* env, jobject, jlong jdb_handle, jlong jsequence_number) { auto* db = reinterpret_cast(jdb_handle); rocksdb::SequenceNumber sequence_number = static_cast(jsequence_number); @@ -2189,68 +2672,86 @@ jlong Java_org_rocksdb_RocksDB_getUpdatesSince(JNIEnv* env, jobject /*jdb*/, /* * Class: org_rocksdb_RocksDB - * Method: setOptions - * Signature: (JJ[Ljava/lang/String;[Ljava/lang/String;)V + * Method: deleteFile + * Signature: (JLjava/lang/String;)V */ -void Java_org_rocksdb_RocksDB_setOptions(JNIEnv* env, jobject /*jdb*/, - jlong jdb_handle, jlong jcf_handle, - jobjectArray jkeys, - jobjectArray jvalues) { - const jsize len = env->GetArrayLength(jkeys); - assert(len == env->GetArrayLength(jvalues)); - - std::unordered_map options_map; - for (jsize i = 0; i < len; i++) { - jobject jobj_key = env->GetObjectArrayElement(jkeys, i); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - return; - } - - jobject jobj_value = env->GetObjectArrayElement(jvalues, i); - if (env->ExceptionCheck()) { - // exception thrown: ArrayIndexOutOfBoundsException - env->DeleteLocalRef(jobj_key); - return; - } +void Java_org_rocksdb_RocksDB_deleteFile( + JNIEnv* env, jobject, jlong jdb_handle, jstring jname) { + auto* db = reinterpret_cast(jdb_handle); + jboolean has_exception = JNI_FALSE; + std::string name = + rocksdb::JniUtil::copyStdString(env, jname, &has_exception); + if (has_exception == JNI_TRUE) { + // exception occurred + return; + } + db->DeleteFile(name); +} - jstring jkey = reinterpret_cast(jobj_key); - jstring jval = reinterpret_cast(jobj_value); +/* + * Class: org_rocksdb_RocksDB + * Method: getLiveFilesMetaData + * Signature: (J)[Lorg/rocksdb/LiveFileMetaData; + */ +jobjectArray Java_org_rocksdb_RocksDB_getLiveFilesMetaData( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + std::vector live_files_meta_data; + db->GetLiveFilesMetaData(&live_files_meta_data); + + // convert to Java type + const jsize jlen = static_cast(live_files_meta_data.size()); + jobjectArray jlive_files_meta_data = env->NewObjectArray( + jlen, rocksdb::LiveFileMetaDataJni::getJClass(env), nullptr); + if(jlive_files_meta_data == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } - const char* key = env->GetStringUTFChars(jkey, nullptr); - if (key == nullptr) { - // exception thrown: OutOfMemoryError - env->DeleteLocalRef(jobj_value); - env->DeleteLocalRef(jobj_key); - return; + jsize i = 0; + for (auto it = live_files_meta_data.begin(); it != live_files_meta_data.end(); ++it) { + jobject jlive_file_meta_data = + rocksdb::LiveFileMetaDataJni::fromCppLiveFileMetaData(env, &(*it)); + if (jlive_file_meta_data == nullptr) { + // exception occurred + env->DeleteLocalRef(jlive_files_meta_data); + return nullptr; } - const char* value = env->GetStringUTFChars(jval, nullptr); - if (value == nullptr) { - // exception thrown: OutOfMemoryError - env->ReleaseStringUTFChars(jkey, key); - env->DeleteLocalRef(jobj_value); - env->DeleteLocalRef(jobj_key); - return; + env->SetObjectArrayElement(jlive_files_meta_data, i++, jlive_file_meta_data); + if (env->ExceptionCheck()) { + // exception occurred + env->DeleteLocalRef(jlive_file_meta_data); + env->DeleteLocalRef(jlive_files_meta_data); + return nullptr; } - std::string s_key(key); - std::string s_value(value); - options_map[s_key] = s_value; - - env->ReleaseStringUTFChars(jkey, key); - env->ReleaseStringUTFChars(jval, value); - env->DeleteLocalRef(jobj_key); - env->DeleteLocalRef(jobj_value); + env->DeleteLocalRef(jlive_file_meta_data); } - auto* db = reinterpret_cast(jdb_handle); - auto* cf_handle = reinterpret_cast(jcf_handle); - db->SetOptions(cf_handle, options_map); + return jlive_files_meta_data; } -////////////////////////////////////////////////////////////////////////////// -// rocksdb::DB::IngestExternalFile +/* + * Class: org_rocksdb_RocksDB + * Method: getColumnFamilyMetaData + * Signature: (JJ)Lorg/rocksdb/ColumnFamilyMetaData; + */ +jobject Java_org_rocksdb_RocksDB_getColumnFamilyMetaData( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + rocksdb::ColumnFamilyMetaData cf_metadata; + db->GetColumnFamilyMetaData(cf_handle, &cf_metadata); + return rocksdb::ColumnFamilyMetaDataJni::fromCppColumnFamilyMetaData( + env, &cf_metadata); +} /* * Class: org_rocksdb_RocksDB @@ -2258,7 +2759,7 @@ void Java_org_rocksdb_RocksDB_setOptions(JNIEnv* env, jobject /*jdb*/, * Signature: (JJ[Ljava/lang/String;IJ)V */ void Java_org_rocksdb_RocksDB_ingestExternalFile( - JNIEnv* env, jobject /*jdb*/, jlong jdb_handle, jlong jcf_handle, + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, jobjectArray jfile_path_list, jint jfile_path_list_len, jlong jingest_external_file_options_handle) { jboolean has_exception = JNI_FALSE; @@ -2281,14 +2782,249 @@ void Java_org_rocksdb_RocksDB_ingestExternalFile( } } +/* + * Class: org_rocksdb_RocksDB + * Method: verifyChecksum + * Signature: (J)V + */ +void Java_org_rocksdb_RocksDB_verifyChecksum( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto s = db->VerifyChecksum(); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } +} + +/* + * Class: org_rocksdb_RocksDB + * Method: getDefaultColumnFamily + * Signature: (J)J + */ +jlong Java_org_rocksdb_RocksDB_getDefaultColumnFamily( + JNIEnv*, jobject, jlong jdb_handle) { + auto* db_handle = reinterpret_cast(jdb_handle); + auto* cf_handle = db_handle->DefaultColumnFamily(); + return reinterpret_cast(cf_handle); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: getPropertiesOfAllTables + * Signature: (JJ)Ljava/util/Map; + */ +jobject Java_org_rocksdb_RocksDB_getPropertiesOfAllTables( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + rocksdb::TablePropertiesCollection table_properties_collection; + auto s = db->GetPropertiesOfAllTables(cf_handle, + &table_properties_collection); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } + + // convert to Java type + jobject jhash_map = rocksdb::HashMapJni::construct( + env, static_cast(table_properties_collection.size())); + if (jhash_map == nullptr) { + // exception occurred + return nullptr; + } + + const rocksdb::HashMapJni::FnMapKV, jobject, jobject> fn_map_kv = + [env](const std::pair>& kv) { + jstring jkey = rocksdb::JniUtil::toJavaString(env, &(kv.first), false); + if (env->ExceptionCheck()) { + // an error occurred + return std::unique_ptr>(nullptr); + } + + jobject jtable_properties = rocksdb::TablePropertiesJni::fromCppTableProperties(env, *(kv.second.get())); + if (jtable_properties == nullptr) { + // an error occurred + env->DeleteLocalRef(jkey); + return std::unique_ptr>(nullptr); + } + + return std::unique_ptr>(new std::pair(static_cast(jkey), static_cast(jtable_properties))); + }; + + if (!rocksdb::HashMapJni::putAll(env, jhash_map, table_properties_collection.begin(), table_properties_collection.end(), fn_map_kv)) { + // exception occurred + return nullptr; + } + + return jhash_map; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: getPropertiesOfTablesInRange + * Signature: (JJ[J)Ljava/util/Map; + */ +jobject Java_org_rocksdb_RocksDB_getPropertiesOfTablesInRange( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle, + jlongArray jrange_slice_handles) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + const jsize jlen = env->GetArrayLength(jrange_slice_handles); + jboolean jrange_slice_handles_is_copy = JNI_FALSE; + jlong *jrange_slice_handle = env->GetLongArrayElements( + jrange_slice_handles, &jrange_slice_handles_is_copy); + if (jrange_slice_handle == nullptr) { + // exception occurred + return nullptr; + } + + const size_t ranges_len = static_cast(jlen / 2); + auto ranges = std::unique_ptr(new rocksdb::Range[ranges_len]); + for (jsize i = 0, j = 0; i < jlen; ++i) { + auto* start = reinterpret_cast( + jrange_slice_handle[i]); + auto* limit = reinterpret_cast( + jrange_slice_handle[++i]); + ranges[j++] = rocksdb::Range(*start, *limit); + } + + rocksdb::TablePropertiesCollection table_properties_collection; + auto s = db->GetPropertiesOfTablesInRange( + cf_handle, ranges.get(), ranges_len, &table_properties_collection); + if (!s.ok()) { + // error occurred + env->ReleaseLongArrayElements(jrange_slice_handles, jrange_slice_handle, JNI_ABORT); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + // cleanup + env->ReleaseLongArrayElements(jrange_slice_handles, jrange_slice_handle, JNI_ABORT); + + return jrange_slice_handles; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: suggestCompactRange + * Signature: (JJ)[J + */ +jlongArray Java_org_rocksdb_RocksDB_suggestCompactRange( + JNIEnv* env, jobject, jlong jdb_handle, jlong jcf_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + auto* begin = new rocksdb::Slice(); + auto* end = new rocksdb::Slice(); + auto s = db->SuggestCompactRange(cf_handle, begin, end); + if (!s.ok()) { + // error occurred + delete begin; + delete end; + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + return nullptr; + } + + jlongArray jslice_handles = env->NewLongArray(2); + if (jslice_handles == nullptr) { + // exception thrown: OutOfMemoryError + delete begin; + delete end; + return nullptr; + } + + jlong slice_handles[2]; + slice_handles[0] = reinterpret_cast(begin); + slice_handles[1] = reinterpret_cast(end); + env->SetLongArrayRegion(jslice_handles, 0, 2, slice_handles); + if (env->ExceptionCheck()) { + // exception thrown: ArrayIndexOutOfBoundsException + delete begin; + delete end; + env->DeleteLocalRef(jslice_handles); + return nullptr; + } + + return jslice_handles; +} + +/* + * Class: org_rocksdb_RocksDB + * Method: promoteL0 + * Signature: (JJI)V + */ +void Java_org_rocksdb_RocksDB_promoteL0( + JNIEnv*, jobject, jlong jdb_handle, jlong jcf_handle, jint jtarget_level) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::ColumnFamilyHandle* cf_handle; + if (jcf_handle == 0) { + cf_handle = db->DefaultColumnFamily(); + } else { + cf_handle = + reinterpret_cast(jcf_handle); + } + db->PromoteL0(cf_handle, static_cast(jtarget_level)); +} + +/* + * Class: org_rocksdb_RocksDB + * Method: startTrace + * Signature: (JJJ)V + */ +void Java_org_rocksdb_RocksDB_startTrace( + JNIEnv* env, jobject, jlong jdb_handle, jlong jmax_trace_file_size, + jlong jtrace_writer_jnicallback_handle) { + auto* db = reinterpret_cast(jdb_handle); + rocksdb::TraceOptions trace_options; + trace_options.max_trace_file_size = + static_cast(jmax_trace_file_size); + // transfer ownership of trace writer from Java to C++ + auto trace_writer = std::unique_ptr( + reinterpret_cast( + jtrace_writer_jnicallback_handle)); + auto s = db->StartTrace(trace_options, std::move(trace_writer)); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } +} + +/* + * Class: org_rocksdb_RocksDB + * Method: endTrace + * Signature: (J)V + */ +JNIEXPORT void JNICALL Java_org_rocksdb_RocksDB_endTrace( + JNIEnv* env, jobject, jlong jdb_handle) { + auto* db = reinterpret_cast(jdb_handle); + auto s = db->EndTrace(); + if (!s.ok()) { + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + } +} + /* * Class: org_rocksdb_RocksDB * Method: destroyDB * Signature: (Ljava/lang/String;J)V */ -void Java_org_rocksdb_RocksDB_destroyDB(JNIEnv* env, jclass /*jcls*/, - jstring jdb_path, - jlong joptions_handle) { +void Java_org_rocksdb_RocksDB_destroyDB( + JNIEnv* env, jclass, jstring jdb_path, jlong joptions_handle) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { // exception thrown: OutOfMemoryError diff --git a/ceph/src/rocksdb/java/rocksjni/sst_file_manager.cc b/ceph/src/rocksdb/java/rocksjni/sst_file_manager.cc index c83ea00ef..3df3c9966 100644 --- a/ceph/src/rocksdb/java/rocksjni/sst_file_manager.cc +++ b/ceph/src/rocksdb/java/rocksjni/sst_file_manager.cc @@ -129,6 +129,8 @@ jobject Java_org_rocksdb_SstFileManager_getTrackedFiles(JNIEnv* env, reinterpret_cast*>(jhandle); auto tracked_files = sptr_sst_file_manager->get()->GetTrackedFiles(); + //TODO(AR) could refactor to share code with rocksdb::HashMapJni::fromCppMap(env, tracked_files); + const jobject jtracked_files = rocksdb::HashMapJni::construct( env, static_cast(tracked_files.size())); if (jtracked_files == nullptr) { @@ -136,7 +138,7 @@ jobject Java_org_rocksdb_SstFileManager_getTrackedFiles(JNIEnv* env, return nullptr; } - const rocksdb::HashMapJni::FnMapKV + const rocksdb::HashMapJni::FnMapKV fn_map_kv = [env](const std::pair& pair) { const jstring jtracked_file_path = diff --git a/ceph/src/rocksdb/java/rocksjni/statistics.cc b/ceph/src/rocksdb/java/rocksjni/statistics.cc index dd58cf60c..ae7ad5352 100644 --- a/ceph/src/rocksdb/java/rocksjni/statistics.cc +++ b/ceph/src/rocksdb/java/rocksjni/statistics.cc @@ -20,8 +20,10 @@ * Method: newStatistics * Signature: ()J */ -jlong Java_org_rocksdb_Statistics_newStatistics__(JNIEnv* env, jclass jcls) { - return Java_org_rocksdb_Statistics_newStatistics___3BJ(env, jcls, nullptr, 0); +jlong Java_org_rocksdb_Statistics_newStatistics__( + JNIEnv* env, jclass jcls) { + return Java_org_rocksdb_Statistics_newStatistics___3BJ( + env, jcls, nullptr, 0); } /* @@ -40,10 +42,10 @@ jlong Java_org_rocksdb_Statistics_newStatistics__J( * Method: newStatistics * Signature: ([B)J */ -jlong Java_org_rocksdb_Statistics_newStatistics___3B(JNIEnv* env, jclass jcls, - jbyteArray jhistograms) { - return Java_org_rocksdb_Statistics_newStatistics___3BJ(env, jcls, jhistograms, - 0); +jlong Java_org_rocksdb_Statistics_newStatistics___3B( + JNIEnv* env, jclass jcls, jbyteArray jhistograms) { + return Java_org_rocksdb_Statistics_newStatistics___3BJ( + env, jcls, jhistograms, 0); } /* @@ -52,8 +54,7 @@ jlong Java_org_rocksdb_Statistics_newStatistics___3B(JNIEnv* env, jclass jcls, * Signature: ([BJ)J */ jlong Java_org_rocksdb_Statistics_newStatistics___3BJ( - JNIEnv* env, jclass /*jcls*/, jbyteArray jhistograms, - jlong jother_statistics_handle) { + JNIEnv* env, jclass, jbyteArray jhistograms, jlong jother_statistics_handle) { std::shared_ptr* pSptr_other_statistics = nullptr; if (jother_statistics_handle > 0) { pSptr_other_statistics = @@ -97,9 +98,8 @@ jlong Java_org_rocksdb_Statistics_newStatistics___3BJ( * Method: disposeInternal * Signature: (J)V */ -void Java_org_rocksdb_Statistics_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_Statistics_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { if (jhandle > 0) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); @@ -112,13 +112,13 @@ void Java_org_rocksdb_Statistics_disposeInternal(JNIEnv* /*env*/, * Method: statsLevel * Signature: (J)B */ -jbyte Java_org_rocksdb_Statistics_statsLevel(JNIEnv* /*env*/, jobject /*jobj*/, - jlong jhandle) { +jbyte Java_org_rocksdb_Statistics_statsLevel( + JNIEnv*, jobject, jlong jhandle) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); return rocksdb::StatsLevelJni::toJavaStatsLevel( - pSptr_statistics->get()->stats_level_); + pSptr_statistics->get()->get_stats_level()); } /* @@ -126,14 +126,13 @@ jbyte Java_org_rocksdb_Statistics_statsLevel(JNIEnv* /*env*/, jobject /*jobj*/, * Method: setStatsLevel * Signature: (JB)V */ -void Java_org_rocksdb_Statistics_setStatsLevel(JNIEnv* /*env*/, - jobject /*jobj*/, jlong jhandle, - jbyte jstats_level) { +void Java_org_rocksdb_Statistics_setStatsLevel( + JNIEnv*, jobject, jlong jhandle, jbyte jstats_level) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); auto stats_level = rocksdb::StatsLevelJni::toCppStatsLevel(jstats_level); - pSptr_statistics->get()->stats_level_ = stats_level; + pSptr_statistics->get()->set_stats_level(stats_level); } /* @@ -141,15 +140,14 @@ void Java_org_rocksdb_Statistics_setStatsLevel(JNIEnv* /*env*/, * Method: getTickerCount * Signature: (JB)J */ -jlong Java_org_rocksdb_Statistics_getTickerCount(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jbyte jticker_type) { +jlong Java_org_rocksdb_Statistics_getTickerCount( + JNIEnv*, jobject, jlong jhandle, jbyte jticker_type) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); auto ticker = rocksdb::TickerTypeJni::toCppTickers(jticker_type); - return pSptr_statistics->get()->getTickerCount(ticker); + uint64_t count = pSptr_statistics->get()->getTickerCount(ticker); + return static_cast(count); } /* @@ -157,10 +155,8 @@ jlong Java_org_rocksdb_Statistics_getTickerCount(JNIEnv* /*env*/, * Method: getAndResetTickerCount * Signature: (JB)J */ -jlong Java_org_rocksdb_Statistics_getAndResetTickerCount(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle, - jbyte jticker_type) { +jlong Java_org_rocksdb_Statistics_getAndResetTickerCount( + JNIEnv*, jobject, jlong jhandle, jbyte jticker_type) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); @@ -173,17 +169,16 @@ jlong Java_org_rocksdb_Statistics_getAndResetTickerCount(JNIEnv* /*env*/, * Method: getHistogramData * Signature: (JB)Lorg/rocksdb/HistogramData; */ -jobject Java_org_rocksdb_Statistics_getHistogramData(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jbyte jhistogram_type) { +jobject Java_org_rocksdb_Statistics_getHistogramData( + JNIEnv* env, jobject, jlong jhandle, jbyte jhistogram_type) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); - rocksdb::HistogramData - data; // TODO(AR) perhaps better to construct a Java Object Wrapper that - // uses ptr to C++ `new HistogramData` + // TODO(AR) perhaps better to construct a Java Object Wrapper that + // uses ptr to C++ `new HistogramData` + rocksdb::HistogramData data; + auto histogram = rocksdb::HistogramTypeJni::toCppHistograms(jhistogram_type); pSptr_statistics->get()->histogramData( static_cast(histogram), &data); @@ -202,7 +197,8 @@ jobject Java_org_rocksdb_Statistics_getHistogramData(JNIEnv* env, return env->NewObject(jclazz, mid, data.median, data.percentile95, data.percentile99, data.average, - data.standard_deviation); + data.standard_deviation, data.max, data.count, + data.sum, data.min); } /* @@ -210,10 +206,8 @@ jobject Java_org_rocksdb_Statistics_getHistogramData(JNIEnv* env, * Method: getHistogramString * Signature: (JB)Ljava/lang/String; */ -jstring Java_org_rocksdb_Statistics_getHistogramString(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jbyte jhistogram_type) { +jstring Java_org_rocksdb_Statistics_getHistogramString( + JNIEnv* env, jobject, jlong jhandle, jbyte jhistogram_type) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); @@ -227,8 +221,8 @@ jstring Java_org_rocksdb_Statistics_getHistogramString(JNIEnv* env, * Method: reset * Signature: (J)V */ -void Java_org_rocksdb_Statistics_reset(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +void Java_org_rocksdb_Statistics_reset( + JNIEnv* env, jobject, jlong jhandle) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); @@ -243,8 +237,8 @@ void Java_org_rocksdb_Statistics_reset(JNIEnv* env, jobject /*jobj*/, * Method: toString * Signature: (J)Ljava/lang/String; */ -jstring Java_org_rocksdb_Statistics_toString(JNIEnv* env, jobject /*jobj*/, - jlong jhandle) { +jstring Java_org_rocksdb_Statistics_toString( + JNIEnv* env, jobject, jlong jhandle) { auto* pSptr_statistics = reinterpret_cast*>(jhandle); assert(pSptr_statistics != nullptr); diff --git a/ceph/src/rocksdb/java/rocksjni/statisticsjni.cc b/ceph/src/rocksdb/java/rocksjni/statisticsjni.cc index 3ac1e5b41..f59ace4df 100644 --- a/ceph/src/rocksdb/java/rocksjni/statisticsjni.cc +++ b/ceph/src/rocksdb/java/rocksjni/statisticsjni.cc @@ -10,25 +10,23 @@ namespace rocksdb { - StatisticsJni::StatisticsJni(std::shared_ptr stats) - : StatisticsImpl(stats, false), m_ignore_histograms() { - } +StatisticsJni::StatisticsJni(std::shared_ptr stats) + : StatisticsImpl(stats), m_ignore_histograms() {} - StatisticsJni::StatisticsJni(std::shared_ptr stats, - const std::set ignore_histograms) : StatisticsImpl(stats, false), - m_ignore_histograms(ignore_histograms) { - } +StatisticsJni::StatisticsJni(std::shared_ptr stats, + const std::set ignore_histograms) + : StatisticsImpl(stats), m_ignore_histograms(ignore_histograms) {} - bool StatisticsJni::HistEnabledForType(uint32_t type) const { - if (type >= HISTOGRAM_ENUM_MAX) { - return false; - } - - if (m_ignore_histograms.count(type) > 0) { - return false; - } +bool StatisticsJni::HistEnabledForType(uint32_t type) const { + if (type >= HISTOGRAM_ENUM_MAX) { + return false; + } - return true; + if (m_ignore_histograms.count(type) > 0) { + return false; } + + return true; +} // @lint-ignore TXT4 T25377293 Grandfathered in }; \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/table.cc b/ceph/src/rocksdb/java/rocksjni/table.cc index 5f5f8cd2a..1ccc550ab 100644 --- a/ceph/src/rocksdb/java/rocksjni/table.cc +++ b/ceph/src/rocksdb/java/rocksjni/table.cc @@ -9,6 +9,7 @@ #include #include "include/org_rocksdb_BlockBasedTableConfig.h" #include "include/org_rocksdb_PlainTableConfig.h" +#include "portal.h" #include "rocksdb/cache.h" #include "rocksdb/filter_policy.h" @@ -37,61 +38,102 @@ jlong Java_org_rocksdb_PlainTableConfig_newTableFactoryHandle( /* * Class: org_rocksdb_BlockBasedTableConfig * Method: newTableFactoryHandle - * Signature: (ZJIJJIIZIZZZJIBBI)J + * Signature: (ZZZZBBDBZJJJJIIIJZZJZZIIZZJIJI)J */ jlong Java_org_rocksdb_BlockBasedTableConfig_newTableFactoryHandle( - JNIEnv * /*env*/, jobject /*jobj*/, jboolean no_block_cache, - jlong block_cache_size, jint block_cache_num_shardbits, jlong jblock_cache, - jlong block_size, jint block_size_deviation, jint block_restart_interval, - jboolean whole_key_filtering, jlong jfilter_policy, - jboolean cache_index_and_filter_blocks, - jboolean pin_l0_filter_and_index_blocks_in_cache, - jboolean hash_index_allow_collision, jlong block_cache_compressed_size, - jint block_cache_compressd_num_shard_bits, jbyte jchecksum_type, - jbyte jindex_type, jint jformat_version) { + JNIEnv*, jobject, jboolean jcache_index_and_filter_blocks, + jboolean jcache_index_and_filter_blocks_with_high_priority, + jboolean jpin_l0_filter_and_index_blocks_in_cache, + jboolean jpin_top_level_index_and_filter, jbyte jindex_type_value, + jbyte jdata_block_index_type_value, + jdouble jdata_block_hash_table_util_ratio, jbyte jchecksum_type_value, + jboolean jno_block_cache, jlong jblock_cache_handle, + jlong jpersistent_cache_handle, + jlong jblock_cache_compressed_handle, jlong jblock_size, + jint jblock_size_deviation, jint jblock_restart_interval, + jint jindex_block_restart_interval, jlong jmetadata_block_size, + jboolean jpartition_filters, jboolean juse_delta_encoding, + jlong jfilter_policy_handle, jboolean jwhole_key_filtering, + jboolean jverify_compression, jint jread_amp_bytes_per_bit, + jint jformat_version, jboolean jenable_index_compression, + jboolean jblock_align, jlong jblock_cache_size, + jint jblock_cache_num_shard_bits, jlong jblock_cache_compressed_size, + jint jblock_cache_compressed_num_shard_bits) { rocksdb::BlockBasedTableOptions options; - options.no_block_cache = no_block_cache; - - if (!no_block_cache) { - if (jblock_cache > 0) { + options.cache_index_and_filter_blocks = + static_cast(jcache_index_and_filter_blocks); + options.cache_index_and_filter_blocks_with_high_priority = + static_cast(jcache_index_and_filter_blocks_with_high_priority); + options.pin_l0_filter_and_index_blocks_in_cache = + static_cast(jpin_l0_filter_and_index_blocks_in_cache); + options.pin_top_level_index_and_filter = + static_cast(jpin_top_level_index_and_filter); + options.index_type = + rocksdb::IndexTypeJni::toCppIndexType(jindex_type_value); + options.data_block_index_type = + rocksdb::DataBlockIndexTypeJni::toCppDataBlockIndexType( + jdata_block_index_type_value); + options.data_block_hash_table_util_ratio = + static_cast(jdata_block_hash_table_util_ratio); + options.checksum = + rocksdb::ChecksumTypeJni::toCppChecksumType(jchecksum_type_value); + options.no_block_cache = static_cast(jno_block_cache); + if (options.no_block_cache) { + options.block_cache = nullptr; + } else { + if (jblock_cache_handle > 0) { std::shared_ptr *pCache = - reinterpret_cast *>(jblock_cache); + reinterpret_cast *>(jblock_cache_handle); options.block_cache = *pCache; - } else if (block_cache_size > 0) { - if (block_cache_num_shardbits > 0) { - options.block_cache = - rocksdb::NewLRUCache(block_cache_size, block_cache_num_shardbits); + } else if (jblock_cache_size > 0) { + if (jblock_cache_num_shard_bits > 0) { + options.block_cache = rocksdb::NewLRUCache( + static_cast(jblock_cache_size), + static_cast(jblock_cache_num_shard_bits)); } else { - options.block_cache = rocksdb::NewLRUCache(block_cache_size); + options.block_cache = rocksdb::NewLRUCache( + static_cast(jblock_cache_size)); } } } - options.block_size = block_size; - options.block_size_deviation = block_size_deviation; - options.block_restart_interval = block_restart_interval; - options.whole_key_filtering = whole_key_filtering; - if (jfilter_policy > 0) { - std::shared_ptr *pFilterPolicy = - reinterpret_cast *>( - jfilter_policy); - options.filter_policy = *pFilterPolicy; + if (jpersistent_cache_handle > 0) { + std::shared_ptr *pCache = + reinterpret_cast *>(jpersistent_cache_handle); + options.persistent_cache = *pCache; } - options.cache_index_and_filter_blocks = cache_index_and_filter_blocks; - options.pin_l0_filter_and_index_blocks_in_cache = - pin_l0_filter_and_index_blocks_in_cache; - options.hash_index_allow_collision = hash_index_allow_collision; - if (block_cache_compressed_size > 0) { - if (block_cache_compressd_num_shard_bits > 0) { - options.block_cache = rocksdb::NewLRUCache( - block_cache_compressed_size, block_cache_compressd_num_shard_bits); + if (jblock_cache_compressed_handle > 0) { + std::shared_ptr *pCache = + reinterpret_cast *>(jblock_cache_compressed_handle); + options.block_cache_compressed = *pCache; + } else if (jblock_cache_compressed_size > 0) { + if (jblock_cache_compressed_num_shard_bits > 0) { + options.block_cache_compressed = rocksdb::NewLRUCache( + static_cast(jblock_cache_compressed_size), + static_cast(jblock_cache_compressed_num_shard_bits)); } else { - options.block_cache = rocksdb::NewLRUCache(block_cache_compressed_size); + options.block_cache_compressed = rocksdb::NewLRUCache( + static_cast(jblock_cache_compressed_size)); } } - options.checksum = static_cast(jchecksum_type); - options.index_type = - static_cast(jindex_type); - options.format_version = jformat_version; + options.block_size = static_cast(jblock_size); + options.block_size_deviation = static_cast(jblock_size_deviation); + options.block_restart_interval = static_cast(jblock_restart_interval); + options.index_block_restart_interval = static_cast(jindex_block_restart_interval); + options.metadata_block_size = static_cast(jmetadata_block_size); + options.partition_filters = static_cast(jpartition_filters); + options.use_delta_encoding = static_cast(juse_delta_encoding); + if (jfilter_policy_handle > 0) { + std::shared_ptr *pFilterPolicy = + reinterpret_cast *>( + jfilter_policy_handle); + options.filter_policy = *pFilterPolicy; + } + options.whole_key_filtering = static_cast(jwhole_key_filtering); + options.verify_compression = static_cast(jverify_compression); + options.read_amp_bytes_per_bit = static_cast(jread_amp_bytes_per_bit); + options.format_version = static_cast(jformat_version); + options.enable_index_compression = static_cast(jenable_index_compression); + options.block_align = static_cast(jblock_align); return reinterpret_cast(rocksdb::NewBlockBasedTableFactory(options)); } diff --git a/ceph/src/rocksdb/java/rocksjni/table_filter.cc b/ceph/src/rocksdb/java/rocksjni/table_filter.cc new file mode 100644 index 000000000..e5b355621 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/table_filter.cc @@ -0,0 +1,25 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// org.rocksdb.AbstractTableFilter. + +#include +#include + +#include "include/org_rocksdb_AbstractTableFilter.h" +#include "rocksjni/table_filter_jnicallback.h" + +/* + * Class: org_rocksdb_AbstractTableFilter + * Method: createNewTableFilter + * Signature: ()J + */ +jlong Java_org_rocksdb_AbstractTableFilter_createNewTableFilter( + JNIEnv* env, jobject jtable_filter) { + auto* table_filter_jnicallback = + new rocksdb::TableFilterJniCallback(env, jtable_filter); + return reinterpret_cast(table_filter_jnicallback); +} \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.cc b/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.cc new file mode 100644 index 000000000..680c01445 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.cc @@ -0,0 +1,62 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::TableFilter. + +#include "rocksjni/table_filter_jnicallback.h" +#include "rocksjni/portal.h" + +namespace rocksdb { +TableFilterJniCallback::TableFilterJniCallback( + JNIEnv* env, jobject jtable_filter) + : JniCallback(env, jtable_filter) { + m_jfilter_methodid = + AbstractTableFilterJni::getFilterMethod(env); + if(m_jfilter_methodid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } + + // create the function reference + /* + Note the JNI ENV must be obtained/release + on each call to the function itself as + it may be called from multiple threads + */ + m_table_filter_function = [this](const rocksdb::TableProperties& table_properties) { + jboolean attached_thread = JNI_FALSE; + JNIEnv* thread_env = getJniEnv(&attached_thread); + assert(thread_env != nullptr); + + // create a Java TableProperties object + jobject jtable_properties = TablePropertiesJni::fromCppTableProperties(thread_env, table_properties); + if (jtable_properties == nullptr) { + // exception thrown from fromCppTableProperties + thread_env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return false; + } + + jboolean result = thread_env->CallBooleanMethod(m_jcallback_obj, m_jfilter_methodid, jtable_properties); + if (thread_env->ExceptionCheck()) { + // exception thrown from CallBooleanMethod + thread_env->DeleteLocalRef(jtable_properties); + thread_env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return false; + } + + // ok... cleanup and then return + releaseJniEnv(attached_thread); + return static_cast(result); + }; +} + +std::function TableFilterJniCallback::GetTableFilterFunction() { + return m_table_filter_function; +} + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.h b/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.h new file mode 100644 index 000000000..39a0c90e0 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/table_filter_jnicallback.h @@ -0,0 +1,34 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::TableFilter. + +#ifndef JAVA_ROCKSJNI_TABLE_FILTER_JNICALLBACK_H_ +#define JAVA_ROCKSJNI_TABLE_FILTER_JNICALLBACK_H_ + +#include +#include +#include + +#include "rocksdb/table_properties.h" +#include "rocksjni/jnicallback.h" + +namespace rocksdb { + +class TableFilterJniCallback : public JniCallback { + public: + TableFilterJniCallback( + JNIEnv* env, jobject jtable_filter); + std::function GetTableFilterFunction(); + + private: + jmethodID m_jfilter_methodid; + std::function m_table_filter_function; +}; + +} //namespace rocksdb + +#endif // JAVA_ROCKSJNI_TABLE_FILTER_JNICALLBACK_H_ diff --git a/ceph/src/rocksdb/java/rocksjni/thread_status.cc b/ceph/src/rocksdb/java/rocksjni/thread_status.cc new file mode 100644 index 000000000..f70d515a5 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/thread_status.cc @@ -0,0 +1,121 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ and enables +// calling c++ rocksdb::ThreadStatus methods from Java side. + +#include + +#include "portal.h" +#include "include/org_rocksdb_ThreadStatus.h" +#include "rocksdb/thread_status.h" + +/* + * Class: org_rocksdb_ThreadStatus + * Method: getThreadTypeName + * Signature: (B)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_getThreadTypeName( + JNIEnv* env, jclass, jbyte jthread_type_value) { + auto name = rocksdb::ThreadStatus::GetThreadTypeName( + rocksdb::ThreadTypeJni::toCppThreadType(jthread_type_value)); + return rocksdb::JniUtil::toJavaString(env, &name, true); +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: getOperationName + * Signature: (B)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_getOperationName( + JNIEnv* env, jclass, jbyte joperation_type_value) { + auto name = rocksdb::ThreadStatus::GetOperationName( + rocksdb::OperationTypeJni::toCppOperationType(joperation_type_value)); + return rocksdb::JniUtil::toJavaString(env, &name, true); +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: microsToStringNative + * Signature: (J)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_microsToStringNative( + JNIEnv* env, jclass, jlong jmicros) { + auto str = + rocksdb::ThreadStatus::MicrosToString(static_cast(jmicros)); + return rocksdb::JniUtil::toJavaString(env, &str, true); +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: getOperationStageName + * Signature: (B)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_getOperationStageName( + JNIEnv* env, jclass, jbyte joperation_stage_value) { + auto name = rocksdb::ThreadStatus::GetOperationStageName( + rocksdb::OperationStageJni::toCppOperationStage(joperation_stage_value)); + return rocksdb::JniUtil::toJavaString(env, &name, true); +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: getOperationPropertyName + * Signature: (BI)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_getOperationPropertyName( + JNIEnv* env, jclass, jbyte joperation_type_value, jint jindex) { + auto name = rocksdb::ThreadStatus::GetOperationPropertyName( + rocksdb::OperationTypeJni::toCppOperationType(joperation_type_value), + static_cast(jindex)); + return rocksdb::JniUtil::toJavaString(env, &name, true); +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: interpretOperationProperties + * Signature: (B[J)Ljava/util/Map; + */ +jobject Java_org_rocksdb_ThreadStatus_interpretOperationProperties( + JNIEnv* env, jclass, jbyte joperation_type_value, + jlongArray joperation_properties) { + + //convert joperation_properties + const jsize len = env->GetArrayLength(joperation_properties); + const std::unique_ptr op_properties(new uint64_t[len]); + jlong* jop = env->GetLongArrayElements(joperation_properties, nullptr); + if (jop == nullptr) { + // exception thrown: OutOfMemoryError + return nullptr; + } + for (jsize i = 0; i < len; i++) { + op_properties[i] = static_cast(jop[i]); + } + env->ReleaseLongArrayElements(joperation_properties, jop, JNI_ABORT); + + // call the function + auto result = rocksdb::ThreadStatus::InterpretOperationProperties( + rocksdb::OperationTypeJni::toCppOperationType(joperation_type_value), + op_properties.get()); + jobject jresult = rocksdb::HashMapJni::fromCppMap(env, &result); + if (env->ExceptionCheck()) { + // exception occurred + return nullptr; + } + + return jresult; +} + +/* + * Class: org_rocksdb_ThreadStatus + * Method: getStateName + * Signature: (B)Ljava/lang/String; + */ +jstring Java_org_rocksdb_ThreadStatus_getStateName( + JNIEnv* env, jclass, jbyte jstate_type_value) { + auto name = rocksdb::ThreadStatus::GetStateName( + rocksdb::StateTypeJni::toCppStateType(jstate_type_value)); + return rocksdb::JniUtil::toJavaString(env, &name, true); +} \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/trace_writer.cc b/ceph/src/rocksdb/java/rocksjni/trace_writer.cc new file mode 100644 index 000000000..5d47cfcb3 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/trace_writer.cc @@ -0,0 +1,23 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::CompactionFilterFactory. + +#include + +#include "include/org_rocksdb_AbstractTraceWriter.h" +#include "rocksjni/trace_writer_jnicallback.h" + +/* + * Class: org_rocksdb_AbstractTraceWriter + * Method: createNewTraceWriter + * Signature: ()J + */ +jlong Java_org_rocksdb_AbstractTraceWriter_createNewTraceWriter( + JNIEnv* env, jobject jobj) { + auto* trace_writer = new rocksdb::TraceWriterJniCallback(env, jobj); + return reinterpret_cast(trace_writer); +} diff --git a/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.cc b/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.cc new file mode 100644 index 000000000..d547fb3f8 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.cc @@ -0,0 +1,115 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::TraceWriter. + +#include "rocksjni/trace_writer_jnicallback.h" +#include "rocksjni/portal.h" + +namespace rocksdb { +TraceWriterJniCallback::TraceWriterJniCallback( + JNIEnv* env, jobject jtrace_writer) + : JniCallback(env, jtrace_writer) { + m_jwrite_proxy_methodid = + AbstractTraceWriterJni::getWriteProxyMethodId(env); + if(m_jwrite_proxy_methodid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } + + m_jclose_writer_proxy_methodid = + AbstractTraceWriterJni::getCloseWriterProxyMethodId(env); + if(m_jclose_writer_proxy_methodid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } + + m_jget_file_size_methodid = + AbstractTraceWriterJni::getGetFileSizeMethodId(env); + if(m_jget_file_size_methodid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } +} + +Status TraceWriterJniCallback::Write(const Slice& data) { + jboolean attached_thread = JNI_FALSE; + JNIEnv* env = getJniEnv(&attached_thread); + if (env == nullptr) { + return Status::IOError("Unable to attach JNI Environment"); + } + + jshort jstatus = env->CallShortMethod(m_jcallback_obj, + m_jwrite_proxy_methodid, + &data); + + if(env->ExceptionCheck()) { + // exception thrown from CallShortMethod + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return Status::IOError("Unable to call AbstractTraceWriter#writeProxy(long)"); + } + + // unpack status code and status sub-code from jstatus + jbyte jcode_value = (jstatus >> 8) & 0xFF; + jbyte jsub_code_value = jstatus & 0xFF; + std::unique_ptr s = StatusJni::toCppStatus(jcode_value, jsub_code_value); + + releaseJniEnv(attached_thread); + + return Status(*s); +} + +Status TraceWriterJniCallback::Close() { + jboolean attached_thread = JNI_FALSE; + JNIEnv* env = getJniEnv(&attached_thread); + if (env == nullptr) { + return Status::IOError("Unable to attach JNI Environment"); + } + + jshort jstatus = env->CallShortMethod(m_jcallback_obj, + m_jclose_writer_proxy_methodid); + + if(env->ExceptionCheck()) { + // exception thrown from CallShortMethod + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return Status::IOError("Unable to call AbstractTraceWriter#closeWriterProxy()"); + } + + // unpack status code and status sub-code from jstatus + jbyte code_value = (jstatus >> 8) & 0xFF; + jbyte sub_code_value = jstatus & 0xFF; + std::unique_ptr s = StatusJni::toCppStatus(code_value, sub_code_value); + + releaseJniEnv(attached_thread); + + return Status(*s); +} + +uint64_t TraceWriterJniCallback::GetFileSize() { + jboolean attached_thread = JNI_FALSE; + JNIEnv* env = getJniEnv(&attached_thread); + if (env == nullptr) { + return 0; + } + + jlong jfile_size = env->CallLongMethod(m_jcallback_obj, + m_jget_file_size_methodid); + + if(env->ExceptionCheck()) { + // exception thrown from CallLongMethod + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return 0; + } + + releaseJniEnv(attached_thread); + + return static_cast(jfile_size); +} + +} // namespace rocksdb \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.h b/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.h new file mode 100644 index 000000000..610b6c465 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/trace_writer_jnicallback.h @@ -0,0 +1,36 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::TraceWriter. + +#ifndef JAVA_ROCKSJNI_TRACE_WRITER_JNICALLBACK_H_ +#define JAVA_ROCKSJNI_TRACE_WRITER_JNICALLBACK_H_ + +#include +#include + +#include "rocksdb/trace_reader_writer.h" +#include "rocksjni/jnicallback.h" + +namespace rocksdb { + +class TraceWriterJniCallback : public JniCallback, public TraceWriter { + public: + TraceWriterJniCallback( + JNIEnv* env, jobject jtrace_writer); + virtual Status Write(const Slice& data); + virtual Status Close(); + virtual uint64_t GetFileSize(); + + private: + jmethodID m_jwrite_proxy_methodid; + jmethodID m_jclose_writer_proxy_methodid; + jmethodID m_jget_file_size_methodid; +}; + +} //namespace rocksdb + +#endif // JAVA_ROCKSJNI_TRACE_WRITER_JNICALLBACK_H_ diff --git a/ceph/src/rocksdb/java/rocksjni/transaction.cc b/ceph/src/rocksdb/java/rocksjni/transaction.cc index a29736df2..04eb654df 100644 --- a/ceph/src/rocksdb/java/rocksjni/transaction.cc +++ b/ceph/src/rocksdb/java/rocksjni/transaction.cc @@ -418,20 +418,20 @@ jobjectArray Java_org_rocksdb_Transaction_multiGet__JJ_3_3B( /* * Class: org_rocksdb_Transaction * Method: getForUpdate - * Signature: (JJ[BIJZ)[B + * Signature: (JJ[BIJZZ)[B */ -jbyteArray Java_org_rocksdb_Transaction_getForUpdate__JJ_3BIJZ( +jbyteArray Java_org_rocksdb_Transaction_getForUpdate__JJ_3BIJZZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jlong jread_options_handle, jbyteArray jkey, jint jkey_part_len, jlong jcolumn_family_handle, - jboolean jexclusive) { + jboolean jexclusive, jboolean jdo_validate) { auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); auto* txn = reinterpret_cast(jhandle); FnGet fn_get_for_update = std::bind( + const rocksdb::Slice&, std::string*, bool, bool)>( &rocksdb::Transaction::GetForUpdate, txn, _1, column_family_handle, _2, - _3, jexclusive); + _3, jexclusive, jdo_validate); return txn_get_helper(env, fn_get_for_update, jread_options_handle, jkey, jkey_part_len); } @@ -439,15 +439,17 @@ jbyteArray Java_org_rocksdb_Transaction_getForUpdate__JJ_3BIJZ( /* * Class: org_rocksdb_Transaction * Method: getForUpdate - * Signature: (JJ[BIZ)[B + * Signature: (JJ[BIZZ)[B */ -jbyteArray Java_org_rocksdb_Transaction_getForUpdate__JJ_3BIZ( +jbyteArray Java_org_rocksdb_Transaction_getForUpdate__JJ_3BIZZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jlong jread_options_handle, - jbyteArray jkey, jint jkey_part_len, jboolean jexclusive) { + jbyteArray jkey, jint jkey_part_len, jboolean jexclusive, + jboolean jdo_validate) { auto* txn = reinterpret_cast(jhandle); FnGet fn_get_for_update = std::bind( - &rocksdb::Transaction::GetForUpdate, txn, _1, _2, _3, jexclusive); + const rocksdb::ReadOptions&, const rocksdb::Slice&, std::string*, bool, + bool)>(&rocksdb::Transaction::GetForUpdate, txn, _1, _2, _3, jexclusive, + jdo_validate); return txn_get_helper(env, fn_get_for_update, jread_options_handle, jkey, jkey_part_len); } @@ -568,19 +570,20 @@ void txn_write_kv_helper(JNIEnv* env, const FnWriteKV& fn_write_kv, /* * Class: org_rocksdb_Transaction * Method: put - * Signature: (J[BI[BIJ)V + * Signature: (J[BI[BIJZ)V */ -void Java_org_rocksdb_Transaction_put__J_3BI_3BIJ( +void Java_org_rocksdb_Transaction_put__J_3BI_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jbyteArray jkey, jint jkey_part_len, jbyteArray jval, jint jval_len, - jlong jcolumn_family_handle) { + jlong jcolumn_family_handle, jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteKV fn_put = std::bind(&rocksdb::Transaction::Put, txn, - column_family_handle, _1, _2); + const rocksdb::Slice&, bool)>(&rocksdb::Transaction::Put, txn, + column_family_handle, _1, _2, + jassume_tracked); txn_write_kv_helper(env, fn_put, jkey, jkey_part_len, jval, jval_len); } @@ -706,20 +709,21 @@ void txn_write_kv_parts_helper(JNIEnv* env, /* * Class: org_rocksdb_Transaction * Method: put - * Signature: (J[[BI[[BIJ)V + * Signature: (J[[BI[[BIJZ)V */ -void Java_org_rocksdb_Transaction_put__J_3_3BI_3_3BIJ( +void Java_org_rocksdb_Transaction_put__J_3_3BI_3_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jobjectArray jkey_parts, jint jkey_parts_len, jobjectArray jvalue_parts, jint jvalue_parts_len, - jlong jcolumn_family_handle) { + jlong jcolumn_family_handle, jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteKVParts fn_put_parts = std::bind(&rocksdb::Transaction::Put, txn, - column_family_handle, _1, _2); + const rocksdb::SliceParts&, bool)>(&rocksdb::Transaction::Put, txn, + column_family_handle, _1, _2, + jassume_tracked); txn_write_kv_parts_helper(env, fn_put_parts, jkey_parts, jkey_parts_len, jvalue_parts, jvalue_parts_len); } @@ -744,19 +748,20 @@ void Java_org_rocksdb_Transaction_put__J_3_3BI_3_3BI( /* * Class: org_rocksdb_Transaction * Method: merge - * Signature: (J[BI[BIJ)V + * Signature: (J[BI[BIJZ)V */ -void Java_org_rocksdb_Transaction_merge__J_3BI_3BIJ( +void Java_org_rocksdb_Transaction_merge__J_3BI_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jbyteArray jkey, jint jkey_part_len, jbyteArray jval, jint jval_len, - jlong jcolumn_family_handle) { + jlong jcolumn_family_handle, jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteKV fn_merge = std::bind(&rocksdb::Transaction::Merge, txn, - column_family_handle, _1, _2); + const rocksdb::Slice&, bool)>(&rocksdb::Transaction::Merge, txn, + column_family_handle, _1, _2, + jassume_tracked); txn_write_kv_helper(env, fn_merge, jkey, jkey_part_len, jval, jval_len); } @@ -803,18 +808,18 @@ void txn_write_k_helper(JNIEnv* env, const FnWriteK& fn_write_k, /* * Class: org_rocksdb_Transaction * Method: delete - * Signature: (J[BIJ)V + * Signature: (J[BIJZ)V */ -void Java_org_rocksdb_Transaction_delete__J_3BIJ(JNIEnv* env, jobject /*jobj*/, - jlong jhandle, jbyteArray jkey, - jint jkey_part_len, - jlong jcolumn_family_handle) { +void Java_org_rocksdb_Transaction_delete__J_3BIJZ( + JNIEnv* env, jobject /*jobj*/, jlong jhandle, jbyteArray jkey, + jint jkey_part_len, jlong jcolumn_family_handle, jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteK fn_delete = std::bind( - &rocksdb::Transaction::Delete, txn, column_family_handle, _1); + rocksdb::ColumnFamilyHandle*, const rocksdb::Slice&, bool)>( + &rocksdb::Transaction::Delete, txn, column_family_handle, _1, + jassume_tracked); txn_write_k_helper(env, fn_delete, jkey, jkey_part_len); } @@ -892,18 +897,20 @@ void txn_write_k_parts_helper(JNIEnv* env, /* * Class: org_rocksdb_Transaction * Method: delete - * Signature: (J[[BIJ)V + * Signature: (J[[BIJZ)V */ -void Java_org_rocksdb_Transaction_delete__J_3_3BIJ( +void Java_org_rocksdb_Transaction_delete__J_3_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jobjectArray jkey_parts, - jint jkey_parts_len, jlong jcolumn_family_handle) { + jint jkey_parts_len, jlong jcolumn_family_handle, + jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteKParts fn_delete_parts = std::bind( - &rocksdb::Transaction::Delete, txn, column_family_handle, _1); + rocksdb::ColumnFamilyHandle*, const rocksdb::SliceParts&, bool)>( + &rocksdb::Transaction::Delete, txn, column_family_handle, _1, + jassume_tracked); txn_write_k_parts_helper(env, fn_delete_parts, jkey_parts, jkey_parts_len); } @@ -926,18 +933,19 @@ void Java_org_rocksdb_Transaction_delete__J_3_3BI(JNIEnv* env, jobject /*jobj*/, /* * Class: org_rocksdb_Transaction * Method: singleDelete - * Signature: (J[BIJ)V + * Signature: (J[BIJZ)V */ -void Java_org_rocksdb_Transaction_singleDelete__J_3BIJ( +void Java_org_rocksdb_Transaction_singleDelete__J_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jbyteArray jkey, - jint jkey_part_len, jlong jcolumn_family_handle) { + jint jkey_part_len, jlong jcolumn_family_handle, jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteK fn_single_delete = std::bind( - &rocksdb::Transaction::SingleDelete, txn, column_family_handle, _1); + rocksdb::ColumnFamilyHandle*, const rocksdb::Slice&, bool)>( + &rocksdb::Transaction::SingleDelete, txn, column_family_handle, _1, + jassume_tracked); txn_write_k_helper(env, fn_single_delete, jkey, jkey_part_len); } @@ -961,18 +969,20 @@ void Java_org_rocksdb_Transaction_singleDelete__J_3BI(JNIEnv* env, /* * Class: org_rocksdb_Transaction * Method: singleDelete - * Signature: (J[[BIJ)V + * Signature: (J[[BIJZ)V */ -void Java_org_rocksdb_Transaction_singleDelete__J_3_3BIJ( +void Java_org_rocksdb_Transaction_singleDelete__J_3_3BIJZ( JNIEnv* env, jobject /*jobj*/, jlong jhandle, jobjectArray jkey_parts, - jint jkey_parts_len, jlong jcolumn_family_handle) { + jint jkey_parts_len, jlong jcolumn_family_handle, + jboolean jassume_tracked) { auto* txn = reinterpret_cast(jhandle); auto* column_family_handle = reinterpret_cast(jcolumn_family_handle); FnWriteKParts fn_single_delete_parts = std::bind( - &rocksdb::Transaction::SingleDelete, txn, column_family_handle, _1); + rocksdb::ColumnFamilyHandle*, const rocksdb::SliceParts&, bool)>( + &rocksdb::Transaction::SingleDelete, txn, column_family_handle, _1, + jassume_tracked); txn_write_k_parts_helper(env, fn_single_delete_parts, jkey_parts, jkey_parts_len); } diff --git a/ceph/src/rocksdb/java/rocksjni/transaction_db.cc b/ceph/src/rocksdb/java/rocksjni/transaction_db.cc index 1914a2422..c2c40bf10 100644 --- a/ceph/src/rocksdb/java/rocksjni/transaction_db.cc +++ b/ceph/src/rocksdb/java/rocksjni/transaction_db.cc @@ -25,7 +25,7 @@ * Signature: (JJLjava/lang/String;)J */ jlong Java_org_rocksdb_TransactionDB_open__JJLjava_lang_String_2( - JNIEnv* env, jclass /*jcls*/, jlong joptions_handle, + JNIEnv* env, jclass, jlong joptions_handle, jlong jtxn_db_options_handle, jstring jdb_path) { auto* options = reinterpret_cast(joptions_handle); auto* txn_db_options = @@ -54,7 +54,7 @@ jlong Java_org_rocksdb_TransactionDB_open__JJLjava_lang_String_2( * Signature: (JJLjava/lang/String;[[B[J)[J */ jlongArray Java_org_rocksdb_TransactionDB_open__JJLjava_lang_String_2_3_3B_3J( - JNIEnv* env, jclass /*jcls*/, jlong jdb_options_handle, + JNIEnv* env, jclass, jlong jdb_options_handle, jlong jtxn_db_options_handle, jstring jdb_path, jobjectArray jcolumn_names, jlongArray jcolumn_options_handles) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); @@ -151,14 +151,38 @@ jlongArray Java_org_rocksdb_TransactionDB_open__JJLjava_lang_String_2_3_3B_3J( } } +/* + * Class: org_rocksdb_TransactionDB + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_TransactionDB_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* txn_db = reinterpret_cast(jhandle); + assert(txn_db != nullptr); + delete txn_db; +} + +/* + * Class: org_rocksdb_TransactionDB + * Method: closeDatabase + * Signature: (J)V + */ +void Java_org_rocksdb_TransactionDB_closeDatabase( + JNIEnv* env, jclass, jlong jhandle) { + auto* txn_db = reinterpret_cast(jhandle); + assert(txn_db != nullptr); + rocksdb::Status s = txn_db->Close(); + rocksdb::RocksDBExceptionJni::ThrowNew(env, s); +} + /* * Class: org_rocksdb_TransactionDB * Method: beginTransaction * Signature: (JJ)J */ jlong Java_org_rocksdb_TransactionDB_beginTransaction__JJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle) { auto* txn_db = reinterpret_cast(jhandle); auto* write_options = reinterpret_cast(jwrite_options_handle); @@ -172,8 +196,8 @@ jlong Java_org_rocksdb_TransactionDB_beginTransaction__JJ( * Signature: (JJJ)J */ jlong Java_org_rocksdb_TransactionDB_beginTransaction__JJJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle, jlong jtxn_options_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle, + jlong jtxn_options_handle) { auto* txn_db = reinterpret_cast(jhandle); auto* write_options = reinterpret_cast(jwrite_options_handle); @@ -190,8 +214,8 @@ jlong Java_org_rocksdb_TransactionDB_beginTransaction__JJJ( * Signature: (JJJ)J */ jlong Java_org_rocksdb_TransactionDB_beginTransaction_1withOld__JJJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle, jlong jold_txn_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle, + jlong jold_txn_handle) { auto* txn_db = reinterpret_cast(jhandle); auto* write_options = reinterpret_cast(jwrite_options_handle); @@ -214,9 +238,8 @@ jlong Java_org_rocksdb_TransactionDB_beginTransaction_1withOld__JJJ( * Signature: (JJJJ)J */ jlong Java_org_rocksdb_TransactionDB_beginTransaction_1withOld__JJJJ( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jlong jwrite_options_handle, jlong jtxn_options_handle, - jlong jold_txn_handle) { + JNIEnv*, jobject, jlong jhandle, jlong jwrite_options_handle, + jlong jtxn_options_handle, jlong jold_txn_handle) { auto* txn_db = reinterpret_cast(jhandle); auto* write_options = reinterpret_cast(jwrite_options_handle); @@ -239,10 +262,8 @@ jlong Java_org_rocksdb_TransactionDB_beginTransaction_1withOld__JJJJ( * Method: getTransactionByName * Signature: (JLjava/lang/String;)J */ -jlong Java_org_rocksdb_TransactionDB_getTransactionByName(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle, - jstring jname) { +jlong Java_org_rocksdb_TransactionDB_getTransactionByName( + JNIEnv* env, jobject, jlong jhandle, jstring jname) { auto* txn_db = reinterpret_cast(jhandle); const char* name = env->GetStringUTFChars(jname, nullptr); if (name == nullptr) { @@ -260,7 +281,7 @@ jlong Java_org_rocksdb_TransactionDB_getTransactionByName(JNIEnv* env, * Signature: (J)[J */ jlongArray Java_org_rocksdb_TransactionDB_getAllPreparedTransactions( - JNIEnv* env, jobject /*jobj*/, jlong jhandle) { + JNIEnv* env, jobject, jlong jhandle) { auto* txn_db = reinterpret_cast(jhandle); std::vector txns; txn_db->GetAllPreparedTransactions(&txns); @@ -294,9 +315,8 @@ jlongArray Java_org_rocksdb_TransactionDB_getAllPreparedTransactions( * Method: getLockStatusData * Signature: (J)Ljava/util/Map; */ -jobject Java_org_rocksdb_TransactionDB_getLockStatusData(JNIEnv* env, - jobject /*jobj*/, - jlong jhandle) { +jobject Java_org_rocksdb_TransactionDB_getLockStatusData( + JNIEnv* env, jobject, jlong jhandle) { auto* txn_db = reinterpret_cast(jhandle); const std::unordered_multimap lock_status_data = txn_db->GetLockStatusData(); @@ -307,7 +327,7 @@ jobject Java_org_rocksdb_TransactionDB_getLockStatusData(JNIEnv* env, return nullptr; } - const rocksdb::HashMapJni::FnMapKV + const rocksdb::HashMapJni::FnMapKV fn_map_kv = [env]( const std::pair& @@ -427,19 +447,7 @@ jobjectArray Java_org_rocksdb_TransactionDB_getDeadlockInfoBuffer( * Signature: (JI)V */ void Java_org_rocksdb_TransactionDB_setDeadlockInfoBufferSize( - JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle, - jint jdeadlock_info_buffer_size) { + JNIEnv*, jobject, jlong jhandle, jint jdeadlock_info_buffer_size) { auto* txn_db = reinterpret_cast(jhandle); txn_db->SetDeadlockInfoBufferSize(jdeadlock_info_buffer_size); } - -/* - * Class: org_rocksdb_TransactionDB - * Method: disposeInternal - * Signature: (J)V - */ -void Java_org_rocksdb_TransactionDB_disposeInternal(JNIEnv* /*env*/, - jobject /*jobj*/, - jlong jhandle) { - delete reinterpret_cast(jhandle); -} diff --git a/ceph/src/rocksdb/java/rocksjni/ttl.cc b/ceph/src/rocksdb/java/rocksjni/ttl.cc index 597332e52..4b071e7b3 100644 --- a/ceph/src/rocksdb/java/rocksjni/ttl.cc +++ b/ceph/src/rocksdb/java/rocksjni/ttl.cc @@ -23,9 +23,9 @@ * Method: open * Signature: (JLjava/lang/String;IZ)J */ -jlong Java_org_rocksdb_TtlDB_open(JNIEnv* env, jclass /*jcls*/, - jlong joptions_handle, jstring jdb_path, - jint jttl, jboolean jread_only) { +jlong Java_org_rocksdb_TtlDB_open( + JNIEnv* env, jclass, jlong joptions_handle, jstring jdb_path, jint jttl, + jboolean jread_only) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { // exception thrown: OutOfMemoryError @@ -53,11 +53,10 @@ jlong Java_org_rocksdb_TtlDB_open(JNIEnv* env, jclass /*jcls*/, * Method: openCF * Signature: (JLjava/lang/String;[[B[J[IZ)[J */ -jlongArray Java_org_rocksdb_TtlDB_openCF(JNIEnv* env, jclass /*jcls*/, - jlong jopt_handle, jstring jdb_path, - jobjectArray jcolumn_names, - jlongArray jcolumn_options, - jintArray jttls, jboolean jread_only) { +jlongArray Java_org_rocksdb_TtlDB_openCF( + JNIEnv* env, jclass, jlong jopt_handle, jstring jdb_path, + jobjectArray jcolumn_names, jlongArray jcolumn_options, + jintArray jttls, jboolean jread_only) { const char* db_path = env->GetStringUTFChars(jdb_path, nullptr); if (db_path == nullptr) { // exception thrown: OutOfMemoryError @@ -147,13 +146,40 @@ jlongArray Java_org_rocksdb_TtlDB_openCF(JNIEnv* env, jclass /*jcls*/, } } +/* + * Class: org_rocksdb_TtlDB + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_TtlDB_disposeInternal( + JNIEnv*, jobject, jlong jhandle) { + auto* ttl_db = reinterpret_cast(jhandle); + assert(ttl_db != nullptr); + delete ttl_db; +} + +/* + * Class: org_rocksdb_TtlDB + * Method: closeDatabase + * Signature: (J)V + */ +void Java_org_rocksdb_TtlDB_closeDatabase( + JNIEnv* /* env */, jclass, jlong /* jhandle */) { + //auto* ttl_db = reinterpret_cast(jhandle); + //assert(ttl_db != nullptr); + //rocksdb::Status s = ttl_db->Close(); + //rocksdb::RocksDBExceptionJni::ThrowNew(env, s); + + //TODO(AR) this is disabled until https://github.com/facebook/rocksdb/issues/4818 is resolved! +} + /* * Class: org_rocksdb_TtlDB * Method: createColumnFamilyWithTtl * Signature: (JLorg/rocksdb/ColumnFamilyDescriptor;[BJI)J; */ jlong Java_org_rocksdb_TtlDB_createColumnFamilyWithTtl( - JNIEnv* env, jobject /*jobj*/, jlong jdb_handle, jbyteArray jcolumn_name, + JNIEnv* env, jobject, jlong jdb_handle, jbyteArray jcolumn_name, jlong jcolumn_options, jint jttl) { jbyte* cfname = env->GetByteArrayElements(jcolumn_name, nullptr); if (cfname == nullptr) { diff --git a/ceph/src/rocksdb/java/rocksjni/wal_filter.cc b/ceph/src/rocksdb/java/rocksjni/wal_filter.cc new file mode 100644 index 000000000..c74e54252 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/wal_filter.cc @@ -0,0 +1,23 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the "bridge" between Java and C++ for +// rocksdb::WalFilter. + +#include + +#include "include/org_rocksdb_AbstractWalFilter.h" +#include "rocksjni/wal_filter_jnicallback.h" + +/* + * Class: org_rocksdb_AbstractWalFilter + * Method: createNewWalFilter + * Signature: ()J + */ +jlong Java_org_rocksdb_AbstractWalFilter_createNewWalFilter( + JNIEnv* env, jobject jobj) { + auto* wal_filter = new rocksdb::WalFilterJniCallback(env, jobj); + return reinterpret_cast(wal_filter); +} \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.cc b/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.cc new file mode 100644 index 000000000..8fd909258 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.cc @@ -0,0 +1,144 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::WalFilter. + +#include "rocksjni/wal_filter_jnicallback.h" +#include "rocksjni/portal.h" + +namespace rocksdb { +WalFilterJniCallback::WalFilterJniCallback( + JNIEnv* env, jobject jwal_filter) + : JniCallback(env, jwal_filter) { + // Note: The name of a WalFilter will not change during it's lifetime, + // so we cache it in a global var + jmethodID jname_mid = AbstractWalFilterJni::getNameMethodId(env); + if(jname_mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } + jstring jname = (jstring)env->CallObjectMethod(m_jcallback_obj, jname_mid); + if(env->ExceptionCheck()) { + // exception thrown + return; + } + jboolean has_exception = JNI_FALSE; + m_name = JniUtil::copyString(env, jname, + &has_exception); // also releases jname + if (has_exception == JNI_TRUE) { + // exception thrown + return; + } + + m_column_family_log_number_map_mid = + AbstractWalFilterJni::getColumnFamilyLogNumberMapMethodId(env); + if(m_column_family_log_number_map_mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } + + m_log_record_found_proxy_mid = + AbstractWalFilterJni::getLogRecordFoundProxyMethodId(env); + if(m_log_record_found_proxy_mid == nullptr) { + // exception thrown: NoSuchMethodException or OutOfMemoryError + return; + } +} + +void WalFilterJniCallback::ColumnFamilyLogNumberMap( + const std::map& cf_lognumber_map, + const std::map& cf_name_id_map) { + jboolean attached_thread = JNI_FALSE; + JNIEnv* env = getJniEnv(&attached_thread); + if (env == nullptr) { + return; + } + + jobject jcf_lognumber_map = + rocksdb::HashMapJni::fromCppMap(env, &cf_lognumber_map); + if (jcf_lognumber_map == nullptr) { + // exception occurred + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return; + } + + jobject jcf_name_id_map = + rocksdb::HashMapJni::fromCppMap(env, &cf_name_id_map); + if (jcf_name_id_map == nullptr) { + // exception occurred + env->ExceptionDescribe(); // print out exception to stderr + env->DeleteLocalRef(jcf_lognumber_map); + releaseJniEnv(attached_thread); + return; + } + + env->CallVoidMethod(m_jcallback_obj, + m_column_family_log_number_map_mid, + jcf_lognumber_map, + jcf_name_id_map); + + env->DeleteLocalRef(jcf_lognumber_map); + env->DeleteLocalRef(jcf_name_id_map); + + if(env->ExceptionCheck()) { + // exception thrown from CallVoidMethod + env->ExceptionDescribe(); // print out exception to stderr + } + + releaseJniEnv(attached_thread); +} + + WalFilter::WalProcessingOption WalFilterJniCallback::LogRecordFound( + unsigned long long log_number, const std::string& log_file_name, + const WriteBatch& batch, WriteBatch* new_batch, bool* batch_changed) { + jboolean attached_thread = JNI_FALSE; + JNIEnv* env = getJniEnv(&attached_thread); + if (env == nullptr) { + return WalFilter::WalProcessingOption::kCorruptedRecord; + } + + jstring jlog_file_name = JniUtil::toJavaString(env, &log_file_name); + if (jlog_file_name == nullptr) { + // exception occcurred + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return WalFilter::WalProcessingOption::kCorruptedRecord; + } + + jshort jlog_record_found_result = env->CallShortMethod(m_jcallback_obj, + m_log_record_found_proxy_mid, + static_cast(log_number), + jlog_file_name, + reinterpret_cast(&batch), + reinterpret_cast(new_batch)); + + env->DeleteLocalRef(jlog_file_name); + + if (env->ExceptionCheck()) { + // exception thrown from CallShortMethod + env->ExceptionDescribe(); // print out exception to stderr + releaseJniEnv(attached_thread); + return WalFilter::WalProcessingOption::kCorruptedRecord; + } + + // unpack WalProcessingOption and batch_changed from jlog_record_found_result + jbyte jwal_processing_option_value = (jlog_record_found_result >> 8) & 0xFF; + jbyte jbatch_changed_value = jlog_record_found_result & 0xFF; + + releaseJniEnv(attached_thread); + + *batch_changed = jbatch_changed_value == JNI_TRUE; + + return WalProcessingOptionJni::toCppWalProcessingOption( + jwal_processing_option_value); +} + +const char* WalFilterJniCallback::Name() const { + return m_name.get(); +} + +} // namespace rocksdb \ No newline at end of file diff --git a/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.h b/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.h new file mode 100644 index 000000000..df6394cef --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/wal_filter_jnicallback.h @@ -0,0 +1,42 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// This file implements the callback "bridge" between Java and C++ for +// rocksdb::WalFilter. + +#ifndef JAVA_ROCKSJNI_WAL_FILTER_JNICALLBACK_H_ +#define JAVA_ROCKSJNI_WAL_FILTER_JNICALLBACK_H_ + +#include +#include +#include +#include + +#include "rocksdb/wal_filter.h" +#include "rocksjni/jnicallback.h" + +namespace rocksdb { + +class WalFilterJniCallback : public JniCallback, public WalFilter { + public: + WalFilterJniCallback( + JNIEnv* env, jobject jwal_filter); + virtual void ColumnFamilyLogNumberMap( + const std::map& cf_lognumber_map, + const std::map& cf_name_id_map); + virtual WalFilter::WalProcessingOption LogRecordFound( + unsigned long long log_number, const std::string& log_file_name, + const WriteBatch& batch, WriteBatch* new_batch, bool* batch_changed); + virtual const char* Name() const; + + private: + std::unique_ptr m_name; + jmethodID m_column_family_log_number_map_mid; + jmethodID m_log_record_found_proxy_mid; +}; + +} //namespace rocksdb + +#endif // JAVA_ROCKSJNI_WAL_FILTER_JNICALLBACK_H_ diff --git a/ceph/src/rocksdb/java/rocksjni/write_buffer_manager.cc b/ceph/src/rocksdb/java/rocksjni/write_buffer_manager.cc new file mode 100644 index 000000000..043f69031 --- /dev/null +++ b/ceph/src/rocksdb/java/rocksjni/write_buffer_manager.cc @@ -0,0 +1,38 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#include + +#include "include/org_rocksdb_WriteBufferManager.h" + +#include "rocksdb/cache.h" +#include "rocksdb/write_buffer_manager.h" + +/* + * Class: org_rocksdb_WriteBufferManager + * Method: newWriteBufferManager + * Signature: (JJ)J + */ +jlong Java_org_rocksdb_WriteBufferManager_newWriteBufferManager( + JNIEnv* /*env*/, jclass /*jclazz*/, jlong jbuffer_size, jlong jcache_handle) { + auto* cache_ptr = + reinterpret_cast *>(jcache_handle); + auto* write_buffer_manager = new std::shared_ptr( + std::make_shared(jbuffer_size, *cache_ptr)); + return reinterpret_cast(write_buffer_manager); +} + +/* + * Class: org_rocksdb_WriteBufferManager + * Method: disposeInternal + * Signature: (J)V + */ +void Java_org_rocksdb_WriteBufferManager_disposeInternal( + JNIEnv* /*env*/, jobject /*jobj*/, jlong jhandle) { + auto* write_buffer_manager = + reinterpret_cast *>(jhandle); + assert(write_buffer_manager != nullptr); + delete write_buffer_manager; +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractCompactionFilterFactory.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractCompactionFilterFactory.java index b970263eb..380b4461d 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractCompactionFilterFactory.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractCompactionFilterFactory.java @@ -57,6 +57,8 @@ public abstract class AbstractCompactionFilterFactory, + K extends MutableOptionKey> { + + private final Map> options = new LinkedHashMap<>(); + + protected abstract U self(); + + /** + * Get all of the possible keys + * + * @return A map of all keys, indexed by name. + */ + protected abstract Map allKeys(); + + /** + * Construct a sub-class instance of {@link AbstractMutableOptions}. + * + * @param keys the keys + * @param values the values + * + * @return an instance of the options. + */ + protected abstract T build(final String[] keys, final String[] values); + + public T build() { + final String keys[] = new String[options.size()]; + final String values[] = new String[options.size()]; + + int i = 0; + for (final Map.Entry> option : options.entrySet()) { + keys[i] = option.getKey().name(); + values[i] = option.getValue().asString(); + i++; + } + + return build(keys, values); + } + + protected U setDouble( + final K key, final double value) { + if (key.getValueType() != MutableOptionKey.ValueType.DOUBLE) { + throw new IllegalArgumentException( + key + " does not accept a double value"); + } + options.put(key, MutableOptionValue.fromDouble(value)); + return self(); + } + + protected double getDouble(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + return value.asDouble(); + } + + protected U setLong( + final K key, final long value) { + if(key.getValueType() != MutableOptionKey.ValueType.LONG) { + throw new IllegalArgumentException( + key + " does not accept a long value"); + } + options.put(key, MutableOptionValue.fromLong(value)); + return self(); + } + + protected long getLong(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + return value.asLong(); + } + + protected U setInt( + final K key, final int value) { + if(key.getValueType() != MutableOptionKey.ValueType.INT) { + throw new IllegalArgumentException( + key + " does not accept an integer value"); + } + options.put(key, MutableOptionValue.fromInt(value)); + return self(); + } + + protected int getInt(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + return value.asInt(); + } + + protected U setBoolean( + final K key, final boolean value) { + if(key.getValueType() != MutableOptionKey.ValueType.BOOLEAN) { + throw new IllegalArgumentException( + key + " does not accept a boolean value"); + } + options.put(key, MutableOptionValue.fromBoolean(value)); + return self(); + } + + protected boolean getBoolean(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + return value.asBoolean(); + } + + protected U setIntArray( + final K key, final int[] value) { + if(key.getValueType() != MutableOptionKey.ValueType.INT_ARRAY) { + throw new IllegalArgumentException( + key + " does not accept an int array value"); + } + options.put(key, MutableOptionValue.fromIntArray(value)); + return self(); + } + + protected int[] getIntArray(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + return value.asIntArray(); + } + + protected > U setEnum( + final K key, final N value) { + if(key.getValueType() != MutableOptionKey.ValueType.ENUM) { + throw new IllegalArgumentException( + key + " does not accept a Enum value"); + } + options.put(key, MutableOptionValue.fromEnum(value)); + return self(); + } + + protected > N getEnum(final K key) + throws NoSuchElementException, NumberFormatException { + final MutableOptionValue value = options.get(key); + if(value == null) { + throw new NoSuchElementException(key.name() + " has not been set"); + } + + if(!(value instanceof MutableOptionValue.MutableOptionEnumValue)) { + throw new NoSuchElementException(key.name() + " is not of Enum type"); + } + + return ((MutableOptionValue.MutableOptionEnumValue)value).asObject(); + } + + public U fromString( + final String keyStr, final String valueStr) + throws IllegalArgumentException { + Objects.requireNonNull(keyStr); + Objects.requireNonNull(valueStr); + + final K key = allKeys().get(keyStr); + switch(key.getValueType()) { + case DOUBLE: + return setDouble(key, Double.parseDouble(valueStr)); + + case LONG: + return setLong(key, Long.parseLong(valueStr)); + + case INT: + return setInt(key, Integer.parseInt(valueStr)); + + case BOOLEAN: + return setBoolean(key, Boolean.parseBoolean(valueStr)); + + case INT_ARRAY: + final String[] strInts = valueStr + .trim().split(INT_ARRAY_INT_SEPARATOR); + if(strInts == null || strInts.length == 0) { + throw new IllegalArgumentException( + "int array value is not correctly formatted"); + } + + final int value[] = new int[strInts.length]; + int i = 0; + for(final String strInt : strInts) { + value[i++] = Integer.parseInt(strInt); + } + return setIntArray(key, value); + } + + throw new IllegalStateException( + key + " has unknown value type: " + key.getValueType()); + } + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTableFilter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTableFilter.java new file mode 100644 index 000000000..627e1ae1f --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTableFilter.java @@ -0,0 +1,19 @@ +package org.rocksdb; + +/** + * Base class for Table Filters. + */ +public abstract class AbstractTableFilter + extends RocksCallbackObject implements TableFilter { + + protected AbstractTableFilter() { + super(); + } + + @Override + protected long initializeNative(final long... nativeParameterHandles) { + return createNewTableFilter(); + } + + private native long createNewTableFilter(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTraceWriter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTraceWriter.java new file mode 100644 index 000000000..806709b1f --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractTraceWriter.java @@ -0,0 +1,70 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Base class for TraceWriters. + */ +public abstract class AbstractTraceWriter + extends RocksCallbackObject implements TraceWriter { + + @Override + protected long initializeNative(final long... nativeParameterHandles) { + return createNewTraceWriter(); + } + + /** + * Called from JNI, proxy for {@link TraceWriter#write(Slice)}. + * + * @param sliceHandle the native handle of the slice (which we do not own) + * + * @return short (2 bytes) where the first byte is the + * {@link Status.Code#getValue()} and the second byte is the + * {@link Status.SubCode#getValue()}. + */ + private short writeProxy(final long sliceHandle) { + try { + write(new Slice(sliceHandle)); + return statusToShort(Status.Code.Ok, Status.SubCode.None); + } catch (final RocksDBException e) { + return statusToShort(e.getStatus()); + } + } + + /** + * Called from JNI, proxy for {@link TraceWriter#closeWriter()}. + * + * @return short (2 bytes) where the first byte is the + * {@link Status.Code#getValue()} and the second byte is the + * {@link Status.SubCode#getValue()}. + */ + private short closeWriterProxy() { + try { + closeWriter(); + return statusToShort(Status.Code.Ok, Status.SubCode.None); + } catch (final RocksDBException e) { + return statusToShort(e.getStatus()); + } + } + + private static short statusToShort(/*@Nullable*/ final Status status) { + final Status.Code code = status != null && status.getCode() != null + ? status.getCode() + : Status.Code.IOError; + final Status.SubCode subCode = status != null && status.getSubCode() != null + ? status.getSubCode() + : Status.SubCode.None; + return statusToShort(code, subCode); + } + + private static short statusToShort(final Status.Code code, + final Status.SubCode subCode) { + short result = (short)(code.getValue() << 8); + return (short)(result | subCode.getValue()); + } + + private native long createNewTraceWriter(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractWalFilter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractWalFilter.java new file mode 100644 index 000000000..d525045c6 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AbstractWalFilter.java @@ -0,0 +1,49 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Base class for WAL Filters. + */ +public abstract class AbstractWalFilter + extends RocksCallbackObject implements WalFilter { + + @Override + protected long initializeNative(final long... nativeParameterHandles) { + return createNewWalFilter(); + } + + /** + * Called from JNI, proxy for + * {@link WalFilter#logRecordFound(long, String, WriteBatch, WriteBatch)}. + * + * @param logNumber the log handle. + * @param logFileName the log file name + * @param batchHandle the native handle of a WriteBatch (which we do not own) + * @param newBatchHandle the native handle of a + * new WriteBatch (which we do not own) + * + * @return short (2 bytes) where the first byte is the + * {@link WalFilter.LogRecordFoundResult#walProcessingOption} + * {@link WalFilter.LogRecordFoundResult#batchChanged}. + */ + private short logRecordFoundProxy(final long logNumber, + final String logFileName, final long batchHandle, + final long newBatchHandle) { + final LogRecordFoundResult logRecordFoundResult = logRecordFound( + logNumber, logFileName, new WriteBatch(batchHandle), + new WriteBatch(newBatchHandle)); + return logRecordFoundResultToShort(logRecordFoundResult); + } + + private static short logRecordFoundResultToShort( + final LogRecordFoundResult logRecordFoundResult) { + short result = (short)(logRecordFoundResult.walProcessingOption.getValue() << 8); + return (short)(result | (logRecordFoundResult.batchChanged ? 1 : 0)); + } + + private native long createNewWalFilter(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java index 092fe3784..3ec467123 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java @@ -434,4 +434,32 @@ public interface AdvancedMutableColumnFamilyOptionsInterface * @return true if reporting is enabled */ boolean reportBgIoStats(); + + /** + * Non-bottom-level files older than TTL will go through the compaction + * process. This needs {@link MutableDBOptionsInterface#maxOpenFiles()} to be + * set to -1. + * + * Enabled only for level compaction for now. + * + * Default: 0 (disabled) + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}. + * + * @param ttl the time-to-live. + * + * @return the reference to the current options. + */ + T setTtl(final long ttl); + + /** + * Get the TTL for Non-bottom-level files that will go through the compaction + * process. + * + * See {@link #setTtl(long)}. + * + * @return the time-to-live. + */ + long ttl(); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BackupEngine.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BackupEngine.java index f22e8901c..a028edea0 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BackupEngine.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BackupEngine.java @@ -65,7 +65,10 @@ public class BackupEngine extends RocksObject implements AutoCloseable { * When false, the Backup Engine will not issue a * flush before starting the backup. In that case, * the backup will also include log files - * corresponding to live memtables. The backup will + * corresponding to live memtables. If writes have + * been performed with the write ahead log disabled, + * set flushBeforeBackup to true to prevent those + * writes from being lost. Otherwise, the backup will * always be consistent with the current state of the * database regardless of the flushBeforeBackup * parameter. @@ -95,7 +98,10 @@ public class BackupEngine extends RocksObject implements AutoCloseable { * When false, the Backup Engine will not issue a * flush before starting the backup. In that case, * the backup will also include log files - * corresponding to live memtables. The backup will + * corresponding to live memtables. If writes have + * been performed with the write ahead log disabled, + * set flushBeforeBackup to true to prevent those + * writes from being lost. Otherwise, the backup will * always be consistent with the current state of the * database regardless of the flushBeforeBackup * parameter. diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java index 2dbbc64d3..7a4ff14bf 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java @@ -9,67 +9,252 @@ package org.rocksdb; * * BlockBasedTable is a RocksDB's default SST file format. */ +//TODO(AR) should be renamed BlockBasedTableOptions public class BlockBasedTableConfig extends TableFormatConfig { public BlockBasedTableConfig() { - noBlockCache_ = false; - blockCacheSize_ = 8 * 1024 * 1024; - blockCacheNumShardBits_ = 0; - blockCache_ = null; - blockSize_ = 4 * 1024; - blockSizeDeviation_ = 10; - blockRestartInterval_ = 16; - wholeKeyFiltering_ = true; - filter_ = null; - cacheIndexAndFilterBlocks_ = false; - pinL0FilterAndIndexBlocksInCache_ = false; - hashIndexAllowCollision_ = true; - blockCacheCompressedSize_ = 0; - blockCacheCompressedNumShardBits_ = 0; - checksumType_ = ChecksumType.kCRC32c; - indexType_ = IndexType.kBinarySearch; - formatVersion_ = 0; + //TODO(AR) flushBlockPolicyFactory + cacheIndexAndFilterBlocks = false; + cacheIndexAndFilterBlocksWithHighPriority = false; + pinL0FilterAndIndexBlocksInCache = false; + pinTopLevelIndexAndFilter = true; + indexType = IndexType.kBinarySearch; + dataBlockIndexType = DataBlockIndexType.kDataBlockBinarySearch; + dataBlockHashTableUtilRatio = 0.75; + checksumType = ChecksumType.kCRC32c; + noBlockCache = false; + blockCache = null; + persistentCache = null; + blockCacheCompressed = null; + blockSize = 4 * 1024; + blockSizeDeviation = 10; + blockRestartInterval = 16; + indexBlockRestartInterval = 1; + metadataBlockSize = 4096; + partitionFilters = false; + useDeltaEncoding = true; + filterPolicy = null; + wholeKeyFiltering = true; + verifyCompression = true; + readAmpBytesPerBit = 0; + formatVersion = 2; + enableIndexCompression = true; + blockAlign = false; + + // NOTE: ONLY used if blockCache == null + blockCacheSize = 8 * 1024 * 1024; + blockCacheNumShardBits = 0; + + // NOTE: ONLY used if blockCacheCompressed == null + blockCacheCompressedSize = 0; + blockCacheCompressedNumShardBits = 0; } /** - * Disable block cache. If this is set to true, - * then no block cache should be used, and the block_cache should - * point to a {@code nullptr} object. - * Default: false + * Indicating if we'd put index/filter blocks to the block cache. + * If not specified, each "table reader" object will pre-load index/filter + * block during table initialization. * - * @param noBlockCache if use block cache + * @return if index and filter blocks should be put in block cache. + */ + public boolean cacheIndexAndFilterBlocks() { + return cacheIndexAndFilterBlocks; + } + + /** + * Indicating if we'd put index/filter blocks to the block cache. + * If not specified, each "table reader" object will pre-load index/filter + * block during table initialization. + * + * @param cacheIndexAndFilterBlocks and filter blocks should be put in block cache. * @return the reference to the current config. */ - public BlockBasedTableConfig setNoBlockCache(final boolean noBlockCache) { - noBlockCache_ = noBlockCache; + public BlockBasedTableConfig setCacheIndexAndFilterBlocks( + final boolean cacheIndexAndFilterBlocks) { + this.cacheIndexAndFilterBlocks = cacheIndexAndFilterBlocks; return this; } /** - * @return if block cache is disabled + * Indicates if index and filter blocks will be treated as high-priority in the block cache. + * See note below about applicability. If not specified, defaults to false. + * + * @return if index and filter blocks will be treated as high-priority. */ - public boolean noBlockCache() { - return noBlockCache_; + public boolean cacheIndexAndFilterBlocksWithHighPriority() { + return cacheIndexAndFilterBlocksWithHighPriority; } /** - * Set the amount of cache in bytes that will be used by RocksDB. - * If cacheSize is non-positive, then cache will not be used. - * DEFAULT: 8M + * If true, cache index and filter blocks with high priority. If set to true, + * depending on implementation of block cache, index and filter blocks may be + * less likely to be evicted than data blocks. * - * @param blockCacheSize block cache size in bytes + * @param cacheIndexAndFilterBlocksWithHighPriority if index and filter blocks + * will be treated as high-priority. * @return the reference to the current config. */ - public BlockBasedTableConfig setBlockCacheSize(final long blockCacheSize) { - blockCacheSize_ = blockCacheSize; + public BlockBasedTableConfig setCacheIndexAndFilterBlocksWithHighPriority( + final boolean cacheIndexAndFilterBlocksWithHighPriority) { + this.cacheIndexAndFilterBlocksWithHighPriority = cacheIndexAndFilterBlocksWithHighPriority; return this; } /** - * @return block cache size in bytes + * Indicating if we'd like to pin L0 index/filter blocks to the block cache. + If not specified, defaults to false. + * + * @return if L0 index and filter blocks should be pinned to the block cache. */ - public long blockCacheSize() { - return blockCacheSize_; + public boolean pinL0FilterAndIndexBlocksInCache() { + return pinL0FilterAndIndexBlocksInCache; + } + + /** + * Indicating if we'd like to pin L0 index/filter blocks to the block cache. + If not specified, defaults to false. + * + * @param pinL0FilterAndIndexBlocksInCache pin blocks in block cache + * @return the reference to the current config. + */ + public BlockBasedTableConfig setPinL0FilterAndIndexBlocksInCache( + final boolean pinL0FilterAndIndexBlocksInCache) { + this.pinL0FilterAndIndexBlocksInCache = pinL0FilterAndIndexBlocksInCache; + return this; + } + + /** + * Indicates if top-level index and filter blocks should be pinned. + * + * @return if top-level index and filter blocks should be pinned. + */ + public boolean pinTopLevelIndexAndFilter() { + return pinTopLevelIndexAndFilter; + } + + /** + * If cacheIndexAndFilterBlocks is true and the below is true, then + * the top-level index of partitioned filter and index blocks are stored in + * the cache, but a reference is held in the "table reader" object so the + * blocks are pinned and only evicted from cache when the table reader is + * freed. This is not limited to l0 in LSM tree. + * + * @param pinTopLevelIndexAndFilter if top-level index and filter blocks should be pinned. + * @return the reference to the current config. + */ + public BlockBasedTableConfig setPinTopLevelIndexAndFilter(final boolean pinTopLevelIndexAndFilter) { + this.pinTopLevelIndexAndFilter = pinTopLevelIndexAndFilter; + return this; + } + + /** + * Get the index type. + * + * @return the currently set index type + */ + public IndexType indexType() { + return indexType; + } + + /** + * Sets the index type to used with this table. + * + * @param indexType {@link org.rocksdb.IndexType} value + * @return the reference to the current option. + */ + public BlockBasedTableConfig setIndexType( + final IndexType indexType) { + this.indexType = indexType; + return this; + } + + /** + * Get the data block index type. + * + * @return the currently set data block index type + */ + public DataBlockIndexType dataBlockIndexType() { + return dataBlockIndexType; + } + + /** + * Sets the data block index type to used with this table. + * + * @param dataBlockIndexType {@link org.rocksdb.DataBlockIndexType} value + * @return the reference to the current option. + */ + public BlockBasedTableConfig setDataBlockIndexType( + final DataBlockIndexType dataBlockIndexType) { + this.dataBlockIndexType = dataBlockIndexType; + return this; + } + + /** + * Get the #entries/#buckets. It is valid only when {@link #dataBlockIndexType()} is + * {@link DataBlockIndexType#kDataBlockBinaryAndHash}. + * + * @return the #entries/#buckets. + */ + public double dataBlockHashTableUtilRatio() { + return dataBlockHashTableUtilRatio; + } + + /** + * Set the #entries/#buckets. It is valid only when {@link #dataBlockIndexType()} is + * {@link DataBlockIndexType#kDataBlockBinaryAndHash}. + * + * @param dataBlockHashTableUtilRatio #entries/#buckets + * @return the reference to the current option. + */ + public BlockBasedTableConfig setDataBlockHashTableUtilRatio( + final double dataBlockHashTableUtilRatio) { + this.dataBlockHashTableUtilRatio = dataBlockHashTableUtilRatio; + return this; + } + + /** + * Get the checksum type to be used with this table. + * + * @return the currently set checksum type + */ + public ChecksumType checksumType() { + return checksumType; + } + + /** + * Sets + * + * @param checksumType {@link org.rocksdb.ChecksumType} value. + * @return the reference to the current option. + */ + public BlockBasedTableConfig setChecksumType( + final ChecksumType checksumType) { + this.checksumType = checksumType; + return this; + } + + /** + * Determine if the block cache is disabled. + * + * @return if block cache is disabled + */ + public boolean noBlockCache() { + return noBlockCache; + } + + /** + * Disable block cache. If this is set to true, + * then no block cache should be used, and the {@link #setBlockCache(Cache)} + * should point to a {@code null} object. + * + * Default: false + * + * @param noBlockCache if use block cache + * @return the reference to the current config. + */ + public BlockBasedTableConfig setNoBlockCache(final boolean noBlockCache) { + this.noBlockCache = noBlockCache; + return this; } /** @@ -82,42 +267,68 @@ public class BlockBasedTableConfig extends TableFormatConfig { * {@link org.rocksdb.Cache} instance can be re-used in multiple options * instances. * - * @param cache {@link org.rocksdb.Cache} Cache java instance (e.g. LRUCache). + * @param blockCache {@link org.rocksdb.Cache} Cache java instance + * (e.g. LRUCache). + * * @return the reference to the current config. */ - public BlockBasedTableConfig setBlockCache(final Cache cache) { - blockCache_ = cache; + public BlockBasedTableConfig setBlockCache(final Cache blockCache) { + this.blockCache = blockCache; return this; } /** - * Controls the number of shards for the block cache. - * This is applied only if cacheSize is set to non-negative. + * Use the specified persistent cache. * - * @param blockCacheNumShardBits the number of shard bits. The resulting - * number of shards would be 2 ^ numShardBits. Any negative - * number means use default settings." - * @return the reference to the current option. + * If {@code !null} use the specified cache for pages read from device, + * otherwise no page cache is used. + * + * @param persistentCache the persistent cache + * + * @return the reference to the current config. */ - public BlockBasedTableConfig setCacheNumShardBits( - final int blockCacheNumShardBits) { - blockCacheNumShardBits_ = blockCacheNumShardBits; + public BlockBasedTableConfig setPersistentCache( + final PersistentCache persistentCache) { + this.persistentCache = persistentCache; return this; } /** - * Returns the number of shard bits used in the block cache. - * The resulting number of shards would be 2 ^ (returned value). - * Any negative number means use default settings. + * Use the specified cache for compressed blocks. * - * @return the number of shard bits used in the block cache. + * If {@code null}, RocksDB will not use a compressed block cache. + * + * Note: though it looks similar to {@link #setBlockCache(Cache)}, RocksDB + * doesn't put the same type of object there. + * + * {@link org.rocksdb.Cache} should not be disposed before options instances + * using this cache is disposed. + * + * {@link org.rocksdb.Cache} instance can be re-used in multiple options + * instances. + * + * @param blockCacheCompressed {@link org.rocksdb.Cache} Cache java instance + * (e.g. LRUCache). + * + * @return the reference to the current config. */ - public int cacheNumShardBits() { - return blockCacheNumShardBits_; + public BlockBasedTableConfig setBlockCacheCompressed( + final Cache blockCacheCompressed) { + this.blockCacheCompressed = blockCacheCompressed; + return this; + } + + /** + * Get the approximate size of user data packed per block. + * + * @return block size in bytes + */ + public long blockSize() { + return blockSize; } /** - * Approximate size of user data packed per block. Note that the + * Approximate size of user data packed per block. Note that the * block size specified here corresponds to uncompressed data. The * actual size of the unit read from disk may be smaller if * compression is enabled. This parameter can be changed dynamically. @@ -127,23 +338,24 @@ public class BlockBasedTableConfig extends TableFormatConfig { * @return the reference to the current config. */ public BlockBasedTableConfig setBlockSize(final long blockSize) { - blockSize_ = blockSize; + this.blockSize = blockSize; return this; } /** - * @return block size in bytes + * @return the hash table ratio. */ - public long blockSize() { - return blockSize_; + public int blockSizeDeviation() { + return blockSizeDeviation; } /** * This is used to close a block before it reaches the configured - * 'block_size'. If the percentage of free space in the current block is less - * than this specified number and adding a new record to the block will - * exceed the configured block size, then this block will be closed and the - * new record will be written to the next block. + * {@link #blockSize()}. If the percentage of free space in the current block + * is less than this specified number and adding a new record to the block + * will exceed the configured block size, then this block will be closed and + * the new record will be written to the next block. + * * Default is 10. * * @param blockSizeDeviation the deviation to block size allowed @@ -151,55 +363,120 @@ public class BlockBasedTableConfig extends TableFormatConfig { */ public BlockBasedTableConfig setBlockSizeDeviation( final int blockSizeDeviation) { - blockSizeDeviation_ = blockSizeDeviation; + this.blockSizeDeviation = blockSizeDeviation; return this; } /** - * @return the hash table ratio. + * Get the block restart interval. + * + * @return block restart interval */ - public int blockSizeDeviation() { - return blockSizeDeviation_; + public int blockRestartInterval() { + return blockRestartInterval; } /** - * Set block restart interval + * Set the block restart interval. * * @param restartInterval block restart interval. * @return the reference to the current config. */ public BlockBasedTableConfig setBlockRestartInterval( final int restartInterval) { - blockRestartInterval_ = restartInterval; + blockRestartInterval = restartInterval; return this; } /** - * @return block restart interval + * Get the index block restart interval. + * + * @return index block restart interval */ - public int blockRestartInterval() { - return blockRestartInterval_; + public int indexBlockRestartInterval() { + return indexBlockRestartInterval; } /** - * If true, place whole keys in the filter (not just prefixes). - * This must generally be true for gets to be efficient. - * Default: true + * Set the index block restart interval * - * @param wholeKeyFiltering if enable whole key filtering + * @param restartInterval index block restart interval. * @return the reference to the current config. */ - public BlockBasedTableConfig setWholeKeyFiltering( - final boolean wholeKeyFiltering) { - wholeKeyFiltering_ = wholeKeyFiltering; + public BlockBasedTableConfig setIndexBlockRestartInterval( + final int restartInterval) { + indexBlockRestartInterval = restartInterval; return this; } /** - * @return if whole key filtering is enabled + * Get the block size for partitioned metadata. + * + * @return block size for partitioned metadata. */ - public boolean wholeKeyFiltering() { - return wholeKeyFiltering_; + public long metadataBlockSize() { + return metadataBlockSize; + } + + /** + * Set block size for partitioned metadata. + * + * @param metadataBlockSize Partitioned metadata block size. + * @return the reference to the current config. + */ + public BlockBasedTableConfig setMetadataBlockSize( + final long metadataBlockSize) { + this.metadataBlockSize = metadataBlockSize; + return this; + } + + /** + * Indicates if we're using partitioned filters. + * + * @return if we're using partition filters. + */ + public boolean partitionFilters() { + return partitionFilters; + } + + /** + * Use partitioned full filters for each SST file. This option is incompatible + * with block-based filters. + * + * Defaults to false. + * + * @param partitionFilters use partition filters. + * @return the reference to the current config. + */ + public BlockBasedTableConfig setPartitionFilters(final boolean partitionFilters) { + this.partitionFilters = partitionFilters; + return this; + } + + /** + * Determine if delta encoding is being used to compress block keys. + * + * @return true if delta encoding is enabled, false otherwise. + */ + public boolean useDeltaEncoding() { + return useDeltaEncoding; + } + + /** + * Use delta encoding to compress keys in blocks. + * + * NOTE: {@link ReadOptions#pinData()} requires this option to be disabled. + * + * Default: true + * + * @param useDeltaEncoding true to enable delta encoding + * + * @return the reference to the current config. + */ + public BlockBasedTableConfig setUseDeltaEncoding( + final boolean useDeltaEncoding) { + this.useDeltaEncoding = useDeltaEncoding; + return this; } /** @@ -212,87 +489,274 @@ public class BlockBasedTableConfig extends TableFormatConfig { * {@link org.rocksdb.Filter} instance can be re-used in multiple options * instances. * - * @param filter {@link org.rocksdb.Filter} Filter Policy java instance. + * @param filterPolicy {@link org.rocksdb.Filter} Filter Policy java instance. * @return the reference to the current config. */ + public BlockBasedTableConfig setFilterPolicy( + final Filter filterPolicy) { + this.filterPolicy = filterPolicy; + return this; + } + + /* + * @deprecated Use {@link #setFilterPolicy(Filter)} + */ + @Deprecated public BlockBasedTableConfig setFilter( final Filter filter) { - filter_ = filter; + return setFilterPolicy(filter); + } + + /** + * Determine if whole keys as opposed to prefixes are placed in the filter. + * + * @return if whole key filtering is enabled + */ + public boolean wholeKeyFiltering() { + return wholeKeyFiltering; + } + + /** + * If true, place whole keys in the filter (not just prefixes). + * This must generally be true for gets to be efficient. + * Default: true + * + * @param wholeKeyFiltering if enable whole key filtering + * @return the reference to the current config. + */ + public BlockBasedTableConfig setWholeKeyFiltering( + final boolean wholeKeyFiltering) { + this.wholeKeyFiltering = wholeKeyFiltering; return this; } /** - * Indicating if we'd put index/filter blocks to the block cache. - If not specified, each "table reader" object will pre-load index/filter - block during table initialization. + * Returns true when compression verification is enabled. * - * @return if index and filter blocks should be put in block cache. + * See {@link #setVerifyCompression(boolean)}. + * + * @return true if compression verification is enabled. */ - public boolean cacheIndexAndFilterBlocks() { - return cacheIndexAndFilterBlocks_; + public boolean verifyCompression() { + return verifyCompression; } /** - * Indicating if we'd put index/filter blocks to the block cache. - If not specified, each "table reader" object will pre-load index/filter - block during table initialization. + * Verify that decompressing the compressed block gives back the input. This + * is a verification mode that we use to detect bugs in compression + * algorithms. + * + * @param verifyCompression true to enable compression verification. * - * @param cacheIndexAndFilterBlocks and filter blocks should be put in block cache. * @return the reference to the current config. */ - public BlockBasedTableConfig setCacheIndexAndFilterBlocks( - final boolean cacheIndexAndFilterBlocks) { - cacheIndexAndFilterBlocks_ = cacheIndexAndFilterBlocks; + public BlockBasedTableConfig setVerifyCompression( + final boolean verifyCompression) { + this.verifyCompression = verifyCompression; return this; } /** - * Indicating if we'd like to pin L0 index/filter blocks to the block cache. - If not specified, defaults to false. + * Get the Read amplification bytes per-bit. * - * @return if L0 index and filter blocks should be pinned to the block cache. + * See {@link #setReadAmpBytesPerBit(int)}. + * + * @return the bytes per-bit. */ - public boolean pinL0FilterAndIndexBlocksInCache() { - return pinL0FilterAndIndexBlocksInCache_; + public int readAmpBytesPerBit() { + return readAmpBytesPerBit; } /** - * Indicating if we'd like to pin L0 index/filter blocks to the block cache. - If not specified, defaults to false. + * Set the Read amplification bytes per-bit. + * + * If used, For every data block we load into memory, we will create a bitmap + * of size ((block_size / `read_amp_bytes_per_bit`) / 8) bytes. This bitmap + * will be used to figure out the percentage we actually read of the blocks. + * + * When this feature is used Tickers::READ_AMP_ESTIMATE_USEFUL_BYTES and + * Tickers::READ_AMP_TOTAL_READ_BYTES can be used to calculate the + * read amplification using this formula + * (READ_AMP_TOTAL_READ_BYTES / READ_AMP_ESTIMATE_USEFUL_BYTES) + * + * value => memory usage (percentage of loaded blocks memory) + * 1 => 12.50 % + * 2 => 06.25 % + * 4 => 03.12 % + * 8 => 01.56 % + * 16 => 00.78 % + * + * Note: This number must be a power of 2, if not it will be sanitized + * to be the next lowest power of 2, for example a value of 7 will be + * treated as 4, a value of 19 will be treated as 16. + * + * Default: 0 (disabled) + * + * @param readAmpBytesPerBit the bytes per-bit * - * @param pinL0FilterAndIndexBlocksInCache pin blocks in block cache * @return the reference to the current config. */ - public BlockBasedTableConfig setPinL0FilterAndIndexBlocksInCache( - final boolean pinL0FilterAndIndexBlocksInCache) { - pinL0FilterAndIndexBlocksInCache_ = pinL0FilterAndIndexBlocksInCache; + public BlockBasedTableConfig setReadAmpBytesPerBit(final int readAmpBytesPerBit) { + this.readAmpBytesPerBit = readAmpBytesPerBit; return this; } /** - * Influence the behavior when kHashSearch is used. - if false, stores a precise prefix to block range mapping - if true, does not store prefix and allows prefix hash collision - (less memory consumption) + * Get the format version. + * See {@link #setFormatVersion(int)}. * - * @return if hash collisions should be allowed. + * @return the currently configured format version. */ - public boolean hashIndexAllowCollision() { - return hashIndexAllowCollision_; + public int formatVersion() { + return formatVersion; } /** - * Influence the behavior when kHashSearch is used. - if false, stores a precise prefix to block range mapping - if true, does not store prefix and allows prefix hash collision - (less memory consumption) + *

We currently have five versions:

* - * @param hashIndexAllowCollision points out if hash collisions should be allowed. + *
    + *
  • 0 - This version is currently written + * out by all RocksDB's versions by default. Can be read by really old + * RocksDB's. Doesn't support changing checksum (default is CRC32).
  • + *
  • 1 - Can be read by RocksDB's versions since 3.0. + * Supports non-default checksum, like xxHash. It is written by RocksDB when + * BlockBasedTableOptions::checksum is something other than kCRC32c. (version + * 0 is silently upconverted)
  • + *
  • 2 - Can be read by RocksDB's versions since 3.10. + * Changes the way we encode compressed blocks with LZ4, BZip2 and Zlib + * compression. If you don't plan to run RocksDB before version 3.10, + * you should probably use this.
  • + *
  • 3 - Can be read by RocksDB's versions since 5.15. Changes the way we + * encode the keys in index blocks. If you don't plan to run RocksDB before + * version 5.15, you should probably use this. + * This option only affects newly written tables. When reading existing + * tables, the information about version is read from the footer.
  • + *
  • 4 - Can be read by RocksDB's versions since 5.16. Changes the way we + * encode the values in index blocks. If you don't plan to run RocksDB before + * version 5.16 and you are using index_block_restart_interval > 1, you should + * probably use this as it would reduce the index size.
  • + *
+ *

This option only affects newly written tables. When reading existing + * tables, the information about version is read from the footer.

+ * + * @param formatVersion integer representing the version to be used. + * + * @return the reference to the current option. + */ + public BlockBasedTableConfig setFormatVersion( + final int formatVersion) { + assert(formatVersion >= 0 && formatVersion <= 4); + this.formatVersion = formatVersion; + return this; + } + + /** + * Determine if index compression is enabled. + * + * See {@link #setEnableIndexCompression(boolean)}. + * + * @return true if index compression is enabled, false otherwise + */ + public boolean enableIndexCompression() { + return enableIndexCompression; + } + + /** + * Store index blocks on disk in compressed format. + * + * Changing this option to false will avoid the overhead of decompression + * if index blocks are evicted and read back. + * + * @param enableIndexCompression true to enable index compression, + * false to disable + * + * @return the reference to the current option. + */ + public BlockBasedTableConfig setEnableIndexCompression( + final boolean enableIndexCompression) { + this.enableIndexCompression = enableIndexCompression; + return this; + } + + /** + * Determines whether data blocks are aligned on the lesser of page size + * and block size. + * + * @return true if data blocks are aligned on the lesser of page size + * and block size. + */ + public boolean blockAlign() { + return blockAlign; + } + + /** + * Set whether data blocks should be aligned on the lesser of page size + * and block size. + * + * @param blockAlign true to align data blocks on the lesser of page size + * and block size. + * + * @return the reference to the current option. + */ + public BlockBasedTableConfig setBlockAlign(final boolean blockAlign) { + this.blockAlign = blockAlign; + return this; + } + + + /** + * Get the size of the cache in bytes that will be used by RocksDB. + * + * @return block cache size in bytes + */ + @Deprecated + public long blockCacheSize() { + return blockCacheSize; + } + + /** + * Set the size of the cache in bytes that will be used by RocksDB. + * If cacheSize is non-positive, then cache will not be used. + * DEFAULT: 8M + * + * @param blockCacheSize block cache size in bytes * @return the reference to the current config. + * + * @deprecated Use {@link #setBlockCache(Cache)}. */ - public BlockBasedTableConfig setHashIndexAllowCollision( - final boolean hashIndexAllowCollision) { - hashIndexAllowCollision_ = hashIndexAllowCollision; + @Deprecated + public BlockBasedTableConfig setBlockCacheSize(final long blockCacheSize) { + this.blockCacheSize = blockCacheSize; + return this; + } + + /** + * Returns the number of shard bits used in the block cache. + * The resulting number of shards would be 2 ^ (returned value). + * Any negative number means use default settings. + * + * @return the number of shard bits used in the block cache. + */ + @Deprecated + public int cacheNumShardBits() { + return blockCacheNumShardBits; + } + + /** + * Controls the number of shards for the block cache. + * This is applied only if cacheSize is set to non-negative. + * + * @param blockCacheNumShardBits the number of shard bits. The resulting + * number of shards would be 2 ^ numShardBits. Any negative + * number means use default settings." + * @return the reference to the current option. + * + * @deprecated Use {@link #setBlockCache(Cache)}. + */ + @Deprecated + public BlockBasedTableConfig setCacheNumShardBits( + final int blockCacheNumShardBits) { + this.blockCacheNumShardBits = blockCacheNumShardBits; return this; } @@ -302,8 +766,9 @@ public class BlockBasedTableConfig extends TableFormatConfig { * * @return size of compressed block cache. */ + @Deprecated public long blockCacheCompressedSize() { - return blockCacheCompressedSize_; + return blockCacheCompressedSize; } /** @@ -312,10 +777,13 @@ public class BlockBasedTableConfig extends TableFormatConfig { * * @param blockCacheCompressedSize of compressed block cache. * @return the reference to the current config. + * + * @deprecated Use {@link #setBlockCacheCompressed(Cache)}. */ + @Deprecated public BlockBasedTableConfig setBlockCacheCompressedSize( final long blockCacheCompressedSize) { - blockCacheCompressedSize_ = blockCacheCompressedSize; + this.blockCacheCompressedSize = blockCacheCompressedSize; return this; } @@ -327,8 +795,9 @@ public class BlockBasedTableConfig extends TableFormatConfig { * number of shards would be 2 ^ numShardBits. Any negative * number means use default settings. */ + @Deprecated public int blockCacheCompressedNumShardBits() { - return blockCacheCompressedNumShardBits_; + return blockCacheCompressedNumShardBits; } /** @@ -339,134 +808,166 @@ public class BlockBasedTableConfig extends TableFormatConfig { * number of shards would be 2 ^ numShardBits. Any negative * number means use default settings." * @return the reference to the current option. + * + * @deprecated Use {@link #setBlockCacheCompressed(Cache)}. */ + @Deprecated public BlockBasedTableConfig setBlockCacheCompressedNumShardBits( final int blockCacheCompressedNumShardBits) { - blockCacheCompressedNumShardBits_ = blockCacheCompressedNumShardBits; - return this; - } - - /** - * Sets the checksum type to be used with this table. - * - * @param checksumType {@link org.rocksdb.ChecksumType} value. - * @return the reference to the current option. - */ - public BlockBasedTableConfig setChecksumType( - final ChecksumType checksumType) { - checksumType_ = checksumType; + this.blockCacheCompressedNumShardBits = blockCacheCompressedNumShardBits; return this; } /** + * Influence the behavior when kHashSearch is used. + * if false, stores a precise prefix to block range mapping + * if true, does not store prefix and allows prefix hash collision + * (less memory consumption) * - * @return the currently set checksum type - */ - public ChecksumType checksumType() { - return checksumType_; - } - - /** - * Sets the index type to used with this table. + * @return if hash collisions should be allowed. * - * @param indexType {@link org.rocksdb.IndexType} value - * @return the reference to the current option. + * @deprecated This option is now deprecated. No matter what value it + * is set to, it will behave as + * if {@link #hashIndexAllowCollision()} == true. */ - public BlockBasedTableConfig setIndexType( - final IndexType indexType) { - indexType_ = indexType; - return this; + @Deprecated + public boolean hashIndexAllowCollision() { + return true; } /** + * Influence the behavior when kHashSearch is used. + * if false, stores a precise prefix to block range mapping + * if true, does not store prefix and allows prefix hash collision + * (less memory consumption) * - * @return the currently set index type - */ - public IndexType indexType() { - return indexType_; - } - - /** - *

We currently have three versions:

+ * @param hashIndexAllowCollision points out if hash collisions should be allowed. * - *
    - *
  • 0 - This version is currently written - * out by all RocksDB's versions by default. Can be read by really old - * RocksDB's. Doesn't support changing checksum (default is CRC32).
  • - *
  • 1 - Can be read by RocksDB's versions since 3.0. - * Supports non-default checksum, like xxHash. It is written by RocksDB when - * BlockBasedTableOptions::checksum is something other than kCRC32c. (version - * 0 is silently upconverted)
  • - *
  • 2 - Can be read by RocksDB's versions since 3.10. - * Changes the way we encode compressed blocks with LZ4, BZip2 and Zlib - * compression. If you don't plan to run RocksDB before version 3.10, - * you should probably use this.
  • - *
- *

This option only affects newly written tables. When reading existing - * tables, the information about version is read from the footer.

+ * @return the reference to the current config. * - * @param formatVersion integer representing the version to be used. - * @return the reference to the current option. + * @deprecated This option is now deprecated. No matter what value it + * is set to, it will behave as + * if {@link #hashIndexAllowCollision()} == true. */ - public BlockBasedTableConfig setFormatVersion( - final int formatVersion) { - assert(formatVersion >= 0 && formatVersion <= 2); - formatVersion_ = formatVersion; + @Deprecated + public BlockBasedTableConfig setHashIndexAllowCollision( + final boolean hashIndexAllowCollision) { + // no-op return this; } - /** - * - * @return the currently configured format version. - * See also: {@link #setFormatVersion(int)}. - */ - public int formatVersion() { - return formatVersion_; - } - + @Override protected long newTableFactoryHandle() { + final long filterPolicyHandle; + if (filterPolicy != null) { + filterPolicyHandle = filterPolicy.nativeHandle_; + } else { + filterPolicyHandle = 0; + } + final long blockCacheHandle; + if (blockCache != null) { + blockCacheHandle = blockCache.nativeHandle_; + } else { + blockCacheHandle = 0; + } - @Override protected long newTableFactoryHandle() { - long filterHandle = 0; - if (filter_ != null) { - filterHandle = filter_.nativeHandle_; + final long persistentCacheHandle; + if (persistentCache != null) { + persistentCacheHandle = persistentCache.nativeHandle_; + } else { + persistentCacheHandle = 0; } - long blockCacheHandle = 0; - if (blockCache_ != null) { - blockCacheHandle = blockCache_.nativeHandle_; + final long blockCacheCompressedHandle; + if (blockCacheCompressed != null) { + blockCacheCompressedHandle = blockCacheCompressed.nativeHandle_; + } else { + blockCacheCompressedHandle = 0; } - return newTableFactoryHandle(noBlockCache_, blockCacheSize_, blockCacheNumShardBits_, - blockCacheHandle, blockSize_, blockSizeDeviation_, blockRestartInterval_, - wholeKeyFiltering_, filterHandle, cacheIndexAndFilterBlocks_, - pinL0FilterAndIndexBlocksInCache_, hashIndexAllowCollision_, blockCacheCompressedSize_, - blockCacheCompressedNumShardBits_, checksumType_.getValue(), indexType_.getValue(), - formatVersion_); - } - - private native long newTableFactoryHandle(boolean noBlockCache, long blockCacheSize, - int blockCacheNumShardBits, long blockCacheHandle, long blockSize, int blockSizeDeviation, - int blockRestartInterval, boolean wholeKeyFiltering, long filterPolicyHandle, - boolean cacheIndexAndFilterBlocks, boolean pinL0FilterAndIndexBlocksInCache, - boolean hashIndexAllowCollision, long blockCacheCompressedSize, - int blockCacheCompressedNumShardBits, byte checkSumType, byte indexType, int formatVersion); - - private boolean cacheIndexAndFilterBlocks_; - private boolean pinL0FilterAndIndexBlocksInCache_; - private IndexType indexType_; - private boolean hashIndexAllowCollision_; - private ChecksumType checksumType_; - private boolean noBlockCache_; - private long blockSize_; - private long blockCacheSize_; - private int blockCacheNumShardBits_; - private Cache blockCache_; - private long blockCacheCompressedSize_; - private int blockCacheCompressedNumShardBits_; - private int blockSizeDeviation_; - private int blockRestartInterval_; - private Filter filter_; - private boolean wholeKeyFiltering_; - private int formatVersion_; + return newTableFactoryHandle(cacheIndexAndFilterBlocks, + cacheIndexAndFilterBlocksWithHighPriority, + pinL0FilterAndIndexBlocksInCache, pinTopLevelIndexAndFilter, + indexType.getValue(), dataBlockIndexType.getValue(), + dataBlockHashTableUtilRatio, checksumType.getValue(), noBlockCache, + blockCacheHandle, persistentCacheHandle, blockCacheCompressedHandle, + blockSize, blockSizeDeviation, blockRestartInterval, + indexBlockRestartInterval, metadataBlockSize, partitionFilters, + useDeltaEncoding, filterPolicyHandle, wholeKeyFiltering, + verifyCompression, readAmpBytesPerBit, formatVersion, + enableIndexCompression, blockAlign, + blockCacheSize, blockCacheNumShardBits, + blockCacheCompressedSize, blockCacheCompressedNumShardBits); + } + + private native long newTableFactoryHandle( + final boolean cacheIndexAndFilterBlocks, + final boolean cacheIndexAndFilterBlocksWithHighPriority, + final boolean pinL0FilterAndIndexBlocksInCache, + final boolean pinTopLevelIndexAndFilter, + final byte indexTypeValue, + final byte dataBlockIndexTypeValue, + final double dataBlockHashTableUtilRatio, + final byte checksumTypeValue, + final boolean noBlockCache, + final long blockCacheHandle, + final long persistentCacheHandle, + final long blockCacheCompressedHandle, + final long blockSize, + final int blockSizeDeviation, + final int blockRestartInterval, + final int indexBlockRestartInterval, + final long metadataBlockSize, + final boolean partitionFilters, + final boolean useDeltaEncoding, + final long filterPolicyHandle, + final boolean wholeKeyFiltering, + final boolean verifyCompression, + final int readAmpBytesPerBit, + final int formatVersion, + final boolean enableIndexCompression, + final boolean blockAlign, + + @Deprecated final long blockCacheSize, + @Deprecated final int blockCacheNumShardBits, + + @Deprecated final long blockCacheCompressedSize, + @Deprecated final int blockCacheCompressedNumShardBits + ); + + //TODO(AR) flushBlockPolicyFactory + private boolean cacheIndexAndFilterBlocks; + private boolean cacheIndexAndFilterBlocksWithHighPriority; + private boolean pinL0FilterAndIndexBlocksInCache; + private boolean pinTopLevelIndexAndFilter; + private IndexType indexType; + private DataBlockIndexType dataBlockIndexType; + private double dataBlockHashTableUtilRatio; + private ChecksumType checksumType; + private boolean noBlockCache; + private Cache blockCache; + private PersistentCache persistentCache; + private Cache blockCacheCompressed; + private long blockSize; + private int blockSizeDeviation; + private int blockRestartInterval; + private int indexBlockRestartInterval; + private long metadataBlockSize; + private boolean partitionFilters; + private boolean useDeltaEncoding; + private Filter filterPolicy; + private boolean wholeKeyFiltering; + private boolean verifyCompression; + private int readAmpBytesPerBit; + private int formatVersion; + private boolean enableIndexCompression; + private boolean blockAlign; + + // NOTE: ONLY used if blockCache == null + @Deprecated private long blockCacheSize; + @Deprecated private int blockCacheNumShardBits; + + // NOTE: ONLY used if blockCacheCompressed == null + @Deprecated private long blockCacheCompressedSize; + @Deprecated private int blockCacheCompressedNumShardBits; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyMetaData.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyMetaData.java new file mode 100644 index 000000000..191904017 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyMetaData.java @@ -0,0 +1,70 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.Arrays; +import java.util.List; + +/** + * The metadata that describes a column family. + */ +public class ColumnFamilyMetaData { + private final long size; + private final long fileCount; + private final byte[] name; + private final LevelMetaData[] levels; + + /** + * Called from JNI C++ + */ + private ColumnFamilyMetaData( + final long size, + final long fileCount, + final byte[] name, + final LevelMetaData[] levels) { + this.size = size; + this.fileCount = fileCount; + this.name = name; + this.levels = levels; + } + + /** + * The size of this column family in bytes, which is equal to the sum of + * the file size of its {@link #levels()}. + * + * @return the size of this column family + */ + public long size() { + return size; + } + + /** + * The number of files in this column family. + * + * @return the number of files + */ + public long fileCount() { + return fileCount; + } + + /** + * The name of the column family. + * + * @return the name + */ + public byte[] name() { + return name; + } + + /** + * The metadata of all levels in this column family. + * + * @return the levels metadata + */ + public List levels() { + return Arrays.asList(levels); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java index 3cdf9569b..e57752463 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java @@ -50,9 +50,19 @@ public class ColumnFamilyOptions extends RocksObject this.compactionFilterFactory_ = other.compactionFilterFactory_; this.compactionOptionsUniversal_ = other.compactionOptionsUniversal_; this.compactionOptionsFIFO_ = other.compactionOptionsFIFO_; + this.bottommostCompressionOptions_ = other.bottommostCompressionOptions_; this.compressionOptions_ = other.compressionOptions_; } + /** + * Constructor from Options + * + * @param options The options. + */ + public ColumnFamilyOptions(final Options options) { + super(newColumnFamilyOptionsFromOptions(options.nativeHandle_)); + } + /** *

Constructor to be used by * {@link #getColumnFamilyOptionsFromProps(java.util.Properties)}, @@ -186,22 +196,7 @@ public class ColumnFamilyOptions extends RocksObject return this; } - /** - * A single CompactionFilter instance to call into during compaction. - * Allows an application to modify/delete a key-value during background - * compaction. - * - * If the client requires a new compaction filter to be used for different - * compaction runs, it can specify call - * {@link #setCompactionFilterFactory(AbstractCompactionFilterFactory)} - * instead. - * - * The client should specify only set one of the two. - * {@link #setCompactionFilter(AbstractCompactionFilter)} takes precedence - * over {@link #setCompactionFilterFactory(AbstractCompactionFilterFactory)} - * if the client specifies both. - */ - //TODO(AR) need to set a note on the concurrency of the compaction filter used from this method + @Override public ColumnFamilyOptions setCompactionFilter( final AbstractCompactionFilter> compactionFilter) { @@ -210,15 +205,13 @@ public class ColumnFamilyOptions extends RocksObject return this; } - /** - * This is a factory that provides {@link AbstractCompactionFilter} objects - * which allow an application to modify/delete a key-value during background - * compaction. - * - * A new filter will be created on each compaction run. If multithreaded - * compaction is being used, each created CompactionFilter will only be used - * from a single thread and so does not need to be thread-safe. - */ + @Override + public AbstractCompactionFilter> compactionFilter() { + assert (isOwningHandle()); + return compactionFilter_; + } + + @Override public ColumnFamilyOptions setCompactionFilterFactory(final AbstractCompactionFilterFactory> compactionFilterFactory) { assert (isOwningHandle()); setCompactionFilterFactoryHandle(nativeHandle_, compactionFilterFactory.nativeHandle_); @@ -226,6 +219,12 @@ public class ColumnFamilyOptions extends RocksObject return this; } + @Override + public AbstractCompactionFilterFactory> compactionFilterFactory() { + assert (isOwningHandle()); + return compactionFilterFactory_; + } + @Override public ColumnFamilyOptions setWriteBufferSize(final long writeBufferSize) { assert(isOwningHandle()); @@ -329,6 +328,20 @@ public class ColumnFamilyOptions extends RocksObject bottommostCompressionType(nativeHandle_)); } + @Override + public ColumnFamilyOptions setBottommostCompressionOptions( + final CompressionOptions bottommostCompressionOptions) { + setBottommostCompressionOptions(nativeHandle_, + bottommostCompressionOptions.nativeHandle_); + this.bottommostCompressionOptions_ = bottommostCompressionOptions; + return this; + } + + @Override + public CompressionOptions bottommostCompressionOptions() { + return this.bottommostCompressionOptions_; + } + @Override public ColumnFamilyOptions setCompressionOptions( final CompressionOptions compressionOptions) { @@ -493,7 +506,7 @@ public class ColumnFamilyOptions extends RocksObject @Override public CompactionStyle compactionStyle() { - return CompactionStyle.values()[compactionStyle(nativeHandle_)]; + return CompactionStyle.fromValue(compactionStyle(nativeHandle_)); } @Override @@ -762,6 +775,17 @@ public class ColumnFamilyOptions extends RocksObject return reportBgIoStats(nativeHandle_); } + @Override + public ColumnFamilyOptions setTtl(final long ttl) { + setTtl(nativeHandle_, ttl); + return this; + } + + @Override + public long ttl() { + return ttl(nativeHandle_); + } + @Override public ColumnFamilyOptions setCompactionOptionsUniversal( final CompactionOptionsUniversal compactionOptionsUniversal) { @@ -804,7 +828,9 @@ public class ColumnFamilyOptions extends RocksObject String optString); private static native long newColumnFamilyOptions(); - private static native long copyColumnFamilyOptions(long handle); + private static native long copyColumnFamilyOptions(final long handle); + private static native long newColumnFamilyOptionsFromOptions( + final long optionsHandle); @Override protected final native void disposeInternal(final long handle); private native void optimizeForSmallDb(final long handle); @@ -840,6 +866,8 @@ public class ColumnFamilyOptions extends RocksObject private native void setBottommostCompressionType(long handle, byte bottommostCompressionType); private native byte bottommostCompressionType(long handle); + private native void setBottommostCompressionOptions(final long handle, + final long bottommostCompressionOptionsHandle); private native void setCompressionOptions(long handle, long compressionOptionsHandle); private native void useFixedLengthPrefixExtractor( @@ -947,6 +975,8 @@ public class ColumnFamilyOptions extends RocksObject private native void setReportBgIoStats(final long handle, final boolean reportBgIoStats); private native boolean reportBgIoStats(final long handle); + private native void setTtl(final long handle, final long ttl); + private native long ttl(final long handle); private native void setCompactionOptionsUniversal(final long handle, final long compactionOptionsUniversalHandle); private native void setCompactionOptionsFIFO(final long handle, @@ -961,10 +991,11 @@ public class ColumnFamilyOptions extends RocksObject private TableFormatConfig tableFormatConfig_; private AbstractComparator> comparator_; private AbstractCompactionFilter> compactionFilter_; - AbstractCompactionFilterFactory> + private AbstractCompactionFilterFactory> compactionFilterFactory_; private CompactionOptionsUniversal compactionOptionsUniversal_; private CompactionOptionsFIFO compactionOptionsFIFO_; + private CompressionOptions bottommostCompressionOptions_; private CompressionOptions compressionOptions_; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptionsInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptionsInterface.java index 5cb68b461..f88a21af2 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptionsInterface.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ColumnFamilyOptionsInterface.java @@ -151,6 +151,60 @@ public interface ColumnFamilyOptionsInterface */ T setMergeOperator(MergeOperator mergeOperator); + /** + * A single CompactionFilter instance to call into during compaction. + * Allows an application to modify/delete a key-value during background + * compaction. + * + * If the client requires a new compaction filter to be used for different + * compaction runs, it can specify call + * {@link #setCompactionFilterFactory(AbstractCompactionFilterFactory)} + * instead. + * + * The client should specify only set one of the two. + * {@link #setCompactionFilter(AbstractCompactionFilter)} takes precedence + * over {@link #setCompactionFilterFactory(AbstractCompactionFilterFactory)} + * if the client specifies both. + * + * If multithreaded compaction is being used, the supplied CompactionFilter + * instance may be used from different threads concurrently and so should be thread-safe. + * + * @param compactionFilter {@link AbstractCompactionFilter} instance. + * @return the instance of the current object. + */ + T setCompactionFilter( + final AbstractCompactionFilter> compactionFilter); + + /** + * Accessor for the CompactionFilter instance in use. + * + * @return Reference to the CompactionFilter, or null if one hasn't been set. + */ + AbstractCompactionFilter> compactionFilter(); + + /** + * This is a factory that provides {@link AbstractCompactionFilter} objects + * which allow an application to modify/delete a key-value during background + * compaction. + * + * A new filter will be created on each compaction run. If multithreaded + * compaction is being used, each created CompactionFilter will only be used + * from a single thread and so does not need to be thread-safe. + * + * @param compactionFilterFactory {@link AbstractCompactionFilterFactory} instance. + * @return the instance of the current object. + */ + T setCompactionFilterFactory( + final AbstractCompactionFilterFactory> + compactionFilterFactory); + + /** + * Accessor for the CompactionFilterFactory instance in use. + * + * @return Reference to the CompactionFilterFactory, or null if one hasn't been set. + */ + AbstractCompactionFilterFactory> compactionFilterFactory(); + /** * This prefix-extractor uses the first n bytes of a key as its prefix. * @@ -345,6 +399,28 @@ public interface ColumnFamilyOptionsInterface */ CompressionType bottommostCompressionType(); + /** + * Set the options for compression algorithms used by + * {@link #bottommostCompressionType()} if it is enabled. + * + * To enable it, please see the definition of + * {@link CompressionOptions}. + * + * @param compressionOptions the bottom most compression options. + * + * @return the reference of the current options. + */ + T setBottommostCompressionOptions( + final CompressionOptions compressionOptions); + + /** + * Get the bottom most compression options. + * + * See {@link #setBottommostCompressionOptions(CompressionOptions)}. + * + * @return the bottom most compression options. + */ + CompressionOptions bottommostCompressionOptions(); /** * Set the different options for compression algorithms diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactRangeOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactRangeOptions.java index e8c892110..c07bd96a5 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactRangeOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactRangeOptions.java @@ -88,26 +88,6 @@ public class CompactRangeOptions extends RocksObject { return this; } - - /** - * Returns the policy for compacting the bottommost level - * @return The BottommostLevelCompaction policy - */ - public BottommostLevelCompaction bottommostLevelCompaction() { - return BottommostLevelCompaction.fromRocksId(bottommostLevelCompaction(nativeHandle_)); - } - - /** - * Sets the policy for compacting the bottommost level - * - * @param bottommostLevelCompaction The policy for compacting the bottommost level - * @return This CompactRangeOptions - */ - public CompactRangeOptions setBottommostLevelCompaction(final BottommostLevelCompaction bottommostLevelCompaction) { - setBottommostLevelCompaction(nativeHandle_, bottommostLevelCompaction.getValue()); - return this; - } - /** * Returns whether compacted files will be moved to the minimum level capable of holding the data or given level * (specified non-negative target_level). @@ -170,6 +150,25 @@ public class CompactRangeOptions extends RocksObject { return this; } + /** + * Returns the policy for compacting the bottommost level + * @return The BottommostLevelCompaction policy + */ + public BottommostLevelCompaction bottommostLevelCompaction() { + return BottommostLevelCompaction.fromRocksId(bottommostLevelCompaction(nativeHandle_)); + } + + /** + * Sets the policy for compacting the bottommost level + * + * @param bottommostLevelCompaction The policy for compacting the bottommost level + * @return This CompactRangeOptions + */ + public CompactRangeOptions setBottommostLevelCompaction(final BottommostLevelCompaction bottommostLevelCompaction) { + setBottommostLevelCompaction(nativeHandle_, bottommostLevelCompaction.getValue()); + return this; + } + /** * If true, compaction will execute immediately even if doing so would cause the DB to * enter write stall mode. Otherwise, it'll sleep until load is low enough. @@ -212,22 +211,27 @@ public class CompactRangeOptions extends RocksObject { } private native static long newCompactRangeOptions(); + @Override protected final native void disposeInternal(final long handle); + private native boolean exclusiveManualCompaction(final long handle); - private native void setExclusiveManualCompaction(final long handle, final boolean exclusive_manual_compaction); - private native int bottommostLevelCompaction(final long handle); - private native void setBottommostLevelCompaction(final long handle, final int bottommostLevelCompaction); + private native void setExclusiveManualCompaction(final long handle, + final boolean exclusive_manual_compaction); private native boolean changeLevel(final long handle); - private native void setChangeLevel(final long handle, final boolean changeLevel); + private native void setChangeLevel(final long handle, + final boolean changeLevel); private native int targetLevel(final long handle); - private native void setTargetLevel(final long handle, final int targetLevel); + private native void setTargetLevel(final long handle, + final int targetLevel); private native int targetPathId(final long handle); - private native void setTargetPathId(final long handle, final int /* uint32_t */ targetPathId); + private native void setTargetPathId(final long handle, + final int targetPathId); + private native int bottommostLevelCompaction(final long handle); + private native void setBottommostLevelCompaction(final long handle, + final int bottommostLevelCompaction); private native boolean allowWriteStall(final long handle); - private native void setAllowWriteStall(final long handle, final boolean allowWriteStall); - private native void setMaxSubcompactions(final long handle, final int /* uint32_t */ maxSubcompactions); + private native void setAllowWriteStall(final long handle, + final boolean allowWriteStall); + private native void setMaxSubcompactions(final long handle, + final int maxSubcompactions); private native int maxSubcompactions(final long handle); - - @Override - protected final native void disposeInternal(final long handle); - } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobInfo.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobInfo.java new file mode 100644 index 000000000..8b59edc91 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobInfo.java @@ -0,0 +1,159 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +public class CompactionJobInfo extends RocksObject { + + public CompactionJobInfo() { + super(newCompactionJobInfo()); + } + + /** + * Private as called from JNI C++ + */ + private CompactionJobInfo(final long nativeHandle) { + super(nativeHandle); + } + + /** + * Get the name of the column family where the compaction happened. + * + * @return the name of the column family + */ + public byte[] columnFamilyName() { + return columnFamilyName(nativeHandle_); + } + + /** + * Get the status indicating whether the compaction was successful or not. + * + * @return the status + */ + public Status status() { + return status(nativeHandle_); + } + + /** + * Get the id of the thread that completed this compaction job. + * + * @return the id of the thread + */ + public long threadId() { + return threadId(nativeHandle_); + } + + /** + * Get the job id, which is unique in the same thread. + * + * @return the id of the thread + */ + public int jobId() { + return jobId(nativeHandle_); + } + + /** + * Get the smallest input level of the compaction. + * + * @return the input level + */ + public int baseInputLevel() { + return baseInputLevel(nativeHandle_); + } + + /** + * Get the output level of the compaction. + * + * @return the output level + */ + public int outputLevel() { + return outputLevel(nativeHandle_); + } + + /** + * Get the names of the compaction input files. + * + * @return the names of the input files. + */ + public List inputFiles() { + return Arrays.asList(inputFiles(nativeHandle_)); + } + + /** + * Get the names of the compaction output files. + * + * @return the names of the output files. + */ + public List outputFiles() { + return Arrays.asList(outputFiles(nativeHandle_)); + } + + /** + * Get the table properties for the input and output tables. + * + * The map is keyed by values from {@link #inputFiles()} and + * {@link #outputFiles()}. + * + * @return the table properties + */ + public Map tableProperties() { + return tableProperties(nativeHandle_); + } + + /** + * Get the Reason for running the compaction. + * + * @return the reason. + */ + public CompactionReason compactionReason() { + return CompactionReason.fromValue(compactionReason(nativeHandle_)); + } + + // + /** + * Get the compression algorithm used for output files. + * + * @return the compression algorithm + */ + public CompressionType compression() { + return CompressionType.getCompressionType(compression(nativeHandle_)); + } + + /** + * Get detailed information about this compaction. + * + * @return the detailed information, or null if not available. + */ + public /* @Nullable */ CompactionJobStats stats() { + final long statsHandle = stats(nativeHandle_); + if (statsHandle == 0) { + return null; + } + + return new CompactionJobStats(statsHandle); + } + + + private static native long newCompactionJobInfo(); + @Override protected native void disposeInternal(final long handle); + + private static native byte[] columnFamilyName(final long handle); + private static native Status status(final long handle); + private static native long threadId(final long handle); + private static native int jobId(final long handle); + private static native int baseInputLevel(final long handle); + private static native int outputLevel(final long handle); + private static native String[] inputFiles(final long handle); + private static native String[] outputFiles(final long handle); + private static native Map tableProperties( + final long handle); + private static native byte compactionReason(final long handle); + private static native byte compression(final long handle); + private static native long stats(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobStats.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobStats.java new file mode 100644 index 000000000..3d53b5565 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionJobStats.java @@ -0,0 +1,295 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +public class CompactionJobStats extends RocksObject { + + public CompactionJobStats() { + super(newCompactionJobStats()); + } + + /** + * Private as called from JNI C++ + */ + CompactionJobStats(final long nativeHandle) { + super(nativeHandle); + } + + /** + * Reset the stats. + */ + public void reset() { + reset(nativeHandle_); + } + + /** + * Aggregate the CompactionJobStats from another instance with this one. + * + * @param compactionJobStats another instance of stats. + */ + public void add(final CompactionJobStats compactionJobStats) { + add(nativeHandle_, compactionJobStats.nativeHandle_); + } + + /** + * Get the elapsed time in micro of this compaction. + * + * @return the elapsed time in micro of this compaction. + */ + public long elapsedMicros() { + return elapsedMicros(nativeHandle_); + } + + /** + * Get the number of compaction input records. + * + * @return the number of compaction input records. + */ + public long numInputRecords() { + return numInputRecords(nativeHandle_); + } + + /** + * Get the number of compaction input files. + * + * @return the number of compaction input files. + */ + public long numInputFiles() { + return numInputFiles(nativeHandle_); + } + + /** + * Get the number of compaction input files at the output level. + * + * @return the number of compaction input files at the output level. + */ + public long numInputFilesAtOutputLevel() { + return numInputFilesAtOutputLevel(nativeHandle_); + } + + /** + * Get the number of compaction output records. + * + * @return the number of compaction output records. + */ + public long numOutputRecords() { + return numOutputRecords(nativeHandle_); + } + + /** + * Get the number of compaction output files. + * + * @return the number of compaction output files. + */ + public long numOutputFiles() { + return numOutputFiles(nativeHandle_); + } + + /** + * Determine if the compaction is a manual compaction. + * + * @return true if the compaction is a manual compaction, false otherwise. + */ + public boolean isManualCompaction() { + return isManualCompaction(nativeHandle_); + } + + /** + * Get the size of the compaction input in bytes. + * + * @return the size of the compaction input in bytes. + */ + public long totalInputBytes() { + return totalInputBytes(nativeHandle_); + } + + /** + * Get the size of the compaction output in bytes. + * + * @return the size of the compaction output in bytes. + */ + public long totalOutputBytes() { + return totalOutputBytes(nativeHandle_); + } + + /** + * Get the number of records being replaced by newer record associated + * with same key. + * + * This could be a new value or a deletion entry for that key so this field + * sums up all updated and deleted keys. + * + * @return the number of records being replaced by newer record associated + * with same key. + */ + public long numRecordsReplaced() { + return numRecordsReplaced(nativeHandle_); + } + + /** + * Get the sum of the uncompressed input keys in bytes. + * + * @return the sum of the uncompressed input keys in bytes. + */ + public long totalInputRawKeyBytes() { + return totalInputRawKeyBytes(nativeHandle_); + } + + /** + * Get the sum of the uncompressed input values in bytes. + * + * @return the sum of the uncompressed input values in bytes. + */ + public long totalInputRawValueBytes() { + return totalInputRawValueBytes(nativeHandle_); + } + + /** + * Get the number of deletion entries before compaction. + * + * Deletion entries can disappear after compaction because they expired. + * + * @return the number of deletion entries before compaction. + */ + public long numInputDeletionRecords() { + return numInputDeletionRecords(nativeHandle_); + } + + /** + * Get the number of deletion records that were found obsolete and discarded + * because it is not possible to delete any more keys with this entry. + * (i.e. all possible deletions resulting from it have been completed) + * + * @return the number of deletion records that were found obsolete and + * discarded. + */ + public long numExpiredDeletionRecords() { + return numExpiredDeletionRecords(nativeHandle_); + } + + /** + * Get the number of corrupt keys (ParseInternalKey returned false when + * applied to the key) encountered and written out. + * + * @return the number of corrupt keys. + */ + public long numCorruptKeys() { + return numCorruptKeys(nativeHandle_); + } + + /** + * Get the Time spent on file's Append() call. + * + * Only populated if {@link ColumnFamilyOptions#reportBgIoStats()} is set. + * + * @return the Time spent on file's Append() call. + */ + public long fileWriteNanos() { + return fileWriteNanos(nativeHandle_); + } + + /** + * Get the Time spent on sync file range. + * + * Only populated if {@link ColumnFamilyOptions#reportBgIoStats()} is set. + * + * @return the Time spent on sync file range. + */ + public long fileRangeSyncNanos() { + return fileRangeSyncNanos(nativeHandle_); + } + + /** + * Get the Time spent on file fsync. + * + * Only populated if {@link ColumnFamilyOptions#reportBgIoStats()} is set. + * + * @return the Time spent on file fsync. + */ + public long fileFsyncNanos() { + return fileFsyncNanos(nativeHandle_); + } + + /** + * Get the Time spent on preparing file write (falocate, etc) + * + * Only populated if {@link ColumnFamilyOptions#reportBgIoStats()} is set. + * + * @return the Time spent on preparing file write (falocate, etc). + */ + public long filePrepareWriteNanos() { + return filePrepareWriteNanos(nativeHandle_); + } + + /** + * Get the smallest output key prefix. + * + * @return the smallest output key prefix. + */ + public byte[] smallestOutputKeyPrefix() { + return smallestOutputKeyPrefix(nativeHandle_); + } + + /** + * Get the largest output key prefix. + * + * @return the smallest output key prefix. + */ + public byte[] largestOutputKeyPrefix() { + return largestOutputKeyPrefix(nativeHandle_); + } + + /** + * Get the number of single-deletes which do not meet a put. + * + * @return number of single-deletes which do not meet a put. + */ + @Experimental("Performance optimization for a very specific workload") + public long numSingleDelFallthru() { + return numSingleDelFallthru(nativeHandle_); + } + + /** + * Get the number of single-deletes which meet something other than a put. + * + * @return the number of single-deletes which meet something other than a put. + */ + @Experimental("Performance optimization for a very specific workload") + public long numSingleDelMismatch() { + return numSingleDelMismatch(nativeHandle_); + } + + private static native long newCompactionJobStats(); + @Override protected native void disposeInternal(final long handle); + + + private static native void reset(final long handle); + private static native void add(final long handle, + final long compactionJobStatsHandle); + private static native long elapsedMicros(final long handle); + private static native long numInputRecords(final long handle); + private static native long numInputFiles(final long handle); + private static native long numInputFilesAtOutputLevel(final long handle); + private static native long numOutputRecords(final long handle); + private static native long numOutputFiles(final long handle); + private static native boolean isManualCompaction(final long handle); + private static native long totalInputBytes(final long handle); + private static native long totalOutputBytes(final long handle); + private static native long numRecordsReplaced(final long handle); + private static native long totalInputRawKeyBytes(final long handle); + private static native long totalInputRawValueBytes(final long handle); + private static native long numInputDeletionRecords(final long handle); + private static native long numExpiredDeletionRecords(final long handle); + private static native long numCorruptKeys(final long handle); + private static native long fileWriteNanos(final long handle); + private static native long fileRangeSyncNanos(final long handle); + private static native long fileFsyncNanos(final long handle); + private static native long filePrepareWriteNanos(final long handle); + private static native byte[] smallestOutputKeyPrefix(final long handle); + private static native byte[] largestOutputKeyPrefix(final long handle); + private static native long numSingleDelFallthru(final long handle); + private static native long numSingleDelMismatch(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptions.java new file mode 100644 index 000000000..2c7e391fb --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptions.java @@ -0,0 +1,121 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.List; + +/** + * CompactionOptions are used in + * {@link RocksDB#compactFiles(CompactionOptions, ColumnFamilyHandle, List, int, int, CompactionJobInfo)} + * calls. + */ +public class CompactionOptions extends RocksObject { + + public CompactionOptions() { + super(newCompactionOptions()); + } + + /** + * Get the compaction output compression type. + * + * See {@link #setCompression(CompressionType)}. + * + * @return the compression type. + */ + public CompressionType compression() { + return CompressionType.getCompressionType( + compression(nativeHandle_)); + } + + /** + * Set the compaction output compression type. + * + * Default: snappy + * + * If set to {@link CompressionType#DISABLE_COMPRESSION_OPTION}, + * RocksDB will choose compression type according to the + * {@link ColumnFamilyOptions#compressionType()}, taking into account + * the output level if {@link ColumnFamilyOptions#compressionPerLevel()} + * is specified. + * + * @param compression the compression type to use for compaction output. + * + * @return the instance of the current Options. + */ + public CompactionOptions setCompression(final CompressionType compression) { + setCompression(nativeHandle_, compression.getValue()); + return this; + } + + /** + * Get the compaction output file size limit. + * + * See {@link #setOutputFileSizeLimit(long)}. + * + * @return the file size limit. + */ + public long outputFileSizeLimit() { + return outputFileSizeLimit(nativeHandle_); + } + + /** + * Compaction will create files of size {@link #outputFileSizeLimit()}. + * + * Default: 2^64-1, which means that compaction will create a single file + * + * @param outputFileSizeLimit the size limit + * + * @return the instance of the current Options. + */ + public CompactionOptions setOutputFileSizeLimit( + final long outputFileSizeLimit) { + setOutputFileSizeLimit(nativeHandle_, outputFileSizeLimit); + return this; + } + + /** + * Get the maximum number of threads that will concurrently perform a + * compaction job. + * + * @return the maximum number of threads. + */ + public int maxSubcompactions() { + return maxSubcompactions(nativeHandle_); + } + + /** + * This value represents the maximum number of threads that will + * concurrently perform a compaction job by breaking it into multiple, + * smaller ones that are run simultaneously. + * + * Default: 0 (i.e. no subcompactions) + * + * If > 0, it will replace the option in + * {@link DBOptions#maxSubcompactions()} for this compaction. + * + * @param maxSubcompactions The maximum number of threads that will + * concurrently perform a compaction job + * + * @return the instance of the current Options. + */ + public CompactionOptions setMaxSubcompactions(final int maxSubcompactions) { + setMaxSubcompactions(nativeHandle_, maxSubcompactions); + return this; + } + + private static native long newCompactionOptions(); + @Override protected final native void disposeInternal(final long handle); + + private static native byte compression(final long handle); + private static native void setCompression(final long handle, + final byte compressionTypeValue); + private static native long outputFileSizeLimit(final long handle); + private static native void setOutputFileSizeLimit(final long handle, + final long outputFileSizeLimit); + private static native int maxSubcompactions(final long handle); + private static native void setMaxSubcompactions(final long handle, + final int maxSubcompactions); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptionsFIFO.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptionsFIFO.java index f79580780..4c8d6545c 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptionsFIFO.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionOptionsFIFO.java @@ -42,9 +42,48 @@ public class CompactionOptionsFIFO extends RocksObject { return maxTableFilesSize(nativeHandle_); } - private native void setMaxTableFilesSize(long handle, long maxTableFilesSize); - private native long maxTableFilesSize(long handle); + /** + * If true, try to do compaction to compact smaller files into larger ones. + * Minimum files to compact follows options.level0_file_num_compaction_trigger + * and compaction won't trigger if average compact bytes per del file is + * larger than options.write_buffer_size. This is to protect large files + * from being compacted again. + * + * Default: false + * + * @param allowCompaction true to allow intra-L0 compaction + * + * @return the reference to the current options. + */ + public CompactionOptionsFIFO setAllowCompaction( + final boolean allowCompaction) { + setAllowCompaction(nativeHandle_, allowCompaction); + return this; + } + + + /** + * Check if intra-L0 compaction is enabled. + * When enabled, we try to compact smaller files into larger ones. + * + * See {@link #setAllowCompaction(boolean)}. + * + * Default: false + * + * @return true if intra-L0 compaction is enabled, false otherwise. + */ + public boolean allowCompaction() { + return allowCompaction(nativeHandle_); + } + private native static long newCompactionOptionsFIFO(); @Override protected final native void disposeInternal(final long handle); + + private native void setMaxTableFilesSize(final long handle, + final long maxTableFilesSize); + private native long maxTableFilesSize(final long handle); + private native void setAllowCompaction(final long handle, + final boolean allowCompaction); + private native boolean allowCompaction(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionReason.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionReason.java new file mode 100644 index 000000000..f18c48122 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionReason.java @@ -0,0 +1,115 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +public enum CompactionReason { + kUnknown((byte)0x0), + + /** + * [Level] number of L0 files > level0_file_num_compaction_trigger + */ + kLevelL0FilesNum((byte)0x1), + + /** + * [Level] total size of level > MaxBytesForLevel() + */ + kLevelMaxLevelSize((byte)0x2), + + /** + * [Universal] Compacting for size amplification + */ + kUniversalSizeAmplification((byte)0x3), + + /** + * [Universal] Compacting for size ratio + */ + kUniversalSizeRatio((byte)0x4), + + /** + * [Universal] number of sorted runs > level0_file_num_compaction_trigger + */ + kUniversalSortedRunNum((byte)0x5), + + /** + * [FIFO] total size > max_table_files_size + */ + kFIFOMaxSize((byte)0x6), + + /** + * [FIFO] reduce number of files. + */ + kFIFOReduceNumFiles((byte)0x7), + + /** + * [FIFO] files with creation time < (current_time - interval) + */ + kFIFOTtl((byte)0x8), + + /** + * Manual compaction + */ + kManualCompaction((byte)0x9), + + /** + * DB::SuggestCompactRange() marked files for compaction + */ + kFilesMarkedForCompaction((byte)0x10), + + /** + * [Level] Automatic compaction within bottommost level to cleanup duplicate + * versions of same user key, usually due to a released snapshot. + */ + kBottommostFiles((byte)0x0A), + + /** + * Compaction based on TTL + */ + kTtl((byte)0x0B), + + /** + * According to the comments in flush_job.cc, RocksDB treats flush as + * a level 0 compaction in internal stats. + */ + kFlush((byte)0x0C), + + /** + * Compaction caused by external sst file ingestion + */ + kExternalSstIngestion((byte)0x0D); + + private final byte value; + + CompactionReason(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value + */ + byte getValue() { + return value; + } + + /** + * Get the CompactionReason from the internal representation value. + * + * @return the compaction reason. + * + * @throws IllegalArgumentException if the value is unknown. + */ + static CompactionReason fromValue(final byte value) { + for (final CompactionReason compactionReason : CompactionReason.values()) { + if(compactionReason.value == value) { + return compactionReason; + } + } + + throw new IllegalArgumentException( + "Illegal value provided for CompactionReason: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionStyle.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionStyle.java index 5e13363c4..b24bbf850 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionStyle.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompactionStyle.java @@ -5,6 +5,8 @@ package org.rocksdb; +import java.util.List; + /** * Enum CompactionStyle * @@ -21,6 +23,9 @@ package org.rocksdb; * compaction strategy. It is suited for keeping event log data with * very low overhead (query log for example). It periodically deletes * the old data, so it's basically a TTL compaction style. + *

  • NONE - Disable background compaction. + * Compaction jobs are submitted + * {@link RocksDB#compactFiles(CompactionOptions, ColumnFamilyHandle, List, int, int, CompactionJobInfo)} ()}.
  • * * * @see
    */ public enum CompactionStyle { - LEVEL((byte) 0), - UNIVERSAL((byte) 1), - FIFO((byte) 2); + LEVEL((byte) 0x0), + UNIVERSAL((byte) 0x1), + FIFO((byte) 0x2), + NONE((byte) 0x3); - private final byte value_; + private final byte value; - private CompactionStyle(byte value) { - value_ = value; + CompactionStyle(final byte value) { + this.value = value; } /** - * Returns the byte value of the enumerations value + * Get the internal representation value. * - * @return byte representation + * @return the internal representation value. */ + //TODO(AR) should be made package-private public byte getValue() { - return value_; + return value; + } + + /** + * Get the Compaction style from the internal representation value. + * + * @param value the internal representation value. + * + * @return the Compaction style + * + * @throws IllegalArgumentException if the value does not match a + * CompactionStyle + */ + static CompactionStyle fromValue(final byte value) + throws IllegalArgumentException { + for (final CompactionStyle compactionStyle : CompactionStyle.values()) { + if (compactionStyle.value == value) { + return compactionStyle; + } + } + throw new IllegalArgumentException("Unknown value for CompactionStyle: " + + value); } } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompressionOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompressionOptions.java index 4927770e5..a9072bbb9 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompressionOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/CompressionOptions.java @@ -71,6 +71,67 @@ public class CompressionOptions extends RocksObject { return maxDictBytes(nativeHandle_); } + /** + * Maximum size of training data passed to zstd's dictionary trainer. Using + * zstd's dictionary trainer can achieve even better compression ratio + * improvements than using {@link #setMaxDictBytes(int)} alone. + * + * The training data will be used to generate a dictionary + * of {@link #maxDictBytes()}. + * + * Default: 0. + * + * @param zstdMaxTrainBytes Maximum bytes to use for training ZStd. + * + * @return the reference to the current options + */ + public CompressionOptions setZStdMaxTrainBytes(final int zstdMaxTrainBytes) { + setZstdMaxTrainBytes(nativeHandle_, zstdMaxTrainBytes); + return this; + } + + /** + * Maximum size of training data passed to zstd's dictionary trainer. + * + * @return Maximum bytes to use for training ZStd + */ + public int zstdMaxTrainBytes() { + return zstdMaxTrainBytes(nativeHandle_); + } + + /** + * When the compression options are set by the user, it will be set to "true". + * For bottommost_compression_opts, to enable it, user must set enabled=true. + * Otherwise, bottommost compression will use compression_opts as default + * compression options. + * + * For compression_opts, if compression_opts.enabled=false, it is still + * used as compression options for compression process. + * + * Default: false. + * + * @param enabled true to use these compression options + * for the bottommost_compression_opts, false otherwise + * + * @return the reference to the current options + */ + public CompressionOptions setEnabled(final boolean enabled) { + setEnabled(nativeHandle_, enabled); + return this; + } + + /** + * Determine whether these compression options + * are used for the bottommost_compression_opts. + * + * @return true if these compression options are used + * for the bottommost_compression_opts, false otherwise + */ + public boolean enabled() { + return enabled(nativeHandle_); + } + + private native static long newCompressionOptions(); @Override protected final native void disposeInternal(final long handle); @@ -82,4 +143,9 @@ public class CompressionOptions extends RocksObject { private native int strategy(final long handle); private native void setMaxDictBytes(final long handle, final int maxDictBytes); private native int maxDictBytes(final long handle); + private native void setZstdMaxTrainBytes(final long handle, + final int zstdMaxTrainBytes); + private native int zstdMaxTrainBytes(final long handle); + private native void setEnabled(final long handle, final boolean enabled); + private native boolean enabled(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptions.java index c32329388..e2c4c02b3 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptions.java @@ -15,8 +15,9 @@ import java.util.*; * If {@link #dispose()} function is not called, then it will be GC'd * automatically and native resources will be released as part of the process. */ -public class DBOptions - extends RocksObject implements DBOptionsInterface { +public class DBOptions extends RocksObject + implements DBOptionsInterface, + MutableDBOptionsInterface { static { RocksDB.loadLibrary(); } @@ -46,6 +47,17 @@ public class DBOptions this.numShardBits_ = other.numShardBits_; this.rateLimiter_ = other.rateLimiter_; this.rowCache_ = other.rowCache_; + this.walFilter_ = other.walFilter_; + this.writeBufferManager_ = other.writeBufferManager_; + } + + /** + * Constructor from Options + * + * @param options The options. + */ + public DBOptions(final Options options) { + super(newDBOptionsFromOptions(options.nativeHandle_)); } /** @@ -130,18 +142,6 @@ public class DBOptions return createMissingColumnFamilies(nativeHandle_); } - @Override - public DBOptions setEnv(final Env env) { - setEnv(nativeHandle_, env.nativeHandle_); - this.env_ = env; - return this; - } - - @Override - public Env getEnv() { - return env_; - } - @Override public DBOptions setErrorIfExists( final boolean errorIfExists) { @@ -170,6 +170,18 @@ public class DBOptions return paranoidChecks(nativeHandle_); } + @Override + public DBOptions setEnv(final Env env) { + setEnv(nativeHandle_, env.nativeHandle_); + this.env_ = env; + return this; + } + + @Override + public Env getEnv() { + return env_; + } + @Override public DBOptions setRateLimiter(final RateLimiter rateLimiter) { assert(isOwningHandle()); @@ -285,8 +297,8 @@ public class DBOptions assert(isOwningHandle()); final int len = dbPaths.size(); - final String paths[] = new String[len]; - final long targetSizes[] = new long[len]; + final String[] paths = new String[len]; + final long[] targetSizes = new long[len]; int i = 0; for(final DbPath dbPath : dbPaths) { @@ -304,8 +316,8 @@ public class DBOptions if(len == 0) { return Collections.emptyList(); } else { - final String paths[] = new String[len]; - final long targetSizes[] = new long[len]; + final String[] paths = new String[len]; + final long[] targetSizes = new long[len]; dbPaths(nativeHandle_, paths, targetSizes); @@ -359,6 +371,19 @@ public class DBOptions return deleteObsoleteFilesPeriodMicros(nativeHandle_); } + @Override + public DBOptions setMaxBackgroundJobs(final int maxBackgroundJobs) { + assert(isOwningHandle()); + setMaxBackgroundJobs(nativeHandle_, maxBackgroundJobs); + return this; + } + + @Override + public int maxBackgroundJobs() { + assert(isOwningHandle()); + return maxBackgroundJobs(nativeHandle_); + } + @Override public void setBaseBackgroundCompactions( final int baseBackgroundCompactions) { @@ -387,9 +412,10 @@ public class DBOptions } @Override - public void setMaxSubcompactions(final int maxSubcompactions) { + public DBOptions setMaxSubcompactions(final int maxSubcompactions) { assert(isOwningHandle()); setMaxSubcompactions(nativeHandle_, maxSubcompactions); + return this; } @Override @@ -412,19 +438,6 @@ public class DBOptions return maxBackgroundFlushes(nativeHandle_); } - @Override - public DBOptions setMaxBackgroundJobs(final int maxBackgroundJobs) { - assert(isOwningHandle()); - setMaxBackgroundJobs(nativeHandle_, maxBackgroundJobs); - return this; - } - - @Override - public int maxBackgroundJobs() { - assert(isOwningHandle()); - return maxBackgroundJobs(nativeHandle_); - } - @Override public DBOptions setMaxLogFileSize(final long maxLogFileSize) { assert(isOwningHandle()); @@ -550,73 +563,73 @@ public class DBOptions } @Override - public DBOptions setUseDirectReads( - final boolean useDirectReads) { + public DBOptions setAllowMmapReads( + final boolean allowMmapReads) { assert(isOwningHandle()); - setUseDirectReads(nativeHandle_, useDirectReads); + setAllowMmapReads(nativeHandle_, allowMmapReads); return this; } @Override - public boolean useDirectReads() { + public boolean allowMmapReads() { assert(isOwningHandle()); - return useDirectReads(nativeHandle_); + return allowMmapReads(nativeHandle_); } @Override - public DBOptions setUseDirectIoForFlushAndCompaction( - final boolean useDirectIoForFlushAndCompaction) { + public DBOptions setAllowMmapWrites( + final boolean allowMmapWrites) { assert(isOwningHandle()); - setUseDirectIoForFlushAndCompaction(nativeHandle_, - useDirectIoForFlushAndCompaction); + setAllowMmapWrites(nativeHandle_, allowMmapWrites); return this; } @Override - public boolean useDirectIoForFlushAndCompaction() { + public boolean allowMmapWrites() { assert(isOwningHandle()); - return useDirectIoForFlushAndCompaction(nativeHandle_); + return allowMmapWrites(nativeHandle_); } @Override - public DBOptions setAllowFAllocate(final boolean allowFAllocate) { + public DBOptions setUseDirectReads( + final boolean useDirectReads) { assert(isOwningHandle()); - setAllowFAllocate(nativeHandle_, allowFAllocate); + setUseDirectReads(nativeHandle_, useDirectReads); return this; } @Override - public boolean allowFAllocate() { + public boolean useDirectReads() { assert(isOwningHandle()); - return allowFAllocate(nativeHandle_); + return useDirectReads(nativeHandle_); } @Override - public DBOptions setAllowMmapReads( - final boolean allowMmapReads) { + public DBOptions setUseDirectIoForFlushAndCompaction( + final boolean useDirectIoForFlushAndCompaction) { assert(isOwningHandle()); - setAllowMmapReads(nativeHandle_, allowMmapReads); + setUseDirectIoForFlushAndCompaction(nativeHandle_, + useDirectIoForFlushAndCompaction); return this; } @Override - public boolean allowMmapReads() { + public boolean useDirectIoForFlushAndCompaction() { assert(isOwningHandle()); - return allowMmapReads(nativeHandle_); + return useDirectIoForFlushAndCompaction(nativeHandle_); } @Override - public DBOptions setAllowMmapWrites( - final boolean allowMmapWrites) { + public DBOptions setAllowFAllocate(final boolean allowFAllocate) { assert(isOwningHandle()); - setAllowMmapWrites(nativeHandle_, allowMmapWrites); + setAllowFAllocate(nativeHandle_, allowFAllocate); return this; } @Override - public boolean allowMmapWrites() { + public boolean allowFAllocate() { assert(isOwningHandle()); - return allowMmapWrites(nativeHandle_); + return allowFAllocate(nativeHandle_); } @Override @@ -667,6 +680,20 @@ public class DBOptions return this; } + @Override + public DBOptions setWriteBufferManager(final WriteBufferManager writeBufferManager) { + assert(isOwningHandle()); + setWriteBufferManager(nativeHandle_, writeBufferManager.nativeHandle_); + this.writeBufferManager_ = writeBufferManager; + return this; + } + + @Override + public WriteBufferManager writeBufferManager() { + assert(isOwningHandle()); + return this.writeBufferManager_; + } + @Override public long dbWriteBufferSize() { assert(isOwningHandle()); @@ -780,6 +807,33 @@ public class DBOptions return walBytesPerSync(nativeHandle_); } + //TODO(AR) NOW +// @Override +// public DBOptions setListeners(final List listeners) { +// assert(isOwningHandle()); +// final long[] eventListenerHandlers = new long[listeners.size()]; +// for (int i = 0; i < eventListenerHandlers.length; i++) { +// eventListenerHandlers[i] = listeners.get(i).nativeHandle_; +// } +// setEventListeners(nativeHandle_, eventListenerHandlers); +// return this; +// } +// +// @Override +// public Collection listeners() { +// assert(isOwningHandle()); +// final long[] eventListenerHandlers = listeners(nativeHandle_); +// if (eventListenerHandlers == null || eventListenerHandlers.length == 0) { +// return Collections.emptyList(); +// } +// +// final List eventListeners = new ArrayList<>(); +// for (final long eventListenerHandle : eventListenerHandlers) { +// eventListeners.add(new EventListener(eventListenerHandle)); //TODO(AR) check ownership is set to false! +// } +// return eventListeners; +// } + @Override public DBOptions setEnableThreadTracking(final boolean enableThreadTracking) { assert(isOwningHandle()); @@ -805,6 +859,19 @@ public class DBOptions return delayedWriteRate(nativeHandle_); } + @Override + public DBOptions setEnablePipelinedWrite(final boolean enablePipelinedWrite) { + assert(isOwningHandle()); + setEnablePipelinedWrite(nativeHandle_, enablePipelinedWrite); + return this; + } + + @Override + public boolean enablePipelinedWrite() { + assert(isOwningHandle()); + return enablePipelinedWrite(nativeHandle_); + } + @Override public DBOptions setAllowConcurrentMemtableWrite( final boolean allowConcurrentMemtableWrite) { @@ -906,6 +973,20 @@ public class DBOptions return this.rowCache_; } + @Override + public DBOptions setWalFilter(final AbstractWalFilter walFilter) { + assert(isOwningHandle()); + setWalFilter(nativeHandle_, walFilter.nativeHandle_); + this.walFilter_ = walFilter; + return this; + } + + @Override + public WalFilter walFilter() { + assert(isOwningHandle()); + return this.walFilter_; + } + @Override public DBOptions setFailIfOptionsFileError(final boolean failIfOptionsFileError) { assert(isOwningHandle()); @@ -958,6 +1039,69 @@ public class DBOptions return avoidFlushDuringShutdown(nativeHandle_); } + @Override + public DBOptions setAllowIngestBehind(final boolean allowIngestBehind) { + assert(isOwningHandle()); + setAllowIngestBehind(nativeHandle_, allowIngestBehind); + return this; + } + + @Override + public boolean allowIngestBehind() { + assert(isOwningHandle()); + return allowIngestBehind(nativeHandle_); + } + + @Override + public DBOptions setPreserveDeletes(final boolean preserveDeletes) { + assert(isOwningHandle()); + setPreserveDeletes(nativeHandle_, preserveDeletes); + return this; + } + + @Override + public boolean preserveDeletes() { + assert(isOwningHandle()); + return preserveDeletes(nativeHandle_); + } + + @Override + public DBOptions setTwoWriteQueues(final boolean twoWriteQueues) { + assert(isOwningHandle()); + setTwoWriteQueues(nativeHandle_, twoWriteQueues); + return this; + } + + @Override + public boolean twoWriteQueues() { + assert(isOwningHandle()); + return twoWriteQueues(nativeHandle_); + } + + @Override + public DBOptions setManualWalFlush(final boolean manualWalFlush) { + assert(isOwningHandle()); + setManualWalFlush(nativeHandle_, manualWalFlush); + return this; + } + + @Override + public boolean manualWalFlush() { + assert(isOwningHandle()); + return manualWalFlush(nativeHandle_); + } + + @Override + public DBOptions setAtomicFlush(final boolean atomicFlush) { + setAtomicFlush(nativeHandle_, atomicFlush); + return this; + } + + @Override + public boolean atomicFlush() { + return atomicFlush(nativeHandle_); + } + static final int DEFAULT_NUM_SHARD_BITS = -1; @@ -976,8 +1120,9 @@ public class DBOptions private static native long getDBOptionsFromProps( String optString); - private native static long newDBOptions(); - private native static long copyDBOptions(long handle); + private static native long newDBOptions(); + private static native long copyDBOptions(final long handle); + private static native long newDBOptionsFromOptions(final long optionsHandle); @Override protected final native void disposeInternal(final long handle); private native void optimizeForSmallDb(final long handle); @@ -1087,6 +1232,8 @@ public class DBOptions private native boolean adviseRandomOnOpen(long handle); private native void setDbWriteBufferSize(final long handle, final long dbWriteBufferSize); + private native void setWriteBufferManager(final long dbOptionsHandle, + final long writeBufferManagerHandle); private native long dbWriteBufferSize(final long handle); private native void setAccessHintOnCompactionStart(final long handle, final byte accessHintOnCompactionStart); @@ -1116,6 +1263,9 @@ public class DBOptions private native boolean enableThreadTracking(long handle); private native void setDelayedWriteRate(long handle, long delayedWriteRate); private native long delayedWriteRate(long handle); + private native void setEnablePipelinedWrite(final long handle, + final boolean enablePipelinedWrite); + private native boolean enablePipelinedWrite(final long handle); private native void setAllowConcurrentMemtableWrite(long handle, boolean allowConcurrentMemtableWrite); private native boolean allowConcurrentMemtableWrite(long handle); @@ -1138,7 +1288,9 @@ public class DBOptions final boolean allow2pc); private native boolean allow2pc(final long handle); private native void setRowCache(final long handle, - final long row_cache_handle); + final long rowCacheHandle); + private native void setWalFilter(final long handle, + final long walFilterHandle); private native void setFailIfOptionsFileError(final long handle, final boolean failIfOptionsFileError); private native boolean failIfOptionsFileError(final long handle); @@ -1151,6 +1303,21 @@ public class DBOptions private native void setAvoidFlushDuringShutdown(final long handle, final boolean avoidFlushDuringShutdown); private native boolean avoidFlushDuringShutdown(final long handle); + private native void setAllowIngestBehind(final long handle, + final boolean allowIngestBehind); + private native boolean allowIngestBehind(final long handle); + private native void setPreserveDeletes(final long handle, + final boolean preserveDeletes); + private native boolean preserveDeletes(final long handle); + private native void setTwoWriteQueues(final long handle, + final boolean twoWriteQueues); + private native boolean twoWriteQueues(final long handle); + private native void setManualWalFlush(final long handle, + final boolean manualWalFlush); + private native boolean manualWalFlush(final long handle); + private native void setAtomicFlush(final long handle, + final boolean atomicFlush); + private native boolean atomicFlush(final long handle); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! @@ -1158,4 +1325,6 @@ public class DBOptions private int numShardBits_; private RateLimiter rateLimiter_; private Cache rowCache_; + private WalFilter walFilter_; + private WriteBufferManager writeBufferManager_; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptionsInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptionsInterface.java index 7c406eaf8..af9aa179b 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptionsInterface.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DBOptionsInterface.java @@ -174,6 +174,7 @@ public interface DBOptionsInterface { * first db_path (db_name if db_paths is empty). * * @param sstFileManager The SST File Manager for the db. + * @return the instance of the current object. */ T setSstFileManager(SstFileManager sstFileManager); @@ -205,35 +206,9 @@ public interface DBOptionsInterface { InfoLogLevel infoLogLevel(); /** - * Number of open files that can be used by the DB. You may need to - * increase this if your database has a large working set. Value -1 means - * files opened are always kept open. You can estimate number of files based - * on {@code target_file_size_base} and {@code target_file_size_multiplier} - * for level-based compaction. For universal-style compaction, you can usually - * set it to -1. - * Default: 5000 - * - * @param maxOpenFiles the maximum number of open files. - * @return the instance of the current object. - */ - T setMaxOpenFiles(int maxOpenFiles); - - /** - * Number of open files that can be used by the DB. You may need to - * increase this if your database has a large working set. Value -1 means - * files opened are always kept open. You can estimate number of files based - * on {@code target_file_size_base} and {@code target_file_size_multiplier} - * for level-based compaction. For universal-style compaction, you can usually - * set it to -1. - * - * @return the maximum number of open files. - */ - int maxOpenFiles(); - - /** - * If {@link #maxOpenFiles()} is -1, DB will open all files on DB::Open(). You - * can use this option to increase the number of threads used to open the - * files. + * If {@link MutableDBOptionsInterface#maxOpenFiles()} is -1, DB will open + * all files on DB::Open(). You can use this option to increase the number + * of threads used to open the files. * * Default: 16 * @@ -245,9 +220,9 @@ public interface DBOptionsInterface { T setMaxFileOpeningThreads(int maxFileOpeningThreads); /** - * If {@link #maxOpenFiles()} is -1, DB will open all files on DB::Open(). You - * can use this option to increase the number of threads used to open the - * files. + * If {@link MutableDBOptionsInterface#maxOpenFiles()} is -1, DB will open all + * files on DB::Open(). You can use this option to increase the number of + * threads used to open the files. * * Default: 16 * @@ -255,36 +230,6 @@ public interface DBOptionsInterface { */ int maxFileOpeningThreads(); - /** - *

    Once write-ahead logs exceed this size, we will start forcing the - * flush of column families whose memtables are backed by the oldest live - * WAL file (i.e. the ones that are causing all the space amplification). - *

    - *

    If set to 0 (default), we will dynamically choose the WAL size limit to - * be [sum of all write_buffer_size * max_write_buffer_number] * 2

    - *

    This option takes effect only when there are more than one column family as - * otherwise the wal size is dictated by the write_buffer_size.

    - *

    Default: 0

    - * - * @param maxTotalWalSize max total wal size. - * @return the instance of the current object. - */ - T setMaxTotalWalSize(long maxTotalWalSize); - - /** - *

    Returns the max total wal size. Once write-ahead logs exceed this size, - * we will start forcing the flush of column families whose memtables are - * backed by the oldest live WAL file (i.e. the ones that are causing all - * the space amplification).

    - * - *

    If set to 0 (default), we will dynamically choose the WAL size limit - * to be [sum of all write_buffer_size * max_write_buffer_number] * 2 - *

    - * - * @return max total wal size - */ - long maxTotalWalSize(); - /** *

    Sets the statistics object which collects metrics about database operations. * Statistics objects should not be shared between DB instances as @@ -465,59 +410,6 @@ public interface DBOptionsInterface { */ long deleteObsoleteFilesPeriodMicros(); - /** - * Suggested number of concurrent background compaction jobs, submitted to - * the default LOW priority thread pool. - * Default: 1 - * - * @param baseBackgroundCompactions Suggested number of background compaction - * jobs - * - * @deprecated Use {@link #setMaxBackgroundJobs(int)} - */ - void setBaseBackgroundCompactions(int baseBackgroundCompactions); - - /** - * Suggested number of concurrent background compaction jobs, submitted to - * the default LOW priority thread pool. - * Default: 1 - * - * @return Suggested number of background compaction jobs - */ - int baseBackgroundCompactions(); - - /** - * Specifies the maximum number of concurrent background compaction jobs, - * submitted to the default LOW priority thread pool. - * If you're increasing this, also consider increasing number of threads in - * LOW priority thread pool. For more information, see - * Default: 1 - * - * @param maxBackgroundCompactions the maximum number of background - * compaction jobs. - * @return the instance of the current object. - * - * @see RocksEnv#setBackgroundThreads(int) - * @see RocksEnv#setBackgroundThreads(int, int) - * @see #maxBackgroundFlushes() - */ - T setMaxBackgroundCompactions(int maxBackgroundCompactions); - - /** - * Returns the maximum number of concurrent background compaction jobs, - * submitted to the default LOW priority thread pool. - * When increasing this number, we may also want to consider increasing - * number of threads in LOW priority thread pool. - * Default: 1 - * - * @return the maximum number of concurrent background compaction jobs. - * @see RocksEnv#setBackgroundThreads(int) - * @see RocksEnv#setBackgroundThreads(int, int) - * - * @deprecated Use {@link #setMaxBackgroundJobs(int)} - */ - int maxBackgroundCompactions(); - /** * This value represents the maximum number of threads that will * concurrently perform a compaction job by breaking it into multiple, @@ -526,8 +418,10 @@ public interface DBOptionsInterface { * * @param maxSubcompactions The maximum number of threads that will * concurrently perform a compaction job + * + * @return the instance of the current object. */ - void setMaxSubcompactions(int maxSubcompactions); + T setMaxSubcompactions(int maxSubcompactions); /** * This value represents the maximum number of threads that will @@ -550,11 +444,12 @@ public interface DBOptionsInterface { * @return the instance of the current object. * * @see RocksEnv#setBackgroundThreads(int) - * @see RocksEnv#setBackgroundThreads(int, int) - * @see #maxBackgroundCompactions() + * @see RocksEnv#setBackgroundThreads(int, Priority) + * @see MutableDBOptionsInterface#maxBackgroundCompactions() * - * @deprecated Use {@link #setMaxBackgroundJobs(int)} + * @deprecated Use {@link MutableDBOptionsInterface#setMaxBackgroundJobs(int)} */ + @Deprecated T setMaxBackgroundFlushes(int maxBackgroundFlushes); /** @@ -565,29 +460,11 @@ public interface DBOptionsInterface { * * @return the maximum number of concurrent background flush jobs. * @see RocksEnv#setBackgroundThreads(int) - * @see RocksEnv#setBackgroundThreads(int, int) + * @see RocksEnv#setBackgroundThreads(int, Priority) */ + @Deprecated int maxBackgroundFlushes(); - /** - * Specifies the maximum number of concurrent background jobs (both flushes - * and compactions combined). - * Default: 2 - * - * @param maxBackgroundJobs number of max concurrent background jobs - * @return the instance of the current object. - */ - T setMaxBackgroundJobs(int maxBackgroundJobs); - - /** - * Returns the maximum number of concurrent background jobs (both flushes - * and compactions combined). - * Default: 2 - * - * @return the maximum number of concurrent background jobs. - */ - int maxBackgroundJobs(); - /** * Specifies the maximum size of a info log file. If the current log file * is larger than `max_log_file_size`, a new info log file will @@ -937,23 +814,6 @@ public interface DBOptionsInterface { */ boolean isFdCloseOnExec(); - /** - * if not zero, dump rocksdb.stats to LOG every stats_dump_period_sec - * Default: 600 (10 minutes) - * - * @param statsDumpPeriodSec time interval in seconds. - * @return the instance of the current object. - */ - T setStatsDumpPeriodSec(int statsDumpPeriodSec); - - /** - * If not zero, dump rocksdb.stats to LOG every stats_dump_period_sec - * Default: 600 (10 minutes) - * - * @return time interval in seconds. - */ - int statsDumpPeriodSec(); - /** * If set true, will hint the underlying file system that the file * access pattern is random, when a sst file is opened. @@ -991,6 +851,28 @@ public interface DBOptionsInterface { */ T setDbWriteBufferSize(long dbWriteBufferSize); + /** + * Use passed {@link WriteBufferManager} to control memory usage across + * multiple column families and/or DB instances. + * + * Check + * https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager + * for more details on when to use it + * + * @param writeBufferManager The WriteBufferManager to use + * @return the reference of the current options. + */ + T setWriteBufferManager(final WriteBufferManager writeBufferManager); + + /** + * Reference to {@link WriteBufferManager} used by it.
    + * + * Default: null (Disabled) + * + * @return a reference to WriteBufferManager + */ + WriteBufferManager writeBufferManager(); + /** * Amount of data to build up in memtables across all column * families before writing to disk. @@ -1066,36 +948,6 @@ public interface DBOptionsInterface { */ boolean newTableReaderForCompactionInputs(); - /** - * If non-zero, we perform bigger reads when doing compaction. If you're - * running RocksDB on spinning disks, you should set this to at least 2MB. - * - * That way RocksDB's compaction is doing sequential instead of random reads. - * When non-zero, we also force {@link #newTableReaderForCompactionInputs()} - * to true. - * - * Default: 0 - * - * @param compactionReadaheadSize The compaction read-ahead size - * - * @return the reference to the current options. - */ - T setCompactionReadaheadSize(final long compactionReadaheadSize); - - /** - * If non-zero, we perform bigger reads when doing compaction. If you're - * running RocksDB on spinning disks, you should set this to at least 2MB. - * - * That way RocksDB's compaction is doing sequential instead of random reads. - * When non-zero, we also force {@link #newTableReaderForCompactionInputs()} - * to true. - * - * Default: 0 - * - * @return The compaction read-ahead size - */ - long compactionReadaheadSize(); - /** * This is a maximum buffer size that is used by WinMmapReadableFile in * unbuffered disk I/O mode. We need to maintain an aligned buffer for @@ -1103,7 +955,8 @@ public interface DBOptionsInterface { * for bigger requests allocate one shot buffers. In unbuffered mode we * always bypass read-ahead buffer at ReadaheadRandomAccessFile * When read-ahead is required we then make use of - * {@link #compactionReadaheadSize()} value and always try to read ahead. + * {@link MutableDBOptionsInterface#compactionReadaheadSize()} value and + * always try to read ahead. * With read-ahead we always pre-allocate buffer to the size instead of * growing it up to a limit. * @@ -1128,9 +981,9 @@ public interface DBOptionsInterface { * for bigger requests allocate one shot buffers. In unbuffered mode we * always bypass read-ahead buffer at ReadaheadRandomAccessFile * When read-ahead is required we then make use of - * {@link #compactionReadaheadSize()} value and always try to read ahead. - * With read-ahead we always pre-allocate buffer to the size instead of - * growing it up to a limit. + * {@link MutableDBOptionsInterface#compactionReadaheadSize()} value and + * always try to read ahead. With read-ahead we always pre-allocate buffer + * to the size instead of growing it up to a limit. * * This option is currently honored only on Windows * @@ -1143,30 +996,6 @@ public interface DBOptionsInterface { */ long randomAccessMaxBufferSize(); - /** - * This is the maximum buffer size that is used by WritableFileWriter. - * On Windows, we need to maintain an aligned buffer for writes. - * We allow the buffer to grow until it's size hits the limit. - * - * Default: 1024 * 1024 (1 MB) - * - * @param writableFileMaxBufferSize the maximum buffer size - * - * @return the reference to the current options. - */ - T setWritableFileMaxBufferSize(long writableFileMaxBufferSize); - - /** - * This is the maximum buffer size that is used by WritableFileWriter. - * On Windows, we need to maintain an aligned buffer for writes. - * We allow the buffer to grow until it's size hits the limit. - * - * Default: 1024 * 1024 (1 MB) - * - * @return the maximum buffer size - */ - long writableFileMaxBufferSize(); - /** * Use adaptive mutex, which spins in the user space before resorting * to kernel. This could reduce context switch when the mutex is not @@ -1190,45 +1019,24 @@ public interface DBOptionsInterface { */ boolean useAdaptiveMutex(); - /** - * Allows OS to incrementally sync files to disk while they are being - * written, asynchronously, in the background. - * Issue one request for every bytes_per_sync written. 0 turns it off. - * Default: 0 - * - * @param bytesPerSync size in bytes - * @return the instance of the current object. - */ - T setBytesPerSync(long bytesPerSync); - - /** - * Allows OS to incrementally sync files to disk while they are being - * written, asynchronously, in the background. - * Issue one request for every bytes_per_sync written. 0 turns it off. - * Default: 0 - * - * @return size in bytes - */ - long bytesPerSync(); - - /** - * Same as {@link #setBytesPerSync(long)} , but applies to WAL files - * - * Default: 0, turned off - * - * @param walBytesPerSync size in bytes - * @return the instance of the current object. - */ - T setWalBytesPerSync(long walBytesPerSync); - - /** - * Same as {@link #bytesPerSync()} , but applies to WAL files - * - * Default: 0, turned off - * - * @return size in bytes - */ - long walBytesPerSync(); + //TODO(AR) NOW +// /** +// * Sets the {@link EventListener}s whose callback functions +// * will be called when specific RocksDB event happens. +// * +// * @param listeners the listeners who should be notified on various events. +// * +// * @return the instance of the current object. +// */ +// T setListeners(final List listeners); +// +// /** +// * Gets the {@link EventListener}s whose callback functions +// * will be called when specific RocksDB event happens. +// * +// * @return a collection of Event listeners. +// */ +// Collection listeners(); /** * If true, then the status of the threads involved in this DB will @@ -1253,40 +1061,33 @@ public interface DBOptionsInterface { boolean enableThreadTracking(); /** - * The limited write rate to DB if - * {@link ColumnFamilyOptions#softPendingCompactionBytesLimit()} or - * {@link ColumnFamilyOptions#level0SlowdownWritesTrigger()} is triggered, - * or we are writing to the last mem table allowed and we allow more than 3 - * mem tables. It is calculated using size of user write requests before - * compression. RocksDB may decide to slow down more if the compaction still - * gets behind further. + * By default, a single write thread queue is maintained. The thread gets + * to the head of the queue becomes write batch group leader and responsible + * for writing to WAL and memtable for the batch group. * - * Unit: bytes per second. + * If {@link #enablePipelinedWrite()} is true, separate write thread queue is + * maintained for WAL write and memtable write. A write thread first enter WAL + * writer queue and then memtable writer queue. Pending thread on the WAL + * writer queue thus only have to wait for previous writers to finish their + * WAL writing but not the memtable writing. Enabling the feature may improve + * write throughput and reduce latency of the prepare phase of two-phase + * commit. * - * Default: 16MB/s + * Default: false * - * @param delayedWriteRate the rate in bytes per second + * @param enablePipelinedWrite true to enabled pipelined writes * * @return the reference to the current options. */ - T setDelayedWriteRate(long delayedWriteRate); + T setEnablePipelinedWrite(final boolean enablePipelinedWrite); /** - * The limited write rate to DB if - * {@link ColumnFamilyOptions#softPendingCompactionBytesLimit()} or - * {@link ColumnFamilyOptions#level0SlowdownWritesTrigger()} is triggered, - * or we are writing to the last mem table allowed and we allow more than 3 - * mem tables. It is calculated using size of user write requests before - * compression. RocksDB may decide to slow down more if the compaction still - * gets behind further. - * - * Unit: bytes per second. - * - * Default: 16MB/s + * Returns true if pipelined writes are enabled. + * See {@link #setEnablePipelinedWrite(boolean)}. * - * @return the rate in bytes per second + * @return true if pipelined writes are enabled, false otherwise. */ - long delayedWriteRate(); + boolean enablePipelinedWrite(); /** * If true, allow multi-writers to update mem tables in parallel. @@ -1488,6 +1289,27 @@ public interface DBOptionsInterface { */ Cache rowCache(); + /** + * A filter object supplied to be invoked while processing write-ahead-logs + * (WALs) during recovery. The filter provides a way to inspect log + * records, ignoring a particular record or skipping replay. + * The filter is invoked at startup and is invoked from a single-thread + * currently. + * + * @param walFilter the filter for processing WALs during recovery. + * + * @return the reference to the current options. + */ + T setWalFilter(final AbstractWalFilter walFilter); + + /** + * Get's the filter for processing WALs during recovery. + * See {@link #setWalFilter(AbstractWalFilter)}. + * + * @return the filter used for processing WALs during recovery. + */ + WalFilter walFilter(); + /** * If true, then DB::Open / CreateColumnFamily / DropColumnFamily * / SetOptions will fail if options file is not detected or properly @@ -1566,35 +1388,126 @@ public interface DBOptionsInterface { boolean avoidFlushDuringRecovery(); /** - * By default RocksDB will flush all memtables on DB close if there are - * unpersisted data (i.e. with WAL disabled) The flush can be skip to speedup - * DB close. Unpersisted data WILL BE LOST. + * Set this option to true during creation of database if you want + * to be able to ingest behind (call IngestExternalFile() skipping keys + * that already exist, rather than overwriting matching keys). + * Setting this option to true will affect 2 things: + * 1) Disable some internal optimizations around SST file compression + * 2) Reserve bottom-most level for ingested files only. + * 3) Note that num_levels should be >= 3 if this option is turned on. * * DEFAULT: false * - * Dynamically changeable through - * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)} - * API. + * @param allowIngestBehind true to allow ingest behind, false to disallow. + * + * @return the reference to the current options. + */ + T setAllowIngestBehind(final boolean allowIngestBehind); + + /** + * Returns true if ingest behind is allowed. + * See {@link #setAllowIngestBehind(boolean)}. + * + * @return true if ingest behind is allowed, false otherwise. + */ + boolean allowIngestBehind(); + + /** + * Needed to support differential snapshots. + * If set to true then DB will only process deletes with sequence number + * less than what was set by SetPreserveDeletesSequenceNumber(uint64_t ts). + * Clients are responsible to periodically call this method to advance + * the cutoff time. If this method is never called and preserve_deletes + * is set to true NO deletes will ever be processed. + * At the moment this only keeps normal deletes, SingleDeletes will + * not be preserved. + * + * DEFAULT: false + * + * @param preserveDeletes true to preserve deletes. + * + * @return the reference to the current options. + */ + T setPreserveDeletes(final boolean preserveDeletes); + + /** + * Returns true if deletes are preserved. + * See {@link #setPreserveDeletes(boolean)}. + * + * @return true if deletes are preserved, false otherwise. + */ + boolean preserveDeletes(); + + /** + * If enabled it uses two queues for writes, one for the ones with + * disable_memtable and one for the ones that also write to memtable. This + * allows the memtable writes not to lag behind other writes. It can be used + * to optimize MySQL 2PC in which only the commits, which are serial, write to + * memtable. + * + * DEFAULT: false * - * @param avoidFlushDuringShutdown true if we should avoid flush during - * shutdown + * @param twoWriteQueues true to enable two write queues, false otherwise. * * @return the reference to the current options. */ - T setAvoidFlushDuringShutdown(boolean avoidFlushDuringShutdown); + T setTwoWriteQueues(final boolean twoWriteQueues); + + /** + * Returns true if two write queues are enabled. + * + * @return true if two write queues are enabled, false otherwise. + */ + boolean twoWriteQueues(); /** - * By default RocksDB will flush all memtables on DB close if there are - * unpersisted data (i.e. with WAL disabled) The flush can be skip to speedup - * DB close. Unpersisted data WILL BE LOST. + * If true WAL is not flushed automatically after each write. Instead it + * relies on manual invocation of FlushWAL to write the WAL buffer to its + * file. * * DEFAULT: false * - * Dynamically changeable through - * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)} - * API. + * @param manualWalFlush true to set disable automatic WAL flushing, + * false otherwise. + * + * @return the reference to the current options. + */ + T setManualWalFlush(final boolean manualWalFlush); + + /** + * Returns true if automatic WAL flushing is disabled. + * See {@link #setManualWalFlush(boolean)}. + * + * @return true if automatic WAL flushing is disabled, false otherwise. + */ + boolean manualWalFlush(); + + /** + * If true, RocksDB supports flushing multiple column families and committing + * their results atomically to MANIFEST. Note that it is not + * necessary to set atomic_flush to true if WAL is always enabled since WAL + * allows the database to be restored to the last persistent state in WAL. + * This option is useful when there are column families with writes NOT + * protected by WAL. + * For manual flush, application has to specify which column families to + * flush atomically in {@link RocksDB#flush(FlushOptions, List)}. + * For auto-triggered flush, RocksDB atomically flushes ALL column families. + * + * Currently, any WAL-enabled writes after atomic flush may be replayed + * independently if the process crashes later and tries to recover. + * + * @param atomicFlush true to enable atomic flush of multiple column families. + * + * @return the reference to the current options. + */ + T setAtomicFlush(final boolean atomicFlush); + + /** + * Determine if atomic flush of multiple column families is enabled. + * + * See {@link #setAtomicFlush(boolean)}. * - * @return true if we should avoid flush during shutdown + * @return true if atomic flush is enabled. */ - boolean avoidFlushDuringShutdown(); + boolean atomicFlush(); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DataBlockIndexType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DataBlockIndexType.java new file mode 100644 index 000000000..513e5b429 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/DataBlockIndexType.java @@ -0,0 +1,32 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + + +/** + * DataBlockIndexType used in conjunction with BlockBasedTable. + */ +public enum DataBlockIndexType { + /** + * traditional block type + */ + kDataBlockBinarySearch((byte)0x0), + + /** + * additional hash index + */ + kDataBlockBinaryAndHash((byte)0x1); + + private final byte value; + + DataBlockIndexType(final byte value) { + this.value = value; + } + + byte getValue() { + return value; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Env.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Env.java index a46f06178..d7658f239 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Env.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Env.java @@ -5,12 +5,23 @@ package org.rocksdb; +import java.util.Arrays; +import java.util.List; + /** * Base class for all Env implementations in RocksDB. */ public abstract class Env extends RocksObject { - public static final int FLUSH_POOL = 0; - public static final int COMPACTION_POOL = 1; + + private static final Env DEFAULT_ENV = new RocksEnv(getDefaultEnvInternal()); + static { + /** + * The Ownership of the Default Env belongs to C++ + * and so we disown the native handle here so that + * we cannot accidentally free it from Java. + */ + DEFAULT_ENV.disOwnNativeHandle(); + } /** *

    Returns the default environment suitable for the current operating @@ -18,13 +29,13 @@ public abstract class Env extends RocksObject { * *

    The result of {@code getDefault()} is a singleton whose ownership * belongs to rocksdb c++. As a result, the returned RocksEnv will not - * have the ownership of its c++ resource, and calling its dispose() + * have the ownership of its c++ resource, and calling its dispose()/close() * will be no-op.

    * * @return the default {@link org.rocksdb.RocksEnv} instance. */ public static Env getDefault() { - return default_env_; + return DEFAULT_ENV; } /** @@ -32,27 +43,36 @@ public abstract class Env extends RocksObject { * for this environment.

    *

    Default number: 1

    * - * @param num the number of threads + * @param number the number of threads * * @return current {@link RocksEnv} instance. */ - public Env setBackgroundThreads(final int num) { - return setBackgroundThreads(num, FLUSH_POOL); + public Env setBackgroundThreads(final int number) { + return setBackgroundThreads(number, Priority.LOW); + } + + /** + *

    Gets the number of background worker threads of the pool + * for this environment.

    + * + * @return the number of threads. + */ + public int getBackgroundThreads(final Priority priority) { + return getBackgroundThreads(nativeHandle_, priority.getValue()); } /** *

    Sets the number of background worker threads of the specified thread * pool for this environment.

    * - * @param num the number of threads - * @param poolID the id to specified a thread pool. Should be either - * FLUSH_POOL or COMPACTION_POOL. + * @param number the number of threads + * @param priority the priority id of a specified thread pool. * *

    Default number: 1

    * @return current {@link RocksEnv} instance. */ - public Env setBackgroundThreads(final int num, final int poolID) { - setBackgroundThreads(nativeHandle_, num, poolID); + public Env setBackgroundThreads(final int number, final Priority priority) { + setBackgroundThreads(nativeHandle_, number, priority.getValue()); return this; } @@ -60,33 +80,75 @@ public abstract class Env extends RocksObject { *

    Returns the length of the queue associated with the specified * thread pool.

    * - * @param poolID the id to specified a thread pool. Should be either - * FLUSH_POOL or COMPACTION_POOL. + * @param priority the priority id of a specified thread pool. * * @return the thread pool queue length. */ - public int getThreadPoolQueueLen(final int poolID) { - return getThreadPoolQueueLen(nativeHandle_, poolID); + public int getThreadPoolQueueLen(final Priority priority) { + return getThreadPoolQueueLen(nativeHandle_, priority.getValue()); } + /** + * Enlarge number of background worker threads of a specific thread pool + * for this environment if it is smaller than specified. 'LOW' is the default + * pool. + * + * @param number the number of threads. + * + * @return current {@link RocksEnv} instance. + */ + public Env incBackgroundThreadsIfNeeded(final int number, + final Priority priority) { + incBackgroundThreadsIfNeeded(nativeHandle_, number, priority.getValue()); + return this; + } - protected Env(final long nativeHandle) { - super(nativeHandle); + /** + * Lower IO priority for threads from the specified pool. + * + * @param priority the priority id of a specified thread pool. + */ + public Env lowerThreadPoolIOPriority(final Priority priority) { + lowerThreadPoolIOPriority(nativeHandle_, priority.getValue()); + return this; } - static { - default_env_ = new RocksEnv(getDefaultEnvInternal()); + /** + * Lower CPU priority for threads from the specified pool. + * + * @param priority the priority id of a specified thread pool. + */ + public Env lowerThreadPoolCPUPriority(final Priority priority) { + lowerThreadPoolCPUPriority(nativeHandle_, priority.getValue()); + return this; } /** - *

    The static default Env. The ownership of its native handle - * belongs to rocksdb c++ and is not able to be released on the Java - * side.

    + * Returns the status of all threads that belong to the current Env. + * + * @return the status of all threads belong to this env. */ - static Env default_env_; + public List getThreadList() throws RocksDBException { + return Arrays.asList(getThreadList(nativeHandle_)); + } + + Env(final long nativeHandle) { + super(nativeHandle); + } private static native long getDefaultEnvInternal(); private native void setBackgroundThreads( - long handle, int num, int priority); - private native int getThreadPoolQueueLen(long handle, int poolID); + final long handle, final int number, final byte priority); + private native int getBackgroundThreads(final long handle, + final byte priority); + private native int getThreadPoolQueueLen(final long handle, + final byte priority); + private native void incBackgroundThreadsIfNeeded(final long handle, + final int number, final byte priority); + private native void lowerThreadPoolIOPriority(final long handle, + final byte priority); + private native void lowerThreadPoolCPUPriority(final long handle, + final byte priority); + private native ThreadStatus[] getThreadList(final long handle) + throws RocksDBException; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/EnvOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/EnvOptions.java index 2bca0355e..6baddb310 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/EnvOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/EnvOptions.java @@ -5,203 +5,362 @@ package org.rocksdb; +/** + * Options while opening a file to read/write + */ public class EnvOptions extends RocksObject { static { RocksDB.loadLibrary(); } + /** + * Construct with default Options + */ public EnvOptions() { super(newEnvOptions()); } - public EnvOptions setUseOsBuffer(final boolean useOsBuffer) { - setUseOsBuffer(nativeHandle_, useOsBuffer); - return this; - } - - public boolean useOsBuffer() { - assert(isOwningHandle()); - return useOsBuffer(nativeHandle_); + /** + * Construct from {@link DBOptions}. + * + * @param dbOptions the database options. + */ + public EnvOptions(final DBOptions dbOptions) { + super(newEnvOptions(dbOptions.nativeHandle_)); } + /** + * Enable/Disable memory mapped reads. + * + * Default: false + * + * @param useMmapReads true to enable memory mapped reads, false to disable. + * + * @return the reference to these options. + */ public EnvOptions setUseMmapReads(final boolean useMmapReads) { setUseMmapReads(nativeHandle_, useMmapReads); return this; } + /** + * Determine if memory mapped reads are in-use. + * + * @return true if memory mapped reads are in-use, false otherwise. + */ public boolean useMmapReads() { assert(isOwningHandle()); return useMmapReads(nativeHandle_); } + /** + * Enable/Disable memory mapped Writes. + * + * Default: true + * + * @param useMmapWrites true to enable memory mapped writes, false to disable. + * + * @return the reference to these options. + */ public EnvOptions setUseMmapWrites(final boolean useMmapWrites) { setUseMmapWrites(nativeHandle_, useMmapWrites); return this; } + /** + * Determine if memory mapped writes are in-use. + * + * @return true if memory mapped writes are in-use, false otherwise. + */ public boolean useMmapWrites() { assert(isOwningHandle()); return useMmapWrites(nativeHandle_); } + /** + * Enable/Disable direct reads, i.e. {@code O_DIRECT}. + * + * Default: false + * + * @param useDirectReads true to enable direct reads, false to disable. + * + * @return the reference to these options. + */ public EnvOptions setUseDirectReads(final boolean useDirectReads) { setUseDirectReads(nativeHandle_, useDirectReads); return this; } + /** + * Determine if direct reads are in-use. + * + * @return true if direct reads are in-use, false otherwise. + */ public boolean useDirectReads() { assert(isOwningHandle()); return useDirectReads(nativeHandle_); } + /** + * Enable/Disable direct writes, i.e. {@code O_DIRECT}. + * + * Default: false + * + * @param useDirectWrites true to enable direct writes, false to disable. + * + * @return the reference to these options. + */ public EnvOptions setUseDirectWrites(final boolean useDirectWrites) { setUseDirectWrites(nativeHandle_, useDirectWrites); return this; } + /** + * Determine if direct writes are in-use. + * + * @return true if direct writes are in-use, false otherwise. + */ public boolean useDirectWrites() { assert(isOwningHandle()); return useDirectWrites(nativeHandle_); } + /** + * Enable/Disable fallocate calls. + * + * Default: true + * + * If false, {@code fallocate()} calls are bypassed. + * + * @param allowFallocate true to enable fallocate calls, false to disable. + * + * @return the reference to these options. + */ public EnvOptions setAllowFallocate(final boolean allowFallocate) { setAllowFallocate(nativeHandle_, allowFallocate); return this; } + /** + * Determine if fallocate calls are used. + * + * @return true if fallocate calls are used, false otherwise. + */ public boolean allowFallocate() { assert(isOwningHandle()); return allowFallocate(nativeHandle_); } + /** + * Enable/Disable the {@code FD_CLOEXEC} bit when opening file descriptors. + * + * Default: true + * + * @param setFdCloexec true to enable the {@code FB_CLOEXEC} bit, + * false to disable. + * + * @return the reference to these options. + */ public EnvOptions setSetFdCloexec(final boolean setFdCloexec) { setSetFdCloexec(nativeHandle_, setFdCloexec); return this; } + /** + * Determine i fthe {@code FD_CLOEXEC} bit is set when opening file + * descriptors. + * + * @return true if the {@code FB_CLOEXEC} bit is enabled, false otherwise. + */ public boolean setFdCloexec() { assert(isOwningHandle()); return setFdCloexec(nativeHandle_); } + /** + * Allows OS to incrementally sync files to disk while they are being + * written, in the background. Issue one request for every + * {@code bytesPerSync} written. + * + * Default: 0 + * + * @param bytesPerSync 0 to disable, otherwise the number of bytes. + * + * @return the reference to these options. + */ public EnvOptions setBytesPerSync(final long bytesPerSync) { setBytesPerSync(nativeHandle_, bytesPerSync); return this; } + /** + * Get the number of incremental bytes per sync written in the background. + * + * @return 0 if disabled, otherwise the number of bytes. + */ public long bytesPerSync() { assert(isOwningHandle()); return bytesPerSync(nativeHandle_); } - public EnvOptions setFallocateWithKeepSize(final boolean fallocateWithKeepSize) { + /** + * If true, we will preallocate the file with {@code FALLOC_FL_KEEP_SIZE} + * flag, which means that file size won't change as part of preallocation. + * If false, preallocation will also change the file size. This option will + * improve the performance in workloads where you sync the data on every + * write. By default, we set it to true for MANIFEST writes and false for + * WAL writes + * + * @param fallocateWithKeepSize true to preallocate, false otherwise. + * + * @return the reference to these options. + */ + public EnvOptions setFallocateWithKeepSize( + final boolean fallocateWithKeepSize) { setFallocateWithKeepSize(nativeHandle_, fallocateWithKeepSize); return this; } + /** + * Determine if file is preallocated. + * + * @return true if the file is preallocated, false otherwise. + */ public boolean fallocateWithKeepSize() { assert(isOwningHandle()); return fallocateWithKeepSize(nativeHandle_); } - public EnvOptions setCompactionReadaheadSize(final long compactionReadaheadSize) { + /** + * See {@link DBOptions#setCompactionReadaheadSize(long)}. + * + * @param compactionReadaheadSize the compaction read-ahead size. + * + * @return the reference to these options. + */ + public EnvOptions setCompactionReadaheadSize( + final long compactionReadaheadSize) { setCompactionReadaheadSize(nativeHandle_, compactionReadaheadSize); return this; } + /** + * See {@link DBOptions#compactionReadaheadSize()}. + * + * @return the compaction read-ahead size. + */ public long compactionReadaheadSize() { assert(isOwningHandle()); return compactionReadaheadSize(nativeHandle_); } - public EnvOptions setRandomAccessMaxBufferSize(final long randomAccessMaxBufferSize) { + /** + * See {@link DBOptions#setRandomAccessMaxBufferSize(long)}. + * + * @param randomAccessMaxBufferSize the max buffer size for random access. + * + * @return the reference to these options. + */ + public EnvOptions setRandomAccessMaxBufferSize( + final long randomAccessMaxBufferSize) { setRandomAccessMaxBufferSize(nativeHandle_, randomAccessMaxBufferSize); return this; } + /** + * See {@link DBOptions#randomAccessMaxBufferSize()}. + * + * @return the max buffer size for random access. + */ public long randomAccessMaxBufferSize() { assert(isOwningHandle()); return randomAccessMaxBufferSize(nativeHandle_); } - public EnvOptions setWritableFileMaxBufferSize(final long writableFileMaxBufferSize) { + /** + * See {@link DBOptions#setWritableFileMaxBufferSize(long)}. + * + * @param writableFileMaxBufferSize the max buffer size. + * + * @return the reference to these options. + */ + public EnvOptions setWritableFileMaxBufferSize( + final long writableFileMaxBufferSize) { setWritableFileMaxBufferSize(nativeHandle_, writableFileMaxBufferSize); return this; } + /** + * See {@link DBOptions#writableFileMaxBufferSize()}. + * + * @return the max buffer size. + */ public long writableFileMaxBufferSize() { assert(isOwningHandle()); return writableFileMaxBufferSize(nativeHandle_); } + /** + * Set the write rate limiter for flush and compaction. + * + * @param rateLimiter the rate limiter. + * + * @return the reference to these options. + */ public EnvOptions setRateLimiter(final RateLimiter rateLimiter) { this.rateLimiter = rateLimiter; setRateLimiter(nativeHandle_, rateLimiter.nativeHandle_); return this; } + /** + * Get the write rate limiter for flush and compaction. + * + * @return the rate limiter. + */ public RateLimiter rateLimiter() { assert(isOwningHandle()); return rateLimiter; } private native static long newEnvOptions(); - + private native static long newEnvOptions(final long dboptions_handle); @Override protected final native void disposeInternal(final long handle); - private native void setUseOsBuffer(final long handle, final boolean useOsBuffer); - - private native boolean useOsBuffer(final long handle); - - private native void setUseMmapReads(final long handle, final boolean useMmapReads); - + private native void setUseMmapReads(final long handle, + final boolean useMmapReads); private native boolean useMmapReads(final long handle); - - private native void setUseMmapWrites(final long handle, final boolean useMmapWrites); - + private native void setUseMmapWrites(final long handle, + final boolean useMmapWrites); private native boolean useMmapWrites(final long handle); - - private native void setUseDirectReads(final long handle, final boolean useDirectReads); - + private native void setUseDirectReads(final long handle, + final boolean useDirectReads); private native boolean useDirectReads(final long handle); - - private native void setUseDirectWrites(final long handle, final boolean useDirectWrites); - + private native void setUseDirectWrites(final long handle, + final boolean useDirectWrites); private native boolean useDirectWrites(final long handle); - - private native void setAllowFallocate(final long handle, final boolean allowFallocate); - + private native void setAllowFallocate(final long handle, + final boolean allowFallocate); private native boolean allowFallocate(final long handle); - - private native void setSetFdCloexec(final long handle, final boolean setFdCloexec); - + private native void setSetFdCloexec(final long handle, + final boolean setFdCloexec); private native boolean setFdCloexec(final long handle); - - private native void setBytesPerSync(final long handle, final long bytesPerSync); - + private native void setBytesPerSync(final long handle, + final long bytesPerSync); private native long bytesPerSync(final long handle); - private native void setFallocateWithKeepSize( final long handle, final boolean fallocateWithKeepSize); - private native boolean fallocateWithKeepSize(final long handle); - private native void setCompactionReadaheadSize( final long handle, final long compactionReadaheadSize); - private native long compactionReadaheadSize(final long handle); - private native void setRandomAccessMaxBufferSize( final long handle, final long randomAccessMaxBufferSize); - private native long randomAccessMaxBufferSize(final long handle); - private native void setWritableFileMaxBufferSize( final long handle, final long writableFileMaxBufferSize); - private native long writableFileMaxBufferSize(final long handle); - - private native void setRateLimiter(final long handle, final long rateLimiterHandle); - + private native void setRateLimiter(final long handle, + final long rateLimiterHandle); private RateLimiter rateLimiter; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Filter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Filter.java index 011be2085..7f490cf59 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Filter.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Filter.java @@ -12,6 +12,7 @@ package org.rocksdb; * number of disk seeks form a handful to a single disk seek per * DB::Get() call. */ +//TODO(AR) should be renamed FilterPolicy public abstract class Filter extends RocksObject { protected Filter(final long nativeHandle) { diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/FlushOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/FlushOptions.java index ce54a528b..760b515fd 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/FlushOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/FlushOptions.java @@ -1,3 +1,8 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + package org.rocksdb; /** @@ -41,9 +46,45 @@ public class FlushOptions extends RocksObject { return waitForFlush(nativeHandle_); } + /** + * Set to true so that flush would proceeds immediately even it it means + * writes will stall for the duration of the flush. + * + * Set to false so that the operation will wait until it's possible to do + * the flush without causing stall or until required flush is performed by + * someone else (foreground call or background thread). + * + * Default: false + * + * @param allowWriteStall true to allow writes to stall for flush, false + * otherwise. + * + * @return instance of current FlushOptions. + */ + public FlushOptions setAllowWriteStall(final boolean allowWriteStall) { + assert(isOwningHandle()); + setAllowWriteStall(nativeHandle_, allowWriteStall); + return this; + } + + /** + * Returns true if writes are allowed to stall for flushes to complete, false + * otherwise. + * + * @return true if writes are allowed to stall for flushes + */ + public boolean allowWriteStall() { + assert(isOwningHandle()); + return allowWriteStall(nativeHandle_); + } + private native static long newFlushOptions(); @Override protected final native void disposeInternal(final long handle); - private native void setWaitForFlush(long handle, - boolean wait); - private native boolean waitForFlush(long handle); + + private native void setWaitForFlush(final long handle, + final boolean wait); + private native boolean waitForFlush(final long handle); + private native void setAllowWriteStall(final long handle, + final boolean allowWriteStall); + private native boolean allowWriteStall(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HdfsEnv.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HdfsEnv.java new file mode 100644 index 000000000..4d8d3bff6 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HdfsEnv.java @@ -0,0 +1,27 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * HDFS environment. + */ +public class HdfsEnv extends Env { + + /** +

    Creates a new environment that is used for HDFS environment.

    + * + *

    The caller must delete the result when it is + * no longer needed.

    + * + * @param fsName the HDFS as a string in the form "hdfs://hostname:port/" + */ + public HdfsEnv(final String fsName) { + super(createHdfsEnv(fsName)); + } + + private static native long createHdfsEnv(final String fsName); + @Override protected final native void disposeInternal(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramData.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramData.java index 11798eb59..81d890883 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramData.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramData.java @@ -11,15 +11,30 @@ public class HistogramData { private final double percentile99_; private final double average_; private final double standardDeviation_; + private final double max_; + private final long count_; + private final long sum_; + private final double min_; + + public HistogramData(final double median, final double percentile95, + final double percentile99, final double average, + final double standardDeviation) { + this(median, percentile95, percentile99, average, standardDeviation, 0.0, 0, 0, 0.0); + } public HistogramData(final double median, final double percentile95, final double percentile99, final double average, - final double standardDeviation) { + final double standardDeviation, final double max, final long count, + final long sum, final double min) { median_ = median; percentile95_ = percentile95; percentile99_ = percentile99; average_ = average; standardDeviation_ = standardDeviation; + min_ = min; + max_ = max; + count_ = count; + sum_ = sum; } public double getMedian() { @@ -41,4 +56,20 @@ public class HistogramData { public double getStandardDeviation() { return standardDeviation_; } + + public double getMax() { + return max_; + } + + public long getCount() { + return count_; + } + + public long getSum() { + return sum_; + } + + public double getMin() { + return min_; + } } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramType.java index 2d95f5149..ab97a4d25 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramType.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/HistogramType.java @@ -84,6 +84,82 @@ public enum HistogramType { READ_NUM_MERGE_OPERANDS((byte) 0x1E), + /** + * Time spent flushing memtable to disk. + */ + FLUSH_TIME((byte) 0x20), + + /** + * Size of keys written to BlobDB. + */ + BLOB_DB_KEY_SIZE((byte) 0x21), + + /** + * Size of values written to BlobDB. + */ + BLOB_DB_VALUE_SIZE((byte) 0x22), + + /** + * BlobDB Put/PutWithTTL/PutUntil/Write latency. + */ + BLOB_DB_WRITE_MICROS((byte) 0x23), + + /** + * BlobDB Get lagency. + */ + BLOB_DB_GET_MICROS((byte) 0x24), + + /** + * BlobDB MultiGet latency. + */ + BLOB_DB_MULTIGET_MICROS((byte) 0x25), + + /** + * BlobDB Seek/SeekToFirst/SeekToLast/SeekForPrev latency. + */ + BLOB_DB_SEEK_MICROS((byte) 0x26), + + /** + * BlobDB Next latency. + */ + BLOB_DB_NEXT_MICROS((byte) 0x27), + + /** + * BlobDB Prev latency. + */ + BLOB_DB_PREV_MICROS((byte) 0x28), + + /** + * Blob file write latency. + */ + BLOB_DB_BLOB_FILE_WRITE_MICROS((byte) 0x29), + + /** + * Blob file read latency. + */ + BLOB_DB_BLOB_FILE_READ_MICROS((byte) 0x2A), + + /** + * Blob file sync latency. + */ + BLOB_DB_BLOB_FILE_SYNC_MICROS((byte) 0x2B), + + /** + * BlobDB garbage collection time. + */ + BLOB_DB_GC_MICROS((byte) 0x2C), + + /** + * BlobDB compression time. + */ + BLOB_DB_COMPRESSION_MICROS((byte) 0x2D), + + /** + * BlobDB decompression time. + */ + BLOB_DB_DECOMPRESSION_MICROS((byte) 0x2E), + + // 0x1F for backwards compatibility on current minor version. HISTOGRAM_ENUM_MAX((byte) 0x1F); private final byte value; @@ -92,6 +168,12 @@ public enum HistogramType { this.value = value; } + /** + * @deprecated + * Exposes internal value of native enum mappings. This method will be marked private in the + * next major release. + */ + @Deprecated public byte getValue() { return value; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IndexType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IndexType.java index e0c113d39..04e481465 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IndexType.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IndexType.java @@ -33,7 +33,7 @@ public enum IndexType { return value_; } - private IndexType(byte value) { + IndexType(byte value) { value_ = value; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java index 734369181..a6a308daa 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/IngestExternalFileOptions.java @@ -7,7 +7,8 @@ package org.rocksdb; import java.util.List; /** - * IngestExternalFileOptions is used by {@link RocksDB#ingestExternalFile(ColumnFamilyHandle, List, IngestExternalFileOptions)} + * IngestExternalFileOptions is used by + * {@link RocksDB#ingestExternalFile(ColumnFamilyHandle, List, IngestExternalFileOptions)}. */ public class IngestExternalFileOptions extends RocksObject { @@ -41,9 +42,12 @@ public class IngestExternalFileOptions extends RocksObject { * Can be set to true to move the files instead of copying them. * * @param moveFiles true if files should be moved instead of copied + * + * @return the reference to the current IngestExternalFileOptions. */ - public void setMoveFiles(final boolean moveFiles) { + public IngestExternalFileOptions setMoveFiles(final boolean moveFiles) { setMoveFiles(nativeHandle_, moveFiles); + return this; } /** @@ -61,9 +65,13 @@ public class IngestExternalFileOptions extends RocksObject { * that where created before the file was ingested. * * @param snapshotConsistency true if snapshot consistency is required + * + * @return the reference to the current IngestExternalFileOptions. */ - public void setSnapshotConsistency(final boolean snapshotConsistency) { + public IngestExternalFileOptions setSnapshotConsistency( + final boolean snapshotConsistency) { setSnapshotConsistency(nativeHandle_, snapshotConsistency); + return this; } /** @@ -81,9 +89,13 @@ public class IngestExternalFileOptions extends RocksObject { * will fail if the file key range overlaps with existing keys or tombstones in the DB. * * @param allowGlobalSeqNo true if global seq numbers are required + * + * @return the reference to the current IngestExternalFileOptions. */ - public void setAllowGlobalSeqNo(final boolean allowGlobalSeqNo) { + public IngestExternalFileOptions setAllowGlobalSeqNo( + final boolean allowGlobalSeqNo) { setAllowGlobalSeqNo(nativeHandle_, allowGlobalSeqNo); + return this; } /** @@ -101,15 +113,100 @@ public class IngestExternalFileOptions extends RocksObject { * (memtable flush required), IngestExternalFile will fail. * * @param allowBlockingFlush true if blocking flushes are allowed + * + * @return the reference to the current IngestExternalFileOptions. */ - public void setAllowBlockingFlush(final boolean allowBlockingFlush) { + public IngestExternalFileOptions setAllowBlockingFlush( + final boolean allowBlockingFlush) { setAllowBlockingFlush(nativeHandle_, allowBlockingFlush); + return this; + } + + /** + * Returns true if duplicate keys in the file being ingested are + * to be skipped rather than overwriting existing data under that key. + * + * @return true if duplicate keys in the file being ingested are to be + * skipped, false otherwise. + */ + public boolean ingestBehind() { + return ingestBehind(nativeHandle_); + } + + /** + * Set to true if you would like duplicate keys in the file being ingested + * to be skipped rather than overwriting existing data under that key. + * + * Usecase: back-fill of some historical data in the database without + * over-writing existing newer version of data. + * + * This option could only be used if the DB has been running + * with DBOptions#allowIngestBehind() == true since the dawn of time. + * + * All files will be ingested at the bottommost level with seqno=0. + * + * Default: false + * + * @param ingestBehind true if you would like duplicate keys in the file being + * ingested to be skipped. + * + * @return the reference to the current IngestExternalFileOptions. + */ + public IngestExternalFileOptions setIngestBehind(final boolean ingestBehind) { + setIngestBehind(nativeHandle_, ingestBehind); + return this; + } + + /** + * Returns true write if the global_seqno is written to a given offset + * in the external SST file for backward compatibility. + * + * See {@link #setWriteGlobalSeqno(boolean)}. + * + * @return true if the global_seqno is written to a given offset, + * false otherwise. + */ + public boolean writeGlobalSeqno() { + return writeGlobalSeqno(nativeHandle_); + } + + /** + * Set to true if you would like to write the global_seqno to a given offset + * in the external SST file for backward compatibility. + * + * Older versions of RocksDB write the global_seqno to a given offset within + * the ingested SST files, and new versions of RocksDB do not. + * + * If you ingest an external SST using new version of RocksDB and would like + * to be able to downgrade to an older version of RocksDB, you should set + * {@link #writeGlobalSeqno()} to true. + * + * If your service is just starting to use the new RocksDB, we recommend that + * you set this option to false, which brings two benefits: + * 1. No extra random write for global_seqno during ingestion. + * 2. Without writing external SST file, it's possible to do checksum. + * + * We have a plan to set this option to false by default in the future. + * + * Default: true + * + * @param writeGlobalSeqno true to write the gloal_seqno to a given offset, + * false otherwise + * + * @return the reference to the current IngestExternalFileOptions. + */ + public IngestExternalFileOptions setWriteGlobalSeqno( + final boolean writeGlobalSeqno) { + setWriteGlobalSeqno(nativeHandle_, writeGlobalSeqno); + return this; } private native static long newIngestExternalFileOptions(); private native static long newIngestExternalFileOptions( final boolean moveFiles, final boolean snapshotConsistency, final boolean allowGlobalSeqNo, final boolean allowBlockingFlush); + @Override protected final native void disposeInternal(final long handle); + private native boolean moveFiles(final long handle); private native void setMoveFiles(final long handle, final boolean move_files); private native boolean snapshotConsistency(final long handle); @@ -121,5 +218,10 @@ public class IngestExternalFileOptions extends RocksObject { private native boolean allowBlockingFlush(final long handle); private native void setAllowBlockingFlush(final long handle, final boolean allowBlockingFlush); - @Override protected final native void disposeInternal(final long handle); + private native boolean ingestBehind(final long handle); + private native void setIngestBehind(final long handle, + final boolean ingestBehind); + private native boolean writeGlobalSeqno(final long handle); + private native void setWriteGlobalSeqno(final long handle, + final boolean writeGlobalSeqNo); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LevelMetaData.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LevelMetaData.java new file mode 100644 index 000000000..c5685098b --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LevelMetaData.java @@ -0,0 +1,56 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.Arrays; +import java.util.List; + +/** + * The metadata that describes a level. + */ +public class LevelMetaData { + private final int level; + private final long size; + private final SstFileMetaData[] files; + + /** + * Called from JNI C++ + */ + private LevelMetaData(final int level, final long size, + final SstFileMetaData[] files) { + this.level = level; + this.size = size; + this.files = files; + } + + /** + * The level which this meta data describes. + * + * @return the level + */ + public int level() { + return level; + } + + /** + * The size of this level in bytes, which is equal to the sum of + * the file size of its {@link #files()}. + * + * @return the size + */ + public long size() { + return size; + } + + /** + * The metadata of all sst files in this level. + * + * @return the metadata of the files + */ + public List files() { + return Arrays.asList(files); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LiveFileMetaData.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LiveFileMetaData.java new file mode 100644 index 000000000..35d883e18 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LiveFileMetaData.java @@ -0,0 +1,55 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The full set of metadata associated with each SST file. + */ +public class LiveFileMetaData extends SstFileMetaData { + private final byte[] columnFamilyName; + private final int level; + + /** + * Called from JNI C++ + */ + private LiveFileMetaData( + final byte[] columnFamilyName, + final int level, + final String fileName, + final String path, + final long size, + final long smallestSeqno, + final long largestSeqno, + final byte[] smallestKey, + final byte[] largestKey, + final long numReadsSampled, + final boolean beingCompacted, + final long numEntries, + final long numDeletions) { + super(fileName, path, size, smallestSeqno, largestSeqno, smallestKey, + largestKey, numReadsSampled, beingCompacted, numEntries, numDeletions); + this.columnFamilyName = columnFamilyName; + this.level = level; + } + + /** + * Get the name of the column family. + * + * @return the name of the column family + */ + public byte[] columnFamilyName() { + return columnFamilyName; + } + + /** + * Get the level at which this file resides. + * + * @return the level at which the file resides. + */ + public int level() { + return level; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LogFile.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LogFile.java new file mode 100644 index 000000000..ef24a6427 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/LogFile.java @@ -0,0 +1,75 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +public class LogFile { + private final String pathName; + private final long logNumber; + private final WalFileType type; + private final long startSequence; + private final long sizeFileBytes; + + /** + * Called from JNI C++ + */ + private LogFile(final String pathName, final long logNumber, + final byte walFileTypeValue, final long startSequence, + final long sizeFileBytes) { + this.pathName = pathName; + this.logNumber = logNumber; + this.type = WalFileType.fromValue(walFileTypeValue); + this.startSequence = startSequence; + this.sizeFileBytes = sizeFileBytes; + } + + /** + * Returns log file's pathname relative to the main db dir + * Eg. For a live-log-file = /000003.log + * For an archived-log-file = /archive/000003.log + * + * @return log file's pathname + */ + public String pathName() { + return pathName; + } + + /** + * Primary identifier for log file. + * This is directly proportional to creation time of the log file + * + * @return the log number + */ + public long logNumber() { + return logNumber; + } + + /** + * Log file can be either alive or archived. + * + * @return the type of the log file. + */ + public WalFileType type() { + return type; + } + + /** + * Starting sequence number of writebatch written in this log file. + * + * @return the stating sequence number + */ + public long startSequence() { + return startSequence; + } + + /** + * Size of log file on disk in Bytes. + * + * @return size of log file + */ + public long sizeFileBytes() { + return sizeFileBytes; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUsageType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUsageType.java new file mode 100644 index 000000000..6010ce7af --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUsageType.java @@ -0,0 +1,72 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * MemoryUsageType + * + *

    The value will be used as a key to indicate the type of memory usage + * described

    + */ +public enum MemoryUsageType { + /** + * Memory usage of all the mem-tables. + */ + kMemTableTotal((byte) 0), + /** + * Memory usage of those un-flushed mem-tables. + */ + kMemTableUnFlushed((byte) 1), + /** + * Memory usage of all the table readers. + */ + kTableReadersTotal((byte) 2), + /** + * Memory usage by Cache. + */ + kCacheTotal((byte) 3), + /** + * Max usage types - copied to keep 1:1 with native. + */ + kNumUsageTypes((byte) 4); + + /** + * Returns the byte value of the enumerations value + * + * @return byte representation + */ + public byte getValue() { + return value_; + } + + /** + *

    Get the MemoryUsageType enumeration value by + * passing the byte identifier to this method.

    + * + * @param byteIdentifier of MemoryUsageType. + * + * @return MemoryUsageType instance. + * + * @throws IllegalArgumentException if the usage type for the byteIdentifier + * cannot be found + */ + public static MemoryUsageType getMemoryUsageType(final byte byteIdentifier) { + for (final MemoryUsageType memoryUsageType : MemoryUsageType.values()) { + if (memoryUsageType.getValue() == byteIdentifier) { + return memoryUsageType; + } + } + + throw new IllegalArgumentException( + "Illegal value provided for MemoryUsageType."); + } + + MemoryUsageType(byte value) { + value_ = value; + } + + private final byte value_; +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUtil.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUtil.java new file mode 100644 index 000000000..52b2175e6 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MemoryUtil.java @@ -0,0 +1,60 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.*; + +/** + * JNI passthrough for MemoryUtil. + */ +public class MemoryUtil { + + /** + *

    Returns the approximate memory usage of different types in the input + * list of DBs and Cache set. For instance, in the output map the key + * kMemTableTotal will be associated with the memory + * usage of all the mem-tables from all the input rocksdb instances.

    + * + *

    Note that for memory usage inside Cache class, we will + * only report the usage of the input "cache_set" without + * including those Cache usage inside the input list "dbs" + * of DBs.

    + * + * @param dbs List of dbs to collect memory usage for. + * @param caches Set of caches to collect memory usage for. + * @return Map from {@link MemoryUsageType} to memory usage as a {@link Long}. + */ + public static Map getApproximateMemoryUsageByType(final List dbs, final Set caches) { + int dbCount = (dbs == null) ? 0 : dbs.size(); + int cacheCount = (caches == null) ? 0 : caches.size(); + long[] dbHandles = new long[dbCount]; + long[] cacheHandles = new long[cacheCount]; + if (dbCount > 0) { + ListIterator dbIter = dbs.listIterator(); + while (dbIter.hasNext()) { + dbHandles[dbIter.nextIndex()] = dbIter.next().nativeHandle_; + } + } + if (cacheCount > 0) { + // NOTE: This index handling is super ugly but I couldn't get a clean way to track both the + // index and the iterator simultaneously within a Set. + int i = 0; + for (Cache cache : caches) { + cacheHandles[i] = cache.nativeHandle_; + i++; + } + } + Map byteOutput = getApproximateMemoryUsageByType(dbHandles, cacheHandles); + Map output = new HashMap<>(); + for(Map.Entry longEntry : byteOutput.entrySet()) { + output.put(MemoryUsageType.getMemoryUsageType(longEntry.getKey()), longEntry.getValue()); + } + return output; + } + + private native static Map getApproximateMemoryUsageByType(final long[] dbHandles, + final long[] cacheHandles); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java index 3585318db..1d9ca0817 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java @@ -7,27 +7,20 @@ package org.rocksdb; import java.util.*; -public class MutableColumnFamilyOptions { - private final static String KEY_VALUE_PAIR_SEPARATOR = ";"; - private final static char KEY_VALUE_SEPARATOR = '='; - private final static String INT_ARRAY_INT_SEPARATOR = ","; - - private final String[] keys; - private final String[] values; - - // user must use builder pattern, or parser - private MutableColumnFamilyOptions(final String keys[], - final String values[]) { - this.keys = keys; - this.values = values; - } - - String[] getKeys() { - return keys; - } +public class MutableColumnFamilyOptions + extends AbstractMutableOptions { - String[] getValues() { - return values; + /** + * User must use builder pattern, or parser. + * + * @param keys the keys + * @param values the values + * + * See {@link #builder()} and {@link #parse(String)}. + */ + private MutableColumnFamilyOptions(final String[] keys, + final String[] values) { + super(keys, values); } /** @@ -60,7 +53,7 @@ public class MutableColumnFamilyOptions { final MutableColumnFamilyOptionsBuilder builder = new MutableColumnFamilyOptionsBuilder(); - final String options[] = str.trim().split(KEY_VALUE_PAIR_SEPARATOR); + final String[] options = str.trim().split(KEY_VALUE_PAIR_SEPARATOR); for(final String option : options) { final int equalsOffset = option.indexOf(KEY_VALUE_SEPARATOR); if(equalsOffset <= 0) { @@ -69,12 +62,12 @@ public class MutableColumnFamilyOptions { } final String key = option.substring(0, equalsOffset); - if(key == null || key.isEmpty()) { + if(key.isEmpty()) { throw new IllegalArgumentException("options string is invalid"); } final String value = option.substring(equalsOffset + 1); - if(value == null || value.isEmpty()) { + if(value.isEmpty()) { throw new IllegalArgumentException("options string is invalid"); } @@ -84,37 +77,7 @@ public class MutableColumnFamilyOptions { return builder; } - /** - * Returns a string representation - * of MutableColumnFamilyOptions which is - * suitable for consumption by {@link #parse(String)} - * - * @return String representation of MutableColumnFamilyOptions - */ - @Override - public String toString() { - final StringBuilder buffer = new StringBuilder(); - for(int i = 0; i < keys.length; i++) { - buffer - .append(keys[i]) - .append(KEY_VALUE_SEPARATOR) - .append(values[i]); - - if(i + 1 < keys.length) { - buffer.append(KEY_VALUE_PAIR_SEPARATOR); - } - } - return buffer.toString(); - } - - public enum ValueType { - DOUBLE, - LONG, - INT, - BOOLEAN, - INT_ARRAY, - ENUM - } + private interface MutableColumnFamilyOptionKey extends MutableOptionKey {} public enum MemtableOption implements MutableColumnFamilyOptionKey { write_buffer_size(ValueType.LONG), @@ -153,7 +116,8 @@ public class MutableColumnFamilyOptions { target_file_size_multiplier(ValueType.INT), max_bytes_for_level_base(ValueType.LONG), max_bytes_for_level_multiplier(ValueType.INT), - max_bytes_for_level_multiplier_additional(ValueType.INT_ARRAY); + max_bytes_for_level_multiplier_additional(ValueType.INT_ARRAY), + ttl(ValueType.LONG); private final ValueType valueType; CompactionOption(final ValueType valueType) { @@ -183,356 +147,9 @@ public class MutableColumnFamilyOptions { } } - private interface MutableColumnFamilyOptionKey { - String name(); - ValueType getValueType(); - } - - private static abstract class MutableColumnFamilyOptionValue { - protected final T value; - - MutableColumnFamilyOptionValue(final T value) { - this.value = value; - } - - abstract double asDouble() throws NumberFormatException; - abstract long asLong() throws NumberFormatException; - abstract int asInt() throws NumberFormatException; - abstract boolean asBoolean() throws IllegalStateException; - abstract int[] asIntArray() throws IllegalStateException; - abstract String asString(); - abstract T asObject(); - } - - private static class MutableColumnFamilyOptionStringValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionStringValue(final String value) { - super(value); - } - - @Override - double asDouble() throws NumberFormatException { - return Double.parseDouble(value); - } - - @Override - long asLong() throws NumberFormatException { - return Long.parseLong(value); - } - - @Override - int asInt() throws NumberFormatException { - return Integer.parseInt(value); - } - - @Override - boolean asBoolean() throws IllegalStateException { - return Boolean.parseBoolean(value); - } - - @Override - int[] asIntArray() throws IllegalStateException { - throw new IllegalStateException("String is not applicable as int[]"); - } - - @Override - String asString() { - return value; - } - - @Override - String asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionDoubleValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionDoubleValue(final double value) { - super(value); - } - - @Override - double asDouble() { - return value; - } - - @Override - long asLong() throws NumberFormatException { - return value.longValue(); - } - - @Override - int asInt() throws NumberFormatException { - if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { - throw new NumberFormatException( - "double value lies outside the bounds of int"); - } - return value.intValue(); - } - - @Override - boolean asBoolean() throws IllegalStateException { - throw new IllegalStateException( - "double is not applicable as boolean"); - } - - @Override - int[] asIntArray() throws IllegalStateException { - if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { - throw new NumberFormatException( - "double value lies outside the bounds of int"); - } - return new int[] { value.intValue() }; - } - - @Override - String asString() { - return Double.toString(value); - } - - @Override - Double asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionLongValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionLongValue(final long value) { - super(value); - } - - @Override - double asDouble() { - if(value > Double.MAX_VALUE || value < Double.MIN_VALUE) { - throw new NumberFormatException( - "long value lies outside the bounds of int"); - } - return value.doubleValue(); - } - - @Override - long asLong() throws NumberFormatException { - return value; - } - - @Override - int asInt() throws NumberFormatException { - if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { - throw new NumberFormatException( - "long value lies outside the bounds of int"); - } - return value.intValue(); - } - - @Override - boolean asBoolean() throws IllegalStateException { - throw new IllegalStateException( - "long is not applicable as boolean"); - } - - @Override - int[] asIntArray() throws IllegalStateException { - if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { - throw new NumberFormatException( - "long value lies outside the bounds of int"); - } - return new int[] { value.intValue() }; - } - - @Override - String asString() { - return Long.toString(value); - } - - @Override - Long asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionIntValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionIntValue(final int value) { - super(value); - } - - @Override - double asDouble() { - if(value > Double.MAX_VALUE || value < Double.MIN_VALUE) { - throw new NumberFormatException("int value lies outside the bounds of int"); - } - return value.doubleValue(); - } - - @Override - long asLong() throws NumberFormatException { - return value; - } - - @Override - int asInt() throws NumberFormatException { - return value; - } - - @Override - boolean asBoolean() throws IllegalStateException { - throw new IllegalStateException("int is not applicable as boolean"); - } - - @Override - int[] asIntArray() throws IllegalStateException { - return new int[] { value }; - } - - @Override - String asString() { - return Integer.toString(value); - } - - @Override - Integer asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionBooleanValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionBooleanValue(final boolean value) { - super(value); - } - - @Override - double asDouble() { - throw new NumberFormatException("boolean is not applicable as double"); - } - - @Override - long asLong() throws NumberFormatException { - throw new NumberFormatException("boolean is not applicable as Long"); - } - - @Override - int asInt() throws NumberFormatException { - throw new NumberFormatException("boolean is not applicable as int"); - } - - @Override - boolean asBoolean() { - return value; - } - - @Override - int[] asIntArray() throws IllegalStateException { - throw new IllegalStateException("boolean is not applicable as int[]"); - } - - @Override - String asString() { - return Boolean.toString(value); - } - - @Override - Boolean asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionIntArrayValue - extends MutableColumnFamilyOptionValue { - MutableColumnFamilyOptionIntArrayValue(final int[] value) { - super(value); - } - - @Override - double asDouble() { - throw new NumberFormatException("int[] is not applicable as double"); - } - - @Override - long asLong() throws NumberFormatException { - throw new NumberFormatException("int[] is not applicable as Long"); - } - - @Override - int asInt() throws NumberFormatException { - throw new NumberFormatException("int[] is not applicable as int"); - } - - @Override - boolean asBoolean() { - throw new NumberFormatException("int[] is not applicable as boolean"); - } - - @Override - int[] asIntArray() throws IllegalStateException { - return value; - } - - @Override - String asString() { - final StringBuilder builder = new StringBuilder(); - for(int i = 0; i < value.length; i++) { - builder.append(Integer.toString(i)); - if(i + 1 < value.length) { - builder.append(INT_ARRAY_INT_SEPARATOR); - } - } - return builder.toString(); - } - - @Override - int[] asObject() { - return value; - } - } - - private static class MutableColumnFamilyOptionEnumValue> - extends MutableColumnFamilyOptionValue { - - MutableColumnFamilyOptionEnumValue(final T value) { - super(value); - } - - @Override - double asDouble() throws NumberFormatException { - throw new NumberFormatException("Enum is not applicable as double"); - } - - @Override - long asLong() throws NumberFormatException { - throw new NumberFormatException("Enum is not applicable as long"); - } - - @Override - int asInt() throws NumberFormatException { - throw new NumberFormatException("Enum is not applicable as int"); - } - - @Override - boolean asBoolean() throws IllegalStateException { - throw new NumberFormatException("Enum is not applicable as boolean"); - } - - @Override - int[] asIntArray() throws IllegalStateException { - throw new NumberFormatException("Enum is not applicable as int[]"); - } - - @Override - String asString() { - return value.name(); - } - - @Override - T asObject() { - return value; - } - } - public static class MutableColumnFamilyOptionsBuilder - implements MutableColumnFamilyOptionsInterface { + extends AbstractMutableOptionsBuilder + implements MutableColumnFamilyOptionsInterface { private final static Map ALL_KEYS_LOOKUP = new HashMap<>(); static { @@ -549,179 +166,24 @@ public class MutableColumnFamilyOptions { } } - private final Map> options = new LinkedHashMap<>(); - - public MutableColumnFamilyOptions build() { - final String keys[] = new String[options.size()]; - final String values[] = new String[options.size()]; - - int i = 0; - for(final Map.Entry> option : options.entrySet()) { - keys[i] = option.getKey().name(); - values[i] = option.getValue().asString(); - i++; - } - - return new MutableColumnFamilyOptions(keys, values); - } - - private MutableColumnFamilyOptionsBuilder setDouble( - final MutableColumnFamilyOptionKey key, final double value) { - if(key.getValueType() != ValueType.DOUBLE) { - throw new IllegalArgumentException( - key + " does not accept a double value"); - } - options.put(key, new MutableColumnFamilyOptionDoubleValue(value)); - return this; - } - - private double getDouble(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - return value.asDouble(); - } - - private MutableColumnFamilyOptionsBuilder setLong( - final MutableColumnFamilyOptionKey key, final long value) { - if(key.getValueType() != ValueType.LONG) { - throw new IllegalArgumentException( - key + " does not accept a long value"); - } - options.put(key, new MutableColumnFamilyOptionLongValue(value)); - return this; - } - - private long getLong(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - return value.asLong(); - } - - private MutableColumnFamilyOptionsBuilder setInt( - final MutableColumnFamilyOptionKey key, final int value) { - if(key.getValueType() != ValueType.INT) { - throw new IllegalArgumentException( - key + " does not accept an integer value"); - } - options.put(key, new MutableColumnFamilyOptionIntValue(value)); - return this; - } - - private int getInt(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - return value.asInt(); - } - - private MutableColumnFamilyOptionsBuilder setBoolean( - final MutableColumnFamilyOptionKey key, final boolean value) { - if(key.getValueType() != ValueType.BOOLEAN) { - throw new IllegalArgumentException( - key + " does not accept a boolean value"); - } - options.put(key, new MutableColumnFamilyOptionBooleanValue(value)); - return this; - } - - private boolean getBoolean(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - return value.asBoolean(); + private MutableColumnFamilyOptionsBuilder() { + super(); } - private MutableColumnFamilyOptionsBuilder setIntArray( - final MutableColumnFamilyOptionKey key, final int[] value) { - if(key.getValueType() != ValueType.INT_ARRAY) { - throw new IllegalArgumentException( - key + " does not accept an int array value"); - } - options.put(key, new MutableColumnFamilyOptionIntArrayValue(value)); - return this; - } - - private int[] getIntArray(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - return value.asIntArray(); - } - - private > MutableColumnFamilyOptionsBuilder setEnum( - final MutableColumnFamilyOptionKey key, final T value) { - if(key.getValueType() != ValueType.ENUM) { - throw new IllegalArgumentException( - key + " does not accept a Enum value"); - } - options.put(key, new MutableColumnFamilyOptionEnumValue(value)); + @Override + protected MutableColumnFamilyOptionsBuilder self() { return this; - } - private > T getEnum(final MutableColumnFamilyOptionKey key) - throws NoSuchElementException, NumberFormatException { - final MutableColumnFamilyOptionValue value = options.get(key); - if(value == null) { - throw new NoSuchElementException(key.name() + " has not been set"); - } - - if(!(value instanceof MutableColumnFamilyOptionEnumValue)) { - throw new NoSuchElementException(key.name() + " is not of Enum type"); - } - - return ((MutableColumnFamilyOptionEnumValue)value).asObject(); + @Override + protected Map allKeys() { + return ALL_KEYS_LOOKUP; } - public MutableColumnFamilyOptionsBuilder fromString(final String keyStr, - final String valueStr) throws IllegalArgumentException { - Objects.requireNonNull(keyStr); - Objects.requireNonNull(valueStr); - - final MutableColumnFamilyOptionKey key = ALL_KEYS_LOOKUP.get(keyStr); - switch(key.getValueType()) { - case DOUBLE: - return setDouble(key, Double.parseDouble(valueStr)); - - case LONG: - return setLong(key, Long.parseLong(valueStr)); - - case INT: - return setInt(key, Integer.parseInt(valueStr)); - - case BOOLEAN: - return setBoolean(key, Boolean.parseBoolean(valueStr)); - - case INT_ARRAY: - final String[] strInts = valueStr - .trim().split(INT_ARRAY_INT_SEPARATOR); - if(strInts == null || strInts.length == 0) { - throw new IllegalArgumentException( - "int array value is not correctly formatted"); - } - - final int value[] = new int[strInts.length]; - int i = 0; - for(final String strInt : strInts) { - value[i++] = Integer.parseInt(strInt); - } - return setIntArray(key, value); - } - - throw new IllegalStateException( - key + " has unknown value type: " + key.getValueType()); + @Override + protected MutableColumnFamilyOptions build(final String[] keys, + final String[] values) { + return new MutableColumnFamilyOptions(keys, values); } @Override @@ -993,5 +455,15 @@ public class MutableColumnFamilyOptions { public boolean reportBgIoStats() { return getBoolean(MiscOption.report_bg_io_stats); } + + @Override + public MutableColumnFamilyOptionsBuilder setTtl(final long ttl) { + return setLong(CompactionOption.ttl, ttl); + } + + @Override + public long ttl() { + return getLong(CompactionOption.ttl); + } } } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptions.java new file mode 100644 index 000000000..328f7f979 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptions.java @@ -0,0 +1,286 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.HashMap; +import java.util.Map; +import java.util.Objects; + +public class MutableDBOptions extends AbstractMutableOptions { + + /** + * User must use builder pattern, or parser. + * + * @param keys the keys + * @param values the values + * + * See {@link #builder()} and {@link #parse(String)}. + */ + private MutableDBOptions(final String[] keys, final String[] values) { + super(keys, values); + } + + /** + * Creates a builder which allows you + * to set MutableDBOptions in a fluent + * manner + * + * @return A builder for MutableDBOptions + */ + public static MutableDBOptionsBuilder builder() { + return new MutableDBOptionsBuilder(); + } + + /** + * Parses a String representation of MutableDBOptions + * + * The format is: key1=value1;key2=value2;key3=value3 etc + * + * For int[] values, each int should be separated by a comma, e.g. + * + * key1=value1;intArrayKey1=1,2,3 + * + * @param str The string representation of the mutable db options + * + * @return A builder for the mutable db options + */ + public static MutableDBOptionsBuilder parse(final String str) { + Objects.requireNonNull(str); + + final MutableDBOptionsBuilder builder = + new MutableDBOptionsBuilder(); + + final String[] options = str.trim().split(KEY_VALUE_PAIR_SEPARATOR); + for(final String option : options) { + final int equalsOffset = option.indexOf(KEY_VALUE_SEPARATOR); + if(equalsOffset <= 0) { + throw new IllegalArgumentException( + "options string has an invalid key=value pair"); + } + + final String key = option.substring(0, equalsOffset); + if(key.isEmpty()) { + throw new IllegalArgumentException("options string is invalid"); + } + + final String value = option.substring(equalsOffset + 1); + if(value.isEmpty()) { + throw new IllegalArgumentException("options string is invalid"); + } + + builder.fromString(key, value); + } + + return builder; + } + + private interface MutableDBOptionKey extends MutableOptionKey {} + + public enum DBOption implements MutableDBOptionKey { + max_background_jobs(ValueType.INT), + base_background_compactions(ValueType.INT), + max_background_compactions(ValueType.INT), + avoid_flush_during_shutdown(ValueType.BOOLEAN), + writable_file_max_buffer_size(ValueType.LONG), + delayed_write_rate(ValueType.LONG), + max_total_wal_size(ValueType.LONG), + delete_obsolete_files_period_micros(ValueType.LONG), + stats_dump_period_sec(ValueType.INT), + max_open_files(ValueType.INT), + bytes_per_sync(ValueType.LONG), + wal_bytes_per_sync(ValueType.LONG), + compaction_readahead_size(ValueType.LONG); + + private final ValueType valueType; + DBOption(final ValueType valueType) { + this.valueType = valueType; + } + + @Override + public ValueType getValueType() { + return valueType; + } + } + + public static class MutableDBOptionsBuilder + extends AbstractMutableOptionsBuilder + implements MutableDBOptionsInterface { + + private final static Map ALL_KEYS_LOOKUP = new HashMap<>(); + static { + for(final MutableDBOptionKey key : DBOption.values()) { + ALL_KEYS_LOOKUP.put(key.name(), key); + } + } + + private MutableDBOptionsBuilder() { + super(); + } + + @Override + protected MutableDBOptionsBuilder self() { + return this; + } + + @Override + protected Map allKeys() { + return ALL_KEYS_LOOKUP; + } + + @Override + protected MutableDBOptions build(final String[] keys, + final String[] values) { + return new MutableDBOptions(keys, values); + } + + @Override + public MutableDBOptionsBuilder setMaxBackgroundJobs( + final int maxBackgroundJobs) { + return setInt(DBOption.max_background_jobs, maxBackgroundJobs); + } + + @Override + public int maxBackgroundJobs() { + return getInt(DBOption.max_background_jobs); + } + + @Override + public void setBaseBackgroundCompactions( + final int baseBackgroundCompactions) { + setInt(DBOption.base_background_compactions, + baseBackgroundCompactions); + } + + @Override + public int baseBackgroundCompactions() { + return getInt(DBOption.base_background_compactions); + } + + @Override + public MutableDBOptionsBuilder setMaxBackgroundCompactions( + final int maxBackgroundCompactions) { + return setInt(DBOption.max_background_compactions, + maxBackgroundCompactions); + } + + @Override + public int maxBackgroundCompactions() { + return getInt(DBOption.max_background_compactions); + } + + @Override + public MutableDBOptionsBuilder setAvoidFlushDuringShutdown( + final boolean avoidFlushDuringShutdown) { + return setBoolean(DBOption.avoid_flush_during_shutdown, + avoidFlushDuringShutdown); + } + + @Override + public boolean avoidFlushDuringShutdown() { + return getBoolean(DBOption.avoid_flush_during_shutdown); + } + + @Override + public MutableDBOptionsBuilder setWritableFileMaxBufferSize( + final long writableFileMaxBufferSize) { + return setLong(DBOption.writable_file_max_buffer_size, + writableFileMaxBufferSize); + } + + @Override + public long writableFileMaxBufferSize() { + return getLong(DBOption.writable_file_max_buffer_size); + } + + @Override + public MutableDBOptionsBuilder setDelayedWriteRate( + final long delayedWriteRate) { + return setLong(DBOption.delayed_write_rate, + delayedWriteRate); + } + + @Override + public long delayedWriteRate() { + return getLong(DBOption.delayed_write_rate); + } + + @Override + public MutableDBOptionsBuilder setMaxTotalWalSize( + final long maxTotalWalSize) { + return setLong(DBOption.max_total_wal_size, maxTotalWalSize); + } + + @Override + public long maxTotalWalSize() { + return getLong(DBOption.max_total_wal_size); + } + + @Override + public MutableDBOptionsBuilder setDeleteObsoleteFilesPeriodMicros( + final long micros) { + return setLong(DBOption.delete_obsolete_files_period_micros, micros); + } + + @Override + public long deleteObsoleteFilesPeriodMicros() { + return getLong(DBOption.delete_obsolete_files_period_micros); + } + + @Override + public MutableDBOptionsBuilder setStatsDumpPeriodSec( + final int statsDumpPeriodSec) { + return setInt(DBOption.stats_dump_period_sec, statsDumpPeriodSec); + } + + @Override + public int statsDumpPeriodSec() { + return getInt(DBOption.stats_dump_period_sec); + } + + @Override + public MutableDBOptionsBuilder setMaxOpenFiles(final int maxOpenFiles) { + return setInt(DBOption.max_open_files, maxOpenFiles); + } + + @Override + public int maxOpenFiles() { + return getInt(DBOption.max_open_files); + } + + @Override + public MutableDBOptionsBuilder setBytesPerSync(final long bytesPerSync) { + return setLong(DBOption.bytes_per_sync, bytesPerSync); + } + + @Override + public long bytesPerSync() { + return getLong(DBOption.bytes_per_sync); + } + + @Override + public MutableDBOptionsBuilder setWalBytesPerSync( + final long walBytesPerSync) { + return setLong(DBOption.wal_bytes_per_sync, walBytesPerSync); + } + + @Override + public long walBytesPerSync() { + return getLong(DBOption.wal_bytes_per_sync); + } + + @Override + public MutableDBOptionsBuilder setCompactionReadaheadSize( + final long compactionReadaheadSize) { + return setLong(DBOption.compaction_readahead_size, + compactionReadaheadSize); + } + + @Override + public long compactionReadaheadSize() { + return getLong(DBOption.compaction_readahead_size); + } + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptionsInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptionsInterface.java new file mode 100644 index 000000000..5fe3215b3 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableDBOptionsInterface.java @@ -0,0 +1,336 @@ +package org.rocksdb; + +public interface MutableDBOptionsInterface { + + /** + * Specifies the maximum number of concurrent background jobs (both flushes + * and compactions combined). + * Default: 2 + * + * @param maxBackgroundJobs number of max concurrent background jobs + * @return the instance of the current object. + */ + T setMaxBackgroundJobs(int maxBackgroundJobs); + + /** + * Returns the maximum number of concurrent background jobs (both flushes + * and compactions combined). + * Default: 2 + * + * @return the maximum number of concurrent background jobs. + */ + int maxBackgroundJobs(); + + /** + * Suggested number of concurrent background compaction jobs, submitted to + * the default LOW priority thread pool. + * Default: 1 + * + * @param baseBackgroundCompactions Suggested number of background compaction + * jobs + * + * @deprecated Use {@link #setMaxBackgroundJobs(int)} + */ + @Deprecated + void setBaseBackgroundCompactions(int baseBackgroundCompactions); + + /** + * Suggested number of concurrent background compaction jobs, submitted to + * the default LOW priority thread pool. + * Default: 1 + * + * @return Suggested number of background compaction jobs + */ + int baseBackgroundCompactions(); + + /** + * Specifies the maximum number of concurrent background compaction jobs, + * submitted to the default LOW priority thread pool. + * If you're increasing this, also consider increasing number of threads in + * LOW priority thread pool. For more information, see + * Default: 1 + * + * @param maxBackgroundCompactions the maximum number of background + * compaction jobs. + * @return the instance of the current object. + * + * @see RocksEnv#setBackgroundThreads(int) + * @see RocksEnv#setBackgroundThreads(int, Priority) + * @see DBOptionsInterface#maxBackgroundFlushes() + */ + T setMaxBackgroundCompactions(int maxBackgroundCompactions); + + /** + * Returns the maximum number of concurrent background compaction jobs, + * submitted to the default LOW priority thread pool. + * When increasing this number, we may also want to consider increasing + * number of threads in LOW priority thread pool. + * Default: 1 + * + * @return the maximum number of concurrent background compaction jobs. + * @see RocksEnv#setBackgroundThreads(int) + * @see RocksEnv#setBackgroundThreads(int, Priority) + * + * @deprecated Use {@link #setMaxBackgroundJobs(int)} + */ + @Deprecated + int maxBackgroundCompactions(); + + /** + * By default RocksDB will flush all memtables on DB close if there are + * unpersisted data (i.e. with WAL disabled) The flush can be skip to speedup + * DB close. Unpersisted data WILL BE LOST. + * + * DEFAULT: false + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)} + * API. + * + * @param avoidFlushDuringShutdown true if we should avoid flush during + * shutdown + * + * @return the reference to the current options. + */ + T setAvoidFlushDuringShutdown(boolean avoidFlushDuringShutdown); + + /** + * By default RocksDB will flush all memtables on DB close if there are + * unpersisted data (i.e. with WAL disabled) The flush can be skip to speedup + * DB close. Unpersisted data WILL BE LOST. + * + * DEFAULT: false + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)} + * API. + * + * @return true if we should avoid flush during shutdown + */ + boolean avoidFlushDuringShutdown(); + + /** + * This is the maximum buffer size that is used by WritableFileWriter. + * On Windows, we need to maintain an aligned buffer for writes. + * We allow the buffer to grow until it's size hits the limit. + * + * Default: 1024 * 1024 (1 MB) + * + * @param writableFileMaxBufferSize the maximum buffer size + * + * @return the reference to the current options. + */ + T setWritableFileMaxBufferSize(long writableFileMaxBufferSize); + + /** + * This is the maximum buffer size that is used by WritableFileWriter. + * On Windows, we need to maintain an aligned buffer for writes. + * We allow the buffer to grow until it's size hits the limit. + * + * Default: 1024 * 1024 (1 MB) + * + * @return the maximum buffer size + */ + long writableFileMaxBufferSize(); + + /** + * The limited write rate to DB if + * {@link ColumnFamilyOptions#softPendingCompactionBytesLimit()} or + * {@link ColumnFamilyOptions#level0SlowdownWritesTrigger()} is triggered, + * or we are writing to the last mem table allowed and we allow more than 3 + * mem tables. It is calculated using size of user write requests before + * compression. RocksDB may decide to slow down more if the compaction still + * gets behind further. + * + * Unit: bytes per second. + * + * Default: 16MB/s + * + * @param delayedWriteRate the rate in bytes per second + * + * @return the reference to the current options. + */ + T setDelayedWriteRate(long delayedWriteRate); + + /** + * The limited write rate to DB if + * {@link ColumnFamilyOptions#softPendingCompactionBytesLimit()} or + * {@link ColumnFamilyOptions#level0SlowdownWritesTrigger()} is triggered, + * or we are writing to the last mem table allowed and we allow more than 3 + * mem tables. It is calculated using size of user write requests before + * compression. RocksDB may decide to slow down more if the compaction still + * gets behind further. + * + * Unit: bytes per second. + * + * Default: 16MB/s + * + * @return the rate in bytes per second + */ + long delayedWriteRate(); + + /** + *

    Once write-ahead logs exceed this size, we will start forcing the + * flush of column families whose memtables are backed by the oldest live + * WAL file (i.e. the ones that are causing all the space amplification). + *

    + *

    If set to 0 (default), we will dynamically choose the WAL size limit to + * be [sum of all write_buffer_size * max_write_buffer_number] * 2

    + *

    This option takes effect only when there are more than one column family as + * otherwise the wal size is dictated by the write_buffer_size.

    + *

    Default: 0

    + * + * @param maxTotalWalSize max total wal size. + * @return the instance of the current object. + */ + T setMaxTotalWalSize(long maxTotalWalSize); + + /** + *

    Returns the max total wal size. Once write-ahead logs exceed this size, + * we will start forcing the flush of column families whose memtables are + * backed by the oldest live WAL file (i.e. the ones that are causing all + * the space amplification).

    + * + *

    If set to 0 (default), we will dynamically choose the WAL size limit + * to be [sum of all write_buffer_size * max_write_buffer_number] * 2 + *

    + * + * @return max total wal size + */ + long maxTotalWalSize(); + + /** + * The periodicity when obsolete files get deleted. The default + * value is 6 hours. The files that get out of scope by compaction + * process will still get automatically delete on every compaction, + * regardless of this setting + * + * @param micros the time interval in micros + * @return the instance of the current object. + */ + T setDeleteObsoleteFilesPeriodMicros(long micros); + + /** + * The periodicity when obsolete files get deleted. The default + * value is 6 hours. The files that get out of scope by compaction + * process will still get automatically delete on every compaction, + * regardless of this setting + * + * @return the time interval in micros when obsolete files will be deleted. + */ + long deleteObsoleteFilesPeriodMicros(); + + /** + * if not zero, dump rocksdb.stats to LOG every stats_dump_period_sec + * Default: 600 (10 minutes) + * + * @param statsDumpPeriodSec time interval in seconds. + * @return the instance of the current object. + */ + T setStatsDumpPeriodSec(int statsDumpPeriodSec); + + /** + * If not zero, dump rocksdb.stats to LOG every stats_dump_period_sec + * Default: 600 (10 minutes) + * + * @return time interval in seconds. + */ + int statsDumpPeriodSec(); + + /** + * Number of open files that can be used by the DB. You may need to + * increase this if your database has a large working set. Value -1 means + * files opened are always kept open. You can estimate number of files based + * on {@code target_file_size_base} and {@code target_file_size_multiplier} + * for level-based compaction. For universal-style compaction, you can usually + * set it to -1. + * Default: 5000 + * + * @param maxOpenFiles the maximum number of open files. + * @return the instance of the current object. + */ + T setMaxOpenFiles(int maxOpenFiles); + + /** + * Number of open files that can be used by the DB. You may need to + * increase this if your database has a large working set. Value -1 means + * files opened are always kept open. You can estimate number of files based + * on {@code target_file_size_base} and {@code target_file_size_multiplier} + * for level-based compaction. For universal-style compaction, you can usually + * set it to -1. + * + * @return the maximum number of open files. + */ + int maxOpenFiles(); + + /** + * Allows OS to incrementally sync files to disk while they are being + * written, asynchronously, in the background. + * Issue one request for every bytes_per_sync written. 0 turns it off. + * Default: 0 + * + * @param bytesPerSync size in bytes + * @return the instance of the current object. + */ + T setBytesPerSync(long bytesPerSync); + + /** + * Allows OS to incrementally sync files to disk while they are being + * written, asynchronously, in the background. + * Issue one request for every bytes_per_sync written. 0 turns it off. + * Default: 0 + * + * @return size in bytes + */ + long bytesPerSync(); + + /** + * Same as {@link #setBytesPerSync(long)} , but applies to WAL files + * + * Default: 0, turned off + * + * @param walBytesPerSync size in bytes + * @return the instance of the current object. + */ + T setWalBytesPerSync(long walBytesPerSync); + + /** + * Same as {@link #bytesPerSync()} , but applies to WAL files + * + * Default: 0, turned off + * + * @return size in bytes + */ + long walBytesPerSync(); + + + /** + * If non-zero, we perform bigger reads when doing compaction. If you're + * running RocksDB on spinning disks, you should set this to at least 2MB. + * + * That way RocksDB's compaction is doing sequential instead of random reads. + * When non-zero, we also force + * {@link DBOptionsInterface#newTableReaderForCompactionInputs()} to true. + * + * Default: 0 + * + * @param compactionReadaheadSize The compaction read-ahead size + * + * @return the reference to the current options. + */ + T setCompactionReadaheadSize(final long compactionReadaheadSize); + + /** + * If non-zero, we perform bigger reads when doing compaction. If you're + * running RocksDB on spinning disks, you should set this to at least 2MB. + * + * That way RocksDB's compaction is doing sequential instead of random reads. + * When non-zero, we also force + * {@link DBOptionsInterface#newTableReaderForCompactionInputs()} to true. + * + * Default: 0 + * + * @return The compaction read-ahead size + */ + long compactionReadaheadSize(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionKey.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionKey.java new file mode 100644 index 000000000..7402471ff --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionKey.java @@ -0,0 +1,15 @@ +package org.rocksdb; + +public interface MutableOptionKey { + enum ValueType { + DOUBLE, + LONG, + INT, + BOOLEAN, + INT_ARRAY, + ENUM + } + + String name(); + ValueType getValueType(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionValue.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionValue.java new file mode 100644 index 000000000..3727f7c1f --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/MutableOptionValue.java @@ -0,0 +1,375 @@ +package org.rocksdb; + +import static org.rocksdb.AbstractMutableOptions.INT_ARRAY_INT_SEPARATOR; + +public abstract class MutableOptionValue { + + abstract double asDouble() throws NumberFormatException; + abstract long asLong() throws NumberFormatException; + abstract int asInt() throws NumberFormatException; + abstract boolean asBoolean() throws IllegalStateException; + abstract int[] asIntArray() throws IllegalStateException; + abstract String asString(); + abstract T asObject(); + + private static abstract class MutableOptionValueObject + extends MutableOptionValue { + protected final T value; + + private MutableOptionValueObject(final T value) { + this.value = value; + } + + @Override T asObject() { + return value; + } + } + + static MutableOptionValue fromString(final String s) { + return new MutableOptionStringValue(s); + } + + static MutableOptionValue fromDouble(final double d) { + return new MutableOptionDoubleValue(d); + } + + static MutableOptionValue fromLong(final long d) { + return new MutableOptionLongValue(d); + } + + static MutableOptionValue fromInt(final int i) { + return new MutableOptionIntValue(i); + } + + static MutableOptionValue fromBoolean(final boolean b) { + return new MutableOptionBooleanValue(b); + } + + static MutableOptionValue fromIntArray(final int[] ix) { + return new MutableOptionIntArrayValue(ix); + } + + static > MutableOptionValue fromEnum(final N value) { + return new MutableOptionEnumValue<>(value); + } + + static class MutableOptionStringValue + extends MutableOptionValueObject { + MutableOptionStringValue(final String value) { + super(value); + } + + @Override + double asDouble() throws NumberFormatException { + return Double.parseDouble(value); + } + + @Override + long asLong() throws NumberFormatException { + return Long.parseLong(value); + } + + @Override + int asInt() throws NumberFormatException { + return Integer.parseInt(value); + } + + @Override + boolean asBoolean() throws IllegalStateException { + return Boolean.parseBoolean(value); + } + + @Override + int[] asIntArray() throws IllegalStateException { + throw new IllegalStateException("String is not applicable as int[]"); + } + + @Override + String asString() { + return value; + } + } + + static class MutableOptionDoubleValue + extends MutableOptionValue { + private final double value; + MutableOptionDoubleValue(final double value) { + this.value = value; + } + + @Override + double asDouble() { + return value; + } + + @Override + long asLong() throws NumberFormatException { + return Double.valueOf(value).longValue(); + } + + @Override + int asInt() throws NumberFormatException { + if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { + throw new NumberFormatException( + "double value lies outside the bounds of int"); + } + return Double.valueOf(value).intValue(); + } + + @Override + boolean asBoolean() throws IllegalStateException { + throw new IllegalStateException( + "double is not applicable as boolean"); + } + + @Override + int[] asIntArray() throws IllegalStateException { + if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { + throw new NumberFormatException( + "double value lies outside the bounds of int"); + } + return new int[] { Double.valueOf(value).intValue() }; + } + + @Override + String asString() { + return String.valueOf(value); + } + + @Override + Double asObject() { + return value; + } + } + + static class MutableOptionLongValue + extends MutableOptionValue { + private final long value; + + MutableOptionLongValue(final long value) { + this.value = value; + } + + @Override + double asDouble() { + if(value > Double.MAX_VALUE || value < Double.MIN_VALUE) { + throw new NumberFormatException( + "long value lies outside the bounds of int"); + } + return Long.valueOf(value).doubleValue(); + } + + @Override + long asLong() throws NumberFormatException { + return value; + } + + @Override + int asInt() throws NumberFormatException { + if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { + throw new NumberFormatException( + "long value lies outside the bounds of int"); + } + return Long.valueOf(value).intValue(); + } + + @Override + boolean asBoolean() throws IllegalStateException { + throw new IllegalStateException( + "long is not applicable as boolean"); + } + + @Override + int[] asIntArray() throws IllegalStateException { + if(value > Integer.MAX_VALUE || value < Integer.MIN_VALUE) { + throw new NumberFormatException( + "long value lies outside the bounds of int"); + } + return new int[] { Long.valueOf(value).intValue() }; + } + + @Override + String asString() { + return String.valueOf(value); + } + + @Override + Long asObject() { + return value; + } + } + + static class MutableOptionIntValue + extends MutableOptionValue { + private final int value; + + MutableOptionIntValue(final int value) { + this.value = value; + } + + @Override + double asDouble() { + if(value > Double.MAX_VALUE || value < Double.MIN_VALUE) { + throw new NumberFormatException("int value lies outside the bounds of int"); + } + return Integer.valueOf(value).doubleValue(); + } + + @Override + long asLong() throws NumberFormatException { + return value; + } + + @Override + int asInt() throws NumberFormatException { + return value; + } + + @Override + boolean asBoolean() throws IllegalStateException { + throw new IllegalStateException("int is not applicable as boolean"); + } + + @Override + int[] asIntArray() throws IllegalStateException { + return new int[] { value }; + } + + @Override + String asString() { + return String.valueOf(value); + } + + @Override + Integer asObject() { + return value; + } + } + + static class MutableOptionBooleanValue + extends MutableOptionValue { + private final boolean value; + + MutableOptionBooleanValue(final boolean value) { + this.value = value; + } + + @Override + double asDouble() { + throw new NumberFormatException("boolean is not applicable as double"); + } + + @Override + long asLong() throws NumberFormatException { + throw new NumberFormatException("boolean is not applicable as Long"); + } + + @Override + int asInt() throws NumberFormatException { + throw new NumberFormatException("boolean is not applicable as int"); + } + + @Override + boolean asBoolean() { + return value; + } + + @Override + int[] asIntArray() throws IllegalStateException { + throw new IllegalStateException("boolean is not applicable as int[]"); + } + + @Override + String asString() { + return String.valueOf(value); + } + + @Override + Boolean asObject() { + return value; + } + } + + static class MutableOptionIntArrayValue + extends MutableOptionValueObject { + MutableOptionIntArrayValue(final int[] value) { + super(value); + } + + @Override + double asDouble() { + throw new NumberFormatException("int[] is not applicable as double"); + } + + @Override + long asLong() throws NumberFormatException { + throw new NumberFormatException("int[] is not applicable as Long"); + } + + @Override + int asInt() throws NumberFormatException { + throw new NumberFormatException("int[] is not applicable as int"); + } + + @Override + boolean asBoolean() { + throw new NumberFormatException("int[] is not applicable as boolean"); + } + + @Override + int[] asIntArray() throws IllegalStateException { + return value; + } + + @Override + String asString() { + final StringBuilder builder = new StringBuilder(); + for(int i = 0; i < value.length; i++) { + builder.append(i); + if(i + 1 < value.length) { + builder.append(INT_ARRAY_INT_SEPARATOR); + } + } + return builder.toString(); + } + } + + static class MutableOptionEnumValue> + extends MutableOptionValueObject { + + MutableOptionEnumValue(final T value) { + super(value); + } + + @Override + double asDouble() throws NumberFormatException { + throw new NumberFormatException("Enum is not applicable as double"); + } + + @Override + long asLong() throws NumberFormatException { + throw new NumberFormatException("Enum is not applicable as long"); + } + + @Override + int asInt() throws NumberFormatException { + throw new NumberFormatException("Enum is not applicable as int"); + } + + @Override + boolean asBoolean() throws IllegalStateException { + throw new NumberFormatException("Enum is not applicable as boolean"); + } + + @Override + int[] asIntArray() throws IllegalStateException { + throw new NumberFormatException("Enum is not applicable as int[]"); + } + + @Override + String asString() { + return value.name(); + } + } + +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationStage.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationStage.java new file mode 100644 index 000000000..6ac0a15a2 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationStage.java @@ -0,0 +1,59 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The operation stage. + */ +public enum OperationStage { + STAGE_UNKNOWN((byte)0x0), + STAGE_FLUSH_RUN((byte)0x1), + STAGE_FLUSH_WRITE_L0((byte)0x2), + STAGE_COMPACTION_PREPARE((byte)0x3), + STAGE_COMPACTION_RUN((byte)0x4), + STAGE_COMPACTION_PROCESS_KV((byte)0x5), + STAGE_COMPACTION_INSTALL((byte)0x6), + STAGE_COMPACTION_SYNC_FILE((byte)0x7), + STAGE_PICK_MEMTABLES_TO_FLUSH((byte)0x8), + STAGE_MEMTABLE_ROLLBACK((byte)0x9), + STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS((byte)0xA); + + private final byte value; + + OperationStage(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value. + */ + byte getValue() { + return value; + } + + /** + * Get the Operation stage from the internal representation value. + * + * @param value the internal representation value. + * + * @return the operation stage + * + * @throws IllegalArgumentException if the value does not match + * an OperationStage + */ + static OperationStage fromValue(final byte value) + throws IllegalArgumentException { + for (final OperationStage threadType : OperationStage.values()) { + if (threadType.value == value) { + return threadType; + } + } + throw new IllegalArgumentException( + "Unknown value for OperationStage: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationType.java new file mode 100644 index 000000000..7cc9b65cd --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OperationType.java @@ -0,0 +1,54 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The type used to refer to a thread operation. + * + * A thread operation describes high-level action of a thread, + * examples include compaction and flush. + */ +public enum OperationType { + OP_UNKNOWN((byte)0x0), + OP_COMPACTION((byte)0x1), + OP_FLUSH((byte)0x2); + + private final byte value; + + OperationType(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value. + */ + byte getValue() { + return value; + } + + /** + * Get the Operation type from the internal representation value. + * + * @param value the internal representation value. + * + * @return the operation type + * + * @throws IllegalArgumentException if the value does not match + * an OperationType + */ + static OperationType fromValue(final byte value) + throws IllegalArgumentException { + for (final OperationType threadType : OperationType.values()) { + if (threadType.value == value) { + return threadType; + } + } + throw new IllegalArgumentException( + "Unknown value for OperationType: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OptimisticTransactionDB.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OptimisticTransactionDB.java index 1610dc739..267cab1de 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OptimisticTransactionDB.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/OptimisticTransactionDB.java @@ -94,6 +94,54 @@ public class OptimisticTransactionDB extends RocksDB return otdb; } + + /** + * This is similar to {@link #close()} except that it + * throws an exception if any error occurs. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + * + * @throws RocksDBException if an error occurs whilst closing. + */ + public void closeE() throws RocksDBException { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } finally { + disposeInternal(); + } + } + } + + /** + * This is similar to {@link #closeE()} except that it + * silently ignores any errors. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + */ + @Override + public void close() { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } catch (final RocksDBException e) { + // silently ignore the error report + } finally { + disposeInternal(); + } + } + } + @Override public Transaction beginTransaction(final WriteOptions writeOptions) { return new Transaction(this, beginTransaction(nativeHandle_, @@ -155,10 +203,14 @@ public class OptimisticTransactionDB extends RocksDB return db; } + @Override protected final native void disposeInternal(final long handle); + protected static native long open(final long optionsHandle, final String path) throws RocksDBException; protected static native long[] open(final long handle, final String path, final byte[][] columnFamilyNames, final long[] columnFamilyOptions); + private native static void closeDatabase(final long handle) + throws RocksDBException; private native long beginTransaction(final long handle, final long writeOptionsHandle); private native long beginTransaction(final long handle, @@ -171,5 +223,4 @@ public class OptimisticTransactionDB extends RocksDB final long optimisticTransactionOptionsHandle, final long oldTransactionHandle); private native long getBaseDB(final long handle); - @Override protected final native void disposeInternal(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Options.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Options.java index cac4fc5a3..5831b1e29 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Options.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Options.java @@ -19,7 +19,9 @@ import java.util.List; * automaticallyand native resources will be released as part of the process. */ public class Options extends RocksObject - implements DBOptionsInterface, ColumnFamilyOptionsInterface, + implements DBOptionsInterface, + MutableDBOptionsInterface, + ColumnFamilyOptionsInterface, MutableColumnFamilyOptionsInterface { static { RocksDB.loadLibrary(); @@ -66,10 +68,13 @@ public class Options extends RocksObject this.tableFormatConfig_ = other.tableFormatConfig_; this.rateLimiter_ = other.rateLimiter_; this.comparator_ = other.comparator_; + this.compactionFilter_ = other.compactionFilter_; + this.compactionFilterFactory_ = other.compactionFilterFactory_; this.compactionOptionsUniversal_ = other.compactionOptionsUniversal_; this.compactionOptionsFIFO_ = other.compactionOptionsFIFO_; this.compressionOptions_ = other.compressionOptions_; this.rowCache_ = other.rowCache_; + this.writeBufferManager_ = other.writeBufferManager_; } @Override @@ -213,6 +218,35 @@ public class Options extends RocksObject return this; } + @Override + public Options setCompactionFilter( + final AbstractCompactionFilter> + compactionFilter) { + setCompactionFilterHandle(nativeHandle_, compactionFilter.nativeHandle_); + compactionFilter_ = compactionFilter; + return this; + } + + @Override + public AbstractCompactionFilter> compactionFilter() { + assert (isOwningHandle()); + return compactionFilter_; + } + + @Override + public Options setCompactionFilterFactory(final AbstractCompactionFilterFactory> compactionFilterFactory) { + assert (isOwningHandle()); + setCompactionFilterFactoryHandle(nativeHandle_, compactionFilterFactory.nativeHandle_); + compactionFilterFactory_ = compactionFilterFactory; + return this; + } + + @Override + public AbstractCompactionFilterFactory> compactionFilterFactory() { + assert (isOwningHandle()); + return compactionFilterFactory_; + } + @Override public Options setWriteBufferSize(final long writeBufferSize) { assert(isOwningHandle()); @@ -440,9 +474,10 @@ public class Options extends RocksObject } @Override - public void setMaxSubcompactions(final int maxSubcompactions) { + public Options setMaxSubcompactions(final int maxSubcompactions) { assert(isOwningHandle()); setMaxSubcompactions(nativeHandle_, maxSubcompactions); + return this; } @Override @@ -724,6 +759,20 @@ public class Options extends RocksObject } @Override + public Options setWriteBufferManager(final WriteBufferManager writeBufferManager) { + assert(isOwningHandle()); + setWriteBufferManager(nativeHandle_, writeBufferManager.nativeHandle_); + this.writeBufferManager_ = writeBufferManager; + return this; + } + + @Override + public WriteBufferManager writeBufferManager() { + assert(isOwningHandle()); + return this.writeBufferManager_; + } + + @Override public long dbWriteBufferSize() { assert(isOwningHandle()); return dbWriteBufferSize(nativeHandle_); @@ -859,6 +908,17 @@ public class Options extends RocksObject return delayedWriteRate(nativeHandle_); } + @Override + public Options setEnablePipelinedWrite(final boolean enablePipelinedWrite) { + setEnablePipelinedWrite(nativeHandle_, enablePipelinedWrite); + return this; + } + + @Override + public boolean enablePipelinedWrite() { + return enablePipelinedWrite(nativeHandle_); + } + @Override public Options setAllowConcurrentMemtableWrite( final boolean allowConcurrentMemtableWrite) { @@ -960,6 +1020,20 @@ public class Options extends RocksObject return this.rowCache_; } + @Override + public Options setWalFilter(final AbstractWalFilter walFilter) { + assert(isOwningHandle()); + setWalFilter(nativeHandle_, walFilter.nativeHandle_); + this.walFilter_ = walFilter; + return this; + } + + @Override + public WalFilter walFilter() { + assert(isOwningHandle()); + return this.walFilter_; + } + @Override public Options setFailIfOptionsFileError(final boolean failIfOptionsFileError) { assert(isOwningHandle()); @@ -1012,6 +1086,58 @@ public class Options extends RocksObject return avoidFlushDuringShutdown(nativeHandle_); } + @Override + public Options setAllowIngestBehind(final boolean allowIngestBehind) { + assert(isOwningHandle()); + setAllowIngestBehind(nativeHandle_, allowIngestBehind); + return this; + } + + @Override + public boolean allowIngestBehind() { + assert(isOwningHandle()); + return allowIngestBehind(nativeHandle_); + } + + @Override + public Options setPreserveDeletes(final boolean preserveDeletes) { + assert(isOwningHandle()); + setPreserveDeletes(nativeHandle_, preserveDeletes); + return this; + } + + @Override + public boolean preserveDeletes() { + assert(isOwningHandle()); + return preserveDeletes(nativeHandle_); + } + + @Override + public Options setTwoWriteQueues(final boolean twoWriteQueues) { + assert(isOwningHandle()); + setTwoWriteQueues(nativeHandle_, twoWriteQueues); + return this; + } + + @Override + public boolean twoWriteQueues() { + assert(isOwningHandle()); + return twoWriteQueues(nativeHandle_); + } + + @Override + public Options setManualWalFlush(final boolean manualWalFlush) { + assert(isOwningHandle()); + setManualWalFlush(nativeHandle_, manualWalFlush); + return this; + } + + @Override + public boolean manualWalFlush() { + assert(isOwningHandle()); + return manualWalFlush(nativeHandle_); + } + @Override public MemTableConfig memTableConfig() { return this.memTableConfig_; @@ -1148,6 +1274,20 @@ public class Options extends RocksObject bottommostCompressionType(nativeHandle_)); } + @Override + public Options setBottommostCompressionOptions( + final CompressionOptions bottommostCompressionOptions) { + setBottommostCompressionOptions(nativeHandle_, + bottommostCompressionOptions.nativeHandle_); + this.bottommostCompressionOptions_ = bottommostCompressionOptions; + return this; + } + + @Override + public CompressionOptions bottommostCompressionOptions() { + return this.bottommostCompressionOptions_; + } + @Override public Options setCompressionOptions( final CompressionOptions compressionOptions) { @@ -1163,7 +1303,7 @@ public class Options extends RocksObject @Override public CompactionStyle compactionStyle() { - return CompactionStyle.values()[compactionStyle(nativeHandle_)]; + return CompactionStyle.fromValue(compactionStyle(nativeHandle_)); } @Override @@ -1535,6 +1675,17 @@ public class Options extends RocksObject return reportBgIoStats(nativeHandle_); } + @Override + public Options setTtl(final long ttl) { + setTtl(nativeHandle_, ttl); + return this; + } + + @Override + public long ttl() { + return ttl(nativeHandle_); + } + @Override public Options setCompactionOptionsUniversal( final CompactionOptionsUniversal compactionOptionsUniversal) { @@ -1573,6 +1724,17 @@ public class Options extends RocksObject return forceConsistencyChecks(nativeHandle_); } + @Override + public Options setAtomicFlush(final boolean atomicFlush) { + setAtomicFlush(nativeHandle_, atomicFlush); + return this; + } + + @Override + public boolean atomicFlush() { + return atomicFlush(nativeHandle_); + } + private native static long newOptions(); private native static long newOptions(long dbOptHandle, long cfOptHandle); @@ -1690,6 +1852,8 @@ public class Options extends RocksObject private native boolean adviseRandomOnOpen(long handle); private native void setDbWriteBufferSize(final long handle, final long dbWriteBufferSize); + private native void setWriteBufferManager(final long handle, + final long writeBufferManagerHandle); private native long dbWriteBufferSize(final long handle); private native void setAccessHintOnCompactionStart(final long handle, final byte accessHintOnCompactionStart); @@ -1719,6 +1883,9 @@ public class Options extends RocksObject private native boolean enableThreadTracking(long handle); private native void setDelayedWriteRate(long handle, long delayedWriteRate); private native long delayedWriteRate(long handle); + private native void setEnablePipelinedWrite(final long handle, + final boolean pipelinedWrite); + private native boolean enablePipelinedWrite(final long handle); private native void setAllowConcurrentMemtableWrite(long handle, boolean allowConcurrentMemtableWrite); private native boolean allowConcurrentMemtableWrite(long handle); @@ -1741,7 +1908,9 @@ public class Options extends RocksObject final boolean allow2pc); private native boolean allow2pc(final long handle); private native void setRowCache(final long handle, - final long row_cache_handle); + final long rowCacheHandle); + private native void setWalFilter(final long handle, + final long walFilterHandle); private native void setFailIfOptionsFileError(final long handle, final boolean failIfOptionsFileError); private native boolean failIfOptionsFileError(final long handle); @@ -1754,6 +1923,19 @@ public class Options extends RocksObject private native void setAvoidFlushDuringShutdown(final long handle, final boolean avoidFlushDuringShutdown); private native boolean avoidFlushDuringShutdown(final long handle); + private native void setAllowIngestBehind(final long handle, + final boolean allowIngestBehind); + private native boolean allowIngestBehind(final long handle); + private native void setPreserveDeletes(final long handle, + final boolean preserveDeletes); + private native boolean preserveDeletes(final long handle); + private native void setTwoWriteQueues(final long handle, + final boolean twoWriteQueues); + private native boolean twoWriteQueues(final long handle); + private native void setManualWalFlush(final long handle, + final boolean manualWalFlush); + private native boolean manualWalFlush(final long handle); + // CF native handles private native void optimizeForSmallDb(final long handle); @@ -1770,6 +1952,10 @@ public class Options extends RocksObject long handle, String name); private native void setMergeOperator( long handle, long mergeOperatorHandle); + private native void setCompactionFilterHandle( + long handle, long compactionFilterHandle); + private native void setCompactionFilterFactoryHandle( + long handle, long compactionFilterFactoryHandle); private native void setWriteBufferSize(long handle, long writeBufferSize) throws IllegalArgumentException; private native long writeBufferSize(long handle); @@ -1787,6 +1973,8 @@ public class Options extends RocksObject private native void setBottommostCompressionType(long handle, byte bottommostCompressionType); private native byte bottommostCompressionType(long handle); + private native void setBottommostCompressionOptions(final long handle, + final long bottommostCompressionOptionsHandle); private native void setCompressionOptions(long handle, long compressionOptionsHandle); private native void useFixedLengthPrefixExtractor( @@ -1890,6 +2078,8 @@ public class Options extends RocksObject private native void setReportBgIoStats(final long handle, final boolean reportBgIoStats); private native boolean reportBgIoStats(final long handle); + private native void setTtl(final long handle, final long ttl); + private native long ttl(final long handle); private native void setCompactionOptionsUniversal(final long handle, final long compactionOptionsUniversalHandle); private native void setCompactionOptionsFIFO(final long handle, @@ -1897,6 +2087,9 @@ public class Options extends RocksObject private native void setForceConsistencyChecks(final long handle, final boolean forceConsistencyChecks); private native boolean forceConsistencyChecks(final long handle); + private native void setAtomicFlush(final long handle, + final boolean atomicFlush); + private native boolean atomicFlush(final long handle); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! @@ -1905,8 +2098,14 @@ public class Options extends RocksObject private TableFormatConfig tableFormatConfig_; private RateLimiter rateLimiter_; private AbstractComparator> comparator_; + private AbstractCompactionFilter> compactionFilter_; + private AbstractCompactionFilterFactory> + compactionFilterFactory_; private CompactionOptionsUniversal compactionOptionsUniversal_; private CompactionOptionsFIFO compactionOptionsFIFO_; + private CompressionOptions bottommostCompressionOptions_; private CompressionOptions compressionOptions_; private Cache rowCache_; + private WalFilter walFilter_; + private WriteBufferManager writeBufferManager_; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/PersistentCache.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/PersistentCache.java new file mode 100644 index 000000000..aed565297 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/PersistentCache.java @@ -0,0 +1,26 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Persistent cache for caching IO pages on a persistent medium. The + * cache is specifically designed for persistent read cache. + */ +public class PersistentCache extends RocksObject { + + public PersistentCache(final Env env, final String path, final long size, + final Logger logger, final boolean optimizedForNvm) + throws RocksDBException { + super(newPersistentCache(env.nativeHandle_, path, size, + logger.nativeHandle_, optimizedForNvm)); + } + + private native static long newPersistentCache(final long envHandle, + final String path, final long size, final long loggerHandle, + final boolean optimizedForNvm) throws RocksDBException; + + @Override protected final native void disposeInternal(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Priority.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Priority.java new file mode 100644 index 000000000..34a56edcb --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Priority.java @@ -0,0 +1,49 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The Thread Pool priority. + */ +public enum Priority { + BOTTOM((byte) 0x0), + LOW((byte) 0x1), + HIGH((byte)0x2), + TOTAL((byte)0x3); + + private final byte value; + + Priority(final byte value) { + this.value = value; + } + + /** + *

    Returns the byte value of the enumerations value.

    + * + * @return byte representation + */ + byte getValue() { + return value; + } + + /** + * Get Priority by byte value. + * + * @param value byte representation of Priority. + * + * @return {@link org.rocksdb.Priority} instance. + * @throws java.lang.IllegalArgumentException if an invalid + * value is provided. + */ + static Priority getPriority(final byte value) { + for (final Priority priority : Priority.values()) { + if (priority.getValue() == value){ + return priority; + } + } + throw new IllegalArgumentException("Illegal value provided for Priority."); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Range.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Range.java new file mode 100644 index 000000000..74c85e5f0 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Range.java @@ -0,0 +1,19 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Range from start to limit. + */ +public class Range { + final Slice start; + final Slice limit; + + public Range(final Slice start, final Slice limit) { + this.start = start; + this.limit = limit; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RateLimiter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RateLimiter.java index 4d6cb2129..c2b8a0fd9 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RateLimiter.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RateLimiter.java @@ -193,9 +193,9 @@ public class RateLimiter extends RocksObject { } /** - *

    Total bytes that go though rate limiter.

    + *

    Total bytes that go through rate limiter.

    * - * @return total bytes that go though rate limiter. + * @return total bytes that go through rate limiter. */ public long getTotalBytesThrough() { assert(isOwningHandle()); @@ -203,9 +203,9 @@ public class RateLimiter extends RocksObject { } /** - *

    Total # of requests that go though rate limiter.

    + *

    Total # of requests that go through rate limiter.

    * - * @return total # of requests that go though rate limiter. + * @return total # of requests that go through rate limiter. */ public long getTotalRequests() { assert(isOwningHandle()); diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ReadOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ReadOptions.java index be8aec6b3..8353e0fe8 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ReadOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ReadOptions.java @@ -16,6 +16,15 @@ public class ReadOptions extends RocksObject { super(newReadOptions()); } + /** + * @param verifyChecksums verification will be performed on every read + * when set to true + * @param fillCache if true, then fill-cache behavior will be performed. + */ + public ReadOptions(final boolean verifyChecksums, final boolean fillCache) { + super(newReadOptions(verifyChecksums, fillCache)); + } + /** * Copy constructor. * @@ -26,7 +35,8 @@ public class ReadOptions extends RocksObject { */ public ReadOptions(ReadOptions other) { super(copyReadOptions(other.nativeHandle_)); - iterateUpperBoundSlice_ = other.iterateUpperBoundSlice_; + this.iterateLowerBoundSlice_ = other.iterateLowerBoundSlice_; + this.iterateUpperBoundSlice_ = other.iterateUpperBoundSlice_; } /** @@ -181,8 +191,12 @@ public class ReadOptions extends RocksObject { /** * Returns whether managed iterators will be used. * - * @return the setting of whether managed iterators will be used, by default false + * @return the setting of whether managed iterators will be used, + * by default false + * + * @deprecated This options is not used anymore. */ + @Deprecated public boolean managed() { assert(isOwningHandle()); return managed(nativeHandle_); @@ -195,7 +209,10 @@ public class ReadOptions extends RocksObject { * * @param managed if true, then managed iterators will be enabled. * @return the reference to the current ReadOptions. + * + * @deprecated This options is not used anymore. */ + @Deprecated public ReadOptions setManaged(final boolean managed) { assert(isOwningHandle()); setManaged(nativeHandle_, managed); @@ -237,7 +254,6 @@ public class ReadOptions extends RocksObject { return prefixSameAsStart(nativeHandle_); } - /** * Enforce that the iterator only iterates over the same prefix as the seek. * This option is effective only for prefix seeks, i.e. prefix_extractor is @@ -345,6 +361,37 @@ public class ReadOptions extends RocksObject { return this; } + /** + * A threshold for the number of keys that can be skipped before failing an + * iterator seek as incomplete. + * + * @return the number of keys that can be skipped + * before failing an iterator seek as incomplete. + */ + public long maxSkippableInternalKeys() { + assert(isOwningHandle()); + return maxSkippableInternalKeys(nativeHandle_); + } + + /** + * A threshold for the number of keys that can be skipped before failing an + * iterator seek as incomplete. The default value of 0 should be used to + * never fail a request as incomplete, even on skipping too many keys. + * + * Default: 0 + * + * @param maxSkippableInternalKeys the number of keys that can be skipped + * before failing an iterator seek as incomplete. + * + * @return the reference to the current ReadOptions. + */ + public ReadOptions setMaxSkippableInternalKeys( + final long maxSkippableInternalKeys) { + assert(isOwningHandle()); + setMaxSkippableInternalKeys(nativeHandle_, maxSkippableInternalKeys); + return this; + } + /** * If true, keys deleted using the DeleteRange() API will be visible to * readers until they are naturally deleted during compaction. This improves @@ -377,14 +424,63 @@ public class ReadOptions extends RocksObject { } /** - * Defines the extent upto which the forward iterator can returns entries. - * Once the bound is reached, Valid() will be false. iterate_upper_bound - * is exclusive ie the bound value is not a valid entry. If - * iterator_extractor is not null, the Seek target and iterator_upper_bound + * Defines the smallest key at which the backward + * iterator can return an entry. Once the bound is passed, + * {@link RocksIterator#isValid()} will be false. + * + * The lower bound is inclusive i.e. the bound value is a valid + * entry. + * + * If prefix_extractor is not null, the Seek target and `iterate_lower_bound` + * need to have the same prefix. This is because ordering is not guaranteed + * outside of prefix domain. + * + * Default: null + * + * @param iterateLowerBound Slice representing the upper bound + * @return the reference to the current ReadOptions. + */ + public ReadOptions setIterateLowerBound(final Slice iterateLowerBound) { + assert(isOwningHandle()); + if (iterateLowerBound != null) { + // Hold onto a reference so it doesn't get garbage collected out from under us. + iterateLowerBoundSlice_ = iterateLowerBound; + setIterateLowerBound(nativeHandle_, iterateLowerBoundSlice_.getNativeHandle()); + } + return this; + } + + /** + * Returns the smallest key at which the backward + * iterator can return an entry. + * + * The lower bound is inclusive i.e. the bound value is a valid entry. + * + * @return the smallest key, or null if there is no lower bound defined. + */ + public Slice iterateLowerBound() { + assert(isOwningHandle()); + final long lowerBoundSliceHandle = iterateLowerBound(nativeHandle_); + if (lowerBoundSliceHandle != 0) { + // Disown the new slice - it's owned by the C++ side of the JNI boundary + // from the perspective of this method. + return new Slice(lowerBoundSliceHandle, false); + } + return null; + } + + /** + * Defines the extent up to which the forward iterator + * can returns entries. Once the bound is reached, + * {@link RocksIterator#isValid()} will be false. + * + * The upper bound is exclusive i.e. the bound value is not a valid entry. + * + * If iterator_extractor is not null, the Seek target and iterate_upper_bound * need to have the same prefix. This is because ordering is not guaranteed - * outside of prefix domain. There is no lower bound on the iterator. + * outside of prefix domain. * - * Default: nullptr + * Default: null * * @param iterateUpperBound Slice representing the upper bound * @return the reference to the current ReadOptions. @@ -392,7 +488,7 @@ public class ReadOptions extends RocksObject { public ReadOptions setIterateUpperBound(final Slice iterateUpperBound) { assert(isOwningHandle()); if (iterateUpperBound != null) { - // Hold onto a reference so it doesn't get garbaged collected out from under us. + // Hold onto a reference so it doesn't get garbage collected out from under us. iterateUpperBoundSlice_ = iterateUpperBound; setIterateUpperBound(nativeHandle_, iterateUpperBoundSlice_.getNativeHandle()); } @@ -400,21 +496,16 @@ public class ReadOptions extends RocksObject { } /** - * Defines the extent upto which the forward iterator can returns entries. - * Once the bound is reached, Valid() will be false. iterate_upper_bound - * is exclusive ie the bound value is not a valid entry. If - * iterator_extractor is not null, the Seek target and iterator_upper_bound - * need to have the same prefix. This is because ordering is not guaranteed - * outside of prefix domain. There is no lower bound on the iterator. + * Returns the largest key at which the forward + * iterator can return an entry. * - * Default: nullptr + * The upper bound is exclusive i.e. the bound value is not a valid entry. * - * @return Slice representing current iterate_upper_bound setting, or null if - * one does not exist. + * @return the largest key, or null if there is no upper bound defined. */ public Slice iterateUpperBound() { assert(isOwningHandle()); - long upperBoundSliceHandle = iterateUpperBound(nativeHandle_); + final long upperBoundSliceHandle = iterateUpperBound(nativeHandle_); if (upperBoundSliceHandle != 0) { // Disown the new slice - it's owned by the C++ side of the JNI boundary // from the perspective of this method. @@ -423,18 +514,71 @@ public class ReadOptions extends RocksObject { return null; } + /** + * A callback to determine whether relevant keys for this scan exist in a + * given table based on the table's properties. The callback is passed the + * properties of each table during iteration. If the callback returns false, + * the table will not be scanned. This option only affects Iterators and has + * no impact on point lookups. + * + * Default: null (every table will be scanned) + * + * @param tableFilter the table filter for the callback. + * + * @return the reference to the current ReadOptions. + */ + public ReadOptions setTableFilter(final AbstractTableFilter tableFilter) { + assert(isOwningHandle()); + setTableFilter(nativeHandle_, tableFilter.nativeHandle_); + return this; + } + + /** + * Needed to support differential snapshots. Has 2 effects: + * 1) Iterator will skip all internal keys with seqnum < iter_start_seqnum + * 2) if this param > 0 iterator will return INTERNAL keys instead of user + * keys; e.g. return tombstones as well. + * + * Default: 0 (don't filter by seqnum, return user keys) + * + * @param startSeqnum the starting sequence number. + * + * @return the reference to the current ReadOptions. + */ + public ReadOptions setIterStartSeqnum(final long startSeqnum) { + assert(isOwningHandle()); + setIterStartSeqnum(nativeHandle_, startSeqnum); + return this; + } + + /** + * Returns the starting Sequence Number of any iterator. + * See {@link #setIterStartSeqnum(long)}. + * + * @return the starting sequence number of any iterator. + */ + public long iterStartSeqnum() { + assert(isOwningHandle()); + return iterStartSeqnum(nativeHandle_); + } + // instance variables // NOTE: If you add new member variables, please update the copy constructor above! // - // Hold a reference to any iterate upper bound that was set on this object - // until we're destroyed or it's overwritten. That way the caller can freely - // leave scope without us losing the Java Slice object, which during close() - // would also reap its associated rocksdb::Slice native object since it's - // possibly (likely) to be an owning handle. - protected Slice iterateUpperBoundSlice_; + // Hold a reference to any iterate lower or upper bound that was set on this + // object until we're destroyed or it's overwritten. That way the caller can + // freely leave scope without us losing the Java Slice object, which during + // close() would also reap its associated rocksdb::Slice native object since + // it's possibly (likely) to be an owning handle. + private Slice iterateLowerBoundSlice_; + private Slice iterateUpperBoundSlice_; private native static long newReadOptions(); + private native static long newReadOptions(final boolean verifyChecksums, + final boolean fillCache); private native static long copyReadOptions(long handle); + @Override protected final native void disposeInternal(final long handle); + private native boolean verifyChecksums(long handle); private native void setVerifyChecksums(long handle, boolean verifyChecksums); private native boolean fillCache(long handle); @@ -459,13 +603,20 @@ public class ReadOptions extends RocksObject { private native long readaheadSize(final long handle); private native void setReadaheadSize(final long handle, final long readaheadSize); + private native long maxSkippableInternalKeys(final long handle); + private native void setMaxSkippableInternalKeys(final long handle, + final long maxSkippableInternalKeys); private native boolean ignoreRangeDeletions(final long handle); private native void setIgnoreRangeDeletions(final long handle, final boolean ignoreRangeDeletions); private native void setIterateUpperBound(final long handle, final long upperBoundSliceHandle); private native long iterateUpperBound(final long handle); - - @Override protected final native void disposeInternal(final long handle); - + private native void setIterateLowerBound(final long handle, + final long lowerBoundSliceHandle); + private native long iterateLowerBound(final long handle); + private native void setTableFilter(final long handle, + final long tableFilterHandle); + private native void setIterStartSeqnum(final long handle, final long seqNum); + private native long iterStartSeqnum(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksDB.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksDB.java index 38be3333f..b93a51e28 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksDB.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksDB.java @@ -7,7 +7,6 @@ package org.rocksdb; import java.util.*; import java.io.IOException; -import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicReference; import org.rocksdb.util.Environment; @@ -64,8 +63,8 @@ public class RocksDB extends RocksObject { NativeLibraryLoader.getInstance().loadLibrary(tmpDir); } catch (IOException e) { libraryLoaded.set(LibraryState.NOT_LOADED); - throw new RuntimeException("Unable to load the RocksDB shared library" - + e); + throw new RuntimeException("Unable to load the RocksDB shared library", + e); } libraryLoaded.set(LibraryState.LOADED); @@ -139,6 +138,15 @@ public class RocksDB extends RocksObject { } } + /** + * Private constructor. + * + * @param nativeHandle The native handle of the C++ RocksDB object + */ + protected RocksDB(final long nativeHandle) { + super(nativeHandle); + } + /** * The factory constructor of RocksDB that opens a RocksDB instance given * the path to the database using the default options w/ createIfMissing @@ -153,9 +161,7 @@ public class RocksDB extends RocksObject { * @see Options#setCreateIfMissing(boolean) */ public static RocksDB open(final String path) throws RocksDBException { - // This allows to use the rocksjni default Options instead of - // the c++ one. - Options options = new Options(); + final Options options = new Options(); options.setCreateIfMissing(true); return open(options, path); } @@ -193,9 +199,7 @@ public class RocksDB extends RocksObject { final List columnFamilyDescriptors, final List columnFamilyHandles) throws RocksDBException { - // This allows to use the rocksjni default Options instead of - // the c++ one. - DBOptions options = new DBOptions(); + final DBOptions options = new DBOptions(); return open(options, path, columnFamilyDescriptors, columnFamilyHandles); } @@ -418,6 +422,54 @@ public class RocksDB extends RocksObject { return db; } + + /** + * This is similar to {@link #close()} except that it + * throws an exception if any error occurs. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + * + * @throws RocksDBException if an error occurs whilst closing. + */ + public void closeE() throws RocksDBException { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } finally { + disposeInternal(); + } + } + } + + /** + * This is similar to {@link #closeE()} except that it + * silently ignores any errors. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + */ + @Override + public void close() { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } catch (final RocksDBException e) { + // silently ignore the error report + } finally { + disposeInternal(); + } + } + } + /** * Static method to determine all available column families for a * rocksdb database identified by path @@ -435,10 +487,108 @@ public class RocksDB extends RocksObject { path)); } - protected void storeOptionsInstance(DBOptionsInterface options) { - options_ = options; + /** + * Creates a new column family with the name columnFamilyName and + * allocates a ColumnFamilyHandle within an internal structure. + * The ColumnFamilyHandle is automatically disposed with DB disposal. + * + * @param columnFamilyDescriptor column family to be created. + * @return {@link org.rocksdb.ColumnFamilyHandle} instance. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public ColumnFamilyHandle createColumnFamily( + final ColumnFamilyDescriptor columnFamilyDescriptor) + throws RocksDBException { + return new ColumnFamilyHandle(this, createColumnFamily(nativeHandle_, + columnFamilyDescriptor.getName(), + columnFamilyDescriptor.getName().length, + columnFamilyDescriptor.getOptions().nativeHandle_)); + } + + /** + * Bulk create column families with the same column family options. + * + * @param columnFamilyOptions the options for the column families. + * @param columnFamilyNames the names of the column families. + * + * @return the handles to the newly created column families. + */ + public List createColumnFamilies( + final ColumnFamilyOptions columnFamilyOptions, + final List columnFamilyNames) throws RocksDBException { + final byte[][] cfNames = columnFamilyNames.toArray( + new byte[0][]); + final long[] cfHandles = createColumnFamilies(nativeHandle_, + columnFamilyOptions.nativeHandle_, cfNames); + final List columnFamilyHandles = + new ArrayList<>(cfHandles.length); + for (int i = 0; i < cfHandles.length; i++) { + columnFamilyHandles.add(new ColumnFamilyHandle(this, cfHandles[i])); + } + return columnFamilyHandles; + } + + /** + * Bulk create column families with the same column family options. + * + * @param columnFamilyDescriptors the descriptions of the column families. + * + * @return the handles to the newly created column families. + */ + public List createColumnFamilies( + final List columnFamilyDescriptors) + throws RocksDBException { + final long[] cfOptsHandles = new long[columnFamilyDescriptors.size()]; + final byte[][] cfNames = new byte[columnFamilyDescriptors.size()][]; + for (int i = 0; i < columnFamilyDescriptors.size(); i++) { + final ColumnFamilyDescriptor columnFamilyDescriptor + = columnFamilyDescriptors.get(i); + cfOptsHandles[i] = columnFamilyDescriptor.getOptions().nativeHandle_; + cfNames[i] = columnFamilyDescriptor.getName(); + } + final long[] cfHandles = createColumnFamilies(nativeHandle_, + cfOptsHandles, cfNames); + final List columnFamilyHandles = + new ArrayList<>(cfHandles.length); + for (int i = 0; i < cfHandles.length; i++) { + columnFamilyHandles.add(new ColumnFamilyHandle(this, cfHandles[i])); + } + return columnFamilyHandles; + } + + /** + * Drops the column family specified by {@code columnFamilyHandle}. This call + * only records a drop record in the manifest and prevents the column + * family from flushing and compacting. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public void dropColumnFamily(final ColumnFamilyHandle columnFamilyHandle) + throws RocksDBException { + dropColumnFamily(nativeHandle_, columnFamilyHandle.nativeHandle_); + } + + // Bulk drop column families. This call only records drop records in the + // manifest and prevents the column families from flushing and compacting. + // In case of error, the request may succeed partially. User may call + // ListColumnFamilies to check the result. + public void dropColumnFamilies( + final List columnFamilies) throws RocksDBException { + final long[] cfHandles = new long[columnFamilies.size()]; + for (int i = 0; i < columnFamilies.size(); i++) { + cfHandles[i] = columnFamilies.get(i).nativeHandle_; + } + dropColumnFamilies(nativeHandle_, cfHandles); } + //TODO(AR) what about DestroyColumnFamilyHandle + /** * Set the database entry for "key" to "value". * @@ -453,6 +603,32 @@ public class RocksDB extends RocksObject { put(nativeHandle_, key, 0, key.length, value, 0, value.length); } + /** + * Set the database entry for "key" to "value". + * + * @param key The specified key to be inserted + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value associated with the specified key + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) + * + * @throws RocksDBException thrown if errors happens in underlying native + * library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds + */ + public void put(final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + put(nativeHandle_, key, offset, len, value, vOffset, vLen); + } + /** * Set the database entry for "key" to "value" in the specified * column family. @@ -473,6 +649,37 @@ public class RocksDB extends RocksObject { columnFamilyHandle.nativeHandle_); } + /** + * Set the database entry for "key" to "value" in the specified + * column family. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key The specified key to be inserted + * @param offset the offset of the "key" array to be used, must + * be non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value associated with the specified key + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) + * + * @throws RocksDBException thrown if errors happens in underlying native + * library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds + */ + public void put(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + put(nativeHandle_, key, offset, len, value, vOffset, vLen, + columnFamilyHandle.nativeHandle_); + } + /** * Set the database entry for "key" to "value". * @@ -489,6 +696,35 @@ public class RocksDB extends RocksObject { key, 0, key.length, value, 0, value.length); } + /** + * Set the database entry for "key" to "value". + * + * @param writeOpts {@link org.rocksdb.WriteOptions} instance. + * @param key The specified key to be inserted + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value associated with the specified key + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds + */ + public void put(final WriteOptions writeOpts, + final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + put(nativeHandle_, writeOpts.nativeHandle_, + key, offset, len, value, vOffset, vLen); + } + /** * Set the database entry for "key" to "value" for the specified * column family. @@ -513,1034 +749,1611 @@ public class RocksDB extends RocksObject { } /** - * If the key definitely does not exist in the database, then this method - * returns false, else true. + * Set the database entry for "key" to "value" for the specified + * column family. * - * This check is potentially lighter-weight than invoking DB::Get(). One way - * to make this lighter weight is to avoid doing any IOs. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param writeOpts {@link org.rocksdb.WriteOptions} instance. + * @param key The specified key to be inserted + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value associated with the specified key + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) * - * @param key byte array of a key to search for - * @param value StringBuilder instance which is a out parameter if a value is - * found in block-cache. - * @return boolean value indicating if key does not exist or might exist. + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds */ - public boolean keyMayExist(final byte[] key, final StringBuilder value) { - return keyMayExist(nativeHandle_, key, 0, key.length, value); + public void put(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpts, + final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + put(nativeHandle_, writeOpts.nativeHandle_, key, offset, len, value, + vOffset, vLen, columnFamilyHandle.nativeHandle_); } /** - * If the key definitely does not exist in the database, then this method - * returns false, else true. + * Remove the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * This check is potentially lighter-weight than invoking DB::Get(). One way - * to make this lighter weight is to avoid doing any IOs. + * @param key Key to delete within database * - * @param columnFamilyHandle {@link ColumnFamilyHandle} instance - * @param key byte array of a key to search for - * @param value StringBuilder instance which is a out parameter if a value is - * found in block-cache. - * @return boolean value indicating if key does not exist or might exist. + * @throws RocksDBException thrown if error happens in underlying + * native library. + * + * @deprecated Use {@link #delete(byte[])} */ - public boolean keyMayExist(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key, final StringBuilder value) { - return keyMayExist(nativeHandle_, key, 0, key.length, - columnFamilyHandle.nativeHandle_, value); + @Deprecated + public void remove(final byte[] key) throws RocksDBException { + delete(key); } /** - * If the key definitely does not exist in the database, then this method - * returns false, else true. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * This check is potentially lighter-weight than invoking DB::Get(). One way - * to make this lighter weight is to avoid doing any IOs. + * @param key Key to delete within database * - * @param readOptions {@link ReadOptions} instance - * @param key byte array of a key to search for - * @param value StringBuilder instance which is a out parameter if a value is - * found in block-cache. - * @return boolean value indicating if key does not exist or might exist. + * @throws RocksDBException thrown if error happens in underlying + * native library. */ - public boolean keyMayExist(final ReadOptions readOptions, - final byte[] key, final StringBuilder value) { - return keyMayExist(nativeHandle_, readOptions.nativeHandle_, - key, 0, key.length, value); + public void delete(final byte[] key) throws RocksDBException { + delete(nativeHandle_, key, 0, key.length); } /** - * If the key definitely does not exist in the database, then this method - * returns false, else true. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * This check is potentially lighter-weight than invoking DB::Get(). One way - * to make this lighter weight is to avoid doing any IOs. + * @param key Key to delete within database + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be + * non-negative and no larger than ("key".length - offset) * - * @param readOptions {@link ReadOptions} instance - * @param columnFamilyHandle {@link ColumnFamilyHandle} instance - * @param key byte array of a key to search for - * @param value StringBuilder instance which is a out parameter if a value is - * found in block-cache. - * @return boolean value indicating if key does not exist or might exist. + * @throws RocksDBException thrown if error happens in underlying + * native library. */ - public boolean keyMayExist(final ReadOptions readOptions, - final ColumnFamilyHandle columnFamilyHandle, final byte[] key, - final StringBuilder value) { - return keyMayExist(nativeHandle_, readOptions.nativeHandle_, - key, 0, key.length, columnFamilyHandle.nativeHandle_, - value); + public void delete(final byte[] key, final int offset, final int len) + throws RocksDBException { + delete(nativeHandle_, key, offset, len); } /** - * Apply the specified updates to the database. + * Remove the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param writeOpts WriteOptions instance - * @param updates WriteBatch instance + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. + * + * @deprecated Use {@link #delete(ColumnFamilyHandle, byte[])} */ - public void write(final WriteOptions writeOpts, final WriteBatch updates) - throws RocksDBException { - write0(nativeHandle_, writeOpts.nativeHandle_, updates.nativeHandle_); + @Deprecated + public void remove(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key) throws RocksDBException { + delete(columnFamilyHandle, key); } /** - * Apply the specified updates to the database. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param writeOpts WriteOptions instance - * @param updates WriteBatchWithIndex instance + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void write(final WriteOptions writeOpts, - final WriteBatchWithIndex updates) throws RocksDBException { - write1(nativeHandle_, writeOpts.nativeHandle_, updates.nativeHandle_); + public void delete(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key) throws RocksDBException { + delete(nativeHandle_, key, 0, key.length, columnFamilyHandle.nativeHandle_); } /** - * Add merge operand for key/value pair. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param key the specified key to be merged. - * @param value the value to be merged with the current value for - * the specified key. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key Key to delete within database + * @param offset the offset of the "key" array to be used, + * must be non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("value".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void merge(final byte[] key, final byte[] value) + public void delete(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final int offset, final int len) throws RocksDBException { - merge(nativeHandle_, key, 0, key.length, value, 0, value.length); + delete(nativeHandle_, key, offset, len, columnFamilyHandle.nativeHandle_); } /** - * Add merge operand for key/value pair in a ColumnFamily. + * Remove the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param columnFamilyHandle {@link ColumnFamilyHandle} instance - * @param key the specified key to be merged. - * @param value the value to be merged with the current value for - * the specified key. + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. + * + * @deprecated Use {@link #delete(WriteOptions, byte[])} */ - public void merge(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key, final byte[] value) throws RocksDBException { - merge(nativeHandle_, key, 0, key.length, value, 0, value.length, - columnFamilyHandle.nativeHandle_); + @Deprecated + public void remove(final WriteOptions writeOpt, final byte[] key) + throws RocksDBException { + delete(writeOpt, key); } /** - * Add merge operand for key/value pair. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param writeOpts {@link WriteOptions} for this write. - * @param key the specified key to be merged. - * @param value the value to be merged with the current value for - * the specified key. + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void merge(final WriteOptions writeOpts, final byte[] key, - final byte[] value) throws RocksDBException { - merge(nativeHandle_, writeOpts.nativeHandle_, - key, 0, key.length, value, 0, value.length); + public void delete(final WriteOptions writeOpt, final byte[] key) + throws RocksDBException { + delete(nativeHandle_, writeOpt.nativeHandle_, key, 0, key.length); } /** - * Add merge operand for key/value pair. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * - * @param columnFamilyHandle {@link ColumnFamilyHandle} instance - * @param writeOpts {@link WriteOptions} for this write. - * @param key the specified key to be merged. - * @param value the value to be merged with the current value for - * the specified key. + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be + * non-negative and no larger than ("key".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void merge(final ColumnFamilyHandle columnFamilyHandle, - final WriteOptions writeOpts, final byte[] key, - final byte[] value) throws RocksDBException { - merge(nativeHandle_, writeOpts.nativeHandle_, - key, 0, key.length, value, 0, value.length, - columnFamilyHandle.nativeHandle_); + public void delete(final WriteOptions writeOpt, final byte[] key, + final int offset, final int len) throws RocksDBException { + delete(nativeHandle_, writeOpt.nativeHandle_, key, offset, len); } - // TODO(AR) we should improve the #get() API, returning -1 (RocksDB.NOT_FOUND) is not very nice - // when we could communicate better status into, also the C++ code show that -2 could be returned - /** - * Get the value associated with the specified key within column family* - * @param key the key to retrieve the value. - * @param value the out-value to receive the retrieved value. - * @return The size of the actual value that matches the specified - * {@code key} in byte. If the return value is greater than the - * length of {@code value}, then it indicates that the size of the - * input buffer {@code value} is insufficient and partial result will - * be returned. RocksDB.NOT_FOUND will be returned if the value not - * found. + * Remove the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. + * + * @deprecated Use {@link #delete(ColumnFamilyHandle, WriteOptions, byte[])} */ - public int get(final byte[] key, final byte[] value) throws RocksDBException { - return get(nativeHandle_, key, 0, key.length, value, 0, value.length); + @Deprecated + public void remove(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpt, final byte[] key) throws RocksDBException { + delete(columnFamilyHandle, writeOpt, key); } /** - * Get the value associated with the specified key within column family. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} * instance - * @param key the key to retrieve the value. - * @param value the out-value to receive the retrieved value. - * @return The size of the actual value that matches the specified - * {@code key} in byte. If the return value is greater than the - * length of {@code value}, then it indicates that the size of the - * input buffer {@code value} is insufficient and partial result will - * be returned. RocksDB.NOT_FOUND will be returned if the value not - * found. + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public int get(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, - final byte[] value) throws RocksDBException, IllegalArgumentException { - return get(nativeHandle_, key, 0, key.length, value, 0, value.length, + public void delete(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpt, final byte[] key) + throws RocksDBException { + delete(nativeHandle_, writeOpt.nativeHandle_, key, 0, key.length, columnFamilyHandle.nativeHandle_); } /** - * Get the value associated with the specified key. - * - * @param opt {@link org.rocksdb.ReadOptions} instance. - * @param key the key to retrieve the value. - * @param value the out-value to receive the retrieved value. - * @return The size of the actual value that matches the specified - * {@code key} in byte. If the return value is greater than the - * length of {@code value}, then it indicates that the size of the - * input buffer {@code value} is insufficient and partial result will - * be returned. RocksDB.NOT_FOUND will be returned if the value not - * found. - * - * @throws RocksDBException thrown if error happens in underlying - * native library. - */ - public int get(final ReadOptions opt, final byte[] key, - final byte[] value) throws RocksDBException { - return get(nativeHandle_, opt.nativeHandle_, - key, 0, key.length, value, 0, value.length); - } - /** - * Get the value associated with the specified key within column family. + * Delete the database entry (if any) for "key". Returns OK on + * success, and a non-OK status on error. It is not an error if "key" + * did not exist in the database. * * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} * instance - * @param opt {@link org.rocksdb.ReadOptions} instance. - * @param key the key to retrieve the value. - * @param value the out-value to receive the retrieved value. - * @return The size of the actual value that matches the specified - * {@code key} in byte. If the return value is greater than the - * length of {@code value}, then it indicates that the size of the - * input buffer {@code value} is insufficient and partial result will - * be returned. RocksDB.NOT_FOUND will be returned if the value not - * found. + * @param writeOpt WriteOptions to be used with delete operation + * @param key Key to delete within database + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be + * non-negative and no larger than ("key".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public int get(final ColumnFamilyHandle columnFamilyHandle, - final ReadOptions opt, final byte[] key, final byte[] value) - throws RocksDBException { - return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length, value, - 0, value.length, columnFamilyHandle.nativeHandle_); + public void delete(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpt, final byte[] key, final int offset, + final int len) throws RocksDBException { + delete(nativeHandle_, writeOpt.nativeHandle_, key, offset, len, + columnFamilyHandle.nativeHandle_); } /** - * The simplified version of get which returns a new byte array storing - * the value associated with the specified input key if any. null will be - * returned if the specified key is not found. + * Remove the database entry for {@code key}. Requires that the key exists + * and was not overwritten. It is not an error if the key did not exist + * in the database. * - * @param key the key retrieve the value. - * @return a byte array storing the value associated with the input key if - * any. null if it does not find the specified key. + * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple + * times), then the result of calling SingleDelete() on this key is undefined. + * SingleDelete() only behaves correctly if there has been only one Put() + * for this key since the previous call to SingleDelete() for this key. + * + * This feature is currently an experimental performance optimization + * for a very specific workload. It is up to the caller to ensure that + * SingleDelete is only used for a key that is not deleted using Delete() or + * written using Merge(). Mixing SingleDelete operations with Deletes and + * Merges can result in undefined behavior. + * + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - public byte[] get(final byte[] key) throws RocksDBException { - return get(nativeHandle_, key, 0, key.length); + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final byte[] key) throws RocksDBException { + singleDelete(nativeHandle_, key, key.length); } /** - * The simplified version of get which returns a new byte array storing - * the value associated with the specified input key if any. null will be - * returned if the specified key is not found. + * Remove the database entry for {@code key}. Requires that the key exists + * and was not overwritten. It is not an error if the key did not exist + * in the database. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param key the key retrieve the value. - * @return a byte array storing the value associated with the input key if - * any. null if it does not find the specified key. + * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple + * times), then the result of calling SingleDelete() on this key is undefined. + * SingleDelete() only behaves correctly if there has been only one Put() + * for this key since the previous call to SingleDelete() for this key. + * + * This feature is currently an experimental performance optimization + * for a very specific workload. It is up to the caller to ensure that + * SingleDelete is only used for a key that is not deleted using Delete() or + * written using Merge(). Mixing SingleDelete operations with Deletes and + * Merges can result in undefined behavior. + * + * @param columnFamilyHandle The column family to delete the key from + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - public byte[] get(final ColumnFamilyHandle columnFamilyHandle, + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, final byte[] key) throws RocksDBException { - return get(nativeHandle_, key, 0, key.length, + singleDelete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_); } /** - * The simplified version of get which returns a new byte array storing - * the value associated with the specified input key if any. null will be - * returned if the specified key is not found. + * Remove the database entry for {@code key}. Requires that the key exists + * and was not overwritten. It is not an error if the key did not exist + * in the database. * - * @param key the key retrieve the value. - * @param opt Read options. - * @return a byte array storing the value associated with the input key if - * any. null if it does not find the specified key. + * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple + * times), then the result of calling SingleDelete() on this key is undefined. + * SingleDelete() only behaves correctly if there has been only one Put() + * for this key since the previous call to SingleDelete() for this key. * - * @throws RocksDBException thrown if error happens in underlying - * native library. - */ - public byte[] get(final ReadOptions opt, final byte[] key) - throws RocksDBException { - return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length); - } - - /** - * The simplified version of get which returns a new byte array storing - * the value associated with the specified input key if any. null will be - * returned if the specified key is not found. + * This feature is currently an experimental performance optimization + * for a very specific workload. It is up to the caller to ensure that + * SingleDelete is only used for a key that is not deleted using Delete() or + * written using Merge(). Mixing SingleDelete operations with Deletes and + * Merges can result in undefined behavior. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param key the key retrieve the value. - * @param opt Read options. - * @return a byte array storing the value associated with the input key if - * any. null if it does not find the specified key. + * Note: consider setting {@link WriteOptions#setSync(boolean)} true. + * + * @param writeOpt Write options for the delete + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - public byte[] get(final ColumnFamilyHandle columnFamilyHandle, - final ReadOptions opt, final byte[] key) throws RocksDBException { - return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length, - columnFamilyHandle.nativeHandle_); + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final WriteOptions writeOpt, final byte[] key) + throws RocksDBException { + singleDelete(nativeHandle_, writeOpt.nativeHandle_, key, key.length); } /** - * Returns a map of keys for which values were found in DB. + * Remove the database entry for {@code key}. Requires that the key exists + * and was not overwritten. It is not an error if the key did not exist + * in the database. * - * @param keys List of keys for which values need to be retrieved. - * @return Map where key of map is the key passed by user and value for map - * entry is the corresponding value in DB. + * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple + * times), then the result of calling SingleDelete() on this key is undefined. + * SingleDelete() only behaves correctly if there has been only one Put() + * for this key since the previous call to SingleDelete() for this key. + * + * This feature is currently an experimental performance optimization + * for a very specific workload. It is up to the caller to ensure that + * SingleDelete is only used for a key that is not deleted using Delete() or + * written using Merge(). Mixing SingleDelete operations with Deletes and + * Merges can result in undefined behavior. + * + * Note: consider setting {@link WriteOptions#setSync(boolean)} true. + * + * @param columnFamilyHandle The column family to delete the key from + * @param writeOpt Write options for the delete + * @param key Key to delete within database * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - public Map multiGet(final List keys) - throws RocksDBException { - assert(keys.size() != 0); - - final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); - final int keyOffsets[] = new int[keysArray.length]; - final int keyLengths[] = new int[keysArray.length]; - for(int i = 0; i < keyLengths.length; i++) { - keyLengths[i] = keysArray[i].length; - } - - final byte[][] values = multiGet(nativeHandle_, keysArray, keyOffsets, - keyLengths); - - final Map keyValueMap = - new HashMap<>(computeCapacityHint(values.length)); - for(int i = 0; i < values.length; i++) { - if(values[i] == null) { - continue; - } - - keyValueMap.put(keys.get(i), values[i]); - } - - return keyValueMap; + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpt, final byte[] key) throws RocksDBException { + singleDelete(nativeHandle_, writeOpt.nativeHandle_, key, key.length, + columnFamilyHandle.nativeHandle_); } - private static int computeCapacityHint(final int estimatedNumberOfItems) { - // Default load factor for HashMap is 0.75, so N * 1.5 will be at the load - // limit. We add +1 for a buffer. - return (int)Math.ceil(estimatedNumberOfItems * 1.5 + 1.0); - } /** - * Returns a map of keys for which values were found in DB. - *

    - * Note: Every key needs to have a related column family name in - * {@code columnFamilyHandleList}. - *

    + * Removes the database entries in the range ["beginKey", "endKey"), i.e., + * including "beginKey" and excluding "endKey". a non-OK status on error. It + * is not an error if no keys exist in the range ["beginKey", "endKey"). * - * @param columnFamilyHandleList {@link java.util.List} containing - * {@link org.rocksdb.ColumnFamilyHandle} instances. - * @param keys List of keys for which values need to be retrieved. - * @return Map where key of map is the key passed by user and value for map - * entry is the corresponding value in DB. + * Delete the database entry (if any) for "key". Returns OK on success, and a + * non-OK status on error. It is not an error if "key" did not exist in the + * database. * - * @throws RocksDBException thrown if error happens in underlying - * native library. - * @throws IllegalArgumentException thrown if the size of passed keys is not - * equal to the amount of passed column family handles. + * @param beginKey First key to delete within database (inclusive) + * @param endKey Last key to delete within database (exclusive) + * + * @throws RocksDBException thrown if error happens in underlying native + * library. */ - public Map multiGet( - final List columnFamilyHandleList, - final List keys) throws RocksDBException, - IllegalArgumentException { - assert(keys.size() != 0); - // Check if key size equals cfList size. If not a exception must be - // thrown. If not a Segmentation fault happens. - if (keys.size() != columnFamilyHandleList.size()) { - throw new IllegalArgumentException( - "For each key there must be a ColumnFamilyHandle."); - } - final long[] cfHandles = new long[columnFamilyHandleList.size()]; - for (int i = 0; i < columnFamilyHandleList.size(); i++) { - cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; - } - - final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); - final int keyOffsets[] = new int[keysArray.length]; - final int keyLengths[] = new int[keysArray.length]; - for(int i = 0; i < keyLengths.length; i++) { - keyLengths[i] = keysArray[i].length; - } - - final byte[][] values = multiGet(nativeHandle_, keysArray, keyOffsets, - keyLengths, cfHandles); - - final Map keyValueMap = - new HashMap<>(computeCapacityHint(values.length)); - for(int i = 0; i < values.length; i++) { - if (values[i] == null) { - continue; - } - keyValueMap.put(keys.get(i), values[i]); - } - return keyValueMap; + public void deleteRange(final byte[] beginKey, final byte[] endKey) + throws RocksDBException { + deleteRange(nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, + endKey.length); } /** - * Returns a map of keys for which values were found in DB. + * Removes the database entries in the range ["beginKey", "endKey"), i.e., + * including "beginKey" and excluding "endKey". a non-OK status on error. It + * is not an error if no keys exist in the range ["beginKey", "endKey"). * - * @param opt Read options. - * @param keys of keys for which values need to be retrieved. - * @return Map where key of map is the key passed by user and value for map - * entry is the corresponding value in DB. + * Delete the database entry (if any) for "key". Returns OK on success, and a + * non-OK status on error. It is not an error if "key" did not exist in the + * database. * - * @throws RocksDBException thrown if error happens in underlying - * native library. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} instance + * @param beginKey First key to delete within database (inclusive) + * @param endKey Last key to delete within database (exclusive) + * + * @throws RocksDBException thrown if error happens in underlying native + * library. */ - public Map multiGet(final ReadOptions opt, - final List keys) throws RocksDBException { - assert(keys.size() != 0); - - final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); - final int keyOffsets[] = new int[keysArray.length]; - final int keyLengths[] = new int[keysArray.length]; - for(int i = 0; i < keyLengths.length; i++) { - keyLengths[i] = keysArray[i].length; - } - - final byte[][] values = multiGet(nativeHandle_, opt.nativeHandle_, - keysArray, keyOffsets, keyLengths); - - final Map keyValueMap = - new HashMap<>(computeCapacityHint(values.length)); - for(int i = 0; i < values.length; i++) { - if(values[i] == null) { - continue; - } - - keyValueMap.put(keys.get(i), values[i]); - } - - return keyValueMap; + public void deleteRange(final ColumnFamilyHandle columnFamilyHandle, + final byte[] beginKey, final byte[] endKey) throws RocksDBException { + deleteRange(nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, + endKey.length, columnFamilyHandle.nativeHandle_); } /** - * Returns a map of keys for which values were found in DB. - *

    - * Note: Every key needs to have a related column family name in - * {@code columnFamilyHandleList}. - *

    + * Removes the database entries in the range ["beginKey", "endKey"), i.e., + * including "beginKey" and excluding "endKey". a non-OK status on error. It + * is not an error if no keys exist in the range ["beginKey", "endKey"). * - * @param opt Read options. - * @param columnFamilyHandleList {@link java.util.List} containing - * {@link org.rocksdb.ColumnFamilyHandle} instances. - * @param keys of keys for which values need to be retrieved. - * @return Map where key of map is the key passed by user and value for map - * entry is the corresponding value in DB. + * Delete the database entry (if any) for "key". Returns OK on success, and a + * non-OK status on error. It is not an error if "key" did not exist in the + * database. + * + * @param writeOpt WriteOptions to be used with delete operation + * @param beginKey First key to delete within database (inclusive) + * @param endKey Last key to delete within database (exclusive) * * @throws RocksDBException thrown if error happens in underlying - * native library. - * @throws IllegalArgumentException thrown if the size of passed keys is not - * equal to the amount of passed column family handles. + * native library. */ - public Map multiGet(final ReadOptions opt, - final List columnFamilyHandleList, - final List keys) throws RocksDBException { - assert(keys.size() != 0); - // Check if key size equals cfList size. If not a exception must be - // thrown. If not a Segmentation fault happens. - if (keys.size()!=columnFamilyHandleList.size()){ - throw new IllegalArgumentException( - "For each key there must be a ColumnFamilyHandle."); - } - final long[] cfHandles = new long[columnFamilyHandleList.size()]; - for (int i = 0; i < columnFamilyHandleList.size(); i++) { - cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; - } - - final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); - final int keyOffsets[] = new int[keysArray.length]; - final int keyLengths[] = new int[keysArray.length]; - for(int i = 0; i < keyLengths.length; i++) { - keyLengths[i] = keysArray[i].length; - } - - final byte[][] values = multiGet(nativeHandle_, opt.nativeHandle_, - keysArray, keyOffsets, keyLengths, cfHandles); - - final Map keyValueMap - = new HashMap<>(computeCapacityHint(values.length)); - for(int i = 0; i < values.length; i++) { - if(values[i] == null) { - continue; - } - keyValueMap.put(keys.get(i), values[i]); - } - - return keyValueMap; + public void deleteRange(final WriteOptions writeOpt, final byte[] beginKey, + final byte[] endKey) throws RocksDBException { + deleteRange(nativeHandle_, writeOpt.nativeHandle_, beginKey, 0, + beginKey.length, endKey, 0, endKey.length); } /** - * Remove the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Removes the database entries in the range ["beginKey", "endKey"), i.e., + * including "beginKey" and excluding "endKey". a non-OK status on error. It + * is not an error if no keys exist in the range ["beginKey", "endKey"). * - * @param key Key to delete within database + * Delete the database entry (if any) for "key". Returns OK on success, and a + * non-OK status on error. It is not an error if "key" did not exist in the + * database. * - * @throws RocksDBException thrown if error happens in underlying - * native library. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} instance + * @param writeOpt WriteOptions to be used with delete operation + * @param beginKey First key to delete within database (included) + * @param endKey Last key to delete within database (excluded) * - * @deprecated Use {@link #delete(byte[])} + * @throws RocksDBException thrown if error happens in underlying native + * library. */ - @Deprecated - public void remove(final byte[] key) throws RocksDBException { - delete(key); + public void deleteRange(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpt, final byte[] beginKey, final byte[] endKey) + throws RocksDBException { + deleteRange(nativeHandle_, writeOpt.nativeHandle_, beginKey, 0, + beginKey.length, endKey, 0, endKey.length, + columnFamilyHandle.nativeHandle_); } + /** - * Delete the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair. * - * @param key Key to delete within database + * @param key the specified key to be merged. + * @param value the value to be merged with the current value for the + * specified key. * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void delete(final byte[] key) throws RocksDBException { - delete(nativeHandle_, key, 0, key.length); + public void merge(final byte[] key, final byte[] value) + throws RocksDBException { + merge(nativeHandle_, key, 0, key.length, value, 0, value.length); } /** - * Remove the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param key Key to delete within database + * @param key the specified key to be merged. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value to be merged with the current value for the + * specified key. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and must be non-negative and no larger than + * ("value".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. - * - * @deprecated Use {@link #delete(ColumnFamilyHandle, byte[])} + * @throws IndexOutOfBoundsException if an offset or length is out of bounds */ - @Deprecated - public void remove(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key) throws RocksDBException { - delete(columnFamilyHandle, key); + public void merge(final byte[] key, int offset, int len, final byte[] value, + final int vOffset, final int vLen) throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + merge(nativeHandle_, key, offset, len, value, vOffset, vLen); } /** - * Delete the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair in a ColumnFamily. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param key Key to delete within database + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key the specified key to be merged. + * @param value the value to be merged with the current value for + * the specified key. * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void delete(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key) throws RocksDBException { - delete(nativeHandle_, key, 0, key.length, columnFamilyHandle.nativeHandle_); + public void merge(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final byte[] value) throws RocksDBException { + merge(nativeHandle_, key, 0, key.length, value, 0, value.length, + columnFamilyHandle.nativeHandle_); } /** - * Remove the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair in a ColumnFamily. * - * @param writeOpt WriteOptions to be used with delete operation - * @param key Key to delete within database + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key the specified key to be merged. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value to be merged with the current value for + * the specified key. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * must be non-negative and no larger than ("value".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds + */ + public void merge(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final int offset, final int len, final byte[] value, + final int vOffset, final int vLen) throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + merge(nativeHandle_, key, offset, len, value, vOffset, vLen, + columnFamilyHandle.nativeHandle_); + } + + /** + * Add merge operand for key/value pair. * - * @deprecated Use {@link #delete(WriteOptions, byte[])} + * @param writeOpts {@link WriteOptions} for this write. + * @param key the specified key to be merged. + * @param value the value to be merged with the current value for + * the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. */ - @Deprecated - public void remove(final WriteOptions writeOpt, final byte[] key) - throws RocksDBException { - delete(writeOpt, key); + public void merge(final WriteOptions writeOpts, final byte[] key, + final byte[] value) throws RocksDBException { + merge(nativeHandle_, writeOpts.nativeHandle_, + key, 0, key.length, value, 0, value.length); } /** - * Delete the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair. * - * @param writeOpt WriteOptions to be used with delete operation - * @param key Key to delete within database + * @param writeOpts {@link WriteOptions} for this write. + * @param key the specified key to be merged. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("value".length - offset) + * @param value the value to be merged with the current value for + * the specified key. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds */ - public void delete(final WriteOptions writeOpt, final byte[] key) + public void merge(final WriteOptions writeOpts, + final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) throws RocksDBException { - delete(nativeHandle_, writeOpt.nativeHandle_, key, 0, key.length); + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + merge(nativeHandle_, writeOpts.nativeHandle_, + key, offset, len, value, vOffset, vLen); } /** - * Remove the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param writeOpt WriteOptions to be used with delete operation - * @param key Key to delete within database + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param writeOpts {@link WriteOptions} for this write. + * @param key the specified key to be merged. + * @param value the value to be merged with the current value for the + * specified key. * * @throws RocksDBException thrown if error happens in underlying * native library. - * - * @deprecated Use {@link #delete(ColumnFamilyHandle, WriteOptions, byte[])} */ - @Deprecated - public void remove(final ColumnFamilyHandle columnFamilyHandle, - final WriteOptions writeOpt, final byte[] key) + public void merge(final ColumnFamilyHandle columnFamilyHandle, + final WriteOptions writeOpts, final byte[] key, final byte[] value) throws RocksDBException { - delete(columnFamilyHandle, writeOpt, key); + merge(nativeHandle_, writeOpts.nativeHandle_, + key, 0, key.length, value, 0, value.length, + columnFamilyHandle.nativeHandle_); } /** - * Delete the database entry (if any) for "key". Returns OK on - * success, and a non-OK status on error. It is not an error if "key" - * did not exist in the database. + * Add merge operand for key/value pair. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param writeOpt WriteOptions to be used with delete operation - * @param key Key to delete within database + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param writeOpts {@link WriteOptions} for this write. + * @param key the specified key to be merged. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the value to be merged with the current value for + * the specified key. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) * * @throws RocksDBException thrown if error happens in underlying * native library. + * @throws IndexOutOfBoundsException if an offset or length is out of bounds */ - public void delete(final ColumnFamilyHandle columnFamilyHandle, - final WriteOptions writeOpt, final byte[] key) + public void merge( + final ColumnFamilyHandle columnFamilyHandle, final WriteOptions writeOpts, + final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) throws RocksDBException { - delete(nativeHandle_, writeOpt.nativeHandle_, key, 0, key.length, + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + merge(nativeHandle_, writeOpts.nativeHandle_, + key, offset, len, value, vOffset, vLen, columnFamilyHandle.nativeHandle_); } /** - * Remove the database entry for {@code key}. Requires that the key exists - * and was not overwritten. It is not an error if the key did not exist - * in the database. + * Apply the specified updates to the database. * - * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple - * times), then the result of calling SingleDelete() on this key is undefined. - * SingleDelete() only behaves correctly if there has been only one Put() - * for this key since the previous call to SingleDelete() for this key. + * @param writeOpts WriteOptions instance + * @param updates WriteBatch instance * - * This feature is currently an experimental performance optimization - * for a very specific workload. It is up to the caller to ensure that - * SingleDelete is only used for a key that is not deleted using Delete() or - * written using Merge(). Mixing SingleDelete operations with Deletes and - * Merges can result in undefined behavior. + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public void write(final WriteOptions writeOpts, final WriteBatch updates) + throws RocksDBException { + write0(nativeHandle_, writeOpts.nativeHandle_, updates.nativeHandle_); + } + + /** + * Apply the specified updates to the database. * - * @param key Key to delete within database + * @param writeOpts WriteOptions instance + * @param updates WriteBatchWithIndex instance * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final byte[] key) throws RocksDBException { - singleDelete(nativeHandle_, key, key.length); + public void write(final WriteOptions writeOpts, + final WriteBatchWithIndex updates) throws RocksDBException { + write1(nativeHandle_, writeOpts.nativeHandle_, updates.nativeHandle_); } + // TODO(AR) we should improve the #get() API, returning -1 (RocksDB.NOT_FOUND) is not very nice + // when we could communicate better status into, also the C++ code show that -2 could be returned + /** - * Remove the database entry for {@code key}. Requires that the key exists - * and was not overwritten. It is not an error if the key did not exist - * in the database. - * - * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple - * times), then the result of calling SingleDelete() on this key is undefined. - * SingleDelete() only behaves correctly if there has been only one Put() - * for this key since the previous call to SingleDelete() for this key. + * Get the value associated with the specified key within column family* * - * This feature is currently an experimental performance optimization - * for a very specific workload. It is up to the caller to ensure that - * SingleDelete is only used for a key that is not deleted using Delete() or - * written using Merge(). Mixing SingleDelete operations with Deletes and - * Merges can result in undefined behavior. + * @param key the key to retrieve the value. + * @param value the out-value to receive the retrieved value. * - * @param columnFamilyHandle The column family to delete the key from - * @param key Key to delete within database + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key) throws RocksDBException { - singleDelete(nativeHandle_, key, key.length, - columnFamilyHandle.nativeHandle_); + public int get(final byte[] key, final byte[] value) throws RocksDBException { + return get(nativeHandle_, key, 0, key.length, value, 0, value.length); } /** - * Remove the database entry for {@code key}. Requires that the key exists - * and was not overwritten. It is not an error if the key did not exist - * in the database. - * - * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple - * times), then the result of calling SingleDelete() on this key is undefined. - * SingleDelete() only behaves correctly if there has been only one Put() - * for this key since the previous call to SingleDelete() for this key. - * - * This feature is currently an experimental performance optimization - * for a very specific workload. It is up to the caller to ensure that - * SingleDelete is only used for a key that is not deleted using Delete() or - * written using Merge(). Mixing SingleDelete operations with Deletes and - * Merges can result in undefined behavior. + * Get the value associated with the specified key within column family* * - * Note: consider setting {@link WriteOptions#setSync(boolean)} true. + * @param key the key to retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the out-value to receive the retrieved value. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "value".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and and no larger than ("value".length - offset) * - * @param writeOpt Write options for the delete - * @param key Key to delete within database + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final WriteOptions writeOpt, final byte[] key) + public int get(final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) throws RocksDBException { - singleDelete(nativeHandle_, writeOpt.nativeHandle_, key, key.length); + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + return get(nativeHandle_, key, offset, len, value, vOffset, vLen); } /** - * Remove the database entry for {@code key}. Requires that the key exists - * and was not overwritten. It is not an error if the key did not exist - * in the database. - * - * If a key is overwritten (by calling {@link #put(byte[], byte[])} multiple - * times), then the result of calling SingleDelete() on this key is undefined. - * SingleDelete() only behaves correctly if there has been only one Put() - * for this key since the previous call to SingleDelete() for this key. - * - * This feature is currently an experimental performance optimization - * for a very specific workload. It is up to the caller to ensure that - * SingleDelete is only used for a key that is not deleted using Delete() or - * written using Merge(). Mixing SingleDelete operations with Deletes and - * Merges can result in undefined behavior. - * - * Note: consider setting {@link WriteOptions#setSync(boolean)} true. + * Get the value associated with the specified key within column family. * - * @param columnFamilyHandle The column family to delete the key from - * @param writeOpt Write options for the delete - * @param key Key to delete within database + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key to retrieve the value. + * @param value the out-value to receive the retrieved value. + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. * * @throws RocksDBException thrown if error happens in underlying - * native library. + * native library. */ - @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, - final WriteOptions writeOpt, final byte[] key) throws RocksDBException { - singleDelete(nativeHandle_, writeOpt.nativeHandle_, key, key.length, + public int get(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final byte[] value) throws RocksDBException, IllegalArgumentException { + return get(nativeHandle_, key, 0, key.length, value, 0, value.length, columnFamilyHandle.nativeHandle_); } /** - * DB implements can export properties about their state - * via this method on a per column family level. + * Get the value associated with the specified key within column family. * - *

    If {@code property} is a valid property understood by this DB - * implementation, fills {@code value} with its current value and - * returns true. Otherwise returns false.

    + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key to retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * an no larger than ("key".length - offset) + * @param value the out-value to receive the retrieved value. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) * - *

    Valid property names include: - *

      - *
    • "rocksdb.num-files-at-level<N>" - return the number of files at - * level <N>, where <N> is an ASCII representation of a level - * number (e.g. "0").
    • - *
    • "rocksdb.stats" - returns a multi-line string that describes statistics - * about the internal operation of the DB.
    • - *
    • "rocksdb.sstables" - returns a multi-line string that describes all - * of the sstables that make up the db contents.
    • - *
    + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public int get(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final int offset, final int len, final byte[] value, final int vOffset, + final int vLen) throws RocksDBException, IllegalArgumentException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + return get(nativeHandle_, key, offset, len, value, vOffset, vLen, + columnFamilyHandle.nativeHandle_); + } + + /** + * Get the value associated with the specified key. + * + * @param opt {@link org.rocksdb.ReadOptions} instance. + * @param key the key to retrieve the value. + * @param value the out-value to receive the retrieved value. + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public int get(final ReadOptions opt, final byte[] key, + final byte[] value) throws RocksDBException { + return get(nativeHandle_, opt.nativeHandle_, + key, 0, key.length, value, 0, value.length); + } + + /** + * Get the value associated with the specified key. + * + * @param opt {@link org.rocksdb.ReadOptions} instance. + * @param key the key to retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param value the out-value to receive the retrieved value. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, must be + * non-negative and no larger than ("value".length - offset) + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public int get(final ReadOptions opt, final byte[] key, final int offset, + final int len, final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + return get(nativeHandle_, opt.nativeHandle_, + key, offset, len, value, vOffset, vLen); + } + + /** + * Get the value associated with the specified key within column family. * * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} * instance - * @param property to be fetched. See above for examples - * @return property value + * @param opt {@link org.rocksdb.ReadOptions} instance. + * @param key the key to retrieve the value. + * @param value the out-value to receive the retrieved value. + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public int get(final ColumnFamilyHandle columnFamilyHandle, + final ReadOptions opt, final byte[] key, final byte[] value) + throws RocksDBException { + return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length, value, + 0, value.length, columnFamilyHandle.nativeHandle_); + } + + /** + * Get the value associated with the specified key within column family. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param opt {@link org.rocksdb.ReadOptions} instance. + * @param key the key to retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be + * non-negative and and no larger than ("key".length - offset) + * @param value the out-value to receive the retrieved value. + * @param vOffset the offset of the "value" array to be used, must be + * non-negative and no longer than "key".length + * @param vLen the length of the "value" array to be used, and must be + * non-negative and no larger than ("value".length - offset) + * @return The size of the actual value that matches the specified + * {@code key} in byte. If the return value is greater than the + * length of {@code value}, then it indicates that the size of the + * input buffer {@code value} is insufficient and partial result will + * be returned. RocksDB.NOT_FOUND will be returned if the value not + * found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public int get(final ColumnFamilyHandle columnFamilyHandle, + final ReadOptions opt, final byte[] key, final int offset, final int len, + final byte[] value, final int vOffset, final int vLen) + throws RocksDBException { + checkBounds(offset, len, key.length); + checkBounds(vOffset, vLen, value.length); + return get(nativeHandle_, opt.nativeHandle_, key, offset, len, value, + vOffset, vLen, columnFamilyHandle.nativeHandle_); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param key the key retrieve the value. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final byte[] key) throws RocksDBException { + return get(nativeHandle_, key, 0, key.length); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param key the key retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final byte[] key, final int offset, + final int len) throws RocksDBException { + checkBounds(offset, len, key.length); + return get(nativeHandle_, key, offset, len); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key retrieve the value. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key) throws RocksDBException { + return get(nativeHandle_, key, 0, key.length, + columnFamilyHandle.nativeHandle_); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final int offset, final int len) + throws RocksDBException { + checkBounds(offset, len, key.length); + return get(nativeHandle_, key, offset, len, + columnFamilyHandle.nativeHandle_); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param key the key retrieve the value. + * @param opt Read options. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ReadOptions opt, final byte[] key) + throws RocksDBException { + return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param key the key retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param opt Read options. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ReadOptions opt, final byte[] key, final int offset, + final int len) throws RocksDBException { + checkBounds(offset, len, key.length); + return get(nativeHandle_, opt.nativeHandle_, key, offset, len); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key retrieve the value. + * @param opt Read options. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ColumnFamilyHandle columnFamilyHandle, + final ReadOptions opt, final byte[] key) throws RocksDBException { + return get(nativeHandle_, opt.nativeHandle_, key, 0, key.length, + columnFamilyHandle.nativeHandle_); + } + + /** + * The simplified version of get which returns a new byte array storing + * the value associated with the specified input key if any. null will be + * returned if the specified key is not found. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key retrieve the value. + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than ("key".length - offset) + * @param opt Read options. + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] get(final ColumnFamilyHandle columnFamilyHandle, + final ReadOptions opt, final byte[] key, final int offset, final int len) + throws RocksDBException { + checkBounds(offset, len, key.length); + return get(nativeHandle_, opt.nativeHandle_, key, offset, len, + columnFamilyHandle.nativeHandle_); + } + + /** + * Returns a map of keys for which values were found in DB. + * + * @param keys List of keys for which values need to be retrieved. + * @return Map where key of map is the key passed by user and value for map + * entry is the corresponding value in DB. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * + * @deprecated Consider {@link #multiGetAsList(List)} instead. + */ + @Deprecated + public Map multiGet(final List keys) + throws RocksDBException { + assert(keys.size() != 0); + + final byte[][] keysArray = keys.toArray(new byte[0][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + final byte[][] values = multiGet(nativeHandle_, keysArray, keyOffsets, + keyLengths); + + final Map keyValueMap = + new HashMap<>(computeCapacityHint(values.length)); + for(int i = 0; i < values.length; i++) { + if(values[i] == null) { + continue; + } + + keyValueMap.put(keys.get(i), values[i]); + } + + return keyValueMap; + } + + /** + * Returns a map of keys for which values were found in DB. + *

    + * Note: Every key needs to have a related column family name in + * {@code columnFamilyHandleList}. + *

    + * + * @param columnFamilyHandleList {@link java.util.List} containing + * {@link org.rocksdb.ColumnFamilyHandle} instances. + * @param keys List of keys for which values need to be retrieved. + * @return Map where key of map is the key passed by user and value for map + * entry is the corresponding value in DB. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IllegalArgumentException thrown if the size of passed keys is not + * equal to the amount of passed column family handles. + * + * @deprecated Consider {@link #multiGetAsList(List, List)} instead. + */ + @Deprecated + public Map multiGet( + final List columnFamilyHandleList, + final List keys) throws RocksDBException, + IllegalArgumentException { + assert(keys.size() != 0); + // Check if key size equals cfList size. If not a exception must be + // thrown. If not a Segmentation fault happens. + if (keys.size() != columnFamilyHandleList.size()) { + throw new IllegalArgumentException( + "For each key there must be a ColumnFamilyHandle."); + } + final long[] cfHandles = new long[columnFamilyHandleList.size()]; + for (int i = 0; i < columnFamilyHandleList.size(); i++) { + cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; + } + + final byte[][] keysArray = keys.toArray(new byte[0][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + final byte[][] values = multiGet(nativeHandle_, keysArray, keyOffsets, + keyLengths, cfHandles); + + final Map keyValueMap = + new HashMap<>(computeCapacityHint(values.length)); + for(int i = 0; i < values.length; i++) { + if (values[i] == null) { + continue; + } + keyValueMap.put(keys.get(i), values[i]); + } + return keyValueMap; + } + + /** + * Returns a map of keys for which values were found in DB. + * + * @param opt Read options. + * @param keys of keys for which values need to be retrieved. + * @return Map where key of map is the key passed by user and value for map + * entry is the corresponding value in DB. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * + * @deprecated Consider {@link #multiGetAsList(ReadOptions, List)} instead. + */ + @Deprecated + public Map multiGet(final ReadOptions opt, + final List keys) throws RocksDBException { + assert(keys.size() != 0); + + final byte[][] keysArray = keys.toArray(new byte[0][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + final byte[][] values = multiGet(nativeHandle_, opt.nativeHandle_, + keysArray, keyOffsets, keyLengths); + + final Map keyValueMap = + new HashMap<>(computeCapacityHint(values.length)); + for(int i = 0; i < values.length; i++) { + if(values[i] == null) { + continue; + } + + keyValueMap.put(keys.get(i), values[i]); + } + + return keyValueMap; + } + + /** + * Returns a map of keys for which values were found in DB. + *

    + * Note: Every key needs to have a related column family name in + * {@code columnFamilyHandleList}. + *

    + * + * @param opt Read options. + * @param columnFamilyHandleList {@link java.util.List} containing + * {@link org.rocksdb.ColumnFamilyHandle} instances. + * @param keys of keys for which values need to be retrieved. + * @return Map where key of map is the key passed by user and value for map + * entry is the corresponding value in DB. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IllegalArgumentException thrown if the size of passed keys is not + * equal to the amount of passed column family handles. + * + * @deprecated Consider {@link #multiGetAsList(ReadOptions, List, List)} + * instead. + */ + @Deprecated + public Map multiGet(final ReadOptions opt, + final List columnFamilyHandleList, + final List keys) throws RocksDBException { + assert(keys.size() != 0); + // Check if key size equals cfList size. If not a exception must be + // thrown. If not a Segmentation fault happens. + if (keys.size()!=columnFamilyHandleList.size()){ + throw new IllegalArgumentException( + "For each key there must be a ColumnFamilyHandle."); + } + final long[] cfHandles = new long[columnFamilyHandleList.size()]; + for (int i = 0; i < columnFamilyHandleList.size(); i++) { + cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; + } + + final byte[][] keysArray = keys.toArray(new byte[0][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + final byte[][] values = multiGet(nativeHandle_, opt.nativeHandle_, + keysArray, keyOffsets, keyLengths, cfHandles); + + final Map keyValueMap + = new HashMap<>(computeCapacityHint(values.length)); + for(int i = 0; i < values.length; i++) { + if(values[i] == null) { + continue; + } + keyValueMap.put(keys.get(i), values[i]); + } + + return keyValueMap; + } + + /** + * Takes a list of keys, and returns a list of values for the given list of + * keys. List will contain null for keys which could not be found. + * + * @param keys List of keys for which values need to be retrieved. + * @return List of values for the given list of keys. List will contain + * null for keys which could not be found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public List multiGetAsList(final List keys) + throws RocksDBException { + assert(keys.size() != 0); + + final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + return Arrays.asList(multiGet(nativeHandle_, keysArray, keyOffsets, + keyLengths)); + } + + /** + * Returns a list of values for the given list of keys. List will contain + * null for keys which could not be found. + *

    + * Note: Every key needs to have a related column family name in + * {@code columnFamilyHandleList}. + *

    + * + * @param columnFamilyHandleList {@link java.util.List} containing + * {@link org.rocksdb.ColumnFamilyHandle} instances. + * @param keys List of keys for which values need to be retrieved. + * @return List of values for the given list of keys. List will contain + * null for keys which could not be found. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IllegalArgumentException thrown if the size of passed keys is not + * equal to the amount of passed column family handles. + */ + public List multiGetAsList( + final List columnFamilyHandleList, + final List keys) throws RocksDBException, + IllegalArgumentException { + assert(keys.size() != 0); + // Check if key size equals cfList size. If not a exception must be + // thrown. If not a Segmentation fault happens. + if (keys.size() != columnFamilyHandleList.size()) { + throw new IllegalArgumentException( + "For each key there must be a ColumnFamilyHandle."); + } + final long[] cfHandles = new long[columnFamilyHandleList.size()]; + for (int i = 0; i < columnFamilyHandleList.size(); i++) { + cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; + } + + final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + return Arrays.asList(multiGet(nativeHandle_, keysArray, keyOffsets, + keyLengths, cfHandles)); + } + + /** + * Returns a list of values for the given list of keys. List will contain + * null for keys which could not be found. + * + * @param opt Read options. + * @param keys of keys for which values need to be retrieved. + * @return List of values for the given list of keys. List will contain + * null for keys which could not be found. * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public String getProperty(final ColumnFamilyHandle columnFamilyHandle, - final String property) throws RocksDBException { - return getProperty0(nativeHandle_, columnFamilyHandle.nativeHandle_, - property, property.length()); + public List multiGetAsList(final ReadOptions opt, + final List keys) throws RocksDBException { + assert(keys.size() != 0); + + final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + return Arrays.asList(multiGet(nativeHandle_, opt.nativeHandle_, + keysArray, keyOffsets, keyLengths)); } /** - * Removes the database entries in the range ["beginKey", "endKey"), i.e., - * including "beginKey" and excluding "endKey". a non-OK status on error. It - * is not an error if no keys exist in the range ["beginKey", "endKey"). - * - * Delete the database entry (if any) for "key". Returns OK on success, and a - * non-OK status on error. It is not an error if "key" did not exist in the - * database. + * Returns a list of values for the given list of keys. List will contain + * null for keys which could not be found. + *

    + * Note: Every key needs to have a related column family name in + * {@code columnFamilyHandleList}. + *

    * - * @param beginKey - * First key to delete within database (included) - * @param endKey - * Last key to delete within database (excluded) + * @param opt Read options. + * @param columnFamilyHandleList {@link java.util.List} containing + * {@link org.rocksdb.ColumnFamilyHandle} instances. + * @param keys of keys for which values need to be retrieved. + * @return List of values for the given list of keys. List will contain + * null for keys which could not be found. * - * @throws RocksDBException - * thrown if error happens in underlying native library. + * @throws RocksDBException thrown if error happens in underlying + * native library. + * @throws IllegalArgumentException thrown if the size of passed keys is not + * equal to the amount of passed column family handles. */ - public void deleteRange(final byte[] beginKey, final byte[] endKey) throws RocksDBException { - deleteRange(nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, endKey.length); + public List multiGetAsList(final ReadOptions opt, + final List columnFamilyHandleList, + final List keys) throws RocksDBException { + assert(keys.size() != 0); + // Check if key size equals cfList size. If not a exception must be + // thrown. If not a Segmentation fault happens. + if (keys.size()!=columnFamilyHandleList.size()){ + throw new IllegalArgumentException( + "For each key there must be a ColumnFamilyHandle."); + } + final long[] cfHandles = new long[columnFamilyHandleList.size()]; + for (int i = 0; i < columnFamilyHandleList.size(); i++) { + cfHandles[i] = columnFamilyHandleList.get(i).nativeHandle_; + } + + final byte[][] keysArray = keys.toArray(new byte[keys.size()][]); + final int keyOffsets[] = new int[keysArray.length]; + final int keyLengths[] = new int[keysArray.length]; + for(int i = 0; i < keyLengths.length; i++) { + keyLengths[i] = keysArray[i].length; + } + + return Arrays.asList(multiGet(nativeHandle_, opt.nativeHandle_, + keysArray, keyOffsets, keyLengths, cfHandles)); } /** - * Removes the database entries in the range ["beginKey", "endKey"), i.e., - * including "beginKey" and excluding "endKey". a non-OK status on error. It - * is not an error if no keys exist in the range ["beginKey", "endKey"). - * - * Delete the database entry (if any) for "key". Returns OK on success, and a - * non-OK status on error. It is not an error if "key" did not exist in the - * database. + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @param columnFamilyHandle - * {@link org.rocksdb.ColumnFamilyHandle} instance - * @param beginKey - * First key to delete within database (included) - * @param endKey - * Last key to delete within database (excluded) + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException - * thrown if error happens in underlying native library. + * @param key byte array of a key to search for + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public void deleteRange(final ColumnFamilyHandle columnFamilyHandle, final byte[] beginKey, - final byte[] endKey) throws RocksDBException { - deleteRange(nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, endKey.length, - columnFamilyHandle.nativeHandle_); + public boolean keyMayExist(final byte[] key, final StringBuilder value) { + return keyMayExist(nativeHandle_, key, 0, key.length, value); } /** - * Removes the database entries in the range ["beginKey", "endKey"), i.e., - * including "beginKey" and excluding "endKey". a non-OK status on error. It - * is not an error if no keys exist in the range ["beginKey", "endKey"). + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * Delete the database entry (if any) for "key". Returns OK on success, and a - * non-OK status on error. It is not an error if "key" did not exist in the - * database. + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @param writeOpt - * WriteOptions to be used with delete operation - * @param beginKey - * First key to delete within database (included) - * @param endKey - * Last key to delete within database (excluded) + * @param key byte array of a key to search for + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than "key".length + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. * - * @throws RocksDBException - * thrown if error happens in underlying native library. + * @return boolean value indicating if key does not exist or might exist. */ - public void deleteRange(final WriteOptions writeOpt, final byte[] beginKey, final byte[] endKey) - throws RocksDBException { - deleteRange(nativeHandle_, writeOpt.nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, - endKey.length); + public boolean keyMayExist(final byte[] key, final int offset, final int len, + final StringBuilder value) { + checkBounds(offset, len, key.length); + return keyMayExist(nativeHandle_, key, offset, len, value); } /** - * Removes the database entries in the range ["beginKey", "endKey"), i.e., - * including "beginKey" and excluding "endKey". a non-OK status on error. It - * is not an error if no keys exist in the range ["beginKey", "endKey"). - * - * Delete the database entry (if any) for "key". Returns OK on success, and a - * non-OK status on error. It is not an error if "key" did not exist in the - * database. + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param writeOpt - * WriteOptions to be used with delete operation - * @param beginKey - * First key to delete within database (included) - * @param endKey - * Last key to delete within database (excluded) + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException - * thrown if error happens in underlying native library. + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key byte array of a key to search for + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public void deleteRange(final ColumnFamilyHandle columnFamilyHandle, final WriteOptions writeOpt, - final byte[] beginKey, final byte[] endKey) throws RocksDBException { - deleteRange(nativeHandle_, writeOpt.nativeHandle_, beginKey, 0, beginKey.length, endKey, 0, - endKey.length, columnFamilyHandle.nativeHandle_); + public boolean keyMayExist(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, final StringBuilder value) { + return keyMayExist(nativeHandle_, key, 0, key.length, + columnFamilyHandle.nativeHandle_, value); } /** - * DB implementations can export properties about their state - * via this method. If "property" is a valid property understood by this - * DB implementation, fills "*value" with its current value and returns - * true. Otherwise returns false. - * - *

    Valid property names include: - *

      - *
    • "rocksdb.num-files-at-level<N>" - return the number of files at - * level <N>, where <N> is an ASCII representation of a level - * number (e.g. "0").
    • - *
    • "rocksdb.stats" - returns a multi-line string that describes statistics - * about the internal operation of the DB.
    • - *
    • "rocksdb.sstables" - returns a multi-line string that describes all - * of the sstables that make up the db contents.
    • - *
    + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @param property to be fetched. See above for examples - * @return property value + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException thrown if error happens in underlying - * native library. + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key byte array of a key to search for + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than "key".length + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public String getProperty(final String property) throws RocksDBException { - return getProperty0(nativeHandle_, property, property.length()); + public boolean keyMayExist(final ColumnFamilyHandle columnFamilyHandle, + final byte[] key, int offset, int len, final StringBuilder value) { + checkBounds(offset, len, key.length); + return keyMayExist(nativeHandle_, key, offset, len, + columnFamilyHandle.nativeHandle_, value); } /** - *

    Similar to GetProperty(), but only works for a subset of properties - * whose return value is a numerical value. Return the value as long.

    - * - *

    Note: As the returned property is of type - * {@code uint64_t} on C++ side the returning value can be negative - * because Java supports in Java 7 only signed long values.

    - * - *

    Java 7: To mitigate the problem of the non - * existent unsigned long tpye, values should be encapsulated using - * {@link java.math.BigInteger} to reflect the correct value. The correct - * behavior is guaranteed if {@code 2^64} is added to negative values.

    - * - *

    Java 8: In Java 8 the value should be treated as - * unsigned long using provided methods of type {@link Long}.

    - * - * @param property to be fetched. + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @return numerical property value. + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException if an error happens in the underlying native code. + * @param readOptions {@link ReadOptions} instance + * @param key byte array of a key to search for + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public long getLongProperty(final String property) throws RocksDBException { - return getLongProperty(nativeHandle_, property, property.length()); + public boolean keyMayExist(final ReadOptions readOptions, + final byte[] key, final StringBuilder value) { + return keyMayExist(nativeHandle_, readOptions.nativeHandle_, + key, 0, key.length, value); } /** - *

    Similar to GetProperty(), but only works for a subset of properties - * whose return value is a numerical value. Return the value as long.

    - * - *

    Note: As the returned property is of type - * {@code uint64_t} on C++ side the returning value can be negative - * because Java supports in Java 7 only signed long values.

    - * - *

    Java 7: To mitigate the problem of the non - * existent unsigned long tpye, values should be encapsulated using - * {@link java.math.BigInteger} to reflect the correct value. The correct - * behavior is guaranteed if {@code 2^64} is added to negative values.

    - * - *

    Java 8: In Java 8 the value should be treated as - * unsigned long using provided methods of type {@link Long}.

    - * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance - * @param property to be fetched. + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @return numerical property value + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException if an error happens in the underlying native code. + * @param readOptions {@link ReadOptions} instance + * @param key byte array of a key to search for + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than "key".length + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public long getLongProperty(final ColumnFamilyHandle columnFamilyHandle, - final String property) throws RocksDBException { - return getLongProperty(nativeHandle_, columnFamilyHandle.nativeHandle_, - property, property.length()); + public boolean keyMayExist(final ReadOptions readOptions, + final byte[] key, final int offset, final int len, + final StringBuilder value) { + checkBounds(offset, len, key.length); + return keyMayExist(nativeHandle_, readOptions.nativeHandle_, + key, offset, len, value); } - /** - *

    Return sum of the getLongProperty of all the column families

    - * - *

    Note: As the returned property is of type - * {@code uint64_t} on C++ side the returning value can be negative - * because Java supports in Java 7 only signed long values.

    - * - *

    Java 7: To mitigate the problem of the non - * existent unsigned long tpye, values should be encapsulated using - * {@link java.math.BigInteger} to reflect the correct value. The correct - * behavior is guaranteed if {@code 2^64} is added to negative values.

    + /** + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - *

    Java 8: In Java 8 the value should be treated as - * unsigned long using provided methods of type {@link Long}.

    + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @param property to be fetched. + * @param readOptions {@link ReadOptions} instance + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key byte array of a key to search for + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. + */ + public boolean keyMayExist(final ReadOptions readOptions, + final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final StringBuilder value) { + return keyMayExist(nativeHandle_, readOptions.nativeHandle_, + key, 0, key.length, columnFamilyHandle.nativeHandle_, + value); + } + + /** + * If the key definitely does not exist in the database, then this method + * returns false, else true. * - * @return numerical property value + * This check is potentially lighter-weight than invoking DB::Get(). One way + * to make this lighter weight is to avoid doing any IOs. * - * @throws RocksDBException if an error happens in the underlying native code. + * @param readOptions {@link ReadOptions} instance + * @param columnFamilyHandle {@link ColumnFamilyHandle} instance + * @param key byte array of a key to search for + * @param offset the offset of the "key" array to be used, must be + * non-negative and no larger than "key".length + * @param len the length of the "key" array to be used, must be non-negative + * and no larger than "key".length + * @param value StringBuilder instance which is a out parameter if a value is + * found in block-cache. + * @return boolean value indicating if key does not exist or might exist. */ - public long getAggregatedLongProperty(final String property) throws RocksDBException { - return getAggregatedLongProperty(nativeHandle_, property, property.length()); + public boolean keyMayExist(final ReadOptions readOptions, + final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final int offset, final int len, final StringBuilder value) { + checkBounds(offset, len, key.length); + return keyMayExist(nativeHandle_, readOptions.nativeHandle_, + key, offset, len, columnFamilyHandle.nativeHandle_, + value); } /** @@ -1577,37 +2390,6 @@ public class RocksDB extends RocksObject { readOptions.nativeHandle_)); } - /** - *

    Return a handle to the current DB state. Iterators created with - * this handle will all observe a stable snapshot of the current DB - * state. The caller must call ReleaseSnapshot(result) when the - * snapshot is no longer needed.

    - * - *

    nullptr will be returned if the DB fails to take a snapshot or does - * not support snapshot.

    - * - * @return Snapshot {@link Snapshot} instance - */ - public Snapshot getSnapshot() { - long snapshotHandle = getSnapshot(nativeHandle_); - if (snapshotHandle != 0) { - return new Snapshot(snapshotHandle); - } - return null; - } - - /** - * Release a previously acquired snapshot. The caller must not - * use "snapshot" after this call. - * - * @param snapshot {@link Snapshot} instance - */ - public void releaseSnapshot(final Snapshot snapshot) { - if (snapshot != null) { - releaseSnapshot(nativeHandle_, snapshot.nativeHandle_); - } - } - /** *

    Return a heap-allocated iterator over the contents of the * database. The result of newIterator() is initially invalid @@ -1702,88 +2484,331 @@ public class RocksDB extends RocksObject { return iterators; } + /** - * Gets the handle for the default column family + *

    Return a handle to the current DB state. Iterators created with + * this handle will all observe a stable snapshot of the current DB + * state. The caller must call ReleaseSnapshot(result) when the + * snapshot is no longer needed.

    * - * @return The handle of the default column family + *

    nullptr will be returned if the DB fails to take a snapshot or does + * not support snapshot.

    + * + * @return Snapshot {@link Snapshot} instance */ - public ColumnFamilyHandle getDefaultColumnFamily() { - final ColumnFamilyHandle cfHandle = new ColumnFamilyHandle(this, - getDefaultColumnFamily(nativeHandle_)); - cfHandle.disOwnNativeHandle(); - return cfHandle; + public Snapshot getSnapshot() { + long snapshotHandle = getSnapshot(nativeHandle_); + if (snapshotHandle != 0) { + return new Snapshot(snapshotHandle); + } + return null; } /** - * Creates a new column family with the name columnFamilyName and - * allocates a ColumnFamilyHandle within an internal structure. - * The ColumnFamilyHandle is automatically disposed with DB disposal. + * Release a previously acquired snapshot. * - * @param columnFamilyDescriptor column family to be created. - * @return {@link org.rocksdb.ColumnFamilyHandle} instance. + * The caller must not use "snapshot" after this call. + * + * @param snapshot {@link Snapshot} instance + */ + public void releaseSnapshot(final Snapshot snapshot) { + if (snapshot != null) { + releaseSnapshot(nativeHandle_, snapshot.nativeHandle_); + } + } + + /** + * DB implements can export properties about their state + * via this method on a per column family level. + * + *

    If {@code property} is a valid property understood by this DB + * implementation, fills {@code value} with its current value and + * returns true. Otherwise returns false.

    + * + *

    Valid property names include: + *

      + *
    • "rocksdb.num-files-at-level<N>" - return the number of files at + * level <N>, where <N> is an ASCII representation of a level + * number (e.g. "0").
    • + *
    • "rocksdb.stats" - returns a multi-line string that describes statistics + * about the internal operation of the DB.
    • + *
    • "rocksdb.sstables" - returns a multi-line string that describes all + * of the sstables that make up the db contents.
    • + *
    + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family. + * @param property to be fetched. See above for examples + * @return property value * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public ColumnFamilyHandle createColumnFamily( - final ColumnFamilyDescriptor columnFamilyDescriptor) - throws RocksDBException { - return new ColumnFamilyHandle(this, createColumnFamily(nativeHandle_, - columnFamilyDescriptor.columnFamilyName(), - columnFamilyDescriptor.columnFamilyOptions().nativeHandle_)); + public String getProperty( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final String property) throws RocksDBException { + return getProperty(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + property, property.length()); } /** - * Drops the column family identified by columnFamilyName. Internal - * handles to this column family will be disposed. If the column family - * is not known removal will fail. + * DB implementations can export properties about their state + * via this method. If "property" is a valid property understood by this + * DB implementation, fills "*value" with its current value and returns + * true. Otherwise returns false. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance + *

    Valid property names include: + *

      + *
    • "rocksdb.num-files-at-level<N>" - return the number of files at + * level <N>, where <N> is an ASCII representation of a level + * number (e.g. "0").
    • + *
    • "rocksdb.stats" - returns a multi-line string that describes statistics + * about the internal operation of the DB.
    • + *
    • "rocksdb.sstables" - returns a multi-line string that describes all + * of the sstables that make up the db contents.
    • + *
    + * + * @param property to be fetched. See above for examples + * @return property value * * @throws RocksDBException thrown if error happens in underlying * native library. */ - public void dropColumnFamily(final ColumnFamilyHandle columnFamilyHandle) - throws RocksDBException, IllegalArgumentException { - // throws RocksDBException if something goes wrong - dropColumnFamily(nativeHandle_, columnFamilyHandle.nativeHandle_); - // After the drop the native handle is not valid anymore - columnFamilyHandle.disOwnNativeHandle(); + public String getProperty(final String property) throws RocksDBException { + return getProperty(null, property); } + /** - *

    Flush all memory table data.

    + * Gets a property map. * - *

    Note: it must be ensured that the FlushOptions instance - * is not GC'ed before this method finishes. If the wait parameter is - * set to false, flush processing is asynchronous.

    + * @param property to be fetched. * - * @param flushOptions {@link org.rocksdb.FlushOptions} instance. - * @throws RocksDBException thrown if an error occurs within the native - * part of the library. + * @return the property map + * + * @throws RocksDBException if an error happens in the underlying native code. */ - public void flush(final FlushOptions flushOptions) + public Map getMapProperty(final String property) + throws RocksDBException { + return getMapProperty(null, property); + } + + /** + * Gets a property map. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family. + * @param property to be fetched. + * + * @return the property map + * + * @throws RocksDBException if an error happens in the underlying native code. + */ + public Map getMapProperty( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final String property) throws RocksDBException { + return getMapProperty(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + property, property.length()); + } + + /** + *

    Similar to GetProperty(), but only works for a subset of properties + * whose return value is a numerical value. Return the value as long.

    + * + *

    Note: As the returned property is of type + * {@code uint64_t} on C++ side the returning value can be negative + * because Java supports in Java 7 only signed long values.

    + * + *

    Java 7: To mitigate the problem of the non + * existent unsigned long tpye, values should be encapsulated using + * {@link java.math.BigInteger} to reflect the correct value. The correct + * behavior is guaranteed if {@code 2^64} is added to negative values.

    + * + *

    Java 8: In Java 8 the value should be treated as + * unsigned long using provided methods of type {@link Long}.

    + * + * @param property to be fetched. + * + * @return numerical property value. + * + * @throws RocksDBException if an error happens in the underlying native code. + */ + public long getLongProperty(final String property) throws RocksDBException { + return getLongProperty(null, property); + } + + /** + *

    Similar to GetProperty(), but only works for a subset of properties + * whose return value is a numerical value. Return the value as long.

    + * + *

    Note: As the returned property is of type + * {@code uint64_t} on C++ side the returning value can be negative + * because Java supports in Java 7 only signed long values.

    + * + *

    Java 7: To mitigate the problem of the non + * existent unsigned long tpye, values should be encapsulated using + * {@link java.math.BigInteger} to reflect the correct value. The correct + * behavior is guaranteed if {@code 2^64} is added to negative values.

    + * + *

    Java 8: In Java 8 the value should be treated as + * unsigned long using provided methods of type {@link Long}.

    + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family + * @param property to be fetched. + * + * @return numerical property value + * + * @throws RocksDBException if an error happens in the underlying native code. + */ + public long getLongProperty( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final String property) throws RocksDBException { + return getLongProperty(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + property, property.length()); + } + + /** + * Reset internal stats for DB and all column families. + * + * Note this doesn't reset {@link Options#statistics()} as it is not + * owned by DB. + */ + public void resetStats() throws RocksDBException { + resetStats(nativeHandle_); + } + + /** + *

    Return sum of the getLongProperty of all the column families

    + * + *

    Note: As the returned property is of type + * {@code uint64_t} on C++ side the returning value can be negative + * because Java supports in Java 7 only signed long values.

    + * + *

    Java 7: To mitigate the problem of the non + * existent unsigned long tpye, values should be encapsulated using + * {@link java.math.BigInteger} to reflect the correct value. The correct + * behavior is guaranteed if {@code 2^64} is added to negative values.

    + * + *

    Java 8: In Java 8 the value should be treated as + * unsigned long using provided methods of type {@link Long}.

    + * + * @param property to be fetched. + * + * @return numerical property value + * + * @throws RocksDBException if an error happens in the underlying native code. + */ + public long getAggregatedLongProperty(final String property) throws RocksDBException { - flush(nativeHandle_, flushOptions.nativeHandle_); + return getAggregatedLongProperty(nativeHandle_, property, + property.length()); + } + + /** + * Get the approximate file system space used by keys in each range. + * + * Note that the returned sizes measure file system space usage, so + * if the user data compresses by a factor of ten, the returned + * sizes will be one-tenth the size of the corresponding user data size. + * + * If {@code sizeApproximationFlags} defines whether the returned size + * should include the recently written data in the mem-tables (if + * the mem-table type supports it), data serialized to disk, or both. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family + * @param ranges the ranges over which to approximate sizes + * @param sizeApproximationFlags flags to determine what to include in the + * approximation. + * + * @return the sizes + */ + public long[] getApproximateSizes( + /*@Nullable*/ final ColumnFamilyHandle columnFamilyHandle, + final List ranges, + final SizeApproximationFlag... sizeApproximationFlags) { + + byte flags = 0x0; + for (final SizeApproximationFlag sizeApproximationFlag + : sizeApproximationFlags) { + flags |= sizeApproximationFlag.getValue(); + } + + return getApproximateSizes(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + toRangeSliceHandles(ranges), flags); + } + + /** + * Get the approximate file system space used by keys in each range for + * the default column family. + * + * Note that the returned sizes measure file system space usage, so + * if the user data compresses by a factor of ten, the returned + * sizes will be one-tenth the size of the corresponding user data size. + * + * If {@code sizeApproximationFlags} defines whether the returned size + * should include the recently written data in the mem-tables (if + * the mem-table type supports it), data serialized to disk, or both. + * + * @param ranges the ranges over which to approximate sizes + * @param sizeApproximationFlags flags to determine what to include in the + * approximation. + * + * @return the sizes. + */ + public long[] getApproximateSizes(final List ranges, + final SizeApproximationFlag... sizeApproximationFlags) { + return getApproximateSizes(null, ranges, sizeApproximationFlags); + } + + public static class CountAndSize { + public final long count; + public final long size; + + public CountAndSize(final long count, final long size) { + this.count = count; + this.size = size; + } + } + + /** + * This method is similar to + * {@link #getApproximateSizes(ColumnFamilyHandle, List, SizeApproximationFlag...)}, + * except that it returns approximate number of records and size in memtables. + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family + * @param range the ranges over which to get the memtable stats + * + * @return the count and size for the range + */ + public CountAndSize getApproximateMemTableStats( + /*@Nullable*/ final ColumnFamilyHandle columnFamilyHandle, + final Range range) { + final long[] result = getApproximateMemTableStats(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + range.start.getNativeHandle(), + range.limit.getNativeHandle()); + return new CountAndSize(result[0], result[1]); } /** - *

    Flush all memory table data.

    + * This method is similar to + * {@link #getApproximateSizes(ColumnFamilyHandle, List, SizeApproximationFlag...)}, + * except that it returns approximate number of records and size in memtables. * - *

    Note: it must be ensured that the FlushOptions instance - * is not GC'ed before this method finishes. If the wait parameter is - * set to false, flush processing is asynchronous.

    + * @param range the ranges over which to get the memtable stats * - * @param flushOptions {@link org.rocksdb.FlushOptions} instance. - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} instance. - * @throws RocksDBException thrown if an error occurs within the native - * part of the library. + * @return the count and size for the range */ - public void flush(final FlushOptions flushOptions, - final ColumnFamilyHandle columnFamilyHandle) throws RocksDBException { - flush(nativeHandle_, flushOptions.nativeHandle_, - columnFamilyHandle.nativeHandle_); + public CountAndSize getApproximateMemTableStats( + final Range range) { + return getApproximateMemTableStats(null, range); } /** @@ -1803,7 +2828,40 @@ public class RocksDB extends RocksObject { * part of the library. */ public void compactRange() throws RocksDBException { - compactRange0(nativeHandle_, false, -1, 0); + compactRange(null); + } + + /** + *

    Range compaction of column family.

    + *

    Note: After the database has been compacted, + * all data will have been pushed down to the last level containing + * any data.

    + * + *

    See also

    + *
      + *
    • + * {@link #compactRange(ColumnFamilyHandle, boolean, int, int)} + *
    • + *
    • + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} + *
    • + *
    • + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[], + * boolean, int, int)} + *
    • + *
    + * + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family. + * + * @throws RocksDBException thrown if an error occurs within the native + * part of the library. + */ + public void compactRange( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle) + throws RocksDBException { + compactRange(nativeHandle_, null, -1, null, -1, 0, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); } /** @@ -1827,45 +2885,44 @@ public class RocksDB extends RocksObject { */ public void compactRange(final byte[] begin, final byte[] end) throws RocksDBException { - compactRange0(nativeHandle_, begin, begin.length, end, - end.length, false, -1, 0); + compactRange(null, begin, end); } /** - *

    Range compaction of database.

    + *

    Range compaction of column family.

    *

    Note: After the database has been compacted, * all data will have been pushed down to the last level containing * any data.

    * - *

    Compaction outputs should be placed in options.db_paths - * [target_path_id]. Behavior is undefined if target_path_id is - * out of range.

    - * *

    See also

    *
      - *
    • {@link #compactRange()}
    • - *
    • {@link #compactRange(byte[], byte[])}
    • - *
    • {@link #compactRange(byte[], byte[], boolean, int, int)}
    • + *
    • {@link #compactRange(ColumnFamilyHandle)}
    • + *
    • + * {@link #compactRange(ColumnFamilyHandle, boolean, int, int)} + *
    • + *
    • + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[], + * boolean, int, int)} + *
    • *
    * - * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead - * - * @param reduce_level reduce level after compaction - * @param target_level target level to compact to - * @param target_path_id the target path id of output path + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family. + * @param begin start of key range (included in range) + * @param end end of key range (excluded from range) * * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ - @Deprecated - public void compactRange(final boolean reduce_level, - final int target_level, final int target_path_id) - throws RocksDBException { - compactRange0(nativeHandle_, reduce_level, - target_level, target_path_id); + public void compactRange( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final byte[] begin, final byte[] end) throws RocksDBException { + compactRange(nativeHandle_, + begin, begin == null ? -1 : begin.length, + end, end == null ? -1 : end.length, + 0, columnFamilyHandle == null ? 0: columnFamilyHandle.nativeHandle_); } - /** *

    Range compaction of database.

    *

    Note: After the database has been compacted, @@ -1879,27 +2936,23 @@ public class RocksDB extends RocksObject { *

    See also

    *
      *
    • {@link #compactRange()}
    • - *
    • {@link #compactRange(boolean, int, int)}
    • *
    • {@link #compactRange(byte[], byte[])}
    • + *
    • {@link #compactRange(byte[], byte[], boolean, int, int)}
    • *
    * * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead * - * @param begin start of key range (included in range) - * @param end end of key range (excluded from range) - * @param reduce_level reduce level after compaction - * @param target_level target level to compact to - * @param target_path_id the target path id of output path + * @param changeLevel reduce level after compaction + * @param targetLevel target level to compact to + * @param targetPathId the target path id of output path * * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ @Deprecated - public void compactRange(final byte[] begin, final byte[] end, - final boolean reduce_level, final int target_level, - final int target_path_id) throws RocksDBException { - compactRange0(nativeHandle_, begin, begin.length, end, end.length, - reduce_level, target_level, target_path_id); + public void compactRange(final boolean changeLevel, final int targetLevel, + final int targetPathId) throws RocksDBException { + compactRange(null, changeLevel, targetLevel, targetPathId); } /** @@ -1908,11 +2961,13 @@ public class RocksDB extends RocksObject { * all data will have been pushed down to the last level containing * any data.

    * + *

    Compaction outputs should be placed in options.db_paths + * [target_path_id]. Behavior is undefined if target_path_id is + * out of range.

    + * *

    See also

    *
      - *
    • - * {@link #compactRange(ColumnFamilyHandle, boolean, int, int)} - *
    • + *
    • {@link #compactRange(ColumnFamilyHandle)}
    • *
    • * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} *
    • @@ -1922,16 +2977,67 @@ public class RocksDB extends RocksObject { * *
    * + * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead + * * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance. + * instance, or null for the default column family. + * @param changeLevel reduce level after compaction + * @param targetLevel target level to compact to + * @param targetPathId the target path id of output path * * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ - public void compactRange(final ColumnFamilyHandle columnFamilyHandle) + @Deprecated + public void compactRange( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final boolean changeLevel, final int targetLevel, final int targetPathId) throws RocksDBException { - compactRange(nativeHandle_, false, -1, 0, - columnFamilyHandle.nativeHandle_); + final CompactRangeOptions options = new CompactRangeOptions(); + options.setChangeLevel(changeLevel); + options.setTargetLevel(targetLevel); + options.setTargetPathId(targetPathId); + compactRange(nativeHandle_, + null, -1, + null, -1, + options.nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + *

    Range compaction of database.

    + *

    Note: After the database has been compacted, + * all data will have been pushed down to the last level containing + * any data.

    + * + *

    Compaction outputs should be placed in options.db_paths + * [target_path_id]. Behavior is undefined if target_path_id is + * out of range.

    + * + *

    See also

    + *
      + *
    • {@link #compactRange()}
    • + *
    • {@link #compactRange(boolean, int, int)}
    • + *
    • {@link #compactRange(byte[], byte[])}
    • + *
    + * + * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} + * instead + * + * @param begin start of key range (included in range) + * @param end end of key range (excluded from range) + * @param changeLevel reduce level after compaction + * @param targetLevel target level to compact to + * @param targetPathId the target path id of output path + * + * @throws RocksDBException thrown if an error occurs within the native + * part of the library. + */ + @Deprecated + public void compactRange(final byte[] begin, final byte[] end, + final boolean changeLevel, final int targetLevel, + final int targetPathId) throws RocksDBException { + compactRange(null, begin, end, changeLevel, targetLevel, targetPathId); } /** @@ -1940,6 +3046,10 @@ public class RocksDB extends RocksObject { * all data will have been pushed down to the last level containing * any data.

    * + *

    Compaction outputs should be placed in options.db_paths + * [target_path_id]. Behavior is undefined if target_path_id is + * out of range.

    + * *

    See also

    *
      *
    • {@link #compactRange(ColumnFamilyHandle)}
    • @@ -1947,26 +3057,40 @@ public class RocksDB extends RocksObject { * {@link #compactRange(ColumnFamilyHandle, boolean, int, int)} * *
    • - * {@link #compactRange(ColumnFamilyHandle, byte[], byte[], - * boolean, int, int)} + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} *
    • *
    * + * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead + * * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} * instance. * @param begin start of key range (included in range) * @param end end of key range (excluded from range) + * @param changeLevel reduce level after compaction + * @param targetLevel target level to compact to + * @param targetPathId the target path id of output path * * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ - public void compactRange(final ColumnFamilyHandle columnFamilyHandle, - final byte[] begin, final byte[] end) throws RocksDBException { - compactRange(nativeHandle_, begin, begin.length, end, end.length, - false, -1, 0, columnFamilyHandle.nativeHandle_); + @Deprecated + public void compactRange( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final byte[] begin, final byte[] end, final boolean changeLevel, + final int targetLevel, final int targetPathId) + throws RocksDBException { + final CompactRangeOptions options = new CompactRangeOptions(); + options.setChangeLevel(changeLevel); + options.setTargetLevel(targetLevel); + options.setTargetPathId(targetPathId); + compactRange(nativeHandle_, + begin, begin == null ? -1 : begin.length, + end, end == null ? -1 : end.length, + options.nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); } - /** *

    Range compaction of column family.

    *

    Note: After the database has been compacted, @@ -1982,115 +3106,325 @@ public class RocksDB extends RocksObject { * part of the library. */ public void compactRange(final ColumnFamilyHandle columnFamilyHandle, - final byte[] begin, final byte[] end, CompactRangeOptions compactRangeOptions) throws RocksDBException { - compactRange(nativeHandle_, begin, begin.length, end, end.length, - compactRangeOptions.nativeHandle_, columnFamilyHandle.nativeHandle_); + final byte[] begin, final byte[] end, + final CompactRangeOptions compactRangeOptions) throws RocksDBException { + compactRange(nativeHandle_, + begin, begin == null ? -1 : begin.length, + end, end == null ? -1 : end.length, + compactRangeOptions.nativeHandle_, columnFamilyHandle.nativeHandle_); } /** - *

    Range compaction of column family.

    - *

    Note: After the database has been compacted, - * all data will have been pushed down to the last level containing - * any data.

    + * Change the options for the column family handle. * - *

    Compaction outputs should be placed in options.db_paths - * [target_path_id]. Behavior is undefined if target_path_id is - * out of range.

    + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance, or null for the default column family. + * @param mutableColumnFamilyOptions the options. + */ + public void setOptions( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle, + final MutableColumnFamilyOptions mutableColumnFamilyOptions) + throws RocksDBException { + setOptions(nativeHandle_, columnFamilyHandle.nativeHandle_, + mutableColumnFamilyOptions.getKeys(), + mutableColumnFamilyOptions.getValues()); + } + + /** + * Change the options for the default column family handle. * - *

    See also

    - *
      - *
    • {@link #compactRange(ColumnFamilyHandle)}
    • - *
    • - * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} - *
    • - *
    • - * {@link #compactRange(ColumnFamilyHandle, byte[], byte[], - * boolean, int, int)} - *
    • - *
    + * @param mutableColumnFamilyOptions the options. + */ + public void setOptions( + final MutableColumnFamilyOptions mutableColumnFamilyOptions) + throws RocksDBException { + setOptions(null, mutableColumnFamilyOptions); + } + + /** + * Set the options for the column family handle. * - * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead + * @param mutableDBoptions the options. + */ + public void setDBOptions(final MutableDBOptions mutableDBoptions) + throws RocksDBException { + setDBOptions(nativeHandle_, + mutableDBoptions.getKeys(), + mutableDBoptions.getValues()); + } + + /** + * Takes nputs a list of files specified by file names and + * compacts them to the specified level. + * + * Note that the behavior is different from + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} + * in that CompactFiles() performs the compaction job using the CURRENT + * thread. + * + * @param compactionOptions compaction options + * @param inputFileNames the name of the files to compact + * @param outputLevel the level to which they should be compacted + * @param outputPathId the id of the output path, or -1 + * @param compactionJobInfo the compaction job info, this parameter + * will be updated with the info from compacting the files, + * can just be null if you don't need it. + */ + public List compactFiles( + final CompactionOptions compactionOptions, + final List inputFileNames, + final int outputLevel, + final int outputPathId, + /* @Nullable */ final CompactionJobInfo compactionJobInfo) + throws RocksDBException { + return compactFiles(compactionOptions, null, inputFileNames, outputLevel, + outputPathId, compactionJobInfo); + } + + /** + * Takes a list of files specified by file names and + * compacts them to the specified level. + * + * Note that the behavior is different from + * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} + * in that CompactFiles() performs the compaction job using the CURRENT + * thread. + * + * @param compactionOptions compaction options + * @param columnFamilyHandle columnFamilyHandle, or null for the + * default column family + * @param inputFileNames the name of the files to compact + * @param outputLevel the level to which they should be compacted + * @param outputPathId the id of the output path, or -1 + * @param compactionJobInfo the compaction job info, this parameter + * will be updated with the info from compacting the files, + * can just be null if you don't need it. + */ + public List compactFiles( + final CompactionOptions compactionOptions, + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle, + final List inputFileNames, + final int outputLevel, + final int outputPathId, + /* @Nullable */ final CompactionJobInfo compactionJobInfo) + throws RocksDBException { + return Arrays.asList(compactFiles(nativeHandle_, compactionOptions.nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + inputFileNames.toArray(new String[0]), + outputLevel, + outputPathId, + compactionJobInfo == null ? 0 : compactionJobInfo.nativeHandle_)); + } + + /** + * This function will wait until all currently running background processes + * finish. After it returns, no background process will be run until + * {@link #continueBackgroundWork()} is called + * + * @throws RocksDBException If an error occurs when pausing background work + */ + public void pauseBackgroundWork() throws RocksDBException { + pauseBackgroundWork(nativeHandle_); + } + + /** + * Resumes background work which was suspended by + * previously calling {@link #pauseBackgroundWork()} + * + * @throws RocksDBException If an error occurs when resuming background work + */ + public void continueBackgroundWork() throws RocksDBException { + continueBackgroundWork(nativeHandle_); + } + + /** + * Enable automatic compactions for the given column + * families if they were previously disabled. + * + * The function will first set the + * {@link ColumnFamilyOptions#disableAutoCompactions()} option for each + * column family to false, after which it will schedule a flush/compaction. + * + * NOTE: Setting disableAutoCompactions to 'false' through + * {@link #setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)} + * does NOT schedule a flush/compaction afterwards, and only changes the + * parameter itself within the column family option. + * + * @param columnFamilyHandles the column family handles + */ + public void enableAutoCompaction( + final List columnFamilyHandles) + throws RocksDBException { + enableAutoCompaction(nativeHandle_, + toNativeHandleList(columnFamilyHandles)); + } + + /** + * Number of levels used for this DB. + * + * @return the number of levels + */ + public int numberLevels() { + return numberLevels(null); + } + + /** + * Number of levels used for a column family in this DB. + * + * @param columnFamilyHandle the column family handle, or null + * for the default column family + * + * @return the number of levels + */ + public int numberLevels(/* @Nullable */final ColumnFamilyHandle columnFamilyHandle) { + return numberLevels(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + * Maximum level to which a new compacted memtable is pushed if it + * does not create overlap. + */ + public int maxMemCompactionLevel() { + return maxMemCompactionLevel(null); + } + + /** + * Maximum level to which a new compacted memtable is pushed if it + * does not create overlap. + * + * @param columnFamilyHandle the column family handle + */ + public int maxMemCompactionLevel( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle) { + return maxMemCompactionLevel(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + * Number of files in level-0 that would stop writes. + */ + public int level0StopWriteTrigger() { + return level0StopWriteTrigger(null); + } + + /** + * Number of files in level-0 that would stop writes. + * + * @param columnFamilyHandle the column family handle + */ + public int level0StopWriteTrigger( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle) { + return level0StopWriteTrigger(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + * Get DB name -- the exact same name that was provided as an argument to + * as path to {@link #open(Options, String)}. + * + * @return the DB name + */ + public String getName() { + return getName(nativeHandle_); + } + + /** + * Get the Env object from the DB + * + * @return the env + */ + public Env getEnv() { + final long envHandle = getEnv(nativeHandle_); + if (envHandle == Env.getDefault().nativeHandle_) { + return Env.getDefault(); + } else { + final Env env = new RocksEnv(envHandle); + env.disOwnNativeHandle(); // we do not own the Env! + return env; + } + } + + /** + *

    Flush all memory table data.

    + * + *

    Note: it must be ensured that the FlushOptions instance + * is not GC'ed before this method finishes. If the wait parameter is + * set to false, flush processing is asynchronous.

    + * + * @param flushOptions {@link org.rocksdb.FlushOptions} instance. + * @throws RocksDBException thrown if an error occurs within the native + * part of the library. + */ + public void flush(final FlushOptions flushOptions) + throws RocksDBException { + flush(flushOptions, (List) null); + } + + /** + *

    Flush all memory table data.

    * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance. - * @param reduce_level reduce level after compaction - * @param target_level target level to compact to - * @param target_path_id the target path id of output path + *

    Note: it must be ensured that the FlushOptions instance + * is not GC'ed before this method finishes. If the wait parameter is + * set to false, flush processing is asynchronous.

    * + * @param flushOptions {@link org.rocksdb.FlushOptions} instance. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} instance. * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ - @Deprecated - public void compactRange(final ColumnFamilyHandle columnFamilyHandle, - final boolean reduce_level, final int target_level, - final int target_path_id) throws RocksDBException { - compactRange(nativeHandle_, reduce_level, target_level, - target_path_id, columnFamilyHandle.nativeHandle_); + public void flush(final FlushOptions flushOptions, + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle) + throws RocksDBException { + flush(flushOptions, + columnFamilyHandle == null ? null : Arrays.asList(columnFamilyHandle)); } /** - *

    Range compaction of column family.

    - *

    Note: After the database has been compacted, - * all data will have been pushed down to the last level containing - * any data.

    - * - *

    Compaction outputs should be placed in options.db_paths - * [target_path_id]. Behavior is undefined if target_path_id is - * out of range.

    - * - *

    See also

    - *
      - *
    • {@link #compactRange(ColumnFamilyHandle)}
    • - *
    • - * {@link #compactRange(ColumnFamilyHandle, boolean, int, int)} - *
    • - *
    • - * {@link #compactRange(ColumnFamilyHandle, byte[], byte[])} - *
    • - *
    + * Flushes multiple column families. * - * @deprecated Use {@link #compactRange(ColumnFamilyHandle, byte[], byte[], CompactRangeOptions)} instead + * If atomic flush is not enabled, this is equivalent to calling + * {@link #flush(FlushOptions, ColumnFamilyHandle)} multiple times. * - * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} - * instance. - * @param begin start of key range (included in range) - * @param end end of key range (excluded from range) - * @param reduce_level reduce level after compaction - * @param target_level target level to compact to - * @param target_path_id the target path id of output path + * If atomic flush is enabled, this will flush all column families + * specified up to the latest sequence number at the time when flush is + * requested. * + * @param flushOptions {@link org.rocksdb.FlushOptions} instance. + * @param columnFamilyHandles column family handles. * @throws RocksDBException thrown if an error occurs within the native * part of the library. */ - @Deprecated - public void compactRange(final ColumnFamilyHandle columnFamilyHandle, - final byte[] begin, final byte[] end, final boolean reduce_level, - final int target_level, final int target_path_id) + public void flush(final FlushOptions flushOptions, + /* @Nullable */ final List columnFamilyHandles) throws RocksDBException { - compactRange(nativeHandle_, begin, begin.length, end, end.length, - reduce_level, target_level, target_path_id, - columnFamilyHandle.nativeHandle_); + flush(nativeHandle_, flushOptions.nativeHandle_, + toNativeHandleList(columnFamilyHandles)); } /** - * This function will wait until all currently running background processes - * finish. After it returns, no background process will be run until - * {@link #continueBackgroundWork()} is called + * Flush the WAL memory buffer to the file. If {@code sync} is true, + * it calls {@link #syncWal()} afterwards. * - * @throws RocksDBException If an error occurs when pausing background work + * @param sync true to also fsync to disk. */ - public void pauseBackgroundWork() throws RocksDBException { - pauseBackgroundWork(nativeHandle_); + public void flushWal(final boolean sync) throws RocksDBException { + flushWal(nativeHandle_, sync); } /** - * Resumes backround work which was suspended by - * previously calling {@link #pauseBackgroundWork()} + * Sync the WAL. * - * @throws RocksDBException If an error occurs when resuming background work + * Note that {@link #write(WriteOptions, WriteBatch)} followed by + * {@link #syncWal()} is not exactly the same as + * {@link #write(WriteOptions, WriteBatch)} with + * {@link WriteOptions#sync()} set to true; In the latter case the changes + * won't be visible until the sync is done. + * + * Currently only works if {@link Options#allowMmapWrites()} is set to false. */ - public void continueBackgroundWork() throws RocksDBException { - continueBackgroundWork(nativeHandle_); + public void syncWal() throws RocksDBException { + syncWal(nativeHandle_); } /** @@ -2103,6 +3437,25 @@ public class RocksDB extends RocksObject { return getLatestSequenceNumber(nativeHandle_); } + /** + * Instructs DB to preserve deletes with sequence numbers >= sequenceNumber. + * + * Has no effect if DBOptions#preserveDeletes() is set to false. + * + * This function assumes that user calls this function with monotonically + * increasing seqnums (otherwise we can't guarantee that a particular delete + * hasn't been already processed). + * + * @param sequenceNumber the minimum sequence number to preserve + * + * @return true if the value was successfully updated, + * false if user attempted to call if with + * sequenceNumber <= current value. + */ + public boolean setPreserveDeletesSequenceNumber(final long sequenceNumber) { + return setPreserveDeletesSequenceNumber(nativeHandle_, sequenceNumber); + } + /** *

    Prevent file deletions. Compactions will continue to occur, * but no obsolete files will be deleted. Calling this multiple @@ -2140,6 +3493,78 @@ public class RocksDB extends RocksObject { enableFileDeletions(nativeHandle_, force); } + public static class LiveFiles { + /** + * The valid size of the manifest file. The manifest file is an ever growing + * file, but only the portion specified here is valid for this snapshot. + */ + public final long manifestFileSize; + + /** + * The files are relative to the {@link #getName()} and are not + * absolute paths. Despite being relative paths, the file names begin + * with "/". + */ + public final List files; + + LiveFiles(final long manifestFileSize, final List files) { + this.manifestFileSize = manifestFileSize; + this.files = files; + } + } + + /** + * Retrieve the list of all files in the database after flushing the memtable. + * + * See {@link #getLiveFiles(boolean)}. + * + * @return the live files + */ + public LiveFiles getLiveFiles() throws RocksDBException { + return getLiveFiles(true); + } + + /** + * Retrieve the list of all files in the database. + * + * In case you have multiple column families, even if {@code flushMemtable} + * is true, you still need to call {@link #getSortedWalFiles()} + * after {@link #getLiveFiles(boolean)} to compensate for new data that + * arrived to already-flushed column families while other column families + * were flushing. + * + * NOTE: Calling {@link #getLiveFiles(boolean)} followed by + * {@link #getSortedWalFiles()} can generate a lossless backup. + * + * @param flushMemtable set to true to flush before recoding the live + * files. Setting to false is useful when we don't want to wait for flush + * which may have to wait for compaction to complete taking an + * indeterminate time. + * + * @return the live files + */ + public LiveFiles getLiveFiles(final boolean flushMemtable) + throws RocksDBException { + final String[] result = getLiveFiles(nativeHandle_, flushMemtable); + if (result == null) { + return null; + } + final String[] files = Arrays.copyOf(result, result.length - 1); + final long manifestFileSize = Long.parseLong(result[result.length - 1]); + + return new LiveFiles(manifestFileSize, Arrays.asList(files)); + } + + /** + * Retrieve the sorted list of all wal files with earliest file first. + * + * @return the log files + */ + public List getSortedWalFiles() throws RocksDBException { + final LogFile[] logFiles = getSortedWalFiles(nativeHandle_); + return Arrays.asList(logFiles); + } + /** *

    Returns an iterator that is positioned at a write-batch containing * seq_number. If the sequence number is non existent, it returns an iterator @@ -2163,21 +3588,46 @@ public class RocksDB extends RocksObject { getUpdatesSince(nativeHandle_, sequenceNumber)); } - public void setOptions(final ColumnFamilyHandle columnFamilyHandle, - final MutableColumnFamilyOptions mutableColumnFamilyOptions) - throws RocksDBException { - setOptions(nativeHandle_, columnFamilyHandle.nativeHandle_, - mutableColumnFamilyOptions.getKeys(), - mutableColumnFamilyOptions.getValues()); + /** + * Delete the file name from the db directory and update the internal state to + * reflect that. Supports deletion of sst and log files only. 'name' must be + * path relative to the db directory. eg. 000001.sst, /archive/000003.log + * + * @param name the file name + */ + public void deleteFile(final String name) throws RocksDBException { + deleteFile(nativeHandle_, name); } - private long[] toNativeHandleList(final List objectList) { - final int len = objectList.size(); - final long[] handleList = new long[len]; - for (int i = 0; i < len; i++) { - handleList[i] = objectList.get(i).nativeHandle_; - } - return handleList; + /** + * Gets a list of all table files metadata. + * + * @return table files metadata. + */ + public List getLiveFilesMetaData() { + return Arrays.asList(getLiveFilesMetaData(nativeHandle_)); + } + + /** + * Obtains the meta data of the specified column family of the DB. + * + * @param columnFamilyHandle the column family + * + * @return the column family metadata + */ + public ColumnFamilyMetaData getColumnFamilyMetaData( + /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle) { + return getColumnFamilyMetaData(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + * Obtains the meta data of the default column family of the DB. + * + * @return the column family metadata + */ + public ColumnFamilyMetaData GetColumnFamilyMetaData() { + return getColumnFamilyMetaData(null); } /** @@ -2201,7 +3651,7 @@ public class RocksDB extends RocksObject { final IngestExternalFileOptions ingestExternalFileOptions) throws RocksDBException { ingestExternalFile(nativeHandle_, getDefaultColumnFamily().nativeHandle_, - filePathList.toArray(new String[filePathList.size()]), + filePathList.toArray(new String[0]), filePathList.size(), ingestExternalFileOptions.nativeHandle_); } @@ -2228,10 +3678,162 @@ public class RocksDB extends RocksObject { final IngestExternalFileOptions ingestExternalFileOptions) throws RocksDBException { ingestExternalFile(nativeHandle_, columnFamilyHandle.nativeHandle_, - filePathList.toArray(new String[filePathList.size()]), + filePathList.toArray(new String[0]), filePathList.size(), ingestExternalFileOptions.nativeHandle_); } + /** + * Verify checksum + * + * @throws RocksDBException if the checksum is not valid + */ + public void verifyChecksum() throws RocksDBException { + verifyChecksum(nativeHandle_); + } + + /** + * Gets the handle for the default column family + * + * @return The handle of the default column family + */ + public ColumnFamilyHandle getDefaultColumnFamily() { + final ColumnFamilyHandle cfHandle = new ColumnFamilyHandle(this, + getDefaultColumnFamily(nativeHandle_)); + cfHandle.disOwnNativeHandle(); + return cfHandle; + } + + /** + * Get the properties of all tables. + * + * @param columnFamilyHandle the column family handle, or null for the default + * column family. + * + * @return the properties + */ + public Map getPropertiesOfAllTables( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle) + throws RocksDBException { + return getPropertiesOfAllTables(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + } + + /** + * Get the properties of all tables in the default column family. + * + * @return the properties + */ + public Map getPropertiesOfAllTables() + throws RocksDBException { + return getPropertiesOfAllTables(null); + } + + /** + * Get the properties of tables in range. + * + * @param columnFamilyHandle the column family handle, or null for the default + * column family. + * @param ranges the ranges over which to get the table properties + * + * @return the properties + */ + public Map getPropertiesOfTablesInRange( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle, + final List ranges) throws RocksDBException { + return getPropertiesOfTablesInRange(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + toRangeSliceHandles(ranges)); + } + + /** + * Get the properties of tables in range for the default column family. + * + * @param ranges the ranges over which to get the table properties + * + * @return the properties + */ + public Map getPropertiesOfTablesInRange( + final List ranges) throws RocksDBException { + return getPropertiesOfTablesInRange(null, ranges); + } + + /** + * Suggest the range to compact. + * + * @param columnFamilyHandle the column family handle, or null for the default + * column family. + * + * @return the suggested range. + */ + public Range suggestCompactRange( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle) + throws RocksDBException { + final long[] rangeSliceHandles = suggestCompactRange(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_); + return new Range(new Slice(rangeSliceHandles[0]), + new Slice(rangeSliceHandles[1])); + } + + /** + * Suggest the range to compact for the default column family. + * + * @return the suggested range. + */ + public Range suggestCompactRange() + throws RocksDBException { + return suggestCompactRange(null); + } + + /** + * Promote L0. + * + * @param columnFamilyHandle the column family handle, + * or null for the default column family. + */ + public void promoteL0( + /* @Nullable */final ColumnFamilyHandle columnFamilyHandle, + final int targetLevel) throws RocksDBException { + promoteL0(nativeHandle_, + columnFamilyHandle == null ? 0 : columnFamilyHandle.nativeHandle_, + targetLevel); + } + + /** + * Promote L0 for the default column family. + */ + public void promoteL0(final int targetLevel) + throws RocksDBException { + promoteL0(null, targetLevel); + } + + /** + * Trace DB operations. + * + * Use {@link #endTrace()} to stop tracing. + * + * @param traceOptions the options + * @param traceWriter the trace writer + */ + public void startTrace(final TraceOptions traceOptions, + final AbstractTraceWriter traceWriter) throws RocksDBException { + startTrace(nativeHandle_, traceOptions.getMaxTraceFileSize(), + traceWriter.nativeHandle_); + /** + * NOTE: {@link #startTrace(long, long, long) transfers the ownership + * from Java to C++, so we must disown the native handle here. + */ + traceWriter.disOwnNativeHandle(); + } + + /** + * Stop tracing DB operations. + * + * See {@link #startTrace(TraceOptions, AbstractTraceWriter)} + */ + public void endTrace() throws RocksDBException { + endTrace(nativeHandle_); + } + /** * Static method to destroy the contents of the specified database. * Be very careful using this method. @@ -2247,17 +3849,47 @@ public class RocksDB extends RocksObject { destroyDB(path, options.nativeHandle_); } - /** - * Private constructor. - * - * @param nativeHandle The native handle of the C++ RocksDB object - */ - protected RocksDB(final long nativeHandle) { - super(nativeHandle); + private /* @Nullable */ long[] toNativeHandleList( + /* @Nullable */ final List objectList) { + if (objectList == null) { + return null; + } + final int len = objectList.size(); + final long[] handleList = new long[len]; + for (int i = 0; i < len; i++) { + handleList[i] = objectList.get(i).nativeHandle_; + } + return handleList; + } + + private static long[] toRangeSliceHandles(final List ranges) { + final long rangeSliceHandles[] = new long [ranges.size() * 2]; + for (int i = 0, j = 0; i < ranges.size(); i++) { + final Range range = ranges.get(i); + rangeSliceHandles[j++] = range.start.getNativeHandle(); + rangeSliceHandles[j++] = range.limit.getNativeHandle(); + } + return rangeSliceHandles; + } + + protected void storeOptionsInstance(DBOptionsInterface options) { + options_ = options; + } + + private static void checkBounds(int offset, int len, int size) { + if ((offset | len | (offset + len) | (size - (offset + len))) < 0) { + throw new IndexOutOfBoundsException(String.format("offset(%d), len(%d), size(%d)", offset, len, size)); + } + } + + private static int computeCapacityHint(final int estimatedNumberOfItems) { + // Default load factor for HashMap is 0.75, so N * 1.5 will be at the load + // limit. We add +1 for a buffer. + return (int)Math.ceil(estimatedNumberOfItems * 1.5 + 1.0); } // native methods - protected native static long open(final long optionsHandle, + private native static long open(final long optionsHandle, final String path) throws RocksDBException; /** @@ -2272,11 +3904,11 @@ public class RocksDB extends RocksObject { * * @throws RocksDBException thrown if the database could not be opened */ - protected native static long[] open(final long optionsHandle, + private native static long[] open(final long optionsHandle, final String path, final byte[][] columnFamilyNames, final long[] columnFamilyOptions) throws RocksDBException; - protected native static long openROnly(final long optionsHandle, + private native static long openROnly(final long optionsHandle, final String path) throws RocksDBException; /** @@ -2291,175 +3923,258 @@ public class RocksDB extends RocksObject { * * @throws RocksDBException thrown if the database could not be opened */ - protected native static long[] openROnly(final long optionsHandle, + private native static long[] openROnly(final long optionsHandle, final String path, final byte[][] columnFamilyNames, final long[] columnFamilyOptions ) throws RocksDBException; - protected native static byte[][] listColumnFamilies(long optionsHandle, - String path) throws RocksDBException; - protected native void put(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength) + @Override protected native void disposeInternal(final long handle); + + private native static void closeDatabase(final long handle) + throws RocksDBException; + private native static byte[][] listColumnFamilies(final long optionsHandle, + final String path) throws RocksDBException; + private native long createColumnFamily(final long handle, + final byte[] columnFamilyName, final int columnFamilyNamelen, + final long columnFamilyOptions) throws RocksDBException; + private native long[] createColumnFamilies(final long handle, + final long columnFamilyOptionsHandle, final byte[][] columnFamilyNames) + throws RocksDBException; + private native long[] createColumnFamilies(final long handle, + final long columnFamilyOptionsHandles[], final byte[][] columnFamilyNames) + throws RocksDBException; + private native void dropColumnFamily( + final long handle, final long cfHandle) throws RocksDBException; + private native void dropColumnFamilies(final long handle, + final long[] cfHandles) throws RocksDBException; + //TODO(AR) best way to express DestroyColumnFamilyHandle? ...maybe in ColumnFamilyHandle? + private native void put(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final byte[] value, + final int valueOffset, int valueLength) throws RocksDBException; + private native void put(final long handle, final byte[] key, final int keyOffset, + final int keyLength, final byte[] value, final int valueOffset, + final int valueLength, final long cfHandle) throws RocksDBException; + private native void put(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength) + throws RocksDBException; + private native void put(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength, + final long cfHandle) throws RocksDBException; + private native void delete(final long handle, final byte[] key, + final int keyOffset, final int keyLength) throws RocksDBException; + private native void delete(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final long cfHandle) + throws RocksDBException; + private native void delete(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength) + throws RocksDBException; + private native void delete(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final long cfHandle) throws RocksDBException; + private native void singleDelete( + final long handle, final byte[] key, final int keyLen) + throws RocksDBException; + private native void singleDelete( + final long handle, final byte[] key, final int keyLen, + final long cfHandle) throws RocksDBException; + private native void singleDelete( + final long handle, final long writeOptHandle, final byte[] key, + final int keyLen) throws RocksDBException; + private native void singleDelete( + final long handle, final long writeOptHandle, + final byte[] key, final int keyLen, final long cfHandle) throws RocksDBException; - protected native void put(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength, - long cfHandle) throws RocksDBException; - protected native void put(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength) throws RocksDBException; - protected native void put(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength, long cfHandle) throws RocksDBException; - protected native void write0(final long handle, long writeOptHandle, - long wbHandle) throws RocksDBException; - protected native void write1(final long handle, long writeOptHandle, - long wbwiHandle) throws RocksDBException; - protected native boolean keyMayExist(final long handle, final byte[] key, + private native void deleteRange(final long handle, final byte[] beginKey, + final int beginKeyOffset, final int beginKeyLength, final byte[] endKey, + final int endKeyOffset, final int endKeyLength) throws RocksDBException; + private native void deleteRange(final long handle, final byte[] beginKey, + final int beginKeyOffset, final int beginKeyLength, final byte[] endKey, + final int endKeyOffset, final int endKeyLength, final long cfHandle) + throws RocksDBException; + private native void deleteRange(final long handle, final long writeOptHandle, + final byte[] beginKey, final int beginKeyOffset, final int beginKeyLength, + final byte[] endKey, final int endKeyOffset, final int endKeyLength) + throws RocksDBException; + private native void deleteRange( + final long handle, final long writeOptHandle, final byte[] beginKey, + final int beginKeyOffset, final int beginKeyLength, final byte[] endKey, + final int endKeyOffset, final int endKeyLength, final long cfHandle) + throws RocksDBException; + private native void merge(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final byte[] value, + final int valueOffset, final int valueLength) throws RocksDBException; + private native void merge(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final byte[] value, + final int valueOffset, final int valueLength, final long cfHandle) + throws RocksDBException; + private native void merge(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength) + throws RocksDBException; + private native void merge(final long handle, final long writeOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength, + final long cfHandle) throws RocksDBException; + private native void write0(final long handle, final long writeOptHandle, + final long wbHandle) throws RocksDBException; + private native void write1(final long handle, final long writeOptHandle, + final long wbwiHandle) throws RocksDBException; + private native int get(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final byte[] value, + final int valueOffset, final int valueLength) throws RocksDBException; + private native int get(final long handle, final byte[] key, + final int keyOffset, final int keyLength, byte[] value, + final int valueOffset, final int valueLength, final long cfHandle) + throws RocksDBException; + private native int get(final long handle, final long readOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength) + throws RocksDBException; + private native int get(final long handle, final long readOptHandle, + final byte[] key, final int keyOffset, final int keyLength, + final byte[] value, final int valueOffset, final int valueLength, + final long cfHandle) throws RocksDBException; + private native byte[] get(final long handle, byte[] key, final int keyOffset, + final int keyLength) throws RocksDBException; + private native byte[] get(final long handle, final byte[] key, + final int keyOffset, final int keyLength, final long cfHandle) + throws RocksDBException; + private native byte[] get(final long handle, final long readOptHandle, + final byte[] key, final int keyOffset, final int keyLength) + throws RocksDBException; + private native byte[] get(final long handle, + final long readOptHandle, final byte[] key, final int keyOffset, + final int keyLength, final long cfHandle) throws RocksDBException; + private native byte[][] multiGet(final long dbHandle, final byte[][] keys, + final int[] keyOffsets, final int[] keyLengths); + private native byte[][] multiGet(final long dbHandle, final byte[][] keys, + final int[] keyOffsets, final int[] keyLengths, + final long[] columnFamilyHandles); + private native byte[][] multiGet(final long dbHandle, final long rOptHandle, + final byte[][] keys, final int[] keyOffsets, final int[] keyLengths); + private native byte[][] multiGet(final long dbHandle, final long rOptHandle, + final byte[][] keys, final int[] keyOffsets, final int[] keyLengths, + final long[] columnFamilyHandles); + private native boolean keyMayExist(final long handle, final byte[] key, final int keyOffset, final int keyLength, final StringBuilder stringBuilder); - protected native boolean keyMayExist(final long handle, final byte[] key, + private native boolean keyMayExist(final long handle, final byte[] key, final int keyOffset, final int keyLength, final long cfHandle, final StringBuilder stringBuilder); - protected native boolean keyMayExist(final long handle, + private native boolean keyMayExist(final long handle, final long optionsHandle, final byte[] key, final int keyOffset, final int keyLength, final StringBuilder stringBuilder); - protected native boolean keyMayExist(final long handle, + private native boolean keyMayExist(final long handle, final long optionsHandle, final byte[] key, final int keyOffset, final int keyLength, final long cfHandle, final StringBuilder stringBuilder); - protected native void merge(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength) + private native long iterator(final long handle); + private native long iterator(final long handle, final long readOptHandle); + private native long iteratorCF(final long handle, final long cfHandle); + private native long iteratorCF(final long handle, final long cfHandle, + final long readOptHandle); + private native long[] iterators(final long handle, + final long[] columnFamilyHandles, final long readOptHandle) throws RocksDBException; - protected native void merge(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength, - long cfHandle) throws RocksDBException; - protected native void merge(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength) throws RocksDBException; - protected native void merge(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength, long cfHandle) throws RocksDBException; - protected native int get(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength) + private native long getSnapshot(final long nativeHandle); + private native void releaseSnapshot( + final long nativeHandle, final long snapshotHandle); + private native String getProperty(final long nativeHandle, + final long cfHandle, final String property, final int propertyLength) throws RocksDBException; - protected native int get(long handle, byte[] key, int keyOffset, - int keyLength, byte[] value, int valueOffset, int valueLength, - long cfHandle) throws RocksDBException; - protected native int get(long handle, long readOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength) throws RocksDBException; - protected native int get(long handle, long readOptHandle, byte[] key, - int keyOffset, int keyLength, byte[] value, int valueOffset, - int valueLength, long cfHandle) throws RocksDBException; - protected native byte[][] multiGet(final long dbHandle, final byte[][] keys, - final int[] keyOffsets, final int[] keyLengths); - protected native byte[][] multiGet(final long dbHandle, final byte[][] keys, - final int[] keyOffsets, final int[] keyLengths, - final long[] columnFamilyHandles); - protected native byte[][] multiGet(final long dbHandle, final long rOptHandle, - final byte[][] keys, final int[] keyOffsets, final int[] keyLengths); - protected native byte[][] multiGet(final long dbHandle, final long rOptHandle, - final byte[][] keys, final int[] keyOffsets, final int[] keyLengths, - final long[] columnFamilyHandles); - protected native byte[] get(long handle, byte[] key, int keyOffset, - int keyLength) throws RocksDBException; - protected native byte[] get(long handle, byte[] key, int keyOffset, - int keyLength, long cfHandle) throws RocksDBException; - protected native byte[] get(long handle, long readOptHandle, - byte[] key, int keyOffset, int keyLength) throws RocksDBException; - protected native byte[] get(long handle, long readOptHandle, byte[] key, - int keyOffset, int keyLength, long cfHandle) throws RocksDBException; - protected native void delete(long handle, byte[] key, int keyOffset, - int keyLength) throws RocksDBException; - protected native void delete(long handle, byte[] key, int keyOffset, - int keyLength, long cfHandle) throws RocksDBException; - protected native void delete(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength) throws RocksDBException; - protected native void delete(long handle, long writeOptHandle, byte[] key, - int keyOffset, int keyLength, long cfHandle) throws RocksDBException; - protected native void singleDelete( - long handle, byte[] key, int keyLen) throws RocksDBException; - protected native void singleDelete( - long handle, byte[] key, int keyLen, long cfHandle) + private native Map getMapProperty(final long nativeHandle, + final long cfHandle, final String property, final int propertyLength) throws RocksDBException; - protected native void singleDelete( - long handle, long writeOptHandle, - byte[] key, int keyLen) throws RocksDBException; - protected native void singleDelete( - long handle, long writeOptHandle, - byte[] key, int keyLen, long cfHandle) throws RocksDBException; - protected native void deleteRange(long handle, byte[] beginKey, int beginKeyOffset, - int beginKeyLength, byte[] endKey, int endKeyOffset, int endKeyLength) + private native long getLongProperty(final long nativeHandle, + final long cfHandle, final String property, final int propertyLength) throws RocksDBException; - protected native void deleteRange(long handle, byte[] beginKey, int beginKeyOffset, - int beginKeyLength, byte[] endKey, int endKeyOffset, int endKeyLength, long cfHandle) + private native void resetStats(final long nativeHandle) throws RocksDBException; - protected native void deleteRange(long handle, long writeOptHandle, byte[] beginKey, - int beginKeyOffset, int beginKeyLength, byte[] endKey, int endKeyOffset, int endKeyLength) + private native long getAggregatedLongProperty(final long nativeHandle, + final String property, int propertyLength) throws RocksDBException; + private native long[] getApproximateSizes(final long nativeHandle, + final long columnFamilyHandle, final long[] rangeSliceHandles, + final byte includeFlags); + private final native long[] getApproximateMemTableStats( + final long nativeHandle, final long columnFamilyHandle, + final long rangeStartSliceHandle, final long rangeLimitSliceHandle); + private native void compactRange(final long handle, + /* @Nullable */ final byte[] begin, final int beginLen, + /* @Nullable */ final byte[] end, final int endLen, + final long compactRangeOptHandle, final long cfHandle) throws RocksDBException; - protected native void deleteRange(long handle, long writeOptHandle, byte[] beginKey, - int beginKeyOffset, int beginKeyLength, byte[] endKey, int endKeyOffset, int endKeyLength, - long cfHandle) throws RocksDBException; - protected native String getProperty0(long nativeHandle, - String property, int propertyLength) throws RocksDBException; - protected native String getProperty0(long nativeHandle, long cfHandle, - String property, int propertyLength) throws RocksDBException; - protected native long getLongProperty(long nativeHandle, String property, - int propertyLength) throws RocksDBException; - protected native long getLongProperty(long nativeHandle, long cfHandle, - String property, int propertyLength) throws RocksDBException; - protected native long getAggregatedLongProperty(long nativeHandle, String property, - int propertyLength) throws RocksDBException; - protected native long iterator(long handle); - protected native long iterator(long handle, long readOptHandle); - protected native long iteratorCF(long handle, long cfHandle); - protected native long iteratorCF(long handle, long cfHandle, - long readOptHandle); - protected native long[] iterators(final long handle, - final long[] columnFamilyHandles, final long readOptHandle) + private native void setOptions(final long handle, final long cfHandle, + final String[] keys, final String[] values) throws RocksDBException; + private native void setDBOptions(final long handle, + final String[] keys, final String[] values) throws RocksDBException; + private native String[] compactFiles(final long handle, + final long compactionOptionsHandle, + final long columnFamilyHandle, + final String[] inputFileNames, + final int outputLevel, + final int outputPathId, + final long compactionJobInfoHandle) throws RocksDBException; + private native void pauseBackgroundWork(final long handle) throws RocksDBException; - protected native long getSnapshot(long nativeHandle); - protected native void releaseSnapshot( - long nativeHandle, long snapshotHandle); - @Override protected native void disposeInternal(final long handle); - private native long getDefaultColumnFamily(long handle); - private native long createColumnFamily(final long handle, - final byte[] columnFamilyName, final long columnFamilyOptions) + private native void continueBackgroundWork(final long handle) throws RocksDBException; - private native void dropColumnFamily(long handle, long cfHandle) + private native void enableAutoCompaction(final long handle, + final long[] columnFamilyHandles) throws RocksDBException; + private native int numberLevels(final long handle, + final long columnFamilyHandle); + private native int maxMemCompactionLevel(final long handle, + final long columnFamilyHandle); + private native int level0StopWriteTrigger(final long handle, + final long columnFamilyHandle); + private native String getName(final long handle); + private native long getEnv(final long handle); + private native void flush(final long handle, final long flushOptHandle, + /* @Nullable */ final long[] cfHandles) throws RocksDBException; + private native void flushWal(final long handle, final boolean sync) throws RocksDBException; - private native void flush(long handle, long flushOptHandle) + private native void syncWal(final long handle) throws RocksDBException; + private native long getLatestSequenceNumber(final long handle); + private native boolean setPreserveDeletesSequenceNumber(final long handle, + final long sequenceNumber); + private native void disableFileDeletions(long handle) throws RocksDBException; + private native void enableFileDeletions(long handle, boolean force) throws RocksDBException; - private native void flush(long handle, long flushOptHandle, long cfHandle) + private native String[] getLiveFiles(final long handle, + final boolean flushMemtable) throws RocksDBException; + private native LogFile[] getSortedWalFiles(final long handle) throws RocksDBException; - private native void compactRange0(long handle, boolean reduce_level, - int target_level, int target_path_id) throws RocksDBException; - private native void compactRange0(long handle, byte[] begin, int beginLen, - byte[] end, int endLen, boolean reduce_level, int target_level, - int target_path_id) throws RocksDBException; - private native void compactRange(long handle, byte[] begin, int beginLen, - byte[] end, int endLen, long compactRangeOptHandle, long cfHandle) - throws RocksDBException; - private native void compactRange(long handle, boolean reduce_level, - int target_level, int target_path_id, long cfHandle) + private native long getUpdatesSince(final long handle, + final long sequenceNumber) throws RocksDBException; + private native void deleteFile(final long handle, final String name) throws RocksDBException; - private native void compactRange(long handle, byte[] begin, int beginLen, - byte[] end, int endLen, boolean reduce_level, int target_level, - int target_path_id, long cfHandle) throws RocksDBException; - private native void pauseBackgroundWork(long handle) throws RocksDBException; - private native void continueBackgroundWork(long handle) throws RocksDBException; - private native long getLatestSequenceNumber(long handle); - private native void disableFileDeletions(long handle) throws RocksDBException; - private native void enableFileDeletions(long handle, boolean force) + private native LiveFileMetaData[] getLiveFilesMetaData(final long handle); + private native ColumnFamilyMetaData getColumnFamilyMetaData( + final long handle, final long columnFamilyHandle); + private native void ingestExternalFile(final long handle, + final long columnFamilyHandle, final String[] filePathList, + final int filePathListLen, final long ingestExternalFileOptionsHandle) throws RocksDBException; - private native long getUpdatesSince(long handle, long sequenceNumber) + private native void verifyChecksum(final long handle) throws RocksDBException; + private native long getDefaultColumnFamily(final long handle); + private native Map getPropertiesOfAllTables( + final long handle, final long columnFamilyHandle) throws RocksDBException; + private native Map getPropertiesOfTablesInRange( + final long handle, final long columnFamilyHandle, + final long[] rangeSliceHandles); + private native long[] suggestCompactRange(final long handle, + final long columnFamilyHandle) throws RocksDBException; + private native void promoteL0(final long handle, + final long columnFamilyHandle, final int tragetLevel) throws RocksDBException; - private native void setOptions(long handle, long cfHandle, String[] keys, - String[] values) throws RocksDBException; - private native void ingestExternalFile(long handle, long cfHandle, - String[] filePathList, int filePathListLen, - long ingest_external_file_options_handle) throws RocksDBException; + private native void startTrace(final long handle, final long maxTraceFileSize, + final long traceWriterHandle) throws RocksDBException; + private native void endTrace(final long handle) throws RocksDBException; + + private native static void destroyDB(final String path, final long optionsHandle) throws RocksDBException; + protected DBOptionsInterface options_; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksEnv.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksEnv.java index 8fe61fd45..b3681d77d 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksEnv.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksEnv.java @@ -25,19 +25,8 @@ public class RocksEnv extends Env { */ RocksEnv(final long handle) { super(handle); - disOwnNativeHandle(); } - /** - *

    The helper function of {@link #dispose()} which all subclasses of - * {@link RocksObject} must implement to release their associated C++ - * resource.

    - * - *

    Note: this class is used to use the default - * RocksEnv with RocksJava. The default env allocation is managed - * by C++.

    - */ @Override - protected final void disposeInternal(final long handle) { - } + protected native final void disposeInternal(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksIteratorInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksIteratorInterface.java index 7ce31509e..a5a9eb88d 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksIteratorInterface.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksIteratorInterface.java @@ -41,7 +41,7 @@ public interface RocksIteratorInterface { void seekToLast(); /** - *

    Position at the first entry in the source whose key is that or + *

    Position at the first entry in the source whose key is at or * past target.

    * *

    The iterator is valid after this call if the source contains diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksMemEnv.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksMemEnv.java index d18d0ceb9..0afa5f662 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksMemEnv.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/RocksMemEnv.java @@ -6,22 +6,34 @@ package org.rocksdb; /** - * RocksDB memory environment. + * Memory environment. */ +//TODO(AR) rename to MemEnv public class RocksMemEnv extends Env { /** - *

    Creates a new RocksDB environment that stores its data + *

    Creates a new environment that stores its data * in memory and delegates all non-file-storage tasks to - * base_env. The caller must delete the result when it is + * {@code baseEnv}.

    + * + *

    The caller must delete the result when it is * no longer needed.

    * - *

    {@code *base_env} must remain live while the result is in use.

    + * @param baseEnv the base environment, + * must remain live while the result is in use. + */ + public RocksMemEnv(final Env baseEnv) { + super(createMemEnv(baseEnv.nativeHandle_)); + } + + /** + * @deprecated Use {@link #RocksMemEnv(Env)}. */ + @Deprecated public RocksMemEnv() { - super(createMemEnv()); + this(Env.getDefault()); } - private static native long createMemEnv(); + private static native long createMemEnv(final long baseEnvHandle); @Override protected final native void disposeInternal(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SizeApproximationFlag.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SizeApproximationFlag.java new file mode 100644 index 000000000..7807e7c83 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SizeApproximationFlag.java @@ -0,0 +1,30 @@ +package org.rocksdb; + +import java.util.List; + +/** + * Flags for + * {@link RocksDB#getApproximateSizes(ColumnFamilyHandle, List, SizeApproximationFlag...)} + * that specify whether memtable stats should be included, + * or file stats approximation or both. + */ +public enum SizeApproximationFlag { + NONE((byte)0x0), + INCLUDE_MEMTABLES((byte)0x1), + INCLUDE_FILES((byte)0x2); + + private final byte value; + + SizeApproximationFlag(final byte value) { + this.value = value; + } + + /** + * Get the internal byte representation. + * + * @return the internal representation. + */ + byte getValue() { + return value; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Slice.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Slice.java index 08a940c3f..50d9f7652 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Slice.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Slice.java @@ -55,7 +55,8 @@ public class Slice extends AbstractSlice { * Slice instances using a handle.

    * * @param nativeHandle address of native instance. - * @param owningNativeHandle whether to own this reference from the C++ side or not + * @param owningNativeHandle true if the Java side owns the memory pointed to + * by this reference, false if ownership belongs to the C++ side */ Slice(final long nativeHandle, final boolean owningNativeHandle) { super(); diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileManager.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileManager.java index f1dfc516e..8805410aa 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileManager.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileManager.java @@ -28,6 +28,8 @@ public final class SstFileManager extends RocksObject { * instances to track SST file and control there deletion rate. * * @param env the environment. + * + * @throws RocksDBException thrown if error happens in underlying native library. */ public SstFileManager(final Env env) throws RocksDBException { this(env, null); @@ -39,6 +41,8 @@ public final class SstFileManager extends RocksObject { * * @param env the environment. * @param logger if not null, the logger will be used to log errors. + * + * @throws RocksDBException thrown if error happens in underlying native library. */ public SstFileManager(final Env env, /*@Nullable*/ final Logger logger) throws RocksDBException { @@ -57,6 +61,8 @@ public final class SstFileManager extends RocksObject { * this value is set to 1024 (1 Kb / sec) and we deleted a file of size * 4 Kb in 1 second, we will wait for another 3 seconds before we delete * other files, Set to 0 to disable deletion rate limiting. + * + * @throws RocksDBException thrown if error happens in underlying native library. */ public SstFileManager(final Env env, /*@Nullable*/ final Logger logger, final long rateBytesPerSec) throws RocksDBException { @@ -78,6 +84,8 @@ public final class SstFileManager extends RocksObject { * @param maxTrashDbRatio if the trash size constitutes for more than this * fraction of the total DB size we will start deleting new files passed * to DeleteScheduler immediately. + * + * @throws RocksDBException thrown if error happens in underlying native library. */ public SstFileManager(final Env env, /*@Nullable*/ final Logger logger, final long rateBytesPerSec, final double maxTrashDbRatio) @@ -104,6 +112,8 @@ public final class SstFileManager extends RocksObject { * @param bytesMaxDeleteChunk if a single file is larger than delete chunk, * ftruncate the file by this size each time, rather than dropping the whole * file. 0 means to always delete the whole file. + * + * @throws RocksDBException thrown if error happens in underlying native library. */ public SstFileManager(final Env env, /*@Nullable*/final Logger logger, final long rateBytesPerSec, final double maxTrashDbRatio, diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileMetaData.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileMetaData.java new file mode 100644 index 000000000..52e984dff --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/SstFileMetaData.java @@ -0,0 +1,150 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The metadata that describes a SST file. + */ +public class SstFileMetaData { + private final String fileName; + private final String path; + private final long size; + private final long smallestSeqno; + private final long largestSeqno; + private final byte[] smallestKey; + private final byte[] largestKey; + private final long numReadsSampled; + private final boolean beingCompacted; + private final long numEntries; + private final long numDeletions; + + /** + * Called from JNI C++ + */ + protected SstFileMetaData( + final String fileName, + final String path, + final long size, + final long smallestSeqno, + final long largestSeqno, + final byte[] smallestKey, + final byte[] largestKey, + final long numReadsSampled, + final boolean beingCompacted, + final long numEntries, + final long numDeletions) { + this.fileName = fileName; + this.path = path; + this.size = size; + this.smallestSeqno = smallestSeqno; + this.largestSeqno = largestSeqno; + this.smallestKey = smallestKey; + this.largestKey = largestKey; + this.numReadsSampled = numReadsSampled; + this.beingCompacted = beingCompacted; + this.numEntries = numEntries; + this.numDeletions = numDeletions; + } + + /** + * Get the name of the file. + * + * @return the name of the file. + */ + public String fileName() { + return fileName; + } + + /** + * Get the full path where the file locates. + * + * @return the full path + */ + public String path() { + return path; + } + + /** + * Get the file size in bytes. + * + * @return file size + */ + public long size() { + return size; + } + + /** + * Get the smallest sequence number in file. + * + * @return the smallest sequence number + */ + public long smallestSeqno() { + return smallestSeqno; + } + + /** + * Get the largest sequence number in file. + * + * @return the largest sequence number + */ + public long largestSeqno() { + return largestSeqno; + } + + /** + * Get the smallest user defined key in the file. + * + * @return the smallest user defined key + */ + public byte[] smallestKey() { + return smallestKey; + } + + /** + * Get the largest user defined key in the file. + * + * @return the largest user defined key + */ + public byte[] largestKey() { + return largestKey; + } + + /** + * Get the number of times the file has been read. + * + * @return the number of times the file has been read + */ + public long numReadsSampled() { + return numReadsSampled; + } + + /** + * Returns true if the file is currently being compacted. + * + * @return true if the file is currently being compacted, false otherwise. + */ + public boolean beingCompacted() { + return beingCompacted; + } + + /** + * Get the number of entries. + * + * @return the number of entries. + */ + public long numEntries() { + return numEntries; + } + + /** + * Get the number of deletions. + * + * @return the number of deletions. + */ + public long numDeletions() { + return numDeletions; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StateType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StateType.java new file mode 100644 index 000000000..803456bb2 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StateType.java @@ -0,0 +1,53 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The type used to refer to a thread state. + * + * A state describes lower-level action of a thread + * such as reading / writing a file or waiting for a mutex. + */ +public enum StateType { + STATE_UNKNOWN((byte)0x0), + STATE_MUTEX_WAIT((byte)0x1); + + private final byte value; + + StateType(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value. + */ + byte getValue() { + return value; + } + + /** + * Get the State type from the internal representation value. + * + * @param value the internal representation value. + * + * @return the state type + * + * @throws IllegalArgumentException if the value does not match + * a StateType + */ + static StateType fromValue(final byte value) + throws IllegalArgumentException { + for (final StateType threadType : StateType.values()) { + if (threadType.value == value) { + return threadType; + } + } + throw new IllegalArgumentException( + "Unknown value for StateType: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatisticsCollector.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatisticsCollector.java index 48cf8af88..fb3f57150 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatisticsCollector.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatisticsCollector.java @@ -93,9 +93,9 @@ public class StatisticsCollector { statsCallback.histogramCallback(histogramType, histogramData); } } - - Thread.sleep(_statsCollectionInterval); } + + Thread.sleep(_statsCollectionInterval); } catch (final InterruptedException e) { Thread.currentThread().interrupt(); diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatsLevel.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatsLevel.java index cc2a87c6a..58504b84a 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatsLevel.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/StatsLevel.java @@ -60,6 +60,6 @@ public enum StatsLevel { } } throw new IllegalArgumentException( - "Illegal value provided for InfoLogLevel."); + "Illegal value provided for StatsLevel."); } } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableFilter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableFilter.java new file mode 100644 index 000000000..45605063b --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableFilter.java @@ -0,0 +1,20 @@ +package org.rocksdb; + +/** + * Filter for iterating a table. + */ +public interface TableFilter { + + /** + * A callback to determine whether relevant keys for this scan exist in a + * given table based on the table's properties. The callback is passed the + * properties of each table during iteration. If the callback returns false, + * the table will not be scanned. This option only affects Iterators and has + * no impact on point lookups. + * + * @param tableProperties the table properties. + * + * @return true if the table should be scanned, false otherwise. + */ + boolean filter(final TableProperties tableProperties); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableProperties.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableProperties.java new file mode 100644 index 000000000..5fe98da67 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TableProperties.java @@ -0,0 +1,365 @@ +package org.rocksdb; + +import java.util.Map; + +/** + * TableProperties contains read-only properties of its associated + * table. + */ +public class TableProperties { + private final long dataSize; + private final long indexSize; + private final long indexPartitions; + private final long topLevelIndexSize; + private final long indexKeyIsUserKey; + private final long indexValueIsDeltaEncoded; + private final long filterSize; + private final long rawKeySize; + private final long rawValueSize; + private final long numDataBlocks; + private final long numEntries; + private final long numDeletions; + private final long numMergeOperands; + private final long numRangeDeletions; + private final long formatVersion; + private final long fixedKeyLen; + private final long columnFamilyId; + private final long creationTime; + private final long oldestKeyTime; + private final byte[] columnFamilyName; + private final String filterPolicyName; + private final String comparatorName; + private final String mergeOperatorName; + private final String prefixExtractorName; + private final String propertyCollectorsNames; + private final String compressionName; + private final Map userCollectedProperties; + private final Map readableProperties; + private final Map propertiesOffsets; + + /** + * Access is private as this will only be constructed from + * C++ via JNI. + */ + private TableProperties(final long dataSize, final long indexSize, + final long indexPartitions, final long topLevelIndexSize, + final long indexKeyIsUserKey, final long indexValueIsDeltaEncoded, + final long filterSize, final long rawKeySize, final long rawValueSize, + final long numDataBlocks, final long numEntries, final long numDeletions, + final long numMergeOperands, final long numRangeDeletions, + final long formatVersion, final long fixedKeyLen, + final long columnFamilyId, final long creationTime, + final long oldestKeyTime, final byte[] columnFamilyName, + final String filterPolicyName, final String comparatorName, + final String mergeOperatorName, final String prefixExtractorName, + final String propertyCollectorsNames, final String compressionName, + final Map userCollectedProperties, + final Map readableProperties, + final Map propertiesOffsets) { + this.dataSize = dataSize; + this.indexSize = indexSize; + this.indexPartitions = indexPartitions; + this.topLevelIndexSize = topLevelIndexSize; + this.indexKeyIsUserKey = indexKeyIsUserKey; + this.indexValueIsDeltaEncoded = indexValueIsDeltaEncoded; + this.filterSize = filterSize; + this.rawKeySize = rawKeySize; + this.rawValueSize = rawValueSize; + this.numDataBlocks = numDataBlocks; + this.numEntries = numEntries; + this.numDeletions = numDeletions; + this.numMergeOperands = numMergeOperands; + this.numRangeDeletions = numRangeDeletions; + this.formatVersion = formatVersion; + this.fixedKeyLen = fixedKeyLen; + this.columnFamilyId = columnFamilyId; + this.creationTime = creationTime; + this.oldestKeyTime = oldestKeyTime; + this.columnFamilyName = columnFamilyName; + this.filterPolicyName = filterPolicyName; + this.comparatorName = comparatorName; + this.mergeOperatorName = mergeOperatorName; + this.prefixExtractorName = prefixExtractorName; + this.propertyCollectorsNames = propertyCollectorsNames; + this.compressionName = compressionName; + this.userCollectedProperties = userCollectedProperties; + this.readableProperties = readableProperties; + this.propertiesOffsets = propertiesOffsets; + } + + /** + * Get the total size of all data blocks. + * + * @return the total size of all data blocks. + */ + public long getDataSize() { + return dataSize; + } + + /** + * Get the size of index block. + * + * @return the size of index block. + */ + public long getIndexSize() { + return indexSize; + } + + /** + * Get the total number of index partitions + * if {@link IndexType#kTwoLevelIndexSearch} is used. + * + * @return the total number of index partitions. + */ + public long getIndexPartitions() { + return indexPartitions; + } + + /** + * Size of the top-level index + * if {@link IndexType#kTwoLevelIndexSearch} is used. + * + * @return the size of the top-level index. + */ + public long getTopLevelIndexSize() { + return topLevelIndexSize; + } + + /** + * Whether the index key is user key. + * Otherwise it includes 8 byte of sequence + * number added by internal key format. + * + * @return the index key + */ + public long getIndexKeyIsUserKey() { + return indexKeyIsUserKey; + } + + /** + * Whether delta encoding is used to encode the index values. + * + * @return whether delta encoding is used to encode the index values. + */ + public long getIndexValueIsDeltaEncoded() { + return indexValueIsDeltaEncoded; + } + + /** + * Get the size of filter block. + * + * @return the size of filter block. + */ + public long getFilterSize() { + return filterSize; + } + + /** + * Get the total raw key size. + * + * @return the total raw key size. + */ + public long getRawKeySize() { + return rawKeySize; + } + + /** + * Get the total raw value size. + * + * @return the total raw value size. + */ + public long getRawValueSize() { + return rawValueSize; + } + + /** + * Get the number of blocks in this table. + * + * @return the number of blocks in this table. + */ + public long getNumDataBlocks() { + return numDataBlocks; + } + + /** + * Get the number of entries in this table. + * + * @return the number of entries in this table. + */ + public long getNumEntries() { + return numEntries; + } + + /** + * Get the number of deletions in the table. + * + * @return the number of deletions in the table. + */ + public long getNumDeletions() { + return numDeletions; + } + + /** + * Get the number of merge operands in the table. + * + * @return the number of merge operands in the table. + */ + public long getNumMergeOperands() { + return numMergeOperands; + } + + /** + * Get the number of range deletions in this table. + * + * @return the number of range deletions in this table. + */ + public long getNumRangeDeletions() { + return numRangeDeletions; + } + + /** + * Get the format version, reserved for backward compatibility. + * + * @return the format version. + */ + public long getFormatVersion() { + return formatVersion; + } + + /** + * Get the length of the keys. + * + * @return 0 when the key is variable length, otherwise number of + * bytes for each key. + */ + public long getFixedKeyLen() { + return fixedKeyLen; + } + + /** + * Get the ID of column family for this SST file, + * corresponding to the column family identified by + * {@link #getColumnFamilyName()}. + * + * @return the id of the column family. + */ + public long getColumnFamilyId() { + return columnFamilyId; + } + + /** + * The time when the SST file was created. + * Since SST files are immutable, this is equivalent + * to last modified time. + * + * @return the created time. + */ + public long getCreationTime() { + return creationTime; + } + + /** + * Get the timestamp of the earliest key. + * + * @return 0 means unknown, otherwise the timestamp. + */ + public long getOldestKeyTime() { + return oldestKeyTime; + } + + /** + * Get the name of the column family with which this + * SST file is associated. + * + * @return the name of the column family, or null if the + * column family is unknown. + */ + /*@Nullable*/ public byte[] getColumnFamilyName() { + return columnFamilyName; + } + + /** + * Get the name of the filter policy used in this table. + * + * @return the name of the filter policy, or null if + * no filter policy is used. + */ + /*@Nullable*/ public String getFilterPolicyName() { + return filterPolicyName; + } + + /** + * Get the name of the comparator used in this table. + * + * @return the name of the comparator. + */ + public String getComparatorName() { + return comparatorName; + } + + /** + * Get the name of the merge operator used in this table. + * + * @return the name of the merge operator, or null if no merge operator + * is used. + */ + /*@Nullable*/ public String getMergeOperatorName() { + return mergeOperatorName; + } + + /** + * Get the name of the prefix extractor used in this table. + * + * @return the name of the prefix extractor, or null if no prefix + * extractor is used. + */ + /*@Nullable*/ public String getPrefixExtractorName() { + return prefixExtractorName; + } + + /** + * Get the names of the property collectors factories used in this table. + * + * @return the names of the property collector factories separated + * by commas, e.g. {collector_name[1]},{collector_name[2]},... + */ + public String getPropertyCollectorsNames() { + return propertyCollectorsNames; + } + + /** + * Get the name of the compression algorithm used to compress the SST files. + * + * @return the name of the compression algorithm. + */ + public String getCompressionName() { + return compressionName; + } + + /** + * Get the user collected properties. + * + * @return the user collected properties. + */ + public Map getUserCollectedProperties() { + return userCollectedProperties; + } + + /** + * Get the readable properties. + * + * @return the readable properties. + */ + public Map getReadableProperties() { + return readableProperties; + } + + /** + * The offset of the value of each property in the file. + * + * @return the offset of each property. + */ + public Map getPropertiesOffsets() { + return propertiesOffsets; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadStatus.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadStatus.java new file mode 100644 index 000000000..062df5889 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadStatus.java @@ -0,0 +1,224 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.Map; + +public class ThreadStatus { + private final long threadId; + private final ThreadType threadType; + private final String dbName; + private final String cfName; + private final OperationType operationType; + private final long operationElapsedTime; // microseconds + private final OperationStage operationStage; + private final long operationProperties[]; + private final StateType stateType; + + /** + * Invoked from C++ via JNI + */ + private ThreadStatus(final long threadId, + final byte threadTypeValue, + final String dbName, + final String cfName, + final byte operationTypeValue, + final long operationElapsedTime, + final byte operationStageValue, + final long[] operationProperties, + final byte stateTypeValue) { + this.threadId = threadId; + this.threadType = ThreadType.fromValue(threadTypeValue); + this.dbName = dbName; + this.cfName = cfName; + this.operationType = OperationType.fromValue(operationTypeValue); + this.operationElapsedTime = operationElapsedTime; + this.operationStage = OperationStage.fromValue(operationStageValue); + this.operationProperties = operationProperties; + this.stateType = StateType.fromValue(stateTypeValue); + } + + /** + * Get the unique ID of the thread. + * + * @return the thread id + */ + public long getThreadId() { + return threadId; + } + + /** + * Get the type of the thread. + * + * @return the type of the thread. + */ + public ThreadType getThreadType() { + return threadType; + } + + /** + * The name of the DB instance that the thread is currently + * involved with. + * + * @return the name of the db, or null if the thread is not involved + * in any DB operation. + */ + /* @Nullable */ public String getDbName() { + return dbName; + } + + /** + * The name of the Column Family that the thread is currently + * involved with. + * + * @return the name of the db, or null if the thread is not involved + * in any column Family operation. + */ + /* @Nullable */ public String getCfName() { + return cfName; + } + + /** + * Get the operation (high-level action) that the current thread is involved + * with. + * + * @return the operation + */ + public OperationType getOperationType() { + return operationType; + } + + /** + * Get the elapsed time of the current thread operation in microseconds. + * + * @return the elapsed time + */ + public long getOperationElapsedTime() { + return operationElapsedTime; + } + + /** + * Get the current stage where the thread is involved in the current + * operation. + * + * @return the current stage of the current operation + */ + public OperationStage getOperationStage() { + return operationStage; + } + + /** + * Get the list of properties that describe some details about the current + * operation. + * + * Each field in might have different meanings for different operations. + * + * @return the properties + */ + public long[] getOperationProperties() { + return operationProperties; + } + + /** + * Get the state (lower-level action) that the current thread is involved + * with. + * + * @return the state + */ + public StateType getStateType() { + return stateType; + } + + /** + * Get the name of the thread type. + * + * @param threadType the thread type + * + * @return the name of the thread type. + */ + public static String getThreadTypeName(final ThreadType threadType) { + return getThreadTypeName(threadType.getValue()); + } + + /** + * Get the name of an operation given its type. + * + * @param operationType the type of operation. + * + * @return the name of the operation. + */ + public static String getOperationName(final OperationType operationType) { + return getOperationName(operationType.getValue()); + } + + public static String microsToString(final long operationElapsedTime) { + return microsToStringNative(operationElapsedTime); + } + + /** + * Obtain a human-readable string describing the specified operation stage. + * + * @param operationStage the stage of the operation. + * + * @return the description of the operation stage. + */ + public static String getOperationStageName( + final OperationStage operationStage) { + return getOperationStageName(operationStage.getValue()); + } + + /** + * Obtain the name of the "i"th operation property of the + * specified operation. + * + * @param operationType the operation type. + * @param i the index of the operation property. + * + * @return the name of the operation property + */ + public static String getOperationPropertyName( + final OperationType operationType, final int i) { + return getOperationPropertyName(operationType.getValue(), i); + } + + /** + * Translate the "i"th property of the specified operation given + * a property value. + * + * @param operationType the operation type. + * @param operationProperties the operation properties. + * + * @return the property values. + */ + public static Map interpretOperationProperties( + final OperationType operationType, final long[] operationProperties) { + return interpretOperationProperties(operationType.getValue(), + operationProperties); + } + + /** + * Obtain the name of a state given its type. + * + * @param stateType the state type. + * + * @return the name of the state. + */ + public static String getStateName(final StateType stateType) { + return getStateName(stateType.getValue()); + } + + private static native String getThreadTypeName(final byte threadTypeValue); + private static native String getOperationName(final byte operationTypeValue); + private static native String microsToStringNative( + final long operationElapsedTime); + private static native String getOperationStageName( + final byte operationStageTypeValue); + private static native String getOperationPropertyName( + final byte operationTypeValue, final int i); + private static native MapinterpretOperationProperties( + final byte operationTypeValue, final long[] operationProperties); + private static native String getStateName(final byte stateTypeValue); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadType.java new file mode 100644 index 000000000..cc329f442 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/ThreadType.java @@ -0,0 +1,65 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * The type of a thread. + */ +public enum ThreadType { + /** + * RocksDB BG thread in high-pri thread pool. + */ + HIGH_PRIORITY((byte)0x0), + + /** + * RocksDB BG thread in low-pri thread pool. + */ + LOW_PRIORITY((byte)0x1), + + /** + * User thread (Non-RocksDB BG thread). + */ + USER((byte)0x2), + + /** + * RocksDB BG thread in bottom-pri thread pool + */ + BOTTOM_PRIORITY((byte)0x3); + + private final byte value; + + ThreadType(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value. + */ + byte getValue() { + return value; + } + + /** + * Get the Thread type from the internal representation value. + * + * @param value the internal representation value. + * + * @return the thread type + * + * @throws IllegalArgumentException if the value does not match a ThreadType + */ + static ThreadType fromValue(final byte value) + throws IllegalArgumentException { + for (final ThreadType threadType : ThreadType.values()) { + if (threadType.value == value) { + return threadType; + } + } + throw new IllegalArgumentException("Unknown value for ThreadType: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TickerType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TickerType.java index fdcf62ff8..551e366dc 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TickerType.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TickerType.java @@ -5,6 +5,16 @@ package org.rocksdb; +/** + * The logical mapping of tickers defined in rocksdb::Tickers. + * + * Java byte value mappings don't align 1:1 to the c++ values. c++ rocksdb::Tickers enumeration type + * is uint32_t and java org.rocksdb.TickerType is byte, this causes mapping issues when + * rocksdb::Tickers value is greater then 127 (0x7F) for jbyte jni interface as range greater is not + * available. Without breaking interface in minor versions, value mappings for + * org.rocksdb.TickerType leverage full byte range [-128 (-0x80), (0x7F)]. Newer tickers added + * should descend into negative values until TICKER_ENUM_MAX reaches -128 (-0x80). + */ public enum TickerType { /** @@ -304,7 +314,8 @@ public enum TickerType { RATE_LIMIT_DELAY_MILLIS((byte) 0x37), /** - * Number of iterators currently open. + * Number of iterators created. + * */ NO_ITERATORS((byte) 0x38), @@ -475,8 +486,238 @@ public enum TickerType { */ NUMBER_MULTIGET_KEYS_FOUND((byte) 0x5E), - TICKER_ENUM_MAX((byte) 0x5F); + // -0x01 to fixate the new value that incorrectly changed TICKER_ENUM_MAX + /** + * Number of iterators created. + */ + NO_ITERATOR_CREATED((byte) -0x01), + + /** + * Number of iterators deleted. + */ + NO_ITERATOR_DELETED((byte) 0x60), + + /** + * Deletions obsoleted before bottom level due to file gap optimization. + */ + COMPACTION_OPTIMIZED_DEL_DROP_OBSOLETE((byte) 0x61), + + /** + * If a compaction was cancelled in sfm to prevent ENOSPC + */ + COMPACTION_CANCELLED((byte) 0x62), + + /** + * # of times bloom FullFilter has not avoided the reads. + */ + BLOOM_FILTER_FULL_POSITIVE((byte) 0x63), + + /** + * # of times bloom FullFilter has not avoided the reads and data actually + * exist. + */ + BLOOM_FILTER_FULL_TRUE_POSITIVE((byte) 0x64), + + /** + * BlobDB specific stats + * # of Put/PutTTL/PutUntil to BlobDB. + */ + BLOB_DB_NUM_PUT((byte) 0x65), + + /** + * # of Write to BlobDB. + */ + BLOB_DB_NUM_WRITE((byte) 0x66), + + /** + * # of Get to BlobDB. + */ + BLOB_DB_NUM_GET((byte) 0x67), + + /** + * # of MultiGet to BlobDB. + */ + BLOB_DB_NUM_MULTIGET((byte) 0x68), + + /** + * # of Seek/SeekToFirst/SeekToLast/SeekForPrev to BlobDB iterator. + */ + BLOB_DB_NUM_SEEK((byte) 0x69), + + /** + * # of Next to BlobDB iterator. + */ + BLOB_DB_NUM_NEXT((byte) 0x6A), + + /** + * # of Prev to BlobDB iterator. + */ + BLOB_DB_NUM_PREV((byte) 0x6B), + + /** + * # of keys written to BlobDB. + */ + BLOB_DB_NUM_KEYS_WRITTEN((byte) 0x6C), + + /** + * # of keys read from BlobDB. + */ + BLOB_DB_NUM_KEYS_READ((byte) 0x6D), + + /** + * # of bytes (key + value) written to BlobDB. + */ + BLOB_DB_BYTES_WRITTEN((byte) 0x6E), + + /** + * # of bytes (keys + value) read from BlobDB. + */ + BLOB_DB_BYTES_READ((byte) 0x6F), + + /** + * # of keys written by BlobDB as non-TTL inlined value. + */ + BLOB_DB_WRITE_INLINED((byte) 0x70), + + /** + * # of keys written by BlobDB as TTL inlined value. + */ + BLOB_DB_WRITE_INLINED_TTL((byte) 0x71), + + /** + * # of keys written by BlobDB as non-TTL blob value. + */ + BLOB_DB_WRITE_BLOB((byte) 0x72), + + /** + * # of keys written by BlobDB as TTL blob value. + */ + BLOB_DB_WRITE_BLOB_TTL((byte) 0x73), + + /** + * # of bytes written to blob file. + */ + BLOB_DB_BLOB_FILE_BYTES_WRITTEN((byte) 0x74), + + /** + * # of bytes read from blob file. + */ + BLOB_DB_BLOB_FILE_BYTES_READ((byte) 0x75), + + /** + * # of times a blob files being synced. + */ + BLOB_DB_BLOB_FILE_SYNCED((byte) 0x76), + + /** + * # of blob index evicted from base DB by BlobDB compaction filter because + * of expiration. + */ + BLOB_DB_BLOB_INDEX_EXPIRED_COUNT((byte) 0x77), + + /** + * Size of blob index evicted from base DB by BlobDB compaction filter + * because of expiration. + */ + BLOB_DB_BLOB_INDEX_EXPIRED_SIZE((byte) 0x78), + /** + * # of blob index evicted from base DB by BlobDB compaction filter because + * of corresponding file deleted. + */ + BLOB_DB_BLOB_INDEX_EVICTED_COUNT((byte) 0x79), + + /** + * Size of blob index evicted from base DB by BlobDB compaction filter + * because of corresponding file deleted. + */ + BLOB_DB_BLOB_INDEX_EVICTED_SIZE((byte) 0x7A), + + /** + * # of blob files being garbage collected. + */ + BLOB_DB_GC_NUM_FILES((byte) 0x7B), + + /** + * # of blob files generated by garbage collection. + */ + BLOB_DB_GC_NUM_NEW_FILES((byte) 0x7C), + + /** + * # of BlobDB garbage collection failures. + */ + BLOB_DB_GC_FAILURES((byte) 0x7D), + + /** + * # of keys drop by BlobDB garbage collection because they had been + * overwritten. + */ + BLOB_DB_GC_NUM_KEYS_OVERWRITTEN((byte) 0x7E), + + /** + * # of keys drop by BlobDB garbage collection because of expiration. + */ + BLOB_DB_GC_NUM_KEYS_EXPIRED((byte) 0x7F), + + /** + * # of keys relocated to new blob file by garbage collection. + */ + BLOB_DB_GC_NUM_KEYS_RELOCATED((byte) -0x02), + + /** + * # of bytes drop by BlobDB garbage collection because they had been + * overwritten. + */ + BLOB_DB_GC_BYTES_OVERWRITTEN((byte) -0x03), + + /** + * # of bytes drop by BlobDB garbage collection because of expiration. + */ + BLOB_DB_GC_BYTES_EXPIRED((byte) -0x04), + + /** + * # of bytes relocated to new blob file by garbage collection. + */ + BLOB_DB_GC_BYTES_RELOCATED((byte) -0x05), + + /** + * # of blob files evicted because of BlobDB is full. + */ + BLOB_DB_FIFO_NUM_FILES_EVICTED((byte) -0x06), + + /** + * # of keys in the blob files evicted because of BlobDB is full. + */ + BLOB_DB_FIFO_NUM_KEYS_EVICTED((byte) -0x07), + + /** + * # of bytes in the blob files evicted because of BlobDB is full. + */ + BLOB_DB_FIFO_BYTES_EVICTED((byte) -0x08), + + /** + * These counters indicate a performance issue in WritePrepared transactions. + * We should not seem them ticking them much. + * # of times prepare_mutex_ is acquired in the fast path. + */ + TXN_PREPARE_MUTEX_OVERHEAD((byte) -0x09), + + /** + * # of times old_commit_map_mutex_ is acquired in the fast path. + */ + TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD((byte) -0x0A), + + /** + * # of times we checked a batch for duplicate keys. + */ + TXN_DUPLICATE_KEY_OVERHEAD((byte) -0x0B), + + /** + * # of times snapshot_mutex_ is acquired in the fast path. + */ + TXN_SNAPSHOT_MUTEX_OVERHEAD((byte) -0x0C), + + TICKER_ENUM_MAX((byte) 0x5F); private final byte value; @@ -484,6 +725,13 @@ public enum TickerType { this.value = value; } + /** + * @deprecated Exposes internal value of native enum mappings. + * This method will be marked package private in the next major release. + * + * @return the internal representation + */ + @Deprecated public byte getValue() { return value; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TimedEnv.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TimedEnv.java new file mode 100644 index 000000000..dc8b5d6ef --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TimedEnv.java @@ -0,0 +1,30 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Timed environment. + */ +public class TimedEnv extends Env { + + /** + *

    Creates a new environment that measures function call times for + * filesystem operations, reporting results to variables in PerfContext.

    + * + * + *

    The caller must delete the result when it is + * no longer needed.

    + * + * @param baseEnv the base environment, + * must remain live while the result is in use. + */ + public TimedEnv(final Env baseEnv) { + super(createTimedEnv(baseEnv.nativeHandle_)); + } + + private static native long createTimedEnv(final long baseEnvHandle); + @Override protected final native void disposeInternal(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceOptions.java new file mode 100644 index 000000000..657b263c6 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceOptions.java @@ -0,0 +1,32 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * TraceOptions is used for + * {@link RocksDB#startTrace(TraceOptions, AbstractTraceWriter)}. + */ +public class TraceOptions { + private final long maxTraceFileSize; + + public TraceOptions() { + this.maxTraceFileSize = 64 * 1024 * 1024 * 1024; // 64 GB + } + + public TraceOptions(final long maxTraceFileSize) { + this.maxTraceFileSize = maxTraceFileSize; + } + + /** + * To avoid the trace file size grows large than the storage space, + * user can set the max trace file size in Bytes. Default is 64GB + * + * @return the max trace size + */ + public long getMaxTraceFileSize() { + return maxTraceFileSize; + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceWriter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceWriter.java new file mode 100644 index 000000000..cb0234e9b --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TraceWriter.java @@ -0,0 +1,36 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * TraceWriter allows exporting RocksDB traces to any system, + * one operation at a time. + */ +public interface TraceWriter { + + /** + * Write the data. + * + * @param data the data + * + * @throws RocksDBException if an error occurs whilst writing. + */ + void write(final Slice data) throws RocksDBException; + + /** + * Close the writer. + * + * @throws RocksDBException if an error occurs whilst closing the writer. + */ + void closeWriter() throws RocksDBException; + + /** + * Get the size of the file that this writer is writing to. + * + * @return the file size + */ + long getFileSize(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Transaction.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Transaction.java index c619bb105..96f1143d4 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Transaction.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/Transaction.java @@ -433,6 +433,33 @@ public class Transaction extends RocksObject { * @param key the key to retrieve the value for. * @param exclusive true if the transaction should have exclusive access to * the key, otherwise false for shared access. + * @param do_validate true if it should validate the snapshot before doing the read + * + * @return a byte array storing the value associated with the input key if + * any. null if it does not find the specified key. + * + * @throws RocksDBException thrown if error happens in underlying + * native library. + */ + public byte[] getForUpdate(final ReadOptions readOptions, + final ColumnFamilyHandle columnFamilyHandle, final byte[] key, final boolean exclusive, + final boolean do_validate) throws RocksDBException { + assert (isOwningHandle()); + return getForUpdate(nativeHandle_, readOptions.nativeHandle_, key, key.length, + columnFamilyHandle.nativeHandle_, exclusive, do_validate); + } + + /** + * Same as + * {@link #getForUpdate(ReadOptions, ColumnFamilyHandle, byte[], boolean, boolean)} + * with do_validate=true. + * + * @param readOptions Read options. + * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle} + * instance + * @param key the key to retrieve the value for. + * @param exclusive true if the transaction should have exclusive access to + * the key, otherwise false for shared access. * * @return a byte array storing the value associated with the input key if * any. null if it does not find the specified key. @@ -444,8 +471,8 @@ public class Transaction extends RocksObject { final ColumnFamilyHandle columnFamilyHandle, final byte[] key, final boolean exclusive) throws RocksDBException { assert(isOwningHandle()); - return getForUpdate(nativeHandle_, readOptions.nativeHandle_, key, - key.length, columnFamilyHandle.nativeHandle_, exclusive); + return getForUpdate(nativeHandle_, readOptions.nativeHandle_, key, key.length, + columnFamilyHandle.nativeHandle_, exclusive, true /*do_validate*/); } /** @@ -495,8 +522,8 @@ public class Transaction extends RocksObject { public byte[] getForUpdate(final ReadOptions readOptions, final byte[] key, final boolean exclusive) throws RocksDBException { assert(isOwningHandle()); - return getForUpdate(nativeHandle_, readOptions.nativeHandle_, key, - key.length, exclusive); + return getForUpdate( + nativeHandle_, readOptions.nativeHandle_, key, key.length, exclusive, true /*do_validate*/); } /** @@ -635,11 +662,23 @@ public class Transaction extends RocksObject { * @throws RocksDBException when one of the TransactionalDB conditions * described above occurs, or in the case of an unexpected error */ + public void put(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, final byte[] value, + final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + put(nativeHandle_, key, key.length, value, value.length, columnFamilyHandle.nativeHandle_, + assume_tracked); + } + + /* + * Same as + * {@link #put(ColumnFamilyHandle, byte[], byte[], boolean)} + * with assume_tracked=false. + */ public void put(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, final byte[] value) throws RocksDBException { assert(isOwningHandle()); - put(nativeHandle_, key, key.length, value, value.length, - columnFamilyHandle.nativeHandle_); + put(nativeHandle_, key, key.length, value, value.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } /** @@ -683,12 +722,24 @@ public class Transaction extends RocksObject { * @throws RocksDBException when one of the TransactionalDB conditions * described above occurs, or in the case of an unexpected error */ + public void put(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts, + final byte[][] valueParts, final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + put(nativeHandle_, keyParts, keyParts.length, valueParts, valueParts.length, + columnFamilyHandle.nativeHandle_, assume_tracked); + } + + /* + * Same as + * {@link #put(ColumnFamilyHandle, byte[][], byte[][], boolean)} + * with assume_tracked=false. + */ public void put(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts, final byte[][] valueParts) throws RocksDBException { assert(isOwningHandle()); put(nativeHandle_, keyParts, keyParts.length, valueParts, valueParts.length, - columnFamilyHandle.nativeHandle_); + columnFamilyHandle.nativeHandle_, /*assume_tracked*/ false); } //TODO(AR) refactor if we implement org.rocksdb.SliceParts in future @@ -733,11 +784,23 @@ public class Transaction extends RocksObject { * @throws RocksDBException when one of the TransactionalDB conditions * described above occurs, or in the case of an unexpected error */ + public void merge(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final byte[] value, final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + merge(nativeHandle_, key, key.length, value, value.length, columnFamilyHandle.nativeHandle_, + assume_tracked); + } + + /* + * Same as + * {@link #merge(ColumnFamilyHandle, byte[], byte[], boolean)} + * with assume_tracked=false. + */ public void merge(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, final byte[] value) throws RocksDBException { assert(isOwningHandle()); - merge(nativeHandle_, key, key.length, value, value.length, - columnFamilyHandle.nativeHandle_); + merge(nativeHandle_, key, key.length, value, value.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } /** @@ -790,10 +853,22 @@ public class Transaction extends RocksObject { * @throws RocksDBException when one of the TransactionalDB conditions * described above occurs, or in the case of an unexpected error */ + public void delete(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + delete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_, assume_tracked); + } + + /* + * Same as + * {@link #delete(ColumnFamilyHandle, byte[], boolean)} + * with assume_tracked=false. + */ public void delete(final ColumnFamilyHandle columnFamilyHandle, final byte[] key) throws RocksDBException { assert(isOwningHandle()); - delete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_); + delete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } /** @@ -834,11 +909,23 @@ public class Transaction extends RocksObject { * @throws RocksDBException when one of the TransactionalDB conditions * described above occurs, or in the case of an unexpected error */ + public void delete(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts, + final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + delete( + nativeHandle_, keyParts, keyParts.length, columnFamilyHandle.nativeHandle_, assume_tracked); + } + + /* + * Same as + * {@link #delete(ColumnFamilyHandle, byte[][], boolean)} + * with assume_tracked=false. + */ public void delete(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts) throws RocksDBException { assert(isOwningHandle()); - delete(nativeHandle_, keyParts, keyParts.length, - columnFamilyHandle.nativeHandle_); + delete(nativeHandle_, keyParts, keyParts.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } //TODO(AR) refactor if we implement org.rocksdb.SliceParts in future @@ -880,11 +967,23 @@ public class Transaction extends RocksObject { * described above occurs, or in the case of an unexpected error */ @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, - final byte[] key) throws RocksDBException { + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, final byte[] key, + final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + singleDelete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_, assume_tracked); + } + + /* + * Same as + * {@link #singleDelete(ColumnFamilyHandle, byte[], boolean)} + * with assume_tracked=false. + */ + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, final byte[] key) + throws RocksDBException { assert(isOwningHandle()); - singleDelete(nativeHandle_, key, key.length, - columnFamilyHandle.nativeHandle_); + singleDelete(nativeHandle_, key, key.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } /** @@ -927,11 +1026,24 @@ public class Transaction extends RocksObject { * described above occurs, or in the case of an unexpected error */ @Experimental("Performance optimization for a very specific workload") - public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, - final byte[][] keyParts) throws RocksDBException { + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts, + final boolean assume_tracked) throws RocksDBException { + assert (isOwningHandle()); + singleDelete( + nativeHandle_, keyParts, keyParts.length, columnFamilyHandle.nativeHandle_, assume_tracked); + } + + /* + * Same as + * {@link #singleDelete(ColumnFamilyHandle, byte[][], boolean)} + * with assume_tracked=false. + */ + @Experimental("Performance optimization for a very specific workload") + public void singleDelete(final ColumnFamilyHandle columnFamilyHandle, final byte[][] keyParts) + throws RocksDBException { assert(isOwningHandle()); - singleDelete(nativeHandle_, keyParts, keyParts.length, - columnFamilyHandle.nativeHandle_); + singleDelete(nativeHandle_, keyParts, keyParts.length, columnFamilyHandle.nativeHandle_, + /*assume_tracked*/ false); } //TODO(AR) refactor if we implement org.rocksdb.SliceParts in future @@ -1642,13 +1754,12 @@ public class Transaction extends RocksObject { private native byte[][] multiGet(final long handle, final long readOptionsHandle, final byte[][] keys) throws RocksDBException; - private native byte[] getForUpdate(final long handle, - final long readOptionsHandle, final byte key[], final int keyLength, - final long columnFamilyHandle, final boolean exclusive) + private native byte[] getForUpdate(final long handle, final long readOptionsHandle, + final byte key[], final int keyLength, final long columnFamilyHandle, final boolean exclusive, + final boolean do_validate) throws RocksDBException; + private native byte[] getForUpdate(final long handle, final long readOptionsHandle, + final byte key[], final int keyLen, final boolean exclusive, final boolean do_validate) throws RocksDBException; - private native byte[] getForUpdate(final long handle, - final long readOptionsHandle, final byte key[], final int keyLen, - final boolean exclusive) throws RocksDBException; private native byte[][] multiGetForUpdate(final long handle, final long readOptionsHandle, final byte[][] keys, final long[] columnFamilyHandles) throws RocksDBException; @@ -1659,42 +1770,38 @@ public class Transaction extends RocksObject { final long readOptionsHandle); private native long getIterator(final long handle, final long readOptionsHandle, final long columnFamilyHandle); - private native void put(final long handle, final byte[] key, - final int keyLength, final byte[] value, final int valueLength, - final long columnFamilyHandle) throws RocksDBException; + private native void put(final long handle, final byte[] key, final int keyLength, + final byte[] value, final int valueLength, final long columnFamilyHandle, + final boolean assume_tracked) throws RocksDBException; private native void put(final long handle, final byte[] key, final int keyLength, final byte[] value, final int valueLength) throws RocksDBException; - private native void put(final long handle, final byte[][] keys, - final int keysLength, final byte[][] values, final int valuesLength, - final long columnFamilyHandle) throws RocksDBException; + private native void put(final long handle, final byte[][] keys, final int keysLength, + final byte[][] values, final int valuesLength, final long columnFamilyHandle, + final boolean assume_tracked) throws RocksDBException; private native void put(final long handle, final byte[][] keys, final int keysLength, final byte[][] values, final int valuesLength) throws RocksDBException; - private native void merge(final long handle, final byte[] key, - final int keyLength, final byte[] value, final int valueLength, - final long columnFamilyHandle) throws RocksDBException; + private native void merge(final long handle, final byte[] key, final int keyLength, + final byte[] value, final int valueLength, final long columnFamilyHandle, + final boolean assume_tracked) throws RocksDBException; private native void merge(final long handle, final byte[] key, final int keyLength, final byte[] value, final int valueLength) throws RocksDBException; - private native void delete(final long handle, final byte[] key, - final int keyLength, final long columnFamilyHandle) - throws RocksDBException; + private native void delete(final long handle, final byte[] key, final int keyLength, + final long columnFamilyHandle, final boolean assume_tracked) throws RocksDBException; private native void delete(final long handle, final byte[] key, final int keyLength) throws RocksDBException; - private native void delete(final long handle, final byte[][] keys, - final int keysLength, final long columnFamilyHandle) - throws RocksDBException; + private native void delete(final long handle, final byte[][] keys, final int keysLength, + final long columnFamilyHandle, final boolean assume_tracked) throws RocksDBException; private native void delete(final long handle, final byte[][] keys, final int keysLength) throws RocksDBException; - private native void singleDelete(final long handle, final byte[] key, - final int keyLength, final long columnFamilyHandle) - throws RocksDBException; + private native void singleDelete(final long handle, final byte[] key, final int keyLength, + final long columnFamilyHandle, final boolean assume_tracked) throws RocksDBException; private native void singleDelete(final long handle, final byte[] key, final int keyLength) throws RocksDBException; - private native void singleDelete(final long handle, final byte[][] keys, - final int keysLength, final long columnFamilyHandle) - throws RocksDBException; + private native void singleDelete(final long handle, final byte[][] keys, final int keysLength, + final long columnFamilyHandle, final boolean assume_tracked) throws RocksDBException; private native void singleDelete(final long handle, final byte[][] keys, final int keysLength) throws RocksDBException; private native void putUntracked(final long handle, final byte[] key, diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TransactionDB.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TransactionDB.java index fcecf3faf..a1a09cf96 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TransactionDB.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TransactionDB.java @@ -104,6 +104,53 @@ public class TransactionDB extends RocksDB return tdb; } + /** + * This is similar to {@link #close()} except that it + * throws an exception if any error occurs. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + * + * @throws RocksDBException if an error occurs whilst closing. + */ + public void closeE() throws RocksDBException { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } finally { + disposeInternal(); + } + } + } + + /** + * This is similar to {@link #closeE()} except that it + * silently ignores any errors. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + */ + @Override + public void close() { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } catch (final RocksDBException e) { + // silently ignore the error report + } finally { + disposeInternal(); + } + } + } + @Override public Transaction beginTransaction(final WriteOptions writeOptions) { return new Transaction(this, beginTransaction(nativeHandle_, @@ -327,12 +374,16 @@ public class TransactionDB extends RocksDB this.transactionDbOptions_ = transactionDbOptions; } + @Override protected final native void disposeInternal(final long handle); + private static native long open(final long optionsHandle, final long transactionDbOptionsHandle, final String path) throws RocksDBException; private static native long[] open(final long dbOptionsHandle, final long transactionDbOptionsHandle, final String path, final byte[][] columnFamilyNames, final long[] columnFamilyOptions); + private native static void closeDatabase(final long handle) + throws RocksDBException; private native long beginTransaction(final long handle, final long writeOptionsHandle); private native long beginTransaction(final long handle, @@ -350,5 +401,4 @@ public class TransactionDB extends RocksDB private native DeadlockPath[] getDeadlockInfoBuffer(final long handle); private native void setDeadlockInfoBufferSize(final long handle, final int targetSize); - @Override protected final native void disposeInternal(final long handle); } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TtlDB.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TtlDB.java index 740f51268..26eee4a87 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TtlDB.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/TtlDB.java @@ -139,6 +139,55 @@ public class TtlDB extends RocksDB { return ttlDB; } + /** + *

    Close the TtlDB instance and release resource.

    + * + * This is similar to {@link #close()} except that it + * throws an exception if any error occurs. + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + * + * @throws RocksDBException if an error occurs whilst closing. + */ + public void closeE() throws RocksDBException { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } finally { + disposeInternal(); + } + } + } + + /** + *

    Close the TtlDB instance and release resource.

    + * + * + * This will not fsync the WAL files. + * If syncing is required, the caller must first call {@link #syncWal()} + * or {@link #write(WriteOptions, WriteBatch)} using an empty write batch + * with {@link WriteOptions#setSync(boolean)} set to true. + * + * See also {@link #close()}. + */ + @Override + public void close() { + if (owningHandle_.compareAndSet(true, false)) { + try { + closeDatabase(nativeHandle_); + } catch (final RocksDBException e) { + // silently ignore the error report + } finally { + disposeInternal(); + } + } + } + /** *

    Creates a new ttl based column family with a name defined * in given ColumnFamilyDescriptor and allocates a @@ -160,22 +209,8 @@ public class TtlDB extends RocksDB { final int ttl) throws RocksDBException { return new ColumnFamilyHandle(this, createColumnFamilyWithTtl(nativeHandle_, - columnFamilyDescriptor.columnFamilyName(), - columnFamilyDescriptor.columnFamilyOptions().nativeHandle_, ttl)); - } - - /** - *

    Close the TtlDB instance and release resource.

    - * - *

    Internally, TtlDB owns the {@code rocksdb::DB} pointer - * to its associated {@link org.rocksdb.RocksDB}. The release - * of that RocksDB pointer is handled in the destructor of the - * c++ {@code rocksdb::TtlDB} and should be transparent to - * Java developers.

    - */ - @Override - public void close() { - super.close(); + columnFamilyDescriptor.getName(), + columnFamilyDescriptor.getOptions().nativeHandle_, ttl)); } /** @@ -193,10 +228,7 @@ public class TtlDB extends RocksDB { super(nativeHandle); } - @Override protected void finalize() throws Throwable { - close(); //TODO(AR) revisit here when implementing AutoCloseable - super.finalize(); - } + @Override protected native void disposeInternal(final long handle); private native static long open(final long optionsHandle, final String db_path, final int ttl, final boolean readOnly) @@ -208,4 +240,6 @@ public class TtlDB extends RocksDB { private native long createColumnFamilyWithTtl(final long handle, final byte[] columnFamilyName, final long columnFamilyOptions, int ttl) throws RocksDBException; + private native static void closeDatabase(final long handle) + throws RocksDBException; } diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/UInt64AddOperator.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/UInt64AddOperator.java new file mode 100644 index 000000000..cce9b298d --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/UInt64AddOperator.java @@ -0,0 +1,19 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Uint64AddOperator is a merge operator that accumlates a long + * integer value. + */ +public class UInt64AddOperator extends MergeOperator { + public UInt64AddOperator() { + super(newSharedUInt64AddOperator()); + } + + private native static long newSharedUInt64AddOperator(); + @Override protected final native void disposeInternal(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFileType.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFileType.java new file mode 100644 index 000000000..fed27ed11 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFileType.java @@ -0,0 +1,55 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +public enum WalFileType { + /** + * Indicates that WAL file is in archive directory. WAL files are moved from + * the main db directory to archive directory once they are not live and stay + * there until cleaned up. Files are cleaned depending on archive size + * (Options::WAL_size_limit_MB) and time since last cleaning + * (Options::WAL_ttl_seconds). + */ + kArchivedLogFile((byte)0x0), + + /** + * Indicates that WAL file is live and resides in the main db directory + */ + kAliveLogFile((byte)0x1); + + private final byte value; + + WalFileType(final byte value) { + this.value = value; + } + + /** + * Get the internal representation value. + * + * @return the internal representation value + */ + byte getValue() { + return value; + } + + /** + * Get the WalFileType from the internal representation value. + * + * @return the wal file type. + * + * @throws IllegalArgumentException if the value is unknown. + */ + static WalFileType fromValue(final byte value) { + for (final WalFileType walFileType : WalFileType.values()) { + if(walFileType.value == value) { + return walFileType; + } + } + + throw new IllegalArgumentException( + "Illegal value provided for WalFileType: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFilter.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFilter.java new file mode 100644 index 000000000..37e36213a --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalFilter.java @@ -0,0 +1,87 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import java.util.Map; + +/** + * WALFilter allows an application to inspect write-ahead-log (WAL) + * records or modify their processing on recovery. + */ +public interface WalFilter { + + /** + * Provide ColumnFamily->LogNumber map to filter + * so that filter can determine whether a log number applies to a given + * column family (i.e. that log hasn't been flushed to SST already for the + * column family). + * + * We also pass in name>id map as only name is known during + * recovery (as handles are opened post-recovery). + * while write batch callbacks happen in terms of column family id. + * + * @param cfLognumber column_family_id to lognumber map + * @param cfNameId column_family_name to column_family_id map + */ + void columnFamilyLogNumberMap(final Map cfLognumber, + final Map cfNameId); + + /** + * LogRecord is invoked for each log record encountered for all the logs + * during replay on logs on recovery. This method can be used to: + * * inspect the record (using the batch parameter) + * * ignoring current record + * (by returning WalProcessingOption::kIgnoreCurrentRecord) + * * reporting corrupted record + * (by returning WalProcessingOption::kCorruptedRecord) + * * stop log replay + * (by returning kStop replay) - please note that this implies + * discarding the logs from current record onwards. + * + * @param logNumber log number of the current log. + * Filter might use this to determine if the log + * record is applicable to a certain column family. + * @param logFileName log file name - only for informational purposes + * @param batch batch encountered in the log during recovery + * @param newBatch new batch to populate if filter wants to change + * the batch (for example to filter some records out, or alter some + * records). Please note that the new batch MUST NOT contain + * more records than original, else recovery would be failed. + * + * @return Processing option for the current record. + */ + LogRecordFoundResult logRecordFound(final long logNumber, + final String logFileName, final WriteBatch batch, + final WriteBatch newBatch); + + class LogRecordFoundResult { + public static LogRecordFoundResult CONTINUE_UNCHANGED = + new LogRecordFoundResult(WalProcessingOption.CONTINUE_PROCESSING, false); + + final WalProcessingOption walProcessingOption; + final boolean batchChanged; + + /** + * @param walProcessingOption the processing option + * @param batchChanged Whether batch was changed by the filter. + * It must be set to true if newBatch was populated, + * else newBatch has no effect. + */ + public LogRecordFoundResult(final WalProcessingOption walProcessingOption, + final boolean batchChanged) { + this.walProcessingOption = walProcessingOption; + this.batchChanged = batchChanged; + } + } + + /** + * Returns a name that identifies this WAL filter. + * The name will be printed to LOG file on start up for diagnosis. + * + * @return the name + */ + String name(); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalProcessingOption.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalProcessingOption.java new file mode 100644 index 000000000..889602edc --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WalProcessingOption.java @@ -0,0 +1,54 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +public enum WalProcessingOption { + /** + * Continue processing as usual. + */ + CONTINUE_PROCESSING((byte)0x0), + + /** + * Ignore the current record but continue processing of log(s). + */ + IGNORE_CURRENT_RECORD((byte)0x1), + + /** + * Stop replay of logs and discard logs. + * Logs won't be replayed on subsequent recovery. + */ + STOP_REPLAY((byte)0x2), + + /** + * Corrupted record detected by filter. + */ + CORRUPTED_RECORD((byte)0x3); + + private final byte value; + + WalProcessingOption(final byte value) { + this.value = value; + } + + /** + * Get the internal representation. + * + * @return the internal representation. + */ + byte getValue() { + return value; + } + + public static WalProcessingOption fromValue(final byte value) { + for (final WalProcessingOption walProcessingOption : WalProcessingOption.values()) { + if (walProcessingOption.value == value) { + return walProcessingOption; + } + } + throw new IllegalArgumentException( + "Illegal value provided for WalProcessingOption: " + value); + } +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchInterface.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchInterface.java index 21c8b6fae..e0999e21b 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchInterface.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchInterface.java @@ -23,6 +23,7 @@ public interface WriteBatchInterface { * * @param key the specified key to be inserted. * @param value the value associated with the specified key. + * @throws RocksDBException thrown if error happens in underlying native library. */ void put(byte[] key, byte[] value) throws RocksDBException; @@ -34,6 +35,7 @@ public interface WriteBatchInterface { * instance * @param key the specified key to be inserted. * @param value the value associated with the specified key. + * @throws RocksDBException thrown if error happens in underlying native library. */ void put(ColumnFamilyHandle columnFamilyHandle, byte[] key, byte[] value) throws RocksDBException; @@ -45,6 +47,7 @@ public interface WriteBatchInterface { * @param key the specified key to be merged. * @param value the value to be merged with the current value for * the specified key. + * @throws RocksDBException thrown if error happens in underlying native library. */ void merge(byte[] key, byte[] value) throws RocksDBException; @@ -56,6 +59,7 @@ public interface WriteBatchInterface { * @param key the specified key to be merged. * @param value the value to be merged with the current value for * the specified key. + * @throws RocksDBException thrown if error happens in underlying native library. */ void merge(ColumnFamilyHandle columnFamilyHandle, byte[] key, byte[] value) throws RocksDBException; @@ -66,6 +70,7 @@ public interface WriteBatchInterface { * @param key Key to delete within database * * @deprecated Use {@link #delete(byte[])} + * @throws RocksDBException thrown if error happens in underlying native library. */ @Deprecated void remove(byte[] key) throws RocksDBException; @@ -77,6 +82,7 @@ public interface WriteBatchInterface { * @param key Key to delete within database * * @deprecated Use {@link #delete(ColumnFamilyHandle, byte[])} + * @throws RocksDBException thrown if error happens in underlying native library. */ @Deprecated void remove(ColumnFamilyHandle columnFamilyHandle, byte[] key) @@ -86,6 +92,7 @@ public interface WriteBatchInterface { *

    If the database contains a mapping for "key", erase it. Else do nothing.

    * * @param key Key to delete within database + * @throws RocksDBException thrown if error happens in underlying native library. */ void delete(byte[] key) throws RocksDBException; @@ -94,6 +101,7 @@ public interface WriteBatchInterface { * * @param columnFamilyHandle {@link ColumnFamilyHandle} instance * @param key Key to delete within database + * @throws RocksDBException thrown if error happens in underlying native library. */ void delete(ColumnFamilyHandle columnFamilyHandle, byte[] key) throws RocksDBException; @@ -161,6 +169,7 @@ public interface WriteBatchInterface { * First key to delete within database (included) * @param endKey * Last key to delete within database (excluded) + * @throws RocksDBException thrown if error happens in underlying native library. */ void deleteRange(byte[] beginKey, byte[] endKey) throws RocksDBException; @@ -178,6 +187,7 @@ public interface WriteBatchInterface { * First key to delete within database (included) * @param endKey * Last key to delete within database (excluded) + * @throws RocksDBException thrown if error happens in underlying native library. */ void deleteRange(ColumnFamilyHandle columnFamilyHandle, byte[] beginKey, byte[] endKey) throws RocksDBException; @@ -195,6 +205,7 @@ public interface WriteBatchInterface { * replication. * * @param blob binary object to be inserted + * @throws RocksDBException thrown if error happens in underlying native library. */ void putLogData(byte[] blob) throws RocksDBException; diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java index 2c0350837..2ad91042d 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBatchWithIndex.java @@ -129,12 +129,10 @@ public class WriteBatchWithIndex extends AbstractWriteBatch { public RocksIterator newIteratorWithBase( final ColumnFamilyHandle columnFamilyHandle, final RocksIterator baseIterator) { - RocksIterator iterator = new RocksIterator( - baseIterator.parent_, - iteratorWithBase(nativeHandle_, - columnFamilyHandle.nativeHandle_, - baseIterator.nativeHandle_)); - //when the iterator is deleted it will also delete the baseIterator + RocksIterator iterator = new RocksIterator(baseIterator.parent_, + iteratorWithBase( + nativeHandle_, columnFamilyHandle.nativeHandle_, baseIterator.nativeHandle_)); + // when the iterator is deleted it will also delete the baseIterator baseIterator.disOwnNativeHandle(); return iterator; } @@ -151,8 +149,7 @@ public class WriteBatchWithIndex extends AbstractWriteBatch { * point-in-timefrom baseIterator and modifications made in this write batch. */ public RocksIterator newIteratorWithBase(final RocksIterator baseIterator) { - return newIteratorWithBase(baseIterator.parent_.getDefaultColumnFamily(), - baseIterator); + return newIteratorWithBase(baseIterator.parent_.getDefaultColumnFamily(), baseIterator); } /** @@ -295,8 +292,8 @@ public class WriteBatchWithIndex extends AbstractWriteBatch { final boolean overwriteKey); private native long iterator0(final long handle); private native long iterator1(final long handle, final long cfHandle); - private native long iteratorWithBase(final long handle, - final long baseIteratorHandle, final long cfHandle); + private native long iteratorWithBase( + final long handle, final long baseIteratorHandle, final long cfHandle); private native byte[] getFromBatch(final long handle, final long optHandle, final byte[] key, final int keyLen); private native byte[] getFromBatch(final long handle, final long optHandle, diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBufferManager.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBufferManager.java new file mode 100644 index 000000000..b244aa952 --- /dev/null +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteBufferManager.java @@ -0,0 +1,33 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Java wrapper over native write_buffer_manager class + */ +public class WriteBufferManager extends RocksObject { + static { + RocksDB.loadLibrary(); + } + + /** + * Construct a new instance of WriteBufferManager. + * + * Check + * https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager + * for more details on when to use it + * + * @param bufferSizeBytes buffer size(in bytes) to use for native write_buffer_manager + * @param cache cache whose memory should be bounded by this write buffer manager + */ + public WriteBufferManager(final long bufferSizeBytes, final Cache cache){ + super(newWriteBufferManager(bufferSizeBytes, cache.nativeHandle_)); + } + + private native static long newWriteBufferManager(final long bufferSizeBytes, final long cacheHandle); + @Override + protected native void disposeInternal(final long handle); +} diff --git a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteOptions.java b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteOptions.java index db662aa50..71789ed1f 100644 --- a/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteOptions.java +++ b/ceph/src/rocksdb/java/src/main/java/org/rocksdb/WriteOptions.java @@ -90,7 +90,10 @@ public class WriteOptions extends RocksObject { /** * If true, writes will not first go to the write ahead log, - * and the write may got lost after a crash. + * and the write may got lost after a crash. The backup engine + * relies on write-ahead logs to back up the memtable, so if + * you disable write-ahead logs, you must create backups with + * flush_before_backup=true to avoid losing unflushed memtable data. * * @param flag a boolean flag to specify whether to disable * write-ahead-log on writes. @@ -103,7 +106,10 @@ public class WriteOptions extends RocksObject { /** * If true, writes will not first go to the write ahead log, - * and the write may got lost after a crash. + * and the write may got lost after a crash. The backup engine + * relies on write-ahead logs to back up the memtable, so if + * you disable write-ahead logs, you must create backups with + * flush_before_backup=true to avoid losing unflushed memtable data. * * @return boolean value indicating if WAL is disabled. */ @@ -163,8 +169,41 @@ public class WriteOptions extends RocksObject { return noSlowdown(nativeHandle_); } + /** + * If true, this write request is of lower priority if compaction is + * behind. In this case that, {@link #noSlowdown()} == true, the request + * will be cancelled immediately with {@link Status.Code#Incomplete} returned. + * Otherwise, it will be slowed down. The slowdown value is determined by + * RocksDB to guarantee it introduces minimum impacts to high priority writes. + * + * Default: false + * + * @param lowPri true if the write request should be of lower priority than + * compactions which are behind. + * + * @return the instance of the current WriteOptions. + */ + public WriteOptions setLowPri(final boolean lowPri) { + setLowPri(nativeHandle_, lowPri); + return this; + } + + /** + * Returns true if this write request is of lower priority if compaction is + * behind. + * + * See {@link #setLowPri(boolean)}. + * + * @return true if this write request is of lower priority, false otherwise. + */ + public boolean lowPri() { + return lowPri(nativeHandle_); + } + private native static long newWriteOptions(); private native static long copyWriteOptions(long handle); + @Override protected final native void disposeInternal(final long handle); + private native void setSync(long handle, boolean flag); private native boolean sync(long handle); private native void setDisableWAL(long handle, boolean flag); @@ -175,5 +214,6 @@ public class WriteOptions extends RocksObject { private native void setNoSlowdown(final long handle, final boolean noSlowdown); private native boolean noSlowdown(final long handle); - @Override protected final native void disposeInternal(final long handle); + private native void setLowPri(final long handle, final boolean lowPri); + private native boolean lowPri(final long handle); } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BackupableDBOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BackupableDBOptionsTest.java index c223014fd..0b4992184 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BackupableDBOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BackupableDBOptionsTest.java @@ -45,7 +45,7 @@ public class BackupableDBOptionsTest { assertThat(backupableDBOptions.backupEnv()). isNull(); - try(final Env env = new RocksMemEnv()) { + try(final Env env = new RocksMemEnv(Env.getDefault())) { backupableDBOptions.setBackupEnv(env); assertThat(backupableDBOptions.backupEnv()) .isEqualTo(env); diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BlockBasedTableConfigTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BlockBasedTableConfigTest.java index 2b15b69f8..fe9f86325 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BlockBasedTableConfigTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/BlockBasedTableConfigTest.java @@ -6,6 +6,7 @@ package org.rocksdb; import org.junit.ClassRule; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; @@ -22,23 +23,94 @@ public class BlockBasedTableConfigTest { @Rule public TemporaryFolder dbFolder = new TemporaryFolder(); + @Test + public void cacheIndexAndFilterBlocks() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); + assertThat(blockBasedTableConfig.cacheIndexAndFilterBlocks()). + isTrue(); + + } + + @Test + public void cacheIndexAndFilterBlocksWithHighPriority() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true); + assertThat(blockBasedTableConfig.cacheIndexAndFilterBlocksWithHighPriority()). + isTrue(); + } + + @Test + public void pinL0FilterAndIndexBlocksInCache() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setPinL0FilterAndIndexBlocksInCache(true); + assertThat(blockBasedTableConfig.pinL0FilterAndIndexBlocksInCache()). + isTrue(); + } + + @Test + public void pinTopLevelIndexAndFilter() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setPinTopLevelIndexAndFilter(false); + assertThat(blockBasedTableConfig.pinTopLevelIndexAndFilter()). + isFalse(); + } + + @Test + public void indexType() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + assertThat(IndexType.values().length).isEqualTo(3); + blockBasedTableConfig.setIndexType(IndexType.kHashSearch); + assertThat(blockBasedTableConfig.indexType().equals( + IndexType.kHashSearch)); + assertThat(IndexType.valueOf("kBinarySearch")).isNotNull(); + blockBasedTableConfig.setIndexType(IndexType.valueOf("kBinarySearch")); + assertThat(blockBasedTableConfig.indexType().equals( + IndexType.kBinarySearch)); + } + + @Test + public void dataBlockIndexType() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setDataBlockIndexType(DataBlockIndexType.kDataBlockBinaryAndHash); + assertThat(blockBasedTableConfig.dataBlockIndexType().equals( + DataBlockIndexType.kDataBlockBinaryAndHash)); + blockBasedTableConfig.setDataBlockIndexType(DataBlockIndexType.kDataBlockBinarySearch); + assertThat(blockBasedTableConfig.dataBlockIndexType().equals( + DataBlockIndexType.kDataBlockBinarySearch)); + } + + @Test + public void checksumType() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + assertThat(ChecksumType.values().length).isEqualTo(3); + assertThat(ChecksumType.valueOf("kxxHash")). + isEqualTo(ChecksumType.kxxHash); + blockBasedTableConfig.setChecksumType(ChecksumType.kNoChecksum); + blockBasedTableConfig.setChecksumType(ChecksumType.kxxHash); + assertThat(blockBasedTableConfig.checksumType().equals( + ChecksumType.kxxHash)); + } + @Test public void noBlockCache() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); blockBasedTableConfig.setNoBlockCache(true); assertThat(blockBasedTableConfig.noBlockCache()).isTrue(); } @Test - public void blockCacheSize() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockCacheSize(8 * 1024); - assertThat(blockBasedTableConfig.blockCacheSize()). - isEqualTo(8 * 1024); + public void blockCache() { + try ( + final Cache cache = new LRUCache(17 * 1024 * 1024); + final Options options = new Options().setTableFormatConfig( + new BlockBasedTableConfig().setBlockCache(cache))) { + assertThat(options.tableFactoryName()).isEqualTo("BlockBasedTable"); + } } @Test - public void sharedBlockCache() throws RocksDBException { + public void blockCacheIntegration() throws RocksDBException { try (final Cache cache = new LRUCache(8 * 1024 * 1024); final Statistics statistics = new Statistics()) { for (int shard = 0; shard < 8; shard++) { @@ -63,148 +135,259 @@ public class BlockBasedTableConfigTest { } @Test - public void blockSizeDeviation() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockSizeDeviation(12); - assertThat(blockBasedTableConfig.blockSizeDeviation()). - isEqualTo(12); + public void persistentCache() throws RocksDBException { + try (final DBOptions dbOptions = new DBOptions(). + setInfoLogLevel(InfoLogLevel.INFO_LEVEL). + setCreateIfMissing(true); + final Logger logger = new Logger(dbOptions) { + @Override + protected void log(final InfoLogLevel infoLogLevel, final String logMsg) { + System.out.println(infoLogLevel.name() + ": " + logMsg); + } + }) { + try (final PersistentCache persistentCache = + new PersistentCache(Env.getDefault(), dbFolder.getRoot().getPath(), 1024 * 1024 * 100, logger, false); + final Options options = new Options().setTableFormatConfig( + new BlockBasedTableConfig().setPersistentCache(persistentCache))) { + assertThat(options.tableFactoryName()).isEqualTo("BlockBasedTable"); + } + } } @Test - public void blockRestartInterval() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockRestartInterval(15); - assertThat(blockBasedTableConfig.blockRestartInterval()). - isEqualTo(15); + public void blockCacheCompressed() { + try (final Cache cache = new LRUCache(17 * 1024 * 1024); + final Options options = new Options().setTableFormatConfig( + new BlockBasedTableConfig().setBlockCacheCompressed(cache))) { + assertThat(options.tableFactoryName()).isEqualTo("BlockBasedTable"); + } } + @Ignore("See issue: https://github.com/facebook/rocksdb/issues/4822") @Test - public void wholeKeyFiltering() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setWholeKeyFiltering(false); - assertThat(blockBasedTableConfig.wholeKeyFiltering()). - isFalse(); - } + public void blockCacheCompressedIntegration() throws RocksDBException { + final byte[] key1 = "some-key1".getBytes(StandardCharsets.UTF_8); + final byte[] key2 = "some-key1".getBytes(StandardCharsets.UTF_8); + final byte[] key3 = "some-key1".getBytes(StandardCharsets.UTF_8); + final byte[] key4 = "some-key1".getBytes(StandardCharsets.UTF_8); + final byte[] value = "some-value".getBytes(StandardCharsets.UTF_8); - @Test - public void cacheIndexAndFilterBlocks() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setCacheIndexAndFilterBlocks(true); - assertThat(blockBasedTableConfig.cacheIndexAndFilterBlocks()). - isTrue(); + try (final Cache compressedCache = new LRUCache(8 * 1024 * 1024); + final Statistics statistics = new Statistics()) { + + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig() + .setNoBlockCache(true) + .setBlockCache(null) + .setBlockCacheCompressed(compressedCache) + .setFormatVersion(4); + try (final Options options = new Options() + .setCreateIfMissing(true) + .setStatistics(statistics) + .setTableFormatConfig(blockBasedTableConfig)) { + + for (int shard = 0; shard < 8; shard++) { + try (final FlushOptions flushOptions = new FlushOptions(); + final WriteOptions writeOptions = new WriteOptions(); + final ReadOptions readOptions = new ReadOptions(); + final RocksDB db = + RocksDB.open(options, dbFolder.getRoot().getAbsolutePath() + "/" + shard)) { + + db.put(writeOptions, key1, value); + db.put(writeOptions, key2, value); + db.put(writeOptions, key3, value); + db.put(writeOptions, key4, value); + db.flush(flushOptions); + + db.get(readOptions, key1); + db.get(readOptions, key2); + db.get(readOptions, key3); + db.get(readOptions, key4); + + assertThat(statistics.getTickerCount(TickerType.BLOCK_CACHE_COMPRESSED_ADD)).isEqualTo(shard + 1); + } + } + } + } } @Test - public void hashIndexAllowCollision() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setHashIndexAllowCollision(false); - assertThat(blockBasedTableConfig.hashIndexAllowCollision()). - isFalse(); + public void blockSize() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockSize(10); + assertThat(blockBasedTableConfig.blockSize()).isEqualTo(10); } @Test - public void blockCacheCompressedSize() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockCacheCompressedSize(40); - assertThat(blockBasedTableConfig.blockCacheCompressedSize()). - isEqualTo(40); + public void blockSizeDeviation() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockSizeDeviation(12); + assertThat(blockBasedTableConfig.blockSizeDeviation()). + isEqualTo(12); } @Test - public void checksumType() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - assertThat(ChecksumType.values().length).isEqualTo(3); - assertThat(ChecksumType.valueOf("kxxHash")). - isEqualTo(ChecksumType.kxxHash); - blockBasedTableConfig.setChecksumType(ChecksumType.kNoChecksum); - blockBasedTableConfig.setChecksumType(ChecksumType.kxxHash); - assertThat(blockBasedTableConfig.checksumType().equals( - ChecksumType.kxxHash)); + public void blockRestartInterval() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockRestartInterval(15); + assertThat(blockBasedTableConfig.blockRestartInterval()). + isEqualTo(15); } @Test - public void indexType() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - assertThat(IndexType.values().length).isEqualTo(3); - blockBasedTableConfig.setIndexType(IndexType.kHashSearch); - assertThat(blockBasedTableConfig.indexType().equals( - IndexType.kHashSearch)); - assertThat(IndexType.valueOf("kBinarySearch")).isNotNull(); - blockBasedTableConfig.setIndexType(IndexType.valueOf("kBinarySearch")); - assertThat(blockBasedTableConfig.indexType().equals( - IndexType.kBinarySearch)); + public void indexBlockRestartInterval() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setIndexBlockRestartInterval(15); + assertThat(blockBasedTableConfig.indexBlockRestartInterval()). + isEqualTo(15); } @Test - public void blockCacheCompressedNumShardBits() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockCacheCompressedNumShardBits(4); - assertThat(blockBasedTableConfig.blockCacheCompressedNumShardBits()). - isEqualTo(4); + public void metadataBlockSize() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setMetadataBlockSize(1024); + assertThat(blockBasedTableConfig.metadataBlockSize()). + isEqualTo(1024); } @Test - public void cacheNumShardBits() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setCacheNumShardBits(5); - assertThat(blockBasedTableConfig.cacheNumShardBits()). - isEqualTo(5); + public void partitionFilters() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setPartitionFilters(true); + assertThat(blockBasedTableConfig.partitionFilters()). + isTrue(); } @Test - public void blockSize() { - BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); - blockBasedTableConfig.setBlockSize(10); - assertThat(blockBasedTableConfig.blockSize()).isEqualTo(10); + public void useDeltaEncoding() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setUseDeltaEncoding(false); + assertThat(blockBasedTableConfig.useDeltaEncoding()). + isFalse(); } - @Test - public void blockBasedTableWithFilter() { + public void blockBasedTableWithFilterPolicy() { try(final Options options = new Options() .setTableFormatConfig(new BlockBasedTableConfig() - .setFilter(new BloomFilter(10)))) { + .setFilterPolicy(new BloomFilter(10)))) { assertThat(options.tableFactoryName()). isEqualTo("BlockBasedTable"); } } @Test - public void blockBasedTableWithoutFilter() { + public void blockBasedTableWithoutFilterPolicy() { try(final Options options = new Options().setTableFormatConfig( - new BlockBasedTableConfig().setFilter(null))) { + new BlockBasedTableConfig().setFilterPolicy(null))) { assertThat(options.tableFactoryName()). isEqualTo("BlockBasedTable"); } } @Test - public void blockBasedTableWithBlockCache() { - try (final Options options = new Options().setTableFormatConfig( - new BlockBasedTableConfig().setBlockCache(new LRUCache(17 * 1024 * 1024)))) { - assertThat(options.tableFactoryName()).isEqualTo("BlockBasedTable"); - } + public void wholeKeyFiltering() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setWholeKeyFiltering(false); + assertThat(blockBasedTableConfig.wholeKeyFiltering()). + isFalse(); + } + + @Test + public void verifyCompression() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setVerifyCompression(true); + assertThat(blockBasedTableConfig.verifyCompression()). + isTrue(); + } + + @Test + public void readAmpBytesPerBit() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setReadAmpBytesPerBit(2); + assertThat(blockBasedTableConfig.readAmpBytesPerBit()). + isEqualTo(2); } @Test - public void blockBasedTableFormatVersion() { - BlockBasedTableConfig config = new BlockBasedTableConfig(); - for (int version=0; version<=2; version++) { - config.setFormatVersion(version); - assertThat(config.formatVersion()).isEqualTo(version); + public void formatVersion() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + for (int version = 0; version < 5; version++) { + blockBasedTableConfig.setFormatVersion(version); + assertThat(blockBasedTableConfig.formatVersion()).isEqualTo(version); } } @Test(expected = AssertionError.class) - public void blockBasedTableFormatVersionFailNegative() { - BlockBasedTableConfig config = new BlockBasedTableConfig(); - config.setFormatVersion(-1); + public void formatVersionFailNegative() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setFormatVersion(-1); } @Test(expected = AssertionError.class) - public void blockBasedTableFormatVersionFailIllegalVersion() { - BlockBasedTableConfig config = new BlockBasedTableConfig(); - config.setFormatVersion(3); + public void formatVersionFailIllegalVersion() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setFormatVersion(99); + } + + @Test + public void enableIndexCompression() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setEnableIndexCompression(false); + assertThat(blockBasedTableConfig.enableIndexCompression()). + isFalse(); + } + + @Test + public void blockAlign() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockAlign(true); + assertThat(blockBasedTableConfig.blockAlign()). + isTrue(); + } + + @Deprecated + @Test + public void hashIndexAllowCollision() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setHashIndexAllowCollision(false); + assertThat(blockBasedTableConfig.hashIndexAllowCollision()). + isTrue(); // NOTE: setHashIndexAllowCollision should do nothing! + } + + @Deprecated + @Test + public void blockCacheSize() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockCacheSize(8 * 1024); + assertThat(blockBasedTableConfig.blockCacheSize()). + isEqualTo(8 * 1024); + } + + @Deprecated + @Test + public void blockCacheNumShardBits() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setCacheNumShardBits(5); + assertThat(blockBasedTableConfig.cacheNumShardBits()). + isEqualTo(5); + } + + @Deprecated + @Test + public void blockCacheCompressedSize() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockCacheCompressedSize(40); + assertThat(blockBasedTableConfig.blockCacheCompressedSize()). + isEqualTo(40); + } + + @Deprecated + @Test + public void blockCacheCompressedNumShardBits() { + final BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig(); + blockBasedTableConfig.setBlockCacheCompressedNumShardBits(4); + assertThat(blockBasedTableConfig.blockCacheCompressedNumShardBits()). + isEqualTo(4); } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyOptionsTest.java index 43c17d52e..2cd8f0de9 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyOptionsTest.java @@ -7,6 +7,7 @@ package org.rocksdb; import org.junit.ClassRule; import org.junit.Test; +import org.rocksdb.test.RemoveEmptyValueCompactionFilterFactory; import java.util.ArrayList; import java.util.List; @@ -463,6 +464,23 @@ public class ColumnFamilyOptionsTest { } } + @Test + public void bottommostCompressionOptions() { + try (final ColumnFamilyOptions columnFamilyOptions = + new ColumnFamilyOptions(); + final CompressionOptions bottommostCompressionOptions = + new CompressionOptions() + .setMaxDictBytes(123)) { + + columnFamilyOptions.setBottommostCompressionOptions( + bottommostCompressionOptions); + assertThat(columnFamilyOptions.bottommostCompressionOptions()) + .isEqualTo(bottommostCompressionOptions); + assertThat(columnFamilyOptions.bottommostCompressionOptions() + .maxDictBytes()).isEqualTo(123); + } + } + @Test public void compressionOptions() { try (final ColumnFamilyOptions columnFamilyOptions @@ -541,6 +559,15 @@ public class ColumnFamilyOptionsTest { } } + @Test + public void ttl() { + try (final ColumnFamilyOptions options = new ColumnFamilyOptions()) { + options.setTtl(1000 * 60); + assertThat(options.ttl()). + isEqualTo(1000 * 60); + } + } + @Test public void compactionOptionsUniversal() { try (final ColumnFamilyOptions opt = new ColumnFamilyOptions(); @@ -576,4 +603,23 @@ public class ColumnFamilyOptionsTest { isEqualTo(booleanValue); } } + + @Test + public void compactionFilter() { + try(final ColumnFamilyOptions options = new ColumnFamilyOptions(); + final RemoveEmptyValueCompactionFilter cf = new RemoveEmptyValueCompactionFilter()) { + options.setCompactionFilter(cf); + assertThat(options.compactionFilter()).isEqualTo(cf); + } + } + + @Test + public void compactionFilterFactory() { + try(final ColumnFamilyOptions options = new ColumnFamilyOptions(); + final RemoveEmptyValueCompactionFilterFactory cff = new RemoveEmptyValueCompactionFilterFactory()) { + options.setCompactionFilterFactory(cff); + assertThat(options.compactionFilterFactory()).isEqualTo(cff); + } + } + } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyTest.java index 0b943ac96..84815b476 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ColumnFamilyTest.java @@ -229,6 +229,7 @@ public class ColumnFamilyTest { new ColumnFamilyOptions())); db.put(tmpColumnFamilyHandle, "key".getBytes(), "value".getBytes()); db.dropColumnFamily(tmpColumnFamilyHandle); + assertThat(tmpColumnFamilyHandle.isOwningHandle()).isTrue(); } finally { if (tmpColumnFamilyHandle != null) { tmpColumnFamilyHandle.close(); @@ -240,6 +241,46 @@ public class ColumnFamilyTest { } } + @Test + public void createWriteDropColumnFamilies() throws RocksDBException { + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor("new_cf".getBytes())); + final List columnFamilyHandleList = new ArrayList<>(); + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true); + final RocksDB db = RocksDB.open(options, + dbFolder.getRoot().getAbsolutePath(), cfDescriptors, + columnFamilyHandleList)) { + ColumnFamilyHandle tmpColumnFamilyHandle = null; + ColumnFamilyHandle tmpColumnFamilyHandle2 = null; + try { + tmpColumnFamilyHandle = db.createColumnFamily( + new ColumnFamilyDescriptor("tmpCF".getBytes(), + new ColumnFamilyOptions())); + tmpColumnFamilyHandle2 = db.createColumnFamily( + new ColumnFamilyDescriptor("tmpCF2".getBytes(), + new ColumnFamilyOptions())); + db.put(tmpColumnFamilyHandle, "key".getBytes(), "value".getBytes()); + db.put(tmpColumnFamilyHandle2, "key".getBytes(), "value".getBytes()); + db.dropColumnFamilies(Arrays.asList(tmpColumnFamilyHandle, tmpColumnFamilyHandle2)); + assertThat(tmpColumnFamilyHandle.isOwningHandle()).isTrue(); + assertThat(tmpColumnFamilyHandle2.isOwningHandle()).isTrue(); + } finally { + if (tmpColumnFamilyHandle != null) { + tmpColumnFamilyHandle.close(); + } + if (tmpColumnFamilyHandle2 != null) { + tmpColumnFamilyHandle2.close(); + } + for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandleList) { + columnFamilyHandle.close(); + } + } + } + } + @Test public void writeBatch() throws RocksDBException { try (final StringAppendOperator stringAppendOperator = new StringAppendOperator(); @@ -378,6 +419,50 @@ public class ColumnFamilyTest { } } + @Test + public void multiGetAsList() throws RocksDBException { + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor("new_cf".getBytes())); + final List columnFamilyHandleList = new ArrayList<>(); + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true); + final RocksDB db = RocksDB.open(options, + dbFolder.getRoot().getAbsolutePath(), + cfDescriptors, columnFamilyHandleList)) { + try { + db.put(columnFamilyHandleList.get(0), "key".getBytes(), + "value".getBytes()); + db.put(columnFamilyHandleList.get(1), "newcfkey".getBytes(), + "value".getBytes()); + + final List keys = Arrays.asList(new byte[][]{ + "key".getBytes(), "newcfkey".getBytes() + }); + List retValues = db.multiGetAsList(columnFamilyHandleList, + keys); + assertThat(retValues.size()).isEqualTo(2); + assertThat(new String(retValues.get(0))) + .isEqualTo("value"); + assertThat(new String(retValues.get(1))) + .isEqualTo("value"); + retValues = db.multiGetAsList(new ReadOptions(), columnFamilyHandleList, + keys); + assertThat(retValues.size()).isEqualTo(2); + assertThat(new String(retValues.get(0))) + .isEqualTo("value"); + assertThat(new String(retValues.get(1))) + .isEqualTo("value"); + } finally { + for (final ColumnFamilyHandle columnFamilyHandle : + columnFamilyHandleList) { + columnFamilyHandle.close(); + } + } + } + } + @Test public void properties() throws RocksDBException { final List cfDescriptors = Arrays.asList( diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionFilterFactoryTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionFilterFactoryTest.java index e90307b0d..efa29b1d9 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionFilterFactoryTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionFilterFactoryTest.java @@ -8,6 +8,7 @@ package org.rocksdb; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; +import org.rocksdb.test.RemoveEmptyValueCompactionFilterFactory; import java.util.ArrayList; import java.util.Arrays; @@ -63,16 +64,4 @@ public class CompactionFilterFactoryTest { } } } - - private static class RemoveEmptyValueCompactionFilterFactory extends AbstractCompactionFilterFactory { - @Override - public RemoveEmptyValueCompactionFilter createCompactionFilter(final AbstractCompactionFilter.Context context) { - return new RemoveEmptyValueCompactionFilter(); - } - - @Override - public String name() { - return "RemoveEmptyValueCompactionFilterFactory"; - } - } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobInfoTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobInfoTest.java new file mode 100644 index 000000000..6c920439c --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobInfoTest.java @@ -0,0 +1,114 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +public class CompactionJobInfoTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Test + public void columnFamilyName() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.columnFamilyName()) + .isEmpty(); + } + } + + @Test + public void status() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.status().getCode()) + .isEqualTo(Status.Code.Ok); + } + } + + @Test + public void threadId() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.threadId()) + .isEqualTo(0); + } + } + + @Test + public void jobId() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.jobId()) + .isEqualTo(0); + } + } + + @Test + public void baseInputLevel() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.baseInputLevel()) + .isEqualTo(0); + } + } + + @Test + public void outputLevel() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.outputLevel()) + .isEqualTo(0); + } + } + + @Test + public void inputFiles() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.inputFiles()) + .isEmpty(); + } + } + + @Test + public void outputFiles() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.outputFiles()) + .isEmpty(); + } + } + + @Test + public void tableProperties() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.tableProperties()) + .isEmpty(); + } + } + + @Test + public void compactionReason() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.compactionReason()) + .isEqualTo(CompactionReason.kUnknown); + } + } + + @Test + public void compression() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.compression()) + .isEqualTo(CompressionType.NO_COMPRESSION); + } + } + + @Test + public void stats() { + try (final CompactionJobInfo compactionJobInfo = new CompactionJobInfo()) { + assertThat(compactionJobInfo.stats()) + .isNotNull(); + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobStatsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobStatsTest.java new file mode 100644 index 000000000..7be7226da --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionJobStatsTest.java @@ -0,0 +1,196 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +public class CompactionJobStatsTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Test + public void reset() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + compactionJobStats.reset(); + assertThat(compactionJobStats.elapsedMicros()).isEqualTo(0); + } + } + + @Test + public void add() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats(); + final CompactionJobStats otherCompactionJobStats = new CompactionJobStats()) { + compactionJobStats.add(otherCompactionJobStats); + } + } + + @Test + public void elapsedMicros() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.elapsedMicros()).isEqualTo(0); + } + } + + @Test + public void numInputRecords() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numInputRecords()).isEqualTo(0); + } + } + + @Test + public void numInputFiles() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numInputFiles()).isEqualTo(0); + } + } + + @Test + public void numInputFilesAtOutputLevel() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numInputFilesAtOutputLevel()).isEqualTo(0); + } + } + + @Test + public void numOutputRecords() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numOutputRecords()).isEqualTo(0); + } + } + + @Test + public void numOutputFiles() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numOutputFiles()).isEqualTo(0); + } + } + + @Test + public void isManualCompaction() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.isManualCompaction()).isFalse(); + } + } + + @Test + public void totalInputBytes() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.totalInputBytes()).isEqualTo(0); + } + } + + @Test + public void totalOutputBytes() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.totalOutputBytes()).isEqualTo(0); + } + } + + + @Test + public void numRecordsReplaced() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numRecordsReplaced()).isEqualTo(0); + } + } + + @Test + public void totalInputRawKeyBytes() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.totalInputRawKeyBytes()).isEqualTo(0); + } + } + + @Test + public void totalInputRawValueBytes() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.totalInputRawValueBytes()).isEqualTo(0); + } + } + + @Test + public void numInputDeletionRecords() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numInputDeletionRecords()).isEqualTo(0); + } + } + + @Test + public void numExpiredDeletionRecords() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numExpiredDeletionRecords()).isEqualTo(0); + } + } + + @Test + public void numCorruptKeys() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numCorruptKeys()).isEqualTo(0); + } + } + + @Test + public void fileWriteNanos() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.fileWriteNanos()).isEqualTo(0); + } + } + + @Test + public void fileRangeSyncNanos() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.fileRangeSyncNanos()).isEqualTo(0); + } + } + + @Test + public void fileFsyncNanos() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.fileFsyncNanos()).isEqualTo(0); + } + } + + @Test + public void filePrepareWriteNanos() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.filePrepareWriteNanos()).isEqualTo(0); + } + } + + @Test + public void smallestOutputKeyPrefix() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.smallestOutputKeyPrefix()).isEmpty(); + } + } + + @Test + public void largestOutputKeyPrefix() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.largestOutputKeyPrefix()).isEmpty(); + } + } + + @Test + public void numSingleDelFallthru() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numSingleDelFallthru()).isEqualTo(0); + } + } + + @Test + public void numSingleDelMismatch() { + try (final CompactionJobStats compactionJobStats = new CompactionJobStats()) { + assertThat(compactionJobStats.numSingleDelMismatch()).isEqualTo(0); + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsFIFOTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsFIFOTest.java index 370a28e81..841615e67 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsFIFOTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsFIFOTest.java @@ -18,9 +18,18 @@ public class CompactionOptionsFIFOTest { @Test public void maxTableFilesSize() { final long size = 500 * 1024 * 1026; - try(final CompactionOptionsFIFO opt = new CompactionOptionsFIFO()) { + try (final CompactionOptionsFIFO opt = new CompactionOptionsFIFO()) { opt.setMaxTableFilesSize(size); assertThat(opt.maxTableFilesSize()).isEqualTo(size); } } + + @Test + public void allowCompaction() { + final boolean allowCompaction = true; + try (final CompactionOptionsFIFO opt = new CompactionOptionsFIFO()) { + opt.setAllowCompaction(allowCompaction); + assertThat(opt.allowCompaction()).isEqualTo(allowCompaction); + } + } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsTest.java new file mode 100644 index 000000000..b1726e866 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompactionOptionsTest.java @@ -0,0 +1,52 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +public class CompactionOptionsTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Test + public void compression() { + try (final CompactionOptions compactionOptions = new CompactionOptions()) { + assertThat(compactionOptions.compression()) + .isEqualTo(CompressionType.SNAPPY_COMPRESSION); + compactionOptions.setCompression(CompressionType.NO_COMPRESSION); + assertThat(compactionOptions.compression()) + .isEqualTo(CompressionType.NO_COMPRESSION); + } + } + + @Test + public void outputFileSizeLimit() { + final long mb250 = 1024 * 1024 * 250; + try (final CompactionOptions compactionOptions = new CompactionOptions()) { + assertThat(compactionOptions.outputFileSizeLimit()) + .isEqualTo(-1); + compactionOptions.setOutputFileSizeLimit(mb250); + assertThat(compactionOptions.outputFileSizeLimit()) + .isEqualTo(mb250); + } + } + + @Test + public void maxSubcompactions() { + try (final CompactionOptions compactionOptions = new CompactionOptions()) { + assertThat(compactionOptions.maxSubcompactions()) + .isEqualTo(0); + compactionOptions.setMaxSubcompactions(9); + assertThat(compactionOptions.maxSubcompactions()) + .isEqualTo(9); + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompressionOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompressionOptionsTest.java index c49224ca3..116552c32 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompressionOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/CompressionOptionsTest.java @@ -50,4 +50,22 @@ public class CompressionOptionsTest { assertThat(opt.maxDictBytes()).isEqualTo(maxDictBytes); } } + + @Test + public void zstdMaxTrainBytes() { + final int zstdMaxTrainBytes = 999; + try(final CompressionOptions opt = new CompressionOptions()) { + opt.setZStdMaxTrainBytes(zstdMaxTrainBytes); + assertThat(opt.zstdMaxTrainBytes()).isEqualTo(zstdMaxTrainBytes); + } + } + + @Test + public void enabled() { + try(final CompressionOptions opt = new CompressionOptions()) { + assertThat(opt.enabled()).isFalse(); + opt.setEnabled(true); + assertThat(opt.enabled()).isTrue(); + } + } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DBOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DBOptionsTest.java index 453639d57..e6ebc46cd 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DBOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DBOptionsTest.java @@ -424,6 +424,26 @@ public class DBOptionsTest { } } + @Test + public void setWriteBufferManager() throws RocksDBException { + try (final DBOptions opt = new DBOptions(); + final Cache cache = new LRUCache(1 * 1024 * 1024); + final WriteBufferManager writeBufferManager = new WriteBufferManager(2000l, cache)) { + opt.setWriteBufferManager(writeBufferManager); + assertThat(opt.writeBufferManager()).isEqualTo(writeBufferManager); + } + } + + @Test + public void setWriteBufferManagerWithZeroBufferSize() throws RocksDBException { + try (final DBOptions opt = new DBOptions(); + final Cache cache = new LRUCache(1 * 1024 * 1024); + final WriteBufferManager writeBufferManager = new WriteBufferManager(0l, cache)) { + opt.setWriteBufferManager(writeBufferManager); + assertThat(opt.writeBufferManager()).isEqualTo(writeBufferManager); + } + } + @Test public void accessHintOnCompactionStart() { try(final DBOptions opt = new DBOptions()) { @@ -514,6 +534,15 @@ public class DBOptionsTest { } } + @Test + public void enablePipelinedWrite() { + try(final DBOptions opt = new DBOptions()) { + assertThat(opt.enablePipelinedWrite()).isFalse(); + opt.setEnablePipelinedWrite(true); + assertThat(opt.enablePipelinedWrite()).isTrue(); + } + } + @Test public void allowConcurrentMemtableWrite() { try (final DBOptions opt = new DBOptions()) { @@ -595,6 +624,38 @@ public class DBOptionsTest { } } + @Test + public void walFilter() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.walFilter()).isNull(); + + try (final AbstractWalFilter walFilter = new AbstractWalFilter() { + @Override + public void columnFamilyLogNumberMap( + final Map cfLognumber, + final Map cfNameId) { + // no-op + } + + @Override + public LogRecordFoundResult logRecordFound(final long logNumber, + final String logFileName, final WriteBatch batch, + final WriteBatch newBatch) { + return new LogRecordFoundResult( + WalProcessingOption.CONTINUE_PROCESSING, false); + } + + @Override + public String name() { + return "test-wal-filter"; + } + }) { + opt.setWalFilter(walFilter); + assertThat(opt.walFilter()).isEqualTo(walFilter); + } + } + } + @Test public void failIfOptionsFileError() { try (final DBOptions opt = new DBOptions()) { @@ -631,6 +692,51 @@ public class DBOptionsTest { } } + @Test + public void allowIngestBehind() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.allowIngestBehind()).isFalse(); + opt.setAllowIngestBehind(true); + assertThat(opt.allowIngestBehind()).isTrue(); + } + } + + @Test + public void preserveDeletes() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.preserveDeletes()).isFalse(); + opt.setPreserveDeletes(true); + assertThat(opt.preserveDeletes()).isTrue(); + } + } + + @Test + public void twoWriteQueues() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.twoWriteQueues()).isFalse(); + opt.setTwoWriteQueues(true); + assertThat(opt.twoWriteQueues()).isTrue(); + } + } + + @Test + public void manualWalFlush() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.manualWalFlush()).isFalse(); + opt.setManualWalFlush(true); + assertThat(opt.manualWalFlush()).isTrue(); + } + } + + @Test + public void atomicFlush() { + try (final DBOptions opt = new DBOptions()) { + assertThat(opt.atomicFlush()).isFalse(); + opt.setAtomicFlush(true); + assertThat(opt.atomicFlush()).isTrue(); + } + } + @Test public void rateLimiter() { try(final DBOptions options = new DBOptions(); diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DefaultEnvTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DefaultEnvTest.java new file mode 100644 index 000000000..9e4f04387 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/DefaultEnvTest.java @@ -0,0 +1,113 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.util.Collection; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class DefaultEnvTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + @Test + public void backgroundThreads() { + try (final Env defaultEnv = RocksEnv.getDefault()) { + defaultEnv.setBackgroundThreads(5, Priority.BOTTOM); + assertThat(defaultEnv.getBackgroundThreads(Priority.BOTTOM)).isEqualTo(5); + + defaultEnv.setBackgroundThreads(5); + assertThat(defaultEnv.getBackgroundThreads(Priority.LOW)).isEqualTo(5); + + defaultEnv.setBackgroundThreads(5, Priority.LOW); + assertThat(defaultEnv.getBackgroundThreads(Priority.LOW)).isEqualTo(5); + + defaultEnv.setBackgroundThreads(5, Priority.HIGH); + assertThat(defaultEnv.getBackgroundThreads(Priority.HIGH)).isEqualTo(5); + } + } + + @Test + public void threadPoolQueueLen() { + try (final Env defaultEnv = RocksEnv.getDefault()) { + assertThat(defaultEnv.getThreadPoolQueueLen(Priority.BOTTOM)).isEqualTo(0); + assertThat(defaultEnv.getThreadPoolQueueLen(Priority.LOW)).isEqualTo(0); + assertThat(defaultEnv.getThreadPoolQueueLen(Priority.HIGH)).isEqualTo(0); + } + } + + @Test + public void incBackgroundThreadsIfNeeded() { + try (final Env defaultEnv = RocksEnv.getDefault()) { + defaultEnv.incBackgroundThreadsIfNeeded(20, Priority.BOTTOM); + assertThat(defaultEnv.getBackgroundThreads(Priority.BOTTOM)).isGreaterThanOrEqualTo(20); + + defaultEnv.incBackgroundThreadsIfNeeded(20, Priority.LOW); + assertThat(defaultEnv.getBackgroundThreads(Priority.LOW)).isGreaterThanOrEqualTo(20); + + defaultEnv.incBackgroundThreadsIfNeeded(20, Priority.HIGH); + assertThat(defaultEnv.getBackgroundThreads(Priority.HIGH)).isGreaterThanOrEqualTo(20); + } + } + + @Test + public void lowerThreadPoolIOPriority() { + try (final Env defaultEnv = RocksEnv.getDefault()) { + defaultEnv.lowerThreadPoolIOPriority(Priority.BOTTOM); + + defaultEnv.lowerThreadPoolIOPriority(Priority.LOW); + + defaultEnv.lowerThreadPoolIOPriority(Priority.HIGH); + } + } + + @Test + public void lowerThreadPoolCPUPriority() { + try (final Env defaultEnv = RocksEnv.getDefault()) { + defaultEnv.lowerThreadPoolCPUPriority(Priority.BOTTOM); + + defaultEnv.lowerThreadPoolCPUPriority(Priority.LOW); + + defaultEnv.lowerThreadPoolCPUPriority(Priority.HIGH); + } + } + + @Test + public void threadList() throws RocksDBException { + try (final Env defaultEnv = RocksEnv.getDefault()) { + final Collection threadList = defaultEnv.getThreadList(); + assertThat(threadList.size()).isGreaterThan(0); + } + } + + @Test + public void threadList_integration() throws RocksDBException { + try (final Env env = RocksEnv.getDefault(); + final Options opt = new Options() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true) + .setEnv(env)) { + // open database + try (final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + + final List threadList = env.getThreadList(); + assertThat(threadList.size()).isGreaterThan(0); + } + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/EnvOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/EnvOptionsTest.java index 9933b1e1d..9be61b7d7 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/EnvOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/EnvOptionsTest.java @@ -18,6 +18,18 @@ public class EnvOptionsTest { public static final Random rand = PlatformRandomHelper.getPlatformSpecificRandomFactory(); + @Test + public void dbOptionsConstructor() { + final long compactionReadaheadSize = 4 * 1024 * 1024; + try (final DBOptions dbOptions = new DBOptions() + .setCompactionReadaheadSize(compactionReadaheadSize)) { + try (final EnvOptions envOptions = new EnvOptions(dbOptions)) { + assertThat(envOptions.compactionReadaheadSize()) + .isEqualTo(compactionReadaheadSize); + } + } + } + @Test public void useMmapReads() { try (final EnvOptions envOptions = new EnvOptions()) { diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/FlushOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/FlushOptionsTest.java new file mode 100644 index 000000000..f90ae911d --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/FlushOptionsTest.java @@ -0,0 +1,31 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +public class FlushOptionsTest { + + @Test + public void waitForFlush() { + try (final FlushOptions flushOptions = new FlushOptions()) { + assertThat(flushOptions.waitForFlush()).isTrue(); + flushOptions.setWaitForFlush(false); + assertThat(flushOptions.waitForFlush()).isFalse(); + } + } + + @Test + public void allowWriteStall() { + try (final FlushOptions flushOptions = new FlushOptions()) { + assertThat(flushOptions.allowWriteStall()).isFalse(); + flushOptions.setAllowWriteStall(true); + assertThat(flushOptions.allowWriteStall()).isTrue(); + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/HdfsEnvTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/HdfsEnvTest.java new file mode 100644 index 000000000..3a91c5cad --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/HdfsEnvTest.java @@ -0,0 +1,45 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import static java.nio.charset.StandardCharsets.UTF_8; + +public class HdfsEnvTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + // expect org.rocksdb.RocksDBException: Not compiled with hdfs support + @Test(expected = RocksDBException.class) + public void construct() throws RocksDBException { + try (final Env env = new HdfsEnv("hdfs://localhost:5000")) { + // no-op + } + } + + // expect org.rocksdb.RocksDBException: Not compiled with hdfs support + @Test(expected = RocksDBException.class) + public void construct_integration() throws RocksDBException { + try (final Env env = new HdfsEnv("hdfs://localhost:5000"); + final Options options = new Options() + .setCreateIfMissing(true) + .setEnv(env); + ) { + try (final RocksDB db = RocksDB.open(options, dbFolder.getRoot().getPath())) { + db.put("key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + } + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/InfoLogLevelTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/InfoLogLevelTest.java index 48ecfa16a..b215dd17f 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/InfoLogLevelTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/InfoLogLevelTest.java @@ -27,6 +27,7 @@ public class InfoLogLevelTest { try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath())) { db.put("key".getBytes(), "value".getBytes()); + db.flush(new FlushOptions().setWaitForFlush(true)); assertThat(getLogContentsWithoutHeader()).isNotEmpty(); } } @@ -93,7 +94,7 @@ public class InfoLogLevelTest { int first_non_header = lines.length; // Identify the last line of the header for (int i = lines.length - 1; i >= 0; --i) { - if (lines[i].indexOf("Options.") >= 0 && lines[i].indexOf(':') >= 0) { + if (lines[i].indexOf("DB pointer") >= 0) { first_non_header = i + 1; break; } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/IngestExternalFileOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/IngestExternalFileOptionsTest.java index 83e0dd17a..a3973ccd9 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/IngestExternalFileOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/IngestExternalFileOptionsTest.java @@ -84,4 +84,24 @@ public class IngestExternalFileOptionsTest { assertThat(options.allowBlockingFlush()).isEqualTo(allowBlockingFlush); } } + + @Test + public void ingestBehind() { + try (final IngestExternalFileOptions options = + new IngestExternalFileOptions()) { + assertThat(options.ingestBehind()).isFalse(); + options.setIngestBehind(true); + assertThat(options.ingestBehind()).isTrue(); + } + } + + @Test + public void writeGlobalSeqno() { + try (final IngestExternalFileOptions options = + new IngestExternalFileOptions()) { + assertThat(options.writeGlobalSeqno()).isTrue(); + options.setWriteGlobalSeqno(false); + assertThat(options.writeGlobalSeqno()).isFalse(); + } + } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/KeyMayExistTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/KeyMayExistTest.java index 8092270eb..577fe2ead 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/KeyMayExistTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/KeyMayExistTest.java @@ -48,12 +48,33 @@ public class KeyMayExistTest { assertThat(exists).isTrue(); assertThat(retValue.toString()).isEqualTo("value"); + // Slice key + StringBuilder builder = new StringBuilder("prefix"); + int offset = builder.toString().length(); + builder.append("slice key 0"); + int len = builder.toString().length() - offset; + builder.append("suffix"); + + byte[] sliceKey = builder.toString().getBytes(); + byte[] sliceValue = "slice value 0".getBytes(); + db.put(sliceKey, offset, len, sliceValue, 0, sliceValue.length); + + retValue = new StringBuilder(); + exists = db.keyMayExist(sliceKey, offset, len, retValue); + assertThat(exists).isTrue(); + assertThat(retValue.toString().getBytes()).isEqualTo(sliceValue); + // Test without column family but with readOptions try (final ReadOptions readOptions = new ReadOptions()) { retValue = new StringBuilder(); exists = db.keyMayExist(readOptions, "key".getBytes(), retValue); assertThat(exists).isTrue(); assertThat(retValue.toString()).isEqualTo("value"); + + retValue = new StringBuilder(); + exists = db.keyMayExist(readOptions, sliceKey, offset, len, retValue); + assertThat(exists).isTrue(); + assertThat(retValue.toString().getBytes()).isEqualTo(sliceValue); } // Test with column family @@ -63,6 +84,13 @@ public class KeyMayExistTest { assertThat(exists).isTrue(); assertThat(retValue.toString()).isEqualTo("value"); + // Test slice sky with column family + retValue = new StringBuilder(); + exists = db.keyMayExist(columnFamilyHandleList.get(0), sliceKey, offset, len, + retValue); + assertThat(exists).isTrue(); + assertThat(retValue.toString().getBytes()).isEqualTo(sliceValue); + // Test with column family and readOptions try (final ReadOptions readOptions = new ReadOptions()) { retValue = new StringBuilder(); @@ -71,11 +99,23 @@ public class KeyMayExistTest { retValue); assertThat(exists).isTrue(); assertThat(retValue.toString()).isEqualTo("value"); + + // Test slice key with column family and read options + retValue = new StringBuilder(); + exists = db.keyMayExist(readOptions, + columnFamilyHandleList.get(0), sliceKey, offset, len, + retValue); + assertThat(exists).isTrue(); + assertThat(retValue.toString().getBytes()).isEqualTo(sliceValue); } // KeyMayExist in CF1 must return false assertThat(db.keyMayExist(columnFamilyHandleList.get(1), "key".getBytes(), retValue)).isFalse(); + + // slice key + assertThat(db.keyMayExist(columnFamilyHandleList.get(1), + sliceKey, 1, 3, retValue)).isFalse(); } finally { for (final ColumnFamilyHandle columnFamilyHandle : columnFamilyHandleList) { diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MemoryUtilTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MemoryUtilTest.java new file mode 100644 index 000000000..73fcc87c3 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MemoryUtilTest.java @@ -0,0 +1,143 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.nio.charset.StandardCharsets; +import java.util.*; + +import static org.assertj.core.api.Assertions.assertThat; + +public class MemoryUtilTest { + + private static final String MEMTABLE_SIZE = "rocksdb.size-all-mem-tables"; + private static final String UNFLUSHED_MEMTABLE_SIZE = "rocksdb.cur-size-all-mem-tables"; + private static final String TABLE_READERS = "rocksdb.estimate-table-readers-mem"; + + private final byte[] key = "some-key".getBytes(StandardCharsets.UTF_8); + private final byte[] value = "some-value".getBytes(StandardCharsets.UTF_8); + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Rule public TemporaryFolder dbFolder1 = new TemporaryFolder(); + @Rule public TemporaryFolder dbFolder2 = new TemporaryFolder(); + + /** + * Test MemoryUtil.getApproximateMemoryUsageByType before and after a put + get + */ + @Test + public void getApproximateMemoryUsageByType() throws RocksDBException { + try (final Cache cache = new LRUCache(8 * 1024 * 1024); + final Options options = + new Options() + .setCreateIfMissing(true) + .setTableFormatConfig(new BlockBasedTableConfig().setBlockCache(cache)); + final FlushOptions flushOptions = + new FlushOptions().setWaitForFlush(true); + final RocksDB db = + RocksDB.open(options, dbFolder1.getRoot().getAbsolutePath())) { + + List dbs = new ArrayList<>(1); + dbs.add(db); + Set caches = new HashSet<>(1); + caches.add(cache); + Map usage = MemoryUtil.getApproximateMemoryUsageByType(dbs, caches); + + assertThat(usage.get(MemoryUsageType.kMemTableTotal)).isEqualTo( + db.getAggregatedLongProperty(MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kMemTableUnFlushed)).isEqualTo( + db.getAggregatedLongProperty(UNFLUSHED_MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kTableReadersTotal)).isEqualTo( + db.getAggregatedLongProperty(TABLE_READERS)); + assertThat(usage.get(MemoryUsageType.kCacheTotal)).isEqualTo(0); + + db.put(key, value); + db.flush(flushOptions); + db.get(key); + + usage = MemoryUtil.getApproximateMemoryUsageByType(dbs, caches); + assertThat(usage.get(MemoryUsageType.kMemTableTotal)).isGreaterThan(0); + assertThat(usage.get(MemoryUsageType.kMemTableTotal)).isEqualTo( + db.getAggregatedLongProperty(MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kMemTableUnFlushed)).isGreaterThan(0); + assertThat(usage.get(MemoryUsageType.kMemTableUnFlushed)).isEqualTo( + db.getAggregatedLongProperty(UNFLUSHED_MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kTableReadersTotal)).isGreaterThan(0); + assertThat(usage.get(MemoryUsageType.kTableReadersTotal)).isEqualTo( + db.getAggregatedLongProperty(TABLE_READERS)); + assertThat(usage.get(MemoryUsageType.kCacheTotal)).isGreaterThan(0); + + } + } + + /** + * Test MemoryUtil.getApproximateMemoryUsageByType with null inputs + */ + @Test + public void getApproximateMemoryUsageByTypeNulls() throws RocksDBException { + Map usage = MemoryUtil.getApproximateMemoryUsageByType(null, null); + + assertThat(usage.get(MemoryUsageType.kMemTableTotal)).isEqualTo(null); + assertThat(usage.get(MemoryUsageType.kMemTableUnFlushed)).isEqualTo(null); + assertThat(usage.get(MemoryUsageType.kTableReadersTotal)).isEqualTo(null); + assertThat(usage.get(MemoryUsageType.kCacheTotal)).isEqualTo(null); + } + + /** + * Test MemoryUtil.getApproximateMemoryUsageByType with two DBs and two caches + */ + @Test + public void getApproximateMemoryUsageByTypeMultiple() throws RocksDBException { + try (final Cache cache1 = new LRUCache(1 * 1024 * 1024); + final Options options1 = + new Options() + .setCreateIfMissing(true) + .setTableFormatConfig(new BlockBasedTableConfig().setBlockCache(cache1)); + final RocksDB db1 = + RocksDB.open(options1, dbFolder1.getRoot().getAbsolutePath()); + final Cache cache2 = new LRUCache(1 * 1024 * 1024); + final Options options2 = + new Options() + .setCreateIfMissing(true) + .setTableFormatConfig(new BlockBasedTableConfig().setBlockCache(cache2)); + final RocksDB db2 = + RocksDB.open(options2, dbFolder2.getRoot().getAbsolutePath()); + final FlushOptions flushOptions = + new FlushOptions().setWaitForFlush(true); + + ) { + List dbs = new ArrayList<>(1); + dbs.add(db1); + dbs.add(db2); + Set caches = new HashSet<>(1); + caches.add(cache1); + caches.add(cache2); + + for (RocksDB db: dbs) { + db.put(key, value); + db.flush(flushOptions); + db.get(key); + } + + Map usage = MemoryUtil.getApproximateMemoryUsageByType(dbs, caches); + assertThat(usage.get(MemoryUsageType.kMemTableTotal)).isEqualTo( + db1.getAggregatedLongProperty(MEMTABLE_SIZE) + db2.getAggregatedLongProperty(MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kMemTableUnFlushed)).isEqualTo( + db1.getAggregatedLongProperty(UNFLUSHED_MEMTABLE_SIZE) + db2.getAggregatedLongProperty(UNFLUSHED_MEMTABLE_SIZE)); + assertThat(usage.get(MemoryUsageType.kTableReadersTotal)).isEqualTo( + db1.getAggregatedLongProperty(TABLE_READERS) + db2.getAggregatedLongProperty(TABLE_READERS)); + assertThat(usage.get(MemoryUsageType.kCacheTotal)).isGreaterThan(0); + + } + } + +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MergeTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MergeTest.java index 73b90869c..554698476 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MergeTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MergeTest.java @@ -5,6 +5,7 @@ package org.rocksdb; +import java.nio.ByteBuffer; import java.util.Arrays; import java.util.List; import java.util.ArrayList; @@ -44,6 +45,38 @@ public class MergeTest { } } + private byte[] longToByteArray(long l) { + ByteBuffer buf = ByteBuffer.allocate(Long.SIZE / Byte.SIZE); + buf.putLong(l); + return buf.array(); + } + + private long longFromByteArray(byte[] a) { + ByteBuffer buf = ByteBuffer.allocate(Long.SIZE / Byte.SIZE); + buf.put(a); + buf.flip(); + return buf.getLong(); + } + + @Test + public void uint64AddOption() + throws InterruptedException, RocksDBException { + try (final Options opt = new Options() + .setCreateIfMissing(true) + .setMergeOperatorName("uint64add"); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + // writing (long)100 under key + db.put("key".getBytes(), longToByteArray(100)); + // merge (long)1 under key + db.merge("key".getBytes(), longToByteArray(1)); + + final byte[] value = db.get("key".getBytes()); + final long longValue = longFromByteArray(value); + assertThat(longValue).isEqualTo(101); + } + } + @Test public void cFStringOption() throws InterruptedException, RocksDBException { @@ -86,6 +119,48 @@ public class MergeTest { } } + @Test + public void cFUInt64AddOption() + throws InterruptedException, RocksDBException { + + try (final ColumnFamilyOptions cfOpt1 = new ColumnFamilyOptions() + .setMergeOperatorName("uint64add"); + final ColumnFamilyOptions cfOpt2 = new ColumnFamilyOptions() + .setMergeOperatorName("uint64add") + ) { + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpt1), + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpt2) + ); + + final List columnFamilyHandleList = new ArrayList<>(); + try (final DBOptions opt = new DBOptions() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath(), cfDescriptors, + columnFamilyHandleList)) { + try { + // writing (long)100 under key + db.put(columnFamilyHandleList.get(1), + "cfkey".getBytes(), longToByteArray(100)); + // merge (long)1 under key + db.merge(columnFamilyHandleList.get(1), + "cfkey".getBytes(), longToByteArray(1)); + + byte[] value = db.get(columnFamilyHandleList.get(1), + "cfkey".getBytes()); + long longValue = longFromByteArray(value); + assertThat(longValue).isEqualTo(101); + } finally { + for (final ColumnFamilyHandle handle : columnFamilyHandleList) { + handle.close(); + } + } + } + } + } + @Test public void operatorOption() throws InterruptedException, RocksDBException { @@ -108,6 +183,28 @@ public class MergeTest { } } + @Test + public void uint64AddOperatorOption() + throws InterruptedException, RocksDBException { + try (final UInt64AddOperator uint64AddOperator = new UInt64AddOperator(); + final Options opt = new Options() + .setCreateIfMissing(true) + .setMergeOperator(uint64AddOperator); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + // Writing (long)100 under key + db.put("key".getBytes(), longToByteArray(100)); + + // Writing (long)1 under key + db.merge("key".getBytes(), longToByteArray(1)); + + final byte[] value = db.get("key".getBytes()); + final long longValue = longFromByteArray(value); + + assertThat(longValue).isEqualTo(101); + } + } + @Test public void cFOperatorOption() throws InterruptedException, RocksDBException { @@ -170,6 +267,68 @@ public class MergeTest { } } + @Test + public void cFUInt64AddOperatorOption() + throws InterruptedException, RocksDBException { + try (final UInt64AddOperator uint64AddOperator = new UInt64AddOperator(); + final ColumnFamilyOptions cfOpt1 = new ColumnFamilyOptions() + .setMergeOperator(uint64AddOperator); + final ColumnFamilyOptions cfOpt2 = new ColumnFamilyOptions() + .setMergeOperator(uint64AddOperator) + ) { + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpt1), + new ColumnFamilyDescriptor("new_cf".getBytes(), cfOpt2) + ); + final List columnFamilyHandleList = new ArrayList<>(); + try (final DBOptions opt = new DBOptions() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath(), cfDescriptors, + columnFamilyHandleList) + ) { + try { + // writing (long)100 under key + db.put(columnFamilyHandleList.get(1), + "cfkey".getBytes(), longToByteArray(100)); + // merge (long)1 under key + db.merge(columnFamilyHandleList.get(1), + "cfkey".getBytes(), longToByteArray(1)); + byte[] value = db.get(columnFamilyHandleList.get(1), + "cfkey".getBytes()); + long longValue = longFromByteArray(value); + + // Test also with createColumnFamily + try (final ColumnFamilyOptions cfHandleOpts = + new ColumnFamilyOptions() + .setMergeOperator(uint64AddOperator); + final ColumnFamilyHandle cfHandle = + db.createColumnFamily( + new ColumnFamilyDescriptor("new_cf2".getBytes(), + cfHandleOpts)) + ) { + // writing (long)200 under cfkey2 + db.put(cfHandle, "cfkey2".getBytes(), longToByteArray(200)); + // merge (long)50 under cfkey2 + db.merge(cfHandle, new WriteOptions(), "cfkey2".getBytes(), + longToByteArray(50)); + value = db.get(cfHandle, "cfkey2".getBytes()); + long longValueTmpCf = longFromByteArray(value); + + assertThat(longValue).isEqualTo(101); + assertThat(longValueTmpCf).isEqualTo(250); + } + } finally { + for (final ColumnFamilyHandle columnFamilyHandle : + columnFamilyHandleList) { + columnFamilyHandle.close(); + } + } + } + } + } + @Test public void operatorGcBehaviour() throws RocksDBException { @@ -182,7 +341,6 @@ public class MergeTest { //no-op } - // test reuse try (final Options opt = new Options() .setMergeOperator(stringAppendOperator); @@ -213,6 +371,48 @@ public class MergeTest { } } + @Test + public void uint64AddOperatorGcBehaviour() + throws RocksDBException { + try (final UInt64AddOperator uint64AddOperator = new UInt64AddOperator()) { + try (final Options opt = new Options() + .setCreateIfMissing(true) + .setMergeOperator(uint64AddOperator); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + //no-op + } + + // test reuse + try (final Options opt = new Options() + .setMergeOperator(uint64AddOperator); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + //no-op + } + + // test param init + try (final UInt64AddOperator uint64AddOperator2 = new UInt64AddOperator(); + final Options opt = new Options() + .setMergeOperator(uint64AddOperator2); + final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + //no-op + } + + // test replace one with another merge operator instance + try (final Options opt = new Options() + .setMergeOperator(uint64AddOperator); + final UInt64AddOperator newUInt64AddOperator = new UInt64AddOperator()) { + opt.setMergeOperator(newUInt64AddOperator); + try (final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath())) { + //no-op + } + } + } + } + @Test public void emptyStringInSetMergeOperatorByName() { try (final Options opt = new Options() diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MutableDBOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MutableDBOptionsTest.java new file mode 100644 index 000000000..1ce3e1177 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/MutableDBOptionsTest.java @@ -0,0 +1,84 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +package org.rocksdb; + +import org.junit.Test; +import org.rocksdb.MutableDBOptions.MutableDBOptionsBuilder; + +import java.util.NoSuchElementException; + +import static org.assertj.core.api.Assertions.assertThat; + +public class MutableDBOptionsTest { + + @Test + public void builder() { + final MutableDBOptionsBuilder builder = + MutableDBOptions.builder(); + builder + .setBytesPerSync(1024 * 1024 * 7) + .setMaxBackgroundJobs(5) + .setAvoidFlushDuringShutdown(false); + + assertThat(builder.bytesPerSync()).isEqualTo(1024 * 1024 * 7); + assertThat(builder.maxBackgroundJobs()).isEqualTo(5); + assertThat(builder.avoidFlushDuringShutdown()).isEqualTo(false); + } + + @Test(expected = NoSuchElementException.class) + public void builder_getWhenNotSet() { + final MutableDBOptionsBuilder builder = + MutableDBOptions.builder(); + + builder.bytesPerSync(); + } + + @Test + public void builder_build() { + final MutableDBOptions options = MutableDBOptions + .builder() + .setBytesPerSync(1024 * 1024 * 7) + .setMaxBackgroundJobs(5) + .build(); + + assertThat(options.getKeys().length).isEqualTo(2); + assertThat(options.getValues().length).isEqualTo(2); + assertThat(options.getKeys()[0]) + .isEqualTo( + MutableDBOptions.DBOption.bytes_per_sync.name()); + assertThat(options.getValues()[0]).isEqualTo("7340032"); + assertThat(options.getKeys()[1]) + .isEqualTo( + MutableDBOptions.DBOption.max_background_jobs.name()); + assertThat(options.getValues()[1]).isEqualTo("5"); + } + + @Test + public void mutableColumnFamilyOptions_toString() { + final String str = MutableDBOptions + .builder() + .setMaxOpenFiles(99) + .setDelayedWriteRate(789) + .setAvoidFlushDuringShutdown(true) + .build() + .toString(); + + assertThat(str).isEqualTo("max_open_files=99;delayed_write_rate=789;" + + "avoid_flush_during_shutdown=true"); + } + + @Test + public void mutableColumnFamilyOptions_parse() { + final String str = "max_open_files=99;delayed_write_rate=789;" + + "avoid_flush_during_shutdown=true"; + + final MutableDBOptionsBuilder builder = + MutableDBOptions.parse(str); + + assertThat(builder.maxOpenFiles()).isEqualTo(99); + assertThat(builder.delayedWriteRate()).isEqualTo(789); + assertThat(builder.avoidFlushDuringShutdown()).isEqualTo(true); + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/OptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/OptionsTest.java index 7f7679d73..e27a33d7d 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/OptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/OptionsTest.java @@ -6,13 +6,11 @@ package org.rocksdb; import java.nio.file.Paths; -import java.util.ArrayList; -import java.util.Collections; -import java.util.List; -import java.util.Random; +import java.util.*; import org.junit.ClassRule; import org.junit.Test; +import org.rocksdb.test.RemoveEmptyValueCompactionFilterFactory; import static org.assertj.core.api.Assertions.assertThat; @@ -645,6 +643,26 @@ public class OptionsTest { } } + @Test + public void setWriteBufferManager() throws RocksDBException { + try (final Options opt = new Options(); + final Cache cache = new LRUCache(1 * 1024 * 1024); + final WriteBufferManager writeBufferManager = new WriteBufferManager(2000l, cache)) { + opt.setWriteBufferManager(writeBufferManager); + assertThat(opt.writeBufferManager()).isEqualTo(writeBufferManager); + } + } + + @Test + public void setWriteBufferManagerWithZeroBufferSize() throws RocksDBException { + try (final Options opt = new Options(); + final Cache cache = new LRUCache(1 * 1024 * 1024); + final WriteBufferManager writeBufferManager = new WriteBufferManager(0l, cache)) { + opt.setWriteBufferManager(writeBufferManager); + assertThat(opt.writeBufferManager()).isEqualTo(writeBufferManager); + } + } + @Test public void accessHintOnCompactionStart() { try (final Options opt = new Options()) { @@ -735,6 +753,15 @@ public class OptionsTest { } } + @Test + public void enablePipelinedWrite() { + try(final Options opt = new Options()) { + assertThat(opt.enablePipelinedWrite()).isFalse(); + opt.setEnablePipelinedWrite(true); + assertThat(opt.enablePipelinedWrite()).isTrue(); + } + } + @Test public void allowConcurrentMemtableWrite() { try (final Options opt = new Options()) { @@ -816,6 +843,38 @@ public class OptionsTest { } } + @Test + public void walFilter() { + try (final Options opt = new Options()) { + assertThat(opt.walFilter()).isNull(); + + try (final AbstractWalFilter walFilter = new AbstractWalFilter() { + @Override + public void columnFamilyLogNumberMap( + final Map cfLognumber, + final Map cfNameId) { + // no-op + } + + @Override + public LogRecordFoundResult logRecordFound(final long logNumber, + final String logFileName, final WriteBatch batch, + final WriteBatch newBatch) { + return new LogRecordFoundResult( + WalProcessingOption.CONTINUE_PROCESSING, false); + } + + @Override + public String name() { + return "test-wal-filter"; + } + }) { + opt.setWalFilter(walFilter); + assertThat(opt.walFilter()).isEqualTo(walFilter); + } + } + } + @Test public void failIfOptionsFileError() { try (final Options opt = new Options()) { @@ -852,6 +911,52 @@ public class OptionsTest { } } + + @Test + public void allowIngestBehind() { + try (final Options opt = new Options()) { + assertThat(opt.allowIngestBehind()).isFalse(); + opt.setAllowIngestBehind(true); + assertThat(opt.allowIngestBehind()).isTrue(); + } + } + + @Test + public void preserveDeletes() { + try (final Options opt = new Options()) { + assertThat(opt.preserveDeletes()).isFalse(); + opt.setPreserveDeletes(true); + assertThat(opt.preserveDeletes()).isTrue(); + } + } + + @Test + public void twoWriteQueues() { + try (final Options opt = new Options()) { + assertThat(opt.twoWriteQueues()).isFalse(); + opt.setTwoWriteQueues(true); + assertThat(opt.twoWriteQueues()).isTrue(); + } + } + + @Test + public void manualWalFlush() { + try (final Options opt = new Options()) { + assertThat(opt.manualWalFlush()).isFalse(); + opt.setManualWalFlush(true); + assertThat(opt.manualWalFlush()).isTrue(); + } + } + + @Test + public void atomicFlush() { + try (final Options opt = new Options()) { + assertThat(opt.atomicFlush()).isFalse(); + opt.setAtomicFlush(true); + assertThat(opt.atomicFlush()).isTrue(); + } + } + @Test public void env() { try (final Options options = new Options(); @@ -945,6 +1050,20 @@ public class OptionsTest { } } + @Test + public void bottommostCompressionOptions() { + try (final Options options = new Options(); + final CompressionOptions bottommostCompressionOptions = new CompressionOptions() + .setMaxDictBytes(123)) { + + options.setBottommostCompressionOptions(bottommostCompressionOptions); + assertThat(options.bottommostCompressionOptions()) + .isEqualTo(bottommostCompressionOptions); + assertThat(options.bottommostCompressionOptions().maxDictBytes()) + .isEqualTo(123); + } + } + @Test public void compressionOptions() { try (final Options options = new Options(); @@ -1087,6 +1206,15 @@ public class OptionsTest { } } + @Test + public void ttl() { + try (final Options options = new Options()) { + options.setTtl(1000 * 60); + assertThat(options.ttl()). + isEqualTo(1000 * 60); + } + } + @Test public void compactionOptionsUniversal() { try (final Options options = new Options(); @@ -1122,4 +1250,23 @@ public class OptionsTest { isEqualTo(booleanValue); } } + + @Test + public void compactionFilter() { + try(final Options options = new Options(); + final RemoveEmptyValueCompactionFilter cf = new RemoveEmptyValueCompactionFilter()) { + options.setCompactionFilter(cf); + assertThat(options.compactionFilter()).isEqualTo(cf); + } + } + + @Test + public void compactionFilterFactory() { + try(final Options options = new Options(); + final RemoveEmptyValueCompactionFilterFactory cff = new RemoveEmptyValueCompactionFilterFactory()) { + options.setCompactionFilterFactory(cff); + assertThat(options.compactionFilterFactory()).isEqualTo(cff); + } + } + } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ReadOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ReadOptionsTest.java index f7d799909..9708cd0b1 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ReadOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/ReadOptionsTest.java @@ -24,6 +24,30 @@ public class ReadOptionsTest { @Rule public ExpectedException exception = ExpectedException.none(); + @Test + public void altConstructor() { + try (final ReadOptions opt = new ReadOptions(true, true)) { + assertThat(opt.verifyChecksums()).isTrue(); + assertThat(opt.fillCache()).isTrue(); + } + } + + @Test + public void copyConstructor() { + try (final ReadOptions opt = new ReadOptions()) { + opt.setVerifyChecksums(false); + opt.setFillCache(false); + opt.setIterateUpperBound(buildRandomSlice()); + opt.setIterateLowerBound(buildRandomSlice()); + try (final ReadOptions other = new ReadOptions(opt)) { + assertThat(opt.verifyChecksums()).isEqualTo(other.verifyChecksums()); + assertThat(opt.fillCache()).isEqualTo(other.fillCache()); + assertThat(Arrays.equals(opt.iterateUpperBound().data(), other.iterateUpperBound().data())).isTrue(); + assertThat(Arrays.equals(opt.iterateLowerBound().data(), other.iterateLowerBound().data())).isTrue(); + } + } + } + @Test public void verifyChecksum() { try (final ReadOptions opt = new ReadOptions()) { @@ -145,15 +169,36 @@ public class ReadOptionsTest { } @Test - public void copyConstructor() { + public void iterateLowerBound() { try (final ReadOptions opt = new ReadOptions()) { - opt.setVerifyChecksums(false); - opt.setFillCache(false); - opt.setIterateUpperBound(buildRandomSlice()); - ReadOptions other = new ReadOptions(opt); - assertThat(opt.verifyChecksums()).isEqualTo(other.verifyChecksums()); - assertThat(opt.fillCache()).isEqualTo(other.fillCache()); - assertThat(Arrays.equals(opt.iterateUpperBound().data(), other.iterateUpperBound().data())).isTrue(); + Slice lowerBound = buildRandomSlice(); + opt.setIterateLowerBound(lowerBound); + assertThat(Arrays.equals(lowerBound.data(), opt.iterateLowerBound().data())).isTrue(); + } + } + + @Test + public void iterateLowerBoundNull() { + try (final ReadOptions opt = new ReadOptions()) { + assertThat(opt.iterateLowerBound()).isNull(); + } + } + + @Test + public void tableFilter() { + try (final ReadOptions opt = new ReadOptions(); + final AbstractTableFilter allTablesFilter = new AllTablesFilter()) { + opt.setTableFilter(allTablesFilter); + } + } + + @Test + public void iterStartSeqnum() { + try (final ReadOptions opt = new ReadOptions()) { + assertThat(opt.iterStartSeqnum()).isEqualTo(0); + + opt.setIterStartSeqnum(10); + assertThat(opt.iterStartSeqnum()).isEqualTo(10); } } @@ -237,6 +282,22 @@ public class ReadOptionsTest { } } + @Test + public void failSetIterateLowerBoundUninitialized() { + try (final ReadOptions readOptions = + setupUninitializedReadOptions(exception)) { + readOptions.setIterateLowerBound(null); + } + } + + @Test + public void failIterateLowerBoundUninitialized() { + try (final ReadOptions readOptions = + setupUninitializedReadOptions(exception)) { + readOptions.iterateLowerBound(); + } + } + private ReadOptions setupUninitializedReadOptions( ExpectedException exception) { final ReadOptions readOptions = new ReadOptions(); @@ -252,4 +313,10 @@ public class ReadOptionsTest { return new Slice(sliceBytes); } + private static class AllTablesFilter extends AbstractTableFilter { + @Override + public boolean filter(final TableProperties tableProperties) { + return true; + } + } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksDBTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksDBTest.java index 158b8d56a..a7d7fee14 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksDBTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksDBTest.java @@ -4,15 +4,14 @@ // (found in the LICENSE.Apache file in the root directory). package org.rocksdb; -import org.junit.Assume; -import org.junit.ClassRule; -import org.junit.Rule; -import org.junit.Test; +import org.junit.*; import org.junit.rules.ExpectedException; import org.junit.rules.TemporaryFolder; +import java.nio.ByteBuffer; import java.util.*; +import static java.nio.charset.StandardCharsets.UTF_8; import static org.assertj.core.api.Assertions.assertThat; import static org.junit.Assert.fail; @@ -60,6 +59,130 @@ public class RocksDBTest { } } + @Test + public void createColumnFamily() throws RocksDBException { + final byte[] col1Name = "col1".getBytes(UTF_8); + + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions() + ) { + try (final ColumnFamilyHandle col1 = + db.createColumnFamily(new ColumnFamilyDescriptor(col1Name, cfOpts))) { + assertThat(col1).isNotNull(); + assertThat(col1.getName()).isEqualTo(col1Name); + } + } + + final List cfHandles = new ArrayList<>(); + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath(), + Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor(col1Name)), + cfHandles)) { + try { + assertThat(cfHandles.size()).isEqualTo(2); + assertThat(cfHandles.get(1)).isNotNull(); + assertThat(cfHandles.get(1).getName()).isEqualTo(col1Name); + } finally { + for (final ColumnFamilyHandle cfHandle : + cfHandles) { + cfHandle.close(); + } + } + } + } + + + @Test + public void createColumnFamilies() throws RocksDBException { + final byte[] col1Name = "col1".getBytes(UTF_8); + final byte[] col2Name = "col2".getBytes(UTF_8); + + List cfHandles; + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions() + ) { + cfHandles = + db.createColumnFamilies(cfOpts, Arrays.asList(col1Name, col2Name)); + try { + assertThat(cfHandles).isNotNull(); + assertThat(cfHandles.size()).isEqualTo(2); + assertThat(cfHandles.get(0).getName()).isEqualTo(col1Name); + assertThat(cfHandles.get(1).getName()).isEqualTo(col2Name); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + + cfHandles = new ArrayList<>(); + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath(), + Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor(col1Name), + new ColumnFamilyDescriptor(col2Name)), + cfHandles)) { + try { + assertThat(cfHandles.size()).isEqualTo(3); + assertThat(cfHandles.get(1)).isNotNull(); + assertThat(cfHandles.get(1).getName()).isEqualTo(col1Name); + assertThat(cfHandles.get(2)).isNotNull(); + assertThat(cfHandles.get(2).getName()).isEqualTo(col2Name); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + + @Test + public void createColumnFamiliesfromDescriptors() throws RocksDBException { + final byte[] col1Name = "col1".getBytes(UTF_8); + final byte[] col2Name = "col2".getBytes(UTF_8); + + List cfHandles; + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions() + ) { + cfHandles = + db.createColumnFamilies(Arrays.asList( + new ColumnFamilyDescriptor(col1Name, cfOpts), + new ColumnFamilyDescriptor(col2Name, cfOpts))); + try { + assertThat(cfHandles).isNotNull(); + assertThat(cfHandles.size()).isEqualTo(2); + assertThat(cfHandles.get(0).getName()).isEqualTo(col1Name); + assertThat(cfHandles.get(1).getName()).isEqualTo(col2Name); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + + cfHandles = new ArrayList<>(); + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath(), + Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor(col1Name), + new ColumnFamilyDescriptor(col2Name)), + cfHandles)) { + try { + assertThat(cfHandles.size()).isEqualTo(3); + assertThat(cfHandles.get(1)).isNotNull(); + assertThat(cfHandles.get(1).getName()).isEqualTo(col1Name); + assertThat(cfHandles.get(2)).isNotNull(); + assertThat(cfHandles.get(2).getName()).isEqualTo(col2Name); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + @Test public void put() throws RocksDBException { try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); @@ -70,6 +193,57 @@ public class RocksDBTest { "value".getBytes()); assertThat(db.get("key2".getBytes())).isEqualTo( "12345678".getBytes()); + + + // put + Segment key3 = sliceSegment("key3"); + Segment key4 = sliceSegment("key4"); + Segment value0 = sliceSegment("value 0"); + Segment value1 = sliceSegment("value 1"); + db.put(key3.data, key3.offset, key3.len, value0.data, value0.offset, value0.len); + db.put(opt, key4.data, key4.offset, key4.len, value1.data, value1.offset, value1.len); + + // compare + Assert.assertTrue(value0.isSamePayload(db.get(key3.data, key3.offset, key3.len))); + Assert.assertTrue(value1.isSamePayload(db.get(key4.data, key4.offset, key4.len))); + } + } + + private static Segment sliceSegment(String key) { + ByteBuffer rawKey = ByteBuffer.allocate(key.length() + 4); + rawKey.put((byte)0); + rawKey.put((byte)0); + rawKey.put(key.getBytes()); + + return new Segment(rawKey.array(), 2, key.length()); + } + + private static class Segment { + final byte[] data; + final int offset; + final int len; + + public boolean isSamePayload(byte[] value) { + if (value == null) { + return false; + } + if (value.length != len) { + return false; + } + + for (int i = 0; i < value.length; i++) { + if (data[i + offset] != value[i]) { + return false; + } + } + + return true; + } + + public Segment(byte[] value, int offset, int len) { + this.data = value; + this.offset = offset; + this.len = len; } } @@ -217,6 +391,41 @@ public class RocksDBTest { } } + @Test + public void multiGetAsList() throws RocksDBException, InterruptedException { + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final ReadOptions rOpt = new ReadOptions()) { + db.put("key1".getBytes(), "value".getBytes()); + db.put("key2".getBytes(), "12345678".getBytes()); + List lookupKeys = new ArrayList<>(); + lookupKeys.add("key1".getBytes()); + lookupKeys.add("key2".getBytes()); + List results = db.multiGetAsList(lookupKeys); + assertThat(results).isNotNull(); + assertThat(results).hasSize(lookupKeys.size()); + assertThat(results). + containsExactly("value".getBytes(), "12345678".getBytes()); + // test same method with ReadOptions + results = db.multiGetAsList(rOpt, lookupKeys); + assertThat(results).isNotNull(); + assertThat(results). + contains("value".getBytes(), "12345678".getBytes()); + + // remove existing key + lookupKeys.remove(1); + // add non existing key + lookupKeys.add("key3".getBytes()); + results = db.multiGetAsList(lookupKeys); + assertThat(results).isNotNull(); + assertThat(results). + containsExactly("value".getBytes(), null); + // test same call with readOptions + results = db.multiGetAsList(rOpt, lookupKeys); + assertThat(results).isNotNull(); + assertThat(results).contains("value".getBytes()); + } + } + @Test public void merge() throws RocksDBException { try (final StringAppendOperator stringAppendOperator = new StringAppendOperator(); @@ -242,6 +451,18 @@ public class RocksDBTest { db.merge(wOpt, "key2".getBytes(), "xxxx".getBytes()); assertThat(db.get("key2".getBytes())).isEqualTo( "xxxx".getBytes()); + + Segment key3 = sliceSegment("key3"); + Segment key4 = sliceSegment("key4"); + Segment value0 = sliceSegment("value 0"); + Segment value1 = sliceSegment("value 1"); + + db.merge(key3.data, key3.offset, key3.len, value0.data, value0.offset, value0.len); + db.merge(wOpt, key4.data, key4.offset, key4.len, value1.data, value1.offset, value1.len); + + // compare + Assert.assertTrue(value0.isSamePayload(db.get(key3.data, key3.offset, key3.len))); + Assert.assertTrue(value1.isSamePayload(db.get(key4.data, key4.offset, key4.len))); } } @@ -259,6 +480,18 @@ public class RocksDBTest { db.delete(wOpt, "key2".getBytes()); assertThat(db.get("key1".getBytes())).isNull(); assertThat(db.get("key2".getBytes())).isNull(); + + + Segment key3 = sliceSegment("key3"); + Segment key4 = sliceSegment("key4"); + db.put("key3".getBytes(), "key3 value".getBytes()); + db.put("key4".getBytes(), "key4 value".getBytes()); + + db.delete(key3.data, key3.offset, key3.len); + db.delete(wOpt, key4.data, key4.offset, key4.len); + + assertThat(db.get("key3".getBytes())).isNull(); + assertThat(db.get("key4".getBytes())).isNull(); } } @@ -292,8 +525,7 @@ public class RocksDBTest { @Test public void deleteRange() throws RocksDBException { - try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); - final WriteOptions wOpt = new WriteOptions()) { + try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath())) { db.put("key1".getBytes(), "value".getBytes()); db.put("key2".getBytes(), "12345678".getBytes()); db.put("key3".getBytes(), "abcdefg".getBytes()); @@ -822,4 +1054,501 @@ public class RocksDBTest { } } } + + @Ignore("This test crashes. Re-enable after fixing.") + @Test + public void getApproximateSizes() throws RocksDBException { + final byte key1[] = "key1".getBytes(UTF_8); + final byte key2[] = "key2".getBytes(UTF_8); + final byte key3[] = "key3".getBytes(UTF_8); + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.put(key1, key1); + db.put(key2, key2); + db.put(key3, key3); + + final long[] sizes = db.getApproximateSizes( + Arrays.asList( + new Range(new Slice(key1), new Slice(key2)), + new Range(new Slice(key2), new Slice(key3)) + ), + SizeApproximationFlag.INCLUDE_FILES, + SizeApproximationFlag.INCLUDE_MEMTABLES); + + assertThat(sizes.length).isEqualTo(2); + assertThat(sizes[0]).isEqualTo(0); + assertThat(sizes[1]).isGreaterThanOrEqualTo(1); + } + } + } + + @Test + public void getApproximateMemTableStats() throws RocksDBException { + final byte key1[] = "key1".getBytes(UTF_8); + final byte key2[] = "key2".getBytes(UTF_8); + final byte key3[] = "key3".getBytes(UTF_8); + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.put(key1, key1); + db.put(key2, key2); + db.put(key3, key3); + + final RocksDB.CountAndSize stats = + db.getApproximateMemTableStats( + new Range(new Slice(key1), new Slice(key3))); + + assertThat(stats).isNotNull(); + assertThat(stats.count).isGreaterThan(1); + assertThat(stats.size).isGreaterThan(1); + } + } + } + + @Ignore("TODO(AR) re-enable when ready!") + @Test + public void compactFiles() throws RocksDBException { + final int kTestKeySize = 16; + final int kTestValueSize = 984; + final int kEntrySize = kTestKeySize + kTestValueSize; + final int kEntriesPerBuffer = 100; + final int writeBufferSize = kEntrySize * kEntriesPerBuffer; + final byte[] cfName = "pikachu".getBytes(UTF_8); + + try (final Options options = new Options() + .setCreateIfMissing(true) + .setWriteBufferSize(writeBufferSize) + .setCompactionStyle(CompactionStyle.LEVEL) + .setTargetFileSizeBase(writeBufferSize) + .setMaxBytesForLevelBase(writeBufferSize * 2) + .setLevel0StopWritesTrigger(2) + .setMaxBytesForLevelMultiplier(2) + .setCompressionType(CompressionType.NO_COMPRESSION) + .setMaxSubcompactions(4)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath); + final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions(options)) { + db.createColumnFamily(new ColumnFamilyDescriptor(cfName, + cfOptions)).close(); + } + + try (final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions(options)) { + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOptions), + new ColumnFamilyDescriptor(cfName, cfOptions) + ); + final List cfHandles = new ArrayList<>(); + try (final DBOptions dbOptions = new DBOptions(options); + final RocksDB db = RocksDB.open(dbOptions, dbPath, cfDescriptors, + cfHandles); + ) { + try (final FlushOptions flushOptions = new FlushOptions() + .setWaitForFlush(true) + .setAllowWriteStall(true); + final CompactionOptions compactionOptions = new CompactionOptions()) { + final Random rnd = new Random(301); + for (int key = 64 * kEntriesPerBuffer; key >= 0; --key) { + final byte[] value = new byte[kTestValueSize]; + rnd.nextBytes(value); + db.put(cfHandles.get(1), Integer.toString(key).getBytes(UTF_8), + value); + } + db.flush(flushOptions, cfHandles); + + final RocksDB.LiveFiles liveFiles = db.getLiveFiles(); + final List compactedFiles = + db.compactFiles(compactionOptions, cfHandles.get(1), + liveFiles.files, 1, -1, null); + assertThat(compactedFiles).isNotEmpty(); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + } + + @Test + public void enableAutoCompaction() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true)) { + final List cfDescs = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY) + ); + final List cfHandles = new ArrayList<>(); + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath, cfDescs, cfHandles)) { + try { + db.enableAutoCompaction(cfHandles); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + + @Test + public void numberLevels() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.numberLevels()).isEqualTo(7); + } + } + } + + @Test + public void maxMemCompactionLevel() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.maxMemCompactionLevel()).isEqualTo(0); + } + } + } + + @Test + public void level0StopWriteTrigger() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.level0StopWriteTrigger()).isEqualTo(36); + } + } + } + + @Test + public void getName() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.getName()).isEqualTo(dbPath); + } + } + } + + @Test + public void getEnv() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.getEnv()).isEqualTo(Env.getDefault()); + } + } + } + + @Test + public void flush() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath); + final FlushOptions flushOptions = new FlushOptions()) { + db.flush(flushOptions); + } + } + } + + @Test + public void flushWal() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.flushWal(true); + } + } + } + + @Test + public void syncWal() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.syncWal(); + } + } + } + + @Test + public void setPreserveDeletesSequenceNumber() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + assertThat(db.setPreserveDeletesSequenceNumber(db.getLatestSequenceNumber())) + .isFalse(); + } + } + } + + @Test + public void getLiveFiles() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + final RocksDB.LiveFiles livefiles = db.getLiveFiles(true); + assertThat(livefiles).isNotNull(); + assertThat(livefiles.manifestFileSize).isEqualTo(13); + assertThat(livefiles.files.size()).isEqualTo(3); + assertThat(livefiles.files.get(0)).isEqualTo("/CURRENT"); + assertThat(livefiles.files.get(1)).isEqualTo("/MANIFEST-000001"); + assertThat(livefiles.files.get(2)).isEqualTo("/OPTIONS-000005"); + } + } + } + + @Test + public void getSortedWalFiles() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.put("key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + final List logFiles = db.getSortedWalFiles(); + assertThat(logFiles).isNotNull(); + assertThat(logFiles.size()).isEqualTo(1); + assertThat(logFiles.get(0).type()) + .isEqualTo(WalFileType.kAliveLogFile); + } + } + } + + @Test + public void deleteFile() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.deleteFile("unknown"); + } + } + } + + @Test + public void getLiveFilesMetaData() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.put("key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + final List liveFilesMetaData + = db.getLiveFilesMetaData(); + assertThat(liveFilesMetaData).isEmpty(); + } + } + } + + @Test + public void getColumnFamilyMetaData() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true)) { + final List cfDescs = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY) + ); + final List cfHandles = new ArrayList<>(); + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath, cfDescs, cfHandles)) { + db.put(cfHandles.get(0), "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + try { + final ColumnFamilyMetaData cfMetadata = + db.getColumnFamilyMetaData(cfHandles.get(0)); + assertThat(cfMetadata).isNotNull(); + assertThat(cfMetadata.name()).isEqualTo(RocksDB.DEFAULT_COLUMN_FAMILY); + assertThat(cfMetadata.levels().size()).isEqualTo(7); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + + @Test + public void verifyChecksum() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.verifyChecksum(); + } + } + } + + @Test + public void getPropertiesOfAllTables() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true)) { + final List cfDescs = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY) + ); + final List cfHandles = new ArrayList<>(); + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath, cfDescs, cfHandles)) { + db.put(cfHandles.get(0), "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + try { + final Map properties = + db.getPropertiesOfAllTables(cfHandles.get(0)); + assertThat(properties).isNotNull(); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + + @Test + public void getPropertiesOfTablesInRange() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true)) { + final List cfDescs = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY) + ); + final List cfHandles = new ArrayList<>(); + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath, cfDescs, cfHandles)) { + db.put(cfHandles.get(0), "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + db.put(cfHandles.get(0), "key2".getBytes(UTF_8), "value2".getBytes(UTF_8)); + db.put(cfHandles.get(0), "key3".getBytes(UTF_8), "value3".getBytes(UTF_8)); + try { + final Range range = new Range( + new Slice("key1".getBytes(UTF_8)), + new Slice("key3".getBytes(UTF_8))); + final Map properties = + db.getPropertiesOfTablesInRange( + cfHandles.get(0), Arrays.asList(range)); + assertThat(properties).isNotNull(); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + + @Test + public void suggestCompactRange() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true)) { + final List cfDescs = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY) + ); + final List cfHandles = new ArrayList<>(); + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath, cfDescs, cfHandles)) { + db.put(cfHandles.get(0), "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + db.put(cfHandles.get(0), "key2".getBytes(UTF_8), "value2".getBytes(UTF_8)); + db.put(cfHandles.get(0), "key3".getBytes(UTF_8), "value3".getBytes(UTF_8)); + try { + final Range range = db.suggestCompactRange(cfHandles.get(0)); + assertThat(range).isNotNull(); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + } + } + } + } + + @Test + public void promoteL0() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + db.promoteL0(2); + } + } + } + + @Test + public void startTrace() throws RocksDBException { + try (final Options options = new Options().setCreateIfMissing(true)) { + final String dbPath = dbFolder.getRoot().getAbsolutePath(); + try (final RocksDB db = RocksDB.open(options, dbPath)) { + final TraceOptions traceOptions = new TraceOptions(); + + try (final InMemoryTraceWriter traceWriter = new InMemoryTraceWriter()) { + db.startTrace(traceOptions, traceWriter); + + db.put("key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + + db.endTrace(); + + final List writes = traceWriter.getWrites(); + assertThat(writes.size()).isGreaterThan(0); + } + } + } + } + + @Test + public void setDBOptions() throws RocksDBException { + try (final DBOptions options = new DBOptions() + .setCreateIfMissing(true) + .setCreateMissingColumnFamilies(true); + final ColumnFamilyOptions new_cf_opts = new ColumnFamilyOptions() + .setWriteBufferSize(4096)) { + + final List columnFamilyDescriptors = + Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor("new_cf".getBytes(), new_cf_opts)); + + // open database + final List columnFamilyHandles = new ArrayList<>(); + try (final RocksDB db = RocksDB.open(options, + dbFolder.getRoot().getAbsolutePath(), columnFamilyDescriptors, columnFamilyHandles)) { + try { + final MutableDBOptions mutableOptions = + MutableDBOptions.builder() + .setBytesPerSync(1024 * 1027 * 7) + .setAvoidFlushDuringShutdown(false) + .build(); + + db.setDBOptions(mutableOptions); + } finally { + for (final ColumnFamilyHandle handle : columnFamilyHandles) { + handle.close(); + } + } + } + } + } + + private static class InMemoryTraceWriter extends AbstractTraceWriter { + private final List writes = new ArrayList<>(); + private volatile boolean closed = false; + + @Override + public void write(final Slice slice) { + if (closed) { + return; + } + final byte[] data = slice.data(); + final byte[] dataCopy = new byte[data.length]; + System.arraycopy(data, 0, dataCopy, 0, data.length); + writes.add(dataCopy); + } + + @Override + public void closeWriter() { + closed = true; + } + + @Override + public long getFileSize() { + long size = 0; + for (int i = 0; i < writes.size(); i++) { + size += writes.get(i).length; + } + return size; + } + + public List getWrites() { + return writes; + } + } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksEnvTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksEnvTest.java deleted file mode 100644 index dfb796107..000000000 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksEnvTest.java +++ /dev/null @@ -1,39 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -package org.rocksdb; - -import org.junit.ClassRule; -import org.junit.Test; - -import static org.assertj.core.api.Assertions.assertThat; - -public class RocksEnvTest { - - @ClassRule - public static final RocksMemoryResource rocksMemoryResource = - new RocksMemoryResource(); - - @Test - public void rocksEnv() { - try (final Env rocksEnv = RocksEnv.getDefault()) { - rocksEnv.setBackgroundThreads(5); - // default rocksenv will always return zero for flush pool - // no matter what was set via setBackgroundThreads - assertThat(rocksEnv.getThreadPoolQueueLen(RocksEnv.FLUSH_POOL)). - isEqualTo(0); - rocksEnv.setBackgroundThreads(5, RocksEnv.FLUSH_POOL); - // default rocksenv will always return zero for flush pool - // no matter what was set via setBackgroundThreads - assertThat(rocksEnv.getThreadPoolQueueLen(RocksEnv.FLUSH_POOL)). - isEqualTo(0); - rocksEnv.setBackgroundThreads(5, RocksEnv.COMPACTION_POOL); - // default rocksenv will always return zero for compaction pool - // no matter what was set via setBackgroundThreads - assertThat(rocksEnv.getThreadPoolQueueLen(RocksEnv.COMPACTION_POOL)). - isEqualTo(0); - } - } -} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksMemEnvTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksMemEnvTest.java index 04fae2e95..8e429d4ec 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksMemEnvTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/RocksMemEnvTest.java @@ -33,7 +33,7 @@ public class RocksMemEnvTest { "baz".getBytes() }; - try (final Env env = new RocksMemEnv(); + try (final Env env = new RocksMemEnv(Env.getDefault()); final Options options = new Options() .setCreateIfMissing(true) .setEnv(env); @@ -107,7 +107,7 @@ public class RocksMemEnvTest { "baz".getBytes() }; - try (final Env env = new RocksMemEnv(); + try (final Env env = new RocksMemEnv(Env.getDefault()); final Options options = new Options() .setCreateIfMissing(true) .setEnv(env); @@ -136,7 +136,7 @@ public class RocksMemEnvTest { @Test(expected = RocksDBException.class) public void createIfMissingFalse() throws RocksDBException { - try (final Env env = new RocksMemEnv(); + try (final Env env = new RocksMemEnv(Env.getDefault()); final Options options = new Options() .setCreateIfMissing(false) .setEnv(env); diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/StatisticsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/StatisticsTest.java index 2103c2fc7..fbd255bdb 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/StatisticsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/StatisticsTest.java @@ -96,6 +96,14 @@ public class StatisticsTest { final HistogramData histogramData = statistics.getHistogramData(HistogramType.BYTES_PER_READ); assertThat(histogramData).isNotNull(); assertThat(histogramData.getAverage()).isGreaterThan(0); + assertThat(histogramData.getMedian()).isGreaterThan(0); + assertThat(histogramData.getPercentile95()).isGreaterThan(0); + assertThat(histogramData.getPercentile99()).isGreaterThan(0); + assertThat(histogramData.getStandardDeviation()).isEqualTo(0.00); + assertThat(histogramData.getMax()).isGreaterThan(0); + assertThat(histogramData.getCount()).isGreaterThan(0); + assertThat(histogramData.getSum()).isGreaterThan(0); + assertThat(histogramData.getMin()).isGreaterThan(0); } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TableFilterTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TableFilterTest.java new file mode 100644 index 000000000..862696763 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TableFilterTest.java @@ -0,0 +1,105 @@ +package org.rocksdb; + +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.assertj.core.api.Assertions.assertThat; + +public class TableFilterTest { + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + @Test + public void readOptions() throws RocksDBException { + try (final DBOptions opt = new DBOptions(). + setCreateIfMissing(true). + setCreateMissingColumnFamilies(true); + final ColumnFamilyOptions new_cf_opts = new ColumnFamilyOptions() + ) { + final List columnFamilyDescriptors = + Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor("new_cf".getBytes(), new_cf_opts) + ); + + final List columnFamilyHandles = new ArrayList<>(); + + // open database + try (final RocksDB db = RocksDB.open(opt, + dbFolder.getRoot().getAbsolutePath(), + columnFamilyDescriptors, + columnFamilyHandles)) { + + try (final CfNameCollectionTableFilter cfNameCollectingTableFilter = + new CfNameCollectionTableFilter(); + final FlushOptions flushOptions = + new FlushOptions().setWaitForFlush(true); + final ReadOptions readOptions = + new ReadOptions().setTableFilter(cfNameCollectingTableFilter)) { + + db.put(columnFamilyHandles.get(0), + "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + db.put(columnFamilyHandles.get(0), + "key2".getBytes(UTF_8), "value2".getBytes(UTF_8)); + db.put(columnFamilyHandles.get(0), + "key3".getBytes(UTF_8), "value3".getBytes(UTF_8)); + db.put(columnFamilyHandles.get(1), + "key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + db.put(columnFamilyHandles.get(1), + "key2".getBytes(UTF_8), "value2".getBytes(UTF_8)); + db.put(columnFamilyHandles.get(1), + "key3".getBytes(UTF_8), "value3".getBytes(UTF_8)); + + db.flush(flushOptions, columnFamilyHandles); + + try (final RocksIterator iterator = + db.newIterator(columnFamilyHandles.get(0), readOptions)) { + iterator.seekToFirst(); + while (iterator.isValid()) { + iterator.key(); + iterator.value(); + iterator.next(); + } + } + + try (final RocksIterator iterator = + db.newIterator(columnFamilyHandles.get(1), readOptions)) { + iterator.seekToFirst(); + while (iterator.isValid()) { + iterator.key(); + iterator.value(); + iterator.next(); + } + } + + assertThat(cfNameCollectingTableFilter.cfNames.size()).isEqualTo(2); + assertThat(cfNameCollectingTableFilter.cfNames.get(0)) + .isEqualTo(RocksDB.DEFAULT_COLUMN_FAMILY); + assertThat(cfNameCollectingTableFilter.cfNames.get(1)) + .isEqualTo("new_cf".getBytes(UTF_8)); + } finally { + for (final ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) { + columnFamilyHandle.close(); + } + } + } + } + } + + private static class CfNameCollectionTableFilter extends AbstractTableFilter { + private final List cfNames = new ArrayList<>(); + + @Override + public boolean filter(final TableProperties tableProperties) { + cfNames.add(tableProperties.getColumnFamilyName()); + return true; + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TimedEnvTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TimedEnvTest.java new file mode 100644 index 000000000..2eb5eea82 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/TimedEnvTest.java @@ -0,0 +1,43 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import static java.nio.charset.StandardCharsets.UTF_8; + +public class TimedEnvTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + @Test + public void construct() throws RocksDBException { + try (final Env env = new TimedEnv(Env.getDefault())) { + // no-op + } + } + + @Test + public void construct_integration() throws RocksDBException { + try (final Env env = new TimedEnv(Env.getDefault()); + final Options options = new Options() + .setCreateIfMissing(true) + .setEnv(env); + ) { + try (final RocksDB db = RocksDB.open(options, dbFolder.getRoot().getPath())) { + db.put("key1".getBytes(UTF_8), "value1".getBytes(UTF_8)); + } + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WalFilterTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WalFilterTest.java new file mode 100644 index 000000000..aeb49165d --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WalFilterTest.java @@ -0,0 +1,164 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +import org.junit.ClassRule; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.rocksdb.util.TestUtil.*; + +public class WalFilterTest { + + @ClassRule + public static final RocksMemoryResource rocksMemoryResource = + new RocksMemoryResource(); + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + @Test + public void walFilter() throws RocksDBException { + // Create 3 batches with two keys each + final byte[][][] batchKeys = { + new byte[][] { + u("key1"), + u("key2") + }, + new byte[][] { + u("key3"), + u("key4") + }, + new byte[][] { + u("key5"), + u("key6") + } + + }; + + final List cfDescriptors = Arrays.asList( + new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY), + new ColumnFamilyDescriptor(u("pikachu")) + ); + final List cfHandles = new ArrayList<>(); + + // Test with all WAL processing options + for (final WalProcessingOption option : WalProcessingOption.values()) { + try (final Options options = optionsForLogIterTest(); + final DBOptions dbOptions = new DBOptions(options) + .setCreateMissingColumnFamilies(true); + final RocksDB db = RocksDB.open(dbOptions, + dbFolder.getRoot().getAbsolutePath(), + cfDescriptors, cfHandles)) { + try (final WriteOptions writeOptions = new WriteOptions()) { + // Write given keys in given batches + for (int i = 0; i < batchKeys.length; i++) { + final WriteBatch batch = new WriteBatch(); + for (int j = 0; j < batchKeys[i].length; j++) { + batch.put(cfHandles.get(0), batchKeys[i][j], dummyString(1024)); + } + db.write(writeOptions, batch); + } + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + cfHandles.clear(); + } + } + + // Create a test filter that would apply wal_processing_option at the first + // record + final int applyOptionForRecordIndex = 1; + try (final TestableWalFilter walFilter = + new TestableWalFilter(option, applyOptionForRecordIndex)) { + + try (final Options options = optionsForLogIterTest(); + final DBOptions dbOptions = new DBOptions(options) + .setWalFilter(walFilter)) { + + try (final RocksDB db = RocksDB.open(dbOptions, + dbFolder.getRoot().getAbsolutePath(), + cfDescriptors, cfHandles)) { + + try { + assertThat(walFilter.logNumbers).isNotEmpty(); + assertThat(walFilter.logFileNames).isNotEmpty(); + } finally { + for (final ColumnFamilyHandle cfHandle : cfHandles) { + cfHandle.close(); + } + cfHandles.clear(); + } + } catch (final RocksDBException e) { + if (option != WalProcessingOption.CORRUPTED_RECORD) { + // exception is expected when CORRUPTED_RECORD! + throw e; + } + } + } + } + } + } + + + private static class TestableWalFilter extends AbstractWalFilter { + private final WalProcessingOption walProcessingOption; + private final int applyOptionForRecordIndex; + Map cfLognumber; + Map cfNameId; + final List logNumbers = new ArrayList<>(); + final List logFileNames = new ArrayList<>(); + private int currentRecordIndex = 0; + + public TestableWalFilter(final WalProcessingOption walProcessingOption, + final int applyOptionForRecordIndex) { + super(); + this.walProcessingOption = walProcessingOption; + this.applyOptionForRecordIndex = applyOptionForRecordIndex; + } + + @Override + public void columnFamilyLogNumberMap(final Map cfLognumber, + final Map cfNameId) { + this.cfLognumber = cfLognumber; + this.cfNameId = cfNameId; + } + + @Override + public LogRecordFoundResult logRecordFound( + final long logNumber, final String logFileName, final WriteBatch batch, + final WriteBatch newBatch) { + + logNumbers.add(logNumber); + logFileNames.add(logFileName); + + final WalProcessingOption optionToReturn; + if (currentRecordIndex == applyOptionForRecordIndex) { + optionToReturn = walProcessingOption; + } + else { + optionToReturn = WalProcessingOption.CONTINUE_PROCESSING; + } + + currentRecordIndex++; + + return new LogRecordFoundResult(optionToReturn, false); + } + + @Override + public String name() { + return "testable-wal-filter"; + } + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchTest.java index 1e3e50b7e..92bec3dcf 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchTest.java @@ -180,6 +180,7 @@ public class WriteBatchTest { @Test public void deleteRange() throws RocksDBException { try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final WriteBatch batch = new WriteBatch(); final WriteOptions wOpt = new WriteOptions()) { db.put("key1".getBytes(), "value".getBytes()); db.put("key2".getBytes(), "12345678".getBytes()); @@ -190,9 +191,8 @@ public class WriteBatchTest { assertThat(db.get("key3".getBytes())).isEqualTo("abcdefg".getBytes()); assertThat(db.get("key4".getBytes())).isEqualTo("xyz".getBytes()); - WriteBatch batch = new WriteBatch(); batch.deleteRange("key2".getBytes(), "key4".getBytes()); - db.write(new WriteOptions(), batch); + db.write(wOpt, batch); assertThat(db.get("key1".getBytes())).isEqualTo("value".getBytes()); assertThat(db.get("key2".getBytes())).isNull(); @@ -399,7 +399,7 @@ public class WriteBatchTest { } @Test - public void hasEndrepareRange() throws RocksDBException { + public void hasEndPrepareRange() throws RocksDBException { try (final WriteBatch batch = new WriteBatch()) { assertThat(batch.hasEndPrepare()).isFalse(); } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchWithIndexTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchWithIndexTest.java index 061af2b8f..fcef00a39 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchWithIndexTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteBatchWithIndexTest.java @@ -47,7 +47,6 @@ public class WriteBatchWithIndexTest { try (final WriteBatchWithIndex wbwi = new WriteBatchWithIndex(true); final RocksIterator base = db.newIterator(); final RocksIterator it = wbwi.newIteratorWithBase(base)) { - it.seek(k1); assertThat(it.isValid()).isTrue(); assertThat(it.key()).isEqualTo(k1); @@ -105,7 +104,7 @@ public class WriteBatchWithIndexTest { } @Test - public void write_writeBatchWithIndex() throws RocksDBException { + public void writeBatchWithIndex() throws RocksDBException { try (final Options options = new Options().setCreateIfMissing(true); final RocksDB db = RocksDB.open(options, dbFolder.getRoot().getAbsolutePath())) { @@ -115,11 +114,12 @@ public class WriteBatchWithIndexTest { final byte[] k2 = "key2".getBytes(); final byte[] v2 = "value2".getBytes(); - try (final WriteBatchWithIndex wbwi = new WriteBatchWithIndex()) { + try (final WriteBatchWithIndex wbwi = new WriteBatchWithIndex(); + final WriteOptions wOpt = new WriteOptions()) { wbwi.put(k1, v1); wbwi.put(k2, v2); - db.write(new WriteOptions(), wbwi); + db.write(wOpt, wbwi); } assertThat(db.get(k1)).isEqualTo(v1); @@ -421,8 +421,8 @@ public class WriteBatchWithIndexTest { final ReadOptions readOptions, final WriteBatchWithIndex wbwi, final String skey) { final byte[] key = skey.getBytes(); - try(final RocksIterator baseIterator = db.newIterator(readOptions); - final RocksIterator iterator = wbwi.newIteratorWithBase(baseIterator)) { + try (final RocksIterator baseIterator = db.newIterator(readOptions); + final RocksIterator iterator = wbwi.newIteratorWithBase(baseIterator)) { iterator.seek(key); // Arrays.equals(key, iterator.key()) ensures an exact match in Rocks, @@ -513,6 +513,7 @@ public class WriteBatchWithIndexTest { @Test public void deleteRange() throws RocksDBException { try (final RocksDB db = RocksDB.open(dbFolder.getRoot().getAbsolutePath()); + final WriteBatch batch = new WriteBatch(); final WriteOptions wOpt = new WriteOptions()) { db.put("key1".getBytes(), "value".getBytes()); db.put("key2".getBytes(), "12345678".getBytes()); @@ -523,9 +524,8 @@ public class WriteBatchWithIndexTest { assertThat(db.get("key3".getBytes())).isEqualTo("abcdefg".getBytes()); assertThat(db.get("key4".getBytes())).isEqualTo("xyz".getBytes()); - WriteBatch batch = new WriteBatch(); batch.deleteRange("key2".getBytes(), "key4".getBytes()); - db.write(new WriteOptions(), batch); + db.write(wOpt, batch); assertThat(db.get("key1".getBytes())).isEqualTo("value".getBytes()); assertThat(db.get("key2".getBytes())).isNull(); diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteOptionsTest.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteOptionsTest.java index 27071e8f2..00c1d7239 100644 --- a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteOptionsTest.java +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/WriteOptionsTest.java @@ -45,6 +45,11 @@ public class WriteOptionsTest { assertThat(writeOptions.noSlowdown()).isTrue(); writeOptions.setNoSlowdown(false); assertThat(writeOptions.noSlowdown()).isFalse(); + + writeOptions.setLowPri(true); + assertThat(writeOptions.lowPri()).isTrue(); + writeOptions.setLowPri(false); + assertThat(writeOptions.lowPri()).isFalse(); } } diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/test/RemoveEmptyValueCompactionFilterFactory.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/test/RemoveEmptyValueCompactionFilterFactory.java new file mode 100644 index 000000000..11ffedf31 --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/test/RemoveEmptyValueCompactionFilterFactory.java @@ -0,0 +1,20 @@ +package org.rocksdb.test; + +import org.rocksdb.AbstractCompactionFilter; +import org.rocksdb.AbstractCompactionFilterFactory; +import org.rocksdb.RemoveEmptyValueCompactionFilter; + +/** + * Simple CompactionFilterFactory class used in tests. Generates RemoveEmptyValueCompactionFilters. + */ +public class RemoveEmptyValueCompactionFilterFactory extends AbstractCompactionFilterFactory { + @Override + public RemoveEmptyValueCompactionFilter createCompactionFilter(final AbstractCompactionFilter.Context context) { + return new RemoveEmptyValueCompactionFilter(); + } + + @Override + public String name() { + return "RemoveEmptyValueCompactionFilterFactory"; + } +} diff --git a/ceph/src/rocksdb/java/src/test/java/org/rocksdb/util/TestUtil.java b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/util/TestUtil.java new file mode 100644 index 000000000..12b3bbbbd --- /dev/null +++ b/ceph/src/rocksdb/java/src/test/java/org/rocksdb/util/TestUtil.java @@ -0,0 +1,72 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb.util; + +import org.rocksdb.CompactionPriority; +import org.rocksdb.Options; +import org.rocksdb.WALRecoveryMode; + +import java.util.Random; + +import static java.nio.charset.StandardCharsets.UTF_8; + +/** + * General test utilities. + */ +public class TestUtil { + + /** + * Get the options for log iteration tests. + * + * @return the options + */ + public static Options optionsForLogIterTest() { + return defaultOptions() + .setCreateIfMissing(true) + .setWalTtlSeconds(1000); + } + + /** + * Get the default options. + * + * @return the options + */ + public static Options defaultOptions() { + return new Options() + .setWriteBufferSize(4090 * 4096) + .setTargetFileSizeBase(2 * 1024 * 1024) + .setMaxBytesForLevelBase(10 * 1024 * 1024) + .setMaxOpenFiles(5000) + .setWalRecoveryMode(WALRecoveryMode.TolerateCorruptedTailRecords) + .setCompactionPriority(CompactionPriority.ByCompensatedSize); + } + + private static final Random random = new Random(); + + /** + * Generate a random string of bytes. + * + * @param len the length of the string to generate. + * + * @return the random string of bytes + */ + public static byte[] dummyString(final int len) { + final byte[] str = new byte[len]; + random.nextBytes(str); + return str; + } + + /** + * Convert a UTF-8 String to a byte array. + * + * @param str the string + * + * @return the byte array. + */ + public static byte[] u(final String str) { + return str.getBytes(UTF_8); + } +} diff --git a/ceph/src/rocksdb/memtable/alloc_tracker.cc b/ceph/src/rocksdb/memtable/alloc_tracker.cc index 9889cc423..a1fa4938c 100644 --- a/ceph/src/rocksdb/memtable/alloc_tracker.cc +++ b/ceph/src/rocksdb/memtable/alloc_tracker.cc @@ -24,7 +24,8 @@ AllocTracker::~AllocTracker() { FreeMem(); } void AllocTracker::Allocate(size_t bytes) { assert(write_buffer_manager_ != nullptr); - if (write_buffer_manager_->enabled()) { + if (write_buffer_manager_->enabled() || + write_buffer_manager_->cost_to_cache()) { bytes_allocated_.fetch_add(bytes, std::memory_order_relaxed); write_buffer_manager_->ReserveMem(bytes); } @@ -32,7 +33,8 @@ void AllocTracker::Allocate(size_t bytes) { void AllocTracker::DoneAllocating() { if (write_buffer_manager_ != nullptr && !done_allocating_) { - if (write_buffer_manager_->enabled()) { + if (write_buffer_manager_->enabled() || + write_buffer_manager_->cost_to_cache()) { write_buffer_manager_->ScheduleFreeMem( bytes_allocated_.load(std::memory_order_relaxed)); } else { @@ -47,7 +49,8 @@ void AllocTracker::FreeMem() { DoneAllocating(); } if (write_buffer_manager_ != nullptr && !freed_) { - if (write_buffer_manager_->enabled()) { + if (write_buffer_manager_->enabled() || + write_buffer_manager_->cost_to_cache()) { write_buffer_manager_->FreeMem( bytes_allocated_.load(std::memory_order_relaxed)); } else { diff --git a/ceph/src/rocksdb/memtable/hash_cuckoo_rep.cc b/ceph/src/rocksdb/memtable/hash_cuckoo_rep.cc deleted file mode 100644 index aa6e3dbf3..000000000 --- a/ceph/src/rocksdb/memtable/hash_cuckoo_rep.cc +++ /dev/null @@ -1,661 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// - -#ifndef ROCKSDB_LITE -#include "memtable/hash_cuckoo_rep.h" - -#include -#include -#include -#include -#include -#include -#include - -#include "db/memtable.h" -#include "memtable/skiplist.h" -#include "memtable/stl_wrappers.h" -#include "port/port.h" -#include "rocksdb/memtablerep.h" -#include "util/murmurhash.h" - -namespace rocksdb { -namespace { - -// the default maximum size of the cuckoo path searching queue -static const int kCuckooPathMaxSearchSteps = 100; - -struct CuckooStep { - static const int kNullStep = -1; - // the bucket id in the cuckoo array. - int bucket_id_; - // index of cuckoo-step array that points to its previous step, - // -1 if it the beginning step. - int prev_step_id_; - // the depth of the current step. - unsigned int depth_; - - CuckooStep() : bucket_id_(-1), prev_step_id_(kNullStep), depth_(1) {} - - CuckooStep(CuckooStep&& o) = default; - - CuckooStep& operator=(CuckooStep&& rhs) { - bucket_id_ = std::move(rhs.bucket_id_); - prev_step_id_ = std::move(rhs.prev_step_id_); - depth_ = std::move(rhs.depth_); - return *this; - } - - CuckooStep(const CuckooStep&) = delete; - CuckooStep& operator=(const CuckooStep&) = delete; - - CuckooStep(int bucket_id, int prev_step_id, int depth) - : bucket_id_(bucket_id), prev_step_id_(prev_step_id), depth_(depth) {} -}; - -class HashCuckooRep : public MemTableRep { - public: - explicit HashCuckooRep(const MemTableRep::KeyComparator& compare, - Allocator* allocator, const size_t bucket_count, - const unsigned int hash_func_count, - const size_t approximate_entry_size) - : MemTableRep(allocator), - compare_(compare), - allocator_(allocator), - bucket_count_(bucket_count), - approximate_entry_size_(approximate_entry_size), - cuckoo_path_max_depth_(kDefaultCuckooPathMaxDepth), - occupied_count_(0), - hash_function_count_(hash_func_count), - backup_table_(nullptr) { - char* mem = reinterpret_cast( - allocator_->Allocate(sizeof(std::atomic) * bucket_count_)); - cuckoo_array_ = new (mem) std::atomic[bucket_count_]; - for (unsigned int bid = 0; bid < bucket_count_; ++bid) { - cuckoo_array_[bid].store(nullptr, std::memory_order_relaxed); - } - - cuckoo_path_ = reinterpret_cast( - allocator_->Allocate(sizeof(int) * (cuckoo_path_max_depth_ + 1))); - is_nearly_full_ = false; - } - - // return false, indicating HashCuckooRep does not support merge operator. - virtual bool IsMergeOperatorSupported() const override { return false; } - - // return false, indicating HashCuckooRep does not support snapshot. - virtual bool IsSnapshotSupported() const override { return false; } - - // Returns true iff an entry that compares equal to key is in the collection. - virtual bool Contains(const char* internal_key) const override; - - virtual ~HashCuckooRep() override {} - - // Insert the specified key (internal_key) into the mem-table. Assertion - // fails if - // the current mem-table already contains the specified key. - virtual void Insert(KeyHandle handle) override; - - // This function returns bucket_count_ * approximate_entry_size_ when any - // of the followings happen to disallow further write operations: - // 1. when the fullness reaches kMaxFullnes. - // 2. when the backup_table_ is used. - // - // otherwise, this function will always return 0. - virtual size_t ApproximateMemoryUsage() override { - if (is_nearly_full_) { - return bucket_count_ * approximate_entry_size_; - } - return 0; - } - - virtual void Get(const LookupKey& k, void* callback_args, - bool (*callback_func)(void* arg, - const char* entry)) override; - - class Iterator : public MemTableRep::Iterator { - std::shared_ptr> bucket_; - std::vector::const_iterator mutable cit_; - const KeyComparator& compare_; - std::string tmp_; // For passing to EncodeKey - bool mutable sorted_; - void DoSort() const; - - public: - explicit Iterator(std::shared_ptr> bucket, - const KeyComparator& compare); - - // Initialize an iterator over the specified collection. - // The returned iterator is not valid. - // explicit Iterator(const MemTableRep* collection); - virtual ~Iterator() override{}; - - // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override; - - // Returns the key at the current position. - // REQUIRES: Valid() - virtual const char* key() const override; - - // Advances to the next position. - // REQUIRES: Valid() - virtual void Next() override; - - // Advances to the previous position. - // REQUIRES: Valid() - virtual void Prev() override; - - // Advance to the first entry with a key >= target - virtual void Seek(const Slice& user_key, const char* memtable_key) override; - - // Retreat to the last entry with a key <= target - virtual void SeekForPrev(const Slice& user_key, - const char* memtable_key) override; - - // Position at the first entry in collection. - // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override; - - // Position at the last entry in collection. - // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override; - }; - - struct CuckooStepBuffer { - CuckooStepBuffer() : write_index_(0), read_index_(0) {} - ~CuckooStepBuffer() {} - - int write_index_; - int read_index_; - CuckooStep steps_[kCuckooPathMaxSearchSteps]; - - CuckooStep& NextWriteBuffer() { return steps_[write_index_++]; } - - inline const CuckooStep& ReadNext() { return steps_[read_index_++]; } - - inline bool HasNewWrite() { return write_index_ > read_index_; } - - inline void reset() { - write_index_ = 0; - read_index_ = 0; - } - - inline bool IsFull() { return write_index_ >= kCuckooPathMaxSearchSteps; } - - // returns the number of steps that has been read - inline int ReadCount() { return read_index_; } - - // returns the number of steps that has been written to the buffer. - inline int WriteCount() { return write_index_; } - }; - - private: - const MemTableRep::KeyComparator& compare_; - // the pointer to Allocator to allocate memory, immutable after construction. - Allocator* const allocator_; - // the number of hash bucket in the hash table. - const size_t bucket_count_; - // approximate size of each entry - const size_t approximate_entry_size_; - // the maxinum depth of the cuckoo path. - const unsigned int cuckoo_path_max_depth_; - // the current number of entries in cuckoo_array_ which has been occupied. - size_t occupied_count_; - // the current number of hash functions used in the cuckoo hash. - unsigned int hash_function_count_; - // the backup MemTableRep to handle the case where cuckoo hash cannot find - // a vacant bucket for inserting the key of a put request. - std::shared_ptr backup_table_; - // the array to store pointers, pointing to the actual data. - std::atomic* cuckoo_array_; - // a buffer to store cuckoo path - int* cuckoo_path_; - // a boolean flag indicating whether the fullness of bucket array - // reaches the point to make the current memtable immutable. - bool is_nearly_full_; - - // the default maximum depth of the cuckoo path. - static const unsigned int kDefaultCuckooPathMaxDepth = 10; - - CuckooStepBuffer step_buffer_; - - // returns the bucket id assogied to the input slice based on the - unsigned int GetHash(const Slice& slice, const int hash_func_id) const { - // the seeds used in the Murmur hash to produce different hash functions. - static const int kMurmurHashSeeds[HashCuckooRepFactory::kMaxHashCount] = { - 545609244, 1769731426, 763324157, 13099088, 592422103, - 1899789565, 248369300, 1984183468, 1613664382, 1491157517}; - return static_cast( - MurmurHash(slice.data(), static_cast(slice.size()), - kMurmurHashSeeds[hash_func_id]) % - bucket_count_); - } - - // A cuckoo path is a sequence of bucket ids, where each id points to a - // location of cuckoo_array_. This path describes the displacement sequence - // of entries in order to store the desired data specified by the input user - // key. The path starts from one of the locations associated with the - // specified user key and ends at a vacant space in the cuckoo array. This - // function will update the cuckoo_path. - // - // @return true if it found a cuckoo path. - bool FindCuckooPath(const char* internal_key, const Slice& user_key, - int* cuckoo_path, size_t* cuckoo_path_length, - int initial_hash_id = 0); - - // Perform quick insert by checking whether there is a vacant bucket in one - // of the possible locations of the input key. If so, then the function will - // return true and the key will be stored in that vacant bucket. - // - // This function is a helper function of FindCuckooPath that discovers the - // first possible steps of a cuckoo path. It begins by first computing - // the possible locations of the input keys (and stores them in bucket_ids.) - // Then, if one of its possible locations is vacant, then the input key will - // be stored in that vacant space and the function will return true. - // Otherwise, the function will return false indicating a complete search - // of cuckoo-path is needed. - bool QuickInsert(const char* internal_key, const Slice& user_key, - int bucket_ids[], const int initial_hash_id); - - // Returns the pointer to the internal iterator to the buckets where buckets - // are sorted according to the user specified KeyComparator. Note that - // any insert after this function call may affect the sorted nature of - // the returned iterator. - virtual MemTableRep::Iterator* GetIterator(Arena* arena) override { - std::vector compact_buckets; - for (unsigned int bid = 0; bid < bucket_count_; ++bid) { - const char* bucket = cuckoo_array_[bid].load(std::memory_order_relaxed); - if (bucket != nullptr) { - compact_buckets.push_back(bucket); - } - } - MemTableRep* backup_table = backup_table_.get(); - if (backup_table != nullptr) { - std::unique_ptr iter(backup_table->GetIterator()); - for (iter->SeekToFirst(); iter->Valid(); iter->Next()) { - compact_buckets.push_back(iter->key()); - } - } - if (arena == nullptr) { - return new Iterator( - std::shared_ptr>( - new std::vector(std::move(compact_buckets))), - compare_); - } else { - auto mem = arena->AllocateAligned(sizeof(Iterator)); - return new (mem) Iterator( - std::shared_ptr>( - new std::vector(std::move(compact_buckets))), - compare_); - } - } -}; - -void HashCuckooRep::Get(const LookupKey& key, void* callback_args, - bool (*callback_func)(void* arg, const char* entry)) { - Slice user_key = key.user_key(); - for (unsigned int hid = 0; hid < hash_function_count_; ++hid) { - const char* bucket = - cuckoo_array_[GetHash(user_key, hid)].load(std::memory_order_acquire); - if (bucket != nullptr) { - Slice bucket_user_key = UserKey(bucket); - if (user_key == bucket_user_key) { - callback_func(callback_args, bucket); - break; - } - } else { - // as Put() always stores at the vacant bucket located by the - // hash function with the smallest possible id, when we first - // find a vacant bucket in Get(), that means a miss. - break; - } - } - MemTableRep* backup_table = backup_table_.get(); - if (backup_table != nullptr) { - backup_table->Get(key, callback_args, callback_func); - } -} - -void HashCuckooRep::Insert(KeyHandle handle) { - static const float kMaxFullness = 0.90f; - - auto* key = static_cast(handle); - int initial_hash_id = 0; - size_t cuckoo_path_length = 0; - auto user_key = UserKey(key); - // find cuckoo path - if (FindCuckooPath(key, user_key, cuckoo_path_, &cuckoo_path_length, - initial_hash_id) == false) { - // if true, then we can't find a vacant bucket for this key even we - // have used up all the hash functions. Then use a backup memtable to - // store such key, which will further make this mem-table become - // immutable. - if (backup_table_.get() == nullptr) { - VectorRepFactory factory(10); - backup_table_.reset( - factory.CreateMemTableRep(compare_, allocator_, nullptr, nullptr)); - is_nearly_full_ = true; - } - backup_table_->Insert(key); - return; - } - // when reaching this point, means the insert can be done successfully. - occupied_count_++; - if (occupied_count_ >= bucket_count_ * kMaxFullness) { - is_nearly_full_ = true; - } - - // perform kickout process if the length of cuckoo path > 1. - if (cuckoo_path_length == 0) return; - - // the cuckoo path stores the kickout path in reverse order. - // so the kickout or displacement is actually performed - // in reverse order, which avoids false-negatives on read - // by moving each key involved in the cuckoo path to the new - // location before replacing it. - for (size_t i = 1; i < cuckoo_path_length; ++i) { - int kicked_out_bid = cuckoo_path_[i - 1]; - int current_bid = cuckoo_path_[i]; - // since we only allow one writer at a time, it is safe to do relaxed read. - cuckoo_array_[kicked_out_bid] - .store(cuckoo_array_[current_bid].load(std::memory_order_relaxed), - std::memory_order_release); - } - int insert_key_bid = cuckoo_path_[cuckoo_path_length - 1]; - cuckoo_array_[insert_key_bid].store(key, std::memory_order_release); -} - -bool HashCuckooRep::Contains(const char* internal_key) const { - auto user_key = UserKey(internal_key); - for (unsigned int hid = 0; hid < hash_function_count_; ++hid) { - const char* stored_key = - cuckoo_array_[GetHash(user_key, hid)].load(std::memory_order_acquire); - if (stored_key != nullptr) { - if (compare_(internal_key, stored_key) == 0) { - return true; - } - } - } - return false; -} - -bool HashCuckooRep::QuickInsert(const char* internal_key, const Slice& user_key, - int bucket_ids[], const int initial_hash_id) { - int cuckoo_bucket_id = -1; - - // Below does the followings: - // 0. Calculate all possible locations of the input key. - // 1. Check if there is a bucket having same user_key as the input does. - // 2. If there exists such bucket, then replace this bucket by the newly - // insert data and return. This step also performs duplication check. - // 3. If no such bucket exists but exists a vacant bucket, then insert the - // input data into it. - // 4. If step 1 to 3 all fail, then return false. - for (unsigned int hid = initial_hash_id; hid < hash_function_count_; ++hid) { - bucket_ids[hid] = GetHash(user_key, hid); - // since only one PUT is allowed at a time, and this is part of the PUT - // operation, so we can safely perform relaxed load. - const char* stored_key = - cuckoo_array_[bucket_ids[hid]].load(std::memory_order_relaxed); - if (stored_key == nullptr) { - if (cuckoo_bucket_id == -1) { - cuckoo_bucket_id = bucket_ids[hid]; - } - } else { - const auto bucket_user_key = UserKey(stored_key); - if (bucket_user_key.compare(user_key) == 0) { - cuckoo_bucket_id = bucket_ids[hid]; - assert(cuckoo_bucket_id != -1); - break; - } - } - } - - if (cuckoo_bucket_id != -1) { - cuckoo_array_[cuckoo_bucket_id].store(const_cast(internal_key), - std::memory_order_release); - return true; - } - - return false; -} - -// Perform pre-check and find the shortest cuckoo path. A cuckoo path -// is a displacement sequence for inserting the specified input key. -// -// @return true if it successfully found a vacant space or cuckoo-path. -// If the return value is true but the length of cuckoo_path is zero, -// then it indicates that a vacant bucket or an bucket with matched user -// key with the input is found, and a quick insertion is done. -bool HashCuckooRep::FindCuckooPath(const char* internal_key, - const Slice& user_key, int* cuckoo_path, - size_t* cuckoo_path_length, - const int initial_hash_id) { - int bucket_ids[HashCuckooRepFactory::kMaxHashCount]; - *cuckoo_path_length = 0; - - if (QuickInsert(internal_key, user_key, bucket_ids, initial_hash_id)) { - return true; - } - // If this step is reached, then it means: - // 1. no vacant bucket in any of the possible locations of the input key. - // 2. none of the possible locations of the input key has the same user - // key as the input `internal_key`. - - // the front and back indices for the step_queue_ - step_buffer_.reset(); - - for (unsigned int hid = initial_hash_id; hid < hash_function_count_; ++hid) { - /// CuckooStep& current_step = step_queue_[front_pos++]; - CuckooStep& current_step = step_buffer_.NextWriteBuffer(); - current_step.bucket_id_ = bucket_ids[hid]; - current_step.prev_step_id_ = CuckooStep::kNullStep; - current_step.depth_ = 1; - } - - while (step_buffer_.HasNewWrite()) { - int step_id = step_buffer_.read_index_; - const CuckooStep& step = step_buffer_.ReadNext(); - // Since it's a BFS process, then the first step with its depth deeper - // than the maximum allowed depth indicates all the remaining steps - // in the step buffer queue will all exceed the maximum depth. - // Return false immediately indicating we can't find a vacant bucket - // for the input key before the maximum allowed depth. - if (step.depth_ >= cuckoo_path_max_depth_) { - return false; - } - // again, we can perform no barrier load safely here as the current - // thread is the only writer. - Slice bucket_user_key = - UserKey(cuckoo_array_[step.bucket_id_].load(std::memory_order_relaxed)); - if (step.prev_step_id_ != CuckooStep::kNullStep) { - if (bucket_user_key == user_key) { - // then there is a loop in the current path, stop discovering this path. - continue; - } - } - // if the current bucket stores at its nth location, then we only consider - // its mth location where m > n. This property makes sure that all reads - // will not miss if we do have data associated to the query key. - // - // The n and m in the above statement is the start_hid and hid in the code. - unsigned int start_hid = hash_function_count_; - for (unsigned int hid = 0; hid < hash_function_count_; ++hid) { - bucket_ids[hid] = GetHash(bucket_user_key, hid); - if (step.bucket_id_ == bucket_ids[hid]) { - start_hid = hid; - } - } - // must found a bucket which is its current "home". - assert(start_hid != hash_function_count_); - - // explore all possible next steps from the current step. - for (unsigned int hid = start_hid + 1; hid < hash_function_count_; ++hid) { - CuckooStep& next_step = step_buffer_.NextWriteBuffer(); - next_step.bucket_id_ = bucket_ids[hid]; - next_step.prev_step_id_ = step_id; - next_step.depth_ = step.depth_ + 1; - // once a vacant bucket is found, trace back all its previous steps - // to generate a cuckoo path. - if (cuckoo_array_[next_step.bucket_id_].load(std::memory_order_relaxed) == - nullptr) { - // store the last step in the cuckoo path. Note that cuckoo_path - // stores steps in reverse order. This allows us to move keys along - // the cuckoo path by storing each key to the new place first before - // removing it from the old place. This property ensures reads will - // not missed due to moving keys along the cuckoo path. - cuckoo_path[(*cuckoo_path_length)++] = next_step.bucket_id_; - int depth; - for (depth = step.depth_; depth > 0 && step_id != CuckooStep::kNullStep; - depth--) { - const CuckooStep& prev_step = step_buffer_.steps_[step_id]; - cuckoo_path[(*cuckoo_path_length)++] = prev_step.bucket_id_; - step_id = prev_step.prev_step_id_; - } - assert(depth == 0 && step_id == CuckooStep::kNullStep); - return true; - } - if (step_buffer_.IsFull()) { - // if true, then it reaches maxinum number of cuckoo search steps. - return false; - } - } - } - - // tried all possible paths but still not unable to find a cuckoo path - // which path leads to a vacant bucket. - return false; -} - -HashCuckooRep::Iterator::Iterator( - std::shared_ptr> bucket, - const KeyComparator& compare) - : bucket_(bucket), - cit_(bucket_->end()), - compare_(compare), - sorted_(false) {} - -void HashCuckooRep::Iterator::DoSort() const { - if (!sorted_) { - std::sort(bucket_->begin(), bucket_->end(), - stl_wrappers::Compare(compare_)); - cit_ = bucket_->begin(); - sorted_ = true; - } -} - -// Returns true iff the iterator is positioned at a valid node. -bool HashCuckooRep::Iterator::Valid() const { - DoSort(); - return cit_ != bucket_->end(); -} - -// Returns the key at the current position. -// REQUIRES: Valid() -const char* HashCuckooRep::Iterator::key() const { - assert(Valid()); - return *cit_; -} - -// Advances to the next position. -// REQUIRES: Valid() -void HashCuckooRep::Iterator::Next() { - assert(Valid()); - if (cit_ == bucket_->end()) { - return; - } - ++cit_; -} - -// Advances to the previous position. -// REQUIRES: Valid() -void HashCuckooRep::Iterator::Prev() { - assert(Valid()); - if (cit_ == bucket_->begin()) { - // If you try to go back from the first element, the iterator should be - // invalidated. So we set it to past-the-end. This means that you can - // treat the container circularly. - cit_ = bucket_->end(); - } else { - --cit_; - } -} - -// Advance to the first entry with a key >= target -void HashCuckooRep::Iterator::Seek(const Slice& user_key, - const char* memtable_key) { - DoSort(); - // Do binary search to find first value not less than the target - const char* encoded_key = - (memtable_key != nullptr) ? memtable_key : EncodeKey(&tmp_, user_key); - cit_ = std::equal_range(bucket_->begin(), bucket_->end(), encoded_key, - [this](const char* a, const char* b) { - return compare_(a, b) < 0; - }).first; -} - -// Retreat to the last entry with a key <= target -void HashCuckooRep::Iterator::SeekForPrev(const Slice& /*user_key*/, - const char* /*memtable_key*/) { - assert(false); -} - -// Position at the first entry in collection. -// Final state of iterator is Valid() iff collection is not empty. -void HashCuckooRep::Iterator::SeekToFirst() { - DoSort(); - cit_ = bucket_->begin(); -} - -// Position at the last entry in collection. -// Final state of iterator is Valid() iff collection is not empty. -void HashCuckooRep::Iterator::SeekToLast() { - DoSort(); - cit_ = bucket_->end(); - if (bucket_->size() != 0) { - --cit_; - } -} - -} // anom namespace - -MemTableRep* HashCuckooRepFactory::CreateMemTableRep( - const MemTableRep::KeyComparator& compare, Allocator* allocator, - const SliceTransform* /*transform*/, Logger* /*logger*/) { - // The estimated average fullness. The write performance of any close hash - // degrades as the fullness of the mem-table increases. Setting kFullness - // to a value around 0.7 can better avoid write performance degradation while - // keeping efficient memory usage. - static const float kFullness = 0.7f; - size_t pointer_size = sizeof(std::atomic); - assert(write_buffer_size_ >= (average_data_size_ + pointer_size)); - size_t bucket_count = - static_cast( - (write_buffer_size_ / (average_data_size_ + pointer_size)) / kFullness + - 1); - unsigned int hash_function_count = hash_function_count_; - if (hash_function_count < 2) { - hash_function_count = 2; - } - if (hash_function_count > kMaxHashCount) { - hash_function_count = kMaxHashCount; - } - return new HashCuckooRep(compare, allocator, bucket_count, - hash_function_count, - static_cast( - (average_data_size_ + pointer_size) / kFullness) - ); -} - -MemTableRepFactory* NewHashCuckooRepFactory(size_t write_buffer_size, - size_t average_data_size, - unsigned int hash_function_count) { - return new HashCuckooRepFactory(write_buffer_size, average_data_size, - hash_function_count); -} - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/memtable/hash_cuckoo_rep.h b/ceph/src/rocksdb/memtable/hash_cuckoo_rep.h deleted file mode 100644 index 800696e93..000000000 --- a/ceph/src/rocksdb/memtable/hash_cuckoo_rep.h +++ /dev/null @@ -1,44 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// Copyright (c) 2011 The LevelDB Authors. All rights reserved. -// Use of this source code is governed by a BSD-style license that can be -// found in the LICENSE file. See the AUTHORS file for names of contributors. - -#pragma once -#ifndef ROCKSDB_LITE -#include "port/port.h" -#include "rocksdb/slice_transform.h" -#include "rocksdb/memtablerep.h" - -namespace rocksdb { - -class HashCuckooRepFactory : public MemTableRepFactory { - public: - // maxinum number of hash functions used in the cuckoo hash. - static const unsigned int kMaxHashCount = 10; - - explicit HashCuckooRepFactory(size_t write_buffer_size, - size_t average_data_size, - unsigned int hash_function_count) - : write_buffer_size_(write_buffer_size), - average_data_size_(average_data_size), - hash_function_count_(hash_function_count) {} - - virtual ~HashCuckooRepFactory() {} - - using MemTableRepFactory::CreateMemTableRep; - virtual MemTableRep* CreateMemTableRep( - const MemTableRep::KeyComparator& compare, Allocator* allocator, - const SliceTransform* transform, Logger* logger) override; - - virtual const char* Name() const override { return "HashCuckooRepFactory"; } - - private: - size_t write_buffer_size_; - size_t average_data_size_; - const unsigned int hash_function_count_; -}; -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/memtable/hash_linklist_rep.cc b/ceph/src/rocksdb/memtable/hash_linklist_rep.cc index b23a9f5e5..878d23383 100644 --- a/ceph/src/rocksdb/memtable/hash_linklist_rep.cc +++ b/ceph/src/rocksdb/memtable/hash_linklist_rep.cc @@ -17,7 +17,7 @@ #include "rocksdb/slice.h" #include "rocksdb/slice_transform.h" #include "util/arena.h" -#include "util/murmurhash.h" +#include "util/hash.h" namespace rocksdb { namespace { @@ -168,24 +168,23 @@ class HashLinkListRep : public MemTableRep { int bucket_entries_logging_threshold, bool if_log_bucket_dist_when_flash); - virtual KeyHandle Allocate(const size_t len, char** buf) override; + KeyHandle Allocate(const size_t len, char** buf) override; - virtual void Insert(KeyHandle handle) override; + void Insert(KeyHandle handle) override; - virtual bool Contains(const char* key) const override; + bool Contains(const char* key) const override; - virtual size_t ApproximateMemoryUsage() override; + size_t ApproximateMemoryUsage() override; - virtual void Get(const LookupKey& k, void* callback_args, - bool (*callback_func)(void* arg, - const char* entry)) override; + void Get(const LookupKey& k, void* callback_args, + bool (*callback_func)(void* arg, const char* entry)) override; - virtual ~HashLinkListRep(); + ~HashLinkListRep() override; - virtual MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override; + MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override; - virtual MemTableRep::Iterator* GetDynamicPrefixIterator( - Arena* arena = nullptr) override; + MemTableRep::Iterator* GetDynamicPrefixIterator( + Arena* arena = nullptr) override; private: friend class DynamicIterator; @@ -219,7 +218,7 @@ class HashLinkListRep : public MemTableRep { } size_t GetHash(const Slice& slice) const { - return MurmurHash(slice.data(), static_cast(slice.size()), 0) % + return NPHash64(slice.data(), static_cast(slice.size()), 0) % bucket_size_; } @@ -265,36 +264,34 @@ class HashLinkListRep : public MemTableRep { explicit FullListIterator(MemtableSkipList* list, Allocator* allocator) : iter_(list), full_list_(list), allocator_(allocator) {} - virtual ~FullListIterator() { - } + ~FullListIterator() override {} // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override { return iter_.Valid(); } + bool Valid() const override { return iter_.Valid(); } // Returns the key at the current position. // REQUIRES: Valid() - virtual const char* key() const override { + const char* key() const override { assert(Valid()); return iter_.key(); } // Advances to the next position. // REQUIRES: Valid() - virtual void Next() override { + void Next() override { assert(Valid()); iter_.Next(); } // Advances to the previous position. // REQUIRES: Valid() - virtual void Prev() override { + void Prev() override { assert(Valid()); iter_.Prev(); } // Advance to the first entry with a key >= target - virtual void Seek(const Slice& internal_key, - const char* memtable_key) override { + void Seek(const Slice& internal_key, const char* memtable_key) override { const char* encoded_key = (memtable_key != nullptr) ? memtable_key : EncodeKey(&tmp_, internal_key); @@ -302,8 +299,8 @@ class HashLinkListRep : public MemTableRep { } // Retreat to the last entry with a key <= target - virtual void SeekForPrev(const Slice& internal_key, - const char* memtable_key) override { + void SeekForPrev(const Slice& internal_key, + const char* memtable_key) override { const char* encoded_key = (memtable_key != nullptr) ? memtable_key : EncodeKey(&tmp_, internal_key); @@ -312,11 +309,12 @@ class HashLinkListRep : public MemTableRep { // Position at the first entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override { iter_.SeekToFirst(); } + void SeekToFirst() override { iter_.SeekToFirst(); } // Position at the last entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override { iter_.SeekToLast(); } + void SeekToLast() override { iter_.SeekToLast(); } + private: MemtableSkipList::Iterator iter_; // To destruct with the iterator. @@ -333,43 +331,43 @@ class HashLinkListRep : public MemTableRep { head_(head), node_(nullptr) {} - virtual ~LinkListIterator() {} + ~LinkListIterator() override {} // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override { return node_ != nullptr; } + bool Valid() const override { return node_ != nullptr; } // Returns the key at the current position. // REQUIRES: Valid() - virtual const char* key() const override { + const char* key() const override { assert(Valid()); return node_->key; } // Advances to the next position. // REQUIRES: Valid() - virtual void Next() override { + void Next() override { assert(Valid()); node_ = node_->Next(); } // Advances to the previous position. // REQUIRES: Valid() - virtual void Prev() override { + void Prev() override { // Prefix iterator does not support total order. // We simply set the iterator to invalid state Reset(nullptr); } // Advance to the first entry with a key >= target - virtual void Seek(const Slice& internal_key, - const char* /*memtable_key*/) override { + void Seek(const Slice& internal_key, + const char* /*memtable_key*/) override { node_ = hash_link_list_rep_->FindGreaterOrEqualInBucket(head_, internal_key); } // Retreat to the last entry with a key <= target - virtual void SeekForPrev(const Slice& /*internal_key*/, - const char* /*memtable_key*/) override { + void SeekForPrev(const Slice& /*internal_key*/, + const char* /*memtable_key*/) override { // Since we do not support Prev() // We simply do not support SeekForPrev Reset(nullptr); @@ -377,7 +375,7 @@ class HashLinkListRep : public MemTableRep { // Position at the first entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override { + void SeekToFirst() override { // Prefix iterator does not support total order. // We simply set the iterator to invalid state Reset(nullptr); @@ -385,7 +383,7 @@ class HashLinkListRep : public MemTableRep { // Position at the last entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override { + void SeekToLast() override { // Prefix iterator does not support total order. // We simply set the iterator to invalid state Reset(nullptr); @@ -414,7 +412,7 @@ class HashLinkListRep : public MemTableRep { memtable_rep_(memtable_rep) {} // Advance to the first entry with a key >= target - virtual void Seek(const Slice& k, const char* memtable_key) override { + void Seek(const Slice& k, const char* memtable_key) override { auto transformed = memtable_rep_.GetPrefix(k); auto* bucket = memtable_rep_.GetBucket(transformed); @@ -443,21 +441,21 @@ class HashLinkListRep : public MemTableRep { } } - virtual bool Valid() const override { + bool Valid() const override { if (skip_list_iter_) { return skip_list_iter_->Valid(); } return HashLinkListRep::LinkListIterator::Valid(); } - virtual const char* key() const override { + const char* key() const override { if (skip_list_iter_) { return skip_list_iter_->key(); } return HashLinkListRep::LinkListIterator::key(); } - virtual void Next() override { + void Next() override { if (skip_list_iter_) { skip_list_iter_->Next(); } else { @@ -476,19 +474,19 @@ class HashLinkListRep : public MemTableRep { // instantiating an empty bucket over which to iterate. public: EmptyIterator() { } - virtual bool Valid() const override { return false; } - virtual const char* key() const override { + bool Valid() const override { return false; } + const char* key() const override { assert(false); return nullptr; } - virtual void Next() override {} - virtual void Prev() override {} - virtual void Seek(const Slice& /*user_key*/, - const char* /*memtable_key*/) override {} - virtual void SeekForPrev(const Slice& /*user_key*/, - const char* /*memtable_key*/) override {} - virtual void SeekToFirst() override {} - virtual void SeekToLast() override {} + void Next() override {} + void Prev() override {} + void Seek(const Slice& /*user_key*/, + const char* /*memtable_key*/) override {} + void SeekForPrev(const Slice& /*user_key*/, + const char* /*memtable_key*/) override {} + void SeekToFirst() override {} + void SeekToLast() override {} private: }; diff --git a/ceph/src/rocksdb/memtable/hash_skiplist_rep.cc b/ceph/src/rocksdb/memtable/hash_skiplist_rep.cc index 93082b1ec..d02919cd4 100644 --- a/ceph/src/rocksdb/memtable/hash_skiplist_rep.cc +++ b/ceph/src/rocksdb/memtable/hash_skiplist_rep.cc @@ -28,21 +28,20 @@ class HashSkipListRep : public MemTableRep { size_t bucket_size, int32_t skiplist_height, int32_t skiplist_branching_factor); - virtual void Insert(KeyHandle handle) override; + void Insert(KeyHandle handle) override; - virtual bool Contains(const char* key) const override; + bool Contains(const char* key) const override; - virtual size_t ApproximateMemoryUsage() override; + size_t ApproximateMemoryUsage() override; - virtual void Get(const LookupKey& k, void* callback_args, - bool (*callback_func)(void* arg, - const char* entry)) override; + void Get(const LookupKey& k, void* callback_args, + bool (*callback_func)(void* arg, const char* entry)) override; - virtual ~HashSkipListRep(); + ~HashSkipListRep() override; - virtual MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override; + MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override; - virtual MemTableRep::Iterator* GetDynamicPrefixIterator( + MemTableRep::Iterator* GetDynamicPrefixIterator( Arena* arena = nullptr) override; private: @@ -85,7 +84,7 @@ class HashSkipListRep : public MemTableRep { Arena* arena = nullptr) : list_(list), iter_(list), own_list_(own_list), arena_(arena) {} - virtual ~Iterator() { + ~Iterator() override { // if we own the list, we should also delete it if (own_list_) { assert(list_ != nullptr); @@ -94,34 +93,31 @@ class HashSkipListRep : public MemTableRep { } // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override { - return list_ != nullptr && iter_.Valid(); - } + bool Valid() const override { return list_ != nullptr && iter_.Valid(); } // Returns the key at the current position. // REQUIRES: Valid() - virtual const char* key() const override { + const char* key() const override { assert(Valid()); return iter_.key(); } // Advances to the next position. // REQUIRES: Valid() - virtual void Next() override { + void Next() override { assert(Valid()); iter_.Next(); } // Advances to the previous position. // REQUIRES: Valid() - virtual void Prev() override { + void Prev() override { assert(Valid()); iter_.Prev(); } // Advance to the first entry with a key >= target - virtual void Seek(const Slice& internal_key, - const char* memtable_key) override { + void Seek(const Slice& internal_key, const char* memtable_key) override { if (list_ != nullptr) { const char* encoded_key = (memtable_key != nullptr) ? @@ -131,15 +127,15 @@ class HashSkipListRep : public MemTableRep { } // Retreat to the last entry with a key <= target - virtual void SeekForPrev(const Slice& /*internal_key*/, - const char* /*memtable_key*/) override { + void SeekForPrev(const Slice& /*internal_key*/, + const char* /*memtable_key*/) override { // not supported assert(false); } // Position at the first entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override { + void SeekToFirst() override { if (list_ != nullptr) { iter_.SeekToFirst(); } @@ -147,11 +143,12 @@ class HashSkipListRep : public MemTableRep { // Position at the last entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override { + void SeekToLast() override { if (list_ != nullptr) { iter_.SeekToLast(); } } + protected: void Reset(Bucket* list) { if (own_list_) { @@ -168,7 +165,7 @@ class HashSkipListRep : public MemTableRep { Bucket* list_; Bucket::Iterator iter_; // here we track if we own list_. If we own it, we are also - // responsible for it's cleaning. This is a poor man's shared_ptr + // responsible for it's cleaning. This is a poor man's std::shared_ptr bool own_list_; std::unique_ptr arena_; std::string tmp_; // For passing to EncodeKey @@ -181,7 +178,7 @@ class HashSkipListRep : public MemTableRep { memtable_rep_(memtable_rep) {} // Advance to the first entry with a key >= target - virtual void Seek(const Slice& k, const char* memtable_key) override { + void Seek(const Slice& k, const char* memtable_key) override { auto transformed = memtable_rep_.transform_->Transform(ExtractUserKey(k)); Reset(memtable_rep_.GetBucket(transformed)); HashSkipListRep::Iterator::Seek(k, memtable_key); @@ -189,7 +186,7 @@ class HashSkipListRep : public MemTableRep { // Position at the first entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override { + void SeekToFirst() override { // Prefix iterator does not support total order. // We simply set the iterator to invalid state Reset(nullptr); @@ -197,11 +194,12 @@ class HashSkipListRep : public MemTableRep { // Position at the last entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override { + void SeekToLast() override { // Prefix iterator does not support total order. // We simply set the iterator to invalid state Reset(nullptr); } + private: // the underlying memtable const HashSkipListRep& memtable_rep_; @@ -212,19 +210,19 @@ class HashSkipListRep : public MemTableRep { // instantiating an empty bucket over which to iterate. public: EmptyIterator() { } - virtual bool Valid() const override { return false; } - virtual const char* key() const override { + bool Valid() const override { return false; } + const char* key() const override { assert(false); return nullptr; } - virtual void Next() override {} - virtual void Prev() override {} - virtual void Seek(const Slice& /*internal_key*/, - const char* /*memtable_key*/) override {} - virtual void SeekForPrev(const Slice& /*internal_key*/, - const char* /*memtable_key*/) override {} - virtual void SeekToFirst() override {} - virtual void SeekToLast() override {} + void Next() override {} + void Prev() override {} + void Seek(const Slice& /*internal_key*/, + const char* /*memtable_key*/) override {} + void SeekForPrev(const Slice& /*internal_key*/, + const char* /*memtable_key*/) override {} + void SeekToFirst() override {} + void SeekToLast() override {} private: }; diff --git a/ceph/src/rocksdb/memtable/memtablerep_bench.cc b/ceph/src/rocksdb/memtable/memtablerep_bench.cc index 0c74bb61d..51ff11a01 100644 --- a/ceph/src/rocksdb/memtable/memtablerep_bench.cc +++ b/ceph/src/rocksdb/memtable/memtablerep_bench.cc @@ -95,17 +95,8 @@ DEFINE_int32( threshold_use_skiplist, 256, "threshold_use_skiplist parameter to pass into NewHashLinkListRepFactory"); -DEFINE_int64( - write_buffer_size, 256, - "write_buffer_size parameter to pass into NewHashCuckooRepFactory"); - -DEFINE_int64( - average_data_size, 64, - "average_data_size parameter to pass into NewHashCuckooRepFactory"); - -DEFINE_int64( - hash_function_count, 4, - "hash_function_count parameter to pass into NewHashCuckooRepFactory"); +DEFINE_int64(write_buffer_size, 256, + "write_buffer_size parameter to pass into WriteBufferManager"); DEFINE_int32( num_threads, 1, @@ -607,12 +598,6 @@ int main(int argc, char** argv) { FLAGS_if_log_bucket_dist_when_flash, FLAGS_threshold_use_skiplist)); options.prefix_extractor.reset( rocksdb::NewFixedPrefixTransform(FLAGS_prefix_length)); - } else if (FLAGS_memtablerep == "cuckoo") { - factory.reset(rocksdb::NewHashCuckooRepFactory( - FLAGS_write_buffer_size, FLAGS_average_data_size, - static_cast(FLAGS_hash_function_count))); - options.prefix_extractor.reset( - rocksdb::NewFixedPrefixTransform(FLAGS_prefix_length)); #endif // ROCKSDB_LITE } else { fprintf(stdout, "Unknown memtablerep: %s\n", FLAGS_memtablerep.c_str()); diff --git a/ceph/src/rocksdb/memtable/skiplistrep.cc b/ceph/src/rocksdb/memtable/skiplistrep.cc index 1e56e1a98..32870b127 100644 --- a/ceph/src/rocksdb/memtable/skiplistrep.cc +++ b/ceph/src/rocksdb/memtable/skiplistrep.cc @@ -27,57 +27,55 @@ public: transform_(transform), lookahead_(lookahead) {} - virtual KeyHandle Allocate(const size_t len, char** buf) override { + KeyHandle Allocate(const size_t len, char** buf) override { *buf = skip_list_.AllocateKey(len); return static_cast(*buf); - } + } // Insert key into the list. // REQUIRES: nothing that compares equal to key is currently in the list. - virtual void Insert(KeyHandle handle) override { - skip_list_.Insert(static_cast(handle)); - } + void Insert(KeyHandle handle) override { + skip_list_.Insert(static_cast(handle)); + } - virtual bool InsertKey(KeyHandle handle) override { - return skip_list_.Insert(static_cast(handle)); - } + bool InsertKey(KeyHandle handle) override { + return skip_list_.Insert(static_cast(handle)); + } - virtual void InsertWithHint(KeyHandle handle, void** hint) override { - skip_list_.InsertWithHint(static_cast(handle), hint); - } + void InsertWithHint(KeyHandle handle, void** hint) override { + skip_list_.InsertWithHint(static_cast(handle), hint); + } - virtual bool InsertKeyWithHint(KeyHandle handle, void** hint) override { - return skip_list_.InsertWithHint(static_cast(handle), hint); - } + bool InsertKeyWithHint(KeyHandle handle, void** hint) override { + return skip_list_.InsertWithHint(static_cast(handle), hint); + } - virtual void InsertConcurrently(KeyHandle handle) override { - skip_list_.InsertConcurrently(static_cast(handle)); - } + void InsertConcurrently(KeyHandle handle) override { + skip_list_.InsertConcurrently(static_cast(handle)); + } - virtual bool InsertKeyConcurrently(KeyHandle handle) override { - return skip_list_.InsertConcurrently(static_cast(handle)); - } + bool InsertKeyConcurrently(KeyHandle handle) override { + return skip_list_.InsertConcurrently(static_cast(handle)); + } // Returns true iff an entry that compares equal to key is in the list. - virtual bool Contains(const char* key) const override { - return skip_list_.Contains(key); - } - - virtual size_t ApproximateMemoryUsage() override { - // All memory is allocated through allocator; nothing to report here - return 0; - } - - virtual void Get(const LookupKey& k, void* callback_args, - bool (*callback_func)(void* arg, - const char* entry)) override { - SkipListRep::Iterator iter(&skip_list_); - Slice dummy_slice; - for (iter.Seek(dummy_slice, k.memtable_key().data()); - iter.Valid() && callback_func(callback_args, iter.key()); - iter.Next()) { - } - } + bool Contains(const char* key) const override { + return skip_list_.Contains(key); + } + + size_t ApproximateMemoryUsage() override { + // All memory is allocated through allocator; nothing to report here + return 0; + } + + void Get(const LookupKey& k, void* callback_args, + bool (*callback_func)(void* arg, const char* entry)) override { + SkipListRep::Iterator iter(&skip_list_); + Slice dummy_slice; + for (iter.Seek(dummy_slice, k.memtable_key().data()); + iter.Valid() && callback_func(callback_args, iter.key()); iter.Next()) { + } + } uint64_t ApproximateNumEntries(const Slice& start_ikey, const Slice& end_ikey) override { @@ -88,7 +86,7 @@ public: return (end_count >= start_count) ? (end_count - start_count) : 0; } - virtual ~SkipListRep() override { } + ~SkipListRep() override {} // Iteration over the contents of a skip list class Iterator : public MemTableRep::Iterator { @@ -101,34 +99,25 @@ public: const InlineSkipList* list) : iter_(list) {} - virtual ~Iterator() override { } + ~Iterator() override {} // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override { - return iter_.Valid(); - } + bool Valid() const override { return iter_.Valid(); } // Returns the key at the current position. // REQUIRES: Valid() - virtual const char* key() const override { - return iter_.key(); - } + const char* key() const override { return iter_.key(); } // Advances to the next position. // REQUIRES: Valid() - virtual void Next() override { - iter_.Next(); - } + void Next() override { iter_.Next(); } // Advances to the previous position. // REQUIRES: Valid() - virtual void Prev() override { - iter_.Prev(); - } + void Prev() override { iter_.Prev(); } // Advance to the first entry with a key >= target - virtual void Seek(const Slice& user_key, const char* memtable_key) - override { + void Seek(const Slice& user_key, const char* memtable_key) override { if (memtable_key != nullptr) { iter_.Seek(memtable_key); } else { @@ -137,8 +126,7 @@ public: } // Retreat to the last entry with a key <= target - virtual void SeekForPrev(const Slice& user_key, - const char* memtable_key) override { + void SeekForPrev(const Slice& user_key, const char* memtable_key) override { if (memtable_key != nullptr) { iter_.SeekForPrev(memtable_key); } else { @@ -148,15 +136,12 @@ public: // Position at the first entry in list. // Final state of iterator is Valid() iff list is not empty. - virtual void SeekToFirst() override { - iter_.SeekToFirst(); - } + void SeekToFirst() override { iter_.SeekToFirst(); } // Position at the last entry in list. // Final state of iterator is Valid() iff list is not empty. - virtual void SeekToLast() override { - iter_.SeekToLast(); - } + void SeekToLast() override { iter_.SeekToLast(); } + protected: std::string tmp_; // For passing to EncodeKey }; @@ -170,18 +155,16 @@ public: explicit LookaheadIterator(const SkipListRep& rep) : rep_(rep), iter_(&rep_.skip_list_), prev_(iter_) {} - virtual ~LookaheadIterator() override {} + ~LookaheadIterator() override {} - virtual bool Valid() const override { - return iter_.Valid(); - } + bool Valid() const override { return iter_.Valid(); } - virtual const char *key() const override { + const char* key() const override { assert(Valid()); return iter_.key(); } - virtual void Next() override { + void Next() override { assert(Valid()); bool advance_prev = true; @@ -206,14 +189,13 @@ public: iter_.Next(); } - virtual void Prev() override { + void Prev() override { assert(Valid()); iter_.Prev(); prev_ = iter_; } - virtual void Seek(const Slice& internal_key, const char *memtable_key) - override { + void Seek(const Slice& internal_key, const char* memtable_key) override { const char *encoded_key = (memtable_key != nullptr) ? memtable_key : EncodeKey(&tmp_, internal_key); @@ -236,8 +218,8 @@ public: prev_ = iter_; } - virtual void SeekForPrev(const Slice& internal_key, - const char* memtable_key) override { + void SeekForPrev(const Slice& internal_key, + const char* memtable_key) override { const char* encoded_key = (memtable_key != nullptr) ? memtable_key : EncodeKey(&tmp_, internal_key); @@ -245,12 +227,12 @@ public: prev_ = iter_; } - virtual void SeekToFirst() override { + void SeekToFirst() override { iter_.SeekToFirst(); prev_ = iter_; } - virtual void SeekToLast() override { + void SeekToLast() override { iter_.SeekToLast(); prev_ = iter_; } @@ -264,7 +246,7 @@ public: InlineSkipList::Iterator prev_; }; - virtual MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override { + MemTableRep::Iterator* GetIterator(Arena* arena = nullptr) override { if (lookahead_ > 0) { void *mem = arena ? arena->AllocateAligned(sizeof(SkipListRep::LookaheadIterator)) diff --git a/ceph/src/rocksdb/memtable/stl_wrappers.h b/ceph/src/rocksdb/memtable/stl_wrappers.h index 19fa15148..0287f4f8f 100644 --- a/ceph/src/rocksdb/memtable/stl_wrappers.h +++ b/ceph/src/rocksdb/memtable/stl_wrappers.h @@ -11,7 +11,6 @@ #include "rocksdb/memtablerep.h" #include "rocksdb/slice.h" #include "util/coding.h" -#include "util/murmurhash.h" namespace rocksdb { namespace stl_wrappers { diff --git a/ceph/src/rocksdb/memtable/vectorrep.cc b/ceph/src/rocksdb/memtable/vectorrep.cc index 378b29624..827ab8a5d 100644 --- a/ceph/src/rocksdb/memtable/vectorrep.cc +++ b/ceph/src/rocksdb/memtable/vectorrep.cc @@ -31,20 +31,19 @@ class VectorRep : public MemTableRep { // single buffer and pass that in as the parameter to Insert) // REQUIRES: nothing that compares equal to key is currently in the // collection. - virtual void Insert(KeyHandle handle) override; + void Insert(KeyHandle handle) override; // Returns true iff an entry that compares equal to key is in the collection. - virtual bool Contains(const char* key) const override; + bool Contains(const char* key) const override; - virtual void MarkReadOnly() override; + void MarkReadOnly() override; - virtual size_t ApproximateMemoryUsage() override; + size_t ApproximateMemoryUsage() override; - virtual void Get(const LookupKey& k, void* callback_args, - bool (*callback_func)(void* arg, - const char* entry)) override; + void Get(const LookupKey& k, void* callback_args, + bool (*callback_func)(void* arg, const char* entry)) override; - virtual ~VectorRep() override { } + ~VectorRep() override {} class Iterator : public MemTableRep::Iterator { class VectorRep* vrep_; @@ -62,41 +61,40 @@ class VectorRep : public MemTableRep { // Initialize an iterator over the specified collection. // The returned iterator is not valid. // explicit Iterator(const MemTableRep* collection); - virtual ~Iterator() override { }; + ~Iterator() override{}; // Returns true iff the iterator is positioned at a valid node. - virtual bool Valid() const override; + bool Valid() const override; // Returns the key at the current position. // REQUIRES: Valid() - virtual const char* key() const override; + const char* key() const override; // Advances to the next position. // REQUIRES: Valid() - virtual void Next() override; + void Next() override; // Advances to the previous position. // REQUIRES: Valid() - virtual void Prev() override; + void Prev() override; // Advance to the first entry with a key >= target - virtual void Seek(const Slice& user_key, const char* memtable_key) override; + void Seek(const Slice& user_key, const char* memtable_key) override; // Advance to the first entry with a key <= target - virtual void SeekForPrev(const Slice& user_key, - const char* memtable_key) override; + void SeekForPrev(const Slice& user_key, const char* memtable_key) override; // Position at the first entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToFirst() override; + void SeekToFirst() override; // Position at the last entry in collection. // Final state of iterator is Valid() iff collection is not empty. - virtual void SeekToLast() override; + void SeekToLast() override; }; // Return an iterator over the keys in this representation. - virtual MemTableRep::Iterator* GetIterator(Arena* arena) override; + MemTableRep::Iterator* GetIterator(Arena* arena) override; private: friend class Iterator; diff --git a/ceph/src/rocksdb/memtable/write_buffer_manager.cc b/ceph/src/rocksdb/memtable/write_buffer_manager.cc index 21b18c8f7..7f2e664ab 100644 --- a/ceph/src/rocksdb/memtable/write_buffer_manager.cc +++ b/ceph/src/rocksdb/memtable/write_buffer_manager.cc @@ -79,7 +79,7 @@ WriteBufferManager::~WriteBufferManager() { void WriteBufferManager::ReserveMemWithCache(size_t mem) { #ifndef ROCKSDB_LITE assert(cache_rep_ != nullptr); - // Use a mutex to protect various data structures. Can be optimzied to a + // Use a mutex to protect various data structures. Can be optimized to a // lock-free solution if it ends up with a performance bottleneck. std::lock_guard lock(cache_rep_->cache_mutex_); @@ -102,14 +102,14 @@ void WriteBufferManager::ReserveMemWithCache(size_t mem) { void WriteBufferManager::FreeMemWithCache(size_t mem) { #ifndef ROCKSDB_LITE assert(cache_rep_ != nullptr); - // Use a mutex to protect various data structures. Can be optimzied to a + // Use a mutex to protect various data structures. Can be optimized to a // lock-free solution if it ends up with a performance bottleneck. std::lock_guard lock(cache_rep_->cache_mutex_); size_t new_mem_used = memory_used_.load(std::memory_order_relaxed) - mem; memory_used_.store(new_mem_used, std::memory_order_relaxed); // Gradually shrink memory costed in the block cache if the actual // usage is less than 3/4 of what we reserve from the block cache. - // We do this becausse: + // We do this because: // 1. we don't pay the cost of the block cache immediately a memtable is // freed, as block cache insert is expensive; // 2. eventually, if we walk away from a temporary memtable size increase, diff --git a/ceph/src/rocksdb/monitoring/histogram.cc b/ceph/src/rocksdb/monitoring/histogram.cc index d5ac07c90..4bc7139d3 100644 --- a/ceph/src/rocksdb/monitoring/histogram.cc +++ b/ceph/src/rocksdb/monitoring/histogram.cc @@ -237,6 +237,7 @@ void HistogramStat::Data(HistogramData * const data) const { data->standard_deviation = StandardDeviation(); data->count = num(); data->sum = sum(); + data->min = static_cast(min()); } void HistogramImpl::Clear() { diff --git a/ceph/src/rocksdb/monitoring/instrumented_mutex.cc b/ceph/src/rocksdb/monitoring/instrumented_mutex.cc index 2255b35ae..7b61bcf4f 100644 --- a/ceph/src/rocksdb/monitoring/instrumented_mutex.cc +++ b/ceph/src/rocksdb/monitoring/instrumented_mutex.cc @@ -12,7 +12,7 @@ namespace rocksdb { namespace { Statistics* stats_for_report(Env* env, Statistics* stats) { if (env != nullptr && stats != nullptr && - stats->stats_level_ > kExceptTimeForMutex) { + stats->get_stats_level() > kExceptTimeForMutex) { return stats; } else { return nullptr; diff --git a/ceph/src/rocksdb/monitoring/iostats_context_imp.h b/ceph/src/rocksdb/monitoring/iostats_context_imp.h index 8af64f1fa..23c2088ca 100644 --- a/ceph/src/rocksdb/monitoring/iostats_context_imp.h +++ b/ceph/src/rocksdb/monitoring/iostats_context_imp.h @@ -37,6 +37,13 @@ extern __thread IOStatsContext iostats_context; PerfStepTimer iostats_step_timer_##metric(&(iostats_context.metric)); \ iostats_step_timer_##metric.Start(); +// Declare and set start time of the timer +#define IOSTATS_CPU_TIMER_GUARD(metric, env) \ + PerfStepTimer iostats_step_timer_##metric( \ + &(iostats_context.metric), env, true, \ + PerfLevel::kEnableTimeAndCPUTimeExceptForMutex); \ + iostats_step_timer_##metric.Start(); + #else // ROCKSDB_SUPPORT_THREAD_LOCAL #define IOSTATS_ADD(metric, value) @@ -48,5 +55,6 @@ extern __thread IOStatsContext iostats_context; #define IOSTATS(metric) 0 #define IOSTATS_TIMER_GUARD(metric) +#define IOSTATS_CPU_TIMER_GUARD(metric, env) #endif // ROCKSDB_SUPPORT_THREAD_LOCAL diff --git a/ceph/src/rocksdb/monitoring/perf_context.cc b/ceph/src/rocksdb/monitoring/perf_context.cc index 9bba841f8..40b0b215c 100644 --- a/ceph/src/rocksdb/monitoring/perf_context.cc +++ b/ceph/src/rocksdb/monitoring/perf_context.cc @@ -15,7 +15,7 @@ PerfContext perf_context; #if defined(OS_SOLARIS) __thread PerfContext perf_context_; #else -__thread PerfContext perf_context; +thread_local PerfContext perf_context; #endif #endif @@ -31,6 +31,300 @@ PerfContext* get_perf_context() { #endif } +PerfContext::~PerfContext() { +#if !defined(NPERF_CONTEXT) && defined(ROCKSDB_SUPPORT_THREAD_LOCAL) && !defined(OS_SOLARIS) + ClearPerLevelPerfContext(); +#endif +} + +PerfContext::PerfContext(const PerfContext& other) { +#ifndef NPERF_CONTEXT + user_key_comparison_count = other.user_key_comparison_count; + block_cache_hit_count = other.block_cache_hit_count; + block_read_count = other.block_read_count; + block_read_byte = other.block_read_byte; + block_read_time = other.block_read_time; + block_cache_index_hit_count = other.block_cache_index_hit_count; + index_block_read_count = other.index_block_read_count; + block_cache_filter_hit_count = other.block_cache_filter_hit_count; + filter_block_read_count = other.filter_block_read_count; + compression_dict_block_read_count = other.compression_dict_block_read_count; + block_checksum_time = other.block_checksum_time; + block_decompress_time = other.block_decompress_time; + get_read_bytes = other.get_read_bytes; + multiget_read_bytes = other.multiget_read_bytes; + iter_read_bytes = other.iter_read_bytes; + internal_key_skipped_count = other.internal_key_skipped_count; + internal_delete_skipped_count = other.internal_delete_skipped_count; + internal_recent_skipped_count = other.internal_recent_skipped_count; + internal_merge_count = other.internal_merge_count; + write_wal_time = other.write_wal_time; + get_snapshot_time = other.get_snapshot_time; + get_from_memtable_time = other.get_from_memtable_time; + get_from_memtable_count = other.get_from_memtable_count; + get_post_process_time = other.get_post_process_time; + get_from_output_files_time = other.get_from_output_files_time; + seek_on_memtable_time = other.seek_on_memtable_time; + seek_on_memtable_count = other.seek_on_memtable_count; + next_on_memtable_count = other.next_on_memtable_count; + prev_on_memtable_count = other.prev_on_memtable_count; + seek_child_seek_time = other.seek_child_seek_time; + seek_child_seek_count = other.seek_child_seek_count; + seek_min_heap_time = other.seek_min_heap_time; + seek_internal_seek_time = other.seek_internal_seek_time; + find_next_user_entry_time = other.find_next_user_entry_time; + write_pre_and_post_process_time = other.write_pre_and_post_process_time; + write_memtable_time = other.write_memtable_time; + write_delay_time = other.write_delay_time; + write_thread_wait_nanos = other.write_thread_wait_nanos; + write_scheduling_flushes_compactions_time = + other.write_scheduling_flushes_compactions_time; + db_mutex_lock_nanos = other.db_mutex_lock_nanos; + db_condition_wait_nanos = other.db_condition_wait_nanos; + merge_operator_time_nanos = other.merge_operator_time_nanos; + read_index_block_nanos = other.read_index_block_nanos; + read_filter_block_nanos = other.read_filter_block_nanos; + new_table_block_iter_nanos = other.new_table_block_iter_nanos; + new_table_iterator_nanos = other.new_table_iterator_nanos; + block_seek_nanos = other.block_seek_nanos; + find_table_nanos = other.find_table_nanos; + bloom_memtable_hit_count = other.bloom_memtable_hit_count; + bloom_memtable_miss_count = other.bloom_memtable_miss_count; + bloom_sst_hit_count = other.bloom_sst_hit_count; + bloom_sst_miss_count = other.bloom_sst_miss_count; + key_lock_wait_time = other.key_lock_wait_time; + key_lock_wait_count = other.key_lock_wait_count; + + env_new_sequential_file_nanos = other.env_new_sequential_file_nanos; + env_new_random_access_file_nanos = other.env_new_random_access_file_nanos; + env_new_writable_file_nanos = other.env_new_writable_file_nanos; + env_reuse_writable_file_nanos = other.env_reuse_writable_file_nanos; + env_new_random_rw_file_nanos = other.env_new_random_rw_file_nanos; + env_new_directory_nanos = other.env_new_directory_nanos; + env_file_exists_nanos = other.env_file_exists_nanos; + env_get_children_nanos = other.env_get_children_nanos; + env_get_children_file_attributes_nanos = + other.env_get_children_file_attributes_nanos; + env_delete_file_nanos = other.env_delete_file_nanos; + env_create_dir_nanos = other.env_create_dir_nanos; + env_create_dir_if_missing_nanos = other.env_create_dir_if_missing_nanos; + env_delete_dir_nanos = other.env_delete_dir_nanos; + env_get_file_size_nanos = other.env_get_file_size_nanos; + env_get_file_modification_time_nanos = + other.env_get_file_modification_time_nanos; + env_rename_file_nanos = other.env_rename_file_nanos; + env_link_file_nanos = other.env_link_file_nanos; + env_lock_file_nanos = other.env_lock_file_nanos; + env_unlock_file_nanos = other.env_unlock_file_nanos; + env_new_logger_nanos = other.env_new_logger_nanos; + get_cpu_nanos = other.get_cpu_nanos; + iter_next_cpu_nanos = other.iter_next_cpu_nanos; + iter_prev_cpu_nanos = other.iter_prev_cpu_nanos; + iter_seek_cpu_nanos = other.iter_seek_cpu_nanos; + if (per_level_perf_context_enabled && level_to_perf_context != nullptr) { + ClearPerLevelPerfContext(); + } + if (other.level_to_perf_context != nullptr) { + level_to_perf_context = new std::map(); + *level_to_perf_context = *other.level_to_perf_context; + } + per_level_perf_context_enabled = other.per_level_perf_context_enabled; +#endif +} + +PerfContext::PerfContext(PerfContext&& other) noexcept { +#ifndef NPERF_CONTEXT + user_key_comparison_count = other.user_key_comparison_count; + block_cache_hit_count = other.block_cache_hit_count; + block_read_count = other.block_read_count; + block_read_byte = other.block_read_byte; + block_read_time = other.block_read_time; + block_cache_index_hit_count = other.block_cache_index_hit_count; + index_block_read_count = other.index_block_read_count; + block_cache_filter_hit_count = other.block_cache_filter_hit_count; + filter_block_read_count = other.filter_block_read_count; + compression_dict_block_read_count = other.compression_dict_block_read_count; + block_checksum_time = other.block_checksum_time; + block_decompress_time = other.block_decompress_time; + get_read_bytes = other.get_read_bytes; + multiget_read_bytes = other.multiget_read_bytes; + iter_read_bytes = other.iter_read_bytes; + internal_key_skipped_count = other.internal_key_skipped_count; + internal_delete_skipped_count = other.internal_delete_skipped_count; + internal_recent_skipped_count = other.internal_recent_skipped_count; + internal_merge_count = other.internal_merge_count; + write_wal_time = other.write_wal_time; + get_snapshot_time = other.get_snapshot_time; + get_from_memtable_time = other.get_from_memtable_time; + get_from_memtable_count = other.get_from_memtable_count; + get_post_process_time = other.get_post_process_time; + get_from_output_files_time = other.get_from_output_files_time; + seek_on_memtable_time = other.seek_on_memtable_time; + seek_on_memtable_count = other.seek_on_memtable_count; + next_on_memtable_count = other.next_on_memtable_count; + prev_on_memtable_count = other.prev_on_memtable_count; + seek_child_seek_time = other.seek_child_seek_time; + seek_child_seek_count = other.seek_child_seek_count; + seek_min_heap_time = other.seek_min_heap_time; + seek_internal_seek_time = other.seek_internal_seek_time; + find_next_user_entry_time = other.find_next_user_entry_time; + write_pre_and_post_process_time = other.write_pre_and_post_process_time; + write_memtable_time = other.write_memtable_time; + write_delay_time = other.write_delay_time; + write_thread_wait_nanos = other.write_thread_wait_nanos; + write_scheduling_flushes_compactions_time = + other.write_scheduling_flushes_compactions_time; + db_mutex_lock_nanos = other.db_mutex_lock_nanos; + db_condition_wait_nanos = other.db_condition_wait_nanos; + merge_operator_time_nanos = other.merge_operator_time_nanos; + read_index_block_nanos = other.read_index_block_nanos; + read_filter_block_nanos = other.read_filter_block_nanos; + new_table_block_iter_nanos = other.new_table_block_iter_nanos; + new_table_iterator_nanos = other.new_table_iterator_nanos; + block_seek_nanos = other.block_seek_nanos; + find_table_nanos = other.find_table_nanos; + bloom_memtable_hit_count = other.bloom_memtable_hit_count; + bloom_memtable_miss_count = other.bloom_memtable_miss_count; + bloom_sst_hit_count = other.bloom_sst_hit_count; + bloom_sst_miss_count = other.bloom_sst_miss_count; + key_lock_wait_time = other.key_lock_wait_time; + key_lock_wait_count = other.key_lock_wait_count; + + env_new_sequential_file_nanos = other.env_new_sequential_file_nanos; + env_new_random_access_file_nanos = other.env_new_random_access_file_nanos; + env_new_writable_file_nanos = other.env_new_writable_file_nanos; + env_reuse_writable_file_nanos = other.env_reuse_writable_file_nanos; + env_new_random_rw_file_nanos = other.env_new_random_rw_file_nanos; + env_new_directory_nanos = other.env_new_directory_nanos; + env_file_exists_nanos = other.env_file_exists_nanos; + env_get_children_nanos = other.env_get_children_nanos; + env_get_children_file_attributes_nanos = + other.env_get_children_file_attributes_nanos; + env_delete_file_nanos = other.env_delete_file_nanos; + env_create_dir_nanos = other.env_create_dir_nanos; + env_create_dir_if_missing_nanos = other.env_create_dir_if_missing_nanos; + env_delete_dir_nanos = other.env_delete_dir_nanos; + env_get_file_size_nanos = other.env_get_file_size_nanos; + env_get_file_modification_time_nanos = + other.env_get_file_modification_time_nanos; + env_rename_file_nanos = other.env_rename_file_nanos; + env_link_file_nanos = other.env_link_file_nanos; + env_lock_file_nanos = other.env_lock_file_nanos; + env_unlock_file_nanos = other.env_unlock_file_nanos; + env_new_logger_nanos = other.env_new_logger_nanos; + get_cpu_nanos = other.get_cpu_nanos; + iter_next_cpu_nanos = other.iter_next_cpu_nanos; + iter_prev_cpu_nanos = other.iter_prev_cpu_nanos; + iter_seek_cpu_nanos = other.iter_seek_cpu_nanos; + if (per_level_perf_context_enabled && level_to_perf_context != nullptr) { + ClearPerLevelPerfContext(); + } + if (other.level_to_perf_context != nullptr) { + level_to_perf_context = other.level_to_perf_context; + other.level_to_perf_context = nullptr; + } + per_level_perf_context_enabled = other.per_level_perf_context_enabled; +#endif +} + +// TODO(Zhongyi): reduce code duplication between copy constructor and +// assignment operator +PerfContext& PerfContext::operator=(const PerfContext& other) { +#ifndef NPERF_CONTEXT + user_key_comparison_count = other.user_key_comparison_count; + block_cache_hit_count = other.block_cache_hit_count; + block_read_count = other.block_read_count; + block_read_byte = other.block_read_byte; + block_read_time = other.block_read_time; + block_cache_index_hit_count = other.block_cache_index_hit_count; + index_block_read_count = other.index_block_read_count; + block_cache_filter_hit_count = other.block_cache_filter_hit_count; + filter_block_read_count = other.filter_block_read_count; + compression_dict_block_read_count = other.compression_dict_block_read_count; + block_checksum_time = other.block_checksum_time; + block_decompress_time = other.block_decompress_time; + get_read_bytes = other.get_read_bytes; + multiget_read_bytes = other.multiget_read_bytes; + iter_read_bytes = other.iter_read_bytes; + internal_key_skipped_count = other.internal_key_skipped_count; + internal_delete_skipped_count = other.internal_delete_skipped_count; + internal_recent_skipped_count = other.internal_recent_skipped_count; + internal_merge_count = other.internal_merge_count; + write_wal_time = other.write_wal_time; + get_snapshot_time = other.get_snapshot_time; + get_from_memtable_time = other.get_from_memtable_time; + get_from_memtable_count = other.get_from_memtable_count; + get_post_process_time = other.get_post_process_time; + get_from_output_files_time = other.get_from_output_files_time; + seek_on_memtable_time = other.seek_on_memtable_time; + seek_on_memtable_count = other.seek_on_memtable_count; + next_on_memtable_count = other.next_on_memtable_count; + prev_on_memtable_count = other.prev_on_memtable_count; + seek_child_seek_time = other.seek_child_seek_time; + seek_child_seek_count = other.seek_child_seek_count; + seek_min_heap_time = other.seek_min_heap_time; + seek_internal_seek_time = other.seek_internal_seek_time; + find_next_user_entry_time = other.find_next_user_entry_time; + write_pre_and_post_process_time = other.write_pre_and_post_process_time; + write_memtable_time = other.write_memtable_time; + write_delay_time = other.write_delay_time; + write_thread_wait_nanos = other.write_thread_wait_nanos; + write_scheduling_flushes_compactions_time = + other.write_scheduling_flushes_compactions_time; + db_mutex_lock_nanos = other.db_mutex_lock_nanos; + db_condition_wait_nanos = other.db_condition_wait_nanos; + merge_operator_time_nanos = other.merge_operator_time_nanos; + read_index_block_nanos = other.read_index_block_nanos; + read_filter_block_nanos = other.read_filter_block_nanos; + new_table_block_iter_nanos = other.new_table_block_iter_nanos; + new_table_iterator_nanos = other.new_table_iterator_nanos; + block_seek_nanos = other.block_seek_nanos; + find_table_nanos = other.find_table_nanos; + bloom_memtable_hit_count = other.bloom_memtable_hit_count; + bloom_memtable_miss_count = other.bloom_memtable_miss_count; + bloom_sst_hit_count = other.bloom_sst_hit_count; + bloom_sst_miss_count = other.bloom_sst_miss_count; + key_lock_wait_time = other.key_lock_wait_time; + key_lock_wait_count = other.key_lock_wait_count; + + env_new_sequential_file_nanos = other.env_new_sequential_file_nanos; + env_new_random_access_file_nanos = other.env_new_random_access_file_nanos; + env_new_writable_file_nanos = other.env_new_writable_file_nanos; + env_reuse_writable_file_nanos = other.env_reuse_writable_file_nanos; + env_new_random_rw_file_nanos = other.env_new_random_rw_file_nanos; + env_new_directory_nanos = other.env_new_directory_nanos; + env_file_exists_nanos = other.env_file_exists_nanos; + env_get_children_nanos = other.env_get_children_nanos; + env_get_children_file_attributes_nanos = + other.env_get_children_file_attributes_nanos; + env_delete_file_nanos = other.env_delete_file_nanos; + env_create_dir_nanos = other.env_create_dir_nanos; + env_create_dir_if_missing_nanos = other.env_create_dir_if_missing_nanos; + env_delete_dir_nanos = other.env_delete_dir_nanos; + env_get_file_size_nanos = other.env_get_file_size_nanos; + env_get_file_modification_time_nanos = + other.env_get_file_modification_time_nanos; + env_rename_file_nanos = other.env_rename_file_nanos; + env_link_file_nanos = other.env_link_file_nanos; + env_lock_file_nanos = other.env_lock_file_nanos; + env_unlock_file_nanos = other.env_unlock_file_nanos; + env_new_logger_nanos = other.env_new_logger_nanos; + get_cpu_nanos = other.get_cpu_nanos; + iter_next_cpu_nanos = other.iter_next_cpu_nanos; + iter_prev_cpu_nanos = other.iter_prev_cpu_nanos; + iter_seek_cpu_nanos = other.iter_seek_cpu_nanos; + if (per_level_perf_context_enabled && level_to_perf_context != nullptr) { + ClearPerLevelPerfContext(); + } + if (other.level_to_perf_context != nullptr) { + level_to_perf_context = new std::map(); + *level_to_perf_context = *other.level_to_perf_context; + } + per_level_perf_context_enabled = other.per_level_perf_context_enabled; +#endif + return *this; +} + void PerfContext::Reset() { #ifndef NPERF_CONTEXT user_key_comparison_count = 0; @@ -38,6 +332,11 @@ void PerfContext::Reset() { block_read_count = 0; block_read_byte = 0; block_read_time = 0; + block_cache_index_hit_count = 0; + index_block_read_count = 0; + block_cache_filter_hit_count = 0; + filter_block_read_count = 0; + compression_dict_block_read_count = 0; block_checksum_time = 0; block_decompress_time = 0; get_read_bytes = 0; @@ -104,6 +403,15 @@ void PerfContext::Reset() { env_lock_file_nanos = 0; env_unlock_file_nanos = 0; env_new_logger_nanos = 0; + get_cpu_nanos = 0; + iter_next_cpu_nanos = 0; + iter_prev_cpu_nanos = 0; + iter_seek_cpu_nanos = 0; + if (per_level_perf_context_enabled && level_to_perf_context) { + for (auto& kv : *level_to_perf_context) { + kv.second.Reset(); + } + } #endif } @@ -112,6 +420,27 @@ void PerfContext::Reset() { ss << #counter << " = " << counter << ", "; \ } +#define PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(counter) \ + if (per_level_perf_context_enabled && \ + level_to_perf_context) { \ + ss << #counter << " = "; \ + for (auto& kv : *level_to_perf_context) { \ + if (!exclude_zero_counters || (kv.second.counter > 0)) { \ + ss << kv.second.counter << "@level" << kv.first << ", "; \ + } \ + } \ + } + +void PerfContextByLevel::Reset() { +#ifndef NPERF_CONTEXT + bloom_filter_useful = 0; + bloom_filter_full_positive = 0; + bloom_filter_full_true_positive = 0; + block_cache_hit_count = 0; + block_cache_miss_count = 0; +#endif +} + std::string PerfContext::ToString(bool exclude_zero_counters) const { #ifdef NPERF_CONTEXT return ""; @@ -122,6 +451,11 @@ std::string PerfContext::ToString(bool exclude_zero_counters) const { PERF_CONTEXT_OUTPUT(block_read_count); PERF_CONTEXT_OUTPUT(block_read_byte); PERF_CONTEXT_OUTPUT(block_read_time); + PERF_CONTEXT_OUTPUT(block_cache_index_hit_count); + PERF_CONTEXT_OUTPUT(index_block_read_count); + PERF_CONTEXT_OUTPUT(block_cache_filter_hit_count); + PERF_CONTEXT_OUTPUT(filter_block_read_count); + PERF_CONTEXT_OUTPUT(compression_dict_block_read_count); PERF_CONTEXT_OUTPUT(block_checksum_time); PERF_CONTEXT_OUTPUT(block_decompress_time); PERF_CONTEXT_OUTPUT(get_read_bytes); @@ -186,8 +520,37 @@ std::string PerfContext::ToString(bool exclude_zero_counters) const { PERF_CONTEXT_OUTPUT(env_lock_file_nanos); PERF_CONTEXT_OUTPUT(env_unlock_file_nanos); PERF_CONTEXT_OUTPUT(env_new_logger_nanos); + PERF_CONTEXT_OUTPUT(get_cpu_nanos); + PERF_CONTEXT_OUTPUT(iter_next_cpu_nanos); + PERF_CONTEXT_OUTPUT(iter_prev_cpu_nanos); + PERF_CONTEXT_OUTPUT(iter_seek_cpu_nanos); + PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_useful); + PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_positive); + PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_true_positive); + PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(block_cache_hit_count); + PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(block_cache_miss_count); return ss.str(); #endif } +void PerfContext::EnablePerLevelPerfContext() { + if (level_to_perf_context == nullptr) { + level_to_perf_context = new std::map(); + } + per_level_perf_context_enabled = true; +} + +void PerfContext::DisablePerLevelPerfContext(){ + per_level_perf_context_enabled = false; +} + +void PerfContext::ClearPerLevelPerfContext(){ + if (level_to_perf_context != nullptr) { + level_to_perf_context->clear(); + delete level_to_perf_context; + level_to_perf_context = nullptr; + } + per_level_perf_context_enabled = false; +} + } diff --git a/ceph/src/rocksdb/monitoring/perf_context_imp.h b/ceph/src/rocksdb/monitoring/perf_context_imp.h index cfcded1c9..e0ff8afc5 100644 --- a/ceph/src/rocksdb/monitoring/perf_context_imp.h +++ b/ceph/src/rocksdb/monitoring/perf_context_imp.h @@ -16,7 +16,7 @@ extern PerfContext perf_context; extern __thread PerfContext perf_context_; #define perf_context (*get_perf_context()) #else -extern __thread PerfContext perf_context; +extern thread_local PerfContext perf_context; #endif #endif @@ -41,12 +41,25 @@ extern __thread PerfContext perf_context; PerfStepTimer perf_step_timer_##metric(&(perf_context.metric)); \ perf_step_timer_##metric.Start(); -#define PERF_CONDITIONAL_TIMER_FOR_MUTEX_GUARD(metric, condition, stats, \ - ticker_type) \ - PerfStepTimer perf_step_timer_##metric(&(perf_context.metric), true, stats, \ - ticker_type); \ - if (condition) { \ - perf_step_timer_##metric.Start(); \ +// Declare and set start time of the timer +#define PERF_TIMER_GUARD_WITH_ENV(metric, env) \ + PerfStepTimer perf_step_timer_##metric(&(perf_context.metric), env); \ + perf_step_timer_##metric.Start(); + +// Declare and set start time of the timer +#define PERF_CPU_TIMER_GUARD(metric, env) \ + PerfStepTimer perf_step_timer_##metric( \ + &(perf_context.metric), env, true, \ + PerfLevel::kEnableTimeAndCPUTimeExceptForMutex); \ + perf_step_timer_##metric.Start(); + +#define PERF_CONDITIONAL_TIMER_FOR_MUTEX_GUARD(metric, condition, stats, \ + ticker_type) \ + PerfStepTimer perf_step_timer_##metric(&(perf_context.metric), nullptr, \ + false, PerfLevel::kEnableTime, stats, \ + ticker_type); \ + if (condition) { \ + perf_step_timer_##metric.Start(); \ } // Update metric with time elapsed since last START. start time is reset @@ -59,6 +72,22 @@ extern __thread PerfContext perf_context; perf_context.metric += value; \ } +// Increase metric value +#define PERF_COUNTER_BY_LEVEL_ADD(metric, value, level) \ + if (perf_level >= PerfLevel::kEnableCount && \ + perf_context.per_level_perf_context_enabled && \ + perf_context.level_to_perf_context) { \ + if ((*(perf_context.level_to_perf_context)).find(level) != \ + (*(perf_context.level_to_perf_context)).end()) { \ + (*(perf_context.level_to_perf_context))[level].metric += value; \ + } \ + else { \ + PerfContextByLevel empty_context; \ + (*(perf_context.level_to_perf_context))[level] = empty_context; \ + (*(perf_context.level_to_perf_context))[level].metric += value; \ + } \ + } \ + #endif } diff --git a/ceph/src/rocksdb/monitoring/perf_step_timer.h b/ceph/src/rocksdb/monitoring/perf_step_timer.h index 246d6eb75..6501bd54a 100644 --- a/ceph/src/rocksdb/monitoring/perf_step_timer.h +++ b/ceph/src/rocksdb/monitoring/perf_step_timer.h @@ -12,14 +12,15 @@ namespace rocksdb { class PerfStepTimer { public: - explicit PerfStepTimer(uint64_t* metric, bool for_mutex = false, - Statistics* statistics = nullptr, - uint32_t ticker_type = 0) - : perf_counter_enabled_( - perf_level >= PerfLevel::kEnableTime || - (!for_mutex && perf_level >= kEnableTimeExceptForMutex)), - env_((perf_counter_enabled_ || statistics != nullptr) ? Env::Default() - : nullptr), + explicit PerfStepTimer( + uint64_t* metric, Env* env = nullptr, bool use_cpu_time = false, + PerfLevel enable_level = PerfLevel::kEnableTimeExceptForMutex, + Statistics* statistics = nullptr, uint32_t ticker_type = 0) + : perf_counter_enabled_(perf_level >= enable_level), + use_cpu_time_(use_cpu_time), + env_((perf_counter_enabled_ || statistics != nullptr) + ? ((env != nullptr) ? env : Env::Default()) + : nullptr), start_(0), metric_(metric), statistics_(statistics), @@ -31,13 +32,21 @@ class PerfStepTimer { void Start() { if (perf_counter_enabled_ || statistics_ != nullptr) { - start_ = env_->NowNanos(); + start_ = time_now(); + } + } + + uint64_t time_now() { + if (!use_cpu_time_) { + return env_->NowNanos(); + } else { + return env_->NowCPUNanos(); } } void Measure() { if (start_) { - uint64_t now = env_->NowNanos(); + uint64_t now = time_now(); *metric_ += now - start_; start_ = now; } @@ -45,7 +54,7 @@ class PerfStepTimer { void Stop() { if (start_) { - uint64_t duration = env_->NowNanos() - start_; + uint64_t duration = time_now() - start_; if (perf_counter_enabled_) { *metric_ += duration; } @@ -59,6 +68,7 @@ class PerfStepTimer { private: const bool perf_counter_enabled_; + const bool use_cpu_time_; Env* const env_; uint64_t start_; uint64_t* metric_; diff --git a/ceph/src/rocksdb/monitoring/statistics.cc b/ceph/src/rocksdb/monitoring/statistics.cc index 59ce3d9e0..adb8cbfed 100644 --- a/ceph/src/rocksdb/monitoring/statistics.cc +++ b/ceph/src/rocksdb/monitoring/statistics.cc @@ -17,13 +17,225 @@ namespace rocksdb { +// The order of items listed in Tickers should be the same as +// the order listed in TickersNameMap +const std::vector> TickersNameMap = { + {BLOCK_CACHE_MISS, "rocksdb.block.cache.miss"}, + {BLOCK_CACHE_HIT, "rocksdb.block.cache.hit"}, + {BLOCK_CACHE_ADD, "rocksdb.block.cache.add"}, + {BLOCK_CACHE_ADD_FAILURES, "rocksdb.block.cache.add.failures"}, + {BLOCK_CACHE_INDEX_MISS, "rocksdb.block.cache.index.miss"}, + {BLOCK_CACHE_INDEX_HIT, "rocksdb.block.cache.index.hit"}, + {BLOCK_CACHE_INDEX_ADD, "rocksdb.block.cache.index.add"}, + {BLOCK_CACHE_INDEX_BYTES_INSERT, "rocksdb.block.cache.index.bytes.insert"}, + {BLOCK_CACHE_INDEX_BYTES_EVICT, "rocksdb.block.cache.index.bytes.evict"}, + {BLOCK_CACHE_FILTER_MISS, "rocksdb.block.cache.filter.miss"}, + {BLOCK_CACHE_FILTER_HIT, "rocksdb.block.cache.filter.hit"}, + {BLOCK_CACHE_FILTER_ADD, "rocksdb.block.cache.filter.add"}, + {BLOCK_CACHE_FILTER_BYTES_INSERT, + "rocksdb.block.cache.filter.bytes.insert"}, + {BLOCK_CACHE_FILTER_BYTES_EVICT, "rocksdb.block.cache.filter.bytes.evict"}, + {BLOCK_CACHE_DATA_MISS, "rocksdb.block.cache.data.miss"}, + {BLOCK_CACHE_DATA_HIT, "rocksdb.block.cache.data.hit"}, + {BLOCK_CACHE_DATA_ADD, "rocksdb.block.cache.data.add"}, + {BLOCK_CACHE_DATA_BYTES_INSERT, "rocksdb.block.cache.data.bytes.insert"}, + {BLOCK_CACHE_BYTES_READ, "rocksdb.block.cache.bytes.read"}, + {BLOCK_CACHE_BYTES_WRITE, "rocksdb.block.cache.bytes.write"}, + {BLOOM_FILTER_USEFUL, "rocksdb.bloom.filter.useful"}, + {BLOOM_FILTER_FULL_POSITIVE, "rocksdb.bloom.filter.full.positive"}, + {BLOOM_FILTER_FULL_TRUE_POSITIVE, + "rocksdb.bloom.filter.full.true.positive"}, + {PERSISTENT_CACHE_HIT, "rocksdb.persistent.cache.hit"}, + {PERSISTENT_CACHE_MISS, "rocksdb.persistent.cache.miss"}, + {SIM_BLOCK_CACHE_HIT, "rocksdb.sim.block.cache.hit"}, + {SIM_BLOCK_CACHE_MISS, "rocksdb.sim.block.cache.miss"}, + {MEMTABLE_HIT, "rocksdb.memtable.hit"}, + {MEMTABLE_MISS, "rocksdb.memtable.miss"}, + {GET_HIT_L0, "rocksdb.l0.hit"}, + {GET_HIT_L1, "rocksdb.l1.hit"}, + {GET_HIT_L2_AND_UP, "rocksdb.l2andup.hit"}, + {COMPACTION_KEY_DROP_NEWER_ENTRY, "rocksdb.compaction.key.drop.new"}, + {COMPACTION_KEY_DROP_OBSOLETE, "rocksdb.compaction.key.drop.obsolete"}, + {COMPACTION_KEY_DROP_RANGE_DEL, "rocksdb.compaction.key.drop.range_del"}, + {COMPACTION_KEY_DROP_USER, "rocksdb.compaction.key.drop.user"}, + {COMPACTION_RANGE_DEL_DROP_OBSOLETE, + "rocksdb.compaction.range_del.drop.obsolete"}, + {COMPACTION_OPTIMIZED_DEL_DROP_OBSOLETE, + "rocksdb.compaction.optimized.del.drop.obsolete"}, + {COMPACTION_CANCELLED, "rocksdb.compaction.cancelled"}, + {NUMBER_KEYS_WRITTEN, "rocksdb.number.keys.written"}, + {NUMBER_KEYS_READ, "rocksdb.number.keys.read"}, + {NUMBER_KEYS_UPDATED, "rocksdb.number.keys.updated"}, + {BYTES_WRITTEN, "rocksdb.bytes.written"}, + {BYTES_READ, "rocksdb.bytes.read"}, + {NUMBER_DB_SEEK, "rocksdb.number.db.seek"}, + {NUMBER_DB_NEXT, "rocksdb.number.db.next"}, + {NUMBER_DB_PREV, "rocksdb.number.db.prev"}, + {NUMBER_DB_SEEK_FOUND, "rocksdb.number.db.seek.found"}, + {NUMBER_DB_NEXT_FOUND, "rocksdb.number.db.next.found"}, + {NUMBER_DB_PREV_FOUND, "rocksdb.number.db.prev.found"}, + {ITER_BYTES_READ, "rocksdb.db.iter.bytes.read"}, + {NO_FILE_CLOSES, "rocksdb.no.file.closes"}, + {NO_FILE_OPENS, "rocksdb.no.file.opens"}, + {NO_FILE_ERRORS, "rocksdb.no.file.errors"}, + {STALL_L0_SLOWDOWN_MICROS, "rocksdb.l0.slowdown.micros"}, + {STALL_MEMTABLE_COMPACTION_MICROS, "rocksdb.memtable.compaction.micros"}, + {STALL_L0_NUM_FILES_MICROS, "rocksdb.l0.num.files.stall.micros"}, + {STALL_MICROS, "rocksdb.stall.micros"}, + {DB_MUTEX_WAIT_MICROS, "rocksdb.db.mutex.wait.micros"}, + {RATE_LIMIT_DELAY_MILLIS, "rocksdb.rate.limit.delay.millis"}, + {NO_ITERATORS, "rocksdb.num.iterators"}, + {NUMBER_MULTIGET_CALLS, "rocksdb.number.multiget.get"}, + {NUMBER_MULTIGET_KEYS_READ, "rocksdb.number.multiget.keys.read"}, + {NUMBER_MULTIGET_BYTES_READ, "rocksdb.number.multiget.bytes.read"}, + {NUMBER_FILTERED_DELETES, "rocksdb.number.deletes.filtered"}, + {NUMBER_MERGE_FAILURES, "rocksdb.number.merge.failures"}, + {BLOOM_FILTER_PREFIX_CHECKED, "rocksdb.bloom.filter.prefix.checked"}, + {BLOOM_FILTER_PREFIX_USEFUL, "rocksdb.bloom.filter.prefix.useful"}, + {NUMBER_OF_RESEEKS_IN_ITERATION, "rocksdb.number.reseeks.iteration"}, + {GET_UPDATES_SINCE_CALLS, "rocksdb.getupdatessince.calls"}, + {BLOCK_CACHE_COMPRESSED_MISS, "rocksdb.block.cachecompressed.miss"}, + {BLOCK_CACHE_COMPRESSED_HIT, "rocksdb.block.cachecompressed.hit"}, + {BLOCK_CACHE_COMPRESSED_ADD, "rocksdb.block.cachecompressed.add"}, + {BLOCK_CACHE_COMPRESSED_ADD_FAILURES, + "rocksdb.block.cachecompressed.add.failures"}, + {WAL_FILE_SYNCED, "rocksdb.wal.synced"}, + {WAL_FILE_BYTES, "rocksdb.wal.bytes"}, + {WRITE_DONE_BY_SELF, "rocksdb.write.self"}, + {WRITE_DONE_BY_OTHER, "rocksdb.write.other"}, + {WRITE_TIMEDOUT, "rocksdb.write.timeout"}, + {WRITE_WITH_WAL, "rocksdb.write.wal"}, + {COMPACT_READ_BYTES, "rocksdb.compact.read.bytes"}, + {COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes"}, + {FLUSH_WRITE_BYTES, "rocksdb.flush.write.bytes"}, + {NUMBER_DIRECT_LOAD_TABLE_PROPERTIES, + "rocksdb.number.direct.load.table.properties"}, + {NUMBER_SUPERVERSION_ACQUIRES, "rocksdb.number.superversion_acquires"}, + {NUMBER_SUPERVERSION_RELEASES, "rocksdb.number.superversion_releases"}, + {NUMBER_SUPERVERSION_CLEANUPS, "rocksdb.number.superversion_cleanups"}, + {NUMBER_BLOCK_COMPRESSED, "rocksdb.number.block.compressed"}, + {NUMBER_BLOCK_DECOMPRESSED, "rocksdb.number.block.decompressed"}, + {NUMBER_BLOCK_NOT_COMPRESSED, "rocksdb.number.block.not_compressed"}, + {MERGE_OPERATION_TOTAL_TIME, "rocksdb.merge.operation.time.nanos"}, + {FILTER_OPERATION_TOTAL_TIME, "rocksdb.filter.operation.time.nanos"}, + {ROW_CACHE_HIT, "rocksdb.row.cache.hit"}, + {ROW_CACHE_MISS, "rocksdb.row.cache.miss"}, + {READ_AMP_ESTIMATE_USEFUL_BYTES, "rocksdb.read.amp.estimate.useful.bytes"}, + {READ_AMP_TOTAL_READ_BYTES, "rocksdb.read.amp.total.read.bytes"}, + {NUMBER_RATE_LIMITER_DRAINS, "rocksdb.number.rate_limiter.drains"}, + {NUMBER_ITER_SKIP, "rocksdb.number.iter.skip"}, + {BLOB_DB_NUM_PUT, "rocksdb.blobdb.num.put"}, + {BLOB_DB_NUM_WRITE, "rocksdb.blobdb.num.write"}, + {BLOB_DB_NUM_GET, "rocksdb.blobdb.num.get"}, + {BLOB_DB_NUM_MULTIGET, "rocksdb.blobdb.num.multiget"}, + {BLOB_DB_NUM_SEEK, "rocksdb.blobdb.num.seek"}, + {BLOB_DB_NUM_NEXT, "rocksdb.blobdb.num.next"}, + {BLOB_DB_NUM_PREV, "rocksdb.blobdb.num.prev"}, + {BLOB_DB_NUM_KEYS_WRITTEN, "rocksdb.blobdb.num.keys.written"}, + {BLOB_DB_NUM_KEYS_READ, "rocksdb.blobdb.num.keys.read"}, + {BLOB_DB_BYTES_WRITTEN, "rocksdb.blobdb.bytes.written"}, + {BLOB_DB_BYTES_READ, "rocksdb.blobdb.bytes.read"}, + {BLOB_DB_WRITE_INLINED, "rocksdb.blobdb.write.inlined"}, + {BLOB_DB_WRITE_INLINED_TTL, "rocksdb.blobdb.write.inlined.ttl"}, + {BLOB_DB_WRITE_BLOB, "rocksdb.blobdb.write.blob"}, + {BLOB_DB_WRITE_BLOB_TTL, "rocksdb.blobdb.write.blob.ttl"}, + {BLOB_DB_BLOB_FILE_BYTES_WRITTEN, "rocksdb.blobdb.blob.file.bytes.written"}, + {BLOB_DB_BLOB_FILE_BYTES_READ, "rocksdb.blobdb.blob.file.bytes.read"}, + {BLOB_DB_BLOB_FILE_SYNCED, "rocksdb.blobdb.blob.file.synced"}, + {BLOB_DB_BLOB_INDEX_EXPIRED_COUNT, + "rocksdb.blobdb.blob.index.expired.count"}, + {BLOB_DB_BLOB_INDEX_EXPIRED_SIZE, "rocksdb.blobdb.blob.index.expired.size"}, + {BLOB_DB_BLOB_INDEX_EVICTED_COUNT, + "rocksdb.blobdb.blob.index.evicted.count"}, + {BLOB_DB_BLOB_INDEX_EVICTED_SIZE, "rocksdb.blobdb.blob.index.evicted.size"}, + {BLOB_DB_GC_NUM_FILES, "rocksdb.blobdb.gc.num.files"}, + {BLOB_DB_GC_NUM_NEW_FILES, "rocksdb.blobdb.gc.num.new.files"}, + {BLOB_DB_GC_FAILURES, "rocksdb.blobdb.gc.failures"}, + {BLOB_DB_GC_NUM_KEYS_OVERWRITTEN, "rocksdb.blobdb.gc.num.keys.overwritten"}, + {BLOB_DB_GC_NUM_KEYS_EXPIRED, "rocksdb.blobdb.gc.num.keys.expired"}, + {BLOB_DB_GC_NUM_KEYS_RELOCATED, "rocksdb.blobdb.gc.num.keys.relocated"}, + {BLOB_DB_GC_BYTES_OVERWRITTEN, "rocksdb.blobdb.gc.bytes.overwritten"}, + {BLOB_DB_GC_BYTES_EXPIRED, "rocksdb.blobdb.gc.bytes.expired"}, + {BLOB_DB_GC_BYTES_RELOCATED, "rocksdb.blobdb.gc.bytes.relocated"}, + {BLOB_DB_FIFO_NUM_FILES_EVICTED, "rocksdb.blobdb.fifo.num.files.evicted"}, + {BLOB_DB_FIFO_NUM_KEYS_EVICTED, "rocksdb.blobdb.fifo.num.keys.evicted"}, + {BLOB_DB_FIFO_BYTES_EVICTED, "rocksdb.blobdb.fifo.bytes.evicted"}, + {TXN_PREPARE_MUTEX_OVERHEAD, "rocksdb.txn.overhead.mutex.prepare"}, + {TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD, + "rocksdb.txn.overhead.mutex.old.commit.map"}, + {TXN_DUPLICATE_KEY_OVERHEAD, "rocksdb.txn.overhead.duplicate.key"}, + {TXN_SNAPSHOT_MUTEX_OVERHEAD, "rocksdb.txn.overhead.mutex.snapshot"}, + {NUMBER_MULTIGET_KEYS_FOUND, "rocksdb.number.multiget.keys.found"}, + {NO_ITERATOR_CREATED, "rocksdb.num.iterator.created"}, + {NO_ITERATOR_DELETED, "rocksdb.num.iterator.deleted"}, + {BLOCK_CACHE_COMPRESSION_DICT_MISS, + "rocksdb.block.cache.compression.dict.miss"}, + {BLOCK_CACHE_COMPRESSION_DICT_HIT, + "rocksdb.block.cache.compression.dict.hit"}, + {BLOCK_CACHE_COMPRESSION_DICT_ADD, + "rocksdb.block.cache.compression.dict.add"}, + {BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT, + "rocksdb.block.cache.compression.dict.bytes.insert"}, + {BLOCK_CACHE_COMPRESSION_DICT_BYTES_EVICT, + "rocksdb.block.cache.compression.dict.bytes.evict"}, +}; + +const std::vector> HistogramsNameMap = { + {DB_GET, "rocksdb.db.get.micros"}, + {DB_WRITE, "rocksdb.db.write.micros"}, + {COMPACTION_TIME, "rocksdb.compaction.times.micros"}, + {COMPACTION_CPU_TIME, "rocksdb.compaction.times.cpu_micros"}, + {SUBCOMPACTION_SETUP_TIME, "rocksdb.subcompaction.setup.times.micros"}, + {TABLE_SYNC_MICROS, "rocksdb.table.sync.micros"}, + {COMPACTION_OUTFILE_SYNC_MICROS, "rocksdb.compaction.outfile.sync.micros"}, + {WAL_FILE_SYNC_MICROS, "rocksdb.wal.file.sync.micros"}, + {MANIFEST_FILE_SYNC_MICROS, "rocksdb.manifest.file.sync.micros"}, + {TABLE_OPEN_IO_MICROS, "rocksdb.table.open.io.micros"}, + {DB_MULTIGET, "rocksdb.db.multiget.micros"}, + {READ_BLOCK_COMPACTION_MICROS, "rocksdb.read.block.compaction.micros"}, + {READ_BLOCK_GET_MICROS, "rocksdb.read.block.get.micros"}, + {WRITE_RAW_BLOCK_MICROS, "rocksdb.write.raw.block.micros"}, + {STALL_L0_SLOWDOWN_COUNT, "rocksdb.l0.slowdown.count"}, + {STALL_MEMTABLE_COMPACTION_COUNT, "rocksdb.memtable.compaction.count"}, + {STALL_L0_NUM_FILES_COUNT, "rocksdb.num.files.stall.count"}, + {HARD_RATE_LIMIT_DELAY_COUNT, "rocksdb.hard.rate.limit.delay.count"}, + {SOFT_RATE_LIMIT_DELAY_COUNT, "rocksdb.soft.rate.limit.delay.count"}, + {NUM_FILES_IN_SINGLE_COMPACTION, "rocksdb.numfiles.in.singlecompaction"}, + {DB_SEEK, "rocksdb.db.seek.micros"}, + {WRITE_STALL, "rocksdb.db.write.stall"}, + {SST_READ_MICROS, "rocksdb.sst.read.micros"}, + {NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"}, + {BYTES_PER_READ, "rocksdb.bytes.per.read"}, + {BYTES_PER_WRITE, "rocksdb.bytes.per.write"}, + {BYTES_PER_MULTIGET, "rocksdb.bytes.per.multiget"}, + {BYTES_COMPRESSED, "rocksdb.bytes.compressed"}, + {BYTES_DECOMPRESSED, "rocksdb.bytes.decompressed"}, + {COMPRESSION_TIMES_NANOS, "rocksdb.compression.times.nanos"}, + {DECOMPRESSION_TIMES_NANOS, "rocksdb.decompression.times.nanos"}, + {READ_NUM_MERGE_OPERANDS, "rocksdb.read.num.merge_operands"}, + {BLOB_DB_KEY_SIZE, "rocksdb.blobdb.key.size"}, + {BLOB_DB_VALUE_SIZE, "rocksdb.blobdb.value.size"}, + {BLOB_DB_WRITE_MICROS, "rocksdb.blobdb.write.micros"}, + {BLOB_DB_GET_MICROS, "rocksdb.blobdb.get.micros"}, + {BLOB_DB_MULTIGET_MICROS, "rocksdb.blobdb.multiget.micros"}, + {BLOB_DB_SEEK_MICROS, "rocksdb.blobdb.seek.micros"}, + {BLOB_DB_NEXT_MICROS, "rocksdb.blobdb.next.micros"}, + {BLOB_DB_PREV_MICROS, "rocksdb.blobdb.prev.micros"}, + {BLOB_DB_BLOB_FILE_WRITE_MICROS, "rocksdb.blobdb.blob.file.write.micros"}, + {BLOB_DB_BLOB_FILE_READ_MICROS, "rocksdb.blobdb.blob.file.read.micros"}, + {BLOB_DB_BLOB_FILE_SYNC_MICROS, "rocksdb.blobdb.blob.file.sync.micros"}, + {BLOB_DB_GC_MICROS, "rocksdb.blobdb.gc.micros"}, + {BLOB_DB_COMPRESSION_MICROS, "rocksdb.blobdb.compression.micros"}, + {BLOB_DB_DECOMPRESSION_MICROS, "rocksdb.blobdb.decompression.micros"}, + {FLUSH_TIME, "rocksdb.db.flush.micros"}, +}; + std::shared_ptr CreateDBStatistics() { - return std::make_shared(nullptr, false); + return std::make_shared(nullptr); } -StatisticsImpl::StatisticsImpl(std::shared_ptr stats, - bool enable_internal_stats) - : stats_(std::move(stats)), enable_internal_stats_(enable_internal_stats) {} +StatisticsImpl::StatisticsImpl(std::shared_ptr stats) + : stats_(std::move(stats)) {} StatisticsImpl::~StatisticsImpl() {} @@ -33,10 +245,7 @@ uint64_t StatisticsImpl::getTickerCount(uint32_t tickerType) const { } uint64_t StatisticsImpl::getTickerCountLocked(uint32_t tickerType) const { - assert( - enable_internal_stats_ ? - tickerType < INTERNAL_TICKER_ENUM_MAX : - tickerType < TICKER_ENUM_MAX); + assert(tickerType < TICKER_ENUM_MAX); uint64_t res = 0; for (size_t core_idx = 0; core_idx < per_core_stats_.Size(); ++core_idx) { res += per_core_stats_.AccessAtCore(core_idx)->tickers_[tickerType]; @@ -52,10 +261,7 @@ void StatisticsImpl::histogramData(uint32_t histogramType, std::unique_ptr StatisticsImpl::getHistogramImplLocked( uint32_t histogramType) const { - assert( - enable_internal_stats_ ? - histogramType < INTERNAL_HISTOGRAM_ENUM_MAX : - histogramType < HISTOGRAM_ENUM_MAX); + assert(histogramType < HISTOGRAM_ENUM_MAX); std::unique_ptr res_hist(new HistogramImpl()); for (size_t core_idx = 0; core_idx < per_core_stats_.Size(); ++core_idx) { res_hist->Merge( @@ -80,8 +286,7 @@ void StatisticsImpl::setTickerCount(uint32_t tickerType, uint64_t count) { } void StatisticsImpl::setTickerCountLocked(uint32_t tickerType, uint64_t count) { - assert(enable_internal_stats_ ? tickerType < INTERNAL_TICKER_ENUM_MAX - : tickerType < TICKER_ENUM_MAX); + assert(tickerType < TICKER_ENUM_MAX); for (size_t core_idx = 0; core_idx < per_core_stats_.Size(); ++core_idx) { if (core_idx == 0) { per_core_stats_.AccessAtCore(core_idx)->tickers_[tickerType] = count; @@ -95,8 +300,7 @@ uint64_t StatisticsImpl::getAndResetTickerCount(uint32_t tickerType) { uint64_t sum = 0; { MutexLock lock(&aggregate_lock_); - assert(enable_internal_stats_ ? tickerType < INTERNAL_TICKER_ENUM_MAX - : tickerType < TICKER_ENUM_MAX); + assert(tickerType < TICKER_ENUM_MAX); for (size_t core_idx = 0; core_idx < per_core_stats_.Size(); ++core_idx) { sum += per_core_stats_.AccessAtCore(core_idx)->tickers_[tickerType].exchange( @@ -110,10 +314,7 @@ uint64_t StatisticsImpl::getAndResetTickerCount(uint32_t tickerType) { } void StatisticsImpl::recordTick(uint32_t tickerType, uint64_t count) { - assert( - enable_internal_stats_ ? - tickerType < INTERNAL_TICKER_ENUM_MAX : - tickerType < TICKER_ENUM_MAX); + assert(tickerType < TICKER_ENUM_MAX); per_core_stats_.Access()->tickers_[tickerType].fetch_add( count, std::memory_order_relaxed); if (stats_ && tickerType < TICKER_ENUM_MAX) { @@ -121,14 +322,14 @@ void StatisticsImpl::recordTick(uint32_t tickerType, uint64_t count) { } } -void StatisticsImpl::measureTime(uint32_t histogramType, uint64_t value) { - assert( - enable_internal_stats_ ? - histogramType < INTERNAL_HISTOGRAM_ENUM_MAX : - histogramType < HISTOGRAM_ENUM_MAX); +void StatisticsImpl::recordInHistogram(uint32_t histogramType, uint64_t value) { + assert(histogramType < HISTOGRAM_ENUM_MAX); + if (get_stats_level() <= StatsLevel::kExceptHistogramOrTimers) { + return; + } per_core_stats_.Access()->histograms_[histogramType].Add(value); if (stats_ && histogramType < HISTOGRAM_ENUM_MAX) { - stats_->measureTime(histogramType, value); + stats_->recordInHistogram(histogramType, value); } } @@ -157,41 +358,50 @@ std::string StatisticsImpl::ToString() const { std::string res; res.reserve(20000); for (const auto& t : TickersNameMap) { - if (t.first < TICKER_ENUM_MAX || enable_internal_stats_) { - char buffer[kTmpStrBufferSize]; - snprintf(buffer, kTmpStrBufferSize, "%s COUNT : %" PRIu64 "\n", - t.second.c_str(), getTickerCountLocked(t.first)); - res.append(buffer); - } + assert(t.first < TICKER_ENUM_MAX); + char buffer[kTmpStrBufferSize]; + snprintf(buffer, kTmpStrBufferSize, "%s COUNT : %" PRIu64 "\n", + t.second.c_str(), getTickerCountLocked(t.first)); + res.append(buffer); } for (const auto& h : HistogramsNameMap) { - if (h.first < HISTOGRAM_ENUM_MAX || enable_internal_stats_) { - char buffer[kTmpStrBufferSize]; - HistogramData hData; - getHistogramImplLocked(h.first)->Data(&hData); - // don't handle failures - buffer should always be big enough and arguments - // should be provided correctly - int ret = snprintf( - buffer, kTmpStrBufferSize, - "%s P50 : %f P95 : %f P99 : %f P100 : %f COUNT : %" PRIu64 " SUM : %" - PRIu64 "\n", h.second.c_str(), hData.median, hData.percentile95, - hData.percentile99, hData.max, hData.count, hData.sum); - if (ret < 0 || ret >= kTmpStrBufferSize) { - assert(false); - continue; - } - res.append(buffer); + assert(h.first < HISTOGRAM_ENUM_MAX); + char buffer[kTmpStrBufferSize]; + HistogramData hData; + getHistogramImplLocked(h.first)->Data(&hData); + // don't handle failures - buffer should always be big enough and arguments + // should be provided correctly + int ret = + snprintf(buffer, kTmpStrBufferSize, + "%s P50 : %f P95 : %f P99 : %f P100 : %f COUNT : %" PRIu64 + " SUM : %" PRIu64 "\n", + h.second.c_str(), hData.median, hData.percentile95, + hData.percentile99, hData.max, hData.count, hData.sum); + if (ret < 0 || ret >= kTmpStrBufferSize) { + assert(false); + continue; } + res.append(buffer); } res.shrink_to_fit(); return res; } -bool StatisticsImpl::HistEnabledForType(uint32_t type) const { - if (LIKELY(!enable_internal_stats_)) { - return type < HISTOGRAM_ENUM_MAX; +bool StatisticsImpl::getTickerMap( + std::map* stats_map) const { + assert(stats_map); + if (!stats_map) return false; + stats_map->clear(); + MutexLock lock(&aggregate_lock_); + for (const auto& t : TickersNameMap) { + assert(t.first < TICKER_ENUM_MAX); + (*stats_map)[t.second.c_str()] = getTickerCountLocked(t.first); } return true; } +bool StatisticsImpl::HistEnabledForType(uint32_t type) const { + return type < HISTOGRAM_ENUM_MAX; +} + } // namespace rocksdb diff --git a/ceph/src/rocksdb/monitoring/statistics.h b/ceph/src/rocksdb/monitoring/statistics.h index 4427c8c54..952bf8cb4 100644 --- a/ceph/src/rocksdb/monitoring/statistics.h +++ b/ceph/src/rocksdb/monitoring/statistics.h @@ -6,9 +6,10 @@ #pragma once #include "rocksdb/statistics.h" -#include #include +#include #include +#include #include "monitoring/histogram.h" #include "port/likely.h" @@ -41,8 +42,7 @@ enum HistogramsInternal : uint32_t { class StatisticsImpl : public Statistics { public: - StatisticsImpl(std::shared_ptr stats, - bool enable_internal_stats); + StatisticsImpl(std::shared_ptr stats); virtual ~StatisticsImpl(); virtual uint64_t getTickerCount(uint32_t ticker_type) const override; @@ -53,17 +53,24 @@ class StatisticsImpl : public Statistics { virtual void setTickerCount(uint32_t ticker_type, uint64_t count) override; virtual uint64_t getAndResetTickerCount(uint32_t ticker_type) override; virtual void recordTick(uint32_t ticker_type, uint64_t count) override; - virtual void measureTime(uint32_t histogram_type, uint64_t value) override; + // The function is implemented for now for backward compatibility reason. + // In case a user explictly calls it, for example, they may have a wrapped + // Statistics object, passing the call to recordTick() into here, nothing + // will break. + void measureTime(uint32_t histogramType, uint64_t time) override { + recordInHistogram(histogramType, time); + } + virtual void recordInHistogram(uint32_t histogram_type, + uint64_t value) override; virtual Status Reset() override; virtual std::string ToString() const override; + virtual bool getTickerMap(std::map*) const override; virtual bool HistEnabledForType(uint32_t type) const override; private: // If non-nullptr, forwards updates to the object pointed to by `stats_`. std::shared_ptr stats_; - // TODO(ajkr): clean this up since there are no internal stats anymore - bool enable_internal_stats_; // Synchronizes anything that operates across other cores' local data, // such that operations like Reset() can be performed atomically. mutable port::Mutex aggregate_lock_; @@ -100,10 +107,17 @@ class StatisticsImpl : public Statistics { }; // Utility functions -inline void MeasureTime(Statistics* statistics, uint32_t histogram_type, - uint64_t value) { +inline void RecordInHistogram(Statistics* statistics, uint32_t histogram_type, + uint64_t value) { + if (statistics) { + statistics->recordInHistogram(histogram_type, value); + } +} + +inline void RecordTimeToHistogram(Statistics* statistics, + uint32_t histogram_type, uint64_t value) { if (statistics) { - statistics->measureTime(histogram_type, value); + statistics->reportTimeToHistogram(histogram_type, value); } } diff --git a/ceph/src/rocksdb/monitoring/statistics_test.cc b/ceph/src/rocksdb/monitoring/statistics_test.cc index 43aacde9c..a77022bfb 100644 --- a/ceph/src/rocksdb/monitoring/statistics_test.cc +++ b/ceph/src/rocksdb/monitoring/statistics_test.cc @@ -16,7 +16,7 @@ class StatisticsTest : public testing::Test {}; // Sanity check to make sure that contents and order of TickersNameMap // match Tickers enum -TEST_F(StatisticsTest, Sanity) { +TEST_F(StatisticsTest, SanityTickers) { EXPECT_EQ(static_cast(Tickers::TICKER_ENUM_MAX), TickersNameMap.size()); @@ -26,6 +26,18 @@ TEST_F(StatisticsTest, Sanity) { } } +// Sanity check to make sure that contents and order of HistogramsNameMap +// match Tickers enum +TEST_F(StatisticsTest, SanityHistograms) { + EXPECT_EQ(static_cast(Histograms::HISTOGRAM_ENUM_MAX), + HistogramsNameMap.size()); + + for (uint32_t h = 0; h < Histograms::HISTOGRAM_ENUM_MAX; h++) { + auto pair = HistogramsNameMap[static_cast(h)]; + ASSERT_EQ(pair.first, h) << "Miss match at " << pair.second; + } +} + } // namespace rocksdb int main(int argc, char** argv) { diff --git a/ceph/src/rocksdb/options/cf_options.cc b/ceph/src/rocksdb/options/cf_options.cc index 37ef71065..6957e150f 100644 --- a/ceph/src/rocksdb/options/cf_options.cc +++ b/ceph/src/rocksdb/options/cf_options.cc @@ -17,6 +17,7 @@ #include "port/port.h" #include "rocksdb/env.h" #include "rocksdb/options.h" +#include "rocksdb/concurrent_task_limiter.h" namespace rocksdb { @@ -75,7 +76,8 @@ ImmutableCFOptions::ImmutableCFOptions(const ImmutableDBOptions& db_options, max_subcompactions(db_options.max_subcompactions), memtable_insert_with_hint_prefix_extractor( cf_options.memtable_insert_with_hint_prefix_extractor.get()), - cf_paths(cf_options.cf_paths) {} + cf_paths(cf_options.cf_paths), + compaction_thread_limiter(cf_options.compaction_thread_limiter) {} // Multiple two operands. If they overflow, return op1. uint64_t MultiplyCheckOverflow(uint64_t op1, double op2) { @@ -133,6 +135,8 @@ void MutableCFOptions::Dump(Logger* log) const { arena_block_size); ROCKS_LOG_INFO(log, " memtable_prefix_bloom_ratio: %f", memtable_prefix_bloom_size_ratio); + ROCKS_LOG_INFO(log, " memtable_whole_key_filtering: %d", + memtable_whole_key_filtering); ROCKS_LOG_INFO(log, " memtable_huge_page_size: %" ROCKSDB_PRIszt, memtable_huge_page_size); @@ -214,8 +218,6 @@ void MutableCFOptions::Dump(Logger* log) const { // FIFO Compaction Options ROCKS_LOG_INFO(log, "compaction_options_fifo.max_table_files_size : %" PRIu64, compaction_options_fifo.max_table_files_size); - ROCKS_LOG_INFO(log, "compaction_options_fifo.ttl : %" PRIu64, - compaction_options_fifo.ttl); ROCKS_LOG_INFO(log, "compaction_options_fifo.allow_compaction : %d", compaction_options_fifo.allow_compaction); } diff --git a/ceph/src/rocksdb/options/cf_options.h b/ceph/src/rocksdb/options/cf_options.h index 1658bf427..fed144e4c 100644 --- a/ceph/src/rocksdb/options/cf_options.h +++ b/ceph/src/rocksdb/options/cf_options.h @@ -18,7 +18,7 @@ namespace rocksdb { // ImmutableCFOptions is a data struct used by RocksDB internal. It contains a // subset of Options that should not be changed during the entire lifetime // of DB. Raw pointers defined in this struct do not have ownership to the data -// they point to. Options contains shared_ptr to these data. +// they point to. Options contains std::shared_ptr to these data. struct ImmutableCFOptions { ImmutableCFOptions(); explicit ImmutableCFOptions(const Options& options); @@ -120,6 +120,8 @@ struct ImmutableCFOptions { const SliceTransform* memtable_insert_with_hint_prefix_extractor; std::vector cf_paths; + + std::shared_ptr compaction_thread_limiter; }; struct MutableCFOptions { @@ -129,6 +131,7 @@ struct MutableCFOptions { arena_block_size(options.arena_block_size), memtable_prefix_bloom_size_ratio( options.memtable_prefix_bloom_size_ratio), + memtable_whole_key_filtering(options.memtable_whole_key_filtering), memtable_huge_page_size(options.memtable_huge_page_size), max_successive_merges(options.max_successive_merges), inplace_update_num_locks(options.inplace_update_num_locks), @@ -156,7 +159,8 @@ struct MutableCFOptions { options.max_sequential_skip_in_iterations), paranoid_file_checks(options.paranoid_file_checks), report_bg_io_stats(options.report_bg_io_stats), - compression(options.compression) { + compression(options.compression), + sample_for_compression(options.sample_for_compression) { RefreshDerivedOptions(options.num_levels, options.compaction_style); } @@ -165,6 +169,7 @@ struct MutableCFOptions { max_write_buffer_number(0), arena_block_size(0), memtable_prefix_bloom_size_ratio(0), + memtable_whole_key_filtering(false), memtable_huge_page_size(0), max_successive_merges(0), inplace_update_num_locks(0), @@ -185,7 +190,8 @@ struct MutableCFOptions { max_sequential_skip_in_iterations(0), paranoid_file_checks(false), report_bg_io_stats(false), - compression(Snappy_Supported() ? kSnappyCompression : kNoCompression) {} + compression(Snappy_Supported() ? kSnappyCompression : kNoCompression), + sample_for_compression(0) {} explicit MutableCFOptions(const Options& options); @@ -211,6 +217,7 @@ struct MutableCFOptions { int max_write_buffer_number; size_t arena_block_size; double memtable_prefix_bloom_size_ratio; + bool memtable_whole_key_filtering; size_t memtable_huge_page_size; size_t max_successive_merges; size_t inplace_update_num_locks; @@ -238,6 +245,7 @@ struct MutableCFOptions { bool paranoid_file_checks; bool report_bg_io_stats; CompressionType compression; + uint64_t sample_for_compression; // Derived options // Per-level target file size. diff --git a/ceph/src/rocksdb/options/db_options.cc b/ceph/src/rocksdb/options/db_options.cc index fd3cdcccd..f24705cb7 100644 --- a/ceph/src/rocksdb/options/db_options.cc +++ b/ceph/src/rocksdb/options/db_options.cc @@ -85,7 +85,9 @@ ImmutableDBOptions::ImmutableDBOptions(const DBOptions& options) allow_ingest_behind(options.allow_ingest_behind), preserve_deletes(options.preserve_deletes), two_write_queues(options.two_write_queues), - manual_wal_flush(options.manual_wal_flush) { + manual_wal_flush(options.manual_wal_flush), + atomic_flush(options.atomic_flush), + avoid_unnecessary_blocking_io(options.avoid_unnecessary_blocking_io) { } void ImmutableDBOptions::Dump(Logger* log) const { @@ -178,7 +180,7 @@ void ImmutableDBOptions::Dump(Logger* log) const { log, " Options.sst_file_manager.rate_bytes_per_sec: %" PRIi64, sst_file_manager ? sst_file_manager->GetDeleteRateBytesPerSecond() : 0); ROCKS_LOG_HEADER(log, " Options.wal_recovery_mode: %d", - wal_recovery_mode); + static_cast(wal_recovery_mode)); ROCKS_LOG_HEADER(log, " Options.enable_thread_tracking: %d", enable_thread_tracking); ROCKS_LOG_HEADER(log, " Options.enable_pipelined_write: %d", @@ -195,7 +197,8 @@ void ImmutableDBOptions::Dump(Logger* log) const { write_thread_slow_yield_usec); if (row_cache) { ROCKS_LOG_HEADER( - log, " Options.row_cache: %" PRIu64, + log, + " Options.row_cache: %" ROCKSDB_PRIszt, row_cache->GetCapacity()); } else { ROCKS_LOG_HEADER(log, @@ -216,6 +219,10 @@ void ImmutableDBOptions::Dump(Logger* log) const { two_write_queues); ROCKS_LOG_HEADER(log, " Options.manual_wal_flush: %d", manual_wal_flush); + ROCKS_LOG_HEADER(log, " Options.atomic_flush: %d", atomic_flush); + ROCKS_LOG_HEADER(log, + " Options.avoid_unnecessary_blocking_io: %d", + avoid_unnecessary_blocking_io); } MutableDBOptions::MutableDBOptions() @@ -228,6 +235,8 @@ MutableDBOptions::MutableDBOptions() max_total_wal_size(0), delete_obsolete_files_period_micros(6ULL * 60 * 60 * 1000000), stats_dump_period_sec(600), + stats_persist_period_sec(600), + stats_history_buffer_size(1024 * 1024), max_open_files(-1), bytes_per_sync(0), wal_bytes_per_sync(0), @@ -244,6 +253,8 @@ MutableDBOptions::MutableDBOptions(const DBOptions& options) delete_obsolete_files_period_micros( options.delete_obsolete_files_period_micros), stats_dump_period_sec(options.stats_dump_period_sec), + stats_persist_period_sec(options.stats_persist_period_sec), + stats_history_buffer_size(options.stats_history_buffer_size), max_open_files(options.max_open_files), bytes_per_sync(options.bytes_per_sync), wal_bytes_per_sync(options.wal_bytes_per_sync), @@ -268,6 +279,12 @@ void MutableDBOptions::Dump(Logger* log) const { delete_obsolete_files_period_micros); ROCKS_LOG_HEADER(log, " Options.stats_dump_period_sec: %u", stats_dump_period_sec); + ROCKS_LOG_HEADER(log, " Options.stats_persist_period_sec: %d", + stats_persist_period_sec); + ROCKS_LOG_HEADER( + log, + " Options.stats_history_buffer_size: %" ROCKSDB_PRIszt, + stats_history_buffer_size); ROCKS_LOG_HEADER(log, " Options.max_open_files: %d", max_open_files); ROCKS_LOG_HEADER(log, diff --git a/ceph/src/rocksdb/options/db_options.h b/ceph/src/rocksdb/options/db_options.h index 107d35c87..283cf7d35 100644 --- a/ceph/src/rocksdb/options/db_options.h +++ b/ceph/src/rocksdb/options/db_options.h @@ -78,6 +78,8 @@ struct ImmutableDBOptions { bool preserve_deletes; bool two_write_queues; bool manual_wal_flush; + bool atomic_flush; + bool avoid_unnecessary_blocking_io; }; struct MutableDBOptions { @@ -96,6 +98,8 @@ struct MutableDBOptions { uint64_t max_total_wal_size; uint64_t delete_obsolete_files_period_micros; unsigned int stats_dump_period_sec; + unsigned int stats_persist_period_sec; + size_t stats_history_buffer_size; int max_open_files; uint64_t bytes_per_sync; uint64_t wal_bytes_per_sync; diff --git a/ceph/src/rocksdb/options/options.cc b/ceph/src/rocksdb/options/options.cc index 17798accb..2c9954581 100644 --- a/ceph/src/rocksdb/options/options.cc +++ b/ceph/src/rocksdb/options/options.cc @@ -51,6 +51,7 @@ AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions(const Options& options) inplace_callback(options.inplace_callback), memtable_prefix_bloom_size_ratio( options.memtable_prefix_bloom_size_ratio), + memtable_whole_key_filtering(options.memtable_whole_key_filtering), memtable_huge_page_size(options.memtable_huge_page_size), memtable_insert_with_hint_prefix_extractor( options.memtable_insert_with_hint_prefix_extractor), @@ -86,7 +87,8 @@ AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions(const Options& options) paranoid_file_checks(options.paranoid_file_checks), force_consistency_checks(options.force_consistency_checks), report_bg_io_stats(options.report_bg_io_stats), - ttl(options.ttl) { + ttl(options.ttl), + sample_for_compression(options.sample_for_compression) { assert(memtable_factory.get() != nullptr); if (max_bytes_for_level_multiplier_additional.size() < static_cast(num_levels)) { @@ -171,12 +173,12 @@ void ColumnFamilyOptions::Dump(Logger* log) const { ROCKS_LOG_HEADER( log, " Options.bottommost_compression_opts.max_dict_bytes: " - "%" ROCKSDB_PRIszt, + "%" PRIu32, bottommost_compression_opts.max_dict_bytes); ROCKS_LOG_HEADER( log, " Options.bottommost_compression_opts.zstd_max_train_bytes: " - "%" ROCKSDB_PRIszt, + "%" PRIu32, bottommost_compression_opts.zstd_max_train_bytes); ROCKS_LOG_HEADER( log, " Options.bottommost_compression_opts.enabled: %s", @@ -189,11 +191,11 @@ void ColumnFamilyOptions::Dump(Logger* log) const { compression_opts.strategy); ROCKS_LOG_HEADER( log, - " Options.compression_opts.max_dict_bytes: %" ROCKSDB_PRIszt, + " Options.compression_opts.max_dict_bytes: %" PRIu32, compression_opts.max_dict_bytes); ROCKS_LOG_HEADER(log, " Options.compression_opts.zstd_max_train_bytes: " - "%" ROCKSDB_PRIszt, + "%" PRIu32, compression_opts.zstd_max_train_bytes); ROCKS_LOG_HEADER(log, " Options.compression_opts.enabled: %s", @@ -306,8 +308,6 @@ void ColumnFamilyOptions::Dump(Logger* log) const { ROCKS_LOG_HEADER(log, "Options.compaction_options_fifo.allow_compaction: %d", compaction_options_fifo.allow_compaction); - ROCKS_LOG_HEADER(log, "Options.compaction_options_fifo.ttl: %" PRIu64, - compaction_options_fifo.ttl); std::string collector_names; for (const auto& collector_factory : table_properties_collector_factories) { collector_names.append(collector_factory->Name()); @@ -327,6 +327,9 @@ void ColumnFamilyOptions::Dump(Logger* log) const { ROCKS_LOG_HEADER( log, " Options.memtable_prefix_bloom_size_ratio: %f", memtable_prefix_bloom_size_ratio); + ROCKS_LOG_HEADER(log, + " Options.memtable_whole_key_filtering: %d", + memtable_whole_key_filtering); ROCKS_LOG_HEADER(log, " Options.memtable_huge_page_size: %" ROCKSDB_PRIszt, memtable_huge_page_size); @@ -347,7 +350,8 @@ void ColumnFamilyOptions::Dump(Logger* log) const { force_consistency_checks); ROCKS_LOG_HEADER(log, " Options.report_bg_io_stats: %d", report_bg_io_stats); - ROCKS_LOG_HEADER(log, " Options.ttl: %d", ttl); + ROCKS_LOG_HEADER(log, " Options.ttl: %" PRIu64, + ttl); } // ColumnFamilyOptions::Dump void Options::Dump(Logger* log) const { @@ -439,6 +443,10 @@ DBOptions* DBOptions::OldDefaults(int rocksdb_major_version, ColumnFamilyOptions* ColumnFamilyOptions::OldDefaults( int rocksdb_major_version, int rocksdb_minor_version) { + if (rocksdb_major_version < 5 || + (rocksdb_major_version == 5 && rocksdb_minor_version <= 18)) { + compaction_pri = CompactionPri::kByCompensatedSize; + } if (rocksdb_major_version < 4 || (rocksdb_major_version == 4 && rocksdb_minor_version < 7)) { write_buffer_size = 4 << 20; @@ -452,7 +460,6 @@ ColumnFamilyOptions* ColumnFamilyOptions::OldDefaults( } else if (rocksdb_major_version == 5 && rocksdb_minor_version < 2) { level0_stop_writes_trigger = 30; } - compaction_pri = CompactionPri::kByCompensatedSize; return this; } diff --git a/ceph/src/rocksdb/options/options_helper.cc b/ceph/src/rocksdb/options/options_helper.cc index f4c59ff06..9facf6e94 100644 --- a/ceph/src/rocksdb/options/options_helper.cc +++ b/ceph/src/rocksdb/options/options_helper.cc @@ -19,6 +19,7 @@ #include "rocksdb/rate_limiter.h" #include "rocksdb/slice_transform.h" #include "rocksdb/table.h" +#include "rocksdb/utilities/object_registry.h" #include "table/block_based_table_factory.h" #include "table/plain_table_factory.h" #include "util/cast_util.h" @@ -79,6 +80,10 @@ DBOptions BuildDBOptions(const ImmutableDBOptions& immutable_db_options, options.allow_fallocate = immutable_db_options.allow_fallocate; options.is_fd_close_on_exec = immutable_db_options.is_fd_close_on_exec; options.stats_dump_period_sec = mutable_db_options.stats_dump_period_sec; + options.stats_persist_period_sec = + mutable_db_options.stats_persist_period_sec; + options.stats_history_buffer_size = + mutable_db_options.stats_history_buffer_size; options.advise_random_on_open = immutable_db_options.advise_random_on_open; options.db_write_buffer_size = immutable_db_options.db_write_buffer_size; options.write_buffer_manager = immutable_db_options.write_buffer_manager; @@ -126,6 +131,9 @@ DBOptions BuildDBOptions(const ImmutableDBOptions& immutable_db_options, immutable_db_options.preserve_deletes; options.two_write_queues = immutable_db_options.two_write_queues; options.manual_wal_flush = immutable_db_options.manual_wal_flush; + options.atomic_flush = immutable_db_options.atomic_flush; + options.avoid_unnecessary_blocking_io = + immutable_db_options.avoid_unnecessary_blocking_io; return options; } @@ -141,6 +149,8 @@ ColumnFamilyOptions BuildColumnFamilyOptions( cf_opts.arena_block_size = mutable_cf_options.arena_block_size; cf_opts.memtable_prefix_bloom_size_ratio = mutable_cf_options.memtable_prefix_bloom_size_ratio; + cf_opts.memtable_whole_key_filtering = + mutable_cf_options.memtable_whole_key_filtering; cf_opts.memtable_huge_page_size = mutable_cf_options.memtable_huge_page_size; cf_opts.max_successive_merges = mutable_cf_options.max_successive_merges; cf_opts.inplace_update_num_locks = @@ -186,6 +196,7 @@ ColumnFamilyOptions BuildColumnFamilyOptions( cf_opts.paranoid_file_checks = mutable_cf_options.paranoid_file_checks; cf_opts.report_bg_io_stats = mutable_cf_options.report_bg_io_stats; cf_opts.compression = mutable_cf_options.compression; + cf_opts.sample_for_compression = mutable_cf_options.sample_for_compression; cf_opts.table_factory = options.table_factory; // TODO(yhchiang): find some way to handle the following derived options @@ -215,7 +226,8 @@ std::map std::unordered_map OptionsHelper::checksum_type_string_map = {{"kNoChecksum", kNoChecksum}, {"kCRC32c", kCRC32c}, - {"kxxHash", kxxHash}}; + {"kxxHash", kxxHash}, + {"kxxHash64", kxxHash64}}; std::unordered_map OptionsHelper::compression_type_string_map = { @@ -231,6 +243,9 @@ std::unordered_map {"kDisableCompressionOption", kDisableCompressionOption}}; #ifndef ROCKSDB_LITE +const std::string kNameComparator = "comparator"; +const std::string kNameMergeOperator = "merge_operator"; + template Status GetStringFromStruct( std::string* opt_string, const T& options, @@ -446,6 +461,12 @@ bool ParseOptionHelper(char* opt_address, const OptionType& opt_type, case OptionType::kInt: *reinterpret_cast(opt_address) = ParseInt(value); break; + case OptionType::kInt32T: + *reinterpret_cast(opt_address) = ParseInt32(value); + break; + case OptionType::kInt64T: + PutUnaligned(reinterpret_cast(opt_address), ParseInt64(value)); + break; case OptionType::kVectorInt: *reinterpret_cast*>(opt_address) = ParseVectorInt(value); break; @@ -555,6 +576,16 @@ bool SerializeSingleOptionHelper(const char* opt_address, case OptionType::kInt: *value = ToString(*(reinterpret_cast(opt_address))); break; + case OptionType::kInt32T: + *value = ToString(*(reinterpret_cast(opt_address))); + break; + case OptionType::kInt64T: + { + int64_t v; + GetUnaligned(reinterpret_cast(opt_address), &v); + *value = ToString(v); + } + break; case OptionType::kVectorInt: return SerializeIntVector( *reinterpret_cast*>(opt_address), value); @@ -988,6 +1019,26 @@ Status ParseColumnFamilyOption(const std::string& name, return s; } } else { + if (name == kNameComparator) { + // Try to get comparator from object registry first. + std::unique_ptr comp_guard; + const Comparator* comp = + NewCustomObject(value, &comp_guard); + // Only support static comparator for now. + if (comp != nullptr && !comp_guard) { + new_options->comparator = comp; + } + } else if (name == kNameMergeOperator) { + // Try to get merge operator from object registry first. + std::unique_ptr> mo_guard; + std::shared_ptr* mo = + NewCustomObject>(value, &mo_guard); + // Only support static comparator for now. + if (mo != nullptr) { + new_options->merge_operator = *mo; + } + } + auto iter = cf_options_type_info.find(name); if (iter == cf_options_type_info.end()) { return Status::InvalidArgument( @@ -1491,6 +1542,14 @@ std::unordered_map {offsetof(struct DBOptions, stats_dump_period_sec), OptionType::kUInt, OptionVerificationType::kNormal, true, offsetof(struct MutableDBOptions, stats_dump_period_sec)}}, + {"stats_persist_period_sec", + {offsetof(struct DBOptions, stats_persist_period_sec), + OptionType::kUInt, OptionVerificationType::kNormal, true, + offsetof(struct MutableDBOptions, stats_persist_period_sec)}}, + {"stats_history_buffer_size", + {offsetof(struct DBOptions, stats_history_buffer_size), + OptionType::kSizeT, OptionVerificationType::kNormal, true, + offsetof(struct MutableDBOptions, stats_history_buffer_size)}}, {"fail_if_options_file_error", {offsetof(struct DBOptions, fail_if_options_file_error), OptionType::kBoolean, OptionVerificationType::kNormal, false, 0}}, @@ -1554,7 +1613,16 @@ std::unordered_map offsetof(struct ImmutableDBOptions, manual_wal_flush)}}, {"seq_per_batch", {0, OptionType::kBoolean, OptionVerificationType::kDeprecated, false, - 0}}}; + 0}}, + {"atomic_flush", + {offsetof(struct DBOptions, atomic_flush), OptionType::kBoolean, + OptionVerificationType::kNormal, false, + offsetof(struct ImmutableDBOptions, atomic_flush)}}, + {"avoid_unnecessary_blocking_io", + {offsetof(struct DBOptions, avoid_unnecessary_blocking_io), + OptionType::kBoolean, OptionVerificationType::kNormal, false, + offsetof(struct ImmutableDBOptions, avoid_unnecessary_blocking_io)}} + }; std::unordered_map OptionsHelper::block_base_table_index_type_string_map = { @@ -1795,6 +1863,10 @@ std::unordered_map {"memtable_prefix_bloom_probes", {0, OptionType::kUInt32T, OptionVerificationType::kDeprecated, true, 0}}, + {"memtable_whole_key_filtering", + {offset_of(&ColumnFamilyOptions::memtable_whole_key_filtering), + OptionType::kBoolean, OptionVerificationType::kNormal, true, + offsetof(struct MutableCFOptions, memtable_whole_key_filtering)}}, {"min_partial_merge_operands", {0, OptionType::kUInt32T, OptionVerificationType::kDeprecated, true, 0}}, @@ -1835,7 +1907,7 @@ std::unordered_map {offset_of(&ColumnFamilyOptions::bottommost_compression), OptionType::kCompressionType, OptionVerificationType::kNormal, false, 0}}, - {"comparator", + {kNameComparator, {offset_of(&ColumnFamilyOptions::comparator), OptionType::kComparator, OptionVerificationType::kByName, false, 0}}, {"prefix_extractor", @@ -1863,7 +1935,7 @@ std::unordered_map {offset_of(&ColumnFamilyOptions::compaction_filter_factory), OptionType::kCompactionFilterFactory, OptionVerificationType::kByName, false, 0}}, - {"merge_operator", + {kNameMergeOperator, {offset_of(&ColumnFamilyOptions::merge_operator), OptionType::kMergeOperator, OptionVerificationType::kByNameAllowFromNull, false, 0}}, @@ -1887,7 +1959,11 @@ std::unordered_map {"ttl", {offset_of(&ColumnFamilyOptions::ttl), OptionType::kUInt64T, OptionVerificationType::kNormal, true, - offsetof(struct MutableCFOptions, ttl)}}}; + offsetof(struct MutableCFOptions, ttl)}}, + {"sample_for_compression", + {offset_of(&ColumnFamilyOptions::sample_for_compression), + OptionType::kUInt64T, OptionVerificationType::kNormal, true, + offsetof(struct MutableCFOptions, sample_for_compression)}}}; std::unordered_map OptionsHelper::fifo_compaction_options_type_info = { @@ -1896,9 +1972,9 @@ std::unordered_map OptionType::kUInt64T, OptionVerificationType::kNormal, true, offsetof(struct CompactionOptionsFIFO, max_table_files_size)}}, {"ttl", - {offset_of(&CompactionOptionsFIFO::ttl), OptionType::kUInt64T, - OptionVerificationType::kNormal, true, - offsetof(struct CompactionOptionsFIFO, ttl)}}, + {0, OptionType::kUInt64T, + OptionVerificationType::kDeprecated, false, + 0}}, {"allow_compaction", {offset_of(&CompactionOptionsFIFO::allow_compaction), OptionType::kBoolean, OptionVerificationType::kNormal, true, diff --git a/ceph/src/rocksdb/options/options_helper.h b/ceph/src/rocksdb/options/options_helper.h index 016a0a1af..1d3d880a6 100644 --- a/ceph/src/rocksdb/options/options_helper.h +++ b/ceph/src/rocksdb/options/options_helper.h @@ -47,6 +47,8 @@ Status GetTableFactoryFromMap( enum class OptionType { kBoolean, kInt, + kInt32T, + kInt64T, kVectorInt, kUInt, kUInt32T, diff --git a/ceph/src/rocksdb/options/options_parser.cc b/ceph/src/rocksdb/options/options_parser.cc index f9144b67d..2a85fa534 100644 --- a/ceph/src/rocksdb/options/options_parser.cc +++ b/ceph/src/rocksdb/options/options_parser.cc @@ -48,7 +48,7 @@ Status PersistRocksDBOptions(const DBOptions& db_opt, if (!s.ok()) { return s; } - unique_ptr writable; + std::unique_ptr writable; writable.reset(new WritableFileWriter(std::move(wf), file_name, EnvOptions(), nullptr /* statistics */)); @@ -500,6 +500,16 @@ bool AreEqualOptions( case OptionType::kInt: return (*reinterpret_cast(offset1) == *reinterpret_cast(offset2)); + case OptionType::kInt32T: + return (*reinterpret_cast(offset1) == + *reinterpret_cast(offset2)); + case OptionType::kInt64T: + { + int64_t v1, v2; + GetUnaligned(reinterpret_cast(offset1), &v1); + GetUnaligned(reinterpret_cast(offset2), &v2); + return (v1 == v2); + } case OptionType::kVectorInt: return (*reinterpret_cast*>(offset1) == *reinterpret_cast*>(offset2)); @@ -574,7 +584,7 @@ bool AreEqualOptions( CompactionOptionsFIFO rhs = *reinterpret_cast(offset2); if (lhs.max_table_files_size == rhs.max_table_files_size && - lhs.ttl == rhs.ttl && lhs.allow_compaction == rhs.allow_compaction) { + lhs.allow_compaction == rhs.allow_compaction) { return true; } return false; diff --git a/ceph/src/rocksdb/options/options_settable_test.cc b/ceph/src/rocksdb/options/options_settable_test.cc index ded152ba9..3a6bd6a88 100644 --- a/ceph/src/rocksdb/options/options_settable_test.cc +++ b/ceph/src/rocksdb/options/options_settable_test.cc @@ -266,6 +266,8 @@ TEST_F(OptionsSettableTest, DBOptionsAllFieldsSettable) { "manifest_preallocation_size=1222;" "allow_mmap_writes=false;" "stats_dump_period_sec=70127;" + "stats_persist_period_sec=54321;" + "stats_history_buffer_size=14159;" "allow_fallocate=true;" "allow_mmap_reads=false;" "use_direct_reads=false;" @@ -291,7 +293,9 @@ TEST_F(OptionsSettableTest, DBOptionsAllFieldsSettable) { "concurrent_prepare=false;" "two_write_queues=false;" "manual_wal_flush=false;" - "seq_per_batch=false;", + "seq_per_batch=false;" + "atomic_flush=false;" + "avoid_unnecessary_blocking_io=false", new_options)); ASSERT_EQ(unset_bytes_base, NumUnsetBytes(new_options_ptr, sizeof(DBOptions), @@ -350,6 +354,8 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { sizeof(std::shared_ptr)}, {offset_of(&ColumnFamilyOptions::cf_paths), sizeof(std::vector)}, + {offset_of(&ColumnFamilyOptions::compaction_thread_limiter), + sizeof(std::shared_ptr)}, }; char* options_ptr = new char[sizeof(ColumnFamilyOptions)]; @@ -388,6 +394,7 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { options->soft_rate_limit = 0; options->purge_redundant_kvs_while_flush = false; options->max_mem_compaction_level = 0; + options->compaction_filter = nullptr; char* new_options_ptr = new char[sizeof(ColumnFamilyOptions)]; ColumnFamilyOptions* new_options = @@ -431,6 +438,7 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { "max_write_buffer_number_to_maintain=84;" "merge_operator=aabcxehazrMergeOperator;" "memtable_prefix_bloom_size_ratio=0.4642;" + "memtable_whole_key_filtering=true;" "memtable_insert_with_hint_prefix_extractor=rocksdb.CappedPrefix.13;" "paranoid_file_checks=true;" "force_consistency_checks=true;" @@ -444,7 +452,8 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { "disable_auto_compactions=false;" "report_bg_io_stats=true;" "ttl=60;" - "compaction_options_fifo={max_table_files_size=3;ttl=100;allow_" + "sample_for_compression=0;" + "compaction_options_fifo={max_table_files_size=3;allow_" "compaction=false;};", new_options)); diff --git a/ceph/src/rocksdb/options/options_test.cc b/ceph/src/rocksdb/options/options_test.cc index 6dc94af5b..586e5697c 100644 --- a/ceph/src/rocksdb/options/options_test.cc +++ b/ceph/src/rocksdb/options/options_test.cc @@ -21,15 +21,18 @@ #include "options/options_helper.h" #include "options/options_parser.h" #include "options/options_sanity_check.h" +#include "port/port.h" #include "rocksdb/cache.h" #include "rocksdb/convenience.h" #include "rocksdb/memtablerep.h" #include "rocksdb/utilities/leveldb_options.h" +#include "rocksdb/utilities/object_registry.h" #include "util/random.h" #include "util/stderr_logger.h" #include "util/string_util.h" #include "util/testharness.h" #include "util/testutil.h" +#include "utilities/merge_operators/bytesxor.h" #ifndef GFLAGS bool FLAGS_enable_print = false; @@ -90,6 +93,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { {"compaction_measure_io_stats", "false"}, {"inplace_update_num_locks", "25"}, {"memtable_prefix_bloom_size_ratio", "0.26"}, + {"memtable_whole_key_filtering", "true"}, {"memtable_huge_page_size", "28"}, {"bloom_locality", "29"}, {"max_successive_merges", "30"}, @@ -127,6 +131,8 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { {"is_fd_close_on_exec", "true"}, {"skip_log_error_on_recovery", "false"}, {"stats_dump_period_sec", "46"}, + {"stats_persist_period_sec", "57"}, + {"stats_history_buffer_size", "69"}, {"advise_random_on_open", "true"}, {"use_adaptive_mutex", "false"}, {"new_table_reader_for_compaction_inputs", "true"}, @@ -195,6 +201,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { ASSERT_EQ(new_cf_opt.inplace_update_support, true); ASSERT_EQ(new_cf_opt.inplace_update_num_locks, 25U); ASSERT_EQ(new_cf_opt.memtable_prefix_bloom_size_ratio, 0.26); + ASSERT_EQ(new_cf_opt.memtable_whole_key_filtering, true); ASSERT_EQ(new_cf_opt.memtable_huge_page_size, 28U); ASSERT_EQ(new_cf_opt.bloom_locality, 29U); ASSERT_EQ(new_cf_opt.max_successive_merges, 30U); @@ -260,6 +267,8 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { ASSERT_EQ(new_db_opt.is_fd_close_on_exec, true); ASSERT_EQ(new_db_opt.skip_log_error_on_recovery, false); ASSERT_EQ(new_db_opt.stats_dump_period_sec, 46U); + ASSERT_EQ(new_db_opt.stats_persist_period_sec, 57U); + ASSERT_EQ(new_db_opt.stats_history_buffer_size, 69U); ASSERT_EQ(new_db_opt.advise_random_on_open, true); ASSERT_EQ(new_db_opt.use_adaptive_mutex, false); ASSERT_EQ(new_db_opt.new_table_reader_for_compaction_inputs, true); @@ -328,6 +337,34 @@ TEST_F(OptionsTest, GetColumnFamilyOptionsFromStringTest) { &new_cf_opt)); ASSERT_OK(RocksDBOptionsParser::VerifyCFOptions(base_cf_opt, new_cf_opt)); + // Comparator from object registry + std::string kCompName = "reverse_comp"; + static Registrar test_reg_a( + kCompName, [](const std::string& /*name*/, + std::unique_ptr* /*comparator_guard*/) { + return ReverseBytewiseComparator(); + }); + + ASSERT_OK(GetColumnFamilyOptionsFromString( + base_cf_opt, "comparator=" + kCompName + ";", &new_cf_opt)); + ASSERT_EQ(new_cf_opt.comparator, ReverseBytewiseComparator()); + + // MergeOperator from object registry + std::unique_ptr bxo(new BytesXOROperator()); + std::string kMoName = bxo->Name(); + static Registrar> test_reg_b( + kMoName, [](const std::string& /*name*/, + std::unique_ptr>* + merge_operator_guard) { + merge_operator_guard->reset( + new std::shared_ptr(new BytesXOROperator())); + return merge_operator_guard->get(); + }); + + ASSERT_OK(GetColumnFamilyOptionsFromString( + base_cf_opt, "merge_operator=" + kMoName + ";", &new_cf_opt)); + ASSERT_EQ(kMoName, std::string(new_cf_opt.merge_operator->Name())); + // Wrong key/value pair ASSERT_NOK(GetColumnFamilyOptionsFromString(base_cf_opt, "write_buffer_size=13;max_write_buffer_number;", &new_cf_opt)); @@ -704,8 +741,8 @@ TEST_F(OptionsTest, GetMemTableRepFactoryFromString) { &new_mem_factory)); ASSERT_NOK(GetMemTableRepFactoryFromString("cuckoo", &new_mem_factory)); - ASSERT_OK(GetMemTableRepFactoryFromString("cuckoo:1024", &new_mem_factory)); - ASSERT_EQ(std::string(new_mem_factory->Name()), "HashCuckooRepFactory"); + // CuckooHash memtable is already removed. + ASSERT_NOK(GetMemTableRepFactoryFromString("cuckoo:1024", &new_mem_factory)); ASSERT_NOK(GetMemTableRepFactoryFromString("bad_factory", &new_mem_factory)); } @@ -1528,6 +1565,7 @@ TEST_F(OptionsParserTest, DifferentDefault) { const std::string kOptionsFileName = "test-persisted-options.ini"; ColumnFamilyOptions cf_level_opts; + ASSERT_EQ(CompactionPri::kMinOverlappingRatio, cf_level_opts.compaction_pri); cf_level_opts.OptimizeLevelStyleCompaction(); ColumnFamilyOptions cf_univ_opts; @@ -1597,6 +1635,14 @@ TEST_F(OptionsParserTest, DifferentDefault) { Options old_default_opts; old_default_opts.OldDefaults(5, 2); ASSERT_EQ(16 * 1024U * 1024U, old_default_opts.delayed_write_rate); + ASSERT_TRUE(old_default_opts.compaction_pri == + CompactionPri::kByCompensatedSize); + } + { + Options old_default_opts; + old_default_opts.OldDefaults(5, 18); + ASSERT_TRUE(old_default_opts.compaction_pri == + CompactionPri::kByCompensatedSize); } Options small_opts; @@ -1798,6 +1844,18 @@ bool IsEscapedString(const std::string& str) { } } // namespace +TEST_F(OptionsParserTest, IntegerParsing) { + ASSERT_EQ(ParseUint64("18446744073709551615"), 18446744073709551615U); + ASSERT_EQ(ParseUint32("4294967295"), 4294967295U); + ASSERT_EQ(ParseSizeT("18446744073709551615"), 18446744073709551615U); + ASSERT_EQ(ParseInt64("9223372036854775807"), 9223372036854775807U); + ASSERT_EQ(ParseInt64("-9223372036854775808"), port::kMinInt64); + ASSERT_EQ(ParseInt32("2147483647"), 2147483647U); + ASSERT_EQ(ParseInt32("-2147483648"), port::kMinInt32); + ASSERT_EQ(ParseInt("-32767"), -32767); + ASSERT_EQ(ParseDouble("-1.234567"), -1.234567); +} + TEST_F(OptionsParserTest, EscapeOptionString) { ASSERT_EQ(UnescapeOptionString( "This is a test string with \\# \\: and \\\\ escape chars."), diff --git a/ceph/src/rocksdb/port/jemalloc_helper.h b/ceph/src/rocksdb/port/jemalloc_helper.h new file mode 100644 index 000000000..0c216face --- /dev/null +++ b/ceph/src/rocksdb/port/jemalloc_helper.h @@ -0,0 +1,53 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#pragma once + +#ifdef ROCKSDB_JEMALLOC +#ifdef __FreeBSD__ +#include +#else +#include +#endif + +#ifndef JEMALLOC_CXX_THROW +#define JEMALLOC_CXX_THROW +#endif + +// Declare non-standard jemalloc APIs as weak symbols. We can null-check these +// symbols to detect whether jemalloc is linked with the binary. +extern "C" void* mallocx(size_t, int) __attribute__((__weak__)); +extern "C" void* rallocx(void*, size_t, int) __attribute__((__weak__)); +extern "C" size_t xallocx(void*, size_t, size_t, int) __attribute__((__weak__)); +extern "C" size_t sallocx(const void*, int) __attribute__((__weak__)); +extern "C" void dallocx(void*, int) __attribute__((__weak__)); +extern "C" void sdallocx(void*, size_t, int) __attribute__((__weak__)); +extern "C" size_t nallocx(size_t, int) __attribute__((__weak__)); +extern "C" int mallctl(const char*, void*, size_t*, void*, size_t) + __attribute__((__weak__)); +extern "C" int mallctlnametomib(const char*, size_t*, size_t*) + __attribute__((__weak__)); +extern "C" int mallctlbymib(const size_t*, size_t, void*, size_t*, void*, + size_t) __attribute__((__weak__)); +extern "C" void malloc_stats_print(void (*)(void*, const char*), void*, + const char*) __attribute__((__weak__)); +extern "C" size_t malloc_usable_size(JEMALLOC_USABLE_SIZE_CONST void*) + JEMALLOC_CXX_THROW __attribute__((__weak__)); + +// Check if Jemalloc is linked with the binary. Note the main program might be +// using a different memory allocator even this method return true. +// It is loosely based on folly::usingJEMalloc(), minus the check that actually +// allocate memory and see if it is through jemalloc, to handle the dlopen() +// case: +// https://github.com/facebook/folly/blob/76cf8b5841fb33137cfbf8b224f0226437c855bc/folly/memory/Malloc.h#L147 +static inline bool HasJemalloc() { + return mallocx != nullptr && rallocx != nullptr && xallocx != nullptr && + sallocx != nullptr && dallocx != nullptr && sdallocx != nullptr && + nallocx != nullptr && mallctl != nullptr && + mallctlnametomib != nullptr && mallctlbymib != nullptr && + malloc_stats_print != nullptr && malloc_usable_size != nullptr; +} + +#endif // ROCKSDB_JEMALLOC diff --git a/ceph/src/rocksdb/port/dirent.h b/ceph/src/rocksdb/port/port_dirent.h similarity index 100% rename from ceph/src/rocksdb/port/dirent.h rename to ceph/src/rocksdb/port/port_dirent.h diff --git a/ceph/src/rocksdb/port/port_posix.cc b/ceph/src/rocksdb/port/port_posix.cc index 0377d448f..80081e480 100644 --- a/ceph/src/rocksdb/port/port_posix.cc +++ b/ceph/src/rocksdb/port/port_posix.cc @@ -25,6 +25,21 @@ #include "util/logging.h" namespace rocksdb { + +// We want to give users opportunity to default all the mutexes to adaptive if +// not specified otherwise. This enables a quick way to conduct various +// performance related experiements. +// +// NB! Support for adaptive mutexes is turned on by definining +// ROCKSDB_PTHREAD_ADAPTIVE_MUTEX during the compilation. If you use RocksDB +// build environment then this happens automatically; otherwise it's up to the +// consumer to define the identifier. +#ifdef ROCKSDB_DEFAULT_TO_ADAPTIVE_MUTEX +extern const bool kDefaultToAdaptiveMutex = true; +#else +extern const bool kDefaultToAdaptiveMutex = false; +#endif + namespace port { static int PthreadCall(const char* label, int result) { diff --git a/ceph/src/rocksdb/port/port_posix.h b/ceph/src/rocksdb/port/port_posix.h index 2d2a7a79c..63d7239fe 100644 --- a/ceph/src/rocksdb/port/port_posix.h +++ b/ceph/src/rocksdb/port/port_posix.h @@ -82,13 +82,18 @@ #endif namespace rocksdb { + +extern const bool kDefaultToAdaptiveMutex; + namespace port { // For use at db/file_indexer.h kLevelMaxIndex const uint32_t kMaxUint32 = std::numeric_limits::max(); const int kMaxInt32 = std::numeric_limits::max(); +const int kMinInt32 = std::numeric_limits::min(); const uint64_t kMaxUint64 = std::numeric_limits::max(); const int64_t kMaxInt64 = std::numeric_limits::max(); +const int64_t kMinInt64 = std::numeric_limits::min(); const size_t kMaxSizet = std::numeric_limits::max(); static const bool kLittleEndian = PLATFORM_IS_LITTLE_ENDIAN; @@ -98,19 +103,7 @@ class CondVar; class Mutex { public: -// We want to give users opportunity to default all the mutexes to adaptive if -// not specified otherwise. This enables a quick way to conduct various -// performance related experiements. -// -// NB! Support for adaptive mutexes is turned on by definining -// ROCKSDB_PTHREAD_ADAPTIVE_MUTEX during the compilation. If you use RocksDB -// build environment then this happens automatically; otherwise it's up to the -// consumer to define the identifier. -#ifdef ROCKSDB_DEFAULT_TO_ADAPTIVE_MUTEX - explicit Mutex(bool adaptive = true); -#else - explicit Mutex(bool adaptive = false); -#endif + explicit Mutex(bool adaptive = kDefaultToAdaptiveMutex); ~Mutex(); void Lock(); diff --git a/ceph/src/rocksdb/port/win/env_win.cc b/ceph/src/rocksdb/port/win/env_win.cc index 723a273f0..9abb14d67 100644 --- a/ceph/src/rocksdb/port/win/env_win.cc +++ b/ceph/src/rocksdb/port/win/env_win.cc @@ -24,7 +24,7 @@ #include "rocksdb/slice.h" #include "port/port.h" -#include "port/dirent.h" +#include "port/port_dirent.h" #include "port/win/win_logger.h" #include "port/win/io_win.h" @@ -48,7 +48,8 @@ ThreadStatusUpdater* CreateThreadStatusUpdater() { namespace { -static const size_t kSectorSize = 512; // Sector size used when physical sector size could not be obtained from device. +// Sector size used when physical sector size cannot be obtained from device. +static const size_t kSectorSize = 512; // RAII helpers for HANDLEs const auto CloseHandleFunc = [](HANDLE h) { ::CloseHandle(h); }; @@ -69,10 +70,11 @@ void WinthreadCall(const char* label, std::error_code result) { namespace port { WinEnvIO::WinEnvIO(Env* hosted_env) - : hosted_env_(hosted_env), + : hosted_env_(hosted_env), page_size_(4 * 1024), allocation_granularity_(page_size_), perf_counter_frequency_(0), + nano_seconds_per_period_(0), GetSystemTimePreciseAsFileTime_(NULL) { SYSTEM_INFO sinfo; @@ -87,12 +89,17 @@ WinEnvIO::WinEnvIO(Env* hosted_env) ret = QueryPerformanceFrequency(&qpf); assert(ret == TRUE); perf_counter_frequency_ = qpf.QuadPart; + + if (std::nano::den % perf_counter_frequency_ == 0) { + nano_seconds_per_period_ = std::nano::den / perf_counter_frequency_; + } } HMODULE module = GetModuleHandle("kernel32.dll"); if (module != NULL) { - GetSystemTimePreciseAsFileTime_ = (FnGetSystemTimePreciseAsFileTime)GetProcAddress( - module, "GetSystemTimePreciseAsFileTime"); + GetSystemTimePreciseAsFileTime_ = + (FnGetSystemTimePreciseAsFileTime)GetProcAddress( + module, "GetSystemTimePreciseAsFileTime"); } } @@ -102,11 +109,12 @@ WinEnvIO::~WinEnvIO() { Status WinEnvIO::DeleteFile(const std::string& fname) { Status result; - BOOL ret = DeleteFileA(fname.c_str()); + BOOL ret = RX_DeleteFile(RX_FN(fname).c_str()); + if(!ret) { auto lastError = GetLastError(); result = IOErrorFromWindowsError("Failed to delete: " + fname, - lastError); + lastError); } return result; @@ -114,7 +122,7 @@ Status WinEnvIO::DeleteFile(const std::string& fname) { Status WinEnvIO::Truncate(const std::string& fname, size_t size) { Status s; - int result = truncate(fname.c_str(), size); + int result = rocksdb::port::Truncate(fname, size); if (result != 0) { s = IOError("Failed to truncate: " + fname, errno); } @@ -132,8 +140,8 @@ Status WinEnvIO::GetCurrentTime(int64_t* unix_time) { } Status WinEnvIO::NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& options) { Status s; result->reset(); @@ -151,17 +159,17 @@ Status WinEnvIO::NewSequentialFile(const std::string& fname, { IOSTATS_TIMER_GUARD(open_nanos); - hFile = CreateFileA( - fname.c_str(), GENERIC_READ, - FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, - OPEN_EXISTING, // Original fopen mode is "rb" - fileFlags, NULL); + hFile = RX_CreateFile( + RX_FN(fname).c_str(), GENERIC_READ, + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, + OPEN_EXISTING, // Original fopen mode is "rb" + fileFlags, NULL); } if (INVALID_HANDLE_VALUE == hFile) { auto lastError = GetLastError(); s = IOErrorFromWindowsError("Failed to open NewSequentialFile" + fname, - lastError); + lastError); } else { result->reset(new WinSequentialFile(fname, hFile, options)); } @@ -169,8 +177,8 @@ Status WinEnvIO::NewSequentialFile(const std::string& fname, } Status WinEnvIO::NewRandomAccessFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& options) { result->reset(); Status s; @@ -189,16 +197,16 @@ Status WinEnvIO::NewRandomAccessFile(const std::string& fname, HANDLE hFile = 0; { IOSTATS_TIMER_GUARD(open_nanos); - hFile = - CreateFileA(fname.c_str(), GENERIC_READ, - FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, - NULL, OPEN_EXISTING, fileFlags, NULL); + hFile = RX_CreateFile( + RX_FN(fname).c_str(), GENERIC_READ, + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, + NULL, OPEN_EXISTING, fileFlags, NULL); } if (INVALID_HANDLE_VALUE == hFile) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "NewRandomAccessFile failed to Create/Open: " + fname, lastError); + "NewRandomAccessFile failed to Create/Open: " + fname, lastError); } UniqueCloseHandlePtr fileGuard(hFile, CloseHandleFunc); @@ -214,55 +222,57 @@ Status WinEnvIO::NewRandomAccessFile(const std::string& fname, // Will not map empty files if (fileSize == 0) { return IOError( - "NewRandomAccessFile failed to map empty file: " + fname, EINVAL); + "NewRandomAccessFile failed to map empty file: " + fname, EINVAL); } - HANDLE hMap = CreateFileMappingA(hFile, NULL, PAGE_READONLY, - 0, // Whole file at its present length - 0, - NULL); // Mapping name + HANDLE hMap = RX_CreateFileMapping(hFile, NULL, PAGE_READONLY, + 0, // At its present length + 0, + NULL); // Mapping name if (!hMap) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to create file mapping for NewRandomAccessFile: " + fname, - lastError); + "Failed to create file mapping for NewRandomAccessFile: " + fname, + lastError); } UniqueCloseHandlePtr mapGuard(hMap, CloseHandleFunc); const void* mapped_region = MapViewOfFileEx(hMap, FILE_MAP_READ, - 0, // High DWORD of access start - 0, // Low DWORD - static_cast(fileSize), - NULL); // Let the OS choose the mapping + 0, // High DWORD of access start + 0, // Low DWORD + static_cast(fileSize), + NULL); // Let the OS choose the mapping if (!mapped_region) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to MapViewOfFile for NewRandomAccessFile: " + fname, - lastError); + "Failed to MapViewOfFile for NewRandomAccessFile: " + fname, + lastError); } result->reset(new WinMmapReadableFile(fname, hFile, hMap, mapped_region, - static_cast(fileSize))); + static_cast(fileSize))); mapGuard.release(); fileGuard.release(); } } else { - result->reset(new WinRandomAccessFile(fname, hFile, - std::max(GetSectorSize(fname), page_size_), options)); + result->reset(new WinRandomAccessFile(fname, hFile, + std::max(GetSectorSize(fname), + page_size_), + options)); fileGuard.release(); } return s; } Status WinEnvIO::OpenWritableFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options, - bool reopen) { + std::unique_ptr* result, + const EnvOptions& options, + bool reopen) { const size_t c_BufferCapacity = 64 * 1024; @@ -302,20 +312,21 @@ Status WinEnvIO::OpenWritableFile(const std::string& fname, HANDLE hFile = 0; { IOSTATS_TIMER_GUARD(open_nanos); - hFile = CreateFileA( - fname.c_str(), - desired_access, // Access desired - shared_mode, - NULL, // Security attributes - creation_disposition, // Posix env says (reopen) ? (O_CREATE | O_APPEND) : O_CREAT | O_TRUNC - fileFlags, // Flags - NULL); // Template File + hFile = RX_CreateFile( + RX_FN(fname).c_str(), + desired_access, // Access desired + shared_mode, + NULL, // Security attributes + // Posix env says (reopen) ? (O_CREATE | O_APPEND) : O_CREAT | O_TRUNC + creation_disposition, + fileFlags, // Flags + NULL); // Template File } if (INVALID_HANDLE_VALUE == hFile) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to create a NewWriteableFile: " + fname, lastError); + "Failed to create a NewWriteableFile: " + fname, lastError); } // We will start writing at the end, appending @@ -326,7 +337,8 @@ Status WinEnvIO::OpenWritableFile(const std::string& fname, if (!ret) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to create a ReopenWritableFile move to the end: " + fname, lastError); + "Failed to create a ReopenWritableFile move to the end: " + fname, + lastError); } } @@ -334,18 +346,21 @@ Status WinEnvIO::OpenWritableFile(const std::string& fname, // We usually do not use mmmapping on SSD and thus we pass memory // page_size result->reset(new WinMmapFile(fname, hFile, page_size_, - allocation_granularity_, local_options)); + allocation_granularity_, local_options)); } else { // Here we want the buffer allocation to be aligned by the SSD page size // and to be a multiple of it - result->reset(new WinWritableFile(fname, hFile, std::max(GetSectorSize(fname), GetPageSize()), - c_BufferCapacity, local_options)); + result->reset(new WinWritableFile(fname, hFile, + std::max(GetSectorSize(fname), + GetPageSize()), + c_BufferCapacity, local_options)); } return s; } Status WinEnvIO::NewRandomRWFile(const std::string & fname, - std::unique_ptr* result, const EnvOptions & options) { + std::unique_ptr* result, + const EnvOptions & options) { Status s; @@ -366,13 +381,13 @@ Status WinEnvIO::NewRandomRWFile(const std::string & fname, { IOSTATS_TIMER_GUARD(open_nanos); hFile = - CreateFileA(fname.c_str(), - desired_access, - shared_mode, - NULL, // Security attributes - creation_disposition, - file_flags, - NULL); + RX_CreateFile(RX_FN(fname).c_str(), + desired_access, + shared_mode, + NULL, // Security attributes + creation_disposition, + file_flags, + NULL); } if (INVALID_HANDLE_VALUE == hFile) { @@ -382,15 +397,18 @@ Status WinEnvIO::NewRandomRWFile(const std::string & fname, } UniqueCloseHandlePtr fileGuard(hFile, CloseHandleFunc); - result->reset(new WinRandomRWFile(fname, hFile, std::max(GetSectorSize(fname), GetPageSize()), - options)); + result->reset(new WinRandomRWFile(fname, hFile, + std::max(GetSectorSize(fname), + GetPageSize()), + options)); fileGuard.release(); return s; } -Status WinEnvIO::NewMemoryMappedFileBuffer(const std::string & fname, - std::unique_ptr* result) { +Status WinEnvIO::NewMemoryMappedFileBuffer( + const std::string & fname, + std::unique_ptr* result) { Status s; result->reset(); @@ -399,19 +417,19 @@ Status WinEnvIO::NewMemoryMappedFileBuffer(const std::string & fname, HANDLE hFile = INVALID_HANDLE_VALUE; { IOSTATS_TIMER_GUARD(open_nanos); - hFile = CreateFileA( - fname.c_str(), GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, - NULL, - OPEN_EXISTING, // Open only if it exists - fileFlags, - NULL); + hFile = RX_CreateFile( + RX_FN(fname).c_str(), GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, + NULL, + OPEN_EXISTING, // Open only if it exists + fileFlags, + NULL); } if (INVALID_HANDLE_VALUE == hFile) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError("Failed to open NewMemoryMappedFileBuffer: " + fname, - lastError); + s = IOErrorFromWindowsError( + "Failed to open NewMemoryMappedFileBuffer: " + fname, lastError); return s; } UniqueCloseHandlePtr fileGuard(hFile, CloseHandleFunc); @@ -423,43 +441,44 @@ Status WinEnvIO::NewMemoryMappedFileBuffer(const std::string & fname, } // Will not map empty files if (fileSize == 0) { - return Status::NotSupported("NewMemoryMappedFileBuffer can not map zero length files: " + fname); + return Status::NotSupported( + "NewMemoryMappedFileBuffer can not map zero length files: " + fname); } // size_t is 32-bit with 32-bit builds if (fileSize > std::numeric_limits::max()) { return Status::NotSupported( - "The specified file size does not fit into 32-bit memory addressing: " + fname); + "The specified file size does not fit into 32-bit memory addressing: " + + fname); } - HANDLE hMap = CreateFileMappingA(hFile, NULL, PAGE_READWRITE, - 0, // Whole file at its present length - 0, - NULL); // Mapping name + HANDLE hMap = RX_CreateFileMapping(hFile, NULL, PAGE_READWRITE, + 0, // Whole file at its present length + 0, + NULL); // Mapping name if (!hMap) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to create file mapping for NewMemoryMappedFileBuffer: " + fname, - lastError); + "Failed to create file mapping for: " + fname, lastError); } UniqueCloseHandlePtr mapGuard(hMap, CloseHandleFunc); void* base = MapViewOfFileEx(hMap, FILE_MAP_WRITE, - 0, // High DWORD of access start - 0, // Low DWORD - static_cast(fileSize), - NULL); // Let the OS choose the mapping + 0, // High DWORD of access start + 0, // Low DWORD + static_cast(fileSize), + NULL); // Let the OS choose the mapping if (!base) { auto lastError = GetLastError(); return IOErrorFromWindowsError( - "Failed to MapViewOfFile for NewMemoryMappedFileBuffer: " + fname, - lastError); + "Failed to MapViewOfFile for NewMemoryMappedFileBuffer: " + fname, + lastError); } - result->reset(new WinMemoryMappedBuffer(hFile, hMap, - base, static_cast(fileSize))); + result->reset(new WinMemoryMappedBuffer(hFile, hMap, base, + static_cast(fileSize))); mapGuard.release(); fileGuard.release(); @@ -468,14 +487,14 @@ Status WinEnvIO::NewMemoryMappedFileBuffer(const std::string & fname, } Status WinEnvIO::NewDirectory(const std::string& name, - std::unique_ptr* result) { + std::unique_ptr* result) { Status s; // Must be nullptr on failure result->reset(); if (!DirExists(name)) { s = IOErrorFromWindowsError( - "open folder: " + name, ERROR_DIRECTORY); + "open folder: " + name, ERROR_DIRECTORY); return s; } @@ -483,18 +502,18 @@ Status WinEnvIO::NewDirectory(const std::string& name, // 0 - for access means read metadata { IOSTATS_TIMER_GUARD(open_nanos); - handle = ::CreateFileA(name.c_str(), 0, - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, - OPEN_EXISTING, - FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible - NULL); + handle = RX_CreateFile( + RX_FN(name).c_str(), 0, + FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, + OPEN_EXISTING, + FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible + NULL); } if (INVALID_HANDLE_VALUE == handle) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError( - "open folder: " + name, lastError); + s = IOErrorFromWindowsError("open folder: " + name, lastError); return s; } @@ -509,8 +528,8 @@ Status WinEnvIO::FileExists(const std::string& fname) { // which is consistent with _access() impl on windows // but can be added WIN32_FILE_ATTRIBUTE_DATA attrs; - if (FALSE == GetFileAttributesExA(fname.c_str(), GetFileExInfoStandard, - &attrs)) { + if (FALSE == RX_GetFileAttributesEx(RX_FN(fname).c_str(), + GetFileExInfoStandard, &attrs)) { auto lastError = GetLastError(); switch (lastError) { case ERROR_ACCESS_DENIED: @@ -521,7 +540,7 @@ Status WinEnvIO::FileExists(const std::string& fname) { break; default: s = IOErrorFromWindowsError("Unexpected error for: " + fname, - lastError); + lastError); break; } } @@ -529,22 +548,24 @@ Status WinEnvIO::FileExists(const std::string& fname) { } Status WinEnvIO::GetChildren(const std::string& dir, - std::vector* result) { + std::vector* result) { Status status; result->clear(); std::vector output; - WIN32_FIND_DATA data; + RX_WIN32_FIND_DATA data; + memset(&data, 0, sizeof(data)); std::string pattern(dir); pattern.append("\\").append("*"); - HANDLE handle = ::FindFirstFileExA(pattern.c_str(), - FindExInfoBasic, // Do not want alternative name - &data, - FindExSearchNameMatch, - NULL, // lpSearchFilter - 0); + HANDLE handle = RX_FindFirstFileEx(RX_FN(pattern).c_str(), + // Do not want alternative name + FindExInfoBasic, + &data, + FindExSearchNameMatch, + NULL, // lpSearchFilter + 0); if (handle == INVALID_HANDLE_VALUE) { auto lastError = GetLastError(); @@ -557,7 +578,7 @@ Status WinEnvIO::GetChildren(const std::string& dir, break; default: status = IOErrorFromWindowsError( - "Failed to GetChhildren for: " + dir, lastError); + "Failed to GetChhildren for: " + dir, lastError); } return status; } @@ -572,8 +593,9 @@ Status WinEnvIO::GetChildren(const std::string& dir, data.cFileName[MAX_PATH - 1] = 0; while (true) { - output.emplace_back(data.cFileName); - BOOL ret =- ::FindNextFileA(handle, &data); + auto x = RX_FILESTRING(data.cFileName, RX_FNLEN(data.cFileName)); + output.emplace_back(FN_TO_RX(x)); + BOOL ret =- RX_FindNextFile(handle, &data); // If the function fails the return value is zero // and non-zero otherwise. Not TRUE or FALSE. if (ret == FALSE) { @@ -588,8 +610,7 @@ Status WinEnvIO::GetChildren(const std::string& dir, Status WinEnvIO::CreateDir(const std::string& name) { Status result; - - BOOL ret = CreateDirectoryA(name.c_str(), NULL); + BOOL ret = RX_CreateDirectory(RX_FN(name).c_str(), NULL); if (!ret) { auto lastError = GetLastError(); result = IOErrorFromWindowsError( @@ -606,15 +627,15 @@ Status WinEnvIO::CreateDirIfMissing(const std::string& name) { return result; } - BOOL ret = CreateDirectoryA(name.c_str(), NULL); + BOOL ret = RX_CreateDirectory(RX_FN(name).c_str(), NULL); if (!ret) { auto lastError = GetLastError(); if (lastError != ERROR_ALREADY_EXISTS) { result = IOErrorFromWindowsError( - "Failed to create a directory: " + name, lastError); + "Failed to create a directory: " + name, lastError); } else { result = - Status::IOError(name + ": exists but is not a directory"); + Status::IOError(name + ": exists but is not a directory"); } } return result; @@ -622,10 +643,11 @@ Status WinEnvIO::CreateDirIfMissing(const std::string& name) { Status WinEnvIO::DeleteDir(const std::string& name) { Status result; - BOOL ret = RemoveDirectoryA(name.c_str()); + BOOL ret = RX_RemoveDirectory(RX_FN(name).c_str()); if (!ret) { auto lastError = GetLastError(); - result = IOErrorFromWindowsError("Failed to remove dir: " + name, lastError); + result = IOErrorFromWindowsError("Failed to remove dir: " + name, + lastError); } return result; } @@ -635,7 +657,8 @@ Status WinEnvIO::GetFileSize(const std::string& fname, Status s; WIN32_FILE_ATTRIBUTE_DATA attrs; - if (GetFileAttributesExA(fname.c_str(), GetFileExInfoStandard, &attrs)) { + if (RX_GetFileAttributesEx(RX_FN(fname).c_str(), GetFileExInfoStandard, + &attrs)) { ULARGE_INTEGER file_size; file_size.HighPart = attrs.nFileSizeHigh; file_size.LowPart = attrs.nFileSizeLow; @@ -670,7 +693,8 @@ Status WinEnvIO::GetFileModificationTime(const std::string& fname, Status s; WIN32_FILE_ATTRIBUTE_DATA attrs; - if (GetFileAttributesExA(fname.c_str(), GetFileExInfoStandard, &attrs)) { + if (RX_GetFileAttributesEx(RX_FN(fname).c_str(), GetFileExInfoStandard, + &attrs)) { *file_mtime = FileTimeToUnixTime(attrs.ftLastWriteTime); } else { auto lastError = GetLastError(); @@ -688,7 +712,8 @@ Status WinEnvIO::RenameFile(const std::string& src, // rename() is not capable of replacing the existing file as on Linux // so use OS API directly - if (!MoveFileExA(src.c_str(), target.c_str(), MOVEFILE_REPLACE_EXISTING)) { + if (!RX_MoveFileEx(RX_FN(src).c_str(), RX_FN(target).c_str(), + MOVEFILE_REPLACE_EXISTING)) { DWORD lastError = GetLastError(); std::string text("Failed to rename: "); @@ -704,7 +729,7 @@ Status WinEnvIO::LinkFile(const std::string& src, const std::string& target) { Status result; - if (!CreateHardLinkA(target.c_str(), src.c_str(), NULL)) { + if (!RX_CreateHardLink(RX_FN(target).c_str(), RX_FN(src).c_str(), NULL)) { DWORD lastError = GetLastError(); if (lastError == ERROR_NOT_SAME_DEVICE) { return Status::NotSupported("No cross FS links allowed"); @@ -721,8 +746,9 @@ Status WinEnvIO::LinkFile(const std::string& src, Status WinEnvIO::NumFileLinks(const std::string& fname, uint64_t* count) { Status s; - HANDLE handle = ::CreateFileA( - fname.c_str(), 0, FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, + HANDLE handle = RX_CreateFile( + RX_FN(fname).c_str(), 0, + FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL); if (INVALID_HANDLE_VALUE == handle) { @@ -758,60 +784,59 @@ Status WinEnvIO::AreFilesSame(const std::string& first, } // 0 - for access means read metadata - HANDLE file_1 = ::CreateFileA(first.c_str(), 0, - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, - OPEN_EXISTING, - FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible - NULL); + HANDLE file_1 = RX_CreateFile( + RX_FN(first).c_str(), 0, + FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, + OPEN_EXISTING, + FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible + NULL); if (INVALID_HANDLE_VALUE == file_1) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError( - "open file: " + first, lastError); + s = IOErrorFromWindowsError("open file: " + first, lastError); return s; } UniqueCloseHandlePtr g_1(file_1, CloseHandleFunc); - HANDLE file_2 = ::CreateFileA(second.c_str(), 0, - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible - NULL); + HANDLE file_2 = RX_CreateFile( + RX_FN(second).c_str(), 0, + FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_BACKUP_SEMANTICS, // make opening folders possible + NULL); if (INVALID_HANDLE_VALUE == file_2) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError( - "open file: " + second, lastError); + s = IOErrorFromWindowsError("open file: " + second, lastError); return s; } UniqueCloseHandlePtr g_2(file_2, CloseHandleFunc); FILE_ID_INFO FileInfo_1; BOOL result = GetFileInformationByHandleEx(file_1, FileIdInfo, &FileInfo_1, - sizeof(FileInfo_1)); + sizeof(FileInfo_1)); if (!result) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError( - "stat file: " + first, lastError); + s = IOErrorFromWindowsError("stat file: " + first, lastError); return s; } FILE_ID_INFO FileInfo_2; result = GetFileInformationByHandleEx(file_2, FileIdInfo, &FileInfo_2, - sizeof(FileInfo_2)); + sizeof(FileInfo_2)); if (!result) { auto lastError = GetLastError(); - s = IOErrorFromWindowsError( - "stat file: " + second, lastError); + s = IOErrorFromWindowsError("stat file: " + second, lastError); return s; } if (FileInfo_1.VolumeSerialNumber == FileInfo_2.VolumeSerialNumber) { - *res = (0 == memcmp(FileInfo_1.FileId.Identifier, FileInfo_2.FileId.Identifier, - sizeof(FileInfo_1.FileId.Identifier))); + *res = (0 == memcmp(FileInfo_1.FileId.Identifier, + FileInfo_2.FileId.Identifier, + sizeof(FileInfo_1.FileId.Identifier))); } else { *res = false; } @@ -820,7 +845,7 @@ Status WinEnvIO::AreFilesSame(const std::string& first, } Status WinEnvIO::LockFile(const std::string& lockFname, - FileLock** lock) { + FileLock** lock) { assert(lock != nullptr); *lock = NULL; @@ -835,15 +860,16 @@ Status WinEnvIO::LockFile(const std::string& lockFname, HANDLE hFile = 0; { IOSTATS_TIMER_GUARD(open_nanos); - hFile = CreateFileA(lockFname.c_str(), (GENERIC_READ | GENERIC_WRITE), - ExclusiveAccessON, NULL, CREATE_ALWAYS, - FILE_ATTRIBUTE_NORMAL, NULL); + hFile = RX_CreateFile(RX_FN(lockFname).c_str(), + (GENERIC_READ | GENERIC_WRITE), + ExclusiveAccessON, NULL, CREATE_ALWAYS, + FILE_ATTRIBUTE_NORMAL, NULL); } if (INVALID_HANDLE_VALUE == hFile) { auto lastError = GetLastError(); result = IOErrorFromWindowsError( - "Failed to create lock file: " + lockFname, lastError); + "Failed to create lock file: " + lockFname, lastError); } else { *lock = new WinFileLock(hFile); } @@ -890,7 +916,7 @@ Status WinEnvIO::GetTestDirectory(std::string* result) { } Status WinEnvIO::NewLogger(const std::string& fname, - std::shared_ptr* result) { + std::shared_ptr* result) { Status s; result->reset(); @@ -898,15 +924,15 @@ Status WinEnvIO::NewLogger(const std::string& fname, HANDLE hFile = 0; { IOSTATS_TIMER_GUARD(open_nanos); - hFile = CreateFileA( - fname.c_str(), GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_DELETE, // In RocksDb log files are - // renamed and deleted before - // they are closed. This enables - // doing so. - NULL, - CREATE_ALWAYS, // Original fopen mode is "w" - FILE_ATTRIBUTE_NORMAL, NULL); + hFile = RX_CreateFile( + RX_FN(fname).c_str(), GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_DELETE, // In RocksDb log files are + // renamed and deleted before + // they are closed. This enables + // doing so. + NULL, + CREATE_ALWAYS, // Original fopen mode is "w" + FILE_ATTRIBUTE_NORMAL, NULL); } if (INVALID_HANDLE_VALUE == hFile) { @@ -953,21 +979,29 @@ uint64_t WinEnvIO::NowMicros() { return li.QuadPart; } using namespace std::chrono; - return duration_cast(system_clock::now().time_since_epoch()).count(); + return duration_cast( + high_resolution_clock::now().time_since_epoch()).count(); } uint64_t WinEnvIO::NowNanos() { - // all std::chrono clocks on windows have the same resolution that is only - // good enough for microseconds but not nanoseconds - // On Windows 8 and Windows 2012 Server - // GetSystemTimePreciseAsFileTime(¤t_time) can be used - LARGE_INTEGER li; - QueryPerformanceCounter(&li); - // Convert to nanoseconds first to avoid loss of precision - // and divide by frequency - li.QuadPart *= std::nano::den; - li.QuadPart /= perf_counter_frequency_; - return li.QuadPart; + if (nano_seconds_per_period_ != 0) { + // all std::chrono clocks on windows have the same resolution that is only + // good enough for microseconds but not nanoseconds + // On Windows 8 and Windows 2012 Server + // GetSystemTimePreciseAsFileTime(¤t_time) can be used + LARGE_INTEGER li; + QueryPerformanceCounter(&li); + // Convert performance counter to nanoseconds by precomputed ratio. + // Directly multiply nano::den with li.QuadPart causes overflow. + // Only do this when nano::den is divisible by perf_counter_frequency_, + // which most likely is the case in reality. If it's not, fall back to + // high_resolution_clock, which may be less precise under old compilers. + li.QuadPart *= nano_seconds_per_period_; + return li.QuadPart; + } + using namespace std::chrono; + return duration_cast( + high_resolution_clock::now().time_since_epoch()).count(); } Status WinEnvIO::GetHostName(char* name, uint64_t len) { @@ -986,32 +1020,32 @@ Status WinEnvIO::GetHostName(char* name, uint64_t len) { } Status WinEnvIO::GetAbsolutePath(const std::string& db_path, - std::string* output_path) { - + std::string* output_path) { // Check if we already have an absolute path // For test compatibility we will consider starting slash as an // absolute path if ((!db_path.empty() && (db_path[0] == '\\' || db_path[0] == '/')) || - !PathIsRelativeA(db_path.c_str())) { + !RX_PathIsRelative(RX_FN(db_path).c_str())) { *output_path = db_path; return Status::OK(); } - std::string result; + RX_FILESTRING result; result.resize(MAX_PATH); // Hopefully no changes the current directory while we do this // however _getcwd also suffers from the same limitation - DWORD len = GetCurrentDirectoryA(MAX_PATH, &result[0]); + DWORD len = RX_GetCurrentDirectory(MAX_PATH, &result[0]); if (len == 0) { auto lastError = GetLastError(); return IOErrorFromWindowsError("Failed to get current working directory", - lastError); + lastError); } result.resize(len); + std::string res = FN_TO_RX(result); - result.swap(*output_path); + res.swap(*output_path); return Status::OK(); } @@ -1031,8 +1065,8 @@ std::string WinEnvIO::TimeToString(uint64_t secondsSince1970) { char* p = &result[0]; int len = snprintf(p, maxsize, "%04d/%02d/%02d-%02d:%02d:%02d ", - t.tm_year + 1900, t.tm_mon + 1, t.tm_mday, t.tm_hour, - t.tm_min, t.tm_sec); + t.tm_year + 1900, t.tm_mon + 1, t.tm_mday, t.tm_hour, + t.tm_min, t.tm_sec); assert(len > 0); result.resize(len); @@ -1042,7 +1076,7 @@ std::string WinEnvIO::TimeToString(uint64_t secondsSince1970) { } EnvOptions WinEnvIO::OptimizeForLogWrite(const EnvOptions& env_options, - const DBOptions& db_options) const { + const DBOptions& db_options) const { EnvOptions optimized(env_options); // These two the same as default optimizations optimized.bytes_per_sync = db_options.wal_bytes_per_sync; @@ -1058,7 +1092,7 @@ EnvOptions WinEnvIO::OptimizeForLogWrite(const EnvOptions& env_options, } EnvOptions WinEnvIO::OptimizeForManifestWrite( - const EnvOptions& env_options) const { + const EnvOptions& env_options) const { EnvOptions optimized(env_options); optimized.use_mmap_writes = false; optimized.use_direct_reads = false; @@ -1066,7 +1100,7 @@ EnvOptions WinEnvIO::OptimizeForManifestWrite( } EnvOptions WinEnvIO::OptimizeForManifestRead( - const EnvOptions& env_options) const { + const EnvOptions& env_options) const { EnvOptions optimized(env_options); optimized.use_mmap_writes = false; optimized.use_direct_reads = false; @@ -1076,7 +1110,8 @@ EnvOptions WinEnvIO::OptimizeForManifestRead( // Returns true iff the named directory exists and is a directory. bool WinEnvIO::DirExists(const std::string& dname) { WIN32_FILE_ATTRIBUTE_DATA attrs; - if (GetFileAttributesExA(dname.c_str(), GetFileExInfoStandard, &attrs)) { + if (RX_GetFileAttributesEx(RX_FN(dname).c_str(), + GetFileExInfoStandard, &attrs)) { return 0 != (attrs.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY); } return false; @@ -1085,7 +1120,7 @@ bool WinEnvIO::DirExists(const std::string& dname) { size_t WinEnvIO::GetSectorSize(const std::string& fname) { size_t sector_size = kSectorSize; - if (PathIsRelativeA(fname.c_str())) { + if (RX_PathIsRelative(RX_FN(fname).c_str())) { return sector_size; } @@ -1098,9 +1133,8 @@ size_t WinEnvIO::GetSectorSize(const std::string& fname) { return sector_size; } - HANDLE hDevice = CreateFile(devicename, 0, 0, - nullptr, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, nullptr); + HANDLE hDevice = CreateFile(devicename, 0, 0, nullptr, OPEN_EXISTING, + FILE_ATTRIBUTE_NORMAL, nullptr); if (hDevice == INVALID_HANDLE_VALUE) { return sector_size; @@ -1114,8 +1148,10 @@ size_t WinEnvIO::GetSectorSize(const std::string& fname) { DWORD output_bytes = 0; BOOL ret = DeviceIoControl(hDevice, IOCTL_STORAGE_QUERY_PROPERTY, - &spropertyquery, sizeof(spropertyquery), output_buffer, - sizeof(STORAGE_ACCESS_ALIGNMENT_DESCRIPTOR), &output_bytes, nullptr); + &spropertyquery, sizeof(spropertyquery), + output_buffer, + sizeof(STORAGE_ACCESS_ALIGNMENT_DESCRIPTOR), + &output_bytes, nullptr); if (ret) { sector_size = ((STORAGE_ACCESS_ALIGNMENT_DESCRIPTOR *)output_buffer)->BytesPerLogicalSector; @@ -1141,7 +1177,8 @@ size_t WinEnvIO::GetSectorSize(const std::string& fname) { //////////////////////////////////////////////////////////////////////// // WinEnvThreads -WinEnvThreads::WinEnvThreads(Env* hosted_env) : hosted_env_(hosted_env), thread_pools_(Env::Priority::TOTAL) { +WinEnvThreads::WinEnvThreads(Env* hosted_env) + : hosted_env_(hosted_env), thread_pools_(Env::Priority::TOTAL) { for (int pool_id = 0; pool_id < Env::Priority::TOTAL; ++pool_id) { thread_pools_[pool_id].SetThreadPriority( @@ -1160,8 +1197,9 @@ WinEnvThreads::~WinEnvThreads() { } } -void WinEnvThreads::Schedule(void(*function)(void*), void* arg, Env::Priority pri, - void* tag, void(*unschedFunction)(void* arg)) { +void WinEnvThreads::Schedule(void(*function)(void*), void* arg, + Env::Priority pri, void* tag, + void(*unschedFunction)(void* arg)) { assert(pri >= Env::Priority::BOTTOM && pri <= Env::Priority::HIGH); thread_pools_[pri].Schedule(function, arg, tag, unschedFunction); } @@ -1256,8 +1294,7 @@ WinEnv::~WinEnv() { delete thread_status_updater_; } -Status WinEnv::GetThreadList( - std::vector* thread_list) { +Status WinEnv::GetThreadList(std::vector* thread_list) { assert(thread_status_updater_); return thread_status_updater_->GetThreadList(thread_list); } @@ -1275,14 +1312,14 @@ Status WinEnv::GetCurrentTime(int64_t* unix_time) { } Status WinEnv::NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& options) { return winenv_io_.NewSequentialFile(fname, result, options); } Status WinEnv::NewRandomAccessFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& options) { return winenv_io_.NewRandomAccessFile(fname, result, options); } @@ -1293,22 +1330,25 @@ Status WinEnv::NewWritableFile(const std::string& fname, } Status WinEnv::ReopenWritableFile(const std::string& fname, - std::unique_ptr* result, const EnvOptions& options) { + std::unique_ptr* result, + const EnvOptions& options) { return winenv_io_.OpenWritableFile(fname, result, options, true); } Status WinEnv::NewRandomRWFile(const std::string & fname, - std::unique_ptr* result, const EnvOptions & options) { + std::unique_ptr* result, + const EnvOptions & options) { return winenv_io_.NewRandomRWFile(fname, result, options); } -Status WinEnv::NewMemoryMappedFileBuffer(const std::string& fname, - std::unique_ptr* result) { +Status WinEnv::NewMemoryMappedFileBuffer( + const std::string& fname, + std::unique_ptr* result) { return winenv_io_.NewMemoryMappedFileBuffer(fname, result); } Status WinEnv::NewDirectory(const std::string& name, - std::unique_ptr* result) { + std::unique_ptr* result) { return winenv_io_.NewDirectory(name, result); } @@ -1317,7 +1357,7 @@ Status WinEnv::FileExists(const std::string& fname) { } Status WinEnv::GetChildren(const std::string& dir, - std::vector* result) { + std::vector* result) { return winenv_io_.GetChildren(dir, result); } @@ -1334,22 +1374,22 @@ Status WinEnv::DeleteDir(const std::string& name) { } Status WinEnv::GetFileSize(const std::string& fname, - uint64_t* size) { + uint64_t* size) { return winenv_io_.GetFileSize(fname, size); } Status WinEnv::GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime) { + uint64_t* file_mtime) { return winenv_io_.GetFileModificationTime(fname, file_mtime); } Status WinEnv::RenameFile(const std::string& src, - const std::string& target) { + const std::string& target) { return winenv_io_.RenameFile(src, target); } Status WinEnv::LinkFile(const std::string& src, - const std::string& target) { + const std::string& target) { return winenv_io_.LinkFile(src, target); } @@ -1358,7 +1398,7 @@ Status WinEnv::NumFileLinks(const std::string& fname, uint64_t* count) { } Status WinEnv::AreFilesSame(const std::string& first, - const std::string& second, bool* res) { + const std::string& second, bool* res) { return winenv_io_.AreFilesSame(first, second, res); } @@ -1376,7 +1416,7 @@ Status WinEnv::GetTestDirectory(std::string* result) { } Status WinEnv::NewLogger(const std::string& fname, - std::shared_ptr* result) { + std::shared_ptr* result) { return winenv_io_.NewLogger(fname, result); } @@ -1402,8 +1442,8 @@ std::string WinEnv::TimeToString(uint64_t secondsSince1970) { } void WinEnv::Schedule(void(*function)(void*), void* arg, Env::Priority pri, - void* tag, - void(*unschedFunction)(void* arg)) { + void* tag, + void(*unschedFunction)(void* arg)) { return winenv_threads_.Schedule(function, arg, pri, tag, unschedFunction); } @@ -1445,17 +1485,17 @@ void WinEnv::IncBackgroundThreadsIfNeeded(int num, Env::Priority pri) { } EnvOptions WinEnv::OptimizeForManifestRead( - const EnvOptions& env_options) const { + const EnvOptions& env_options) const { return winenv_io_.OptimizeForManifestRead(env_options); } EnvOptions WinEnv::OptimizeForLogWrite(const EnvOptions& env_options, - const DBOptions& db_options) const { + const DBOptions& db_options) const { return winenv_io_.OptimizeForLogWrite(env_options, db_options); } EnvOptions WinEnv::OptimizeForManifestWrite( - const EnvOptions& env_options) const { + const EnvOptions& env_options) const { return winenv_io_.OptimizeForManifestWrite(env_options); } diff --git a/ceph/src/rocksdb/port/win/env_win.h b/ceph/src/rocksdb/port/win/env_win.h index 81b323a71..7a4d48de2 100644 --- a/ceph/src/rocksdb/port/win/env_win.h +++ b/ceph/src/rocksdb/port/win/env_win.h @@ -47,8 +47,7 @@ public: WinEnvThreads& operator=(const WinEnvThreads&) = delete; void Schedule(void(*function)(void*), void* arg, Env::Priority pri, - void* tag, - void(*unschedFunction)(void* arg)); + void* tag, void(*unschedFunction)(void* arg)); int UnSchedule(void* arg, Env::Priority pri); @@ -72,8 +71,8 @@ public: private: - Env* hosted_env_; - mutable std::mutex mu_; + Env* hosted_env_; + mutable std::mutex mu_; std::vector thread_pools_; std::vector threads_to_join_; @@ -94,35 +93,35 @@ public: virtual Status GetCurrentTime(int64_t* unix_time); virtual Status NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options); + std::unique_ptr* result, + const EnvOptions& options); // Helper for NewWritable and ReopenWritableFile virtual Status OpenWritableFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options, - bool reopen); + std::unique_ptr* result, + const EnvOptions& options, + bool reopen); virtual Status NewRandomAccessFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options); + std::unique_ptr* result, + const EnvOptions& options); // The returned file will only be accessed by one thread at a time. virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options); + std::unique_ptr* result, + const EnvOptions& options); virtual Status NewMemoryMappedFileBuffer( - const std::string& fname, - std::unique_ptr* result); + const std::string& fname, + std::unique_ptr* result); virtual Status NewDirectory(const std::string& name, - std::unique_ptr* result); + std::unique_ptr* result); virtual Status FileExists(const std::string& fname); virtual Status GetChildren(const std::string& dir, - std::vector* result); + std::vector* result); virtual Status CreateDir(const std::string& name); @@ -130,35 +129,31 @@ public: virtual Status DeleteDir(const std::string& name); - virtual Status GetFileSize(const std::string& fname, - uint64_t* size); + virtual Status GetFileSize(const std::string& fname, uint64_t* size); static uint64_t FileTimeToUnixTime(const FILETIME& ftTime); virtual Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime); + uint64_t* file_mtime); - virtual Status RenameFile(const std::string& src, - const std::string& target); + virtual Status RenameFile(const std::string& src, const std::string& target); - virtual Status LinkFile(const std::string& src, - const std::string& target); + virtual Status LinkFile(const std::string& src, const std::string& target); virtual Status NumFileLinks(const std::string& /*fname*/, uint64_t* /*count*/); virtual Status AreFilesSame(const std::string& first, - const std::string& second, bool* res); + const std::string& second, bool* res); - virtual Status LockFile(const std::string& lockFname, - FileLock** lock); + virtual Status LockFile(const std::string& lockFname, FileLock** lock); virtual Status UnlockFile(FileLock* lock); virtual Status GetTestDirectory(std::string* result); virtual Status NewLogger(const std::string& fname, - std::shared_ptr* result); + std::shared_ptr* result); virtual uint64_t NowMicros(); @@ -167,18 +162,18 @@ public: virtual Status GetHostName(char* name, uint64_t len); virtual Status GetAbsolutePath(const std::string& db_path, - std::string* output_path); + std::string* output_path); virtual std::string TimeToString(uint64_t secondsSince1970); virtual EnvOptions OptimizeForLogWrite(const EnvOptions& env_options, - const DBOptions& db_options) const; + const DBOptions& db_options) const; virtual EnvOptions OptimizeForManifestWrite( - const EnvOptions& env_options) const; + const EnvOptions& env_options) const; virtual EnvOptions OptimizeForManifestRead( - const EnvOptions& env_options) const; + const EnvOptions& env_options) const; size_t GetPageSize() const { return page_size_; } @@ -194,10 +189,11 @@ private: typedef VOID(WINAPI * FnGetSystemTimePreciseAsFileTime)(LPFILETIME); - Env* hosted_env_; - size_t page_size_; - size_t allocation_granularity_; - uint64_t perf_counter_frequency_; + Env* hosted_env_; + size_t page_size_; + size_t allocation_granularity_; + uint64_t perf_counter_frequency_; + uint64_t nano_seconds_per_period_; FnGetSystemTimePreciseAsFileTime GetSystemTimePreciseAsFileTime_; }; @@ -214,12 +210,12 @@ public: Status GetCurrentTime(int64_t* unix_time) override; Status NewSequentialFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override; + std::unique_ptr* result, + const EnvOptions& options) override; Status NewRandomAccessFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override; + std::unique_ptr* result, + const EnvOptions& options) override; Status NewWritableFile(const std::string& fname, std::unique_ptr* result, @@ -233,25 +229,25 @@ public: // // The returned file will only be accessed by one thread at a time. Status ReopenWritableFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override; + std::unique_ptr* result, + const EnvOptions& options) override; // The returned file will only be accessed by one thread at a time. Status NewRandomRWFile(const std::string& fname, - std::unique_ptr* result, - const EnvOptions& options) override; + std::unique_ptr* result, + const EnvOptions& options) override; Status NewMemoryMappedFileBuffer( - const std::string& fname, - std::unique_ptr* result) override; + const std::string& fname, + std::unique_ptr* result) override; Status NewDirectory(const std::string& name, - std::unique_ptr* result) override; + std::unique_ptr* result) override; Status FileExists(const std::string& fname) override; Status GetChildren(const std::string& dir, - std::vector* result) override; + std::vector* result) override; Status CreateDir(const std::string& name) override; @@ -260,31 +256,30 @@ public: Status DeleteDir(const std::string& name) override; Status GetFileSize(const std::string& fname, - uint64_t* size) override; + uint64_t* size) override; Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime) override; + uint64_t* file_mtime) override; Status RenameFile(const std::string& src, - const std::string& target) override; + const std::string& target) override; Status LinkFile(const std::string& src, - const std::string& target) override; + const std::string& target) override; Status NumFileLinks(const std::string& fname, uint64_t* count) override; Status AreFilesSame(const std::string& first, - const std::string& second, bool* res) override; + const std::string& second, bool* res) override; - Status LockFile(const std::string& lockFname, - FileLock** lock) override; + Status LockFile(const std::string& lockFname, FileLock** lock) override; Status UnlockFile(FileLock* lock) override; Status GetTestDirectory(std::string* result) override; Status NewLogger(const std::string& fname, - std::shared_ptr* result) override; + std::shared_ptr* result) override; uint64_t NowMicros() override; @@ -293,16 +288,14 @@ public: Status GetHostName(char* name, uint64_t len) override; Status GetAbsolutePath(const std::string& db_path, - std::string* output_path) override; + std::string* output_path) override; std::string TimeToString(uint64_t secondsSince1970) override; - Status GetThreadList( - std::vector* thread_list) override; + Status GetThreadList(std::vector* thread_list) override; void Schedule(void(*function)(void*), void* arg, Env::Priority pri, - void* tag, - void(*unschedFunction)(void* arg)) override; + void* tag, void(*unschedFunction)(void* arg)) override; int UnSchedule(void* arg, Env::Priority pri) override; @@ -323,18 +316,18 @@ public: void IncBackgroundThreadsIfNeeded(int num, Env::Priority pri) override; EnvOptions OptimizeForManifestRead( - const EnvOptions& env_options) const override; + const EnvOptions& env_options) const override; EnvOptions OptimizeForLogWrite(const EnvOptions& env_options, - const DBOptions& db_options) const override; + const DBOptions& db_options) const override; EnvOptions OptimizeForManifestWrite( - const EnvOptions& env_options) const override; + const EnvOptions& env_options) const override; private: - WinEnvIO winenv_io_; + WinEnvIO winenv_io_; WinEnvThreads winenv_threads_; }; diff --git a/ceph/src/rocksdb/port/win/io_win.h b/ceph/src/rocksdb/port/win/io_win.h index 3b08c394f..1c9d803b1 100644 --- a/ceph/src/rocksdb/port/win/io_win.h +++ b/ceph/src/rocksdb/port/win/io_win.h @@ -27,7 +27,9 @@ std::string GetWindowsErrSz(DWORD err); inline Status IOErrorFromWindowsError(const std::string& context, DWORD err) { return ((err == ERROR_HANDLE_DISK_FULL) || (err == ERROR_DISK_FULL)) ? Status::NoSpace(context, GetWindowsErrSz(err)) - : Status::IOError(context, GetWindowsErrSz(err)); + : ((err == ERROR_FILE_NOT_FOUND) || (err == ERROR_PATH_NOT_FOUND)) + ? Status::PathNotFound(context, GetWindowsErrSz(err)) + : Status::IOError(context, GetWindowsErrSz(err)); } inline Status IOErrorFromLastWindowsError(const std::string& context) { @@ -37,7 +39,9 @@ inline Status IOErrorFromLastWindowsError(const std::string& context) { inline Status IOError(const std::string& context, int err_number) { return (err_number == ENOSPC) ? Status::NoSpace(context, strerror(err_number)) - : Status::IOError(context, strerror(err_number)); + : (err_number == ENOENT) + ? Status::PathNotFound(context, strerror(err_number)) + : Status::IOError(context, strerror(err_number)); } class WinFileData; @@ -58,7 +62,7 @@ class WinFileData { protected: const std::string filename_; HANDLE hFile_; - // If ture, the I/O issued would be direct I/O which the buffer + // If true, the I/O issued would be direct I/O which the buffer // will need to be aligned (not sure there is a guarantee that the buffer // passed in is aligned). const bool use_direct_io_; @@ -426,9 +430,7 @@ public: class WinDirectory : public Directory { HANDLE handle_; public: - explicit - WinDirectory(HANDLE h) noexcept : - handle_(h) { + explicit WinDirectory(HANDLE h) noexcept : handle_(h) { assert(handle_ != INVALID_HANDLE_VALUE); } ~WinDirectory() { diff --git a/ceph/src/rocksdb/port/win/port_win.cc b/ceph/src/rocksdb/port/win/port_win.cc index 75b4ec6de..03ba6ef42 100644 --- a/ceph/src/rocksdb/port/win/port_win.cc +++ b/ceph/src/rocksdb/port/win/port_win.cc @@ -14,7 +14,7 @@ #include "port/win/port_win.h" #include -#include "port/dirent.h" +#include "port/port_dirent.h" #include "port/sys_time.h" #include @@ -26,11 +26,33 @@ #include #include +#ifdef ROCKSDB_WINDOWS_UTF8_FILENAMES +// utf8 <-> utf16 +#include +#include +#include +#endif + #include "util/logging.h" namespace rocksdb { + +extern const bool kDefaultToAdaptiveMutex = false; + namespace port { +#ifdef ROCKSDB_WINDOWS_UTF8_FILENAMES +std::string utf16_to_utf8(const std::wstring& utf16) { + std::wstring_convert,wchar_t> convert; + return convert.to_bytes(utf16); +} + +std::wstring utf8_to_utf16(const std::string& utf8) { + std::wstring_convert> converter; + return converter.from_bytes(utf8); +} +#endif + void gettimeofday(struct timeval* tv, struct timezone* /* tz */) { using namespace std::chrono; @@ -110,7 +132,7 @@ void InitOnce(OnceType* once, void (*initializer)()) { struct DIR { HANDLE handle_; bool firstread_; - WIN32_FIND_DATA data_; + RX_WIN32_FIND_DATA data_; dirent entry_; DIR() : handle_(INVALID_HANDLE_VALUE), @@ -137,7 +159,7 @@ DIR* opendir(const char* name) { std::unique_ptr dir(new DIR); - dir->handle_ = ::FindFirstFileExA(pattern.c_str(), + dir->handle_ = RX_FindFirstFileEx(RX_FN(pattern).c_str(), FindExInfoBasic, // Do not want alternative name &dir->data_, FindExSearchNameMatch, @@ -148,8 +170,9 @@ DIR* opendir(const char* name) { return nullptr; } + RX_FILESTRING x(dir->data_.cFileName, RX_FNLEN(dir->data_.cFileName)); strcpy_s(dir->entry_.d_name, sizeof(dir->entry_.d_name), - dir->data_.cFileName); + FN_TO_RX(x).c_str()); return dir.release(); } @@ -165,14 +188,15 @@ struct dirent* readdir(DIR* dirp) { return &dirp->entry_; } - auto ret = ::FindNextFileA(dirp->handle_, &dirp->data_); + auto ret = RX_FindNextFile(dirp->handle_, &dirp->data_); if (ret == 0) { return nullptr; } + RX_FILESTRING x(dirp->data_.cFileName, RX_FNLEN(dirp->data_.cFileName)); strcpy_s(dirp->entry_.d_name, sizeof(dirp->entry_.d_name), - dirp->data_.cFileName); + FN_TO_RX(x).c_str()); return &dirp->entry_; } @@ -182,11 +206,15 @@ int closedir(DIR* dirp) { return 0; } -int truncate(const char* path, int64_t len) { +int truncate(const char* path, int64_t length) { if (path == nullptr) { errno = EFAULT; return -1; } + return rocksdb::port::Truncate(path, length); +} + +int Truncate(std::string path, int64_t len) { if (len < 0) { errno = EINVAL; @@ -194,7 +222,7 @@ int truncate(const char* path, int64_t len) { } HANDLE hFile = - CreateFile(path, GENERIC_READ | GENERIC_WRITE, + RX_CreateFile(RX_FN(path).c_str(), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, // Security attrs OPEN_EXISTING, // Truncate existing file only diff --git a/ceph/src/rocksdb/port/win/port_win.h b/ceph/src/rocksdb/port/win/port_win.h index 41ccea68d..de41cdc7f 100644 --- a/ceph/src/rocksdb/port/win/port_win.h +++ b/ceph/src/rocksdb/port/win/port_win.h @@ -78,6 +78,8 @@ namespace rocksdb { #define PREFETCH(addr, rw, locality) +extern const bool kDefaultToAdaptiveMutex; + namespace port { // VS < 2015 @@ -93,7 +95,9 @@ namespace port { // For use at db/file_indexer.h kLevelMaxIndex const uint32_t kMaxUint32 = UINT32_MAX; const int kMaxInt32 = INT32_MAX; +const int kMinInt32 = INT32_MIN; const int64_t kMaxInt64 = INT64_MAX; +const int64_t kMinInt64 = INT64_MIN; const uint64_t kMaxUint64 = UINT64_MAX; #ifdef _WIN64 @@ -109,8 +113,10 @@ const size_t kMaxSizet = UINT_MAX; // For use at db/file_indexer.h kLevelMaxIndex const uint32_t kMaxUint32 = std::numeric_limits::max(); const int kMaxInt32 = std::numeric_limits::max(); +const int kMinInt32 = std::numeric_limits::min(); const uint64_t kMaxUint64 = std::numeric_limits::max(); const int64_t kMaxInt64 = std::numeric_limits::max(); +const int64_t kMinInt64 = std::numeric_limits::min(); const size_t kMaxSizet = std::numeric_limits::max(); @@ -123,7 +129,7 @@ class CondVar; class Mutex { public: - /* implicit */ Mutex(bool adaptive = false) + /* implicit */ Mutex(bool adaptive = kDefaultToAdaptiveMutex) #ifndef NDEBUG : locked_(false) #endif @@ -327,11 +333,62 @@ inline void* pthread_getspecific(pthread_key_t key) { // using C-runtime to implement. Note, this does not // feel space with zeros in case the file is extended. int truncate(const char* path, int64_t length); +int Truncate(std::string path, int64_t length); void Crash(const std::string& srcfile, int srcline); extern int GetMaxOpenFiles(); +std::string utf16_to_utf8(const std::wstring& utf16); +std::wstring utf8_to_utf16(const std::string& utf8); } // namespace port + +#ifdef ROCKSDB_WINDOWS_UTF8_FILENAMES + +#define RX_FILESTRING std::wstring +#define RX_FN(a) rocksdb::port::utf8_to_utf16(a) +#define FN_TO_RX(a) rocksdb::port::utf16_to_utf8(a) +#define RX_FNLEN(a) ::wcslen(a) + +#define RX_DeleteFile DeleteFileW +#define RX_CreateFile CreateFileW +#define RX_CreateFileMapping CreateFileMappingW +#define RX_GetFileAttributesEx GetFileAttributesExW +#define RX_FindFirstFileEx FindFirstFileExW +#define RX_FindNextFile FindNextFileW +#define RX_WIN32_FIND_DATA WIN32_FIND_DATAW +#define RX_CreateDirectory CreateDirectoryW +#define RX_RemoveDirectory RemoveDirectoryW +#define RX_GetFileAttributesEx GetFileAttributesExW +#define RX_MoveFileEx MoveFileExW +#define RX_CreateHardLink CreateHardLinkW +#define RX_PathIsRelative PathIsRelativeW +#define RX_GetCurrentDirectory GetCurrentDirectoryW + +#else + +#define RX_FILESTRING std::string +#define RX_FN(a) a +#define FN_TO_RX(a) a +#define RX_FNLEN(a) strlen(a) + +#define RX_DeleteFile DeleteFileA +#define RX_CreateFile CreateFileA +#define RX_CreateFileMapping CreateFileMappingA +#define RX_GetFileAttributesEx GetFileAttributesExA +#define RX_FindFirstFileEx FindFirstFileExA +#define RX_CreateDirectory CreateDirectoryA +#define RX_FindNextFile FindNextFileA +#define RX_WIN32_FIND_DATA WIN32_FIND_DATA +#define RX_CreateDirectory CreateDirectoryA +#define RX_RemoveDirectory RemoveDirectoryA +#define RX_GetFileAttributesEx GetFileAttributesExA +#define RX_MoveFileEx MoveFileExA +#define RX_CreateHardLink CreateHardLinkA +#define RX_PathIsRelative PathIsRelativeA +#define RX_GetCurrentDirectory GetCurrentDirectoryA + +#endif + using port::pthread_key_t; using port::pthread_key_create; using port::pthread_key_delete; diff --git a/ceph/src/rocksdb/port/win/win_thread.cc b/ceph/src/rocksdb/port/win/win_thread.cc index b48af2370..9a976e2c6 100644 --- a/ceph/src/rocksdb/port/win/win_thread.cc +++ b/ceph/src/rocksdb/port/win/win_thread.cc @@ -40,7 +40,7 @@ struct WindowsThread::Data { void WindowsThread::Init(std::function&& func) { data_ = std::make_shared(std::move(func)); - // We create another instance of shared_ptr to get an additional ref + // We create another instance of std::shared_ptr to get an additional ref // since we may detach and destroy this instance before the threadproc // may start to run. We choose to allocate this additional ref on the heap // so we do not need to synchronize and allow this thread to proceed diff --git a/ceph/src/rocksdb/src.mk b/ceph/src/rocksdb/src.mk index 9a0ce92ba..55b4e3427 100644 --- a/ceph/src/rocksdb/src.mk +++ b/ceph/src/rocksdb/src.mk @@ -11,6 +11,7 @@ LIB_SOURCES = \ db/compaction_iterator.cc \ db/compaction_job.cc \ db/compaction_picker.cc \ + db/compaction_picker_fifo.cc \ db/compaction_picker_universal.cc \ db/convenience.cc \ db/db_filesnapshot.cc \ @@ -21,6 +22,7 @@ LIB_SOURCES = \ db/db_impl_files.cc \ db/db_impl_open.cc \ db/db_impl_readonly.cc \ + db/db_impl_secondary.cc \ db/db_impl_write.cc \ db/db_info_dumper.cc \ db/db_iter.cc \ @@ -33,6 +35,7 @@ LIB_SOURCES = \ db/flush_job.cc \ db/flush_scheduler.cc \ db/forward_iterator.cc \ + db/in_memory_stats_history.cc \ db/internal_stats.cc \ db/logs_with_prep_tracker.cc \ db/log_reader.cc \ @@ -43,6 +46,7 @@ LIB_SOURCES = \ db/merge_helper.cc \ db/merge_operator.cc \ db/range_del_aggregator.cc \ + db/range_tombstone_fragmenter.cc \ db/repair.cc \ db/snapshot_impl.cc \ db/table_cache.cc \ @@ -64,7 +68,6 @@ LIB_SOURCES = \ env/io_posix.cc \ env/mock_env.cc \ memtable/alloc_tracker.cc \ - memtable/hash_cuckoo_rep.cc \ memtable/hash_linklist_rep.cc \ memtable/hash_skiplist_rep.cc \ memtable/skiplistrep.cc \ @@ -120,6 +123,7 @@ LIB_SOURCES = \ table/plain_table_index.cc \ table/plain_table_key_coding.cc \ table/plain_table_reader.cc \ + table/sst_file_reader.cc \ table/sst_file_writer.cc \ table/table_properties.cc \ table/two_level_iterator.cc \ @@ -133,6 +137,7 @@ LIB_SOURCES = \ util/comparator.cc \ util/compression_context_cache.cc \ util/concurrent_arena.cc \ + util/concurrent_task_limiter_impl.cc \ util/crc32c.cc \ util/delete_scheduler.cc \ util/dynamic_bloom.cc \ @@ -142,6 +147,7 @@ LIB_SOURCES = \ util/filename.cc \ util/filter_policy.cc \ util/hash.cc \ + util/jemalloc_nodump_allocator.cc \ util/log_buffer.cc \ util/murmurhash.cc \ util/random.cc \ @@ -172,16 +178,10 @@ LIB_SOURCES = \ utilities/checkpoint/checkpoint_impl.cc \ utilities/compaction_filters/remove_emptyvalue_compactionfilter.cc \ utilities/convenience/info_log_finder.cc \ - utilities/date_tiered/date_tiered_db_impl.cc \ utilities/debug.cc \ - utilities/document/document_db.cc \ - utilities/document/json_document.cc \ - utilities/document/json_document_builder.cc \ utilities/env_mirror.cc \ utilities/env_timed.cc \ - utilities/geodb/geodb_impl.cc \ utilities/leveldb_options/leveldb_options.cc \ - utilities/lua/rocks_lua_compaction_filter.cc \ utilities/memory/memory_util.cc \ utilities/merge_operators/max.cc \ utilities/merge_operators/put.cc \ @@ -196,9 +196,7 @@ LIB_SOURCES = \ utilities/persistent_cache/block_cache_tier_metadata.cc \ utilities/persistent_cache/persistent_cache_tier.cc \ utilities/persistent_cache/volatile_tier_impl.cc \ - utilities/redis/redis_lists.cc \ utilities/simulator_cache/sim_cache.cc \ - utilities/spatialdb/spatial_db.cc \ utilities/table_properties_collectors/compact_on_deletion_collector.cc \ utilities/trace/file_trace_reader_writer.cc \ utilities/transactions/optimistic_transaction.cc \ @@ -244,11 +242,6 @@ MOCK_LIB_SOURCES = \ BENCH_LIB_SOURCES = \ tools/db_bench_tool.cc \ -EXP_LIB_SOURCES = \ - utilities/col_buf_decoder.cc \ - utilities/col_buf_encoder.cc \ - utilities/column_aware_encoding_util.cc - TEST_LIB_SOURCES = \ db/db_test_util.cc \ util/testharness.cc \ @@ -287,6 +280,7 @@ MAIN_SOURCES = \ db/db_options_test.cc \ db/db_properties_test.cc \ db/db_range_del_test.cc \ + db/db_secondary_test.cc \ db/db_sst_test.cc \ db/db_statistics_test.cc \ db/db_table_properties_test.cc \ @@ -325,10 +319,10 @@ MAIN_SOURCES = \ db/persistent_cache_test.cc \ db/plain_table_db_test.cc \ db/prefix_test.cc \ - db/redis_test.cc \ db/repair_test.cc \ db/range_del_aggregator_test.cc \ db/range_del_aggregator_bench.cc \ + db/range_tombstone_fragmenter_test.cc \ db/table_properties_collector_test.cc \ db/util_merge_operators_test.cc \ db/version_builder_test.cc \ @@ -357,6 +351,7 @@ MAIN_SOURCES = \ table/data_block_hash_index_test.cc \ table/full_filter_block_test.cc \ table/merger_test.cc \ + table/sst_file_reader_test.cc \ table/table_reader_bench.cc \ table/table_test.cc \ third-party/gtest-1.7.0/fused-src/gtest/gtest-all.cc \ @@ -390,21 +385,12 @@ MAIN_SOURCES = \ utilities/cassandra/cassandra_row_merge_test.cc \ utilities/cassandra/cassandra_serialize_test.cc \ utilities/checkpoint/checkpoint_test.cc \ - utilities/column_aware_encoding_exp.cc \ - utilities/column_aware_encoding_test.cc \ - utilities/date_tiered/date_tiered_test.cc \ - utilities/document/document_db_test.cc \ - utilities/document/json_document_test.cc \ - utilities/geodb/geodb_test.cc \ - utilities/lua/rocks_lua_test.cc \ utilities/memory/memory_test.cc \ utilities/merge_operators/string_append/stringappend_test.cc \ utilities/object_registry_test.cc \ utilities/option_change_migration/option_change_migration_test.cc \ utilities/options/options_util_test.cc \ - utilities/redis/redis_lists_test.cc \ utilities/simulator_cache/sim_cache_test.cc \ - utilities/spatialdb/spatial_db_test.cc \ utilities/table_properties_collectors/compact_on_deletion_collector_test.cc \ utilities/transactions/optimistic_transaction_test.cc \ utilities/transactions/transaction_test.cc \ @@ -419,10 +405,13 @@ JNI_NATIVE_SOURCES = \ java/rocksjni/checkpoint.cc \ java/rocksjni/clock_cache.cc \ java/rocksjni/columnfamilyhandle.cc \ + java/rocksjni/compact_range_options.cc \ java/rocksjni/compaction_filter.cc \ java/rocksjni/compaction_filter_factory.cc \ java/rocksjni/compaction_filter_factory_jnicallback.cc \ - java/rocksjni/compact_range_options.cc \ + java/rocksjni/compaction_job_info.cc \ + java/rocksjni/compaction_job_stats.cc \ + java/rocksjni/compaction_options.cc \ java/rocksjni/compaction_options_fifo.cc \ java/rocksjni/compaction_options_universal.cc \ java/rocksjni/comparator.cc \ @@ -437,12 +426,14 @@ JNI_NATIVE_SOURCES = \ java/rocksjni/loggerjnicallback.cc \ java/rocksjni/lru_cache.cc \ java/rocksjni/memtablejni.cc \ + java/rocksjni/memory_util.cc \ java/rocksjni/merge_operator.cc \ java/rocksjni/native_comparator_wrapper_test.cc \ java/rocksjni/optimistic_transaction_db.cc \ java/rocksjni/optimistic_transaction_options.cc \ java/rocksjni/options.cc \ java/rocksjni/options_util.cc \ + java/rocksjni/persistent_cache.cc \ java/rocksjni/ratelimiterjni.cc \ java/rocksjni/remove_emptyvalue_compactionfilterjni.cc \ java/rocksjni/cassandra_compactionfilterjni.cc \ @@ -458,6 +449,11 @@ JNI_NATIVE_SOURCES = \ java/rocksjni/statistics.cc \ java/rocksjni/statisticsjni.cc \ java/rocksjni/table.cc \ + java/rocksjni/table_filter.cc \ + java/rocksjni/table_filter_jnicallback.cc \ + java/rocksjni/thread_status.cc \ + java/rocksjni/trace_writer.cc \ + java/rocksjni/trace_writer_jnicallback.cc \ java/rocksjni/transaction.cc \ java/rocksjni/transaction_db.cc \ java/rocksjni/transaction_options.cc \ @@ -466,7 +462,10 @@ JNI_NATIVE_SOURCES = \ java/rocksjni/transaction_notifier.cc \ java/rocksjni/transaction_notifier_jnicallback.cc \ java/rocksjni/ttl.cc \ + java/rocksjni/wal_filter.cc \ + java/rocksjni/wal_filter_jnicallback.cc \ java/rocksjni/write_batch.cc \ java/rocksjni/writebatchhandlerjnicallback.cc \ java/rocksjni/write_batch_test.cc \ - java/rocksjni/write_batch_with_index.cc + java/rocksjni/write_batch_with_index.cc \ + java/rocksjni/write_buffer_manager.cc diff --git a/ceph/src/rocksdb/table/adaptive_table_factory.cc b/ceph/src/rocksdb/table/adaptive_table_factory.cc index 0a3e9415a..bbba3b919 100644 --- a/ceph/src/rocksdb/table/adaptive_table_factory.cc +++ b/ceph/src/rocksdb/table/adaptive_table_factory.cc @@ -42,8 +42,8 @@ extern const uint64_t kCuckooTableMagicNumber; Status AdaptiveTableFactory::NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table, bool /*prefetch_index_and_filter_in_cache*/) const { Footer footer; auto s = ReadFooterFromFile(file.get(), nullptr /* prefetch_buffer */, diff --git a/ceph/src/rocksdb/table/adaptive_table_factory.h b/ceph/src/rocksdb/table/adaptive_table_factory.h index 00af6a76e..5534c8b37 100644 --- a/ceph/src/rocksdb/table/adaptive_table_factory.h +++ b/ceph/src/rocksdb/table/adaptive_table_factory.h @@ -14,7 +14,6 @@ namespace rocksdb { struct EnvOptions; -using std::unique_ptr; class Status; class RandomAccessFile; class WritableFile; @@ -35,8 +34,8 @@ class AdaptiveTableFactory : public TableFactory { Status NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table, bool prefetch_index_and_filter_in_cache = true) const override; TableBuilder* NewTableBuilder( diff --git a/ceph/src/rocksdb/table/block.cc b/ceph/src/rocksdb/table/block.cc index c8247828e..7c83ebb64 100644 --- a/ceph/src/rocksdb/table/block.cc +++ b/ceph/src/rocksdb/table/block.cc @@ -63,6 +63,39 @@ struct DecodeEntry { } }; +// Helper routine: similar to DecodeEntry but does not have assertions. +// Instead, returns nullptr so that caller can detect and report failure. +struct CheckAndDecodeEntry { + inline const char* operator()(const char* p, const char* limit, + uint32_t* shared, uint32_t* non_shared, + uint32_t* value_length) { + // We need 2 bytes for shared and non_shared size. We also need one more + // byte either for value size or the actual value in case of value delta + // encoding. + if (limit - p < 3) { + return nullptr; + } + *shared = reinterpret_cast(p)[0]; + *non_shared = reinterpret_cast(p)[1]; + *value_length = reinterpret_cast(p)[2]; + if ((*shared | *non_shared | *value_length) < 128) { + // Fast path: all three values are encoded in one byte each + p += 3; + } else { + if ((p = GetVarint32Ptr(p, limit, shared)) == nullptr) return nullptr; + if ((p = GetVarint32Ptr(p, limit, non_shared)) == nullptr) return nullptr; + if ((p = GetVarint32Ptr(p, limit, value_length)) == nullptr) { + return nullptr; + } + } + + if (static_cast(limit - p) < (*non_shared + *value_length)) { + return nullptr; + } + return p; + } +}; + struct DecodeKey { inline const char* operator()(const char* p, const char* limit, uint32_t* shared, uint32_t* non_shared) { @@ -96,7 +129,12 @@ struct DecodeKeyV4 { void DataBlockIter::Next() { assert(Valid()); - ParseNextDataKey(); + ParseNextDataKey(); +} + +void DataBlockIter::NextOrReport() { + assert(Valid()); + ParseNextDataKey(); } void IndexBlockIter::Next() { @@ -179,7 +217,7 @@ void DataBlockIter::Prev() { SeekToRestartPoint(restart_index_); do { - if (!ParseNextDataKey()) { + if (!ParseNextDataKey()) { break; } Slice current_key = key(); @@ -218,7 +256,7 @@ void DataBlockIter::Seek(const Slice& target) { // Linear search (within restart block) for first key >= target while (true) { - if (!ParseNextDataKey() || Compare(key_, seek_key) >= 0) { + if (!ParseNextDataKey() || Compare(key_, seek_key) >= 0) { return; } } @@ -297,7 +335,7 @@ bool DataBlockIter::SeekForGetImpl(const Slice& target) { // // TODO(fwu): check the left and write boundary of the restart interval // to avoid linear seek a target key that is out of range. - if (!ParseNextDataKey(limit) || Compare(key_, target) >= 0) { + if (!ParseNextDataKey(limit) || Compare(key_, target) >= 0) { // we stop at the first potential matching user key. break; } @@ -391,7 +429,7 @@ void DataBlockIter::SeekForPrev(const Slice& target) { SeekToRestartPoint(index); // Linear search (within restart block) for first key >= seek_key - while (ParseNextDataKey() && Compare(key_, seek_key) < 0) { + while (ParseNextDataKey() && Compare(key_, seek_key) < 0) { } if (!Valid()) { SeekToLast(); @@ -407,7 +445,15 @@ void DataBlockIter::SeekToFirst() { return; } SeekToRestartPoint(0); - ParseNextDataKey(); + ParseNextDataKey(); +} + +void DataBlockIter::SeekToFirstOrReport() { + if (data_ == nullptr) { // Not init yet + return; + } + SeekToRestartPoint(0); + ParseNextDataKey(); } void IndexBlockIter::SeekToFirst() { @@ -423,7 +469,7 @@ void DataBlockIter::SeekToLast() { return; } SeekToRestartPoint(num_restarts_ - 1); - while (ParseNextDataKey() && NextEntryOffset() < restarts_) { + while (ParseNextDataKey() && NextEntryOffset() < restarts_) { // Keep skipping } } @@ -447,6 +493,7 @@ void BlockIter::CorruptionError() { value_.clear(); } +template bool DataBlockIter::ParseNextDataKey(const char* limit) { current_ = NextEntryOffset(); const char* p = data_ + current_; @@ -463,7 +510,7 @@ bool DataBlockIter::ParseNextDataKey(const char* limit) { // Decode next entry uint32_t shared, non_shared, value_length; - p = DecodeEntry()(p, limit, &shared, &non_shared, &value_length); + p = DecodeEntryFunc()(p, limit, &shared, &non_shared, &value_length); if (p == nullptr || key_.Size() < shared) { CorruptionError(); return false; @@ -722,13 +769,13 @@ bool IndexBlockIter::PrefixSeek(const Slice& target, uint32_t* index) { if (num_blocks == 0) { current_ = restarts_; return false; - } else { + } else { return BinaryBlockIndexSeek(seek_key, block_ids, 0, num_blocks - 1, index); } } uint32_t Block::NumRestarts() const { - assert(size_ >= 2*sizeof(uint32_t)); + assert(size_ >= 2 * sizeof(uint32_t)); uint32_t block_footer = DecodeFixed32(data_ + size_ - sizeof(uint32_t)); uint32_t num_restarts = block_footer; if (size_ > kMaxBlockSizeSupportedByHashIndex) { @@ -781,45 +828,43 @@ Block::Block(BlockContents&& contents, SequenceNumber _global_seqno, size_ = 0; // Error marker } else { // Should only decode restart points for uncompressed blocks - if (compression_type() == kNoCompression) { - num_restarts_ = NumRestarts(); - switch (IndexType()) { - case BlockBasedTableOptions::kDataBlockBinarySearch: - restart_offset_ = static_cast(size_) - - (1 + num_restarts_) * sizeof(uint32_t); - if (restart_offset_ > size_ - sizeof(uint32_t)) { - // The size is too small for NumRestarts() and therefore - // restart_offset_ wrapped around. - size_ = 0; - } + num_restarts_ = NumRestarts(); + switch (IndexType()) { + case BlockBasedTableOptions::kDataBlockBinarySearch: + restart_offset_ = static_cast(size_) - + (1 + num_restarts_) * sizeof(uint32_t); + if (restart_offset_ > size_ - sizeof(uint32_t)) { + // The size is too small for NumRestarts() and therefore + // restart_offset_ wrapped around. + size_ = 0; + } + break; + case BlockBasedTableOptions::kDataBlockBinaryAndHash: + if (size_ < sizeof(uint32_t) /* block footer */ + + sizeof(uint16_t) /* NUM_BUCK */) { + size_ = 0; break; - case BlockBasedTableOptions::kDataBlockBinaryAndHash: - if (size_ < sizeof(uint32_t) /* block footer */ + - sizeof(uint16_t) /* NUM_BUCK */) { - size_ = 0; - break; - } - - uint16_t map_offset; - data_block_hash_index_.Initialize( - contents.data.data(), - static_cast(contents.data.size() - - sizeof(uint32_t)), /*chop off - NUM_RESTARTS*/ - &map_offset); - - restart_offset_ = map_offset - num_restarts_ * sizeof(uint32_t); - - if (restart_offset_ > map_offset) { - // map_offset is too small for NumRestarts() and - // therefore restart_offset_ wrapped around. - size_ = 0; - break; - } + } + + uint16_t map_offset; + data_block_hash_index_.Initialize( + contents.data.data(), + static_cast(contents.data.size() - + sizeof(uint32_t)), /*chop off + NUM_RESTARTS*/ + &map_offset); + + restart_offset_ = map_offset - num_restarts_ * sizeof(uint32_t); + + if (restart_offset_ > map_offset) { + // map_offset is too small for NumRestarts() and + // therefore restart_offset_ wrapped around. + size_ = 0; break; - default: - size_ = 0; // Error marker - } + } + break; + default: + size_ = 0; // Error marker } } if (read_amp_bytes_per_bit != 0 && statistics && size_ != 0) { @@ -834,6 +879,7 @@ DataBlockIter* Block::NewIterator(const Comparator* cmp, const Comparator* ucmp, bool /*total_order_seek*/, bool /*key_includes_seq*/, bool /*value_is_full*/, + bool block_contents_pinned, BlockPrefixIndex* /*prefix_index*/) { DataBlockIter* ret_iter; if (iter != nullptr) { @@ -852,7 +898,7 @@ DataBlockIter* Block::NewIterator(const Comparator* cmp, const Comparator* ucmp, } else { ret_iter->Initialize( cmp, ucmp, data_, restart_offset_, num_restarts_, global_seqno_, - read_amp_bitmap_.get(), cachable(), + read_amp_bitmap_.get(), block_contents_pinned, data_block_hash_index_.Valid() ? &data_block_hash_index_ : nullptr); if (read_amp_bitmap_) { if (read_amp_bitmap_->GetStatistics() != stats) { @@ -870,6 +916,7 @@ IndexBlockIter* Block::NewIterator(const Comparator* cmp, const Comparator* ucmp, IndexBlockIter* iter, Statistics* /*stats*/, bool total_order_seek, bool key_includes_seq, bool value_is_full, + bool block_contents_pinned, BlockPrefixIndex* prefix_index) { IndexBlockIter* ret_iter; if (iter != nullptr) { @@ -890,7 +937,8 @@ IndexBlockIter* Block::NewIterator(const Comparator* cmp, total_order_seek ? nullptr : prefix_index; ret_iter->Initialize(cmp, ucmp, data_, restart_offset_, num_restarts_, prefix_index_ptr, key_includes_seq, value_is_full, - cachable(), nullptr /* data_block_hash_index */); + block_contents_pinned, + nullptr /* data_block_hash_index */); } return ret_iter; diff --git a/ceph/src/rocksdb/table/block.h b/ceph/src/rocksdb/table/block.h index 83900b56f..737874abd 100644 --- a/ceph/src/rocksdb/table/block.h +++ b/ceph/src/rocksdb/table/block.h @@ -53,8 +53,8 @@ class BlockReadAmpBitmap { : bitmap_(nullptr), bytes_per_bit_pow_(0), statistics_(statistics), - rnd_( - Random::GetTLSInstance()->Uniform(static_cast(bytes_per_bit))) { + rnd_(Random::GetTLSInstance()->Uniform( + static_cast(bytes_per_bit))) { TEST_SYNC_POINT_CALLBACK("BlockReadAmpBitmap:rnd", &rnd_); assert(block_size > 0 && bytes_per_bit > 0); @@ -64,8 +64,7 @@ class BlockReadAmpBitmap { } // num_bits_needed = ceil(block_size / bytes_per_bit) - size_t num_bits_needed = - ((block_size - 1) >> bytes_per_bit_pow_) + 1; + size_t num_bits_needed = ((block_size - 1) >> bytes_per_bit_pow_) + 1; assert(num_bits_needed > 0); // bitmap_size = ceil(num_bits_needed / kBitsPerEntry) @@ -153,14 +152,12 @@ class Block { size_t size() const { return size_; } const char* data() const { return data_; } - bool cachable() const { return contents_.cachable; } // The additional memory space taken by the block data. size_t usable_size() const { return contents_.usable_size(); } uint32_t NumRestarts() const; + bool own_bytes() const { return contents_.own_bytes(); } + BlockBasedTableOptions::DataBlockIndexType IndexType() const; - CompressionType compression_type() const { - return contents_.compression_type; - } // If comparator is InternalKeyComparator, user_comparator is its user // comparator; they are equal otherwise. @@ -170,7 +167,7 @@ class Block { // // key_includes_seq, default true, means that the keys are in internal key // format. - // value_is_full, default ture, means that no delta encoding is + // value_is_full, default true, means that no delta encoding is // applied to values. // // NewIterator @@ -180,6 +177,14 @@ class Block { // If `prefix_index` is not nullptr this block will do hash lookup for the key // prefix. If total_order_seek is true, prefix_index_ is ignored. // + // If `block_contents_pinned` is true, the caller will guarantee that when + // the cleanup functions are transferred from the iterator to other + // classes, e.g. PinnableSlice, the pointer to the bytes will still be + // valid. Either the iterator holds cache handle or ownership of some resource + // and release them in a release function, or caller is sure that the data + // will not go away (for example, it's from mmapped file which will not be + // closed). + // // NOTE: for the hash based lookup, if a key prefix doesn't match any key, // the iterator will simply be set as "invalid", rather than returning // the key that is just pass the target key. @@ -188,7 +193,8 @@ class Block { const Comparator* comparator, const Comparator* user_comparator, TBlockIter* iter = nullptr, Statistics* stats = nullptr, bool total_order_seek = true, bool key_includes_seq = true, - bool value_is_full = true, BlockPrefixIndex* prefix_index = nullptr); + bool value_is_full = true, bool block_contents_pinned = false, + BlockPrefixIndex* prefix_index = nullptr); // Report an approximation of how much memory has been used. size_t ApproximateMemoryUsage() const; @@ -197,9 +203,9 @@ class Block { private: BlockContents contents_; - const char* data_; // contents_.data.data() - size_t size_; // contents_.data.size() - uint32_t restart_offset_; // Offset in data_ of restart array + const char* data_; // contents_.data.data() + size_t size_; // contents_.data.size() + uint32_t restart_offset_; // Offset in data_ of restart array uint32_t num_restarts_; std::unique_ptr read_amp_bitmap_; // All keys in the block will have seqno = global_seqno_, regardless of @@ -219,8 +225,8 @@ class BlockIter : public InternalIteratorBase { void InitializeBase(const Comparator* comparator, const char* data, uint32_t restarts, uint32_t num_restarts, SequenceNumber global_seqno, bool block_contents_pinned) { - assert(data_ == nullptr); // Ensure it is called only once - assert(num_restarts > 0); // Ensure the param is valid + assert(data_ == nullptr); // Ensure it is called only once + assert(num_restarts > 0); // Ensure the param is valid comparator_ = comparator; data_ = data; @@ -288,14 +294,16 @@ class BlockIter : public InternalIteratorBase { // Index of restart block in which current_ or current_-1 falls uint32_t restart_index_; - uint32_t restarts_; // Offset of restart array (list of fixed32) + uint32_t restarts_; // Offset of restart array (list of fixed32) // current_ is offset in data_ of current entry. >= restarts_ if !Valid uint32_t current_; IterKey key_; Slice value_; Status status_; bool key_pinned_; - // whether the block data is guaranteed to outlive this iterator + // Whether the block data is guaranteed to outlive this iterator, and + // as long as the cleanup functions are transferred to another class, + // e.g. PinnableSlice, the pointer to the bytes will still be valid. bool block_contents_pinned_; SequenceNumber global_seqno_; @@ -386,8 +394,18 @@ class DataBlockIter final : public BlockIter { virtual void Next() override; + // Try to advance to the next entry in the block. If there is data corruption + // or error, report it to the caller instead of aborting the process. May + // incur higher CPU overhead because we need to perform check on every entry. + void NextOrReport(); + virtual void SeekToFirst() override; + // Try to seek to the first entry in the block. If there is data corruption + // or error, report it to caller instead of aborting the process. May incur + // higher CPU overhead because we need to perform check on every entry. + void SeekToFirstOrReport(); + virtual void SeekToLast() override; void Invalidate(Status s) { @@ -430,6 +448,7 @@ class DataBlockIter final : public BlockIter { DataBlockHashIndex* data_block_hash_index_; const Comparator* user_comparator_; + template inline bool ParseNextDataKey(const char* limit = nullptr); inline int Compare(const IterKey& ikey, const Slice& b) const { @@ -449,7 +468,7 @@ class IndexBlockIter final : public BlockIter { } // key_includes_seq, default true, means that the keys are in internal key // format. - // value_is_full, default ture, means that no delta encoding is + // value_is_full, default true, means that no delta encoding is // applied to values. IndexBlockIter(const Comparator* comparator, const Comparator* user_comparator, const char* data, @@ -528,8 +547,7 @@ class IndexBlockIter final : public BlockIter { bool PrefixSeek(const Slice& target, uint32_t* index); bool BinaryBlockIndexSeek(const Slice& target, uint32_t* block_ids, - uint32_t left, uint32_t right, - uint32_t* index); + uint32_t left, uint32_t right, uint32_t* index); inline int CompareBlockKey(uint32_t block_index, const Slice& target); inline int Compare(const Slice& a, const Slice& b) const { diff --git a/ceph/src/rocksdb/table/block_based_filter_block.cc b/ceph/src/rocksdb/table/block_based_filter_block.cc index a4b406869..81087b243 100644 --- a/ceph/src/rocksdb/table/block_based_filter_block.cc +++ b/ceph/src/rocksdb/table/block_based_filter_block.cc @@ -53,7 +53,6 @@ void AppendItem(std::string* props, const TKey& key, const std::string& value) { } } // namespace - // See doc/table_format.txt for an explanation of the filter block format. // Generate new filter every 2KB of data diff --git a/ceph/src/rocksdb/table/block_based_filter_block.h b/ceph/src/rocksdb/table/block_based_filter_block.h index 96a75e361..d1ff58546 100644 --- a/ceph/src/rocksdb/table/block_based_filter_block.h +++ b/ceph/src/rocksdb/table/block_based_filter_block.h @@ -15,8 +15,8 @@ #include #include -#include #include +#include #include #include "rocksdb/options.h" #include "rocksdb/slice.h" @@ -26,7 +26,6 @@ namespace rocksdb { - // A BlockBasedFilterBlockBuilder is used to construct all of the filters for a // particular Table. It generates a single string which is stored as // a special block in the Table. @@ -36,7 +35,7 @@ namespace rocksdb { class BlockBasedFilterBlockBuilder : public FilterBlockBuilder { public: BlockBasedFilterBlockBuilder(const SliceTransform* prefix_extractor, - const BlockBasedTableOptions& table_opt); + const BlockBasedTableOptions& table_opt); virtual bool IsBlockBased() override { return true; } virtual void StartBlock(uint64_t block_offset) override; @@ -66,7 +65,7 @@ class BlockBasedFilterBlockBuilder : public FilterBlockBuilder { std::string result_; // Filter data computed so far std::vector tmp_entries_; // policy_->CreateFilter() argument std::vector filter_offsets_; - size_t num_added_; // Number of keys added + size_t num_added_; // Number of keys added // No copying allowed BlockBasedFilterBlockBuilder(const BlockBasedFilterBlockBuilder&); diff --git a/ceph/src/rocksdb/table/block_based_filter_block_test.cc b/ceph/src/rocksdb/table/block_based_filter_block_test.cc index 8de857f4e..6b352b2f6 100644 --- a/ceph/src/rocksdb/table/block_based_filter_block_test.cc +++ b/ceph/src/rocksdb/table/block_based_filter_block_test.cc @@ -21,18 +21,16 @@ namespace rocksdb { // For testing: emit an array with one hash value per key class TestHashFilter : public FilterPolicy { public: - virtual const char* Name() const override { return "TestHashFilter"; } + const char* Name() const override { return "TestHashFilter"; } - virtual void CreateFilter(const Slice* keys, int n, - std::string* dst) const override { + void CreateFilter(const Slice* keys, int n, std::string* dst) const override { for (int i = 0; i < n; i++) { uint32_t h = Hash(keys[i].data(), keys[i].size(), 1); PutFixed32(dst, h); } } - virtual bool KeyMayMatch(const Slice& key, - const Slice& filter) const override { + bool KeyMayMatch(const Slice& key, const Slice& filter) const override { uint32_t h = Hash(key.data(), key.size(), 1); for (unsigned int i = 0; i + 4 <= filter.size(); i += 4) { if (h == DecodeFixed32(filter.data() + i)) { @@ -55,7 +53,7 @@ class FilterBlockTest : public testing::Test { TEST_F(FilterBlockTest, EmptyBuilder) { BlockBasedFilterBlockBuilder builder(nullptr, table_options_); - BlockContents block(builder.Finish(), false, kNoCompression); + BlockContents block(builder.Finish()); ASSERT_EQ("\\x00\\x00\\x00\\x00\\x0b", EscapeString(block.data)); BlockBasedFilterBlockReader reader(nullptr, table_options_, true, std::move(block), nullptr); @@ -75,7 +73,7 @@ TEST_F(FilterBlockTest, SingleChunk) { builder.StartBlock(300); builder.Add("hello"); ASSERT_EQ(5, builder.NumAdded()); - BlockContents block(builder.Finish(), false, kNoCompression); + BlockContents block(builder.Finish()); BlockBasedFilterBlockReader reader(nullptr, table_options_, true, std::move(block), nullptr); ASSERT_TRUE(reader.KeyMayMatch("foo", nullptr, 100)); @@ -107,7 +105,7 @@ TEST_F(FilterBlockTest, MultiChunk) { builder.Add("box"); builder.Add("hello"); - BlockContents block(builder.Finish(), false, kNoCompression); + BlockContents block(builder.Finish()); BlockBasedFilterBlockReader reader(nullptr, table_options_, true, std::move(block), nullptr); @@ -146,13 +144,13 @@ class BlockBasedFilterBlockTest : public testing::Test { table_options_.filter_policy.reset(NewBloomFilterPolicy(10)); } - ~BlockBasedFilterBlockTest() {} + ~BlockBasedFilterBlockTest() override {} }; TEST_F(BlockBasedFilterBlockTest, BlockBasedEmptyBuilder) { - FilterBlockBuilder* builder = new BlockBasedFilterBlockBuilder( - nullptr, table_options_); - BlockContents block(builder->Finish(), false, kNoCompression); + FilterBlockBuilder* builder = + new BlockBasedFilterBlockBuilder(nullptr, table_options_); + BlockContents block(builder->Finish()); ASSERT_EQ("\\x00\\x00\\x00\\x00\\x0b", EscapeString(block.data)); FilterBlockReader* reader = new BlockBasedFilterBlockReader( nullptr, table_options_, true, std::move(block), nullptr); @@ -164,8 +162,8 @@ TEST_F(BlockBasedFilterBlockTest, BlockBasedEmptyBuilder) { } TEST_F(BlockBasedFilterBlockTest, BlockBasedSingleChunk) { - FilterBlockBuilder* builder = new BlockBasedFilterBlockBuilder( - nullptr, table_options_); + FilterBlockBuilder* builder = + new BlockBasedFilterBlockBuilder(nullptr, table_options_); builder->StartBlock(100); builder->Add("foo"); builder->Add("bar"); @@ -174,7 +172,7 @@ TEST_F(BlockBasedFilterBlockTest, BlockBasedSingleChunk) { builder->Add("box"); builder->StartBlock(300); builder->Add("hello"); - BlockContents block(builder->Finish(), false, kNoCompression); + BlockContents block(builder->Finish()); FilterBlockReader* reader = new BlockBasedFilterBlockReader( nullptr, table_options_, true, std::move(block), nullptr); ASSERT_TRUE(reader->KeyMayMatch("foo", nullptr, 100)); @@ -190,8 +188,8 @@ TEST_F(BlockBasedFilterBlockTest, BlockBasedSingleChunk) { } TEST_F(BlockBasedFilterBlockTest, BlockBasedMultiChunk) { - FilterBlockBuilder* builder = new BlockBasedFilterBlockBuilder( - nullptr, table_options_); + FilterBlockBuilder* builder = + new BlockBasedFilterBlockBuilder(nullptr, table_options_); // First filter builder->StartBlock(0); @@ -210,7 +208,7 @@ TEST_F(BlockBasedFilterBlockTest, BlockBasedMultiChunk) { builder->Add("box"); builder->Add("hello"); - BlockContents block(builder->Finish(), false, kNoCompression); + BlockContents block(builder->Finish()); FilterBlockReader* reader = new BlockBasedFilterBlockReader( nullptr, table_options_, true, std::move(block), nullptr); diff --git a/ceph/src/rocksdb/table/block_based_table_builder.cc b/ceph/src/rocksdb/table/block_based_table_builder.cc index 59c385d65..479311f5b 100644 --- a/ceph/src/rocksdb/table/block_based_table_builder.cc +++ b/ceph/src/rocksdb/table/block_based_table_builder.cc @@ -42,6 +42,7 @@ #include "util/coding.h" #include "util/compression.h" #include "util/crc32c.h" +#include "util/memory_allocator.h" #include "util/stop_watch.h" #include "util/string_util.h" #include "util/xxhash.h" @@ -79,9 +80,11 @@ FilterBlockBuilder* CreateFilterBlockBuilder( // until index builder actully cuts the partition, we take the lower bound // as partition size. assert(table_opt.block_size_deviation <= 100); - auto partition_size = static_cast( - ((table_opt.metadata_block_size * - (100 - table_opt.block_size_deviation)) + 99) / 100); + auto partition_size = + static_cast(((table_opt.metadata_block_size * + (100 - table_opt.block_size_deviation)) + + 99) / + 100); partition_size = std::max(partition_size, static_cast(1)); return new PartitionedFilterBlockBuilder( mopt.prefix_extractor.get(), table_opt.whole_key_filtering, @@ -100,83 +103,105 @@ bool GoodCompressionRatio(size_t compressed_size, size_t raw_size) { return compressed_size < raw_size - (raw_size / 8u); } -} // namespace - -// format_version is the block format as defined in include/rocksdb/table.h -Slice CompressBlock(const Slice& raw, const CompressionContext& compression_ctx, - CompressionType* type, uint32_t format_version, - std::string* compressed_output) { - *type = compression_ctx.type(); - if (compression_ctx.type() == kNoCompression) { - return raw; - } - +bool CompressBlockInternal(const Slice& raw, + const CompressionInfo& compression_info, + uint32_t format_version, + std::string* compressed_output) { // Will return compressed block contents if (1) the compression method is // supported in this platform and (2) the compression rate is "good enough". - switch (compression_ctx.type()) { + switch (compression_info.type()) { case kSnappyCompression: - if (Snappy_Compress(compression_ctx, raw.data(), raw.size(), - compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. + return Snappy_Compress(compression_info, raw.data(), raw.size(), + compressed_output); case kZlibCompression: - if (Zlib_Compress( - compression_ctx, - GetCompressFormatForVersion(kZlibCompression, format_version), - raw.data(), raw.size(), compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. + return Zlib_Compress( + compression_info, + GetCompressFormatForVersion(kZlibCompression, format_version), + raw.data(), raw.size(), compressed_output); case kBZip2Compression: - if (BZip2_Compress( - compression_ctx, - GetCompressFormatForVersion(kBZip2Compression, format_version), - raw.data(), raw.size(), compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. + return BZip2_Compress( + compression_info, + GetCompressFormatForVersion(kBZip2Compression, format_version), + raw.data(), raw.size(), compressed_output); case kLZ4Compression: - if (LZ4_Compress( - compression_ctx, - GetCompressFormatForVersion(kLZ4Compression, format_version), - raw.data(), raw.size(), compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. + return LZ4_Compress( + compression_info, + GetCompressFormatForVersion(kLZ4Compression, format_version), + raw.data(), raw.size(), compressed_output); case kLZ4HCCompression: - if (LZ4HC_Compress( - compression_ctx, - GetCompressFormatForVersion(kLZ4HCCompression, format_version), - raw.data(), raw.size(), compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. + return LZ4HC_Compress( + compression_info, + GetCompressFormatForVersion(kLZ4HCCompression, format_version), + raw.data(), raw.size(), compressed_output); case kXpressCompression: - if (XPRESS_Compress(raw.data(), raw.size(), - compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; + return XPRESS_Compress(raw.data(), raw.size(), compressed_output); case kZSTD: case kZSTDNotFinalCompression: - if (ZSTD_Compress(compression_ctx, raw.data(), raw.size(), - compressed_output) && - GoodCompressionRatio(compressed_output->size(), raw.size())) { - return *compressed_output; - } - break; // fall back to no compression. - default: {} // Do not recognize this compression type + return ZSTD_Compress(compression_info, raw.data(), raw.size(), + compressed_output); + default: + // Do not recognize this compression type + return false; + } +} + +} // namespace + +// format_version is the block format as defined in include/rocksdb/table.h +Slice CompressBlock(const Slice& raw, const CompressionInfo& info, + CompressionType* type, uint32_t format_version, + bool do_sample, std::string* compressed_output, + std::string* sampled_output_fast, + std::string* sampled_output_slow) { + *type = info.type(); + + if (info.type() == kNoCompression && !info.SampleForCompression()) { + return raw; } - // Compression method is not supported, or not good compression ratio, so just - // fall back to uncompressed form. + // If requested, we sample one in every N block with a + // fast and slow compression algorithm and report the stats. + // The users can use these stats to decide if it is worthwhile + // enabling compression and they also get a hint about which + // compression algorithm wil be beneficial. + if (do_sample && info.SampleForCompression() && + Random::GetTLSInstance()->OneIn((int)info.SampleForCompression()) && + sampled_output_fast && sampled_output_slow) { + // Sampling with a fast compression algorithm + if (LZ4_Supported() || Snappy_Supported()) { + CompressionType c = + LZ4_Supported() ? kLZ4Compression : kSnappyCompression; + CompressionContext context(c); + CompressionOptions options; + CompressionInfo info_tmp(options, context, + CompressionDict::GetEmptyDict(), c, + info.SampleForCompression()); + + CompressBlockInternal(raw, info_tmp, format_version, sampled_output_fast); + } + + // Sampling with a slow but high-compression algorithm + if (ZSTD_Supported() || Zlib_Supported()) { + CompressionType c = ZSTD_Supported() ? kZSTD : kZlibCompression; + CompressionContext context(c); + CompressionOptions options; + CompressionInfo info_tmp(options, context, + CompressionDict::GetEmptyDict(), c, + info.SampleForCompression()); + CompressBlockInternal(raw, info_tmp, format_version, sampled_output_slow); + } + } + + // Actually compress the data + if (*type != kNoCompression) { + if (CompressBlockInternal(raw, info, format_version, compressed_output) && + GoodCompressionRatio(compressed_output->size(), raw.size())) { + return *compressed_output; + } + } + + // Compression method is not supported, or not good + // compression ratio, so just fall back to uncompressed form. *type = kNoCompression; return raw; } @@ -209,14 +234,22 @@ class BlockBasedTableBuilder::BlockBasedTablePropertiesCollector whole_key_filtering_(whole_key_filtering), prefix_filtering_(prefix_filtering) {} - virtual Status InternalAdd(const Slice& /*key*/, const Slice& /*value*/, - uint64_t /*file_size*/) override { + Status InternalAdd(const Slice& /*key*/, const Slice& /*value*/, + uint64_t /*file_size*/) override { // Intentionally left blank. Have no interest in collecting stats for // individual key/value pairs. return Status::OK(); } - virtual Status Finish(UserCollectedProperties* properties) override { + virtual void BlockAdd(uint64_t /* blockRawBytes */, + uint64_t /* blockCompressedBytesFast */, + uint64_t /* blockCompressedBytesSlow */) override { + // Intentionally left blank. No interest in collecting stats for + // blocks. + return; + } + + Status Finish(UserCollectedProperties* properties) override { std::string val; PutFixed32(&val, static_cast(index_type_)); properties->insert({BlockBasedTablePropertyNames::kIndexType, val}); @@ -228,11 +261,11 @@ class BlockBasedTableBuilder::BlockBasedTablePropertiesCollector } // The name of the properties collector can be used for debugging purpose. - virtual const char* Name() const override { + const char* Name() const override { return "BlockBasedTablePropertiesCollector"; } - virtual UserCollectedProperties GetReadableProperties() const override { + UserCollectedProperties GetReadableProperties() const override { // Intentionally left blank. return UserCollectedProperties(); } @@ -253,6 +286,13 @@ struct BlockBasedTableBuilder::Rep { Status status; size_t alignment; BlockBuilder data_block; + // Buffers uncompressed data blocks and keys to replay later. Needed when + // compression dictionary is enabled so we can finalize the dictionary before + // compressing any data blocks. + // TODO(ajkr): ideally we don't buffer all keys and all uncompressed data + // blocks as it's redundant, but it's easier to implement for now. + std::vector>> + data_block_and_keys_buffers; BlockBuilder range_del_block; InternalKeySliceTransform internal_prefix_transform; @@ -260,13 +300,43 @@ struct BlockBasedTableBuilder::Rep { PartitionedIndexBuilder* p_index_builder_ = nullptr; std::string last_key; - // Compression dictionary or nullptr - const std::string* compression_dict; + CompressionType compression_type; + uint64_t sample_for_compression; + CompressionOptions compression_opts; + std::unique_ptr compression_dict; CompressionContext compression_ctx; std::unique_ptr verify_ctx; + std::unique_ptr verify_dict; + + size_t data_begin_offset = 0; + TableProperties props; - bool closed = false; // Either Finish() or Abandon() has been called. + // States of the builder. + // + // - `kBuffered`: This is the initial state where zero or more data blocks are + // accumulated uncompressed in-memory. From this state, call + // `EnterUnbuffered()` to finalize the compression dictionary if enabled, + // compress/write out any buffered blocks, and proceed to the `kUnbuffered` + // state. + // + // - `kUnbuffered`: This is the state when compression dictionary is finalized + // either because it wasn't enabled in the first place or it's been created + // from sampling previously buffered data. In this state, blocks are simply + // compressed/written out as they fill up. From this state, call `Finish()` + // to complete the file (write meta-blocks, etc.), or `Abandon()` to delete + // the partially created file. + // + // - `kClosed`: This indicates either `Finish()` or `Abandon()` has been + // called, so the table builder is no longer usable. We must be in this + // state by the time the destructor runs. + enum class State { + kBuffered, + kUnbuffered, + kClosed, + }; + State state; + const bool use_delta_encoding_for_index_values; std::unique_ptr filter_builder; char compressed_cache_key_prefix[BlockBasedTable::kMaxCacheKeyPrefixSize]; @@ -280,6 +350,7 @@ struct BlockBasedTableBuilder::Rep { const std::string& column_family_name; uint64_t creation_time = 0; uint64_t oldest_key_time = 0; + const uint64_t target_file_size; std::vector> table_properties_collectors; @@ -290,10 +361,10 @@ struct BlockBasedTableBuilder::Rep { int_tbl_prop_collector_factories, uint32_t _column_family_id, WritableFileWriter* f, const CompressionType _compression_type, - const CompressionOptions& _compression_opts, - const std::string* _compression_dict, const bool skip_filters, + const uint64_t _sample_for_compression, + const CompressionOptions& _compression_opts, const bool skip_filters, const std::string& _column_family_name, const uint64_t _creation_time, - const uint64_t _oldest_key_time) + const uint64_t _oldest_key_time, const uint64_t _target_file_size) : ioptions(_ioptions), moptions(_moptions), table_options(table_opt), @@ -312,8 +383,14 @@ struct BlockBasedTableBuilder::Rep { table_options.data_block_hash_table_util_ratio), range_del_block(1 /* block_restart_interval */), internal_prefix_transform(_moptions.prefix_extractor.get()), - compression_dict(_compression_dict), - compression_ctx(_compression_type, _compression_opts), + compression_type(_compression_type), + sample_for_compression(_sample_for_compression), + compression_opts(_compression_opts), + compression_dict(), + compression_ctx(_compression_type), + verify_dict(), + state((_compression_opts.max_dict_bytes > 0) ? State::kBuffered + : State::kUnbuffered), use_delta_encoding_for_index_values(table_opt.format_version >= 4 && !table_opt.block_align), compressed_cache_key_prefix_size(0), @@ -323,7 +400,8 @@ struct BlockBasedTableBuilder::Rep { column_family_id(_column_family_id), column_family_name(_column_family_name), creation_time(_creation_time), - oldest_key_time(_oldest_key_time) { + oldest_key_time(_oldest_key_time), + target_file_size(_target_file_size) { if (table_options.index_type == BlockBasedTableOptions::kTwoLevelIndexSearch) { p_index_builder_ = PartitionedIndexBuilder::CreateIndexBuilder( @@ -354,7 +432,7 @@ struct BlockBasedTableBuilder::Rep { _moptions.prefix_extractor != nullptr)); if (table_options.verify_compression) { verify_ctx.reset(new UncompressionContext(UncompressionContext::NoCache(), - compression_ctx.type())); + compression_type)); } } @@ -372,10 +450,10 @@ BlockBasedTableBuilder::BlockBasedTableBuilder( int_tbl_prop_collector_factories, uint32_t column_family_id, WritableFileWriter* file, const CompressionType compression_type, - const CompressionOptions& compression_opts, - const std::string* compression_dict, const bool skip_filters, + const uint64_t sample_for_compression, + const CompressionOptions& compression_opts, const bool skip_filters, const std::string& column_family_name, const uint64_t creation_time, - const uint64_t oldest_key_time) { + const uint64_t oldest_key_time, const uint64_t target_file_size) { BlockBasedTableOptions sanitized_table_options(table_options); if (sanitized_table_options.format_version == 0 && sanitized_table_options.checksum != kCRC32c) { @@ -388,11 +466,11 @@ BlockBasedTableBuilder::BlockBasedTableBuilder( sanitized_table_options.format_version = 1; } - rep_ = - new Rep(ioptions, moptions, sanitized_table_options, internal_comparator, - int_tbl_prop_collector_factories, column_family_id, file, - compression_type, compression_opts, compression_dict, - skip_filters, column_family_name, creation_time, oldest_key_time); + rep_ = new Rep( + ioptions, moptions, sanitized_table_options, internal_comparator, + int_tbl_prop_collector_factories, column_family_id, file, + compression_type, sample_for_compression, compression_opts, skip_filters, + column_family_name, creation_time, oldest_key_time, target_file_size); if (rep_->filter_builder != nullptr) { rep_->filter_builder->StartBlock(0); @@ -406,25 +484,33 @@ BlockBasedTableBuilder::BlockBasedTableBuilder( } BlockBasedTableBuilder::~BlockBasedTableBuilder() { - assert(rep_->closed); // Catch errors where caller forgot to call Finish() + // Catch errors where caller forgot to call Finish() + assert(rep_->state == Rep::State::kClosed); delete rep_; } void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) { Rep* r = rep_; - assert(!r->closed); + assert(rep_->state != Rep::State::kClosed); if (!ok()) return; ValueType value_type = ExtractValueType(key); if (IsValueType(value_type)) { - if (r->props.num_entries > 0) { +#ifndef NDEBUG + if (r->props.num_entries > r->props.num_range_deletions) { assert(r->internal_comparator.Compare(key, Slice(r->last_key)) > 0); } +#endif // NDEBUG auto should_flush = r->flush_block_policy->Update(key, value); if (should_flush) { assert(!r->data_block.empty()); Flush(); + if (r->state == Rep::State::kBuffered && + r->data_begin_offset > r->target_file_size) { + EnterUnbuffered(); + } + // Add item to index block. // We do not emit the index entry for a block until we have seen the // first key for the next data block. This allows us to use shorter @@ -433,52 +519,61 @@ void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) { // "the r" as the key for the index block entry since it is >= all // entries in the first block and < all entries in subsequent // blocks. - if (ok()) { + if (ok() && r->state == Rep::State::kUnbuffered) { r->index_builder->AddIndexEntry(&r->last_key, &key, r->pending_handle); } } // Note: PartitionedFilterBlockBuilder requires key being added to filter // builder after being added to index builder. - if (r->filter_builder != nullptr) { + if (r->state == Rep::State::kUnbuffered && r->filter_builder != nullptr) { r->filter_builder->Add(ExtractUserKey(key)); } r->last_key.assign(key.data(), key.size()); r->data_block.Add(key, value); - r->props.num_entries++; - r->props.raw_key_size += key.size(); - r->props.raw_value_size += value.size(); - - r->index_builder->OnKeyAdded(key); + if (r->state == Rep::State::kBuffered) { + // Buffer keys to be replayed during `Finish()` once compression + // dictionary has been finalized. + if (r->data_block_and_keys_buffers.empty() || should_flush) { + r->data_block_and_keys_buffers.emplace_back(); + } + r->data_block_and_keys_buffers.back().second.emplace_back(key.ToString()); + } else { + r->index_builder->OnKeyAdded(key); + } NotifyCollectTableCollectorsOnAdd(key, value, r->offset, r->table_properties_collectors, r->ioptions.info_log); } else if (value_type == kTypeRangeDeletion) { r->range_del_block.Add(key, value); - ++r->props.num_range_deletions; - r->props.raw_key_size += key.size(); - r->props.raw_value_size += value.size(); NotifyCollectTableCollectorsOnAdd(key, value, r->offset, r->table_properties_collectors, r->ioptions.info_log); } else { assert(false); } + + r->props.num_entries++; + r->props.raw_key_size += key.size(); + r->props.raw_value_size += value.size(); + if (value_type == kTypeDeletion || value_type == kTypeSingleDeletion) { + r->props.num_deletions++; + } else if (value_type == kTypeRangeDeletion) { + r->props.num_deletions++; + r->props.num_range_deletions++; + } else if (value_type == kTypeMerge) { + r->props.num_merge_operands++; + } } void BlockBasedTableBuilder::Flush() { Rep* r = rep_; - assert(!r->closed); + assert(rep_->state != Rep::State::kClosed); if (!ok()) return; if (r->data_block.empty()) return; WriteBlock(&r->data_block, &r->pending_handle, true /* is_data_block */); - if (r->filter_builder != nullptr) { - r->filter_builder->StartBlock(r->offset); - } - r->props.data_size = r->offset; - ++r->props.num_data_blocks; } void BlockBasedTableBuilder::WriteBlock(BlockBuilder* block, @@ -498,42 +593,64 @@ void BlockBasedTableBuilder::WriteBlock(const Slice& raw_block_contents, assert(ok()); Rep* r = rep_; - auto type = r->compression_ctx.type(); + auto type = r->compression_type; + uint64_t sample_for_compression = r->sample_for_compression; Slice block_contents; bool abort_compression = false; - StopWatchNano timer(r->ioptions.env, - ShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)); + StopWatchNano timer( + r->ioptions.env, + ShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)); + + if (r->state == Rep::State::kBuffered) { + assert(is_data_block); + assert(!r->data_block_and_keys_buffers.empty()); + r->data_block_and_keys_buffers.back().first = raw_block_contents.ToString(); + r->data_begin_offset += r->data_block_and_keys_buffers.back().first.size(); + return; + } if (raw_block_contents.size() < kCompressionSizeLimit) { - Slice compression_dict; - if (is_data_block && r->compression_dict && r->compression_dict->size()) { - r->compression_ctx.dict() = *r->compression_dict; - if (r->table_options.verify_compression) { - assert(r->verify_ctx != nullptr); - r->verify_ctx->dict() = *r->compression_dict; - } + const CompressionDict* compression_dict; + if (!is_data_block || r->compression_dict == nullptr) { + compression_dict = &CompressionDict::GetEmptyDict(); } else { - // Clear dictionary - r->compression_ctx.dict() = Slice(); - if (r->table_options.verify_compression) { - assert(r->verify_ctx != nullptr); - r->verify_ctx->dict() = Slice(); - } + compression_dict = r->compression_dict.get(); } - - block_contents = - CompressBlock(raw_block_contents, r->compression_ctx, &type, - r->table_options.format_version, &r->compressed_output); + assert(compression_dict != nullptr); + CompressionInfo compression_info(r->compression_opts, r->compression_ctx, + *compression_dict, type, + sample_for_compression); + + std::string sampled_output_fast; + std::string sampled_output_slow; + block_contents = CompressBlock( + raw_block_contents, compression_info, &type, + r->table_options.format_version, is_data_block /* do_sample */, + &r->compressed_output, &sampled_output_fast, &sampled_output_slow); + + // notify collectors on block add + NotifyCollectTableCollectorsOnBlockAdd( + r->table_properties_collectors, raw_block_contents.size(), + sampled_output_fast.size(), sampled_output_slow.size()); // Some of the compression algorithms are known to be unreliable. If // the verify_compression flag is set then try to de-compress the // compressed data and compare to the input. if (type != kNoCompression && r->table_options.verify_compression) { // Retrieve the uncompressed contents into a new buffer + const UncompressionDict* verify_dict; + if (!is_data_block || r->verify_dict == nullptr) { + verify_dict = &UncompressionDict::GetEmptyDict(); + } else { + verify_dict = r->verify_dict.get(); + } + assert(verify_dict != nullptr); BlockContents contents; + UncompressionInfo uncompression_info(*r->verify_ctx, *verify_dict, + r->compression_type); Status stat = UncompressBlockContentsForCompressionType( - *r->verify_ctx, block_contents.data(), block_contents.size(), + uncompression_info, block_contents.data(), block_contents.size(), &contents, r->table_options.format_version, r->ioptions); if (stat.ok()) { @@ -565,16 +682,25 @@ void BlockBasedTableBuilder::WriteBlock(const Slice& raw_block_contents, block_contents = raw_block_contents; } else if (type != kNoCompression) { if (ShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)) { - MeasureTime(r->ioptions.statistics, COMPRESSION_TIMES_NANOS, - timer.ElapsedNanos()); + RecordTimeToHistogram(r->ioptions.statistics, COMPRESSION_TIMES_NANOS, + timer.ElapsedNanos()); } - MeasureTime(r->ioptions.statistics, BYTES_COMPRESSED, - raw_block_contents.size()); + RecordInHistogram(r->ioptions.statistics, BYTES_COMPRESSED, + raw_block_contents.size()); RecordTick(r->ioptions.statistics, NUMBER_BLOCK_COMPRESSED); + } else if (type != r->compression_type) { + RecordTick(r->ioptions.statistics, NUMBER_BLOCK_NOT_COMPRESSED); } WriteRawBlock(block_contents, type, handle, is_data_block); r->compressed_output.clear(); + if (is_data_block) { + if (r->filter_builder != nullptr) { + r->filter_builder->StartBlock(r->offset); + } + r->props.data_size = r->offset; + ++r->props.num_data_blocks; + } } void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents, @@ -609,9 +735,25 @@ void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents, EncodeFixed32(trailer_without_type, XXH32_digest(xxh)); break; } + case kxxHash64: { + XXH64_state_t* const state = XXH64_createState(); + XXH64_reset(state, 0); + XXH64_update(state, block_contents.data(), + static_cast(block_contents.size())); + XXH64_update(state, trailer, 1); // Extend to cover block type + EncodeFixed32( + trailer_without_type, + static_cast(XXH64_digest(state) & // lower 32 bits + uint64_t{0xffffffff})); + XXH64_freeState(state); + break; + } } assert(r->status.ok()); + TEST_SYNC_POINT_CALLBACK( + "BlockBasedTableBuilder::WriteRawBlock:TamperWithChecksum", + static_cast(trailer)); r->status = r->file->Append(Slice(trailer, kBlockTrailerSize)); if (r->status.ok()) { r->status = InsertBlockInCache(block_contents, type, handle); @@ -632,13 +774,11 @@ void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents, } } -Status BlockBasedTableBuilder::status() const { - return rep_->status; -} +Status BlockBasedTableBuilder::status() const { return rep_->status; } -static void DeleteCachedBlock(const Slice& /*key*/, void* value) { - Block* block = reinterpret_cast(value); - delete block; +static void DeleteCachedBlockContents(const Slice& /*key*/, void* value) { + BlockContents* bc = reinterpret_cast(value); + delete bc; } // @@ -651,28 +791,31 @@ Status BlockBasedTableBuilder::InsertBlockInCache(const Slice& block_contents, Cache* block_cache_compressed = r->table_options.block_cache_compressed.get(); if (type != kNoCompression && block_cache_compressed != nullptr) { - size_t size = block_contents.size(); - std::unique_ptr ubuf(new char[size + 1]); + auto ubuf = + AllocateBlock(size + 1, block_cache_compressed->memory_allocator()); memcpy(ubuf.get(), block_contents.data(), size); ubuf[size] = type; - BlockContents results(std::move(ubuf), size, true, type); - - Block* block = new Block(std::move(results), kDisableGlobalSequenceNumber); + BlockContents* block_contents_to_cache = + new BlockContents(std::move(ubuf), size); +#ifndef NDEBUG + block_contents_to_cache->is_raw_block = true; +#endif // NDEBUG // make cache key by appending the file offset to the cache prefix id char* end = EncodeVarint64( - r->compressed_cache_key_prefix + - r->compressed_cache_key_prefix_size, - handle->offset()); - Slice key(r->compressed_cache_key_prefix, static_cast - (end - r->compressed_cache_key_prefix)); + r->compressed_cache_key_prefix + r->compressed_cache_key_prefix_size, + handle->offset()); + Slice key(r->compressed_cache_key_prefix, + static_cast(end - r->compressed_cache_key_prefix)); // Insert into compressed block cache. - block_cache_compressed->Insert(key, block, block->ApproximateMemoryUsage(), - &DeleteCachedBlock); + block_cache_compressed->Insert( + key, block_contents_to_cache, + block_contents_to_cache->ApproximateMemoryUsage(), + &DeleteCachedBlockContents); // Invalidate OS cache. r->file->InvalidateCache(static_cast(r->offset), size); @@ -780,7 +923,9 @@ void BlockBasedTableBuilder::WritePropertiesBlock( ? rep_->ioptions.merge_operator->Name() : "nullptr"; rep_->props.compression_name = - CompressionTypeToString(rep_->compression_ctx.type()); + CompressionTypeToString(rep_->compression_type); + rep_->props.compression_options = + CompressionOptionsToString(rep_->compression_opts); rep_->props.prefix_extractor_name = rep_->moptions.prefix_extractor != nullptr ? rep_->moptions.prefix_extractor->Name() @@ -823,17 +968,36 @@ void BlockBasedTableBuilder::WritePropertiesBlock( &properties_block_handle); } if (ok()) { +#ifndef NDEBUG + { + uint64_t props_block_offset = properties_block_handle.offset(); + uint64_t props_block_size = properties_block_handle.size(); + TEST_SYNC_POINT_CALLBACK( + "BlockBasedTableBuilder::WritePropertiesBlock:GetPropsBlockOffset", + &props_block_offset); + TEST_SYNC_POINT_CALLBACK( + "BlockBasedTableBuilder::WritePropertiesBlock:GetPropsBlockSize", + &props_block_size); + } +#endif // !NDEBUG meta_index_builder->Add(kPropertiesBlock, properties_block_handle); } } void BlockBasedTableBuilder::WriteCompressionDictBlock( MetaIndexBuilder* meta_index_builder) { - if (rep_->compression_dict && rep_->compression_dict->size()) { + if (rep_->compression_dict != nullptr && + rep_->compression_dict->GetRawDict().size()) { BlockHandle compression_dict_block_handle; if (ok()) { - WriteRawBlock(*rep_->compression_dict, kNoCompression, + WriteRawBlock(rep_->compression_dict->GetRawDict(), kNoCompression, &compression_dict_block_handle); +#ifndef NDEBUG + Slice compression_dict = rep_->compression_dict->GetRawDict(); + TEST_SYNC_POINT_CALLBACK( + "BlockBasedTableBuilder::WriteCompressionDictBlock:RawDict", + &compression_dict); +#endif // NDEBUG } if (ok()) { meta_index_builder->Add(kCompressionDictBlock, @@ -852,13 +1016,106 @@ void BlockBasedTableBuilder::WriteRangeDelBlock( } } +void BlockBasedTableBuilder::WriteFooter(BlockHandle& metaindex_block_handle, + BlockHandle& index_block_handle) { + Rep* r = rep_; + // No need to write out new footer if we're using default checksum. + // We're writing legacy magic number because we want old versions of RocksDB + // be able to read files generated with new release (just in case if + // somebody wants to roll back after an upgrade) + // TODO(icanadi) at some point in the future, when we're absolutely sure + // nobody will roll back to RocksDB 2.x versions, retire the legacy magic + // number and always write new table files with new magic number + bool legacy = (r->table_options.format_version == 0); + // this is guaranteed by BlockBasedTableBuilder's constructor + assert(r->table_options.checksum == kCRC32c || + r->table_options.format_version != 0); + Footer footer( + legacy ? kLegacyBlockBasedTableMagicNumber : kBlockBasedTableMagicNumber, + r->table_options.format_version); + footer.set_metaindex_handle(metaindex_block_handle); + footer.set_index_handle(index_block_handle); + footer.set_checksum(r->table_options.checksum); + std::string footer_encoding; + footer.EncodeTo(&footer_encoding); + assert(r->status.ok()); + r->status = r->file->Append(footer_encoding); + if (r->status.ok()) { + r->offset += footer_encoding.size(); + } +} + +void BlockBasedTableBuilder::EnterUnbuffered() { + Rep* r = rep_; + assert(r->state == Rep::State::kBuffered); + r->state = Rep::State::kUnbuffered; + const size_t kSampleBytes = r->compression_opts.zstd_max_train_bytes > 0 + ? r->compression_opts.zstd_max_train_bytes + : r->compression_opts.max_dict_bytes; + Random64 generator{r->creation_time}; + std::string compression_dict_samples; + std::vector compression_dict_sample_lens; + if (!r->data_block_and_keys_buffers.empty()) { + while (compression_dict_samples.size() < kSampleBytes) { + size_t rand_idx = + generator.Uniform(r->data_block_and_keys_buffers.size()); + size_t copy_len = + std::min(kSampleBytes - compression_dict_samples.size(), + r->data_block_and_keys_buffers[rand_idx].first.size()); + compression_dict_samples.append( + r->data_block_and_keys_buffers[rand_idx].first, 0, copy_len); + compression_dict_sample_lens.emplace_back(copy_len); + } + } + + // final data block flushed, now we can generate dictionary from the samples. + // OK if compression_dict_samples is empty, we'll just get empty dictionary. + std::string dict; + if (r->compression_opts.zstd_max_train_bytes > 0) { + dict = ZSTD_TrainDictionary(compression_dict_samples, + compression_dict_sample_lens, + r->compression_opts.max_dict_bytes); + } else { + dict = std::move(compression_dict_samples); + } + r->compression_dict.reset(new CompressionDict(dict, r->compression_type, + r->compression_opts.level)); + r->verify_dict.reset(new UncompressionDict( + dict, r->compression_type == kZSTD || + r->compression_type == kZSTDNotFinalCompression)); + + for (size_t i = 0; ok() && i < r->data_block_and_keys_buffers.size(); ++i) { + const auto& data_block = r->data_block_and_keys_buffers[i].first; + auto& keys = r->data_block_and_keys_buffers[i].second; + assert(!data_block.empty()); + assert(!keys.empty()); + + for (const auto& key : keys) { + if (r->filter_builder != nullptr) { + r->filter_builder->Add(ExtractUserKey(key)); + } + r->index_builder->OnKeyAdded(key); + } + WriteBlock(Slice(data_block), &r->pending_handle, true /* is_data_block */); + if (ok() && i + 1 < r->data_block_and_keys_buffers.size()) { + Slice first_key_in_next_block = + r->data_block_and_keys_buffers[i + 1].second.front(); + Slice* first_key_in_next_block_ptr = &first_key_in_next_block; + r->index_builder->AddIndexEntry(&keys.back(), first_key_in_next_block_ptr, + r->pending_handle); + } + } + r->data_block_and_keys_buffers.clear(); +} + Status BlockBasedTableBuilder::Finish() { Rep* r = rep_; + assert(r->state != Rep::State::kClosed); bool empty_data_block = r->data_block.empty(); Flush(); - assert(!r->closed); - r->closed = true; - + if (r->state == Rep::State::kBuffered) { + EnterUnbuffered(); + } // To make sure properties block is able to keep the accurate size of index // block, we will finish writing all index entries first. if (ok() && !empty_data_block) { @@ -866,13 +1123,14 @@ Status BlockBasedTableBuilder::Finish() { &r->last_key, nullptr /* no next data block */, r->pending_handle); } - // Write meta blocks and metaindex block with the following order. + // Write meta blocks, metaindex block and footer in the following order. // 1. [meta block: filter] // 2. [meta block: index] // 3. [meta block: compression dictionary] // 4. [meta block: range deletion tombstone] // 5. [meta block: properties] // 6. [metaindex block] + // 7. Footer BlockHandle metaindex_block_handle, index_block_handle; MetaIndexBuilder meta_index_builder; WriteFilterBlock(&meta_index_builder); @@ -885,51 +1143,23 @@ Status BlockBasedTableBuilder::Finish() { WriteRawBlock(meta_index_builder.Finish(), kNoCompression, &metaindex_block_handle); } - - // Write footer if (ok()) { - // No need to write out new footer if we're using default checksum. - // We're writing legacy magic number because we want old versions of RocksDB - // be able to read files generated with new release (just in case if - // somebody wants to roll back after an upgrade) - // TODO(icanadi) at some point in the future, when we're absolutely sure - // nobody will roll back to RocksDB 2.x versions, retire the legacy magic - // number and always write new table files with new magic number - bool legacy = (r->table_options.format_version == 0); - // this is guaranteed by BlockBasedTableBuilder's constructor - assert(r->table_options.checksum == kCRC32c || - r->table_options.format_version != 0); - Footer footer(legacy ? kLegacyBlockBasedTableMagicNumber - : kBlockBasedTableMagicNumber, - r->table_options.format_version); - footer.set_metaindex_handle(metaindex_block_handle); - footer.set_index_handle(index_block_handle); - footer.set_checksum(r->table_options.checksum); - std::string footer_encoding; - footer.EncodeTo(&footer_encoding); - assert(r->status.ok()); - r->status = r->file->Append(footer_encoding); - if (r->status.ok()) { - r->offset += footer_encoding.size(); - } + WriteFooter(metaindex_block_handle, index_block_handle); } - + r->state = Rep::State::kClosed; return r->status; } void BlockBasedTableBuilder::Abandon() { - Rep* r = rep_; - assert(!r->closed); - r->closed = true; + assert(rep_->state != Rep::State::kClosed); + rep_->state = Rep::State::kClosed; } uint64_t BlockBasedTableBuilder::NumEntries() const { return rep_->props.num_entries; } -uint64_t BlockBasedTableBuilder::FileSize() const { - return rep_->offset; -} +uint64_t BlockBasedTableBuilder::FileSize() const { return rep_->offset; } bool BlockBasedTableBuilder::NeedCompact() const { for (const auto& collector : rep_->table_properties_collectors) { diff --git a/ceph/src/rocksdb/table/block_based_table_builder.h b/ceph/src/rocksdb/table/block_based_table_builder.h index eba0bb7c1..b10494e7b 100644 --- a/ceph/src/rocksdb/table/block_based_table_builder.h +++ b/ceph/src/rocksdb/table/block_based_table_builder.h @@ -37,8 +37,6 @@ class BlockBasedTableBuilder : public TableBuilder { // Create a builder that will store the contents of the table it is // building in *file. Does not close the file. It is up to the // caller to close the file after calling Finish(). - // @param compression_dict Data for presetting the compression library's - // dictionary, or nullptr. BlockBasedTableBuilder( const ImmutableCFOptions& ioptions, const MutableCFOptions& moptions, const BlockBasedTableOptions& table_options, @@ -47,10 +45,10 @@ class BlockBasedTableBuilder : public TableBuilder { int_tbl_prop_collector_factories, uint32_t column_family_id, WritableFileWriter* file, const CompressionType compression_type, - const CompressionOptions& compression_opts, - const std::string* compression_dict, const bool skip_filters, + const uint64_t sample_for_compression, + const CompressionOptions& compression_opts, const bool skip_filters, const std::string& column_family_name, const uint64_t creation_time = 0, - const uint64_t oldest_key_time = 0); + const uint64_t oldest_key_time = 0, const uint64_t target_file_size = 0); // REQUIRES: Either Finish() or Abandon() has been called. ~BlockBasedTableBuilder(); @@ -94,6 +92,11 @@ class BlockBasedTableBuilder : public TableBuilder { private: bool ok() const { return status().ok(); } + // Transition state from buffered to unbuffered. See `Rep::State` API comment + // for details of the states. + // REQUIRES: `rep_->state == kBuffered` + void EnterUnbuffered(); + // Call block's Finish() method // and then write the compressed block contents to file. void WriteBlock(BlockBuilder* block, BlockHandle* handle, bool is_data_block); @@ -114,6 +117,8 @@ class BlockBasedTableBuilder : public TableBuilder { void WritePropertiesBlock(MetaIndexBuilder* meta_index_builder); void WriteCompressionDictBlock(MetaIndexBuilder* meta_index_builder); void WriteRangeDelBlock(MetaIndexBuilder* meta_index_builder); + void WriteFooter(BlockHandle& metaindex_block_handle, + BlockHandle& index_block_handle); struct Rep; class BlockBasedTablePropertiesCollectorFactory; @@ -131,8 +136,10 @@ class BlockBasedTableBuilder : public TableBuilder { const uint64_t kCompressionSizeLimit = std::numeric_limits::max(); }; -Slice CompressBlock(const Slice& raw, const CompressionContext& compression_ctx, +Slice CompressBlock(const Slice& raw, const CompressionInfo& info, CompressionType* type, uint32_t format_version, - std::string* compressed_output); + bool do_sample, std::string* compressed_output, + std::string* sampled_output_fast, + std::string* sampled_output_slow); } // namespace rocksdb diff --git a/ceph/src/rocksdb/table/block_based_table_factory.cc b/ceph/src/rocksdb/table/block_based_table_factory.cc index 485aed870..cda8d1e27 100644 --- a/ceph/src/rocksdb/table/block_based_table_factory.cc +++ b/ceph/src/rocksdb/table/block_based_table_factory.cc @@ -194,8 +194,8 @@ BlockBasedTableFactory::BlockBasedTableFactory( Status BlockBasedTableFactory::NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table_reader, bool prefetch_index_and_filter_in_cache) const { return BlockBasedTable::Open( table_reader_options.ioptions, table_reader_options.env_options, @@ -214,12 +214,13 @@ TableBuilder* BlockBasedTableFactory::NewTableBuilder( table_options_, table_builder_options.internal_comparator, table_builder_options.int_tbl_prop_collector_factories, column_family_id, file, table_builder_options.compression_type, + table_builder_options.sample_for_compression, table_builder_options.compression_opts, - table_builder_options.compression_dict, table_builder_options.skip_filters, table_builder_options.column_family_name, table_builder_options.creation_time, - table_builder_options.oldest_key_time); + table_builder_options.oldest_key_time, + table_builder_options.target_file_size); return table_builder; } @@ -296,6 +297,12 @@ std::string BlockBasedTableFactory::GetPrintableTableOptions() const { snprintf(buffer, kBufferSize, " index_type: %d\n", table_options_.index_type); ret.append(buffer); + snprintf(buffer, kBufferSize, " data_block_index_type: %d\n", + table_options_.data_block_index_type); + ret.append(buffer); + snprintf(buffer, kBufferSize, " data_block_hash_table_util_ratio: %lf\n", + table_options_.data_block_hash_table_util_ratio); + ret.append(buffer); snprintf(buffer, kBufferSize, " hash_index_allow_collision: %d\n", table_options_.hash_index_allow_collision); ret.append(buffer); diff --git a/ceph/src/rocksdb/table/block_based_table_factory.h b/ceph/src/rocksdb/table/block_based_table_factory.h index b30bd6232..100bb0bc4 100644 --- a/ceph/src/rocksdb/table/block_based_table_factory.h +++ b/ceph/src/rocksdb/table/block_based_table_factory.h @@ -23,7 +23,6 @@ namespace rocksdb { struct EnvOptions; -using std::unique_ptr; class BlockBasedTableBuilder; // A class used to track actual bytes written from the tail in the recent SST @@ -53,8 +52,8 @@ class BlockBasedTableFactory : public TableFactory { Status NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table_reader, bool prefetch_index_and_filter_in_cache = true) const override; TableBuilder* NewTableBuilder( diff --git a/ceph/src/rocksdb/table/block_based_table_reader.cc b/ceph/src/rocksdb/table/block_based_table_reader.cc index 9f2e02d68..dc2d4263e 100644 --- a/ceph/src/rocksdb/table/block_based_table_reader.cc +++ b/ceph/src/rocksdb/table/block_based_table_reader.cc @@ -46,17 +46,18 @@ #include "monitoring/perf_context_imp.h" #include "util/coding.h" +#include "util/crc32c.h" #include "util/file_reader_writer.h" #include "util/stop_watch.h" #include "util/string_util.h" #include "util/sync_point.h" +#include "util/xxhash.h" namespace rocksdb { extern const uint64_t kBlockBasedTableMagicNumber; extern const std::string kHashIndexPrefixesBlock; extern const std::string kHashIndexPrefixesMetadataBlock; -using std::unique_ptr; typedef BlockBasedTable::IndexReader IndexReader; @@ -72,19 +73,21 @@ namespace { // The only relevant option is options.verify_checksums for now. // On failure return non-OK. // On success fill *result and return OK - caller owns *result -// @param compression_dict Data for presetting the compression library's +// @param uncompression_dict Data for presetting the compression library's // dictionary. Status ReadBlockFromFile( RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer, const Footer& footer, const ReadOptions& options, const BlockHandle& handle, std::unique_ptr* result, const ImmutableCFOptions& ioptions, - bool do_uncompress, const Slice& compression_dict, + bool do_uncompress, bool maybe_compressed, + const UncompressionDict& uncompression_dict, const PersistentCacheOptions& cache_options, SequenceNumber global_seqno, - size_t read_amp_bytes_per_bit, const bool immortal_file = false) { + size_t read_amp_bytes_per_bit, MemoryAllocator* memory_allocator) { BlockContents contents; BlockFetcher block_fetcher(file, prefetch_buffer, footer, options, handle, &contents, ioptions, do_uncompress, - compression_dict, cache_options, immortal_file); + maybe_compressed, uncompression_dict, + cache_options, memory_allocator); Status s = block_fetcher.ReadBlockContents(); if (s.ok()) { result->reset(new Block(std::move(contents), global_seqno, @@ -94,6 +97,20 @@ Status ReadBlockFromFile( return s; } +inline MemoryAllocator* GetMemoryAllocator( + const BlockBasedTableOptions& table_options) { + return table_options.block_cache.get() + ? table_options.block_cache->memory_allocator() + : nullptr; +} + +inline MemoryAllocator* GetMemoryAllocatorForCompressedBlock( + const BlockBasedTableOptions& table_options) { + return table_options.block_cache_compressed.get() + ? table_options.block_cache_compressed->memory_allocator() + : nullptr; +} + // Delete the resource that is held by the iterator. template void DeleteHeldResource(void* arg, void* /*ignored*/) { @@ -109,6 +126,7 @@ void DeleteCachedEntry(const Slice& /*key*/, void* value) { void DeleteCachedFilterEntry(const Slice& key, void* value); void DeleteCachedIndexEntry(const Slice& key, void* value); +void DeleteCachedUncompressionDictEntry(const Slice& key, void* value); // Release the cached entry and decrement its ref count. void ReleaseCachedEntry(void* arg, void* h) { @@ -136,7 +154,7 @@ Slice GetCacheKeyFromOffset(const char* cache_key_prefix, } Cache::Handle* GetEntryFromCache(Cache* block_cache, const Slice& key, - Tickers block_cache_miss_ticker, + int level, Tickers block_cache_miss_ticker, Tickers block_cache_hit_ticker, uint64_t* block_cache_miss_stats, uint64_t* block_cache_hit_stats, @@ -145,6 +163,8 @@ Cache::Handle* GetEntryFromCache(Cache* block_cache, const Slice& key, auto cache_handle = block_cache->Lookup(key, statistics); if (cache_handle != nullptr) { PERF_COUNTER_ADD(block_cache_hit_count, 1); + PERF_COUNTER_BY_LEVEL_ADD(block_cache_hit_count, 1, + static_cast(level)); if (get_context != nullptr) { // overall cache hit get_context->get_context_stats_.num_cache_hit++; @@ -162,6 +182,8 @@ Cache::Handle* GetEntryFromCache(Cache* block_cache, const Slice& key, RecordTick(statistics, block_cache_hit_ticker); } } else { + PERF_COUNTER_BY_LEVEL_ADD(block_cache_miss_count, 1, + static_cast(level)); if (get_context != nullptr) { // overall cache miss get_context->get_context_stats_.num_cache_miss++; @@ -215,13 +237,15 @@ class PartitionIndexReader : public IndexReader, public Cleanable { IndexReader** index_reader, const PersistentCacheOptions& cache_options, const int level, const bool index_key_includes_seq, - const bool index_value_is_full) { + const bool index_value_is_full, + MemoryAllocator* memory_allocator) { std::unique_ptr index_block; auto s = ReadBlockFromFile( file, prefetch_buffer, footer, ReadOptions(), index_handle, &index_block, ioptions, true /* decompress */, - Slice() /*compression dict*/, cache_options, - kDisableGlobalSequenceNumber, 0 /* read_amp_bytes_per_bit */); + true /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, kDisableGlobalSequenceNumber, + 0 /* read_amp_bytes_per_bit */, memory_allocator); if (s.ok()) { *index_reader = new PartitionIndexReader( @@ -233,12 +257,14 @@ class PartitionIndexReader : public IndexReader, public Cleanable { } // return a two-level iterator: first level is on the partition index - virtual InternalIteratorBase* NewIterator( + InternalIteratorBase* NewIterator( IndexBlockIter* /*iter*/ = nullptr, bool /*dont_care*/ = true, bool fill_cache = true) override { Statistics* kNullStats = nullptr; // Filters are already checked before seeking the index if (!partition_map_.empty()) { + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return NewTwoLevelIterator( new BlockBasedTable::PartitionedIndexIteratorState( table_, &partition_map_, index_key_includes_seq_, @@ -250,6 +276,8 @@ class PartitionIndexReader : public IndexReader, public Cleanable { auto ro = ReadOptions(); ro.fill_cache = fill_cache; bool kIsIndex = true; + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return new BlockBasedTableIterator( table_, ro, *icomparator_, index_block_->NewIterator( @@ -264,12 +292,14 @@ class PartitionIndexReader : public IndexReader, public Cleanable { // in its destructor. } - virtual void CacheDependencies(bool pin) override { + void CacheDependencies(bool pin) override { // Before read partitions, prefetch them to avoid lots of IOs auto rep = table_->rep_; IndexBlockIter biter; BlockHandle handle; Statistics* kNullStats = nullptr; + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. index_block_->NewIterator( icomparator_, icomparator_->user_comparator(), &biter, kNullStats, true, index_key_includes_seq_, index_value_is_full_); @@ -305,16 +335,13 @@ class PartitionIndexReader : public IndexReader, public Cleanable { for (; biter.Valid(); biter.Next()) { handle = biter.value(); BlockBasedTable::CachableEntry block; - Slice compression_dict; - if (rep->compression_dict_block) { - compression_dict = rep->compression_dict_block->data; - } const bool is_index = true; // TODO: Support counter batch update for partitioned index and // filter blocks - s = table_->MaybeLoadDataBlockToCache( - prefetch_buffer.get(), rep, ro, handle, compression_dict, &block, - is_index, nullptr /* get_context */); + s = table_->MaybeReadBlockAndLoadToCache( + prefetch_buffer.get(), rep, ro, handle, + UncompressionDict::GetEmptyDict(), &block, is_index, + nullptr /* get_context */); assert(s.ok() || block.value == nullptr); if (s.ok() && block.value != nullptr) { @@ -333,12 +360,10 @@ class PartitionIndexReader : public IndexReader, public Cleanable { } } - virtual size_t size() const override { return index_block_->size(); } - virtual size_t usable_size() const override { - return index_block_->usable_size(); - } + size_t size() const override { return index_block_->size(); } + size_t usable_size() const override { return index_block_->usable_size(); } - virtual size_t ApproximateMemoryUsage() const override { + size_t ApproximateMemoryUsage() const override { assert(index_block_); size_t usage = index_block_->ApproximateMemoryUsage(); #ifdef ROCKSDB_MALLOC_USABLE_SIZE @@ -388,13 +413,15 @@ class BinarySearchIndexReader : public IndexReader { IndexReader** index_reader, const PersistentCacheOptions& cache_options, const bool index_key_includes_seq, - const bool index_value_is_full) { + const bool index_value_is_full, + MemoryAllocator* memory_allocator) { std::unique_ptr index_block; auto s = ReadBlockFromFile( file, prefetch_buffer, footer, ReadOptions(), index_handle, &index_block, ioptions, true /* decompress */, - Slice() /*compression dict*/, cache_options, - kDisableGlobalSequenceNumber, 0 /* read_amp_bytes_per_bit */); + true /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, kDisableGlobalSequenceNumber, + 0 /* read_amp_bytes_per_bit */, memory_allocator); if (s.ok()) { *index_reader = new BinarySearchIndexReader( @@ -405,21 +432,21 @@ class BinarySearchIndexReader : public IndexReader { return s; } - virtual InternalIteratorBase* NewIterator( + InternalIteratorBase* NewIterator( IndexBlockIter* iter = nullptr, bool /*dont_care*/ = true, bool /*dont_care*/ = true) override { Statistics* kNullStats = nullptr; + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return index_block_->NewIterator( icomparator_, icomparator_->user_comparator(), iter, kNullStats, true, index_key_includes_seq_, index_value_is_full_); } - virtual size_t size() const override { return index_block_->size(); } - virtual size_t usable_size() const override { - return index_block_->usable_size(); - } + size_t size() const override { return index_block_->size(); } + size_t usable_size() const override { return index_block_->usable_size(); } - virtual size_t ApproximateMemoryUsage() const override { + size_t ApproximateMemoryUsage() const override { assert(index_block_); size_t usage = index_block_->ApproximateMemoryUsage(); #ifdef ROCKSDB_MALLOC_USABLE_SIZE @@ -458,13 +485,15 @@ class HashIndexReader : public IndexReader { InternalIterator* meta_index_iter, IndexReader** index_reader, bool /*hash_index_allow_collision*/, const PersistentCacheOptions& cache_options, - const bool index_key_includes_seq, const bool index_value_is_full) { + const bool index_key_includes_seq, const bool index_value_is_full, + MemoryAllocator* memory_allocator) { std::unique_ptr index_block; auto s = ReadBlockFromFile( file, prefetch_buffer, footer, ReadOptions(), index_handle, &index_block, ioptions, true /* decompress */, - Slice() /*compression dict*/, cache_options, - kDisableGlobalSequenceNumber, 0 /* read_amp_bytes_per_bit */); + true /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, kDisableGlobalSequenceNumber, + 0 /* read_amp_bytes_per_bit */, memory_allocator); if (!s.ok()) { return s; @@ -497,13 +526,13 @@ class HashIndexReader : public IndexReader { return Status::OK(); } - Slice dummy_comp_dict; // Read contents for the blocks BlockContents prefixes_contents; BlockFetcher prefixes_block_fetcher( file, prefetch_buffer, footer, ReadOptions(), prefixes_handle, - &prefixes_contents, ioptions, true /* decompress */, - dummy_comp_dict /*compression dict*/, cache_options); + &prefixes_contents, ioptions, true /*decompress*/, + true /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, memory_allocator); s = prefixes_block_fetcher.ReadBlockContents(); if (!s.ok()) { return s; @@ -511,8 +540,9 @@ class HashIndexReader : public IndexReader { BlockContents prefixes_meta_contents; BlockFetcher prefixes_meta_block_fetcher( file, prefetch_buffer, footer, ReadOptions(), prefixes_meta_handle, - &prefixes_meta_contents, ioptions, true /* decompress */, - dummy_comp_dict /*compression dict*/, cache_options); + &prefixes_meta_contents, ioptions, true /*decompress*/, + true /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, memory_allocator); s = prefixes_meta_block_fetcher.ReadBlockContents(); if (!s.ok()) { // TODO: log error @@ -530,22 +560,22 @@ class HashIndexReader : public IndexReader { return Status::OK(); } - virtual InternalIteratorBase* NewIterator( + InternalIteratorBase* NewIterator( IndexBlockIter* iter = nullptr, bool total_order_seek = true, bool /*dont_care*/ = true) override { Statistics* kNullStats = nullptr; + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return index_block_->NewIterator( icomparator_, icomparator_->user_comparator(), iter, kNullStats, total_order_seek, index_key_includes_seq_, index_value_is_full_, - prefix_index_.get()); + false /* block_contents_pinned */, prefix_index_.get()); } - virtual size_t size() const override { return index_block_->size(); } - virtual size_t usable_size() const override { - return index_block_->usable_size(); - } + size_t size() const override { return index_block_->size(); } + size_t usable_size() const override { return index_block_->usable_size(); } - virtual size_t ApproximateMemoryUsage() const override { + size_t ApproximateMemoryUsage() const override { assert(index_block_); size_t usage = index_block_->ApproximateMemoryUsage(); usage += prefixes_contents_.usable_size(); @@ -572,8 +602,7 @@ class HashIndexReader : public IndexReader { assert(index_block_ != nullptr); } - ~HashIndexReader() { - } + ~HashIndexReader() override {} std::unique_ptr index_block_; std::unique_ptr prefix_index_; @@ -606,9 +635,8 @@ void BlockBasedTable::SetupCacheKeyPrefix(Rep* rep, uint64_t file_size) { } } -void BlockBasedTable::GenerateCachePrefix(Cache* cc, - RandomAccessFile* file, char* buffer, size_t* size) { - +void BlockBasedTable::GenerateCachePrefix(Cache* cc, RandomAccessFile* file, + char* buffer, size_t* size) { // generate an id from the file *size = file->GetUniqueId(buffer, kMaxCacheKeyPrefixSize); @@ -620,9 +648,8 @@ void BlockBasedTable::GenerateCachePrefix(Cache* cc, } } -void BlockBasedTable::GenerateCachePrefix(Cache* cc, - WritableFile* file, char* buffer, size_t* size) { - +void BlockBasedTable::GenerateCachePrefix(Cache* cc, WritableFile* file, + char* buffer, size_t* size) { // generate an id from the file *size = file->GetUniqueId(buffer, kMaxCacheKeyPrefixSize); @@ -696,17 +723,24 @@ Status GetGlobalSequenceNumber(const TableProperties& table_properties, if (seqno_pos != props.end()) { global_seqno = DecodeFixed64(seqno_pos->second.c_str()); } - if (global_seqno != 0 && global_seqno != largest_seqno) { - std::array msg_buf; - snprintf(msg_buf.data(), msg_buf.max_size(), - "An external sst file with version %u have global seqno property " - "with value %s, while largest seqno in the file is %llu", - version, seqno_pos->second.c_str(), - static_cast(largest_seqno)); - return Status::Corruption(msg_buf.data()); + // SstTableReader open table reader with kMaxSequenceNumber as largest_seqno + // to denote it is unknown. + if (largest_seqno < kMaxSequenceNumber) { + if (global_seqno == 0) { + global_seqno = largest_seqno; + } + if (global_seqno != largest_seqno) { + std::array msg_buf; + snprintf( + msg_buf.data(), msg_buf.max_size(), + "An external sst file with version %u have global seqno property " + "with value %s, while largest seqno in the file is %llu", + version, seqno_pos->second.c_str(), + static_cast(largest_seqno)); + return Status::Corruption(msg_buf.data()); + } } - global_seqno = largest_seqno; - *seqno = largest_seqno; + *seqno = global_seqno; if (global_seqno > kMaxSequenceNumber) { std::array msg_buf; @@ -737,9 +771,9 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, const EnvOptions& env_options, const BlockBasedTableOptions& table_options, const InternalKeyComparator& internal_comparator, - unique_ptr&& file, + std::unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, + std::unique_ptr* table_reader, const SliceTransform* prefix_extractor, const bool prefetch_index_and_filter_in_cache, const bool skip_filters, const int level, @@ -748,49 +782,25 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, TailPrefetchStats* tail_prefetch_stats) { table_reader->reset(); + Status s; Footer footer; - std::unique_ptr prefetch_buffer; // prefetch both index and filters, down to all partitions const bool prefetch_all = prefetch_index_and_filter_in_cache || level == 0; const bool preload_all = !table_options.cache_index_and_filter_blocks; - size_t tail_prefetch_size = 0; - if (tail_prefetch_stats != nullptr) { - // Multiple threads may get a 0 (no history) when running in parallel, - // but it will get cleared after the first of them finishes. - tail_prefetch_size = tail_prefetch_stats->GetSuggestedPrefetchSize(); - } - if (tail_prefetch_size == 0) { - // Before read footer, readahead backwards to prefetch data. Do more - // readahead if we're going to read index/filter. - // TODO: This may incorrectly select small readahead in case partitioned - // index/filter is enabled and top-level partition pinning is enabled. - // That's because we need to issue readahead before we read the properties, - // at which point we don't yet know the index type. - tail_prefetch_size = prefetch_all || preload_all ? 512 * 1024 : 4 * 1024; - } - size_t prefetch_off; - size_t prefetch_len; - if (file_size < tail_prefetch_size) { - prefetch_off = 0; - prefetch_len = static_cast(file_size); - } else { - prefetch_off = static_cast(file_size - tail_prefetch_size); - prefetch_len = tail_prefetch_size; - } - TEST_SYNC_POINT_CALLBACK("BlockBasedTable::Open::TailPrefetchLen", - &tail_prefetch_size); - Status s; - // TODO should not have this special logic in the future. - if (!file->use_direct_io()) { - prefetch_buffer.reset(new FilePrefetchBuffer(nullptr, 0, 0, false, true)); - s = file->Prefetch(prefetch_off, prefetch_len); - } else { - prefetch_buffer.reset(new FilePrefetchBuffer(nullptr, 0, 0, true, true)); - s = prefetch_buffer->Prefetch(file.get(), prefetch_off, prefetch_len); - } + s = PrefetchTail(file.get(), file_size, tail_prefetch_stats, prefetch_all, + preload_all, &prefetch_buffer); + + // Read in the following order: + // 1. Footer + // 2. [metaindex block] + // 3. [meta block: properties] + // 4. [meta block: range deletion tombstone] + // 5. [meta block: compression dictionary] + // 6. [meta block: index] + // 7. [meta block: filter] s = ReadFooterFromFile(file.get(), prefetch_buffer.get(), file_size, &footer, kBlockBasedTableMagicNumber); if (!s.ok()) { @@ -807,7 +817,7 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, // raw pointer will be used to create HashIndexReader, whose reset may // access a dangling pointer. Rep* rep = new BlockBasedTable::Rep(ioptions, env_options, table_options, - internal_comparator, skip_filters, + internal_comparator, skip_filters, level, immortal_table); rep->file = std::move(file); rep->footer = footer; @@ -818,16 +828,16 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, rep->internal_prefix_transform.reset( new InternalKeySliceTransform(prefix_extractor)); SetupCacheKeyPrefix(rep, file_size); - unique_ptr new_table(new BlockBasedTable(rep)); + std::unique_ptr new_table(new BlockBasedTable(rep)); // page cache options rep->persistent_cache_options = PersistentCacheOptions(rep->table_options.persistent_cache, std::string(rep->persistent_cache_key_prefix, rep->persistent_cache_key_prefix_size), - rep->ioptions.statistics); + rep->ioptions.statistics); - // Read meta index + // Read metaindex std::unique_ptr meta; std::unique_ptr meta_iter; s = ReadMetaBlock(rep, prefetch_buffer.get(), &meta, &meta_iter); @@ -835,38 +845,147 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, return s; } - // Find filter handle and filter type - if (rep->filter_policy) { - for (auto filter_type : - {Rep::FilterType::kFullFilter, Rep::FilterType::kPartitionedFilter, - Rep::FilterType::kBlockFilter}) { - std::string prefix; - switch (filter_type) { - case Rep::FilterType::kFullFilter: - prefix = kFullFilterBlockPrefix; - break; - case Rep::FilterType::kPartitionedFilter: - prefix = kPartitionedFilterBlockPrefix; - break; - case Rep::FilterType::kBlockFilter: - prefix = kFilterBlockPrefix; - break; - default: - assert(0); - } - std::string filter_block_key = prefix; - filter_block_key.append(rep->filter_policy->Name()); - if (FindMetaBlock(meta_iter.get(), filter_block_key, &rep->filter_handle) - .ok()) { - rep->filter_type = filter_type; - break; - } + s = ReadPropertiesBlock(rep, prefetch_buffer.get(), meta_iter.get(), + largest_seqno); + if (!s.ok()) { + return s; + } + s = ReadRangeDelBlock(rep, prefetch_buffer.get(), meta_iter.get(), + internal_comparator); + if (!s.ok()) { + return s; + } + s = PrefetchIndexAndFilterBlocks(rep, prefetch_buffer.get(), meta_iter.get(), + new_table.get(), prefix_extractor, + prefetch_all, table_options, level, + prefetch_index_and_filter_in_cache); + + if (s.ok()) { + // Update tail prefetch stats + assert(prefetch_buffer.get() != nullptr); + if (tail_prefetch_stats != nullptr) { + assert(prefetch_buffer->min_offset_read() < file_size); + tail_prefetch_stats->RecordEffectiveSize( + static_cast(file_size) - prefetch_buffer->min_offset_read()); } + + *table_reader = std::move(new_table); + } + + return s; +} + +Status BlockBasedTable::PrefetchTail( + RandomAccessFileReader* file, uint64_t file_size, + TailPrefetchStats* tail_prefetch_stats, const bool prefetch_all, + const bool preload_all, + std::unique_ptr* prefetch_buffer) { + size_t tail_prefetch_size = 0; + if (tail_prefetch_stats != nullptr) { + // Multiple threads may get a 0 (no history) when running in parallel, + // but it will get cleared after the first of them finishes. + tail_prefetch_size = tail_prefetch_stats->GetSuggestedPrefetchSize(); + } + if (tail_prefetch_size == 0) { + // Before read footer, readahead backwards to prefetch data. Do more + // readahead if we're going to read index/filter. + // TODO: This may incorrectly select small readahead in case partitioned + // index/filter is enabled and top-level partition pinning is enabled. + // That's because we need to issue readahead before we read the properties, + // at which point we don't yet know the index type. + tail_prefetch_size = prefetch_all || preload_all ? 512 * 1024 : 4 * 1024; + } + size_t prefetch_off; + size_t prefetch_len; + if (file_size < tail_prefetch_size) { + prefetch_off = 0; + prefetch_len = static_cast(file_size); + } else { + prefetch_off = static_cast(file_size - tail_prefetch_size); + prefetch_len = tail_prefetch_size; + } + TEST_SYNC_POINT_CALLBACK("BlockBasedTable::Open::TailPrefetchLen", + &tail_prefetch_size); + Status s; + // TODO should not have this special logic in the future. + if (!file->use_direct_io()) { + prefetch_buffer->reset(new FilePrefetchBuffer(nullptr, 0, 0, false, true)); + s = file->Prefetch(prefetch_off, prefetch_len); + } else { + prefetch_buffer->reset(new FilePrefetchBuffer(nullptr, 0, 0, true, true)); + s = (*prefetch_buffer)->Prefetch(file, prefetch_off, prefetch_len); } + return s; +} - // Read the properties +Status VerifyChecksum(const ChecksumType type, const char* buf, size_t len, + uint32_t expected) { + Status s; + uint32_t actual = 0; + switch (type) { + case kNoChecksum: + break; + case kCRC32c: + expected = crc32c::Unmask(expected); + actual = crc32c::Value(buf, len); + break; + case kxxHash: + actual = XXH32(buf, static_cast(len), 0); + break; + case kxxHash64: + actual = static_cast(XXH64(buf, static_cast(len), 0) & + uint64_t{0xffffffff}); + break; + default: + s = Status::Corruption("unknown checksum type"); + } + if (s.ok() && actual != expected) { + s = Status::Corruption("properties block checksum mismatched"); + } + return s; +} + +Status BlockBasedTable::TryReadPropertiesWithGlobalSeqno( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, const Slice& handle_value, + TableProperties** table_properties) { + assert(table_properties != nullptr); + // If this is an external SST file ingested with write_global_seqno set to + // true, then we expect the checksum mismatch because checksum was written + // by SstFileWriter, but its global seqno in the properties block may have + // been changed during ingestion. In this case, we read the properties + // block, copy it to a memory buffer, change the global seqno to its + // original value, i.e. 0, and verify the checksum again. + BlockHandle props_block_handle; + CacheAllocationPtr tmp_buf; + Status s = ReadProperties(handle_value, rep->file.get(), prefetch_buffer, + rep->footer, rep->ioptions, table_properties, + false /* verify_checksum */, &props_block_handle, + &tmp_buf, false /* compression_type_missing */, + nullptr /* memory_allocator */); + if (s.ok() && tmp_buf) { + const auto seqno_pos_iter = + (*table_properties) + ->properties_offsets.find( + ExternalSstFilePropertyNames::kGlobalSeqno); + size_t block_size = props_block_handle.size(); + if (seqno_pos_iter != (*table_properties)->properties_offsets.end()) { + uint64_t global_seqno_offset = seqno_pos_iter->second; + EncodeFixed64( + tmp_buf.get() + global_seqno_offset - props_block_handle.offset(), 0); + } + uint32_t value = DecodeFixed32(tmp_buf.get() + block_size + 1); + s = rocksdb::VerifyChecksum(rep->footer.checksum(), tmp_buf.get(), + block_size + 1, value); + } + return s; +} + +Status BlockBasedTable::ReadPropertiesBlock( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, InternalIterator* meta_iter, + const SequenceNumber largest_seqno) { bool found_properties_block = true; - s = SeekToPropertiesBlock(meta_iter.get(), &found_properties_block); + Status s; + s = SeekToPropertiesBlock(meta_iter, &found_properties_block); if (!s.ok()) { ROCKS_LOG_WARN(rep->ioptions.info_log, @@ -876,9 +995,20 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, s = meta_iter->status(); TableProperties* table_properties = nullptr; if (s.ok()) { - s = ReadProperties(meta_iter->value(), rep->file.get(), - prefetch_buffer.get(), rep->footer, rep->ioptions, - &table_properties, false /* compression_type_missing */); + s = ReadProperties( + meta_iter->value(), rep->file.get(), prefetch_buffer, rep->footer, + rep->ioptions, &table_properties, true /* verify_checksum */, + nullptr /* ret_block_handle */, nullptr /* ret_block_contents */, + false /* compression_type_missing */, nullptr /* memory_allocator */); + } + + if (s.IsCorruption()) { + s = TryReadPropertiesWithGlobalSeqno( + rep, prefetch_buffer, meta_iter->value(), &table_properties); + } + std::unique_ptr props_guard; + if (table_properties != nullptr) { + props_guard.reset(table_properties); } if (!s.ok()) { @@ -888,9 +1018,14 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, s.ToString().c_str()); } else { assert(table_properties != nullptr); - rep->table_properties.reset(table_properties); + rep->table_properties.reset(props_guard.release()); rep->blocks_maybe_compressed = rep->table_properties->compression_name != CompressionTypeToString(kNoCompression); + rep->blocks_definitely_zstd_compressed = + (rep->table_properties->compression_name == + CompressionTypeToString(kZSTD) || + rep->table_properties->compression_name == + CompressionTypeToString(kZSTDNotFinalCompression)); } } else { ROCKS_LOG_ERROR(rep->ioptions.info_log, @@ -903,40 +1038,6 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, } #endif // ROCKSDB_LITE - // Read the compression dictionary meta block - bool found_compression_dict; - BlockHandle compression_dict_handle; - s = SeekToCompressionDictBlock(meta_iter.get(), &found_compression_dict, - &compression_dict_handle); - if (!s.ok()) { - ROCKS_LOG_WARN( - rep->ioptions.info_log, - "Error when seeking to compression dictionary block from file: %s", - s.ToString().c_str()); - } else if (found_compression_dict && !compression_dict_handle.IsNull()) { - // TODO(andrewkr): Add to block cache if cache_index_and_filter_blocks is - // true. - std::unique_ptr compression_dict_cont{new BlockContents()}; - PersistentCacheOptions cache_options; - ReadOptions read_options; - read_options.verify_checksums = false; - BlockFetcher compression_block_fetcher( - rep->file.get(), prefetch_buffer.get(), rep->footer, read_options, - compression_dict_handle, compression_dict_cont.get(), rep->ioptions, false /* decompress */, - Slice() /*compression dict*/, cache_options); - s = compression_block_fetcher.ReadBlockContents(); - - if (!s.ok()) { - ROCKS_LOG_WARN( - rep->ioptions.info_log, - "Encountered error while reading data from compression dictionary " - "block %s", - s.ToString().c_str()); - } else { - rep->compression_dict_block = std::move(compression_dict_cont); - } - } - // Read the table properties, if provided. if (rep->table_properties) { rep->whole_key_filtering &= @@ -951,35 +1052,119 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, &(rep->global_seqno)); if (!s.ok()) { ROCKS_LOG_ERROR(rep->ioptions.info_log, "%s", s.ToString().c_str()); - return s; } } + return s; +} - // Read the range del meta block +Status BlockBasedTable::ReadRangeDelBlock( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, InternalIterator* meta_iter, + const InternalKeyComparator& internal_comparator) { + Status s; bool found_range_del_block; - s = SeekToRangeDelBlock(meta_iter.get(), &found_range_del_block, - &rep->range_del_handle); + BlockHandle range_del_handle; + s = SeekToRangeDelBlock(meta_iter, &found_range_del_block, &range_del_handle); if (!s.ok()) { ROCKS_LOG_WARN( rep->ioptions.info_log, "Error when seeking to range delete tombstones block from file: %s", s.ToString().c_str()); - } else { - if (found_range_del_block && !rep->range_del_handle.IsNull()) { - ReadOptions read_options; - s = MaybeLoadDataBlockToCache( - prefetch_buffer.get(), rep, read_options, rep->range_del_handle, - Slice() /* compression_dict */, &rep->range_del_entry, - false /* is_index */, nullptr /* get_context */); - if (!s.ok()) { - ROCKS_LOG_WARN( - rep->ioptions.info_log, - "Encountered error while reading data from range del block %s", - s.ToString().c_str()); + } else if (found_range_del_block && !range_del_handle.IsNull()) { + ReadOptions read_options; + std::unique_ptr iter(NewDataBlockIterator( + rep, read_options, range_del_handle, nullptr /* input_iter */, + false /* is_index */, true /* key_includes_seq */, + true /* index_key_is_full */, nullptr /* get_context */, Status(), + prefetch_buffer)); + assert(iter != nullptr); + s = iter->status(); + if (!s.ok()) { + ROCKS_LOG_WARN( + rep->ioptions.info_log, + "Encountered error while reading data from range del block %s", + s.ToString().c_str()); + } else { + rep->fragmented_range_dels = + std::make_shared(std::move(iter), + internal_comparator); + } + } + return s; +} + +Status BlockBasedTable::ReadCompressionDictBlock( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, + std::unique_ptr* compression_dict_block) { + assert(compression_dict_block != nullptr); + Status s; + if (!rep->compression_dict_handle.IsNull()) { + std::unique_ptr compression_dict_cont{new BlockContents()}; + PersistentCacheOptions cache_options; + ReadOptions read_options; + read_options.verify_checksums = true; + BlockFetcher compression_block_fetcher( + rep->file.get(), prefetch_buffer, rep->footer, read_options, + rep->compression_dict_handle, compression_dict_cont.get(), + rep->ioptions, false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), cache_options); + s = compression_block_fetcher.ReadBlockContents(); + + if (!s.ok()) { + ROCKS_LOG_WARN( + rep->ioptions.info_log, + "Encountered error while reading data from compression dictionary " + "block %s", + s.ToString().c_str()); + } else { + *compression_dict_block = std::move(compression_dict_cont); + } + } + return s; +} + +Status BlockBasedTable::PrefetchIndexAndFilterBlocks( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, InternalIterator* meta_iter, + BlockBasedTable* new_table, const SliceTransform* prefix_extractor, + bool prefetch_all, const BlockBasedTableOptions& table_options, + const int level, const bool prefetch_index_and_filter_in_cache) { + Status s; + + // Find filter handle and filter type + if (rep->filter_policy) { + for (auto filter_type : + {Rep::FilterType::kFullFilter, Rep::FilterType::kPartitionedFilter, + Rep::FilterType::kBlockFilter}) { + std::string prefix; + switch (filter_type) { + case Rep::FilterType::kFullFilter: + prefix = kFullFilterBlockPrefix; + break; + case Rep::FilterType::kPartitionedFilter: + prefix = kPartitionedFilterBlockPrefix; + break; + case Rep::FilterType::kBlockFilter: + prefix = kFilterBlockPrefix; + break; + default: + assert(0); + } + std::string filter_block_key = prefix; + filter_block_key.append(rep->filter_policy->Name()); + if (FindMetaBlock(meta_iter, filter_block_key, &rep->filter_handle) + .ok()) { + rep->filter_type = filter_type; + break; } } } + { + // Find compression dictionary handle + bool found_compression_dict; + s = SeekToCompressionDictBlock(meta_iter, &found_compression_dict, + &rep->compression_dict_handle); + } + bool need_upper_bound_check = PrefixExtractorChanged(rep->table_properties.get(), prefix_extractor); @@ -1007,8 +1192,9 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, pin_all || (table_options.pin_top_level_index_and_filter && rep->filter_type == Rep::FilterType::kPartitionedFilter); // pre-fetching of blocks is turned on - // Will use block cache for index/filter blocks access + // Will use block cache for meta-blocks access // Always prefetch index and filter for level 0 + // TODO(ajkr): also prefetch compression dictionary block if (table_options.cache_index_and_filter_blocks) { assert(table_options.block_cache != nullptr); if (prefetch_index) { @@ -1019,10 +1205,12 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, bool disable_prefix_seek = rep->index_type == BlockBasedTableOptions::kHashSearch && need_upper_bound_check; - unique_ptr> iter( - new_table->NewIndexIterator(ReadOptions(), disable_prefix_seek, - nullptr, &index_entry)); - s = iter->status(); + if (s.ok()) { + std::unique_ptr> iter( + new_table->NewIndexIterator(ReadOptions(), disable_prefix_seek, + nullptr, &index_entry)); + s = iter->status(); + } if (s.ok()) { // This is the first call to NewIndexIterator() since we're in Open(). // On success it should give us ownership of the `CachableEntry` by @@ -1056,12 +1244,15 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, } } } else { - // If we don't use block cache for index/filter blocks access, we'll - // pre-load these blocks, which will kept in member variables in Rep - // and with a same life-time as this table object. + // If we don't use block cache for meta-block access, we'll pre-load these + // blocks, which will kept in member variables in Rep and with a same life- + // time as this table object. IndexReader* index_reader = nullptr; - s = new_table->CreateIndexReader(prefetch_buffer.get(), &index_reader, - meta_iter.get(), level); + if (s.ok()) { + s = new_table->CreateIndexReader(prefetch_buffer, &index_reader, + meta_iter, level); + } + std::unique_ptr compression_dict_block; if (s.ok()) { rep->index_reader.reset(index_reader); // The partitions of partitioned index are always stored in cache. They @@ -1074,9 +1265,9 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, // Set filter block if (rep->filter_policy) { const bool is_a_filter_partition = true; - auto filter = new_table->ReadFilter( - prefetch_buffer.get(), rep->filter_handle, !is_a_filter_partition, - rep->table_prefix_extractor.get()); + auto filter = new_table->ReadFilter(prefetch_buffer, rep->filter_handle, + !is_a_filter_partition, + rep->table_prefix_extractor.get()); rep->filter.reset(filter); // Refer to the comment above about paritioned indexes always being // cached @@ -1084,21 +1275,19 @@ Status BlockBasedTable::Open(const ImmutableCFOptions& ioptions, filter->CacheDependencies(pin_all, rep->table_prefix_extractor.get()); } } + s = ReadCompressionDictBlock(rep, prefetch_buffer, + &compression_dict_block); } else { delete index_reader; } - } - - if (s.ok()) { - assert(prefetch_buffer.get() != nullptr); - if (tail_prefetch_stats != nullptr) { - assert(prefetch_buffer->min_offset_read() < file_size); - tail_prefetch_stats->RecordEffectiveSize( - static_cast(file_size) - prefetch_buffer->min_offset_read()); + if (s.ok() && !rep->compression_dict_handle.IsNull()) { + assert(compression_dict_block != nullptr); + // TODO(ajkr): find a way to avoid the `compression_dict_block` data copy + rep->uncompression_dict.reset(new UncompressionDict( + compression_dict_block->data.ToString(), + rep->blocks_definitely_zstd_compressed, rep->ioptions.statistics)); } - *table_reader = std::move(new_table); } - return s; } @@ -1133,6 +1322,9 @@ size_t BlockBasedTable::ApproximateMemoryUsage() const { if (rep_->index_reader) { usage += rep_->index_reader->ApproximateMemoryUsage(); } + if (rep_->uncompression_dict) { + usage += rep_->uncompression_dict->ApproximateMemoryUsage(); + } return usage; } @@ -1148,9 +1340,10 @@ Status BlockBasedTable::ReadMetaBlock(Rep* rep, Status s = ReadBlockFromFile( rep->file.get(), prefetch_buffer, rep->footer, ReadOptions(), rep->footer.metaindex_handle(), &meta, rep->ioptions, - true /* decompress */, Slice() /*compression dict*/, - rep->persistent_cache_options, kDisableGlobalSequenceNumber, - 0 /* read_amp_bytes_per_bit */); + true /* decompress */, true /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), rep->persistent_cache_options, + kDisableGlobalSequenceNumber, 0 /* read_amp_bytes_per_bit */, + GetMemoryAllocator(rep->table_options)); if (!s.ok()) { ROCKS_LOG_ERROR(rep->ioptions.info_log, @@ -1169,20 +1362,20 @@ Status BlockBasedTable::ReadMetaBlock(Rep* rep, Status BlockBasedTable::GetDataBlockFromCache( const Slice& block_cache_key, const Slice& compressed_block_cache_key, - Cache* block_cache, Cache* block_cache_compressed, - const ImmutableCFOptions& ioptions, const ReadOptions& read_options, - BlockBasedTable::CachableEntry* block, uint32_t format_version, - const Slice& compression_dict, size_t read_amp_bytes_per_bit, bool is_index, - GetContext* get_context) { + Cache* block_cache, Cache* block_cache_compressed, Rep* rep, + const ReadOptions& read_options, + BlockBasedTable::CachableEntry* block, + const UncompressionDict& uncompression_dict, size_t read_amp_bytes_per_bit, + bool is_index, GetContext* get_context) { Status s; - Block* compressed_block = nullptr; + BlockContents* compressed_block = nullptr; Cache::Handle* block_cache_compressed_handle = nullptr; - Statistics* statistics = ioptions.statistics; + Statistics* statistics = rep->ioptions.statistics; // Lookup uncompressed cache first if (block_cache != nullptr) { block->cache_handle = GetEntryFromCache( - block_cache, block_cache_key, + block_cache, block_cache_key, rep->level, is_index ? BLOCK_CACHE_INDEX_MISS : BLOCK_CACHE_DATA_MISS, is_index ? BLOCK_CACHE_INDEX_HIT : BLOCK_CACHE_DATA_HIT, get_context @@ -1220,32 +1413,35 @@ Status BlockBasedTable::GetDataBlockFromCache( // found compressed block RecordTick(statistics, BLOCK_CACHE_COMPRESSED_HIT); - compressed_block = reinterpret_cast( + compressed_block = reinterpret_cast( block_cache_compressed->Value(block_cache_compressed_handle)); - assert(compressed_block->compression_type() != kNoCompression); + CompressionType compression_type = compressed_block->get_compression_type(); + assert(compression_type != kNoCompression); // Retrieve the uncompressed contents into a new buffer BlockContents contents; - UncompressionContext uncompresssion_ctx(compressed_block->compression_type(), - compression_dict); - s = UncompressBlockContents(uncompresssion_ctx, compressed_block->data(), - compressed_block->size(), &contents, - format_version, ioptions); + UncompressionContext context(compression_type); + UncompressionInfo info(context, uncompression_dict, compression_type); + s = UncompressBlockContents(info, compressed_block->data.data(), + compressed_block->data.size(), &contents, + rep->table_options.format_version, rep->ioptions, + GetMemoryAllocator(rep->table_options)); // Insert uncompressed block into block cache if (s.ok()) { block->value = - new Block(std::move(contents), compressed_block->global_seqno(), + new Block(std::move(contents), rep->get_global_seqno(is_index), read_amp_bytes_per_bit, statistics); // uncompressed block - assert(block->value->compression_type() == kNoCompression); - if (block_cache != nullptr && block->value->cachable() && + if (block_cache != nullptr && block->value->own_bytes() && read_options.fill_cache) { size_t charge = block->value->ApproximateMemoryUsage(); s = block_cache->Insert(block_cache_key, block->value, charge, &DeleteCachedEntry, &(block->cache_handle)); +#ifndef NDEBUG block_cache->TEST_mark_as_data_block(block_cache_key, charge); +#endif // NDEBUG if (s.ok()) { if (get_context != nullptr) { get_context->get_context_stats_.num_cache_add++; @@ -1290,64 +1486,77 @@ Status BlockBasedTable::PutDataBlockToCache( const Slice& block_cache_key, const Slice& compressed_block_cache_key, Cache* block_cache, Cache* block_cache_compressed, const ReadOptions& /*read_options*/, const ImmutableCFOptions& ioptions, - CachableEntry* block, Block* raw_block, uint32_t format_version, - const Slice& compression_dict, size_t read_amp_bytes_per_bit, bool is_index, - Cache::Priority priority, GetContext* get_context) { - assert(raw_block->compression_type() == kNoCompression || + CachableEntry* cached_block, BlockContents* raw_block_contents, + CompressionType raw_block_comp_type, uint32_t format_version, + const UncompressionDict& uncompression_dict, SequenceNumber seq_no, + size_t read_amp_bytes_per_bit, MemoryAllocator* memory_allocator, + bool is_index, Cache::Priority priority, GetContext* get_context) { + assert(raw_block_comp_type == kNoCompression || block_cache_compressed != nullptr); Status s; // Retrieve the uncompressed contents into a new buffer - BlockContents contents; + BlockContents uncompressed_block_contents; Statistics* statistics = ioptions.statistics; - if (raw_block->compression_type() != kNoCompression) { - UncompressionContext uncompression_ctx(raw_block->compression_type(), - compression_dict); - s = UncompressBlockContents(uncompression_ctx, raw_block->data(), - raw_block->size(), &contents, format_version, - ioptions); + if (raw_block_comp_type != kNoCompression) { + UncompressionContext context(raw_block_comp_type); + UncompressionInfo info(context, uncompression_dict, raw_block_comp_type); + s = UncompressBlockContents(info, raw_block_contents->data.data(), + raw_block_contents->data.size(), + &uncompressed_block_contents, format_version, + ioptions, memory_allocator); } if (!s.ok()) { - delete raw_block; return s; } - if (raw_block->compression_type() != kNoCompression) { - block->value = new Block(std::move(contents), raw_block->global_seqno(), - read_amp_bytes_per_bit, - statistics); // uncompressed block + if (raw_block_comp_type != kNoCompression) { + cached_block->value = new Block(std::move(uncompressed_block_contents), + seq_no, read_amp_bytes_per_bit, + statistics); // uncompressed block } else { - block->value = raw_block; - raw_block = nullptr; + cached_block->value = + new Block(std::move(*raw_block_contents), seq_no, + read_amp_bytes_per_bit, ioptions.statistics); } // Insert compressed block into compressed block cache. // Release the hold on the compressed cache entry immediately. - if (block_cache_compressed != nullptr && raw_block != nullptr && - raw_block->cachable()) { - s = block_cache_compressed->Insert(compressed_block_cache_key, raw_block, - raw_block->ApproximateMemoryUsage(), - &DeleteCachedEntry); + if (block_cache_compressed != nullptr && + raw_block_comp_type != kNoCompression && raw_block_contents != nullptr && + raw_block_contents->own_bytes()) { +#ifndef NDEBUG + assert(raw_block_contents->is_raw_block); +#endif // NDEBUG + + // We cannot directly put raw_block_contents because this could point to + // an object in the stack. + BlockContents* block_cont_for_comp_cache = + new BlockContents(std::move(*raw_block_contents)); + s = block_cache_compressed->Insert( + compressed_block_cache_key, block_cont_for_comp_cache, + block_cont_for_comp_cache->ApproximateMemoryUsage(), + &DeleteCachedEntry); if (s.ok()) { // Avoid the following code to delete this cached block. - raw_block = nullptr; RecordTick(statistics, BLOCK_CACHE_COMPRESSED_ADD); } else { RecordTick(statistics, BLOCK_CACHE_COMPRESSED_ADD_FAILURES); + delete block_cont_for_comp_cache; } } - delete raw_block; // insert into uncompressed block cache - assert((block->value->compression_type() == kNoCompression)); - if (block_cache != nullptr && block->value->cachable()) { - size_t charge = block->value->ApproximateMemoryUsage(); - s = block_cache->Insert(block_cache_key, block->value, charge, - &DeleteCachedEntry, &(block->cache_handle), - priority); + if (block_cache != nullptr && cached_block->value->own_bytes()) { + size_t charge = cached_block->value->ApproximateMemoryUsage(); + s = block_cache->Insert(block_cache_key, cached_block->value, charge, + &DeleteCachedEntry, + &(cached_block->cache_handle), priority); +#ifndef NDEBUG block_cache->TEST_mark_as_data_block(block_cache_key, charge); +#endif // NDEBUG if (s.ok()) { - assert(block->cache_handle != nullptr); + assert(cached_block->cache_handle != nullptr); if (get_context != nullptr) { get_context->get_context_stats_.num_cache_add++; get_context->get_context_stats_.num_cache_bytes_write += charge; @@ -1373,12 +1582,12 @@ Status BlockBasedTable::PutDataBlockToCache( RecordTick(statistics, BLOCK_CACHE_DATA_BYTES_INSERT, charge); } } - assert(reinterpret_cast( - block_cache->Value(block->cache_handle)) == block->value); + assert(reinterpret_cast(block_cache->Value( + cached_block->cache_handle)) == cached_block->value); } else { RecordTick(statistics, BLOCK_CACHE_ADD_FAILURES); - delete block->value; - block->value = nullptr; + delete cached_block->value; + cached_block->value = nullptr; } } @@ -1397,12 +1606,11 @@ FilterBlockReader* BlockBasedTable::ReadFilter( } BlockContents block; - Slice dummy_comp_dict; - - BlockFetcher block_fetcher(rep->file.get(), prefetch_buffer, rep->footer, - ReadOptions(), filter_handle, &block, - rep->ioptions, false /* decompress */, - dummy_comp_dict, rep->persistent_cache_options); + BlockFetcher block_fetcher( + rep->file.get(), prefetch_buffer, rep->footer, ReadOptions(), + filter_handle, &block, rep->ioptions, false /* decompress */, + false /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + rep->persistent_cache_options, GetMemoryAllocator(rep->table_options)); Status s = block_fetcher.ReadBlockContents(); if (!s.ok()) { @@ -1495,7 +1703,8 @@ BlockBasedTable::CachableEntry BlockBasedTable::GetFilter( Statistics* statistics = rep_->ioptions.statistics; auto cache_handle = GetEntryFromCache( - block_cache, key, BLOCK_CACHE_FILTER_MISS, BLOCK_CACHE_FILTER_HIT, + block_cache, key, rep_->level, BLOCK_CACHE_FILTER_MISS, + BLOCK_CACHE_FILTER_HIT, get_context ? &get_context->get_context_stats_.num_cache_filter_miss : nullptr, get_context ? &get_context->get_context_stats_.num_cache_filter_hit @@ -1504,8 +1713,9 @@ BlockBasedTable::CachableEntry BlockBasedTable::GetFilter( FilterBlockReader* filter = nullptr; if (cache_handle != nullptr) { - filter = reinterpret_cast( - block_cache->Value(cache_handle)); + PERF_COUNTER_ADD(block_cache_filter_hit_count, 1); + filter = + reinterpret_cast(block_cache->Value(cache_handle)); } else if (no_io) { // Do not invoke any io. return CachableEntry(); @@ -1520,6 +1730,7 @@ BlockBasedTable::CachableEntry BlockBasedTable::GetFilter( ? Cache::Priority::HIGH : Cache::Priority::LOW); if (s.ok()) { + PERF_COUNTER_ADD(filter_block_read_count, 1); if (get_context != nullptr) { get_context->get_context_stats_.num_cache_add++; get_context->get_context_stats_.num_cache_bytes_write += usage; @@ -1540,7 +1751,85 @@ BlockBasedTable::CachableEntry BlockBasedTable::GetFilter( } } - return { filter, cache_handle }; + return {filter, cache_handle}; +} + +BlockBasedTable::CachableEntry +BlockBasedTable::GetUncompressionDict(Rep* rep, + FilePrefetchBuffer* prefetch_buffer, + bool no_io, GetContext* get_context) { + if (!rep->table_options.cache_index_and_filter_blocks) { + // block cache is either disabled or not used for meta-blocks. In either + // case, BlockBasedTableReader is the owner of the uncompression dictionary. + return {rep->uncompression_dict.get(), nullptr /* cache handle */}; + } + if (rep->compression_dict_handle.IsNull()) { + return {nullptr, nullptr}; + } + char cache_key_buf[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; + auto cache_key = + GetCacheKey(rep->cache_key_prefix, rep->cache_key_prefix_size, + rep->compression_dict_handle, cache_key_buf); + auto cache_handle = GetEntryFromCache( + rep->table_options.block_cache.get(), cache_key, rep->level, + BLOCK_CACHE_COMPRESSION_DICT_MISS, BLOCK_CACHE_COMPRESSION_DICT_HIT, + get_context + ? &get_context->get_context_stats_.num_cache_compression_dict_miss + : nullptr, + get_context + ? &get_context->get_context_stats_.num_cache_compression_dict_hit + : nullptr, + rep->ioptions.statistics, get_context); + UncompressionDict* dict = nullptr; + if (cache_handle != nullptr) { + dict = reinterpret_cast( + rep->table_options.block_cache->Value(cache_handle)); + } else if (no_io) { + // Do not invoke any io. + } else { + std::unique_ptr compression_dict_block; + Status s = + ReadCompressionDictBlock(rep, prefetch_buffer, &compression_dict_block); + size_t usage = 0; + if (s.ok()) { + assert(compression_dict_block != nullptr); + // TODO(ajkr): find a way to avoid the `compression_dict_block` data copy + dict = new UncompressionDict(compression_dict_block->data.ToString(), + rep->blocks_definitely_zstd_compressed, + rep->ioptions.statistics); + usage = dict->ApproximateMemoryUsage(); + s = rep->table_options.block_cache->Insert( + cache_key, dict, usage, &DeleteCachedUncompressionDictEntry, + &cache_handle, + rep->table_options.cache_index_and_filter_blocks_with_high_priority + ? Cache::Priority::HIGH + : Cache::Priority::LOW); + } + if (s.ok()) { + PERF_COUNTER_ADD(compression_dict_block_read_count, 1); + if (get_context != nullptr) { + get_context->get_context_stats_.num_cache_add++; + get_context->get_context_stats_.num_cache_bytes_write += usage; + get_context->get_context_stats_.num_cache_compression_dict_add++; + get_context->get_context_stats_ + .num_cache_compression_dict_bytes_insert += usage; + } else { + RecordTick(rep->ioptions.statistics, BLOCK_CACHE_ADD); + RecordTick(rep->ioptions.statistics, BLOCK_CACHE_BYTES_WRITE, usage); + RecordTick(rep->ioptions.statistics, BLOCK_CACHE_COMPRESSION_DICT_ADD); + RecordTick(rep->ioptions.statistics, + BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT, usage); + } + } else { + // There should be no way to get here if block cache insertion succeeded. + // Though it is still possible something failed earlier. + RecordTick(rep->ioptions.statistics, BLOCK_CACHE_ADD_FAILURES); + delete dict; + dict = nullptr; + assert(cache_handle == nullptr); + } + } + return {dict, cache_handle}; } // disable_prefix_seek should be set to true when prefix_extractor found in SST @@ -1551,12 +1840,16 @@ InternalIteratorBase* BlockBasedTable::NewIndexIterator( GetContext* get_context) { // index reader has already been pre-populated. if (rep_->index_reader) { + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return rep_->index_reader->NewIterator( input_iter, read_options.total_order_seek || disable_prefix_seek, read_options.fill_cache); } // we have a pinned index block if (rep_->index_entry.IsSet()) { + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return rep_->index_entry.value->NewIterator( input_iter, read_options.total_order_seek || disable_prefix_seek, read_options.fill_cache); @@ -1572,7 +1865,8 @@ InternalIteratorBase* BlockBasedTable::NewIndexIterator( rep_->dummy_index_reader_offset, cache_key); Statistics* statistics = rep_->ioptions.statistics; auto cache_handle = GetEntryFromCache( - block_cache, key, BLOCK_CACHE_INDEX_MISS, BLOCK_CACHE_INDEX_HIT, + block_cache, key, rep_->level, BLOCK_CACHE_INDEX_MISS, + BLOCK_CACHE_INDEX_HIT, get_context ? &get_context->get_context_stats_.num_cache_index_miss : nullptr, get_context ? &get_context->get_context_stats_.num_cache_index_hit @@ -1591,6 +1885,7 @@ InternalIteratorBase* BlockBasedTable::NewIndexIterator( IndexReader* index_reader = nullptr; if (cache_handle != nullptr) { + PERF_COUNTER_ADD(block_cache_index_hit_count, 1); index_reader = reinterpret_cast(block_cache->Value(cache_handle)); } else { @@ -1620,6 +1915,7 @@ InternalIteratorBase* BlockBasedTable::NewIndexIterator( RecordTick(statistics, BLOCK_CACHE_ADD); RecordTick(statistics, BLOCK_CACHE_BYTES_WRITE, charge); } + PERF_COUNTER_ADD(index_block_read_count, 1); RecordTick(statistics, BLOCK_CACHE_INDEX_ADD); RecordTick(statistics, BLOCK_CACHE_INDEX_BYTES_INSERT, charge); } else { @@ -1635,10 +1931,11 @@ InternalIteratorBase* BlockBasedTable::NewIndexIterator( return NewErrorInternalIterator(s); } } - } assert(cache_handle); + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. auto* iter = index_reader->NewIterator( input_iter, read_options.total_order_seek || disable_prefix_seek); @@ -1665,55 +1962,75 @@ TBlockIter* BlockBasedTable::NewDataBlockIterator( FilePrefetchBuffer* prefetch_buffer) { PERF_TIMER_GUARD(new_table_block_iter_nanos); - const bool no_io = (ro.read_tier == kBlockCacheTier); Cache* block_cache = rep->table_options.block_cache.get(); CachableEntry block; - Slice compression_dict; - if (s.ok()) { - if (rep->compression_dict_block) { - compression_dict = rep->compression_dict_block->data; - } - s = MaybeLoadDataBlockToCache(prefetch_buffer, rep, ro, handle, - compression_dict, &block, is_index, - get_context); - } - TBlockIter* iter; - if (input_iter != nullptr) { - iter = input_iter; - } else { - iter = new TBlockIter; - } - // Didn't get any data from block caches. - if (s.ok() && block.value == nullptr) { - if (no_io) { - // Could not read from block_cache and can't do IO - iter->Invalidate(Status::Incomplete("no blocking io")); - return iter; - } - std::unique_ptr block_value; - { - StopWatch sw(rep->ioptions.env, rep->ioptions.statistics, - READ_BLOCK_GET_MICROS); - s = ReadBlockFromFile( - rep->file.get(), prefetch_buffer, rep->footer, ro, handle, - &block_value, rep->ioptions, rep->blocks_maybe_compressed, - compression_dict, rep->persistent_cache_options, - is_index ? kDisableGlobalSequenceNumber : rep->global_seqno, - rep->table_options.read_amp_bytes_per_bit, rep->immortal_table); - } + { + const bool no_io = (ro.read_tier == kBlockCacheTier); + auto uncompression_dict_storage = + GetUncompressionDict(rep, prefetch_buffer, no_io, get_context); + const UncompressionDict& uncompression_dict = + uncompression_dict_storage.value == nullptr + ? UncompressionDict::GetEmptyDict() + : *uncompression_dict_storage.value; if (s.ok()) { - block.value = block_value.release(); + s = MaybeReadBlockAndLoadToCache(prefetch_buffer, rep, ro, handle, + uncompression_dict, &block, is_index, + get_context); + } + + if (input_iter != nullptr) { + iter = input_iter; + } else { + iter = new TBlockIter; + } + // Didn't get any data from block caches. + if (s.ok() && block.value == nullptr) { + if (no_io) { + // Could not read from block_cache and can't do IO + iter->Invalidate(Status::Incomplete("no blocking io")); + return iter; + } + std::unique_ptr block_value; + { + StopWatch sw(rep->ioptions.env, rep->ioptions.statistics, + READ_BLOCK_GET_MICROS); + s = ReadBlockFromFile( + rep->file.get(), prefetch_buffer, rep->footer, ro, handle, + &block_value, rep->ioptions, + rep->blocks_maybe_compressed /*do_decompress*/, + rep->blocks_maybe_compressed, uncompression_dict, + rep->persistent_cache_options, + is_index ? kDisableGlobalSequenceNumber : rep->global_seqno, + rep->table_options.read_amp_bytes_per_bit, + GetMemoryAllocator(rep->table_options)); + } + if (s.ok()) { + block.value = block_value.release(); + } } + // TODO(ajkr): also pin compression dictionary block when + // `pin_l0_filter_and_index_blocks_in_cache == true`. + uncompression_dict_storage.Release(block_cache); } if (s.ok()) { assert(block.value != nullptr); const bool kTotalOrderSeek = true; + // Block contents are pinned and it is still pinned after the iterator + // is destroyed as long as cleanup functions are moved to another object, + // when: + // 1. block cache handle is set to be released in cleanup function, or + // 2. it's pointing to immortal source. If own_bytes is true then we are + // not reading data from the original source, whether immortal or not. + // Otherwise, the block is pinned iff the source is immortal. + bool block_contents_pinned = + (block.cache_handle != nullptr || + (!block.value->own_bytes() && rep->immortal_table)); iter = block.value->NewIterator( &rep->internal_comparator, rep->internal_comparator.user_comparator(), iter, rep->ioptions.statistics, kTotalOrderSeek, key_includes_seq, - index_key_is_full); + index_key_is_full, block_contents_pinned); if (block.cache_handle != nullptr) { iter->RegisterCleanup(&ReleaseCachedEntry, block_cache, block.cache_handle); @@ -1722,7 +2039,7 @@ TBlockIter* BlockBasedTable::NewDataBlockIterator( // insert a dummy record to block cache to track the memory usage Cache::Handle* cache_handle; // There are two other types of cache keys: 1) SST cache key added in - // `MaybeLoadDataBlockToCache` 2) dummy cache key added in + // `MaybeReadBlockAndLoadToCache` 2) dummy cache key added in // `write_buffer_manager`. Use longer prefix (41 bytes) to differentiate // from SST cache key(31 bytes), and use non-zero prefix to // differentiate from `write_buffer_manager` @@ -1758,25 +2075,28 @@ TBlockIter* BlockBasedTable::NewDataBlockIterator( return iter; } -Status BlockBasedTable::MaybeLoadDataBlockToCache( +Status BlockBasedTable::MaybeReadBlockAndLoadToCache( FilePrefetchBuffer* prefetch_buffer, Rep* rep, const ReadOptions& ro, - const BlockHandle& handle, Slice compression_dict, + const BlockHandle& handle, const UncompressionDict& uncompression_dict, CachableEntry* block_entry, bool is_index, GetContext* get_context) { assert(block_entry != nullptr); const bool no_io = (ro.read_tier == kBlockCacheTier); Cache* block_cache = rep->table_options.block_cache.get(); + + // No point to cache compressed blocks if it never goes away Cache* block_cache_compressed = - rep->table_options.block_cache_compressed.get(); + rep->immortal_table ? nullptr + : rep->table_options.block_cache_compressed.get(); + // First, try to get the block from the cache + // // If either block cache is enabled, we'll try to read from it. Status s; + char cache_key[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; + char compressed_cache_key[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; + Slice key /* key to the block cache */; + Slice ckey /* key to the compressed block cache */; if (block_cache != nullptr || block_cache_compressed != nullptr) { - Statistics* statistics = rep->ioptions.statistics; - char cache_key[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; - char compressed_cache_key[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; - Slice key, /* key to the block cache */ - ckey /* key to the compressed block cache */; - // create key for block cache if (block_cache != nullptr) { key = GetCacheKey(rep->cache_key_prefix, rep->cache_key_prefix_size, @@ -1789,30 +2109,42 @@ Status BlockBasedTable::MaybeLoadDataBlockToCache( compressed_cache_key); } - s = GetDataBlockFromCache( - key, ckey, block_cache, block_cache_compressed, rep->ioptions, ro, - block_entry, rep->table_options.format_version, compression_dict, - rep->table_options.read_amp_bytes_per_bit, is_index, get_context); + s = GetDataBlockFromCache(key, ckey, block_cache, block_cache_compressed, + rep, ro, block_entry, uncompression_dict, + rep->table_options.read_amp_bytes_per_bit, + is_index, get_context); + // Can't find the block from the cache. If I/O is allowed, read from the + // file. if (block_entry->value == nullptr && !no_io && ro.fill_cache) { - std::unique_ptr raw_block; + Statistics* statistics = rep->ioptions.statistics; + bool do_decompress = + block_cache_compressed == nullptr && rep->blocks_maybe_compressed; + CompressionType raw_block_comp_type; + BlockContents raw_block_contents; { StopWatch sw(rep->ioptions.env, statistics, READ_BLOCK_GET_MICROS); - s = ReadBlockFromFile( + BlockFetcher block_fetcher( rep->file.get(), prefetch_buffer, rep->footer, ro, handle, - &raw_block, rep->ioptions, - block_cache_compressed == nullptr && rep->blocks_maybe_compressed, - compression_dict, rep->persistent_cache_options, - is_index ? kDisableGlobalSequenceNumber : rep->global_seqno, - rep->table_options.read_amp_bytes_per_bit, rep->immortal_table); + &raw_block_contents, rep->ioptions, + do_decompress /* do uncompress */, rep->blocks_maybe_compressed, + uncompression_dict, rep->persistent_cache_options, + GetMemoryAllocator(rep->table_options), + GetMemoryAllocatorForCompressedBlock(rep->table_options)); + s = block_fetcher.ReadBlockContents(); + raw_block_comp_type = block_fetcher.get_compression_type(); } if (s.ok()) { + SequenceNumber seq_no = rep->get_global_seqno(is_index); + // If filling cache is allowed and a cache is configured, try to put the + // block to the cache. s = PutDataBlockToCache( key, ckey, block_cache, block_cache_compressed, ro, rep->ioptions, - block_entry, raw_block.release(), rep->table_options.format_version, - compression_dict, rep->table_options.read_amp_bytes_per_bit, - is_index, + block_entry, &raw_block_contents, raw_block_comp_type, + rep->table_options.format_version, uncompression_dict, seq_no, + rep->table_options.read_amp_bytes_per_bit, + GetMemoryAllocator(rep->table_options), is_index, is_index && rep->table_options .cache_index_and_filter_blocks_with_high_priority ? Cache::Priority::HIGH @@ -1855,6 +2187,8 @@ BlockBasedTable::PartitionedIndexIteratorState::NewSecondaryIterator( RecordTick(rep->ioptions.statistics, BLOCK_CACHE_BYTES_READ, block_cache->GetUsage(block->second.cache_handle)); Statistics* kNullStats = nullptr; + // We don't return pinned datat from index blocks, so no need + // to set `block_contents_pinned`. return block->second.value->NewIterator( &rep->internal_comparator, rep->internal_comparator.user_comparator(), nullptr, kNullStats, true, index_key_includes_seq_, index_key_is_full_); @@ -1933,7 +2267,7 @@ bool BlockBasedTable::PrefixMayMatch( // Then, try find it within each block // we already know prefix_extractor and prefix_extractor_name must match // because `CheckPrefixMayMatch` first checks `check_filter_ == true` - unique_ptr> iiter( + std::unique_ptr> iiter( NewIndexIterator(no_io_read_options, /* need_upper_bound_check */ false)); iiter->Seek(internal_prefix); @@ -2015,9 +2349,8 @@ void BlockBasedTableIterator::Seek(const Slice& target) { assert( !block_iter_.Valid() || (key_includes_seq_ && icomp_.Compare(target, block_iter_.key()) <= 0) || - (!key_includes_seq_ && - icomp_.user_comparator()->Compare(ExtractUserKey(target), - block_iter_.key()) <= 0)); + (!key_includes_seq_ && user_comparator_.Compare(ExtractUserKey(target), + block_iter_.key()) <= 0)); } template @@ -2184,9 +2517,8 @@ void BlockBasedTableIterator::FindKeyForward() { bool reached_upper_bound = (read_options_.iterate_upper_bound != nullptr && block_iter_points_to_real_block_ && block_iter_.Valid() && - icomp_.user_comparator()->Compare(ExtractUserKey(block_iter_.key()), - *read_options_.iterate_upper_bound) >= - 0); + user_comparator_.Compare(ExtractUserKey(block_iter_.key()), + *read_options_.iterate_upper_bound) >= 0); TEST_SYNC_POINT_CALLBACK( "BlockBasedTable::BlockEntryIteratorState::KeyReachedUpperBound", &reached_upper_bound); @@ -2235,7 +2567,7 @@ InternalIterator* BlockBasedTable::NewIterator( !skip_filters && !read_options.total_order_seek && prefix_extractor != nullptr, need_upper_bound_check, prefix_extractor, kIsNotIndex, - true /*key_includes_seq*/, for_compaction); + true /*key_includes_seq*/, true /*index_key_is_full*/, for_compaction); } else { auto* mem = arena->AllocateAligned(sizeof(BlockBasedTableIterator)); @@ -2245,37 +2577,21 @@ InternalIterator* BlockBasedTable::NewIterator( !skip_filters && !read_options.total_order_seek && prefix_extractor != nullptr, need_upper_bound_check, prefix_extractor, kIsNotIndex, - true /*key_includes_seq*/, for_compaction); + true /*key_includes_seq*/, true /*index_key_is_full*/, for_compaction); } } -InternalIterator* BlockBasedTable::NewRangeTombstoneIterator( +FragmentedRangeTombstoneIterator* BlockBasedTable::NewRangeTombstoneIterator( const ReadOptions& read_options) { - if (rep_->range_del_handle.IsNull()) { - // The block didn't exist, nullptr indicates no range tombstones. + if (rep_->fragmented_range_dels == nullptr) { return nullptr; } - if (rep_->range_del_entry.cache_handle != nullptr) { - // We have a handle to an uncompressed block cache entry that's held for - // this table's lifetime. Increment its refcount before returning an - // iterator based on it since the returned iterator may outlive this table - // reader. - assert(rep_->range_del_entry.value != nullptr); - Cache* block_cache = rep_->table_options.block_cache.get(); - assert(block_cache != nullptr); - if (block_cache->Ref(rep_->range_del_entry.cache_handle)) { - auto iter = rep_->range_del_entry.value->NewIterator( - &rep_->internal_comparator, - rep_->internal_comparator.user_comparator()); - iter->RegisterCleanup(&ReleaseCachedEntry, block_cache, - rep_->range_del_entry.cache_handle); - return iter; - } + SequenceNumber snapshot = kMaxSequenceNumber; + if (read_options.snapshot != nullptr) { + snapshot = read_options.snapshot->GetSequenceNumber(); } - // The meta-block exists but isn't in uncompressed block cache (maybe - // because it is disabled), so go through the full lookup process. - return NewDataBlockIterator(rep_, read_options, - rep_->range_del_handle); + return new FragmentedRangeTombstoneIterator( + rep_->fragmented_range_dels, rep_->internal_comparator, snapshot); } bool BlockBasedTable::FullFilterKeyMayMatch( @@ -2302,6 +2618,7 @@ bool BlockBasedTable::FullFilterKeyMayMatch( } if (may_match) { RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_POSITIVE); + PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_positive, 1, rep_->level); } return may_match; } @@ -2326,6 +2643,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key, if (!FullFilterKeyMayMatch(read_options, filter, key, no_io, prefix_extractor)) { RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL); + PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_useful, 1, rep_->level); } else { IndexBlockIter iiter_on_stack; // if prefix_extractor found in block differs from options, disable @@ -2358,12 +2676,14 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key, // TODO: think about interaction with Merge. If a user key cannot // cross one data block, we should be fine. RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL); + PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_useful, 1, rep_->level); break; } else { DataBlockIter biter; NewDataBlockIterator( rep_, read_options, iiter->value(), &biter, false, - true /* key_includes_seq */, get_context); + true /* key_includes_seq */, true /* index_key_is_full */, + get_context); if (read_options.read_tier == kBlockCacheTier && biter.status().IsIncomplete()) { @@ -2410,6 +2730,8 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key, } if (matched && filter != nullptr && !filter->IsBlockBased()) { RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_TRUE_POSITIVE); + PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_true_positive, 1, + rep_->level); } if (s.ok()) { s = iiter->status(); @@ -2489,7 +2811,7 @@ Status BlockBasedTable::VerifyChecksum() { std::unique_ptr meta_iter; s = ReadMetaBlock(rep_, nullptr /* prefetch buffer */, &meta, &meta_iter); if (s.ok()) { - s = VerifyChecksumInBlocks(meta_iter.get()); + s = VerifyChecksumInMetaBlocks(meta_iter.get()); if (!s.ok()) { return s; } @@ -2523,12 +2845,11 @@ Status BlockBasedTable::VerifyChecksumInBlocks( } BlockHandle handle = index_iter->value(); BlockContents contents; - Slice dummy_comp_dict; - BlockFetcher block_fetcher(rep_->file.get(), nullptr /* prefetch buffer */, - rep_->footer, ReadOptions(), handle, &contents, - rep_->ioptions, false /* decompress */, - dummy_comp_dict /*compression dict*/, - rep_->persistent_cache_options); + BlockFetcher block_fetcher( + rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer, + ReadOptions(), handle, &contents, rep_->ioptions, + false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options); s = block_fetcher.ReadBlockContents(); if (!s.ok()) { break; @@ -2537,7 +2858,7 @@ Status BlockBasedTable::VerifyChecksumInBlocks( return s; } -Status BlockBasedTable::VerifyChecksumInBlocks( +Status BlockBasedTable::VerifyChecksumInMetaBlocks( InternalIteratorBase* index_iter) { Status s; for (index_iter->SeekToFirst(); index_iter->Valid(); index_iter->Next()) { @@ -2549,13 +2870,19 @@ Status BlockBasedTable::VerifyChecksumInBlocks( Slice input = index_iter->value(); s = handle.DecodeFrom(&input); BlockContents contents; - Slice dummy_comp_dict; - BlockFetcher block_fetcher(rep_->file.get(), nullptr /* prefetch buffer */, - rep_->footer, ReadOptions(), handle, &contents, - rep_->ioptions, false /* decompress */, - dummy_comp_dict /*compression dict*/, - rep_->persistent_cache_options); + BlockFetcher block_fetcher( + rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer, + ReadOptions(), handle, &contents, rep_->ioptions, + false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options); s = block_fetcher.ReadBlockContents(); + if (s.IsCorruption() && index_iter->key() == kPropertiesBlock) { + TableProperties* table_properties; + s = TryReadPropertiesWithGlobalSeqno(rep_, nullptr /* prefetch_buffer */, + index_iter->value(), + &table_properties); + delete table_properties; + } if (!s.ok()) { break; } @@ -2582,12 +2909,24 @@ bool BlockBasedTable::TEST_KeyInCache(const ReadOptions& options, Slice ckey; Status s; - s = GetDataBlockFromCache( - cache_key, ckey, block_cache, nullptr, rep_->ioptions, options, &block, - rep_->table_options.format_version, - rep_->compression_dict_block ? rep_->compression_dict_block->data - : Slice(), - 0 /* read_amp_bytes_per_bit */); + if (!rep_->compression_dict_handle.IsNull()) { + std::unique_ptr compression_dict_block; + s = ReadCompressionDictBlock(rep_, nullptr /* prefetch_buffer */, + &compression_dict_block); + if (s.ok()) { + assert(compression_dict_block != nullptr); + UncompressionDict uncompression_dict( + compression_dict_block->data.ToString(), + rep_->blocks_definitely_zstd_compressed); + s = GetDataBlockFromCache(cache_key, ckey, block_cache, nullptr, rep_, + options, &block, uncompression_dict, + 0 /* read_amp_bytes_per_bit */); + } + } else { + s = GetDataBlockFromCache( + cache_key, ckey, block_cache, nullptr, rep_, options, &block, + UncompressionDict::GetEmptyDict(), 0 /* read_amp_bytes_per_bit */); + } assert(s.ok()); bool in_cache = block.value != nullptr; if (in_cache) { @@ -2644,7 +2983,8 @@ Status BlockBasedTable::CreateIndexReader( rep_->table_properties == nullptr || rep_->table_properties->index_key_is_user_key == 0, rep_->table_properties == nullptr || - rep_->table_properties->index_value_is_delta_encoded == 0); + rep_->table_properties->index_value_is_delta_encoded == 0, + GetMemoryAllocator(rep_->table_options)); } case BlockBasedTableOptions::kBinarySearch: { return BinarySearchIndexReader::Create( @@ -2653,7 +2993,8 @@ Status BlockBasedTable::CreateIndexReader( rep_->table_properties == nullptr || rep_->table_properties->index_key_is_user_key == 0, rep_->table_properties == nullptr || - rep_->table_properties->index_value_is_delta_encoded == 0); + rep_->table_properties->index_value_is_delta_encoded == 0, + GetMemoryAllocator(rep_->table_options)); } case BlockBasedTableOptions::kHashSearch: { std::unique_ptr meta_guard; @@ -2675,7 +3016,8 @@ Status BlockBasedTable::CreateIndexReader( rep_->table_properties == nullptr || rep_->table_properties->index_key_is_user_key == 0, rep_->table_properties == nullptr || - rep_->table_properties->index_value_is_delta_encoded == 0); + rep_->table_properties->index_value_is_delta_encoded == 0, + GetMemoryAllocator(rep_->table_options)); } meta_index_iter = meta_iter_guard.get(); } @@ -2688,7 +3030,8 @@ Status BlockBasedTable::CreateIndexReader( rep_->table_properties == nullptr || rep_->table_properties->index_key_is_user_key == 0, rep_->table_properties == nullptr || - rep_->table_properties->index_value_is_delta_encoded == 0); + rep_->table_properties->index_value_is_delta_encoded == 0, + GetMemoryAllocator(rep_->table_options)); } default: { std::string error_message = @@ -2699,7 +3042,7 @@ Status BlockBasedTable::CreateIndexReader( } uint64_t BlockBasedTable::ApproximateOffsetOf(const Slice& key) { - unique_ptr> index_iter( + std::unique_ptr> index_iter( NewIndexIterator(ReadOptions())); index_iter->Seek(key); @@ -2853,11 +3196,11 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file, BlockHandle handle; if (FindMetaBlock(meta_iter.get(), filter_block_key, &handle).ok()) { BlockContents block; - Slice dummy_comp_dict; BlockFetcher block_fetcher( rep_->file.get(), nullptr /* prefetch_buffer */, rep_->footer, ReadOptions(), handle, &block, rep_->ioptions, - false /*decompress*/, dummy_comp_dict /*compression dict*/, + false /*decompress*/, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options); s = block_fetcher.ReadBlockContents(); if (!s.ok()) { @@ -2886,8 +3229,15 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file, } // Output compression dictionary - if (rep_->compression_dict_block != nullptr) { - auto compression_dict = rep_->compression_dict_block->data; + if (!rep_->compression_dict_handle.IsNull()) { + std::unique_ptr compression_dict_block; + s = ReadCompressionDictBlock(rep_, nullptr /* prefetch_buffer */, + &compression_dict_block); + if (!s.ok()) { + return s; + } + assert(compression_dict_block != nullptr); + auto compression_dict = compression_dict_block->data; out_file->Append( "Compression Dictionary:\n" "--------------------------------------\n"); @@ -2925,22 +3275,36 @@ void BlockBasedTable::Close() { if (rep_->closed) { return; } - rep_->filter_entry.Release(rep_->table_options.block_cache.get()); - rep_->index_entry.Release(rep_->table_options.block_cache.get()); - rep_->range_del_entry.Release(rep_->table_options.block_cache.get()); - // cleanup index and filter blocks to avoid accessing dangling pointer + + Cache* const cache = rep_->table_options.block_cache.get(); + + rep_->filter_entry.Release(cache); + rep_->index_entry.Release(cache); + + // cleanup index, filter, and compression dictionary blocks + // to avoid accessing dangling pointers if (!rep_->table_options.no_block_cache) { char cache_key[kMaxCacheKeyPrefixSize + kMaxVarint64Length]; + // Get the filter block key auto key = GetCacheKey(rep_->cache_key_prefix, rep_->cache_key_prefix_size, rep_->filter_handle, cache_key); - rep_->table_options.block_cache.get()->Erase(key); + cache->Erase(key); + // Get the index block key key = GetCacheKeyFromOffset(rep_->cache_key_prefix, rep_->cache_key_prefix_size, rep_->dummy_index_reader_offset, cache_key); - rep_->table_options.block_cache.get()->Erase(key); + cache->Erase(key); + + if (!rep_->compression_dict_handle.IsNull()) { + // Get the compression dictionary block key + key = GetCacheKey(rep_->cache_key_prefix, rep_->cache_key_prefix_size, + rep_->compression_dict_handle, cache_key); + cache->Erase(key); + } } + rep_->closed = true; } @@ -3131,6 +3495,13 @@ void DeleteCachedIndexEntry(const Slice& /*key*/, void* value) { delete index_reader; } +void DeleteCachedUncompressionDictEntry(const Slice& /*key*/, void* value) { + UncompressionDict* dict = reinterpret_cast(value); + RecordTick(dict->statistics(), BLOCK_CACHE_COMPRESSION_DICT_BYTES_EVICT, + dict->ApproximateMemoryUsage()); + delete dict; +} + } // anonymous namespace } // namespace rocksdb diff --git a/ceph/src/rocksdb/table/block_based_table_reader.h b/ceph/src/rocksdb/table/block_based_table_reader.h index 3cada0c2c..f0b5cdb1b 100644 --- a/ceph/src/rocksdb/table/block_based_table_reader.h +++ b/ceph/src/rocksdb/table/block_based_table_reader.h @@ -16,6 +16,7 @@ #include #include +#include "db/range_tombstone_fragmenter.h" #include "options/cf_options.h" #include "rocksdb/options.h" #include "rocksdb/persistent_cache.h" @@ -32,6 +33,7 @@ #include "table/two_level_iterator.h" #include "util/coding.h" #include "util/file_reader_writer.h" +#include "util/user_comparator_wrapper.h" namespace rocksdb { @@ -52,8 +54,6 @@ struct EnvOptions; struct ReadOptions; class GetContext; -using std::unique_ptr; - typedef std::vector> KVPairBlock; // A Table is a sorted map from strings to strings. Tables are @@ -88,8 +88,9 @@ class BlockBasedTable : public TableReader { const EnvOptions& env_options, const BlockBasedTableOptions& table_options, const InternalKeyComparator& internal_key_comparator, - unique_ptr&& file, - uint64_t file_size, unique_ptr* table_reader, + std::unique_ptr&& file, + uint64_t file_size, + std::unique_ptr* table_reader, const SliceTransform* prefix_extractor = nullptr, bool prefetch_index_and_filter_in_cache = true, bool skip_filters = false, int level = -1, @@ -112,7 +113,7 @@ class BlockBasedTable : public TableReader { bool skip_filters = false, bool for_compaction = false) override; - InternalIterator* NewRangeTombstoneIterator( + FragmentedRangeTombstoneIterator* NewRangeTombstoneIterator( const ReadOptions& read_options) override; // @param skip_filters Disables loading/accessing the filter block @@ -255,13 +256,11 @@ class BlockBasedTable : public TableReader { // @param block_entry value is set to the uncompressed block if found. If // in uncompressed block cache, also sets cache_handle to reference that // block. - static Status MaybeLoadDataBlockToCache(FilePrefetchBuffer* prefetch_buffer, - Rep* rep, const ReadOptions& ro, - const BlockHandle& handle, - Slice compression_dict, - CachableEntry* block_entry, - bool is_index = false, - GetContext* get_context = nullptr); + static Status MaybeReadBlockAndLoadToCache( + FilePrefetchBuffer* prefetch_buffer, Rep* rep, const ReadOptions& ro, + const BlockHandle& handle, const UncompressionDict& uncompression_dict, + CachableEntry* block_entry, bool is_index = false, + GetContext* get_context = nullptr); // For the following two functions: // if `no_io == true`, we will not try to read filter/index from sst file @@ -275,6 +274,10 @@ class BlockBasedTable : public TableReader { const bool is_a_filter_partition, bool no_io, GetContext* get_context, const SliceTransform* prefix_extractor = nullptr) const; + static CachableEntry GetUncompressionDict( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, bool no_io, + GetContext* get_context); + // Get the iterator from the index reader. // If input_iter is not set, return new Iterator // If input_iter is set, update it and return it as Iterator @@ -295,15 +298,16 @@ class BlockBasedTable : public TableReader { // block_cache_compressed. // On success, Status::OK with be returned and @block will be populated with // pointer to the block as well as its block handle. - // @param compression_dict Data for presetting the compression library's + // @param uncompression_dict Data for presetting the compression library's // dictionary. static Status GetDataBlockFromCache( const Slice& block_cache_key, const Slice& compressed_block_cache_key, - Cache* block_cache, Cache* block_cache_compressed, - const ImmutableCFOptions& ioptions, const ReadOptions& read_options, - BlockBasedTable::CachableEntry* block, uint32_t format_version, - const Slice& compression_dict, size_t read_amp_bytes_per_bit, - bool is_index = false, GetContext* get_context = nullptr); + Cache* block_cache, Cache* block_cache_compressed, Rep* rep, + const ReadOptions& read_options, + BlockBasedTable::CachableEntry* block, + const UncompressionDict& uncompression_dict, + size_t read_amp_bytes_per_bit, bool is_index = false, + GetContext* get_context = nullptr); // Put a raw block (maybe compressed) to the corresponding block caches. // This method will perform decompression against raw_block if needed and then @@ -311,16 +315,18 @@ class BlockBasedTable : public TableReader { // On success, Status::OK will be returned; also @block will be populated with // uncompressed block and its cache handle. // - // REQUIRES: raw_block is heap-allocated. PutDataBlockToCache() will be - // responsible for releasing its memory if error occurs. - // @param compression_dict Data for presetting the compression library's + // Allocated memory managed by raw_block_contents will be transferred to + // PutDataBlockToCache(). After the call, the object will be invalid. + // @param uncompression_dict Data for presetting the compression library's // dictionary. static Status PutDataBlockToCache( const Slice& block_cache_key, const Slice& compressed_block_cache_key, Cache* block_cache, Cache* block_cache_compressed, const ReadOptions& read_options, const ImmutableCFOptions& ioptions, - CachableEntry* block, Block* raw_block, uint32_t format_version, - const Slice& compression_dict, size_t read_amp_bytes_per_bit, + CachableEntry* block, BlockContents* raw_block_contents, + CompressionType raw_block_comp_type, uint32_t format_version, + const UncompressionDict& uncompression_dict, SequenceNumber seq_no, + size_t read_amp_bytes_per_bit, MemoryAllocator* memory_allocator, bool is_index = false, Cache::Priority pri = Cache::Priority::LOW, GetContext* get_context = nullptr); @@ -349,12 +355,36 @@ class BlockBasedTable : public TableReader { const Slice& user_key, const bool no_io, const SliceTransform* prefix_extractor = nullptr) const; - // Read the meta block from sst. + static Status PrefetchTail( + RandomAccessFileReader* file, uint64_t file_size, + TailPrefetchStats* tail_prefetch_stats, const bool prefetch_all, + const bool preload_all, + std::unique_ptr* prefetch_buffer); static Status ReadMetaBlock(Rep* rep, FilePrefetchBuffer* prefetch_buffer, std::unique_ptr* meta_block, std::unique_ptr* iter); - - Status VerifyChecksumInBlocks(InternalIteratorBase* index_iter); + static Status TryReadPropertiesWithGlobalSeqno( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, const Slice& handle_value, + TableProperties** table_properties); + static Status ReadPropertiesBlock(Rep* rep, + FilePrefetchBuffer* prefetch_buffer, + InternalIterator* meta_iter, + const SequenceNumber largest_seqno); + static Status ReadRangeDelBlock( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, + InternalIterator* meta_iter, + const InternalKeyComparator& internal_comparator); + static Status ReadCompressionDictBlock( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, + std::unique_ptr* compression_dict_block); + static Status PrefetchIndexAndFilterBlocks( + Rep* rep, FilePrefetchBuffer* prefetch_buffer, + InternalIterator* meta_iter, BlockBasedTable* new_table, + const SliceTransform* prefix_extractor, bool prefetch_all, + const BlockBasedTableOptions& table_options, const int level, + const bool prefetch_index_and_filter_in_cache); + + Status VerifyChecksumInMetaBlocks(InternalIteratorBase* index_iter); Status VerifyChecksumInBlocks(InternalIteratorBase* index_iter); // Create the filter from the filter block. @@ -366,10 +396,10 @@ class BlockBasedTable : public TableReader { static void SetupCacheKeyPrefix(Rep* rep, uint64_t file_size); // Generate a cache key prefix from the file - static void GenerateCachePrefix(Cache* cc, - RandomAccessFile* file, char* buffer, size_t* size); - static void GenerateCachePrefix(Cache* cc, - WritableFile* file, char* buffer, size_t* size); + static void GenerateCachePrefix(Cache* cc, RandomAccessFile* file, + char* buffer, size_t* size); + static void GenerateCachePrefix(Cache* cc, WritableFile* file, char* buffer, + size_t* size); // Helper functions for DumpTable() Status DumpIndexBlock(WritableFile* out_file); @@ -431,7 +461,7 @@ struct BlockBasedTable::Rep { Rep(const ImmutableCFOptions& _ioptions, const EnvOptions& _env_options, const BlockBasedTableOptions& _table_opt, const InternalKeyComparator& _internal_comparator, bool skip_filters, - const bool _immortal_table) + int _level, const bool _immortal_table) : ioptions(_ioptions), env_options(_env_options), table_options(_table_opt), @@ -442,8 +472,8 @@ struct BlockBasedTable::Rep { hash_index_allow_collision(false), whole_key_filtering(_table_opt.whole_key_filtering), prefix_filtering(true), - range_del_handle(BlockHandle::NullBlockHandle()), global_seqno(kDisableGlobalSequenceNumber), + level(_level), immortal_table(_immortal_table) {} const ImmutableCFOptions& ioptions; @@ -452,7 +482,7 @@ struct BlockBasedTable::Rep { const FilterPolicy* const filter_policy; const InternalKeyComparator& internal_comparator; Status status; - unique_ptr file; + std::unique_ptr file; char cache_key_prefix[kMaxCacheKeyPrefixSize]; size_t cache_key_prefix_size = 0; char persistent_cache_key_prefix[kMaxCacheKeyPrefixSize]; @@ -465,11 +495,15 @@ struct BlockBasedTable::Rep { // Footer contains the fixed table information Footer footer; - // index_reader and filter will be populated and used only when - // options.block_cache is nullptr; otherwise we will get the index block via - // the block cache. - unique_ptr index_reader; - unique_ptr filter; + // `index_reader`, `filter`, and `uncompression_dict` will be populated (i.e., + // non-nullptr) and used only when options.block_cache is nullptr or when + // `cache_index_and_filter_blocks == false`. Otherwise, we will get the index, + // filter, and compression dictionary blocks via the block cache. In that case + // `dummy_index_reader_offset`, `filter_handle`, and `compression_dict_handle` + // are used to lookup these meta-blocks in block cache. + std::unique_ptr index_reader; + std::unique_ptr filter; + std::unique_ptr uncompression_dict; enum class FilterType { kNoFilter, @@ -479,13 +513,9 @@ struct BlockBasedTable::Rep { }; FilterType filter_type; BlockHandle filter_handle; + BlockHandle compression_dict_handle; std::shared_ptr table_properties; - // Block containing the data for the compression dictionary. We take ownership - // for the entire block struct, even though we only use its Slice member. This - // is easier because the Slice member depends on the continued existence of - // another member ("allocation"). - std::unique_ptr compression_dict_block; BlockBasedTableOptions::IndexType index_type; bool hash_index_allow_collision; bool whole_key_filtering; @@ -494,7 +524,7 @@ struct BlockBasedTable::Rep { // module should not be relying on db module. However to make things easier // and compatible with existing code, we introduce a wrapper that allows // block to extract prefix without knowing if a key is internal or not. - unique_ptr internal_prefix_transform; + std::unique_ptr internal_prefix_transform; std::shared_ptr table_prefix_extractor; // only used in level 0 files when pin_l0_filter_and_index_blocks_in_cache is @@ -505,10 +535,7 @@ struct BlockBasedTable::Rep { // push flush them out, hence they're pinned CachableEntry filter_entry; CachableEntry index_entry; - // range deletion meta-block is pinned through reader's lifetime when LRU - // cache is enabled. - CachableEntry range_del_entry; - BlockHandle range_del_handle; + std::shared_ptr fragmented_range_dels; // If global_seqno is used, all Keys in this file will have the same // seqno with value `global_seqno`. @@ -517,12 +544,26 @@ struct BlockBasedTable::Rep { // and every key have it's own seqno. SequenceNumber global_seqno; + // the level when the table is opened, could potentially change when trivial + // move is involved + int level; + // If false, blocks in this file are definitely all uncompressed. Knowing this // before reading individual blocks enables certain optimizations. bool blocks_maybe_compressed = true; + // If true, data blocks in this file are definitely ZSTD compressed. If false + // they might not be. When false we skip creating a ZSTD digested + // uncompression dictionary. Even if we get a false negative, things should + // still work, just not as quickly. + bool blocks_definitely_zstd_compressed = false; + bool closed = false; const bool immortal_table; + + SequenceNumber get_global_seqno(bool is_index) const { + return is_index ? kDisableGlobalSequenceNumber : global_seqno; + } }; template @@ -540,6 +581,7 @@ class BlockBasedTableIterator : public InternalIteratorBase { : table_(table), read_options_(read_options), icomp_(icomp), + user_comparator_(icomp.user_comparator()), index_iter_(index_iter), pinned_iters_mgr_(nullptr), block_iter_points_to_real_block_(false), @@ -636,6 +678,7 @@ class BlockBasedTableIterator : public InternalIteratorBase { BlockBasedTable* table_; const ReadOptions read_options_; const InternalKeyComparator& icomp_; + UserComparatorWrapper user_comparator_; InternalIteratorBase* index_iter_; PinnedIteratorsManager* pinned_iters_mgr_; TBlockIter block_iter_; diff --git a/ceph/src/rocksdb/table/block_builder.cc b/ceph/src/rocksdb/table/block_builder.cc index 37b407ea5..c14b4f6d3 100644 --- a/ceph/src/rocksdb/table/block_builder.cc +++ b/ceph/src/rocksdb/table/block_builder.cc @@ -64,14 +64,14 @@ BlockBuilder::BlockBuilder( assert(0); } assert(block_restart_interval_ >= 1); - restarts_.push_back(0); // First restart point is at offset 0 + restarts_.push_back(0); // First restart point is at offset 0 estimate_ = sizeof(uint32_t) + sizeof(uint32_t); } void BlockBuilder::Reset() { buffer_.clear(); restarts_.clear(); - restarts_.push_back(0); // First restart point is at offset 0 + restarts_.push_back(0); // First restart point is at offset 0 estimate_ = sizeof(uint32_t) + sizeof(uint32_t); counter_ = 0; finished_ = false; @@ -81,8 +81,8 @@ void BlockBuilder::Reset() { } } -size_t BlockBuilder::EstimateSizeAfterKV(const Slice& key, const Slice& value) - const { +size_t BlockBuilder::EstimateSizeAfterKV(const Slice& key, + const Slice& value) const { size_t estimate = CurrentSizeEstimate(); // Note: this is an imprecise estimate as it accounts for the whole key size // instead of non-shared key size. @@ -95,13 +95,13 @@ size_t BlockBuilder::EstimateSizeAfterKV(const Slice& key, const Slice& value) : value.size() / 2; if (counter_ >= block_restart_interval_) { - estimate += sizeof(uint32_t); // a new restart entry. + estimate += sizeof(uint32_t); // a new restart entry. } - estimate += sizeof(int32_t); // varint for shared prefix length. + estimate += sizeof(int32_t); // varint for shared prefix length. // Note: this is an imprecise estimate as we will have to encoded size, one // for shared key and one for non-shared key. - estimate += VarintLength(key.size()); // varint for key length. + estimate += VarintLength(key.size()); // varint for key length. if (!use_value_delta_encoding_ || (counter_ >= block_restart_interval_)) { estimate += VarintLength(value.size()); // varint for value length. } diff --git a/ceph/src/rocksdb/table/block_builder.h b/ceph/src/rocksdb/table/block_builder.h index 3b7fc1768..0576279f5 100644 --- a/ceph/src/rocksdb/table/block_builder.h +++ b/ceph/src/rocksdb/table/block_builder.h @@ -54,23 +54,21 @@ class BlockBuilder { size_t EstimateSizeAfterKV(const Slice& key, const Slice& value) const; // Return true iff no entries have been added since the last Reset() - bool empty() const { - return buffer_.empty(); - } + bool empty() const { return buffer_.empty(); } private: - const int block_restart_interval_; + const int block_restart_interval_; // TODO(myabandeh): put it into a separate IndexBlockBuilder - const bool use_delta_encoding_; + const bool use_delta_encoding_; // Refer to BlockIter::DecodeCurrentValue for format of delta encoded values const bool use_value_delta_encoding_; - std::string buffer_; // Destination buffer + std::string buffer_; // Destination buffer std::vector restarts_; // Restart points - size_t estimate_; - int counter_; // Number of entries emitted since restart - bool finished_; // Has Finish() been called? - std::string last_key_; + size_t estimate_; + int counter_; // Number of entries emitted since restart + bool finished_; // Has Finish() been called? + std::string last_key_; DataBlockHashIndexBuilder data_block_hash_index_builder_; }; diff --git a/ceph/src/rocksdb/table/block_fetcher.cc b/ceph/src/rocksdb/table/block_fetcher.cc index ea97066ec..1f209210c 100644 --- a/ceph/src/rocksdb/table/block_fetcher.cc +++ b/ceph/src/rocksdb/table/block_fetcher.cc @@ -9,29 +9,29 @@ #include "table/block_fetcher.h" -#include #include +#include #include "monitoring/perf_context_imp.h" #include "monitoring/statistics.h" #include "rocksdb/env.h" #include "table/block.h" #include "table/block_based_table_reader.h" -#include "table/persistent_cache_helper.h" #include "table/format.h" +#include "table/persistent_cache_helper.h" #include "util/coding.h" #include "util/compression.h" #include "util/crc32c.h" #include "util/file_reader_writer.h" #include "util/logging.h" +#include "util/memory_allocator.h" #include "util/stop_watch.h" #include "util/string_util.h" #include "util/xxhash.h" namespace rocksdb { -inline -void BlockFetcher::CheckBlockChecksum() { +inline void BlockFetcher::CheckBlockChecksum() { // Check the crc of the type and the block contents if (read_options_.verify_checksums) { const char* data = slice_.data(); // Pointer to where Read put the data @@ -48,6 +48,11 @@ void BlockFetcher::CheckBlockChecksum() { case kxxHash: actual = XXH32(data, static_cast(block_size_) + 1, 0); break; + case kxxHash64: + actual = static_cast( + XXH64(data, static_cast(block_size_) + 1, 0) & + uint64_t{0xffffffff}); + break; default: status_ = Status::Corruption( "unknown checksum type " + ToString(footer_.checksum()) + " in " + @@ -63,8 +68,7 @@ void BlockFetcher::CheckBlockChecksum() { } } -inline -bool BlockFetcher::TryGetUncompressBlockFromPersistentCache() { +inline bool BlockFetcher::TryGetUncompressBlockFromPersistentCache() { if (cache_options_.persistent_cache && !cache_options_.persistent_cache->IsCompressed()) { Status status = PersistentCacheHelper::LookupUncompressedPage( @@ -85,8 +89,7 @@ bool BlockFetcher::TryGetUncompressBlockFromPersistentCache() { return false; } -inline -bool BlockFetcher::TryGetFromPrefetchBuffer() { +inline bool BlockFetcher::TryGetFromPrefetchBuffer() { if (prefetch_buffer_ != nullptr && prefetch_buffer_->TryReadFromCache( handle_.offset(), @@ -102,14 +105,15 @@ bool BlockFetcher::TryGetFromPrefetchBuffer() { return got_from_prefetch_buffer_; } -inline -bool BlockFetcher::TryGetCompressedBlockFromPersistentCache() { +inline bool BlockFetcher::TryGetCompressedBlockFromPersistentCache() { if (cache_options_.persistent_cache && cache_options_.persistent_cache->IsCompressed()) { // lookup uncompressed cache mode p-cache + std::unique_ptr raw_data; status_ = PersistentCacheHelper::LookupRawPage( - cache_options_, handle_, &heap_buf_, block_size_ + kBlockTrailerSize); + cache_options_, handle_, &raw_data, block_size_ + kBlockTrailerSize); if (status_.ok()) { + heap_buf_ = CacheAllocationPtr(raw_data.release()); used_buf_ = heap_buf_.get(); slice_ = Slice(heap_buf_.get(), block_size_); return true; @@ -123,22 +127,25 @@ bool BlockFetcher::TryGetCompressedBlockFromPersistentCache() { return false; } -inline -void BlockFetcher::PrepareBufferForBlockFromFile() { +inline void BlockFetcher::PrepareBufferForBlockFromFile() { // cache miss read from device if (do_uncompress_ && block_size_ + kBlockTrailerSize < kDefaultStackBufferSize) { // If we've got a small enough hunk of data, read it in to the // trivially allocated stack buffer instead of needing a full malloc() used_buf_ = &stack_buf_[0]; + } else if (maybe_compressed_ && !do_uncompress_) { + compressed_buf_ = AllocateBlock(block_size_ + kBlockTrailerSize, + memory_allocator_compressed_); + used_buf_ = compressed_buf_.get(); } else { - heap_buf_.reset(new char[block_size_ + kBlockTrailerSize]); + heap_buf_ = + AllocateBlock(block_size_ + kBlockTrailerSize, memory_allocator_); used_buf_ = heap_buf_.get(); } } -inline -void BlockFetcher::InsertCompressedBlockToPersistentCacheIfNeeded() { +inline void BlockFetcher::InsertCompressedBlockToPersistentCacheIfNeeded() { if (status_.ok() && read_options_.fill_cache && cache_options_.persistent_cache && cache_options_.persistent_cache->IsCompressed()) { @@ -148,8 +155,7 @@ void BlockFetcher::InsertCompressedBlockToPersistentCacheIfNeeded() { } } -inline -void BlockFetcher::InsertUncompressedBlockToPersistentCacheIfNeeded() { +inline void BlockFetcher::InsertUncompressedBlockToPersistentCacheIfNeeded() { if (status_.ok() && !got_from_prefetch_buffer_ && read_options_.fill_cache && cache_options_.persistent_cache && !cache_options_.persistent_cache->IsCompressed()) { @@ -159,29 +165,44 @@ void BlockFetcher::InsertUncompressedBlockToPersistentCacheIfNeeded() { } } -inline -void BlockFetcher::GetBlockContents() { +inline void BlockFetcher::CopyBufferToHeap() { + assert(used_buf_ != heap_buf_.get()); + heap_buf_ = AllocateBlock(block_size_ + kBlockTrailerSize, memory_allocator_); + memcpy(heap_buf_.get(), used_buf_, block_size_ + kBlockTrailerSize); +} + +inline void BlockFetcher::GetBlockContents() { if (slice_.data() != used_buf_) { // the slice content is not the buffer provided - *contents_ = BlockContents(Slice(slice_.data(), block_size_), - immortal_source_, compression_type); + *contents_ = BlockContents(Slice(slice_.data(), block_size_)); } else { // page can be either uncompressed or compressed, the buffer either stack // or heap provided. Refer to https://github.com/facebook/rocksdb/pull/4096 if (got_from_prefetch_buffer_ || used_buf_ == &stack_buf_[0]) { - assert(used_buf_ != heap_buf_.get()); - heap_buf_.reset(new char[block_size_ + kBlockTrailerSize]); - memcpy(heap_buf_.get(), used_buf_, block_size_ + kBlockTrailerSize); + CopyBufferToHeap(); + } else if (used_buf_ == compressed_buf_.get()) { + if (compression_type_ == kNoCompression && + memory_allocator_ != memory_allocator_compressed_) { + CopyBufferToHeap(); + } else { + heap_buf_ = std::move(compressed_buf_); + } } - *contents_ = BlockContents(std::move(heap_buf_), block_size_, true, - compression_type); + *contents_ = BlockContents(std::move(heap_buf_), block_size_); } +#ifndef NDEBUG + contents_->is_raw_block = true; +#endif } Status BlockFetcher::ReadBlockContents() { block_size_ = static_cast(handle_.size()); if (TryGetUncompressBlockFromPersistentCache()) { + compression_type_ = kNoCompression; +#ifndef NDEBUG + contents_->is_raw_block = true; +#endif // NDEBUG return Status::OK(); } if (TryGetFromPrefetchBuffer()) { @@ -222,15 +243,16 @@ Status BlockFetcher::ReadBlockContents() { PERF_TIMER_GUARD(block_decompress_time); - compression_type = - static_cast(slice_.data()[block_size_]); + compression_type_ = get_block_compression_type(slice_.data(), block_size_); - if (do_uncompress_ && compression_type != kNoCompression) { + if (do_uncompress_ && compression_type_ != kNoCompression) { // compressed page, uncompress, update cache - UncompressionContext uncompression_ctx(compression_type, compression_dict_); - status_ = - UncompressBlockContents(uncompression_ctx, slice_.data(), block_size_, - contents_, footer_.version(), ioptions_); + UncompressionContext context(compression_type_); + UncompressionInfo info(context, uncompression_dict_, compression_type_); + status_ = UncompressBlockContents(info, slice_.data(), block_size_, + contents_, footer_.version(), ioptions_, + memory_allocator_); + compression_type_ = kNoCompression; } else { GetBlockContents(); } diff --git a/ceph/src/rocksdb/table/block_fetcher.h b/ceph/src/rocksdb/table/block_fetcher.h index 9e0d2448d..b5fee9415 100644 --- a/ceph/src/rocksdb/table/block_fetcher.h +++ b/ceph/src/rocksdb/table/block_fetcher.h @@ -10,6 +10,7 @@ #pragma once #include "table/block.h" #include "table/format.h" +#include "util/memory_allocator.h" namespace rocksdb { class BlockFetcher { @@ -18,15 +19,17 @@ class BlockFetcher { // The only relevant option is options.verify_checksums for now. // On failure return non-OK. // On success fill *result and return OK - caller owns *result - // @param compression_dict Data for presetting the compression library's + // @param uncompression_dict Data for presetting the compression library's // dictionary. BlockFetcher(RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer, const Footer& footer, const ReadOptions& read_options, const BlockHandle& handle, BlockContents* contents, const ImmutableCFOptions& ioptions, - bool do_uncompress, const Slice& compression_dict, + bool do_uncompress, bool maybe_compressed, + const UncompressionDict& uncompression_dict, const PersistentCacheOptions& cache_options, - const bool immortal_source = false) + MemoryAllocator* memory_allocator = nullptr, + MemoryAllocator* memory_allocator_compressed = nullptr) : file_(file), prefetch_buffer_(prefetch_buffer), footer_(footer), @@ -35,10 +38,13 @@ class BlockFetcher { contents_(contents), ioptions_(ioptions), do_uncompress_(do_uncompress), - immortal_source_(immortal_source), - compression_dict_(compression_dict), - cache_options_(cache_options) {} + maybe_compressed_(maybe_compressed), + uncompression_dict_(uncompression_dict), + cache_options_(cache_options), + memory_allocator_(memory_allocator), + memory_allocator_compressed_(memory_allocator_compressed) {} Status ReadBlockContents(); + CompressionType get_compression_type() const { return compression_type_; } private: static const uint32_t kDefaultStackBufferSize = 5000; @@ -51,17 +57,20 @@ class BlockFetcher { BlockContents* contents_; const ImmutableCFOptions& ioptions_; bool do_uncompress_; - const bool immortal_source_; - const Slice& compression_dict_; + bool maybe_compressed_; + const UncompressionDict& uncompression_dict_; const PersistentCacheOptions& cache_options_; + MemoryAllocator* memory_allocator_; + MemoryAllocator* memory_allocator_compressed_; Status status_; Slice slice_; char* used_buf_ = nullptr; size_t block_size_; - std::unique_ptr heap_buf_; + CacheAllocationPtr heap_buf_; + CacheAllocationPtr compressed_buf_; char stack_buf_[kDefaultStackBufferSize]; bool got_from_prefetch_buffer_ = false; - rocksdb::CompressionType compression_type; + rocksdb::CompressionType compression_type_; // return true if found bool TryGetUncompressBlockFromPersistentCache(); @@ -69,6 +78,8 @@ class BlockFetcher { bool TryGetFromPrefetchBuffer(); bool TryGetCompressedBlockFromPersistentCache(); void PrepareBufferForBlockFromFile(); + // Copy content from used_buf_ to new heap buffer. + void CopyBufferToHeap(); void GetBlockContents(); void InsertCompressedBlockToPersistentCacheIfNeeded(); void InsertUncompressedBlockToPersistentCacheIfNeeded(); diff --git a/ceph/src/rocksdb/table/block_prefix_index.cc b/ceph/src/rocksdb/table/block_prefix_index.cc index df37b5fc2..67c749d4c 100644 --- a/ceph/src/rocksdb/table/block_prefix_index.cc +++ b/ceph/src/rocksdb/table/block_prefix_index.cc @@ -41,9 +41,7 @@ inline uint32_t PrefixToBucket(const Slice& prefix, uint32_t num_buckets) { const uint32_t kNoneBlock = 0x7FFFFFFF; const uint32_t kBlockArrayMask = 0x80000000; -inline bool IsNone(uint32_t block_id) { - return block_id == kNoneBlock; -} +inline bool IsNone(uint32_t block_id) { return block_id == kNoneBlock; } inline bool IsBlockId(uint32_t block_id) { return (block_id & kBlockArrayMask) == 0; @@ -74,10 +72,9 @@ class BlockPrefixIndex::Builder { explicit Builder(const SliceTransform* internal_prefix_extractor) : internal_prefix_extractor_(internal_prefix_extractor) {} - void Add(const Slice& key_prefix, uint32_t start_block, - uint32_t num_blocks) { + void Add(const Slice& key_prefix, uint32_t start_block, uint32_t num_blocks) { PrefixRecord* record = reinterpret_cast( - arena_.AllocateAligned(sizeof(PrefixRecord))); + arena_.AllocateAligned(sizeof(PrefixRecord))); record->prefix = key_prefix; record->start_block = start_block; record->end_block = start_block + num_blocks - 1; @@ -169,7 +166,6 @@ class BlockPrefixIndex::Builder { Arena arena_; }; - Status BlockPrefixIndex::Create(const SliceTransform* internal_prefix_extractor, const Slice& prefixes, const Slice& prefix_meta, BlockPrefixIndex** prefix_index) { @@ -191,7 +187,7 @@ Status BlockPrefixIndex::Create(const SliceTransform* internal_prefix_extractor, } if (pos + prefix_size > prefixes.size()) { s = Status::Corruption( - "Corrupted prefix meta block: size inconsistency."); + "Corrupted prefix meta block: size inconsistency."); break; } Slice prefix(prefixes.data() + pos, prefix_size); @@ -211,8 +207,7 @@ Status BlockPrefixIndex::Create(const SliceTransform* internal_prefix_extractor, return s; } -uint32_t BlockPrefixIndex::GetBlocks(const Slice& key, - uint32_t** blocks) { +uint32_t BlockPrefixIndex::GetBlocks(const Slice& key, uint32_t** blocks) { Slice prefix = internal_prefix_extractor_->Transform(key); uint32_t bucket = PrefixToBucket(prefix, num_buckets_); @@ -226,7 +221,7 @@ uint32_t BlockPrefixIndex::GetBlocks(const Slice& key, } else { uint32_t index = DecodeIndex(block_id); assert(index < num_block_array_buffer_entries_); - *blocks = &block_array_buffer_[index+1]; + *blocks = &block_array_buffer_[index + 1]; uint32_t num_blocks = block_array_buffer_[index]; assert(num_blocks > 1); assert(index + num_blocks < num_block_array_buffer_entries_); diff --git a/ceph/src/rocksdb/table/block_prefix_index.h b/ceph/src/rocksdb/table/block_prefix_index.h index dd4282d17..105606db2 100644 --- a/ceph/src/rocksdb/table/block_prefix_index.h +++ b/ceph/src/rocksdb/table/block_prefix_index.h @@ -19,7 +19,6 @@ class SliceTransform; // that index block. class BlockPrefixIndex { public: - // Maps a key to a list of data blocks that could potentially contain // the key, based on the prefix. // Returns the total number of relevant blocks, 0 means the key does @@ -28,7 +27,7 @@ class BlockPrefixIndex { size_t ApproximateMemoryUsage() const { return sizeof(BlockPrefixIndex) + - (num_block_array_buffer_entries_ + num_buckets_) * sizeof(uint32_t); + (num_block_array_buffer_entries_ + num_buckets_) * sizeof(uint32_t); } // Create hash index by reading from the metadata blocks. @@ -48,8 +47,7 @@ class BlockPrefixIndex { friend Builder; BlockPrefixIndex(const SliceTransform* internal_prefix_extractor, - uint32_t num_buckets, - uint32_t* buckets, + uint32_t num_buckets, uint32_t* buckets, uint32_t num_block_array_buffer_entries, uint32_t* block_array_buffer) : internal_prefix_extractor_(internal_prefix_extractor), diff --git a/ceph/src/rocksdb/table/block_test.cc b/ceph/src/rocksdb/table/block_test.cc index 0ca6ec3f6..3e0ff3eab 100644 --- a/ceph/src/rocksdb/table/block_test.cc +++ b/ceph/src/rocksdb/table/block_test.cc @@ -12,13 +12,13 @@ #include #include "db/dbformat.h" -#include "db/write_batch_internal.h" #include "db/memtable.h" +#include "db/write_batch_internal.h" #include "rocksdb/db.h" #include "rocksdb/env.h" #include "rocksdb/iterator.h" -#include "rocksdb/table.h" #include "rocksdb/slice_transform.h" +#include "rocksdb/table.h" #include "table/block.h" #include "table/block_builder.h" #include "table/format.h" @@ -28,7 +28,7 @@ namespace rocksdb { -static std::string RandomString(Random* rnd, int len) { +static std::string RandomString(Random *rnd, int len) { std::string r; test::RandomString(rnd, len, &r); return r; @@ -117,15 +117,13 @@ TEST_F(BlockTest, SimpleTest) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); // read contents of block sequentially int count = 0; InternalIterator *iter = reader.NewIterator(options.comparator, options.comparator); - for (iter->SeekToFirst();iter->Valid(); count++, iter->Next()) { - + for (iter->SeekToFirst(); iter->Valid(); count++, iter->Next()) { // read kv from block Slice k = iter->key(); Slice v = iter->value(); @@ -140,7 +138,6 @@ TEST_F(BlockTest, SimpleTest) { iter = reader.NewIterator(options.comparator, options.comparator); for (int i = 0; i < num_records; i++) { - // find a random key in the lookaside array int index = rnd.Uniform(num_records); Slice k(keys[index]); @@ -188,7 +185,6 @@ TEST_F(BlockTest, ValueDeltaEncodingTest) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); const bool kTotalOrderSeek = true; @@ -247,7 +243,6 @@ BlockContents GetBlockContents(std::unique_ptr *builder, BlockContents contents; contents.data = rawblock; - contents.cachable = false; return contents; } @@ -257,8 +252,7 @@ void CheckBlockContents(BlockContents contents, const int max_key, const std::vector &values) { const size_t prefix_size = 6; // create block reader - BlockContents contents_ref(contents.data, contents.cachable, - contents.compression_type); + BlockContents contents_ref(contents.data); Block reader1(std::move(contents), kDisableGlobalSequenceNumber); Block reader2(std::move(contents_ref), kDisableGlobalSequenceNumber); @@ -379,9 +373,9 @@ class BlockReadAmpBitmapSlowAndAccurate { TEST_F(BlockTest, BlockReadAmpBitmap) { uint32_t pin_offset = 0; SyncPoint::GetInstance()->SetCallBack( - "BlockReadAmpBitmap:rnd", [&pin_offset](void* arg) { - pin_offset = *(static_cast(arg)); - }); + "BlockReadAmpBitmap:rnd", [&pin_offset](void *arg) { + pin_offset = *(static_cast(arg)); + }); SyncPoint::GetInstance()->EnableProcessing(); std::vector block_sizes = { 1, // 1 byte @@ -447,11 +441,11 @@ TEST_F(BlockTest, BlockReadAmpBitmap) { size_t total_bits = 0; for (size_t bit_idx = 0; bit_idx < needed_bits; bit_idx++) { total_bits += read_amp_slow_and_accurate.IsPinMarked( - bit_idx * kBytesPerBit + pin_offset); + bit_idx * kBytesPerBit + pin_offset); } size_t expected_estimate_useful = total_bits * kBytesPerBit; size_t got_estimate_useful = - stats->getTickerCount(READ_AMP_ESTIMATE_USEFUL_BYTES); + stats->getTickerCount(READ_AMP_ESTIMATE_USEFUL_BYTES); ASSERT_EQ(expected_estimate_useful, got_estimate_useful); } } @@ -486,7 +480,6 @@ TEST_F(BlockTest, BlockWithReadAmpBitmap) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = true; Block reader(std::move(contents), kDisableGlobalSequenceNumber, kBytesPerBit, stats.get()); @@ -521,7 +514,6 @@ TEST_F(BlockTest, BlockWithReadAmpBitmap) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = true; Block reader(std::move(contents), kDisableGlobalSequenceNumber, kBytesPerBit, stats.get()); @@ -558,7 +550,6 @@ TEST_F(BlockTest, BlockWithReadAmpBitmap) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = true; Block reader(std::move(contents), kDisableGlobalSequenceNumber, kBytesPerBit, stats.get()); diff --git a/ceph/src/rocksdb/table/bloom_block.h b/ceph/src/rocksdb/table/bloom_block.h index 9ff610bad..483fa25d9 100644 --- a/ceph/src/rocksdb/table/bloom_block.h +++ b/ceph/src/rocksdb/table/bloom_block.h @@ -15,8 +15,7 @@ class BloomBlockBuilder { public: static const std::string kBloomBlock; - explicit BloomBlockBuilder(uint32_t num_probes = 6) - : bloom_(num_probes, nullptr) {} + explicit BloomBlockBuilder(uint32_t num_probes = 6) : bloom_(num_probes) {} void SetTotalBits(Allocator* allocator, uint32_t total_bits, uint32_t locality, size_t huge_page_tlb_size, diff --git a/ceph/src/rocksdb/table/cuckoo_table_builder.cc b/ceph/src/rocksdb/table/cuckoo_table_builder.cc index 7d9842a95..f590e6ad4 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_builder.cc +++ b/ceph/src/rocksdb/table/cuckoo_table_builder.cc @@ -289,6 +289,7 @@ Status CuckooTableBuilder::Finish() { } } properties_.num_entries = num_entries_; + properties_.num_deletions = num_entries_ - num_values_; properties_.fixed_key_len = key_size_; properties_.user_collected_properties[ CuckooTablePropertyNames::kValueLength].assign( diff --git a/ceph/src/rocksdb/table/cuckoo_table_builder_test.cc b/ceph/src/rocksdb/table/cuckoo_table_builder_test.cc index 27eacf6ec..c1e350327 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_builder_test.cc +++ b/ceph/src/rocksdb/table/cuckoo_table_builder_test.cc @@ -43,8 +43,15 @@ class CuckooBuilderTest : public testing::Test { std::string expected_unused_bucket, uint64_t expected_table_size, uint32_t expected_num_hash_func, bool expected_is_last_level, uint32_t expected_cuckoo_block_size = 1) { + uint64_t num_deletions = 0; + for (const auto& key : keys) { + ParsedInternalKey parsed; + if (ParseInternalKey(key, &parsed) && parsed.type == kTypeDeletion) { + num_deletions++; + } + } // Read file - unique_ptr read_file; + std::unique_ptr read_file; ASSERT_OK(env_->NewRandomAccessFile(fname, &read_file, env_options_)); uint64_t read_file_size; ASSERT_OK(env_->GetFileSize(fname, &read_file_size)); @@ -56,7 +63,7 @@ class CuckooBuilderTest : public testing::Test { // Assert Table Properties. TableProperties* props = nullptr; - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); ASSERT_OK(ReadTableProperties(file_reader.get(), read_file_size, kCuckooTableMagicNumber, ioptions, @@ -90,6 +97,7 @@ class CuckooBuilderTest : public testing::Test { ASSERT_EQ(expected_is_last_level, is_last_level_found); ASSERT_EQ(props->num_entries, keys.size()); + ASSERT_EQ(props->num_deletions, num_deletions); ASSERT_EQ(props->fixed_key_len, keys.empty() ? 0 : keys[0].size()); ASSERT_EQ(props->data_size, expected_unused_bucket.size() * (expected_table_size + expected_cuckoo_block_size - 1)); @@ -126,9 +134,10 @@ class CuckooBuilderTest : public testing::Test { } } - std::string GetInternalKey(Slice user_key, bool zero_seqno) { + std::string GetInternalKey(Slice user_key, bool zero_seqno, + ValueType type = kTypeValue) { IterKey ikey; - ikey.SetInternalKey(user_key, zero_seqno ? 0 : 1000, kTypeValue); + ikey.SetInternalKey(user_key, zero_seqno ? 0 : 1000, type); return ikey.GetInternalKey().ToString(); } @@ -152,10 +161,10 @@ class CuckooBuilderTest : public testing::Test { }; TEST_F(CuckooBuilderTest, SuccessWithEmptyFile) { - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("EmptyFile"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, 4, 100, BytewiseComparator(), 1, false, false, @@ -169,50 +178,57 @@ TEST_F(CuckooBuilderTest, SuccessWithEmptyFile) { } TEST_F(CuckooBuilderTest, WriteSuccessNoCollisionFullKey) { - uint32_t num_hash_fun = 4; - std::vector user_keys = {"key01", "key02", "key03", "key04"}; - std::vector values = {"v01", "v02", "v03", "v04"}; - // Need to have a temporary variable here as VS compiler does not currently - // support operator= with initializer_list as a parameter - std::unordered_map> hm = { - {user_keys[0], {0, 1, 2, 3}}, - {user_keys[1], {1, 2, 3, 4}}, - {user_keys[2], {2, 3, 4, 5}}, - {user_keys[3], {3, 4, 5, 6}}}; - hash_map = std::move(hm); - - std::vector expected_locations = {0, 1, 2, 3}; - std::vector keys; - for (auto& user_key : user_keys) { - keys.push_back(GetInternalKey(user_key, false)); - } - uint64_t expected_table_size = GetExpectedTableSize(keys.size()); - - unique_ptr writable_file; - fname = test::PerThreadDBPath("NoCollisionFullKey"); - ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( - new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); - CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, - 100, BytewiseComparator(), 1, false, false, - GetSliceHash, 0 /* column_family_id */, - kDefaultColumnFamilyName); - ASSERT_OK(builder.status()); - for (uint32_t i = 0; i < user_keys.size(); i++) { - builder.Add(Slice(keys[i]), Slice(values[i])); - ASSERT_EQ(builder.NumEntries(), i + 1); + for (auto type : {kTypeValue, kTypeDeletion}) { + uint32_t num_hash_fun = 4; + std::vector user_keys = {"key01", "key02", "key03", "key04"}; + std::vector values; + if (type == kTypeValue) { + values = {"v01", "v02", "v03", "v04"}; + } else { + values = {"", "", "", ""}; + } + // Need to have a temporary variable here as VS compiler does not currently + // support operator= with initializer_list as a parameter + std::unordered_map> hm = { + {user_keys[0], {0, 1, 2, 3}}, + {user_keys[1], {1, 2, 3, 4}}, + {user_keys[2], {2, 3, 4, 5}}, + {user_keys[3], {3, 4, 5, 6}}}; + hash_map = std::move(hm); + + std::vector expected_locations = {0, 1, 2, 3}; + std::vector keys; + for (auto& user_key : user_keys) { + keys.push_back(GetInternalKey(user_key, false, type)); + } + uint64_t expected_table_size = GetExpectedTableSize(keys.size()); + + std::unique_ptr writable_file; + fname = test::PerThreadDBPath("NoCollisionFullKey"); + ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); + std::unique_ptr file_writer( + new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); + CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, + 100, BytewiseComparator(), 1, false, false, + GetSliceHash, 0 /* column_family_id */, + kDefaultColumnFamilyName); ASSERT_OK(builder.status()); + for (uint32_t i = 0; i < user_keys.size(); i++) { + builder.Add(Slice(keys[i]), Slice(values[i])); + ASSERT_EQ(builder.NumEntries(), i + 1); + ASSERT_OK(builder.status()); + } + size_t bucket_size = keys[0].size() + values[0].size(); + ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); + ASSERT_OK(builder.Finish()); + ASSERT_OK(file_writer->Close()); + ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); + + std::string expected_unused_bucket = GetInternalKey("key00", true); + expected_unused_bucket += std::string(values[0].size(), 'a'); + CheckFileContents(keys, values, expected_locations, expected_unused_bucket, + expected_table_size, 2, false); } - size_t bucket_size = keys[0].size() + values[0].size(); - ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); - ASSERT_OK(builder.Finish()); - ASSERT_OK(file_writer->Close()); - ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); - - std::string expected_unused_bucket = GetInternalKey("key00", true); - expected_unused_bucket += std::string(values[0].size(), 'a'); - CheckFileContents(keys, values, expected_locations, - expected_unused_bucket, expected_table_size, 2, false); } TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionFullKey) { @@ -236,10 +252,10 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionFullKey) { } uint64_t expected_table_size = GetExpectedTableSize(keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionFullKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 1, false, false, @@ -284,11 +300,11 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionAndCuckooBlock) { } uint64_t expected_table_size = GetExpectedTableSize(keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; uint32_t cuckoo_block_size = 2; fname = test::PerThreadDBPath("WithCollisionFullKey2"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder( file_writer.get(), kHashTableRatio, num_hash_fun, 100, @@ -338,10 +354,10 @@ TEST_F(CuckooBuilderTest, WithCollisionPathFullKey) { } uint64_t expected_table_size = GetExpectedTableSize(keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionPathFullKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 1, false, false, @@ -388,10 +404,10 @@ TEST_F(CuckooBuilderTest, WithCollisionPathFullKeyAndCuckooBlock) { } uint64_t expected_table_size = GetExpectedTableSize(keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionPathFullKeyAndCuckooBlock"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 2, false, false, @@ -431,10 +447,10 @@ TEST_F(CuckooBuilderTest, WriteSuccessNoCollisionUserKey) { std::vector expected_locations = {0, 1, 2, 3}; uint64_t expected_table_size = GetExpectedTableSize(user_keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("NoCollisionUserKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 1, false, false, @@ -475,10 +491,10 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionUserKey) { std::vector expected_locations = {0, 1, 2, 3}; uint64_t expected_table_size = GetExpectedTableSize(user_keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionUserKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 1, false, false, @@ -521,10 +537,10 @@ TEST_F(CuckooBuilderTest, WithCollisionPathUserKey) { std::vector expected_locations = {0, 1, 3, 4, 2}; uint64_t expected_table_size = GetExpectedTableSize(user_keys.size()); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionPathUserKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 2, BytewiseComparator(), 1, false, false, @@ -566,10 +582,10 @@ TEST_F(CuckooBuilderTest, FailWhenCollisionPathTooLong) { }; hash_map = std::move(hm); - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("WithCollisionPathUserKey"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 2, BytewiseComparator(), 1, false, false, @@ -594,10 +610,10 @@ TEST_F(CuckooBuilderTest, FailWhenSameKeyInserted) { uint32_t num_hash_fun = 4; std::string user_key = "repeatedkey"; - unique_ptr writable_file; + std::unique_ptr writable_file; fname = test::PerThreadDBPath("FailWhenSameKeyInserted"); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, EnvOptions())); CuckooTableBuilder builder(file_writer.get(), kHashTableRatio, num_hash_fun, 100, BytewiseComparator(), 1, false, false, diff --git a/ceph/src/rocksdb/table/cuckoo_table_factory.cc b/ceph/src/rocksdb/table/cuckoo_table_factory.cc index 84d22468e..74d18d512 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_factory.cc +++ b/ceph/src/rocksdb/table/cuckoo_table_factory.cc @@ -14,7 +14,7 @@ namespace rocksdb { Status CuckooTableFactory::NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, + std::unique_ptr&& file, uint64_t file_size, std::unique_ptr* table, bool /*prefetch_index_and_filter_in_cache*/) const { std::unique_ptr new_reader(new CuckooTableReader( diff --git a/ceph/src/rocksdb/table/cuckoo_table_factory.h b/ceph/src/rocksdb/table/cuckoo_table_factory.h index a96635de5..eb3c5e517 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_factory.h +++ b/ceph/src/rocksdb/table/cuckoo_table_factory.h @@ -60,8 +60,8 @@ class CuckooTableFactory : public TableFactory { Status NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table, bool prefetch_index_and_filter_in_cache = true) const override; TableBuilder* NewTableBuilder( diff --git a/ceph/src/rocksdb/table/cuckoo_table_reader.cc b/ceph/src/rocksdb/table/cuckoo_table_reader.cc index be7b1ffa9..f4df2467f 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_reader.cc +++ b/ceph/src/rocksdb/table/cuckoo_table_reader.cc @@ -197,7 +197,7 @@ void CuckooTableReader::Prepare(const Slice& key) { class CuckooTableIterator : public InternalIterator { public: explicit CuckooTableIterator(CuckooTableReader* reader); - ~CuckooTableIterator() {} + ~CuckooTableIterator() override {} bool Valid() const override; void SeekToFirst() override; void SeekToLast() override; diff --git a/ceph/src/rocksdb/table/cuckoo_table_reader_test.cc b/ceph/src/rocksdb/table/cuckoo_table_reader_test.cc index 36083c547..74fb52e6c 100644 --- a/ceph/src/rocksdb/table/cuckoo_table_reader_test.cc +++ b/ceph/src/rocksdb/table/cuckoo_table_reader_test.cc @@ -95,7 +95,7 @@ class CuckooReaderTest : public testing::Test { const Comparator* ucomp = BytewiseComparator()) { std::unique_ptr writable_file; ASSERT_OK(env->NewWritableFile(fname, &writable_file, env_options)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, env_options)); CuckooTableBuilder builder( @@ -115,7 +115,7 @@ class CuckooReaderTest : public testing::Test { // Check reader now. std::unique_ptr read_file; ASSERT_OK(env->NewRandomAccessFile(fname, &read_file, env_options)); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); const ImmutableCFOptions ioptions(options); CuckooTableReader reader(ioptions, std::move(file_reader), file_size, ucomp, @@ -144,7 +144,7 @@ class CuckooReaderTest : public testing::Test { void CheckIterator(const Comparator* ucomp = BytewiseComparator()) { std::unique_ptr read_file; ASSERT_OK(env->NewRandomAccessFile(fname, &read_file, env_options)); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); const ImmutableCFOptions ioptions(options); CuckooTableReader reader(ioptions, std::move(file_reader), file_size, ucomp, @@ -323,7 +323,7 @@ TEST_F(CuckooReaderTest, WhenKeyNotFound) { CreateCuckooFileAndCheckReader(); std::unique_ptr read_file; ASSERT_OK(env->NewRandomAccessFile(fname, &read_file, env_options)); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); const ImmutableCFOptions ioptions(options); CuckooTableReader reader(ioptions, std::move(file_reader), file_size, ucmp, @@ -411,7 +411,7 @@ void WriteFile(const std::vector& keys, std::unique_ptr writable_file; ASSERT_OK(env->NewWritableFile(fname, &writable_file, env_options)); - unique_ptr file_writer( + std::unique_ptr file_writer( new WritableFileWriter(std::move(writable_file), fname, env_options)); CuckooTableBuilder builder( file_writer.get(), hash_ratio, 64, 1000, test::Uint64Comparator(), 5, @@ -432,7 +432,7 @@ void WriteFile(const std::vector& keys, env->GetFileSize(fname, &file_size); std::unique_ptr read_file; ASSERT_OK(env->NewRandomAccessFile(fname, &read_file, env_options)); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); const ImmutableCFOptions ioptions(options); @@ -464,7 +464,7 @@ void ReadKeys(uint64_t num, uint32_t batch_size) { env->GetFileSize(fname, &file_size); std::unique_ptr read_file; ASSERT_OK(env->NewRandomAccessFile(fname, &read_file, env_options)); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(read_file), fname)); const ImmutableCFOptions ioptions(options); diff --git a/ceph/src/rocksdb/table/data_block_hash_index_test.cc b/ceph/src/rocksdb/table/data_block_hash_index_test.cc index dc62917f2..11226648e 100644 --- a/ceph/src/rocksdb/table/data_block_hash_index_test.cc +++ b/ceph/src/rocksdb/table/data_block_hash_index_test.cc @@ -7,12 +7,14 @@ #include #include +#include "db/table_properties_collector.h" #include "rocksdb/slice.h" #include "table/block.h" #include "table/block_based_table_reader.h" #include "table/block_builder.h" #include "table/data_block_hash_index.h" #include "table/get_context.h" +#include "table/table_builder.h" #include "util/testharness.h" #include "util/testutil.h" @@ -282,7 +284,6 @@ TEST(DataBlockHashIndex, BlockRestartIndexExceedMax) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); ASSERT_EQ(reader.IndexType(), @@ -305,7 +306,6 @@ TEST(DataBlockHashIndex, BlockRestartIndexExceedMax) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); ASSERT_EQ(reader.IndexType(), @@ -337,7 +337,6 @@ TEST(DataBlockHashIndex, BlockSizeExceedMax) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); ASSERT_EQ(reader.IndexType(), @@ -362,7 +361,6 @@ TEST(DataBlockHashIndex, BlockSizeExceedMax) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); // the index type have fallen back to binary when build finish. @@ -390,7 +388,6 @@ TEST(DataBlockHashIndex, BlockTestSingleKey) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); const InternalKeyComparator icmp(BytewiseComparator()); @@ -472,7 +469,6 @@ TEST(DataBlockHashIndex, BlockTestLarge) { // create block reader BlockContents contents; contents.data = rawblock; - contents.cachable = false; Block reader(std::move(contents), kDisableGlobalSequenceNumber); const InternalKeyComparator icmp(BytewiseComparator()); @@ -540,9 +536,9 @@ TEST(DataBlockHashIndex, BlockTestLarge) { void TestBoundary(InternalKey& ik1, std::string& v1, InternalKey& ik2, std::string& v2, InternalKey& seek_ikey, GetContext& get_context, Options& options) { - unique_ptr file_writer; - unique_ptr file_reader; - unique_ptr table_reader; + std::unique_ptr file_writer; + std::unique_ptr file_reader; + std::unique_ptr table_reader; int level_ = -1; std::vector keys; @@ -555,16 +551,16 @@ void TestBoundary(InternalKey& ik1, std::string& v1, InternalKey& ik2, soptions.use_mmap_reads = ioptions.allow_mmap_reads; file_writer.reset( test::GetWritableFileWriter(new test::StringSink(), "" /* don't care */)); - unique_ptr builder; + std::unique_ptr builder; std::vector> int_tbl_prop_collector_factories; std::string column_family_name; builder.reset(ioptions.table_factory->NewTableBuilder( TableBuilderOptions(ioptions, moptions, internal_comparator, &int_tbl_prop_collector_factories, - options.compression, CompressionOptions(), - nullptr /* compression_dict */, - false /* skip_filters */, column_family_name, level_), + options.compression, options.sample_for_compression, + CompressionOptions(), false /* skip_filters */, + column_family_name, level_), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); diff --git a/ceph/src/rocksdb/table/flush_block_policy.cc b/ceph/src/rocksdb/table/flush_block_policy.cc index d2a4b9627..1b1675828 100644 --- a/ceph/src/rocksdb/table/flush_block_policy.cc +++ b/ceph/src/rocksdb/table/flush_block_policy.cc @@ -30,8 +30,7 @@ class FlushBlockBySizePolicy : public FlushBlockPolicy { align_(align), data_block_builder_(data_block_builder) {} - virtual bool Update(const Slice& key, - const Slice& value) override { + bool Update(const Slice& key, const Slice& value) override { // it makes no sense to flush when the data block is empty if (data_block_builder_.empty()) { return false; diff --git a/ceph/src/rocksdb/table/format.cc b/ceph/src/rocksdb/table/format.cc index 16d959c3d..476db85f7 100644 --- a/ceph/src/rocksdb/table/format.cc +++ b/ceph/src/rocksdb/table/format.cc @@ -9,8 +9,8 @@ #include "table/format.h" -#include #include +#include #include "monitoring/perf_context_imp.h" #include "monitoring/statistics.h" @@ -24,6 +24,7 @@ #include "util/crc32c.h" #include "util/file_reader_writer.h" #include "util/logging.h" +#include "util/memory_allocator.h" #include "util/stop_watch.h" #include "util/string_util.h" #include "util/xxhash.h" @@ -44,7 +45,7 @@ const uint64_t kPlainTableMagicNumber = 0; bool ShouldReportDetailedTime(Env* env, Statistics* stats) { return env != nullptr && stats != nullptr && - stats->stats_level_ > kExceptDetailedTimers; + stats->get_stats_level() > kExceptDetailedTimers; } void BlockHandle::EncodeTo(std::string* dst) const { @@ -55,8 +56,7 @@ void BlockHandle::EncodeTo(std::string* dst) const { } Status BlockHandle::DecodeFrom(Slice* input) { - if (GetVarint64(input, &offset_) && - GetVarint64(input, &size_)) { + if (GetVarint64(input, &offset_) && GetVarint64(input, &size_)) { return Status::OK(); } else { // reset in case failure after partially decoding @@ -158,7 +158,7 @@ Status Footer::DecodeFrom(Slice* input) { assert(input != nullptr); assert(input->size() >= kMinEncodedLength); - const char *magic_ptr = + const char* magic_ptr = input->data() + input->size() - kMagicNumberLengthByte; const uint32_t magic_lo = DecodeFixed32(magic_ptr); const uint32_t magic_hi = DecodeFixed32(magic_ptr + 4); @@ -233,9 +233,10 @@ Status ReadFooterFromFile(RandomAccessFileReader* file, uint64_t file_size, Footer* footer, uint64_t enforce_table_magic_number) { if (file_size < Footer::kMinEncodedLength) { - return Status::Corruption( - "file is too short (" + ToString(file_size) + " bytes) to be an " - "sstable: " + file->file_name()); + return Status::Corruption("file is too short (" + ToString(file_size) + + " bytes) to be an " + "sstable: " + + file->file_name()); } char footer_space[Footer::kMaxEncodedLength]; @@ -256,9 +257,10 @@ Status ReadFooterFromFile(RandomAccessFileReader* file, // Check that we actually read the whole footer from the file. It may be // that size isn't correct. if (footer_input.size() < Footer::kMinEncodedLength) { - return Status::Corruption( - "file is too short (" + ToString(file_size) + " bytes) to be an " - "sstable" + file->file_name()); + return Status::Corruption("file is too short (" + ToString(file_size) + + " bytes) to be an " + "sstable" + + file->file_name()); } s = footer->DecodeFrom(&footer_input); @@ -268,119 +270,121 @@ Status ReadFooterFromFile(RandomAccessFileReader* file, if (enforce_table_magic_number != 0 && enforce_table_magic_number != footer->table_magic_number()) { return Status::Corruption( - "Bad table magic number: expected " - + ToString(enforce_table_magic_number) + ", found " - + ToString(footer->table_magic_number()) - + " in " + file->file_name()); + "Bad table magic number: expected " + + ToString(enforce_table_magic_number) + ", found " + + ToString(footer->table_magic_number()) + " in " + file->file_name()); } return Status::OK(); } Status UncompressBlockContentsForCompressionType( - const UncompressionContext& uncompression_ctx, const char* data, size_t n, + const UncompressionInfo& uncompression_info, const char* data, size_t n, BlockContents* contents, uint32_t format_version, - const ImmutableCFOptions& ioptions) { - std::unique_ptr ubuf; + const ImmutableCFOptions& ioptions, MemoryAllocator* allocator) { + CacheAllocationPtr ubuf; - assert(uncompression_ctx.type() != kNoCompression && + assert(uncompression_info.type() != kNoCompression && "Invalid compression type"); - StopWatchNano timer(ioptions.env, - ShouldReportDetailedTime(ioptions.env, ioptions.statistics)); + StopWatchNano timer(ioptions.env, ShouldReportDetailedTime( + ioptions.env, ioptions.statistics)); int decompress_size = 0; - switch (uncompression_ctx.type()) { + switch (uncompression_info.type()) { case kSnappyCompression: { size_t ulength = 0; static char snappy_corrupt_msg[] = - "Snappy not supported or corrupted Snappy compressed block contents"; + "Snappy not supported or corrupted Snappy compressed block contents"; if (!Snappy_GetUncompressedLength(data, n, &ulength)) { return Status::Corruption(snappy_corrupt_msg); } - ubuf.reset(new char[ulength]); + ubuf = AllocateBlock(ulength, allocator); if (!Snappy_Uncompress(data, n, ubuf.get())) { return Status::Corruption(snappy_corrupt_msg); } - *contents = BlockContents(std::move(ubuf), ulength, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), ulength); break; } case kZlibCompression: - ubuf.reset(Zlib_Uncompress( - uncompression_ctx, data, n, &decompress_size, - GetCompressFormatForVersion(kZlibCompression, format_version))); + ubuf = Zlib_Uncompress( + uncompression_info, data, n, &decompress_size, + GetCompressFormatForVersion(kZlibCompression, format_version), + allocator); if (!ubuf) { static char zlib_corrupt_msg[] = - "Zlib not supported or corrupted Zlib compressed block contents"; + "Zlib not supported or corrupted Zlib compressed block contents"; return Status::Corruption(zlib_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; case kBZip2Compression: - ubuf.reset(BZip2_Uncompress( + ubuf = BZip2_Uncompress( data, n, &decompress_size, - GetCompressFormatForVersion(kBZip2Compression, format_version))); + GetCompressFormatForVersion(kBZip2Compression, format_version), + allocator); if (!ubuf) { static char bzip2_corrupt_msg[] = - "Bzip2 not supported or corrupted Bzip2 compressed block contents"; + "Bzip2 not supported or corrupted Bzip2 compressed block contents"; return Status::Corruption(bzip2_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; case kLZ4Compression: - ubuf.reset(LZ4_Uncompress( - uncompression_ctx, data, n, &decompress_size, - GetCompressFormatForVersion(kLZ4Compression, format_version))); + ubuf = LZ4_Uncompress( + uncompression_info, data, n, &decompress_size, + GetCompressFormatForVersion(kLZ4Compression, format_version), + allocator); if (!ubuf) { static char lz4_corrupt_msg[] = - "LZ4 not supported or corrupted LZ4 compressed block contents"; + "LZ4 not supported or corrupted LZ4 compressed block contents"; return Status::Corruption(lz4_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; case kLZ4HCCompression: - ubuf.reset(LZ4_Uncompress( - uncompression_ctx, data, n, &decompress_size, - GetCompressFormatForVersion(kLZ4HCCompression, format_version))); + ubuf = LZ4_Uncompress( + uncompression_info, data, n, &decompress_size, + GetCompressFormatForVersion(kLZ4HCCompression, format_version), + allocator); if (!ubuf) { static char lz4hc_corrupt_msg[] = - "LZ4HC not supported or corrupted LZ4HC compressed block contents"; + "LZ4HC not supported or corrupted LZ4HC compressed block contents"; return Status::Corruption(lz4hc_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; case kXpressCompression: + // XPRESS allocates memory internally, thus no support for custom + // allocator. ubuf.reset(XPRESS_Uncompress(data, n, &decompress_size)); if (!ubuf) { static char xpress_corrupt_msg[] = - "XPRESS not supported or corrupted XPRESS compressed block contents"; + "XPRESS not supported or corrupted XPRESS compressed block " + "contents"; return Status::Corruption(xpress_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; case kZSTD: case kZSTDNotFinalCompression: - ubuf.reset(ZSTD_Uncompress(uncompression_ctx, data, n, &decompress_size)); + ubuf = ZSTD_Uncompress(uncompression_info, data, n, &decompress_size, + allocator); if (!ubuf) { static char zstd_corrupt_msg[] = "ZSTD not supported or corrupted ZSTD compressed block contents"; return Status::Corruption(zstd_corrupt_msg); } - *contents = - BlockContents(std::move(ubuf), decompress_size, true, kNoCompression); + *contents = BlockContents(std::move(ubuf), decompress_size); break; default: return Status::Corruption("bad block type"); } - if(ShouldReportDetailedTime(ioptions.env, ioptions.statistics)){ - MeasureTime(ioptions.statistics, DECOMPRESSION_TIMES_NANOS, - timer.ElapsedNanos()); + if (ShouldReportDetailedTime(ioptions.env, ioptions.statistics)) { + RecordTimeToHistogram(ioptions.statistics, DECOMPRESSION_TIMES_NANOS, + timer.ElapsedNanos()); } - MeasureTime(ioptions.statistics, BYTES_DECOMPRESSED, contents->data.size()); + RecordTimeToHistogram(ioptions.statistics, BYTES_DECOMPRESSED, + contents->data.size()); RecordTick(ioptions.statistics, NUMBER_BLOCK_DECOMPRESSED); return Status::OK(); @@ -393,14 +397,16 @@ Status UncompressBlockContentsForCompressionType( // buffer is returned via 'result' and it is upto the caller to // free this buffer. // format_version is the block format as defined in include/rocksdb/table.h -Status UncompressBlockContents(const UncompressionContext& uncompression_ctx, +Status UncompressBlockContents(const UncompressionInfo& uncompression_info, const char* data, size_t n, BlockContents* contents, uint32_t format_version, - const ImmutableCFOptions& ioptions) { + const ImmutableCFOptions& ioptions, + MemoryAllocator* allocator) { assert(data[n] != kNoCompression); - assert(data[n] == uncompression_ctx.type()); - return UncompressBlockContentsForCompressionType( - uncompression_ctx, data, n, contents, format_version, ioptions); + assert(data[n] == uncompression_info.type()); + return UncompressBlockContentsForCompressionType(uncompression_info, data, n, + contents, format_version, + ioptions, allocator); } } // namespace rocksdb diff --git a/ceph/src/rocksdb/table/format.h b/ceph/src/rocksdb/table/format.h index 6e0e99c1c..f58588505 100644 --- a/ceph/src/rocksdb/table/format.h +++ b/ceph/src/rocksdb/table/format.h @@ -26,6 +26,7 @@ #include "port/port.h" // noexcept #include "table/persistent_cache_options.h" #include "util/file_reader_writer.h" +#include "util/memory_allocator.h" namespace rocksdb { @@ -75,8 +76,8 @@ class BlockHandle { static const BlockHandle kNullBlockHandle; }; -inline uint32_t GetCompressFormatForVersion( - CompressionType compression_type, uint32_t version) { +inline uint32_t GetCompressFormatForVersion(CompressionType compression_type, + uint32_t version) { #ifdef NDEBUG (void)compression_type; #endif @@ -188,28 +189,50 @@ Status ReadFooterFromFile(RandomAccessFileReader* file, // 1-byte type + 32-bit crc static const size_t kBlockTrailerSize = 5; +inline CompressionType get_block_compression_type(const char* block_data, + size_t block_size) { + return static_cast(block_data[block_size]); +} + struct BlockContents { - Slice data; // Actual contents of data - bool cachable; // True iff data can be cached - CompressionType compression_type; - std::unique_ptr allocation; + Slice data; // Actual contents of data + CacheAllocationPtr allocation; + +#ifndef NDEBUG + // Whether the block is a raw block, which contains compression type + // byte. It is only used for assertion. + bool is_raw_block = false; +#endif // NDEBUG - BlockContents() : cachable(false), compression_type(kNoCompression) {} + BlockContents() {} - BlockContents(const Slice& _data, bool _cachable, - CompressionType _compression_type) - : data(_data), cachable(_cachable), compression_type(_compression_type) {} + BlockContents(const Slice& _data) : data(_data) {} + + BlockContents(CacheAllocationPtr&& _data, size_t _size) + : data(_data.get(), _size), allocation(std::move(_data)) {} + + BlockContents(std::unique_ptr&& _data, size_t _size) + : data(_data.get(), _size) { + allocation.reset(_data.release()); + } - BlockContents(std::unique_ptr&& _data, size_t _size, bool _cachable, - CompressionType _compression_type) - : data(_data.get(), _size), - cachable(_cachable), - compression_type(_compression_type), - allocation(std::move(_data)) {} + bool own_bytes() const { return allocation.get() != nullptr; } + + // It's the caller's responsibility to make sure that this is + // for raw block contents, which contains the compression + // byte in the end. + CompressionType get_compression_type() const { + assert(is_raw_block); + return get_block_compression_type(data.data(), data.size()); + } // The additional memory space taken by the block data. size_t usable_size() const { if (allocation.get() != nullptr) { + auto allocator = allocation.get_deleter().allocator; + if (allocator) { + return allocator->UsableSize(allocation.get(), data.size()); + } #ifdef ROCKSDB_MALLOC_USABLE_SIZE return malloc_usable_size(allocation.get()); #else @@ -220,15 +243,20 @@ struct BlockContents { } } + size_t ApproximateMemoryUsage() const { + return usable_size() + sizeof(*this); + } + BlockContents(BlockContents&& other) ROCKSDB_NOEXCEPT { *this = std::move(other); } BlockContents& operator=(BlockContents&& other) { data = std::move(other.data); - cachable = other.cachable; - compression_type = other.compression_type; allocation = std::move(other.allocation); +#ifndef NDEBUG + is_raw_block = other.is_raw_block; +#endif // NDEBUG return *this; } }; @@ -249,18 +277,20 @@ extern Status ReadBlockContents( // free this buffer. // For description of compress_format_version and possible values, see // util/compression.h -extern Status UncompressBlockContents( - const UncompressionContext& uncompression_ctx, const char* data, size_t n, - BlockContents* contents, uint32_t compress_format_version, - const ImmutableCFOptions& ioptions); +extern Status UncompressBlockContents(const UncompressionInfo& info, + const char* data, size_t n, + BlockContents* contents, + uint32_t compress_format_version, + const ImmutableCFOptions& ioptions, + MemoryAllocator* allocator = nullptr); // This is an extension to UncompressBlockContents that accepts // a specific compression type. This is used by un-wrapped blocks // with no compression header. extern Status UncompressBlockContentsForCompressionType( - const UncompressionContext& uncompression_ctx, const char* data, size_t n, + const UncompressionInfo& info, const char* data, size_t n, BlockContents* contents, uint32_t compress_format_version, - const ImmutableCFOptions& ioptions); + const ImmutableCFOptions& ioptions, MemoryAllocator* allocator = nullptr); // Implementation details follow. Clients should ignore, diff --git a/ceph/src/rocksdb/table/full_filter_block_test.cc b/ceph/src/rocksdb/table/full_filter_block_test.cc index b2d81eee3..f01ae52bf 100644 --- a/ceph/src/rocksdb/table/full_filter_block_test.cc +++ b/ceph/src/rocksdb/table/full_filter_block_test.cc @@ -20,12 +20,12 @@ class TestFilterBitsBuilder : public FilterBitsBuilder { explicit TestFilterBitsBuilder() {} // Add Key to filter - virtual void AddKey(const Slice& key) override { + void AddKey(const Slice& key) override { hash_entries_.push_back(Hash(key.data(), key.size(), 1)); } // Generate the filter using the keys that are added - virtual Slice Finish(std::unique_ptr* buf) override { + Slice Finish(std::unique_ptr* buf) override { uint32_t len = static_cast(hash_entries_.size()) * 4; char* data = new char[len]; for (size_t i = 0; i < hash_entries_.size(); i++) { @@ -45,7 +45,7 @@ class TestFilterBitsReader : public FilterBitsReader { explicit TestFilterBitsReader(const Slice& contents) : data_(contents.data()), len_(static_cast(contents.size())) {} - virtual bool MayMatch(const Slice& entry) override { + bool MayMatch(const Slice& entry) override { uint32_t h = Hash(entry.data(), entry.size(), 1); for (size_t i = 0; i + 4 <= len_; i += 4) { if (h == DecodeFixed32(data_ + i)) { @@ -63,18 +63,16 @@ class TestFilterBitsReader : public FilterBitsReader { class TestHashFilter : public FilterPolicy { public: - virtual const char* Name() const override { return "TestHashFilter"; } + const char* Name() const override { return "TestHashFilter"; } - virtual void CreateFilter(const Slice* keys, int n, - std::string* dst) const override { + void CreateFilter(const Slice* keys, int n, std::string* dst) const override { for (int i = 0; i < n; i++) { uint32_t h = Hash(keys[i].data(), keys[i].size(), 1); PutFixed32(dst, h); } } - virtual bool KeyMayMatch(const Slice& key, - const Slice& filter) const override { + bool KeyMayMatch(const Slice& key, const Slice& filter) const override { uint32_t h = Hash(key.data(), key.size(), 1); for (unsigned int i = 0; i + 4 <= filter.size(); i += 4) { if (h == DecodeFixed32(filter.data() + i)) { @@ -84,12 +82,11 @@ class TestHashFilter : public FilterPolicy { return false; } - virtual FilterBitsBuilder* GetFilterBitsBuilder() const override { + FilterBitsBuilder* GetFilterBitsBuilder() const override { return new TestFilterBitsBuilder(); } - virtual FilterBitsReader* GetFilterBitsReader(const Slice& contents) - const override { + FilterBitsReader* GetFilterBitsReader(const Slice& contents) const override { return new TestFilterBitsReader(contents); } }; @@ -145,7 +142,7 @@ class FullFilterBlockTest : public testing::Test { table_options_.filter_policy.reset(NewBloomFilterPolicy(10, false)); } - ~FullFilterBlockTest() {} + ~FullFilterBlockTest() override {} }; TEST_F(FullFilterBlockTest, EmptyBuilder) { diff --git a/ceph/src/rocksdb/table/get_context.cc b/ceph/src/rocksdb/table/get_context.cc index 0aa75b607..24c9ba7d5 100644 --- a/ceph/src/rocksdb/table/get_context.cc +++ b/ceph/src/rocksdb/table/get_context.cc @@ -43,7 +43,7 @@ GetContext::GetContext(const Comparator* ucmp, Statistics* statistics, GetState init_state, const Slice& user_key, PinnableSlice* pinnable_val, bool* value_found, MergeContext* merge_context, - RangeDelAggregator* _range_del_agg, Env* env, + SequenceNumber* _max_covering_tombstone_seq, Env* env, SequenceNumber* seq, PinnedIteratorsManager* _pinned_iters_mgr, ReadCallback* callback, bool* is_blob_index) @@ -56,7 +56,7 @@ GetContext::GetContext(const Comparator* ucmp, pinnable_val_(pinnable_val), value_found_(value_found), merge_context_(merge_context), - range_del_agg_(_range_del_agg), + max_covering_tombstone_seq_(_max_covering_tombstone_seq), env_(env), seq_(seq), replay_log_(nullptr), @@ -107,6 +107,10 @@ void GetContext::ReportCounters() { RecordTick(statistics_, BLOCK_CACHE_FILTER_HIT, get_context_stats_.num_cache_filter_hit); } + if (get_context_stats_.num_cache_compression_dict_hit > 0) { + RecordTick(statistics_, BLOCK_CACHE_COMPRESSION_DICT_HIT, + get_context_stats_.num_cache_compression_dict_hit); + } if (get_context_stats_.num_cache_index_miss > 0) { RecordTick(statistics_, BLOCK_CACHE_INDEX_MISS, get_context_stats_.num_cache_index_miss); @@ -119,6 +123,10 @@ void GetContext::ReportCounters() { RecordTick(statistics_, BLOCK_CACHE_DATA_MISS, get_context_stats_.num_cache_data_miss); } + if (get_context_stats_.num_cache_compression_dict_miss > 0) { + RecordTick(statistics_, BLOCK_CACHE_COMPRESSION_DICT_MISS, + get_context_stats_.num_cache_compression_dict_miss); + } if (get_context_stats_.num_cache_bytes_read > 0) { RecordTick(statistics_, BLOCK_CACHE_BYTES_READ, get_context_stats_.num_cache_bytes_read); @@ -158,6 +166,14 @@ void GetContext::ReportCounters() { RecordTick(statistics_, BLOCK_CACHE_FILTER_BYTES_INSERT, get_context_stats_.num_cache_filter_bytes_insert); } + if (get_context_stats_.num_cache_compression_dict_add > 0) { + RecordTick(statistics_, BLOCK_CACHE_COMPRESSION_DICT_ADD, + get_context_stats_.num_cache_compression_dict_add); + } + if (get_context_stats_.num_cache_compression_dict_bytes_insert > 0) { + RecordTick(statistics_, BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT, + get_context_stats_.num_cache_compression_dict_bytes_insert); + } } bool GetContext::SaveValue(const ParsedInternalKey& parsed_key, @@ -185,7 +201,8 @@ bool GetContext::SaveValue(const ParsedInternalKey& parsed_key, auto type = parsed_key.type; // Key matches. Process it if ((type == kTypeValue || type == kTypeMerge || type == kTypeBlobIndex) && - range_del_agg_ != nullptr && range_del_agg_->ShouldDelete(parsed_key)) { + max_covering_tombstone_seq_ != nullptr && + *max_covering_tombstone_seq_ > parsed_key.sequence) { type = kTypeRangeDeletion; } switch (type) { @@ -204,6 +221,8 @@ bool GetContext::SaveValue(const ParsedInternalKey& parsed_key, // If the backing resources for the value are provided, pin them pinnable_val_->PinSlice(value, value_pinner); } else { + TEST_SYNC_POINT_CALLBACK("GetContext::SaveValue::PinSelf", this); + // Otherwise copy the value pinnable_val_->PinSelf(value); } diff --git a/ceph/src/rocksdb/table/get_context.h b/ceph/src/rocksdb/table/get_context.h index 066be104b..d7d0e9808 100644 --- a/ceph/src/rocksdb/table/get_context.h +++ b/ceph/src/rocksdb/table/get_context.h @@ -6,7 +6,6 @@ #pragma once #include #include "db/merge_context.h" -#include "db/range_del_aggregator.h" #include "db/read_callback.h" #include "rocksdb/env.h" #include "rocksdb/statistics.h" @@ -22,9 +21,11 @@ struct GetContextStats { uint64_t num_cache_index_hit = 0; uint64_t num_cache_data_hit = 0; uint64_t num_cache_filter_hit = 0; + uint64_t num_cache_compression_dict_hit = 0; uint64_t num_cache_index_miss = 0; uint64_t num_cache_filter_miss = 0; uint64_t num_cache_data_miss = 0; + uint64_t num_cache_compression_dict_miss = 0; uint64_t num_cache_bytes_read = 0; uint64_t num_cache_miss = 0; uint64_t num_cache_add = 0; @@ -35,6 +36,8 @@ struct GetContextStats { uint64_t num_cache_data_bytes_insert = 0; uint64_t num_cache_filter_add = 0; uint64_t num_cache_filter_bytes_insert = 0; + uint64_t num_cache_compression_dict_add = 0; + uint64_t num_cache_compression_dict_bytes_insert = 0; }; class GetContext { @@ -52,8 +55,9 @@ class GetContext { GetContext(const Comparator* ucmp, const MergeOperator* merge_operator, Logger* logger, Statistics* statistics, GetState init_state, const Slice& user_key, PinnableSlice* value, bool* value_found, - MergeContext* merge_context, RangeDelAggregator* range_del_agg, - Env* env, SequenceNumber* seq = nullptr, + MergeContext* merge_context, + SequenceNumber* max_covering_tombstone_seq, Env* env, + SequenceNumber* seq = nullptr, PinnedIteratorsManager* _pinned_iters_mgr = nullptr, ReadCallback* callback = nullptr, bool* is_blob_index = nullptr); @@ -76,7 +80,9 @@ class GetContext { GetState State() const { return state_; } - RangeDelAggregator* range_del_agg() { return range_del_agg_; } + SequenceNumber* max_covering_tombstone_seq() { + return max_covering_tombstone_seq_; + } PinnedIteratorsManager* pinned_iters_mgr() { return pinned_iters_mgr_; } @@ -111,7 +117,7 @@ class GetContext { PinnableSlice* pinnable_val_; bool* value_found_; // Is value set correctly? Used by KeyMayExist MergeContext* merge_context_; - RangeDelAggregator* range_del_agg_; + SequenceNumber* max_covering_tombstone_seq_; Env* env_; // If a key is found, seq_ will be set to the SequenceNumber of most recent // write to the key or kMaxSequenceNumber if unknown diff --git a/ceph/src/rocksdb/table/iterator.cc b/ceph/src/rocksdb/table/iterator.cc index 97c47fb28..0475b9d13 100644 --- a/ceph/src/rocksdb/table/iterator.cc +++ b/ceph/src/rocksdb/table/iterator.cc @@ -103,20 +103,20 @@ Status Iterator::GetProperty(std::string prop_name, std::string* prop) { *prop = "0"; return Status::OK(); } - return Status::InvalidArgument("Undentified property."); + return Status::InvalidArgument("Unidentified property."); } namespace { class EmptyIterator : public Iterator { public: explicit EmptyIterator(const Status& s) : status_(s) { } - virtual bool Valid() const override { return false; } - virtual void Seek(const Slice& /*target*/) override {} - virtual void SeekForPrev(const Slice& /*target*/) override {} - virtual void SeekToFirst() override {} - virtual void SeekToLast() override {} - virtual void Next() override { assert(false); } - virtual void Prev() override { assert(false); } + bool Valid() const override { return false; } + void Seek(const Slice& /*target*/) override {} + void SeekForPrev(const Slice& /*target*/) override {} + void SeekToFirst() override {} + void SeekToLast() override {} + void Next() override { assert(false); } + void Prev() override { assert(false); } Slice key() const override { assert(false); return Slice(); @@ -125,7 +125,7 @@ class EmptyIterator : public Iterator { assert(false); return Slice(); } - virtual Status status() const override { return status_; } + Status status() const override { return status_; } private: Status status_; @@ -135,13 +135,13 @@ template class EmptyInternalIterator : public InternalIteratorBase { public: explicit EmptyInternalIterator(const Status& s) : status_(s) {} - virtual bool Valid() const override { return false; } - virtual void Seek(const Slice& /*target*/) override {} - virtual void SeekForPrev(const Slice& /*target*/) override {} - virtual void SeekToFirst() override {} - virtual void SeekToLast() override {} - virtual void Next() override { assert(false); } - virtual void Prev() override { assert(false); } + bool Valid() const override { return false; } + void Seek(const Slice& /*target*/) override {} + void SeekForPrev(const Slice& /*target*/) override {} + void SeekToFirst() override {} + void SeekToLast() override {} + void Next() override { assert(false); } + void Prev() override { assert(false); } Slice key() const override { assert(false); return Slice(); @@ -150,16 +150,14 @@ class EmptyInternalIterator : public InternalIteratorBase { assert(false); return TValue(); } - virtual Status status() const override { return status_; } + Status status() const override { return status_; } private: Status status_; }; } // namespace -Iterator* NewEmptyIterator() { - return new EmptyIterator(Status::OK()); -} +Iterator* NewEmptyIterator() { return new EmptyIterator(Status::OK()); } Iterator* NewErrorIterator(const Status& status) { return new EmptyIterator(status); @@ -180,7 +178,7 @@ InternalIteratorBase* NewErrorInternalIterator(const Status& status, if (arena == nullptr) { return NewErrorInternalIterator(status); } else { - auto mem = arena->AllocateAligned(sizeof(EmptyIterator)); + auto mem = arena->AllocateAligned(sizeof(EmptyInternalIterator)); return new (mem) EmptyInternalIterator(status); } } @@ -201,7 +199,7 @@ InternalIteratorBase* NewEmptyInternalIterator(Arena* arena) { if (arena == nullptr) { return NewEmptyInternalIterator(); } else { - auto mem = arena->AllocateAligned(sizeof(EmptyIterator)); + auto mem = arena->AllocateAligned(sizeof(EmptyInternalIterator)); return new (mem) EmptyInternalIterator(Status::OK()); } } diff --git a/ceph/src/rocksdb/table/merger_test.cc b/ceph/src/rocksdb/table/merger_test.cc index f20ed187a..1b04d0657 100644 --- a/ceph/src/rocksdb/table/merger_test.cc +++ b/ceph/src/rocksdb/table/merger_test.cc @@ -19,7 +19,7 @@ class MergerTest : public testing::Test { rnd_(3), merging_iterator_(nullptr), single_iterator_(nullptr) {} - ~MergerTest() = default; + ~MergerTest() override = default; std::vector GenerateStrings(size_t len, int string_len) { std::vector ret; diff --git a/ceph/src/rocksdb/table/merging_iterator.cc b/ceph/src/rocksdb/table/merging_iterator.cc index 744de37da..bd4a186b3 100644 --- a/ceph/src/rocksdb/table/merging_iterator.cc +++ b/ceph/src/rocksdb/table/merging_iterator.cc @@ -83,19 +83,17 @@ class MergingIterator : public InternalIterator { } } - virtual ~MergingIterator() { + ~MergingIterator() override { for (auto& child : children_) { child.DeleteIter(is_arena_mode_); } } - virtual bool Valid() const override { - return current_ != nullptr && status_.ok(); - } + bool Valid() const override { return current_ != nullptr && status_.ok(); } - virtual Status status() const override { return status_; } + Status status() const override { return status_; } - virtual void SeekToFirst() override { + void SeekToFirst() override { ClearHeaps(); status_ = Status::OK(); for (auto& child : children_) { @@ -111,7 +109,7 @@ class MergingIterator : public InternalIterator { current_ = CurrentForward(); } - virtual void SeekToLast() override { + void SeekToLast() override { ClearHeaps(); InitMaxHeap(); status_ = Status::OK(); @@ -128,7 +126,7 @@ class MergingIterator : public InternalIterator { current_ = CurrentReverse(); } - virtual void Seek(const Slice& target) override { + void Seek(const Slice& target) override { ClearHeaps(); status_ = Status::OK(); for (auto& child : children_) { @@ -153,7 +151,7 @@ class MergingIterator : public InternalIterator { } } - virtual void SeekForPrev(const Slice& target) override { + void SeekForPrev(const Slice& target) override { ClearHeaps(); InitMaxHeap(); status_ = Status::OK(); @@ -180,7 +178,7 @@ class MergingIterator : public InternalIterator { } } - virtual void Next() override { + void Next() override { assert(Valid()); // Ensure that all children are positioned after key(). @@ -214,7 +212,7 @@ class MergingIterator : public InternalIterator { current_ = CurrentForward(); } - virtual void Prev() override { + void Prev() override { assert(Valid()); // Ensure that all children are positioned before key(). // If we are moving in the reverse direction, it is already @@ -273,31 +271,30 @@ class MergingIterator : public InternalIterator { current_ = CurrentReverse(); } - virtual Slice key() const override { + Slice key() const override { assert(Valid()); return current_->key(); } - virtual Slice value() const override { + Slice value() const override { assert(Valid()); return current_->value(); } - virtual void SetPinnedItersMgr( - PinnedIteratorsManager* pinned_iters_mgr) override { + void SetPinnedItersMgr(PinnedIteratorsManager* pinned_iters_mgr) override { pinned_iters_mgr_ = pinned_iters_mgr; for (auto& child : children_) { child.SetPinnedItersMgr(pinned_iters_mgr); } } - virtual bool IsKeyPinned() const override { + bool IsKeyPinned() const override { assert(Valid()); return pinned_iters_mgr_ && pinned_iters_mgr_->PinningEnabled() && current_->IsKeyPinned(); } - virtual bool IsValuePinned() const override { + bool IsValuePinned() const override { assert(Valid()); return pinned_iters_mgr_ && pinned_iters_mgr_->PinningEnabled() && current_->IsValuePinned(); diff --git a/ceph/src/rocksdb/table/meta_blocks.cc b/ceph/src/rocksdb/table/meta_blocks.cc index 256730bfa..57111cfeb 100644 --- a/ceph/src/rocksdb/table/meta_blocks.cc +++ b/ceph/src/rocksdb/table/meta_blocks.cc @@ -79,6 +79,8 @@ void PropertyBlockBuilder::AddTableProperty(const TableProperties& props) { Add(TablePropertiesNames::kIndexValueIsDeltaEncoded, props.index_value_is_delta_encoded); Add(TablePropertiesNames::kNumEntries, props.num_entries); + Add(TablePropertiesNames::kDeletedKeys, props.num_deletions); + Add(TablePropertiesNames::kMergeOperands, props.num_merge_operands); Add(TablePropertiesNames::kNumRangeDeletions, props.num_range_deletions); Add(TablePropertiesNames::kNumDataBlocks, props.num_data_blocks); Add(TablePropertiesNames::kFilterSize, props.filter_size); @@ -113,6 +115,9 @@ void PropertyBlockBuilder::AddTableProperty(const TableProperties& props) { if (!props.compression_name.empty()) { Add(TablePropertiesNames::kCompression, props.compression_name); } + if (!props.compression_options.empty()) { + Add(TablePropertiesNames::kCompressionOptions, props.compression_options); + } } Slice PropertyBlockBuilder::Finish() { @@ -149,6 +154,16 @@ bool NotifyCollectTableCollectorsOnAdd( return all_succeeded; } +void NotifyCollectTableCollectorsOnBlockAdd( + const std::vector>& collectors, + const uint64_t blockRawBytes, const uint64_t blockCompressedBytesFast, + const uint64_t blockCompressedBytesSlow) { + for (auto& collector : collectors) { + collector->BlockAdd(blockRawBytes, blockCompressedBytesFast, + blockCompressedBytesSlow); + } +} + bool NotifyCollectTableCollectorsOnFinish( const std::vector>& collectors, Logger* info_log, PropertyBlockBuilder* builder) { @@ -172,8 +187,11 @@ bool NotifyCollectTableCollectorsOnFinish( Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer, const Footer& footer, const ImmutableCFOptions& ioptions, - TableProperties** table_properties, - bool compression_type_missing) { + TableProperties** table_properties, bool verify_checksum, + BlockHandle* ret_block_handle, + CacheAllocationPtr* verification_buf, + bool /*compression_type_missing*/, + MemoryAllocator* memory_allocator) { assert(table_properties); Slice v = handle_value; @@ -184,20 +202,17 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, BlockContents block_contents; ReadOptions read_options; - read_options.verify_checksums = false; + read_options.verify_checksums = verify_checksum; Status s; - Slice compression_dict; PersistentCacheOptions cache_options; BlockFetcher block_fetcher( file, prefetch_buffer, footer, read_options, handle, &block_contents, - ioptions, false /* decompress */, compression_dict, cache_options); + ioptions, false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), cache_options, memory_allocator); s = block_fetcher.ReadBlockContents(); - // override compression_type when table file is known to contain undefined - // value at compression type marker - if (compression_type_missing) { - block_contents.compression_type = kNoCompression; - } + // property block is never compressed. Need to add uncompress logic if we are + // to compress it.. if (!s.ok()) { return s; @@ -229,6 +244,10 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, {TablePropertiesNames::kNumDataBlocks, &new_table_properties->num_data_blocks}, {TablePropertiesNames::kNumEntries, &new_table_properties->num_entries}, + {TablePropertiesNames::kDeletedKeys, + &new_table_properties->num_deletions}, + {TablePropertiesNames::kMergeOperands, + &new_table_properties->num_merge_operands}, {TablePropertiesNames::kNumRangeDeletions, &new_table_properties->num_range_deletions}, {TablePropertiesNames::kFormatVersion, @@ -244,16 +263,19 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, }; std::string last_key; - for (iter.SeekToFirst(); iter.Valid(); iter.Next()) { + for (iter.SeekToFirstOrReport(); iter.Valid(); iter.NextOrReport()) { s = iter.status(); if (!s.ok()) { break; } auto key = iter.key().ToString(); - // properties block is strictly sorted with no duplicate key. - assert(last_key.empty() || - BytewiseComparator()->Compare(key, last_key) > 0); + // properties block should be strictly sorted with no duplicate key. + if (!last_key.empty() && + BytewiseComparator()->Compare(key, last_key) <= 0) { + s = Status::Corruption("properties unsorted"); + break; + } last_key = key; auto raw_val = iter.value(); @@ -263,6 +285,12 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, {key, handle.offset() + iter.ValueOffset()}); if (pos != predefined_uint64_properties.end()) { + if (key == TablePropertiesNames::kDeletedKeys || + key == TablePropertiesNames::kMergeOperands) { + // Insert in user-collected properties for API backwards compatibility + new_table_properties->user_collected_properties.insert( + {key, raw_val.ToString()}); + } // handle predefined rocksdb properties uint64_t val; if (!GetVarint64(&raw_val, &val)) { @@ -288,6 +316,8 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, new_table_properties->property_collectors_names = raw_val.ToString(); } else if (key == TablePropertiesNames::kCompression) { new_table_properties->compression_name = raw_val.ToString(); + } else if (key == TablePropertiesNames::kCompressionOptions) { + new_table_properties->compression_options = raw_val.ToString(); } else { // handle user-collected properties new_table_properties->user_collected_properties.insert( @@ -296,6 +326,16 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, } if (s.ok()) { *table_properties = new_table_properties; + if (ret_block_handle != nullptr) { + *ret_block_handle = handle; + } + if (verification_buf != nullptr) { + size_t len = handle.size() + kBlockTrailerSize; + *verification_buf = rocksdb::AllocateBlock(len, memory_allocator); + if (verification_buf->get() != nullptr) { + memcpy(verification_buf->get(), block_contents.data.data(), len); + } + } } else { delete new_table_properties; } @@ -305,9 +345,10 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, uint64_t table_magic_number, - const ImmutableCFOptions &ioptions, + const ImmutableCFOptions& ioptions, TableProperties** properties, - bool compression_type_missing) { + bool compression_type_missing, + MemoryAllocator* memory_allocator) { // -- Read metaindex block Footer footer; auto s = ReadFooterFromFile(file, nullptr /* prefetch_buffer */, file_size, @@ -320,22 +361,19 @@ Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, BlockContents metaindex_contents; ReadOptions read_options; read_options.verify_checksums = false; - Slice compression_dict; PersistentCacheOptions cache_options; BlockFetcher block_fetcher( file, nullptr /* prefetch_buffer */, footer, read_options, metaindex_handle, &metaindex_contents, ioptions, false /* decompress */, - compression_dict, cache_options); + false /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options, memory_allocator); s = block_fetcher.ReadBlockContents(); if (!s.ok()) { return s; } - // override compression_type when table file is known to contain undefined - // value at compression type marker - if (compression_type_missing) { - metaindex_contents.compression_type = kNoCompression; - } + // property blocks are never compressed. Need to add uncompress logic if we + // are to compress it. Block metaindex_block(std::move(metaindex_contents), kDisableGlobalSequenceNumber); std::unique_ptr meta_iter( @@ -351,8 +389,11 @@ Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, TableProperties table_properties; if (found_properties_block == true) { - s = ReadProperties(meta_iter->value(), file, nullptr /* prefetch_buffer */, - footer, ioptions, properties, compression_type_missing); + s = ReadProperties( + meta_iter->value(), file, nullptr /* prefetch_buffer */, footer, + ioptions, properties, false /* verify_checksum */, + nullptr /* ret_block_hanel */, nullptr /* ret_block_contents */, + compression_type_missing, memory_allocator); } else { s = Status::NotFound(); } @@ -375,10 +416,11 @@ Status FindMetaBlock(InternalIterator* meta_index_iter, Status FindMetaBlock(RandomAccessFileReader* file, uint64_t file_size, uint64_t table_magic_number, - const ImmutableCFOptions &ioptions, + const ImmutableCFOptions& ioptions, const std::string& meta_block_name, BlockHandle* block_handle, - bool compression_type_missing) { + bool /*compression_type_missing*/, + MemoryAllocator* memory_allocator) { Footer footer; auto s = ReadFooterFromFile(file, nullptr /* prefetch_buffer */, file_size, &footer, table_magic_number); @@ -390,21 +432,18 @@ Status FindMetaBlock(RandomAccessFileReader* file, uint64_t file_size, BlockContents metaindex_contents; ReadOptions read_options; read_options.verify_checksums = false; - Slice compression_dict; PersistentCacheOptions cache_options; BlockFetcher block_fetcher( file, nullptr /* prefetch_buffer */, footer, read_options, metaindex_handle, &metaindex_contents, ioptions, - false /* do decompression */, compression_dict, cache_options); + false /* do decompression */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), cache_options, memory_allocator); s = block_fetcher.ReadBlockContents(); if (!s.ok()) { return s; } - // override compression_type when table file is known to contain undefined - // value at compression type marker - if (compression_type_missing) { - metaindex_contents.compression_type = kNoCompression; - } + // meta blocks are never compressed. Need to add uncompress logic if we are to + // compress it. Block metaindex_block(std::move(metaindex_contents), kDisableGlobalSequenceNumber); @@ -420,7 +459,8 @@ Status ReadMetaBlock(RandomAccessFileReader* file, uint64_t table_magic_number, const ImmutableCFOptions& ioptions, const std::string& meta_block_name, - BlockContents* contents, bool compression_type_missing) { + BlockContents* contents, bool /*compression_type_missing*/, + MemoryAllocator* memory_allocator) { Status status; Footer footer; status = ReadFooterFromFile(file, prefetch_buffer, file_size, &footer, @@ -434,22 +474,19 @@ Status ReadMetaBlock(RandomAccessFileReader* file, BlockContents metaindex_contents; ReadOptions read_options; read_options.verify_checksums = false; - Slice compression_dict; PersistentCacheOptions cache_options; BlockFetcher block_fetcher(file, prefetch_buffer, footer, read_options, metaindex_handle, &metaindex_contents, ioptions, - false /* decompress */, compression_dict, - cache_options); + false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), cache_options, + memory_allocator); status = block_fetcher.ReadBlockContents(); if (!status.ok()) { return status; } - // override compression_type when table file is known to contain undefined - // value at compression type marker - if (compression_type_missing) { - metaindex_contents.compression_type = kNoCompression; - } + // meta block is never compressed. Need to add uncompress logic if we are to + // compress it. // Finding metablock Block metaindex_block(std::move(metaindex_contents), @@ -469,7 +506,8 @@ Status ReadMetaBlock(RandomAccessFileReader* file, // Reading metablock BlockFetcher block_fetcher2( file, prefetch_buffer, footer, read_options, block_handle, contents, - ioptions, false /* decompress */, compression_dict, cache_options); + ioptions, false /* decompress */, false /*maybe_compressed*/, + UncompressionDict::GetEmptyDict(), cache_options, memory_allocator); return block_fetcher2.ReadBlockContents(); } diff --git a/ceph/src/rocksdb/table/meta_blocks.h b/ceph/src/rocksdb/table/meta_blocks.h index a18c8edc4..6efd1225e 100644 --- a/ceph/src/rocksdb/table/meta_blocks.h +++ b/ceph/src/rocksdb/table/meta_blocks.h @@ -11,12 +11,13 @@ #include "db/builder.h" #include "db/table_properties_collector.h" -#include "util/kv_map.h" #include "rocksdb/comparator.h" +#include "rocksdb/memory_allocator.h" #include "rocksdb/options.h" #include "rocksdb/slice.h" #include "table/block_builder.h" #include "table/format.h" +#include "util/kv_map.h" namespace rocksdb { @@ -82,6 +83,11 @@ bool NotifyCollectTableCollectorsOnAdd( const std::vector>& collectors, Logger* info_log); +void NotifyCollectTableCollectorsOnBlockAdd( + const std::vector>& collectors, + uint64_t blockRawBytes, uint64_t blockCompressedBytesFast, + uint64_t blockCompressedBytesSlow); + // NotifyCollectTableCollectorsOnAdd() triggers the `Finish` event for all // property collectors. The collected properties will be added to `builder`. bool NotifyCollectTableCollectorsOnFinish( @@ -95,8 +101,11 @@ bool NotifyCollectTableCollectorsOnFinish( Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer, const Footer& footer, const ImmutableCFOptions& ioptions, - TableProperties** table_properties, - bool compression_type_missing = false); + TableProperties** table_properties, bool verify_checksum, + BlockHandle* block_handle, + CacheAllocationPtr* verification_buf, + bool compression_type_missing = false, + MemoryAllocator* memory_allocator = nullptr); // Directly read the properties from the properties block of a plain table. // @returns a status to indicate if the operation succeeded. On success, @@ -108,9 +117,10 @@ Status ReadProperties(const Slice& handle_value, RandomAccessFileReader* file, // `ReadProperties`, `FindMetaBlock`, and `ReadMetaBlock` Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, uint64_t table_magic_number, - const ImmutableCFOptions &ioptions, + const ImmutableCFOptions& ioptions, TableProperties** properties, - bool compression_type_missing = false); + bool compression_type_missing = false, + MemoryAllocator* memory_allocator = nullptr); // Find the meta block from the meta index block. Status FindMetaBlock(InternalIterator* meta_index_iter, @@ -120,10 +130,11 @@ Status FindMetaBlock(InternalIterator* meta_index_iter, // Find the meta block Status FindMetaBlock(RandomAccessFileReader* file, uint64_t file_size, uint64_t table_magic_number, - const ImmutableCFOptions &ioptions, + const ImmutableCFOptions& ioptions, const std::string& meta_block_name, BlockHandle* block_handle, - bool compression_type_missing = false); + bool compression_type_missing = false, + MemoryAllocator* memory_allocator = nullptr); // Read the specified meta block with name meta_block_name // from `file` and initialize `contents` with contents of this block. @@ -134,6 +145,7 @@ Status ReadMetaBlock(RandomAccessFileReader* file, const ImmutableCFOptions& ioptions, const std::string& meta_block_name, BlockContents* contents, - bool compression_type_missing = false); + bool compression_type_missing = false, + MemoryAllocator* memory_allocator = nullptr); } // namespace rocksdb diff --git a/ceph/src/rocksdb/table/mock_table.cc b/ceph/src/rocksdb/table/mock_table.cc index a5473b30b..65a436169 100644 --- a/ceph/src/rocksdb/table/mock_table.cc +++ b/ceph/src/rocksdb/table/mock_table.cc @@ -60,8 +60,8 @@ MockTableFactory::MockTableFactory() : next_id_(1) {} Status MockTableFactory::NewTableReader( const TableReaderOptions& /*table_reader_options*/, - unique_ptr&& file, uint64_t /*file_size*/, - unique_ptr* table_reader, + std::unique_ptr&& file, uint64_t /*file_size*/, + std::unique_ptr* table_reader, bool /*prefetch_index_and_filter_in_cache*/) const { uint32_t id = GetIDFromFile(file.get()); diff --git a/ceph/src/rocksdb/table/mock_table.h b/ceph/src/rocksdb/table/mock_table.h index 92cf87370..2f123a963 100644 --- a/ceph/src/rocksdb/table/mock_table.h +++ b/ceph/src/rocksdb/table/mock_table.h @@ -157,8 +157,8 @@ class MockTableFactory : public TableFactory { const char* Name() const override { return "MockTable"; } Status NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table_reader, bool prefetch_index_and_filter_in_cache = true) const override; TableBuilder* NewTableBuilder( const TableBuilderOptions& table_builder_options, diff --git a/ceph/src/rocksdb/table/partitioned_filter_block_test.cc b/ceph/src/rocksdb/table/partitioned_filter_block_test.cc index 0b11c0df2..8068f14d8 100644 --- a/ceph/src/rocksdb/table/partitioned_filter_block_test.cc +++ b/ceph/src/rocksdb/table/partitioned_filter_block_test.cc @@ -27,24 +27,24 @@ class MockedBlockBasedTable : public BlockBasedTable { rep->cache_key_prefix_size = 10; } - virtual CachableEntry GetFilter( + CachableEntry GetFilter( FilePrefetchBuffer*, const BlockHandle& filter_blk_handle, const bool /* unused */, bool /* unused */, GetContext* /* unused */, const SliceTransform* prefix_extractor) const override { Slice slice = slices[filter_blk_handle.offset()]; auto obj = new FullFilterBlockReader( - prefix_extractor, true, BlockContents(slice, false, kNoCompression), + prefix_extractor, true, BlockContents(slice), rep_->table_options.filter_policy->GetFilterBitsReader(slice), nullptr); return {obj, nullptr}; } - virtual FilterBlockReader* ReadFilter( + FilterBlockReader* ReadFilter( FilePrefetchBuffer*, const BlockHandle& filter_blk_handle, const bool /* unused */, const SliceTransform* prefix_extractor) const override { Slice slice = slices[filter_blk_handle.offset()]; auto obj = new FullFilterBlockReader( - prefix_extractor, true, BlockContents(slice, false, kNoCompression), + prefix_extractor, true, BlockContents(slice), rep_->table_options.filter_policy->GetFilterBitsReader(slice), nullptr); return obj; } @@ -67,7 +67,7 @@ class PartitionedFilterBlockTest } std::shared_ptr cache_; - ~PartitionedFilterBlockTest() {} + ~PartitionedFilterBlockTest() override {} const std::string keys[4] = {"afoo", "bar", "box", "hello"}; const std::string missing_keys[2] = {"missing", "other"}; @@ -147,10 +147,10 @@ class PartitionedFilterBlockTest const bool kImmortal = true; table.reset(new MockedBlockBasedTable( new BlockBasedTable::Rep(ioptions, env_options, table_options_, icomp, - !kSkipFilters, !kImmortal))); + !kSkipFilters, 0, !kImmortal))); auto reader = new PartitionedFilterBlockReader( - prefix_extractor, true, BlockContents(slice, false, kNoCompression), - nullptr, nullptr, icomp, table.get(), pib->seperator_is_key_plus_seq(), + prefix_extractor, true, BlockContents(slice), nullptr, nullptr, icomp, + table.get(), pib->seperator_is_key_plus_seq(), !pib->get_use_value_delta_encoding()); return reader; } diff --git a/ceph/src/rocksdb/table/persistent_cache_helper.cc b/ceph/src/rocksdb/table/persistent_cache_helper.cc index 103f57c80..4e90697a6 100644 --- a/ceph/src/rocksdb/table/persistent_cache_helper.cc +++ b/ceph/src/rocksdb/table/persistent_cache_helper.cc @@ -29,12 +29,9 @@ void PersistentCacheHelper::InsertUncompressedPage( const BlockContents& contents) { assert(cache_options.persistent_cache); assert(!cache_options.persistent_cache->IsCompressed()); - if (!contents.cachable || contents.compression_type != kNoCompression) { - // We shouldn't cache this. Either - // (1) content is not cacheable - // (2) content is compressed - return; - } + // Precondition: + // (1) content is cacheable + // (2) content is not compressed // construct the page key char cache_key[BlockBasedTable::kMaxCacheKeyPrefixSize + kMaxVarint64Length]; @@ -109,8 +106,7 @@ Status PersistentCacheHelper::LookupUncompressedPage( // update stats RecordTick(cache_options.statistics, PERSISTENT_CACHE_HIT); // construct result and return - *contents = - BlockContents(std::move(data), size, false /*cacheable*/, kNoCompression); + *contents = BlockContents(std::move(data), size); return Status::OK(); } diff --git a/ceph/src/rocksdb/table/plain_table_builder.cc b/ceph/src/rocksdb/table/plain_table_builder.cc index 717635cc1..453b6c768 100644 --- a/ceph/src/rocksdb/table/plain_table_builder.cc +++ b/ceph/src/rocksdb/table/plain_table_builder.cc @@ -166,6 +166,12 @@ void PlainTableBuilder::Add(const Slice& key, const Slice& value) { properties_.num_entries++; properties_.raw_key_size += key.size(); properties_.raw_value_size += value.size(); + if (internal_key.type == kTypeDeletion || + internal_key.type == kTypeSingleDeletion) { + properties_.num_deletions++; + } else if (internal_key.type == kTypeMerge) { + properties_.num_merge_operands++; + } // notify property collectors NotifyCollectTableCollectorsOnAdd( diff --git a/ceph/src/rocksdb/table/plain_table_factory.cc b/ceph/src/rocksdb/table/plain_table_factory.cc index b88a689d4..a6e59c142 100644 --- a/ceph/src/rocksdb/table/plain_table_factory.cc +++ b/ceph/src/rocksdb/table/plain_table_factory.cc @@ -19,15 +19,16 @@ namespace rocksdb { Status PlainTableFactory::NewTableReader( const TableReaderOptions& table_reader_options, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table, bool /*prefetch_index_and_filter_in_cache*/) const { return PlainTableReader::Open( table_reader_options.ioptions, table_reader_options.env_options, table_reader_options.internal_comparator, std::move(file), file_size, table, table_options_.bloom_bits_per_key, table_options_.hash_table_ratio, table_options_.index_sparseness, table_options_.huge_page_tlb_size, - table_options_.full_scan_mode, table_reader_options.prefix_extractor); + table_options_.full_scan_mode, table_reader_options.immortal, + table_reader_options.prefix_extractor); } TableBuilder* PlainTableFactory::NewTableBuilder( @@ -146,15 +147,8 @@ Status GetMemTableRepFactoryFromString( mem_factory = new VectorRepFactory(); } } else if (opts_list[0] == "cuckoo") { - // Expecting format - // cuckoo: - if (2 == len) { - size_t write_buffer_size = ParseSizeT(opts_list[1]); - mem_factory = NewHashCuckooRepFactory(write_buffer_size); - } else if (1 == len) { - return Status::InvalidArgument("Can't parse memtable_factory option ", - opts_str); - } + return Status::NotSupported( + "cuckoo hash memtable is not supported anymore."); } else { return Status::InvalidArgument("Unrecognized memtable_factory option ", opts_str); diff --git a/ceph/src/rocksdb/table/plain_table_factory.h b/ceph/src/rocksdb/table/plain_table_factory.h index f540a92b8..990df482e 100644 --- a/ceph/src/rocksdb/table/plain_table_factory.h +++ b/ceph/src/rocksdb/table/plain_table_factory.h @@ -17,7 +17,6 @@ namespace rocksdb { struct EnvOptions; -using std::unique_ptr; class Status; class RandomAccessFile; class WritableFile; @@ -149,8 +148,8 @@ class PlainTableFactory : public TableFactory { const char* Name() const override { return "PlainTable"; } Status NewTableReader(const TableReaderOptions& table_reader_options, - unique_ptr&& file, - uint64_t file_size, unique_ptr* table, + std::unique_ptr&& file, + uint64_t file_size, std::unique_ptr* table, bool prefetch_index_and_filter_in_cache) const override; TableBuilder* NewTableBuilder( diff --git a/ceph/src/rocksdb/table/plain_table_index.cc b/ceph/src/rocksdb/table/plain_table_index.cc index 39a6b53d6..437409239 100644 --- a/ceph/src/rocksdb/table/plain_table_index.cc +++ b/ceph/src/rocksdb/table/plain_table_index.cc @@ -203,7 +203,7 @@ Slice PlainTableIndexBuilder::FillIndexes( assert(sub_index_offset == sub_index_size_); ROCKS_LOG_DEBUG(ioptions_.info_log, - "hash table size: %d, suffix_map length %" ROCKSDB_PRIszt, + "hash table size: %" PRIu32 ", suffix_map length %" PRIu32, index_size_, sub_index_size_); return Slice(allocated, GetTotalSize()); } diff --git a/ceph/src/rocksdb/table/plain_table_key_coding.h b/ceph/src/rocksdb/table/plain_table_key_coding.h index 321e0aed5..9a27ad06b 100644 --- a/ceph/src/rocksdb/table/plain_table_key_coding.h +++ b/ceph/src/rocksdb/table/plain_table_key_coding.h @@ -114,7 +114,7 @@ class PlainTableFileReader { }; // Keep buffers for two recent reads. - std::array, 2> buffers_; + std::array, 2> buffers_; uint32_t num_buf_; Status status_; diff --git a/ceph/src/rocksdb/table/plain_table_reader.cc b/ceph/src/rocksdb/table/plain_table_reader.cc index 4f6c99f94..b0c6dcf07 100644 --- a/ceph/src/rocksdb/table/plain_table_reader.cc +++ b/ceph/src/rocksdb/table/plain_table_reader.cc @@ -54,7 +54,7 @@ inline uint32_t GetFixed32Element(const char* base, size_t offset) { class PlainTableIterator : public InternalIterator { public: explicit PlainTableIterator(PlainTableReader* table, bool use_prefix_seek); - ~PlainTableIterator(); + ~PlainTableIterator() override; bool Valid() const override; @@ -91,21 +91,20 @@ class PlainTableIterator : public InternalIterator { }; extern const uint64_t kPlainTableMagicNumber; -PlainTableReader::PlainTableReader(const ImmutableCFOptions& ioptions, - unique_ptr&& file, - const EnvOptions& storage_options, - const InternalKeyComparator& icomparator, - EncodingType encoding_type, - uint64_t file_size, - const TableProperties* table_properties, - const SliceTransform* prefix_extractor) +PlainTableReader::PlainTableReader( + const ImmutableCFOptions& ioptions, + std::unique_ptr&& file, + const EnvOptions& storage_options, const InternalKeyComparator& icomparator, + EncodingType encoding_type, uint64_t file_size, + const TableProperties* table_properties, + const SliceTransform* prefix_extractor) : internal_comparator_(icomparator), encoding_type_(encoding_type), full_scan_mode_(false), user_key_len_(static_cast(table_properties->fixed_key_len)), prefix_extractor_(prefix_extractor), enable_bloom_(false), - bloom_(6, nullptr), + bloom_(6), file_info_(std::move(file), storage_options, static_cast(table_properties->data_size)), ioptions_(ioptions), @@ -118,10 +117,11 @@ PlainTableReader::~PlainTableReader() { Status PlainTableReader::Open( const ImmutableCFOptions& ioptions, const EnvOptions& env_options, const InternalKeyComparator& internal_comparator, - unique_ptr&& file, uint64_t file_size, - unique_ptr* table_reader, const int bloom_bits_per_key, + std::unique_ptr&& file, uint64_t file_size, + std::unique_ptr* table_reader, const int bloom_bits_per_key, double hash_table_ratio, size_t index_sparseness, size_t huge_page_tlb_size, - bool full_scan_mode, const SliceTransform* prefix_extractor) { + bool full_scan_mode, const bool immortal_table, + const SliceTransform* prefix_extractor) { if (file_size > PlainTableIndex::kMaxFileSize) { return Status::NotSupported("File is too large for PlainTableReader!"); } @@ -182,6 +182,10 @@ Status PlainTableReader::Open( new_reader->full_scan_mode_ = true; } + if (immortal_table && new_reader->file_info_.is_mmap_mode) { + new_reader->dummy_cleanable_.reset(new Cleanable()); + } + *table_reader = std::move(new_reader); return s; } @@ -202,7 +206,8 @@ InternalIterator* PlainTableReader::NewIterator( } Status PlainTableReader::PopulateIndexRecordList( - PlainTableIndexBuilder* index_builder, vector* prefix_hashes) { + PlainTableIndexBuilder* index_builder, + std::vector* prefix_hashes) { Slice prev_key_prefix_slice; std::string prev_key_prefix_buf; uint32_t pos = data_start_offset_; @@ -252,10 +257,9 @@ Status PlainTableReader::PopulateIndexRecordList( return s; } -void PlainTableReader::AllocateAndFillBloom(int bloom_bits_per_key, - int num_prefixes, - size_t huge_page_tlb_size, - vector* prefix_hashes) { +void PlainTableReader::AllocateAndFillBloom( + int bloom_bits_per_key, int num_prefixes, size_t huge_page_tlb_size, + std::vector* prefix_hashes) { if (!IsTotalOrderMode()) { uint32_t bloom_total_bits = num_prefixes * bloom_bits_per_key; if (bloom_total_bits > 0) { @@ -267,7 +271,7 @@ void PlainTableReader::AllocateAndFillBloom(int bloom_bits_per_key, } } -void PlainTableReader::FillBloom(vector* prefix_hashes) { +void PlainTableReader::FillBloom(std::vector* prefix_hashes) { assert(bloom_.IsInitialized()); for (auto prefix_hash : *prefix_hashes) { bloom_.AddHash(prefix_hash); @@ -599,7 +603,8 @@ Status PlainTableReader::Get(const ReadOptions& /*ro*/, const Slice& target, // can we enable the fast path? if (internal_comparator_.Compare(found_key, parsed_target) >= 0) { bool dont_care __attribute__((__unused__)); - if (!get_context->SaveValue(found_key, found_value, &dont_care)) { + if (!get_context->SaveValue(found_key, found_value, &dont_care, + dummy_cleanable_.get())) { break; } } diff --git a/ceph/src/rocksdb/table/plain_table_reader.h b/ceph/src/rocksdb/table/plain_table_reader.h index df08a98fa..022886b72 100644 --- a/ceph/src/rocksdb/table/plain_table_reader.h +++ b/ceph/src/rocksdb/table/plain_table_reader.h @@ -39,18 +39,15 @@ class InternalKeyComparator; class PlainTableKeyDecoder; class GetContext; -using std::unique_ptr; -using std::unordered_map; -using std::vector; extern const uint32_t kPlainTableVariableLength; struct PlainTableReaderFileInfo { bool is_mmap_mode; Slice file_data; uint32_t data_end_offset; - unique_ptr file; + std::unique_ptr file; - PlainTableReaderFileInfo(unique_ptr&& _file, + PlainTableReaderFileInfo(std::unique_ptr&& _file, const EnvOptions& storage_options, uint32_t _data_size_offset) : is_mmap_mode(storage_options.use_mmap_reads), @@ -71,11 +68,11 @@ class PlainTableReader: public TableReader { static Status Open(const ImmutableCFOptions& ioptions, const EnvOptions& env_options, const InternalKeyComparator& internal_comparator, - unique_ptr&& file, - uint64_t file_size, unique_ptr* table, + std::unique_ptr&& file, + uint64_t file_size, std::unique_ptr* table, const int bloom_bits_per_key, double hash_table_ratio, size_t index_sparseness, size_t huge_page_tlb_size, - bool full_scan_mode, + bool full_scan_mode, const bool immortal_table = false, const SliceTransform* prefix_extractor = nullptr); InternalIterator* NewIterator(const ReadOptions&, @@ -104,7 +101,7 @@ class PlainTableReader: public TableReader { } PlainTableReader(const ImmutableCFOptions& ioptions, - unique_ptr&& file, + std::unique_ptr&& file, const EnvOptions& env_options, const InternalKeyComparator& internal_comparator, EncodingType encoding_type, uint64_t file_size, @@ -153,10 +150,11 @@ class PlainTableReader: public TableReader { DynamicBloom bloom_; PlainTableReaderFileInfo file_info_; Arena arena_; - std::unique_ptr index_block_alloc_; - std::unique_ptr bloom_block_alloc_; + CacheAllocationPtr index_block_alloc_; + CacheAllocationPtr bloom_block_alloc_; const ImmutableCFOptions& ioptions_; + std::unique_ptr dummy_cleanable_; uint64_t file_size_; std::shared_ptr table_properties_; @@ -201,14 +199,14 @@ class PlainTableReader: public TableReader { // If bloom_ is not null, all the keys' full-key hash will be added to the // bloom filter. Status PopulateIndexRecordList(PlainTableIndexBuilder* index_builder, - vector* prefix_hashes); + std::vector* prefix_hashes); // Internal helper function to allocate memory for bloom filter and fill it void AllocateAndFillBloom(int bloom_bits_per_key, int num_prefixes, size_t huge_page_tlb_size, - vector* prefix_hashes); + std::vector* prefix_hashes); - void FillBloom(vector* prefix_hashes); + void FillBloom(std::vector* prefix_hashes); // Read the key and value at `offset` to parameters for keys, the and // `seekable`. diff --git a/ceph/src/rocksdb/table/sst_file_reader.cc b/ceph/src/rocksdb/table/sst_file_reader.cc new file mode 100644 index 000000000..54408bb50 --- /dev/null +++ b/ceph/src/rocksdb/table/sst_file_reader.cc @@ -0,0 +1,87 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#ifndef ROCKSDB_LITE + +#include "rocksdb/sst_file_reader.h" + +#include "db/db_iter.h" +#include "db/dbformat.h" +#include "options/cf_options.h" +#include "table/get_context.h" +#include "table/table_builder.h" +#include "table/table_reader.h" +#include "util/file_reader_writer.h" + +namespace rocksdb { + +struct SstFileReader::Rep { + Options options; + EnvOptions soptions; + ImmutableCFOptions ioptions; + MutableCFOptions moptions; + + std::unique_ptr table_reader; + + Rep(const Options& opts) + : options(opts), + soptions(options), + ioptions(options), + moptions(ColumnFamilyOptions(options)) {} +}; + +SstFileReader::SstFileReader(const Options& options) : rep_(new Rep(options)) {} + +SstFileReader::~SstFileReader() {} + +Status SstFileReader::Open(const std::string& file_path) { + auto r = rep_.get(); + Status s; + uint64_t file_size = 0; + std::unique_ptr file; + std::unique_ptr file_reader; + s = r->options.env->GetFileSize(file_path, &file_size); + if (s.ok()) { + s = r->options.env->NewRandomAccessFile(file_path, &file, r->soptions); + } + if (s.ok()) { + file_reader.reset(new RandomAccessFileReader(std::move(file), file_path)); + } + if (s.ok()) { + TableReaderOptions t_opt(r->ioptions, r->moptions.prefix_extractor.get(), + r->soptions, r->ioptions.internal_comparator); + // Allow open file with global sequence number for backward compatibility. + t_opt.largest_seqno = kMaxSequenceNumber; + s = r->options.table_factory->NewTableReader(t_opt, std::move(file_reader), + file_size, &r->table_reader); + } + return s; +} + +Iterator* SstFileReader::NewIterator(const ReadOptions& options) { + auto r = rep_.get(); + auto sequence = options.snapshot != nullptr + ? options.snapshot->GetSequenceNumber() + : kMaxSequenceNumber; + auto internal_iter = + r->table_reader->NewIterator(options, r->moptions.prefix_extractor.get()); + return NewDBIterator(r->options.env, options, r->ioptions, r->moptions, + r->ioptions.user_comparator, internal_iter, sequence, + r->moptions.max_sequential_skip_in_iterations, + nullptr /* read_callback */); +} + +std::shared_ptr SstFileReader::GetTableProperties() + const { + return rep_->table_reader->GetTableProperties(); +} + +Status SstFileReader::VerifyChecksum() { + return rep_->table_reader->VerifyChecksum(); +} + +} // namespace rocksdb + +#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/table/sst_file_reader_test.cc b/ceph/src/rocksdb/table/sst_file_reader_test.cc new file mode 100644 index 000000000..51bc975af --- /dev/null +++ b/ceph/src/rocksdb/table/sst_file_reader_test.cc @@ -0,0 +1,174 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#ifndef ROCKSDB_LITE + +#include + +#include "rocksdb/db.h" +#include "rocksdb/sst_file_reader.h" +#include "rocksdb/sst_file_writer.h" +#include "table/sst_file_writer_collectors.h" +#include "util/testharness.h" +#include "util/testutil.h" +#include "utilities/merge_operators.h" + +namespace rocksdb { + +std::string EncodeAsString(uint64_t v) { + char buf[16]; + snprintf(buf, sizeof(buf), "%08" PRIu64, v); + return std::string(buf); +} + +std::string EncodeAsUint64(uint64_t v) { + std::string dst; + PutFixed64(&dst, v); + return dst; +} + +class SstFileReaderTest : public testing::Test { + public: + SstFileReaderTest() { + options_.merge_operator = MergeOperators::CreateUInt64AddOperator(); + sst_name_ = test::PerThreadDBPath("sst_file"); + } + + ~SstFileReaderTest() { + Status s = Env::Default()->DeleteFile(sst_name_); + assert(s.ok()); + } + + void CreateFile(const std::string& file_name, + const std::vector& keys) { + SstFileWriter writer(soptions_, options_); + ASSERT_OK(writer.Open(file_name)); + for (size_t i = 0; i + 2 < keys.size(); i += 3) { + ASSERT_OK(writer.Put(keys[i], keys[i])); + ASSERT_OK(writer.Merge(keys[i + 1], EncodeAsUint64(i + 1))); + ASSERT_OK(writer.Delete(keys[i + 2])); + } + ASSERT_OK(writer.Finish()); + } + + void CheckFile(const std::string& file_name, + const std::vector& keys, + bool check_global_seqno = false) { + ReadOptions ropts; + SstFileReader reader(options_); + ASSERT_OK(reader.Open(file_name)); + ASSERT_OK(reader.VerifyChecksum()); + std::unique_ptr iter(reader.NewIterator(ropts)); + iter->SeekToFirst(); + for (size_t i = 0; i + 2 < keys.size(); i += 3) { + ASSERT_TRUE(iter->Valid()); + ASSERT_EQ(iter->key().compare(keys[i]), 0); + ASSERT_EQ(iter->value().compare(keys[i]), 0); + iter->Next(); + ASSERT_TRUE(iter->Valid()); + ASSERT_EQ(iter->key().compare(keys[i + 1]), 0); + ASSERT_EQ(iter->value().compare(EncodeAsUint64(i + 1)), 0); + iter->Next(); + } + ASSERT_FALSE(iter->Valid()); + if (check_global_seqno) { + auto properties = reader.GetTableProperties(); + ASSERT_TRUE(properties); + auto& user_properties = properties->user_collected_properties; + ASSERT_TRUE( + user_properties.count(ExternalSstFilePropertyNames::kGlobalSeqno)); + } + } + + void CreateFileAndCheck(const std::vector& keys) { + CreateFile(sst_name_, keys); + CheckFile(sst_name_, keys); + } + + protected: + Options options_; + EnvOptions soptions_; + std::string sst_name_; +}; + +const uint64_t kNumKeys = 100; + +TEST_F(SstFileReaderTest, Basic) { + std::vector keys; + for (uint64_t i = 0; i < kNumKeys; i++) { + keys.emplace_back(EncodeAsString(i)); + } + CreateFileAndCheck(keys); +} + +TEST_F(SstFileReaderTest, Uint64Comparator) { + options_.comparator = test::Uint64Comparator(); + std::vector keys; + for (uint64_t i = 0; i < kNumKeys; i++) { + keys.emplace_back(EncodeAsUint64(i)); + } + CreateFileAndCheck(keys); +} + +TEST_F(SstFileReaderTest, ReadFileWithGlobalSeqno) { + std::vector keys; + for (uint64_t i = 0; i < kNumKeys; i++) { + keys.emplace_back(EncodeAsString(i)); + } + // Generate a SST file. + CreateFile(sst_name_, keys); + + // Ingest the file into a db, to assign it a global sequence number. + Options options; + options.create_if_missing = true; + std::string db_name = test::PerThreadDBPath("test_db"); + DB* db; + ASSERT_OK(DB::Open(options, db_name, &db)); + // Bump sequence number. + ASSERT_OK(db->Put(WriteOptions(), keys[0], "foo")); + ASSERT_OK(db->Flush(FlushOptions())); + // Ingest the file. + IngestExternalFileOptions ingest_options; + ingest_options.write_global_seqno = true; + ASSERT_OK(db->IngestExternalFile({sst_name_}, ingest_options)); + std::vector live_files; + uint64_t manifest_file_size = 0; + ASSERT_OK(db->GetLiveFiles(live_files, &manifest_file_size)); + // Get the ingested file. + std::string ingested_file; + for (auto& live_file : live_files) { + if (live_file.substr(live_file.size() - 4, std::string::npos) == ".sst") { + if (ingested_file.empty() || ingested_file < live_file) { + ingested_file = live_file; + } + } + } + ASSERT_FALSE(ingested_file.empty()); + delete db; + + // Verify the file can be open and read by SstFileReader. + CheckFile(db_name + ingested_file, keys, true /* check_global_seqno */); + + // Cleanup. + ASSERT_OK(DestroyDB(db_name, options)); +} + +} // namespace rocksdb + +int main(int argc, char** argv) { + ::testing::InitGoogleTest(&argc, argv); + return RUN_ALL_TESTS(); +} + +#else +#include + +int main(int /*argc*/, char** /*argv*/) { + fprintf(stderr, + "SKIPPED as SstFileReader is not supported in ROCKSDB_LITE\n"); + return 0; +} + +#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/table/sst_file_writer.cc b/ceph/src/rocksdb/table/sst_file_writer.cc index e0c4c3189..b9a7273e0 100644 --- a/ceph/src/rocksdb/table/sst_file_writer.cc +++ b/ceph/src/rocksdb/table/sst_file_writer.cc @@ -202,6 +202,8 @@ Status SstFileWriter::Open(const std::string& file_path) { compression_type = r->mutable_cf_options.compression; compression_opts = r->ioptions.compression_opts; } + uint64_t sample_for_compression = + r->mutable_cf_options.sample_for_compression; std::vector> int_tbl_prop_collector_factories; @@ -234,11 +236,12 @@ Status SstFileWriter::Open(const std::string& file_path) { TableBuilderOptions table_builder_options( r->ioptions, r->mutable_cf_options, r->internal_comparator, - &int_tbl_prop_collector_factories, compression_type, compression_opts, - nullptr /* compression_dict */, r->skip_filters, r->column_family_name, - unknown_level); - r->file_writer.reset( - new WritableFileWriter(std::move(sst_file), file_path, r->env_options)); + &int_tbl_prop_collector_factories, compression_type, + sample_for_compression, compression_opts, r->skip_filters, + r->column_family_name, unknown_level); + r->file_writer.reset(new WritableFileWriter( + std::move(sst_file), file_path, r->env_options, r->ioptions.env, + nullptr /* stats */, r->ioptions.listeners)); // TODO(tec) : If table_factory is using compressed block cache, we will // be adding the external sst file blocks into it, which is wasteful. diff --git a/ceph/src/rocksdb/table/sst_file_writer_collectors.h b/ceph/src/rocksdb/table/sst_file_writer_collectors.h index 89e0970d8..e1827939f 100644 --- a/ceph/src/rocksdb/table/sst_file_writer_collectors.h +++ b/ceph/src/rocksdb/table/sst_file_writer_collectors.h @@ -5,6 +5,8 @@ #pragma once #include +#include "db/dbformat.h" +#include "db/table_properties_collector.h" #include "rocksdb/types.h" #include "util/string_util.h" @@ -33,6 +35,14 @@ class SstFileWriterPropertiesCollector : public IntTblPropCollector { return Status::OK(); } + virtual void BlockAdd(uint64_t /* blockRawBytes */, + uint64_t /* blockCompressedBytesFast */, + uint64_t /* blockCompressedBytesSlow */) override { + // Intentionally left blank. No interest in collecting stats for + // blocks. + return; + } + virtual Status Finish(UserCollectedProperties* properties) override { // File version std::string version_val; diff --git a/ceph/src/rocksdb/table/table_builder.h b/ceph/src/rocksdb/table/table_builder.h index 0665fac82..20d9a55f2 100644 --- a/ceph/src/rocksdb/table/table_builder.h +++ b/ceph/src/rocksdb/table/table_builder.h @@ -73,37 +73,38 @@ struct TableBuilderOptions { const InternalKeyComparator& _internal_comparator, const std::vector>* _int_tbl_prop_collector_factories, - CompressionType _compression_type, - const CompressionOptions& _compression_opts, - const std::string* _compression_dict, bool _skip_filters, + CompressionType _compression_type, uint64_t _sample_for_compression, + const CompressionOptions& _compression_opts, bool _skip_filters, const std::string& _column_family_name, int _level, - const uint64_t _creation_time = 0, const int64_t _oldest_key_time = 0) + const uint64_t _creation_time = 0, const int64_t _oldest_key_time = 0, + const uint64_t _target_file_size = 0) : ioptions(_ioptions), moptions(_moptions), internal_comparator(_internal_comparator), int_tbl_prop_collector_factories(_int_tbl_prop_collector_factories), compression_type(_compression_type), + sample_for_compression(_sample_for_compression), compression_opts(_compression_opts), - compression_dict(_compression_dict), skip_filters(_skip_filters), column_family_name(_column_family_name), level(_level), creation_time(_creation_time), - oldest_key_time(_oldest_key_time) {} + oldest_key_time(_oldest_key_time), + target_file_size(_target_file_size) {} const ImmutableCFOptions& ioptions; const MutableCFOptions& moptions; const InternalKeyComparator& internal_comparator; const std::vector>* int_tbl_prop_collector_factories; CompressionType compression_type; + uint64_t sample_for_compression; const CompressionOptions& compression_opts; - // Data for presetting the compression library's dictionary, or nullptr. - const std::string* compression_dict; bool skip_filters; // only used by BlockBasedTableBuilder const std::string& column_family_name; int level; // what level this table/file is on, -1 for "not set, don't know" const uint64_t creation_time; const int64_t oldest_key_time; + const uint64_t target_file_size; }; // TableBuilder provides the interface used to build a Table diff --git a/ceph/src/rocksdb/table/table_properties.cc b/ceph/src/rocksdb/table/table_properties.cc index 207a64191..b7aaea481 100644 --- a/ceph/src/rocksdb/table/table_properties.cc +++ b/ceph/src/rocksdb/table/table_properties.cc @@ -78,6 +78,9 @@ std::string TableProperties::ToString( AppendProperty(result, "# data blocks", num_data_blocks, prop_delim, kv_delim); AppendProperty(result, "# entries", num_entries, prop_delim, kv_delim); + AppendProperty(result, "# deletions", num_deletions, prop_delim, kv_delim); + AppendProperty(result, "# merge operands", num_merge_operands, prop_delim, + kv_delim); AppendProperty(result, "# range deletions", num_range_deletions, prop_delim, kv_delim); @@ -150,6 +153,11 @@ std::string TableProperties::ToString( compression_name.empty() ? std::string("N/A") : compression_name, prop_delim, kv_delim); + AppendProperty( + result, "SST file compression options", + compression_options.empty() ? std::string("N/A") : compression_options, + prop_delim, kv_delim); + AppendProperty(result, "creation time", creation_time, prop_delim, kv_delim); AppendProperty(result, "time stamp of earliest key", oldest_key_time, @@ -170,6 +178,8 @@ void TableProperties::Add(const TableProperties& tp) { raw_value_size += tp.raw_value_size; num_data_blocks += tp.num_data_blocks; num_entries += tp.num_entries; + num_deletions += tp.num_deletions; + num_merge_operands += tp.num_merge_operands; num_range_deletions += tp.num_range_deletions; } @@ -195,6 +205,9 @@ const std::string TablePropertiesNames::kNumDataBlocks = "rocksdb.num.data.blocks"; const std::string TablePropertiesNames::kNumEntries = "rocksdb.num.entries"; +const std::string TablePropertiesNames::kDeletedKeys = "rocksdb.deleted.keys"; +const std::string TablePropertiesNames::kMergeOperands = + "rocksdb.merge.operands"; const std::string TablePropertiesNames::kNumRangeDeletions = "rocksdb.num.range-deletions"; const std::string TablePropertiesNames::kFilterPolicy = @@ -215,6 +228,8 @@ const std::string TablePropertiesNames::kPrefixExtractorName = const std::string TablePropertiesNames::kPropertyCollectors = "rocksdb.property.collectors"; const std::string TablePropertiesNames::kCompression = "rocksdb.compression"; +const std::string TablePropertiesNames::kCompressionOptions = + "rocksdb.compression_options"; const std::string TablePropertiesNames::kCreationTime = "rocksdb.creation.time"; const std::string TablePropertiesNames::kOldestKeyTime = "rocksdb.oldest.key.time"; diff --git a/ceph/src/rocksdb/table/table_reader.h b/ceph/src/rocksdb/table/table_reader.h index 505b5ba1f..a5f15e130 100644 --- a/ceph/src/rocksdb/table/table_reader.h +++ b/ceph/src/rocksdb/table/table_reader.h @@ -9,6 +9,7 @@ #pragma once #include +#include "db/range_tombstone_fragmenter.h" #include "rocksdb/slice_transform.h" #include "table/internal_iterator.h" @@ -44,7 +45,7 @@ class TableReader { bool skip_filters = false, bool for_compaction = false) = 0; - virtual InternalIterator* NewRangeTombstoneIterator( + virtual FragmentedRangeTombstoneIterator* NewRangeTombstoneIterator( const ReadOptions& /*read_options*/) { return nullptr; } diff --git a/ceph/src/rocksdb/table/table_reader_bench.cc b/ceph/src/rocksdb/table/table_reader_bench.cc index 4032c4a5a..a9b75715b 100644 --- a/ceph/src/rocksdb/table/table_reader_bench.cc +++ b/ceph/src/rocksdb/table/table_reader_bench.cc @@ -86,9 +86,9 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options, const ImmutableCFOptions ioptions(opts); const ColumnFamilyOptions cfo(opts); const MutableCFOptions moptions(cfo); - unique_ptr file_writer; + std::unique_ptr file_writer; if (!through_db) { - unique_ptr file; + std::unique_ptr file; env->NewWritableFile(file_name, &file, env_options); std::vector > @@ -100,8 +100,8 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options, tb = opts.table_factory->NewTableBuilder( TableBuilderOptions( ioptions, moptions, ikc, &int_tbl_prop_collector_factories, - CompressionType::kNoCompression, CompressionOptions(), - nullptr /* compression_dict */, false /* skip_filters */, + CompressionType::kNoCompression, 0 /* sample_for_compression */, + CompressionOptions(), false /* skip_filters */, kDefaultColumnFamilyName, unknown_level), 0 /* column_family_id */, file_writer.get()); } else { @@ -127,9 +127,9 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options, db->Flush(FlushOptions()); } - unique_ptr table_reader; + std::unique_ptr table_reader; if (!through_db) { - unique_ptr raf; + std::unique_ptr raf; s = env->NewRandomAccessFile(file_name, &raf, env_options); if (!s.ok()) { fprintf(stderr, "Create File Error: %s\n", s.ToString().c_str()); @@ -137,7 +137,7 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options, } uint64_t file_size; env->GetFileSize(file_name, &file_size); - unique_ptr file_reader( + std::unique_ptr file_reader( new RandomAccessFileReader(std::move(raf), file_name)); s = opts.table_factory->NewTableReader( TableReaderOptions(ioptions, moptions.prefix_extractor.get(), @@ -170,12 +170,12 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options, if (!through_db) { PinnableSlice value; MergeContext merge_context; - RangeDelAggregator range_del_agg(ikc, {} /* snapshots */); + SequenceNumber max_covering_tombstone_seq = 0; GetContext get_context(ioptions.user_comparator, ioptions.merge_operator, ioptions.info_log, ioptions.statistics, GetContext::kNotFound, Slice(key), &value, nullptr, &merge_context, - &range_del_agg, env); + &max_covering_tombstone_seq, env); s = table_reader->Get(read_options, key, &get_context, nullptr); } else { s = db->Get(read_options, key, &result); diff --git a/ceph/src/rocksdb/table/table_test.cc b/ceph/src/rocksdb/table/table_test.cc index 26383fa81..f217fe50a 100644 --- a/ceph/src/rocksdb/table/table_test.cc +++ b/ceph/src/rocksdb/table/table_test.cc @@ -65,17 +65,17 @@ namespace { // DummyPropertiesCollector used to test BlockBasedTableProperties class DummyPropertiesCollector : public TablePropertiesCollector { public: - const char* Name() const { return ""; } + const char* Name() const override { return ""; } - Status Finish(UserCollectedProperties* /*properties*/) { + Status Finish(UserCollectedProperties* /*properties*/) override { return Status::OK(); } - Status Add(const Slice& /*user_key*/, const Slice& /*value*/) { + Status Add(const Slice& /*user_key*/, const Slice& /*value*/) override { return Status::OK(); } - virtual UserCollectedProperties GetReadableProperties() const { + UserCollectedProperties GetReadableProperties() const override { return UserCollectedProperties{}; } }; @@ -83,21 +83,21 @@ class DummyPropertiesCollector : public TablePropertiesCollector { class DummyPropertiesCollectorFactory1 : public TablePropertiesCollectorFactory { public: - virtual TablePropertiesCollector* CreateTablePropertiesCollector( - TablePropertiesCollectorFactory::Context /*context*/) { + TablePropertiesCollector* CreateTablePropertiesCollector( + TablePropertiesCollectorFactory::Context /*context*/) override { return new DummyPropertiesCollector(); } - const char* Name() const { return "DummyPropertiesCollector1"; } + const char* Name() const override { return "DummyPropertiesCollector1"; } }; class DummyPropertiesCollectorFactory2 : public TablePropertiesCollectorFactory { public: - virtual TablePropertiesCollector* CreateTablePropertiesCollector( - TablePropertiesCollectorFactory::Context /*context*/) { + TablePropertiesCollector* CreateTablePropertiesCollector( + TablePropertiesCollectorFactory::Context /*context*/) override { return new DummyPropertiesCollector(); } - const char* Name() const { return "DummyPropertiesCollector2"; } + const char* Name() const override { return "DummyPropertiesCollector2"; } }; // Return reverse of "key". @@ -110,23 +110,23 @@ std::string Reverse(const Slice& key) { class ReverseKeyComparator : public Comparator { public: - virtual const char* Name() const override { + const char* Name() const override { return "rocksdb.ReverseBytewiseComparator"; } - virtual int Compare(const Slice& a, const Slice& b) const override { + int Compare(const Slice& a, const Slice& b) const override { return BytewiseComparator()->Compare(Reverse(a), Reverse(b)); } - virtual void FindShortestSeparator(std::string* start, - const Slice& limit) const override { + void FindShortestSeparator(std::string* start, + const Slice& limit) const override { std::string s = Reverse(*start); std::string l = Reverse(limit); BytewiseComparator()->FindShortestSeparator(&s, l); *start = Reverse(s); } - virtual void FindShortSuccessor(std::string* key) const override { + void FindShortSuccessor(std::string* key) const override { std::string s = Reverse(*key); BytewiseComparator()->FindShortSuccessor(&s); *key = Reverse(s); @@ -212,15 +212,13 @@ class BlockConstructor: public Constructor { : Constructor(cmp), comparator_(cmp), block_(nullptr) { } - ~BlockConstructor() { - delete block_; - } - virtual Status FinishImpl( - const Options& /*options*/, const ImmutableCFOptions& /*ioptions*/, - const MutableCFOptions& /*moptions*/, - const BlockBasedTableOptions& table_options, - const InternalKeyComparator& /*internal_comparator*/, - const stl_wrappers::KVMap& kv_map) override { + ~BlockConstructor() override { delete block_; } + Status FinishImpl(const Options& /*options*/, + const ImmutableCFOptions& /*ioptions*/, + const MutableCFOptions& /*moptions*/, + const BlockBasedTableOptions& table_options, + const InternalKeyComparator& /*internal_comparator*/, + const stl_wrappers::KVMap& kv_map) override { delete block_; block_ = nullptr; BlockBuilder builder(table_options.block_restart_interval); @@ -232,11 +230,10 @@ class BlockConstructor: public Constructor { data_ = builder.Finish().ToString(); BlockContents contents; contents.data = data_; - contents.cachable = false; block_ = new Block(std::move(contents), kDisableGlobalSequenceNumber); return Status::OK(); } - virtual InternalIterator* NewIterator( + InternalIterator* NewIterator( const SliceTransform* /*prefix_extractor*/) const override { return block_->NewIterator(comparator_, comparator_); } @@ -255,32 +252,32 @@ class KeyConvertingIterator : public InternalIterator { explicit KeyConvertingIterator(InternalIterator* iter, bool arena_mode = false) : iter_(iter), arena_mode_(arena_mode) {} - virtual ~KeyConvertingIterator() { + ~KeyConvertingIterator() override { if (arena_mode_) { iter_->~InternalIterator(); } else { delete iter_; } } - virtual bool Valid() const override { return iter_->Valid() && status_.ok(); } - virtual void Seek(const Slice& target) override { + bool Valid() const override { return iter_->Valid() && status_.ok(); } + void Seek(const Slice& target) override { ParsedInternalKey ikey(target, kMaxSequenceNumber, kTypeValue); std::string encoded; AppendInternalKey(&encoded, ikey); iter_->Seek(encoded); } - virtual void SeekForPrev(const Slice& target) override { + void SeekForPrev(const Slice& target) override { ParsedInternalKey ikey(target, kMaxSequenceNumber, kTypeValue); std::string encoded; AppendInternalKey(&encoded, ikey); iter_->SeekForPrev(encoded); } - virtual void SeekToFirst() override { iter_->SeekToFirst(); } - virtual void SeekToLast() override { iter_->SeekToLast(); } - virtual void Next() override { iter_->Next(); } - virtual void Prev() override { iter_->Prev(); } + void SeekToFirst() override { iter_->SeekToFirst(); } + void SeekToLast() override { iter_->SeekToLast(); } + void Next() override { iter_->Next(); } + void Prev() override { iter_->Prev(); } - virtual Slice key() const override { + Slice key() const override { assert(Valid()); ParsedInternalKey parsed_key; if (!ParseInternalKey(iter_->key(), &parsed_key)) { @@ -290,8 +287,8 @@ class KeyConvertingIterator : public InternalIterator { return parsed_key.user_key; } - virtual Slice value() const override { return iter_->value(); } - virtual Status status() const override { + Slice value() const override { return iter_->value(); } + Status status() const override { return status_.ok() ? iter_->status() : status_; } @@ -313,28 +310,27 @@ class TableConstructor: public Constructor { : Constructor(cmp), convert_to_internal_key_(convert_to_internal_key), level_(level) {} - ~TableConstructor() { Reset(); } + ~TableConstructor() override { Reset(); } - virtual Status FinishImpl(const Options& options, - const ImmutableCFOptions& ioptions, - const MutableCFOptions& moptions, - const BlockBasedTableOptions& /*table_options*/, - const InternalKeyComparator& internal_comparator, - const stl_wrappers::KVMap& kv_map) override { + Status FinishImpl(const Options& options, const ImmutableCFOptions& ioptions, + const MutableCFOptions& moptions, + const BlockBasedTableOptions& /*table_options*/, + const InternalKeyComparator& internal_comparator, + const stl_wrappers::KVMap& kv_map) override { Reset(); soptions.use_mmap_reads = ioptions.allow_mmap_reads; file_writer_.reset(test::GetWritableFileWriter(new test::StringSink(), "" /* don't care */)); - unique_ptr builder; + std::unique_ptr builder; std::vector> int_tbl_prop_collector_factories; std::string column_family_name; builder.reset(ioptions.table_factory->NewTableBuilder( - TableBuilderOptions( - ioptions, moptions, internal_comparator, - &int_tbl_prop_collector_factories, options.compression, - CompressionOptions(), nullptr /* compression_dict */, - false /* skip_filters */, column_family_name, level_), + TableBuilderOptions(ioptions, moptions, internal_comparator, + &int_tbl_prop_collector_factories, + options.compression, options.sample_for_compression, + options.compression_opts, false /* skip_filters */, + column_family_name, level_), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer_.get())); @@ -369,7 +365,7 @@ class TableConstructor: public Constructor { &table_reader_); } - virtual InternalIterator* NewIterator( + InternalIterator* NewIterator( const SliceTransform* prefix_extractor) const override { ReadOptions ro; InternalIterator* iter = table_reader_->NewIterator(ro, prefix_extractor); @@ -402,7 +398,7 @@ class TableConstructor: public Constructor { virtual TableReader* GetTableReader() { return table_reader_.get(); } - virtual bool AnywayDeleteIterator() const override { + bool AnywayDeleteIterator() const override { return convert_to_internal_key_; } @@ -423,9 +419,9 @@ class TableConstructor: public Constructor { } uint64_t uniq_id_; - unique_ptr file_writer_; - unique_ptr file_reader_; - unique_ptr table_reader_; + std::unique_ptr file_writer_; + std::unique_ptr file_reader_; + std::unique_ptr table_reader_; bool convert_to_internal_key_; int level_; @@ -450,15 +446,12 @@ class MemTableConstructor: public Constructor { wb, kMaxSequenceNumber, 0 /* column_family_id */); memtable_->Ref(); } - ~MemTableConstructor() { - delete memtable_->Unref(); - } - virtual Status FinishImpl( - const Options&, const ImmutableCFOptions& ioptions, - const MutableCFOptions& /*moptions*/, - const BlockBasedTableOptions& /*table_options*/, - const InternalKeyComparator& /*internal_comparator*/, - const stl_wrappers::KVMap& kv_map) override { + ~MemTableConstructor() override { delete memtable_->Unref(); } + Status FinishImpl(const Options&, const ImmutableCFOptions& ioptions, + const MutableCFOptions& /*moptions*/, + const BlockBasedTableOptions& /*table_options*/, + const InternalKeyComparator& /*internal_comparator*/, + const stl_wrappers::KVMap& kv_map) override { delete memtable_->Unref(); ImmutableCFOptions mem_ioptions(ioptions); memtable_ = new MemTable(internal_comparator_, mem_ioptions, @@ -472,15 +465,15 @@ class MemTableConstructor: public Constructor { } return Status::OK(); } - virtual InternalIterator* NewIterator( + InternalIterator* NewIterator( const SliceTransform* /*prefix_extractor*/) const override { return new KeyConvertingIterator( memtable_->NewIterator(ReadOptions(), &arena_), true); } - virtual bool AnywayDeleteIterator() const override { return true; } + bool AnywayDeleteIterator() const override { return true; } - virtual bool IsArenaMode() const override { return true; } + bool IsArenaMode() const override { return true; } private: mutable Arena arena_; @@ -494,21 +487,19 @@ class MemTableConstructor: public Constructor { class InternalIteratorFromIterator : public InternalIterator { public: explicit InternalIteratorFromIterator(Iterator* it) : it_(it) {} - virtual bool Valid() const override { return it_->Valid(); } - virtual void Seek(const Slice& target) override { it_->Seek(target); } - virtual void SeekForPrev(const Slice& target) override { - it_->SeekForPrev(target); - } - virtual void SeekToFirst() override { it_->SeekToFirst(); } - virtual void SeekToLast() override { it_->SeekToLast(); } - virtual void Next() override { it_->Next(); } - virtual void Prev() override { it_->Prev(); } + bool Valid() const override { return it_->Valid(); } + void Seek(const Slice& target) override { it_->Seek(target); } + void SeekForPrev(const Slice& target) override { it_->SeekForPrev(target); } + void SeekToFirst() override { it_->SeekToFirst(); } + void SeekToLast() override { it_->SeekToLast(); } + void Next() override { it_->Next(); } + void Prev() override { it_->Prev(); } Slice key() const override { return it_->key(); } Slice value() const override { return it_->value(); } - virtual Status status() const override { return it_->status(); } + Status status() const override { return it_->status(); } private: - unique_ptr it_; + std::unique_ptr it_; }; class DBConstructor: public Constructor { @@ -519,15 +510,13 @@ class DBConstructor: public Constructor { db_ = nullptr; NewDB(); } - ~DBConstructor() { - delete db_; - } - virtual Status FinishImpl( - const Options& /*options*/, const ImmutableCFOptions& /*ioptions*/, - const MutableCFOptions& /*moptions*/, - const BlockBasedTableOptions& /*table_options*/, - const InternalKeyComparator& /*internal_comparator*/, - const stl_wrappers::KVMap& kv_map) override { + ~DBConstructor() override { delete db_; } + Status FinishImpl(const Options& /*options*/, + const ImmutableCFOptions& /*ioptions*/, + const MutableCFOptions& /*moptions*/, + const BlockBasedTableOptions& /*table_options*/, + const InternalKeyComparator& /*internal_comparator*/, + const stl_wrappers::KVMap& kv_map) override { delete db_; db_ = nullptr; NewDB(); @@ -539,12 +528,12 @@ class DBConstructor: public Constructor { return Status::OK(); } - virtual InternalIterator* NewIterator( + InternalIterator* NewIterator( const SliceTransform* /*prefix_extractor*/) const override { return new InternalIteratorFromIterator(db_->NewIterator(ReadOptions())); } - virtual DB* db() const override { return db_; } + DB* db() const override { return db_; } private: void NewDB() { @@ -680,9 +669,9 @@ class FixedOrLessPrefixTransform : public SliceTransform { prefix_len_(prefix_len) { } - virtual const char* Name() const override { return "rocksdb.FixedPrefix"; } + const char* Name() const override { return "rocksdb.FixedPrefix"; } - virtual Slice Transform(const Slice& src) const override { + Slice Transform(const Slice& src) const override { assert(InDomain(src)); if (src.size() < prefix_len_) { return src; @@ -690,14 +679,12 @@ class FixedOrLessPrefixTransform : public SliceTransform { return Slice(src.data(), prefix_len_); } - virtual bool InDomain(const Slice& /*src*/) const override { return true; } + bool InDomain(const Slice& /*src*/) const override { return true; } - virtual bool InRange(const Slice& dst) const override { + bool InRange(const Slice& dst) const override { return (dst.size() <= prefix_len_); } - virtual bool FullLengthEnabled(size_t* /*len*/) const override { - return false; - } + bool FullLengthEnabled(size_t* /*len*/) const override { return false; } }; class HarnessTest : public testing::Test { @@ -806,7 +793,7 @@ class HarnessTest : public testing::Test { moptions_ = MutableCFOptions(options_); } - ~HarnessTest() { delete constructor_; } + ~HarnessTest() override { delete constructor_; } void Add(const std::string& key, const std::string& value) { constructor_->Add(key, value); @@ -1024,7 +1011,7 @@ class HarnessTest : public testing::Test { WriteBufferManager write_buffer_; bool support_prev_; bool only_support_prefix_seek_; - shared_ptr internal_comparator_; + std::shared_ptr internal_comparator_; }; static bool Between(uint64_t val, uint64_t low, uint64_t high) { @@ -1141,7 +1128,7 @@ TEST_P(BlockBasedTableTest, BasicBlockBasedTableProperties) { Options options; options.compression = kNoCompression; options.statistics = CreateDBStatistics(); - options.statistics->stats_level_ = StatsLevel::kAll; + options.statistics->set_stats_level(StatsLevel::kAll); BlockBasedTableOptions table_options = GetBlockBasedTableOptions(); table_options.block_restart_interval = 1; options.table_factory.reset(NewBlockBasedTableFactory(table_options)); @@ -1189,7 +1176,7 @@ uint64_t BlockBasedTableTest::IndexUncompressedHelper(bool compressed) { Options options; options.compression = kSnappyCompression; options.statistics = CreateDBStatistics(); - options.statistics->stats_level_ = StatsLevel::kAll; + options.statistics->set_stats_level(StatsLevel::kAll); BlockBasedTableOptions table_options = GetBlockBasedTableOptions(); table_options.block_restart_interval = 1; table_options.enable_index_compression = compressed; @@ -1278,6 +1265,13 @@ TEST_P(BlockBasedTableTest, RangeDelBlock) { std::vector keys = {"1pika", "2chu"}; std::vector vals = {"p", "c"}; + std::vector expected_tombstones = { + {"1pika", "2chu", 0}, + {"2chu", "c", 1}, + {"2chu", "c", 0}, + {"c", "p", 0}, + }; + for (int i = 0; i < 2; i++) { RangeTombstone t(keys[i], vals[i], i); std::pair p = t.Serialize(); @@ -1310,14 +1304,15 @@ TEST_P(BlockBasedTableTest, RangeDelBlock) { ASSERT_FALSE(iter->Valid()); iter->SeekToFirst(); ASSERT_TRUE(iter->Valid()); - for (int i = 0; i < 2; i++) { + for (size_t i = 0; i < expected_tombstones.size(); i++) { ASSERT_TRUE(iter->Valid()); ParsedInternalKey parsed_key; ASSERT_TRUE(ParseInternalKey(iter->key(), &parsed_key)); RangeTombstone t(parsed_key, iter->value()); - ASSERT_EQ(t.start_key_, keys[i]); - ASSERT_EQ(t.end_key_, vals[i]); - ASSERT_EQ(t.seq_, i); + const auto& expected_t = expected_tombstones[i]; + ASSERT_EQ(t.start_key_, expected_t.start_key_); + ASSERT_EQ(t.end_key_, expected_t.end_key_); + ASSERT_EQ(t.seq_, expected_t.seq_); iter->Next(); } ASSERT_TRUE(!iter->Valid()); @@ -1385,8 +1380,8 @@ void PrefetchRange(TableConstructor* c, Options* opt, // prefetch auto* table_reader = dynamic_cast(c->GetTableReader()); Status s; - unique_ptr begin, end; - unique_ptr i_begin, i_end; + std::unique_ptr begin, end; + std::unique_ptr i_begin, i_end; if (key_begin != nullptr) { if (c->ConvertToInternalKey()) { i_begin.reset(new InternalKey(key_begin, kMaxSequenceNumber, kTypeValue)); @@ -1417,7 +1412,7 @@ TEST_P(BlockBasedTableTest, PrefetchTest) { // The purpose of this test is to test the prefetching operation built into // BlockBasedTable. Options opt; - unique_ptr ikc; + std::unique_ptr ikc; ikc.reset(new test::PlainInternalKeyComparator(opt.comparator)); opt.compression = kNoCompression; BlockBasedTableOptions table_options = GetBlockBasedTableOptions(); @@ -2009,7 +2004,7 @@ TEST_P(BlockBasedTableTest, FilterBlockInBlockCache) { // -- PART 1: Open with regular block cache. // Since block_cache is disabled, no cache activities will be involved. - unique_ptr iter; + std::unique_ptr iter; int64_t last_cache_bytes_read = 0; // At first, no block will be accessed. @@ -2280,10 +2275,10 @@ class MockCache : public LRUCache { double high_pri_pool_ratio) : LRUCache(capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio) {} - virtual Status Insert(const Slice& key, void* value, size_t charge, - void (*deleter)(const Slice& key, void* value), - Handle** handle = nullptr, - Priority priority = Priority::LOW) override { + Status Insert(const Slice& key, void* value, size_t charge, + void (*deleter)(const Slice& key, void* value), + Handle** handle = nullptr, + Priority priority = Priority::LOW) override { // Replace the deleter with our own so that we keep track of data blocks // erased from the cache deleters_[key.ToString()] = deleter; @@ -2291,8 +2286,7 @@ class MockCache : public LRUCache { priority); } // This is called by the application right after inserting a data block - virtual void TEST_mark_as_data_block(const Slice& key, - size_t charge) override { + void TEST_mark_as_data_block(const Slice& key, size_t charge) override { marked_data_in_cache_[key.ToString()] = charge; marked_size_ += charge; } @@ -2323,93 +2317,122 @@ std::map MockCache::marked_data_in_cache_; // table is closed. This test makes sure that the only items remains in the // cache after the table is closed are raw data blocks. TEST_P(BlockBasedTableTest, NoObjectInCacheAfterTableClose) { + std::vector compression_types{kNoCompression}; + + // The following are the compression library versions supporting compression + // dictionaries. See the test case CacheCompressionDict in the + // DBBlockCacheTest suite. +#ifdef ZLIB + compression_types.push_back(kZlibCompression); +#endif // ZLIB +#if LZ4_VERSION_NUMBER >= 10400 + compression_types.push_back(kLZ4Compression); + compression_types.push_back(kLZ4HCCompression); +#endif // LZ4_VERSION_NUMBER >= 10400 +#if ZSTD_VERSION_NUMBER >= 500 + compression_types.push_back(kZSTD); +#endif // ZSTD_VERSION_NUMBER >= 500 + for (int level: {-1, 0, 1, 10}) { - for (auto index_type : - {BlockBasedTableOptions::IndexType::kBinarySearch, + for (auto index_type : + {BlockBasedTableOptions::IndexType::kBinarySearch, BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch}) { - for (bool block_based_filter : {true, false}) { - for (bool partition_filter : {true, false}) { - if (partition_filter && - (block_based_filter || - index_type != - BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch)) { - continue; - } - for (bool index_and_filter_in_cache : {true, false}) { - for (bool pin_l0 : {true, false}) { - for (bool pin_top_level : {true, false}) { - if (pin_l0 && !index_and_filter_in_cache) { - continue; - } - // Create a table - Options opt; - unique_ptr ikc; - ikc.reset(new test::PlainInternalKeyComparator(opt.comparator)); - opt.compression = kNoCompression; - BlockBasedTableOptions table_options = - GetBlockBasedTableOptions(); - table_options.block_size = 1024; - table_options.index_type = - BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch; - table_options.pin_l0_filter_and_index_blocks_in_cache = pin_l0; - table_options.pin_top_level_index_and_filter = pin_top_level; - table_options.partition_filters = partition_filter; - table_options.cache_index_and_filter_blocks = - index_and_filter_in_cache; - // big enough so we don't ever lose cached values. - table_options.block_cache = std::shared_ptr( - new MockCache(16 * 1024 * 1024, 4, false, 0.0)); - table_options.filter_policy.reset( - rocksdb::NewBloomFilterPolicy(10, block_based_filter)); - opt.table_factory.reset(NewBlockBasedTableFactory(table_options)); - - bool convert_to_internal_key = false; - TableConstructor c(BytewiseComparator(), convert_to_internal_key, - level); - std::string user_key = "k01"; - std::string key = - InternalKey(user_key, 0, kTypeValue).Encode().ToString(); - c.Add(key, "hello"); - std::vector keys; - stl_wrappers::KVMap kvmap; - const ImmutableCFOptions ioptions(opt); - const MutableCFOptions moptions(opt); - c.Finish(opt, ioptions, moptions, table_options, *ikc, &keys, - &kvmap); - - // Doing a read to make index/filter loaded into the cache - auto table_reader = - dynamic_cast(c.GetTableReader()); - PinnableSlice value; - GetContext get_context(opt.comparator, nullptr, nullptr, nullptr, - GetContext::kNotFound, user_key, &value, - nullptr, nullptr, nullptr, nullptr); - InternalKey ikey(user_key, 0, kTypeValue); - auto s = table_reader->Get(ReadOptions(), key, &get_context, - moptions.prefix_extractor.get()); - ASSERT_EQ(get_context.State(), GetContext::kFound); - ASSERT_STREQ(value.data(), "hello"); - - // Close the table - c.ResetTableReader(); - - auto usage = table_options.block_cache->GetUsage(); - auto pinned_usage = table_options.block_cache->GetPinnedUsage(); - // The only usage must be for marked data blocks - ASSERT_EQ(usage, MockCache::marked_size_); - // There must be some pinned data since PinnableSlice has not - // released them yet - ASSERT_GT(pinned_usage, 0); - // Release pinnable slice reousrces - value.Reset(); - pinned_usage = table_options.block_cache->GetPinnedUsage(); - ASSERT_EQ(pinned_usage, 0); + for (bool block_based_filter : {true, false}) { + for (bool partition_filter : {true, false}) { + if (partition_filter && + (block_based_filter || + index_type != + BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch)) { + continue; } + for (bool index_and_filter_in_cache : {true, false}) { + for (bool pin_l0 : {true, false}) { + for (bool pin_top_level : {true, false}) { + if (pin_l0 && !index_and_filter_in_cache) { + continue; + } + + for (auto compression_type : compression_types) { + for (uint32_t max_dict_bytes : {0, 1 << 14}) { + if (compression_type == kNoCompression && max_dict_bytes) + continue; + + // Create a table + Options opt; + std::unique_ptr ikc; + ikc.reset(new test::PlainInternalKeyComparator( + opt.comparator)); + opt.compression = compression_type; + opt.compression_opts.max_dict_bytes = max_dict_bytes; + BlockBasedTableOptions table_options = + GetBlockBasedTableOptions(); + table_options.block_size = 1024; + table_options.index_type = index_type; + table_options.pin_l0_filter_and_index_blocks_in_cache = + pin_l0; + table_options.pin_top_level_index_and_filter = + pin_top_level; + table_options.partition_filters = partition_filter; + table_options.cache_index_and_filter_blocks = + index_and_filter_in_cache; + // big enough so we don't ever lose cached values. + table_options.block_cache = std::make_shared( + 16 * 1024 * 1024, 4, false, 0.0); + table_options.filter_policy.reset( + rocksdb::NewBloomFilterPolicy(10, block_based_filter)); + opt.table_factory.reset(NewBlockBasedTableFactory( + table_options)); + + bool convert_to_internal_key = false; + TableConstructor c(BytewiseComparator(), + convert_to_internal_key, level); + std::string user_key = "k01"; + std::string key = + InternalKey(user_key, 0, kTypeValue).Encode().ToString(); + c.Add(key, "hello"); + std::vector keys; + stl_wrappers::KVMap kvmap; + const ImmutableCFOptions ioptions(opt); + const MutableCFOptions moptions(opt); + c.Finish(opt, ioptions, moptions, table_options, *ikc, + &keys, &kvmap); + + // Doing a read to make index/filter loaded into the cache + auto table_reader = + dynamic_cast(c.GetTableReader()); + PinnableSlice value; + GetContext get_context(opt.comparator, nullptr, nullptr, + nullptr, GetContext::kNotFound, user_key, &value, + nullptr, nullptr, nullptr, nullptr); + InternalKey ikey(user_key, 0, kTypeValue); + auto s = table_reader->Get(ReadOptions(), key, &get_context, + moptions.prefix_extractor.get()); + ASSERT_EQ(get_context.State(), GetContext::kFound); + ASSERT_STREQ(value.data(), "hello"); + + // Close the table + c.ResetTableReader(); + + auto usage = table_options.block_cache->GetUsage(); + auto pinned_usage = + table_options.block_cache->GetPinnedUsage(); + // The only usage must be for marked data blocks + ASSERT_EQ(usage, MockCache::marked_size_); + // There must be some pinned data since PinnableSlice has + // not released them yet + ASSERT_GT(pinned_usage, 0); + // Release pinnable slice reousrces + value.Reset(); + pinned_usage = table_options.block_cache->GetPinnedUsage(); + ASSERT_EQ(pinned_usage, 0); + } + } + } + } } } } } - } } // level } @@ -2419,7 +2442,7 @@ TEST_P(BlockBasedTableTest, BlockCacheLeak) { // unique ID from the file. Options opt; - unique_ptr ikc; + std::unique_ptr ikc; ikc.reset(new test::PlainInternalKeyComparator(opt.comparator)); opt.compression = kNoCompression; BlockBasedTableOptions table_options = GetBlockBasedTableOptions(); @@ -2442,7 +2465,7 @@ TEST_P(BlockBasedTableTest, BlockCacheLeak) { const MutableCFOptions moptions(opt); c.Finish(opt, ioptions, moptions, table_options, *ikc, &keys, &kvmap); - unique_ptr iter( + std::unique_ptr iter( c.NewIterator(moptions.prefix_extractor.get())); iter->SeekToFirst(); while (iter->Valid()) { @@ -2477,6 +2500,78 @@ TEST_P(BlockBasedTableTest, BlockCacheLeak) { c.ResetTableReader(); } +namespace { +class CustomMemoryAllocator : public MemoryAllocator { + public: + const char* Name() const override { return "CustomMemoryAllocator"; } + + void* Allocate(size_t size) override { + ++numAllocations; + auto ptr = new char[size + 16]; + memcpy(ptr, "memory_allocator_", 16); // mangle first 16 bytes + return reinterpret_cast(ptr + 16); + } + void Deallocate(void* p) override { + ++numDeallocations; + char* ptr = reinterpret_cast(p) - 16; + delete[] ptr; + } + + std::atomic numAllocations; + std::atomic numDeallocations; +}; +} // namespace + +TEST_P(BlockBasedTableTest, MemoryAllocator) { + auto custom_memory_allocator = std::make_shared(); + { + Options opt; + std::unique_ptr ikc; + ikc.reset(new test::PlainInternalKeyComparator(opt.comparator)); + opt.compression = kNoCompression; + BlockBasedTableOptions table_options; + table_options.block_size = 1024; + LRUCacheOptions lruOptions; + lruOptions.memory_allocator = custom_memory_allocator; + lruOptions.capacity = 16 * 1024 * 1024; + lruOptions.num_shard_bits = 4; + table_options.block_cache = NewLRUCache(std::move(lruOptions)); + opt.table_factory.reset(NewBlockBasedTableFactory(table_options)); + + TableConstructor c(BytewiseComparator(), + true /* convert_to_internal_key_ */); + c.Add("k01", "hello"); + c.Add("k02", "hello2"); + c.Add("k03", std::string(10000, 'x')); + c.Add("k04", std::string(200000, 'x')); + c.Add("k05", std::string(300000, 'x')); + c.Add("k06", "hello3"); + c.Add("k07", std::string(100000, 'x')); + std::vector keys; + stl_wrappers::KVMap kvmap; + const ImmutableCFOptions ioptions(opt); + const MutableCFOptions moptions(opt); + c.Finish(opt, ioptions, moptions, table_options, *ikc, &keys, &kvmap); + + std::unique_ptr iter( + c.NewIterator(moptions.prefix_extractor.get())); + iter->SeekToFirst(); + while (iter->Valid()) { + iter->key(); + iter->value(); + iter->Next(); + } + ASSERT_OK(iter->status()); + } + + // out of scope, block cache should have been deleted, all allocations + // deallocated + EXPECT_EQ(custom_memory_allocator->numAllocations.load(), + custom_memory_allocator->numDeallocations.load()); + // make sure that allocations actually happened through the cache allocator + EXPECT_GT(custom_memory_allocator->numAllocations.load(), 0); +} + TEST_P(BlockBasedTableTest, NewIndexIteratorLeak) { // A regression test to avoid data race described in // https://github.com/facebook/rocksdb/issues/1267 @@ -2550,7 +2645,7 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) { PlainTableFactory factory(plain_table_options); test::StringSink sink; - unique_ptr file_writer( + std::unique_ptr file_writer( test::GetWritableFileWriter(new test::StringSink(), "" /* don't care */)); Options options; const ImmutableCFOptions ioptions(options); @@ -2563,7 +2658,7 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) { std::unique_ptr builder(factory.NewTableBuilder( TableBuilderOptions( ioptions, moptions, ikc, &int_tbl_prop_collector_factories, - kNoCompression, CompressionOptions(), nullptr /* compression_dict */, + kNoCompression, 0 /* sample_for_compression */, CompressionOptions(), false /* skip_filters */, column_family_name, unknown_level), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); @@ -2579,7 +2674,7 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) { test::StringSink* ss = static_cast(file_writer->writable_file()); - unique_ptr file_reader( + std::unique_ptr file_reader( test::GetRandomAccessFileReader( new test::StringSource(ss->contents(), 72242, true))); @@ -2658,9 +2753,9 @@ static void DoCompressionTest(CompressionType comp) { ASSERT_TRUE(Between(c.ApproximateOffsetOf("abc"), 0, 0)); ASSERT_TRUE(Between(c.ApproximateOffsetOf("k01"), 0, 0)); ASSERT_TRUE(Between(c.ApproximateOffsetOf("k02"), 0, 0)); - ASSERT_TRUE(Between(c.ApproximateOffsetOf("k03"), 2000, 3000)); - ASSERT_TRUE(Between(c.ApproximateOffsetOf("k04"), 2000, 3000)); - ASSERT_TRUE(Between(c.ApproximateOffsetOf("xyz"), 4000, 6100)); + ASSERT_TRUE(Between(c.ApproximateOffsetOf("k03"), 2000, 3500)); + ASSERT_TRUE(Between(c.ApproximateOffsetOf("k04"), 2000, 3500)); + ASSERT_TRUE(Between(c.ApproximateOffsetOf("xyz"), 4000, 6500)); c.ResetTableReader(); } @@ -2706,6 +2801,7 @@ TEST_F(GeneralTableTest, ApproximateOffsetOfCompressed) { } } +#ifndef ROCKSDB_VALGRIND_RUN // RandomizedHarnessTest is very slow for certain combination of arguments // Split into 8 pieces to reduce the time individual tests take. TEST_F(HarnessTest, Randomized1) { @@ -2789,6 +2885,7 @@ TEST_F(HarnessTest, RandomizedLongDB) { ASSERT_GT(files, 0); } #endif // ROCKSDB_LITE +#endif // ROCKSDB_VALGRIND_RUN class MemTableTest : public testing::Test {}; @@ -2824,7 +2921,8 @@ TEST_F(MemTableTest, Simple) { iter = memtable->NewIterator(ReadOptions(), &arena); arena_iter_guard.set(iter); } else { - iter = memtable->NewRangeTombstoneIterator(ReadOptions()); + iter = memtable->NewRangeTombstoneIterator( + ReadOptions(), kMaxSequenceNumber /* read_seq */); iter_guard.reset(iter); } if (iter == nullptr) { @@ -2924,6 +3022,26 @@ TEST_F(HarnessTest, FooterTests) { ASSERT_EQ(decoded_footer.index_handle().size(), index.size()); ASSERT_EQ(decoded_footer.version(), 1U); } + { + // xxhash64 block based + std::string encoded; + Footer footer(kBlockBasedTableMagicNumber, 1); + BlockHandle meta_index(10, 5), index(20, 15); + footer.set_metaindex_handle(meta_index); + footer.set_index_handle(index); + footer.set_checksum(kxxHash64); + footer.EncodeTo(&encoded); + Footer decoded_footer; + Slice encoded_slice(encoded); + decoded_footer.DecodeFrom(&encoded_slice); + ASSERT_EQ(decoded_footer.table_magic_number(), kBlockBasedTableMagicNumber); + ASSERT_EQ(decoded_footer.checksum(), kxxHash64); + ASSERT_EQ(decoded_footer.metaindex_handle().offset(), meta_index.offset()); + ASSERT_EQ(decoded_footer.metaindex_handle().size(), meta_index.size()); + ASSERT_EQ(decoded_footer.index_handle().offset(), index.offset()); + ASSERT_EQ(decoded_footer.index_handle().size(), index.size()); + ASSERT_EQ(decoded_footer.version(), 1U); + } // Plain table is not supported in ROCKSDB_LITE #ifndef ROCKSDB_LITE { @@ -3063,7 +3181,7 @@ TEST_P(IndexBlockRestartIntervalTest, IndexBlockRestartInterval) { class PrefixTest : public testing::Test { public: PrefixTest() : testing::Test() {} - ~PrefixTest() {} + ~PrefixTest() override {} }; namespace { @@ -3151,7 +3269,7 @@ TEST_F(PrefixTest, PrefixAndWholeKeyTest) { TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) { BlockBasedTableOptions bbto = GetBlockBasedTableOptions(); test::StringSink* sink = new test::StringSink(); - unique_ptr file_writer( + std::unique_ptr file_writer( test::GetWritableFileWriter(sink, "" /* don't care */)); Options options; options.table_factory.reset(NewBlockBasedTableFactory(bbto)); @@ -3167,7 +3285,7 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) { std::unique_ptr builder(options.table_factory->NewTableBuilder( TableBuilderOptions(ioptions, moptions, ikc, &int_tbl_prop_collector_factories, kNoCompression, - CompressionOptions(), nullptr /* compression_dict */, + 0 /* sample_for_compression */, CompressionOptions(), false /* skip_filters */, column_family_name, -1), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); @@ -3189,7 +3307,7 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) { // Helper function to get version, global_seqno, global_seqno_offset std::function GetVersionAndGlobalSeqno = [&]() { - unique_ptr file_reader( + std::unique_ptr file_reader( test::GetRandomAccessFileReader( new test::StringSource(ss_rw.contents(), 73342, true))); @@ -3218,9 +3336,9 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) { }; // Helper function to get the contents of the table InternalIterator - unique_ptr table_reader; + std::unique_ptr table_reader; std::function GetTableInternalIter = [&]() { - unique_ptr file_reader( + std::unique_ptr file_reader( test::GetRandomAccessFileReader( new test::StringSource(ss_rw.contents(), 73342, true))); @@ -3333,7 +3451,7 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) { BlockBasedTableOptions bbto = GetBlockBasedTableOptions(); bbto.block_align = true; test::StringSink* sink = new test::StringSink(); - unique_ptr file_writer( + std::unique_ptr file_writer( test::GetWritableFileWriter(sink, "" /* don't care */)); Options options; options.compression = kNoCompression; @@ -3347,7 +3465,7 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) { std::unique_ptr builder(options.table_factory->NewTableBuilder( TableBuilderOptions(ioptions, moptions, ikc, &int_tbl_prop_collector_factories, kNoCompression, - CompressionOptions(), nullptr /* compression_dict */, + 0 /* sample_for_compression */, CompressionOptions(), false /* skip_filters */, column_family_name, -1), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); @@ -3365,7 +3483,7 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) { file_writer->Flush(); test::RandomRWStringSink ss_rw(sink); - unique_ptr file_reader( + std::unique_ptr file_reader( test::GetRandomAccessFileReader( new test::StringSource(ss_rw.contents(), 73342, true))); @@ -3423,7 +3541,7 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) { BlockBasedTableOptions bbto = GetBlockBasedTableOptions(); bbto.block_align = true; test::StringSink* sink = new test::StringSink(); - unique_ptr file_writer( + std::unique_ptr file_writer( test::GetWritableFileWriter(sink, "" /* don't care */)); Options options; @@ -3440,7 +3558,7 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) { std::unique_ptr builder(options.table_factory->NewTableBuilder( TableBuilderOptions(ioptions, moptions, ikc, &int_tbl_prop_collector_factories, kNoCompression, - CompressionOptions(), nullptr /* compression_dict */, + 0 /* sample_for_compression */, CompressionOptions(), false /* skip_filters */, column_family_name, -1), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); @@ -3458,7 +3576,7 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) { file_writer->Flush(); test::RandomRWStringSink ss_rw(sink); - unique_ptr file_reader( + std::unique_ptr file_reader( test::GetRandomAccessFileReader( new test::StringSource(ss_rw.contents(), 73342, true))); @@ -3474,13 +3592,13 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) { BlockContents* contents) { ReadOptions read_options; read_options.verify_checksums = false; - Slice compression_dict; PersistentCacheOptions cache_options; - BlockFetcher block_fetcher(file, nullptr /* prefetch_buffer */, footer, - read_options, handle, contents, ioptions, - false /* decompress */, compression_dict, - cache_options); + BlockFetcher block_fetcher( + file, nullptr /* prefetch_buffer */, footer, read_options, handle, + contents, ioptions, false /* decompress */, + false /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + cache_options); ASSERT_OK(block_fetcher.ReadBlockContents()); }; @@ -3561,12 +3679,12 @@ TEST_P(BlockBasedTableTest, PropertiesMetaBlockLast) { // read metaindex auto metaindex_handle = footer.metaindex_handle(); BlockContents metaindex_contents; - Slice compression_dict; PersistentCacheOptions pcache_opts; BlockFetcher block_fetcher( table_reader.get(), nullptr /* prefetch_buffer */, footer, ReadOptions(), metaindex_handle, &metaindex_contents, ioptions, false /* decompress */, - compression_dict, pcache_opts); + false /*maybe_compressed*/, UncompressionDict::GetEmptyDict(), + pcache_opts, nullptr /*memory_allocator*/); ASSERT_OK(block_fetcher.ReadBlockContents()); Block metaindex_block(std::move(metaindex_contents), kDisableGlobalSequenceNumber); diff --git a/ceph/src/rocksdb/table/two_level_iterator.cc b/ceph/src/rocksdb/table/two_level_iterator.cc index 58ab61c69..a8f617dee 100644 --- a/ceph/src/rocksdb/table/two_level_iterator.cc +++ b/ceph/src/rocksdb/table/two_level_iterator.cc @@ -25,29 +25,29 @@ class TwoLevelIndexIterator : public InternalIteratorBase { TwoLevelIteratorState* state, InternalIteratorBase* first_level_iter); - virtual ~TwoLevelIndexIterator() { + ~TwoLevelIndexIterator() override { first_level_iter_.DeleteIter(false /* is_arena_mode */); second_level_iter_.DeleteIter(false /* is_arena_mode */); delete state_; } - virtual void Seek(const Slice& target) override; - virtual void SeekForPrev(const Slice& target) override; - virtual void SeekToFirst() override; - virtual void SeekToLast() override; - virtual void Next() override; - virtual void Prev() override; + void Seek(const Slice& target) override; + void SeekForPrev(const Slice& target) override; + void SeekToFirst() override; + void SeekToLast() override; + void Next() override; + void Prev() override; - virtual bool Valid() const override { return second_level_iter_.Valid(); } - virtual Slice key() const override { + bool Valid() const override { return second_level_iter_.Valid(); } + Slice key() const override { assert(Valid()); return second_level_iter_.key(); } - virtual BlockHandle value() const override { + BlockHandle value() const override { assert(Valid()); return second_level_iter_.value(); } - virtual Status status() const override { + Status status() const override { if (!first_level_iter_.status().ok()) { assert(second_level_iter_.iter() == nullptr); return first_level_iter_.status(); @@ -58,10 +58,10 @@ class TwoLevelIndexIterator : public InternalIteratorBase { return status_; } } - virtual void SetPinnedItersMgr( + void SetPinnedItersMgr( PinnedIteratorsManager* /*pinned_iters_mgr*/) override {} - virtual bool IsKeyPinned() const override { return false; } - virtual bool IsValuePinned() const override { return false; } + bool IsKeyPinned() const override { return false; } + bool IsValuePinned() const override { return false; } private: void SaveError(const Status& s) { diff --git a/ceph/src/rocksdb/third-party/fbson/COMMIT.md b/ceph/src/rocksdb/third-party/fbson/COMMIT.md deleted file mode 100644 index b38b5424d..000000000 --- a/ceph/src/rocksdb/third-party/fbson/COMMIT.md +++ /dev/null @@ -1,5 +0,0 @@ -fbson commit: -https://github.com/facebook/mysql-5.6/commit/55ef9ff25c934659a70b4094e9b406c48e9dd43d - -# TODO. -* Had to convert zero sized array to [1] sized arrays due to the fact that MS Compiler complains about it not being standard. At some point need to contribute this change back to MySql where this code was taken from. diff --git a/ceph/src/rocksdb/third-party/fbson/FbsonDocument.h b/ceph/src/rocksdb/third-party/fbson/FbsonDocument.h deleted file mode 100644 index c69fcb45f..000000000 --- a/ceph/src/rocksdb/third-party/fbson/FbsonDocument.h +++ /dev/null @@ -1,890 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -/* - * This header defines FbsonDocument, FbsonKeyValue, and various value classes - * which are derived from FbsonValue, and a forward iterator for container - * values - essentially everything that is related to FBSON binary data - * structures. - * - * Implementation notes: - * - * None of the classes in this header file can be instantiated directly (i.e. - * you cannot create a FbsonKeyValue or FbsonValue object - all constructors - * are declared non-public). We use the classes as wrappers on the packed FBSON - * bytes (serialized), and cast the classes (types) to the underlying packed - * byte array. - * - * For the same reason, we cannot define any FBSON value class to be virtual, - * since we never call constructors, and will not instantiate vtbl and vptrs. - * - * Therefore, the classes are defined as packed structures (i.e. no data - * alignment and padding), and the private member variables of the classes are - * defined precisely in the same order as the FBSON spec. This ensures we - * access the packed FBSON bytes correctly. - * - * The packed structures are highly optimized for in-place operations with low - * overhead. The reads (and in-place writes) are performed directly on packed - * bytes. There is no memory allocation at all at runtime. - * - * For updates/writes of values that will expand the original FBSON size, the - * write will fail, and the caller needs to handle buffer increase. - * - * ** Iterator ** - * Both ObjectVal class and ArrayVal class have iterator type that you can use - * to declare an iterator on a container object to go through the key-value - * pairs or value list. The iterator has both non-const and const types. - * - * Note: iterators are forward direction only. - * - * ** Query ** - * Querying into containers is through the member functions find (for key/value - * pairs) and get (for array elements), and is in streaming style. We don't - * need to read/scan the whole FBSON packed bytes in order to return results. - * Once the key/index is found, we will stop search. You can use text to query - * both objects and array (for array, text will be converted to integer index), - * and use index to retrieve from array. Array index is 0-based. - * - * ** External dictionary ** - * During query processing, you can also pass a callback function, so the - * search will first try to check if the key string exists in the dictionary. - * If so, search will be based on the id instead of the key string. - * - * @author Tian Xia - */ - -#pragma once - -#include -#include -#include - -namespace fbson { - -#pragma pack(push, 1) - -#define FBSON_VER 1 - -// forward declaration -class FbsonValue; -class ObjectVal; - -/* - * FbsonDocument is the main object that accesses and queries FBSON packed - * bytes. NOTE: FbsonDocument only allows object container as the top level - * FBSON value. However, you can use the static method "createValue" to get any - * FbsonValue object from the packed bytes. - * - * FbsonDocument object also dereferences to an object container value - * (ObjectVal) once FBSON is loaded. - * - * ** Load ** - * FbsonDocument is usable after loading packed bytes (memory location) into - * the object. We only need the header and first few bytes of the payload after - * header to verify the FBSON. - * - * Note: creating an FbsonDocument (through createDocument) does not allocate - * any memory. The document object is an efficient wrapper on the packed bytes - * which is accessed directly. - * - * ** Query ** - * Query is through dereferencing into ObjectVal. - */ -class FbsonDocument { - public: - // create an FbsonDocument object from FBSON packed bytes - static FbsonDocument* createDocument(const char* pb, uint32_t size); - - // create an FbsonValue from FBSON packed bytes - static FbsonValue* createValue(const char* pb, uint32_t size); - - uint8_t version() { return header_.ver_; } - - FbsonValue* getValue() { return ((FbsonValue*)payload_); } - - ObjectVal* operator->() { return ((ObjectVal*)payload_); } - - const ObjectVal* operator->() const { return ((const ObjectVal*)payload_); } - - private: - /* - * FbsonHeader class defines FBSON header (internal to FbsonDocument). - * - * Currently it only contains version information (1-byte). We may expand the - * header to include checksum of the FBSON binary for more security. - */ - struct FbsonHeader { - uint8_t ver_; - } header_; - - char payload_[1]; - - FbsonDocument(); - - FbsonDocument(const FbsonDocument&) = delete; - FbsonDocument& operator=(const FbsonDocument&) = delete; -}; - -/* - * FbsonFwdIteratorT implements FBSON's iterator template. - * - * Note: it is an FORWARD iterator only due to the design of FBSON format. - */ -template -class FbsonFwdIteratorT { - typedef Iter_Type iterator; - typedef typename std::iterator_traits::pointer pointer; - typedef typename std::iterator_traits::reference reference; - - public: - explicit FbsonFwdIteratorT(const iterator& i) : current_(i) {} - - // allow non-const to const iterator conversion (same container type) - template - FbsonFwdIteratorT(const FbsonFwdIteratorT& rhs) - : current_(rhs.base()) {} - - bool operator==(const FbsonFwdIteratorT& rhs) const { - return (current_ == rhs.current_); - } - - bool operator!=(const FbsonFwdIteratorT& rhs) const { - return !operator==(rhs); - } - - bool operator<(const FbsonFwdIteratorT& rhs) const { - return (current_ < rhs.current_); - } - - bool operator>(const FbsonFwdIteratorT& rhs) const { return !operator<(rhs); } - - FbsonFwdIteratorT& operator++() { - current_ = (iterator)(((char*)current_) + current_->numPackedBytes()); - return *this; - } - - FbsonFwdIteratorT operator++(int) { - auto tmp = *this; - current_ = (iterator)(((char*)current_) + current_->numPackedBytes()); - return tmp; - } - - explicit operator pointer() { return current_; } - - reference operator*() const { return *current_; } - - pointer operator->() const { return current_; } - - iterator base() const { return current_; } - - private: - iterator current_; -}; - -typedef int (*hDictInsert)(const char* key, unsigned len); -typedef int (*hDictFind)(const char* key, unsigned len); - -/* - * FbsonType defines 10 primitive types and 2 container types, as described - * below. - * - * primitive_value ::= - * 0x00 //null value (0 byte) - * | 0x01 //boolean true (0 byte) - * | 0x02 //boolean false (0 byte) - * | 0x03 int8 //char/int8 (1 byte) - * | 0x04 int16 //int16 (2 bytes) - * | 0x05 int32 //int32 (4 bytes) - * | 0x06 int64 //int64 (8 bytes) - * | 0x07 double //floating point (8 bytes) - * | 0x08 string //variable length string - * | 0x09 binary //variable length binary - * - * container ::= - * 0x0A int32 key_value_list //object, int32 is the total bytes of the object - * | 0x0B int32 value_list //array, int32 is the total bytes of the array - */ -enum class FbsonType : char { - T_Null = 0x00, - T_True = 0x01, - T_False = 0x02, - T_Int8 = 0x03, - T_Int16 = 0x04, - T_Int32 = 0x05, - T_Int64 = 0x06, - T_Double = 0x07, - T_String = 0x08, - T_Binary = 0x09, - T_Object = 0x0A, - T_Array = 0x0B, - NUM_TYPES, -}; - -typedef std::underlying_type::type FbsonTypeUnder; - -/* - * FbsonKeyValue class defines FBSON key type, as described below. - * - * key ::= - * 0x00 int8 //1-byte dictionary id - * | int8 (byte*) //int8 (>0) is the size of the key string - * - * value ::= primitive_value | container - * - * FbsonKeyValue can be either an id mapping to the key string in an external - * dictionary, or it is the original key string. Whether to read an id or a - * string is decided by the first byte (size_). - * - * Note: a key object must be followed by a value object. Therefore, a key - * object implicitly refers to a key-value pair, and you can get the value - * object right after the key object. The function numPackedBytes hence - * indicates the total size of the key-value pair, so that we will be able go - * to next pair from the key. - * - * ** Dictionary size ** - * By default, the dictionary size is 255 (1-byte). Users can define - * "USE_LARGE_DICT" to increase the dictionary size to 655535 (2-byte). - */ -class FbsonKeyValue { - public: -#ifdef USE_LARGE_DICT - static const int sMaxKeyId = 65535; - typedef uint16_t keyid_type; -#else - static const int sMaxKeyId = 255; - typedef uint8_t keyid_type; -#endif // #ifdef USE_LARGE_DICT - - static const uint8_t sMaxKeyLen = 64; - - // size of the key. 0 indicates it is stored as id - uint8_t klen() const { return size_; } - - // get the key string. Note the string may not be null terminated. - const char* getKeyStr() const { return key_.str_; } - - keyid_type getKeyId() const { return key_.id_; } - - unsigned int keyPackedBytes() const { - return size_ ? (sizeof(size_) + size_) - : (sizeof(size_) + sizeof(keyid_type)); - } - - FbsonValue* value() const { - return (FbsonValue*)(((char*)this) + keyPackedBytes()); - } - - // size of the total packed bytes (key+value) - unsigned int numPackedBytes() const; - - private: - uint8_t size_; - - union key_ { - keyid_type id_; - char str_[1]; - } key_; - - FbsonKeyValue(); -}; - -/* - * FbsonValue is the base class of all FBSON types. It contains only one member - * variable - type info, which can be retrieved by member functions is[Type]() - * or type(). - */ -class FbsonValue { - public: - static const uint32_t sMaxValueLen = 1 << 24; // 16M - - bool isNull() const { return (type_ == FbsonType::T_Null); } - bool isTrue() const { return (type_ == FbsonType::T_True); } - bool isFalse() const { return (type_ == FbsonType::T_False); } - bool isInt8() const { return (type_ == FbsonType::T_Int8); } - bool isInt16() const { return (type_ == FbsonType::T_Int16); } - bool isInt32() const { return (type_ == FbsonType::T_Int32); } - bool isInt64() const { return (type_ == FbsonType::T_Int64); } - bool isDouble() const { return (type_ == FbsonType::T_Double); } - bool isString() const { return (type_ == FbsonType::T_String); } - bool isBinary() const { return (type_ == FbsonType::T_Binary); } - bool isObject() const { return (type_ == FbsonType::T_Object); } - bool isArray() const { return (type_ == FbsonType::T_Array); } - - FbsonType type() const { return type_; } - - // size of the total packed bytes - unsigned int numPackedBytes() const; - - // size of the value in bytes - unsigned int size() const; - - // get the raw byte array of the value - const char* getValuePtr() const; - - // find the FBSON value by a key path string (null terminated) - FbsonValue* findPath(const char* key_path, - const char* delim = ".", - hDictFind handler = nullptr) { - return findPath(key_path, (unsigned int)strlen(key_path), delim, handler); - } - - // find the FBSON value by a key path string (with length) - FbsonValue* findPath(const char* key_path, - unsigned int len, - const char* delim, - hDictFind handler); - - protected: - FbsonType type_; // type info - - FbsonValue(); -}; - -/* - * NumerValT is the template class (derived from FbsonValue) of all number - * types (integers and double). - */ -template -class NumberValT : public FbsonValue { - public: - T val() const { return num_; } - - unsigned int numPackedBytes() const { return sizeof(FbsonValue) + sizeof(T); } - - // catch all unknow specialization of the template class - bool setVal(T /*value*/) { return false; } - - private: - T num_; - - NumberValT(); -}; - -typedef NumberValT Int8Val; - -// override setVal for Int8Val -template <> -inline bool Int8Val::setVal(int8_t value) { - if (!isInt8()) { - return false; - } - - num_ = value; - return true; -} - -typedef NumberValT Int16Val; - -// override setVal for Int16Val -template <> -inline bool Int16Val::setVal(int16_t value) { - if (!isInt16()) { - return false; - } - - num_ = value; - return true; -} - -typedef NumberValT Int32Val; - -// override setVal for Int32Val -template <> -inline bool Int32Val::setVal(int32_t value) { - if (!isInt32()) { - return false; - } - - num_ = value; - return true; -} - -typedef NumberValT Int64Val; - -// override setVal for Int64Val -template <> -inline bool Int64Val::setVal(int64_t value) { - if (!isInt64()) { - return false; - } - - num_ = value; - return true; -} - -typedef NumberValT DoubleVal; - -// override setVal for DoubleVal -template <> -inline bool DoubleVal::setVal(double value) { - if (!isDouble()) { - return false; - } - - num_ = value; - return true; -} - -/* - * BlobVal is the base class (derived from FbsonValue) for string and binary - * types. The size_ indicates the total bytes of the payload_. - */ -class BlobVal : public FbsonValue { - public: - // size of the blob payload only - unsigned int getBlobLen() const { return size_; } - - // return the blob as byte array - const char* getBlob() const { return payload_; } - - // size of the total packed bytes - unsigned int numPackedBytes() const { - return sizeof(FbsonValue) + sizeof(size_) + size_; - } - - protected: - uint32_t size_; - char payload_[1]; - - // set new blob bytes - bool internalSetVal(const char* blob, uint32_t blobSize) { - // if we cannot fit the new blob, fail the operation - if (blobSize > size_) { - return false; - } - - memcpy(payload_, blob, blobSize); - - // Set the reset of the bytes to 0. Note we cannot change the size_ of the - // current payload, as all values are packed. - memset(payload_ + blobSize, 0, size_ - blobSize); - - return true; - } - - BlobVal(); - - private: - // Disable as this class can only be allocated dynamically - BlobVal(const BlobVal&) = delete; - BlobVal& operator=(const BlobVal&) = delete; -}; - -/* - * Binary type - */ -class BinaryVal : public BlobVal { - public: - bool setVal(const char* blob, uint32_t blobSize) { - if (!isBinary()) { - return false; - } - - return internalSetVal(blob, blobSize); - } - - private: - BinaryVal(); -}; - -/* - * String type - * Note: FBSON string may not be a c-string (NULL-terminated) - */ -class StringVal : public BlobVal { - public: - bool setVal(const char* str, uint32_t blobSize) { - if (!isString()) { - return false; - } - - return internalSetVal(str, blobSize); - } - - private: - StringVal(); -}; - -/* - * ContainerVal is the base class (derived from FbsonValue) for object and - * array types. The size_ indicates the total bytes of the payload_. - */ -class ContainerVal : public FbsonValue { - public: - // size of the container payload only - unsigned int getContainerSize() const { return size_; } - - // return the container payload as byte array - const char* getPayload() const { return payload_; } - - // size of the total packed bytes - unsigned int numPackedBytes() const { - return sizeof(FbsonValue) + sizeof(size_) + size_; - } - - protected: - uint32_t size_; - char payload_[1]; - - ContainerVal(); - - ContainerVal(const ContainerVal&) = delete; - ContainerVal& operator=(const ContainerVal&) = delete; -}; - -/* - * Object type - */ -class ObjectVal : public ContainerVal { - public: - // find the FBSON value by a key string (null terminated) - FbsonValue* find(const char* key, hDictFind handler = nullptr) const { - if (!key) - return nullptr; - - return find(key, (unsigned int)strlen(key), handler); - } - - // find the FBSON value by a key string (with length) - FbsonValue* find(const char* key, - unsigned int klen, - hDictFind handler = nullptr) const { - if (!key || !klen) - return nullptr; - - int key_id = -1; - if (handler && (key_id = handler(key, klen)) >= 0) { - return find(key_id); - } - - return internalFind(key, klen); - } - - // find the FBSON value by a key dictionary ID - FbsonValue* find(int key_id) const { - if (key_id < 0 || key_id > FbsonKeyValue::sMaxKeyId) - return nullptr; - - const char* pch = payload_; - const char* fence = payload_ + size_; - - while (pch < fence) { - FbsonKeyValue* pkey = (FbsonKeyValue*)(pch); - if (!pkey->klen() && key_id == pkey->getKeyId()) { - return pkey->value(); - } - pch += pkey->numPackedBytes(); - } - - assert(pch == fence); - - return nullptr; - } - - typedef FbsonKeyValue value_type; - typedef value_type* pointer; - typedef const value_type* const_pointer; - typedef FbsonFwdIteratorT iterator; - typedef FbsonFwdIteratorT const_iterator; - - iterator begin() { return iterator((pointer)payload_); } - - const_iterator begin() const { return const_iterator((pointer)payload_); } - - iterator end() { return iterator((pointer)(payload_ + size_)); } - - const_iterator end() const { - return const_iterator((pointer)(payload_ + size_)); - } - - private: - FbsonValue* internalFind(const char* key, unsigned int klen) const { - const char* pch = payload_; - const char* fence = payload_ + size_; - - while (pch < fence) { - FbsonKeyValue* pkey = (FbsonKeyValue*)(pch); - if (klen == pkey->klen() && strncmp(key, pkey->getKeyStr(), klen) == 0) { - return pkey->value(); - } - pch += pkey->numPackedBytes(); - } - - assert(pch == fence); - - return nullptr; - } - - private: - ObjectVal(); -}; - -/* - * Array type - */ -class ArrayVal : public ContainerVal { - public: - // get the FBSON value at index - FbsonValue* get(int idx) const { - if (idx < 0) - return nullptr; - - const char* pch = payload_; - const char* fence = payload_ + size_; - - while (pch < fence && idx-- > 0) - pch += ((FbsonValue*)pch)->numPackedBytes(); - - if (idx == -1) - return (FbsonValue*)pch; - else { - assert(pch == fence); - return nullptr; - } - } - - // Get number of elements in array - unsigned int numElem() const { - const char* pch = payload_; - const char* fence = payload_ + size_; - - unsigned int num = 0; - while (pch < fence) { - ++num; - pch += ((FbsonValue*)pch)->numPackedBytes(); - } - - assert(pch == fence); - - return num; - } - - typedef FbsonValue value_type; - typedef value_type* pointer; - typedef const value_type* const_pointer; - typedef FbsonFwdIteratorT iterator; - typedef FbsonFwdIteratorT const_iterator; - - iterator begin() { return iterator((pointer)payload_); } - - const_iterator begin() const { return const_iterator((pointer)payload_); } - - iterator end() { return iterator((pointer)(payload_ + size_)); } - - const_iterator end() const { - return const_iterator((pointer)(payload_ + size_)); - } - - private: - ArrayVal(); -}; - -inline FbsonDocument* FbsonDocument::createDocument(const char* pb, - uint32_t size) { - if (!pb || size < sizeof(FbsonHeader) + sizeof(FbsonValue)) { - return nullptr; - } - - FbsonDocument* doc = (FbsonDocument*)pb; - if (doc->header_.ver_ != FBSON_VER) { - return nullptr; - } - - FbsonValue* val = (FbsonValue*)doc->payload_; - if (!val->isObject() || size != sizeof(FbsonHeader) + val->numPackedBytes()) { - return nullptr; - } - - return doc; -} - -inline FbsonValue* FbsonDocument::createValue(const char* pb, uint32_t size) { - if (!pb || size < sizeof(FbsonHeader) + sizeof(FbsonValue)) { - return nullptr; - } - - FbsonDocument* doc = (FbsonDocument*)pb; - if (doc->header_.ver_ != FBSON_VER) { - return nullptr; - } - - FbsonValue* val = (FbsonValue*)doc->payload_; - if (size != sizeof(FbsonHeader) + val->numPackedBytes()) { - return nullptr; - } - - return val; -} - -inline unsigned int FbsonKeyValue::numPackedBytes() const { - unsigned int ks = keyPackedBytes(); - FbsonValue* val = (FbsonValue*)(((char*)this) + ks); - return ks + val->numPackedBytes(); -} - -// Poor man's "virtual" function FbsonValue::numPackedBytes -inline unsigned int FbsonValue::numPackedBytes() const { - switch (type_) { - case FbsonType::T_Null: - case FbsonType::T_True: - case FbsonType::T_False: { - return sizeof(type_); - } - - case FbsonType::T_Int8: { - return sizeof(type_) + sizeof(int8_t); - } - case FbsonType::T_Int16: { - return sizeof(type_) + sizeof(int16_t); - } - case FbsonType::T_Int32: { - return sizeof(type_) + sizeof(int32_t); - } - case FbsonType::T_Int64: { - return sizeof(type_) + sizeof(int64_t); - } - case FbsonType::T_Double: { - return sizeof(type_) + sizeof(double); - } - case FbsonType::T_String: - case FbsonType::T_Binary: { - return ((BlobVal*)(this))->numPackedBytes(); - } - - case FbsonType::T_Object: - case FbsonType::T_Array: { - return ((ContainerVal*)(this))->numPackedBytes(); - } - default: - return 0; - } -} - -inline unsigned int FbsonValue::size() const { - switch (type_) { - case FbsonType::T_Int8: { - return sizeof(int8_t); - } - case FbsonType::T_Int16: { - return sizeof(int16_t); - } - case FbsonType::T_Int32: { - return sizeof(int32_t); - } - case FbsonType::T_Int64: { - return sizeof(int64_t); - } - case FbsonType::T_Double: { - return sizeof(double); - } - case FbsonType::T_String: - case FbsonType::T_Binary: { - return ((BlobVal*)(this))->getBlobLen(); - } - - case FbsonType::T_Object: - case FbsonType::T_Array: { - return ((ContainerVal*)(this))->getContainerSize(); - } - case FbsonType::T_Null: - case FbsonType::T_True: - case FbsonType::T_False: - default: - return 0; - } -} - -inline const char* FbsonValue::getValuePtr() const { - switch (type_) { - case FbsonType::T_Int8: - case FbsonType::T_Int16: - case FbsonType::T_Int32: - case FbsonType::T_Int64: - case FbsonType::T_Double: - return ((char*)this) + sizeof(FbsonType); - - case FbsonType::T_String: - case FbsonType::T_Binary: - return ((BlobVal*)(this))->getBlob(); - - case FbsonType::T_Object: - case FbsonType::T_Array: - return ((ContainerVal*)(this))->getPayload(); - - case FbsonType::T_Null: - case FbsonType::T_True: - case FbsonType::T_False: - default: - return nullptr; - } -} - -inline FbsonValue* FbsonValue::findPath(const char* key_path, - unsigned int kp_len, - const char* delim = ".", - hDictFind handler = nullptr) { - if (!key_path || !kp_len) - return nullptr; - - if (!delim) - delim = "."; // default delimiter - - FbsonValue* pval = this; - const char* fence = key_path + kp_len; - char idx_buf[21]; // buffer to parse array index (integer value) - - while (pval && key_path < fence) { - const char* key = key_path; - unsigned int klen = 0; - // find the current key - for (; key_path != fence && *key_path != *delim; ++key_path, ++klen) - ; - - if (!klen) - return nullptr; - - switch (pval->type_) { - case FbsonType::T_Object: { - pval = ((ObjectVal*)pval)->find(key, klen, handler); - break; - } - - case FbsonType::T_Array: { - // parse string into an integer (array index) - if (klen >= sizeof(idx_buf)) - return nullptr; - - memcpy(idx_buf, key, klen); - idx_buf[klen] = 0; - - char* end = nullptr; - int index = (int)strtol(idx_buf, &end, 10); - if (end && !*end) - pval = ((fbson::ArrayVal*)pval)->get(index); - else - // incorrect index string - return nullptr; - break; - } - - default: - return nullptr; - } - - // skip the delimiter - if (key_path < fence) { - ++key_path; - if (key_path == fence) - // we have a trailing delimiter at the end - return nullptr; - } - } - - return pval; -} - -#pragma pack(pop) - -} // namespace fbson diff --git a/ceph/src/rocksdb/third-party/fbson/FbsonJsonParser.h b/ceph/src/rocksdb/third-party/fbson/FbsonJsonParser.h deleted file mode 100644 index f4b8ed251..000000000 --- a/ceph/src/rocksdb/third-party/fbson/FbsonJsonParser.h +++ /dev/null @@ -1,742 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -/* - * This file defines FbsonJsonParserT (template) and FbsonJsonParser. - * - * FbsonJsonParserT is a template class which implements a JSON parser. - * FbsonJsonParserT parses JSON text, and serialize it to FBSON binary format - * by using FbsonWriterT object. By default, FbsonJsonParserT creates a new - * FbsonWriterT object with an output stream object. However, you can also - * pass in your FbsonWriterT or any stream object that implements some basic - * interface of std::ostream (see FbsonStream.h). - * - * FbsonJsonParser specializes FbsonJsonParserT with FbsonOutStream type (see - * FbsonStream.h). So unless you want to provide own a different output stream - * type, use FbsonJsonParser object. - * - * ** Parsing JSON ** - * FbsonJsonParserT parses JSON string, and directly serializes into FBSON - * packed bytes. There are three ways to parse a JSON string: (1) using - * c-string, (2) using string with len, (3) using std::istream object. You can - * use custome streambuf to redirect output. FbsonOutBuffer is a streambuf used - * internally if the input is raw character buffer. - * - * You can reuse an FbsonJsonParserT object to parse/serialize multiple JSON - * strings, and the previous FBSON will be overwritten. - * - * If parsing fails (returned false), the error code will be set to one of - * FbsonErrType, and can be retrieved by calling getErrorCode(). - * - * ** External dictionary ** - * During parsing a JSON string, you can pass a callback function to map a key - * string to an id, and store the dictionary id in FBSON to save space. The - * purpose of using an external dictionary is more towards a collection of - * documents (which has common keys) rather than a single document, so that - * space saving will be significant. - * - * ** Endianness ** - * Note: FBSON serialization doesn't assume endianness of the server. However - * you will need to ensure that the endianness at the reader side is the same - * as that at the writer side (if they are on different machines). Otherwise, - * proper conversion is needed when a number value is returned to the - * caller/writer. - * - * @author Tian Xia - */ - -#pragma once - -#include -#include -#include "FbsonDocument.h" -#include "FbsonWriter.h" - -namespace fbson { - -const char* const kJsonDelim = " ,]}\t\r\n"; -const char* const kWhiteSpace = " \t\n\r"; - -/* - * Error codes - */ -enum class FbsonErrType { - E_NONE = 0, - E_INVALID_VER, - E_EMPTY_STR, - E_OUTPUT_FAIL, - E_INVALID_DOCU, - E_INVALID_VALUE, - E_INVALID_KEY, - E_INVALID_STR, - E_INVALID_OBJ, - E_INVALID_ARR, - E_INVALID_HEX, - E_INVALID_OCTAL, - E_INVALID_DECIMAL, - E_INVALID_EXPONENT, - E_HEX_OVERFLOW, - E_OCTAL_OVERFLOW, - E_DECIMAL_OVERFLOW, - E_DOUBLE_OVERFLOW, - E_EXPONENT_OVERFLOW, -}; - -/* - * Template FbsonJsonParserT - */ -template -class FbsonJsonParserT { - public: - FbsonJsonParserT() : err_(FbsonErrType::E_NONE) {} - - explicit FbsonJsonParserT(OS_TYPE& os) - : writer_(os), err_(FbsonErrType::E_NONE) {} - - // parse a UTF-8 JSON string - bool parse(const std::string& str, hDictInsert handler = nullptr) { - return parse(str.c_str(), (unsigned int)str.size(), handler); - } - - // parse a UTF-8 JSON c-style string (NULL terminated) - bool parse(const char* c_str, hDictInsert handler = nullptr) { - return parse(c_str, (unsigned int)strlen(c_str), handler); - } - - // parse a UTF-8 JSON string with length - bool parse(const char* pch, unsigned int len, hDictInsert handler = nullptr) { - if (!pch || len == 0) { - err_ = FbsonErrType::E_EMPTY_STR; - return false; - } - - FbsonInBuffer sb(pch, len); - std::istream in(&sb); - return parse(in, handler); - } - - // parse UTF-8 JSON text from an input stream - bool parse(std::istream& in, hDictInsert handler = nullptr) { - bool res = false; - - // reset output stream - writer_.reset(); - - trim(in); - - if (in.peek() == '{') { - in.ignore(); - res = parseObject(in, handler); - } else if (in.peek() == '[') { - in.ignore(); - res = parseArray(in, handler); - } else { - err_ = FbsonErrType::E_INVALID_DOCU; - } - - trim(in); - if (res && !in.eof()) { - err_ = FbsonErrType::E_INVALID_DOCU; - return false; - } - - return res; - } - - FbsonWriterT& getWriter() { return writer_; } - - FbsonErrType getErrorCode() { return err_; } - - // clear error code - void clearErr() { err_ = FbsonErrType::E_NONE; } - - private: - // parse a JSON object (comma-separated list of key-value pairs) - bool parseObject(std::istream& in, hDictInsert handler) { - if (!writer_.writeStartObject()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - trim(in); - - if (in.peek() == '}') { - in.ignore(); - // empty object - if (!writer_.writeEndObject()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - return true; - } - - while (in.good()) { - if (in.get() != '"') { - err_ = FbsonErrType::E_INVALID_KEY; - return false; - } - - if (!parseKVPair(in, handler)) { - return false; - } - - trim(in); - - char ch = in.get(); - if (ch == '}') { - // end of the object - if (!writer_.writeEndObject()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - return true; - } else if (ch != ',') { - err_ = FbsonErrType::E_INVALID_OBJ; - return false; - } - - trim(in); - } - - err_ = FbsonErrType::E_INVALID_OBJ; - return false; - } - - // parse a JSON array (comma-separated list of values) - bool parseArray(std::istream& in, hDictInsert handler) { - if (!writer_.writeStartArray()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - trim(in); - - if (in.peek() == ']') { - in.ignore(); - // empty array - if (!writer_.writeEndArray()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - return true; - } - - while (in.good()) { - if (!parseValue(in, handler)) { - return false; - } - - trim(in); - - char ch = in.get(); - if (ch == ']') { - // end of the array - if (!writer_.writeEndArray()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - return true; - } else if (ch != ',') { - err_ = FbsonErrType::E_INVALID_ARR; - return false; - } - - trim(in); - } - - err_ = FbsonErrType::E_INVALID_ARR; - return false; - } - - // parse a key-value pair, separated by ":" - bool parseKVPair(std::istream& in, hDictInsert handler) { - if (parseKey(in, handler) && parseValue(in, handler)) { - return true; - } - - return false; - } - - // parse a key (must be string) - bool parseKey(std::istream& in, hDictInsert handler) { - char key[FbsonKeyValue::sMaxKeyLen]; - int i = 0; - while (in.good() && in.peek() != '"' && i < FbsonKeyValue::sMaxKeyLen) { - key[i++] = in.get(); - } - - if (!in.good() || in.peek() != '"' || i == 0) { - err_ = FbsonErrType::E_INVALID_KEY; - return false; - } - - in.ignore(); // discard '"' - - int key_id = -1; - if (handler) { - key_id = handler(key, i); - } - - if (key_id < 0) { - writer_.writeKey(key, i); - } else { - writer_.writeKey(key_id); - } - - trim(in); - - if (in.get() != ':') { - err_ = FbsonErrType::E_INVALID_OBJ; - return false; - } - - return true; - } - - // parse a value - bool parseValue(std::istream& in, hDictInsert handler) { - bool res = false; - - trim(in); - - switch (in.peek()) { - case 'N': - case 'n': { - in.ignore(); - res = parseNull(in); - break; - } - case 'T': - case 't': { - in.ignore(); - res = parseTrue(in); - break; - } - case 'F': - case 'f': { - in.ignore(); - res = parseFalse(in); - break; - } - case '"': { - in.ignore(); - res = parseString(in); - break; - } - case '{': { - in.ignore(); - res = parseObject(in, handler); - break; - } - case '[': { - in.ignore(); - res = parseArray(in, handler); - break; - } - default: { - res = parseNumber(in); - break; - } - } - - return res; - } - - // parse NULL value - bool parseNull(std::istream& in) { - if (tolower(in.get()) == 'u' && tolower(in.get()) == 'l' && - tolower(in.get()) == 'l') { - writer_.writeNull(); - return true; - } - - err_ = FbsonErrType::E_INVALID_VALUE; - return false; - } - - // parse TRUE value - bool parseTrue(std::istream& in) { - if (tolower(in.get()) == 'r' && tolower(in.get()) == 'u' && - tolower(in.get()) == 'e') { - writer_.writeBool(true); - return true; - } - - err_ = FbsonErrType::E_INVALID_VALUE; - return false; - } - - // parse FALSE value - bool parseFalse(std::istream& in) { - if (tolower(in.get()) == 'a' && tolower(in.get()) == 'l' && - tolower(in.get()) == 's' && tolower(in.get()) == 'e') { - writer_.writeBool(false); - return true; - } - - err_ = FbsonErrType::E_INVALID_VALUE; - return false; - } - - // parse a string - bool parseString(std::istream& in) { - if (!writer_.writeStartString()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - bool escaped = false; - char buffer[4096]; // write 4KB at a time - int nread = 0; - while (in.good()) { - char ch = in.get(); - if (ch != '"' || escaped) { - buffer[nread++] = ch; - if (nread == 4096) { - // flush buffer - if (!writer_.writeString(buffer, nread)) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - nread = 0; - } - // set/reset escape - if (ch == '\\' || escaped) { - escaped = !escaped; - } - } else { - // write all remaining bytes in the buffer - if (nread > 0) { - if (!writer_.writeString(buffer, nread)) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - } - // end writing string - if (!writer_.writeEndString()) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - return true; - } - } - - err_ = FbsonErrType::E_INVALID_STR; - return false; - } - - // parse a number - // Number format can be hex, octal, or decimal (including float). - // Only decimal can have (+/-) sign prefix. - bool parseNumber(std::istream& in) { - bool ret = false; - switch (in.peek()) { - case '0': { - in.ignore(); - - if (in.peek() == 'x' || in.peek() == 'X') { - in.ignore(); - ret = parseHex(in); - } else if (in.peek() == '.') { - in.ignore(); - ret = parseDouble(in, 0, 0, 1); - } else { - ret = parseOctal(in); - } - - break; - } - case '-': { - in.ignore(); - ret = parseDecimal(in, -1); - break; - } - case '+': - in.ignore(); -#if defined(__clang__) - [[clang::fallthrough]]; -#elif defined(__GNUC__) && __GNUC__ >= 7 - [[gnu::fallthrough]]; -#endif - default: - ret = parseDecimal(in, 1); - break; - } - - return ret; - } - - // parse a number in hex format - bool parseHex(std::istream& in) { - uint64_t val = 0; - int num_digits = 0; - char ch = tolower(in.peek()); - while (in.good() && !strchr(kJsonDelim, ch) && (++num_digits) <= 16) { - if (ch >= '0' && ch <= '9') { - val = (val << 4) + (ch - '0'); - } else if (ch >= 'a' && ch <= 'f') { - val = (val << 4) + (ch - 'a' + 10); - } else { // unrecognized hex digit - err_ = FbsonErrType::E_INVALID_HEX; - return false; - } - - in.ignore(); - ch = tolower(in.peek()); - } - - int size = 0; - if (num_digits <= 2) { - size = writer_.writeInt8((int8_t)val); - } else if (num_digits <= 4) { - size = writer_.writeInt16((int16_t)val); - } else if (num_digits <= 8) { - size = writer_.writeInt32((int32_t)val); - } else if (num_digits <= 16) { - size = writer_.writeInt64(val); - } else { - err_ = FbsonErrType::E_HEX_OVERFLOW; - return false; - } - - if (size == 0) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - return true; - } - - // parse a number in octal format - bool parseOctal(std::istream& in) { - int64_t val = 0; - char ch = in.peek(); - while (in.good() && !strchr(kJsonDelim, ch)) { - if (ch >= '0' && ch <= '7') { - val = val * 8 + (ch - '0'); - } else { - err_ = FbsonErrType::E_INVALID_OCTAL; - return false; - } - - // check if the number overflows - if (val < 0) { - err_ = FbsonErrType::E_OCTAL_OVERFLOW; - return false; - } - - in.ignore(); - ch = in.peek(); - } - - int size = 0; - if (val <= std::numeric_limits::max()) { - size = writer_.writeInt8((int8_t)val); - } else if (val <= std::numeric_limits::max()) { - size = writer_.writeInt16((int16_t)val); - } else if (val <= std::numeric_limits::max()) { - size = writer_.writeInt32((int32_t)val); - } else { // val <= INT64_MAX - size = writer_.writeInt64(val); - } - - if (size == 0) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - return true; - } - - // parse a number in decimal (including float) - bool parseDecimal(std::istream& in, int sign) { - int64_t val = 0; - int precision = 0; - - char ch = 0; - while (in.good() && (ch = in.peek()) == '0') - in.ignore(); - - while (in.good() && !strchr(kJsonDelim, ch)) { - if (ch >= '0' && ch <= '9') { - val = val * 10 + (ch - '0'); - ++precision; - } else if (ch == '.') { - // note we don't pop out '.' - return parseDouble(in, static_cast(val), precision, sign); - } else { - err_ = FbsonErrType::E_INVALID_DECIMAL; - return false; - } - - in.ignore(); - - // if the number overflows int64_t, first parse it as double iff we see a - // decimal point later. Otherwise, will treat it as overflow - if (val < 0 && val > std::numeric_limits::min()) { - return parseDouble(in, static_cast(val), precision, sign); - } - - ch = in.peek(); - } - - if (sign < 0) { - val = -val; - } - - int size = 0; - if (val >= std::numeric_limits::min() && - val <= std::numeric_limits::max()) { - size = writer_.writeInt8((int8_t)val); - } else if (val >= std::numeric_limits::min() && - val <= std::numeric_limits::max()) { - size = writer_.writeInt16((int16_t)val); - } else if (val >= std::numeric_limits::min() && - val <= std::numeric_limits::max()) { - size = writer_.writeInt32((int32_t)val); - } else { // val <= INT64_MAX - size = writer_.writeInt64(val); - } - - if (size == 0) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - return true; - } - - // parse IEEE745 double precision: - // Significand precision length - 15 - // Maximum exponent value - 308 - // - // "If a decimal string with at most 15 significant digits is converted to - // IEEE 754 double precision representation and then converted back to a - // string with the same number of significant digits, then the final string - // should match the original" - bool parseDouble(std::istream& in, double val, int precision, int sign) { - int integ = precision; - int frac = 0; - bool is_frac = false; - - char ch = in.peek(); - if (ch == '.') { - is_frac = true; - in.ignore(); - ch = in.peek(); - } - - int exp = 0; - while (in.good() && !strchr(kJsonDelim, ch)) { - if (ch >= '0' && ch <= '9') { - if (precision < 15) { - val = val * 10 + (ch - '0'); - if (is_frac) { - ++frac; - } else { - ++integ; - } - ++precision; - } else if (!is_frac) { - ++exp; - } - } else if (ch == 'e' || ch == 'E') { - in.ignore(); - int exp2; - if (!parseExponent(in, exp2)) { - return false; - } - - exp += exp2; - // check if exponent overflows - if (exp > 308 || exp < -308) { - err_ = FbsonErrType::E_EXPONENT_OVERFLOW; - return false; - } - - is_frac = true; - break; - } - - in.ignore(); - ch = in.peek(); - } - - if (!is_frac) { - err_ = FbsonErrType::E_DECIMAL_OVERFLOW; - return false; - } - - val *= std::pow(10, exp - frac); - if (std::isnan(val) || std::isinf(val)) { - err_ = FbsonErrType::E_DOUBLE_OVERFLOW; - return false; - } - - if (sign < 0) { - val = -val; - } - - if (writer_.writeDouble(val) == 0) { - err_ = FbsonErrType::E_OUTPUT_FAIL; - return false; - } - - return true; - } - - // parse the exponent part of a double number - bool parseExponent(std::istream& in, int& exp) { - bool neg = false; - - char ch = in.peek(); - if (ch == '+') { - in.ignore(); - ch = in.peek(); - } else if (ch == '-') { - neg = true; - in.ignore(); - ch = in.peek(); - } - - exp = 0; - while (in.good() && !strchr(kJsonDelim, ch)) { - if (ch >= '0' && ch <= '9') { - exp = exp * 10 + (ch - '0'); - } else { - err_ = FbsonErrType::E_INVALID_EXPONENT; - return false; - } - - if (exp > 308) { - err_ = FbsonErrType::E_EXPONENT_OVERFLOW; - return false; - } - - in.ignore(); - ch = in.peek(); - } - - if (neg) { - exp = -exp; - } - - return true; - } - - void trim(std::istream& in) { - while (in.good() && strchr(kWhiteSpace, in.peek())) { - in.ignore(); - } - } - - private: - FbsonWriterT writer_; - FbsonErrType err_; -}; - -typedef FbsonJsonParserT FbsonJsonParser; - -} // namespace fbson diff --git a/ceph/src/rocksdb/third-party/fbson/FbsonStream.h b/ceph/src/rocksdb/third-party/fbson/FbsonStream.h deleted file mode 100644 index b20cb1c3b..000000000 --- a/ceph/src/rocksdb/third-party/fbson/FbsonStream.h +++ /dev/null @@ -1,179 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -/* - * This header file defines FbsonInBuffer and FbsonOutStream classes. - * - * ** Input Buffer ** - * FbsonInBuffer is a customer input buffer to wrap raw character buffer. Its - * object instances are used to create std::istream objects interally. - * - * ** Output Stream ** - * FbsonOutStream is a custom output stream classes, to contain the FBSON - * serialized binary. The class is conveniently used to specialize templates of - * FbsonParser and FbsonWriter. - * - * @author Tian Xia - */ - -#pragma once - -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#if defined OS_WIN && !defined snprintf -#define snprintf _snprintf -#endif - -#include -#include - -namespace fbson { - -// lengths includes sign -#define MAX_INT_DIGITS 11 -#define MAX_INT64_DIGITS 20 -#define MAX_DOUBLE_DIGITS 23 // 1(sign)+16(significant)+1(decimal)+5(exponent) - -/* - * FBSON's implementation of input buffer - */ -class FbsonInBuffer : public std::streambuf { - public: - FbsonInBuffer(const char* str, uint32_t len) { - // this is read buffer and the str will not be changed - // so we use const_cast (ugly!) to remove constness - char* pch(const_cast(str)); - setg(pch, pch, pch + len); - } -}; - -/* - * FBSON's implementation of output stream. - * - * This is a wrapper of a char buffer. By default, the buffer capacity is 1024 - * bytes. We will double the buffer if realloc is needed for writes. - */ -class FbsonOutStream : public std::ostream { - public: - explicit FbsonOutStream(uint32_t capacity = 1024) - : std::ostream(nullptr), - head_(nullptr), - size_(0), - capacity_(capacity), - alloc_(true) { - if (capacity_ == 0) { - capacity_ = 1024; - } - - head_ = (char*)malloc(capacity_); - } - - FbsonOutStream(char* buffer, uint32_t capacity) - : std::ostream(nullptr), - head_(buffer), - size_(0), - capacity_(capacity), - alloc_(false) { - assert(buffer && capacity_ > 0); - } - - ~FbsonOutStream() { - if (alloc_) { - free(head_); - } - } - - void put(char c) { write(&c, 1); } - - void write(const char* c_str) { write(c_str, (uint32_t)strlen(c_str)); } - - void write(const char* bytes, uint32_t len) { - if (len == 0) - return; - - if (size_ + len > capacity_) { - realloc(len); - } - - memcpy(head_ + size_, bytes, len); - size_ += len; - } - - // write the integer to string - void write(int i) { - // snprintf automatically adds a NULL, so we need one more char - if (size_ + MAX_INT_DIGITS + 1 > capacity_) { - realloc(MAX_INT_DIGITS + 1); - } - - int len = snprintf(head_ + size_, MAX_INT_DIGITS + 1, "%d", i); - assert(len > 0); - size_ += len; - } - - // write the 64bit integer to string - void write(int64_t l) { - // snprintf automatically adds a NULL, so we need one more char - if (size_ + MAX_INT64_DIGITS + 1 > capacity_) { - realloc(MAX_INT64_DIGITS + 1); - } - - int len = snprintf(head_ + size_, MAX_INT64_DIGITS + 1, "%" PRIi64, l); - assert(len > 0); - size_ += len; - } - - // write the double to string - void write(double d) { - // snprintf automatically adds a NULL, so we need one more char - if (size_ + MAX_DOUBLE_DIGITS + 1 > capacity_) { - realloc(MAX_DOUBLE_DIGITS + 1); - } - - int len = snprintf(head_ + size_, MAX_DOUBLE_DIGITS + 1, "%.15g", d); - assert(len > 0); - size_ += len; - } - - pos_type tellp() const { return size_; } - - void seekp(pos_type pos) { size_ = (uint32_t)pos; } - - const char* getBuffer() const { return head_; } - - pos_type getSize() const { return tellp(); } - - private: - void realloc(uint32_t len) { - assert(capacity_ > 0); - - capacity_ *= 2; - while (capacity_ < size_ + len) { - capacity_ *= 2; - } - - if (alloc_) { - char* new_buf = (char*)::realloc(head_, capacity_); - assert(new_buf); - head_ = new_buf; - } else { - char* new_buf = (char*)::malloc(capacity_); - assert(new_buf); - memcpy(new_buf, head_, size_); - head_ = new_buf; - alloc_ = true; - } - } - - private: - char* head_; - uint32_t size_; - uint32_t capacity_; - bool alloc_; -}; - -} // namespace fbson diff --git a/ceph/src/rocksdb/third-party/fbson/FbsonUtil.h b/ceph/src/rocksdb/third-party/fbson/FbsonUtil.h deleted file mode 100644 index 70ac6cb2b..000000000 --- a/ceph/src/rocksdb/third-party/fbson/FbsonUtil.h +++ /dev/null @@ -1,160 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -/* - * This header file defines miscellaneous utility classes. - * - * @author Tian Xia - */ - -#pragma once - -#include -#include "FbsonDocument.h" - -namespace fbson { - -#define OUT_BUF_SIZE 1024 - -/* - * FbsonToJson converts an FbsonValue object to a JSON string. - */ -class FbsonToJson { - public: - FbsonToJson() : os_(buffer_, OUT_BUF_SIZE) {} - - // get json string - const char* json(const FbsonValue* pval) { - os_.clear(); - os_.seekp(0); - - if (pval) { - intern_json(pval); - } - - os_.put(0); - return os_.getBuffer(); - } - - private: - // recursively convert FbsonValue - void intern_json(const FbsonValue* val) { - switch (val->type()) { - case FbsonType::T_Null: { - os_.write("null", 4); - break; - } - case FbsonType::T_True: { - os_.write("true", 4); - break; - } - case FbsonType::T_False: { - os_.write("false", 5); - break; - } - case FbsonType::T_Int8: { - os_.write(((Int8Val*)val)->val()); - break; - } - case FbsonType::T_Int16: { - os_.write(((Int16Val*)val)->val()); - break; - } - case FbsonType::T_Int32: { - os_.write(((Int32Val*)val)->val()); - break; - } - case FbsonType::T_Int64: { - os_.write(((Int64Val*)val)->val()); - break; - } - case FbsonType::T_Double: { - os_.write(((DoubleVal*)val)->val()); - break; - } - case FbsonType::T_String: { - os_.put('"'); - os_.write(((StringVal*)val)->getBlob(), ((StringVal*)val)->getBlobLen()); - os_.put('"'); - break; - } - case FbsonType::T_Binary: { - os_.write("\"", 9); - os_.write(((BinaryVal*)val)->getBlob(), ((BinaryVal*)val)->getBlobLen()); - os_.write("\"", 9); - break; - } - case FbsonType::T_Object: { - object_to_json((ObjectVal*)val); - break; - } - case FbsonType::T_Array: { - array_to_json((ArrayVal*)val); - break; - } - default: - break; - } - } - - // convert object - void object_to_json(const ObjectVal* val) { - os_.put('{'); - - auto iter = val->begin(); - auto iter_fence = val->end(); - - while (iter < iter_fence) { - // write key - if (iter->klen()) { - os_.put('"'); - os_.write(iter->getKeyStr(), iter->klen()); - os_.put('"'); - } else { - os_.write(iter->getKeyId()); - } - os_.put(':'); - - // convert value - intern_json(iter->value()); - - ++iter; - if (iter != iter_fence) { - os_.put(','); - } - } - - assert(iter == iter_fence); - - os_.put('}'); - } - - // convert array to json - void array_to_json(const ArrayVal* val) { - os_.put('['); - - auto iter = val->begin(); - auto iter_fence = val->end(); - - while (iter != iter_fence) { - // convert value - intern_json((const FbsonValue*)iter); - ++iter; - if (iter != iter_fence) { - os_.put(','); - } - } - - assert(iter == iter_fence); - - os_.put(']'); - } - - private: - FbsonOutStream os_; - char buffer_[OUT_BUF_SIZE]; -}; - -} // namespace fbson diff --git a/ceph/src/rocksdb/third-party/fbson/FbsonWriter.h b/ceph/src/rocksdb/third-party/fbson/FbsonWriter.h deleted file mode 100644 index e5010fade..000000000 --- a/ceph/src/rocksdb/third-party/fbson/FbsonWriter.h +++ /dev/null @@ -1,434 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -/* - * This file defines FbsonWriterT (template) and FbsonWriter. - * - * FbsonWriterT is a template class which implements an FBSON serializer. - * Users call various write functions of FbsonWriterT object to write values - * directly to FBSON packed bytes. All write functions of value or key return - * the number of bytes written to FBSON, or 0 if there is an error. To write an - * object, an array, or a string, you must call writeStart[..] before writing - * values or key, and call writeEnd[..] after finishing at the end. - * - * By default, an FbsonWriterT object creates an output stream buffer. - * Alternatively, you can also pass any output stream object to a writer, as - * long as the stream object implements some basic functions of std::ostream - * (such as FbsonOutStream, see FbsonStream.h). - * - * FbsonWriter specializes FbsonWriterT with FbsonOutStream type (see - * FbsonStream.h). So unless you want to provide own a different output stream - * type, use FbsonParser object. - * - * @author Tian Xia - */ - -#pragma once - -#include -#include "FbsonDocument.h" -#include "FbsonStream.h" - -// conversion' conversion from 'type1' to 'type2', possible loss of data -// Can not restore at the header end as the warnings are emitted at the point of -// template instantiation -#if defined(_MSC_VER) -#pragma warning(disable : 4244) -#endif - -namespace fbson { - -template -class FbsonWriterT { - public: - FbsonWriterT() - : alloc_(true), hasHdr_(false), kvState_(WS_Value), str_pos_(0) { - os_ = new OS_TYPE(); - } - - explicit FbsonWriterT(OS_TYPE& os) - : os_(&os), - alloc_(false), - hasHdr_(false), - kvState_(WS_Value), - str_pos_(0) {} - - ~FbsonWriterT() { - if (alloc_) { - delete os_; - } - } - - void reset() { - os_->clear(); - os_->seekp(0); - hasHdr_ = false; - kvState_ = WS_Value; - for (; !stack_.empty(); stack_.pop()) - ; - } - - // write a key string (or key id if an external dict is provided) - uint32_t writeKey(const char* key, - uint8_t len, - hDictInsert handler = nullptr) { - if (len && !stack_.empty() && verifyKeyState()) { - int key_id = -1; - if (handler) { - key_id = handler(key, len); - } - - uint32_t size = sizeof(uint8_t); - if (key_id < 0) { - os_->put(len); - os_->write(key, len); - size += len; - } else if (key_id <= FbsonKeyValue::sMaxKeyId) { - FbsonKeyValue::keyid_type idx = key_id; - os_->put(0); - os_->write((char*)&idx, sizeof(FbsonKeyValue::keyid_type)); - size += sizeof(FbsonKeyValue::keyid_type); - } else { // key id overflow - assert(0); - return 0; - } - - kvState_ = WS_Key; - return size; - } - - return 0; - } - - // write a key id - uint32_t writeKey(FbsonKeyValue::keyid_type idx) { - if (!stack_.empty() && verifyKeyState()) { - os_->put(0); - os_->write((char*)&idx, sizeof(FbsonKeyValue::keyid_type)); - kvState_ = WS_Key; - return sizeof(uint8_t) + sizeof(FbsonKeyValue::keyid_type); - } - - return 0; - } - - uint32_t writeNull() { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Null); - kvState_ = WS_Value; - return sizeof(FbsonValue); - } - - return 0; - } - - uint32_t writeBool(bool b) { - if (!stack_.empty() && verifyValueState()) { - if (b) { - os_->put((FbsonTypeUnder)FbsonType::T_True); - } else { - os_->put((FbsonTypeUnder)FbsonType::T_False); - } - - kvState_ = WS_Value; - return sizeof(FbsonValue); - } - - return 0; - } - - uint32_t writeInt8(int8_t v) { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Int8); - os_->put(v); - kvState_ = WS_Value; - return sizeof(Int8Val); - } - - return 0; - } - - uint32_t writeInt16(int16_t v) { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Int16); - os_->write((char*)&v, sizeof(int16_t)); - kvState_ = WS_Value; - return sizeof(Int16Val); - } - - return 0; - } - - uint32_t writeInt32(int32_t v) { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Int32); - os_->write((char*)&v, sizeof(int32_t)); - kvState_ = WS_Value; - return sizeof(Int32Val); - } - - return 0; - } - - uint32_t writeInt64(int64_t v) { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Int64); - os_->write((char*)&v, sizeof(int64_t)); - kvState_ = WS_Value; - return sizeof(Int64Val); - } - - return 0; - } - - uint32_t writeDouble(double v) { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Double); - os_->write((char*)&v, sizeof(double)); - kvState_ = WS_Value; - return sizeof(DoubleVal); - } - - return 0; - } - - // must call writeStartString before writing a string val - bool writeStartString() { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_String); - str_pos_ = os_->tellp(); - - // fill the size bytes with 0 for now - uint32_t size = 0; - os_->write((char*)&size, sizeof(uint32_t)); - - kvState_ = WS_String; - return true; - } - - return false; - } - - // finish writing a string val - bool writeEndString() { - if (kvState_ == WS_String) { - std::streampos cur_pos = os_->tellp(); - int32_t size = (int32_t)(cur_pos - str_pos_ - sizeof(uint32_t)); - assert(size >= 0); - - os_->seekp(str_pos_); - os_->write((char*)&size, sizeof(uint32_t)); - os_->seekp(cur_pos); - - kvState_ = WS_Value; - return true; - } - - return false; - } - - uint32_t writeString(const char* str, uint32_t len) { - if (kvState_ == WS_String) { - os_->write(str, len); - return len; - } - - return 0; - } - - uint32_t writeString(char ch) { - if (kvState_ == WS_String) { - os_->put(ch); - return 1; - } - - return 0; - } - - // must call writeStartBinary before writing a binary val - bool writeStartBinary() { - if (!stack_.empty() && verifyValueState()) { - os_->put((FbsonTypeUnder)FbsonType::T_Binary); - str_pos_ = os_->tellp(); - - // fill the size bytes with 0 for now - uint32_t size = 0; - os_->write((char*)&size, sizeof(uint32_t)); - - kvState_ = WS_Binary; - return true; - } - - return false; - } - - // finish writing a binary val - bool writeEndBinary() { - if (kvState_ == WS_Binary) { - std::streampos cur_pos = os_->tellp(); - int32_t size = (int32_t)(cur_pos - str_pos_ - sizeof(uint32_t)); - assert(size >= 0); - - os_->seekp(str_pos_); - os_->write((char*)&size, sizeof(uint32_t)); - os_->seekp(cur_pos); - - kvState_ = WS_Value; - return true; - } - - return false; - } - - uint32_t writeBinary(const char* bin, uint32_t len) { - if (kvState_ == WS_Binary) { - os_->write(bin, len); - return len; - } - - return 0; - } - - // must call writeStartObject before writing an object val - bool writeStartObject() { - if (stack_.empty() || verifyValueState()) { - if (stack_.empty()) { - // if this is a new FBSON, write the header - if (!hasHdr_) { - writeHeader(); - } else - return false; - } - - os_->put((FbsonTypeUnder)FbsonType::T_Object); - // save the size position - stack_.push(WriteInfo({WS_Object, os_->tellp()})); - - // fill the size bytes with 0 for now - uint32_t size = 0; - os_->write((char*)&size, sizeof(uint32_t)); - - kvState_ = WS_Value; - return true; - } - - return false; - } - - // finish writing an object val - bool writeEndObject() { - if (!stack_.empty() && stack_.top().state == WS_Object && - kvState_ == WS_Value) { - WriteInfo& ci = stack_.top(); - std::streampos cur_pos = os_->tellp(); - int32_t size = (int32_t)(cur_pos - ci.sz_pos - sizeof(uint32_t)); - assert(size >= 0); - - os_->seekp(ci.sz_pos); - os_->write((char*)&size, sizeof(uint32_t)); - os_->seekp(cur_pos); - stack_.pop(); - - return true; - } - - return false; - } - - // must call writeStartArray before writing an array val - bool writeStartArray() { - if (stack_.empty() || verifyValueState()) { - if (stack_.empty()) { - // if this is a new FBSON, write the header - if (!hasHdr_) { - writeHeader(); - } else - return false; - } - - os_->put((FbsonTypeUnder)FbsonType::T_Array); - // save the size position - stack_.push(WriteInfo({WS_Array, os_->tellp()})); - - // fill the size bytes with 0 for now - uint32_t size = 0; - os_->write((char*)&size, sizeof(uint32_t)); - - kvState_ = WS_Value; - return true; - } - - return false; - } - - // finish writing an array val - bool writeEndArray() { - if (!stack_.empty() && stack_.top().state == WS_Array && - kvState_ == WS_Value) { - WriteInfo& ci = stack_.top(); - std::streampos cur_pos = os_->tellp(); - int32_t size = (int32_t)(cur_pos - ci.sz_pos - sizeof(uint32_t)); - assert(size >= 0); - - os_->seekp(ci.sz_pos); - os_->write((char*)&size, sizeof(uint32_t)); - os_->seekp(cur_pos); - stack_.pop(); - - return true; - } - - return false; - } - - OS_TYPE* getOutput() { return os_; } - - private: - // verify we are in the right state before writing a value - bool verifyValueState() { - assert(!stack_.empty()); - return (stack_.top().state == WS_Object && kvState_ == WS_Key) || - (stack_.top().state == WS_Array && kvState_ == WS_Value); - } - - // verify we are in the right state before writing a key - bool verifyKeyState() { - assert(!stack_.empty()); - return stack_.top().state == WS_Object && kvState_ == WS_Value; - } - - void writeHeader() { - os_->put(FBSON_VER); - hasHdr_ = true; - } - - private: - enum WriteState { - WS_NONE, - WS_Array, - WS_Object, - WS_Key, - WS_Value, - WS_String, - WS_Binary, - }; - - struct WriteInfo { - WriteState state; - std::streampos sz_pos; - }; - - private: - OS_TYPE* os_; - bool alloc_; - bool hasHdr_; - WriteState kvState_; // key or value state - std::streampos str_pos_; - std::stack stack_; -}; - -typedef FbsonWriterT FbsonWriter; - -} // namespace fbson diff --git a/ceph/src/rocksdb/tools/analyze_txn_stress_test.sh b/ceph/src/rocksdb/tools/analyze_txn_stress_test.sh new file mode 100755 index 000000000..808260608 --- /dev/null +++ b/ceph/src/rocksdb/tools/analyze_txn_stress_test.sh @@ -0,0 +1,76 @@ +#!/bin/bash +# Usage: +# 1. Enable ROCKS_LOG_DETAILS in util/logging.h +# 2. Run ./transaction_test --gtest_filter="MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/*" --gtest_break_on_failure +# 3. SET=1 # 2 or 3 +# 4. LOG=/dev/shm/transaction_testdb_8600601584148590297/LOG +# 5. grep RandomTransactionVerify $LOG | cut -d' ' -f 12 | sort -n # to find verify snapshots +# 5. vn=1345 +# 6. vn_1=1340 +# 4. . tools/tools/analyze_txn_stress_test.sh +echo Input params: +# The rocksdb LOG path +echo $LOG +# Snapshot at which we got RandomTransactionVerify failure +echo $vn +# The snapshot before that where RandomTransactionVerify passed +echo $vn_1 +# The stress tests use 3 sets, one or more might have shown inconsistent results. +SET=${SET-1} # 1 or 2 or 3 +echo Checking set number $SET + +# Find the txns that committed between the two snapshots, and gather their changes made by them in /tmp/changes.txt +# 2019/02/28-15:25:51.655477 7fffec9ff700 [DEBUG] [ilities/transactions/write_prepared_txn_db.cc:416] Txn 68497 Committing with 68498 +grep Committing $LOG | awk '{if ($9 <= vn && $9 > vn_1) print $0}' vn=$vn vn_1=${vn_1} > /tmp/txn.txt +# 2019/02/28-15:25:49.046464 7fffe81f5700 [DEBUG] [il/transaction_test_util.cc:216] Commit of 65541 OK (txn12936193128775589751-9089) +for i in `cat /tmp/txn.txt | awk '{print $6}'`; do grep "Commit of $i " $LOG; done > /tmp/names.txt +for n in `cat /tmp/names.txt | awk '{print $9}'`; do grep $n $LOG; done > /tmp/changes.txt +echo "Sum of the changes:" +cat /tmp/changes.txt | grep Insert | awk '{print $12}' | cut -d= -f1 | cut -d+ -f2 | awk '{sum+=$1} END{print sum}' + +# Gather read values at each snapshot +# 2019/02/28-15:25:51.655926 7fffebbff700 [DEBUG] [il/transaction_test_util.cc:347] VerifyRead at 67972 (67693): 000230 value: 15983 +grep "VerifyRead at ${vn_1} (.*): 000${SET}" $LOG | cut -d' ' -f 9- > /tmp/va.txt +grep "VerifyRead at ${vn} (.*): 000${SET}" $LOG | cut -d' ' -f 9- > /tmp/vb.txt + +# For each key in the 2nd snapshot, find the value read by 1st, do the adds, and see if the results match. +IFS=$'\n' +for l in `cat /tmp/vb.txt`; +do + grep $l /tmp/va.txt > /dev/null ; + if [[ $? -ne 0 ]]; then + #echo $l + k=`echo $l | awk '{print $1}'`; + v=`echo $l | awk '{print $3}'`; + # 2019/02/28-15:25:19.350111 7fffe81f5700 [DEBUG] [il/transaction_test_util.cc:194] Insert (txn12936193128775589751-2298) OK snap: 16289 key:000219 value: 3772+95=3867 + exp=`grep "\<$k\>" /tmp/changes.txt | tail -1 | cut -d= -f2`; + if [[ $v -ne $exp ]]; then echo $l; fi + else + k=`echo $l | awk '{print $1}'`; + grep "\<$k\>" /tmp/changes.txt + fi; +done + +# Check that all the keys read in the 1st snapshot are still visible in the 2nd +for l in `cat /tmp/va.txt`; +do + k=`echo $l | awk '{print $1}'`; + grep "\<$k\>" /tmp/vb.txt > /dev/null + if [[ $? -ne 0 ]]; then + echo missing key $k + fi +done + +# The following found a bug in ValidateSnapshot. It checks if the adds on each key match up. +grep Insert /tmp/changes.txt | cut -d' ' -f 10 | sort | uniq > /tmp/keys.txt +for k in `cat /tmp/keys.txt`; +do + grep "\<$k\>" /tmp/changes.txt > /tmp/adds.txt; + # 2019/02/28-15:25:19.350111 7fffe81f5700 [DEBUG] [il/transaction_test_util.cc:194] Insert (txn12936193128775589751-2298) OK snap: 16289 key:000219 value: 3772+95=3867 + START=`head -1 /tmp/adds.txt | cut -d' ' -f 12 | cut -d+ -f1` + END=`tail -1 /tmp/adds.txt | cut -d' ' -f 12 | cut -d= -f2` + ADDS=`cat /tmp/adds.txt | grep Insert | awk '{print $12}' | cut -d= -f1 | cut -d+ -f2 | awk '{sum+=$1} END{print sum}'` + EXP=$((START+ADDS)) + # If first + all the adds != last then there was an issue with ValidateSnapshot. + if [[ $END -ne $EXP ]]; then echo inconsistent txn: $k $START+$ADDS=$END; cat /tmp/adds.txt; return 1; fi +done diff --git a/ceph/src/rocksdb/tools/benchmark.sh b/ceph/src/rocksdb/tools/benchmark.sh index 6d0920490..31df59cd7 100755 --- a/ceph/src/rocksdb/tools/benchmark.sh +++ b/ceph/src/rocksdb/tools/benchmark.sh @@ -20,6 +20,7 @@ fi K=1024 M=$((1024 * K)) G=$((1024 * M)) +T=$((1024 * T)) if [ -z $DB_DIR ]; then echo "DB_DIR is not defined" @@ -44,16 +45,16 @@ if [ ! -z $DB_BENCH_NO_SYNC ]; then syncval="0"; fi -num_threads=${NUM_THREADS:-16} +num_threads=${NUM_THREADS:-64} mb_written_per_sec=${MB_WRITE_PER_SEC:-0} # Only for tests that do range scans num_nexts_per_seek=${NUM_NEXTS_PER_SEEK:-10} -cache_size=${CACHE_SIZE:-$((1 * G))} +cache_size=${CACHE_SIZE:-$((17179869184))} compression_max_dict_bytes=${COMPRESSION_MAX_DICT_BYTES:-0} -compression_type=${COMPRESSION_TYPE:-snappy} +compression_type=${COMPRESSION_TYPE:-zstd} duration=${DURATION:-0} -num_keys=${NUM_KEYS:-$((1 * G))} +num_keys=${NUM_KEYS:-8000000000} key_size=${KEY_SIZE:-20} value_size=${VALUE_SIZE:-400} block_size=${BLOCK_SIZE:-8192} @@ -99,7 +100,6 @@ const_params=" l0_config=" --level0_file_num_compaction_trigger=4 \ - --level0_slowdown_writes_trigger=12 \ --level0_stop_writes_trigger=20" if [ $duration -gt 0 ]; then @@ -108,30 +108,35 @@ fi params_w="$const_params \ $l0_config \ - --max_background_jobs=20 \ - --max_write_buffer_number=8" + --max_background_compactions=16 \ + --max_write_buffer_number=8 \ + --max_background_flushes=7" params_bulkload="$const_params \ - --max_background_jobs=20 \ + --max_background_compactions=16 \ --max_write_buffer_number=8 \ + --allow_concurrent_memtable_write=false \ + --max_background_flushes=7 \ --level0_file_num_compaction_trigger=$((10 * M)) \ --level0_slowdown_writes_trigger=$((10 * M)) \ --level0_stop_writes_trigger=$((10 * M))" +params_fillseq="$params_w \ + --allow_concurrent_memtable_write=false" # # Tune values for level and universal compaction. # For universal compaction, these level0_* options mean total sorted of runs in # LSM. In level-based compaction, it means number of L0 files. # params_level_compact="$const_params \ - --max_background_jobs=16 \ + --max_background_flushes=4 \ --max_write_buffer_number=4 \ --level0_file_num_compaction_trigger=4 \ --level0_slowdown_writes_trigger=16 \ --level0_stop_writes_trigger=20" params_univ_compact="$const_params \ - --max_background_jobs=20 \ + --max_background_flushes=4 \ --max_write_buffer_number=4 \ --level0_file_num_compaction_trigger=8 \ --level0_slowdown_writes_trigger=16 \ @@ -151,8 +156,8 @@ function summarize_result { stall_pct=$( grep "^Cumulative stall" $test_out| tail -1 | awk '{ print $5 }' ) ops_sec=$( grep ^${bench_name} $test_out | awk '{ print $5 }' ) mb_sec=$( grep ^${bench_name} $test_out | awk '{ print $7 }' ) - lo_wgb=$( grep "^ L0" $test_out | tail -1 | awk '{ print $8 }' ) - sum_wgb=$( grep "^ Sum" $test_out | tail -1 | awk '{ print $8 }' ) + lo_wgb=$( grep "^ L0" $test_out | tail -1 | awk '{ print $9 }' ) + sum_wgb=$( grep "^ Sum" $test_out | tail -1 | awk '{ print $9 }' ) sum_size=$( grep "^ Sum" $test_out | tail -1 | awk '{ printf "%.1f", $3 / 1024.0 }' ) wamp=$( echo "scale=1; $sum_wgb / $lo_wgb" | bc ) wmb_ps=$( echo "scale=1; ( $sum_wgb * 1024.0 ) / $uptime" | bc ) @@ -232,7 +237,7 @@ function run_manual_compaction_worker { --memtablerep=vector \ --allow_concurrent_memtable_write=false \ --disable_wal=1 \ - --max_background_jobs=$4 \ + --max_background_compactions=$4 \ --seed=$( date +%s ) \ 2>&1 | tee -a $fillrandom_output_file" @@ -276,7 +281,7 @@ function run_univ_compaction { # Define a set of benchmarks. subcompactions=(1 2 4 8 16) - max_background_jobs=(20 20 10 5 4) + max_background_compactions=(16 16 8 4 2) i=0 total=${#subcompactions[@]} @@ -285,7 +290,7 @@ function run_univ_compaction { while [ "$i" -lt "$total" ] do run_manual_compaction_worker $io_stats $compaction_style ${subcompactions[$i]} \ - ${max_background_jobs[$i]} + ${max_background_compactions[$i]} ((i++)) done } @@ -311,7 +316,7 @@ function run_fillseq { cmd="./db_bench --benchmarks=fillseq \ --use_existing_db=0 \ --sync=0 \ - $params_w \ + $params_fillseq \ --min_level_to_compress=0 \ --threads=1 \ --memtablerep=vector \ @@ -465,6 +470,12 @@ for job in ${jobs[@]}; do elif [ $job = fillseq_enable_wal ]; then run_fillseq 0 elif [ $job = overwrite ]; then + syncval="0" + params_w="$params_w \ + --writes=125000000 \ + --subcompactions=4 \ + --soft_pending_compaction_bytes_limit=$((1 * T)) \ + --hard_pending_compaction_bytes_limit=$((4 * T)) " run_change overwrite elif [ $job = updaterandom ]; then run_change updaterandom diff --git a/ceph/src/rocksdb/tools/check_format_compatible.sh b/ceph/src/rocksdb/tools/check_format_compatible.sh index 5959fb832..a849b9e7e 100755 --- a/ceph/src/rocksdb/tools/check_format_compatible.sh +++ b/ceph/src/rocksdb/tools/check_format_compatible.sh @@ -56,7 +56,7 @@ declare -a backward_compatible_checkout_objs=("2.2.fb.branch" "2.3.fb.branch" "2 declare -a forward_compatible_checkout_objs=("3.10.fb" "3.11.fb" "3.12.fb" "3.13.fb" "4.0.fb" "4.1.fb" "4.2.fb" "4.3.fb" "4.4.fb" "4.5.fb" "4.6.fb" "4.7.fb" "4.8.fb" "4.9.fb" "4.10.fb" "4.11.fb" "4.12.fb" "4.13.fb" "5.0.fb" "5.1.fb" "5.2.fb" "5.3.fb" "5.4.fb" "5.5.fb" "5.6.fb" "5.7.fb" "5.8.fb" "5.9.fb" "5.10.fb") declare -a forward_compatible_with_options_checkout_objs=("5.11.fb" "5.12.fb" "5.13.fb" "5.14.fb") declare -a checkout_objs=(${backward_compatible_checkout_objs[@]} ${forward_compatible_checkout_objs[@]} ${forward_compatible_with_options_checkout_objs[@]}) -declare -a extern_sst_ingestion_compatible_checkout_objs=("5.14.fb" "5.15.fb") +declare -a extern_sst_ingestion_compatible_checkout_objs=("5.14.fb" "5.15.fb" "5.16.fb" "5.17.fb" "5.18.fb") generate_db() { diff --git a/ceph/src/rocksdb/tools/db_bench_tool.cc b/ceph/src/rocksdb/tools/db_bench_tool.cc index e3560d6fa..0cb4e0eb2 100644 --- a/ceph/src/rocksdb/tools/db_bench_tool.cc +++ b/ceph/src/rocksdb/tools/db_bench_tool.cc @@ -21,6 +21,7 @@ #endif #include #include +#include #include #include #include @@ -33,10 +34,12 @@ #include #include "db/db_impl.h" +#include "db/malloc_stats.h" #include "db/version_set.h" #include "hdfs/env_hdfs.h" #include "monitoring/histogram.h" #include "monitoring/statistics.h" +#include "options/cf_options.h" #include "port/port.h" #include "port/stack_trace.h" #include "rocksdb/cache.h" @@ -45,7 +48,6 @@ #include "rocksdb/filter_policy.h" #include "rocksdb/memtablerep.h" #include "rocksdb/options.h" -#include "options/cf_options.h" #include "rocksdb/perf_context.h" #include "rocksdb/persistent_cache.h" #include "rocksdb/rate_limiter.h" @@ -101,6 +103,7 @@ DEFINE_string( "compact," "compactall," "multireadrandom," + "mixgraph," "readseq," "readtocache," "readreverse," @@ -248,6 +251,10 @@ DEFINE_bool(reverse_iterator, false, "When true use Prev rather than Next for iterators that do " "Seek and then Next"); +DEFINE_int64(max_scan_distance, 0, + "Used to define iterate_upper_bound (or iterate_lower_bound " + "if FLAGS_reverse_iterator is set to true) when value is nonzero"); + DEFINE_bool(use_uint64_comparator, false, "use Uint64 user comparator"); DEFINE_int64(batch_size, 1, "Batch size"); @@ -507,6 +514,8 @@ DEFINE_int32(bloom_bits, -1, "Bloom filter bits per key. Negative means" DEFINE_double(memtable_bloom_size_ratio, 0, "Ratio of memtable size used for bloom filter. 0 means no bloom " "filter."); +DEFINE_bool(memtable_whole_key_filtering, false, + "Try to use whole key bloom filter in memtables."); DEFINE_bool(memtable_use_huge_page, false, "Try to use huge page in memtables."); @@ -514,6 +523,14 @@ DEFINE_bool(use_existing_db, false, "If true, do not destroy the existing" " database. If you set this flag and also specify a benchmark that" " wants a fresh database, that benchmark will fail."); +DEFINE_bool(use_existing_keys, false, + "If true, uses existing keys in the DB, " + "rather than generating new ones. This involves some startup " + "latency to load all keys into memory. It is supported for the " + "same read/overwrite benchmarks as `-use_existing_db=true`, which " + "must also be set for this flag to be enabled. When this flag is " + "set, the value for `-num` will be ignored."); + DEFINE_bool(show_table_properties, false, "If true, then per-level table" " properties will be printed on every stats-interval when" @@ -551,6 +568,8 @@ DEFINE_bool(verify_checksum, true, " from storage"); DEFINE_bool(statistics, false, "Database statistics"); +DEFINE_int32(stats_level, rocksdb::StatsLevel::kExceptDetailedTimers, + "stats level for statistics"); DEFINE_string(statistics_string, "", "Serialized statistics string"); static class std::shared_ptr dbstats; @@ -640,9 +659,11 @@ DEFINE_bool(optimize_filters_for_hits, false, DEFINE_uint64(delete_obsolete_files_period_micros, 0, "Ignored. Left here for backward compatibility"); +DEFINE_int64(writes_before_delete_range, 0, + "Number of writes before DeleteRange is called regularly."); + DEFINE_int64(writes_per_range_tombstone, 0, - "Number of writes between range " - "tombstones"); + "Number of writes between range tombstones"); DEFINE_int64(range_tombstone_width, 100, "Number of keys in tombstone's range"); @@ -687,6 +708,7 @@ DEFINE_string( "RocksDB options related command-line arguments, all other arguments " "that are related to RocksDB options will be ignored:\n" "\t--use_existing_db\n" + "\t--use_existing_keys\n" "\t--statistics\n" "\t--row_cache_size\n" "\t--row_cache_numshardbits\n" @@ -779,6 +801,8 @@ DEFINE_string(compression_type, "snappy", static enum rocksdb::CompressionType FLAGS_compression_type_e = rocksdb::kSnappyCompression; +DEFINE_int64(sample_for_compression, 0, "Sample every N block for compression"); + DEFINE_int32(compression_level, rocksdb::CompressionOptions().level, "Compression level. The meaning of this value is library-" "dependent. If unset, we try to use the default for the library " @@ -925,6 +949,52 @@ DEFINE_uint64( "If non-zero, db_bench will rate-limit the writes going into RocksDB. This " "is the global rate in bytes/second."); +// the parameters of mix_graph +DEFINE_double(key_dist_a, 0.0, + "The parameter 'a' of key access distribution model " + "f(x)=a*x^b"); +DEFINE_double(key_dist_b, 0.0, + "The parameter 'b' of key access distribution model " + "f(x)=a*x^b"); +DEFINE_double(value_theta, 0.0, + "The parameter 'theta' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(value_k, 0.0, + "The parameter 'k' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(value_sigma, 0.0, + "The parameter 'theta' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(iter_theta, 0.0, + "The parameter 'theta' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(iter_k, 0.0, + "The parameter 'k' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(iter_sigma, 0.0, + "The parameter 'sigma' of Generized Pareto Distribution " + "f(x)=(1/sigma)*(1+k*(x-theta)/sigma)^-(1/k+1)"); +DEFINE_double(mix_get_ratio, 1.0, + "The ratio of Get queries of mix_graph workload"); +DEFINE_double(mix_put_ratio, 0.0, + "The ratio of Put queries of mix_graph workload"); +DEFINE_double(mix_seek_ratio, 0.0, + "The ratio of Seek queries of mix_graph workload"); +DEFINE_int64(mix_max_scan_len, 10000, "The max scan length of Iterator"); +DEFINE_int64(mix_ave_kv_size, 512, + "The average key-value size of this workload"); +DEFINE_int64(mix_max_value_size, 1024, "The max value size of this workload"); +DEFINE_double( + sine_mix_rate_noise, 0.0, + "Add the noise ratio to the sine rate, it is between 0.0 and 1.0"); +DEFINE_bool(sine_mix_rate, false, + "Enable the sine QPS control on the mix workload"); +DEFINE_uint64( + sine_mix_rate_interval_milliseconds, 10000, + "Interval of which the sine wave read_rate_limit is recalculated"); +DEFINE_int64(mix_accesses, -1, + "The total query accesses of mix_graph workload"); + DEFINE_uint64( benchmark_read_rate_limit, 0, "If non-zero, db_bench will rate-limit the reads from RocksDB. This " @@ -935,6 +1005,9 @@ DEFINE_uint64(max_compaction_bytes, rocksdb::Options().max_compaction_bytes, #ifndef ROCKSDB_LITE DEFINE_bool(readonly, false, "Run read only benchmarks."); + +DEFINE_bool(print_malloc_stats, false, + "Print malloc stats to stdout after benchmarks finish."); #endif // ROCKSDB_LITE DEFINE_bool(disable_auto_compactions, false, "Do not auto trigger compactions"); @@ -1035,13 +1108,18 @@ DEFINE_bool(identity_as_first_hash, false, "the first hash function of cuckoo " DEFINE_bool(dump_malloc_stats, true, "Dump malloc stats in LOG "); DEFINE_uint64(stats_dump_period_sec, rocksdb::Options().stats_dump_period_sec, "Gap between printing stats to log in seconds"); +DEFINE_uint64(stats_persist_period_sec, + rocksdb::Options().stats_persist_period_sec, + "Gap between persisting stats in seconds"); +DEFINE_uint64(stats_history_buffer_size, + rocksdb::Options().stats_history_buffer_size, + "Max number of stats snapshots to keep in memory"); enum RepFactory { kSkipList, kPrefixHash, kVectorRep, kHashLinkedList, - kCuckoo }; static enum RepFactory StringToRepFactory(const char* ctype) { @@ -1055,8 +1133,6 @@ static enum RepFactory StringToRepFactory(const char* ctype) { return kVectorRep; else if (!strcasecmp(ctype, "hash_linkedlist")) return kHashLinkedList; - else if (!strcasecmp(ctype, "cuckoo")) - return kCuckoo; fprintf(stdout, "Cannot parse memreptable %s\n", ctype); return kSkipList; @@ -1137,19 +1213,20 @@ class ReportFileOpEnv : public EnvWrapper { counters_.bytes_written_ = 0; } - Status NewSequentialFile(const std::string& f, unique_ptr* r, + Status NewSequentialFile(const std::string& f, + std::unique_ptr* r, const EnvOptions& soptions) override { class CountingFile : public SequentialFile { private: - unique_ptr target_; + std::unique_ptr target_; ReportFileOpCounters* counters_; public: - CountingFile(unique_ptr&& target, + CountingFile(std::unique_ptr&& target, ReportFileOpCounters* counters) : target_(std::move(target)), counters_(counters) {} - virtual Status Read(size_t n, Slice* result, char* scratch) override { + Status Read(size_t n, Slice* result, char* scratch) override { counters_->read_counter_.fetch_add(1, std::memory_order_relaxed); Status rv = target_->Read(n, result, scratch); counters_->bytes_read_.fetch_add(result->size(), @@ -1157,7 +1234,7 @@ class ReportFileOpEnv : public EnvWrapper { return rv; } - virtual Status Skip(uint64_t n) override { return target_->Skip(n); } + Status Skip(uint64_t n) override { return target_->Skip(n); } }; Status s = target()->NewSequentialFile(f, r, soptions); @@ -1169,19 +1246,19 @@ class ReportFileOpEnv : public EnvWrapper { } Status NewRandomAccessFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& soptions) override { class CountingFile : public RandomAccessFile { private: - unique_ptr target_; + std::unique_ptr target_; ReportFileOpCounters* counters_; public: - CountingFile(unique_ptr&& target, + CountingFile(std::unique_ptr&& target, ReportFileOpCounters* counters) : target_(std::move(target)), counters_(counters) {} - virtual Status Read(uint64_t offset, size_t n, Slice* result, - char* scratch) const override { + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { counters_->read_counter_.fetch_add(1, std::memory_order_relaxed); Status rv = target_->Read(offset, n, result, scratch); counters_->bytes_read_.fetch_add(result->size(), @@ -1198,15 +1275,15 @@ class ReportFileOpEnv : public EnvWrapper { return s; } - Status NewWritableFile(const std::string& f, unique_ptr* r, + Status NewWritableFile(const std::string& f, std::unique_ptr* r, const EnvOptions& soptions) override { class CountingFile : public WritableFile { private: - unique_ptr target_; + std::unique_ptr target_; ReportFileOpCounters* counters_; public: - CountingFile(unique_ptr&& target, + CountingFile(std::unique_ptr&& target, ReportFileOpCounters* counters) : target_(std::move(target)), counters_(counters) {} @@ -1772,7 +1849,7 @@ class Stats { fprintf(stdout, "%-12s : %11.3f micros/op %ld ops/sec;%s%s\n", name.ToString().c_str(), - elapsed * 1e6 / done_, + seconds_ * 1e6 / done_, (long)throughput, (extra.empty() ? "" : " "), extra.c_str()); @@ -1968,12 +2045,15 @@ class Benchmark { int prefix_size_; int64_t keys_per_prefix_; int64_t entries_per_batch_; + int64_t writes_before_delete_range_; int64_t writes_per_range_tombstone_; int64_t range_tombstone_width_; int64_t max_num_range_tombstones_; WriteOptions write_options_; Options open_options_; // keep options around to properly destroy db later +#ifndef ROCKSDB_LITE TraceOptions trace_options_; +#endif int64_t reads_; int64_t deletes_; double read_random_exp_range_; @@ -1982,25 +2062,28 @@ class Benchmark { int64_t merge_keys_; bool report_file_operations_; bool use_blob_db_; + std::vector keys_; class ErrorHandlerListener : public EventListener { public: +#ifndef ROCKSDB_LITE ErrorHandlerListener() : mutex_(), cv_(&mutex_), no_auto_recovery_(false), recovery_complete_(false) {} - ~ErrorHandlerListener() {} + ~ErrorHandlerListener() override {} void OnErrorRecoveryBegin(BackgroundErrorReason /*reason*/, - Status /*bg_error*/, bool* auto_recovery) { + Status /*bg_error*/, + bool* auto_recovery) override { if (*auto_recovery && no_auto_recovery_) { *auto_recovery = false; } } - void OnErrorRecoveryCompleted(Status /*old_bg_error*/) { + void OnErrorRecoveryCompleted(Status /*old_bg_error*/) override { InstrumentedMutexLock l(&mutex_); recovery_complete_ = true; cv_.SignalAll(); @@ -2025,6 +2108,10 @@ class Benchmark { InstrumentedCondVar cv_; bool no_auto_recovery_; bool recovery_complete_; +#else // ROCKSDB_LITE + bool WaitForRecovery(uint64_t /*abs_time_us*/) { return true; } + void EnableAutoRecovery(bool /*enable*/) {} +#endif // ROCKSDB_LITE }; std::shared_ptr listener_; @@ -2037,28 +2124,28 @@ class Benchmark { return true; } - inline bool CompressSlice(const CompressionContext& compression_ctx, + inline bool CompressSlice(const CompressionInfo& compression_info, const Slice& input, std::string* compressed) { bool ok = true; switch (FLAGS_compression_type_e) { case rocksdb::kSnappyCompression: - ok = Snappy_Compress(compression_ctx, input.data(), input.size(), + ok = Snappy_Compress(compression_info, input.data(), input.size(), compressed); break; case rocksdb::kZlibCompression: - ok = Zlib_Compress(compression_ctx, 2, input.data(), input.size(), + ok = Zlib_Compress(compression_info, 2, input.data(), input.size(), compressed); break; case rocksdb::kBZip2Compression: - ok = BZip2_Compress(compression_ctx, 2, input.data(), input.size(), + ok = BZip2_Compress(compression_info, 2, input.data(), input.size(), compressed); break; case rocksdb::kLZ4Compression: - ok = LZ4_Compress(compression_ctx, 2, input.data(), input.size(), + ok = LZ4_Compress(compression_info, 2, input.data(), input.size(), compressed); break; case rocksdb::kLZ4HCCompression: - ok = LZ4HC_Compress(compression_ctx, 2, input.data(), input.size(), + ok = LZ4HC_Compress(compression_info, 2, input.data(), input.size(), compressed); break; case rocksdb::kXpressCompression: @@ -2066,7 +2153,7 @@ class Benchmark { input.size(), compressed); break; case rocksdb::kZSTD: - ok = ZSTD_Compress(compression_ctx, input.data(), input.size(), + ok = ZSTD_Compress(compression_info, input.data(), input.size(), compressed); break; default: @@ -2110,6 +2197,8 @@ class Benchmark { auto compression = CompressionTypeToString(FLAGS_compression_type_e); fprintf(stdout, "Compression: %s\n", compression.c_str()); + fprintf(stdout, "Compression sampling rate: %" PRId64 "\n", + FLAGS_sample_for_compression); switch (FLAGS_rep_factory) { case kPrefixHash: @@ -2124,9 +2213,6 @@ class Benchmark { case kHashLinkedList: fprintf(stdout, "Memtablerep: hash_linkedlist\n"); break; - case kCuckoo: - fprintf(stdout, "Memtablerep: cuckoo\n"); - break; } fprintf(stdout, "Perf Level: %d\n", FLAGS_perf_level); @@ -2149,10 +2235,12 @@ class Benchmark { const int len = FLAGS_block_size; std::string input_str(len, 'y'); std::string compressed; - CompressionContext compression_ctx(FLAGS_compression_type_e, - Options().compression_opts); - bool result = - CompressSlice(compression_ctx, Slice(input_str), &compressed); + CompressionOptions opts; + CompressionContext context(FLAGS_compression_type_e); + CompressionInfo info(opts, context, CompressionDict::GetEmptyDict(), + FLAGS_compression_type_e, + FLAGS_sample_for_compression); + bool result = CompressSlice(info, Slice(input_str), &compressed); if (!result) { fprintf(stdout, "WARNING: %s compression is not enabled\n", @@ -2253,13 +2341,13 @@ class Benchmark { class KeepFilter : public CompactionFilter { public: - virtual bool Filter(int /*level*/, const Slice& /*key*/, - const Slice& /*value*/, std::string* /*new_value*/, - bool* /*value_changed*/) const override { + bool Filter(int /*level*/, const Slice& /*key*/, const Slice& /*value*/, + std::string* /*new_value*/, + bool* /*value_changed*/) const override { return false; } - virtual const char* Name() const override { return "KeepFilter"; } + const char* Name() const override { return "KeepFilter"; } }; std::shared_ptr NewCache(int64_t capacity) { @@ -2397,6 +2485,13 @@ class Benchmark { // | key 00000 | // ---------------------------- void GenerateKeyFromInt(uint64_t v, int64_t num_keys, Slice* key) { + if (!keys_.empty()) { + assert(FLAGS_use_existing_keys); + assert(keys_.size() == static_cast(num_keys)); + assert(v < static_cast(num_keys)); + *key = keys_[v]; + return; + } char* start = const_cast(key->data()); char* pos = start; if (keys_per_prefix_ > 0) { @@ -2495,6 +2590,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { value_size_ = FLAGS_value_size; key_size_ = FLAGS_key_size; entries_per_batch_ = FLAGS_batch_size; + writes_before_delete_range_ = FLAGS_writes_before_delete_range; writes_per_range_tombstone_ = FLAGS_writes_per_range_tombstone; range_tombstone_width_ = FLAGS_range_tombstone_width; max_num_range_tombstones_ = FLAGS_max_num_range_tombstones; @@ -2611,6 +2707,8 @@ void VerifyDBFromDB(std::string& truth_db_name) { fprintf(stderr, "entries_per_batch = %" PRIi64 "\n", entries_per_batch_); method = &Benchmark::MultiReadRandom; + } else if (name == "mixgraph") { + method = &Benchmark::MixGraph; } else if (name == "readmissing") { ++key_size_; method = &Benchmark::ReadRandom; @@ -2849,6 +2947,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { } SetPerfLevel(static_cast (shared->perf_level)); + perf_context.EnablePerLevelPerfContext(); thread->stats.Start(thread->tid); (arg->bm->*(arg->method))(thread); thread->stats.Stop(); @@ -3004,13 +3103,15 @@ void VerifyDBFromDB(std::string& truth_db_name) { int64_t produced = 0; bool ok = true; std::string compressed; - CompressionContext compression_ctx(FLAGS_compression_type_e, - Options().compression_opts); - + CompressionOptions opts; + CompressionContext context(FLAGS_compression_type_e); + CompressionInfo info(opts, context, CompressionDict::GetEmptyDict(), + FLAGS_compression_type_e, + FLAGS_sample_for_compression); // Compress 1G while (ok && bytes < int64_t(1) << 30) { compressed.clear(); - ok = CompressSlice(compression_ctx, input, &compressed); + ok = CompressSlice(info, input, &compressed); produced += compressed.size(); bytes += input.size(); thread->stats.FinishedOps(nullptr, nullptr, 1, kCompress); @@ -3032,15 +3133,21 @@ void VerifyDBFromDB(std::string& truth_db_name) { Slice input = gen.Generate(FLAGS_block_size); std::string compressed; + CompressionContext compression_ctx(FLAGS_compression_type_e); + CompressionOptions compression_opts; + CompressionInfo compression_info( + compression_opts, compression_ctx, CompressionDict::GetEmptyDict(), + FLAGS_compression_type_e, FLAGS_sample_for_compression); UncompressionContext uncompression_ctx(FLAGS_compression_type_e); - CompressionContext compression_ctx(FLAGS_compression_type_e, - Options().compression_opts); + UncompressionInfo uncompression_info(uncompression_ctx, + UncompressionDict::GetEmptyDict(), + FLAGS_compression_type_e); - bool ok = CompressSlice(compression_ctx, input, &compressed); + bool ok = CompressSlice(compression_info, input, &compressed); int64_t bytes = 0; int decompress_size; while (ok && bytes < 1024 * 1048576) { - char *uncompressed = nullptr; + CacheAllocationPtr uncompressed; switch (FLAGS_compression_type_e) { case rocksdb::kSnappyCompression: { // get size and allocate here to make comparison fair @@ -3050,45 +3157,44 @@ void VerifyDBFromDB(std::string& truth_db_name) { ok = false; break; } - uncompressed = new char[ulength]; + uncompressed = AllocateBlock(ulength, nullptr); ok = Snappy_Uncompress(compressed.data(), compressed.size(), - uncompressed); + uncompressed.get()); break; } case rocksdb::kZlibCompression: - uncompressed = Zlib_Uncompress(uncompression_ctx, compressed.data(), + uncompressed = Zlib_Uncompress(uncompression_info, compressed.data(), compressed.size(), &decompress_size, 2); - ok = uncompressed != nullptr; + ok = uncompressed.get() != nullptr; break; case rocksdb::kBZip2Compression: uncompressed = BZip2_Uncompress(compressed.data(), compressed.size(), &decompress_size, 2); - ok = uncompressed != nullptr; + ok = uncompressed.get() != nullptr; break; case rocksdb::kLZ4Compression: - uncompressed = LZ4_Uncompress(uncompression_ctx, compressed.data(), + uncompressed = LZ4_Uncompress(uncompression_info, compressed.data(), compressed.size(), &decompress_size, 2); - ok = uncompressed != nullptr; + ok = uncompressed.get() != nullptr; break; case rocksdb::kLZ4HCCompression: - uncompressed = LZ4_Uncompress(uncompression_ctx, compressed.data(), + uncompressed = LZ4_Uncompress(uncompression_info, compressed.data(), compressed.size(), &decompress_size, 2); - ok = uncompressed != nullptr; + ok = uncompressed.get() != nullptr; break; case rocksdb::kXpressCompression: - uncompressed = XPRESS_Uncompress(compressed.data(), compressed.size(), - &decompress_size); - ok = uncompressed != nullptr; + uncompressed.reset(XPRESS_Uncompress( + compressed.data(), compressed.size(), &decompress_size)); + ok = uncompressed.get() != nullptr; break; case rocksdb::kZSTD: - uncompressed = ZSTD_Uncompress(uncompression_ctx, compressed.data(), + uncompressed = ZSTD_Uncompress(uncompression_info, compressed.data(), compressed.size(), &decompress_size); - ok = uncompressed != nullptr; + ok = uncompressed.get() != nullptr; break; default: ok = false; } - delete[] uncompressed; bytes += input.size(); thread->stats.FinishedOps(nullptr, nullptr, 1, kUncompress); } @@ -3153,9 +3259,10 @@ void VerifyDBFromDB(std::string& truth_db_name) { options.use_direct_io_for_flush_and_compaction = FLAGS_use_direct_io_for_flush_and_compaction; #ifndef ROCKSDB_LITE + options.ttl = FLAGS_fifo_compaction_ttl; options.compaction_options_fifo = CompactionOptionsFIFO( FLAGS_fifo_compaction_max_table_files_size_mb * 1024 * 1024, - FLAGS_fifo_compaction_allow_compaction, FLAGS_fifo_compaction_ttl); + FLAGS_fifo_compaction_allow_compaction); #endif // ROCKSDB_LITE if (FLAGS_prefix_size != 0) { options.prefix_extractor.reset( @@ -3173,6 +3280,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { } options.memtable_huge_page_size = FLAGS_memtable_use_huge_page ? 2048 : 0; options.memtable_prefix_bloom_size_ratio = FLAGS_memtable_bloom_size_ratio; + options.memtable_whole_key_filtering = FLAGS_memtable_whole_key_filtering; if (FLAGS_memtable_insert_with_hint_prefix_size > 0) { options.memtable_insert_with_hint_prefix_extractor.reset( NewCappedPrefixTransform( @@ -3219,10 +3327,6 @@ void VerifyDBFromDB(std::string& truth_db_name) { new VectorRepFactory ); break; - case kCuckoo: - options.memtable_factory.reset(NewHashCuckooRepFactory( - options.write_buffer_size, FLAGS_key_size + FLAGS_value_size)); - break; #else default: fprintf(stderr, "Only skip list is supported in lite mode\n"); @@ -3390,6 +3494,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { options.level0_slowdown_writes_trigger = FLAGS_level0_slowdown_writes_trigger; options.compression = FLAGS_compression_type_e; + options.sample_for_compression = FLAGS_sample_for_compression; options.WAL_ttl_seconds = FLAGS_wal_ttl_seconds; options.WAL_size_limit_MB = FLAGS_wal_size_limit_MB; options.max_total_wal_size = FLAGS_max_total_wal_size; @@ -3492,6 +3597,10 @@ void VerifyDBFromDB(std::string& truth_db_name) { options.dump_malloc_stats = FLAGS_dump_malloc_stats; options.stats_dump_period_sec = static_cast(FLAGS_stats_dump_period_sec); + options.stats_persist_period_sec = + static_cast(FLAGS_stats_persist_period_sec); + options.stats_history_buffer_size = + static_cast(FLAGS_stats_history_buffer_size); options.compression_opts.level = FLAGS_compression_level; options.compression_opts.max_dict_bytes = FLAGS_compression_max_dict_bytes; @@ -3569,6 +3678,19 @@ void VerifyDBFromDB(std::string& truth_db_name) { options.compaction_filter = new KeepFilter(); fprintf(stdout, "A noop compaction filter is used\n"); } + + if (FLAGS_use_existing_keys) { + // Only work on single database + assert(db_.db != nullptr); + ReadOptions read_opts; + read_opts.total_order_seek = true; + Iterator* iter = db_.db->NewIterator(read_opts); + for (iter->SeekToFirst(); iter->Valid(); iter->Next()) { + keys_.emplace_back(iter->key().ToString()); + } + delete iter; + FLAGS_num = keys_.size(); + } } void Open(Options* opts) { @@ -3876,9 +3998,13 @@ void VerifyDBFromDB(std::string& truth_db_name) { bytes += value_size_ + key_size_; ++num_written; if (writes_per_range_tombstone_ > 0 && - num_written / writes_per_range_tombstone_ <= + num_written > writes_before_delete_range_ && + (num_written - writes_before_delete_range_) / + writes_per_range_tombstone_ <= max_num_range_tombstones_ && - num_written % writes_per_range_tombstone_ == 0) { + (num_written - writes_before_delete_range_) % + writes_per_range_tombstone_ == + 0) { int64_t begin_num = key_gens[id]->Next(); if (FLAGS_expand_range_tombstones) { for (int64_t offset = 0; offset < range_tombstone_width_; @@ -4229,7 +4355,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { } if (levelMeta.level == 0) { for (auto& fileMeta : levelMeta.files) { - fprintf(stdout, "Level[%d]: %s(size: %" PRIu64 " bytes)\n", + fprintf(stdout, "Level[%d]: %s(size: %" ROCKSDB_PRIszt " bytes)\n", levelMeta.level, fileMeta.name.c_str(), fileMeta.size); } } else { @@ -4509,6 +4635,255 @@ void VerifyDBFromDB(std::string& truth_db_name) { thread->stats.AddMessage(msg); } + // THe reverse function of Pareto function + int64_t ParetoCdfInversion(double u, double theta, double k, double sigma) { + double ret; + if (k == 0.0) { + ret = theta - sigma * std::log(u); + } else { + ret = theta + sigma * (std::pow(u, -1 * k) - 1) / k; + } + return static_cast(ceil(ret)); + } + // inversion of y=ax^b + int64_t PowerCdfInversion(double u, double a, double b) { + double ret; + ret = std::pow((u / a), (1 / b)); + return static_cast(ceil(ret)); + } + + // Add the noice to the QPS + double AddNoise(double origin, double noise_ratio) { + if (noise_ratio < 0.0 || noise_ratio > 1.0) { + return origin; + } + int band_int = static_cast(FLAGS_sine_a); + double delta = (rand() % band_int - band_int / 2) * noise_ratio; + if (origin + delta < 0) { + return origin; + } else { + return (origin + delta); + } + } + + // decide the query type + // 0 Get, 1 Put, 2 Seek, 3 SeekForPrev, 4 Delete, 5 SingleDelete, 6 merge + class QueryDecider { + public: + std::vector type_; + std::vector ratio_; + int range_; + + QueryDecider() {} + ~QueryDecider() {} + + Status Initiate(std::vector ratio_input) { + int range_max = 1000; + double sum = 0.0; + for (auto& ratio : ratio_input) { + sum += ratio; + } + range_ = 0; + for (auto& ratio : ratio_input) { + range_ += static_cast(ceil(range_max * (ratio / sum))); + type_.push_back(range_); + ratio_.push_back(ratio / sum); + } + return Status::OK(); + } + + int GetType(int64_t rand_num) { + if (rand_num < 0) { + rand_num = rand_num * (-1); + } + assert(range_ != 0); + int pos = static_cast(rand_num % range_); + for (int i = 0; i < static_cast(type_.size()); i++) { + if (pos < type_[i]) { + return i; + } + } + return 0; + } + }; + + // The graph wokrload mixed with Get, Put, Iterator + void MixGraph(ThreadState* thread) { + int64_t read = 0; // including single gets and Next of iterators + int64_t gets = 0; + int64_t puts = 0; + int64_t found = 0; + int64_t seek = 0; + int64_t seek_found = 0; + int64_t bytes = 0; + const int64_t default_value_max = 1 * 1024 * 1024; + int64_t value_max = default_value_max; + int64_t scan_len_max = FLAGS_mix_max_scan_len; + double write_rate = 1000000.0; + double read_rate = 1000000.0; + std::vector ratio{FLAGS_mix_get_ratio, FLAGS_mix_put_ratio, + FLAGS_mix_seek_ratio}; + char value_buffer[default_value_max]; + QueryDecider query; + RandomGenerator gen; + Status s; + if (value_max > FLAGS_mix_max_value_size) { + value_max = FLAGS_mix_max_value_size; + } + + ReadOptions options(FLAGS_verify_checksum, true); + std::unique_ptr key_guard; + Slice key = AllocateKey(&key_guard); + PinnableSlice pinnable_val; + query.Initiate(ratio); + + // the limit of qps initiation + if (FLAGS_sine_a != 0 || FLAGS_sine_d != 0) { + thread->shared->read_rate_limiter.reset(NewGenericRateLimiter( + read_rate, 100000 /* refill_period_us */, 10 /* fairness */, + RateLimiter::Mode::kReadsOnly)); + thread->shared->write_rate_limiter.reset( + NewGenericRateLimiter(write_rate)); + } + + Duration duration(FLAGS_duration, reads_); + while (!duration.Done(1)) { + DBWithColumnFamilies* db_with_cfh = SelectDBWithCfh(thread); + int64_t rand_v, key_rand, key_seed; + rand_v = GetRandomKey(&thread->rand) % FLAGS_num; + double u = static_cast(rand_v) / FLAGS_num; + key_seed = PowerCdfInversion(u, FLAGS_key_dist_a, FLAGS_key_dist_b); + Random64 rand(key_seed); + key_rand = static_cast(rand.Next()) % FLAGS_num; + GenerateKeyFromInt(key_rand, FLAGS_num, &key); + int query_type = query.GetType(rand_v); + + // change the qps + uint64_t now = FLAGS_env->NowMicros(); + uint64_t usecs_since_last; + if (now > thread->stats.GetSineInterval()) { + usecs_since_last = now - thread->stats.GetSineInterval(); + } else { + usecs_since_last = 0; + } + + if (usecs_since_last > + (FLAGS_sine_mix_rate_interval_milliseconds * uint64_t{1000})) { + double usecs_since_start = + static_cast(now - thread->stats.GetStart()); + thread->stats.ResetSineInterval(); + double mix_rate_with_noise = AddNoise( + SineRate(usecs_since_start / 1000000.0), FLAGS_sine_mix_rate_noise); + read_rate = mix_rate_with_noise * (query.ratio_[0] + query.ratio_[2]); + write_rate = + mix_rate_with_noise * query.ratio_[1] * FLAGS_mix_ave_kv_size; + + thread->shared->write_rate_limiter.reset( + NewGenericRateLimiter(write_rate)); + thread->shared->read_rate_limiter.reset(NewGenericRateLimiter( + read_rate, + FLAGS_sine_mix_rate_interval_milliseconds * uint64_t{1000}, 10, + RateLimiter::Mode::kReadsOnly)); + } + // Start the query + if (query_type == 0) { + // the Get query + gets++; + read++; + if (FLAGS_num_column_families > 1) { + s = db_with_cfh->db->Get(options, db_with_cfh->GetCfh(key_rand), key, + &pinnable_val); + } else { + pinnable_val.Reset(); + s = db_with_cfh->db->Get(options, + db_with_cfh->db->DefaultColumnFamily(), key, + &pinnable_val); + } + + if (s.ok()) { + found++; + bytes += key.size() + pinnable_val.size(); + } else if (!s.IsNotFound()) { + fprintf(stderr, "Get returned an error: %s\n", s.ToString().c_str()); + abort(); + } + + if (thread->shared->read_rate_limiter.get() != nullptr && + read % 256 == 255) { + thread->shared->read_rate_limiter->Request( + 256, Env::IO_HIGH, nullptr /* stats */, + RateLimiter::OpType::kRead); + } + thread->stats.FinishedOps(db_with_cfh, db_with_cfh->db, 1, kRead); + } else if (query_type == 1) { + // the Put query + puts++; + int64_t value_size = ParetoCdfInversion( + u, FLAGS_value_theta, FLAGS_value_k, FLAGS_value_sigma); + if (value_size < 0) { + value_size = 10; + } else if (value_size > value_max) { + value_size = value_size % value_max; + } + s = db_with_cfh->db->Put( + write_options_, key, + gen.Generate(static_cast(value_size))); + if (!s.ok()) { + fprintf(stderr, "put error: %s\n", s.ToString().c_str()); + exit(1); + } + + if (thread->shared->write_rate_limiter) { + thread->shared->write_rate_limiter->Request( + key.size() + value_size, Env::IO_HIGH, nullptr /*stats*/, + RateLimiter::OpType::kWrite); + } + thread->stats.FinishedOps(db_with_cfh, db_with_cfh->db, 1, kWrite); + } else if (query_type == 2) { + // Seek query + if (db_with_cfh->db != nullptr) { + Iterator* single_iter = nullptr; + single_iter = db_with_cfh->db->NewIterator(options); + if (single_iter != nullptr) { + single_iter->Seek(key); + seek++; + read++; + if (single_iter->Valid() && single_iter->key().compare(key) == 0) { + seek_found++; + } + int64_t scan_length = + ParetoCdfInversion(u, FLAGS_iter_theta, FLAGS_iter_k, + FLAGS_iter_sigma) % + scan_len_max; + for (int64_t j = 0; j < scan_length && single_iter->Valid(); j++) { + Slice value = single_iter->value(); + memcpy(value_buffer, value.data(), + std::min(value.size(), sizeof(value_buffer))); + bytes += single_iter->key().size() + single_iter->value().size(); + single_iter->Next(); + assert(single_iter->status().ok()); + } + } + delete single_iter; + } + thread->stats.FinishedOps(db_with_cfh, db_with_cfh->db, 1, kSeek); + } + } + char msg[256]; + snprintf(msg, sizeof(msg), + "( Gets:%" PRIu64 " Puts:%" PRIu64 " Seek:%" PRIu64 " of %" PRIu64 + " in %" PRIu64 " found)\n", + gets, puts, seek, found, read); + + thread->stats.AddBytes(bytes); + thread->stats.AddMessage(msg); + + if (FLAGS_perf_level > rocksdb::PerfLevel::kDisable) { + thread->stats.AddMessage(std::string("PERF_CONTEXT:\n") + + get_perf_context()->ToString()); + } + } + void IteratorCreation(ThreadState* thread) { Duration duration(FLAGS_duration, reads_); ReadOptions options(FLAGS_verify_checksum, true); @@ -4548,9 +4923,31 @@ void VerifyDBFromDB(std::string& truth_db_name) { std::unique_ptr key_guard; Slice key = AllocateKey(&key_guard); + std::unique_ptr upper_bound_key_guard; + Slice upper_bound = AllocateKey(&upper_bound_key_guard); + std::unique_ptr lower_bound_key_guard; + Slice lower_bound = AllocateKey(&lower_bound_key_guard); + Duration duration(FLAGS_duration, reads_); char value_buffer[256]; while (!duration.Done(1)) { + int64_t seek_pos = thread->rand.Next() % FLAGS_num; + GenerateKeyFromInt((uint64_t)seek_pos, FLAGS_num, &key); + if (FLAGS_max_scan_distance != 0) { + if (FLAGS_reverse_iterator) { + GenerateKeyFromInt( + static_cast(std::max( + static_cast(0), seek_pos - FLAGS_max_scan_distance)), + FLAGS_num, &lower_bound); + options.iterate_lower_bound = &lower_bound; + } else { + GenerateKeyFromInt( + (uint64_t)std::min(FLAGS_num, seek_pos + FLAGS_max_scan_distance), + FLAGS_num, &upper_bound); + options.iterate_upper_bound = &upper_bound; + } + } + if (!FLAGS_use_tailing_iterator) { if (db_.db != nullptr) { delete single_iter; @@ -4571,7 +4968,6 @@ void VerifyDBFromDB(std::string& truth_db_name) { iter_to_use = multi_iters[thread->rand.Next() % multi_iters.size()]; } - GenerateKeyFromInt(thread->rand.Next() % FLAGS_num, FLAGS_num, &key); iter_to_use->Seek(key); read++; if (iter_to_use->Valid() && iter_to_use->key().compare(key) == 0) { @@ -5668,7 +6064,7 @@ void VerifyDBFromDB(std::string& truth_db_name) { void Replay(ThreadState* /*thread*/, DBWithColumnFamilies* db_with_cfh) { Status s; - unique_ptr trace_reader; + std::unique_ptr trace_reader; s = NewFileTraceReader(FLAGS_env, EnvOptions(), FLAGS_trace_file, &trace_reader); if (!s.ok()) { @@ -5723,6 +6119,9 @@ int db_bench_tool(int argc, char** argv) { if (FLAGS_statistics) { dbstats = rocksdb::CreateDBStatistics(); } + if (dbstats) { + dbstats->set_stats_level(static_cast(FLAGS_stats_level)); + } FLAGS_compaction_pri_e = (rocksdb::CompactionPri)FLAGS_compaction_pri; std::vector fanout = rocksdb::StringSplit( @@ -5752,6 +6151,13 @@ int db_bench_tool(int argc, char** argv) { } } #endif // ROCKSDB_LITE + if (FLAGS_use_existing_keys && !FLAGS_use_existing_db) { + fprintf(stderr, + "`-use_existing_db` must be true for `-use_existing_keys` to be " + "settable\n"); + exit(1); + } + if (!FLAGS_hdfs.empty()) { FLAGS_env = new rocksdb::HdfsEnv(FLAGS_hdfs); } @@ -5796,6 +6202,15 @@ int db_bench_tool(int argc, char** argv) { rocksdb::Benchmark benchmark; benchmark.Run(); + +#ifndef ROCKSDB_LITE + if (FLAGS_print_malloc_stats) { + std::string stats_string; + rocksdb::DumpMallocStats(&stats_string); + fprintf(stdout, "Malloc stats:\n%s\n", stats_string.c_str()); + } +#endif // ROCKSDB_LITE + return 0; } } // namespace rocksdb diff --git a/ceph/src/rocksdb/tools/db_bench_tool_test.cc b/ceph/src/rocksdb/tools/db_bench_tool_test.cc index 67426066e..1b19de5f1 100644 --- a/ceph/src/rocksdb/tools/db_bench_tool_test.cc +++ b/ceph/src/rocksdb/tools/db_bench_tool_test.cc @@ -248,6 +248,7 @@ const std::string options_file_content = R"OPTIONS_FILE( verify_checksums_in_compaction=true merge_operator=nullptr memtable_prefix_bloom_bits=0 + memtable_whole_key_filtering=true paranoid_file_checks=false inplace_update_num_locks=10000 optimize_filters_for_hits=false @@ -279,7 +280,7 @@ const std::string options_file_content = R"OPTIONS_FILE( TEST_F(DBBenchTest, OptionsFileFromFile) { const std::string kOptionsFileName = test_path_ + "/OPTIONS_flash"; - unique_ptr writable; + std::unique_ptr writable; ASSERT_OK(Env::Default()->NewWritableFile(kOptionsFileName, &writable, EnvOptions())); ASSERT_OK(writable->Append(options_file_content)); diff --git a/ceph/src/rocksdb/tools/db_crashtest.py b/ceph/src/rocksdb/tools/db_crashtest.py index 59528128b..dbabb2b4f 100644 --- a/ceph/src/rocksdb/tools/db_crashtest.py +++ b/ceph/src/rocksdb/tools/db_crashtest.py @@ -15,6 +15,9 @@ import argparse # default_params < {blackbox,whitebox}_default_params < # simple_default_params < # {blackbox,whitebox}_simple_default_params < args +# for enable_atomic_flush: +# default_params < {blackbox,whitebox}_default_params < +# atomic_flush_params < args expected_values_file = tempfile.NamedTemporaryFile() @@ -23,12 +26,14 @@ default_params = { "block_size": 16384, "cache_size": 1048576, "checkpoint_one_in": 1000000, + "compression_type": "snappy", "compression_max_dict_bytes": lambda: 16384 * random.randint(0, 1), "compression_zstd_max_train_bytes": lambda: 65536 * random.randint(0, 1), "clear_column_family_one_in": 0, "compact_files_one_in": 1000000, "compact_range_one_in": 1000000, - "delpercent": 5, + "delpercent": 4, + "delrangepercent": 1, "destroy_db_initially": 0, "enable_pipelined_write": lambda: random.randint(0, 1), "expected_values_path": expected_values_file.name, @@ -43,6 +48,7 @@ default_params = { "prefixpercent": 5, "progress_reports": 0, "readpercent": 45, + "recycle_log_file_num": lambda: random.randint(0, 1), "reopen": 20, "snapshot_hold_ops": 100000, "subcompactions": lambda: random.randint(1, 4), @@ -122,6 +128,15 @@ blackbox_simple_default_params = { whitebox_simple_default_params = {} +atomic_flush_params = { + "disable_wal": 1, + "reopen": 0, + "test_atomic_flush": 1, + # use small value for write_buffer_size so that RocksDB triggers flush + # more frequently + "write_buffer_size": 1024 * 1024, +} + def finalize_and_sanitize(src_params): dest_params = dict([(k, v() if callable(v) else v) @@ -135,6 +150,9 @@ def finalize_and_sanitize(src_params): dest_params["db"]): dest_params["use_direct_io_for_flush_and_compaction"] = 0 dest_params["use_direct_reads"] = 0 + if dest_params.get("test_batches_snapshots") == 1: + dest_params["delpercent"] += dest_params["delrangepercent"] + dest_params["delrangepercent"] = 0 return dest_params @@ -152,6 +170,8 @@ def gen_cmd_params(args): params.update(blackbox_simple_default_params) if args.test_type == 'whitebox': params.update(whitebox_simple_default_params) + if args.enable_atomic_flush: + params.update(atomic_flush_params) for k, v in vars(args).items(): if v is not None: @@ -164,7 +184,7 @@ def gen_cmd(params, unknown_params): '--{0}={1}'.format(k, v) for k, v in finalize_and_sanitize(params).items() if k not in set(['test_type', 'simple', 'duration', 'interval', - 'random_kill_odd']) + 'random_kill_odd', 'enable_atomic_flush']) and v is not None] + unknown_params return cmd @@ -356,6 +376,7 @@ def main(): db_stress multiple times") parser.add_argument("test_type", choices=["blackbox", "whitebox"]) parser.add_argument("--simple", action="store_true") + parser.add_argument("--enable_atomic_flush", action='store_true') all_params = dict(default_params.items() + blackbox_default_params.items() diff --git a/ceph/src/rocksdb/tools/db_repl_stress.cc b/ceph/src/rocksdb/tools/db_repl_stress.cc index 5901b9777..c640b5945 100644 --- a/ceph/src/rocksdb/tools/db_repl_stress.cc +++ b/ceph/src/rocksdb/tools/db_repl_stress.cc @@ -67,7 +67,7 @@ struct ReplicationThread { static void ReplicationThreadBody(void* arg) { ReplicationThread* t = reinterpret_cast(arg); DB* db = t->db; - unique_ptr iter; + std::unique_ptr iter; SequenceNumber currentSeqNum = 1; while (!t->stop.load(std::memory_order_acquire)) { iter.reset(); diff --git a/ceph/src/rocksdb/tools/db_stress.cc b/ceph/src/rocksdb/tools/db_stress.cc index 45a7c9a0d..7f8c4b53f 100644 --- a/ceph/src/rocksdb/tools/db_stress.cc +++ b/ceph/src/rocksdb/tools/db_stress.cc @@ -100,6 +100,8 @@ DEFINE_uint64(seed, 2341234, "Seed for PRNG"); static const bool FLAGS_seed_dummy __attribute__((__unused__)) = RegisterFlagValidator(&FLAGS_seed, &ValidateUint32Range); +DEFINE_bool(read_only, false, "True if open DB in read-only mode during tests"); + DEFINE_int64(max_key, 1 * KB* KB, "Max number of key/values to place in database"); @@ -133,6 +135,13 @@ DEFINE_bool(test_batches_snapshots, false, "\t(b) No long validation at the end (more speed up)\n" "\t(c) Test snapshot and atomicity of batch writes"); +DEFINE_bool(atomic_flush, false, + "If set, enables atomic flush in the options.\n"); + +DEFINE_bool(test_atomic_flush, false, + "If set, runs the stress test dedicated to verifying atomic flush " + "functionality. Setting this implies `atomic_flush=true`.\n"); + DEFINE_int32(threads, 32, "Number of concurrent threads to run."); DEFINE_int32(ttl, -1, @@ -201,6 +210,10 @@ DEFINE_double(memtable_prefix_bloom_size_ratio, "creates prefix blooms for memtables, each with size " "`write_buffer_size * memtable_prefix_bloom_size_ratio`."); +DEFINE_bool(memtable_whole_key_filtering, + rocksdb::Options().memtable_whole_key_filtering, + "Enable whole key filtering in memtables."); + DEFINE_int32(open_files, rocksdb::Options().max_open_files, "Maximum number of files to keep open at the same time " "(use default if == 0)"); @@ -366,6 +379,9 @@ extern std::vector rocksdb_kill_prefix_blacklist; DEFINE_bool(disable_wal, false, "If true, do not write WAL for write."); +DEFINE_uint64(recycle_log_file_num, rocksdb::Options().recycle_log_file_num, + "Number of old WAL files to keep around for later recycling"); + DEFINE_int64(target_file_size_base, rocksdb::Options().target_file_size_base, "Target level-1 file size for compaction"); @@ -790,46 +806,36 @@ class Stats { } } - void AddBytesForWrites(int nwrites, size_t nbytes) { + void AddBytesForWrites(long nwrites, size_t nbytes) { writes_ += nwrites; bytes_ += nbytes; } - void AddGets(int ngets, int nfounds) { + void AddGets(long ngets, long nfounds) { founds_ += nfounds; gets_ += ngets; } - void AddPrefixes(int nprefixes, int count) { + void AddPrefixes(long nprefixes, long count) { prefixes_ += nprefixes; iterator_size_sums_ += count; } - void AddIterations(int n) { - iterations_ += n; - } + void AddIterations(long n) { iterations_ += n; } - void AddDeletes(int n) { - deletes_ += n; - } + void AddDeletes(long n) { deletes_ += n; } void AddSingleDeletes(size_t n) { single_deletes_ += n; } - void AddRangeDeletions(int n) { - range_deletions_ += n; - } + void AddRangeDeletions(long n) { range_deletions_ += n; } - void AddCoveredByRangeDeletions(int n) { - covered_by_range_deletions_ += n; - } + void AddCoveredByRangeDeletions(long n) { covered_by_range_deletions_ += n; } - void AddErrors(int n) { - errors_ += n; - } + void AddErrors(long n) { errors_ += n; } - void AddNumCompactFilesSucceed(int n) { num_compact_files_succeed_ += n; } + void AddNumCompactFilesSucceed(long n) { num_compact_files_succeed_ += n; } - void AddNumCompactFilesFailed(int n) { num_compact_files_failed_ += n; } + void AddNumCompactFilesFailed(long n) { num_compact_files_failed_ += n; } void Report(const char* name) { std::string extra; @@ -948,7 +954,7 @@ class SharedState { if (status.ok()) { status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size); } - unique_ptr wfile; + std::unique_ptr wfile; if (status.ok() && size == 0) { const EnvOptions soptions; status = FLAGS_env->NewWritableFile(FLAGS_expected_values_path, &wfile, @@ -1289,7 +1295,8 @@ class DbStressListener : public EventListener { } assert(info.job_id > 0 || FLAGS_compact_files_one_in > 0); if (info.status.ok() && info.file_size > 0) { - assert(info.table_properties.data_size > 0); + assert(info.table_properties.data_size > 0 || + info.table_properties.num_range_deletions > 0); assert(info.table_properties.raw_key_size > 0); assert(info.table_properties.num_entries > 0); } @@ -1386,7 +1393,8 @@ class StressTest { txn_db_(nullptr), #endif new_column_family_name_(1), - num_times_reopened_(0) { + num_times_reopened_(0), + db_preload_finished_(false) { if (FLAGS_destroy_db_initially) { std::vector files; FLAGS_env->GetChildren(FLAGS_db, &files); @@ -1395,7 +1403,14 @@ class StressTest { FLAGS_env->DeleteFile(FLAGS_db + "/" + files[i]); } } - DestroyDB(FLAGS_db, Options()); + Options options; + options.env = FLAGS_env; + Status s = DestroyDB(FLAGS_db, options); + if (!s.ok()) { + fprintf(stderr, "Cannot destroy original db: %s\n", + s.ToString().c_str()); + exit(1); + } } } @@ -1513,6 +1528,13 @@ class StressTest { Open(); BuildOptionsTable(); SharedState shared(this); + + if (FLAGS_read_only) { + now = FLAGS_env->NowMicros(); + fprintf(stdout, "%s Preloading db with %" PRIu64 " KVs\n", + FLAGS_env->TimeToString(now / 1000000).c_str(), FLAGS_max_key); + PreloadDbAndReopenAsReadOnly(FLAGS_max_key, &shared); + } uint32_t n = shared.GetNumThreads(); now = FLAGS_env->NowMicros(); @@ -1743,6 +1765,9 @@ class StressTest { } } if (snap_state.key_vec != nullptr) { + // When `prefix_extractor` is set, seeking to beginning and scanning + // across prefixes are only supported with `total_order_seek` set. + ropt.total_order_seek = true; std::unique_ptr iterator(db->NewIterator(ropt)); std::unique_ptr> tmp_bitvec(new std::vector(FLAGS_max_key)); for (iterator->SeekToFirst(); iterator->Valid(); iterator->Next()) { @@ -1760,6 +1785,93 @@ class StressTest { return Status::OK(); } + // Currently PreloadDb has to be single-threaded. + void PreloadDbAndReopenAsReadOnly(int64_t number_of_keys, + SharedState* shared) { + WriteOptions write_opts; + write_opts.disableWAL = FLAGS_disable_wal; + if (FLAGS_sync) { + write_opts.sync = true; + } + char value[100]; + int cf_idx = 0; + Status s; + for (auto cfh : column_families_) { + for (int64_t k = 0; k != number_of_keys; ++k) { + std::string key_str = Key(k); + Slice key = key_str; + size_t sz = GenerateValue(0 /*value_base*/, value, sizeof(value)); + Slice v(value, sz); + shared->Put(cf_idx, k, 0, true /* pending */); + + if (FLAGS_use_merge) { + if (!FLAGS_use_txn) { + s = db_->Merge(write_opts, cfh, key, v); + } else { +#ifndef ROCKSDB_LITE + Transaction* txn; + s = NewTxn(write_opts, &txn); + if (s.ok()) { + s = txn->Merge(cfh, key, v); + if (s.ok()) { + s = CommitTxn(txn); + } + } +#endif + } + } else { + if (!FLAGS_use_txn) { + s = db_->Put(write_opts, cfh, key, v); + } else { +#ifndef ROCKSDB_LITE + Transaction* txn; + s = NewTxn(write_opts, &txn); + if (s.ok()) { + s = txn->Put(cfh, key, v); + if (s.ok()) { + s = CommitTxn(txn); + } + } +#endif + } + } + + shared->Put(cf_idx, k, 0, false /* pending */); + if (!s.ok()) { + break; + } + } + if (!s.ok()) { + break; + } + ++cf_idx; + } + if (s.ok()) { + s = db_->Flush(FlushOptions(), column_families_); + } + if (s.ok()) { + for (auto cf : column_families_) { + delete cf; + } + column_families_.clear(); + delete db_; + db_ = nullptr; +#ifndef ROCKSDB_LITE + txn_db_ = nullptr; +#endif + + db_preload_finished_.store(true); + auto now = FLAGS_env->NowMicros(); + fprintf(stdout, "%s Reopening database in read-only\n", + FLAGS_env->TimeToString(now / 1000000).c_str()); + // Reopen as read-only, can ignore all options related to updates + Open(); + } else { + fprintf(stderr, "Failed to preload db"); + exit(1); + } + } + Status SetOptions(ThreadState* thread) { assert(FLAGS_set_options_one_in > 0); std::unordered_map opts; @@ -1847,8 +1959,7 @@ class StressTest { if (thread->shared->AllVotedReopen()) { thread->shared->GetStressTest()->Reopen(); thread->shared->GetCondVar()->SignalAll(); - } - else { + } else { thread->shared->GetCondVar()->Wait(); } // Commenting this out as we don't want to reset stats on each open. @@ -1870,49 +1981,6 @@ class StressTest { MaybeClearOneColumnFamily(thread); #ifndef ROCKSDB_LITE - if (FLAGS_checkpoint_one_in > 0 && - thread->rand.Uniform(FLAGS_checkpoint_one_in) == 0) { - std::string checkpoint_dir = - FLAGS_db + "/.checkpoint" + ToString(thread->tid); - DestroyDB(checkpoint_dir, Options()); - Checkpoint* checkpoint; - Status s = Checkpoint::Create(db_, &checkpoint); - if (s.ok()) { - s = checkpoint->CreateCheckpoint(checkpoint_dir); - } - std::vector files; - if (s.ok()) { - s = FLAGS_env->GetChildren(checkpoint_dir, &files); - } - DestroyDB(checkpoint_dir, Options()); - delete checkpoint; - if (!s.ok()) { - printf("A checkpoint operation failed with: %s\n", - s.ToString().c_str()); - } - } - - if (FLAGS_backup_one_in > 0 && - thread->rand.Uniform(FLAGS_backup_one_in) == 0) { - std::string backup_dir = FLAGS_db + "/.backup" + ToString(thread->tid); - BackupableDBOptions backup_opts(backup_dir); - BackupEngine* backup_engine = nullptr; - Status s = BackupEngine::Open(FLAGS_env, backup_opts, &backup_engine); - if (s.ok()) { - s = backup_engine->CreateNewBackup(db_); - } - if (s.ok()) { - s = backup_engine->PurgeOldBackups(0 /* num_backups_to_keep */); - } - if (!s.ok()) { - printf("A BackupEngine operation failed with: %s\n", - s.ToString().c_str()); - } - if (backup_engine != nullptr) { - delete backup_engine; - } - } - if (FLAGS_compact_files_one_in > 0 && thread->rand.Uniform(FLAGS_compact_files_one_in) == 0) { auto* random_cf = @@ -1975,15 +2043,6 @@ class StressTest { auto column_family = column_families_[rand_column_family]; - if (FLAGS_flush_one_in > 0 && - thread->rand.Uniform(FLAGS_flush_one_in) == 0) { - FlushOptions flush_opts; - Status status = db_->Flush(flush_opts, column_family); - if (!status.ok()) { - fprintf(stdout, "Unable to perform Flush(): %s\n", status.ToString().c_str()); - } - } - if (FLAGS_compact_range_one_in > 0 && thread->rand.Uniform(FLAGS_compact_range_one_in) == 0) { int64_t end_key_num; @@ -2007,6 +2066,21 @@ class StressTest { std::vector rand_column_families = GenerateColumnFamilies(FLAGS_column_families, rand_column_family); + + if (FLAGS_flush_one_in > 0 && + thread->rand.Uniform(FLAGS_flush_one_in) == 0) { + FlushOptions flush_opts; + std::vector cfhs; + std::for_each( + rand_column_families.begin(), rand_column_families.end(), + [this, &cfhs](int k) { cfhs.push_back(column_families_[k]); }); + Status status = db_->Flush(flush_opts, cfhs); + if (!status.ok()) { + fprintf(stdout, "Unable to perform Flush(): %s\n", + status.ToString().c_str()); + } + } + std::vector rand_keys = GenerateKeys(rand_key); if (FLAGS_ingest_external_file_one_in > 0 && @@ -2014,6 +2088,23 @@ class StressTest { TestIngestExternalFile(thread, rand_column_families, rand_keys, lock); } + if (FLAGS_backup_one_in > 0 && + thread->rand.Uniform(FLAGS_backup_one_in) == 0) { + Status s = TestBackupRestore(thread, rand_column_families, rand_keys); + if (!s.ok()) { + VerificationAbort(shared, "Backup/restore gave inconsistent state", + s); + } + } + + if (FLAGS_checkpoint_one_in > 0 && + thread->rand.Uniform(FLAGS_checkpoint_one_in) == 0) { + Status s = TestCheckpoint(thread, rand_column_families, rand_keys); + if (!s.ok()) { + VerificationAbort(shared, "Checkpoint gave inconsistent state", s); + } + } + if (FLAGS_acquire_snapshot_one_in > 0 && thread->rand.Uniform(FLAGS_acquire_snapshot_one_in) == 0) { auto snapshot = db_->GetSnapshot(); @@ -2029,6 +2120,9 @@ class StressTest { if (FLAGS_compare_full_db_state_snapshot && (thread->tid == 0)) { key_vec = new std::vector(FLAGS_max_key); + // When `prefix_extractor` is set, seeking to beginning and scanning + // across prefixes are only supported with `total_order_seek` set. + ropt.total_order_seek = true; std::unique_ptr iterator(db_->NewIterator(ropt)); for (iterator->SeekToFirst(); iterator->Valid(); iterator->Next()) { uint64_t key_val; @@ -2199,6 +2293,190 @@ class StressTest { return s; } +#ifdef ROCKSDB_LITE + virtual Status TestBackupRestore( + ThreadState* /* thread */, + const std::vector& /* rand_column_families */, + const std::vector& /* rand_keys */) { + assert(false); + fprintf(stderr, + "RocksDB lite does not support " + "TestBackupRestore\n"); + std::terminate(); + } + + virtual Status TestCheckpoint( + ThreadState* /* thread */, + const std::vector& /* rand_column_families */, + const std::vector& /* rand_keys */) { + assert(false); + fprintf(stderr, + "RocksDB lite does not support " + "TestCheckpoint\n"); + std::terminate(); + } +#else // ROCKSDB_LITE + virtual Status TestBackupRestore(ThreadState* thread, + const std::vector& rand_column_families, + const std::vector& rand_keys) { + // Note the column families chosen by `rand_column_families` cannot be + // dropped while the locks for `rand_keys` are held. So we should not have + // to worry about accessing those column families throughout this function. + assert(rand_column_families.size() == rand_keys.size()); + std::string backup_dir = FLAGS_db + "/.backup" + ToString(thread->tid); + std::string restore_dir = FLAGS_db + "/.restore" + ToString(thread->tid); + BackupableDBOptions backup_opts(backup_dir); + BackupEngine* backup_engine = nullptr; + Status s = BackupEngine::Open(FLAGS_env, backup_opts, &backup_engine); + if (s.ok()) { + s = backup_engine->CreateNewBackup(db_); + } + if (s.ok()) { + delete backup_engine; + backup_engine = nullptr; + s = BackupEngine::Open(FLAGS_env, backup_opts, &backup_engine); + } + if (s.ok()) { + s = backup_engine->RestoreDBFromLatestBackup(restore_dir /* db_dir */, + restore_dir /* wal_dir */); + } + if (s.ok()) { + s = backup_engine->PurgeOldBackups(0 /* num_backups_to_keep */); + } + DB* restored_db = nullptr; + std::vector restored_cf_handles; + if (s.ok()) { + Options restore_options(options_); + restore_options.listeners.clear(); + std::vector cf_descriptors; + // TODO(ajkr): `column_family_names_` is not safe to access here when + // `clear_column_family_one_in != 0`. But we can't easily switch to + // `ListColumnFamilies` to get names because it won't necessarily give + // the same order as `column_family_names_`. + assert(FLAGS_clear_column_family_one_in == 0); + for (auto name : column_family_names_) { + cf_descriptors.emplace_back(name, ColumnFamilyOptions(restore_options)); + } + s = DB::Open(DBOptions(restore_options), restore_dir, cf_descriptors, + &restored_cf_handles, &restored_db); + } + // for simplicity, currently only verifies existence/non-existence of a few + // keys + for (size_t i = 0; s.ok() && i < rand_column_families.size(); ++i) { + std::string key_str = Key(rand_keys[i]); + Slice key = key_str; + std::string restored_value; + Status get_status = restored_db->Get( + ReadOptions(), restored_cf_handles[rand_column_families[i]], key, + &restored_value); + bool exists = + thread->shared->Exists(rand_column_families[i], rand_keys[i]); + if (get_status.ok()) { + if (!exists) { + s = Status::Corruption( + "key exists in restore but not in original db"); + } + } else if (get_status.IsNotFound()) { + if (exists) { + s = Status::Corruption( + "key exists in original db but not in restore"); + } + } else { + s = get_status; + } + } + if (backup_engine != nullptr) { + delete backup_engine; + backup_engine = nullptr; + } + if (restored_db != nullptr) { + for (auto* cf_handle : restored_cf_handles) { + restored_db->DestroyColumnFamilyHandle(cf_handle); + } + delete restored_db; + restored_db = nullptr; + } + if (!s.ok()) { + printf("A backup/restore operation failed with: %s\n", + s.ToString().c_str()); + } + return s; + } + + virtual Status TestCheckpoint(ThreadState* thread, + const std::vector& rand_column_families, + const std::vector& rand_keys) { + // Note the column families chosen by `rand_column_families` cannot be + // dropped while the locks for `rand_keys` are held. So we should not have + // to worry about accessing those column families throughout this function. + assert(rand_column_families.size() == rand_keys.size()); + std::string checkpoint_dir = + FLAGS_db + "/.checkpoint" + ToString(thread->tid); + DestroyDB(checkpoint_dir, Options()); + Checkpoint* checkpoint = nullptr; + Status s = Checkpoint::Create(db_, &checkpoint); + if (s.ok()) { + s = checkpoint->CreateCheckpoint(checkpoint_dir); + } + std::vector cf_handles; + DB* checkpoint_db = nullptr; + if (s.ok()) { + delete checkpoint; + checkpoint = nullptr; + Options options(options_); + options.listeners.clear(); + std::vector cf_descs; + // TODO(ajkr): `column_family_names_` is not safe to access here when + // `clear_column_family_one_in != 0`. But we can't easily switch to + // `ListColumnFamilies` to get names because it won't necessarily give + // the same order as `column_family_names_`. + if (FLAGS_clear_column_family_one_in == 0) { + for (const auto& name : column_family_names_) { + cf_descs.emplace_back(name, ColumnFamilyOptions(options)); + } + s = DB::OpenForReadOnly(DBOptions(options), checkpoint_dir, cf_descs, + &cf_handles, &checkpoint_db); + } + } + if (checkpoint_db != nullptr) { + for (size_t i = 0; s.ok() && i < rand_column_families.size(); ++i) { + std::string key_str = Key(rand_keys[i]); + Slice key = key_str; + std::string value; + Status get_status = checkpoint_db->Get( + ReadOptions(), cf_handles[rand_column_families[i]], key, &value); + bool exists = + thread->shared->Exists(rand_column_families[i], rand_keys[i]); + if (get_status.ok()) { + if (!exists) { + s = Status::Corruption( + "key exists in checkpoint but not in original db"); + } + } else if (get_status.IsNotFound()) { + if (exists) { + s = Status::Corruption( + "key exists in original db but not in checkpoint"); + } + } else { + s = get_status; + } + } + for (auto cfh : cf_handles) { + delete cfh; + } + cf_handles.clear(); + delete checkpoint_db; + checkpoint_db = nullptr; + } + DestroyDB(checkpoint_dir, Options()); + if (!s.ok()) { + fprintf(stderr, "A checkpoint operation failed with: %s\n", + s.ToString().c_str()); + } + return s; + } +#endif // ROCKSDB_LITE + void VerificationAbort(SharedState* shared, std::string msg, Status s) const { printf("Verification failed: %s. Status is %s\n", msg.c_str(), s.ToString().c_str()); @@ -2218,6 +2496,10 @@ class StressTest { fprintf(stdout, "Format version : %d\n", FLAGS_format_version); fprintf(stdout, "TransactionDB : %s\n", FLAGS_use_txn ? "true" : "false"); + fprintf(stdout, "Read only mode : %s\n", + FLAGS_read_only ? "true" : "false"); + fprintf(stdout, "Atomic flush : %s\n", + FLAGS_atomic_flush ? "true" : "false"); fprintf(stdout, "Column families : %d\n", FLAGS_column_families); if (!FLAGS_test_batches_snapshots) { fprintf(stdout, "Clear CFs one in : %d\n", @@ -2315,6 +2597,8 @@ class StressTest { FLAGS_max_write_buffer_number_to_maintain; options_.memtable_prefix_bloom_size_ratio = FLAGS_memtable_prefix_bloom_size_ratio; + options_.memtable_whole_key_filtering = + FLAGS_memtable_whole_key_filtering; options_.max_background_compactions = FLAGS_max_background_compactions; options_.max_background_flushes = FLAGS_max_background_flushes; options_.compaction_style = @@ -2331,6 +2615,8 @@ class StressTest { options_.use_direct_reads = FLAGS_use_direct_reads; options_.use_direct_io_for_flush_and_compaction = FLAGS_use_direct_io_for_flush_and_compaction; + options_.recycle_log_file_num = + static_cast(FLAGS_recycle_log_file_num); options_.target_file_size_base = FLAGS_target_file_size_base; options_.target_file_size_multiplier = FLAGS_target_file_size_multiplier; options_.max_bytes_for_level_base = FLAGS_max_bytes_for_level_base; @@ -2363,6 +2649,7 @@ class StressTest { FLAGS_universal_max_merge_width; options_.compaction_options_universal.max_size_amplification_percent = FLAGS_universal_max_size_amplification_percent; + options_.atomic_flush = FLAGS_atomic_flush; } else { #ifdef ROCKSDB_LITE fprintf(stderr, "--options_file not supported in lite mode\n"); @@ -2484,8 +2771,13 @@ class StressTest { new DbStressListener(FLAGS_db, options_.db_paths, cf_descriptors)); options_.create_missing_column_families = true; if (!FLAGS_use_txn) { - s = DB::Open(DBOptions(options_), FLAGS_db, cf_descriptors, - &column_families_, &db_); + if (db_preload_finished_.load() && FLAGS_read_only) { + s = DB::OpenForReadOnly(DBOptions(options_), FLAGS_db, cf_descriptors, + &column_families_, &db_); + } else { + s = DB::Open(DBOptions(options_), FLAGS_db, cf_descriptors, + &column_families_, &db_); + } } else { #ifndef ROCKSDB_LITE TransactionDBOptions txn_db_options; @@ -2570,6 +2862,7 @@ class StressTest { int num_times_reopened_; std::unordered_map> options_table_; std::vector options_index_; + std::atomic db_preload_finished_; }; class NonBatchedOpsStressTest : public StressTest { @@ -2594,7 +2887,7 @@ class NonBatchedOpsStressTest : public StressTest { } if (!thread->rand.OneIn(2)) { // Use iterator to verify this range - unique_ptr iter( + std::unique_ptr iter( db_->NewIterator(options, column_families_[cf])); iter->Seek(Key(start)); for (auto i = start; i < end; i++) { @@ -2733,16 +3026,15 @@ class NonBatchedOpsStressTest : public StressTest { } Iterator* iter = db_->NewIterator(ro_copy, cfh); - int64_t count = 0; + long count = 0; for (iter->Seek(prefix); iter->Valid() && iter->key().starts_with(prefix); iter->Next()) { ++count; } - assert(count <= - (static_cast(1) << ((8 - FLAGS_prefix_size) * 8))); + assert(count <= (static_cast(1) << ((8 - FLAGS_prefix_size) * 8))); Status s = iter->status(); if (iter->status().ok()) { - thread->stats.AddPrefixes(1, static_cast(count)); + thread->stats.AddPrefixes(1, count); } else { thread->stats.AddErrors(1); } @@ -3272,7 +3564,7 @@ class BatchedOpsStressTest : public StressTest { iters[i]->Seek(prefix_slices[i]); } - int count = 0; + long count = 0; while (iters[0]->Valid() && iters[0]->key().starts_with(prefix_slices[0])) { count++; std::string values[10]; @@ -3327,6 +3619,334 @@ class BatchedOpsStressTest : public StressTest { virtual void VerifyDb(ThreadState* /* thread */) const {} }; +class AtomicFlushStressTest : public StressTest { + public: + AtomicFlushStressTest() : batch_id_(0) {} + + virtual ~AtomicFlushStressTest() {} + + virtual Status TestPut(ThreadState* thread, WriteOptions& write_opts, + const ReadOptions& /* read_opts */, + const std::vector& rand_column_families, + const std::vector& rand_keys, + char (&value)[100], + std::unique_ptr& /* lock */) { + std::string key_str = Key(rand_keys[0]); + Slice key = key_str; + uint64_t value_base = batch_id_.fetch_add(1); + size_t sz = + GenerateValue(static_cast(value_base), value, sizeof(value)); + Slice v(value, sz); + WriteBatch batch; + for (auto cf : rand_column_families) { + ColumnFamilyHandle* cfh = column_families_[cf]; + if (FLAGS_use_merge) { + batch.Merge(cfh, key, v); + } else { /* !FLAGS_use_merge */ + batch.Put(cfh, key, v); + } + } + Status s = db_->Write(write_opts, &batch); + if (!s.ok()) { + fprintf(stderr, "multi put or merge error: %s\n", s.ToString().c_str()); + thread->stats.AddErrors(1); + } else { + auto num = static_cast(rand_column_families.size()); + thread->stats.AddBytesForWrites(num, (sz + 1) * num); + } + + return s; + } + + virtual Status TestDelete(ThreadState* thread, WriteOptions& write_opts, + const std::vector& rand_column_families, + const std::vector& rand_keys, + std::unique_ptr& /* lock */) { + std::string key_str = Key(rand_keys[0]); + Slice key = key_str; + WriteBatch batch; + for (auto cf : rand_column_families) { + ColumnFamilyHandle* cfh = column_families_[cf]; + batch.Delete(cfh, key); + } + Status s = db_->Write(write_opts, &batch); + if (!s.ok()) { + fprintf(stderr, "multidel error: %s\n", s.ToString().c_str()); + thread->stats.AddErrors(1); + } else { + thread->stats.AddDeletes(static_cast(rand_column_families.size())); + } + return s; + } + + virtual Status TestDeleteRange(ThreadState* thread, WriteOptions& write_opts, + const std::vector& rand_column_families, + const std::vector& rand_keys, + std::unique_ptr& /* lock */) { + int64_t rand_key = rand_keys[0]; + auto shared = thread->shared; + int64_t max_key = shared->GetMaxKey(); + if (rand_key > max_key - FLAGS_range_deletion_width) { + rand_key = + thread->rand.Next() % (max_key - FLAGS_range_deletion_width + 1); + } + std::string key_str = Key(rand_key); + Slice key = key_str; + std::string end_key_str = Key(rand_key + FLAGS_range_deletion_width); + Slice end_key = end_key_str; + WriteBatch batch; + for (auto cf : rand_column_families) { + ColumnFamilyHandle* cfh = column_families_[rand_column_families[cf]]; + batch.DeleteRange(cfh, key, end_key); + } + Status s = db_->Write(write_opts, &batch); + if (!s.ok()) { + fprintf(stderr, "multi del range error: %s\n", s.ToString().c_str()); + thread->stats.AddErrors(1); + } else { + thread->stats.AddRangeDeletions( + static_cast(rand_column_families.size())); + } + return s; + } + + virtual void TestIngestExternalFile( + ThreadState* /* thread */, + const std::vector& /* rand_column_families */, + const std::vector& /* rand_keys */, + std::unique_ptr& /* lock */) { + assert(false); + fprintf(stderr, + "AtomicFlushStressTest does not support TestIngestExternalFile " + "because it's not possible to verify the result\n"); + std::terminate(); + } + + virtual Status TestGet(ThreadState* thread, const ReadOptions& readoptions, + const std::vector& rand_column_families, + const std::vector& rand_keys) { + std::string key_str = Key(rand_keys[0]); + Slice key = key_str; + auto cfh = + column_families_[rand_column_families[thread->rand.Next() % + rand_column_families.size()]]; + std::string from_db; + Status s = db_->Get(readoptions, cfh, key, &from_db); + if (s.ok()) { + thread->stats.AddGets(1, 1); + } else if (s.IsNotFound()) { + thread->stats.AddGets(1, 0); + } else { + thread->stats.AddErrors(1); + } + return s; + } + + virtual Status TestPrefixScan(ThreadState* thread, + const ReadOptions& readoptions, + const std::vector& rand_column_families, + const std::vector& rand_keys) { + std::string key_str = Key(rand_keys[0]); + Slice key = key_str; + Slice prefix = Slice(key.data(), FLAGS_prefix_size); + + std::string upper_bound; + Slice ub_slice; + ReadOptions ro_copy = readoptions; + if (thread->rand.OneIn(2) && GetNextPrefix(prefix, &upper_bound)) { + ub_slice = Slice(upper_bound); + ro_copy.iterate_upper_bound = &ub_slice; + } + auto cfh = + column_families_[rand_column_families[thread->rand.Next() % + rand_column_families.size()]]; + Iterator* iter = db_->NewIterator(ro_copy, cfh); + long count = 0; + for (iter->Seek(prefix); iter->Valid() && iter->key().starts_with(prefix); + iter->Next()) { + ++count; + } + assert(count <= (static_cast(1) << ((8 - FLAGS_prefix_size) * 8))); + Status s = iter->status(); + if (s.ok()) { + thread->stats.AddPrefixes(1, count); + } else { + thread->stats.AddErrors(1); + } + delete iter; + return s; + } + +#ifdef ROCKSDB_LITE + virtual Status TestCheckpoint( + ThreadState* /* thread */, + const std::vector& /* rand_column_families */, + const std::vector& /* rand_keys */) { + assert(false); + fprintf(stderr, + "RocksDB lite does not support " + "TestCheckpoint\n"); + std::terminate(); + } +#else + virtual Status TestCheckpoint( + ThreadState* thread, const std::vector& /* rand_column_families */, + const std::vector& /* rand_keys */) { + std::string checkpoint_dir = + FLAGS_db + "/.checkpoint" + ToString(thread->tid); + DestroyDB(checkpoint_dir, Options()); + Checkpoint* checkpoint = nullptr; + Status s = Checkpoint::Create(db_, &checkpoint); + if (s.ok()) { + s = checkpoint->CreateCheckpoint(checkpoint_dir); + } + std::vector cf_handles; + DB* checkpoint_db = nullptr; + if (s.ok()) { + delete checkpoint; + checkpoint = nullptr; + Options options(options_); + options.listeners.clear(); + std::vector cf_descs; + // TODO(ajkr): `column_family_names_` is not safe to access here when + // `clear_column_family_one_in != 0`. But we can't easily switch to + // `ListColumnFamilies` to get names because it won't necessarily give + // the same order as `column_family_names_`. + if (FLAGS_clear_column_family_one_in == 0) { + for (const auto& name : column_family_names_) { + cf_descs.emplace_back(name, ColumnFamilyOptions(options)); + } + s = DB::OpenForReadOnly(DBOptions(options), checkpoint_dir, cf_descs, + &cf_handles, &checkpoint_db); + } + } + if (checkpoint_db != nullptr) { + for (auto cfh : cf_handles) { + delete cfh; + } + cf_handles.clear(); + delete checkpoint_db; + checkpoint_db = nullptr; + } + DestroyDB(checkpoint_dir, Options()); + if (!s.ok()) { + fprintf(stderr, "A checkpoint operation failed with: %s\n", + s.ToString().c_str()); + } + return s; + } +#endif // !ROCKSDB_LITE + + virtual void VerifyDb(ThreadState* thread) const { + ReadOptions options(FLAGS_verify_checksum, true); + // We must set total_order_seek to true because we are doing a SeekToFirst + // on a column family whose memtables may support (by default) prefix-based + // iterator. In this case, NewIterator with options.total_order_seek being + // false returns a prefix-based iterator. Calling SeekToFirst using this + // iterator causes the iterator to become invalid. That means we cannot + // iterate the memtable using this iterator any more, although the memtable + // contains the most up-to-date key-values. + options.total_order_seek = true; + assert(thread != nullptr); + auto shared = thread->shared; + std::vector > iters(column_families_.size()); + for (size_t i = 0; i != column_families_.size(); ++i) { + iters[i].reset(db_->NewIterator(options, column_families_[i])); + } + for (auto& iter : iters) { + iter->SeekToFirst(); + } + size_t num = column_families_.size(); + assert(num == iters.size()); + std::vector statuses(num, Status::OK()); + do { + size_t valid_cnt = 0; + size_t idx = 0; + for (auto& iter : iters) { + if (iter->Valid()) { + ++valid_cnt; + } else { + statuses[idx] = iter->status(); + } + ++idx; + } + if (valid_cnt == 0) { + Status status; + for (size_t i = 0; i != num; ++i) { + const auto& s = statuses[i]; + if (!s.ok()) { + status = s; + fprintf(stderr, "Iterator on cf %s has error: %s\n", + column_families_[i]->GetName().c_str(), + s.ToString().c_str()); + shared->SetVerificationFailure(); + } + } + if (status.ok()) { + fprintf(stdout, "Finished scanning all column families.\n"); + } + break; + } else if (valid_cnt != iters.size()) { + for (size_t i = 0; i != num; ++i) { + if (!iters[i]->Valid()) { + if (statuses[i].ok()) { + fprintf(stderr, "Finished scanning cf %s\n", + column_families_[i]->GetName().c_str()); + } else { + fprintf(stderr, "Iterator on cf %s has error: %s\n", + column_families_[i]->GetName().c_str(), + statuses[i].ToString().c_str()); + } + } else { + fprintf(stderr, "cf %s has remaining data to scan\n", + column_families_[i]->GetName().c_str()); + } + } + shared->SetVerificationFailure(); + break; + } + // If the program reaches here, then all column families' iterators are + // still valid. + Slice key; + Slice value; + for (size_t i = 0; i != num; ++i) { + if (i == 0) { + key = iters[i]->key(); + value = iters[i]->value(); + } else { + if (key.compare(iters[i]->key()) != 0) { + fprintf(stderr, "Verification failed\n"); + fprintf(stderr, "cf%s: %s => %s\n", + column_families_[0]->GetName().c_str(), + key.ToString(true /* hex */).c_str(), + value.ToString(/* hex */).c_str()); + fprintf(stderr, "cf%s: %s => %s\n", + column_families_[i]->GetName().c_str(), + iters[i]->key().ToString(true /* hex */).c_str(), + iters[i]->value().ToString(true /* hex */).c_str()); + shared->SetVerificationFailure(); + } + } + } + for (auto& iter : iters) { + iter->Next(); + } + } while (true); + } + + virtual std::vector GenerateColumnFamilies( + const int /* num_column_families */, int /* rand_column_family */) const { + std::vector ret; + int num = static_cast(column_families_.size()); + int k = 0; + std::generate_n(back_inserter(ret), num, [&k]() -> int { return k++; }); + return ret; + } + + private: + std::atomic batch_id_; +}; + } // namespace rocksdb int main(int argc, char** argv) { @@ -3415,6 +4035,26 @@ int main(int argc, char** argv) { "Error: nooverwritepercent must be 0 when using file ingestion\n"); exit(1); } + if (FLAGS_clear_column_family_one_in > 0 && FLAGS_backup_one_in > 0) { + fprintf(stderr, + "Error: clear_column_family_one_in must be 0 when using backup\n"); + exit(1); + } + if (FLAGS_test_atomic_flush) { + FLAGS_atomic_flush = true; + } + if (FLAGS_read_only) { + if (FLAGS_writepercent != 0 || FLAGS_delpercent != 0 || + FLAGS_delrangepercent != 0) { + fprintf(stderr, "Error: updates are not supported in read only mode\n"); + exit(1); + } else if (FLAGS_checkpoint_one_in > 0 && + FLAGS_clear_column_family_one_in > 0) { + fprintf(stdout, + "Warn: checkpoint won't be validated since column families may " + "be dropped.\n"); + } + } // Choose a location for the test database if none given with --db= if (FLAGS_db.empty()) { @@ -3428,7 +4068,9 @@ int main(int argc, char** argv) { rocksdb_kill_prefix_blacklist = SplitString(FLAGS_kill_prefix_blacklist); std::unique_ptr stress; - if (FLAGS_test_batches_snapshots) { + if (FLAGS_test_atomic_flush) { + stress.reset(new rocksdb::AtomicFlushStressTest()); + } else if (FLAGS_test_batches_snapshots) { stress.reset(new rocksdb::BatchedOpsStressTest()); } else { stress.reset(new rocksdb::NonBatchedOpsStressTest()); diff --git a/ceph/src/rocksdb/tools/ldb_cmd.cc b/ceph/src/rocksdb/tools/ldb_cmd.cc index 4b6f6f4d8..e106bfbb2 100644 --- a/ceph/src/rocksdb/tools/ldb_cmd.cc +++ b/ceph/src/rocksdb/tools/ldb_cmd.cc @@ -16,7 +16,7 @@ #include "db/dbformat.h" #include "db/log_reader.h" #include "db/write_batch_internal.h" -#include "port/dirent.h" +#include "port/port_dirent.h" #include "rocksdb/cache.h" #include "rocksdb/table_properties.h" #include "rocksdb/utilities/backupable_db.h" @@ -81,10 +81,12 @@ const char* LDBCommand::DELIM = " ==> "; namespace { -void DumpWalFile(std::string wal_file, bool print_header, bool print_values, - bool is_write_committed, LDBCommandExecuteResult* exec_state); +void DumpWalFile(Options options, std::string wal_file, bool print_header, + bool print_values, bool is_write_committed, + LDBCommandExecuteResult* exec_state); -void DumpSstFile(std::string filename, bool output_hex, bool show_properties); +void DumpSstFile(Options options, std::string filename, bool output_hex, + bool show_properties); }; LDBCommand* LDBCommand::InitFromCmdLineArgs( @@ -130,12 +132,19 @@ LDBCommand* LDBCommand::InitFromCmdLineArgs( for (const auto& arg : args) { if (arg[0] == '-' && arg[1] == '-'){ std::vector splits = StringSplit(arg, '='); + // --option_name=option_value if (splits.size() == 2) { std::string optionKey = splits[0].substr(OPTION_PREFIX.size()); parsed_params.option_map[optionKey] = splits[1]; - } else { + } else if (splits.size() == 1) { + // --flag_name std::string optionKey = splits[0].substr(OPTION_PREFIX.size()); parsed_params.flags.push_back(optionKey); + } else { + // --option_name=option_value, option_value contains '=' + std::string optionKey = splits[0].substr(OPTION_PREFIX.size()); + parsed_params.option_map[optionKey] = + arg.substr(splits[0].length() + 1); } } else { cmdTokens.push_back(arg); @@ -330,6 +339,12 @@ void LDBCommand::OpenDB() { db_ = nullptr; return; } + if (options_.env->FileExists(options_.wal_dir).IsNotFound()) { + options_.wal_dir = db_path_; + fprintf( + stderr, + "wal_dir loaded from the option file doesn't exist. Ignore it.\n"); + } } options_ = PrepareOptionsForOpenDB(); if (!exec_state_.IsNotStarted()) { @@ -928,8 +943,8 @@ void DBLoaderCommand::DoCommand() { namespace { -void DumpManifestFile(std::string file, bool verbose, bool hex, bool json) { - Options options; +void DumpManifestFile(Options options, std::string file, bool verbose, bool hex, + bool json) { EnvOptions sopt; std::string dbname("dummy"); std::shared_ptr tc(NewLRUCache(options.max_open_files - 10, @@ -1030,7 +1045,7 @@ void ManifestDumpCommand::DoCommand() { printf("Processing Manifest file %s\n", manifestfile.c_str()); } - DumpManifestFile(manifestfile, verbose_, is_key_hex_, json_); + DumpManifestFile(options_, manifestfile, verbose_, is_key_hex_, json_); if (verbose_) { printf("Processing Manifest file %s done\n", manifestfile.c_str()); @@ -1425,14 +1440,15 @@ void DBDumperCommand::DoCommand() { switch (type) { case kLogFile: // TODO(myabandeh): allow configuring is_write_commited - DumpWalFile(path_, /* print_header_ */ true, /* print_values_ */ true, - true /* is_write_commited */, &exec_state_); + DumpWalFile(options_, path_, /* print_header_ */ true, + /* print_values_ */ true, true /* is_write_commited */, + &exec_state_); break; case kTableFile: - DumpSstFile(path_, is_key_hex_, /* show_properties */ true); + DumpSstFile(options_, path_, is_key_hex_, /* show_properties */ true); break; case kDescriptorFile: - DumpManifestFile(path_, /* verbose_ */ false, is_key_hex_, + DumpManifestFile(options_, path_, /* verbose_ */ false, is_key_hex_, /* json_ */ false); break; default: @@ -1460,7 +1476,9 @@ void DBDumperCommand::DoDumpCommand() { } // Setup key iterator - Iterator* iter = db_->NewIterator(ReadOptions(), GetCfHandle()); + ReadOptions scan_read_opts; + scan_read_opts.total_order_seek = true; + Iterator* iter = db_->NewIterator(scan_read_opts, GetCfHandle()); Status st = iter->status(); if (!st.ok()) { exec_state_ = @@ -1860,7 +1878,7 @@ void ChangeCompactionStyleCommand::DoCommand() { namespace { struct StdErrReporter : public log::Reader::Reporter { - virtual void Corruption(size_t /*bytes*/, const Status& s) override { + void Corruption(size_t /*bytes*/, const Status& s) override { std::cerr << "Corruption detected in log file " << s.ToString() << "\n"; } }; @@ -1885,74 +1903,71 @@ class InMemoryHandler : public WriteBatch::Handler { } } - virtual Status PutCF(uint32_t cf, const Slice& key, - const Slice& value) override { + Status PutCF(uint32_t cf, const Slice& key, const Slice& value) override { row_ << "PUT(" << cf << ") : "; commonPutMerge(key, value); return Status::OK(); } - virtual Status MergeCF(uint32_t cf, const Slice& key, - const Slice& value) override { + Status MergeCF(uint32_t cf, const Slice& key, const Slice& value) override { row_ << "MERGE(" << cf << ") : "; commonPutMerge(key, value); return Status::OK(); } - virtual Status MarkNoop(bool) - override { + Status MarkNoop(bool) override { row_ << "NOOP "; return Status::OK(); } - virtual Status DeleteCF(uint32_t cf, const Slice& key) override { + Status DeleteCF(uint32_t cf, const Slice& key) override { row_ << "DELETE(" << cf << ") : "; row_ << LDBCommand::StringToHex(key.ToString()) << " "; return Status::OK(); } - virtual Status SingleDeleteCF(uint32_t cf, const Slice& key) override { + Status SingleDeleteCF(uint32_t cf, const Slice& key) override { row_ << "SINGLE_DELETE(" << cf << ") : "; row_ << LDBCommand::StringToHex(key.ToString()) << " "; return Status::OK(); } - virtual Status DeleteRangeCF(uint32_t cf, const Slice& begin_key, - const Slice& end_key) override { + Status DeleteRangeCF(uint32_t cf, const Slice& begin_key, + const Slice& end_key) override { row_ << "DELETE_RANGE(" << cf << ") : "; row_ << LDBCommand::StringToHex(begin_key.ToString()) << " "; row_ << LDBCommand::StringToHex(end_key.ToString()) << " "; return Status::OK(); } - virtual Status MarkBeginPrepare(bool unprepare) override { + Status MarkBeginPrepare(bool unprepare) override { row_ << "BEGIN_PREPARE("; row_ << (unprepare ? "true" : "false") << ") "; return Status::OK(); } - virtual Status MarkEndPrepare(const Slice& xid) override { + Status MarkEndPrepare(const Slice& xid) override { row_ << "END_PREPARE("; row_ << LDBCommand::StringToHex(xid.ToString()) << ") "; return Status::OK(); } - virtual Status MarkRollback(const Slice& xid) override { + Status MarkRollback(const Slice& xid) override { row_ << "ROLLBACK("; row_ << LDBCommand::StringToHex(xid.ToString()) << ") "; return Status::OK(); } - virtual Status MarkCommit(const Slice& xid) override { + Status MarkCommit(const Slice& xid) override { row_ << "COMMIT("; row_ << LDBCommand::StringToHex(xid.ToString()) << ") "; return Status::OK(); } - virtual ~InMemoryHandler() {} + ~InMemoryHandler() override {} protected: - virtual bool WriteAfterCommit() const override { return write_after_commit_; } + bool WriteAfterCommit() const override { return write_after_commit_; } private: std::stringstream& row_; @@ -1960,16 +1975,17 @@ class InMemoryHandler : public WriteBatch::Handler { bool write_after_commit_; }; -void DumpWalFile(std::string wal_file, bool print_header, bool print_values, - bool is_write_committed, LDBCommandExecuteResult* exec_state) { - Env* env_ = Env::Default(); - EnvOptions soptions; - unique_ptr wal_file_reader; +void DumpWalFile(Options options, std::string wal_file, bool print_header, + bool print_values, bool is_write_committed, + LDBCommandExecuteResult* exec_state) { + Env* env = options.env; + EnvOptions soptions(options); + std::unique_ptr wal_file_reader; Status status; { - unique_ptr file; - status = env_->NewSequentialFile(wal_file, &file, soptions); + std::unique_ptr file; + status = env->NewSequentialFile(wal_file, &file, soptions); if (status.ok()) { wal_file_reader.reset( new SequentialFileReader(std::move(file), wal_file)); @@ -1997,9 +2013,8 @@ void DumpWalFile(std::string wal_file, bool print_header, bool print_values, // bogus input, carry on as best we can log_number = 0; } - DBOptions db_options; - log::Reader reader(db_options.info_log, std::move(wal_file_reader), - &reporter, true /* checksum */, log_number); + log::Reader reader(options.info_log, std::move(wal_file_reader), &reporter, + true /* checksum */, log_number); std::string scratch; WriteBatch batch; Slice record; @@ -2078,8 +2093,8 @@ void WALDumperCommand::Help(std::string& ret) { } void WALDumperCommand::DoCommand() { - DumpWalFile(wal_file_, print_header_, print_values_, is_write_committed_, - &exec_state_); + DumpWalFile(options_, wal_file_, print_header_, print_values_, + is_write_committed_, &exec_state_); } // ---------------------------------------------------------------------------- @@ -2318,7 +2333,9 @@ void ScanCommand::DoCommand() { } int num_keys_scanned = 0; - Iterator* it = db_->NewIterator(ReadOptions(), GetCfHandle()); + ReadOptions scan_read_opts; + scan_read_opts.total_order_seek = true; + Iterator* it = db_->NewIterator(scan_read_opts, GetCfHandle()); if (start_key_specified_) { it->Seek(start_key_); } else { @@ -2835,7 +2852,8 @@ void RestoreCommand::DoCommand() { namespace { -void DumpSstFile(std::string filename, bool output_hex, bool show_properties) { +void DumpSstFile(Options options, std::string filename, bool output_hex, + bool show_properties) { std::string from_key; std::string to_key; if (filename.length() <= 4 || @@ -2844,8 +2862,9 @@ void DumpSstFile(std::string filename, bool output_hex, bool show_properties) { return; } // no verification - rocksdb::SstFileReader reader(filename, false, output_hex); - Status st = reader.ReadSequential(true, std::numeric_limits::max(), false, // has_from + rocksdb::SstFileDumper dumper(options, filename, false, output_hex); + Status st = dumper.ReadSequential(true, std::numeric_limits::max(), + false, // has_from from_key, false, // has_to to_key); if (!st.ok()) { @@ -2859,21 +2878,17 @@ void DumpSstFile(std::string filename, bool output_hex, bool show_properties) { std::shared_ptr table_properties_from_reader; - st = reader.ReadTableProperties(&table_properties_from_reader); + st = dumper.ReadTableProperties(&table_properties_from_reader); if (!st.ok()) { std::cerr << filename << ": " << st.ToString() << ". Try to use initial table properties" << std::endl; - table_properties = reader.GetInitTableProperties(); + table_properties = dumper.GetInitTableProperties(); } else { table_properties = table_properties_from_reader.get(); } if (table_properties != nullptr) { std::cout << std::endl << "Table Properties:" << std::endl; std::cout << table_properties->ToString("\n") << std::endl; - std::cout << "# deleted keys: " - << rocksdb::GetDeletedKeys( - table_properties->user_collected_properties) - << std::endl; } } } @@ -2913,7 +2928,7 @@ void DBFileDumperCommand::DoCommand() { manifest_filename.resize(manifest_filename.size() - 1); std::string manifest_filepath = db_->GetName() + "/" + manifest_filename; std::cout << manifest_filepath << std::endl; - DumpManifestFile(manifest_filepath, false, false, false); + DumpManifestFile(options_, manifest_filepath, false, false, false); std::cout << std::endl; std::cout << "SST Files" << std::endl; @@ -2924,7 +2939,7 @@ void DBFileDumperCommand::DoCommand() { std::string filename = fileMetadata.db_path + fileMetadata.name; std::cout << filename << " level:" << fileMetadata.level << std::endl; std::cout << "------------------------------" << std::endl; - DumpSstFile(filename, false, true); + DumpSstFile(options_, filename, false, true); std::cout << std::endl; } std::cout << std::endl; @@ -2941,7 +2956,7 @@ void DBFileDumperCommand::DoCommand() { std::string filename = db_->GetOptions().wal_dir + wal->PathName(); std::cout << filename << std::endl; // TODO(myabandeh): allow configuring is_write_commited - DumpWalFile(filename, true, true, true /* is_write_commited */, + DumpWalFile(options_, filename, true, true, true /* is_write_commited */, &exec_state_); } } diff --git a/ceph/src/rocksdb/tools/ldb_cmd_test.cc b/ceph/src/rocksdb/tools/ldb_cmd_test.cc index 824b44c08..3b7099533 100644 --- a/ceph/src/rocksdb/tools/ldb_cmd_test.cc +++ b/ceph/src/rocksdb/tools/ldb_cmd_test.cc @@ -12,6 +12,8 @@ using std::string; using std::vector; using std::map; +namespace rocksdb { + class LdbCmdTest : public testing::Test {}; TEST_F(LdbCmdTest, HexToString) { @@ -47,6 +49,77 @@ TEST_F(LdbCmdTest, HexToStringBadInputs) { } } +TEST_F(LdbCmdTest, MemEnv) { + std::unique_ptr env(NewMemEnv(Env::Default())); + Options opts; + opts.env = env.get(); + opts.create_if_missing = true; + + DB* db = nullptr; + std::string dbname = test::TmpDir(); + ASSERT_OK(DB::Open(opts, dbname, &db)); + + WriteOptions wopts; + for (int i = 0; i < 100; i++) { + char buf[16]; + snprintf(buf, sizeof(buf), "%08d", i); + ASSERT_OK(db->Put(wopts, buf, buf)); + } + FlushOptions fopts; + fopts.wait = true; + ASSERT_OK(db->Flush(fopts)); + + delete db; + + char arg1[] = "./ldb"; + char arg2[1024]; + snprintf(arg2, sizeof(arg2), "--db=%s", dbname.c_str()); + char arg3[] = "dump_live_files"; + char* argv[] = {arg1, arg2, arg3}; + + rocksdb::LDBTool tool; + tool.Run(3, argv, opts); +} + +TEST_F(LdbCmdTest, OptionParsing) { + // test parsing flags + { + std::vector args; + args.push_back("scan"); + args.push_back("--ttl"); + args.push_back("--timestamp"); + LDBCommand* command = rocksdb::LDBCommand::InitFromCmdLineArgs( + args, Options(), LDBOptions(), nullptr); + const std::vector flags = command->TEST_GetFlags(); + EXPECT_EQ(flags.size(), 2); + EXPECT_EQ(flags[0], "ttl"); + EXPECT_EQ(flags[1], "timestamp"); + delete command; + } + // test parsing options which contains equal sign in the option value + { + std::vector args; + args.push_back("scan"); + args.push_back("--db=/dev/shm/ldbtest/"); + args.push_back( + "--from='abcd/efg/hijk/lmn/" + "opq:__rst.uvw.xyz?a=3+4+bcd+efghi&jk=lm_no&pq=rst-0&uv=wx-8&yz=a&bcd_" + "ef=gh.ijk'"); + LDBCommand* command = rocksdb::LDBCommand::InitFromCmdLineArgs( + args, Options(), LDBOptions(), nullptr); + const std::map option_map = + command->TEST_GetOptionMap(); + EXPECT_EQ(option_map.at("db"), "/dev/shm/ldbtest/"); + EXPECT_EQ(option_map.at("from"), + "'abcd/efg/hijk/lmn/" + "opq:__rst.uvw.xyz?a=3+4+bcd+efghi&jk=lm_no&pq=rst-0&uv=wx-8&yz=" + "a&bcd_ef=gh.ijk'"); + delete command; + } +} + +} // namespace rocksdb + int main(int argc, char** argv) { ::testing::InitGoogleTest(&argc, argv); return RUN_ALL_TESTS(); diff --git a/ceph/src/rocksdb/tools/sst_dump_test.cc b/ceph/src/rocksdb/tools/sst_dump_test.cc index beab224d1..6bf3e3b97 100644 --- a/ceph/src/rocksdb/tools/sst_dump_test.cc +++ b/ceph/src/rocksdb/tools/sst_dump_test.cc @@ -38,24 +38,18 @@ static std::string MakeValue(int i) { return key.Encode().ToString(); } -void createSST(const std::string& file_name, - const BlockBasedTableOptions& table_options) { - std::shared_ptr tf; - tf.reset(new rocksdb::BlockBasedTableFactory(table_options)); - - unique_ptr file; - Env* env = Env::Default(); - EnvOptions env_options; +void createSST(const Options& opts, const std::string& file_name) { + Env* env = opts.env; + EnvOptions env_options(opts); ReadOptions read_options; - Options opts; const ImmutableCFOptions imoptions(opts); const MutableCFOptions moptions(opts); rocksdb::InternalKeyComparator ikc(opts.comparator); - unique_ptr tb; + std::unique_ptr tb; + std::unique_ptr file; ASSERT_OK(env->NewWritableFile(file_name, &file, env_options)); - opts.table_factory = tf; std::vector > int_tbl_prop_collector_factories; std::unique_ptr file_writer( @@ -65,9 +59,9 @@ void createSST(const std::string& file_name, tb.reset(opts.table_factory->NewTableBuilder( TableBuilderOptions( imoptions, moptions, ikc, &int_tbl_prop_collector_factories, - CompressionType::kNoCompression, CompressionOptions(), - nullptr /* compression_dict */, false /* skip_filters */, - column_family_name, unknown_level), + CompressionType::kNoCompression, 0 /* sample_for_compression */, + CompressionOptions(), false /* skip_filters */, column_family_name, + unknown_level), TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, file_writer.get())); @@ -80,8 +74,8 @@ void createSST(const std::string& file_name, file_writer->Close(); } -void cleanup(const std::string& file_name) { - Env* env = Env::Default(); +void cleanup(const Options& opts, const std::string& file_name) { + Env* env = opts.env; env->DeleteFile(file_name); std::string outfile_name = file_name.substr(0, file_name.length() - 4); outfile_name.append("_dump.txt"); @@ -94,11 +88,9 @@ class SSTDumpToolTest : public testing::Test { std::string testDir_; public: - BlockBasedTableOptions table_options_; - SSTDumpToolTest() { testDir_ = test::TmpDir(); } - ~SSTDumpToolTest() {} + ~SSTDumpToolTest() override {} std::string MakeFilePath(const std::string& file_name) const { std::string path(testDir_); @@ -119,88 +111,121 @@ class SSTDumpToolTest : public testing::Test { }; TEST_F(SSTDumpToolTest, EmptyFilter) { + Options opts; std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); - createSST(file_path, table_options_); + createSST(opts, file_path); char* usage[3]; PopulateCommandArgs(file_path, "--command=raw", usage); rocksdb::SSTDumpTool tool; - ASSERT_TRUE(!tool.Run(3, usage)); + ASSERT_TRUE(!tool.Run(3, usage, opts)); - cleanup(file_path); + cleanup(opts, file_path); for (int i = 0; i < 3; i++) { delete[] usage[i]; } } TEST_F(SSTDumpToolTest, FilterBlock) { - table_options_.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, true)); + Options opts; + BlockBasedTableOptions table_opts; + table_opts.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, true)); + opts.table_factory.reset(new BlockBasedTableFactory(table_opts)); std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); - createSST(file_path, table_options_); + createSST(opts, file_path); char* usage[3]; PopulateCommandArgs(file_path, "--command=raw", usage); rocksdb::SSTDumpTool tool; - ASSERT_TRUE(!tool.Run(3, usage)); + ASSERT_TRUE(!tool.Run(3, usage, opts)); - cleanup(file_path); + cleanup(opts, file_path); for (int i = 0; i < 3; i++) { delete[] usage[i]; } } TEST_F(SSTDumpToolTest, FullFilterBlock) { - table_options_.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + Options opts; + BlockBasedTableOptions table_opts; + table_opts.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + opts.table_factory.reset(new BlockBasedTableFactory(table_opts)); std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); - createSST(file_path, table_options_); + createSST(opts, file_path); char* usage[3]; PopulateCommandArgs(file_path, "--command=raw", usage); rocksdb::SSTDumpTool tool; - ASSERT_TRUE(!tool.Run(3, usage)); + ASSERT_TRUE(!tool.Run(3, usage, opts)); - cleanup(file_path); + cleanup(opts, file_path); for (int i = 0; i < 3; i++) { delete[] usage[i]; } } TEST_F(SSTDumpToolTest, GetProperties) { - table_options_.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + Options opts; + BlockBasedTableOptions table_opts; + table_opts.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + opts.table_factory.reset(new BlockBasedTableFactory(table_opts)); std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); - createSST(file_path, table_options_); + createSST(opts, file_path); char* usage[3]; PopulateCommandArgs(file_path, "--show_properties", usage); rocksdb::SSTDumpTool tool; - ASSERT_TRUE(!tool.Run(3, usage)); + ASSERT_TRUE(!tool.Run(3, usage, opts)); - cleanup(file_path); + cleanup(opts, file_path); for (int i = 0; i < 3; i++) { delete[] usage[i]; } } TEST_F(SSTDumpToolTest, CompressedSizes) { - table_options_.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + Options opts; + BlockBasedTableOptions table_opts; + table_opts.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false)); + opts.table_factory.reset(new BlockBasedTableFactory(table_opts)); std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); - createSST(file_path, table_options_); + createSST(opts, file_path); char* usage[3]; PopulateCommandArgs(file_path, "--command=recompress", usage); rocksdb::SSTDumpTool tool; - ASSERT_TRUE(!tool.Run(3, usage)); + ASSERT_TRUE(!tool.Run(3, usage, opts)); + + cleanup(opts, file_path); + for (int i = 0; i < 3; i++) { + delete[] usage[i]; + } +} + +TEST_F(SSTDumpToolTest, MemEnv) { + std::unique_ptr env(NewMemEnv(Env::Default())); + Options opts; + opts.env = env.get(); + std::string file_path = MakeFilePath("rocksdb_sst_test.sst"); + createSST(opts, file_path); + + char* usage[3]; + PopulateCommandArgs(file_path, "--command=verify_checksum", usage); + + rocksdb::SSTDumpTool tool; + ASSERT_TRUE(!tool.Run(3, usage, opts)); - cleanup(file_path); + cleanup(opts, file_path); for (int i = 0; i < 3; i++) { delete[] usage[i]; } } + } // namespace rocksdb int main(int argc, char** argv) { diff --git a/ceph/src/rocksdb/tools/sst_dump_tool.cc b/ceph/src/rocksdb/tools/sst_dump_tool.cc index 6ca56aad9..5cbbfc385 100644 --- a/ceph/src/rocksdb/tools/sst_dump_tool.cc +++ b/ceph/src/rocksdb/tools/sst_dump_tool.cc @@ -43,12 +43,14 @@ namespace rocksdb { -SstFileReader::SstFileReader(const std::string& file_path, bool verify_checksum, +SstFileDumper::SstFileDumper(const Options& options, + const std::string& file_path, bool verify_checksum, bool output_hex) : file_name_(file_path), read_num_(0), verify_checksum_(verify_checksum), output_hex_(output_hex), + options_(options), ioptions_(options_), moptions_(ColumnFamilyOptions(options_)), internal_comparator_(BytewiseComparator()) { @@ -74,7 +76,7 @@ static const std::vector> {CompressionType::kXpressCompression, "kXpressCompression"}, {CompressionType::kZSTD, "kZSTD"}}; -Status SstFileReader::GetTableReader(const std::string& file_path) { +Status SstFileDumper::GetTableReader(const std::string& file_path) { // Warning about 'magic_number' being uninitialized shows up only in UBsan // builds. Though access is guarded by 's.ok()' checks, fix the issue to // avoid any warnings. @@ -83,7 +85,7 @@ Status SstFileReader::GetTableReader(const std::string& file_path) { // read table magic number Footer footer; - unique_ptr file; + std::unique_ptr file; uint64_t file_size = 0; Status s = options_.env->NewRandomAccessFile(file_path, &file, soptions_); if (s.ok()) { @@ -123,10 +125,10 @@ Status SstFileReader::GetTableReader(const std::string& file_path) { return s; } -Status SstFileReader::NewTableReader( +Status SstFileDumper::NewTableReader( const ImmutableCFOptions& /*ioptions*/, const EnvOptions& /*soptions*/, const InternalKeyComparator& /*internal_comparator*/, uint64_t file_size, - unique_ptr* /*table_reader*/) { + std::unique_ptr* /*table_reader*/) { // We need to turn off pre-fetching of index and filter nodes for // BlockBasedTable if (BlockBasedTableFactory::kName == options_.table_factory->Name()) { @@ -143,12 +145,12 @@ Status SstFileReader::NewTableReader( std::move(file_), file_size, &table_reader_); } -Status SstFileReader::VerifyChecksum() { +Status SstFileDumper::VerifyChecksum() { return table_reader_->VerifyChecksum(); } -Status SstFileReader::DumpTable(const std::string& out_filename) { - unique_ptr out_file; +Status SstFileDumper::DumpTable(const std::string& out_filename) { + std::unique_ptr out_file; Env* env = Env::Default(); env->NewWritableFile(out_filename, &out_file, soptions_); Status s = table_reader_->DumpTable(out_file.get(), @@ -157,23 +159,23 @@ Status SstFileReader::DumpTable(const std::string& out_filename) { return s; } -uint64_t SstFileReader::CalculateCompressedTableSize( +uint64_t SstFileDumper::CalculateCompressedTableSize( const TableBuilderOptions& tb_options, size_t block_size) { - unique_ptr out_file; - unique_ptr env(NewMemEnv(Env::Default())); + std::unique_ptr out_file; + std::unique_ptr env(NewMemEnv(Env::Default())); env->NewWritableFile(testFileName, &out_file, soptions_); - unique_ptr dest_writer; + std::unique_ptr dest_writer; dest_writer.reset( new WritableFileWriter(std::move(out_file), testFileName, soptions_)); BlockBasedTableOptions table_options; table_options.block_size = block_size; BlockBasedTableFactory block_based_tf(table_options); - unique_ptr table_builder; + std::unique_ptr table_builder; table_builder.reset(block_based_tf.NewTableBuilder( tb_options, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, dest_writer.get())); - unique_ptr iter(table_reader_->NewIterator( + std::unique_ptr iter(table_reader_->NewIterator( ReadOptions(), moptions_.prefix_extractor.get())); for (iter->SeekToFirst(); iter->Valid(); iter->Next()) { if (!iter->status().ok()) { @@ -192,7 +194,7 @@ uint64_t SstFileReader::CalculateCompressedTableSize( return size; } -int SstFileReader::ShowAllCompressionSizes( +int SstFileDumper::ShowAllCompressionSizes( size_t block_size, const std::vector>& compression_types) { @@ -214,7 +216,7 @@ int SstFileReader::ShowAllCompressionSizes( int unknown_level = -1; TableBuilderOptions tb_opts( imoptions, moptions, ikc, &block_based_table_factories, i.first, - compress_opt, nullptr /* compression_dict */, + 0 /* sample_for_compression */, compress_opt, false /* skip_filters */, column_family_name, unknown_level); uint64_t file_size = CalculateCompressedTableSize(tb_opts, block_size); fprintf(stdout, "Compression: %s", i.second); @@ -226,7 +228,7 @@ int SstFileReader::ShowAllCompressionSizes( return 0; } -Status SstFileReader::ReadTableProperties(uint64_t table_magic_number, +Status SstFileDumper::ReadTableProperties(uint64_t table_magic_number, RandomAccessFileReader* file, uint64_t file_size) { TableProperties* table_properties = nullptr; @@ -240,7 +242,7 @@ Status SstFileReader::ReadTableProperties(uint64_t table_magic_number, return s; } -Status SstFileReader::SetTableOptionsByMagicNumber( +Status SstFileDumper::SetTableOptionsByMagicNumber( uint64_t table_magic_number) { assert(table_properties_); if (table_magic_number == kBlockBasedTableMagicNumber || @@ -283,7 +285,7 @@ Status SstFileReader::SetTableOptionsByMagicNumber( return Status::OK(); } -Status SstFileReader::SetOldTableOptions() { +Status SstFileDumper::SetOldTableOptions() { assert(table_properties_ == nullptr); options_.table_factory = std::make_shared(); fprintf(stdout, "Sst file format: block-based(old version)\n"); @@ -291,7 +293,7 @@ Status SstFileReader::SetOldTableOptions() { return Status::OK(); } -Status SstFileReader::ReadSequential(bool print_kv, uint64_t read_num, +Status SstFileDumper::ReadSequential(bool print_kv, uint64_t read_num, bool has_from, const std::string& from_key, bool has_to, const std::string& to_key, bool use_from_as_prefix) { @@ -348,7 +350,7 @@ Status SstFileReader::ReadSequential(bool print_kv, uint64_t read_num, return ret; } -Status SstFileReader::ReadTableProperties( +Status SstFileDumper::ReadTableProperties( std::shared_ptr* table_properties) { if (!table_reader_) { return init_result_; @@ -417,7 +419,7 @@ void print_help() { } // namespace -int SSTDumpTool::Run(int argc, char** argv) { +int SSTDumpTool::Run(int argc, char** argv, Options options) { const char* dir_or_file = nullptr; uint64_t read_num = std::numeric_limits::max(); std::string command; @@ -545,7 +547,7 @@ int SSTDumpTool::Run(int argc, char** argv) { } std::vector filenames; - rocksdb::Env* env = rocksdb::Env::Default(); + rocksdb::Env* env = options.env; rocksdb::Status st = env->GetChildren(dir_or_file, &filenames); bool dir = true; if (!st.ok()) { @@ -570,16 +572,16 @@ int SSTDumpTool::Run(int argc, char** argv) { filename = std::string(dir_or_file) + "/" + filename; } - rocksdb::SstFileReader reader(filename, verify_checksum, + rocksdb::SstFileDumper dumper(options, filename, verify_checksum, output_hex); - if (!reader.getStatus().ok()) { + if (!dumper.getStatus().ok()) { fprintf(stderr, "%s: %s\n", filename.c_str(), - reader.getStatus().ToString().c_str()); + dumper.getStatus().ToString().c_str()); continue; } if (command == "recompress") { - reader.ShowAllCompressionSizes( + dumper.ShowAllCompressionSizes( set_block_size ? block_size : 16384, compression_types.empty() ? kCompressions : compression_types); return 0; @@ -589,7 +591,7 @@ int SSTDumpTool::Run(int argc, char** argv) { std::string out_filename = filename.substr(0, filename.length() - 4); out_filename.append("_dump.txt"); - st = reader.DumpTable(out_filename); + st = dumper.DumpTable(out_filename); if (!st.ok()) { fprintf(stderr, "%s: %s\n", filename.c_str(), st.ToString().c_str()); exit(1); @@ -601,7 +603,7 @@ int SSTDumpTool::Run(int argc, char** argv) { // scan all files in give file path. if (command == "" || command == "scan" || command == "check") { - st = reader.ReadSequential( + st = dumper.ReadSequential( command == "scan", read_num > 0 ? (read_num - total_read) : read_num, has_from || use_from_as_prefix, from_key, has_to, to_key, use_from_as_prefix); @@ -609,14 +611,14 @@ int SSTDumpTool::Run(int argc, char** argv) { fprintf(stderr, "%s: %s\n", filename.c_str(), st.ToString().c_str()); } - total_read += reader.GetReadNumber(); + total_read += dumper.GetReadNumber(); if (read_num > 0 && total_read > read_num) { break; } } if (command == "verify") { - st = reader.VerifyChecksum(); + st = dumper.VerifyChecksum(); if (!st.ok()) { fprintf(stderr, "%s is corrupted: %s\n", filename.c_str(), st.ToString().c_str()); @@ -631,11 +633,11 @@ int SSTDumpTool::Run(int argc, char** argv) { std::shared_ptr table_properties_from_reader; - st = reader.ReadTableProperties(&table_properties_from_reader); + st = dumper.ReadTableProperties(&table_properties_from_reader); if (!st.ok()) { fprintf(stderr, "%s: %s\n", filename.c_str(), st.ToString().c_str()); fprintf(stderr, "Try to use initial table properties\n"); - table_properties = reader.GetInitTableProperties(); + table_properties = dumper.GetInitTableProperties(); } else { table_properties = table_properties_from_reader.get(); } @@ -646,19 +648,6 @@ int SSTDumpTool::Run(int argc, char** argv) { "------------------------------\n" " %s", table_properties->ToString("\n ", ": ").c_str()); - fprintf(stdout, "# deleted keys: %" PRIu64 "\n", - rocksdb::GetDeletedKeys( - table_properties->user_collected_properties)); - - bool property_present; - uint64_t merge_operands = rocksdb::GetMergeOperands( - table_properties->user_collected_properties, &property_present); - if (property_present) { - fprintf(stdout, " # merge operands: %" PRIu64 "\n", - merge_operands); - } else { - fprintf(stdout, " # merge operands: UNKNOWN\n"); - } } total_num_files += 1; total_num_data_blocks += table_properties->num_data_blocks; diff --git a/ceph/src/rocksdb/tools/sst_dump_tool_imp.h b/ceph/src/rocksdb/tools/sst_dump_tool_imp.h index ca60dd93c..846738a40 100644 --- a/ceph/src/rocksdb/tools/sst_dump_tool_imp.h +++ b/ceph/src/rocksdb/tools/sst_dump_tool_imp.h @@ -15,10 +15,10 @@ namespace rocksdb { -class SstFileReader { +class SstFileDumper { public: - explicit SstFileReader(const std::string& file_name, bool verify_checksum, - bool output_hex); + explicit SstFileDumper(const Options& options, const std::string& file_name, + bool verify_checksum, bool output_hex); Status ReadSequential(bool print_kv, uint64_t read_num, bool has_from, const std::string& from_key, bool has_to, @@ -57,7 +57,7 @@ class SstFileReader { const EnvOptions& soptions, const InternalKeyComparator& internal_comparator, uint64_t file_size, - unique_ptr* table_reader); + std::unique_ptr* table_reader); std::string file_name_; uint64_t read_num_; @@ -70,13 +70,13 @@ class SstFileReader { Options options_; Status init_result_; - unique_ptr table_reader_; - unique_ptr file_; + std::unique_ptr table_reader_; + std::unique_ptr file_; const ImmutableCFOptions ioptions_; const MutableCFOptions moptions_; InternalKeyComparator internal_comparator_; - unique_ptr table_properties_; + std::unique_ptr table_properties_; }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/tools/trace_analyzer_test.cc b/ceph/src/rocksdb/tools/trace_analyzer_test.cc index 768f789cc..b2cc777d5 100644 --- a/ceph/src/rocksdb/tools/trace_analyzer_test.cc +++ b/ceph/src/rocksdb/tools/trace_analyzer_test.cc @@ -50,7 +50,7 @@ class TraceAnalyzerTest : public testing::Test { dbname_ = test_path_ + "/db"; } - ~TraceAnalyzerTest() {} + ~TraceAnalyzerTest() override {} void GenerateTrace(std::string trace_path) { Options options; diff --git a/ceph/src/rocksdb/tools/trace_analyzer_tool.cc b/ceph/src/rocksdb/tools/trace_analyzer_tool.cc index 7915322f0..a01869252 100644 --- a/ceph/src/rocksdb/tools/trace_analyzer_tool.cc +++ b/ceph/src/rocksdb/tools/trace_analyzer_tool.cc @@ -78,6 +78,11 @@ DEFINE_bool(output_time_series, false, "such that we can have the time series data of the queries \n" "File name: ---time_series.txt\n" "Format:[type_id time_in_sec access_keyid]."); +DEFINE_bool(try_process_corrupted_trace, false, + "In default, trace_analyzer will exit if the trace file is " + "corrupted due to the unexpected tracing cases. If this option " + "is enabled, trace_analyzer will stop reading the trace file, " + "and start analyzing the read-in data."); DEFINE_int32(output_prefix_cut, 0, "The number of bytes as prefix to cut the keys.\n" "If it is enabled, it will generate the following:\n" @@ -139,7 +144,7 @@ DEFINE_bool(no_key, false, DEFINE_bool(print_overall_stats, true, " Print the stats of the whole trace, " "like total requests, keys, and etc."); -DEFINE_bool(print_key_distribution, false, "Print the key size distribution."); +DEFINE_bool(output_key_distribution, false, "Print the key size distribution."); DEFINE_bool( output_value_distribution, false, "Out put the value size distribution, only available for Put and Merge.\n" @@ -158,6 +163,9 @@ DEFINE_int32(value_interval, 8, "To output the value distribution, we need to set the value " "intervals and make the statistic of the value size distribution " "in different intervals. The default is 8."); +DEFINE_double(sample_ratio, 1.0, + "If the trace size is extremely huge or user want to sample " + "the trace when analyzing, sample ratio can be set (0, 1.0]"); namespace rocksdb { @@ -276,9 +284,17 @@ TraceAnalyzer::TraceAnalyzer(std::string& trace_path, std::string& output_path, total_access_keys_ = 0; total_gets_ = 0; total_writes_ = 0; + trace_create_time_ = 0; begin_time_ = 0; end_time_ = 0; time_series_start_ = 0; + cur_time_sec_ = 0; + if (FLAGS_sample_ratio > 1.0 || FLAGS_sample_ratio <= 0) { + sample_max_ = 1; + } else { + sample_max_ = static_cast(1.0 / FLAGS_sample_ratio); + } + ta_.resize(kTaTypeNum); ta_[0].type_name = "get"; if (FLAGS_analyze_get) { @@ -328,6 +344,9 @@ TraceAnalyzer::TraceAnalyzer(std::string& trace_path, std::string& output_path, } else { ta_[7].enabled = false; } + for (int i = 0; i < kTaTypeNum; i++) { + ta_[i].sample_count = 0; + } } TraceAnalyzer::~TraceAnalyzer() {} @@ -363,6 +382,13 @@ Status TraceAnalyzer::PrepareProcessing() { if (!s.ok()) { return s; } + + qps_stats_name = + output_path_ + "/" + FLAGS_output_prefix + "-cf_qps_stats.txt"; + s = env_->NewWritableFile(qps_stats_name, &cf_qps_f_, env_options_); + if (!s.ok()) { + return s; + } } return Status::OK(); } @@ -422,6 +448,7 @@ Status TraceAnalyzer::StartProcessing() { fprintf(stderr, "Cannot read the header\n"); return s; } + trace_create_time_ = header.ts; if (FLAGS_output_time_series) { time_series_start_ = header.ts; } @@ -521,7 +548,7 @@ Status TraceAnalyzer::MakeStatistics() { } // Generate the key size distribution data - if (FLAGS_print_key_distribution) { + if (FLAGS_output_key_distribution) { if (stat.second.a_key_size_stats.find(record.first.size()) == stat.second.a_key_size_stats.end()) { stat.second.a_key_size_stats[record.first.size()] = 1; @@ -565,17 +592,31 @@ Status TraceAnalyzer::MakeStatistics() { // find the medium of the key size uint64_t k_count = 0; + bool get_mid = false; for (auto& record : stat.second.a_key_size_stats) { k_count += record.second; - if (k_count >= stat.second.a_key_mid) { + if (!get_mid && k_count >= stat.second.a_key_mid) { stat.second.a_key_mid = record.first; - break; + get_mid = true; + } + if (FLAGS_output_key_distribution && stat.second.a_key_size_f) { + ret = sprintf(buffer_, "%" PRIu64 " %" PRIu64 "\n", record.first, + record.second); + if (ret < 0) { + return Status::IOError("Format output failed"); + } + std::string printout(buffer_); + s = stat.second.a_key_size_f->Append(printout); + if (!s.ok()) { + fprintf(stderr, "Write key size distribution file failed\n"); + return s; + } } } // output the value size distribution uint64_t v_begin = 0, v_end = 0, v_count = 0; - bool get_mid = false; + get_mid = false; for (auto& record : stat.second.a_value_size_stats) { v_begin = v_end; v_end = (record.first + 1) * FLAGS_value_interval; @@ -740,6 +781,9 @@ Status TraceAnalyzer::MakeStatisticCorrelation(TraceStats& stats, // Process the statistics of QPS Status TraceAnalyzer::MakeStatisticQPS() { + if(begin_time_ == 0) { + begin_time_ = trace_create_time_; + } uint32_t duration = static_cast((end_time_ - begin_time_) / 1000000); int ret; @@ -818,6 +862,32 @@ Status TraceAnalyzer::MakeStatisticQPS() { stat.second.a_ave_qps = (static_cast(cf_qps_sum)) / duration; } + // Output the accessed unique key number change overtime + if (stat.second.a_key_num_f) { + uint64_t cur_uni_key = + static_cast(stat.second.a_key_stats.size()); + double cur_ratio = 0.0; + uint64_t cur_num = 0; + for (uint32_t i = 0; i < duration; i++) { + auto find_time = stat.second.uni_key_num.find(i); + if (find_time != stat.second.uni_key_num.end()) { + cur_ratio = (static_cast(find_time->second)) / cur_uni_key; + cur_num = find_time->second; + } + ret = sprintf(buffer_, "%" PRIu64 " %.12f\n", cur_num, cur_ratio); + if (ret < 0) { + return Status::IOError("Format the output failed"); + } + std::string printout(buffer_); + s = stat.second.a_key_num_f->Append(printout); + if (!s.ok()) { + fprintf(stderr, + "Write accessed unique key number change file failed\n"); + return s; + } + } + } + // output the prefix of top k access peak if (FLAGS_output_prefix_cut > 0 && stat.second.a_top_qps_prefix_f) { while (!stat.second.top_k_qps_sec.empty()) { @@ -882,6 +952,33 @@ Status TraceAnalyzer::MakeStatisticQPS() { } } + if (cf_qps_f_) { + int cfs_size = static_cast(cfs_.size()); + uint32_t v; + for (uint32_t i = 0; i < duration; i++) { + for (int cf = 0; cf < cfs_size; cf++) { + if (cfs_[cf].cf_qps.find(i) != cfs_[cf].cf_qps.end()) { + v = cfs_[cf].cf_qps[i]; + } else { + v = 0; + } + if (cf < cfs_size - 1) { + ret = sprintf(buffer_, "%u ", v); + } else { + ret = sprintf(buffer_, "%u\n", v); + } + if (ret < 0) { + return Status::IOError("Format the output failed"); + } + std::string printout(buffer_); + s = cf_qps_f_->Append(printout); + if (!s.ok()) { + return s; + } + } + } + } + qps_peak_ = qps_peak; for (int type = 0; type <= kTaTypeNum; type++) { if (duration == 0) { @@ -1010,7 +1107,7 @@ Status TraceAnalyzer::ReProcessing() { } // Make the statistics fo the key size distribution - if (FLAGS_print_key_distribution) { + if (FLAGS_output_key_distribution) { if (cfs_[cf_id].w_key_size_stats.find(input_key.size()) == cfs_[cf_id].w_key_size_stats.end()) { cfs_[cf_id].w_key_size_stats[input_key.size()] = 1; @@ -1129,6 +1226,11 @@ Status TraceAnalyzer::KeyStatsInsertion(const uint32_t& type, tmp_qps_map[prefix] = 1; ta_[type].stats[cf_id].a_qps_prefix_stats[time_in_sec] = tmp_qps_map; } + if (time_in_sec != cur_time_sec_) { + ta_[type].stats[cf_id].uni_key_num[cur_time_sec_] = + static_cast(ta_[type].stats[cf_id].a_key_stats.size()); + cur_time_sec_ = time_in_sec; + } } else { found_stats->second.a_count++; found_stats->second.a_key_size_sqsum += MultiplyCheckOverflow( @@ -1149,6 +1251,11 @@ Status TraceAnalyzer::KeyStatsInsertion(const uint32_t& type, s = StatsUnitCorrelationUpdate(found_key->second, type, ts, key); } } + if (time_in_sec != cur_time_sec_) { + found_stats->second.uni_key_num[cur_time_sec_] = + static_cast(found_stats->second.a_key_stats.size()); + cur_time_sec_ = time_in_sec; + } auto found_value = found_stats->second.a_value_size_stats.find(dist_value_size); @@ -1189,6 +1296,10 @@ Status TraceAnalyzer::KeyStatsInsertion(const uint32_t& type, cfs_[cf_id] = cf_unit; } + if (FLAGS_output_qps_stats) { + cfs_[cf_id].cf_qps[time_in_sec]++; + } + if (FLAGS_output_time_series) { TraceUnit trace_u; trace_u.type = type; @@ -1251,6 +1362,9 @@ Status TraceAnalyzer::OpenStatsOutputFiles(const std::string& type, if (FLAGS_output_key_stats) { s = CreateOutputFile(type, new_stats.cf_name, "accessed_key_stats.txt", &new_stats.a_key_f); + s = CreateOutputFile(type, new_stats.cf_name, + "accessed_unique_key_num_change.txt", + &new_stats.a_key_num_f); if (!FLAGS_key_space_dir.empty()) { s = CreateOutputFile(type, new_stats.cf_name, "whole_key_stats.txt", &new_stats.w_key_f); @@ -1289,6 +1403,12 @@ Status TraceAnalyzer::OpenStatsOutputFiles(const std::string& type, &new_stats.a_value_size_f); } + if (FLAGS_output_key_distribution) { + s = CreateOutputFile(type, new_stats.cf_name, + "accessed_key_size_distribution.txt", + &new_stats.a_key_size_f); + } + if (FLAGS_output_qps_stats) { s = CreateOutputFile(type, new_stats.cf_name, "qps_stats.txt", &new_stats.a_qps_f); @@ -1328,6 +1448,10 @@ void TraceAnalyzer::CloseOutputFiles() { stat.second.a_key_f->Close(); } + if (stat.second.a_key_num_f) { + stat.second.a_key_num_f->Close(); + } + if (stat.second.a_count_dist_f) { stat.second.a_count_dist_f->Close(); } @@ -1340,6 +1464,10 @@ void TraceAnalyzer::CloseOutputFiles() { stat.second.a_value_size_f->Close(); } + if (stat.second.a_key_size_f) { + stat.second.a_key_size_f->Close(); + } + if (stat.second.a_qps_f) { stat.second.a_qps_f->Close(); } @@ -1373,6 +1501,15 @@ Status TraceAnalyzer::HandleGet(uint32_t column_family_id, } } + if (ta_[TraceOperationType::kGet].sample_count >= sample_max_) { + ta_[TraceOperationType::kGet].sample_count = 0; + } + if (ta_[TraceOperationType::kGet].sample_count > 0) { + ta_[TraceOperationType::kGet].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kGet].sample_count++; + if (!ta_[TraceOperationType::kGet].enabled) { return Status::OK(); } @@ -1400,6 +1537,15 @@ Status TraceAnalyzer::HandlePut(uint32_t column_family_id, const Slice& key, } } + if (ta_[TraceOperationType::kPut].sample_count >= sample_max_) { + ta_[TraceOperationType::kPut].sample_count = 0; + } + if (ta_[TraceOperationType::kPut].sample_count > 0) { + ta_[TraceOperationType::kPut].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kPut].sample_count++; + if (!ta_[TraceOperationType::kPut].enabled) { return Status::OK(); } @@ -1424,6 +1570,15 @@ Status TraceAnalyzer::HandleDelete(uint32_t column_family_id, } } + if (ta_[TraceOperationType::kDelete].sample_count >= sample_max_) { + ta_[TraceOperationType::kDelete].sample_count = 0; + } + if (ta_[TraceOperationType::kDelete].sample_count > 0) { + ta_[TraceOperationType::kDelete].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kDelete].sample_count++; + if (!ta_[TraceOperationType::kDelete].enabled) { return Status::OK(); } @@ -1448,6 +1603,15 @@ Status TraceAnalyzer::HandleSingleDelete(uint32_t column_family_id, } } + if (ta_[TraceOperationType::kSingleDelete].sample_count >= sample_max_) { + ta_[TraceOperationType::kSingleDelete].sample_count = 0; + } + if (ta_[TraceOperationType::kSingleDelete].sample_count > 0) { + ta_[TraceOperationType::kSingleDelete].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kSingleDelete].sample_count++; + if (!ta_[TraceOperationType::kSingleDelete].enabled) { return Status::OK(); } @@ -1473,6 +1637,15 @@ Status TraceAnalyzer::HandleDeleteRange(uint32_t column_family_id, } } + if (ta_[TraceOperationType::kRangeDelete].sample_count >= sample_max_) { + ta_[TraceOperationType::kRangeDelete].sample_count = 0; + } + if (ta_[TraceOperationType::kRangeDelete].sample_count > 0) { + ta_[TraceOperationType::kRangeDelete].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kRangeDelete].sample_count++; + if (!ta_[TraceOperationType::kRangeDelete].enabled) { return Status::OK(); } @@ -1499,6 +1672,15 @@ Status TraceAnalyzer::HandleMerge(uint32_t column_family_id, const Slice& key, } } + if (ta_[TraceOperationType::kMerge].sample_count >= sample_max_) { + ta_[TraceOperationType::kMerge].sample_count = 0; + } + if (ta_[TraceOperationType::kMerge].sample_count > 0) { + ta_[TraceOperationType::kMerge].sample_count++; + return Status::OK(); + } + ta_[TraceOperationType::kMerge].sample_count++; + if (!ta_[TraceOperationType::kMerge].enabled) { return Status::OK(); } @@ -1535,6 +1717,15 @@ Status TraceAnalyzer::HandleIter(uint32_t column_family_id, } } + if (ta_[type].sample_count >= sample_max_) { + ta_[type].sample_count = 0; + } + if (ta_[type].sample_count > 0) { + ta_[type].sample_count++; + return Status::OK(); + } + ta_[type].sample_count++; + if (!ta_[type].enabled) { return Status::OK(); } @@ -1596,6 +1787,8 @@ void TraceAnalyzer::PrintStatistics() { ta_[type].total_succ_access += stat.a_succ_count; printf("*********************************************************\n"); printf("colume family id: %u\n", stat.cf_id); + printf("Total number of queries to this cf by %s: %" PRIu64 "\n", + ta_[type].type_name.c_str(), stat.a_count); printf("Total unique keys in this cf: %" PRIu64 "\n", total_a_keys); printf("Average key size: %f key size medium: %" PRIu64 " Key size Variation: %f\n", @@ -1642,15 +1835,6 @@ void TraceAnalyzer::PrintStatistics() { } } - // print the key size distribution - if (FLAGS_print_key_distribution) { - printf("The key size distribution\n"); - for (auto& record : stat.a_key_size_stats) { - printf("key_size %" PRIu64 " nums: %" PRIu64 "\n", record.first, - record.second); - } - } - // print the operation correlations if (!FLAGS_print_correlation.empty()) { for (int correlation = 0; @@ -1700,6 +1884,8 @@ void TraceAnalyzer::PrintStatistics() { printf("Average QPS per second: %f Peak QPS: %u\n", qps_ave_[kTaTypeNum], qps_peak_[kTaTypeNum]); } + printf("The statistics related to query number need to times: %u\n", + sample_max_); printf("Total_requests: %" PRIu64 " Total_accessed_keys: %" PRIu64 " Total_gets: %" PRIu64 " Total_write_batch: %" PRIu64 "\n", total_requests_, total_access_keys_, total_gets_, total_writes_); @@ -1762,7 +1948,7 @@ int trace_analyzer_tool(int argc, char** argv) { } s = analyzer->StartProcessing(); - if (!s.ok()) { + if (!s.ok() && !FLAGS_try_process_corrupted_trace) { fprintf(stderr, "%s\n", s.getState()); fprintf(stderr, "Cannot processing the trace\n"); exit(1); diff --git a/ceph/src/rocksdb/tools/trace_analyzer_tool.h b/ceph/src/rocksdb/tools/trace_analyzer_tool.h index ac9f42f1c..be96f5005 100644 --- a/ceph/src/rocksdb/tools/trace_analyzer_tool.h +++ b/ceph/src/rocksdb/tools/trace_analyzer_tool.h @@ -115,12 +115,15 @@ struct TraceStats { top_k_qps_sec; std::list time_series; std::vector> correlation_output; + std::map uni_key_num; std::unique_ptr time_series_f; std::unique_ptr a_key_f; std::unique_ptr a_count_dist_f; std::unique_ptr a_prefix_cut_f; std::unique_ptr a_value_size_f; + std::unique_ptr a_key_size_f; + std::unique_ptr a_key_num_f; std::unique_ptr a_qps_f; std::unique_ptr a_top_qps_prefix_f; std::unique_ptr w_key_f; @@ -140,6 +143,7 @@ struct TypeUnit { uint64_t total_keys; uint64_t total_access; uint64_t total_succ_access; + uint32_t sample_count; std::map stats; TypeUnit() = default; ~TypeUnit() = default; @@ -155,6 +159,7 @@ struct CfUnit { uint64_t a_count; // the total keys in this cf that are accessed std::map w_key_size_stats; // whole key space key size // statistic this cf + std::map cf_qps; }; class TraceAnalyzer { @@ -204,11 +209,15 @@ class TraceAnalyzer { uint64_t total_access_keys_; uint64_t total_gets_; uint64_t total_writes_; + uint64_t trace_create_time_; uint64_t begin_time_; uint64_t end_time_; uint64_t time_series_start_; + uint32_t sample_max_; + uint32_t cur_time_sec_; std::unique_ptr trace_sequence_f_; // readable trace std::unique_ptr qps_f_; // overall qps + std::unique_ptr cf_qps_f_; // The qps of each CF> std::unique_ptr wkey_input_f_; std::vector ta_; // The main statistic collecting data structure std::map cfs_; // All the cf_id appears in this trace; diff --git a/ceph/src/rocksdb/util/auto_roll_logger_test.cc b/ceph/src/rocksdb/util/auto_roll_logger_test.cc index 5a6b3abc1..ab9e05958 100644 --- a/ceph/src/rocksdb/util/auto_roll_logger_test.cc +++ b/ceph/src/rocksdb/util/auto_roll_logger_test.cc @@ -28,13 +28,13 @@ namespace { class NoSleepEnv : public EnvWrapper { public: NoSleepEnv(Env* base) : EnvWrapper(base) {} - virtual void SleepForMicroseconds(int micros) override { + void SleepForMicroseconds(int micros) override { fake_time_ += static_cast(micros); } - virtual uint64_t NowNanos() override { return fake_time_ * 1000; } + uint64_t NowNanos() override { return fake_time_ * 1000; } - virtual uint64_t NowMicros() override { return fake_time_; } + uint64_t NowMicros() override { return fake_time_; } private: uint64_t fake_time_ = 6666666666; @@ -230,7 +230,7 @@ TEST_F(AutoRollLoggerTest, CompositeRollByTimeAndSizeLogger) { TEST_F(AutoRollLoggerTest, CreateLoggerFromOptions) { DBOptions options; NoSleepEnv nse(Env::Default()); - shared_ptr logger; + std::shared_ptr logger; // Normal logger ASSERT_OK(CreateLoggerFromOptions(kTestDir, options, &logger)); @@ -273,7 +273,7 @@ TEST_F(AutoRollLoggerTest, CreateLoggerFromOptions) { TEST_F(AutoRollLoggerTest, LogFlushWhileRolling) { DBOptions options; - shared_ptr logger; + std::shared_ptr logger; InitTestDb(); options.max_log_file_size = 1024 * 5; @@ -452,12 +452,12 @@ TEST_F(AutoRollLoggerTest, LogHeaderTest) { if (test_num == 0) { // Log some headers explicitly using Header() for (size_t i = 0; i < MAX_HEADERS; i++) { - Header(&logger, "%s %d", HEADER_STR.c_str(), i); + Header(&logger, "%s %" ROCKSDB_PRIszt, HEADER_STR.c_str(), i); } } else if (test_num == 1) { // HEADER_LEVEL should make this behave like calling Header() for (size_t i = 0; i < MAX_HEADERS; i++) { - ROCKS_LOG_HEADER(&logger, "%s %d", HEADER_STR.c_str(), i); + ROCKS_LOG_HEADER(&logger, "%s %" ROCKSDB_PRIszt, HEADER_STR.c_str(), i); } } diff --git a/ceph/src/rocksdb/util/autovector.h b/ceph/src/rocksdb/util/autovector.h index b5c847124..5843fa8a1 100644 --- a/ceph/src/rocksdb/util/autovector.h +++ b/ceph/src/rocksdb/util/autovector.h @@ -179,15 +179,16 @@ class autovector { typedef std::reverse_iterator reverse_iterator; typedef std::reverse_iterator const_reverse_iterator; - autovector() = default; + autovector() : values_(reinterpret_cast(buf_)) {} - autovector(std::initializer_list init_list) { + autovector(std::initializer_list init_list) + : values_(reinterpret_cast(buf_)) { for (const T& item : init_list) { push_back(item); } } - ~autovector() = default; + ~autovector() { clear(); } // -- Immutable operations // Indicate if all data resides in in-stack data structure. @@ -203,10 +204,18 @@ class autovector { void resize(size_type n) { if (n > kSize) { vect_.resize(n - kSize); + while (num_stack_items_ < kSize) { + new ((void*)(&values_[num_stack_items_++])) value_type(); + } num_stack_items_ = kSize; } else { vect_.clear(); - num_stack_items_ = n; + while (num_stack_items_ < n) { + new ((void*)(&values_[num_stack_items_++])) value_type(); + } + while (num_stack_items_ > n) { + values_[--num_stack_items_].~value_type(); + } } } @@ -214,12 +223,18 @@ class autovector { const_reference operator[](size_type n) const { assert(n < size()); - return n < kSize ? values_[n] : vect_[n - kSize]; + if (n < kSize) { + return values_[n]; + } + return vect_[n - kSize]; } reference operator[](size_type n) { assert(n < size()); - return n < kSize ? values_[n] : vect_[n - kSize]; + if (n < kSize) { + return values_[n]; + } + return vect_[n - kSize]; } const_reference at(size_type n) const { @@ -255,6 +270,7 @@ class autovector { // -- Mutable Operations void push_back(T&& item) { if (num_stack_items_ < kSize) { + new ((void*)(&values_[num_stack_items_])) value_type(); values_[num_stack_items_++] = std::move(item); } else { vect_.push_back(item); @@ -263,6 +279,7 @@ class autovector { void push_back(const T& item) { if (num_stack_items_ < kSize) { + new ((void*)(&values_[num_stack_items_])) value_type(); values_[num_stack_items_++] = item; } else { vect_.push_back(item); @@ -271,7 +288,12 @@ class autovector { template void emplace_back(Args&&... args) { - push_back(value_type(args...)); + if (num_stack_items_ < kSize) { + new ((void*)(&values_[num_stack_items_++])) + value_type(std::forward(args)...); + } else { + vect_.emplace_back(std::forward(args)...); + } } void pop_back() { @@ -279,12 +301,14 @@ class autovector { if (!vect_.empty()) { vect_.pop_back(); } else { - --num_stack_items_; + values_[--num_stack_items_].~value_type(); } } void clear() { - num_stack_items_ = 0; + while (num_stack_items_ > 0) { + values_[--num_stack_items_].~value_type(); + } vect_.clear(); } @@ -318,13 +342,17 @@ class autovector { private: size_type num_stack_items_ = 0; // current number of items - value_type values_[kSize]; // the first `kSize` items + alignas(alignof( + value_type)) char buf_[kSize * + sizeof(value_type)]; // the first `kSize` items + pointer values_; // used only if there are more than `kSize` items. std::vector vect_; }; template autovector& autovector::assign(const autovector& other) { + values_ = reinterpret_cast(buf_); // copy the internal vector vect_.assign(other.vect_.begin(), other.vect_.end()); diff --git a/ceph/src/rocksdb/util/bloom.cc b/ceph/src/rocksdb/util/bloom.cc index a20533341..9c05f7107 100644 --- a/ceph/src/rocksdb/util/bloom.cc +++ b/ceph/src/rocksdb/util/bloom.cc @@ -172,9 +172,9 @@ class FullFilterBitsReader : public FilterBitsReader { } } - ~FullFilterBitsReader() {} + ~FullFilterBitsReader() override {} - virtual bool MayMatch(const Slice& entry) override { + bool MayMatch(const Slice& entry) override { if (data_len_ <= 5) { // remain same with original filter return false; } @@ -274,15 +274,11 @@ class BloomFilterPolicy : public FilterPolicy { initialize(); } - ~BloomFilterPolicy() { - } + ~BloomFilterPolicy() override {} - virtual const char* Name() const override { - return "rocksdb.BuiltinBloomFilter"; - } + const char* Name() const override { return "rocksdb.BuiltinBloomFilter"; } - virtual void CreateFilter(const Slice* keys, int n, - std::string* dst) const override { + void CreateFilter(const Slice* keys, int n, std::string* dst) const override { // Compute bloom filter size (in both bits and bytes) size_t bits = n * bits_per_key_; @@ -310,8 +306,7 @@ class BloomFilterPolicy : public FilterPolicy { } } - virtual bool KeyMayMatch(const Slice& key, - const Slice& bloom_filter) const override { + bool KeyMayMatch(const Slice& key, const Slice& bloom_filter) const override { const size_t len = bloom_filter.size(); if (len < 2) return false; @@ -337,7 +332,7 @@ class BloomFilterPolicy : public FilterPolicy { return true; } - virtual FilterBitsBuilder* GetFilterBitsBuilder() const override { + FilterBitsBuilder* GetFilterBitsBuilder() const override { if (use_block_based_builder_) { return nullptr; } @@ -345,8 +340,7 @@ class BloomFilterPolicy : public FilterPolicy { return new FullFilterBitsBuilder(bits_per_key_, num_probes_); } - virtual FilterBitsReader* GetFilterBitsReader(const Slice& contents) - const override { + FilterBitsReader* GetFilterBitsReader(const Slice& contents) const override { return new FullFilterBitsReader(contents); } diff --git a/ceph/src/rocksdb/util/bloom_test.cc b/ceph/src/rocksdb/util/bloom_test.cc index bbf1d3ae9..4b25e9b6c 100644 --- a/ceph/src/rocksdb/util/bloom_test.cc +++ b/ceph/src/rocksdb/util/bloom_test.cc @@ -63,9 +63,7 @@ class BloomTest : public testing::Test { BloomTest() : policy_( NewBloomFilterPolicy(FLAGS_bits_per_key)) {} - ~BloomTest() { - delete policy_; - } + ~BloomTest() override { delete policy_; } void Reset() { keys_.clear(); @@ -192,9 +190,7 @@ class FullBloomTest : public testing::Test { Reset(); } - ~FullBloomTest() { - delete policy_; - } + ~FullBloomTest() override { delete policy_; } FullFilterBitsBuilder* GetFullFilterBitsBuilder() { return dynamic_cast(bits_builder_.get()); diff --git a/ceph/src/rocksdb/util/compaction_job_stats_impl.cc b/ceph/src/rocksdb/util/compaction_job_stats_impl.cc index 612af6f27..a1ebc8b96 100644 --- a/ceph/src/rocksdb/util/compaction_job_stats_impl.cc +++ b/ceph/src/rocksdb/util/compaction_job_stats_impl.cc @@ -11,6 +11,7 @@ namespace rocksdb { void CompactionJobStats::Reset() { elapsed_micros = 0; + cpu_micros = 0; num_input_records = 0; num_input_files = 0; @@ -45,6 +46,7 @@ void CompactionJobStats::Reset() { void CompactionJobStats::Add(const CompactionJobStats& stats) { elapsed_micros += stats.elapsed_micros; + cpu_micros += stats.cpu_micros; num_input_records += stats.num_input_records; num_input_files += stats.num_input_files; diff --git a/ceph/src/rocksdb/util/comparator.cc b/ceph/src/rocksdb/util/comparator.cc index c1a129639..b42c23725 100644 --- a/ceph/src/rocksdb/util/comparator.cc +++ b/ceph/src/rocksdb/util/comparator.cc @@ -22,20 +22,16 @@ class BytewiseComparatorImpl : public Comparator { public: BytewiseComparatorImpl() { } - virtual const char* Name() const override { - return "leveldb.BytewiseComparator"; - } + const char* Name() const override { return "leveldb.BytewiseComparator"; } - virtual int Compare(const Slice& a, const Slice& b) const override { + int Compare(const Slice& a, const Slice& b) const override { return a.compare(b); } - virtual bool Equal(const Slice& a, const Slice& b) const override { - return a == b; - } + bool Equal(const Slice& a, const Slice& b) const override { return a == b; } - virtual void FindShortestSeparator(std::string* start, - const Slice& limit) const override { + void FindShortestSeparator(std::string* start, + const Slice& limit) const override { // Find length of common prefix size_t min_length = std::min(start->size(), limit.size()); size_t diff_index = 0; @@ -85,7 +81,7 @@ class BytewiseComparatorImpl : public Comparator { } } - virtual void FindShortSuccessor(std::string* key) const override { + void FindShortSuccessor(std::string* key) const override { // Find first character that can be incremented size_t n = key->size(); for (size_t i = 0; i < n; i++) { @@ -99,8 +95,8 @@ class BytewiseComparatorImpl : public Comparator { // *key is a run of 0xffs. Leave it alone. } - virtual bool IsSameLengthImmediateSuccessor(const Slice& s, - const Slice& t) const override { + bool IsSameLengthImmediateSuccessor(const Slice& s, + const Slice& t) const override { if (s.size() != t.size() || s.size() == 0) { return false; } @@ -125,7 +121,7 @@ class BytewiseComparatorImpl : public Comparator { } } - virtual bool CanKeysWithDifferentByteContentsBeEqual() const override { + bool CanKeysWithDifferentByteContentsBeEqual() const override { return false; } }; @@ -134,11 +130,11 @@ class ReverseBytewiseComparatorImpl : public BytewiseComparatorImpl { public: ReverseBytewiseComparatorImpl() { } - virtual const char* Name() const override { + const char* Name() const override { return "rocksdb.ReverseBytewiseComparator"; } - virtual int Compare(const Slice& a, const Slice& b) const override { + int Compare(const Slice& a, const Slice& b) const override { return -a.compare(b); } @@ -193,7 +189,7 @@ class ReverseBytewiseComparatorImpl : public BytewiseComparatorImpl { // Don't do anything for simplicity. } - virtual bool CanKeysWithDifferentByteContentsBeEqual() const override { + bool CanKeysWithDifferentByteContentsBeEqual() const override { return false; } }; diff --git a/ceph/src/rocksdb/util/compression.h b/ceph/src/rocksdb/util/compression.h index e918e14fb..b901ceb35 100644 --- a/ceph/src/rocksdb/util/compression.h +++ b/ceph/src/rocksdb/util/compression.h @@ -11,11 +11,21 @@ #include #include +#ifdef ROCKSDB_MALLOC_USABLE_SIZE +#ifdef OS_FREEBSD +#include +#else // OS_FREEBSD +#include +#endif // OS_FREEBSD +#endif // ROCKSDB_MALLOC_USABLE_SIZE #include #include "rocksdb/options.h" +#include "rocksdb/table.h" #include "util/coding.h" #include "util/compression_context_cache.h" +#include "util/memory_allocator.h" +#include "util/string_util.h" #ifdef SNAPPY #include @@ -52,6 +62,16 @@ ZSTD_customMem GetJeZstdAllocationOverrides(); #endif // defined(ROCKSDB_JEMALLOC) && defined(OS_WIN) && // defined(ZSTD_STATIC_LINKING_ONLY) +// We require `ZSTD_sizeof_DDict` and `ZSTD_createDDict_byReference` to use +// `ZSTD_DDict`. The former was introduced in v1.0.0 and the latter was +// introduced in v1.1.3. But an important bug fix for `ZSTD_sizeof_DDict` came +// in v1.1.4, so that is the version we require. As of today's latest version +// (v1.3.8), they are both still in the experimental API, which means they are +// only exported when the compiler flag `ZSTD_STATIC_LINKING_ONLY` is set. +#if defined(ZSTD_STATIC_LINKING_ONLY) && ZSTD_VERSION_NUMBER >= 10104 +#define ROCKSDB_ZSTD_DDICT +#endif // defined(ZSTD_STATIC_LINKING_ONLY) && ZSTD_VERSION_NUMBER >= 10104 + // Cached data represents a portion that can be re-used // If, in the future we have more than one native context to // cache we can arrange this as a tuple @@ -133,16 +153,147 @@ class ZSTDUncompressCachedData { namespace rocksdb { -// Instantiate this class and pass it to the uncompression API below +// Holds dictionary and related data, like ZSTD's digested compression +// dictionary. +struct CompressionDict { +#if ZSTD_VERSION_NUMBER >= 700 + ZSTD_CDict* zstd_cdict_ = nullptr; +#endif // ZSTD_VERSION_NUMBER >= 700 + std::string dict_; + + public: +#if ZSTD_VERSION_NUMBER >= 700 + CompressionDict(std::string dict, CompressionType type, int level) { +#else // ZSTD_VERSION_NUMBER >= 700 + CompressionDict(std::string dict, CompressionType /*type*/, int /*level*/) { +#endif // ZSTD_VERSION_NUMBER >= 700 + dict_ = std::move(dict); +#if ZSTD_VERSION_NUMBER >= 700 + zstd_cdict_ = nullptr; + if (!dict_.empty() && (type == kZSTD || type == kZSTDNotFinalCompression)) { + if (level == CompressionOptions::kDefaultCompressionLevel) { + // 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see + // https://github.com/facebook/zstd/issues/1148 + level = 3; + } + // Should be safe (but slower) if below call fails as we'll use the + // raw dictionary to compress. + zstd_cdict_ = ZSTD_createCDict(dict_.data(), dict_.size(), level); + assert(zstd_cdict_ != nullptr); + } +#endif // ZSTD_VERSION_NUMBER >= 700 + } + + ~CompressionDict() { +#if ZSTD_VERSION_NUMBER >= 700 + size_t res = 0; + if (zstd_cdict_ != nullptr) { + res = ZSTD_freeCDict(zstd_cdict_); + } + assert(res == 0); // Last I checked they can't fail + (void)res; // prevent unused var warning +#endif // ZSTD_VERSION_NUMBER >= 700 + } + +#if ZSTD_VERSION_NUMBER >= 700 + const ZSTD_CDict* GetDigestedZstdCDict() const { return zstd_cdict_; } +#endif // ZSTD_VERSION_NUMBER >= 700 + + Slice GetRawDict() const { return dict_; } + + static const CompressionDict& GetEmptyDict() { + static CompressionDict empty_dict{}; + return empty_dict; + } + + CompressionDict() = default; + // Disable copy/move + CompressionDict(const CompressionDict&) = delete; + CompressionDict& operator=(const CompressionDict&) = delete; + CompressionDict(CompressionDict&&) = delete; + CompressionDict& operator=(CompressionDict&&) = delete; +}; + +// Holds dictionary and related data, like ZSTD's digested uncompression +// dictionary. +struct UncompressionDict { +#ifdef ROCKSDB_ZSTD_DDICT + ZSTD_DDict* zstd_ddict_; +#endif // ROCKSDB_ZSTD_DDICT + // Block containing the data for the compression dictionary. It may be + // redundant with the data held in `zstd_ddict_`. + std::string dict_; + // This `Statistics` pointer is intended to be used upon block cache eviction, + // so only needs to be populated on `UncompressionDict`s that'll be inserted + // into block cache. + Statistics* statistics_; + +#ifdef ROCKSDB_ZSTD_DDICT + UncompressionDict(std::string dict, bool using_zstd, + Statistics* _statistics = nullptr) { +#else // ROCKSDB_ZSTD_DDICT + UncompressionDict(std::string dict, bool /*using_zstd*/, + Statistics* _statistics = nullptr) { +#endif // ROCKSDB_ZSTD_DDICT + dict_ = std::move(dict); + statistics_ = _statistics; +#ifdef ROCKSDB_ZSTD_DDICT + zstd_ddict_ = nullptr; + if (!dict_.empty() && using_zstd) { + zstd_ddict_ = ZSTD_createDDict_byReference(dict_.data(), dict_.size()); + assert(zstd_ddict_ != nullptr); + } +#endif // ROCKSDB_ZSTD_DDICT + } + + ~UncompressionDict() { +#ifdef ROCKSDB_ZSTD_DDICT + size_t res = 0; + if (zstd_ddict_ != nullptr) { + res = ZSTD_freeDDict(zstd_ddict_); + } + assert(res == 0); // Last I checked they can't fail + (void)res; // prevent unused var warning +#endif // ROCKSDB_ZSTD_DDICT + } + +#ifdef ROCKSDB_ZSTD_DDICT + const ZSTD_DDict* GetDigestedZstdDDict() const { return zstd_ddict_; } +#endif // ROCKSDB_ZSTD_DDICT + + Slice GetRawDict() const { return dict_; } + + static const UncompressionDict& GetEmptyDict() { + static UncompressionDict empty_dict{}; + return empty_dict; + } + + Statistics* statistics() const { return statistics_; } + + size_t ApproximateMemoryUsage() { + size_t usage = 0; + usage += sizeof(struct UncompressionDict); +#ifdef ROCKSDB_ZSTD_DDICT + usage += ZSTD_sizeof_DDict(zstd_ddict_); +#endif // ROCKSDB_ZSTD_DDICT + usage += dict_.size(); + return usage; + } + + UncompressionDict() = default; + // Disable copy/move + UncompressionDict(const CompressionDict&) = delete; + UncompressionDict& operator=(const CompressionDict&) = delete; + UncompressionDict(CompressionDict&&) = delete; + UncompressionDict& operator=(CompressionDict&&) = delete; +}; + class CompressionContext { private: - const CompressionType type_; - const CompressionOptions opts_; - Slice dict_; #if defined(ZSTD) && (ZSTD_VERSION_NUMBER >= 500) ZSTD_CCtx* zstd_ctx_ = nullptr; - void CreateNativeContext() { - if (type_ == kZSTD || type_ == kZSTDNotFinalCompression) { + void CreateNativeContext(CompressionType type) { + if (type == kZSTD || type == kZSTDNotFinalCompression) { #ifdef ROCKSDB_ZSTD_CUSTOM_MEM zstd_ctx_ = ZSTD_createCCtx_advanced(port::GetJeZstdAllocationOverrides()); @@ -160,57 +311,67 @@ class CompressionContext { public: // callable inside ZSTD_Compress ZSTD_CCtx* ZSTDPreallocCtx() const { - assert(type_ == kZSTD || type_ == kZSTDNotFinalCompression); + assert(zstd_ctx_ != nullptr); return zstd_ctx_; } + #else // ZSTD && (ZSTD_VERSION_NUMBER >= 500) private: - void CreateNativeContext() {} + void CreateNativeContext(CompressionType /* type */) {} void DestroyNativeContext() {} #endif // ZSTD && (ZSTD_VERSION_NUMBER >= 500) public: - explicit CompressionContext(CompressionType comp_type) : type_(comp_type) { - CreateNativeContext(); - } - CompressionContext(CompressionType comp_type, const CompressionOptions& opts, - const Slice& comp_dict = Slice()) - : type_(comp_type), opts_(opts), dict_(comp_dict) { - CreateNativeContext(); + explicit CompressionContext(CompressionType type) { + CreateNativeContext(type); } ~CompressionContext() { DestroyNativeContext(); } CompressionContext(const CompressionContext&) = delete; CompressionContext& operator=(const CompressionContext&) = delete; +}; + +class CompressionInfo { + const CompressionOptions& opts_; + const CompressionContext& context_; + const CompressionDict& dict_; + const CompressionType type_; + const uint64_t sample_for_compression_; + + public: + CompressionInfo(const CompressionOptions& _opts, + const CompressionContext& _context, + const CompressionDict& _dict, CompressionType _type, + uint64_t _sample_for_compression) + : opts_(_opts), + context_(_context), + dict_(_dict), + type_(_type), + sample_for_compression_(_sample_for_compression) {} const CompressionOptions& options() const { return opts_; } + const CompressionContext& context() const { return context_; } + const CompressionDict& dict() const { return dict_; } CompressionType type() const { return type_; } - const Slice& dict() const { return dict_; } - Slice& dict() { return dict_; } + uint64_t SampleForCompression() const { return sample_for_compression_; } }; -// Instantiate this class and pass it to the uncompression API below class UncompressionContext { private: - CompressionType type_; - Slice dict_; CompressionContextCache* ctx_cache_ = nullptr; ZSTDUncompressCachedData uncomp_cached_data_; public: struct NoCache {}; // Do not use context cache, used by TableBuilder - UncompressionContext(NoCache, CompressionType comp_type) : type_(comp_type) {} - explicit UncompressionContext(CompressionType comp_type) - : UncompressionContext(comp_type, Slice()) {} - UncompressionContext(CompressionType comp_type, const Slice& comp_dict) - : type_(comp_type), dict_(comp_dict) { - if (type_ == kZSTD || type_ == kZSTDNotFinalCompression) { + UncompressionContext(NoCache, CompressionType /* type */) {} + + explicit UncompressionContext(CompressionType type) { + if (type == kZSTD || type == kZSTDNotFinalCompression) { ctx_cache_ = CompressionContextCache::Instance(); uncomp_cached_data_ = ctx_cache_->GetCachedZSTDUncompressData(); } } ~UncompressionContext() { - if ((type_ == kZSTD || type_ == kZSTDNotFinalCompression) && - uncomp_cached_data_.GetCacheIndex() != -1) { + if (uncomp_cached_data_.GetCacheIndex() != -1) { assert(ctx_cache_ != nullptr); ctx_cache_->ReturnCachedZSTDUncompressData( uncomp_cached_data_.GetCacheIndex()); @@ -222,9 +383,21 @@ class UncompressionContext { ZSTDUncompressCachedData::ZSTDNativeContext GetZSTDContext() const { return uncomp_cached_data_.Get(); } +}; + +class UncompressionInfo { + const UncompressionContext& context_; + const UncompressionDict& dict_; + const CompressionType type_; + + public: + UncompressionInfo(const UncompressionContext& _context, + const UncompressionDict& _dict, CompressionType _type) + : context_(_context), dict_(_dict), type_(_type) {} + + const UncompressionContext& context() const { return context_; } + const UncompressionDict& dict() const { return dict_; } CompressionType type() const { return type_; } - const Slice& dict() const { return dict_; } - Slice& dict() { return dict_; } }; inline bool Snappy_Supported() { @@ -336,6 +509,31 @@ inline std::string CompressionTypeToString(CompressionType compression_type) { } } +inline std::string CompressionOptionsToString( + CompressionOptions& compression_options) { + std::string result; + result.reserve(512); + result.append("window_bits=") + .append(ToString(compression_options.window_bits)) + .append("; "); + result.append("level=") + .append(ToString(compression_options.level)) + .append("; "); + result.append("strategy=") + .append(ToString(compression_options.strategy)) + .append("; "); + result.append("max_dict_bytes=") + .append(ToString(compression_options.max_dict_bytes)) + .append("; "); + result.append("zstd_max_train_bytes=") + .append(ToString(compression_options.zstd_max_train_bytes)) + .append("; "); + result.append("enabled=") + .append(ToString(compression_options.enabled)) + .append("; "); + return result; +} + // compress_format_version can have two values: // 1 -- decompressed sizes for BZip2 and Zlib are not included in the compressed // block. Also, decompressed sizes for LZ4 are encoded in platform-dependent @@ -343,9 +541,8 @@ inline std::string CompressionTypeToString(CompressionType compression_type) { // 2 -- Zlib, BZip2 and LZ4 encode decompressed size as Varint32 just before the // start of compressed block. Snappy format is the same as version 1. -inline bool Snappy_Compress(const CompressionContext& /*ctx*/, - const char* input, size_t length, - ::std::string* output) { +inline bool Snappy_Compress(const CompressionInfo& /*info*/, const char* input, + size_t length, ::std::string* output) { #ifdef SNAPPY output->resize(snappy::MaxCompressedLength(length)); size_t outlen; @@ -410,7 +607,7 @@ inline bool GetDecompressedSizeInfo(const char** input_data, // header in varint32 format // @param compression_dict Data for presetting the compression library's // dictionary. -inline bool Zlib_Compress(const CompressionContext& ctx, +inline bool Zlib_Compress(const CompressionInfo& info, uint32_t compress_format_version, const char* input, size_t length, ::std::string* output) { #ifdef ZLIB @@ -435,24 +632,25 @@ inline bool Zlib_Compress(const CompressionContext& ctx, // The default value is 8. See zconf.h for more details. static const int memLevel = 8; int level; - if (ctx.options().level == CompressionOptions::kDefaultCompressionLevel) { + if (info.options().level == CompressionOptions::kDefaultCompressionLevel) { level = Z_DEFAULT_COMPRESSION; } else { - level = ctx.options().level; + level = info.options().level; } z_stream _stream; memset(&_stream, 0, sizeof(z_stream)); - int st = deflateInit2(&_stream, level, Z_DEFLATED, ctx.options().window_bits, - memLevel, ctx.options().strategy); + int st = deflateInit2(&_stream, level, Z_DEFLATED, info.options().window_bits, + memLevel, info.options().strategy); if (st != Z_OK) { return false; } - if (ctx.dict().size()) { + Slice compression_dict = info.dict().GetRawDict(); + if (compression_dict.size()) { // Initialize the compression library's dictionary - st = deflateSetDictionary(&_stream, - reinterpret_cast(ctx.dict().data()), - static_cast(ctx.dict().size())); + st = deflateSetDictionary( + &_stream, reinterpret_cast(compression_dict.data()), + static_cast(compression_dict.size())); if (st != Z_OK) { deflateEnd(&_stream); return false; @@ -480,7 +678,7 @@ inline bool Zlib_Compress(const CompressionContext& ctx, deflateEnd(&_stream); return compressed; #else - (void)ctx; + (void)info; (void)compress_format_version; (void)input; (void)length; @@ -495,11 +693,10 @@ inline bool Zlib_Compress(const CompressionContext& ctx, // header in varint32 format // @param compression_dict Data for presetting the compression library's // dictionary. -inline char* Zlib_Uncompress(const UncompressionContext& ctx, - const char* input_data, size_t input_length, - int* decompress_size, - uint32_t compress_format_version, - int windowBits = -14) { +inline CacheAllocationPtr Zlib_Uncompress( + const UncompressionInfo& info, const char* input_data, size_t input_length, + int* decompress_size, uint32_t compress_format_version, + MemoryAllocator* allocator = nullptr, int windowBits = -14) { #ifdef ZLIB uint32_t output_len = 0; if (compress_format_version == 2) { @@ -528,11 +725,12 @@ inline char* Zlib_Uncompress(const UncompressionContext& ctx, return nullptr; } - if (ctx.dict().size()) { + Slice compression_dict = info.dict().GetRawDict(); + if (compression_dict.size()) { // Initialize the compression library's dictionary - st = inflateSetDictionary(&_stream, - reinterpret_cast(ctx.dict().data()), - static_cast(ctx.dict().size())); + st = inflateSetDictionary( + &_stream, reinterpret_cast(compression_dict.data()), + static_cast(compression_dict.size())); if (st != Z_OK) { return nullptr; } @@ -541,9 +739,9 @@ inline char* Zlib_Uncompress(const UncompressionContext& ctx, _stream.next_in = (Bytef*)input_data; _stream.avail_in = static_cast(input_length); - char* output = new char[output_len]; + auto output = AllocateBlock(output_len, allocator); - _stream.next_out = (Bytef*)output; + _stream.next_out = (Bytef*)output.get(); _stream.avail_out = static_cast(output_len); bool done = false; @@ -561,19 +759,17 @@ inline char* Zlib_Uncompress(const UncompressionContext& ctx, size_t old_sz = output_len; uint32_t output_len_delta = output_len / 5; output_len += output_len_delta < 10 ? 10 : output_len_delta; - char* tmp = new char[output_len]; - memcpy(tmp, output, old_sz); - delete[] output; - output = tmp; + auto tmp = AllocateBlock(output_len, allocator); + memcpy(tmp.get(), output.get(), old_sz); + output = std::move(tmp); // Set more output. - _stream.next_out = (Bytef*)(output + old_sz); + _stream.next_out = (Bytef*)(output.get() + old_sz); _stream.avail_out = static_cast(output_len - old_sz); break; } case Z_BUF_ERROR: default: - delete[] output; inflateEnd(&_stream); return nullptr; } @@ -585,11 +781,12 @@ inline char* Zlib_Uncompress(const UncompressionContext& ctx, inflateEnd(&_stream); return output; #else - (void)ctx; + (void)info; (void)input_data; (void)input_length; (void)decompress_size; (void)compress_format_version; + (void)allocator; (void)windowBits; return nullptr; #endif @@ -599,7 +796,7 @@ inline char* Zlib_Uncompress(const UncompressionContext& ctx, // block header // compress_format_version == 2 -- decompressed size is included in the block // header in varint32 format -inline bool BZip2_Compress(const CompressionContext& /*ctx*/, +inline bool BZip2_Compress(const CompressionInfo& /*info*/, uint32_t compress_format_version, const char* input, size_t length, ::std::string* output) { #ifdef BZIP2 @@ -660,9 +857,9 @@ inline bool BZip2_Compress(const CompressionContext& /*ctx*/, // block header // compress_format_version == 2 -- decompressed size is included in the block // header in varint32 format -inline char* BZip2_Uncompress(const char* input_data, size_t input_length, - int* decompress_size, - uint32_t compress_format_version) { +inline CacheAllocationPtr BZip2_Uncompress( + const char* input_data, size_t input_length, int* decompress_size, + uint32_t compress_format_version, MemoryAllocator* allocator = nullptr) { #ifdef BZIP2 uint32_t output_len = 0; if (compress_format_version == 2) { @@ -690,9 +887,9 @@ inline char* BZip2_Uncompress(const char* input_data, size_t input_length, _stream.next_in = (char*)input_data; _stream.avail_in = static_cast(input_length); - char* output = new char[output_len]; + auto output = AllocateBlock(output_len, allocator); - _stream.next_out = (char*)output; + _stream.next_out = (char*)output.get(); _stream.avail_out = static_cast(output_len); bool done = false; @@ -709,18 +906,16 @@ inline char* BZip2_Uncompress(const char* input_data, size_t input_length, assert(compress_format_version != 2); uint32_t old_sz = output_len; output_len = output_len * 1.2; - char* tmp = new char[output_len]; - memcpy(tmp, output, old_sz); - delete[] output; - output = tmp; + auto tmp = AllocateBlock(output_len, allocator); + memcpy(tmp.get(), output.get(), old_sz); + output = std::move(tmp); // Set more output. - _stream.next_out = (char*)(output + old_sz); + _stream.next_out = (char*)(output.get() + old_sz); _stream.avail_out = static_cast(output_len - old_sz); break; } default: - delete[] output; BZ2_bzDecompressEnd(&_stream); return nullptr; } @@ -736,6 +931,7 @@ inline char* BZip2_Uncompress(const char* input_data, size_t input_length, (void)input_length; (void)decompress_size; (void)compress_format_version; + (void)allocator; return nullptr; #endif } @@ -746,7 +942,7 @@ inline char* BZip2_Uncompress(const char* input_data, size_t input_length, // header in varint32 format // @param compression_dict Data for presetting the compression library's // dictionary. -inline bool LZ4_Compress(const CompressionContext& ctx, +inline bool LZ4_Compress(const CompressionInfo& info, uint32_t compress_format_version, const char* input, size_t length, ::std::string* output) { #ifdef LZ4 @@ -774,9 +970,10 @@ inline bool LZ4_Compress(const CompressionContext& ctx, int outlen; #if LZ4_VERSION_NUMBER >= 10400 // r124+ LZ4_stream_t* stream = LZ4_createStream(); - if (ctx.dict().size()) { - LZ4_loadDict(stream, ctx.dict().data(), - static_cast(ctx.dict().size())); + Slice compression_dict = info.dict().GetRawDict(); + if (compression_dict.size()) { + LZ4_loadDict(stream, compression_dict.data(), + static_cast(compression_dict.size())); } #if LZ4_VERSION_NUMBER >= 10700 // r129+ outlen = @@ -791,6 +988,7 @@ inline bool LZ4_Compress(const CompressionContext& ctx, #else // up to r123 outlen = LZ4_compress_limitedOutput(input, &(*output)[output_header_len], static_cast(length), compress_bound); + (void)ctx; #endif // LZ4_VERSION_NUMBER >= 10400 if (outlen == 0) { @@ -799,7 +997,7 @@ inline bool LZ4_Compress(const CompressionContext& ctx, output->resize(static_cast(output_header_len + outlen)); return true; #else // LZ4 - (void)ctx; + (void)info; (void)compress_format_version; (void)input; (void)length; @@ -814,10 +1012,12 @@ inline bool LZ4_Compress(const CompressionContext& ctx, // header in varint32 format // @param compression_dict Data for presetting the compression library's // dictionary. -inline char* LZ4_Uncompress(const UncompressionContext& ctx, - const char* input_data, size_t input_length, - int* decompress_size, - uint32_t compress_format_version) { +inline CacheAllocationPtr LZ4_Uncompress(const UncompressionInfo& info, + const char* input_data, + size_t input_length, + int* decompress_size, + uint32_t compress_format_version, + MemoryAllocator* allocator = nullptr) { #ifdef LZ4 uint32_t output_len = 0; if (compress_format_version == 2) { @@ -837,35 +1037,37 @@ inline char* LZ4_Uncompress(const UncompressionContext& ctx, input_data += 8; } - char* output = new char[output_len]; + auto output = AllocateBlock(output_len, allocator); #if LZ4_VERSION_NUMBER >= 10400 // r124+ LZ4_streamDecode_t* stream = LZ4_createStreamDecode(); - if (ctx.dict().size()) { - LZ4_setStreamDecode(stream, ctx.dict().data(), - static_cast(ctx.dict().size())); + Slice compression_dict = info.dict().GetRawDict(); + if (compression_dict.size()) { + LZ4_setStreamDecode(stream, compression_dict.data(), + static_cast(compression_dict.size())); } *decompress_size = LZ4_decompress_safe_continue( - stream, input_data, output, static_cast(input_length), + stream, input_data, output.get(), static_cast(input_length), static_cast(output_len)); LZ4_freeStreamDecode(stream); #else // up to r123 - *decompress_size = - LZ4_decompress_safe(input_data, output, static_cast(input_length), - static_cast(output_len)); + *decompress_size = LZ4_decompress_safe(input_data, output.get(), + static_cast(input_length), + static_cast(output_len)); + (void)ctx; #endif // LZ4_VERSION_NUMBER >= 10400 if (*decompress_size < 0) { - delete[] output; return nullptr; } assert(*decompress_size == static_cast(output_len)); return output; #else // LZ4 - (void)ctx; + (void)info; (void)input_data; (void)input_length; (void)decompress_size; (void)compress_format_version; + (void)allocator; return nullptr; #endif } @@ -876,7 +1078,7 @@ inline char* LZ4_Uncompress(const UncompressionContext& ctx, // header in varint32 format // @param compression_dict Data for presetting the compression library's // dictionary. -inline bool LZ4HC_Compress(const CompressionContext& ctx, +inline bool LZ4HC_Compress(const CompressionInfo& info, uint32_t compress_format_version, const char* input, size_t length, ::std::string* output) { #ifdef LZ4 @@ -903,17 +1105,18 @@ inline bool LZ4HC_Compress(const CompressionContext& ctx, int outlen; int level; - if (ctx.options().level == CompressionOptions::kDefaultCompressionLevel) { + if (info.options().level == CompressionOptions::kDefaultCompressionLevel) { level = 0; // lz4hc.h says any value < 1 will be sanitized to default } else { - level = ctx.options().level; + level = info.options().level; } #if LZ4_VERSION_NUMBER >= 10400 // r124+ LZ4_streamHC_t* stream = LZ4_createStreamHC(); LZ4_resetStreamHC(stream, level); + Slice compression_dict = info.dict().GetRawDict(); const char* compression_dict_data = - ctx.dict().size() > 0 ? ctx.dict().data() : nullptr; - size_t compression_dict_size = ctx.dict().size(); + compression_dict.size() > 0 ? compression_dict.data() : nullptr; + size_t compression_dict_size = compression_dict.size(); LZ4_loadDictHC(stream, compression_dict_data, static_cast(compression_dict_size)); @@ -944,7 +1147,7 @@ inline bool LZ4HC_Compress(const CompressionContext& ctx, output->resize(static_cast(output_header_len + outlen)); return true; #else // LZ4 - (void)ctx; + (void)info; (void)compress_format_version; (void)input; (void)length; @@ -978,9 +1181,7 @@ inline char* XPRESS_Uncompress(const char* /*input_data*/, } #endif -// @param compression_dict Data for presetting the compression library's -// dictionary. -inline bool ZSTD_Compress(const CompressionContext& ctx, const char* input, +inline bool ZSTD_Compress(const CompressionInfo& info, const char* input, size_t length, ::std::string* output) { #ifdef ZSTD if (length > std::numeric_limits::max()) { @@ -995,19 +1196,29 @@ inline bool ZSTD_Compress(const CompressionContext& ctx, const char* input, output->resize(static_cast(output_header_len + compressBound)); size_t outlen = 0; int level; - if (ctx.options().level == CompressionOptions::kDefaultCompressionLevel) { + if (info.options().level == CompressionOptions::kDefaultCompressionLevel) { // 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see // https://github.com/facebook/zstd/issues/1148 level = 3; } else { - level = ctx.options().level; + level = info.options().level; } #if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+ - ZSTD_CCtx* context = ctx.ZSTDPreallocCtx(); + ZSTD_CCtx* context = info.context().ZSTDPreallocCtx(); assert(context != nullptr); - outlen = ZSTD_compress_usingDict(context, &(*output)[output_header_len], - compressBound, input, length, - ctx.dict().data(), ctx.dict().size(), level); +#if ZSTD_VERSION_NUMBER >= 700 // v0.7.0+ + if (info.dict().GetDigestedZstdCDict() != nullptr) { + outlen = ZSTD_compress_usingCDict(context, &(*output)[output_header_len], + compressBound, input, length, + info.dict().GetDigestedZstdCDict()); + } +#endif // ZSTD_VERSION_NUMBER >= 700 + if (outlen == 0) { + outlen = ZSTD_compress_usingDict(context, &(*output)[output_header_len], + compressBound, input, length, + info.dict().GetRawDict().data(), + info.dict().GetRawDict().size(), level); + } #else // up to v0.4.x outlen = ZSTD_compress(&(*output)[output_header_len], compressBound, input, length, level); @@ -1018,7 +1229,7 @@ inline bool ZSTD_Compress(const CompressionContext& ctx, const char* input, output->resize(output_header_len + outlen); return true; #else // ZSTD - (void)ctx; + (void)info; (void)input; (void)length; (void)output; @@ -1028,9 +1239,9 @@ inline bool ZSTD_Compress(const CompressionContext& ctx, const char* input, // @param compression_dict Data for presetting the compression library's // dictionary. -inline char* ZSTD_Uncompress(const UncompressionContext& ctx, - const char* input_data, size_t input_length, - int* decompress_size) { +inline CacheAllocationPtr ZSTD_Uncompress( + const UncompressionInfo& info, const char* input_data, size_t input_length, + int* decompress_size, MemoryAllocator* allocator = nullptr) { #ifdef ZSTD uint32_t output_len = 0; if (!compression::GetDecompressedSizeInfo(&input_data, &input_length, @@ -1038,30 +1249,52 @@ inline char* ZSTD_Uncompress(const UncompressionContext& ctx, return nullptr; } - char* output = new char[output_len]; - size_t actual_output_length; + auto output = AllocateBlock(output_len, allocator); + size_t actual_output_length = 0; #if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+ - ZSTD_DCtx* context = ctx.GetZSTDContext(); + ZSTD_DCtx* context = info.context().GetZSTDContext(); assert(context != nullptr); - actual_output_length = ZSTD_decompress_usingDict( - context, output, output_len, input_data, input_length, ctx.dict().data(), - ctx.dict().size()); +#ifdef ROCKSDB_ZSTD_DDICT + if (info.dict().GetDigestedZstdDDict() != nullptr) { + actual_output_length = ZSTD_decompress_usingDDict( + context, output.get(), output_len, input_data, input_length, + info.dict().GetDigestedZstdDDict()); + } +#endif // ROCKSDB_ZSTD_DDICT + if (actual_output_length == 0) { + actual_output_length = ZSTD_decompress_usingDict( + context, output.get(), output_len, input_data, input_length, + info.dict().GetRawDict().data(), info.dict().GetRawDict().size()); + } #else // up to v0.4.x + (void)info; actual_output_length = - ZSTD_decompress(output, output_len, input_data, input_length); + ZSTD_decompress(output.get(), output_len, input_data, input_length); #endif // ZSTD_VERSION_NUMBER >= 500 assert(actual_output_length == output_len); *decompress_size = static_cast(actual_output_length); return output; #else // ZSTD - (void)ctx; + (void)info; (void)input_data; (void)input_length; (void)decompress_size; + (void)allocator; return nullptr; #endif } +inline bool ZSTD_TrainDictionarySupported() { +#ifdef ZSTD + // Dictionary trainer is available since v0.6.1 for static linking, but not + // available for dynamic linking until v1.1.3. For now we enable the feature + // in v1.1.3+ only. + return (ZSTD_versionNumber() >= 10103); +#else + return false; +#endif +} + inline std::string ZSTD_TrainDictionary(const std::string& samples, const std::vector& sample_lens, size_t max_dict_bytes) { @@ -1069,6 +1302,10 @@ inline std::string ZSTD_TrainDictionary(const std::string& samples, // available for dynamic linking until v1.1.3. For now we enable the feature // in v1.1.3+ only. #if ZSTD_VERSION_NUMBER >= 10103 // v1.1.3+ + assert(samples.empty() == sample_lens.empty()); + if (samples.empty()) { + return ""; + } std::string dict_data(max_dict_bytes, '\0'); size_t dict_len = ZDICT_trainFromBuffer( &dict_data[0], max_dict_bytes, &samples[0], &sample_lens[0], diff --git a/ceph/src/rocksdb/util/concurrent_task_limiter_impl.cc b/ceph/src/rocksdb/util/concurrent_task_limiter_impl.cc new file mode 100644 index 000000000..e1ce4bef7 --- /dev/null +++ b/ceph/src/rocksdb/util/concurrent_task_limiter_impl.cc @@ -0,0 +1,67 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// Copyright (c) 2011 The LevelDB Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. See the AUTHORS file for names of contributors. + +#include "util/concurrent_task_limiter_impl.h" +#include "rocksdb/concurrent_task_limiter.h" + +namespace rocksdb { + +ConcurrentTaskLimiterImpl::ConcurrentTaskLimiterImpl( + const std::string& name, int32_t max_outstanding_task) + : name_(name), + max_outstanding_tasks_{max_outstanding_task}, + outstanding_tasks_{0} { + +} + +ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl() { + assert(outstanding_tasks_ == 0); +} + +const std::string& ConcurrentTaskLimiterImpl::GetName() const { + return name_; +} + +void ConcurrentTaskLimiterImpl::SetMaxOutstandingTask(int32_t limit) { + max_outstanding_tasks_.store(limit, std::memory_order_relaxed); +} + +void ConcurrentTaskLimiterImpl::ResetMaxOutstandingTask() { + max_outstanding_tasks_.store(-1, std::memory_order_relaxed); +} + +int32_t ConcurrentTaskLimiterImpl::GetOutstandingTask() const { + return outstanding_tasks_.load(std::memory_order_relaxed); +} + +std::unique_ptr ConcurrentTaskLimiterImpl::GetToken( + bool force) { + int32_t limit = max_outstanding_tasks_.load(std::memory_order_relaxed); + int32_t tasks = outstanding_tasks_.load(std::memory_order_relaxed); + // force = true, bypass the throttle. + // limit < 0 means unlimited tasks. + while (force || limit < 0 || tasks < limit) { + if (outstanding_tasks_.compare_exchange_weak(tasks, tasks + 1)) { + return std::unique_ptr(new TaskLimiterToken(this)); + } + } + return nullptr; +} + +ConcurrentTaskLimiter* NewConcurrentTaskLimiter( + const std::string& name, int32_t limit) { + return new ConcurrentTaskLimiterImpl(name, limit); +} + +TaskLimiterToken::~TaskLimiterToken() { + --limiter_->outstanding_tasks_; + assert(limiter_->outstanding_tasks_ >= 0); +} + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/concurrent_task_limiter_impl.h b/ceph/src/rocksdb/util/concurrent_task_limiter_impl.h new file mode 100644 index 000000000..515f1481e --- /dev/null +++ b/ceph/src/rocksdb/util/concurrent_task_limiter_impl.h @@ -0,0 +1,68 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// +// Copyright (c) 2011 The LevelDB Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. See the AUTHORS file for names of contributors. + +#pragma once +#include +#include + +#include "rocksdb/env.h" +#include "rocksdb/concurrent_task_limiter.h" + +namespace rocksdb { + +class TaskLimiterToken; + +class ConcurrentTaskLimiterImpl : public ConcurrentTaskLimiter { + public: + explicit ConcurrentTaskLimiterImpl(const std::string& name, + int32_t max_outstanding_task); + + virtual ~ConcurrentTaskLimiterImpl(); + + virtual const std::string& GetName() const override; + + virtual void SetMaxOutstandingTask(int32_t limit) override; + + virtual void ResetMaxOutstandingTask() override; + + virtual int32_t GetOutstandingTask() const override; + + // Request token for adding a new task. + // If force == true, it requests a token bypassing throttle. + // Returns nullptr if it got throttled. + virtual std::unique_ptr GetToken(bool force); + + private: + friend class TaskLimiterToken; + + std::string name_; + std::atomic max_outstanding_tasks_; + std::atomic outstanding_tasks_; + + // No copying allowed + ConcurrentTaskLimiterImpl(const ConcurrentTaskLimiterImpl&) = delete; + ConcurrentTaskLimiterImpl& operator=( + const ConcurrentTaskLimiterImpl&) = delete; +}; + +class TaskLimiterToken { + public: + explicit TaskLimiterToken(ConcurrentTaskLimiterImpl* limiter) + : limiter_(limiter) {} + ~TaskLimiterToken(); + + private: + ConcurrentTaskLimiterImpl* limiter_; + + // no copying allowed + TaskLimiterToken(const TaskLimiterToken&) = delete; + void operator=(const TaskLimiterToken&) = delete; +}; + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/delete_scheduler.cc b/ceph/src/rocksdb/util/delete_scheduler.cc index 1d51055a3..f5ee28448 100644 --- a/ceph/src/rocksdb/util/delete_scheduler.cc +++ b/ceph/src/rocksdb/util/delete_scheduler.cc @@ -52,11 +52,12 @@ DeleteScheduler::~DeleteScheduler() { } Status DeleteScheduler::DeleteFile(const std::string& file_path, - const std::string& dir_to_sync) { + const std::string& dir_to_sync, + const bool force_bg) { Status s; - if (rate_bytes_per_sec_.load() <= 0 || + if (rate_bytes_per_sec_.load() <= 0 || (!force_bg && total_trash_size_.load() > - sst_file_manager_->GetTotalSize() * max_trash_db_ratio_.load()) { + sst_file_manager_->GetTotalSize() * max_trash_db_ratio_.load())) { // Rate limiting is disabled or trash size makes up more than // max_trash_db_ratio_ (default 25%) of the total DB size TEST_SYNC_POINT("DeleteScheduler::DeleteFile"); @@ -275,7 +276,7 @@ Status DeleteScheduler::DeleteTrashFile(const std::string& path_in_trash, Status my_status = env_->NumFileLinks(path_in_trash, &num_hard_links); if (my_status.ok()) { if (num_hard_links == 1) { - unique_ptr wf; + std::unique_ptr wf; my_status = env_->ReopenWritableFile(path_in_trash, &wf, EnvOptions()); if (my_status.ok()) { diff --git a/ceph/src/rocksdb/util/delete_scheduler.h b/ceph/src/rocksdb/util/delete_scheduler.h index cbd13ecef..29b70517b 100644 --- a/ceph/src/rocksdb/util/delete_scheduler.h +++ b/ceph/src/rocksdb/util/delete_scheduler.h @@ -46,8 +46,11 @@ class DeleteScheduler { rate_bytes_per_sec_.store(bytes_per_sec); } - // Mark file as trash directory and schedule it's deletion - Status DeleteFile(const std::string& fname, const std::string& dir_to_sync); + // Mark file as trash directory and schedule it's deletion. If force_bg is + // set, it forces the file to always be deleted in the background thread, + // except when rate limiting is disabled + Status DeleteFile(const std::string& fname, const std::string& dir_to_sync, + const bool force_bg = false); // Wait for all files being deleteing in the background to finish or for // destructor to be called. diff --git a/ceph/src/rocksdb/util/delete_scheduler_test.cc b/ceph/src/rocksdb/util/delete_scheduler_test.cc index bfd9954de..0d8e354b9 100644 --- a/ceph/src/rocksdb/util/delete_scheduler_test.cc +++ b/ceph/src/rocksdb/util/delete_scheduler_test.cc @@ -38,7 +38,7 @@ class DeleteSchedulerTest : public testing::Test { } } - ~DeleteSchedulerTest() { + ~DeleteSchedulerTest() override { rocksdb::SyncPoint::GetInstance()->DisableProcessing(); rocksdb::SyncPoint::GetInstance()->LoadDependency({}); rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); diff --git a/ceph/src/rocksdb/util/duplicate_detector.h b/ceph/src/rocksdb/util/duplicate_detector.h index 5879a5240..40a1cbd12 100644 --- a/ceph/src/rocksdb/util/duplicate_detector.h +++ b/ceph/src/rocksdb/util/duplicate_detector.h @@ -56,11 +56,11 @@ class DuplicateDetector { db_->immutable_db_options().info_log, "Recovering an entry from the dropped column family %" PRIu32 ". WAL must must have been emptied before dropping the column " - "family"); + "family", cf); #ifndef ROCKSDB_LITE throw std::runtime_error( - "Recovering an entry from the dropped column family %" PRIu32 - ". WAL must must have been flushed before dropping the column " + "Recovering an entry from a dropped column family. " + "WAL must must have been flushed before dropping the column " "family"); #endif return; diff --git a/ceph/src/rocksdb/util/dynamic_bloom.cc b/ceph/src/rocksdb/util/dynamic_bloom.cc index 635dd98af..8e90efd89 100644 --- a/ceph/src/rocksdb/util/dynamic_bloom.cc +++ b/ceph/src/rocksdb/util/dynamic_bloom.cc @@ -32,20 +32,13 @@ uint32_t GetTotalBitsForLocality(uint32_t total_bits) { DynamicBloom::DynamicBloom(Allocator* allocator, uint32_t total_bits, uint32_t locality, uint32_t num_probes, - uint32_t (*hash_func)(const Slice& key), - size_t huge_page_tlb_size, - Logger* logger) - : DynamicBloom(num_probes, hash_func) { + size_t huge_page_tlb_size, Logger* logger) + : DynamicBloom(num_probes) { SetTotalBits(allocator, total_bits, locality, huge_page_tlb_size, logger); } -DynamicBloom::DynamicBloom(uint32_t num_probes, - uint32_t (*hash_func)(const Slice& key)) - : kTotalBits(0), - kNumBlocks(0), - kNumProbes(num_probes), - hash_func_(hash_func == nullptr ? &BloomHash : hash_func), - data_(nullptr) {} +DynamicBloom::DynamicBloom(uint32_t num_probes) + : kTotalBits(0), kNumBlocks(0), kNumProbes(num_probes), data_(nullptr) {} void DynamicBloom::SetRawData(unsigned char* raw_data, uint32_t total_bits, uint32_t num_blocks) { diff --git a/ceph/src/rocksdb/util/dynamic_bloom.h b/ceph/src/rocksdb/util/dynamic_bloom.h index 398222b1d..654bc9ad5 100644 --- a/ceph/src/rocksdb/util/dynamic_bloom.h +++ b/ceph/src/rocksdb/util/dynamic_bloom.h @@ -10,6 +10,7 @@ #include "rocksdb/slice.h" #include "port/port.h" +#include "util/hash.h" #include #include @@ -35,12 +36,10 @@ class DynamicBloom { explicit DynamicBloom(Allocator* allocator, uint32_t total_bits, uint32_t locality = 0, uint32_t num_probes = 6, - uint32_t (*hash_func)(const Slice& key) = nullptr, size_t huge_page_tlb_size = 0, Logger* logger = nullptr); - explicit DynamicBloom(uint32_t num_probes = 6, - uint32_t (*hash_func)(const Slice& key) = nullptr); + explicit DynamicBloom(uint32_t num_probes = 6); void SetTotalBits(Allocator* allocator, uint32_t total_bits, uint32_t locality, size_t huge_page_tlb_size, @@ -86,7 +85,6 @@ class DynamicBloom { uint32_t kNumBlocks; const uint32_t kNumProbes; - uint32_t (*hash_func_)(const Slice& key); std::atomic* data_; // or_func(ptr, mask) should effect *ptr |= mask with the appropriate @@ -95,10 +93,10 @@ class DynamicBloom { void AddHash(uint32_t hash, const OrFunc& or_func); }; -inline void DynamicBloom::Add(const Slice& key) { AddHash(hash_func_(key)); } +inline void DynamicBloom::Add(const Slice& key) { AddHash(BloomHash(key)); } inline void DynamicBloom::AddConcurrently(const Slice& key) { - AddHashConcurrently(hash_func_(key)); + AddHashConcurrently(BloomHash(key)); } inline void DynamicBloom::AddHash(uint32_t hash) { @@ -122,7 +120,7 @@ inline void DynamicBloom::AddHashConcurrently(uint32_t hash) { } inline bool DynamicBloom::MayContain(const Slice& key) const { - return (MayContainHash(hash_func_(key))); + return (MayContainHash(BloomHash(key))); } #if defined(_MSC_VER) diff --git a/ceph/src/rocksdb/util/event_logger_test.cc b/ceph/src/rocksdb/util/event_logger_test.cc index 13b639442..4bcf30ff5 100644 --- a/ceph/src/rocksdb/util/event_logger_test.cc +++ b/ceph/src/rocksdb/util/event_logger_test.cc @@ -15,7 +15,7 @@ class EventLoggerTest : public testing::Test {}; class StringLogger : public Logger { public: using Logger::Logv; - virtual void Logv(const char* format, va_list ap) override { + void Logv(const char* format, va_list ap) override { vsnprintf(buffer_, sizeof(buffer_), format, ap); } char* buffer() { return buffer_; } diff --git a/ceph/src/rocksdb/util/fault_injection_test_env.cc b/ceph/src/rocksdb/util/fault_injection_test_env.cc index 3b3dbbe99..9cad23871 100644 --- a/ceph/src/rocksdb/util/fault_injection_test_env.cc +++ b/ceph/src/rocksdb/util/fault_injection_test_env.cc @@ -29,12 +29,12 @@ std::string GetDirName(const std::string filename) { // A basic file truncation function suitable for this test. Status Truncate(Env* env, const std::string& filename, uint64_t length) { - unique_ptr orig_file; + std::unique_ptr orig_file; const EnvOptions options; Status s = env->NewSequentialFile(filename, &orig_file, options); if (!s.ok()) { - fprintf(stderr, "Cannot truncate file %s: %s\n", filename.c_str(), - s.ToString().c_str()); + fprintf(stderr, "Cannot open file %s for truncation: %s\n", + filename.c_str(), s.ToString().c_str()); return s; } @@ -46,7 +46,7 @@ Status Truncate(Env* env, const std::string& filename, uint64_t length) { #endif if (s.ok()) { std::string tmp_name = GetDirName(filename) + "/truncate.tmp"; - unique_ptr tmp_file; + std::unique_ptr tmp_file; s = env->NewWritableFile(tmp_name, &tmp_file, options); if (s.ok()) { s = tmp_file->Append(result); @@ -103,7 +103,7 @@ Status TestDirectory::Fsync() { } TestWritableFile::TestWritableFile(const std::string& fname, - unique_ptr&& f, + std::unique_ptr&& f, FaultInjectionTestEnv* env) : state_(fname), target_(std::move(f)), @@ -126,6 +126,7 @@ Status TestWritableFile::Append(const Slice& data) { Status s = target_->Append(data); if (s.ok()) { state_.pos_ += data.size(); + env_->WritableFileAppended(state_); } return s; } @@ -153,12 +154,13 @@ Status TestWritableFile::Sync() { } // No need to actual sync. state_.pos_at_last_sync_ = state_.pos_; + env_->WritableFileSynced(state_); return Status::OK(); } Status FaultInjectionTestEnv::NewDirectory(const std::string& name, - unique_ptr* result) { - unique_ptr r; + std::unique_ptr* result) { + std::unique_ptr r; Status s = target()->NewDirectory(name, &r); assert(s.ok()); if (!s.ok()) { @@ -168,9 +170,9 @@ Status FaultInjectionTestEnv::NewDirectory(const std::string& name, return Status::OK(); } -Status FaultInjectionTestEnv::NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& soptions) { +Status FaultInjectionTestEnv::NewWritableFile( + const std::string& fname, std::unique_ptr* result, + const EnvOptions& soptions) { if (!IsFilesystemActive()) { return GetError(); } @@ -197,6 +199,27 @@ Status FaultInjectionTestEnv::NewWritableFile(const std::string& fname, return s; } +Status FaultInjectionTestEnv::ReopenWritableFile( + const std::string& fname, std::unique_ptr* result, + const EnvOptions& soptions) { + if (!IsFilesystemActive()) { + return GetError(); + } + Status s = target()->ReopenWritableFile(fname, result, soptions); + if (s.ok()) { + result->reset(new TestWritableFile(fname, std::move(*result), this)); + // WritableFileWriter* file is opened + // again then it will be truncated - so forget our saved state. + UntrackFile(fname); + MutexLock l(&mutex_); + open_files_.insert(fname); + auto dir_and_name = GetDirAndName(fname); + auto& list = dir_to_new_files_since_last_sync_[dir_and_name.first]; + list.insert(dir_and_name.second); + } + return s; +} + Status FaultInjectionTestEnv::NewRandomAccessFile( const std::string& fname, std::unique_ptr* result, const EnvOptions& soptions) { @@ -256,6 +279,28 @@ void FaultInjectionTestEnv::WritableFileClosed(const FileState& state) { } } +void FaultInjectionTestEnv::WritableFileSynced(const FileState& state) { + MutexLock l(&mutex_); + if (open_files_.find(state.filename_) != open_files_.end()) { + if (db_file_state_.find(state.filename_) == db_file_state_.end()) { + db_file_state_.insert(std::make_pair(state.filename_, state)); + } else { + db_file_state_[state.filename_] = state; + } + } +} + +void FaultInjectionTestEnv::WritableFileAppended(const FileState& state) { + MutexLock l(&mutex_); + if (open_files_.find(state.filename_) != open_files_.end()) { + if (db_file_state_.find(state.filename_) == db_file_state_.end()) { + db_file_state_.insert(std::make_pair(state.filename_, state)); + } else { + db_file_state_[state.filename_] = state; + } + } +} + // For every file that is not fully synced, make a call to `func` with // FileState of the file as the parameter. Status FaultInjectionTestEnv::DropFileData( diff --git a/ceph/src/rocksdb/util/fault_injection_test_env.h b/ceph/src/rocksdb/util/fault_injection_test_env.h index 563986e29..7c5a080f7 100644 --- a/ceph/src/rocksdb/util/fault_injection_test_env.h +++ b/ceph/src/rocksdb/util/fault_injection_test_env.h @@ -56,7 +56,7 @@ struct FileState { class TestWritableFile : public WritableFile { public: explicit TestWritableFile(const std::string& fname, - unique_ptr&& f, + std::unique_ptr&& f, FaultInjectionTestEnv* env); virtual ~TestWritableFile(); virtual Status Append(const Slice& data) override; @@ -77,7 +77,7 @@ class TestWritableFile : public WritableFile { private: FileState state_; - unique_ptr target_; + std::unique_ptr target_; bool writable_file_opened_; FaultInjectionTestEnv* env_; }; @@ -94,7 +94,7 @@ class TestDirectory : public Directory { private: FaultInjectionTestEnv* env_; std::string dirname_; - unique_ptr dir_; + std::unique_ptr dir_; }; class FaultInjectionTestEnv : public EnvWrapper { @@ -104,12 +104,16 @@ class FaultInjectionTestEnv : public EnvWrapper { virtual ~FaultInjectionTestEnv() {} Status NewDirectory(const std::string& name, - unique_ptr* result) override; + std::unique_ptr* result) override; Status NewWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& soptions) override; + Status ReopenWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& soptions) override; + Status NewRandomAccessFile(const std::string& fname, std::unique_ptr* result, const EnvOptions& soptions) override; @@ -131,6 +135,10 @@ class FaultInjectionTestEnv : public EnvWrapper { void WritableFileClosed(const FileState& state); + void WritableFileSynced(const FileState& state); + + void WritableFileAppended(const FileState& state); + // For every file that is not fully synced, make a call to `func` with // FileState of the file as the parameter. Status DropFileData(std::function func); diff --git a/ceph/src/rocksdb/util/file_reader_writer.cc b/ceph/src/rocksdb/util/file_reader_writer.cc index cd09f7122..9a818cb0f 100644 --- a/ceph/src/rocksdb/util/file_reader_writer.cc +++ b/ceph/src/rocksdb/util/file_reader_writer.cc @@ -77,6 +77,7 @@ Status RandomAccessFileReader::Read(uint64_t offset, size_t n, Slice* result, StopWatch sw(env_, stats_, hist_type_, (stats_ != nullptr) ? &elapsed : nullptr, true /*overwrite*/, true /*delay_enabled*/); + auto prev_perf_level = GetPerfLevel(); IOSTATS_TIMER_GUARD(read_nanos); if (use_direct_io()) { #ifndef ROCKSDB_LITE @@ -98,8 +99,24 @@ Status RandomAccessFileReader::Read(uint64_t offset, size_t n, Slice* result, allowed = read_size; } Slice tmp; - s = file_->Read(aligned_offset + buf.CurrentSize(), allowed, &tmp, - buf.Destination()); + + FileOperationInfo::TimePoint start_ts; + uint64_t orig_offset = 0; + if (ShouldNotifyListeners()) { + start_ts = std::chrono::system_clock::now(); + orig_offset = aligned_offset + buf.CurrentSize(); + } + { + IOSTATS_CPU_TIMER_GUARD(cpu_read_nanos, env_); + s = file_->Read(aligned_offset + buf.CurrentSize(), allowed, &tmp, + buf.Destination()); + } + if (ShouldNotifyListeners()) { + auto finish_ts = std::chrono::system_clock::now(); + NotifyOnFileReadFinish(orig_offset, tmp.size(), start_ts, finish_ts, + s); + } + buf.Size(buf.CurrentSize() + tmp.size()); if (!s.ok() || tmp.size() < allowed) { break; @@ -131,7 +148,25 @@ Status RandomAccessFileReader::Read(uint64_t offset, size_t n, Slice* result, allowed = n; } Slice tmp_result; - s = file_->Read(offset + pos, allowed, &tmp_result, scratch + pos); + +#ifndef ROCKSDB_LITE + FileOperationInfo::TimePoint start_ts; + if (ShouldNotifyListeners()) { + start_ts = std::chrono::system_clock::now(); + } +#endif + { + IOSTATS_CPU_TIMER_GUARD(cpu_read_nanos, env_); + s = file_->Read(offset + pos, allowed, &tmp_result, scratch + pos); + } +#ifndef ROCKSDB_LITE + if (ShouldNotifyListeners()) { + auto finish_ts = std::chrono::system_clock::now(); + NotifyOnFileReadFinish(offset + pos, tmp_result.size(), start_ts, + finish_ts, s); + } +#endif + if (res_scratch == nullptr) { // we can't simply use `scratch` because reads of mmap'd files return // data in a different buffer. @@ -148,10 +183,12 @@ Status RandomAccessFileReader::Read(uint64_t offset, size_t n, Slice* result, *result = Slice(res_scratch, s.ok() ? pos : 0); } IOSTATS_ADD_IF_POSITIVE(bytes_read, result->size()); + SetPerfLevel(prev_perf_level); } if (stats_ != nullptr && file_read_hist_ != nullptr) { file_read_hist_->Add(elapsed); } + return s; } @@ -301,7 +338,9 @@ Status WritableFileWriter::Flush() { if (buf_.CurrentSize() > 0) { if (use_direct_io()) { #ifndef ROCKSDB_LITE - s = WriteDirect(); + if (pending_sync_) { + s = WriteDirect(); + } #endif // !ROCKSDB_LITE } else { s = WriteBuffered(buf_.BufferStart(), buf_.CurrentSize()); @@ -379,11 +418,14 @@ Status WritableFileWriter::SyncInternal(bool use_fsync) { Status s; IOSTATS_TIMER_GUARD(fsync_nanos); TEST_SYNC_POINT("WritableFileWriter::SyncInternal:0"); + auto prev_perf_level = GetPerfLevel(); + IOSTATS_CPU_TIMER_GUARD(cpu_write_nanos, env_); if (use_fsync) { s = writable_file_->Fsync(); } else { s = writable_file_->Sync(); } + SetPerfLevel(prev_perf_level); return s; } @@ -414,7 +456,27 @@ Status WritableFileWriter::WriteBuffered(const char* data, size_t size) { { IOSTATS_TIMER_GUARD(write_nanos); TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend"); - s = writable_file_->Append(Slice(src, allowed)); + +#ifndef ROCKSDB_LITE + FileOperationInfo::TimePoint start_ts; + uint64_t old_size = writable_file_->GetFileSize(); + if (ShouldNotifyListeners()) { + start_ts = std::chrono::system_clock::now(); + old_size = next_write_offset_; + } +#endif + { + auto prev_perf_level = GetPerfLevel(); + IOSTATS_CPU_TIMER_GUARD(cpu_write_nanos, env_); + s = writable_file_->Append(Slice(src, allowed)); + SetPerfLevel(prev_perf_level); + } +#ifndef ROCKSDB_LITE + if (ShouldNotifyListeners()) { + auto finish_ts = std::chrono::system_clock::now(); + NotifyOnFileWriteFinish(old_size, allowed, start_ts, finish_ts, s); + } +#endif if (!s.ok()) { return s; } @@ -477,8 +539,16 @@ Status WritableFileWriter::WriteDirect() { { IOSTATS_TIMER_GUARD(write_nanos); TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend"); + FileOperationInfo::TimePoint start_ts; + if (ShouldNotifyListeners()) { + start_ts = std::chrono::system_clock::now(); + } // direct writes must be positional s = writable_file_->PositionedAppend(Slice(src, size), write_offset); + if (ShouldNotifyListeners()) { + auto finish_ts = std::chrono::system_clock::now(); + NotifyOnFileWriteFinish(write_offset, size, start_ts, finish_ts, s); + } if (!s.ok()) { buf_.Size(file_advance + leftover_tail); return s; @@ -525,96 +595,93 @@ class ReadaheadRandomAccessFile : public RandomAccessFile { ReadaheadRandomAccessFile& operator=(const ReadaheadRandomAccessFile&) = delete; - virtual Status Read(uint64_t offset, size_t n, Slice* result, - char* scratch) const override { - - if (n + alignment_ >= readahead_size_) { - return file_->Read(offset, n, result, scratch); - } - - std::unique_lock lk(lock_); - - size_t cached_len = 0; - // Check if there is a cache hit, means that [offset, offset + n) is either - // completely or partially in the buffer - // If it's completely cached, including end of file case when offset + n is - // greater than EOF, return - if (TryReadFromCache(offset, n, &cached_len, scratch) && - (cached_len == n || - // End of file - buffer_.CurrentSize() < readahead_size_)) { - *result = Slice(scratch, cached_len); - return Status::OK(); - } - size_t advanced_offset = static_cast(offset + cached_len); - // In the case of cache hit advanced_offset is already aligned, means that - // chunk_offset equals to advanced_offset - size_t chunk_offset = TruncateToPageBoundary(alignment_, advanced_offset); - Slice readahead_result; - - Status s = ReadIntoBuffer(chunk_offset, readahead_size_); - if (s.ok()) { - // In the case of cache miss, i.e. when cached_len equals 0, an offset can - // exceed the file end position, so the following check is required - if (advanced_offset < chunk_offset + buffer_.CurrentSize()) { - // In the case of cache miss, the first chunk_padding bytes in buffer_ - // are - // stored for alignment only and must be skipped - size_t chunk_padding = advanced_offset - chunk_offset; - auto remaining_len = - std::min(buffer_.CurrentSize() - chunk_padding, n - cached_len); - memcpy(scratch + cached_len, buffer_.BufferStart() + chunk_padding, - remaining_len); - *result = Slice(scratch, cached_len + remaining_len); - } else { - *result = Slice(scratch, cached_len); - } - } - return s; - } - - virtual Status Prefetch(uint64_t offset, size_t n) override { - if (n < readahead_size_) { - // Don't allow smaller prefetches than the configured `readahead_size_`. - // `Read()` assumes a smaller prefetch buffer indicates EOF was reached. - return Status::OK(); - } - size_t offset_ = static_cast(offset); - size_t prefetch_offset = TruncateToPageBoundary(alignment_, offset_); - if (prefetch_offset == buffer_offset_) { - return Status::OK(); - } - return ReadIntoBuffer(prefetch_offset, - Roundup(offset_ + n, alignment_) - prefetch_offset); - } - - virtual size_t GetUniqueId(char* id, size_t max_size) const override { - return file_->GetUniqueId(id, max_size); - } - - virtual void Hint(AccessPattern pattern) override { file_->Hint(pattern); } - - virtual Status InvalidateCache(size_t offset, size_t length) override { - return file_->InvalidateCache(offset, length); - } - - virtual bool use_direct_io() const override { - return file_->use_direct_io(); - } - - private: - bool TryReadFromCache(uint64_t offset, size_t n, size_t* cached_len, - char* scratch) const { - if (offset < buffer_offset_ || - offset >= buffer_offset_ + buffer_.CurrentSize()) { - *cached_len = 0; - return false; - } - uint64_t offset_in_buffer = offset - buffer_offset_; - *cached_len = std::min( - buffer_.CurrentSize() - static_cast(offset_in_buffer), n); - memcpy(scratch, buffer_.BufferStart() + offset_in_buffer, *cached_len); - return true; + Status Read(uint64_t offset, size_t n, Slice* result, + char* scratch) const override { + if (n + alignment_ >= readahead_size_) { + return file_->Read(offset, n, result, scratch); + } + + std::unique_lock lk(lock_); + + size_t cached_len = 0; + // Check if there is a cache hit, means that [offset, offset + n) is either + // completely or partially in the buffer + // If it's completely cached, including end of file case when offset + n is + // greater than EOF, return + if (TryReadFromCache(offset, n, &cached_len, scratch) && + (cached_len == n || + // End of file + buffer_.CurrentSize() < readahead_size_)) { + *result = Slice(scratch, cached_len); + return Status::OK(); + } + size_t advanced_offset = static_cast(offset + cached_len); + // In the case of cache hit advanced_offset is already aligned, means that + // chunk_offset equals to advanced_offset + size_t chunk_offset = TruncateToPageBoundary(alignment_, advanced_offset); + Slice readahead_result; + + Status s = ReadIntoBuffer(chunk_offset, readahead_size_); + if (s.ok()) { + // In the case of cache miss, i.e. when cached_len equals 0, an offset can + // exceed the file end position, so the following check is required + if (advanced_offset < chunk_offset + buffer_.CurrentSize()) { + // In the case of cache miss, the first chunk_padding bytes in buffer_ + // are + // stored for alignment only and must be skipped + size_t chunk_padding = advanced_offset - chunk_offset; + auto remaining_len = + std::min(buffer_.CurrentSize() - chunk_padding, n - cached_len); + memcpy(scratch + cached_len, buffer_.BufferStart() + chunk_padding, + remaining_len); + *result = Slice(scratch, cached_len + remaining_len); + } else { + *result = Slice(scratch, cached_len); + } + } + return s; + } + + Status Prefetch(uint64_t offset, size_t n) override { + if (n < readahead_size_) { + // Don't allow smaller prefetches than the configured `readahead_size_`. + // `Read()` assumes a smaller prefetch buffer indicates EOF was reached. + return Status::OK(); + } + size_t offset_ = static_cast(offset); + size_t prefetch_offset = TruncateToPageBoundary(alignment_, offset_); + if (prefetch_offset == buffer_offset_) { + return Status::OK(); + } + return ReadIntoBuffer(prefetch_offset, + Roundup(offset_ + n, alignment_) - prefetch_offset); + } + + size_t GetUniqueId(char* id, size_t max_size) const override { + return file_->GetUniqueId(id, max_size); + } + + void Hint(AccessPattern pattern) override { file_->Hint(pattern); } + + Status InvalidateCache(size_t offset, size_t length) override { + return file_->InvalidateCache(offset, length); + } + + bool use_direct_io() const override { return file_->use_direct_io(); } + +private: + bool TryReadFromCache(uint64_t offset, size_t n, size_t* cached_len, + char* scratch) const { + if (offset < buffer_offset_ || + offset >= buffer_offset_ + buffer_.CurrentSize()) { + *cached_len = 0; + return false; + } + uint64_t offset_in_buffer = offset - buffer_offset_; + *cached_len = std::min( + buffer_.CurrentSize() - static_cast(offset_in_buffer), n); + memcpy(scratch, buffer_.BufferStart() + offset_in_buffer, *cached_len); + return true; } Status ReadIntoBuffer(uint64_t offset, size_t n) const { @@ -753,7 +820,7 @@ std::unique_ptr NewReadaheadRandomAccessFile( } Status NewWritableFile(Env* env, const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options) { Status s = env->NewWritableFile(fname, result, options); TEST_KILL_RANDOM("NewWritableFile:0", rocksdb_kill_odds * REDUCE_ODDS2); diff --git a/ceph/src/rocksdb/util/file_reader_writer.h b/ceph/src/rocksdb/util/file_reader_writer.h index a2c90f2b3..4451f8b81 100644 --- a/ceph/src/rocksdb/util/file_reader_writer.h +++ b/ceph/src/rocksdb/util/file_reader_writer.h @@ -12,6 +12,7 @@ #include #include "port/port.h" #include "rocksdb/env.h" +#include "rocksdb/listener.h" #include "rocksdb/rate_limiter.h" #include "util/aligned_buffer.h" #include "util/sync_point.h" @@ -62,6 +63,24 @@ class SequentialFileReader { class RandomAccessFileReader { private: +#ifndef ROCKSDB_LITE + void NotifyOnFileReadFinish(uint64_t offset, size_t length, + const FileOperationInfo::TimePoint& start_ts, + const FileOperationInfo::TimePoint& finish_ts, + const Status& status) const { + FileOperationInfo info(file_name_, start_ts, finish_ts); + info.offset = offset; + info.length = length; + info.status = status; + + for (auto& listener : listeners_) { + listener->OnFileReadFinish(info); + } + } +#endif // ROCKSDB_LITE + + bool ShouldNotifyListeners() const { return !listeners_.empty(); } + std::unique_ptr file_; std::string file_name_; Env* env_; @@ -70,16 +89,15 @@ class RandomAccessFileReader { HistogramImpl* file_read_hist_; RateLimiter* rate_limiter_; bool for_compaction_; + std::vector> listeners_; public: - explicit RandomAccessFileReader(std::unique_ptr&& raf, - std::string _file_name, - Env* env = nullptr, - Statistics* stats = nullptr, - uint32_t hist_type = 0, - HistogramImpl* file_read_hist = nullptr, - RateLimiter* rate_limiter = nullptr, - bool for_compaction = false) + explicit RandomAccessFileReader( + std::unique_ptr&& raf, std::string _file_name, + Env* env = nullptr, Statistics* stats = nullptr, uint32_t hist_type = 0, + HistogramImpl* file_read_hist = nullptr, + RateLimiter* rate_limiter = nullptr, bool for_compaction = false, + const std::vector>& listeners = {}) : file_(std::move(raf)), file_name_(std::move(_file_name)), env_(env), @@ -87,7 +105,19 @@ class RandomAccessFileReader { hist_type_(hist_type), file_read_hist_(file_read_hist), rate_limiter_(rate_limiter), - for_compaction_(for_compaction) {} + for_compaction_(for_compaction), + listeners_() { +#ifndef ROCKSDB_LITE + std::for_each(listeners.begin(), listeners.end(), + [this](const std::shared_ptr& e) { + if (e->ShouldBeNotifiedOnFileIO()) { + listeners_.emplace_back(e); + } + }); +#else // !ROCKSDB_LITE + (void)listeners; +#endif + } RandomAccessFileReader(RandomAccessFileReader&& o) ROCKSDB_NOEXCEPT { *this = std::move(o); @@ -124,8 +154,27 @@ class RandomAccessFileReader { // Use posix write to write data to a file. class WritableFileWriter { private: +#ifndef ROCKSDB_LITE + void NotifyOnFileWriteFinish(uint64_t offset, size_t length, + const FileOperationInfo::TimePoint& start_ts, + const FileOperationInfo::TimePoint& finish_ts, + const Status& status) { + FileOperationInfo info(file_name_, start_ts, finish_ts); + info.offset = offset; + info.length = length; + info.status = status; + + for (auto& listener : listeners_) { + listener->OnFileWriteFinish(info); + } + } +#endif // ROCKSDB_LITE + + bool ShouldNotifyListeners() const { return !listeners_.empty(); } + std::unique_ptr writable_file_; std::string file_name_; + Env* env_; AlignedBuffer buf_; size_t max_buffer_size_; // Actually written data size can be used for truncate @@ -142,13 +191,17 @@ class WritableFileWriter { uint64_t bytes_per_sync_; RateLimiter* rate_limiter_; Statistics* stats_; + std::vector> listeners_; public: - WritableFileWriter(std::unique_ptr&& file, - const std::string& _file_name, const EnvOptions& options, - Statistics* stats = nullptr) + WritableFileWriter( + std::unique_ptr&& file, const std::string& _file_name, + const EnvOptions& options, Env* env = nullptr, + Statistics* stats = nullptr, + const std::vector>& listeners = {}) : writable_file_(std::move(file)), file_name_(_file_name), + env_(env), buf_(), max_buffer_size_(options.writable_file_max_buffer_size), filesize_(0), @@ -159,11 +212,22 @@ class WritableFileWriter { last_sync_size_(0), bytes_per_sync_(options.bytes_per_sync), rate_limiter_(options.rate_limiter), - stats_(stats) { + stats_(stats), + listeners_() { TEST_SYNC_POINT_CALLBACK("WritableFileWriter::WritableFileWriter:0", reinterpret_cast(max_buffer_size_)); buf_.Alignment(writable_file_->GetRequiredBufferAlignment()); buf_.AllocateNewBuffer(std::min((size_t)65536, max_buffer_size_)); +#ifndef ROCKSDB_LITE + std::for_each(listeners.begin(), listeners.end(), + [this](const std::shared_ptr& e) { + if (e->ShouldBeNotifiedOnFileIO()) { + listeners_.emplace_back(e); + } + }); +#else // !ROCKSDB_LITE + (void)listeners; +#endif } WritableFileWriter(const WritableFileWriter&) = delete; @@ -254,7 +318,7 @@ class FilePrefetchBuffer { }; extern Status NewWritableFile(Env* env, const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& options); bool ReadOneLine(std::istringstream* iss, SequentialFile* seq_file, std::string* output, bool* has_data, Status* result); diff --git a/ceph/src/rocksdb/util/file_reader_writer_test.cc b/ceph/src/rocksdb/util/file_reader_writer_test.cc index 3ca44ecc0..6a7ea6d7d 100644 --- a/ceph/src/rocksdb/util/file_reader_writer_test.cc +++ b/ceph/src/rocksdb/util/file_reader_writer_test.cc @@ -20,13 +20,13 @@ TEST_F(WritableFileWriterTest, RangeSync) { class FakeWF : public WritableFile { public: explicit FakeWF() : size_(0), last_synced_(0) {} - ~FakeWF() {} + ~FakeWF() override {} Status Append(const Slice& data) override { size_ += data.size(); return Status::OK(); } - virtual Status Truncate(uint64_t /*size*/) override { return Status::OK(); } + Status Truncate(uint64_t /*size*/) override { return Status::OK(); } Status Close() override { EXPECT_GE(size_, last_synced_ + kMb); EXPECT_LT(size_, last_synced_ + 2 * kMb); @@ -71,8 +71,8 @@ TEST_F(WritableFileWriterTest, RangeSync) { EnvOptions env_options; env_options.bytes_per_sync = kMb; - unique_ptr wf(new FakeWF); - unique_ptr writer( + std::unique_ptr wf(new FakeWF); + std::unique_ptr writer( new WritableFileWriter(std::move(wf), "" /* don't care */, env_options)); Random r(301); std::unique_ptr large_buf(new char[10 * kMb]); @@ -97,7 +97,7 @@ TEST_F(WritableFileWriterTest, IncrementalBuffer) { : file_data_(_file_data), use_direct_io_(_use_direct_io), no_flush_(_no_flush) {} - ~FakeWF() {} + ~FakeWF() override {} Status Append(const Slice& data) override { file_data_->append(data.data(), data.size()); @@ -113,7 +113,7 @@ TEST_F(WritableFileWriterTest, IncrementalBuffer) { return Status::OK(); } - virtual Status Truncate(uint64_t size) override { + Status Truncate(uint64_t size) override { file_data_->resize(size); return Status::OK(); } @@ -147,14 +147,14 @@ TEST_F(WritableFileWriterTest, IncrementalBuffer) { env_options.writable_file_max_buffer_size = (attempt < kNumAttempts / 2) ? 512 * 1024 : 700 * 1024; std::string actual; - unique_ptr wf(new FakeWF(&actual, + std::unique_ptr wf(new FakeWF(&actual, #ifndef ROCKSDB_LITE - attempt % 2 == 1, + attempt % 2 == 1, #else - false, + false, #endif - no_flush)); - unique_ptr writer(new WritableFileWriter( + no_flush)); + std::unique_ptr writer(new WritableFileWriter( std::move(wf), "" /* don't care */, env_options)); std::string target; @@ -183,7 +183,7 @@ TEST_F(WritableFileWriterTest, AppendStatusReturn) { public: explicit FakeWF() : use_direct_io_(false), io_error_(false) {} - virtual bool use_direct_io() const override { return use_direct_io_; } + bool use_direct_io() const override { return use_direct_io_; } Status Append(const Slice& /*data*/) override { if (io_error_) { return Status::IOError("Fake IO error"); @@ -206,9 +206,9 @@ TEST_F(WritableFileWriterTest, AppendStatusReturn) { bool use_direct_io_; bool io_error_; }; - unique_ptr wf(new FakeWF()); + std::unique_ptr wf(new FakeWF()); wf->Setuse_direct_io(true); - unique_ptr writer( + std::unique_ptr writer( new WritableFileWriter(std::move(wf), "" /* don't care */, EnvOptions())); ASSERT_OK(writer->Append(std::string(2 * kMb, 'a'))); @@ -226,7 +226,7 @@ class ReadaheadRandomAccessFileTest static std::vector GetReadaheadSizeList() { return {1lu << 12, 1lu << 16}; } - virtual void SetUp() override { + void SetUp() override { readahead_size_ = GetParam(); scratch_.reset(new char[2 * readahead_size_]); ResetSourceStr(); diff --git a/ceph/src/rocksdb/util/file_util.cc b/ceph/src/rocksdb/util/file_util.cc index aa2994b1e..ba1b4744b 100644 --- a/ceph/src/rocksdb/util/file_util.cc +++ b/ceph/src/rocksdb/util/file_util.cc @@ -19,16 +19,16 @@ Status CopyFile(Env* env, const std::string& source, const std::string& destination, uint64_t size, bool use_fsync) { const EnvOptions soptions; Status s; - unique_ptr src_reader; - unique_ptr dest_writer; + std::unique_ptr src_reader; + std::unique_ptr dest_writer; { - unique_ptr srcfile; + std::unique_ptr srcfile; s = env->NewSequentialFile(source, &srcfile, soptions); if (!s.ok()) { return s; } - unique_ptr destfile; + std::unique_ptr destfile; s = env->NewWritableFile(destination, &destfile, soptions); if (!s.ok()) { return s; @@ -71,9 +71,9 @@ Status CreateFile(Env* env, const std::string& destination, const std::string& contents, bool use_fsync) { const EnvOptions soptions; Status s; - unique_ptr dest_writer; + std::unique_ptr dest_writer; - unique_ptr destfile; + std::unique_ptr destfile; s = env->NewWritableFile(destination, &destfile, soptions); if (!s.ok()) { return s; @@ -87,19 +87,22 @@ Status CreateFile(Env* env, const std::string& destination, return dest_writer->Sync(use_fsync); } -Status DeleteSSTFile(const ImmutableDBOptions* db_options, - const std::string& fname, const std::string& dir_to_sync) { +Status DeleteDBFile(const ImmutableDBOptions* db_options, + const std::string& fname, const std::string& dir_to_sync, + const bool force_bg) { #ifndef ROCKSDB_LITE - auto sfm = + SstFileManagerImpl* sfm = static_cast(db_options->sst_file_manager.get()); if (sfm) { - return sfm->ScheduleFileDeletion(fname, dir_to_sync); + return sfm->ScheduleFileDeletion(fname, dir_to_sync, force_bg); } else { return db_options->env->DeleteFile(fname); } #else (void)dir_to_sync; + (void)force_bg; // SstFileManager is not supported in ROCKSDB_LITE + // Delete file immediately return db_options->env->DeleteFile(fname); #endif } diff --git a/ceph/src/rocksdb/util/file_util.h b/ceph/src/rocksdb/util/file_util.h index 5c05c9def..c3b365c8b 100644 --- a/ceph/src/rocksdb/util/file_util.h +++ b/ceph/src/rocksdb/util/file_util.h @@ -10,6 +10,7 @@ #include "rocksdb/env.h" #include "rocksdb/status.h" #include "rocksdb/types.h" +#include "util/filename.h" namespace rocksdb { // use_fsync maps to options.use_fsync, which determines the way that @@ -21,8 +22,9 @@ extern Status CopyFile(Env* env, const std::string& source, extern Status CreateFile(Env* env, const std::string& destination, const std::string& contents, bool use_fsync); -extern Status DeleteSSTFile(const ImmutableDBOptions* db_options, - const std::string& fname, - const std::string& path_to_sync); +extern Status DeleteDBFile(const ImmutableDBOptions* db_options, + const std::string& fname, + const std::string& path_to_sync, + const bool force_bg = false); } // namespace rocksdb diff --git a/ceph/src/rocksdb/util/filelock_test.cc b/ceph/src/rocksdb/util/filelock_test.cc index de3cdd00c..f8721b590 100644 --- a/ceph/src/rocksdb/util/filelock_test.cc +++ b/ceph/src/rocksdb/util/filelock_test.cc @@ -25,8 +25,7 @@ class LockTest : public testing::Test { current_ = this; } - ~LockTest() { - } + ~LockTest() override {} Status LockFile(FileLock** db_lock) { return env_->LockFile(file_, db_lock); diff --git a/ceph/src/rocksdb/util/hash.h b/ceph/src/rocksdb/util/hash.h index 4a13f4564..ed42b0894 100644 --- a/ceph/src/rocksdb/util/hash.h +++ b/ceph/src/rocksdb/util/hash.h @@ -14,19 +14,36 @@ #include #include "rocksdb/slice.h" +#include "util/murmurhash.h" namespace rocksdb { +// Non-persistent hash. Only used for in-memory data structure +// The hash results are applicable to change. +extern uint64_t NPHash64(const char* data, size_t n, uint32_t seed); + extern uint32_t Hash(const char* data, size_t n, uint32_t seed); inline uint32_t BloomHash(const Slice& key) { return Hash(key.data(), key.size(), 0xbc9f1d34); } +inline uint64_t GetSliceNPHash64(const Slice& s) { + return NPHash64(s.data(), s.size(), 0); +} + inline uint32_t GetSliceHash(const Slice& s) { return Hash(s.data(), s.size(), 397); } +inline uint64_t NPHash64(const char* data, size_t n, uint32_t seed) { + // Right now murmurhash2B is used. It should able to be freely + // changed to a better hash, without worrying about backward + // compatibility issue. + return MURMUR_HASH(data, static_cast(n), + static_cast(seed)); +} + // std::hash compatible interface. struct SliceHasher { uint32_t operator()(const Slice& s) const { return GetSliceHash(s); } diff --git a/ceph/src/rocksdb/util/heap.h b/ceph/src/rocksdb/util/heap.h index 4d5894134..6093c20e2 100644 --- a/ceph/src/rocksdb/util/heap.h +++ b/ceph/src/rocksdb/util/heap.h @@ -92,9 +92,9 @@ class BinaryHeap { reset_root_cmp_cache(); } - bool empty() const { - return data_.empty(); - } + bool empty() const { return data_.empty(); } + + size_t size() const { return data_.size(); } void reset_root_cmp_cache() { root_cmp_cache_ = port::kMaxSizet; } diff --git a/ceph/src/rocksdb/util/jemalloc_nodump_allocator.cc b/ceph/src/rocksdb/util/jemalloc_nodump_allocator.cc new file mode 100644 index 000000000..cdd08e932 --- /dev/null +++ b/ceph/src/rocksdb/util/jemalloc_nodump_allocator.cc @@ -0,0 +1,206 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#include "util/jemalloc_nodump_allocator.h" + +#include +#include + +#include "port/likely.h" +#include "port/port.h" +#include "util/string_util.h" + +namespace rocksdb { + +#ifdef ROCKSDB_JEMALLOC_NODUMP_ALLOCATOR + +std::atomic JemallocNodumpAllocator::original_alloc_{nullptr}; + +JemallocNodumpAllocator::JemallocNodumpAllocator( + JemallocAllocatorOptions& options, + std::unique_ptr&& arena_hooks, unsigned arena_index) + : options_(options), + arena_hooks_(std::move(arena_hooks)), + arena_index_(arena_index), + tcache_(&JemallocNodumpAllocator::DestroyThreadSpecificCache) {} + +int JemallocNodumpAllocator::GetThreadSpecificCache(size_t size) { + // We always enable tcache. The only corner case is when there are a ton of + // threads accessing with low frequency, then it could consume a lot of + // memory (may reach # threads * ~1MB) without bringing too much benefit. + if (options_.limit_tcache_size && (size <= options_.tcache_size_lower_bound || + size > options_.tcache_size_upper_bound)) { + return MALLOCX_TCACHE_NONE; + } + unsigned* tcache_index = reinterpret_cast(tcache_.Get()); + if (UNLIKELY(tcache_index == nullptr)) { + // Instantiate tcache. + tcache_index = new unsigned(0); + size_t tcache_index_size = sizeof(unsigned); + int ret = + mallctl("tcache.create", tcache_index, &tcache_index_size, nullptr, 0); + if (ret != 0) { + // No good way to expose the error. Silently disable tcache. + delete tcache_index; + return MALLOCX_TCACHE_NONE; + } + tcache_.Reset(static_cast(tcache_index)); + } + return MALLOCX_TCACHE(*tcache_index); +} + +void* JemallocNodumpAllocator::Allocate(size_t size) { + int tcache_flag = GetThreadSpecificCache(size); + return mallocx(size, MALLOCX_ARENA(arena_index_) | tcache_flag); +} + +void JemallocNodumpAllocator::Deallocate(void* p) { + // Obtain tcache. + size_t size = 0; + if (options_.limit_tcache_size) { + size = malloc_usable_size(p); + } + int tcache_flag = GetThreadSpecificCache(size); + // No need to pass arena index to dallocx(). Jemalloc will find arena index + // from its own metadata. + dallocx(p, tcache_flag); +} + +void* JemallocNodumpAllocator::Alloc(extent_hooks_t* extent, void* new_addr, + size_t size, size_t alignment, bool* zero, + bool* commit, unsigned arena_ind) { + extent_alloc_t* original_alloc = + original_alloc_.load(std::memory_order_relaxed); + assert(original_alloc != nullptr); + void* result = original_alloc(extent, new_addr, size, alignment, zero, commit, + arena_ind); + if (result != nullptr) { + int ret = madvise(result, size, MADV_DONTDUMP); + if (ret != 0) { + fprintf( + stderr, + "JemallocNodumpAllocator failed to set MADV_DONTDUMP, error code: %d", + ret); + assert(false); + } + } + return result; +} + +Status JemallocNodumpAllocator::DestroyArena(unsigned arena_index) { + assert(arena_index != 0); + std::string key = "arena." + ToString(arena_index) + ".destroy"; + int ret = mallctl(key.c_str(), nullptr, 0, nullptr, 0); + if (ret != 0) { + return Status::Incomplete("Failed to destroy jemalloc arena, error code: " + + ToString(ret)); + } + return Status::OK(); +} + +void JemallocNodumpAllocator::DestroyThreadSpecificCache(void* ptr) { + assert(ptr != nullptr); + unsigned* tcache_index = static_cast(ptr); + size_t tcache_index_size = sizeof(unsigned); + int ret __attribute__((__unused__)) = + mallctl("tcache.destroy", nullptr, 0, tcache_index, tcache_index_size); + // Silently ignore error. + assert(ret == 0); + delete tcache_index; +} + +JemallocNodumpAllocator::~JemallocNodumpAllocator() { + // Destroy tcache before destroying arena. + autovector tcache_list; + tcache_.Scrape(&tcache_list, nullptr); + for (void* tcache_index : tcache_list) { + DestroyThreadSpecificCache(tcache_index); + } + // Destroy arena. Silently ignore error. + Status s __attribute__((__unused__)) = DestroyArena(arena_index_); + assert(s.ok()); +} + +size_t JemallocNodumpAllocator::UsableSize(void* p, + size_t /*allocation_size*/) const { + return malloc_usable_size(static_cast(p)); +} +#endif // ROCKSDB_JEMALLOC_NODUMP_ALLOCATOR + +Status NewJemallocNodumpAllocator( + JemallocAllocatorOptions& options, + std::shared_ptr* memory_allocator) { + *memory_allocator = nullptr; + Status unsupported = Status::NotSupported( + "JemallocNodumpAllocator only available with jemalloc version >= 5 " + "and MADV_DONTDUMP is available."); +#ifndef ROCKSDB_JEMALLOC_NODUMP_ALLOCATOR + (void)options; + return unsupported; +#else + if (!HasJemalloc()) { + return unsupported; + } + if (memory_allocator == nullptr) { + return Status::InvalidArgument("memory_allocator must be non-null."); + } + if (options.limit_tcache_size && + options.tcache_size_lower_bound >= options.tcache_size_upper_bound) { + return Status::InvalidArgument( + "tcache_size_lower_bound larger or equal to tcache_size_upper_bound."); + } + + // Create arena. + unsigned arena_index = 0; + size_t arena_index_size = sizeof(arena_index); + int ret = + mallctl("arenas.create", &arena_index, &arena_index_size, nullptr, 0); + if (ret != 0) { + return Status::Incomplete("Failed to create jemalloc arena, error code: " + + ToString(ret)); + } + assert(arena_index != 0); + + // Read existing hooks. + std::string key = "arena." + ToString(arena_index) + ".extent_hooks"; + extent_hooks_t* hooks; + size_t hooks_size = sizeof(hooks); + ret = mallctl(key.c_str(), &hooks, &hooks_size, nullptr, 0); + if (ret != 0) { + JemallocNodumpAllocator::DestroyArena(arena_index); + return Status::Incomplete("Failed to read existing hooks, error code: " + + ToString(ret)); + } + + // Store existing alloc. + extent_alloc_t* original_alloc = hooks->alloc; + extent_alloc_t* expected = nullptr; + bool success = + JemallocNodumpAllocator::original_alloc_.compare_exchange_strong( + expected, original_alloc); + if (!success && original_alloc != expected) { + JemallocNodumpAllocator::DestroyArena(arena_index); + return Status::Incomplete("Original alloc conflict."); + } + + // Set the custom hook. + std::unique_ptr new_hooks(new extent_hooks_t(*hooks)); + new_hooks->alloc = &JemallocNodumpAllocator::Alloc; + extent_hooks_t* hooks_ptr = new_hooks.get(); + ret = mallctl(key.c_str(), nullptr, nullptr, &hooks_ptr, sizeof(hooks_ptr)); + if (ret != 0) { + JemallocNodumpAllocator::DestroyArena(arena_index); + return Status::Incomplete("Failed to set custom hook, error code: " + + ToString(ret)); + } + + // Create cache allocator. + memory_allocator->reset( + new JemallocNodumpAllocator(options, std::move(new_hooks), arena_index)); + return Status::OK(); +#endif // ROCKSDB_JEMALLOC_NODUMP_ALLOCATOR +} + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/jemalloc_nodump_allocator.h b/ceph/src/rocksdb/util/jemalloc_nodump_allocator.h new file mode 100644 index 000000000..e93c12237 --- /dev/null +++ b/ceph/src/rocksdb/util/jemalloc_nodump_allocator.h @@ -0,0 +1,79 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#pragma once + +#include +#include + +#include "port/jemalloc_helper.h" +#include "port/port.h" +#include "rocksdb/memory_allocator.h" +#include "util/core_local.h" +#include "util/thread_local.h" + +#if defined(ROCKSDB_JEMALLOC) && defined(ROCKSDB_PLATFORM_POSIX) + +#include + +#if (JEMALLOC_VERSION_MAJOR >= 5) && defined(MADV_DONTDUMP) +#define ROCKSDB_JEMALLOC_NODUMP_ALLOCATOR + +namespace rocksdb { + +class JemallocNodumpAllocator : public MemoryAllocator { + public: + JemallocNodumpAllocator(JemallocAllocatorOptions& options, + std::unique_ptr&& arena_hooks, + unsigned arena_index); + ~JemallocNodumpAllocator(); + + const char* Name() const override { return "JemallocNodumpAllocator"; } + void* Allocate(size_t size) override; + void Deallocate(void* p) override; + size_t UsableSize(void* p, size_t allocation_size) const override; + + private: + friend Status NewJemallocNodumpAllocator( + JemallocAllocatorOptions& options, + std::shared_ptr* memory_allocator); + + // Custom alloc hook to replace jemalloc default alloc. + static void* Alloc(extent_hooks_t* extent, void* new_addr, size_t size, + size_t alignment, bool* zero, bool* commit, + unsigned arena_ind); + + // Destroy arena on destruction of the allocator, or on failure. + static Status DestroyArena(unsigned arena_index); + + // Destroy tcache on destruction of the allocator, or thread exit. + static void DestroyThreadSpecificCache(void* ptr); + + // Get or create tcache. Return flag suitable to use with `mallocx`: + // either MALLOCX_TCACHE_NONE or MALLOCX_TCACHE(tc). + int GetThreadSpecificCache(size_t size); + + // A function pointer to jemalloc default alloc. Use atomic to make sure + // NewJemallocNodumpAllocator is thread-safe. + // + // Hack: original_alloc_ needs to be static for Alloc() to access it. + // alloc needs to be static to pass to jemalloc as function pointer. + static std::atomic original_alloc_; + + const JemallocAllocatorOptions options_; + + // Custom hooks has to outlive corresponding arena. + const std::unique_ptr arena_hooks_; + + // Arena index. + const unsigned arena_index_; + + // Hold thread-local tcache index. + ThreadLocalPtr tcache_; +}; + +} // namespace rocksdb +#endif // (JEMALLOC_VERSION_MAJOR >= 5) && MADV_DONTDUMP +#endif // ROCKSDB_JEMALLOC && ROCKSDB_PLATFORM_POSIX diff --git a/ceph/src/rocksdb/util/kv_map.h b/ceph/src/rocksdb/util/kv_map.h index 784a244ae..d5ba3307f 100644 --- a/ceph/src/rocksdb/util/kv_map.h +++ b/ceph/src/rocksdb/util/kv_map.h @@ -10,7 +10,6 @@ #include "rocksdb/comparator.h" #include "rocksdb/slice.h" #include "util/coding.h" -#include "util/murmurhash.h" namespace rocksdb { namespace stl_wrappers { diff --git a/ceph/src/rocksdb/util/log_write_bench.cc b/ceph/src/rocksdb/util/log_write_bench.cc index b4e12b948..5c9b3e84b 100644 --- a/ceph/src/rocksdb/util/log_write_bench.cc +++ b/ceph/src/rocksdb/util/log_write_bench.cc @@ -35,9 +35,9 @@ void RunBenchmark() { Env* env = Env::Default(); EnvOptions env_options = env->OptimizeForLogWrite(EnvOptions()); env_options.bytes_per_sync = FLAGS_bytes_per_sync; - unique_ptr file; + std::unique_ptr file; env->NewWritableFile(file_name, &file, env_options); - unique_ptr writer; + std::unique_ptr writer; writer.reset(new WritableFileWriter(std::move(file), env_options)); std::string record; diff --git a/ceph/src/rocksdb/util/logging.h b/ceph/src/rocksdb/util/logging.h index 992e0018d..a4ef31bd6 100644 --- a/ceph/src/rocksdb/util/logging.h +++ b/ceph/src/rocksdb/util/logging.h @@ -11,40 +11,51 @@ // with macros. #pragma once -#include "port/port.h" // Helper macros that include information about file name and line number -#define STRINGIFY(x) #x -#define TOSTRING(x) STRINGIFY(x) -#define PREPEND_FILE_LINE(FMT) ("[" __FILE__ ":" TOSTRING(__LINE__) "] " FMT) +#define ROCKS_LOG_STRINGIFY(x) #x +#define ROCKS_LOG_TOSTRING(x) ROCKS_LOG_STRINGIFY(x) +#define ROCKS_LOG_PREPEND_FILE_LINE(FMT) ("[%s:" ROCKS_LOG_TOSTRING(__LINE__) "] " FMT) + +inline const char* RocksLogShorterFileName(const char* file) +{ + // 15 is the length of "util/logging.h". + // If the name of this file changed, please change this number, too. + return file + (sizeof(__FILE__) > 15 ? sizeof(__FILE__) - 15 : 0); +} // Don't inclide file/line info in HEADER level -#define ROCKS_LOG_HEADER(LGR, FMT, ...) \ +#define ROCKS_LOG_HEADER(LGR, FMT, ...) \ rocksdb::Log(InfoLogLevel::HEADER_LEVEL, LGR, FMT, ##__VA_ARGS__) -#define ROCKS_LOG_DEBUG(LGR, FMT, ...) \ - rocksdb::Log(InfoLogLevel::DEBUG_LEVEL, LGR, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_DEBUG(LGR, FMT, ...) \ + rocksdb::Log(InfoLogLevel::DEBUG_LEVEL, LGR, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) + +#define ROCKS_LOG_INFO(LGR, FMT, ...) \ + rocksdb::Log(InfoLogLevel::INFO_LEVEL, LGR, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_INFO(LGR, FMT, ...) \ - rocksdb::Log(InfoLogLevel::INFO_LEVEL, LGR, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_WARN(LGR, FMT, ...) \ + rocksdb::Log(InfoLogLevel::WARN_LEVEL, LGR, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_WARN(LGR, FMT, ...) \ - rocksdb::Log(InfoLogLevel::WARN_LEVEL, LGR, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_ERROR(LGR, FMT, ...) \ + rocksdb::Log(InfoLogLevel::ERROR_LEVEL, LGR, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_ERROR(LGR, FMT, ...) \ - rocksdb::Log(InfoLogLevel::ERROR_LEVEL, LGR, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_FATAL(LGR, FMT, ...) \ + rocksdb::Log(InfoLogLevel::FATAL_LEVEL, LGR, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_FATAL(LGR, FMT, ...) \ - rocksdb::Log(InfoLogLevel::FATAL_LEVEL, LGR, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_BUFFER(LOG_BUF, FMT, ...) \ + rocksdb::LogToBuffer(LOG_BUF, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_BUFFER(LOG_BUF, FMT, ...) \ - rocksdb::LogToBuffer(LOG_BUF, PREPEND_FILE_LINE(FMT), ##__VA_ARGS__) +#define ROCKS_LOG_BUFFER_MAX_SZ(LOG_BUF, MAX_LOG_SIZE, FMT, ...) \ + rocksdb::LogToBuffer(LOG_BUF, MAX_LOG_SIZE, ROCKS_LOG_PREPEND_FILE_LINE(FMT), \ + RocksLogShorterFileName(__FILE__), ##__VA_ARGS__) -#define ROCKS_LOG_BUFFER_MAX_SZ(LOG_BUF, MAX_LOG_SIZE, FMT, ...) \ - rocksdb::LogToBuffer(LOG_BUF, MAX_LOG_SIZE, PREPEND_FILE_LINE(FMT), \ - ##__VA_ARGS__) +#define ROCKS_LOG_DETAILS(LGR, FMT, ...) \ + ; // due to overhead by default skip such lines +// ROCKS_LOG_DEBUG(LGR, FMT, ##__VA_ARGS__) diff --git a/ceph/src/rocksdb/util/memory_allocator.h b/ceph/src/rocksdb/util/memory_allocator.h new file mode 100644 index 000000000..99a7241d0 --- /dev/null +++ b/ceph/src/rocksdb/util/memory_allocator.h @@ -0,0 +1,38 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// + +#pragma once + +#include "rocksdb/memory_allocator.h" + +namespace rocksdb { + +struct CustomDeleter { + CustomDeleter(MemoryAllocator* a = nullptr) : allocator(a) {} + + void operator()(char* ptr) const { + if (allocator) { + allocator->Deallocate(reinterpret_cast(ptr)); + } else { + delete[] ptr; + } + } + + MemoryAllocator* allocator; +}; + +using CacheAllocationPtr = std::unique_ptr; + +inline CacheAllocationPtr AllocateBlock(size_t size, + MemoryAllocator* allocator) { + if (allocator) { + auto block = reinterpret_cast(allocator->Allocate(size)); + return CacheAllocationPtr(block, allocator); + } + return CacheAllocationPtr(new char[size]); +} + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/mock_time_env.h b/ceph/src/rocksdb/util/mock_time_env.h new file mode 100644 index 000000000..feada4777 --- /dev/null +++ b/ceph/src/rocksdb/util/mock_time_env.h @@ -0,0 +1,45 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +#pragma once + +#include "rocksdb/env.h" + +namespace rocksdb { + +class MockTimeEnv : public EnvWrapper { + public: + explicit MockTimeEnv(Env* base) : EnvWrapper(base) {} + + virtual Status GetCurrentTime(int64_t* time) override { + assert(time != nullptr); + assert(current_time_ <= + static_cast(std::numeric_limits::max())); + *time = static_cast(current_time_); + return Status::OK(); + } + + virtual uint64_t NowMicros() override { + assert(current_time_ <= std::numeric_limits::max() / 1000000); + return current_time_ * 1000000; + } + + virtual uint64_t NowNanos() override { + assert(current_time_ <= std::numeric_limits::max() / 1000000000); + return current_time_ * 1000000000; + } + + uint64_t RealNowMicros() { return target()->NowMicros(); } + + void set_current_time(uint64_t time) { + assert(time >= current_time_); + current_time_ = time; + } + + private: + std::atomic current_time_{0}; +}; + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/repeatable_thread.h b/ceph/src/rocksdb/util/repeatable_thread.h index 34164ca56..967cc4994 100644 --- a/ceph/src/rocksdb/util/repeatable_thread.h +++ b/ceph/src/rocksdb/util/repeatable_thread.h @@ -10,6 +10,7 @@ #include "port/port.h" #include "rocksdb/env.h" +#include "util/mock_time_env.h" #include "util/mutexlock.h" namespace rocksdb { @@ -24,6 +25,7 @@ class RepeatableThread { env_(env), delay_us_(delay_us), initial_delay_us_(initial_delay_us), + mutex_(env), cond_var_(&mutex_), running_(true), #ifndef NDEBUG @@ -35,7 +37,7 @@ class RepeatableThread { void cancel() { { - MutexLock l(&mutex_); + InstrumentedMutexLock l(&mutex_); if (!running_) { return; } @@ -45,6 +47,8 @@ class RepeatableThread { thread_.join(); } + bool IsRunning() { return running_; } + ~RepeatableThread() { cancel(); } #ifndef NDEBUG @@ -55,7 +59,7 @@ class RepeatableThread { // // Note: only support one caller of this method. void TEST_WaitForRun(std::function callback = nullptr) { - MutexLock l(&mutex_); + InstrumentedMutexLock l(&mutex_); while (!waiting_) { cond_var_.Wait(); } @@ -72,7 +76,7 @@ class RepeatableThread { private: bool wait(uint64_t delay) { - MutexLock l(&mutex_); + InstrumentedMutexLock l(&mutex_); if (running_ && delay > 0) { uint64_t wait_until = env_->NowMicros() + delay; #ifndef NDEBUG @@ -111,7 +115,7 @@ class RepeatableThread { function_(); #ifndef NDEBUG { - MutexLock l(&mutex_); + InstrumentedMutexLock l(&mutex_); run_count_++; cond_var_.SignalAll(); } @@ -127,8 +131,8 @@ class RepeatableThread { // Mutex lock should be held when accessing running_, waiting_ // and run_count_. - port::Mutex mutex_; - port::CondVar cond_var_; + InstrumentedMutex mutex_; + InstrumentedCondVar cond_var_; bool running_; #ifndef NDEBUG // RepeatableThread waiting for timeout. diff --git a/ceph/src/rocksdb/util/repeatable_thread_test.cc b/ceph/src/rocksdb/util/repeatable_thread_test.cc index dec437da3..ee853c105 100644 --- a/ceph/src/rocksdb/util/repeatable_thread_test.cc +++ b/ceph/src/rocksdb/util/repeatable_thread_test.cc @@ -8,6 +8,7 @@ #include "db/db_test_util.h" #include "util/repeatable_thread.h" +#include "util/sync_point.h" #include "util/testharness.h" class RepeatableThreadTest : public testing::Test { @@ -56,6 +57,35 @@ TEST_F(RepeatableThreadTest, MockEnvTest) { constexpr int kIteration = 3; mock_env_->set_current_time(0); // in seconds std::atomic count{0}; + +#if defined(OS_MACOSX) && !defined(NDEBUG) + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "InstrumentedCondVar::TimedWaitInternal", [&](void* arg) { + // Obtain the current (real) time in seconds and add 1000 extra seconds + // to ensure that RepeatableThread::wait invokes TimedWait with a time + // greater than (real) current time. This is to prevent the TimedWait + // function from returning immediately without sleeping and releasing + // the mutex on certain platforms, e.g. OS X. If TimedWait returns + // immediately, the mutex will not be released, and + // RepeatableThread::TEST_WaitForRun never has a chance to execute the + // callback which, in this case, updates the result returned by + // mock_env->NowMicros. Consequently, RepeatableThread::wait cannot + // break out of the loop, causing test to hang. The extra 1000 seconds + // is a best-effort approach because there seems no reliable and + // deterministic way to provide the aforementioned guarantee. By the + // time RepeatableThread::wait is called, it is no guarantee that the + // delay + mock_env->NowMicros will be greater than the current real + // time. However, 1000 seconds should be sufficient in most cases. + uint64_t time_us = *reinterpret_cast(arg); + if (time_us < mock_env_->RealNowMicros()) { + *reinterpret_cast(arg) = mock_env_->RealNowMicros() + 1000; + } + }); + rocksdb::SyncPoint::GetInstance()->EnableProcessing(); +#endif // OS_MACOSX && !NDEBUG + rocksdb::RepeatableThread thread([&] { count++; }, "rt_test", mock_env_.get(), 1 * kSecond, 1 * kSecond); for (int i = 1; i <= kIteration; i++) { diff --git a/ceph/src/rocksdb/util/slice.cc b/ceph/src/rocksdb/util/slice.cc index ed8c48169..5e23ae0a3 100644 --- a/ceph/src/rocksdb/util/slice.cc +++ b/ceph/src/rocksdb/util/slice.cc @@ -32,27 +32,27 @@ class FixedPrefixTransform : public SliceTransform { // the class implementation itself. name_("rocksdb.FixedPrefix." + ToString(prefix_len_)) {} - virtual const char* Name() const override { return name_.c_str(); } + const char* Name() const override { return name_.c_str(); } - virtual Slice Transform(const Slice& src) const override { + Slice Transform(const Slice& src) const override { assert(InDomain(src)); return Slice(src.data(), prefix_len_); } - virtual bool InDomain(const Slice& src) const override { + bool InDomain(const Slice& src) const override { return (src.size() >= prefix_len_); } - virtual bool InRange(const Slice& dst) const override { + bool InRange(const Slice& dst) const override { return (dst.size() == prefix_len_); } - virtual bool FullLengthEnabled(size_t* len) const override { + bool FullLengthEnabled(size_t* len) const override { *len = prefix_len_; return true; } - virtual bool SameResultWhenAppended(const Slice& prefix) const override { + bool SameResultWhenAppended(const Slice& prefix) const override { return InDomain(prefix); } }; @@ -72,25 +72,25 @@ class CappedPrefixTransform : public SliceTransform { // the class implementation itself. name_("rocksdb.CappedPrefix." + ToString(cap_len_)) {} - virtual const char* Name() const override { return name_.c_str(); } + const char* Name() const override { return name_.c_str(); } - virtual Slice Transform(const Slice& src) const override { + Slice Transform(const Slice& src) const override { assert(InDomain(src)); return Slice(src.data(), std::min(cap_len_, src.size())); } - virtual bool InDomain(const Slice& /*src*/) const override { return true; } + bool InDomain(const Slice& /*src*/) const override { return true; } - virtual bool InRange(const Slice& dst) const override { + bool InRange(const Slice& dst) const override { return (dst.size() <= cap_len_); } - virtual bool FullLengthEnabled(size_t* len) const override { + bool FullLengthEnabled(size_t* len) const override { *len = cap_len_; return true; } - virtual bool SameResultWhenAppended(const Slice& prefix) const override { + bool SameResultWhenAppended(const Slice& prefix) const override { return prefix.size() >= cap_len_; } }; @@ -99,15 +99,15 @@ class NoopTransform : public SliceTransform { public: explicit NoopTransform() { } - virtual const char* Name() const override { return "rocksdb.Noop"; } + const char* Name() const override { return "rocksdb.Noop"; } - virtual Slice Transform(const Slice& src) const override { return src; } + Slice Transform(const Slice& src) const override { return src; } - virtual bool InDomain(const Slice& /*src*/) const override { return true; } + bool InDomain(const Slice& /*src*/) const override { return true; } - virtual bool InRange(const Slice& /*dst*/) const override { return true; } + bool InRange(const Slice& /*dst*/) const override { return true; } - virtual bool SameResultWhenAppended(const Slice& /*prefix*/) const override { + bool SameResultWhenAppended(const Slice& /*prefix*/) const override { return false; } }; diff --git a/ceph/src/rocksdb/util/slice_transform_test.cc b/ceph/src/rocksdb/util/slice_transform_test.cc index ddbb9f4bf..f91675cce 100644 --- a/ceph/src/rocksdb/util/slice_transform_test.cc +++ b/ceph/src/rocksdb/util/slice_transform_test.cc @@ -24,7 +24,7 @@ TEST_F(SliceTransformTest, CapPrefixTransform) { std::string s; s = "abcdefge"; - unique_ptr transform; + std::unique_ptr transform; transform.reset(NewCappedPrefixTransform(6)); ASSERT_EQ(transform->Transform(s).ToString(), "abcdef"); @@ -57,7 +57,7 @@ class SliceTransformDBTest : public testing::Test { EXPECT_OK(DestroyDB(dbname_, last_options_)); } - ~SliceTransformDBTest() { + ~SliceTransformDBTest() override { delete db_; EXPECT_OK(DestroyDB(dbname_, last_options_)); } @@ -115,7 +115,7 @@ TEST_F(SliceTransformDBTest, CapPrefix) { ASSERT_OK(db()->Put(wo, "foo3", "bar3")); ASSERT_OK(db()->Flush(fo)); - unique_ptr iter(db()->NewIterator(ro)); + std::unique_ptr iter(db()->NewIterator(ro)); iter->Seek("foo"); ASSERT_OK(iter->status()); diff --git a/ceph/src/rocksdb/util/sst_file_manager_impl.cc b/ceph/src/rocksdb/util/sst_file_manager_impl.cc index ee1394bc9..6a770b106 100644 --- a/ceph/src/rocksdb/util/sst_file_manager_impl.cc +++ b/ceph/src/rocksdb/util/sst_file_manager_impl.cc @@ -5,6 +5,11 @@ #include "util/sst_file_manager_impl.h" +#ifndef __STDC_FORMAT_MACROS +#define __STDC_FORMAT_MACROS +#endif + +#include #include #include "db/db_impl.h" @@ -189,8 +194,11 @@ bool SstFileManagerImpl::EnoughRoomForCompaction( needed_headroom -= in_progress_files_size_; if (free_space < needed_headroom + size_added_by_compaction) { // We hit the condition of not enough disk space - ROCKS_LOG_ERROR(logger_, "free space [%d bytes] is less than " - "needed headroom [%d bytes]\n", free_space, needed_headroom); + ROCKS_LOG_ERROR(logger_, + "free space [%" PRIu64 + " bytes] is less than " + "needed headroom [%" ROCKSDB_PRIszt " bytes]\n", + free_space, needed_headroom); return false; } } @@ -266,17 +274,22 @@ void SstFileManagerImpl::ClearError() { // now if (bg_err_.severity() == Status::Severity::kHardError) { if (free_space < reserved_disk_buffer_) { - ROCKS_LOG_ERROR(logger_, "free space [%d bytes] is less than " - "required disk buffer [%d bytes]\n", free_space, - reserved_disk_buffer_); + ROCKS_LOG_ERROR(logger_, + "free space [%" PRIu64 + " bytes] is less than " + "required disk buffer [%" PRIu64 " bytes]\n", + free_space, reserved_disk_buffer_); ROCKS_LOG_ERROR(logger_, "Cannot clear hard error\n"); s = Status::NoSpace(); } } else if (bg_err_.severity() == Status::Severity::kSoftError) { if (free_space < free_space_trigger_) { - ROCKS_LOG_WARN(logger_, "free space [%d bytes] is less than " - "free space for compaction trigger [%d bytes]\n", free_space, - free_space_trigger_); + ROCKS_LOG_WARN(logger_, + "free space [%" PRIu64 + " bytes] is less than " + "free space for compaction trigger [%" PRIu64 + " bytes]\n", + free_space, free_space_trigger_); ROCKS_LOG_WARN(logger_, "Cannot clear soft error\n"); s = Status::NoSpace(); } @@ -402,8 +415,11 @@ bool SstFileManagerImpl::CancelErrorRecovery(ErrorHandler* handler) { } Status SstFileManagerImpl::ScheduleFileDeletion( - const std::string& file_path, const std::string& path_to_sync) { - return delete_scheduler_.DeleteFile(file_path, path_to_sync); + const std::string& file_path, const std::string& path_to_sync, + const bool force_bg) { + TEST_SYNC_POINT("SstFileManagerImpl::ScheduleFileDeletion"); + return delete_scheduler_.DeleteFile(file_path, path_to_sync, + force_bg); } void SstFileManagerImpl::WaitForEmptyTrash() { diff --git a/ceph/src/rocksdb/util/sst_file_manager_impl.h b/ceph/src/rocksdb/util/sst_file_manager_impl.h index d11035df8..211b4fa71 100644 --- a/ceph/src/rocksdb/util/sst_file_manager_impl.h +++ b/ceph/src/rocksdb/util/sst_file_manager_impl.h @@ -111,9 +111,12 @@ class SstFileManagerImpl : public SstFileManager { // not guaranteed bool CancelErrorRecovery(ErrorHandler* db); - // Mark file as trash and schedule it's deletion. + // Mark file as trash and schedule it's deletion. If force_bg is set, it + // forces the file to be deleting in the background regardless of DB size, + // except when rate limited delete is disabled virtual Status ScheduleFileDeletion(const std::string& file_path, - const std::string& dir_to_sync); + const std::string& dir_to_sync, + const bool force_bg = false); // Wait for all files being deleteing in the background to finish or for // destructor to be called. diff --git a/ceph/src/rocksdb/util/status.cc b/ceph/src/rocksdb/util/status.cc index 5b3dcf8e9..c66bf6f8e 100644 --- a/ceph/src/rocksdb/util/status.cc +++ b/ceph/src/rocksdb/util/status.cc @@ -41,7 +41,8 @@ static const char* msgs[static_cast(Status::kMaxSubCode)] = { "Deadlock", // kDeadlock "Stale file handle", // kStaleFile "Memory limit reached", // kMemoryLimit - "Space limit reached" // kSpaceLimit + "Space limit reached", // kSpaceLimit + "No such file or directory", // kPathNotFound }; Status::Status(Code _code, SubCode _subcode, const Slice& msg, diff --git a/ceph/src/rocksdb/util/stop_watch.h b/ceph/src/rocksdb/util/stop_watch.h index b018eb1d6..afa708e37 100644 --- a/ceph/src/rocksdb/util/stop_watch.h +++ b/ceph/src/rocksdb/util/stop_watch.h @@ -22,7 +22,10 @@ class StopWatch { hist_type_(hist_type), elapsed_(elapsed), overwrite_(overwrite), - stats_enabled_(statistics && statistics->HistEnabledForType(hist_type)), + stats_enabled_(statistics && + statistics->get_stats_level() >= + StatsLevel::kExceptTimers && + statistics->HistEnabledForType(hist_type)), delay_enabled_(delay_enabled), total_delay_(0), delay_start_time_(0), @@ -41,9 +44,10 @@ class StopWatch { *elapsed_ -= total_delay_; } if (stats_enabled_) { - statistics_->measureTime(hist_type_, - (elapsed_ != nullptr) ? *elapsed_ : - (env_->NowMicros() - start_time_)); + statistics_->reportTimeToHistogram( + hist_type_, (elapsed_ != nullptr) + ? *elapsed_ + : (env_->NowMicros() - start_time_)); } } diff --git a/ceph/src/rocksdb/util/string_util.cc b/ceph/src/rocksdb/util/string_util.cc index f3581105e..26e6759ac 100644 --- a/ceph/src/rocksdb/util/string_util.cc +++ b/ceph/src/rocksdb/util/string_util.cc @@ -21,6 +21,7 @@ #include #include #include "rocksdb/env.h" +#include "port/port.h" #include "rocksdb/slice.h" namespace rocksdb { @@ -276,6 +277,15 @@ uint32_t ParseUint32(const std::string& value) { } } +int32_t ParseInt32(const std::string& value) { + int64_t num = ParseInt64(value); + if (num <= port::kMaxInt32 && num >= port::kMinInt32) { + return static_cast(num); + } else { + throw std::out_of_range(value); + } +} + #endif uint64_t ParseUint64(const std::string& value) { @@ -303,6 +313,31 @@ uint64_t ParseUint64(const std::string& value) { return num; } +int64_t ParseInt64(const std::string& value) { + size_t endchar; +#ifndef CYGWIN + int64_t num = std::stoll(value.c_str(), &endchar); +#else + char* endptr; + int64_t num = std::strtoll(value.c_str(), &endptr, 0); + endchar = endptr - value.c_str(); +#endif + + if (endchar < value.length()) { + char c = value[endchar]; + if (c == 'k' || c == 'K') + num <<= 10LL; + else if (c == 'm' || c == 'M') + num <<= 20LL; + else if (c == 'g' || c == 'G') + num <<= 30LL; + else if (c == 't' || c == 'T') + num <<= 40LL; + } + + return num; +} + int ParseInt(const std::string& value) { size_t endchar; #ifndef CYGWIN diff --git a/ceph/src/rocksdb/util/string_util.h b/ceph/src/rocksdb/util/string_util.h index b2bca40ac..6e125ddfa 100644 --- a/ceph/src/rocksdb/util/string_util.h +++ b/ceph/src/rocksdb/util/string_util.h @@ -109,12 +109,17 @@ std::string trim(const std::string& str); bool ParseBoolean(const std::string& type, const std::string& value); uint32_t ParseUint32(const std::string& value); + +int32_t ParseInt32(const std::string& value); #endif uint64_t ParseUint64(const std::string& value); int ParseInt(const std::string& value); + +int64_t ParseInt64(const std::string& value); + double ParseDouble(const std::string& value); size_t ParseSizeT(const std::string& value); diff --git a/ceph/src/rocksdb/util/sync_point.cc b/ceph/src/rocksdb/util/sync_point.cc index ce0fa0a97..4599c256d 100644 --- a/ceph/src/rocksdb/util/sync_point.cc +++ b/ceph/src/rocksdb/util/sync_point.cc @@ -17,9 +17,7 @@ SyncPoint* SyncPoint::GetInstance() { return &sync_point; } -SyncPoint::SyncPoint() : - impl_(new Data) { -} +SyncPoint::SyncPoint() : impl_(new Data) {} SyncPoint:: ~SyncPoint() { delete impl_; diff --git a/ceph/src/rocksdb/util/testutil.cc b/ceph/src/rocksdb/util/testutil.cc index 0983f759c..ec95d107e 100644 --- a/ceph/src/rocksdb/util/testutil.cc +++ b/ceph/src/rocksdb/util/testutil.cc @@ -87,11 +87,9 @@ class Uint64ComparatorImpl : public Comparator { public: Uint64ComparatorImpl() {} - virtual const char* Name() const override { - return "rocksdb.Uint64Comparator"; - } + const char* Name() const override { return "rocksdb.Uint64Comparator"; } - virtual int Compare(const Slice& a, const Slice& b) const override { + int Compare(const Slice& a, const Slice& b) const override { assert(a.size() == sizeof(uint64_t) && b.size() == sizeof(uint64_t)); const uint64_t* left = reinterpret_cast(a.data()); const uint64_t* right = reinterpret_cast(b.data()); @@ -108,14 +106,12 @@ class Uint64ComparatorImpl : public Comparator { } } - virtual void FindShortestSeparator(std::string* /*start*/, - const Slice& /*limit*/) const override { + void FindShortestSeparator(std::string* /*start*/, + const Slice& /*limit*/) const override { return; } - virtual void FindShortSuccessor(std::string* /*key*/) const override { - return; - } + void FindShortSuccessor(std::string* /*key*/) const override { return; } }; } // namespace @@ -126,19 +122,19 @@ const Comparator* Uint64Comparator() { WritableFileWriter* GetWritableFileWriter(WritableFile* wf, const std::string& fname) { - unique_ptr file(wf); + std::unique_ptr file(wf); return new WritableFileWriter(std::move(file), fname, EnvOptions()); } RandomAccessFileReader* GetRandomAccessFileReader(RandomAccessFile* raf) { - unique_ptr file(raf); + std::unique_ptr file(raf); return new RandomAccessFileReader(std::move(file), "[test RandomAccessFileReader]"); } SequentialFileReader* GetSequentialFileReader(SequentialFile* se, const std::string& fname) { - unique_ptr file(se); + std::unique_ptr file(se); return new SequentialFileReader(std::move(file), fname); } @@ -310,6 +306,7 @@ void RandomInitCFOptions(ColumnFamilyOptions* cf_opt, Random* rnd) { cf_opt->purge_redundant_kvs_while_flush = rnd->Uniform(2); cf_opt->force_consistency_checks = rnd->Uniform(2); cf_opt->compaction_options_fifo.allow_compaction = rnd->Uniform(2); + cf_opt->memtable_whole_key_filtering = rnd->Uniform(2); // double options cf_opt->hard_rate_limit = static_cast(rnd->Uniform(10000)) / 13; @@ -355,7 +352,6 @@ void RandomInitCFOptions(ColumnFamilyOptions* cf_opt, Random* rnd) { cf_opt->target_file_size_base * rnd->Uniform(100); cf_opt->compaction_options_fifo.max_table_files_size = uint_max + rnd->Uniform(10000); - cf_opt->compaction_options_fifo.ttl = uint_max + rnd->Uniform(10000); // unsigned int options cf_opt->rate_limit_delay_max_milliseconds = rnd->Uniform(10000); @@ -401,5 +397,21 @@ Status DestroyDir(Env* env, const std::string& dir) { return s; } +bool IsDirectIOSupported(Env* env, const std::string& dir) { + EnvOptions env_options; + env_options.use_mmap_writes = false; + env_options.use_direct_writes = true; + std::string tmp = TempFileName(dir, 999); + Status s; + { + std::unique_ptr file; + s = env->NewWritableFile(tmp, &file, env_options); + } + if (s.ok()) { + s = env->DeleteFile(tmp); + } + return s.ok(); +} + } // namespace test } // namespace rocksdb diff --git a/ceph/src/rocksdb/util/testutil.h b/ceph/src/rocksdb/util/testutil.h index c16c0cbe5..2aab3df72 100644 --- a/ceph/src/rocksdb/util/testutil.h +++ b/ceph/src/rocksdb/util/testutil.h @@ -64,7 +64,7 @@ class ErrorEnv : public EnvWrapper { num_writable_file_errors_(0) { } virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, + std::unique_ptr* result, const EnvOptions& soptions) override { result->reset(); if (writable_file_error_) { @@ -554,7 +554,7 @@ class StringEnv : public EnvWrapper { const Status WriteToNewFile(const std::string& file_name, const std::string& content) { - unique_ptr r; + std::unique_ptr r; auto s = NewWritableFile(file_name, &r, EnvOptions()); if (!s.ok()) { return s; @@ -567,7 +567,8 @@ class StringEnv : public EnvWrapper { } // The following text is boilerplate that forwards all methods to target() - Status NewSequentialFile(const std::string& f, unique_ptr* r, + Status NewSequentialFile(const std::string& f, + std::unique_ptr* r, const EnvOptions& /*options*/) override { auto iter = files_.find(f); if (iter == files_.end()) { @@ -577,11 +578,11 @@ class StringEnv : public EnvWrapper { return Status::OK(); } Status NewRandomAccessFile(const std::string& /*f*/, - unique_ptr* /*r*/, + std::unique_ptr* /*r*/, const EnvOptions& /*options*/) override { return Status::NotSupported(); } - Status NewWritableFile(const std::string& f, unique_ptr* r, + Status NewWritableFile(const std::string& f, std::unique_ptr* r, const EnvOptions& /*options*/) override { auto iter = files_.find(f); if (iter != files_.end()) { @@ -591,7 +592,7 @@ class StringEnv : public EnvWrapper { return Status::OK(); } virtual Status NewDirectory(const std::string& /*name*/, - unique_ptr* /*result*/) override { + std::unique_ptr* /*result*/) override { return Status::NotSupported(); } Status FileExists(const std::string& f) override { @@ -747,5 +748,7 @@ std::string RandomName(Random* rnd, const size_t len); Status DestroyDir(Env* env, const std::string& dir); +bool IsDirectIOSupported(Env* env, const std::string& dir); + } // namespace test } // namespace rocksdb diff --git a/ceph/src/rocksdb/util/thread_local.cc b/ceph/src/rocksdb/util/thread_local.cc index dea2002a0..7346eff11 100644 --- a/ceph/src/rocksdb/util/thread_local.cc +++ b/ceph/src/rocksdb/util/thread_local.cc @@ -204,7 +204,7 @@ extern "C" { // The linker must not discard thread_callback_on_exit. (We force a reference // to this variable with a linker /include:symbol pragma to ensure that.) If // this variable is discarded, the OnThreadExit function will never be called. -#ifdef _WIN64 +#ifndef _X86_ // .CRT section is merged with .rdata on x64 so it must be constant data. #pragma const_seg(".CRT$XLB") @@ -219,7 +219,7 @@ const PIMAGE_TLS_CALLBACK p_thread_callback_on_exit = #pragma comment(linker, "/include:_tls_used") #pragma comment(linker, "/include:p_thread_callback_on_exit") -#else // _WIN64 +#else // _X86_ #pragma data_seg(".CRT$XLB") PIMAGE_TLS_CALLBACK p_thread_callback_on_exit = wintlscleanup::WinOnThreadExit; @@ -229,7 +229,7 @@ PIMAGE_TLS_CALLBACK p_thread_callback_on_exit = wintlscleanup::WinOnThreadExit; #pragma comment(linker, "/INCLUDE:__tls_used") #pragma comment(linker, "/INCLUDE:_p_thread_callback_on_exit") -#endif // _WIN64 +#endif // _X86_ #else // https://github.com/couchbase/gperftools/blob/master/src/windows/port.cc diff --git a/ceph/src/rocksdb/util/thread_operation.h b/ceph/src/rocksdb/util/thread_operation.h index 025392b59..f1827da0a 100644 --- a/ceph/src/rocksdb/util/thread_operation.h +++ b/ceph/src/rocksdb/util/thread_operation.h @@ -70,7 +70,7 @@ static OperationStageInfo global_op_stage_table[] = { {ThreadStatus::STAGE_MEMTABLE_ROLLBACK, "MemTableList::RollbackMemtableFlush"}, {ThreadStatus::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS, - "MemTableList::InstallMemtableFlushResults"}, + "MemTableList::TryInstallMemtableFlushResults"}, }; // The structure that describes a state. diff --git a/ceph/src/rocksdb/util/threadpool_imp.cc b/ceph/src/rocksdb/util/threadpool_imp.cc index d850b7c9e..acac0063b 100644 --- a/ceph/src/rocksdb/util/threadpool_imp.cc +++ b/ceph/src/rocksdb/util/threadpool_imp.cc @@ -188,7 +188,7 @@ void ThreadPoolImpl::Impl::BGThread(size_t thread_id) { bool low_cpu_priority = false; while (true) { -// Wait until there is an item that is ready to run + // Wait until there is an item that is ready to run std::unique_lock lock(mu_); // Stop waiting if the thread needs to do work or needs to terminate. while (!exit_all_threads_ && !IsLastExcessiveThread(thread_id) && @@ -198,7 +198,7 @@ void ThreadPoolImpl::Impl::BGThread(size_t thread_id) { if (exit_all_threads_) { // mechanism to let BG threads exit safely - if(!wait_for_jobs_to_complete_ || + if (!wait_for_jobs_to_complete_ || queue_.empty()) { break; } @@ -292,6 +292,9 @@ void* ThreadPoolImpl::Impl::BGThreadWrapper(void* arg) { case Env::Priority::BOTTOM: thread_type = ThreadStatus::BOTTOM_PRIORITY; break; + case Env::Priority::USER: + thread_type = ThreadStatus::USER; + break; case Env::Priority::TOTAL: assert(false); return nullptr; diff --git a/ceph/src/rocksdb/util/timer_queue.h b/ceph/src/rocksdb/util/timer_queue.h index f068ffefb..bd8a4f850 100644 --- a/ceph/src/rocksdb/util/timer_queue.h +++ b/ceph/src/rocksdb/util/timer_queue.h @@ -22,8 +22,6 @@ #pragma once -#include "port/port.h" - #include #include #include @@ -33,6 +31,9 @@ #include #include +#include "port/port.h" +#include "util/sync_point.h" + // Allows execution of handlers at a specified time in the future // Guarantees: // - All handlers are executed ONCE, even if cancelled (aborted parameter will @@ -48,7 +49,13 @@ class TimerQueue { public: TimerQueue() : m_th(&TimerQueue::run, this) {} - ~TimerQueue() { + ~TimerQueue() { shutdown(); } + + // This function is not thread-safe. + void shutdown() { + if (closed_) { + return; + } cancelAll(); // Abusing the timer queue to trigger the shutdown. add(0, [this](bool) { @@ -56,6 +63,7 @@ class TimerQueue { return std::make_pair(false, 0); }); m_th.join(); + closed_ = true; } // Adds a new timer @@ -67,6 +75,7 @@ class TimerQueue { WorkItem item; Clock::time_point tp = Clock::now(); item.end = tp + std::chrono::milliseconds(milliseconds); + TEST_SYNC_POINT_CALLBACK("TimeQueue::Add:item.end", &item.end); item.period = milliseconds; item.handler = std::move(handler); @@ -217,4 +226,5 @@ class TimerQueue { std::vector& getContainer() { return this->c; } } m_items; rocksdb::port::Thread m_th; + bool closed_ = false; }; diff --git a/ceph/src/rocksdb/util/trace_replay.cc b/ceph/src/rocksdb/util/trace_replay.cc index cd2e3ee95..28160b292 100644 --- a/ceph/src/rocksdb/util/trace_replay.cc +++ b/ceph/src/rocksdb/util/trace_replay.cc @@ -16,6 +16,8 @@ namespace rocksdb { +const std::string kTraceMagic = "feedcafedeadbeef"; + namespace { void EncodeCFAndKey(std::string* dst, uint32_t cf_id, const Slice& key) { PutFixed32(dst, cf_id); @@ -29,45 +31,88 @@ void DecodeCFAndKey(std::string& buffer, uint32_t* cf_id, Slice* key) { } } // namespace -Tracer::Tracer(Env* env, std::unique_ptr&& trace_writer) - : env_(env), trace_writer_(std::move(trace_writer)) { +Tracer::Tracer(Env* env, const TraceOptions& trace_options, + std::unique_ptr&& trace_writer) + : env_(env), + trace_options_(trace_options), + trace_writer_(std::move(trace_writer)), + trace_request_count_ (0) { WriteHeader(); } Tracer::~Tracer() { trace_writer_.reset(); } Status Tracer::Write(WriteBatch* write_batch) { + TraceType trace_type = kTraceWrite; + if (ShouldSkipTrace(trace_type)) { + return Status::OK(); + } Trace trace; trace.ts = env_->NowMicros(); - trace.type = kTraceWrite; + trace.type = trace_type; trace.payload = write_batch->Data(); return WriteTrace(trace); } Status Tracer::Get(ColumnFamilyHandle* column_family, const Slice& key) { + TraceType trace_type = kTraceGet; + if (ShouldSkipTrace(trace_type)) { + return Status::OK(); + } Trace trace; trace.ts = env_->NowMicros(); - trace.type = kTraceGet; + trace.type = trace_type; EncodeCFAndKey(&trace.payload, column_family->GetID(), key); return WriteTrace(trace); } Status Tracer::IteratorSeek(const uint32_t& cf_id, const Slice& key) { + TraceType trace_type = kTraceIteratorSeek; + if (ShouldSkipTrace(trace_type)) { + return Status::OK(); + } Trace trace; trace.ts = env_->NowMicros(); - trace.type = kTraceIteratorSeek; + trace.type = trace_type; EncodeCFAndKey(&trace.payload, cf_id, key); return WriteTrace(trace); } Status Tracer::IteratorSeekForPrev(const uint32_t& cf_id, const Slice& key) { + TraceType trace_type = kTraceIteratorSeekForPrev; + if (ShouldSkipTrace(trace_type)) { + return Status::OK(); + } Trace trace; trace.ts = env_->NowMicros(); - trace.type = kTraceIteratorSeekForPrev; + trace.type = trace_type; EncodeCFAndKey(&trace.payload, cf_id, key); return WriteTrace(trace); } +bool Tracer::ShouldSkipTrace(const TraceType& trace_type) { + if (IsTraceFileOverMax()) { + return true; + } + if ((trace_options_.filter & kTraceFilterGet + && trace_type == kTraceGet) + || (trace_options_.filter & kTraceFilterWrite + && trace_type == kTraceWrite)) { + return true; + } + ++trace_request_count_; + if (trace_request_count_ < trace_options_.sampling_frequency) { + return true; + } + trace_request_count_ = 0; + return false; +} + +bool Tracer::IsTraceFileOverMax() { + uint64_t trace_file_size = trace_writer_->GetFileSize(); + return (trace_file_size > trace_options_.max_trace_file_size); +} + Status Tracer::WriteHeader() { std::ostringstream s; s << kTraceMagic << "\t" @@ -103,7 +148,7 @@ Status Tracer::WriteTrace(const Trace& trace) { Status Tracer::Close() { return WriteFooter(); } Replayer::Replayer(DB* db, const std::vector& handles, - unique_ptr&& reader) + std::unique_ptr&& reader) : trace_reader_(std::move(reader)) { assert(db != nullptr); db_ = static_cast(db->GetRootDB()); diff --git a/ceph/src/rocksdb/util/trace_replay.h b/ceph/src/rocksdb/util/trace_replay.h index b324696f0..749ea2f64 100644 --- a/ceph/src/rocksdb/util/trace_replay.h +++ b/ceph/src/rocksdb/util/trace_replay.h @@ -10,6 +10,7 @@ #include #include "rocksdb/env.h" +#include "rocksdb/options.h" #include "rocksdb/trace_reader_writer.h" namespace rocksdb { @@ -21,7 +22,7 @@ class DBImpl; class Slice; class WriteBatch; -const std::string kTraceMagic = "feedcafedeadbeef"; +extern const std::string kTraceMagic; const unsigned int kTraceTimestampSize = 8; const unsigned int kTraceTypeSize = 1; const unsigned int kTracePayloadLengthSize = 4; @@ -55,13 +56,15 @@ struct Trace { // Trace RocksDB operations using a TraceWriter. class Tracer { public: - Tracer(Env* env, std::unique_ptr&& trace_writer); + Tracer(Env* env, const TraceOptions& trace_options, + std::unique_ptr&& trace_writer); ~Tracer(); Status Write(WriteBatch* write_batch); Status Get(ColumnFamilyHandle* cfname, const Slice& key); Status IteratorSeek(const uint32_t& cf_id, const Slice& key); Status IteratorSeekForPrev(const uint32_t& cf_id, const Slice& key); + bool IsTraceFileOverMax(); Status Close(); @@ -69,9 +72,12 @@ class Tracer { Status WriteHeader(); Status WriteFooter(); Status WriteTrace(const Trace& trace); + bool ShouldSkipTrace(const TraceType& type); Env* env_; - unique_ptr trace_writer_; + TraceOptions trace_options_; + std::unique_ptr trace_writer_; + uint64_t trace_request_count_; }; // Replay RocksDB operations from a trace. diff --git a/ceph/src/rocksdb/util/transaction_test_util.cc b/ceph/src/rocksdb/util/transaction_test_util.cc index 633391891..30cff11e1 100644 --- a/ceph/src/rocksdb/util/transaction_test_util.cc +++ b/ceph/src/rocksdb/util/transaction_test_util.cc @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -20,6 +21,10 @@ #include "rocksdb/utilities/optimistic_transaction_db.h" #include "rocksdb/utilities/transaction.h" #include "rocksdb/utilities/transaction_db.h" + +#include "db/dbformat.h" +#include "db/snapshot_impl.h" +#include "util/logging.h" #include "util/random.h" #include "util/string_util.h" @@ -27,13 +32,15 @@ namespace rocksdb { RandomTransactionInserter::RandomTransactionInserter( Random64* rand, const WriteOptions& write_options, - const ReadOptions& read_options, uint64_t num_keys, uint16_t num_sets) + const ReadOptions& read_options, uint64_t num_keys, uint16_t num_sets, + const uint64_t cmt_delay_ms, const uint64_t first_id) : rand_(rand), write_options_(write_options), read_options_(read_options), num_keys_(num_keys), num_sets_(num_sets), - txn_id_(0) {} + txn_id_(first_id), + cmt_delay_ms_(cmt_delay_ms) {} RandomTransactionInserter::~RandomTransactionInserter() { if (txn_ != nullptr) { @@ -50,17 +57,18 @@ bool RandomTransactionInserter::TransactionDBInsert( std::hash hasher; char name[64]; - snprintf(name, 64, "txn%" ROCKSDB_PRIszt "-%d", + snprintf(name, 64, "txn%" ROCKSDB_PRIszt "-%" PRIu64, hasher(std::this_thread::get_id()), txn_id_++); assert(strlen(name) < 64 - 1); - txn_->SetName(name); + assert(txn_->SetName(name).ok()); - bool take_snapshot = rand_->OneIn(2); + // Take a snapshot if set_snapshot was not set or with 50% change otherwise + bool take_snapshot = txn_->GetSnapshot() == nullptr || rand_->OneIn(2); if (take_snapshot) { txn_->SetSnapshot(); read_options_.snapshot = txn_->GetSnapshot(); } - auto res = DoInsert(nullptr, txn_, false); + auto res = DoInsert(db, txn_, false); if (take_snapshot) { read_options_.snapshot = nullptr; } @@ -73,7 +81,7 @@ bool RandomTransactionInserter::OptimisticTransactionDBInsert( optimistic_txn_ = db->BeginTransaction(write_options_, txn_options, optimistic_txn_); - return DoInsert(nullptr, optimistic_txn_, true); + return DoInsert(db, optimistic_txn_, true); } bool RandomTransactionInserter::DBInsert(DB* db) { @@ -135,8 +143,7 @@ bool RandomTransactionInserter::DoInsert(DB* db, Transaction* txn, std::vector set_vec(num_sets_); std::iota(set_vec.begin(), set_vec.end(), static_cast(0)); - std::random_shuffle(set_vec.begin(), set_vec.end(), - [&](uint64_t r) { return rand_->Uniform(r); }); + std::shuffle(set_vec.begin(), set_vec.end(), std::random_device{}); // For each set, pick a key at random and increment it for (uint16_t set_i : set_vec) { @@ -178,20 +185,41 @@ bool RandomTransactionInserter::DoInsert(DB* db, Transaction* txn, } bytes_inserted_ += key.size() + sum.size(); } + if (txn != nullptr) { + ROCKS_LOG_DEBUG(db->GetDBOptions().info_log, + "Insert (%s) %s snap: %" PRIu64 " key:%s value: %" PRIu64 + "+%" PRIu64 "=%" PRIu64, + txn->GetName().c_str(), s.ToString().c_str(), + txn->GetSnapshot()->GetSequenceNumber(), full_key.c_str(), + int_value, incr, int_value + incr); + } } if (s.ok()) { if (txn != nullptr) { - if (!is_optimistic && !rand_->OneIn(10)) { - // also try commit without prpare + bool with_prepare = !is_optimistic && !rand_->OneIn(10); + if (with_prepare) { + // Also try commit without prepare s = txn->Prepare(); assert(s.ok()); + ROCKS_LOG_DEBUG(db->GetDBOptions().info_log, + "Prepare of %" PRIu64 " %s (%s)", txn->GetId(), + s.ToString().c_str(), txn->GetName().c_str()); + db->GetDBOptions().env->SleepForMicroseconds( + static_cast(cmt_delay_ms_ * 1000)); } if (!rand_->OneIn(20)) { s = txn->Commit(); + assert(!with_prepare || s.ok()); + ROCKS_LOG_DEBUG(db->GetDBOptions().info_log, + "Commit of %" PRIu64 " %s (%s)", txn->GetId(), + s.ToString().c_str(), txn->GetName().c_str()); } else { // Also try 5% rollback s = txn->Rollback(); + ROCKS_LOG_DEBUG(db->GetDBOptions().info_log, + "Rollback %" PRIu64 " %s %s", txn->GetId(), + txn->GetName().c_str(), s.ToString().c_str()); assert(s.ok()); } assert(is_optimistic || s.ok()); @@ -228,6 +256,8 @@ bool RandomTransactionInserter::DoInsert(DB* db, Transaction* txn, } else { if (txn != nullptr) { assert(txn->Rollback().ok()); + ROCKS_LOG_DEBUG(db->GetDBOptions().info_log, "Error %s for txn %s", + s.ToString().c_str(), txn->GetName().c_str()); } } @@ -246,7 +276,11 @@ bool RandomTransactionInserter::DoInsert(DB* db, Transaction* txn, // Verify that the sum of the keys in each set are equal Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets, uint64_t num_keys_per_set, - bool take_snapshot, Random64* rand) { + bool take_snapshot, Random64* rand, + uint64_t delay_ms) { + // delay_ms is the delay between taking a snapshot and doing the reads. It + // emulates reads from a long-running backup job. + assert(delay_ms == 0 || take_snapshot); uint64_t prev_total = 0; uint32_t prev_i = 0; bool prev_assigned = false; @@ -254,14 +288,14 @@ Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets, ReadOptions roptions; if (take_snapshot) { roptions.snapshot = db->GetSnapshot(); + db->GetDBOptions().env->SleepForMicroseconds( + static_cast(delay_ms * 1000)); } std::vector set_vec(num_sets); std::iota(set_vec.begin(), set_vec.end(), static_cast(0)); - if (rand) { - std::random_shuffle(set_vec.begin(), set_vec.end(), - [&](uint64_t r) { return rand->Uniform(r); }); - } + std::shuffle(set_vec.begin(), set_vec.end(), std::random_device{}); + // For each set of keys with the same prefix, sum all the values for (uint16_t set_i : set_vec) { // Five digits (since the largest uint16_t is 65535) plus the NUL @@ -273,7 +307,9 @@ Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets, // Use either point lookup or iterator. Point lookups are slower so we use // it less often. - if (num_keys_per_set != 0 && rand && rand->OneIn(10)) { // use point lookup + const bool use_point_lookup = + num_keys_per_set != 0 && rand && rand->OneIn(10); + if (use_point_lookup) { ReadOptions read_options; for (uint64_t k = 0; k < num_keys_per_set; k++) { std::string dont_care; @@ -301,17 +337,37 @@ Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets, value.ToString().c_str()); return Status::Corruption(); } + ROCKS_LOG_DEBUG( + db->GetDBOptions().info_log, + "VerifyRead at %" PRIu64 " (%" PRIu64 "): %.*s value: %" PRIu64, + roptions.snapshot ? roptions.snapshot->GetSequenceNumber() : 0ul, + roptions.snapshot + ? ((SnapshotImpl*)roptions.snapshot)->min_uncommitted_ + : 0ul, + static_cast(key.size()), key.data(), int_value); total += int_value; } delete iter; } if (prev_assigned && total != prev_total) { + db->GetDBOptions().info_log->Flush(); fprintf(stdout, - "RandomTransactionVerify found inconsistent totals. " - "Set[%" PRIu32 "]: %" PRIu64 ", Set[%" PRIu32 "]: %" PRIu64 " \n", - prev_i, prev_total, set_i, total); + "RandomTransactionVerify found inconsistent totals using " + "pointlookup? %d " + "Set[%" PRIu32 "]: %" PRIu64 ", Set[%" PRIu32 "]: %" PRIu64 + " at snapshot %" PRIu64 "\n", + use_point_lookup, prev_i, prev_total, set_i, total, + roptions.snapshot ? roptions.snapshot->GetSequenceNumber() : 0ul); + fflush(stdout); return Status::Corruption(); + } else { + ROCKS_LOG_DEBUG( + db->GetDBOptions().info_log, + "RandomTransactionVerify pass pointlookup? %d total: %" PRIu64 + " snap: %" PRIu64, + use_point_lookup, total, + roptions.snapshot ? roptions.snapshot->GetSequenceNumber() : 0ul); } prev_total = total; prev_i = set_i; diff --git a/ceph/src/rocksdb/util/transaction_test_util.h b/ceph/src/rocksdb/util/transaction_test_util.h index 414a4267e..1aa4196ab 100644 --- a/ceph/src/rocksdb/util/transaction_test_util.h +++ b/ceph/src/rocksdb/util/transaction_test_util.h @@ -35,10 +35,13 @@ class RandomTransactionInserter { public: // num_keys is the number of keys in each set. // num_sets is the number of sets of keys. + // cmt_delay_ms is the delay between prepare (if there is any) and commit + // first_id is the id of the first transaction explicit RandomTransactionInserter( Random64* rand, const WriteOptions& write_options = WriteOptions(), const ReadOptions& read_options = ReadOptions(), uint64_t num_keys = 1000, - uint16_t num_sets = 3); + uint16_t num_sets = 3, const uint64_t cmt_delay_ms = 0, + const uint64_t first_id = 0); ~RandomTransactionInserter(); @@ -76,7 +79,8 @@ class RandomTransactionInserter { // Returns OK if Invariant is true. static Status Verify(DB* db, uint16_t num_sets, uint64_t num_keys_per_set = 0, - bool take_snapshot = false, Random64* rand = nullptr); + bool take_snapshot = false, Random64* rand = nullptr, + uint64_t delay_ms = 0); // Returns the status of the previous Insert operation Status GetLastStatus() { return last_status_; } @@ -116,7 +120,9 @@ class RandomTransactionInserter { Transaction* txn_ = nullptr; Transaction* optimistic_txn_ = nullptr; - std::atomic txn_id_; + uint64_t txn_id_; + // The delay between ::Prepare and ::Commit + const uint64_t cmt_delay_ms_; bool DoInsert(DB* db, Transaction* txn, bool is_optimistic); }; diff --git a/ceph/src/rocksdb/util/user_comparator_wrapper.h b/ceph/src/rocksdb/util/user_comparator_wrapper.h new file mode 100644 index 000000000..43797709c --- /dev/null +++ b/ceph/src/rocksdb/util/user_comparator_wrapper.h @@ -0,0 +1,65 @@ +// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). +// Copyright (c) 2011 The LevelDB Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. See the AUTHORS file for names of contributors. + +#pragma once + +#include "monitoring/perf_context_imp.h" +#include "rocksdb/comparator.h" + +namespace rocksdb { + +// Wrapper of user comparator, with auto increment to +// perf_context.user_key_comparison_count. +class UserComparatorWrapper final : public Comparator { + public: + explicit UserComparatorWrapper(const Comparator* const user_cmp) + : user_comparator_(user_cmp) {} + + ~UserComparatorWrapper() = default; + + const Comparator* user_comparator() const { return user_comparator_; } + + int Compare(const Slice& a, const Slice& b) const override { + PERF_COUNTER_ADD(user_key_comparison_count, 1); + return user_comparator_->Compare(a, b); + } + + bool Equal(const Slice& a, const Slice& b) const override { + PERF_COUNTER_ADD(user_key_comparison_count, 1); + return user_comparator_->Equal(a, b); + } + + const char* Name() const override { return user_comparator_->Name(); } + + void FindShortestSeparator(std::string* start, + const Slice& limit) const override { + return user_comparator_->FindShortestSeparator(start, limit); + } + + void FindShortSuccessor(std::string* key) const override { + return user_comparator_->FindShortSuccessor(key); + } + + const Comparator* GetRootComparator() const override { + return user_comparator_->GetRootComparator(); + } + + bool IsSameLengthImmediateSuccessor(const Slice& s, + const Slice& t) const override { + return user_comparator_->IsSameLengthImmediateSuccessor(s, t); + } + + bool CanKeysWithDifferentByteContentsBeEqual() const override { + return user_comparator_->CanKeysWithDifferentByteContentsBeEqual(); + } + + private: + const Comparator* user_comparator_; +}; + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/vector_iterator.h b/ceph/src/rocksdb/util/vector_iterator.h new file mode 100644 index 000000000..da60eb229 --- /dev/null +++ b/ceph/src/rocksdb/util/vector_iterator.h @@ -0,0 +1,100 @@ +#pragma once + +#include +#include +#include + +#include "db/dbformat.h" +#include "rocksdb/iterator.h" +#include "rocksdb/slice.h" +#include "table/internal_iterator.h" + +namespace rocksdb { + +// Iterator over a vector of keys/values +class VectorIterator : public InternalIterator { + public: + VectorIterator(std::vector keys, std::vector values, + const InternalKeyComparator* icmp) + : keys_(std::move(keys)), + values_(std::move(values)), + indexed_cmp_(icmp, &keys_), + current_(keys.size()) { + assert(keys_.size() == values_.size()); + + indices_.reserve(keys_.size()); + for (size_t i = 0; i < keys_.size(); i++) { + indices_.push_back(i); + } + std::sort(indices_.begin(), indices_.end(), indexed_cmp_); + } + + virtual bool Valid() const override { + return !indices_.empty() && current_ < indices_.size(); + } + + virtual void SeekToFirst() override { current_ = 0; } + virtual void SeekToLast() override { current_ = indices_.size() - 1; } + + virtual void Seek(const Slice& target) override { + current_ = std::lower_bound(indices_.begin(), indices_.end(), target, + indexed_cmp_) - + indices_.begin(); + } + + virtual void SeekForPrev(const Slice& target) override { + current_ = std::lower_bound(indices_.begin(), indices_.end(), target, + indexed_cmp_) - + indices_.begin(); + if (!Valid()) { + SeekToLast(); + } else { + Prev(); + } + } + + virtual void Next() override { current_++; } + virtual void Prev() override { current_--; } + + virtual Slice key() const override { + return Slice(keys_[indices_[current_]]); + } + virtual Slice value() const override { + return Slice(values_[indices_[current_]]); + } + + virtual Status status() const override { return Status::OK(); } + + virtual bool IsKeyPinned() const override { return true; } + virtual bool IsValuePinned() const override { return true; } + + private: + struct IndexedKeyComparator { + IndexedKeyComparator(const InternalKeyComparator* c, + const std::vector* ks) + : cmp(c), keys(ks) {} + + bool operator()(size_t a, size_t b) const { + return cmp->Compare((*keys)[a], (*keys)[b]) < 0; + } + + bool operator()(size_t a, const Slice& b) const { + return cmp->Compare((*keys)[a], b) < 0; + } + + bool operator()(const Slice& a, size_t b) const { + return cmp->Compare(a, (*keys)[b]) < 0; + } + + const InternalKeyComparator* cmp; + const std::vector* keys; + }; + + std::vector keys_; + std::vector values_; + IndexedKeyComparator indexed_cmp_; + std::vector indices_; + size_t current_; +}; + +} // namespace rocksdb diff --git a/ceph/src/rocksdb/util/xxhash.cc b/ceph/src/rocksdb/util/xxhash.cc index 4bce61a48..2ec95a636 100644 --- a/ceph/src/rocksdb/util/xxhash.cc +++ b/ceph/src/rocksdb/util/xxhash.cc @@ -34,6 +34,39 @@ You can contact the author at : //************************************** // Tuning parameters //************************************** +/*!XXH_FORCE_MEMORY_ACCESS : + * By default, access to unaligned memory is controlled by `memcpy()`, which is + * safe and portable. Unfortunately, on some target/compiler combinations, the + * generated assembly is sub-optimal. The below switch allow to select different + * access method for improved performance. Method 0 (default) : use `memcpy()`. + * Safe and portable. Method 1 : `__packed` statement. It depends on compiler + * extension (ie, not portable). This method is safe if your compiler supports + * it, and *generally* as fast or faster than `memcpy`. Method 2 : direct + * access. This method doesn't depend on compiler but violate C standard. It can + * generate buggy code on targets which do not support unaligned memory + * accesses. But in some circumstances, it's the only known way to get the most + * performance (ie GCC + ARMv6) See http://stackoverflow.com/a/32095106/646947 + * for details. Prefer these methods in priority order (0 > 1 > 2) + */ + +#include "util/util.h" + +#ifndef XXH_FORCE_MEMORY_ACCESS /* can be defined externally, on command line \ + for example */ +#if defined(__GNUC__) && \ + (defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || \ + defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || \ + defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__)) +#define XXH_FORCE_MEMORY_ACCESS 2 +#elif (defined(__INTEL_COMPILER) && !defined(_WIN32)) || \ + (defined(__GNUC__) && \ + (defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || \ + defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || \ + defined(__ARM_ARCH_7S__))) +#define XXH_FORCE_MEMORY_ACCESS 1 +#endif +#endif + // Unaligned memory access is automatically enabled for "common" CPU, such as x86. // For others CPU, the compiler will be more cautious, and insert extra code to ensure aligned access is respected. // If you know your target CPU supports unaligned memory access, you want to force this option manually to improve performance. @@ -58,6 +91,21 @@ You can contact the author at : // This option has no impact on Little_Endian CPU. #define XXH_FORCE_NATIVE_FORMAT 0 +/*!XXH_FORCE_ALIGN_CHECK : + * This is a minor performance trick, only useful with lots of very small keys. + * It means : check for aligned/unaligned input. + * The check costs one initial branch per hash; + * set it to 0 when the input is guaranteed to be aligned, + * or when alignment doesn't matter for performance. + */ +#ifndef XXH_FORCE_ALIGN_CHECK /* can be defined externally */ +#if defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || \ + defined(_M_X64) +#define XXH_FORCE_ALIGN_CHECK 0 +#else +#define XXH_FORCE_ALIGN_CHECK 1 +#endif +#endif //************************************** // Compiler Specific Options @@ -91,7 +139,7 @@ FORCE_INLINE void XXH_free (void* p) { free(p); } // for memcpy() #include FORCE_INLINE void* XXH_memcpy(void* dest, const void* src, size_t size) { return memcpy(dest,src,size); } - +#include /* assert */ namespace rocksdb { //************************************** @@ -134,6 +182,34 @@ typedef struct _U32_S { U32 v; } _PACKED U32_S; #define A32(x) (((U32_S *)(x))->v) +#if (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS == 2)) + +/* Force direct memory access. Only works on CPU which support unaligned memory + * access in hardware */ +static U32 XXH_read32(const void* memPtr) { return *(const U32*)memPtr; } + +#elif (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS == 1)) + +/* __pack instructions are safer, but compiler specific, hence potentially + * problematic for some compilers */ +/* currently only defined for gcc and icc */ +typedef union { + U32 u32; +} __attribute__((packed)) unalign; +static U32 XXH_read32(const void* ptr) { return ((const unalign*)ptr)->u32; } + +#else + +/* portable and safe solution. Generally efficient. + * see : http://stackoverflow.com/a/32095106/646947 + */ +static U32 XXH_read32(const void* memPtr) { + U32 val; + memcpy(&val, memPtr, sizeof(val)); + return val; +} + +#endif /* XXH_FORCE_DIRECT_MEMORY_ACCESS */ //*************************************** // Compiler-specific Functions and Macros @@ -143,8 +219,10 @@ typedef struct _U32_S { U32 v; } _PACKED U32_S; // Note : although _rotl exists for minGW (GCC under windows), performance seems poor #if defined(_MSC_VER) # define XXH_rotl32(x,r) _rotl(x,r) +#define XXH_rotl64(x, r) _rotl64(x, r) #else # define XXH_rotl32(x,r) ((x << r) | (x >> (32 - r))) +#define XXH_rotl64(x, r) ((x << r) | (x >> (64 - r))) #endif #if defined(_MSC_VER) // Visual Studio @@ -199,12 +277,25 @@ FORCE_INLINE U32 XXH_readLE32_align(const U32* ptr, XXH_endianess endian, XXH_al return endian==XXH_littleEndian ? *ptr : XXH_swap32(*ptr); } -FORCE_INLINE U32 XXH_readLE32(const U32* ptr, XXH_endianess endian) { return XXH_readLE32_align(ptr, endian, XXH_unaligned); } +FORCE_INLINE U32 XXH_readLE32_align(const void* ptr, XXH_endianess endian, + XXH_alignment align) { + if (align == XXH_unaligned) + return endian == XXH_littleEndian ? XXH_read32(ptr) + : XXH_swap32(XXH_read32(ptr)); + else + return endian == XXH_littleEndian ? *(const U32*)ptr + : XXH_swap32(*(const U32*)ptr); +} +FORCE_INLINE U32 XXH_readLE32(const U32* ptr, XXH_endianess endian) { + return XXH_readLE32_align(ptr, endian, XXH_unaligned); +} //**************************** // Simple Hash Functions //**************************** +#define XXH_get32bits(p) XXH_readLE32_align(p, endian, align) + FORCE_INLINE U32 XXH32_endian_align(const void* input, int len, U32 seed, XXH_endianess endian, XXH_alignment align) { const BYTE* p = (const BYTE*)input; @@ -476,4 +567,508 @@ U32 XXH32_digest (void* state_in) return h32; } +/* ******************************************************************* + * 64-bit hash functions + *********************************************************************/ + + #if (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS==2)) + + /* Force direct memory access. Only works on CPU which support unaligned memory access in hardware */ + static U64 XXH_read64(const void* memPtr) { return *(const U64*) memPtr; } + + #elif (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS==1)) + + /* __pack instructions are safer, but compiler specific, hence potentially problematic for some compilers */ + /* currently only defined for gcc and icc */ + typedef union { U32 u32; U64 u64; } __attribute__((packed)) unalign64; + static U64 XXH_read64(const void* ptr) { return ((const unalign64*)ptr)->u64; } + + #else + + /* portable and safe solution. Generally efficient. + * see : http://stackoverflow.com/a/32095106/646947 + */ + + static U64 XXH_read64(const void* memPtr) + { + U64 val; + memcpy(&val, memPtr, sizeof(val)); + return val; + } +#endif /* XXH_FORCE_DIRECT_MEMORY_ACCESS */ + +#if defined(_MSC_VER) /* Visual Studio */ +#define XXH_swap64 _byteswap_uint64 +#elif XXH_GCC_VERSION >= 403 +#define XXH_swap64 __builtin_bswap64 +#else +static U64 XXH_swap64(U64 x) { + return ((x << 56) & 0xff00000000000000ULL) | + ((x << 40) & 0x00ff000000000000ULL) | + ((x << 24) & 0x0000ff0000000000ULL) | + ((x << 8) & 0x000000ff00000000ULL) | + ((x >> 8) & 0x00000000ff000000ULL) | + ((x >> 24) & 0x0000000000ff0000ULL) | + ((x >> 40) & 0x000000000000ff00ULL) | + ((x >> 56) & 0x00000000000000ffULL); +} +#endif + +FORCE_INLINE U64 XXH_readLE64_align(const void* ptr, XXH_endianess endian, + XXH_alignment align) { + if (align == XXH_unaligned) + return endian == XXH_littleEndian ? XXH_read64(ptr) + : XXH_swap64(XXH_read64(ptr)); + else + return endian == XXH_littleEndian ? *(const U64*)ptr + : XXH_swap64(*(const U64*)ptr); +} + +FORCE_INLINE U64 XXH_readLE64(const void* ptr, XXH_endianess endian) { + return XXH_readLE64_align(ptr, endian, XXH_unaligned); +} + +static U64 XXH_readBE64(const void* ptr) { + return XXH_CPU_LITTLE_ENDIAN ? XXH_swap64(XXH_read64(ptr)) : XXH_read64(ptr); +} + +/*====== xxh64 ======*/ + +static const U64 PRIME64_1 = + 11400714785074694791ULL; /* 0b1001111000110111011110011011000110000101111010111100101010000111 + */ +static const U64 PRIME64_2 = + 14029467366897019727ULL; /* 0b1100001010110010101011100011110100100111110101001110101101001111 + */ +static const U64 PRIME64_3 = + 1609587929392839161ULL; /* 0b0001011001010110011001111011000110011110001101110111100111111001 + */ +static const U64 PRIME64_4 = + 9650029242287828579ULL; /* 0b1000010111101011110010100111011111000010101100101010111001100011 + */ +static const U64 PRIME64_5 = + 2870177450012600261ULL; /* 0b0010011111010100111010110010111100010110010101100110011111000101 + */ + +static U64 XXH64_round(U64 acc, U64 input) { + acc += input * PRIME64_2; + acc = XXH_rotl64(acc, 31); + acc *= PRIME64_1; + return acc; +} + +static U64 XXH64_mergeRound(U64 acc, U64 val) { + val = XXH64_round(0, val); + acc ^= val; + acc = acc * PRIME64_1 + PRIME64_4; + return acc; +} + +static U64 XXH64_avalanche(U64 h64) { + h64 ^= h64 >> 33; + h64 *= PRIME64_2; + h64 ^= h64 >> 29; + h64 *= PRIME64_3; + h64 ^= h64 >> 32; + return h64; +} + +#define XXH_get64bits(p) XXH_readLE64_align(p, endian, align) + +static U64 XXH64_finalize(U64 h64, const void* ptr, size_t len, + XXH_endianess endian, XXH_alignment align) { + const BYTE* p = (const BYTE*)ptr; + +#define PROCESS1_64 \ + h64 ^= (*p++) * PRIME64_5; \ + h64 = XXH_rotl64(h64, 11) * PRIME64_1; + +#define PROCESS4_64 \ + h64 ^= (U64)(XXH_get32bits(p)) * PRIME64_1; \ + p += 4; \ + h64 = XXH_rotl64(h64, 23) * PRIME64_2 + PRIME64_3; + +#define PROCESS8_64 \ + { \ + U64 const k1 = XXH64_round(0, XXH_get64bits(p)); \ + p += 8; \ + h64 ^= k1; \ + h64 = XXH_rotl64(h64, 27) * PRIME64_1 + PRIME64_4; \ + } + + switch (len & 31) { + case 24: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 16: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 8: + PROCESS8_64; + return XXH64_avalanche(h64); + + case 28: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 20: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 12: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 4: + PROCESS4_64; + return XXH64_avalanche(h64); + + case 25: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 17: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 9: + PROCESS8_64; + PROCESS1_64; + return XXH64_avalanche(h64); + + case 29: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 21: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 13: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 5: + PROCESS4_64; + PROCESS1_64; + return XXH64_avalanche(h64); + + case 26: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 18: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 10: + PROCESS8_64; + PROCESS1_64; + PROCESS1_64; + return XXH64_avalanche(h64); + + case 30: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 22: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 14: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 6: + PROCESS4_64; + PROCESS1_64; + PROCESS1_64; + return XXH64_avalanche(h64); + + case 27: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 19: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 11: + PROCESS8_64; + PROCESS1_64; + PROCESS1_64; + PROCESS1_64; + return XXH64_avalanche(h64); + + case 31: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 23: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 15: + PROCESS8_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 7: + PROCESS4_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 3: + PROCESS1_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 2: + PROCESS1_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 1: + PROCESS1_64; + FALLTHROUGH_INTENDED; + /* fallthrough */ + case 0: + return XXH64_avalanche(h64); + } + + /* impossible to reach */ + assert(0); + return 0; /* unreachable, but some compilers complain without it */ +} + +FORCE_INLINE U64 XXH64_endian_align(const void* input, size_t len, U64 seed, + XXH_endianess endian, XXH_alignment align) { + const BYTE* p = (const BYTE*)input; + const BYTE* bEnd = p + len; + U64 h64; + +#if defined(XXH_ACCEPT_NULL_INPUT_POINTER) && \ + (XXH_ACCEPT_NULL_INPUT_POINTER >= 1) + if (p == NULL) { + len = 0; + bEnd = p = (const BYTE*)(size_t)32; + } +#endif + + if (len >= 32) { + const BYTE* const limit = bEnd - 32; + U64 v1 = seed + PRIME64_1 + PRIME64_2; + U64 v2 = seed + PRIME64_2; + U64 v3 = seed + 0; + U64 v4 = seed - PRIME64_1; + + do { + v1 = XXH64_round(v1, XXH_get64bits(p)); + p += 8; + v2 = XXH64_round(v2, XXH_get64bits(p)); + p += 8; + v3 = XXH64_round(v3, XXH_get64bits(p)); + p += 8; + v4 = XXH64_round(v4, XXH_get64bits(p)); + p += 8; + } while (p <= limit); + + h64 = XXH_rotl64(v1, 1) + XXH_rotl64(v2, 7) + XXH_rotl64(v3, 12) + + XXH_rotl64(v4, 18); + h64 = XXH64_mergeRound(h64, v1); + h64 = XXH64_mergeRound(h64, v2); + h64 = XXH64_mergeRound(h64, v3); + h64 = XXH64_mergeRound(h64, v4); + + } else { + h64 = seed + PRIME64_5; + } + + h64 += (U64)len; + + return XXH64_finalize(h64, p, len, endian, align); +} + +unsigned long long XXH64(const void* input, size_t len, + unsigned long long seed) { +#if 0 + /* Simple version, good for code maintenance, but unfortunately slow for small inputs */ + XXH64_state_t state; + XXH64_reset(&state, seed); + XXH64_update(&state, input, len); + return XXH64_digest(&state); +#else + XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN; + + if (XXH_FORCE_ALIGN_CHECK) { + if ((((size_t)input) & 7) == + 0) { /* Input is aligned, let's leverage the speed advantage */ + if ((endian_detected == XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT) + return XXH64_endian_align(input, len, seed, XXH_littleEndian, + XXH_aligned); + else + return XXH64_endian_align(input, len, seed, XXH_bigEndian, XXH_aligned); + } + } + + if ((endian_detected == XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT) + return XXH64_endian_align(input, len, seed, XXH_littleEndian, + XXH_unaligned); + else + return XXH64_endian_align(input, len, seed, XXH_bigEndian, XXH_unaligned); +#endif +} + +/*====== Hash Streaming ======*/ + +XXH64_state_t* XXH64_createState(void) { + return (XXH64_state_t*)XXH_malloc(sizeof(XXH64_state_t)); +} +XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr) { + XXH_free(statePtr); + return XXH_OK; +} + +void XXH64_copyState(XXH64_state_t* dstState, const XXH64_state_t* srcState) { + memcpy(dstState, srcState, sizeof(*dstState)); +} + +XXH_errorcode XXH64_reset(XXH64_state_t* statePtr, unsigned long long seed) { + XXH64_state_t state; /* using a local state to memcpy() in order to avoid + strict-aliasing warnings */ + memset(&state, 0, sizeof(state)); + state.v1 = seed + PRIME64_1 + PRIME64_2; + state.v2 = seed + PRIME64_2; + state.v3 = seed + 0; + state.v4 = seed - PRIME64_1; + /* do not write into reserved, planned to be removed in a future version */ + memcpy(statePtr, &state, sizeof(state) - sizeof(state.reserved)); + return XXH_OK; +} + +FORCE_INLINE XXH_errorcode XXH64_update_endian(XXH64_state_t* state, + const void* input, size_t len, + XXH_endianess endian) { + if (input == NULL) +#if defined(XXH_ACCEPT_NULL_INPUT_POINTER) && \ + (XXH_ACCEPT_NULL_INPUT_POINTER >= 1) + return XXH_OK; +#else + return XXH_ERROR; +#endif + + { + const BYTE* p = (const BYTE*)input; + const BYTE* const bEnd = p + len; + + state->total_len += len; + + if (state->memsize + len < 32) { /* fill in tmp buffer */ + XXH_memcpy(((BYTE*)state->mem64) + state->memsize, input, len); + state->memsize += (U32)len; + return XXH_OK; + } + + if (state->memsize) { /* tmp buffer is full */ + XXH_memcpy(((BYTE*)state->mem64) + state->memsize, input, + 32 - state->memsize); + state->v1 = + XXH64_round(state->v1, XXH_readLE64(state->mem64 + 0, endian)); + state->v2 = + XXH64_round(state->v2, XXH_readLE64(state->mem64 + 1, endian)); + state->v3 = + XXH64_round(state->v3, XXH_readLE64(state->mem64 + 2, endian)); + state->v4 = + XXH64_round(state->v4, XXH_readLE64(state->mem64 + 3, endian)); + p += 32 - state->memsize; + state->memsize = 0; + } + + if (p + 32 <= bEnd) { + const BYTE* const limit = bEnd - 32; + U64 v1 = state->v1; + U64 v2 = state->v2; + U64 v3 = state->v3; + U64 v4 = state->v4; + + do { + v1 = XXH64_round(v1, XXH_readLE64(p, endian)); + p += 8; + v2 = XXH64_round(v2, XXH_readLE64(p, endian)); + p += 8; + v3 = XXH64_round(v3, XXH_readLE64(p, endian)); + p += 8; + v4 = XXH64_round(v4, XXH_readLE64(p, endian)); + p += 8; + } while (p <= limit); + + state->v1 = v1; + state->v2 = v2; + state->v3 = v3; + state->v4 = v4; + } + + if (p < bEnd) { + XXH_memcpy(state->mem64, p, (size_t)(bEnd - p)); + state->memsize = (unsigned)(bEnd - p); + } + } + + return XXH_OK; +} + +XXH_errorcode XXH64_update(XXH64_state_t* state_in, const void* input, + size_t len) { + XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN; + + if ((endian_detected == XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT) + return XXH64_update_endian(state_in, input, len, XXH_littleEndian); + else + return XXH64_update_endian(state_in, input, len, XXH_bigEndian); +} + +FORCE_INLINE U64 XXH64_digest_endian(const XXH64_state_t* state, + XXH_endianess endian) { + U64 h64; + + if (state->total_len >= 32) { + U64 const v1 = state->v1; + U64 const v2 = state->v2; + U64 const v3 = state->v3; + U64 const v4 = state->v4; + + h64 = XXH_rotl64(v1, 1) + XXH_rotl64(v2, 7) + XXH_rotl64(v3, 12) + + XXH_rotl64(v4, 18); + h64 = XXH64_mergeRound(h64, v1); + h64 = XXH64_mergeRound(h64, v2); + h64 = XXH64_mergeRound(h64, v3); + h64 = XXH64_mergeRound(h64, v4); + } else { + h64 = state->v3 /*seed*/ + PRIME64_5; + } + + h64 += (U64)state->total_len; + + return XXH64_finalize(h64, state->mem64, (size_t)state->total_len, endian, + XXH_aligned); +} + +unsigned long long XXH64_digest(const XXH64_state_t* state_in) { + XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN; + + if ((endian_detected == XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT) + return XXH64_digest_endian(state_in, XXH_littleEndian); + else + return XXH64_digest_endian(state_in, XXH_bigEndian); +} + +/*====== Canonical representation ======*/ + +void XXH64_canonicalFromHash(XXH64_canonical_t* dst, XXH64_hash_t hash) { + XXH_STATIC_ASSERT(sizeof(XXH64_canonical_t) == sizeof(XXH64_hash_t)); + if (XXH_CPU_LITTLE_ENDIAN) hash = XXH_swap64(hash); + memcpy(dst, &hash, sizeof(*dst)); +} + +XXH64_hash_t XXH64_hashFromCanonical(const XXH64_canonical_t* src) { + return XXH_readBE64(src); +} } // namespace rocksdb diff --git a/ceph/src/rocksdb/util/xxhash.h b/ceph/src/rocksdb/util/xxhash.h index 3343e3488..88352ac75 100644 --- a/ceph/src/rocksdb/util/xxhash.h +++ b/ceph/src/rocksdb/util/xxhash.h @@ -59,6 +59,14 @@ It depends on successfully passing SMHasher test set. #pragma once +#include + +#if !defined(__VMS) && \ + (defined(__cplusplus) || \ + (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */)) +#include +#endif + #if defined (__cplusplus) namespace rocksdb { #endif @@ -67,6 +75,7 @@ namespace rocksdb { //**************************** // Type //**************************** +/* size_t */ typedef enum { XXH_OK=0, XXH_ERROR } XXH_errorcode; @@ -157,7 +166,74 @@ To free memory context, use XXH32_digest(), or free(). #define XXH32_result XXH32_digest #define XXH32_getIntermediateResult XXH32_intermediateDigest +/*-********************************************************************** + * 64-bit hash + ************************************************************************/ +typedef unsigned long long XXH64_hash_t; +/*! XXH64() : + Calculate the 64-bit hash of sequence of length "len" stored at memory + address "input". "seed" can be used to alter the result predictably. This + function runs faster on 64-bit systems, but slower on 32-bit systems (see + benchmark). +*/ +XXH64_hash_t XXH64(const void* input, size_t length, unsigned long long seed); + +/*====== Streaming ======*/ +typedef struct XXH64_state_s XXH64_state_t; /* incomplete type */ +XXH64_state_t* XXH64_createState(void); +XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr); +void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); + +XXH_errorcode XXH64_reset(XXH64_state_t* statePtr, unsigned long long seed); +XXH_errorcode XXH64_update(XXH64_state_t* statePtr, const void* input, + size_t length); +XXH64_hash_t XXH64_digest(const XXH64_state_t* statePtr); + +/*====== Canonical representation ======*/ +typedef struct { + unsigned char digest[8]; +} XXH64_canonical_t; +void XXH64_canonicalFromHash(XXH64_canonical_t* dst, XXH64_hash_t hash); +XXH64_hash_t XXH64_hashFromCanonical(const XXH64_canonical_t* src); + +/* These definitions are only present to allow + * static allocation of XXH state, on stack or in a struct for example. + * Never **ever** use members directly. */ + +#if !defined(__VMS) && \ + (defined(__cplusplus) || \ + (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */)) + +struct XXH64_state_s { + uint64_t total_len; + uint64_t v1; + uint64_t v2; + uint64_t v3; + uint64_t v4; + uint64_t mem64[4]; + uint32_t memsize; + uint32_t reserved[2]; /* never read nor write, might be removed in a future + version */ +}; /* typedef'd to XXH64_state_t */ + +#else + +#ifndef XXH_NO_LONG_LONG /* remove 64-bit support */ +struct XXH64_state_s { + unsigned long long total_len; + unsigned long long v1; + unsigned long long v2; + unsigned long long v3; + unsigned long long v4; + unsigned long long mem64[4]; + unsigned memsize; + unsigned reserved[2]; /* never read nor write, might be removed in a future + version */ +}; /* typedef'd to XXH64_state_t */ +#endif + +#endif #if defined (__cplusplus) } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/backupable/backupable_db.cc b/ceph/src/rocksdb/utilities/backupable/backupable_db.cc index 4cafc6ab1..b7c15c391 100644 --- a/ceph/src/rocksdb/utilities/backupable/backupable_db.cc +++ b/ceph/src/rocksdb/utilities/backupable/backupable_db.cc @@ -92,7 +92,7 @@ class BackupEngineImpl : public BackupEngine { public: BackupEngineImpl(Env* db_env, const BackupableDBOptions& options, bool read_only = false); - ~BackupEngineImpl(); + ~BackupEngineImpl() override; Status CreateNewBackupWithMetadata(DB* db, const std::string& app_metadata, bool flush_before_backup = false, std::function progress_callback = @@ -118,7 +118,7 @@ class BackupEngineImpl : public BackupEngine { restore_options); } - virtual Status VerifyBackup(BackupID backup_id) override; + Status VerifyBackup(BackupID backup_id) override; Status Initialize(); @@ -305,16 +305,16 @@ class BackupEngineImpl : public BackupEngine { // @param contents If non-empty, the file will be created with these contents. Status CopyOrCreateFile(const std::string& src, const std::string& dst, const std::string& contents, Env* src_env, - Env* dst_env, bool sync, RateLimiter* rate_limiter, + Env* dst_env, const EnvOptions& src_env_options, + bool sync, RateLimiter* rate_limiter, uint64_t* size = nullptr, uint32_t* checksum_value = nullptr, uint64_t size_limit = 0, std::function progress_callback = []() {}); - Status CalculateChecksum(const std::string& src, - Env* src_env, - uint64_t size_limit, - uint32_t* checksum_value); + Status CalculateChecksum(const std::string& src, Env* src_env, + const EnvOptions& src_env_options, + uint64_t size_limit, uint32_t* checksum_value); struct CopyOrCreateResult { uint64_t size; @@ -331,6 +331,7 @@ class BackupEngineImpl : public BackupEngine { std::string contents; Env* src_env; Env* dst_env; + EnvOptions src_env_options; bool sync; RateLimiter* rate_limiter; uint64_t size_limit; @@ -338,14 +339,15 @@ class BackupEngineImpl : public BackupEngine { std::function progress_callback; CopyOrCreateWorkItem() - : src_path(""), - dst_path(""), - contents(""), - src_env(nullptr), - dst_env(nullptr), - sync(false), - rate_limiter(nullptr), - size_limit(0) {} + : src_path(""), + dst_path(""), + contents(""), + src_env(nullptr), + dst_env(nullptr), + src_env_options(), + sync(false), + rate_limiter(nullptr), + size_limit(0) {} CopyOrCreateWorkItem(const CopyOrCreateWorkItem&) = delete; CopyOrCreateWorkItem& operator=(const CopyOrCreateWorkItem&) = delete; @@ -360,6 +362,7 @@ class BackupEngineImpl : public BackupEngine { contents = std::move(o.contents); src_env = o.src_env; dst_env = o.dst_env; + src_env_options = std::move(o.src_env_options); sync = o.sync; rate_limiter = o.rate_limiter; size_limit = o.size_limit; @@ -370,14 +373,15 @@ class BackupEngineImpl : public BackupEngine { CopyOrCreateWorkItem(std::string _src_path, std::string _dst_path, std::string _contents, Env* _src_env, Env* _dst_env, - bool _sync, RateLimiter* _rate_limiter, - uint64_t _size_limit, + EnvOptions _src_env_options, bool _sync, + RateLimiter* _rate_limiter, uint64_t _size_limit, std::function _progress_callback = []() {}) : src_path(std::move(_src_path)), dst_path(std::move(_dst_path)), contents(std::move(_contents)), src_env(_src_env), dst_env(_dst_env), + src_env_options(std::move(_src_env_options)), sync(_sync), rate_limiter(_rate_limiter), size_limit(_size_limit), @@ -471,7 +475,8 @@ class BackupEngineImpl : public BackupEngine { std::vector& backup_items_to_finish, BackupID backup_id, bool shared, const std::string& src_dir, const std::string& fname, // starts with "/" - RateLimiter* rate_limiter, uint64_t size_bytes, uint64_t size_limit = 0, + const EnvOptions& src_env_options, RateLimiter* rate_limiter, + uint64_t size_bytes, uint64_t size_limit = 0, bool shared_checksum = false, std::function progress_callback = []() {}, const std::string& contents = std::string()); @@ -479,9 +484,9 @@ class BackupEngineImpl : public BackupEngine { // backup state data BackupID latest_backup_id_; BackupID latest_valid_backup_id_; - std::map> backups_; - std::map>> corrupt_backups_; + std::map> backups_; + std::map>> + corrupt_backups_; std::unordered_map> backuped_file_infos_; std::atomic stop_backup_; @@ -492,10 +497,10 @@ class BackupEngineImpl : public BackupEngine { Env* backup_env_; // directories - unique_ptr backup_directory_; - unique_ptr shared_directory_; - unique_ptr meta_directory_; - unique_ptr private_directory_; + std::unique_ptr backup_directory_; + std::unique_ptr shared_directory_; + std::unique_ptr meta_directory_; + std::unique_ptr private_directory_; static const size_t kDefaultCopyFileBufferSize = 5 * 1024 * 1024LL; // 5MB size_t copy_file_buffer_size_; @@ -616,7 +621,7 @@ Status BackupEngineImpl::Initialize() { } assert(backups_.find(backup_id) == backups_.end()); backups_.insert(std::make_pair( - backup_id, unique_ptr(new BackupMeta( + backup_id, std::unique_ptr(new BackupMeta( GetBackupMetaFile(backup_id, false /* tmp */), GetBackupMetaFile(backup_id, true /* tmp */), &backuped_file_infos_, backup_env_)))); @@ -723,9 +728,10 @@ Status BackupEngineImpl::Initialize() { CopyOrCreateResult result; result.status = CopyOrCreateFile( work_item.src_path, work_item.dst_path, work_item.contents, - work_item.src_env, work_item.dst_env, work_item.sync, - work_item.rate_limiter, &result.size, &result.checksum_value, - work_item.size_limit, work_item.progress_callback); + work_item.src_env, work_item.dst_env, work_item.src_env_options, + work_item.sync, work_item.rate_limiter, &result.size, + &result.checksum_value, work_item.size_limit, + work_item.progress_callback); work_item.result.set_value(std::move(result)); } }); @@ -761,7 +767,7 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata( } auto ret = backups_.insert(std::make_pair( - new_backup_id, unique_ptr(new BackupMeta( + new_backup_id, std::unique_ptr(new BackupMeta( GetBackupMetaFile(new_backup_id, false /* tmp */), GetBackupMetaFile(new_backup_id, true /* tmp */), &backuped_file_infos_, backup_env_)))); @@ -796,8 +802,10 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata( if (s.ok()) { CheckpointImpl checkpoint(db); uint64_t sequence_number = 0; + DBOptions db_options = db->GetDBOptions(); + EnvOptions src_raw_env_options(db_options); s = checkpoint.CreateCustomCheckpoint( - db->GetDBOptions(), + db_options, [&](const std::string& /*src_dirname*/, const std::string& /*fname*/, FileType) { // custom checkpoint will switch to calling copy_file_cb after it sees @@ -815,11 +823,33 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata( if (type == kTableFile) { st = db_env_->GetFileSize(src_dirname + fname, &size_bytes); } + EnvOptions src_env_options; + switch (type) { + case kLogFile: + src_env_options = + db_env_->OptimizeForLogRead(src_raw_env_options); + break; + case kTableFile: + src_env_options = db_env_->OptimizeForCompactionTableRead( + src_raw_env_options, ImmutableDBOptions(db_options)); + break; + case kDescriptorFile: + src_env_options = + db_env_->OptimizeForManifestRead(src_raw_env_options); + break; + default: + // Other backed up files (like options file) are not read by live + // DB, so don't need to worry about avoiding mixing buffered and + // direct I/O. Just use plain defaults. + src_env_options = src_raw_env_options; + break; + } if (st.ok()) { st = AddBackupFileWorkItem( live_dst_paths, backup_items_to_finish, new_backup_id, options_.share_table_files && type == kTableFile, src_dirname, - fname, rate_limiter, size_bytes, size_limit_bytes, + fname, src_env_options, rate_limiter, size_bytes, + size_limit_bytes, options_.share_files_with_checksum && type == kTableFile, progress_callback); } @@ -829,8 +859,9 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata( Log(options_.info_log, "add file for backup %s", fname.c_str()); return AddBackupFileWorkItem( live_dst_paths, backup_items_to_finish, new_backup_id, - false /* shared */, "" /* src_dir */, fname, rate_limiter, - contents.size(), 0 /* size_limit */, false /* shared_checksum */, + false /* shared */, "" /* src_dir */, fname, + EnvOptions() /* src_env_options */, rate_limiter, contents.size(), + 0 /* size_limit */, false /* shared_checksum */, progress_callback, contents); } /* create_file_cb */, &sequence_number, flush_before_backup ? 0 : port::kMaxUint64); @@ -869,7 +900,7 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata( s = new_backup->StoreToFile(options_.sync); } if (s.ok() && options_.sync) { - unique_ptr backup_private_directory; + std::unique_ptr backup_private_directory; backup_env_->NewDirectory( GetAbsolutePath(GetPrivateFileRel(new_backup_id, false)), &backup_private_directory); @@ -1114,7 +1145,8 @@ Status BackupEngineImpl::RestoreDBFromBackup( dst.c_str()); CopyOrCreateWorkItem copy_or_create_work_item( GetAbsolutePath(file), dst, "" /* contents */, backup_env_, db_env_, - false, rate_limiter, 0 /* size_limit */); + EnvOptions() /* src_env_options */, false, rate_limiter, + 0 /* size_limit */); RestoreAfterCopyOrCreateWorkItem after_copy_or_create_work_item( copy_or_create_work_item.result.get_future(), file_info->checksum_value); @@ -1183,15 +1215,15 @@ Status BackupEngineImpl::VerifyBackup(BackupID backup_id) { Status BackupEngineImpl::CopyOrCreateFile( const std::string& src, const std::string& dst, const std::string& contents, - Env* src_env, Env* dst_env, bool sync, RateLimiter* rate_limiter, - uint64_t* size, uint32_t* checksum_value, uint64_t size_limit, - std::function progress_callback) { + Env* src_env, Env* dst_env, const EnvOptions& src_env_options, bool sync, + RateLimiter* rate_limiter, uint64_t* size, uint32_t* checksum_value, + uint64_t size_limit, std::function progress_callback) { assert(src.empty() != contents.empty()); Status s; - unique_ptr dst_file; - unique_ptr src_file; - EnvOptions env_options; - env_options.use_mmap_writes = false; + std::unique_ptr dst_file; + std::unique_ptr src_file; + EnvOptions dst_env_options; + dst_env_options.use_mmap_writes = false; // TODO:(gzh) maybe use direct reads/writes here if possible if (size != nullptr) { *size = 0; @@ -1205,18 +1237,18 @@ Status BackupEngineImpl::CopyOrCreateFile( size_limit = std::numeric_limits::max(); } - s = dst_env->NewWritableFile(dst, &dst_file, env_options); + s = dst_env->NewWritableFile(dst, &dst_file, dst_env_options); if (s.ok() && !src.empty()) { - s = src_env->NewSequentialFile(src, &src_file, env_options); + s = src_env->NewSequentialFile(src, &src_file, src_env_options); } if (!s.ok()) { return s; } - unique_ptr dest_writer( - new WritableFileWriter(std::move(dst_file), dst, env_options)); - unique_ptr src_reader; - unique_ptr buf; + std::unique_ptr dest_writer( + new WritableFileWriter(std::move(dst_file), dst, dst_env_options)); + std::unique_ptr src_reader; + std::unique_ptr buf; if (!src.empty()) { src_reader.reset(new SequentialFileReader(std::move(src_file), src)); buf.reset(new char[copy_file_buffer_size_]); @@ -1276,9 +1308,10 @@ Status BackupEngineImpl::AddBackupFileWorkItem( std::unordered_set& live_dst_paths, std::vector& backup_items_to_finish, BackupID backup_id, bool shared, const std::string& src_dir, - const std::string& fname, RateLimiter* rate_limiter, uint64_t size_bytes, - uint64_t size_limit, bool shared_checksum, - std::function progress_callback, const std::string& contents) { + const std::string& fname, const EnvOptions& src_env_options, + RateLimiter* rate_limiter, uint64_t size_bytes, uint64_t size_limit, + bool shared_checksum, std::function progress_callback, + const std::string& contents) { assert(!fname.empty() && fname[0] == '/'); assert(contents.empty() != src_dir.empty()); @@ -1289,7 +1322,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem( if (shared && shared_checksum) { // add checksum and file length to the file name - s = CalculateChecksum(src_dir + fname, db_env_, size_limit, + s = CalculateChecksum(src_dir + fname, db_env_, src_env_options, size_limit, &checksum_value); if (!s.ok()) { return s; @@ -1365,8 +1398,8 @@ Status BackupEngineImpl::AddBackupFileWorkItem( // the file is present and referenced by a backup ROCKS_LOG_INFO(options_.info_log, "%s already present, calculate checksum", fname.c_str()); - s = CalculateChecksum(src_dir + fname, db_env_, size_limit, - &checksum_value); + s = CalculateChecksum(src_dir + fname, db_env_, src_env_options, + size_limit, &checksum_value); } } live_dst_paths.insert(final_dest_path); @@ -1376,8 +1409,8 @@ Status BackupEngineImpl::AddBackupFileWorkItem( copy_dest_path->c_str()); CopyOrCreateWorkItem copy_or_create_work_item( src_dir.empty() ? "" : src_dir + fname, *copy_dest_path, contents, - db_env_, backup_env_, options_.sync, rate_limiter, size_limit, - progress_callback); + db_env_, backup_env_, src_env_options, options_.sync, rate_limiter, + size_limit, progress_callback); BackupAfterCopyOrCreateWorkItem after_copy_or_create_work_item( copy_or_create_work_item.result.get_future(), shared, need_to_copy, backup_env_, temp_dest_path, final_dest_path, dst_relative); @@ -1399,6 +1432,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem( } Status BackupEngineImpl::CalculateChecksum(const std::string& src, Env* src_env, + const EnvOptions& src_env_options, uint64_t size_limit, uint32_t* checksum_value) { *checksum_value = 0; @@ -1406,17 +1440,13 @@ Status BackupEngineImpl::CalculateChecksum(const std::string& src, Env* src_env, size_limit = std::numeric_limits::max(); } - EnvOptions env_options; - env_options.use_mmap_writes = false; - env_options.use_direct_reads = false; - std::unique_ptr src_file; - Status s = src_env->NewSequentialFile(src, &src_file, env_options); + Status s = src_env->NewSequentialFile(src, &src_file, src_env_options); if (!s.ok()) { return s; } - unique_ptr src_reader( + std::unique_ptr src_reader( new SequentialFileReader(std::move(src_file), src)); std::unique_ptr buf(new char[copy_file_buffer_size_]); Slice data; @@ -1634,15 +1664,15 @@ Status BackupEngineImpl::BackupMeta::LoadFromFile( const std::unordered_map& abs_path_to_size) { assert(Empty()); Status s; - unique_ptr backup_meta_file; + std::unique_ptr backup_meta_file; s = env_->NewSequentialFile(meta_filename_, &backup_meta_file, EnvOptions()); if (!s.ok()) { return s; } - unique_ptr backup_meta_reader( + std::unique_ptr backup_meta_reader( new SequentialFileReader(std::move(backup_meta_file), meta_filename_)); - unique_ptr buf(new char[max_backup_meta_file_size_ + 1]); + std::unique_ptr buf(new char[max_backup_meta_file_size_ + 1]); Slice data; s = backup_meta_reader->Read(max_backup_meta_file_size_, &data, buf.get()); @@ -1736,7 +1766,7 @@ Status BackupEngineImpl::BackupMeta::LoadFromFile( Status BackupEngineImpl::BackupMeta::StoreToFile(bool sync) { Status s; - unique_ptr backup_meta_file; + std::unique_ptr backup_meta_file; EnvOptions env_options; env_options.use_mmap_writes = false; env_options.use_direct_writes = false; @@ -1745,7 +1775,7 @@ Status BackupEngineImpl::BackupMeta::StoreToFile(bool sync) { return s; } - unique_ptr buf(new char[max_backup_meta_file_size_]); + std::unique_ptr buf(new char[max_backup_meta_file_size_]); size_t len = 0, buf_size = max_backup_meta_file_size_; len += snprintf(buf.get(), buf_size, "%" PRId64 "\n", timestamp_); len += snprintf(buf.get() + len, buf_size - len, "%" PRIu64 "\n", @@ -1762,7 +1792,8 @@ Status BackupEngineImpl::BackupMeta::StoreToFile(bool sync) { else if (len + hex_meta_strlen >= buf_size) { backup_meta_file->Append(Slice(buf.get(), len)); buf.reset(); - unique_ptr new_reset_buf(new char[max_backup_meta_file_size_]); + std::unique_ptr new_reset_buf( + new char[max_backup_meta_file_size_]); buf.swap(new_reset_buf); len = 0; } @@ -1776,7 +1807,7 @@ Status BackupEngineImpl::BackupMeta::StoreToFile(bool sync) { "%" ROCKSDB_PRIszt "\n", files_.size()) >= buf_size) { backup_meta_file->Append(Slice(buf.get(), len)); buf.reset(); - unique_ptr new_reset_buf(new char[max_backup_meta_file_size_]); + std::unique_ptr new_reset_buf(new char[max_backup_meta_file_size_]); buf.swap(new_reset_buf); len = 0; } @@ -1794,7 +1825,8 @@ Status BackupEngineImpl::BackupMeta::StoreToFile(bool sync) { if (newlen >= buf_size) { backup_meta_file->Append(Slice(buf.get(), len)); buf.reset(); - unique_ptr new_reset_buf(new char[max_backup_meta_file_size_]); + std::unique_ptr new_reset_buf( + new char[max_backup_meta_file_size_]); buf.swap(new_reset_buf); len = 0; } @@ -1821,34 +1853,33 @@ class BackupEngineReadOnlyImpl : public BackupEngineReadOnly { BackupEngineReadOnlyImpl(Env* db_env, const BackupableDBOptions& options) : backup_engine_(new BackupEngineImpl(db_env, options, true)) {} - virtual ~BackupEngineReadOnlyImpl() {} + ~BackupEngineReadOnlyImpl() override {} // The returned BackupInfos are in chronological order, which means the // latest backup comes last. - virtual void GetBackupInfo(std::vector* backup_info) override { + void GetBackupInfo(std::vector* backup_info) override { backup_engine_->GetBackupInfo(backup_info); } - virtual void GetCorruptedBackups( - std::vector* corrupt_backup_ids) override { + void GetCorruptedBackups(std::vector* corrupt_backup_ids) override { backup_engine_->GetCorruptedBackups(corrupt_backup_ids); } - virtual Status RestoreDBFromBackup( + Status RestoreDBFromBackup( BackupID backup_id, const std::string& db_dir, const std::string& wal_dir, const RestoreOptions& restore_options = RestoreOptions()) override { return backup_engine_->RestoreDBFromBackup(backup_id, db_dir, wal_dir, restore_options); } - virtual Status RestoreDBFromLatestBackup( + Status RestoreDBFromLatestBackup( const std::string& db_dir, const std::string& wal_dir, const RestoreOptions& restore_options = RestoreOptions()) override { return backup_engine_->RestoreDBFromLatestBackup(db_dir, wal_dir, restore_options); } - virtual Status VerifyBackup(BackupID backup_id) override { + Status VerifyBackup(BackupID backup_id) override { return backup_engine_->VerifyBackup(backup_id); } diff --git a/ceph/src/rocksdb/utilities/backupable/backupable_db_test.cc b/ceph/src/rocksdb/utilities/backupable/backupable_db_test.cc index 9fdc058fd..1548203dd 100644 --- a/ceph/src/rocksdb/utilities/backupable/backupable_db_test.cc +++ b/ceph/src/rocksdb/utilities/backupable/backupable_db_test.cc @@ -35,8 +35,6 @@ namespace rocksdb { namespace { -using std::unique_ptr; - class DummyDB : public StackableDB { public: /* implicit */ @@ -44,51 +42,42 @@ class DummyDB : public StackableDB { : StackableDB(nullptr), options_(options), dbname_(dbname), deletions_enabled_(true), sequence_number_(0) {} - virtual SequenceNumber GetLatestSequenceNumber() const override { + SequenceNumber GetLatestSequenceNumber() const override { return ++sequence_number_; } - virtual const std::string& GetName() const override { - return dbname_; - } + const std::string& GetName() const override { return dbname_; } - virtual Env* GetEnv() const override { - return options_.env; - } + Env* GetEnv() const override { return options_.env; } using DB::GetOptions; - virtual Options GetOptions( - ColumnFamilyHandle* /*column_family*/) const override { + Options GetOptions(ColumnFamilyHandle* /*column_family*/) const override { return options_; } - virtual DBOptions GetDBOptions() const override { - return DBOptions(options_); - } + DBOptions GetDBOptions() const override { return DBOptions(options_); } - virtual Status EnableFileDeletions(bool /*force*/) override { + Status EnableFileDeletions(bool /*force*/) override { EXPECT_TRUE(!deletions_enabled_); deletions_enabled_ = true; return Status::OK(); } - virtual Status DisableFileDeletions() override { + Status DisableFileDeletions() override { EXPECT_TRUE(deletions_enabled_); deletions_enabled_ = false; return Status::OK(); } - virtual Status GetLiveFiles(std::vector& vec, uint64_t* mfs, - bool /*flush_memtable*/ = true) override { + Status GetLiveFiles(std::vector& vec, uint64_t* mfs, + bool /*flush_memtable*/ = true) override { EXPECT_TRUE(!deletions_enabled_); vec = live_files_; *mfs = 100; return Status::OK(); } - virtual ColumnFamilyHandle* DefaultColumnFamily() const override { - return nullptr; - } + ColumnFamilyHandle* DefaultColumnFamily() const override { return nullptr; } class DummyLogFile : public LogFile { public: @@ -96,36 +85,32 @@ class DummyDB : public StackableDB { DummyLogFile(const std::string& path, bool alive = true) : path_(path), alive_(alive) {} - virtual std::string PathName() const override { - return path_; - } + std::string PathName() const override { return path_; } - virtual uint64_t LogNumber() const override { - // what business do you have calling this method? - ADD_FAILURE(); - return 0; - } + uint64_t LogNumber() const override { + // what business do you have calling this method? + ADD_FAILURE(); + return 0; + } - virtual WalFileType Type() const override { - return alive_ ? kAliveLogFile : kArchivedLogFile; - } + WalFileType Type() const override { + return alive_ ? kAliveLogFile : kArchivedLogFile; + } - virtual SequenceNumber StartSequence() const override { - // this seqnum guarantees the dummy file will be included in the backup - // as long as it is alive. - return kMaxSequenceNumber; - } + SequenceNumber StartSequence() const override { + // this seqnum guarantees the dummy file will be included in the backup + // as long as it is alive. + return kMaxSequenceNumber; + } - virtual uint64_t SizeFileBytes() const override { - return 0; - } + uint64_t SizeFileBytes() const override { return 0; } - private: - std::string path_; - bool alive_; + private: + std::string path_; + bool alive_; }; // DummyLogFile - virtual Status GetSortedWalFiles(VectorLogPtr& files) override { + Status GetSortedWalFiles(VectorLogPtr& files) override { EXPECT_TRUE(!deletions_enabled_); files.resize(wal_files_.size()); for (size_t i = 0; i < files.size(); ++i) { @@ -136,7 +121,7 @@ class DummyDB : public StackableDB { } // To avoid FlushWAL called on stacked db which is nullptr - virtual Status FlushWAL(bool /*sync*/) override { return Status::OK(); } + Status FlushWAL(bool /*sync*/) override { return Status::OK(); } std::vector live_files_; // pair @@ -156,7 +141,7 @@ class TestEnv : public EnvWrapper { public: explicit DummySequentialFile(bool fail_reads) : SequentialFile(), rnd_(5), fail_reads_(fail_reads) {} - virtual Status Read(size_t n, Slice* result, char* scratch) override { + Status Read(size_t n, Slice* result, char* scratch) override { if (fail_reads_) { return Status::IOError(); } @@ -169,17 +154,19 @@ class TestEnv : public EnvWrapper { return Status::OK(); } - virtual Status Skip(uint64_t n) override { + Status Skip(uint64_t n) override { size_left = (n > size_left) ? size_left - n : 0; return Status::OK(); } + private: size_t size_left = 200; Random rnd_; bool fail_reads_; }; - Status NewSequentialFile(const std::string& f, unique_ptr* r, + Status NewSequentialFile(const std::string& f, + std::unique_ptr* r, const EnvOptions& options) override { MutexLock l(&mutex_); if (dummy_sequential_file_) { @@ -187,11 +174,18 @@ class TestEnv : public EnvWrapper { new TestEnv::DummySequentialFile(dummy_sequential_file_fail_reads_)); return Status::OK(); } else { - return EnvWrapper::NewSequentialFile(f, r, options); + Status s = EnvWrapper::NewSequentialFile(f, r, options); + if (s.ok()) { + if ((*r)->use_direct_io()) { + ++num_direct_seq_readers_; + } + ++num_seq_readers_; + } + return s; } } - Status NewWritableFile(const std::string& f, unique_ptr* r, + Status NewWritableFile(const std::string& f, std::unique_ptr* r, const EnvOptions& options) override { MutexLock l(&mutex_); written_files_.push_back(f); @@ -199,10 +193,31 @@ class TestEnv : public EnvWrapper { return Status::NotSupported("Sorry, can't do this"); } limit_written_files_--; - return EnvWrapper::NewWritableFile(f, r, options); + Status s = EnvWrapper::NewWritableFile(f, r, options); + if (s.ok()) { + if ((*r)->use_direct_io()) { + ++num_direct_writers_; + } + ++num_writers_; + } + return s; } - virtual Status DeleteFile(const std::string& fname) override { + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { + MutexLock l(&mutex_); + Status s = EnvWrapper::NewRandomAccessFile(fname, result, options); + if (s.ok()) { + if ((*result)->use_direct_io()) { + ++num_direct_rand_readers_; + } + ++num_rand_readers_; + } + return s; + } + + Status DeleteFile(const std::string& fname) override { MutexLock l(&mutex_); if (fail_delete_files_) { return Status::IOError(); @@ -212,7 +227,7 @@ class TestEnv : public EnvWrapper { return EnvWrapper::DeleteFile(fname); } - virtual Status DeleteDir(const std::string& dirname) override { + Status DeleteDir(const std::string& dirname) override { MutexLock l(&mutex_); if (fail_delete_files_) { return Status::IOError(); @@ -307,14 +322,31 @@ class TestEnv : public EnvWrapper { } void SetNewDirectoryFailure(bool fail) { new_directory_failure_ = fail; } - virtual Status NewDirectory(const std::string& name, - unique_ptr* result) override { + Status NewDirectory(const std::string& name, + std::unique_ptr* result) override { if (new_directory_failure_) { return Status::IOError("SimulatedFailure"); } return EnvWrapper::NewDirectory(name, result); } + void ClearFileOpenCounters() { + MutexLock l(&mutex_); + num_rand_readers_ = 0; + num_direct_rand_readers_ = 0; + num_seq_readers_ = 0; + num_direct_seq_readers_ = 0; + num_writers_ = 0; + num_direct_writers_ = 0; + } + + int num_rand_readers() { return num_rand_readers_; } + int num_direct_rand_readers() { return num_direct_rand_readers_; } + int num_seq_readers() { return num_seq_readers_; } + int num_direct_seq_readers() { return num_direct_seq_readers_; } + int num_writers() { return num_writers_; } + int num_direct_writers() { return num_direct_writers_; } + private: port::Mutex mutex_; bool dummy_sequential_file_ = false; @@ -328,6 +360,15 @@ class TestEnv : public EnvWrapper { bool get_children_failure_ = false; bool create_dir_if_missing_failure_ = false; bool new_directory_failure_ = false; + + // Keeps track of how many files of each type were successfully opened, and + // out of those, how many were opened with direct I/O. + std::atomic num_rand_readers_; + std::atomic num_direct_rand_readers_; + std::atomic num_seq_readers_; + std::atomic num_direct_seq_readers_; + std::atomic num_writers_; + std::atomic num_direct_writers_; }; // TestEnv class FileManager : public EnvWrapper { @@ -427,7 +468,7 @@ class FileManager : public EnvWrapper { } Status WriteToFile(const std::string& fname, const std::string& data) { - unique_ptr file; + std::unique_ptr file; EnvOptions env_options; env_options.use_mmap_writes = false; Status s = EnvWrapper::NewWritableFile(fname, &file, env_options); @@ -620,22 +661,22 @@ class BackupableDBTest : public testing::Test { std::shared_ptr logger_; // envs - unique_ptr db_chroot_env_; - unique_ptr backup_chroot_env_; - unique_ptr test_db_env_; - unique_ptr test_backup_env_; - unique_ptr file_manager_; + std::unique_ptr db_chroot_env_; + std::unique_ptr backup_chroot_env_; + std::unique_ptr test_db_env_; + std::unique_ptr test_backup_env_; + std::unique_ptr file_manager_; // all the dbs! DummyDB* dummy_db_; // BackupableDB owns dummy_db_ - unique_ptr db_; - unique_ptr backup_engine_; + std::unique_ptr db_; + std::unique_ptr backup_engine_; // options Options options_; protected: - unique_ptr backupable_options_; + std::unique_ptr backupable_options_; }; // BackupableDBTest void AppendPath(const std::string& path, std::vector& v) { @@ -1633,6 +1674,59 @@ TEST_F(BackupableDBTest, WriteOnlyEngineNoSharedFileDeletion) { AssertBackupConsistency(i + 1, 0, (i + 1) * kNumKeys); } } + +TEST_P(BackupableDBTestWithParam, BackupUsingDirectIO) { + // Tests direct I/O on the backup engine's reads and writes on the DB env and + // backup env + // We use ChrootEnv underneath so the below line checks for direct I/O support + // in the chroot directory, not the true filesystem root. + if (!test::IsDirectIOSupported(test_db_env_.get(), "/")) { + return; + } + const int kNumKeysPerBackup = 100; + const int kNumBackups = 3; + options_.use_direct_reads = true; + OpenDBAndBackupEngine(true /* destroy_old_data */); + for (int i = 0; i < kNumBackups; ++i) { + FillDB(db_.get(), i * kNumKeysPerBackup /* from */, + (i + 1) * kNumKeysPerBackup /* to */); + ASSERT_OK(db_->Flush(FlushOptions())); + + // Clear the file open counters and then do a bunch of backup engine ops. + // For all ops, files should be opened in direct mode. + test_backup_env_->ClearFileOpenCounters(); + test_db_env_->ClearFileOpenCounters(); + CloseBackupEngine(); + OpenBackupEngine(); + ASSERT_OK(backup_engine_->CreateNewBackup(db_.get(), + false /* flush_before_backup */)); + ASSERT_OK(backup_engine_->VerifyBackup(i + 1)); + CloseBackupEngine(); + OpenBackupEngine(); + std::vector backup_infos; + backup_engine_->GetBackupInfo(&backup_infos); + ASSERT_EQ(static_cast(i + 1), backup_infos.size()); + + // Verify backup engine always opened files with direct I/O + ASSERT_EQ(0, test_db_env_->num_writers()); + ASSERT_EQ(0, test_db_env_->num_rand_readers()); + ASSERT_GT(test_db_env_->num_direct_seq_readers(), 0); + // Currently the DB doesn't support reading WALs or manifest with direct + // I/O, so subtract two. + ASSERT_EQ(test_db_env_->num_seq_readers() - 2, + test_db_env_->num_direct_seq_readers()); + ASSERT_EQ(0, test_db_env_->num_rand_readers()); + } + CloseDBAndBackupEngine(); + + for (int i = 0; i < kNumBackups; ++i) { + AssertBackupConsistency(i + 1 /* backup_id */, + i * kNumKeysPerBackup /* start_exist */, + (i + 1) * kNumKeysPerBackup /* end_exist */, + (i + 2) * kNumKeysPerBackup /* end */); + } +} + } // anon namespace } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_compaction_filter.cc b/ceph/src/rocksdb/utilities/blob_db/blob_compaction_filter.cc index cbc76a98d..f145d9a92 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_compaction_filter.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_compaction_filter.cc @@ -22,24 +22,21 @@ class BlobIndexCompactionFilter : public CompactionFilter { current_time_(current_time), statistics_(statistics) {} - virtual ~BlobIndexCompactionFilter() { + ~BlobIndexCompactionFilter() override { RecordTick(statistics_, BLOB_DB_BLOB_INDEX_EXPIRED_COUNT, expired_count_); RecordTick(statistics_, BLOB_DB_BLOB_INDEX_EXPIRED_SIZE, expired_size_); RecordTick(statistics_, BLOB_DB_BLOB_INDEX_EVICTED_COUNT, evicted_count_); RecordTick(statistics_, BLOB_DB_BLOB_INDEX_EVICTED_SIZE, evicted_size_); } - virtual const char* Name() const override { - return "BlobIndexCompactionFilter"; - } + const char* Name() const override { return "BlobIndexCompactionFilter"; } // Filter expired blob indexes regardless of snapshots. - virtual bool IgnoreSnapshots() const override { return true; } + bool IgnoreSnapshots() const override { return true; } - virtual Decision FilterV2(int /*level*/, const Slice& key, - ValueType value_type, const Slice& value, - std::string* /*new_value*/, - std::string* /*skip_until*/) const override { + Decision FilterV2(int /*level*/, const Slice& key, ValueType value_type, + const Slice& value, std::string* /*new_value*/, + std::string* /*skip_until*/) const override { if (value_type != kBlobIndex) { return Decision::kKeep; } diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_db.cc b/ceph/src/rocksdb/utilities/blob_db/blob_db.cc index b5948cd62..d660def49 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_db.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_db.cc @@ -76,7 +76,7 @@ void BlobDBOptions::Dump(Logger* log) const { log, " BlobDBOptions.max_db_size: %" PRIu64, max_db_size); ROCKS_LOG_HEADER( - log, " BlobDBOptions.ttl_range_secs: %" PRIu32, + log, " BlobDBOptions.ttl_range_secs: %" PRIu64, ttl_range_secs); ROCKS_LOG_HEADER( log, " BlobDBOptions.min_blob_size: %" PRIu64, @@ -93,12 +93,6 @@ void BlobDBOptions::Dump(Logger* log) const { ROCKS_LOG_HEADER( log, " BlobDBOptions.enable_garbage_collection: %d", enable_garbage_collection); - ROCKS_LOG_HEADER( - log, " BlobDBOptions.garbage_collection_interval_secs: %" PRIu64, - garbage_collection_interval_secs); - ROCKS_LOG_HEADER( - log, "BlobDBOptions.garbage_collection_deletion_size_threshold: %lf", - garbage_collection_deletion_size_threshold); ROCKS_LOG_HEADER( log, " BlobDBOptions.disable_background_tasks: %d", disable_background_tasks); diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_db.h b/ceph/src/rocksdb/utilities/blob_db/blob_db.h index 021d52aa8..3beb74fc9 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_db.h +++ b/ceph/src/rocksdb/utilities/blob_db/blob_db.h @@ -52,7 +52,7 @@ struct BlobDBOptions { // and so on uint64_t ttl_range_secs = 3600; - // The smallest value to store in blob log. Value larger than this threshold + // The smallest value to store in blob log. Values smaller than this threshold // will be inlined in base DB together with the key. uint64_t min_blob_size = 0; @@ -73,13 +73,6 @@ struct BlobDBOptions { // blob files will be cleanup based on TTL. bool enable_garbage_collection = false; - // Time interval to trigger garbage collection, in seconds. - uint64_t garbage_collection_interval_secs = 60; - - // If garbage collection is enabled, blob files with deleted size no less - // than this ratio will become candidates to be cleanup. - double garbage_collection_deletion_size_threshold = 0.75; - // Disable all background job. Used for test only. bool disable_background_tasks = false; diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.cc b/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.cc index 1a32bd562..5dcddc214 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.cc @@ -26,10 +26,12 @@ #include "util/cast_util.h" #include "util/crc32c.h" #include "util/file_reader_writer.h" +#include "util/file_util.h" #include "util/filename.h" #include "util/logging.h" #include "util/mutexlock.h" #include "util/random.h" +#include "util/sst_file_manager_impl.h" #include "util/stop_watch.h" #include "util/sync_point.h" #include "util/timer_queue.h" @@ -45,13 +47,6 @@ int kBlockBasedTableVersionFormat = 2; namespace rocksdb { namespace blob_db { -WalFilter::WalProcessingOption BlobReconcileWalFilter::LogRecordFound( - unsigned long long /*log_number*/, const std::string& /*log_file_name*/, - const WriteBatch& /*batch*/, WriteBatch* /*new_batch*/, - bool* /*batch_changed*/) { - return WalFilter::WalProcessingOption::kContinueProcessing; -} - bool BlobFileComparator::operator()( const std::shared_ptr& lhs, const std::shared_ptr& rhs) const { @@ -100,6 +95,7 @@ BlobDBImpl::BlobDBImpl(const std::string& dbname, } BlobDBImpl::~BlobDBImpl() { + tqueue_.shutdown(); // CancelAllBackgroundWork(db_, true); Status s __attribute__((__unused__)) = Close(); assert(s.ok()); @@ -185,6 +181,12 @@ Status BlobDBImpl::Open(std::vector* handles) { return s; } db_impl_ = static_cast_with_check(db_->GetRootDB()); + + // Add trash files in blob dir to file delete scheduler. + SstFileManagerImpl* sfm = static_cast( + db_impl_->immutable_db_options().sst_file_manager.get()); + DeleteScheduler::CleanupDirectory(env_, sfm, blob_dir_); + UpdateLiveSSTSize(); // Start background jobs. @@ -203,9 +205,6 @@ void BlobDBImpl::StartBackgroundTasks() { tqueue_.add( kReclaimOpenFilesPeriodMillisecs, std::bind(&BlobDBImpl::ReclaimOpenFiles, this, std::placeholders::_1)); - tqueue_.add(static_cast( - bdb_options_.garbage_collection_interval_secs * 1000), - std::bind(&BlobDBImpl::RunGC, this, std::placeholders::_1)); tqueue_.add( kDeleteObsoleteFilesPeriodMillisecs, std::bind(&BlobDBImpl::DeleteObsoleteFiles, this, std::placeholders::_1)); @@ -273,7 +272,7 @@ Status BlobDBImpl::OpenAllBlobFiles() { continue; } else if (!read_metadata_status.ok()) { ROCKS_LOG_ERROR(db_options_.info_log, - "Unable to read metadata of blob file % " PRIu64 + "Unable to read metadata of blob file %" PRIu64 ", status: '%s'", file_number, read_metadata_status.ToString().c_str()); return read_metadata_status; @@ -346,8 +345,9 @@ Status BlobDBImpl::CreateWriterLocked(const std::shared_ptr& bfile) { uint64_t boffset = bfile->GetFileSize(); if (debug_level_ >= 2 && boffset) { - ROCKS_LOG_DEBUG(db_options_.info_log, "Open blob file: %s with offset: %d", - fpath.c_str(), boffset); + ROCKS_LOG_DEBUG(db_options_.info_log, + "Open blob file: %s with offset: %" PRIu64, fpath.c_str(), + boffset); } Writer::ElemType et = Writer::kEtNone; @@ -357,8 +357,8 @@ Status BlobDBImpl::CreateWriterLocked(const std::shared_ptr& bfile) { et = Writer::kEtRecord; } else if (bfile->file_size_) { ROCKS_LOG_WARN(db_options_.info_log, - "Open blob file: %s with wrong size: %d", fpath.c_str(), - boffset); + "Open blob file: %s with wrong size: %" PRIu64, + fpath.c_str(), boffset); return Status::Corruption("Invalid blob file size"); } @@ -404,82 +404,91 @@ std::shared_ptr BlobDBImpl::FindBlobFileLocked( return (b1 || b2) ? nullptr : (*finditr); } -std::shared_ptr BlobDBImpl::CheckOrCreateWriterLocked( - const std::shared_ptr& bfile) { - std::shared_ptr writer = bfile->GetWriter(); - if (writer) return writer; - - Status s = CreateWriterLocked(bfile); - if (!s.ok()) return nullptr; - - writer = bfile->GetWriter(); - return writer; +Status BlobDBImpl::CheckOrCreateWriterLocked( + const std::shared_ptr& blob_file, + std::shared_ptr* writer) { + assert(writer != nullptr); + *writer = blob_file->GetWriter(); + if (*writer != nullptr) { + return Status::OK(); + } + Status s = CreateWriterLocked(blob_file); + if (s.ok()) { + *writer = blob_file->GetWriter(); + } + return s; } -std::shared_ptr BlobDBImpl::SelectBlobFile() { +Status BlobDBImpl::SelectBlobFile(std::shared_ptr* blob_file) { + assert(blob_file != nullptr); { ReadLock rl(&mutex_); if (open_non_ttl_file_ != nullptr) { - return open_non_ttl_file_; + *blob_file = open_non_ttl_file_; + return Status::OK(); } } // CHECK again WriteLock wl(&mutex_); if (open_non_ttl_file_ != nullptr) { - return open_non_ttl_file_; + *blob_file = open_non_ttl_file_; + return Status::OK(); } - std::shared_ptr bfile = NewBlobFile("SelectBlobFile"); - assert(bfile); + *blob_file = NewBlobFile("SelectBlobFile"); + assert(*blob_file != nullptr); // file not visible, hence no lock - std::shared_ptr writer = CheckOrCreateWriterLocked(bfile); - if (!writer) { + std::shared_ptr writer; + Status s = CheckOrCreateWriterLocked(*blob_file, &writer); + if (!s.ok()) { ROCKS_LOG_ERROR(db_options_.info_log, - "Failed to get writer from blob file: %s", - bfile->PathName().c_str()); - return nullptr; + "Failed to get writer from blob file: %s, error: %s", + (*blob_file)->PathName().c_str(), s.ToString().c_str()); + return s; } - bfile->file_size_ = BlobLogHeader::kSize; - bfile->header_.compression = bdb_options_.compression; - bfile->header_.has_ttl = false; - bfile->header_.column_family_id = + (*blob_file)->file_size_ = BlobLogHeader::kSize; + (*blob_file)->header_.compression = bdb_options_.compression; + (*blob_file)->header_.has_ttl = false; + (*blob_file)->header_.column_family_id = reinterpret_cast(DefaultColumnFamily())->GetID(); - bfile->header_valid_ = true; - bfile->SetColumnFamilyId(bfile->header_.column_family_id); - bfile->SetHasTTL(false); - bfile->SetCompression(bdb_options_.compression); + (*blob_file)->header_valid_ = true; + (*blob_file)->SetColumnFamilyId((*blob_file)->header_.column_family_id); + (*blob_file)->SetHasTTL(false); + (*blob_file)->SetCompression(bdb_options_.compression); - Status s = writer->WriteHeader(bfile->header_); + s = writer->WriteHeader((*blob_file)->header_); if (!s.ok()) { ROCKS_LOG_ERROR(db_options_.info_log, "Failed to write header to new blob file: %s" " status: '%s'", - bfile->PathName().c_str(), s.ToString().c_str()); - return nullptr; + (*blob_file)->PathName().c_str(), s.ToString().c_str()); + return s; } - blob_files_.insert(std::make_pair(bfile->BlobFileNumber(), bfile)); - open_non_ttl_file_ = bfile; + blob_files_.insert( + std::make_pair((*blob_file)->BlobFileNumber(), *blob_file)); + open_non_ttl_file_ = *blob_file; total_blob_size_ += BlobLogHeader::kSize; - return bfile; + return s; } -std::shared_ptr BlobDBImpl::SelectBlobFileTTL(uint64_t expiration) { +Status BlobDBImpl::SelectBlobFileTTL(uint64_t expiration, + std::shared_ptr* blob_file) { + assert(blob_file != nullptr); assert(expiration != kNoExpiration); uint64_t epoch_read = 0; - std::shared_ptr bfile; { ReadLock rl(&mutex_); - bfile = FindBlobFileLocked(expiration); + *blob_file = FindBlobFileLocked(expiration); epoch_read = epoch_of_.load(); } - if (bfile) { - assert(!bfile->Immutable()); - return bfile; + if (*blob_file != nullptr) { + assert(!(*blob_file)->Immutable()); + return Status::OK(); } uint64_t exp_low = @@ -487,61 +496,67 @@ std::shared_ptr BlobDBImpl::SelectBlobFileTTL(uint64_t expiration) { uint64_t exp_high = exp_low + bdb_options_.ttl_range_secs; ExpirationRange expiration_range = std::make_pair(exp_low, exp_high); - bfile = NewBlobFile("SelectBlobFileTTL"); - assert(bfile); + *blob_file = NewBlobFile("SelectBlobFileTTL"); + assert(*blob_file != nullptr); - ROCKS_LOG_INFO(db_options_.info_log, "New blob file TTL range: %s %d %d", - bfile->PathName().c_str(), exp_low, exp_high); + ROCKS_LOG_INFO(db_options_.info_log, + "New blob file TTL range: %s %" PRIu64 " %" PRIu64, + (*blob_file)->PathName().c_str(), exp_low, exp_high); LogFlush(db_options_.info_log); // we don't need to take lock as no other thread is seeing bfile yet - std::shared_ptr writer = CheckOrCreateWriterLocked(bfile); - if (!writer) { - ROCKS_LOG_ERROR(db_options_.info_log, - "Failed to get writer from blob file with TTL: %s", - bfile->PathName().c_str()); - return nullptr; + std::shared_ptr writer; + Status s = CheckOrCreateWriterLocked(*blob_file, &writer); + if (!s.ok()) { + ROCKS_LOG_ERROR( + db_options_.info_log, + "Failed to get writer from blob file with TTL: %s, error: %s", + (*blob_file)->PathName().c_str(), s.ToString().c_str()); + return s; } - bfile->header_.expiration_range = expiration_range; - bfile->header_.compression = bdb_options_.compression; - bfile->header_.has_ttl = true; - bfile->header_.column_family_id = + (*blob_file)->header_.expiration_range = expiration_range; + (*blob_file)->header_.compression = bdb_options_.compression; + (*blob_file)->header_.has_ttl = true; + (*blob_file)->header_.column_family_id = reinterpret_cast(DefaultColumnFamily())->GetID(); - ; - bfile->header_valid_ = true; - bfile->SetColumnFamilyId(bfile->header_.column_family_id); - bfile->SetHasTTL(true); - bfile->SetCompression(bdb_options_.compression); - bfile->file_size_ = BlobLogHeader::kSize; + (*blob_file)->header_valid_ = true; + (*blob_file)->SetColumnFamilyId((*blob_file)->header_.column_family_id); + (*blob_file)->SetHasTTL(true); + (*blob_file)->SetCompression(bdb_options_.compression); + (*blob_file)->file_size_ = BlobLogHeader::kSize; // set the first value of the range, since that is // concrete at this time. also necessary to add to open_ttl_files_ - bfile->expiration_range_ = expiration_range; + (*blob_file)->expiration_range_ = expiration_range; WriteLock wl(&mutex_); // in case the epoch has shifted in the interim, then check // check condition again - should be rare. if (epoch_of_.load() != epoch_read) { - auto bfile2 = FindBlobFileLocked(expiration); - if (bfile2) return bfile2; + std::shared_ptr blob_file2 = FindBlobFileLocked(expiration); + if (blob_file2 != nullptr) { + *blob_file = std::move(blob_file2); + return Status::OK(); + } } - Status s = writer->WriteHeader(bfile->header_); + s = writer->WriteHeader((*blob_file)->header_); if (!s.ok()) { ROCKS_LOG_ERROR(db_options_.info_log, "Failed to write header to new blob file: %s" " status: '%s'", - bfile->PathName().c_str(), s.ToString().c_str()); - return nullptr; + (*blob_file)->PathName().c_str(), s.ToString().c_str()); + return s; } - blob_files_.insert(std::make_pair(bfile->BlobFileNumber(), bfile)); - open_ttl_files_.insert(bfile); + blob_files_.insert( + std::make_pair((*blob_file)->BlobFileNumber(), *blob_file)); + open_ttl_files_.insert(*blob_file); total_blob_size_ += BlobLogHeader::kSize; epoch_of_++; - return bfile; + return s; } class BlobDBImpl::BlobInserter : public WriteBatch::Handler { @@ -560,8 +575,8 @@ class BlobDBImpl::BlobInserter : public WriteBatch::Handler { WriteBatch* batch() { return &batch_; } - virtual Status PutCF(uint32_t column_family_id, const Slice& key, - const Slice& value) override { + Status PutCF(uint32_t column_family_id, const Slice& key, + const Slice& value) override { if (column_family_id != default_cf_id_) { return Status::NotSupported( "Blob DB doesn't support non-default column family."); @@ -571,8 +586,7 @@ class BlobDBImpl::BlobInserter : public WriteBatch::Handler { return s; } - virtual Status DeleteCF(uint32_t column_family_id, - const Slice& key) override { + Status DeleteCF(uint32_t column_family_id, const Slice& key) override { if (column_family_id != default_cf_id_) { return Status::NotSupported( "Blob DB doesn't support non-default column family."); @@ -592,17 +606,17 @@ class BlobDBImpl::BlobInserter : public WriteBatch::Handler { return s; } - virtual Status SingleDeleteCF(uint32_t /*column_family_id*/, - const Slice& /*key*/) override { + Status SingleDeleteCF(uint32_t /*column_family_id*/, + const Slice& /*key*/) override { return Status::NotSupported("Not supported operation in blob db."); } - virtual Status MergeCF(uint32_t /*column_family_id*/, const Slice& /*key*/, - const Slice& /*value*/) override { + Status MergeCF(uint32_t /*column_family_id*/, const Slice& /*key*/, + const Slice& /*value*/) override { return Status::NotSupported("Not supported operation in blob db."); } - virtual void LogData(const Slice& blob) override { batch_.PutLogData(blob); } + void LogData(const Slice& blob) override { batch_.PutLogData(blob); } }; Status BlobDBImpl::Write(const WriteOptions& options, WriteBatch* updates) { @@ -695,43 +709,48 @@ Status BlobDBImpl::PutBlobValue(const WriteOptions& /*options*/, return s; } - std::shared_ptr bfile = (expiration != kNoExpiration) - ? SelectBlobFileTTL(expiration) - : SelectBlobFile(); - assert(bfile != nullptr); - assert(bfile->compression() == bdb_options_.compression); - - s = AppendBlob(bfile, headerbuf, key, value_compressed, expiration, - &index_entry); - if (expiration == kNoExpiration) { - RecordTick(statistics_, BLOB_DB_WRITE_BLOB); + std::shared_ptr blob_file; + if (expiration != kNoExpiration) { + s = SelectBlobFileTTL(expiration, &blob_file); } else { - RecordTick(statistics_, BLOB_DB_WRITE_BLOB_TTL); + s = SelectBlobFile(&blob_file); + } + if (s.ok()) { + assert(blob_file != nullptr); + assert(blob_file->compression() == bdb_options_.compression); + s = AppendBlob(blob_file, headerbuf, key, value_compressed, expiration, + &index_entry); } - if (s.ok()) { if (expiration != kNoExpiration) { - bfile->ExtendExpirationRange(expiration); + blob_file->ExtendExpirationRange(expiration); } - s = CloseBlobFileIfNeeded(bfile); - if (s.ok()) { - s = WriteBatchInternal::PutBlobIndex(batch, column_family_id, key, - index_entry); + s = CloseBlobFileIfNeeded(blob_file); + } + if (s.ok()) { + s = WriteBatchInternal::PutBlobIndex(batch, column_family_id, key, + index_entry); + } + if (s.ok()) { + if (expiration == kNoExpiration) { + RecordTick(statistics_, BLOB_DB_WRITE_BLOB); + } else { + RecordTick(statistics_, BLOB_DB_WRITE_BLOB_TTL); } } else { - ROCKS_LOG_ERROR(db_options_.info_log, - "Failed to append blob to FILE: %s: KEY: %s VALSZ: %d" - " status: '%s' blob_file: '%s'", - bfile->PathName().c_str(), key.ToString().c_str(), - value.size(), s.ToString().c_str(), - bfile->DumpState().c_str()); + ROCKS_LOG_ERROR( + db_options_.info_log, + "Failed to append blob to FILE: %s: KEY: %s VALSZ: %" ROCKSDB_PRIszt + " status: '%s' blob_file: '%s'", + blob_file->PathName().c_str(), key.ToString().c_str(), value.size(), + s.ToString().c_str(), blob_file->DumpState().c_str()); } } RecordTick(statistics_, BLOB_DB_NUM_KEYS_WRITTEN); RecordTick(statistics_, BLOB_DB_BYTES_WRITTEN, key.size() + value.size()); - MeasureTime(statistics_, BLOB_DB_KEY_SIZE, key.size()); - MeasureTime(statistics_, BLOB_DB_VALUE_SIZE, value.size()); + RecordInHistogram(statistics_, BLOB_DB_KEY_SIZE, key.size()); + RecordInHistogram(statistics_, BLOB_DB_VALUE_SIZE, value.size()); return s; } @@ -742,10 +761,13 @@ Slice BlobDBImpl::GetCompressedSlice(const Slice& raw, return raw; } StopWatch compression_sw(env_, statistics_, BLOB_DB_COMPRESSION_MICROS); - CompressionType ct = bdb_options_.compression; - CompressionContext compression_ctx(ct); - CompressBlock(raw, compression_ctx, &ct, kBlockBasedTableVersionFormat, - compression_output); + CompressionType type = bdb_options_.compression; + CompressionOptions opts; + CompressionContext context(type); + CompressionInfo info(opts, context, CompressionDict::GetEmptyDict(), type, + 0 /* sample_for_compression */); + CompressBlock(raw, info, &type, kBlockBasedTableVersionFormat, false, + compression_output, nullptr, nullptr); return *compression_output; } @@ -867,9 +889,10 @@ Status BlobDBImpl::AppendBlob(const std::shared_ptr& bfile, uint64_t key_offset = 0; { WriteLock lockbfile_w(&bfile->mutex_); - std::shared_ptr writer = CheckOrCreateWriterLocked(bfile); - if (!writer) { - return Status::IOError("Failed to create blob writer"); + std::shared_ptr writer; + s = CheckOrCreateWriterLocked(bfile, &writer); + if (!s.ok()) { + return s; } // write the blob to the blob log. @@ -1027,20 +1050,19 @@ Status BlobDBImpl::GetBlobValue(const Slice& key, const Slice& index_entry, ROCKS_LOG_DEBUG(db_options_.info_log, "Failed to read blob from blob file %" PRIu64 ", blob_offset: %" PRIu64 ", blob_size: %" PRIu64 - ", key_size: " PRIu64 ", read " PRIu64 - " bytes, status: '%s'", + ", key_size: %" ROCKSDB_PRIszt ", status: '%s'", bfile->BlobFileNumber(), blob_index.offset(), blob_index.size(), key.size(), s.ToString().c_str()); return s; } if (blob_record.size() != record_size) { - ROCKS_LOG_DEBUG(db_options_.info_log, - "Failed to read blob from blob file %" PRIu64 - ", blob_offset: %" PRIu64 ", blob_size: %" PRIu64 - ", key_size: " PRIu64 ", read " PRIu64 - " bytes, status: '%s'", - bfile->BlobFileNumber(), blob_index.offset(), - blob_index.size(), key.size(), s.ToString().c_str()); + ROCKS_LOG_DEBUG( + db_options_.info_log, + "Failed to read blob from blob file %" PRIu64 ", blob_offset: %" PRIu64 + ", blob_size: %" PRIu64 ", key_size: %" ROCKSDB_PRIszt + ", read %" ROCKSDB_PRIszt " bytes, expected %" PRIu64 " bytes", + bfile->BlobFileNumber(), blob_index.offset(), blob_index.size(), + key.size(), blob_record.size(), record_size); return Status::Corruption("Failed to retrieve blob from blob index."); } @@ -1052,7 +1074,7 @@ Status BlobDBImpl::GetBlobValue(const Slice& key, const Slice& index_entry, ROCKS_LOG_DEBUG(db_options_.info_log, "Unable to decode CRC from blob file %" PRIu64 ", blob_offset: %" PRIu64 ", blob_size: %" PRIu64 - ", key size: %" PRIu64 ", status: '%s'", + ", key size: %" ROCKSDB_PRIszt ", status: '%s'", bfile->BlobFileNumber(), blob_index.offset(), blob_index.size(), key.size(), s.ToString().c_str()); return Status::Corruption("Unable to decode checksum."); @@ -1080,9 +1102,11 @@ Status BlobDBImpl::GetBlobValue(const Slice& key, const Slice& index_entry, { StopWatch decompression_sw(env_, statistics_, BLOB_DB_DECOMPRESSION_MICROS); - UncompressionContext uncompression_ctx(bfile->compression()); + UncompressionContext context(bfile->compression()); + UncompressionInfo info(context, UncompressionDict::GetEmptyDict(), + bfile->compression()); s = UncompressBlockContentsForCompressionType( - uncompression_ctx, blob_value.data(), blob_value.size(), &contents, + info, blob_value.data(), blob_value.size(), &contents, kBlockBasedTableVersionFormat, *(cfh->cfd()->ioptions())); } value->PinSelf(contents.data); @@ -1151,9 +1175,9 @@ std::pair BlobDBImpl::SanityCheck(bool aborted) { } ROCKS_LOG_INFO(db_options_.info_log, "Starting Sanity Check"); - ROCKS_LOG_INFO(db_options_.info_log, "Number of files %" PRIu64, + ROCKS_LOG_INFO(db_options_.info_log, "Number of files %" ROCKSDB_PRIszt, blob_files_.size()); - ROCKS_LOG_INFO(db_options_.info_log, "Number of open files %" PRIu64, + ROCKS_LOG_INFO(db_options_.info_log, "Number of open files %" ROCKSDB_PRIszt, open_ttl_files_.size()); for (auto bfile : open_ttl_files_) { @@ -1291,6 +1315,9 @@ std::pair BlobDBImpl::EvictExpiredFiles(bool aborted) { return std::make_pair(false, -1); } + TEST_SYNC_POINT("BlobDBImpl::EvictExpiredFiles:0"); + TEST_SYNC_POINT("BlobDBImpl::EvictExpiredFiles:1"); + std::vector> process_files; uint64_t now = EpochNow(); { @@ -1305,6 +1332,10 @@ std::pair BlobDBImpl::EvictExpiredFiles(bool aborted) { } } + TEST_SYNC_POINT("BlobDBImpl::EvictExpiredFiles:2"); + TEST_SYNC_POINT("BlobDBImpl::EvictExpiredFiles:3"); + TEST_SYNC_POINT_CALLBACK("BlobDBImpl::EvictExpiredFiles:cb", nullptr); + SequenceNumber seq = GetLatestSequenceNumber(); { MutexLock l(&write_mutex_); @@ -1387,7 +1418,7 @@ class BlobDBImpl::GarbageCollectionWriteCallback : public WriteCallback { SequenceNumber upper_bound) : cfd_(cfd), key_(key), upper_bound_(upper_bound) {} - virtual Status Callback(DB* db) override { + Status Callback(DB* db) override { auto* db_impl = reinterpret_cast(db); auto* sv = db_impl->GetAndRefSuperVersion(cfd_); SequenceNumber latest_seq = 0; @@ -1414,7 +1445,7 @@ class BlobDBImpl::GarbageCollectionWriteCallback : public WriteCallback { return s; } - virtual bool AllowWriteBatching() override { return false; } + bool AllowWriteBatching() override { return false; } private: ColumnFamilyData* cfd_; @@ -1445,7 +1476,7 @@ Status BlobDBImpl::GCFileAndUpdateLSM(const std::shared_ptr& bfptr, bfptr->OpenRandomAccessReader(env_, db_options_, env_options_); if (!reader) { ROCKS_LOG_ERROR(db_options_.info_log, - "File sequential reader could not be opened", + "File sequential reader could not be opened for %s", bfptr->PathName().c_str()); return Status::IOError("failed to create sequential reader"); } @@ -1459,8 +1490,7 @@ Status BlobDBImpl::GCFileAndUpdateLSM(const std::shared_ptr& bfptr, return s; } - auto* cfh = - db_impl_->GetColumnFamilyHandleUnlocked(bfptr->column_family_id()); + auto cfh = db_impl_->DefaultColumnFamily(); auto* cfd = reinterpret_cast(cfh)->cfd(); auto column_family_id = cfd->GetID(); bool has_ttl = header.has_ttl; @@ -1575,7 +1605,13 @@ Status BlobDBImpl::GCFileAndUpdateLSM(const std::shared_ptr& bfptr, reason += bfptr->PathName(); newfile = NewBlobFile(reason); - new_writer = CheckOrCreateWriterLocked(newfile); + s = CheckOrCreateWriterLocked(newfile, &new_writer); + if (!s.ok()) { + ROCKS_LOG_ERROR(db_options_.info_log, + "Failed to open file %s for writer, error: %s", + newfile->PathName().c_str(), s.ToString().c_str()); + break; + } // Can't use header beyond this point newfile->header_ = std::move(header); newfile->header_valid_ = true; @@ -1720,7 +1756,8 @@ std::pair BlobDBImpl::DeleteObsoleteFiles(bool aborted) { bfile->PathName().c_str()); blob_files_.erase(bfile->BlobFileNumber()); - Status s = env_->DeleteFile(bfile->PathName()); + Status s = DeleteDBFile(&(db_impl_->immutable_db_options()), + bfile->PathName(), blob_dir_, true); if (!s.ok()) { ROCKS_LOG_ERROR(db_options_.info_log, "File failed to be deleted as obsolete %s", @@ -1810,7 +1847,7 @@ Status DestroyBlobDB(const std::string& dbname, const Options& options, uint64_t number; FileType type; if (ParseFileName(f, &number, &type) && type == kBlobFile) { - Status del = env->DeleteFile(blobdir + "/" + f); + Status del = DeleteDBFile(&soptions, blobdir + "/" + f, blobdir, true); if (status.ok() && !del.ok()) { status = del; } diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.h b/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.h index 4296d5c6a..0a22c0acd 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.h +++ b/ceph/src/rocksdb/utilities/blob_db/blob_db_impl.h @@ -47,21 +47,6 @@ struct BlobCompactionContext; class BlobDBImpl; class BlobFile; -// this implements the callback from the WAL which ensures that the -// blob record is present in the blob log. If fsync/fdatasync in not -// happening on every write, there is the probability that keys in the -// blob log can lag the keys in blobs -// TODO(yiwu): implement the WAL filter. -class BlobReconcileWalFilter : public WalFilter { - public: - virtual WalFilter::WalProcessingOption LogRecordFound( - unsigned long long log_number, const std::string& log_file_name, - const WriteBatch& batch, WriteBatch* new_batch, - bool* batch_changed) override; - - virtual const char* Name() const override { return "BlobDBWalReconciler"; } -}; - // Comparator to sort "TTL" aware Blob files based on the lower value of // TTL range. struct BlobFileComparatorTTL { @@ -212,6 +197,8 @@ class BlobDBImpl : public BlobDB { void TEST_DeleteObsoleteFiles(); uint64_t TEST_live_sst_size(); + + const std::string& TEST_blob_dir() const { return blob_dir_; } #endif // !NDEBUG private: @@ -255,10 +242,11 @@ class BlobDBImpl : public BlobDB { // find an existing blob log file based on the expiration unix epoch // if such a file does not exist, return nullptr - std::shared_ptr SelectBlobFileTTL(uint64_t expiration); + Status SelectBlobFileTTL(uint64_t expiration, + std::shared_ptr* blob_file); // find an existing blob log file to append the value to - std::shared_ptr SelectBlobFile(); + Status SelectBlobFile(std::shared_ptr* blob_file); std::shared_ptr FindBlobFileLocked(uint64_t expiration) const; @@ -309,8 +297,8 @@ class BlobDBImpl : public BlobDB { // returns a Writer object for the file. If writer is not // already present, creates one. Needs Write Mutex to be held - std::shared_ptr CheckOrCreateWriterLocked( - const std::shared_ptr& bfile); + Status CheckOrCreateWriterLocked(const std::shared_ptr& blob_file, + std::shared_ptr* writer); // Iterate through keys and values on Blob and write into // separate file the remaining blobs and delete/update pointers @@ -347,7 +335,8 @@ class BlobDBImpl : public BlobDB { ColumnFamilyOptions cf_options_; EnvOptions env_options_; - // Raw pointer of statistic. db_options_ has a shared_ptr to hold ownership. + // Raw pointer of statistic. db_options_ has a std::shared_ptr to hold + // ownership. Statistics* statistics_; // by default this is "blob_dir" under dbname_ diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_db_test.cc b/ceph/src/rocksdb/utilities/blob_db/blob_db_test.cc index cf8f1217a..afb953df9 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_db_test.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_db_test.cc @@ -6,6 +6,7 @@ #ifndef ROCKSDB_LITE #include +#include #include #include #include @@ -17,7 +18,9 @@ #include "rocksdb/utilities/debug.h" #include "util/cast_util.h" #include "util/fault_injection_test_env.h" +#include "util/file_util.h" #include "util/random.h" +#include "util/sst_file_manager_impl.h" #include "util/string_util.h" #include "util/sync_point.h" #include "util/testharness.h" @@ -47,7 +50,7 @@ class BlobDBTest : public testing::Test { assert(s.ok()); } - ~BlobDBTest() { + ~BlobDBTest() override { SyncPoint::GetInstance()->ClearAllCallBacks(); Destroy(); } @@ -71,6 +74,12 @@ class BlobDBTest : public testing::Test { Open(bdb_options, options); } + void Close() { + assert(blob_db_ != nullptr); + delete blob_db_; + blob_db_ = nullptr; + } + void Destroy() { if (blob_db_) { Options options = blob_db_->GetOptions(); @@ -374,6 +383,19 @@ TEST_F(BlobDBTest, GetIOError) { fault_injection_env_->SetFilesystemActive(true); } +TEST_F(BlobDBTest, PutIOError) { + Options options; + options.env = fault_injection_env_.get(); + BlobDBOptions bdb_options; + bdb_options.min_blob_size = 0; // Make sure value write to blob file + bdb_options.disable_background_tasks = true; + Open(bdb_options, options); + fault_injection_env_->SetFilesystemActive(false, Status::IOError()); + ASSERT_TRUE(Put("foo", "v1").IsIOError()); + fault_injection_env_->SetFilesystemActive(true, Status::IOError()); + ASSERT_OK(Put("bar", "v1")); +} + TEST_F(BlobDBTest, WriteBatch) { Random rnd(301); BlobDBOptions bdb_options; @@ -749,6 +771,115 @@ TEST_F(BlobDBTest, ReadWhileGC) { } } +TEST_F(BlobDBTest, SstFileManager) { + // run the same test for Get(), MultiGet() and Iterator each. + std::shared_ptr sst_file_manager( + NewSstFileManager(mock_env_.get())); + sst_file_manager->SetDeleteRateBytesPerSecond(1); + SstFileManagerImpl *sfm = + static_cast(sst_file_manager.get()); + + BlobDBOptions bdb_options; + bdb_options.min_blob_size = 0; + Options db_options; + + int files_deleted_directly = 0; + int files_scheduled_to_delete = 0; + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "SstFileManagerImpl::ScheduleFileDeletion", + [&](void * /*arg*/) { files_scheduled_to_delete++; }); + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "DeleteScheduler::DeleteFile", + [&](void * /*arg*/) { files_deleted_directly++; }); + SyncPoint::GetInstance()->EnableProcessing(); + db_options.sst_file_manager = sst_file_manager; + + Open(bdb_options, db_options); + + // Create one obselete file and clean it. + blob_db_->Put(WriteOptions(), "foo", "bar"); + auto blob_files = blob_db_impl()->TEST_GetBlobFiles(); + ASSERT_EQ(1, blob_files.size()); + std::shared_ptr bfile = blob_files[0]; + ASSERT_OK(blob_db_impl()->TEST_CloseBlobFile(bfile)); + GCStats gc_stats; + ASSERT_OK(blob_db_impl()->TEST_GCFileAndUpdateLSM(bfile, &gc_stats)); + blob_db_impl()->TEST_DeleteObsoleteFiles(); + + // Even if SSTFileManager is not set, DB is creating a dummy one. + ASSERT_EQ(1, files_scheduled_to_delete); + ASSERT_EQ(0, files_deleted_directly); + Destroy(); + // Make sure that DestroyBlobDB() also goes through delete scheduler. + ASSERT_GE(files_scheduled_to_delete, 2); + // Due to a timing issue, the WAL may or may not be deleted directly. The + // blob file is first scheduled, followed by WAL. If the background trash + // thread does not wake up on time, the WAL file will be directly + // deleted as the trash size will be > DB size + ASSERT_LE(files_deleted_directly, 1); + SyncPoint::GetInstance()->DisableProcessing(); + sfm->WaitForEmptyTrash(); +} + +TEST_F(BlobDBTest, SstFileManagerRestart) { + int files_deleted_directly = 0; + int files_scheduled_to_delete = 0; + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "SstFileManagerImpl::ScheduleFileDeletion", + [&](void * /*arg*/) { files_scheduled_to_delete++; }); + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "DeleteScheduler::DeleteFile", + [&](void * /*arg*/) { files_deleted_directly++; }); + + // run the same test for Get(), MultiGet() and Iterator each. + std::shared_ptr sst_file_manager( + NewSstFileManager(mock_env_.get())); + sst_file_manager->SetDeleteRateBytesPerSecond(1); + SstFileManagerImpl *sfm = + static_cast(sst_file_manager.get()); + + BlobDBOptions bdb_options; + bdb_options.min_blob_size = 0; + Options db_options; + + SyncPoint::GetInstance()->EnableProcessing(); + db_options.sst_file_manager = sst_file_manager; + + Open(bdb_options, db_options); + std::string blob_dir = blob_db_impl()->TEST_blob_dir(); + blob_db_->Put(WriteOptions(), "foo", "bar"); + Close(); + + // Create 3 dummy trash files under the blob_dir + CreateFile(db_options.env, blob_dir + "/000666.blob.trash", "", false); + CreateFile(db_options.env, blob_dir + "/000888.blob.trash", "", true); + CreateFile(db_options.env, blob_dir + "/something_not_match.trash", "", + false); + + // Make sure that reopening the DB rescan the existing trash files + Open(bdb_options, db_options); + ASSERT_GE(files_scheduled_to_delete, 3); + // Depending on timing, the WAL file may or may not be directly deleted + ASSERT_LE(files_deleted_directly, 1); + + sfm->WaitForEmptyTrash(); + + // There should be exact one file under the blob dir now. + std::vector all_files; + ASSERT_OK(db_options.env->GetChildren(blob_dir, &all_files)); + int nfiles = 0; + for (const auto &f : all_files) { + assert(!f.empty()); + if (f[0] == '.') { + continue; + } + nfiles++; + } + ASSERT_EQ(nfiles, 1); + + SyncPoint::GetInstance()->DisableProcessing(); +} + TEST_F(BlobDBTest, SnapshotAndGarbageCollection) { BlobDBOptions bdb_options; bdb_options.min_blob_size = 0; @@ -1174,12 +1305,12 @@ TEST_F(BlobDBTest, InlineSmallValues) { TEST_F(BlobDBTest, CompactionFilterNotSupported) { class TestCompactionFilter : public CompactionFilter { - virtual const char *Name() const { return "TestCompactionFilter"; } + const char *Name() const override { return "TestCompactionFilter"; } }; class TestCompactionFilterFactory : public CompactionFilterFactory { - virtual const char *Name() const { return "TestCompactionFilterFactory"; } - virtual std::unique_ptr CreateCompactionFilter( - const CompactionFilter::Context & /*context*/) { + const char *Name() const override { return "TestCompactionFilterFactory"; } + std::unique_ptr CreateCompactionFilter( + const CompactionFilter::Context & /*context*/) override { return std::unique_ptr(new TestCompactionFilter()); } }; @@ -1482,6 +1613,68 @@ TEST_F(BlobDBTest, DisableFileDeletions) { } } +TEST_F(BlobDBTest, ShutdownWait) { + BlobDBOptions bdb_options; + bdb_options.ttl_range_secs = 100; + bdb_options.min_blob_size = 0; + bdb_options.disable_background_tasks = false; + Options options; + options.env = mock_env_.get(); + + SyncPoint::GetInstance()->LoadDependency({ + {"BlobDBImpl::EvictExpiredFiles:0", "BlobDBTest.ShutdownWait:0"}, + {"BlobDBTest.ShutdownWait:1", "BlobDBImpl::EvictExpiredFiles:1"}, + {"BlobDBImpl::EvictExpiredFiles:2", "BlobDBTest.ShutdownWait:2"}, + {"BlobDBTest.ShutdownWait:3", "BlobDBImpl::EvictExpiredFiles:3"}, + }); + // Force all tasks to be scheduled immediately. + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "TimeQueue::Add:item.end", [&](void *arg) { + std::chrono::steady_clock::time_point *tp = + static_cast(arg); + *tp = + std::chrono::steady_clock::now() - std::chrono::milliseconds(10000); + }); + + rocksdb::SyncPoint::GetInstance()->SetCallBack( + "BlobDBImpl::EvictExpiredFiles:cb", [&](void * /*arg*/) { + // Sleep 3 ms to increase the chance of data race. + // We've synced up the code so that EvictExpiredFiles() + // is called concurrently with ~BlobDBImpl(). + // ~BlobDBImpl() is supposed to wait for all background + // task to shutdown before doing anything else. In order + // to use the same test to reproduce a bug of the waiting + // logic, we wait a little bit here, so that TSAN can + // catch the data race. + // We should improve the test if we find a better way. + Env::Default()->SleepForMicroseconds(3000); + }); + + SyncPoint::GetInstance()->EnableProcessing(); + + Open(bdb_options, options); + mock_env_->set_current_time(50); + std::map data; + ASSERT_OK(PutWithTTL("foo", "bar", 100, &data)); + auto blob_files = blob_db_impl()->TEST_GetBlobFiles(); + ASSERT_EQ(1, blob_files.size()); + auto blob_file = blob_files[0]; + ASSERT_FALSE(blob_file->Immutable()); + ASSERT_FALSE(blob_file->Obsolete()); + VerifyDB(data); + + TEST_SYNC_POINT("BlobDBTest.ShutdownWait:0"); + mock_env_->set_current_time(250); + // The key should expired now. + TEST_SYNC_POINT("BlobDBTest.ShutdownWait:1"); + + TEST_SYNC_POINT("BlobDBTest.ShutdownWait:2"); + TEST_SYNC_POINT("BlobDBTest.ShutdownWait:3"); + Close(); + + SyncPoint::GetInstance()->DisableProcessing(); +} + } // namespace blob_db } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.cc b/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.cc index 7ce0697e3..37eee19db 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.cc @@ -208,10 +208,13 @@ Status BlobDumpTool::DumpRecord(DisplayType show_key, DisplayType show_blob, if (compression != kNoCompression && (show_uncompressed_blob != DisplayType::kNone || show_summary)) { BlockContents contents; - UncompressionContext uncompression_ctx(compression); + UncompressionContext context(compression); + UncompressionInfo info(context, UncompressionDict::GetEmptyDict(), + compression); s = UncompressBlockContentsForCompressionType( - uncompression_ctx, slice.data() + key_size, static_cast(value_size), - &contents, 2 /*compress_format_version*/, ImmutableCFOptions(Options())); + info, slice.data() + key_size, static_cast(value_size), + &contents, 2 /*compress_format_version*/, + ImmutableCFOptions(Options())); if (!s.ok()) { return s; } diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.h b/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.h index e91feffa7..ff4672fd3 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.h +++ b/ceph/src/rocksdb/utilities/blob_db/blob_dump_tool.h @@ -33,7 +33,7 @@ class BlobDumpTool { private: std::unique_ptr reader_; - std::unique_ptr buffer_; + std::unique_ptr buffer_; size_t buffer_size_; Status Read(uint64_t offset, size_t size, Slice* result); diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_file.cc b/ceph/src/rocksdb/utilities/blob_db/blob_file.cc index 6e70bdcb0..3bcbd0487 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_file.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_file.cc @@ -244,14 +244,14 @@ Status BlobFile::ReadMetadata(Env* env, const EnvOptions& env_options) { file_size_ = file_size; } else { ROCKS_LOG_ERROR(info_log_, - "Failed to get size of blob file %" ROCKSDB_PRIszt + "Failed to get size of blob file %" PRIu64 ", status: %s", file_number_, s.ToString().c_str()); return s; } if (file_size < BlobLogHeader::kSize) { ROCKS_LOG_ERROR(info_log_, - "Incomplete blob file blob file %" ROCKSDB_PRIszt + "Incomplete blob file blob file %" PRIu64 ", size: %" PRIu64, file_number_, file_size); return Status::Corruption("Incomplete blob file header."); @@ -262,7 +262,7 @@ Status BlobFile::ReadMetadata(Env* env, const EnvOptions& env_options) { s = env->NewRandomAccessFile(PathName(), &file, env_options); if (!s.ok()) { ROCKS_LOG_ERROR(info_log_, - "Failed to open blob file %" ROCKSDB_PRIszt ", status: %s", + "Failed to open blob file %" PRIu64 ", status: %s", file_number_, s.ToString().c_str()); return s; } @@ -275,7 +275,7 @@ Status BlobFile::ReadMetadata(Env* env, const EnvOptions& env_options) { s = file_reader->Read(0, BlobLogHeader::kSize, &header_slice, header_buf); if (!s.ok()) { ROCKS_LOG_ERROR(info_log_, - "Failed to read header of blob file %" ROCKSDB_PRIszt + "Failed to read header of blob file %" PRIu64 ", status: %s", file_number_, s.ToString().c_str()); return s; @@ -284,7 +284,7 @@ Status BlobFile::ReadMetadata(Env* env, const EnvOptions& env_options) { s = header.DecodeFrom(header_slice); if (!s.ok()) { ROCKS_LOG_ERROR(info_log_, - "Failed to decode header of blob file %" ROCKSDB_PRIszt + "Failed to decode header of blob file %" PRIu64 ", status: %s", file_number_, s.ToString().c_str()); return s; @@ -309,7 +309,7 @@ Status BlobFile::ReadMetadata(Env* env, const EnvOptions& env_options) { &footer_slice, footer_buf); if (!s.ok()) { ROCKS_LOG_ERROR(info_log_, - "Failed to read footer of blob file %" ROCKSDB_PRIszt + "Failed to read footer of blob file %" PRIu64 ", status: %s", file_number_, s.ToString().c_str()); return s; diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_log_format.cc b/ceph/src/rocksdb/utilities/blob_db/blob_log_format.cc index 2bf702848..8726cb8f1 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_log_format.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_log_format.cc @@ -82,7 +82,7 @@ Status BlobLogFooter::DecodeFrom(Slice src) { uint32_t src_crc = 0; src_crc = crc32c::Value(src.data(), BlobLogFooter::kSize - sizeof(uint32_t)); src_crc = crc32c::Mask(src_crc); - uint32_t magic_number; + uint32_t magic_number = 0; if (!GetFixed32(&src, &magic_number) || !GetFixed64(&src, &blob_count) || !GetFixed64(&src, &expiration_range.first) || !GetFixed64(&src, &expiration_range.second) || !GetFixed32(&src, &crc)) { diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_log_format.h b/ceph/src/rocksdb/utilities/blob_db/blob_log_format.h index 3e1b686aa..fcc042f06 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_log_format.h +++ b/ceph/src/rocksdb/utilities/blob_db/blob_log_format.h @@ -10,7 +10,9 @@ #ifndef ROCKSDB_LITE #include +#include #include + #include "rocksdb/options.h" #include "rocksdb/slice.h" #include "rocksdb/status.h" @@ -106,8 +108,8 @@ struct BlobLogRecord { uint32_t blob_crc = 0; Slice key; Slice value; - std::string key_buf; - std::string value_buf; + std::unique_ptr key_buf; + std::unique_ptr value_buf; uint64_t record_size() const { return kHeaderSize + key_size + value_size; } diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.cc b/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.cc index 4996d987b..8ffcc2fa1 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.cc @@ -16,7 +16,7 @@ namespace rocksdb { namespace blob_db { -Reader::Reader(unique_ptr&& file_reader, Env* env, +Reader::Reader(std::unique_ptr&& file_reader, Env* env, Statistics* statistics) : file_(std::move(file_reader)), env_(env), @@ -24,10 +24,9 @@ Reader::Reader(unique_ptr&& file_reader, Env* env, buffer_(), next_byte_(0) {} -Status Reader::ReadSlice(uint64_t size, Slice* slice, std::string* buf) { +Status Reader::ReadSlice(uint64_t size, Slice* slice, char* buf) { StopWatch read_sw(env_, statistics_, BLOB_DB_BLOB_FILE_READ_MICROS); - buf->reserve(static_cast(size)); - Status s = file_->Read(next_byte_, static_cast(size), slice, &(*buf)[0]); + Status s = file_->Read(next_byte_, static_cast(size), slice, buf); next_byte_ += size; if (!s.ok()) { return s; @@ -42,7 +41,7 @@ Status Reader::ReadSlice(uint64_t size, Slice* slice, std::string* buf) { Status Reader::ReadHeader(BlobLogHeader* header) { assert(file_.get() != nullptr); assert(next_byte_ == 0); - Status s = ReadSlice(BlobLogHeader::kSize, &buffer_, &backing_store_); + Status s = ReadSlice(BlobLogHeader::kSize, &buffer_, header_buf_); if (!s.ok()) { return s; } @@ -56,7 +55,7 @@ Status Reader::ReadHeader(BlobLogHeader* header) { Status Reader::ReadRecord(BlobLogRecord* record, ReadLevel level, uint64_t* blob_offset) { - Status s = ReadSlice(BlobLogRecord::kHeaderSize, &buffer_, &backing_store_); + Status s = ReadSlice(BlobLogRecord::kHeaderSize, &buffer_, header_buf_); if (!s.ok()) { return s; } @@ -80,14 +79,18 @@ Status Reader::ReadRecord(BlobLogRecord* record, ReadLevel level, break; case kReadHeaderKey: - s = ReadSlice(record->key_size, &record->key, &record->key_buf); + record->key_buf.reset(new char[record->key_size]); + s = ReadSlice(record->key_size, &record->key, record->key_buf.get()); next_byte_ += record->value_size; break; case kReadHeaderKeyBlob: - s = ReadSlice(record->key_size, &record->key, &record->key_buf); + record->key_buf.reset(new char[record->key_size]); + s = ReadSlice(record->key_size, &record->key, record->key_buf.get()); if (s.ok()) { - s = ReadSlice(record->value_size, &record->value, &record->value_buf); + record->value_buf.reset(new char[record->value_size]); + s = ReadSlice(record->value_size, &record->value, + record->value_buf.get()); } if (s.ok()) { s = record->CheckBlobCRC(); diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.h b/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.h index 4b780decd..45e2e9551 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.h +++ b/ceph/src/rocksdb/utilities/blob_db/blob_log_reader.h @@ -60,19 +60,19 @@ class Reader { Status ReadRecord(BlobLogRecord* record, ReadLevel level = kReadHeader, uint64_t* blob_offset = nullptr); - Status ReadSlice(uint64_t size, Slice* slice, std::string* buf); - void ResetNextByte() { next_byte_ = 0; } uint64_t GetNextByte() const { return next_byte_; } private: + Status ReadSlice(uint64_t size, Slice* slice, char* buf); + const std::unique_ptr file_; Env* env_; Statistics* statistics_; - std::string backing_store_; Slice buffer_; + char header_buf_[BlobLogRecord::kHeaderSize]; // which byte to read next. For asserting proper usage uint64_t next_byte_; diff --git a/ceph/src/rocksdb/utilities/blob_db/blob_log_writer.cc b/ceph/src/rocksdb/utilities/blob_db/blob_log_writer.cc index 9b0ca74f7..51578c5c3 100644 --- a/ceph/src/rocksdb/utilities/blob_db/blob_log_writer.cc +++ b/ceph/src/rocksdb/utilities/blob_db/blob_log_writer.cc @@ -19,7 +19,7 @@ namespace rocksdb { namespace blob_db { -Writer::Writer(unique_ptr&& dest, Env* env, +Writer::Writer(std::unique_ptr&& dest, Env* env, Statistics* statistics, uint64_t log_number, uint64_t bpsync, bool use_fs, uint64_t boffset) : dest_(std::move(dest)), diff --git a/ceph/src/rocksdb/utilities/cassandra/cassandra_format_test.cc b/ceph/src/rocksdb/utilities/cassandra/cassandra_format_test.cc index e0fe28b3a..8f9baa723 100644 --- a/ceph/src/rocksdb/utilities/cassandra/cassandra_format_test.cc +++ b/ceph/src/rocksdb/utilities/cassandra/cassandra_format_test.cc @@ -125,13 +125,14 @@ TEST(ExpiringColumnTest, ExpiringColumn) { TEST(TombstoneTest, TombstoneCollectable) { int32_t now = (int32_t)time(nullptr); int32_t gc_grace_seconds = 16440; + int32_t time_delta_seconds = 10; EXPECT_TRUE(Tombstone(ColumnTypeMask::DELETION_MASK, 0, - now - gc_grace_seconds, - ToMicroSeconds(now - gc_grace_seconds)) + now - gc_grace_seconds - time_delta_seconds, + ToMicroSeconds(now - gc_grace_seconds - time_delta_seconds)) .Collectable(gc_grace_seconds)); EXPECT_FALSE(Tombstone(ColumnTypeMask::DELETION_MASK, 0, - now - gc_grace_seconds + 1, - ToMicroSeconds(now - gc_grace_seconds + 1)) + now - gc_grace_seconds + time_delta_seconds, + ToMicroSeconds(now - gc_grace_seconds + time_delta_seconds)) .Collectable(gc_grace_seconds)); } diff --git a/ceph/src/rocksdb/utilities/cassandra/cassandra_functional_test.cc b/ceph/src/rocksdb/utilities/cassandra/cassandra_functional_test.cc index 3e612b3ad..dacc6f03c 100644 --- a/ceph/src/rocksdb/utilities/cassandra/cassandra_functional_test.cc +++ b/ceph/src/rocksdb/utilities/cassandra/cassandra_functional_test.cc @@ -99,15 +99,13 @@ public: : purge_ttl_on_expiration_(purge_ttl_on_expiration), gc_grace_period_in_seconds_(gc_grace_period_in_seconds) {} - virtual std::unique_ptr CreateCompactionFilter( + std::unique_ptr CreateCompactionFilter( const CompactionFilter::Context& /*context*/) override { - return unique_ptr(new CassandraCompactionFilter( + return std::unique_ptr(new CassandraCompactionFilter( purge_ttl_on_expiration_, gc_grace_period_in_seconds_)); - } + } - virtual const char* Name() const override { - return "TestCompactionFilterFactory"; - } + const char* Name() const override { return "TestCompactionFilterFactory"; } private: bool purge_ttl_on_expiration_; diff --git a/ceph/src/rocksdb/utilities/cassandra/format.cc b/ceph/src/rocksdb/utilities/cassandra/format.cc index 4a22658de..42cd7206b 100644 --- a/ceph/src/rocksdb/utilities/cassandra/format.cc +++ b/ceph/src/rocksdb/utilities/cassandra/format.cc @@ -266,7 +266,7 @@ RowValue RowValue::ConvertExpiredColumnsToTombstones(bool* changed) const { std::static_pointer_cast(column); if(expiring_column->Expired()) { - shared_ptr tombstone = expiring_column->ToTombstone(); + std::shared_ptr tombstone = expiring_column->ToTombstone(); new_columns.push_back(tombstone); *changed = true; continue; diff --git a/ceph/src/rocksdb/utilities/checkpoint/checkpoint_impl.cc b/ceph/src/rocksdb/utilities/checkpoint/checkpoint_impl.cc index 48f9200fb..9863ac1d5 100644 --- a/ceph/src/rocksdb/utilities/checkpoint/checkpoint_impl.cc +++ b/ceph/src/rocksdb/utilities/checkpoint/checkpoint_impl.cc @@ -133,7 +133,7 @@ Status CheckpointImpl::CreateCheckpoint(const std::string& checkpoint_dir, s = db_->GetEnv()->RenameFile(full_private_path, checkpoint_dir); } if (s.ok()) { - unique_ptr checkpoint_directory; + std::unique_ptr checkpoint_directory; db_->GetEnv()->NewDirectory(checkpoint_dir, &checkpoint_directory); if (checkpoint_directory != nullptr) { s = checkpoint_directory->Fsync(); diff --git a/ceph/src/rocksdb/utilities/checkpoint/checkpoint_test.cc b/ceph/src/rocksdb/utilities/checkpoint/checkpoint_test.cc index 62c78faa8..9318a733d 100644 --- a/ceph/src/rocksdb/utilities/checkpoint/checkpoint_test.cc +++ b/ceph/src/rocksdb/utilities/checkpoint/checkpoint_test.cc @@ -66,7 +66,7 @@ class CheckpointTest : public testing::Test { Reopen(options); } - ~CheckpointTest() { + ~CheckpointTest() override { rocksdb::SyncPoint::GetInstance()->DisableProcessing(); rocksdb::SyncPoint::GetInstance()->LoadDependency({}); rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); @@ -164,6 +164,16 @@ class CheckpointTest : public testing::Test { return DB::OpenForReadOnly(options, dbname_, &db_); } + Status ReadOnlyReopenWithColumnFamilies(const std::vector& cfs, + const Options& options) { + std::vector column_families; + for (const auto& cf : cfs) { + column_families.emplace_back(cf, options); + } + return DB::OpenForReadOnly(options, dbname_, column_families, &handles_, + &db_); + } + Status TryReopen(const Options& options) { Close(); last_options_ = options; @@ -612,6 +622,69 @@ TEST_F(CheckpointTest, CheckpointWithUnsyncedDataDropped) { db_ = nullptr; } +TEST_F(CheckpointTest, CheckpointReadOnlyDB) { + ASSERT_OK(Put("foo", "foo_value")); + ASSERT_OK(Flush()); + Close(); + Options options = CurrentOptions(); + ASSERT_OK(ReadOnlyReopen(options)); + Checkpoint* checkpoint = nullptr; + ASSERT_OK(Checkpoint::Create(db_, &checkpoint)); + ASSERT_OK(checkpoint->CreateCheckpoint(snapshot_name_)); + delete checkpoint; + checkpoint = nullptr; + Close(); + DB* snapshot_db = nullptr; + ASSERT_OK(DB::Open(options, snapshot_name_, &snapshot_db)); + ReadOptions read_opts; + std::string get_result; + ASSERT_OK(snapshot_db->Get(read_opts, "foo", &get_result)); + ASSERT_EQ("foo_value", get_result); + delete snapshot_db; +} + +TEST_F(CheckpointTest, CheckpointReadOnlyDBWithMultipleColumnFamilies) { + Options options = CurrentOptions(); + CreateAndReopenWithCF({"pikachu", "eevee"}, options); + for (int i = 0; i != 3; ++i) { + ASSERT_OK(Put(i, "foo", "foo_value")); + ASSERT_OK(Flush(i)); + } + Close(); + Status s = ReadOnlyReopenWithColumnFamilies( + {kDefaultColumnFamilyName, "pikachu", "eevee"}, options); + ASSERT_OK(s); + Checkpoint* checkpoint = nullptr; + ASSERT_OK(Checkpoint::Create(db_, &checkpoint)); + ASSERT_OK(checkpoint->CreateCheckpoint(snapshot_name_)); + delete checkpoint; + checkpoint = nullptr; + Close(); + + std::vector column_families{ + {kDefaultColumnFamilyName, options}, + {"pikachu", options}, + {"eevee", options}}; + DB* snapshot_db = nullptr; + std::vector snapshot_handles; + s = DB::Open(options, snapshot_name_, column_families, &snapshot_handles, + &snapshot_db); + ASSERT_OK(s); + ReadOptions read_opts; + for (int i = 0; i != 3; ++i) { + std::string get_result; + s = snapshot_db->Get(read_opts, snapshot_handles[i], "foo", &get_result); + ASSERT_OK(s); + ASSERT_EQ("foo_value", get_result); + } + + for (auto snapshot_h : snapshot_handles) { + delete snapshot_h; + } + snapshot_handles.clear(); + delete snapshot_db; +} + } // namespace rocksdb int main(int argc, char** argv) { diff --git a/ceph/src/rocksdb/utilities/col_buf_decoder.cc b/ceph/src/rocksdb/utilities/col_buf_decoder.cc deleted file mode 100644 index 8f9fa74ab..000000000 --- a/ceph/src/rocksdb/utilities/col_buf_decoder.cc +++ /dev/null @@ -1,240 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#include "utilities/col_buf_decoder.h" -#include -#include -#include "port/port.h" - -namespace rocksdb { - -ColBufDecoder::~ColBufDecoder() {} - -namespace { - -inline uint64_t EncodeFixed64WithEndian(uint64_t val, bool big_endian, - size_t size) { - if (big_endian && port::kLittleEndian) { - val = EndianTransform(val, size); - } else if (!big_endian && !port::kLittleEndian) { - val = EndianTransform(val, size); - } - return val; -} - -} // namespace - -ColBufDecoder* ColBufDecoder::NewColBufDecoder( - const ColDeclaration& col_declaration) { - if (col_declaration.col_type == "FixedLength") { - return new FixedLengthColBufDecoder( - col_declaration.size, col_declaration.col_compression_type, - col_declaration.nullable, col_declaration.big_endian); - } else if (col_declaration.col_type == "VariableLength") { - return new VariableLengthColBufDecoder(); - } else if (col_declaration.col_type == "VariableChunk") { - return new VariableChunkColBufDecoder(col_declaration.col_compression_type); - } else if (col_declaration.col_type == "LongFixedLength") { - return new LongFixedLengthColBufDecoder(col_declaration.size, - col_declaration.nullable); - } - // Unrecognized column type - return nullptr; -} - -namespace { - -void ReadVarint64(const char** src_ptr, uint64_t* val_ptr) { - const char* q = GetVarint64Ptr(*src_ptr, *src_ptr + 10, val_ptr); - assert(q != nullptr); - *src_ptr = q; -} -} // namespace - -size_t FixedLengthColBufDecoder::Init(const char* src) { - remain_runs_ = 0; - last_val_ = 0; - // Dictionary initialization - dict_vec_.clear(); - const char* orig_src = src; - if (col_compression_type_ == kColDict || - col_compression_type_ == kColRleDict) { - const char* q; - uint64_t dict_size; - // Bypass limit - q = GetVarint64Ptr(src, src + 10, &dict_size); - assert(q != nullptr); - src = q; - - uint64_t dict_key; - for (uint64_t i = 0; i < dict_size; ++i) { - // Bypass limit - ReadVarint64(&src, &dict_key); - - dict_key = EncodeFixed64WithEndian(dict_key, big_endian_, size_); - dict_vec_.push_back(dict_key); - } - } - return src - orig_src; -} - -size_t FixedLengthColBufDecoder::Decode(const char* src, char** dest) { - uint64_t read_val = 0; - const char* orig_src = src; - const char* src_limit = src + 20; - if (nullable_) { - bool not_null; - not_null = *src; - src += 1; - if (!not_null) { - return 1; - } - } - if (IsRunLength(col_compression_type_)) { - if (remain_runs_ == 0) { - const char* q; - run_val_ = 0; - if (col_compression_type_ == kColRle) { - memcpy(&run_val_, src, size_); - src += size_; - } else { - q = GetVarint64Ptr(src, src_limit, &run_val_); - assert(q != nullptr); - src = q; - } - - q = GetVarint64Ptr(src, src_limit, &remain_runs_); - assert(q != nullptr); - src = q; - - if (col_compression_type_ != kColRleDeltaVarint && - col_compression_type_ != kColRleDict) { - run_val_ = EncodeFixed64WithEndian(run_val_, big_endian_, size_); - } - } - read_val = run_val_; - } else { - if (col_compression_type_ == kColNoCompression) { - memcpy(&read_val, src, size_); - src += size_; - } else { - // Assume a column does not exceed 8 bytes here - const char* q = GetVarint64Ptr(src, src_limit, &read_val); - assert(q != nullptr); - src = q; - } - if (col_compression_type_ != kColDeltaVarint && - col_compression_type_ != kColDict) { - read_val = EncodeFixed64WithEndian(read_val, big_endian_, size_); - } - } - - uint64_t write_val = read_val; - if (col_compression_type_ == kColDeltaVarint || - col_compression_type_ == kColRleDeltaVarint) { - // does not support 64 bit - - uint64_t mask = (write_val & 1) ? (~uint64_t(0)) : 0; - int64_t delta = (write_val >> 1) ^ mask; - write_val = last_val_ + delta; - - uint64_t tmp = write_val; - write_val = EncodeFixed64WithEndian(write_val, big_endian_, size_); - last_val_ = tmp; - } else if (col_compression_type_ == kColRleDict || - col_compression_type_ == kColDict) { - uint64_t dict_val = read_val; - assert(dict_val < dict_vec_.size()); - write_val = dict_vec_[static_cast(dict_val)]; - } - - // dest->append(reinterpret_cast(&write_val), size_); - memcpy(*dest, reinterpret_cast(&write_val), size_); - *dest += size_; - if (IsRunLength(col_compression_type_)) { - --remain_runs_; - } - return src - orig_src; -} - -size_t LongFixedLengthColBufDecoder::Decode(const char* src, char** dest) { - if (nullable_) { - bool not_null; - not_null = *src; - src += 1; - if (!not_null) { - return 1; - } - } - memcpy(*dest, src, size_); - *dest += size_; - return size_ + 1; -} - -size_t VariableLengthColBufDecoder::Decode(const char* src, char** dest) { - uint8_t len; - len = *src; - memcpy(dest, reinterpret_cast(&len), 1); - *dest += 1; - src += 1; - memcpy(*dest, src, len); - *dest += len; - return len + 1; -} - -size_t VariableChunkColBufDecoder::Init(const char* src) { - // Dictionary initialization - dict_vec_.clear(); - const char* orig_src = src; - if (col_compression_type_ == kColDict) { - const char* q; - uint64_t dict_size; - // Bypass limit - q = GetVarint64Ptr(src, src + 10, &dict_size); - assert(q != nullptr); - src = q; - - uint64_t dict_key; - for (uint64_t i = 0; i < dict_size; ++i) { - // Bypass limit - ReadVarint64(&src, &dict_key); - dict_vec_.push_back(dict_key); - } - } - return src - orig_src; -} - -size_t VariableChunkColBufDecoder::Decode(const char* src, char** dest) { - const char* orig_src = src; - uint64_t size = 0; - ReadVarint64(&src, &size); - int64_t full_chunks = size / 8; - uint64_t chunk_buf; - size_t chunk_size = 8; - for (int64_t i = 0; i < full_chunks + 1; ++i) { - chunk_buf = 0; - if (i == full_chunks) { - chunk_size = size % 8; - } - if (col_compression_type_ == kColDict) { - uint64_t dict_val; - ReadVarint64(&src, &dict_val); - assert(dict_val < dict_vec_.size()); - chunk_buf = dict_vec_[static_cast(dict_val)]; - } else { - memcpy(&chunk_buf, src, chunk_size); - src += chunk_size; - } - memcpy(*dest, reinterpret_cast(&chunk_buf), 8); - *dest += 8; - uint8_t mask = ((0xFF - 8) + chunk_size) & 0xFF; - memcpy(*dest, reinterpret_cast(&mask), 1); - *dest += 1; - } - - return src - orig_src; -} - -} // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/col_buf_decoder.h b/ceph/src/rocksdb/utilities/col_buf_decoder.h deleted file mode 100644 index cea952637..000000000 --- a/ceph/src/rocksdb/utilities/col_buf_decoder.h +++ /dev/null @@ -1,119 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#include -#include -#include -#include -#include -#include -#include "util/coding.h" -#include "utilities/col_buf_encoder.h" - -namespace rocksdb { - -struct ColDeclaration; - -// ColBufDecoder is a class to decode column buffers. It can be populated from a -// ColDeclaration. Before starting decoding, a Init() method should be called. -// Each time it takes a column value into Decode() method. -class ColBufDecoder { - public: - virtual ~ColBufDecoder() = 0; - virtual size_t Init(const char* /*src*/) { return 0; } - virtual size_t Decode(const char* src, char** dest) = 0; - static ColBufDecoder* NewColBufDecoder(const ColDeclaration& col_declaration); - - protected: - std::string buffer_; - static inline bool IsRunLength(ColCompressionType type) { - return type == kColRle || type == kColRleVarint || - type == kColRleDeltaVarint || type == kColRleDict; - } -}; - -class FixedLengthColBufDecoder : public ColBufDecoder { - public: - explicit FixedLengthColBufDecoder( - size_t size, ColCompressionType col_compression_type = kColNoCompression, - bool nullable = false, bool big_endian = false) - : size_(size), - col_compression_type_(col_compression_type), - nullable_(nullable), - big_endian_(big_endian), - remain_runs_(0), - run_val_(0), - last_val_(0) {} - - size_t Init(const char* src) override; - size_t Decode(const char* src, char** dest) override; - ~FixedLengthColBufDecoder() {} - - private: - size_t size_; - ColCompressionType col_compression_type_; - bool nullable_; - bool big_endian_; - - // for decoding - std::vector dict_vec_; - uint64_t remain_runs_; - uint64_t run_val_; - uint64_t last_val_; -}; - -class LongFixedLengthColBufDecoder : public ColBufDecoder { - public: - LongFixedLengthColBufDecoder(size_t size, bool nullable) - : size_(size), nullable_(nullable) {} - - size_t Decode(const char* src, char** dest) override; - ~LongFixedLengthColBufDecoder() {} - - private: - size_t size_; - bool nullable_; -}; - -class VariableLengthColBufDecoder : public ColBufDecoder { - public: - size_t Decode(const char* src, char** dest) override; - ~VariableLengthColBufDecoder() {} -}; - -class VariableChunkColBufDecoder : public VariableLengthColBufDecoder { - public: - size_t Init(const char* src) override; - size_t Decode(const char* src, char** dest) override; - explicit VariableChunkColBufDecoder(ColCompressionType col_compression_type) - : col_compression_type_(col_compression_type) {} - VariableChunkColBufDecoder() : col_compression_type_(kColNoCompression) {} - - private: - ColCompressionType col_compression_type_; - std::unordered_map dictionary_; - std::vector dict_vec_; -}; - -struct KVPairColBufDecoders { - std::vector> key_col_bufs; - std::vector> value_col_bufs; - std::unique_ptr value_checksum_buf; - - explicit KVPairColBufDecoders(const KVPairColDeclarations& kvp_cd) { - for (auto kcd : *kvp_cd.key_col_declarations) { - key_col_bufs.emplace_back( - std::move(ColBufDecoder::NewColBufDecoder(kcd))); - } - for (auto vcd : *kvp_cd.value_col_declarations) { - value_col_bufs.emplace_back( - std::move(ColBufDecoder::NewColBufDecoder(vcd))); - } - value_checksum_buf.reset( - ColBufDecoder::NewColBufDecoder(*kvp_cd.value_checksum_declaration)); - } -}; -} // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/col_buf_encoder.cc b/ceph/src/rocksdb/utilities/col_buf_encoder.cc deleted file mode 100644 index f8b19e8c7..000000000 --- a/ceph/src/rocksdb/utilities/col_buf_encoder.cc +++ /dev/null @@ -1,210 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#include "utilities/col_buf_encoder.h" -#include -#include -#include "port/port.h" - -namespace rocksdb { - -ColBufEncoder::~ColBufEncoder() {} - -namespace { - -inline uint64_t DecodeFixed64WithEndian(uint64_t val, bool big_endian, - size_t size) { - if (big_endian && port::kLittleEndian) { - val = EndianTransform(val, size); - } else if (!big_endian && !port::kLittleEndian) { - val = EndianTransform(val, size); - } - return val; -} - -} // namespace - -const std::string &ColBufEncoder::GetData() { return buffer_; } - -ColBufEncoder *ColBufEncoder::NewColBufEncoder( - const ColDeclaration &col_declaration) { - if (col_declaration.col_type == "FixedLength") { - return new FixedLengthColBufEncoder( - col_declaration.size, col_declaration.col_compression_type, - col_declaration.nullable, col_declaration.big_endian); - } else if (col_declaration.col_type == "VariableLength") { - return new VariableLengthColBufEncoder(); - } else if (col_declaration.col_type == "VariableChunk") { - return new VariableChunkColBufEncoder(col_declaration.col_compression_type); - } else if (col_declaration.col_type == "LongFixedLength") { - return new LongFixedLengthColBufEncoder(col_declaration.size, - col_declaration.nullable); - } - // Unrecognized column type - return nullptr; -} - -size_t FixedLengthColBufEncoder::Append(const char *buf) { - if (nullable_) { - if (buf == nullptr) { - buffer_.append(1, 0); - return 0; - } else { - buffer_.append(1, 1); - } - } - uint64_t read_val = 0; - memcpy(&read_val, buf, size_); - read_val = DecodeFixed64WithEndian(read_val, big_endian_, size_); - - // Determine write value - uint64_t write_val = read_val; - if (col_compression_type_ == kColDeltaVarint || - col_compression_type_ == kColRleDeltaVarint) { - int64_t delta = read_val - last_val_; - // Encode signed delta value - delta = (static_cast(delta) << 1) ^ (delta >> 63); - write_val = delta; - last_val_ = read_val; - } else if (col_compression_type_ == kColDict || - col_compression_type_ == kColRleDict) { - auto iter = dictionary_.find(read_val); - uint64_t dict_val; - if (iter == dictionary_.end()) { - // Add new entry to dictionary - dict_val = dictionary_.size(); - dictionary_.insert(std::make_pair(read_val, dict_val)); - dict_vec_.push_back(read_val); - } else { - dict_val = iter->second; - } - write_val = dict_val; - } - - // Write into buffer - if (IsRunLength(col_compression_type_)) { - if (run_length_ == -1) { - // First element - run_val_ = write_val; - run_length_ = 1; - } else if (write_val != run_val_) { - // End of run - // Write run value - if (col_compression_type_ == kColRle) { - buffer_.append(reinterpret_cast(&run_val_), size_); - } else { - PutVarint64(&buffer_, run_val_); - } - // Write run length - PutVarint64(&buffer_, run_length_); - run_val_ = write_val; - run_length_ = 1; - } else { - run_length_++; - } - } else { // non run-length encodings - if (col_compression_type_ == kColNoCompression) { - buffer_.append(reinterpret_cast(&write_val), size_); - } else { - PutVarint64(&buffer_, write_val); - } - } - return size_; -} - -void FixedLengthColBufEncoder::Finish() { - if (col_compression_type_ == kColDict || - col_compression_type_ == kColRleDict) { - std::string header; - PutVarint64(&header, dict_vec_.size()); - // Put dictionary in the header - for (auto item : dict_vec_) { - PutVarint64(&header, item); - } - buffer_ = header + buffer_; - } - if (IsRunLength(col_compression_type_)) { - // Finish last run value - if (col_compression_type_ == kColRle) { - buffer_.append(reinterpret_cast(&run_val_), size_); - } else { - PutVarint64(&buffer_, run_val_); - } - PutVarint64(&buffer_, run_length_); - } -} - -size_t LongFixedLengthColBufEncoder::Append(const char *buf) { - if (nullable_) { - if (buf == nullptr) { - buffer_.append(1, 0); - return 0; - } else { - buffer_.append(1, 1); - } - } - buffer_.append(buf, size_); - return size_; -} - -void LongFixedLengthColBufEncoder::Finish() {} - -size_t VariableLengthColBufEncoder::Append(const char *buf) { - uint8_t length = 0; - length = *buf; - buffer_.append(buf, 1); - buf += 1; - buffer_.append(buf, length); - return length + 1; -} - -void VariableLengthColBufEncoder::Finish() {} - -size_t VariableChunkColBufEncoder::Append(const char *buf) { - const char *orig_buf = buf; - uint8_t mark = 0xFF; - size_t length = 0; - std::string tmp_buffer; - while (mark == 0xFF) { - uint64_t val; - memcpy(&val, buf, 8); - buf += 8; - mark = *buf; - buf += 1; - int8_t chunk_size = 8 - (0xFF - mark); - if (col_compression_type_ == kColDict) { - auto iter = dictionary_.find(val); - uint64_t dict_val; - if (iter == dictionary_.end()) { - dict_val = dictionary_.size(); - dictionary_.insert(std::make_pair(val, dict_val)); - dict_vec_.push_back(val); - } else { - dict_val = iter->second; - } - PutVarint64(&tmp_buffer, dict_val); - } else { - tmp_buffer.append(reinterpret_cast(&val), chunk_size); - } - length += chunk_size; - } - - PutVarint64(&buffer_, length); - buffer_.append(tmp_buffer); - return buf - orig_buf; -} - -void VariableChunkColBufEncoder::Finish() { - if (col_compression_type_ == kColDict) { - std::string header; - PutVarint64(&header, dict_vec_.size()); - for (auto item : dict_vec_) { - PutVarint64(&header, item); - } - buffer_ = header + buffer_; - } -} - -} // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/col_buf_encoder.h b/ceph/src/rocksdb/utilities/col_buf_encoder.h deleted file mode 100644 index 902879925..000000000 --- a/ceph/src/rocksdb/utilities/col_buf_encoder.h +++ /dev/null @@ -1,219 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#include -#include -#include -#include -#include -#include -#include "util/coding.h" - -namespace rocksdb { - -enum ColCompressionType { - kColNoCompression, - kColRle, - kColVarint, - kColRleVarint, - kColDeltaVarint, - kColRleDeltaVarint, - kColDict, - kColRleDict -}; - -struct ColDeclaration; - -// ColBufEncoder is a class to encode column buffers. It can be populated from a -// ColDeclaration. Each time it takes a column value into Append() method to -// encode the column and store it into an internal buffer. After all rows for -// this column are consumed, a Finish() should be called to add header and -// remaining data. -class ColBufEncoder { - public: - // Read a column, encode data and append into internal buffer. - virtual size_t Append(const char *buf) = 0; - virtual ~ColBufEncoder() = 0; - // Get the internal column buffer. Should only be called after Finish(). - const std::string &GetData(); - // Finish encoding. Add header and remaining data. - virtual void Finish() = 0; - // Populate a ColBufEncoder from ColDeclaration. - static ColBufEncoder *NewColBufEncoder(const ColDeclaration &col_declaration); - - protected: - std::string buffer_; - static inline bool IsRunLength(ColCompressionType type) { - return type == kColRle || type == kColRleVarint || - type == kColRleDeltaVarint || type == kColRleDict; - } -}; - -// Encoder for fixed length column buffer. In fixed length column buffer, the -// size of the column should not exceed 8 bytes. -// The following encodings are supported: -// Varint: Variable length integer. See util/coding.h for more details -// Rle (Run length encoding): encode a sequence of contiguous value as -// [run_value][run_length]. Can be combined with Varint -// Delta: Encode value to its delta with its adjacent entry. Use varint to -// possibly reduce stored bytes. Can be combined with Rle. -// Dictionary: Use a dictionary to record all possible values in the block and -// encode them with an ID started from 0. IDs are encoded as varint. A column -// with dictionary encoding will have a header to store all actual values, -// ordered by their dictionary value, and the data will be replaced by -// dictionary value. Can be combined with Rle. -class FixedLengthColBufEncoder : public ColBufEncoder { - public: - explicit FixedLengthColBufEncoder( - size_t size, ColCompressionType col_compression_type = kColNoCompression, - bool nullable = false, bool big_endian = false) - : size_(size), - col_compression_type_(col_compression_type), - nullable_(nullable), - big_endian_(big_endian), - last_val_(0), - run_length_(-1), - run_val_(0) {} - - size_t Append(const char *buf) override; - void Finish() override; - ~FixedLengthColBufEncoder() {} - - private: - size_t size_; - ColCompressionType col_compression_type_; - // If set as true, the input value can be null (represented as nullptr). When - // nullable is true, use one more byte before actual value to indicate if the - // current value is null. - bool nullable_; - // If set as true, input value will be treated as big endian encoded. - bool big_endian_; - - // for encoding - uint64_t last_val_; - int16_t run_length_; - uint64_t run_val_; - // Map to store dictionary for dictionary encoding - std::unordered_map dictionary_; - // Vector of dictionary keys. - std::vector dict_vec_; -}; - -// Long fixed length column buffer is a variant of fixed length buffer to hold -// fixed length buffer with more than 8 bytes. We do not support any special -// encoding schemes in LongFixedLengthColBufEncoder. -class LongFixedLengthColBufEncoder : public ColBufEncoder { - public: - LongFixedLengthColBufEncoder(size_t size, bool nullable) - : size_(size), nullable_(nullable) {} - size_t Append(const char *buf) override; - void Finish() override; - - ~LongFixedLengthColBufEncoder() {} - - private: - size_t size_; - bool nullable_; -}; - -// Variable length column buffer holds a format of variable length column. In -// this format, a column is composed of one byte length k, followed by data with -// k bytes long data. -class VariableLengthColBufEncoder : public ColBufEncoder { - public: - size_t Append(const char *buf) override; - void Finish() override; - - ~VariableLengthColBufEncoder() {} -}; - -// Variable chunk column buffer holds another format of variable length column. -// In this format, a column contains multiple chunks of data, each of which is -// composed of 8 bytes long data, and one byte as a mask to indicate whether we -// have more data to come. If no more data coming, the mask is set as 0xFF. If -// the chunk is the last chunk and has only k valid bytes, the mask is set as -// 0xFF - (8 - k). -class VariableChunkColBufEncoder : public VariableLengthColBufEncoder { - public: - size_t Append(const char *buf) override; - void Finish() override; - explicit VariableChunkColBufEncoder(ColCompressionType col_compression_type) - : col_compression_type_(col_compression_type) {} - VariableChunkColBufEncoder() : col_compression_type_(kColNoCompression) {} - - private: - ColCompressionType col_compression_type_; - // Map to store dictionary for dictionary encoding - std::unordered_map dictionary_; - // Vector of dictionary keys. - std::vector dict_vec_; -}; - -// ColDeclaration declares a column's type, algorithm of column-aware encoding, -// and other column data like endian and nullability. -struct ColDeclaration { - explicit ColDeclaration( - std::string _col_type, - ColCompressionType _col_compression_type = kColNoCompression, - size_t _size = 0, bool _nullable = false, bool _big_endian = false) - : col_type(_col_type), - col_compression_type(_col_compression_type), - size(_size), - nullable(_nullable), - big_endian(_big_endian) {} - std::string col_type; - ColCompressionType col_compression_type; - size_t size; - bool nullable; - bool big_endian; -}; - -// KVPairColDeclarations is a class to hold column declaration of columns in -// key and value. -struct KVPairColDeclarations { - std::vector *key_col_declarations; - std::vector *value_col_declarations; - ColDeclaration *value_checksum_declaration; - KVPairColDeclarations(std::vector *_key_col_declarations, - std::vector *_value_col_declarations, - ColDeclaration *_value_checksum_declaration) - : key_col_declarations(_key_col_declarations), - value_col_declarations(_value_col_declarations), - value_checksum_declaration(_value_checksum_declaration) {} -}; - -// Similar to KVPairDeclarations, KVPairColBufEncoders is used to hold column -// buffer encoders of all columns in key and value. -struct KVPairColBufEncoders { - std::vector> key_col_bufs; - std::vector> value_col_bufs; - std::unique_ptr value_checksum_buf; - - explicit KVPairColBufEncoders(const KVPairColDeclarations &kvp_cd) { - for (auto kcd : *kvp_cd.key_col_declarations) { - key_col_bufs.emplace_back( - std::move(ColBufEncoder::NewColBufEncoder(kcd))); - } - for (auto vcd : *kvp_cd.value_col_declarations) { - value_col_bufs.emplace_back( - std::move(ColBufEncoder::NewColBufEncoder(vcd))); - } - value_checksum_buf.reset( - ColBufEncoder::NewColBufEncoder(*kvp_cd.value_checksum_declaration)); - } - - // Helper function to call Finish() - void Finish() { - for (auto &col_buf : key_col_bufs) { - col_buf->Finish(); - } - for (auto &col_buf : value_col_bufs) { - col_buf->Finish(); - } - value_checksum_buf->Finish(); - } -}; -} // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/column_aware_encoding_exp.cc b/ceph/src/rocksdb/utilities/column_aware_encoding_exp.cc deleted file mode 100644 index 988a59b3c..000000000 --- a/ceph/src/rocksdb/utilities/column_aware_encoding_exp.cc +++ /dev/null @@ -1,176 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#include -#include - -#ifndef ROCKSDB_LITE -#ifdef GFLAGS - -#include -#include -#include "rocksdb/env.h" -#include "rocksdb/options.h" -#include "table/block_based_table_builder.h" -#include "table/block_based_table_reader.h" -#include "table/format.h" -#include "tools/sst_dump_tool_imp.h" -#include "util/compression.h" -#include "util/gflags_compat.h" -#include "util/stop_watch.h" -#include "utilities/col_buf_encoder.h" -#include "utilities/column_aware_encoding_util.h" - -using GFLAGS_NAMESPACE::ParseCommandLineFlags; -DEFINE_string(encoded_file, "", "file to store encoded data blocks"); -DEFINE_string(decoded_file, "", - "file to store decoded data blocks after encoding"); -DEFINE_string(format, "col", "Output Format. Can be 'row' or 'col'"); -// TODO(jhli): option `col` should be removed and replaced by general -// column specifications. -DEFINE_string(index_type, "col", "Index type. Can be 'primary' or 'secondary'"); -DEFINE_string(dump_file, "", - "Dump data blocks separated by columns in human-readable format"); -DEFINE_bool(decode, false, "Deocde blocks after they are encoded"); -DEFINE_bool(stat, false, - "Print column distribution statistics. Cannot decode in this mode"); -DEFINE_string(compression_type, "kNoCompression", - "The compression algorithm used to compress data blocks"); - -namespace rocksdb { - -class ColumnAwareEncodingExp { - public: - static void Run(const std::string& sst_file) { - bool decode = FLAGS_decode; - if (FLAGS_decoded_file.size() > 0) { - decode = true; - } - if (FLAGS_stat) { - decode = false; - } - - ColumnAwareEncodingReader reader(sst_file); - std::vector* key_col_declarations; - std::vector* value_col_declarations; - ColDeclaration* value_checksum_declaration; - if (FLAGS_index_type == "primary") { - ColumnAwareEncodingReader::GetColDeclarationsPrimary( - &key_col_declarations, &value_col_declarations, - &value_checksum_declaration); - } else { - ColumnAwareEncodingReader::GetColDeclarationsSecondary( - &key_col_declarations, &value_col_declarations, - &value_checksum_declaration); - } - KVPairColDeclarations kvp_cd(key_col_declarations, value_col_declarations, - value_checksum_declaration); - - if (!FLAGS_dump_file.empty()) { - std::vector kv_pair_blocks; - reader.GetKVPairsFromDataBlocks(&kv_pair_blocks); - reader.DumpDataColumns(FLAGS_dump_file, kvp_cd, kv_pair_blocks); - return; - } - std::unordered_map compressions = { - {"kNoCompression", CompressionType::kNoCompression}, - {"kZlibCompression", CompressionType::kZlibCompression}, - {"kZSTD", CompressionType::kZSTD}}; - - // Find Compression - CompressionType compression_type = compressions[FLAGS_compression_type]; - EnvOptions env_options; - if (CompressionTypeSupported(compression_type)) { - fprintf(stdout, "[%s]\n", FLAGS_compression_type.c_str()); - unique_ptr encoded_out_file; - - std::unique_ptr env(NewMemEnv(Env::Default())); - if (!FLAGS_encoded_file.empty()) { - env->NewWritableFile(FLAGS_encoded_file, &encoded_out_file, - env_options); - } - - std::vector kv_pair_blocks; - reader.GetKVPairsFromDataBlocks(&kv_pair_blocks); - - std::vector encoded_blocks; - StopWatchNano sw(env.get(), true); - if (FLAGS_format == "col") { - reader.EncodeBlocks(kvp_cd, encoded_out_file.get(), compression_type, - kv_pair_blocks, &encoded_blocks, FLAGS_stat); - } else { // row format - reader.EncodeBlocksToRowFormat(encoded_out_file.get(), compression_type, - kv_pair_blocks, &encoded_blocks); - } - if (encoded_out_file != nullptr) { - uint64_t size = 0; - env->GetFileSize(FLAGS_encoded_file, &size); - fprintf(stdout, "File size: %" PRIu64 "\n", size); - } - uint64_t encode_time = sw.ElapsedNanosSafe(false /* reset */); - fprintf(stdout, "Encode time: %" PRIu64 "\n", encode_time); - if (decode) { - unique_ptr decoded_out_file; - if (!FLAGS_decoded_file.empty()) { - env->NewWritableFile(FLAGS_decoded_file, &decoded_out_file, - env_options); - } - sw.Start(); - if (FLAGS_format == "col") { - reader.DecodeBlocks(kvp_cd, decoded_out_file.get(), &encoded_blocks); - } else { - reader.DecodeBlocksFromRowFormat(decoded_out_file.get(), - &encoded_blocks); - } - uint64_t decode_time = sw.ElapsedNanosSafe(true /* reset */); - fprintf(stdout, "Decode time: %" PRIu64 "\n", decode_time); - } - } else { - fprintf(stdout, "Unsupported compression type: %s.\n", - FLAGS_compression_type.c_str()); - } - delete key_col_declarations; - delete value_col_declarations; - delete value_checksum_declaration; - } -}; - -} // namespace rocksdb - -int main(int argc, char** argv) { - int arg_idx = ParseCommandLineFlags(&argc, &argv, true); - if (arg_idx >= argc) { - fprintf(stdout, "SST filename required.\n"); - exit(1); - } - std::string sst_file(argv[arg_idx]); - if (FLAGS_format != "row" && FLAGS_format != "col") { - fprintf(stderr, "Format must be 'row' or 'col'\n"); - exit(1); - } - if (FLAGS_index_type != "primary" && FLAGS_index_type != "secondary") { - fprintf(stderr, "Format must be 'primary' or 'secondary'\n"); - exit(1); - } - rocksdb::ColumnAwareEncodingExp::Run(sst_file); - return 0; -} - -#else -int main() { - fprintf(stderr, "Please install gflags to run rocksdb tools\n"); - return 1; -} -#endif // GFLAGS -#else -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "Not supported in lite mode.\n"); - return 1; -} -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/column_aware_encoding_test.cc b/ceph/src/rocksdb/utilities/column_aware_encoding_test.cc deleted file mode 100644 index b99ff563a..000000000 --- a/ceph/src/rocksdb/utilities/column_aware_encoding_test.cc +++ /dev/null @@ -1,254 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// -#ifndef ROCKSDB_LITE - -#include -#include "util/testharness.h" -#include "util/testutil.h" -#include "utilities/col_buf_decoder.h" -#include "utilities/col_buf_encoder.h" - -namespace rocksdb { - -class ColumnAwareEncodingTest : public testing::Test { - public: - ColumnAwareEncodingTest() {} - - ~ColumnAwareEncodingTest() {} -}; - -class ColumnAwareEncodingTestWithSize - : public ColumnAwareEncodingTest, - public testing::WithParamInterface { - public: - ColumnAwareEncodingTestWithSize() {} - - ~ColumnAwareEncodingTestWithSize() {} - - static std::vector GetValues() { return {4, 8}; } -}; - -INSTANTIATE_TEST_CASE_P( - ColumnAwareEncodingTestWithSize, ColumnAwareEncodingTestWithSize, - ::testing::ValuesIn(ColumnAwareEncodingTestWithSize::GetValues())); - -TEST_P(ColumnAwareEncodingTestWithSize, NoCompressionEncodeDecode) { - size_t col_size = GetParam(); - std::unique_ptr col_buf_encoder( - new FixedLengthColBufEncoder(col_size, kColNoCompression, false, true)); - std::string str_buf; - uint64_t base_val = 0x0102030405060708; - uint64_t val = 0; - memcpy(&val, &base_val, col_size); - const int row_count = 4; - for (int i = 0; i < row_count; ++i) { - str_buf.append(reinterpret_cast(&val), col_size); - } - const char* str_buf_ptr = str_buf.c_str(); - for (int i = 0; i < row_count; ++i) { - col_buf_encoder->Append(str_buf_ptr); - } - col_buf_encoder->Finish(); - const std::string& encoded_data = col_buf_encoder->GetData(); - // Check correctness of encoded string length - ASSERT_EQ(row_count * col_size, encoded_data.size()); - - const char* encoded_data_ptr = encoded_data.c_str(); - uint64_t expected_encoded_val; - if (col_size == 8) { - expected_encoded_val = port::kLittleEndian ? 0x0807060504030201 : 0x0102030405060708; - } else if (col_size == 4) { - expected_encoded_val = port::kLittleEndian ? 0x08070605 : 0x0102030400000000; - } - uint64_t encoded_val = 0; - for (int i = 0; i < row_count; ++i) { - memcpy(&encoded_val, encoded_data_ptr, col_size); - // Check correctness of encoded value - ASSERT_EQ(expected_encoded_val, encoded_val); - encoded_data_ptr += col_size; - } - - std::unique_ptr col_buf_decoder( - new FixedLengthColBufDecoder(col_size, kColNoCompression, false, true)); - encoded_data_ptr = encoded_data.c_str(); - encoded_data_ptr += col_buf_decoder->Init(encoded_data_ptr); - char* decoded_data = new char[100]; - char* decoded_data_base = decoded_data; - for (int i = 0; i < row_count; ++i) { - encoded_data_ptr += - col_buf_decoder->Decode(encoded_data_ptr, &decoded_data); - } - - // Check correctness of decoded string length - ASSERT_EQ(row_count * col_size, decoded_data - decoded_data_base); - decoded_data = decoded_data_base; - for (int i = 0; i < row_count; ++i) { - uint64_t decoded_val; - decoded_val = 0; - memcpy(&decoded_val, decoded_data, col_size); - // Check correctness of decoded value - ASSERT_EQ(val, decoded_val); - decoded_data += col_size; - } - delete[] decoded_data_base; -} - -TEST_P(ColumnAwareEncodingTestWithSize, RleEncodeDecode) { - size_t col_size = GetParam(); - std::unique_ptr col_buf_encoder( - new FixedLengthColBufEncoder(col_size, kColRle, false, true)); - std::string str_buf; - uint64_t base_val = 0x0102030405060708; - uint64_t val = 0; - memcpy(&val, &base_val, col_size); - const int row_count = 4; - for (int i = 0; i < row_count; ++i) { - str_buf.append(reinterpret_cast(&val), col_size); - } - const char* str_buf_ptr = str_buf.c_str(); - for (int i = 0; i < row_count; ++i) { - str_buf_ptr += col_buf_encoder->Append(str_buf_ptr); - } - col_buf_encoder->Finish(); - const std::string& encoded_data = col_buf_encoder->GetData(); - // Check correctness of encoded string length - ASSERT_EQ(col_size + 1, encoded_data.size()); - - const char* encoded_data_ptr = encoded_data.c_str(); - uint64_t encoded_val = 0; - memcpy(&encoded_val, encoded_data_ptr, col_size); - uint64_t expected_encoded_val; - if (col_size == 8) { - expected_encoded_val = port::kLittleEndian ? 0x0807060504030201 : 0x0102030405060708; - } else if (col_size == 4) { - expected_encoded_val = port::kLittleEndian ? 0x08070605 : 0x0102030400000000; - } - // Check correctness of encoded value - ASSERT_EQ(expected_encoded_val, encoded_val); - - std::unique_ptr col_buf_decoder( - new FixedLengthColBufDecoder(col_size, kColRle, false, true)); - char* decoded_data = new char[100]; - char* decoded_data_base = decoded_data; - encoded_data_ptr += col_buf_decoder->Init(encoded_data_ptr); - for (int i = 0; i < row_count; ++i) { - encoded_data_ptr += - col_buf_decoder->Decode(encoded_data_ptr, &decoded_data); - } - // Check correctness of decoded string length - ASSERT_EQ(decoded_data - decoded_data_base, row_count * col_size); - decoded_data = decoded_data_base; - for (int i = 0; i < row_count; ++i) { - uint64_t decoded_val; - decoded_val = 0; - memcpy(&decoded_val, decoded_data, col_size); - // Check correctness of decoded value - ASSERT_EQ(val, decoded_val); - decoded_data += col_size; - } - delete[] decoded_data_base; -} - -TEST_P(ColumnAwareEncodingTestWithSize, DeltaEncodeDecode) { - size_t col_size = GetParam(); - int row_count = 4; - std::unique_ptr col_buf_encoder( - new FixedLengthColBufEncoder(col_size, kColDeltaVarint, false, true)); - std::string str_buf; - uint64_t base_val1 = port::kLittleEndian ? 0x0102030405060708 : 0x0807060504030201; - uint64_t base_val2 = port::kLittleEndian ? 0x0202030405060708 : 0x0807060504030202; - uint64_t val1 = 0, val2 = 0; - memcpy(&val1, &base_val1, col_size); - memcpy(&val2, &base_val2, col_size); - const char* str_buf_ptr; - for (int i = 0; i < row_count / 2; ++i) { - str_buf = std::string(reinterpret_cast(&val1), col_size); - str_buf_ptr = str_buf.c_str(); - col_buf_encoder->Append(str_buf_ptr); - - str_buf = std::string(reinterpret_cast(&val2), col_size); - str_buf_ptr = str_buf.c_str(); - col_buf_encoder->Append(str_buf_ptr); - } - col_buf_encoder->Finish(); - const std::string& encoded_data = col_buf_encoder->GetData(); - // Check encoded string length - int varint_len = 0; - if (col_size == 8) { - varint_len = 9; - } else if (col_size == 4) { - varint_len = port::kLittleEndian ? 5 : 9; - } - // Check encoded string length: first value is original one (val - 0), the - // coming three are encoded as 1, -1, 1, so they should take 1 byte in varint. - ASSERT_EQ(varint_len + 3 * 1, encoded_data.size()); - - std::unique_ptr col_buf_decoder( - new FixedLengthColBufDecoder(col_size, kColDeltaVarint, false, true)); - char* decoded_data = new char[100]; - char* decoded_data_base = decoded_data; - const char* encoded_data_ptr = encoded_data.c_str(); - encoded_data_ptr += col_buf_decoder->Init(encoded_data_ptr); - for (int i = 0; i < row_count; ++i) { - encoded_data_ptr += - col_buf_decoder->Decode(encoded_data_ptr, &decoded_data); - } - - // Check correctness of decoded string length - ASSERT_EQ(row_count * col_size, decoded_data - decoded_data_base); - decoded_data = decoded_data_base; - - // Check correctness of decoded data - for (int i = 0; i < row_count / 2; ++i) { - uint64_t decoded_val = 0; - memcpy(&decoded_val, decoded_data, col_size); - ASSERT_EQ(val1, decoded_val); - decoded_data += col_size; - memcpy(&decoded_val, decoded_data, col_size); - ASSERT_EQ(val2, decoded_val); - decoded_data += col_size; - } - delete[] decoded_data_base; -} - -TEST_F(ColumnAwareEncodingTest, ChunkBufEncodeDecode) { - std::unique_ptr col_buf_encoder( - new VariableChunkColBufEncoder(kColDict)); - std::string buf("12345678\377\1\0\0\0\0\0\0\0\376", 18); - col_buf_encoder->Append(buf.c_str()); - col_buf_encoder->Finish(); - const std::string& encoded_data = col_buf_encoder->GetData(); - const char* str_ptr = encoded_data.c_str(); - - std::unique_ptr col_buf_decoder( - new VariableChunkColBufDecoder(kColDict)); - str_ptr += col_buf_decoder->Init(str_ptr); - char* decoded_data = new char[100]; - char* decoded_data_base = decoded_data; - col_buf_decoder->Decode(str_ptr, &decoded_data); - for (size_t i = 0; i < buf.size(); ++i) { - ASSERT_EQ(buf[i], decoded_data_base[i]); - } - delete[] decoded_data_base; -} - -} // namespace rocksdb - -int main(int argc, char** argv) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else - -#include - -int main() { - fprintf(stderr, - "SKIPPED as column aware encoding experiment is not enabled in " - "ROCKSDB_LITE\n"); -} -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/column_aware_encoding_util.cc b/ceph/src/rocksdb/utilities/column_aware_encoding_util.cc deleted file mode 100644 index 222ee4680..000000000 --- a/ceph/src/rocksdb/utilities/column_aware_encoding_util.cc +++ /dev/null @@ -1,491 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// -#ifndef ROCKSDB_LITE - -#include "utilities/column_aware_encoding_util.h" - -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#include -#include -#include -#include -#include -#include -#include "include/rocksdb/comparator.h" -#include "include/rocksdb/slice.h" -#include "rocksdb/env.h" -#include "rocksdb/status.h" -#include "table/block_based_table_builder.h" -#include "table/block_based_table_factory.h" -#include "table/format.h" -#include "table/table_reader.h" -#include "util/cast_util.h" -#include "util/coding.h" -#include "utilities/col_buf_decoder.h" -#include "utilities/col_buf_encoder.h" - -#include "port/port.h" - -namespace rocksdb { - -ColumnAwareEncodingReader::ColumnAwareEncodingReader( - const std::string& file_path) - : file_name_(file_path), - ioptions_(options_), - moptions_(options_), - internal_comparator_(BytewiseComparator()) { - InitTableReader(file_name_); -} - -void ColumnAwareEncodingReader::InitTableReader(const std::string& file_path) { - std::unique_ptr file; - uint64_t file_size; - options_.env->NewRandomAccessFile(file_path, &file, soptions_); - options_.env->GetFileSize(file_path, &file_size); - - file_.reset(new RandomAccessFileReader(std::move(file), file_path)); - - options_.comparator = &internal_comparator_; - options_.table_factory = std::make_shared(); - - std::unique_ptr table_reader; - options_.table_factory->NewTableReader( - TableReaderOptions(ioptions_, moptions_.prefix_extractor.get(), soptions_, - internal_comparator_), - std::move(file_), file_size, &table_reader, /*enable_prefetch=*/false); - - table_reader_.reset(static_cast_with_check( - table_reader.release())); -} - -void ColumnAwareEncodingReader::GetKVPairsFromDataBlocks( - std::vector* kv_pair_blocks) { - table_reader_->GetKVPairsFromDataBlocks(kv_pair_blocks); -} - -void ColumnAwareEncodingReader::DecodeBlocks( - const KVPairColDeclarations& kvp_col_declarations, WritableFile* out_file, - const std::vector* blocks) { - char* decoded_content_base = new char[16384]; - Options options; - ImmutableCFOptions ioptions(options); - for (auto& block : *blocks) { - KVPairColBufDecoders kvp_col_bufs(kvp_col_declarations); - auto& key_col_bufs = kvp_col_bufs.key_col_bufs; - auto& value_col_bufs = kvp_col_bufs.value_col_bufs; - auto& value_checksum_buf = kvp_col_bufs.value_checksum_buf; - - auto& slice_final_with_bit = block; - uint32_t format_version = 2; - BlockContents contents; - const char* content_ptr; - - CompressionType type = - (CompressionType)slice_final_with_bit[slice_final_with_bit.size() - 1]; - if (type != kNoCompression) { - UncompressionContext uncompression_ctx(type); - UncompressBlockContents(uncompression_ctx, slice_final_with_bit.c_str(), - slice_final_with_bit.size() - 1, &contents, - format_version, ioptions); - content_ptr = contents.data.data(); - } else { - content_ptr = slice_final_with_bit.data(); - } - - size_t num_kv_pairs; - const char* header_content_ptr = content_ptr; - num_kv_pairs = static_cast(DecodeFixed64(header_content_ptr)); - - header_content_ptr += sizeof(size_t); - size_t num_key_columns = key_col_bufs.size(); - size_t num_value_columns = value_col_bufs.size(); - std::vector key_content_ptr(num_key_columns); - std::vector value_content_ptr(num_value_columns); - const char* checksum_content_ptr; - - size_t num_columns = num_key_columns + num_value_columns; - const char* col_content_ptr = - header_content_ptr + sizeof(size_t) * num_columns; - - // Read headers - for (size_t i = 0; i < num_key_columns; ++i) { - key_content_ptr[i] = col_content_ptr; - key_content_ptr[i] += key_col_bufs[i]->Init(key_content_ptr[i]); - size_t offset; - offset = static_cast(DecodeFixed64(header_content_ptr)); - header_content_ptr += sizeof(size_t); - col_content_ptr += offset; - } - for (size_t i = 0; i < num_value_columns; ++i) { - value_content_ptr[i] = col_content_ptr; - value_content_ptr[i] += value_col_bufs[i]->Init(value_content_ptr[i]); - size_t offset; - offset = static_cast(DecodeFixed64(header_content_ptr)); - header_content_ptr += sizeof(size_t); - col_content_ptr += offset; - } - checksum_content_ptr = col_content_ptr; - checksum_content_ptr += value_checksum_buf->Init(checksum_content_ptr); - - // Decode block - char* decoded_content = decoded_content_base; - for (size_t j = 0; j < num_kv_pairs; ++j) { - for (size_t i = 0; i < num_key_columns; ++i) { - key_content_ptr[i] += - key_col_bufs[i]->Decode(key_content_ptr[i], &decoded_content); - } - for (size_t i = 0; i < num_value_columns; ++i) { - value_content_ptr[i] += - value_col_bufs[i]->Decode(value_content_ptr[i], &decoded_content); - } - checksum_content_ptr += - value_checksum_buf->Decode(checksum_content_ptr, &decoded_content); - } - - size_t offset = decoded_content - decoded_content_base; - Slice output_content(decoded_content, offset); - - if (out_file != nullptr) { - out_file->Append(output_content); - } - } - delete[] decoded_content_base; -} - -void ColumnAwareEncodingReader::DecodeBlocksFromRowFormat( - WritableFile* out_file, const std::vector* blocks) { - Options options; - ImmutableCFOptions ioptions(options); - for (auto& block : *blocks) { - auto& slice_final_with_bit = block; - uint32_t format_version = 2; - BlockContents contents; - std::string decoded_content; - - CompressionType type = - (CompressionType)slice_final_with_bit[slice_final_with_bit.size() - 1]; - if (type != kNoCompression) { - UncompressionContext uncompression_ctx(type); - UncompressBlockContents(uncompression_ctx, slice_final_with_bit.c_str(), - slice_final_with_bit.size() - 1, &contents, - format_version, ioptions); - decoded_content = std::string(contents.data.data(), contents.data.size()); - } else { - decoded_content = std::move(slice_final_with_bit); - } - - if (out_file != nullptr) { - out_file->Append(decoded_content); - } - } -} - -void ColumnAwareEncodingReader::DumpDataColumns( - const std::string& filename, - const KVPairColDeclarations& kvp_col_declarations, - const std::vector& kv_pair_blocks) { - KVPairColBufEncoders kvp_col_bufs(kvp_col_declarations); - auto& key_col_bufs = kvp_col_bufs.key_col_bufs; - auto& value_col_bufs = kvp_col_bufs.value_col_bufs; - auto& value_checksum_buf = kvp_col_bufs.value_checksum_buf; - - FILE* fp = fopen(filename.c_str(), "w"); - size_t block_id = 1; - for (auto& kv_pairs : kv_pair_blocks) { - fprintf(fp, "---------------- Block: %-4" ROCKSDB_PRIszt " ----------------\n", block_id); - for (auto& kv_pair : kv_pairs) { - const auto& key = kv_pair.first; - const auto& value = kv_pair.second; - size_t value_offset = 0; - - const char* key_ptr = key.data(); - for (auto& buf : key_col_bufs) { - size_t col_size = buf->Append(key_ptr); - std::string tmp_buf(key_ptr, col_size); - Slice col(tmp_buf); - fprintf(fp, "%s ", col.ToString(true).c_str()); - key_ptr += col_size; - } - fprintf(fp, "|"); - - const char* value_ptr = value.data(); - for (auto& buf : value_col_bufs) { - size_t col_size = buf->Append(value_ptr); - std::string tmp_buf(value_ptr, col_size); - Slice col(tmp_buf); - fprintf(fp, " %s", col.ToString(true).c_str()); - value_ptr += col_size; - value_offset += col_size; - } - - if (value_offset < value.size()) { - size_t col_size = value_checksum_buf->Append(value_ptr); - std::string tmp_buf(value_ptr, col_size); - Slice col(tmp_buf); - fprintf(fp, "|%s", col.ToString(true).c_str()); - } else { - value_checksum_buf->Append(nullptr); - } - fprintf(fp, "\n"); - } - block_id++; - } - fclose(fp); -} - -namespace { - -void CompressDataBlock(const std::string& output_content, Slice* slice_final, - CompressionType* type, std::string* compressed_output) { - CompressionContext compression_ctx(*type); - uint32_t format_version = 2; // hard-coded version - *slice_final = CompressBlock(output_content, compression_ctx, type, - format_version, compressed_output); -} - -} // namespace - -void ColumnAwareEncodingReader::EncodeBlocksToRowFormat( - WritableFile* out_file, CompressionType compression_type, - const std::vector& kv_pair_blocks, - std::vector* blocks) { - std::string output_content; - for (auto& kv_pairs : kv_pair_blocks) { - output_content.clear(); - std::string last_key; - size_t counter = 0; - const size_t block_restart_interval = 16; - for (auto& kv_pair : kv_pairs) { - const auto& key = kv_pair.first; - const auto& value = kv_pair.second; - - Slice last_key_piece(last_key); - size_t shared = 0; - if (counter >= block_restart_interval) { - counter = 0; - } else { - const size_t min_length = std::min(last_key_piece.size(), key.size()); - while ((shared < min_length) && last_key_piece[shared] == key[shared]) { - shared++; - } - } - const size_t non_shared = key.size() - shared; - output_content.append(key.c_str() + shared, non_shared); - output_content.append(value); - - last_key.resize(shared); - last_key.append(key.data() + shared, non_shared); - counter++; - } - Slice slice_final; - auto type = compression_type; - std::string compressed_output; - CompressDataBlock(output_content, &slice_final, &type, &compressed_output); - - if (out_file != nullptr) { - out_file->Append(slice_final); - } - - // Add a bit in the end for decoding - std::string slice_final_with_bit(slice_final.data(), slice_final.size()); - slice_final_with_bit.append(reinterpret_cast(&type), 1); - blocks->push_back( - std::string(slice_final_with_bit.data(), slice_final_with_bit.size())); - } -} - -Status ColumnAwareEncodingReader::EncodeBlocks( - const KVPairColDeclarations& kvp_col_declarations, WritableFile* out_file, - CompressionType compression_type, - const std::vector& kv_pair_blocks, - std::vector* blocks, bool print_column_stat) { - std::vector key_col_sizes( - kvp_col_declarations.key_col_declarations->size(), 0); - std::vector value_col_sizes( - kvp_col_declarations.value_col_declarations->size(), 0); - size_t value_checksum_size = 0; - - for (auto& kv_pairs : kv_pair_blocks) { - KVPairColBufEncoders kvp_col_bufs(kvp_col_declarations); - auto& key_col_bufs = kvp_col_bufs.key_col_bufs; - auto& value_col_bufs = kvp_col_bufs.value_col_bufs; - auto& value_checksum_buf = kvp_col_bufs.value_checksum_buf; - - size_t num_kv_pairs = 0; - for (auto& kv_pair : kv_pairs) { - const auto& key = kv_pair.first; - const auto& value = kv_pair.second; - size_t value_offset = 0; - num_kv_pairs++; - - const char* key_ptr = key.data(); - for (auto& buf : key_col_bufs) { - size_t col_size = buf->Append(key_ptr); - key_ptr += col_size; - } - - const char* value_ptr = value.data(); - for (auto& buf : value_col_bufs) { - size_t col_size = buf->Append(value_ptr); - value_ptr += col_size; - value_offset += col_size; - } - - if (value_offset < value.size()) { - value_checksum_buf->Append(value_ptr); - } else { - value_checksum_buf->Append(nullptr); - } - } - - kvp_col_bufs.Finish(); - // Get stats - // Compress and write a block - if (print_column_stat) { - for (size_t i = 0; i < key_col_bufs.size(); ++i) { - Slice slice_final; - auto type = compression_type; - std::string compressed_output; - CompressDataBlock(key_col_bufs[i]->GetData(), &slice_final, &type, - &compressed_output); - out_file->Append(slice_final); - key_col_sizes[i] += slice_final.size(); - } - for (size_t i = 0; i < value_col_bufs.size(); ++i) { - Slice slice_final; - auto type = compression_type; - std::string compressed_output; - CompressDataBlock(value_col_bufs[i]->GetData(), &slice_final, &type, - &compressed_output); - out_file->Append(slice_final); - value_col_sizes[i] += slice_final.size(); - } - Slice slice_final; - auto type = compression_type; - std::string compressed_output; - CompressDataBlock(value_checksum_buf->GetData(), &slice_final, &type, - &compressed_output); - out_file->Append(slice_final); - value_checksum_size += slice_final.size(); - } else { - std::string output_content; - // Write column sizes - PutFixed64(&output_content, num_kv_pairs); - for (auto& buf : key_col_bufs) { - size_t size = buf->GetData().size(); - PutFixed64(&output_content, size); - } - for (auto& buf : value_col_bufs) { - size_t size = buf->GetData().size(); - PutFixed64(&output_content, size); - } - // Write data - for (auto& buf : key_col_bufs) { - output_content.append(buf->GetData()); - } - for (auto& buf : value_col_bufs) { - output_content.append(buf->GetData()); - } - output_content.append(value_checksum_buf->GetData()); - - Slice slice_final; - auto type = compression_type; - std::string compressed_output; - CompressDataBlock(output_content, &slice_final, &type, - &compressed_output); - - if (out_file != nullptr) { - out_file->Append(slice_final); - } - - // Add a bit in the end for decoding - std::string slice_final_with_bit(slice_final.data(), - slice_final.size() + 1); - slice_final_with_bit[slice_final.size()] = static_cast(type); - blocks->push_back(std::string(slice_final_with_bit.data(), - slice_final_with_bit.size())); - } - } - - if (print_column_stat) { - size_t total_size = 0; - for (size_t i = 0; i < key_col_sizes.size(); ++i) - total_size += key_col_sizes[i]; - for (size_t i = 0; i < value_col_sizes.size(); ++i) - total_size += value_col_sizes[i]; - total_size += value_checksum_size; - - for (size_t i = 0; i < key_col_sizes.size(); ++i) - printf("Key col %" ROCKSDB_PRIszt " size: %" ROCKSDB_PRIszt " percentage %lf%%\n", i, key_col_sizes[i], - 100.0 * key_col_sizes[i] / total_size); - for (size_t i = 0; i < value_col_sizes.size(); ++i) - printf("Value col %" ROCKSDB_PRIszt " size: %" ROCKSDB_PRIszt " percentage %lf%%\n", i, - value_col_sizes[i], 100.0 * value_col_sizes[i] / total_size); - printf("Value checksum size: %" ROCKSDB_PRIszt " percentage %lf%%\n", value_checksum_size, - 100.0 * value_checksum_size / total_size); - } - return Status::OK(); -} - -void ColumnAwareEncodingReader::GetColDeclarationsPrimary( - std::vector** key_col_declarations, - std::vector** value_col_declarations, - ColDeclaration** value_checksum_declaration) { - *key_col_declarations = new std::vector{ - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 4, false, - true), - ColDeclaration("FixedLength", ColCompressionType::kColRleDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 8)}; - - *value_col_declarations = new std::vector{ - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 4), - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 4), - ColDeclaration("FixedLength", ColCompressionType::kColRle, 1), - ColDeclaration("VariableLength"), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 4), - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 8)}; - *value_checksum_declaration = new ColDeclaration( - "LongFixedLength", ColCompressionType::kColNoCompression, 9, - true /* nullable */); -} - -void ColumnAwareEncodingReader::GetColDeclarationsSecondary( - std::vector** key_col_declarations, - std::vector** value_col_declarations, - ColDeclaration** value_checksum_declaration) { - *key_col_declarations = new std::vector{ - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 4, false, - true), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColRleDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColRle, 1), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 4, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColDeltaVarint, 8, - false, true), - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 8, false, - true), - ColDeclaration("VariableChunk", ColCompressionType::kColNoCompression), - ColDeclaration("FixedLength", ColCompressionType::kColRleVarint, 8)}; - *value_col_declarations = new std::vector(); - *value_checksum_declaration = new ColDeclaration( - "LongFixedLength", ColCompressionType::kColNoCompression, 9, - true /* nullable */); -} - -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/column_aware_encoding_util.h b/ceph/src/rocksdb/utilities/column_aware_encoding_util.h deleted file mode 100644 index c2c4fa2d6..000000000 --- a/ceph/src/rocksdb/utilities/column_aware_encoding_util.h +++ /dev/null @@ -1,81 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include -#include "db/dbformat.h" -#include "include/rocksdb/env.h" -#include "include/rocksdb/listener.h" -#include "include/rocksdb/options.h" -#include "include/rocksdb/status.h" -#include "options/cf_options.h" -#include "table/block_based_table_reader.h" - -namespace rocksdb { - -struct ColDeclaration; -struct KVPairColDeclarations; - -class ColumnAwareEncodingReader { - public: - explicit ColumnAwareEncodingReader(const std::string& file_name); - - void GetKVPairsFromDataBlocks(std::vector* kv_pair_blocks); - - void EncodeBlocksToRowFormat(WritableFile* out_file, - CompressionType compression_type, - const std::vector& kv_pair_blocks, - std::vector* blocks); - - void DecodeBlocksFromRowFormat(WritableFile* out_file, - const std::vector* blocks); - - void DumpDataColumns(const std::string& filename, - const KVPairColDeclarations& kvp_col_declarations, - const std::vector& kv_pair_blocks); - - Status EncodeBlocks(const KVPairColDeclarations& kvp_col_declarations, - WritableFile* out_file, CompressionType compression_type, - const std::vector& kv_pair_blocks, - std::vector* blocks, bool print_column_stat); - - void DecodeBlocks(const KVPairColDeclarations& kvp_col_declarations, - WritableFile* out_file, - const std::vector* blocks); - - static void GetColDeclarationsPrimary( - std::vector** key_col_declarations, - std::vector** value_col_declarations, - ColDeclaration** value_checksum_declaration); - - static void GetColDeclarationsSecondary( - std::vector** key_col_declarations, - std::vector** value_col_declarations, - ColDeclaration** value_checksum_declaration); - - private: - // Init the TableReader for the sst file - void InitTableReader(const std::string& file_path); - - std::string file_name_; - EnvOptions soptions_; - - Options options_; - - Status init_result_; - std::unique_ptr table_reader_; - std::unique_ptr file_; - - const ImmutableCFOptions ioptions_; - const MutableCFOptions moptions_; - InternalKeyComparator internal_comparator_; - std::unique_ptr table_properties_; -}; - -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.cc b/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.cc deleted file mode 100644 index 978bfb2e4..000000000 --- a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.cc +++ /dev/null @@ -1,399 +0,0 @@ -// Copyright (c) 2011 The LevelDB Authors. All rights reserved. -// Use of this source code is governed by a BSD-style license that can be -// found in the LICENSE file. See the AUTHORS file for names of contributors. -#ifndef ROCKSDB_LITE - -#include "utilities/date_tiered/date_tiered_db_impl.h" - -#include - -#include "db/db_impl.h" -#include "db/db_iter.h" -#include "db/write_batch_internal.h" -#include "monitoring/instrumented_mutex.h" -#include "options/options_helper.h" -#include "rocksdb/convenience.h" -#include "rocksdb/env.h" -#include "rocksdb/iterator.h" -#include "rocksdb/utilities/date_tiered_db.h" -#include "table/merging_iterator.h" -#include "util/coding.h" -#include "util/filename.h" -#include "util/string_util.h" - -namespace rocksdb { - -// Open the db inside DateTieredDBImpl because options needs pointer to its ttl -DateTieredDBImpl::DateTieredDBImpl( - DB* db, Options options, - const std::vector& descriptors, - const std::vector& handles, int64_t ttl, - int64_t column_family_interval) - : db_(db), - cf_options_(ColumnFamilyOptions(options)), - ioptions_(ImmutableCFOptions(options)), - moptions_(MutableCFOptions(options)), - icomp_(cf_options_.comparator), - ttl_(ttl), - column_family_interval_(column_family_interval), - mutex_(options.statistics.get(), db->GetEnv(), DB_MUTEX_WAIT_MICROS, - options.use_adaptive_mutex) { - latest_timebound_ = std::numeric_limits::min(); - for (size_t i = 0; i < handles.size(); ++i) { - const auto& name = descriptors[i].name; - int64_t timestamp = 0; - try { - timestamp = ParseUint64(name); - } catch (const std::invalid_argument&) { - // Bypass unrelated column family, e.g. default - db_->DestroyColumnFamilyHandle(handles[i]); - continue; - } - if (timestamp > latest_timebound_) { - latest_timebound_ = timestamp; - } - handle_map_.insert(std::make_pair(timestamp, handles[i])); - } -} - -DateTieredDBImpl::~DateTieredDBImpl() { - for (auto handle : handle_map_) { - db_->DestroyColumnFamilyHandle(handle.second); - } - delete db_; - db_ = nullptr; -} - -Status DateTieredDB::Open(const Options& options, const std::string& dbname, - DateTieredDB** dbptr, int64_t ttl, - int64_t column_family_interval, bool read_only) { - DBOptions db_options(options); - ColumnFamilyOptions cf_options(options); - std::vector descriptors; - std::vector handles; - DB* db; - Status s; - - // Get column families - std::vector column_family_names; - s = DB::ListColumnFamilies(db_options, dbname, &column_family_names); - if (!s.ok()) { - // No column family found. Use default - s = DB::Open(options, dbname, &db); - if (!s.ok()) { - return s; - } - } else { - for (auto name : column_family_names) { - descriptors.emplace_back(ColumnFamilyDescriptor(name, cf_options)); - } - - // Open database - if (read_only) { - s = DB::OpenForReadOnly(db_options, dbname, descriptors, &handles, &db); - } else { - s = DB::Open(db_options, dbname, descriptors, &handles, &db); - } - } - - if (s.ok()) { - *dbptr = new DateTieredDBImpl(db, options, descriptors, handles, ttl, - column_family_interval); - } - return s; -} - -// Checks if the string is stale or not according to TTl provided -bool DateTieredDBImpl::IsStale(int64_t keytime, int64_t ttl, Env* env) { - if (ttl <= 0) { - // Data is fresh if TTL is non-positive - return false; - } - int64_t curtime; - if (!env->GetCurrentTime(&curtime).ok()) { - // Treat the data as fresh if could not get current time - return false; - } - return curtime >= keytime + ttl; -} - -// Drop column family when all data in that column family is expired -// TODO(jhli): Can be made a background job -Status DateTieredDBImpl::DropObsoleteColumnFamilies() { - int64_t curtime; - Status s; - s = db_->GetEnv()->GetCurrentTime(&curtime); - if (!s.ok()) { - return s; - } - { - InstrumentedMutexLock l(&mutex_); - auto iter = handle_map_.begin(); - while (iter != handle_map_.end()) { - if (iter->first <= curtime - ttl_) { - s = db_->DropColumnFamily(iter->second); - if (!s.ok()) { - return s; - } - delete iter->second; - iter = handle_map_.erase(iter); - } else { - break; - } - } - } - return Status::OK(); -} - -// Get timestamp from user key -Status DateTieredDBImpl::GetTimestamp(const Slice& key, int64_t* result) { - if (key.size() < kTSLength) { - return Status::Corruption("Bad timestamp in key"); - } - const char* pos = key.data() + key.size() - 8; - int64_t timestamp = 0; - if (port::kLittleEndian) { - int bytes_to_fill = 8; - for (int i = 0; i < bytes_to_fill; ++i) { - timestamp |= (static_cast(static_cast(pos[i])) - << ((bytes_to_fill - i - 1) << 3)); - } - } else { - memcpy(×tamp, pos, sizeof(timestamp)); - } - *result = timestamp; - return Status::OK(); -} - -Status DateTieredDBImpl::CreateColumnFamily( - ColumnFamilyHandle** column_family) { - int64_t curtime; - Status s; - mutex_.AssertHeld(); - s = db_->GetEnv()->GetCurrentTime(&curtime); - if (!s.ok()) { - return s; - } - int64_t new_timebound; - if (handle_map_.empty()) { - new_timebound = curtime + column_family_interval_; - } else { - new_timebound = - latest_timebound_ + - ((curtime - latest_timebound_) / column_family_interval_ + 1) * - column_family_interval_; - } - std::string cf_name = ToString(new_timebound); - latest_timebound_ = new_timebound; - s = db_->CreateColumnFamily(cf_options_, cf_name, column_family); - if (s.ok()) { - handle_map_.insert(std::make_pair(new_timebound, *column_family)); - } - return s; -} - -Status DateTieredDBImpl::FindColumnFamily(int64_t keytime, - ColumnFamilyHandle** column_family, - bool create_if_missing) { - *column_family = nullptr; - { - InstrumentedMutexLock l(&mutex_); - auto iter = handle_map_.upper_bound(keytime); - if (iter == handle_map_.end()) { - if (!create_if_missing) { - return Status::NotFound(); - } else { - return CreateColumnFamily(column_family); - } - } - // Move to previous element to get the appropriate time window - *column_family = iter->second; - } - return Status::OK(); -} - -Status DateTieredDBImpl::Put(const WriteOptions& options, const Slice& key, - const Slice& val) { - int64_t timestamp = 0; - Status s; - s = GetTimestamp(key, ×tamp); - if (!s.ok()) { - return s; - } - DropObsoleteColumnFamilies(); - - // Prune request to obsolete data - if (IsStale(timestamp, ttl_, db_->GetEnv())) { - return Status::InvalidArgument(); - } - - // Decide column family (i.e. the time window) to put into - ColumnFamilyHandle* column_family; - s = FindColumnFamily(timestamp, &column_family, true /*create_if_missing*/); - if (!s.ok()) { - return s; - } - - // Efficiently put with WriteBatch - WriteBatch batch; - batch.Put(column_family, key, val); - return Write(options, &batch); -} - -Status DateTieredDBImpl::Get(const ReadOptions& options, const Slice& key, - std::string* value) { - int64_t timestamp = 0; - Status s; - s = GetTimestamp(key, ×tamp); - if (!s.ok()) { - return s; - } - // Prune request to obsolete data - if (IsStale(timestamp, ttl_, db_->GetEnv())) { - return Status::NotFound(); - } - - // Decide column family to get from - ColumnFamilyHandle* column_family; - s = FindColumnFamily(timestamp, &column_family, false /*create_if_missing*/); - if (!s.ok()) { - return s; - } - if (column_family == nullptr) { - // Cannot find column family - return Status::NotFound(); - } - - // Get value with key - return db_->Get(options, column_family, key, value); -} - -bool DateTieredDBImpl::KeyMayExist(const ReadOptions& options, const Slice& key, - std::string* value, bool* value_found) { - int64_t timestamp = 0; - Status s; - s = GetTimestamp(key, ×tamp); - if (!s.ok()) { - // Cannot get current time - return false; - } - // Decide column family to get from - ColumnFamilyHandle* column_family; - s = FindColumnFamily(timestamp, &column_family, false /*create_if_missing*/); - if (!s.ok() || column_family == nullptr) { - // Cannot find column family - return false; - } - if (IsStale(timestamp, ttl_, db_->GetEnv())) { - return false; - } - return db_->KeyMayExist(options, column_family, key, value, value_found); -} - -Status DateTieredDBImpl::Delete(const WriteOptions& options, const Slice& key) { - int64_t timestamp = 0; - Status s; - s = GetTimestamp(key, ×tamp); - if (!s.ok()) { - return s; - } - DropObsoleteColumnFamilies(); - // Prune request to obsolete data - if (IsStale(timestamp, ttl_, db_->GetEnv())) { - return Status::NotFound(); - } - - // Decide column family to get from - ColumnFamilyHandle* column_family; - s = FindColumnFamily(timestamp, &column_family, false /*create_if_missing*/); - if (!s.ok()) { - return s; - } - if (column_family == nullptr) { - // Cannot find column family - return Status::NotFound(); - } - - // Get value with key - return db_->Delete(options, column_family, key); -} - -Status DateTieredDBImpl::Merge(const WriteOptions& options, const Slice& key, - const Slice& value) { - // Decide column family to get from - int64_t timestamp = 0; - Status s; - s = GetTimestamp(key, ×tamp); - if (!s.ok()) { - // Cannot get current time - return s; - } - ColumnFamilyHandle* column_family; - s = FindColumnFamily(timestamp, &column_family, true /*create_if_missing*/); - if (!s.ok()) { - return s; - } - WriteBatch batch; - batch.Merge(column_family, key, value); - return Write(options, &batch); -} - -Status DateTieredDBImpl::Write(const WriteOptions& opts, WriteBatch* updates) { - class Handler : public WriteBatch::Handler { - public: - explicit Handler() {} - WriteBatch updates_ttl; - Status batch_rewrite_status; - virtual Status PutCF(uint32_t column_family_id, const Slice& key, - const Slice& value) override { - WriteBatchInternal::Put(&updates_ttl, column_family_id, key, value); - return Status::OK(); - } - virtual Status MergeCF(uint32_t column_family_id, const Slice& key, - const Slice& value) override { - WriteBatchInternal::Merge(&updates_ttl, column_family_id, key, value); - return Status::OK(); - } - virtual Status DeleteCF(uint32_t column_family_id, - const Slice& key) override { - WriteBatchInternal::Delete(&updates_ttl, column_family_id, key); - return Status::OK(); - } - virtual void LogData(const Slice& blob) override { - updates_ttl.PutLogData(blob); - } - }; - Handler handler; - updates->Iterate(&handler); - if (!handler.batch_rewrite_status.ok()) { - return handler.batch_rewrite_status; - } else { - return db_->Write(opts, &(handler.updates_ttl)); - } -} - -Iterator* DateTieredDBImpl::NewIterator(const ReadOptions& opts) { - if (handle_map_.empty()) { - return NewEmptyIterator(); - } - - DBImpl* db_impl = reinterpret_cast(db_); - - auto db_iter = NewArenaWrappedDbIterator( - db_impl->GetEnv(), opts, ioptions_, moptions_, kMaxSequenceNumber, - cf_options_.max_sequential_skip_in_iterations, 0, - nullptr /*read_callback*/); - - auto arena = db_iter->GetArena(); - MergeIteratorBuilder builder(&icomp_, arena); - for (auto& item : handle_map_) { - auto handle = item.second; - builder.AddIterator(db_impl->NewInternalIterator( - arena, db_iter->GetRangeDelAggregator(), handle)); - } - auto internal_iter = builder.Finish(); - db_iter->SetIterUnderDBIter(internal_iter); - return db_iter; -} -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.h b/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.h deleted file mode 100644 index 7a6a6b75a..000000000 --- a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_db_impl.h +++ /dev/null @@ -1,93 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#ifndef ROCKSDB_LITE - -#include -#include -#include - -#include "monitoring/instrumented_mutex.h" -#include "options/cf_options.h" -#include "rocksdb/db.h" -#include "rocksdb/utilities/date_tiered_db.h" - -namespace rocksdb { - -// Implementation of DateTieredDB. -class DateTieredDBImpl : public DateTieredDB { - public: - DateTieredDBImpl(DB* db, Options options, - const std::vector& descriptors, - const std::vector& handles, int64_t ttl, - int64_t column_family_interval); - - virtual ~DateTieredDBImpl(); - - Status Put(const WriteOptions& options, const Slice& key, - const Slice& val) override; - - Status Get(const ReadOptions& options, const Slice& key, - std::string* value) override; - - Status Delete(const WriteOptions& options, const Slice& key) override; - - bool KeyMayExist(const ReadOptions& options, const Slice& key, - std::string* value, bool* value_found = nullptr) override; - - Status Merge(const WriteOptions& options, const Slice& key, - const Slice& value) override; - - Iterator* NewIterator(const ReadOptions& opts) override; - - Status DropObsoleteColumnFamilies() override; - - // Extract timestamp from key. - static Status GetTimestamp(const Slice& key, int64_t* result); - - private: - // Base database object - DB* db_; - - const ColumnFamilyOptions cf_options_; - - const ImmutableCFOptions ioptions_; - - const MutableCFOptions moptions_; - - const InternalKeyComparator icomp_; - - // Storing all column family handles for time series data. - std::vector handles_; - - // Manages a mapping from a column family's maximum timestamp to its handle. - std::map handle_map_; - - // A time-to-live value to indicate when the data should be removed. - int64_t ttl_; - - // An variable to indicate the time range of a column family. - int64_t column_family_interval_; - - // Indicate largest maximum timestamp of a column family. - int64_t latest_timebound_; - - // Mutex to protect handle_map_ operations. - InstrumentedMutex mutex_; - - // Internal method to execute Put and Merge in batch. - Status Write(const WriteOptions& opts, WriteBatch* updates); - - Status CreateColumnFamily(ColumnFamilyHandle** column_family); - - Status FindColumnFamily(int64_t keytime, ColumnFamilyHandle** column_family, - bool create_if_missing); - - static bool IsStale(int64_t keytime, int64_t ttl, Env* env); -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_test.cc b/ceph/src/rocksdb/utilities/date_tiered/date_tiered_test.cc deleted file mode 100644 index 8e7fced58..000000000 --- a/ceph/src/rocksdb/utilities/date_tiered/date_tiered_test.cc +++ /dev/null @@ -1,468 +0,0 @@ -// Copyright (c) 2011 The LevelDB Authors. All rights reserved. -// Use of this source code is governed by a BSD-style license that can be -// found in the LICENSE file. See the AUTHORS file for names of contributors. - -#ifndef ROCKSDB_LITE - -#ifndef OS_WIN -#include -#endif - -#include -#include - -#include "rocksdb/compaction_filter.h" -#include "rocksdb/utilities/date_tiered_db.h" -#include "util/logging.h" -#include "util/string_util.h" -#include "util/testharness.h" - -namespace rocksdb { - -namespace { - -typedef std::map KVMap; -} - -class SpecialTimeEnv : public EnvWrapper { - public: - explicit SpecialTimeEnv(Env* base) : EnvWrapper(base) { - base->GetCurrentTime(¤t_time_); - } - - void Sleep(int64_t sleep_time) { current_time_ += sleep_time; } - virtual Status GetCurrentTime(int64_t* current_time) override { - *current_time = current_time_; - return Status::OK(); - } - - private: - int64_t current_time_ = 0; -}; - -class DateTieredTest : public testing::Test { - public: - DateTieredTest() { - env_.reset(new SpecialTimeEnv(Env::Default())); - dbname_ = test::PerThreadDBPath("date_tiered"); - options_.create_if_missing = true; - options_.env = env_.get(); - date_tiered_db_.reset(nullptr); - DestroyDB(dbname_, Options()); - } - - ~DateTieredTest() { - CloseDateTieredDB(); - DestroyDB(dbname_, Options()); - } - - void OpenDateTieredDB(int64_t ttl, int64_t column_family_interval, - bool read_only = false) { - ASSERT_TRUE(date_tiered_db_.get() == nullptr); - DateTieredDB* date_tiered_db = nullptr; - ASSERT_OK(DateTieredDB::Open(options_, dbname_, &date_tiered_db, ttl, - column_family_interval, read_only)); - date_tiered_db_.reset(date_tiered_db); - } - - void CloseDateTieredDB() { date_tiered_db_.reset(nullptr); } - - Status AppendTimestamp(std::string* key) { - char ts[8]; - int bytes_to_fill = 8; - int64_t timestamp_value = 0; - Status s = env_->GetCurrentTime(×tamp_value); - if (!s.ok()) { - return s; - } - if (port::kLittleEndian) { - for (int i = 0; i < bytes_to_fill; ++i) { - ts[i] = (timestamp_value >> ((bytes_to_fill - i - 1) << 3)) & 0xFF; - } - } else { - memcpy(ts, static_cast(×tamp_value), bytes_to_fill); - } - key->append(ts, 8); - return Status::OK(); - } - - // Populates and returns a kv-map - void MakeKVMap(int64_t num_entries, KVMap* kvmap) { - kvmap->clear(); - int digits = 1; - for (int64_t dummy = num_entries; dummy /= 10; ++digits) { - } - int digits_in_i = 1; - for (int64_t i = 0; i < num_entries; i++) { - std::string key = "key"; - std::string value = "value"; - if (i % 10 == 0) { - digits_in_i++; - } - for (int j = digits_in_i; j < digits; j++) { - key.append("0"); - value.append("0"); - } - AppendNumberTo(&key, i); - AppendNumberTo(&value, i); - ASSERT_OK(AppendTimestamp(&key)); - (*kvmap)[key] = value; - } - // check all insertions done - ASSERT_EQ(num_entries, static_cast(kvmap->size())); - } - - size_t GetColumnFamilyCount() { - DBOptions db_options(options_); - std::vector cf; - DB::ListColumnFamilies(db_options, dbname_, &cf); - return cf.size(); - } - - void Sleep(int64_t sleep_time) { env_->Sleep(sleep_time); } - - static const int64_t kSampleSize_ = 100; - std::string dbname_; - std::unique_ptr date_tiered_db_; - std::unique_ptr env_; - KVMap kvmap_; - - private: - Options options_; - KVMap::iterator kv_it_; - const std::string kNewValue_ = "new_value"; - unique_ptr test_comp_filter_; -}; - -// Puts a set of values and checks its presence using Get during ttl -TEST_F(DateTieredTest, KeyLifeCycle) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(2, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - - // Put data in database - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - - Sleep(1); - // T=1, keys should still reside in database - for (auto& kv : map_insert) { - std::string value; - ASSERT_OK(date_tiered_db_->Get(ropts, kv.first, &value)); - ASSERT_EQ(value, kv.second); - } - - Sleep(1); - // T=2, keys should not be retrieved - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } - - CloseDateTieredDB(); -} - -TEST_F(DateTieredTest, DeleteTest) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(2, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - - // Put data in database - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - - Sleep(1); - // Delete keys when they are not obsolete - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Delete(wopts, kv.first)); - } - - // Key should not be found - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } -} - -TEST_F(DateTieredTest, KeyMayExistTest) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(2, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - - // Put data in database - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - - Sleep(1); - // T=1, keys should still reside in database - for (auto& kv : map_insert) { - std::string value; - ASSERT_TRUE(date_tiered_db_->KeyMayExist(ropts, kv.first, &value)); - ASSERT_EQ(value, kv.second); - } -} - -// Database open and close should not affect -TEST_F(DateTieredTest, MultiOpen) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(4, 4); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - - // Put data in database - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - CloseDateTieredDB(); - - Sleep(1); - OpenDateTieredDB(2, 2); - // T=1, keys should still reside in database - for (auto& kv : map_insert) { - std::string value; - ASSERT_OK(date_tiered_db_->Get(ropts, kv.first, &value)); - ASSERT_EQ(value, kv.second); - } - - Sleep(1); - // T=2, keys should not be retrieved - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } - - CloseDateTieredDB(); -} - -// If the key in Put() is obsolete, the data should not be written into database -TEST_F(DateTieredTest, InsertObsoleteDate) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(2, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - - Sleep(2); - // T=2, keys put into database are already obsolete - // Put data in database. Operations should not return OK - for (auto& kv : map_insert) { - auto s = date_tiered_db_->Put(wopts, kv.first, kv.second); - ASSERT_TRUE(s.IsInvalidArgument()); - } - - // Data should not be found in database - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } - - CloseDateTieredDB(); -} - -// Resets the timestamp of a set of kvs by updating them and checks that they -// are not deleted according to the old timestamp -TEST_F(DateTieredTest, ColumnFamilyCounts) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(4, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - // Only default column family - ASSERT_EQ(1, GetColumnFamilyCount()); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - // A time series column family is created - ASSERT_EQ(2, GetColumnFamilyCount()); - - Sleep(2); - KVMap map_insert2; - MakeKVMap(kSampleSize_, &map_insert2); - for (auto& kv : map_insert2) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - // Another time series column family is created - ASSERT_EQ(3, GetColumnFamilyCount()); - - Sleep(4); - - // Data should not be found in database - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } - - // Explicitly drop obsolete column families - date_tiered_db_->DropObsoleteColumnFamilies(); - - // The first column family is deleted from database - ASSERT_EQ(2, GetColumnFamilyCount()); - - CloseDateTieredDB(); -} - -// Puts a set of values and checks its presence using iterator during ttl -TEST_F(DateTieredTest, IteratorLifeCycle) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(2, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - // Create key value pairs to insert - KVMap map_insert; - MakeKVMap(kSampleSize_, &map_insert); - Iterator* dbiter; - - // Put data in database - for (auto& kv : map_insert) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - - Sleep(1); - ASSERT_EQ(2, GetColumnFamilyCount()); - // T=1, keys should still reside in database - dbiter = date_tiered_db_->NewIterator(ropts); - dbiter->SeekToFirst(); - for (auto& kv : map_insert) { - ASSERT_TRUE(dbiter->Valid()); - ASSERT_EQ(0, dbiter->value().compare(kv.second)); - dbiter->Next(); - } - delete dbiter; - - Sleep(4); - // T=5, keys should not be retrieved - for (auto& kv : map_insert) { - std::string value; - auto s = date_tiered_db_->Get(ropts, kv.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } - - // Explicitly drop obsolete column families - date_tiered_db_->DropObsoleteColumnFamilies(); - - // Only default column family - ASSERT_EQ(1, GetColumnFamilyCount()); - - // Empty iterator - dbiter = date_tiered_db_->NewIterator(ropts); - dbiter->Seek(map_insert.begin()->first); - ASSERT_FALSE(dbiter->Valid()); - delete dbiter; - - CloseDateTieredDB(); -} - -// Iterator should be able to merge data from multiple column families -TEST_F(DateTieredTest, IteratorMerge) { - WriteOptions wopts; - ReadOptions ropts; - - // T=0, open the database and insert data - OpenDateTieredDB(4, 2); - ASSERT_TRUE(date_tiered_db_.get() != nullptr); - - Iterator* dbiter; - - // Put data in database - KVMap map_insert1; - MakeKVMap(kSampleSize_, &map_insert1); - for (auto& kv : map_insert1) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - ASSERT_EQ(2, GetColumnFamilyCount()); - - Sleep(2); - // Put more data - KVMap map_insert2; - MakeKVMap(kSampleSize_, &map_insert2); - for (auto& kv : map_insert2) { - ASSERT_OK(date_tiered_db_->Put(wopts, kv.first, kv.second)); - } - // Multiple column families for time series data - ASSERT_EQ(3, GetColumnFamilyCount()); - - // Iterator should be able to merge data from different column families - dbiter = date_tiered_db_->NewIterator(ropts); - dbiter->SeekToFirst(); - KVMap::iterator iter1 = map_insert1.begin(); - KVMap::iterator iter2 = map_insert2.begin(); - for (; iter1 != map_insert1.end() && iter2 != map_insert2.end(); - iter1++, iter2++) { - ASSERT_TRUE(dbiter->Valid()); - ASSERT_EQ(0, dbiter->value().compare(iter1->second)); - dbiter->Next(); - - ASSERT_TRUE(dbiter->Valid()); - ASSERT_EQ(0, dbiter->value().compare(iter2->second)); - dbiter->Next(); - } - delete dbiter; - - CloseDateTieredDB(); -} - -} // namespace rocksdb - -// A black-box test for the DateTieredDB around rocksdb -int main(int argc, char** argv) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else -#include - -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "SKIPPED as DateTieredDB is not supported in ROCKSDB_LITE\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/debug.cc b/ceph/src/rocksdb/utilities/debug.cc index e0c5f5566..72fcbf0f5 100644 --- a/ceph/src/rocksdb/utilities/debug.cc +++ b/ceph/src/rocksdb/utilities/debug.cc @@ -19,9 +19,11 @@ Status GetAllKeyVersions(DB* db, Slice begin_key, Slice end_key, DBImpl* idb = static_cast(db->GetRootDB()); auto icmp = InternalKeyComparator(idb->GetOptions().comparator); - RangeDelAggregator range_del_agg(icmp, {} /* snapshots */); + ReadRangeDelAggregator range_del_agg(&icmp, + kMaxSequenceNumber /* upper_bound */); Arena arena; - ScopedArenaIterator iter(idb->NewInternalIterator(&arena, &range_del_agg)); + ScopedArenaIterator iter( + idb->NewInternalIterator(&arena, &range_del_agg, kMaxSequenceNumber)); if (!begin_key.empty()) { InternalKey ikey; diff --git a/ceph/src/rocksdb/utilities/document/document_db.cc b/ceph/src/rocksdb/utilities/document/document_db.cc deleted file mode 100644 index 279e4cb4d..000000000 --- a/ceph/src/rocksdb/utilities/document/document_db.cc +++ /dev/null @@ -1,1207 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE - -#include "rocksdb/utilities/document_db.h" - -#include "rocksdb/cache.h" -#include "rocksdb/table.h" -#include "rocksdb/filter_policy.h" -#include "rocksdb/comparator.h" -#include "rocksdb/db.h" -#include "rocksdb/slice.h" -#include "rocksdb/utilities/json_document.h" -#include "util/coding.h" -#include "util/mutexlock.h" -#include "port/port.h" - -namespace rocksdb { - -// IMPORTANT NOTE: Secondary index column families should be very small and -// generally fit in memory. Assume that accessing secondary index column -// families is much faster than accessing primary index (data heap) column -// family. Accessing a key (i.e. checking for existence) from a column family in -// RocksDB is not much faster than accessing both key and value since they are -// kept together and loaded from storage together. - -namespace { -// < 0 <=> lhs < rhs -// == 0 <=> lhs == rhs -// > 0 <=> lhs == rhs -// TODO(icanadi) move this to JSONDocument? -int DocumentCompare(const JSONDocument& lhs, const JSONDocument& rhs) { - assert(lhs.IsObject() == false && rhs.IsObject() == false && - lhs.type() == rhs.type()); - - switch (lhs.type()) { - case JSONDocument::kNull: - return 0; - case JSONDocument::kBool: - return static_cast(lhs.GetBool()) - static_cast(rhs.GetBool()); - case JSONDocument::kDouble: { - double res = lhs.GetDouble() - rhs.GetDouble(); - return res == 0.0 ? 0 : (res < 0.0 ? -1 : 1); - } - case JSONDocument::kInt64: { - int64_t res = lhs.GetInt64() - rhs.GetInt64(); - return res == 0 ? 0 : (res < 0 ? -1 : 1); - } - case JSONDocument::kString: - return Slice(lhs.GetString()).compare(Slice(rhs.GetString())); - default: - assert(false); - } - return 0; -} -} // namespace - -class Filter { - public: - // returns nullptr on parse failure - static Filter* ParseFilter(const JSONDocument& filter); - - struct Interval { - JSONDocument upper_bound; - JSONDocument lower_bound; - bool upper_inclusive; - bool lower_inclusive; - Interval() - : upper_bound(), - lower_bound(), - upper_inclusive(false), - lower_inclusive(false) {} - Interval(const JSONDocument& ub, const JSONDocument& lb, bool ui, bool li) - : upper_bound(ub), - lower_bound(lb), - upper_inclusive(ui), - lower_inclusive(li) { - } - - void UpdateUpperBound(const JSONDocument& ub, bool inclusive); - void UpdateLowerBound(const JSONDocument& lb, bool inclusive); - }; - - bool SatisfiesFilter(const JSONDocument& document) const; - const Interval* GetInterval(const std::string& field) const; - - private: - explicit Filter(const JSONDocument& filter) : filter_(filter.Copy()) { - assert(filter_.IsOwner()); - } - - // copied from the parameter - const JSONDocument filter_; - // constant after construction - std::unordered_map intervals_; -}; - -void Filter::Interval::UpdateUpperBound(const JSONDocument& ub, - bool inclusive) { - bool update = upper_bound.IsNull(); - if (!update) { - int cmp = DocumentCompare(upper_bound, ub); - update = (cmp > 0) || (cmp == 0 && !inclusive); - } - if (update) { - upper_bound = ub; - upper_inclusive = inclusive; - } -} - -void Filter::Interval::UpdateLowerBound(const JSONDocument& lb, - bool inclusive) { - bool update = lower_bound.IsNull(); - if (!update) { - int cmp = DocumentCompare(lower_bound, lb); - update = (cmp < 0) || (cmp == 0 && !inclusive); - } - if (update) { - lower_bound = lb; - lower_inclusive = inclusive; - } -} - -Filter* Filter::ParseFilter(const JSONDocument& filter) { - if (filter.IsObject() == false) { - return nullptr; - } - - std::unique_ptr f(new Filter(filter)); - - for (const auto& items : f->filter_.Items()) { - if (items.first.size() && items.first[0] == '$') { - // fields starting with '$' are commands - continue; - } - assert(f->intervals_.find(items.first) == f->intervals_.end()); - if (items.second.IsObject()) { - if (items.second.Count() == 0) { - // uhm...? - return nullptr; - } - Interval interval; - for (const auto& condition : items.second.Items()) { - if (condition.second.IsObject() || condition.second.IsArray()) { - // comparison operators not defined on objects. invalid array - return nullptr; - } - // comparison operators: - if (condition.first == "$gt") { - interval.UpdateLowerBound(condition.second, false); - } else if (condition.first == "$gte") { - interval.UpdateLowerBound(condition.second, true); - } else if (condition.first == "$lt") { - interval.UpdateUpperBound(condition.second, false); - } else if (condition.first == "$lte") { - interval.UpdateUpperBound(condition.second, true); - } else { - // TODO(icanadi) more logical operators - return nullptr; - } - } - f->intervals_.insert({items.first, interval}); - } else { - // equality - f->intervals_.insert( - {items.first, Interval(items.second, - items.second, true, true)}); - } - } - - return f.release(); -} - -const Filter::Interval* Filter::GetInterval(const std::string& field) const { - auto itr = intervals_.find(field); - if (itr == intervals_.end()) { - return nullptr; - } - // we can do that since intervals_ is constant after construction - return &itr->second; -} - -bool Filter::SatisfiesFilter(const JSONDocument& document) const { - for (const auto& interval : intervals_) { - if (!document.Contains(interval.first)) { - // doesn't have the value, doesn't satisfy the filter - // (we don't support null queries yet) - return false; - } - auto value = document[interval.first]; - if (!interval.second.upper_bound.IsNull()) { - if (value.type() != interval.second.upper_bound.type()) { - // no cross-type queries yet - // TODO(icanadi) do this at least for numbers! - return false; - } - int cmp = DocumentCompare(interval.second.upper_bound, value); - if (cmp < 0 || (cmp == 0 && interval.second.upper_inclusive == false)) { - // bigger (or equal) than upper bound - return false; - } - } - if (!interval.second.lower_bound.IsNull()) { - if (value.type() != interval.second.lower_bound.type()) { - // no cross-type queries yet - return false; - } - int cmp = DocumentCompare(interval.second.lower_bound, value); - if (cmp > 0 || (cmp == 0 && interval.second.lower_inclusive == false)) { - // smaller (or equal) than the lower bound - return false; - } - } - } - return true; -} - -class Index { - public: - Index() = default; - virtual ~Index() {} - - virtual const char* Name() const = 0; - - // Functions that are executed during write time - // --------------------------------------------- - // GetIndexKey() generates a key that will be used to index document and - // returns the key though the second std::string* parameter - virtual void GetIndexKey(const JSONDocument& document, - std::string* key) const = 0; - // Keys generated with GetIndexKey() will be compared using this comparator. - // It should be assumed that there will be a suffix added to the index key - // according to IndexKey implementation - virtual const Comparator* GetComparator() const = 0; - - // Functions that are executed during query time - // --------------------------------------------- - enum Direction { - kForwards, - kBackwards, - }; - // Returns true if this index can provide some optimization for satisfying - // filter. False otherwise - virtual bool UsefulIndex(const Filter& filter) const = 0; - // For every filter (assuming UsefulIndex()) there is a continuous interval of - // keys in the index that satisfy the index conditions. That interval can be - // three things: - // * [A, B] - // * [A, infinity> - // * <-infinity, B] - // - // Query engine that uses this Index for optimization will access the interval - // by first calling Position() and then iterating in the Direction (returned - // by Position()) while ShouldContinueLooking() is true. - // * For [A, B] interval Position() will Seek() to A and return kForwards. - // ShouldContinueLooking() will be true until the iterator value gets beyond B - // -- then it will return false - // * For [A, infinity> Position() will Seek() to A and return kForwards. - // ShouldContinueLooking() will always return true - // * For <-infinity, B] Position() will Seek() to B and return kBackwards. - // ShouldContinueLooking() will always return true (given that iterator is - // advanced by calling Prev()) - virtual Direction Position(const Filter& filter, - Iterator* iterator) const = 0; - virtual bool ShouldContinueLooking(const Filter& filter, - const Slice& secondary_key, - Direction direction) const = 0; - - // Static function that is executed when Index is created - // --------------------------------------------- - // Create Index from user-supplied description. Return nullptr on parse - // failure. - static Index* CreateIndexFromDescription(const JSONDocument& description, - const std::string& name); - - private: - // No copying allowed - Index(const Index&); - void operator=(const Index&); -}; - -// Encoding helper function -namespace { -std::string InternalSecondaryIndexName(const std::string& user_name) { - return "index_" + user_name; -} - -// Don't change these, they are persisted in secondary indexes -enum JSONPrimitivesEncoding : char { - kNull = 0x1, - kBool = 0x2, - kDouble = 0x3, - kInt64 = 0x4, - kString = 0x5, -}; - -// encodes simple JSON members (meaning string, integer, etc) -// the end result of this will be lexicographically compared to each other -bool EncodeJSONPrimitive(const JSONDocument& json, std::string* dst) { - // TODO(icanadi) revise this at some point, have a custom comparator - switch (json.type()) { - case JSONDocument::kNull: - dst->push_back(kNull); - break; - case JSONDocument::kBool: - dst->push_back(kBool); - dst->push_back(static_cast(json.GetBool())); - break; - case JSONDocument::kDouble: - dst->push_back(kDouble); - PutFixed64(dst, static_cast(json.GetDouble())); - break; - case JSONDocument::kInt64: - dst->push_back(kInt64); - { - auto val = json.GetInt64(); - dst->push_back((val < 0) ? '0' : '1'); - PutFixed64(dst, static_cast(val)); - } - break; - case JSONDocument::kString: - dst->push_back(kString); - dst->append(json.GetString()); - break; - default: - return false; - } - return true; -} - -} // namespace - -// format of the secondary key is: -// -class IndexKey { - public: - IndexKey() : ok_(false) {} - explicit IndexKey(const Slice& slice) { - if (slice.size() < sizeof(uint32_t)) { - ok_ = false; - return; - } - uint32_t primary_key_offset = - DecodeFixed32(slice.data() + slice.size() - sizeof(uint32_t)); - if (primary_key_offset >= slice.size() - sizeof(uint32_t)) { - ok_ = false; - return; - } - parts_[0] = Slice(slice.data(), primary_key_offset); - parts_[1] = Slice(slice.data() + primary_key_offset, - slice.size() - primary_key_offset - sizeof(uint32_t)); - ok_ = true; - } - IndexKey(const Slice& secondary_key, const Slice& primary_key) : ok_(true) { - parts_[0] = secondary_key; - parts_[1] = primary_key; - } - - SliceParts GetSliceParts() { - uint32_t primary_key_offset = static_cast(parts_[0].size()); - EncodeFixed32(primary_key_offset_buf_, primary_key_offset); - parts_[2] = Slice(primary_key_offset_buf_, sizeof(uint32_t)); - return SliceParts(parts_, 3); - } - - const Slice& GetPrimaryKey() const { return parts_[1]; } - const Slice& GetSecondaryKey() const { return parts_[0]; } - - bool ok() const { return ok_; } - - private: - bool ok_; - // 0 -- secondary key - // 1 -- primary key - // 2 -- primary key offset - Slice parts_[3]; - char primary_key_offset_buf_[sizeof(uint32_t)]; -}; - -class SimpleSortedIndex : public Index { - public: - SimpleSortedIndex(const std::string& field, const std::string& name) - : field_(field), name_(name) {} - - virtual const char* Name() const override { return name_.c_str(); } - - virtual void GetIndexKey(const JSONDocument& document, std::string* key) const - override { - if (!document.Contains(field_)) { - if (!EncodeJSONPrimitive(JSONDocument(JSONDocument::kNull), key)) { - assert(false); - } - } else { - if (!EncodeJSONPrimitive(document[field_], key)) { - assert(false); - } - } - } - virtual const Comparator* GetComparator() const override { - return BytewiseComparator(); - } - - virtual bool UsefulIndex(const Filter& filter) const override { - return filter.GetInterval(field_) != nullptr; - } - // REQUIRES: UsefulIndex(filter) == true - virtual Direction Position(const Filter& filter, - Iterator* iterator) const override { - auto interval = filter.GetInterval(field_); - assert(interval != nullptr); // because index is useful - Direction direction; - - const JSONDocument* limit; - if (!interval->lower_bound.IsNull()) { - limit = &(interval->lower_bound); - direction = kForwards; - } else { - limit = &(interval->upper_bound); - direction = kBackwards; - } - - std::string encoded_limit; - if (!EncodeJSONPrimitive(*limit, &encoded_limit)) { - assert(false); - } - iterator->Seek(Slice(encoded_limit)); - - return direction; - } - // REQUIRES: UsefulIndex(filter) == true -#if defined(_MSC_VER) -#pragma warning(push) -#pragma warning(disable : 4702) // Unreachable code -#endif - virtual bool ShouldContinueLooking( - const Filter& filter, const Slice& secondary_key, - Index::Direction direction) const override { - auto interval = filter.GetInterval(field_); - assert(interval != nullptr); // because index is useful - if (direction == kForwards) { - if (interval->upper_bound.IsNull()) { - // continue looking, no upper bound - return true; - } - std::string encoded_upper_bound; - if (!EncodeJSONPrimitive(interval->upper_bound, &encoded_upper_bound)) { - // uhm...? - // TODO(icanadi) store encoded upper and lower bounds in Filter*? - assert(false); - } - // TODO(icanadi) we need to somehow decode this and use DocumentCompare() - int compare = secondary_key.compare(Slice(encoded_upper_bound)); - // if (current key is bigger than upper bound) OR (current key is equal to - // upper bound, but inclusive is false) THEN stop looking. otherwise, - // continue - return (compare > 0 || - (compare == 0 && interval->upper_inclusive == false)) - ? false - : true; - } else { - assert(direction == kBackwards); - if (interval->lower_bound.IsNull()) { - // continue looking, no lower bound - return true; - } - std::string encoded_lower_bound; - if (!EncodeJSONPrimitive(interval->lower_bound, &encoded_lower_bound)) { - // uhm...? - // TODO(icanadi) store encoded upper and lower bounds in Filter*? - assert(false); - } - // TODO(icanadi) we need to somehow decode this and use DocumentCompare() - int compare = secondary_key.compare(Slice(encoded_lower_bound)); - // if (current key is smaller than lower bound) OR (current key is equal - // to lower bound, but inclusive is false) THEN stop looking. otherwise, - // continue - return (compare < 0 || - (compare == 0 && interval->lower_inclusive == false)) - ? false - : true; - } - - assert(false); - // this is here just so compiler doesn't complain - return false; - } -#if defined(_MSC_VER) -#pragma warning(pop) -#endif - private: - std::string field_; - std::string name_; -}; - -Index* Index::CreateIndexFromDescription(const JSONDocument& description, - const std::string& name) { - if (!description.IsObject() || description.Count() != 1) { - // not supported yet - return nullptr; - } - const auto& field = *description.Items().begin(); - if (field.second.IsInt64() == false || field.second.GetInt64() != 1) { - // not supported yet - return nullptr; - } - return new SimpleSortedIndex(field.first, name); -} - -class CursorWithFilterIndexed : public Cursor { - public: - CursorWithFilterIndexed(Iterator* primary_index_iter, - Iterator* secondary_index_iter, const Index* index, - const Filter* filter) - : primary_index_iter_(primary_index_iter), - secondary_index_iter_(secondary_index_iter), - index_(index), - filter_(filter), - valid_(true), - current_json_document_(nullptr) { - assert(filter_.get() != nullptr); - direction_ = index->Position(*filter_.get(), secondary_index_iter_.get()); - UpdateIndexKey(); - AdvanceUntilSatisfies(); - } - - virtual bool Valid() const override { - return valid_ && secondary_index_iter_->Valid(); - } - virtual void Next() override { - assert(Valid()); - Advance(); - AdvanceUntilSatisfies(); - } - // temporary object. copy it if you want to use it - virtual const JSONDocument& document() const override { - assert(Valid()); - return *current_json_document_; - } - virtual Status status() const override { - if (!status_.ok()) { - return status_; - } - if (!primary_index_iter_->status().ok()) { - return primary_index_iter_->status(); - } - return secondary_index_iter_->status(); - } - - private: - void Advance() { - if (direction_ == Index::kForwards) { - secondary_index_iter_->Next(); - } else { - secondary_index_iter_->Prev(); - } - UpdateIndexKey(); - } - void AdvanceUntilSatisfies() { - bool found = false; - while (secondary_index_iter_->Valid() && - index_->ShouldContinueLooking( - *filter_.get(), index_key_.GetSecondaryKey(), direction_)) { - if (!UpdateJSONDocument()) { - // corruption happened - return; - } - if (filter_->SatisfiesFilter(*current_json_document_)) { - // we found satisfied! - found = true; - break; - } else { - // doesn't satisfy :( - Advance(); - } - } - if (!found) { - valid_ = false; - } - } - - bool UpdateJSONDocument() { - assert(secondary_index_iter_->Valid()); - primary_index_iter_->Seek(index_key_.GetPrimaryKey()); - if (!primary_index_iter_->Valid()) { - status_ = Status::Corruption( - "Inconsistency between primary and secondary index"); - valid_ = false; - return false; - } - current_json_document_.reset( - JSONDocument::Deserialize(primary_index_iter_->value())); - assert(current_json_document_->IsOwner()); - if (current_json_document_.get() == nullptr) { - status_ = Status::Corruption("JSON deserialization failed"); - valid_ = false; - return false; - } - return true; - } - void UpdateIndexKey() { - if (secondary_index_iter_->Valid()) { - index_key_ = IndexKey(secondary_index_iter_->key()); - if (!index_key_.ok()) { - status_ = Status::Corruption("Invalid index key"); - valid_ = false; - } - } - } - std::unique_ptr primary_index_iter_; - std::unique_ptr secondary_index_iter_; - // we don't own index_ - const Index* index_; - Index::Direction direction_; - std::unique_ptr filter_; - bool valid_; - IndexKey index_key_; - std::unique_ptr current_json_document_; - Status status_; -}; - -class CursorFromIterator : public Cursor { - public: - explicit CursorFromIterator(Iterator* iter) - : iter_(iter), current_json_document_(nullptr) { - iter_->SeekToFirst(); - UpdateCurrentJSON(); - } - - virtual bool Valid() const override { return status_.ok() && iter_->Valid(); } - virtual void Next() override { - iter_->Next(); - UpdateCurrentJSON(); - } - virtual const JSONDocument& document() const override { - assert(Valid()); - return *current_json_document_; - }; - virtual Status status() const override { - if (!status_.ok()) { - return status_; - } - return iter_->status(); - } - - // not part of public Cursor interface - Slice key() const { return iter_->key(); } - - private: - void UpdateCurrentJSON() { - if (Valid()) { - current_json_document_.reset(JSONDocument::Deserialize(iter_->value())); - if (current_json_document_.get() == nullptr) { - status_ = Status::Corruption("JSON deserialization failed"); - } - } - } - - Status status_; - std::unique_ptr iter_; - std::unique_ptr current_json_document_; -}; - -class CursorWithFilter : public Cursor { - public: - CursorWithFilter(Cursor* base_cursor, const Filter* filter) - : base_cursor_(base_cursor), filter_(filter) { - assert(filter_.get() != nullptr); - SeekToNextSatisfies(); - } - virtual bool Valid() const override { return base_cursor_->Valid(); } - virtual void Next() override { - assert(Valid()); - base_cursor_->Next(); - SeekToNextSatisfies(); - } - virtual const JSONDocument& document() const override { - assert(Valid()); - return base_cursor_->document(); - } - virtual Status status() const override { return base_cursor_->status(); } - - private: - void SeekToNextSatisfies() { - for (; base_cursor_->Valid(); base_cursor_->Next()) { - if (filter_->SatisfiesFilter(base_cursor_->document())) { - break; - } - } - } - std::unique_ptr base_cursor_; - std::unique_ptr filter_; -}; - -class CursorError : public Cursor { - public: - explicit CursorError(Status s) : s_(s) { assert(!s.ok()); } - virtual Status status() const override { return s_; } - virtual bool Valid() const override { return false; } - virtual void Next() override {} - virtual const JSONDocument& document() const override { - assert(false); - // compiler complains otherwise - return trash_; - } - - private: - Status s_; - JSONDocument trash_; -}; - -class DocumentDBImpl : public DocumentDB { - public: - DocumentDBImpl( - DB* db, ColumnFamilyHandle* primary_key_column_family, - const std::vector>& indexes, - const Options& rocksdb_options) - : DocumentDB(db), - primary_key_column_family_(primary_key_column_family), - rocksdb_options_(rocksdb_options) { - for (const auto& index : indexes) { - name_to_index_.insert( - {index.first->Name(), IndexColumnFamily(index.first, index.second)}); - } - } - - ~DocumentDBImpl() { - for (auto& iter : name_to_index_) { - delete iter.second.index; - delete iter.second.column_family; - } - delete primary_key_column_family_; - } - - virtual Status CreateIndex(const WriteOptions& write_options, - const IndexDescriptor& index) override { - auto index_obj = - Index::CreateIndexFromDescription(*index.description, index.name); - if (index_obj == nullptr) { - return Status::InvalidArgument("Failed parsing index description"); - } - - ColumnFamilyHandle* cf_handle; - Status s = - CreateColumnFamily(ColumnFamilyOptions(rocksdb_options_), - InternalSecondaryIndexName(index.name), &cf_handle); - if (!s.ok()) { - delete index_obj; - return s; - } - - MutexLock l(&write_mutex_); - - std::unique_ptr cursor(new CursorFromIterator( - DocumentDB::NewIterator(ReadOptions(), primary_key_column_family_))); - - WriteBatch batch; - for (; cursor->Valid(); cursor->Next()) { - std::string secondary_index_key; - index_obj->GetIndexKey(cursor->document(), &secondary_index_key); - IndexKey index_key(Slice(secondary_index_key), cursor->key()); - batch.Put(cf_handle, index_key.GetSliceParts(), SliceParts()); - } - - if (!cursor->status().ok()) { - delete index_obj; - return cursor->status(); - } - - { - MutexLock l_nti(&name_to_index_mutex_); - name_to_index_.insert( - {index.name, IndexColumnFamily(index_obj, cf_handle)}); - } - - return DocumentDB::Write(write_options, &batch); - } - - virtual Status DropIndex(const std::string& name) override { - MutexLock l(&write_mutex_); - - auto index_iter = name_to_index_.find(name); - if (index_iter == name_to_index_.end()) { - return Status::InvalidArgument("No such index"); - } - - Status s = DropColumnFamily(index_iter->second.column_family); - if (!s.ok()) { - return s; - } - - delete index_iter->second.index; - delete index_iter->second.column_family; - - // remove from name_to_index_ - { - MutexLock l_nti(&name_to_index_mutex_); - name_to_index_.erase(index_iter); - } - - return Status::OK(); - } - - virtual Status Insert(const WriteOptions& options, - const JSONDocument& document) override { - WriteBatch batch; - - if (!document.IsObject()) { - return Status::InvalidArgument("Document not an object"); - } - if (!document.Contains(kPrimaryKey)) { - return Status::InvalidArgument("No primary key"); - } - auto primary_key = document[kPrimaryKey]; - if (primary_key.IsNull() || - (!primary_key.IsString() && !primary_key.IsInt64())) { - return Status::InvalidArgument( - "Primary key format error"); - } - std::string encoded_document; - document.Serialize(&encoded_document); - std::string primary_key_encoded; - if (!EncodeJSONPrimitive(primary_key, &primary_key_encoded)) { - // previous call should be guaranteed to pass because of all primary_key - // conditions checked before - assert(false); - } - Slice primary_key_slice(primary_key_encoded); - - // Lock now, since we're starting DB operations - MutexLock l(&write_mutex_); - // check if there is already a document with the same primary key - PinnableSlice value; - Status s = DocumentDB::Get(ReadOptions(), primary_key_column_family_, - primary_key_slice, &value); - if (!s.IsNotFound()) { - return s.ok() ? Status::InvalidArgument("Duplicate primary key!") : s; - } - - batch.Put(primary_key_column_family_, primary_key_slice, encoded_document); - - for (const auto& iter : name_to_index_) { - std::string secondary_index_key; - iter.second.index->GetIndexKey(document, &secondary_index_key); - IndexKey index_key(Slice(secondary_index_key), primary_key_slice); - batch.Put(iter.second.column_family, index_key.GetSliceParts(), - SliceParts()); - } - - return DocumentDB::Write(options, &batch); - } - - virtual Status Remove(const ReadOptions& read_options, - const WriteOptions& write_options, - const JSONDocument& query) override { - MutexLock l(&write_mutex_); - std::unique_ptr cursor( - ConstructFilterCursor(read_options, nullptr, query)); - - WriteBatch batch; - for (; cursor->status().ok() && cursor->Valid(); cursor->Next()) { - const auto& document = cursor->document(); - if (!document.IsObject()) { - return Status::Corruption("Document corruption"); - } - if (!document.Contains(kPrimaryKey)) { - return Status::Corruption("Document corruption"); - } - auto primary_key = document[kPrimaryKey]; - if (primary_key.IsNull() || - (!primary_key.IsString() && !primary_key.IsInt64())) { - return Status::Corruption("Document corruption"); - } - - // TODO(icanadi) Instead of doing this, just get primary key encoding from - // cursor, as it already has this information - std::string primary_key_encoded; - if (!EncodeJSONPrimitive(primary_key, &primary_key_encoded)) { - // previous call should be guaranteed to pass because of all primary_key - // conditions checked before - assert(false); - } - Slice primary_key_slice(primary_key_encoded); - batch.Delete(primary_key_column_family_, primary_key_slice); - - for (const auto& iter : name_to_index_) { - std::string secondary_index_key; - iter.second.index->GetIndexKey(document, &secondary_index_key); - IndexKey index_key(Slice(secondary_index_key), primary_key_slice); - batch.Delete(iter.second.column_family, index_key.GetSliceParts()); - } - } - - if (!cursor->status().ok()) { - return cursor->status(); - } - - return DocumentDB::Write(write_options, &batch); - } - - virtual Status Update(const ReadOptions& read_options, - const WriteOptions& write_options, - const JSONDocument& filter, - const JSONDocument& updates) override { - MutexLock l(&write_mutex_); - std::unique_ptr cursor( - ConstructFilterCursor(read_options, nullptr, filter)); - - if (!updates.IsObject()) { - return Status::Corruption("Bad update document format"); - } - WriteBatch batch; - for (; cursor->status().ok() && cursor->Valid(); cursor->Next()) { - const auto& old_document = cursor->document(); - JSONDocument new_document(old_document); - if (!new_document.IsObject()) { - return Status::Corruption("Document corruption"); - } - // TODO(icanadi) Make this nicer, something like class Filter - for (const auto& update : updates.Items()) { - if (update.first == "$set") { - JSONDocumentBuilder builder; - bool res __attribute__((__unused__)) = builder.WriteStartObject(); - assert(res); - for (const auto& itr : update.second.Items()) { - if (itr.first == kPrimaryKey) { - return Status::NotSupported("Please don't change primary key"); - } - res = builder.WriteKeyValue(itr.first, itr.second); - assert(res); - } - res = builder.WriteEndObject(); - assert(res); - JSONDocument update_document = builder.GetJSONDocument(); - builder.Reset(); - res = builder.WriteStartObject(); - assert(res); - for (const auto& itr : new_document.Items()) { - if (update_document.Contains(itr.first)) { - res = builder.WriteKeyValue(itr.first, - update_document[itr.first]); - } else { - res = builder.WriteKeyValue(itr.first, new_document[itr.first]); - } - assert(res); - } - res = builder.WriteEndObject(); - assert(res); - new_document = builder.GetJSONDocument(); - assert(new_document.IsOwner()); - } else { - // TODO(icanadi) more commands - return Status::InvalidArgument("Can't understand update command"); - } - } - - // TODO(icanadi) reuse some of this code - if (!new_document.Contains(kPrimaryKey)) { - return Status::Corruption("Corrupted document -- primary key missing"); - } - auto primary_key = new_document[kPrimaryKey]; - if (primary_key.IsNull() || - (!primary_key.IsString() && !primary_key.IsInt64())) { - // This will happen when document on storage doesn't have primary key, - // since we don't support any update operations on primary key. That's - // why this is corruption error - return Status::Corruption("Corrupted document -- primary key missing"); - } - std::string encoded_document; - new_document.Serialize(&encoded_document); - std::string primary_key_encoded; - if (!EncodeJSONPrimitive(primary_key, &primary_key_encoded)) { - // previous call should be guaranteed to pass because of all primary_key - // conditions checked before - assert(false); - } - Slice primary_key_slice(primary_key_encoded); - batch.Put(primary_key_column_family_, primary_key_slice, - encoded_document); - - for (const auto& iter : name_to_index_) { - std::string old_key, new_key; - iter.second.index->GetIndexKey(old_document, &old_key); - iter.second.index->GetIndexKey(new_document, &new_key); - if (old_key == new_key) { - // don't need to update this secondary index - continue; - } - - IndexKey old_index_key(Slice(old_key), primary_key_slice); - IndexKey new_index_key(Slice(new_key), primary_key_slice); - - batch.Delete(iter.second.column_family, old_index_key.GetSliceParts()); - batch.Put(iter.second.column_family, new_index_key.GetSliceParts(), - SliceParts()); - } - } - - if (!cursor->status().ok()) { - return cursor->status(); - } - - return DocumentDB::Write(write_options, &batch); - } - - virtual Cursor* Query(const ReadOptions& read_options, - const JSONDocument& query) override { - Cursor* cursor = nullptr; - - if (!query.IsArray()) { - return new CursorError( - Status::InvalidArgument("Query has to be an array")); - } - - // TODO(icanadi) support index "_id" - for (size_t i = 0; i < query.Count(); ++i) { - const auto& command_doc = query[i]; - if (command_doc.Count() != 1) { - // there can be only one key-value pair in each of array elements. - // key is the command and value are the params - delete cursor; - return new CursorError(Status::InvalidArgument("Invalid query")); - } - const auto& command = *command_doc.Items().begin(); - - if (command.first == "$filter") { - cursor = ConstructFilterCursor(read_options, cursor, command.second); - } else { - // only filter is supported for now - delete cursor; - return new CursorError(Status::InvalidArgument("Invalid query")); - } - } - - if (cursor == nullptr) { - cursor = new CursorFromIterator( - DocumentDB::NewIterator(read_options, primary_key_column_family_)); - } - - return cursor; - } - - // RocksDB functions - using DB::Get; - virtual Status Get(const ReadOptions& /*options*/, - ColumnFamilyHandle* /*column_family*/, - const Slice& /*key*/, PinnableSlice* /*value*/) override { - return Status::NotSupported(""); - } - virtual Status Get(const ReadOptions& /*options*/, const Slice& /*key*/, - std::string* /*value*/) override { - return Status::NotSupported(""); - } - virtual Status Write(const WriteOptions& /*options*/, - WriteBatch* /*updates*/) override { - return Status::NotSupported(""); - } - virtual Iterator* NewIterator( - const ReadOptions& /*options*/, - ColumnFamilyHandle* /*column_family*/) override { - return nullptr; - } - virtual Iterator* NewIterator(const ReadOptions& /*options*/) override { - return nullptr; - } - - private: -#if defined(_MSC_VER) -#pragma warning(push) -#pragma warning(disable : 4702) // unreachable code -#endif - Cursor* ConstructFilterCursor(ReadOptions read_options, Cursor* cursor, - const JSONDocument& query) { - std::unique_ptr filter(Filter::ParseFilter(query)); - if (filter.get() == nullptr) { - return new CursorError(Status::InvalidArgument("Invalid query")); - } - - IndexColumnFamily tmp_storage(nullptr, nullptr); - - if (cursor == nullptr) { - IndexColumnFamily* index_column_family = nullptr; - if (query.Contains("$index") && query["$index"].IsString()) { - { - auto index_name = query["$index"]; - MutexLock l(&name_to_index_mutex_); - auto index_iter = name_to_index_.find(index_name.GetString()); - if (index_iter != name_to_index_.end()) { - tmp_storage = index_iter->second; - index_column_family = &tmp_storage; - } else { - return new CursorError( - Status::InvalidArgument("Index does not exist")); - } - } - } - - if (index_column_family != nullptr && - index_column_family->index->UsefulIndex(*filter.get())) { - std::vector iterators; - Status s = DocumentDB::NewIterators( - read_options, - {primary_key_column_family_, index_column_family->column_family}, - &iterators); - if (!s.ok()) { - delete cursor; - return new CursorError(s); - } - assert(iterators.size() == 2); - return new CursorWithFilterIndexed(iterators[0], iterators[1], - index_column_family->index, - filter.release()); - } else { - return new CursorWithFilter( - new CursorFromIterator(DocumentDB::NewIterator( - read_options, primary_key_column_family_)), - filter.release()); - } - } else { - return new CursorWithFilter(cursor, filter.release()); - } - assert(false); - return nullptr; - } -#if defined(_MSC_VER) -#pragma warning(pop) -#endif - - // currently, we lock and serialize all writes to rocksdb. reads are not - // locked and always get consistent view of the database. we should optimize - // locking in the future - port::Mutex write_mutex_; - port::Mutex name_to_index_mutex_; - const char* kPrimaryKey = "_id"; - struct IndexColumnFamily { - IndexColumnFamily(Index* _index, ColumnFamilyHandle* _column_family) - : index(_index), column_family(_column_family) {} - Index* index; - ColumnFamilyHandle* column_family; - }; - - - // name_to_index_ protected: - // 1) when writing -- 1. lock write_mutex_, 2. lock name_to_index_mutex_ - // 2) when reading -- lock name_to_index_mutex_ OR write_mutex_ - std::unordered_map name_to_index_; - ColumnFamilyHandle* primary_key_column_family_; - Options rocksdb_options_; -}; - -namespace { -Options GetRocksDBOptionsFromOptions(const DocumentDBOptions& options) { - Options rocksdb_options; - rocksdb_options.max_background_compactions = options.background_threads - 1; - rocksdb_options.max_background_flushes = 1; - rocksdb_options.write_buffer_size = static_cast(options.memtable_size); - rocksdb_options.max_write_buffer_number = 6; - BlockBasedTableOptions table_options; - table_options.block_cache = NewLRUCache(static_cast(options.cache_size)); - rocksdb_options.table_factory.reset(NewBlockBasedTableFactory(table_options)); - return rocksdb_options; -} -} // namespace - -Status DocumentDB::Open(const DocumentDBOptions& options, - const std::string& name, - const std::vector& indexes, - DocumentDB** db, bool read_only) { - Options rocksdb_options = GetRocksDBOptionsFromOptions(options); - rocksdb_options.create_if_missing = true; - - std::vector column_families; - column_families.push_back(ColumnFamilyDescriptor( - kDefaultColumnFamilyName, ColumnFamilyOptions(rocksdb_options))); - for (const auto& index : indexes) { - column_families.emplace_back(InternalSecondaryIndexName(index.name), - ColumnFamilyOptions(rocksdb_options)); - } - std::vector handles; - DB* base_db; - Status s; - if (read_only) { - s = DB::OpenForReadOnly(DBOptions(rocksdb_options), name, column_families, - &handles, &base_db); - } else { - s = DB::Open(DBOptions(rocksdb_options), name, column_families, &handles, - &base_db); - } - if (!s.ok()) { - return s; - } - - std::vector> index_cf(indexes.size()); - assert(handles.size() == indexes.size() + 1); - for (size_t i = 0; i < indexes.size(); ++i) { - auto index = Index::CreateIndexFromDescription(*indexes[i].description, - indexes[i].name); - index_cf[i] = {index, handles[i + 1]}; - } - *db = new DocumentDBImpl(base_db, handles[0], index_cf, rocksdb_options); - return Status::OK(); -} - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/document/document_db_test.cc b/ceph/src/rocksdb/utilities/document/document_db_test.cc deleted file mode 100644 index 3ee560db1..000000000 --- a/ceph/src/rocksdb/utilities/document/document_db_test.cc +++ /dev/null @@ -1,338 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE - -#include - -#include "rocksdb/utilities/json_document.h" -#include "rocksdb/utilities/document_db.h" - -#include "util/testharness.h" -#include "util/testutil.h" - -namespace rocksdb { - -class DocumentDBTest : public testing::Test { - public: - DocumentDBTest() { - dbname_ = test::PerThreadDBPath("document_db_test"); - DestroyDB(dbname_, Options()); - } - ~DocumentDBTest() { - delete db_; - DestroyDB(dbname_, Options()); - } - - void AssertCursorIDs(Cursor* cursor, std::vector expected) { - std::vector got; - while (cursor->Valid()) { - ASSERT_TRUE(cursor->Valid()); - ASSERT_TRUE(cursor->document().Contains("_id")); - got.push_back(cursor->document()["_id"].GetInt64()); - cursor->Next(); - } - std::sort(expected.begin(), expected.end()); - std::sort(got.begin(), got.end()); - ASSERT_TRUE(got == expected); - } - - // converts ' to ", so that we don't have to escape " all over the place - std::string ConvertQuotes(const std::string& input) { - std::string output; - for (auto x : input) { - if (x == '\'') { - output.push_back('\"'); - } else { - output.push_back(x); - } - } - return output; - } - - void CreateIndexes(std::vector indexes) { - for (auto i : indexes) { - ASSERT_OK(db_->CreateIndex(WriteOptions(), i)); - } - } - - JSONDocument* Parse(const std::string& doc) { - return JSONDocument::ParseJSON(ConvertQuotes(doc).c_str()); - } - - std::string dbname_; - DocumentDB* db_; -}; - -TEST_F(DocumentDBTest, SimpleQueryTest) { - DocumentDBOptions options; - DocumentDB::IndexDescriptor index; - index.description = Parse("{\"name\": 1}"); - index.name = "name_index"; - - ASSERT_OK(DocumentDB::Open(options, dbname_, {}, &db_)); - CreateIndexes({index}); - delete db_; - db_ = nullptr; - // now there is index present - ASSERT_OK(DocumentDB::Open(options, dbname_, {index}, &db_)); - assert(db_ != nullptr); - delete index.description; - - std::vector json_objects = { - "{\"_id\': 1, \"name\": \"One\"}", "{\"_id\": 2, \"name\": \"Two\"}", - "{\"_id\": 3, \"name\": \"Three\"}", "{\"_id\": 4, \"name\": \"Four\"}"}; - - for (auto& json : json_objects) { - std::unique_ptr document(Parse(json)); - ASSERT_TRUE(document.get() != nullptr); - ASSERT_OK(db_->Insert(WriteOptions(), *document)); - } - - // inserting a document with existing primary key should return failure - { - std::unique_ptr document(Parse(json_objects[0])); - ASSERT_TRUE(document.get() != nullptr); - Status s = db_->Insert(WriteOptions(), *document); - ASSERT_TRUE(s.IsInvalidArgument()); - } - - // find equal to "Two" - { - std::unique_ptr query( - Parse("[{'$filter': {'name': 'Two', '$index': 'name_index'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {2}); - } - - // find less than "Three" - { - std::unique_ptr query(Parse( - "[{'$filter': {'name': {'$lt': 'Three'}, '$index': " - "'name_index'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - - AssertCursorIDs(cursor.get(), {1, 4}); - } - - // find less than "Three" without index - { - std::unique_ptr query( - Parse("[{'$filter': {'name': {'$lt': 'Three'} }}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {1, 4}); - } - - // remove less or equal to "Three" - { - std::unique_ptr query( - Parse("{'name': {'$lte': 'Three'}, '$index': 'name_index'}")); - ASSERT_OK(db_->Remove(ReadOptions(), WriteOptions(), *query)); - } - - // find all -- only "Two" left, everything else should be deleted - { - std::unique_ptr query(Parse("[]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {2}); - } -} - -TEST_F(DocumentDBTest, ComplexQueryTest) { - DocumentDBOptions options; - DocumentDB::IndexDescriptor priority_index; - priority_index.description = Parse("{'priority': 1}"); - priority_index.name = "priority"; - DocumentDB::IndexDescriptor job_name_index; - job_name_index.description = Parse("{'job_name': 1}"); - job_name_index.name = "job_name"; - DocumentDB::IndexDescriptor progress_index; - progress_index.description = Parse("{'progress': 1}"); - progress_index.name = "progress"; - - ASSERT_OK(DocumentDB::Open(options, dbname_, {}, &db_)); - CreateIndexes({priority_index, progress_index}); - delete priority_index.description; - delete progress_index.description; - - std::vector json_objects = { - "{'_id': 1, 'job_name': 'play', 'priority': 10, 'progress': 14.2}", - "{'_id': 2, 'job_name': 'white', 'priority': 2, 'progress': 45.1}", - "{'_id': 3, 'job_name': 'straw', 'priority': 5, 'progress': 83.2}", - "{'_id': 4, 'job_name': 'temporary', 'priority': 3, 'progress': 14.9}", - "{'_id': 5, 'job_name': 'white', 'priority': 4, 'progress': 44.2}", - "{'_id': 6, 'job_name': 'tea', 'priority': 1, 'progress': 12.4}", - "{'_id': 7, 'job_name': 'delete', 'priority': 2, 'progress': 77.54}", - "{'_id': 8, 'job_name': 'rock', 'priority': 3, 'progress': 93.24}", - "{'_id': 9, 'job_name': 'steady', 'priority': 3, 'progress': 9.1}", - "{'_id': 10, 'job_name': 'white', 'priority': 1, 'progress': 61.4}", - "{'_id': 11, 'job_name': 'who', 'priority': 4, 'progress': 39.41}", - "{'_id': 12, 'job_name': 'who', 'priority': -1, 'progress': 39.42}", - "{'_id': 13, 'job_name': 'who', 'priority': -2, 'progress': 39.42}", }; - - // add index on the fly! - CreateIndexes({job_name_index}); - delete job_name_index.description; - - for (auto& json : json_objects) { - std::unique_ptr document(Parse(json)); - ASSERT_TRUE(document != nullptr); - ASSERT_OK(db_->Insert(WriteOptions(), *document)); - } - - // 2 < priority < 4 AND progress > 10.0, index priority - { - std::unique_ptr query(Parse( - "[{'$filter': {'priority': {'$lt': 4, '$gt': 2}, 'progress': {'$gt': " - "10.0}, '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {4, 8}); - } - - // -1 <= priority <= 1, index priority - { - std::unique_ptr query(Parse( - "[{'$filter': {'priority': {'$lte': 1, '$gte': -1}," - " '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {6, 10, 12}); - } - - // 2 < priority < 4 AND progress > 10.0, index progress - { - std::unique_ptr query(Parse( - "[{'$filter': {'priority': {'$lt': 4, '$gt': 2}, 'progress': {'$gt': " - "10.0}, '$index': 'progress'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {4, 8}); - } - - // job_name == 'white' AND priority >= 2, index job_name - { - std::unique_ptr query(Parse( - "[{'$filter': {'job_name': 'white', 'priority': {'$gte': " - "2}, '$index': 'job_name'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {2, 5}); - } - - // 35.0 <= progress < 65.5, index progress - { - std::unique_ptr query(Parse( - "[{'$filter': {'progress': {'$gt': 5.0, '$gte': 35.0, '$lt': 65.5}, " - "'$index': 'progress'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {2, 5, 10, 11, 12, 13}); - } - - // 2 < priority <= 4, index priority - { - std::unique_ptr query(Parse( - "[{'$filter': {'priority': {'$gt': 2, '$lt': 8, '$lte': 4}, " - "'$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {4, 5, 8, 9, 11}); - } - - // Delete all whose progress is bigger than 50% - { - std::unique_ptr query( - Parse("{'progress': {'$gt': 50.0}, '$index': 'progress'}")); - ASSERT_OK(db_->Remove(ReadOptions(), WriteOptions(), *query)); - } - - // 2 < priority < 6, index priority - { - std::unique_ptr query(Parse( - "[{'$filter': {'priority': {'$gt': 2, '$lt': 6}, " - "'$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - AssertCursorIDs(cursor.get(), {4, 5, 9, 11}); - } - - // update set priority to 10 where job_name is 'white' - { - std::unique_ptr query(Parse("{'job_name': 'white'}")); - std::unique_ptr update(Parse("{'$set': {'priority': 10}}")); - ASSERT_OK(db_->Update(ReadOptions(), WriteOptions(), *query, *update)); - } - - // update twice: set priority to 15 where job_name is 'white' - { - std::unique_ptr query(Parse("{'job_name': 'white'}")); - std::unique_ptr update(Parse("{'$set': {'priority': 10}," - "'$set': {'priority': 15}}")); - ASSERT_OK(db_->Update(ReadOptions(), WriteOptions(), *query, *update)); - } - - // update twice: set priority to 15 and - // progress to 40 where job_name is 'white' - { - std::unique_ptr query(Parse("{'job_name': 'white'}")); - std::unique_ptr update( - Parse("{'$set': {'priority': 10, 'progress': 35}," - "'$set': {'priority': 15, 'progress': 40}}")); - ASSERT_OK(db_->Update(ReadOptions(), WriteOptions(), *query, *update)); - } - - // priority < 0 - { - std::unique_ptr query( - Parse("[{'$filter': {'priority': {'$lt': 0}, '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - ASSERT_OK(cursor->status()); - AssertCursorIDs(cursor.get(), {12, 13}); - } - - // -2 < priority < 0 - { - std::unique_ptr query( - Parse("[{'$filter': {'priority': {'$gt': -2, '$lt': 0}," - " '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - ASSERT_OK(cursor->status()); - AssertCursorIDs(cursor.get(), {12}); - } - - // -2 <= priority < 0 - { - std::unique_ptr query( - Parse("[{'$filter': {'priority': {'$gte': -2, '$lt': 0}," - " '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - ASSERT_OK(cursor->status()); - AssertCursorIDs(cursor.get(), {12, 13}); - } - - // 4 < priority - { - std::unique_ptr query( - Parse("[{'$filter': {'priority': {'$gt': 4}, '$index': 'priority'}}]")); - std::unique_ptr cursor(db_->Query(ReadOptions(), *query)); - ASSERT_OK(cursor->status()); - AssertCursorIDs(cursor.get(), {1, 2, 5}); - } - - Status s = db_->DropIndex("doesnt-exist"); - ASSERT_TRUE(!s.ok()); - ASSERT_OK(db_->DropIndex("priority")); -} - -} // namespace rocksdb - -int main(int argc, char** argv) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else -#include - -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "SKIPPED as DocumentDB is not supported in ROCKSDB_LITE\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/document/json_document.cc b/ceph/src/rocksdb/utilities/document/json_document.cc deleted file mode 100644 index 21a4c7dbc..000000000 --- a/ceph/src/rocksdb/utilities/document/json_document.cc +++ /dev/null @@ -1,610 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -#ifndef ROCKSDB_LITE - -#include "rocksdb/utilities/json_document.h" - -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#include -#include -#include - -#include -#include -#include -#include -#include -#include - - -#include "third-party/fbson/FbsonDocument.h" -#include "third-party/fbson/FbsonJsonParser.h" -#include "third-party/fbson/FbsonUtil.h" -#include "util/coding.h" - -using std::placeholders::_1; - -namespace { - -size_t ObjectNumElem(const fbson::ObjectVal& objectVal) { - size_t size = 0; - for (auto keyValuePair : objectVal) { - (void)keyValuePair; - ++size; - } - return size; -} - -template -void InitJSONDocument(std::unique_ptr* data, - fbson::FbsonValue** value, - Func f) { - // TODO(stash): maybe add function to FbsonDocument to avoid creating array? - fbson::FbsonWriter writer; - bool res __attribute__((__unused__)) = writer.writeStartArray(); - assert(res); - uint32_t bytesWritten __attribute__((__unused__)); - bytesWritten = f(writer); - assert(bytesWritten != 0); - res = writer.writeEndArray(); - assert(res); - char* buf = new char[writer.getOutput()->getSize()]; - memcpy(buf, writer.getOutput()->getBuffer(), writer.getOutput()->getSize()); - - *value = ((fbson::FbsonDocument *)buf)->getValue(); - assert((*value)->isArray()); - assert(((fbson::ArrayVal*)*value)->numElem() == 1); - *value = ((fbson::ArrayVal*)*value)->get(0); - data->reset(buf); -} - -void InitString(std::unique_ptr* data, - fbson::FbsonValue** value, - const std::string& s) { - InitJSONDocument(data, value, std::bind( - [](fbson::FbsonWriter& writer, const std::string& str) -> uint32_t { - bool res __attribute__((__unused__)) = writer.writeStartString(); - assert(res); - auto bytesWritten = writer.writeString(str.c_str(), - static_cast(str.length())); - res = writer.writeEndString(); - assert(res); - // If the string is empty, then bytesWritten == 0, and assert in - // InitJsonDocument will fail. - return bytesWritten + static_cast(str.empty()); - }, - _1, s)); -} - -bool IsNumeric(fbson::FbsonValue* value) { - return value->isInt8() || value->isInt16() || - value->isInt32() || value->isInt64(); -} - -int64_t GetInt64ValFromFbsonNumericType(fbson::FbsonValue* value) { - switch (value->type()) { - case fbson::FbsonType::T_Int8: - return reinterpret_cast(value)->val(); - case fbson::FbsonType::T_Int16: - return reinterpret_cast(value)->val(); - case fbson::FbsonType::T_Int32: - return reinterpret_cast(value)->val(); - case fbson::FbsonType::T_Int64: - return reinterpret_cast(value)->val(); - default: - assert(false); - } - return 0; -} - -bool IsComparable(fbson::FbsonValue* left, fbson::FbsonValue* right) { - if (left->type() == right->type()) { - return true; - } - if (IsNumeric(left) && IsNumeric(right)) { - return true; - } - return false; -} - -void CreateArray(std::unique_ptr* data, fbson::FbsonValue** value) { - fbson::FbsonWriter writer; - bool res __attribute__((__unused__)) = writer.writeStartArray(); - assert(res); - res = writer.writeEndArray(); - assert(res); - data->reset(new char[writer.getOutput()->getSize()]); - memcpy(data->get(), - writer.getOutput()->getBuffer(), - writer.getOutput()->getSize()); - *value = reinterpret_cast(data->get())->getValue(); -} - -void CreateObject(std::unique_ptr* data, fbson::FbsonValue** value) { - fbson::FbsonWriter writer; - bool res __attribute__((__unused__)) = writer.writeStartObject(); - assert(res); - res = writer.writeEndObject(); - assert(res); - data->reset(new char[writer.getOutput()->getSize()]); - memcpy(data->get(), - writer.getOutput()->getBuffer(), - writer.getOutput()->getSize()); - *value = reinterpret_cast(data->get())->getValue(); -} - -} // namespace - -namespace rocksdb { - - -// TODO(stash): find smth easier -JSONDocument::JSONDocument() { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeNull, _1)); -} - -JSONDocument::JSONDocument(bool b) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeBool, _1, b)); -} - -JSONDocument::JSONDocument(double d) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeDouble, _1, d)); -} - -JSONDocument::JSONDocument(int8_t i) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeInt8, _1, i)); -} - -JSONDocument::JSONDocument(int16_t i) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeInt16, _1, i)); -} - -JSONDocument::JSONDocument(int32_t i) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeInt32, _1, i)); -} - -JSONDocument::JSONDocument(int64_t i) { - InitJSONDocument(&data_, - &value_, - std::bind(&fbson::FbsonWriter::writeInt64, _1, i)); -} - -JSONDocument::JSONDocument(const std::string& s) { - InitString(&data_, &value_, s); -} - -JSONDocument::JSONDocument(const char* s) : JSONDocument(std::string(s)) { -} - -void JSONDocument::InitFromValue(const fbson::FbsonValue* val) { - data_.reset(new char[val->numPackedBytes()]); - memcpy(data_.get(), val, val->numPackedBytes()); - value_ = reinterpret_cast(data_.get()); -} - -// Private constructor -JSONDocument::JSONDocument(fbson::FbsonValue* val, bool makeCopy) { - if (makeCopy) { - InitFromValue(val); - } else { - value_ = val; - } -} - -JSONDocument::JSONDocument(Type _type) { - // TODO(icanadi) make all of this better by using templates - switch (_type) { - case kNull: - InitJSONDocument(&data_, &value_, - std::bind(&fbson::FbsonWriter::writeNull, _1)); - break; - case kObject: - CreateObject(&data_, &value_); - break; - case kBool: - InitJSONDocument(&data_, &value_, - std::bind(&fbson::FbsonWriter::writeBool, _1, false)); - break; - case kDouble: - InitJSONDocument(&data_, &value_, - std::bind(&fbson::FbsonWriter::writeDouble, _1, 0.)); - break; - case kArray: - CreateArray(&data_, &value_); - break; - case kInt64: - InitJSONDocument(&data_, &value_, - std::bind(&fbson::FbsonWriter::writeInt64, _1, 0)); - break; - case kString: - InitString(&data_, &value_, ""); - break; - default: - assert(false); - } -} - -JSONDocument::JSONDocument(const JSONDocument& jsonDocument) { - if (jsonDocument.IsOwner()) { - InitFromValue(jsonDocument.value_); - } else { - value_ = jsonDocument.value_; - } -} - -JSONDocument::JSONDocument(JSONDocument&& jsonDocument) { - value_ = jsonDocument.value_; - data_.swap(jsonDocument.data_); -} - -JSONDocument& JSONDocument::operator=(JSONDocument jsonDocument) { - value_ = jsonDocument.value_; - data_.swap(jsonDocument.data_); - return *this; -} - -JSONDocument::Type JSONDocument::type() const { - switch (value_->type()) { - case fbson::FbsonType::T_Null: - return JSONDocument::kNull; - - case fbson::FbsonType::T_True: - case fbson::FbsonType::T_False: - return JSONDocument::kBool; - - case fbson::FbsonType::T_Int8: - case fbson::FbsonType::T_Int16: - case fbson::FbsonType::T_Int32: - case fbson::FbsonType::T_Int64: - return JSONDocument::kInt64; - - case fbson::FbsonType::T_Double: - return JSONDocument::kDouble; - - case fbson::FbsonType::T_String: - return JSONDocument::kString; - - case fbson::FbsonType::T_Object: - return JSONDocument::kObject; - - case fbson::FbsonType::T_Array: - return JSONDocument::kArray; - - case fbson::FbsonType::T_Binary: - default: - assert(false); - } - return JSONDocument::kNull; -} - -bool JSONDocument::Contains(const std::string& key) const { - assert(IsObject()); - auto objectVal = reinterpret_cast(value_); - return objectVal->find(key.c_str()) != nullptr; -} - -JSONDocument JSONDocument::operator[](const std::string& key) const { - assert(IsObject()); - auto objectVal = reinterpret_cast(value_); - auto foundValue = objectVal->find(key.c_str()); - assert(foundValue != nullptr); - // No need to save paths in const objects - JSONDocument ans(foundValue, false); - return ans; -} - -size_t JSONDocument::Count() const { - assert(IsObject() || IsArray()); - if (IsObject()) { - // TODO(stash): add to fbson? - const fbson::ObjectVal& objectVal = - *reinterpret_cast(value_); - return ObjectNumElem(objectVal); - } else if (IsArray()) { - auto arrayVal = reinterpret_cast(value_); - return arrayVal->numElem(); - } - assert(false); - return 0; -} - -JSONDocument JSONDocument::operator[](size_t i) const { - assert(IsArray()); - auto arrayVal = reinterpret_cast(value_); - auto foundValue = arrayVal->get(static_cast(i)); - JSONDocument ans(foundValue, false); - return ans; -} - -bool JSONDocument::IsNull() const { - return value_->isNull(); -} - -bool JSONDocument::IsArray() const { - return value_->isArray(); -} - -bool JSONDocument::IsBool() const { - return value_->isTrue() || value_->isFalse(); -} - -bool JSONDocument::IsDouble() const { - return value_->isDouble(); -} - -bool JSONDocument::IsInt64() const { - return value_->isInt8() || value_->isInt16() || - value_->isInt32() || value_->isInt64(); -} - -bool JSONDocument::IsObject() const { - return value_->isObject(); -} - -bool JSONDocument::IsString() const { - return value_->isString(); -} - -bool JSONDocument::GetBool() const { - assert(IsBool()); - return value_->isTrue(); -} - -double JSONDocument::GetDouble() const { - assert(IsDouble()); - return ((fbson::DoubleVal*)value_)->val(); -} - -int64_t JSONDocument::GetInt64() const { - assert(IsInt64()); - return GetInt64ValFromFbsonNumericType(value_); -} - -std::string JSONDocument::GetString() const { - assert(IsString()); - fbson::StringVal* stringVal = (fbson::StringVal*)value_; - return std::string(stringVal->getBlob(), stringVal->getBlobLen()); -} - -namespace { - -// FbsonValue can be int8, int16, int32, int64 -bool CompareNumeric(fbson::FbsonValue* left, fbson::FbsonValue* right) { - assert(IsNumeric(left) && IsNumeric(right)); - return GetInt64ValFromFbsonNumericType(left) == - GetInt64ValFromFbsonNumericType(right); -} - -bool CompareSimpleTypes(fbson::FbsonValue* left, fbson::FbsonValue* right) { - if (IsNumeric(left)) { - return CompareNumeric(left, right); - } - if (left->numPackedBytes() != right->numPackedBytes()) { - return false; - } - return memcmp(left, right, left->numPackedBytes()) == 0; -} - -bool CompareFbsonValue(fbson::FbsonValue* left, fbson::FbsonValue* right) { - if (!IsComparable(left, right)) { - return false; - } - - switch (left->type()) { - case fbson::FbsonType::T_True: - case fbson::FbsonType::T_False: - case fbson::FbsonType::T_Null: - return true; - case fbson::FbsonType::T_Int8: - case fbson::FbsonType::T_Int16: - case fbson::FbsonType::T_Int32: - case fbson::FbsonType::T_Int64: - return CompareNumeric(left, right); - case fbson::FbsonType::T_String: - case fbson::FbsonType::T_Double: - return CompareSimpleTypes(left, right); - case fbson::FbsonType::T_Object: - { - auto leftObject = reinterpret_cast(left); - auto rightObject = reinterpret_cast(right); - if (ObjectNumElem(*leftObject) != ObjectNumElem(*rightObject)) { - return false; - } - for (auto && keyValue : *leftObject) { - std::string str(keyValue.getKeyStr(), keyValue.klen()); - if (rightObject->find(str.c_str()) == nullptr) { - return false; - } - if (!CompareFbsonValue(keyValue.value(), - rightObject->find(str.c_str()))) { - return false; - } - } - return true; - } - case fbson::FbsonType::T_Array: - { - auto leftArr = reinterpret_cast(left); - auto rightArr = reinterpret_cast(right); - if (leftArr->numElem() != rightArr->numElem()) { - return false; - } - for (int i = 0; i < static_cast(leftArr->numElem()); ++i) { - if (!CompareFbsonValue(leftArr->get(i), rightArr->get(i))) { - return false; - } - } - return true; - } - default: - assert(false); - } - return false; -} - -} // namespace - -bool JSONDocument::operator==(const JSONDocument& rhs) const { - return CompareFbsonValue(value_, rhs.value_); -} - -bool JSONDocument::operator!=(const JSONDocument& rhs) const { - return !(*this == rhs); -} - -JSONDocument JSONDocument::Copy() const { - return JSONDocument(value_, true); -} - -bool JSONDocument::IsOwner() const { - return data_.get() != nullptr; -} - -std::string JSONDocument::DebugString() const { - fbson::FbsonToJson fbsonToJson; - return fbsonToJson.json(value_); -} - -JSONDocument::ItemsIteratorGenerator JSONDocument::Items() const { - assert(IsObject()); - return ItemsIteratorGenerator(*(reinterpret_cast(value_))); -} - -// TODO(icanadi) (perf) allocate objects with arena -JSONDocument* JSONDocument::ParseJSON(const char* json) { - fbson::FbsonJsonParser parser; - if (!parser.parse(json)) { - return nullptr; - } - - auto fbsonVal = fbson::FbsonDocument::createValue( - parser.getWriter().getOutput()->getBuffer(), - static_cast(parser.getWriter().getOutput()->getSize())); - - if (fbsonVal == nullptr) { - return nullptr; - } - - return new JSONDocument(fbsonVal, true); -} - -void JSONDocument::Serialize(std::string* dst) const { - // first byte is reserved for header - // currently, header is only version number. that will help us provide - // backwards compatility. we might also store more information here if - // necessary - dst->push_back(kSerializationFormatVersion); - dst->push_back(FBSON_VER); - dst->append(reinterpret_cast(value_), value_->numPackedBytes()); -} - -const char JSONDocument::kSerializationFormatVersion = 2; - -JSONDocument* JSONDocument::Deserialize(const Slice& src) { - Slice input(src); - if (src.size() == 0) { - return nullptr; - } - char header = input[0]; - if (header == 1) { - assert(false); - } - input.remove_prefix(1); - auto value = fbson::FbsonDocument::createValue(input.data(), - static_cast(input.size())); - if (value == nullptr) { - return nullptr; - } - - return new JSONDocument(value, true); -} - -class JSONDocument::const_item_iterator::Impl { - public: - typedef fbson::ObjectVal::const_iterator It; - - explicit Impl(It it) : it_(it) {} - - const char* getKeyStr() const { - return it_->getKeyStr(); - } - - uint8_t klen() const { - return it_->klen(); - } - - It& operator++() { - return ++it_; - } - - bool operator!=(const Impl& other) { - return it_ != other.it_; - } - - fbson::FbsonValue* value() const { - return it_->value(); - } - - private: - It it_; -}; - -JSONDocument::const_item_iterator::const_item_iterator(Impl* impl) -: it_(impl) {} - -JSONDocument::const_item_iterator::const_item_iterator(const_item_iterator&& a) -: it_(std::move(a.it_)) {} - -JSONDocument::const_item_iterator& - JSONDocument::const_item_iterator::operator++() { - ++(*it_); - return *this; -} - -bool JSONDocument::const_item_iterator::operator!=( - const const_item_iterator& other) { - return *it_ != *(other.it_); -} - -JSONDocument::const_item_iterator::~const_item_iterator() { -} - -JSONDocument::const_item_iterator::value_type - JSONDocument::const_item_iterator::operator*() { - return JSONDocument::const_item_iterator::value_type(std::string(it_->getKeyStr(), it_->klen()), - JSONDocument(it_->value(), false)); -} - -JSONDocument::ItemsIteratorGenerator::ItemsIteratorGenerator( - const fbson::ObjectVal& object) - : object_(object) {} - -JSONDocument::const_item_iterator - JSONDocument::ItemsIteratorGenerator::begin() const { - return const_item_iterator(new const_item_iterator::Impl(object_.begin())); -} - -JSONDocument::const_item_iterator - JSONDocument::ItemsIteratorGenerator::end() const { - return const_item_iterator(new const_item_iterator::Impl(object_.end())); -} - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/document/json_document_builder.cc b/ceph/src/rocksdb/utilities/document/json_document_builder.cc deleted file mode 100644 index 7aa95e465..000000000 --- a/ceph/src/rocksdb/utilities/document/json_document_builder.cc +++ /dev/null @@ -1,120 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE -#include -#include -#include -#include "rocksdb/utilities/json_document.h" -#include "third-party/fbson/FbsonWriter.h" - -namespace rocksdb { -JSONDocumentBuilder::JSONDocumentBuilder() -: writer_(new fbson::FbsonWriter()) { -} - -JSONDocumentBuilder::JSONDocumentBuilder(fbson::FbsonOutStream* out) -: writer_(new fbson::FbsonWriter(*out)) { -} - -void JSONDocumentBuilder::Reset() { - writer_->reset(); -} - -bool JSONDocumentBuilder::WriteStartArray() { - return writer_->writeStartArray(); -} - -bool JSONDocumentBuilder::WriteEndArray() { - return writer_->writeEndArray(); -} - -bool JSONDocumentBuilder::WriteStartObject() { - return writer_->writeStartObject(); -} - -bool JSONDocumentBuilder::WriteEndObject() { - return writer_->writeEndObject(); -} - -bool JSONDocumentBuilder::WriteKeyValue(const std::string& key, - const JSONDocument& value) { - assert(key.size() <= std::numeric_limits::max()); - size_t bytesWritten = writer_->writeKey(key.c_str(), - static_cast(key.size())); - if (bytesWritten == 0) { - return false; - } - return WriteJSONDocument(value); -} - -bool JSONDocumentBuilder::WriteJSONDocument(const JSONDocument& value) { - switch (value.type()) { - case JSONDocument::kNull: - return writer_->writeNull() != 0; - case JSONDocument::kInt64: - return writer_->writeInt64(value.GetInt64()); - case JSONDocument::kDouble: - return writer_->writeDouble(value.GetDouble()); - case JSONDocument::kBool: - return writer_->writeBool(value.GetBool()); - case JSONDocument::kString: - { - bool res = writer_->writeStartString(); - if (!res) { - return false; - } - const std::string& str = value.GetString(); - res = writer_->writeString(str.c_str(), - static_cast(str.size())); - if (!res) { - return false; - } - return writer_->writeEndString(); - } - case JSONDocument::kArray: - { - bool res = WriteStartArray(); - if (!res) { - return false; - } - for (size_t i = 0; i < value.Count(); ++i) { - res = WriteJSONDocument(value[i]); - if (!res) { - return false; - } - } - return WriteEndArray(); - } - case JSONDocument::kObject: - { - bool res = WriteStartObject(); - if (!res) { - return false; - } - for (auto keyValue : value.Items()) { - WriteKeyValue(keyValue.first, keyValue.second); - } - return WriteEndObject(); - } - default: - assert(false); - } - return false; -} - -JSONDocument JSONDocumentBuilder::GetJSONDocument() { - fbson::FbsonValue* value = - fbson::FbsonDocument::createValue(writer_->getOutput()->getBuffer(), - static_cast(writer_->getOutput()->getSize())); - return JSONDocument(value, true); -} - -JSONDocumentBuilder::~JSONDocumentBuilder() { -} - -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/document/json_document_test.cc b/ceph/src/rocksdb/utilities/document/json_document_test.cc deleted file mode 100644 index 977905b91..000000000 --- a/ceph/src/rocksdb/utilities/document/json_document_test.cc +++ /dev/null @@ -1,341 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE - -#include -#include -#include - -#include "rocksdb/utilities/json_document.h" - -#include "util/testutil.h" -#include "util/testharness.h" - -namespace rocksdb { -namespace { -void AssertField(const JSONDocument& json, const std::string& field) { - ASSERT_TRUE(json.Contains(field)); - ASSERT_TRUE(json[field].IsNull()); -} - -void AssertField(const JSONDocument& json, const std::string& field, - const std::string& expected) { - ASSERT_TRUE(json.Contains(field)); - ASSERT_TRUE(json[field].IsString()); - ASSERT_EQ(expected, json[field].GetString()); -} - -void AssertField(const JSONDocument& json, const std::string& field, - int64_t expected) { - ASSERT_TRUE(json.Contains(field)); - ASSERT_TRUE(json[field].IsInt64()); - ASSERT_EQ(expected, json[field].GetInt64()); -} - -void AssertField(const JSONDocument& json, const std::string& field, - bool expected) { - ASSERT_TRUE(json.Contains(field)); - ASSERT_TRUE(json[field].IsBool()); - ASSERT_EQ(expected, json[field].GetBool()); -} - -void AssertField(const JSONDocument& json, const std::string& field, - double expected) { - ASSERT_TRUE(json.Contains(field)); - ASSERT_TRUE(json[field].IsDouble()); - ASSERT_DOUBLE_EQ(expected, json[field].GetDouble()); -} -} // namespace - -class JSONDocumentTest : public testing::Test { - public: - JSONDocumentTest() - : rnd_(101) - {} - - void AssertSampleJSON(const JSONDocument& json) { - AssertField(json, "title", std::string("json")); - AssertField(json, "type", std::string("object")); - // properties - ASSERT_TRUE(json.Contains("properties")); - ASSERT_TRUE(json["properties"].Contains("flags")); - ASSERT_TRUE(json["properties"]["flags"].IsArray()); - ASSERT_EQ(3u, json["properties"]["flags"].Count()); - ASSERT_TRUE(json["properties"]["flags"][0].IsInt64()); - ASSERT_EQ(10, json["properties"]["flags"][0].GetInt64()); - ASSERT_TRUE(json["properties"]["flags"][1].IsString()); - ASSERT_EQ("parse", json["properties"]["flags"][1].GetString()); - ASSERT_TRUE(json["properties"]["flags"][2].IsObject()); - AssertField(json["properties"]["flags"][2], "tag", std::string("no")); - AssertField(json["properties"]["flags"][2], std::string("status")); - AssertField(json["properties"], "age", 110.5e-4); - AssertField(json["properties"], "depth", static_cast(-10)); - // test iteration - std::set expected({"flags", "age", "depth"}); - for (auto item : json["properties"].Items()) { - auto iter = expected.find(item.first); - ASSERT_TRUE(iter != expected.end()); - expected.erase(iter); - } - ASSERT_EQ(0U, expected.size()); - ASSERT_TRUE(json.Contains("latlong")); - ASSERT_TRUE(json["latlong"].IsArray()); - ASSERT_EQ(2u, json["latlong"].Count()); - ASSERT_TRUE(json["latlong"][0].IsDouble()); - ASSERT_EQ(53.25, json["latlong"][0].GetDouble()); - ASSERT_TRUE(json["latlong"][1].IsDouble()); - ASSERT_EQ(43.75, json["latlong"][1].GetDouble()); - AssertField(json, "enabled", true); - } - - const std::string kSampleJSON = - "{ \"title\" : \"json\", \"type\" : \"object\", \"properties\" : { " - "\"flags\": [10, \"parse\", {\"tag\": \"no\", \"status\": null}], " - "\"age\": 110.5e-4, \"depth\": -10 }, \"latlong\": [53.25, 43.75], " - "\"enabled\": true }"; - - const std::string kSampleJSONDifferent = - "{ \"title\" : \"json\", \"type\" : \"object\", \"properties\" : { " - "\"flags\": [10, \"parse\", {\"tag\": \"no\", \"status\": 2}], " - "\"age\": 110.5e-4, \"depth\": -10 }, \"latlong\": [53.25, 43.75], " - "\"enabled\": true }"; - - Random rnd_; -}; - -TEST_F(JSONDocumentTest, MakeNullTest) { - JSONDocument x; - ASSERT_TRUE(x.IsNull()); - ASSERT_TRUE(x.IsOwner()); - ASSERT_TRUE(!x.IsBool()); -} - -TEST_F(JSONDocumentTest, MakeBoolTest) { - { - JSONDocument x(true); - ASSERT_TRUE(x.IsOwner()); - ASSERT_TRUE(x.IsBool()); - ASSERT_TRUE(!x.IsInt64()); - ASSERT_EQ(x.GetBool(), true); - } - - { - JSONDocument x(false); - ASSERT_TRUE(x.IsOwner()); - ASSERT_TRUE(x.IsBool()); - ASSERT_TRUE(!x.IsInt64()); - ASSERT_EQ(x.GetBool(), false); - } -} - -TEST_F(JSONDocumentTest, MakeInt64Test) { - JSONDocument x(static_cast(16)); - ASSERT_TRUE(x.IsInt64()); - ASSERT_TRUE(x.IsInt64()); - ASSERT_TRUE(!x.IsBool()); - ASSERT_TRUE(x.IsOwner()); - ASSERT_EQ(x.GetInt64(), 16); -} - -TEST_F(JSONDocumentTest, MakeStringTest) { - JSONDocument x("string"); - ASSERT_TRUE(x.IsOwner()); - ASSERT_TRUE(x.IsString()); - ASSERT_TRUE(!x.IsBool()); - ASSERT_EQ(x.GetString(), "string"); -} - -TEST_F(JSONDocumentTest, MakeDoubleTest) { - JSONDocument x(5.6); - ASSERT_TRUE(x.IsOwner()); - ASSERT_TRUE(x.IsDouble()); - ASSERT_TRUE(!x.IsBool()); - ASSERT_EQ(x.GetDouble(), 5.6); -} - -TEST_F(JSONDocumentTest, MakeByTypeTest) { - { - JSONDocument x(JSONDocument::kNull); - ASSERT_TRUE(x.IsNull()); - } - { - JSONDocument x(JSONDocument::kBool); - ASSERT_TRUE(x.IsBool()); - } - { - JSONDocument x(JSONDocument::kString); - ASSERT_TRUE(x.IsString()); - } - { - JSONDocument x(JSONDocument::kInt64); - ASSERT_TRUE(x.IsInt64()); - } - { - JSONDocument x(JSONDocument::kDouble); - ASSERT_TRUE(x.IsDouble()); - } - { - JSONDocument x(JSONDocument::kObject); - ASSERT_TRUE(x.IsObject()); - } - { - JSONDocument x(JSONDocument::kArray); - ASSERT_TRUE(x.IsArray()); - } -} - -TEST_F(JSONDocumentTest, Parsing) { - std::unique_ptr parsed_json( - JSONDocument::ParseJSON(kSampleJSON.c_str())); - ASSERT_TRUE(parsed_json->IsOwner()); - ASSERT_TRUE(parsed_json != nullptr); - AssertSampleJSON(*parsed_json); - - // test deep copying - JSONDocument copied_json_document(*parsed_json); - AssertSampleJSON(copied_json_document); - ASSERT_TRUE(copied_json_document == *parsed_json); - - std::unique_ptr parsed_different_sample( - JSONDocument::ParseJSON(kSampleJSONDifferent.c_str())); - ASSERT_TRUE(parsed_different_sample != nullptr); - ASSERT_TRUE(!(*parsed_different_sample == copied_json_document)); - - // parse error - const std::string kFaultyJSON = - kSampleJSON.substr(0, kSampleJSON.size() - 10); - ASSERT_TRUE(JSONDocument::ParseJSON(kFaultyJSON.c_str()) == nullptr); -} - -TEST_F(JSONDocumentTest, Serialization) { - std::unique_ptr parsed_json( - JSONDocument::ParseJSON(kSampleJSON.c_str())); - ASSERT_TRUE(parsed_json != nullptr); - ASSERT_TRUE(parsed_json->IsOwner()); - std::string serialized; - parsed_json->Serialize(&serialized); - - std::unique_ptr deserialized_json( - JSONDocument::Deserialize(Slice(serialized))); - ASSERT_TRUE(deserialized_json != nullptr); - AssertSampleJSON(*deserialized_json); - - // deserialization failure - ASSERT_TRUE(JSONDocument::Deserialize( - Slice(serialized.data(), serialized.size() - 10)) == nullptr); -} - -TEST_F(JSONDocumentTest, OperatorEqualsTest) { - // kNull - ASSERT_TRUE(JSONDocument() == JSONDocument()); - - // kBool - ASSERT_TRUE(JSONDocument(false) != JSONDocument()); - ASSERT_TRUE(JSONDocument(false) == JSONDocument(false)); - ASSERT_TRUE(JSONDocument(true) == JSONDocument(true)); - ASSERT_TRUE(JSONDocument(false) != JSONDocument(true)); - - // kString - ASSERT_TRUE(JSONDocument("test") != JSONDocument()); - ASSERT_TRUE(JSONDocument("test") == JSONDocument("test")); - - // kInt64 - ASSERT_TRUE(JSONDocument(static_cast(15)) != JSONDocument()); - ASSERT_TRUE(JSONDocument(static_cast(15)) != - JSONDocument(static_cast(14))); - ASSERT_TRUE(JSONDocument(static_cast(15)) == - JSONDocument(static_cast(15))); - - unique_ptr arrayWithInt8Doc(JSONDocument::ParseJSON("[8]")); - ASSERT_TRUE(arrayWithInt8Doc != nullptr); - ASSERT_TRUE(arrayWithInt8Doc->IsArray()); - ASSERT_TRUE((*arrayWithInt8Doc)[0].IsInt64()); - ASSERT_TRUE((*arrayWithInt8Doc)[0] == JSONDocument(static_cast(8))); - - unique_ptr arrayWithInt16Doc(JSONDocument::ParseJSON("[512]")); - ASSERT_TRUE(arrayWithInt16Doc != nullptr); - ASSERT_TRUE(arrayWithInt16Doc->IsArray()); - ASSERT_TRUE((*arrayWithInt16Doc)[0].IsInt64()); - ASSERT_TRUE((*arrayWithInt16Doc)[0] == - JSONDocument(static_cast(512))); - - unique_ptr arrayWithInt32Doc( - JSONDocument::ParseJSON("[1000000]")); - ASSERT_TRUE(arrayWithInt32Doc != nullptr); - ASSERT_TRUE(arrayWithInt32Doc->IsArray()); - ASSERT_TRUE((*arrayWithInt32Doc)[0].IsInt64()); - ASSERT_TRUE((*arrayWithInt32Doc)[0] == - JSONDocument(static_cast(1000000))); - - // kDouble - ASSERT_TRUE(JSONDocument(15.) != JSONDocument()); - ASSERT_TRUE(JSONDocument(15.) != JSONDocument(14.)); - ASSERT_TRUE(JSONDocument(15.) == JSONDocument(15.)); -} - -TEST_F(JSONDocumentTest, JSONDocumentBuilderTest) { - unique_ptr parsedArray( - JSONDocument::ParseJSON("[1, [123, \"a\", \"b\"], {\"b\":\"c\"}]")); - ASSERT_TRUE(parsedArray != nullptr); - - JSONDocumentBuilder builder; - ASSERT_TRUE(builder.WriteStartArray()); - ASSERT_TRUE(builder.WriteJSONDocument(1)); - - ASSERT_TRUE(builder.WriteStartArray()); - ASSERT_TRUE(builder.WriteJSONDocument(123)); - ASSERT_TRUE(builder.WriteJSONDocument("a")); - ASSERT_TRUE(builder.WriteJSONDocument("b")); - ASSERT_TRUE(builder.WriteEndArray()); - - ASSERT_TRUE(builder.WriteStartObject()); - ASSERT_TRUE(builder.WriteKeyValue("b", "c")); - ASSERT_TRUE(builder.WriteEndObject()); - - ASSERT_TRUE(builder.WriteEndArray()); - - ASSERT_TRUE(*parsedArray == builder.GetJSONDocument()); -} - -TEST_F(JSONDocumentTest, OwnershipTest) { - std::unique_ptr parsed( - JSONDocument::ParseJSON(kSampleJSON.c_str())); - ASSERT_TRUE(parsed != nullptr); - ASSERT_TRUE(parsed->IsOwner()); - - // Copy constructor from owner -> owner - JSONDocument copy_constructor(*parsed); - ASSERT_TRUE(copy_constructor.IsOwner()); - - // Copy constructor from non-owner -> non-owner - JSONDocument non_owner((*parsed)["properties"]); - ASSERT_TRUE(!non_owner.IsOwner()); - - // Move constructor from owner -> owner - JSONDocument moved_from_owner(std::move(copy_constructor)); - ASSERT_TRUE(moved_from_owner.IsOwner()); - - // Move constructor from non-owner -> non-owner - JSONDocument moved_from_non_owner(std::move(non_owner)); - ASSERT_TRUE(!moved_from_non_owner.IsOwner()); -} - -} // namespace rocksdb - -int main(int argc, char** argv) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else -#include - -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "SKIPPED as JSONDocument is not supported in ROCKSDB_LITE\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/env_librados_test.cc b/ceph/src/rocksdb/utilities/env_librados_test.cc index 7d9b252ea..fb10224e7 100644 --- a/ceph/src/rocksdb/utilities/env_librados_test.cc +++ b/ceph/src/rocksdb/utilities/env_librados_test.cc @@ -108,7 +108,7 @@ public: TEST_F(EnvLibradosTest, Basics) { uint64_t file_size; - unique_ptr writable_file; + std::unique_ptr writable_file; std::vector children; ASSERT_OK(env_->CreateDir("/dir")); @@ -150,8 +150,8 @@ TEST_F(EnvLibradosTest, Basics) { ASSERT_EQ(3U, file_size); // Check that opening non-existent file fails. - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; ASSERT_TRUE( !env_->NewSequentialFile("/dir/non_existent", &seq_file, soptions_).ok()); ASSERT_TRUE(!seq_file); @@ -169,9 +169,9 @@ TEST_F(EnvLibradosTest, Basics) { } TEST_F(EnvLibradosTest, ReadWrite) { - unique_ptr writable_file; - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr writable_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; Slice result; char scratch[100]; @@ -210,7 +210,7 @@ TEST_F(EnvLibradosTest, ReadWrite) { TEST_F(EnvLibradosTest, Locks) { FileLock* lock = nullptr; - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->CreateDir("/dir")); @@ -229,7 +229,7 @@ TEST_F(EnvLibradosTest, Misc) { ASSERT_OK(env_->GetTestDirectory(&test_dir)); ASSERT_TRUE(!test_dir.empty()); - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_TRUE(!env_->NewWritableFile("/a/b", &writable_file, soptions_).ok()); ASSERT_OK(env_->NewWritableFile("/a", &writable_file, soptions_)); @@ -249,14 +249,14 @@ TEST_F(EnvLibradosTest, LargeWrite) { write_data.append(1, 'h'); } - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->CreateDir("/dir")); ASSERT_OK(env_->NewWritableFile("/dir/g", &writable_file, soptions_)); ASSERT_OK(writable_file->Append("foo")); ASSERT_OK(writable_file->Append(write_data)); writable_file.reset(); - unique_ptr seq_file; + std::unique_ptr seq_file; Slice result; ASSERT_OK(env_->NewSequentialFile("/dir/g", &seq_file, soptions_)); ASSERT_OK(seq_file->Read(3, &result, scratch)); // Read "foo". @@ -282,7 +282,7 @@ TEST_F(EnvLibradosTest, FrequentlySmallWrite) { write_data.append(1, 'h'); } - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->CreateDir("/dir")); ASSERT_OK(env_->NewWritableFile("/dir/g", &writable_file, soptions_)); ASSERT_OK(writable_file->Append("foo")); @@ -292,7 +292,7 @@ TEST_F(EnvLibradosTest, FrequentlySmallWrite) { } writable_file.reset(); - unique_ptr seq_file; + std::unique_ptr seq_file; Slice result; ASSERT_OK(env_->NewSequentialFile("/dir/g", &seq_file, soptions_)); ASSERT_OK(seq_file->Read(3, &result, scratch)); // Read "foo". @@ -317,7 +317,7 @@ TEST_F(EnvLibradosTest, Truncate) { write_data.append(1, 'h'); } - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->CreateDir("/dir")); ASSERT_OK(env_->NewWritableFile("/dir/g", &writable_file, soptions_)); ASSERT_OK(writable_file->Append(write_data)); @@ -801,7 +801,7 @@ public: TEST_F(EnvLibradosMutipoolTest, Basics) { uint64_t file_size; - unique_ptr writable_file; + std::unique_ptr writable_file; std::vector children; std::vector v = {"/tmp/dir1", "/tmp/dir2", "/tmp/dir3", "/tmp/dir4", "dir"}; @@ -850,8 +850,8 @@ TEST_F(EnvLibradosMutipoolTest, Basics) { ASSERT_EQ(3U, file_size); // Check that opening non-existent file fails. - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; ASSERT_TRUE( !env_->NewSequentialFile(dir_non_existent.c_str(), &seq_file, soptions_).ok()); ASSERT_TRUE(!seq_file); diff --git a/ceph/src/rocksdb/utilities/env_mirror.cc b/ceph/src/rocksdb/utilities/env_mirror.cc index d14de97d0..327d8e162 100644 --- a/ceph/src/rocksdb/utilities/env_mirror.cc +++ b/ceph/src/rocksdb/utilities/env_mirror.cc @@ -16,7 +16,7 @@ namespace rocksdb { // Env's. This is useful for debugging purposes. class SequentialFileMirror : public SequentialFile { public: - unique_ptr a_, b_; + std::unique_ptr a_, b_; std::string fname; explicit SequentialFileMirror(std::string f) : fname(f) {} @@ -60,7 +60,7 @@ class SequentialFileMirror : public SequentialFile { class RandomAccessFileMirror : public RandomAccessFile { public: - unique_ptr a_, b_; + std::unique_ptr a_, b_; std::string fname; explicit RandomAccessFileMirror(std::string f) : fname(f) {} @@ -95,7 +95,7 @@ class RandomAccessFileMirror : public RandomAccessFile { class WritableFileMirror : public WritableFile { public: - unique_ptr a_, b_; + std::unique_ptr a_, b_; std::string fname; explicit WritableFileMirror(std::string f) : fname(f) {} @@ -191,7 +191,7 @@ class WritableFileMirror : public WritableFile { }; Status EnvMirror::NewSequentialFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) { if (f.find("/proc/") == 0) { return a_->NewSequentialFile(f, r, options); @@ -208,7 +208,7 @@ Status EnvMirror::NewSequentialFile(const std::string& f, } Status EnvMirror::NewRandomAccessFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) { if (f.find("/proc/") == 0) { return a_->NewRandomAccessFile(f, r, options); @@ -225,7 +225,7 @@ Status EnvMirror::NewRandomAccessFile(const std::string& f, } Status EnvMirror::NewWritableFile(const std::string& f, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) { if (f.find("/proc/") == 0) return a_->NewWritableFile(f, r, options); WritableFileMirror* mf = new WritableFileMirror(f); @@ -241,7 +241,7 @@ Status EnvMirror::NewWritableFile(const std::string& f, Status EnvMirror::ReuseWritableFile(const std::string& fname, const std::string& old_fname, - unique_ptr* r, + std::unique_ptr* r, const EnvOptions& options) { if (fname.find("/proc/") == 0) return a_->ReuseWritableFile(fname, old_fname, r, options); diff --git a/ceph/src/rocksdb/utilities/env_mirror_test.cc b/ceph/src/rocksdb/utilities/env_mirror_test.cc index 2bf8ec858..812595ca1 100644 --- a/ceph/src/rocksdb/utilities/env_mirror_test.cc +++ b/ceph/src/rocksdb/utilities/env_mirror_test.cc @@ -32,7 +32,7 @@ class EnvMirrorTest : public testing::Test { TEST_F(EnvMirrorTest, Basics) { uint64_t file_size; - unique_ptr writable_file; + std::unique_ptr writable_file; std::vector children; ASSERT_OK(env_->CreateDir("/dir")); @@ -91,8 +91,8 @@ TEST_F(EnvMirrorTest, Basics) { ASSERT_EQ(3U, file_size); // Check that opening non-existent file fails. - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; ASSERT_TRUE( !env_->NewSequentialFile("/dir/non_existent", &seq_file, soptions_).ok()); ASSERT_TRUE(!seq_file); @@ -110,9 +110,9 @@ TEST_F(EnvMirrorTest, Basics) { } TEST_F(EnvMirrorTest, ReadWrite) { - unique_ptr writable_file; - unique_ptr seq_file; - unique_ptr rand_file; + std::unique_ptr writable_file; + std::unique_ptr seq_file; + std::unique_ptr rand_file; Slice result; char scratch[100]; @@ -162,7 +162,7 @@ TEST_F(EnvMirrorTest, Misc) { ASSERT_OK(env_->GetTestDirectory(&test_dir)); ASSERT_TRUE(!test_dir.empty()); - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->NewWritableFile("/a/b", &writable_file, soptions_)); // These are no-ops, but we test they return success. @@ -181,13 +181,13 @@ TEST_F(EnvMirrorTest, LargeWrite) { write_data.append(1, static_cast(i)); } - unique_ptr writable_file; + std::unique_ptr writable_file; ASSERT_OK(env_->NewWritableFile("/dir/f", &writable_file, soptions_)); ASSERT_OK(writable_file->Append("foo")); ASSERT_OK(writable_file->Append(write_data)); writable_file.reset(); - unique_ptr seq_file; + std::unique_ptr seq_file; Slice result; ASSERT_OK(env_->NewSequentialFile("/dir/f", &seq_file, soptions_)); ASSERT_OK(seq_file->Read(3, &result, scratch)); // Read "foo". diff --git a/ceph/src/rocksdb/utilities/env_timed.cc b/ceph/src/rocksdb/utilities/env_timed.cc index 6afd45bf9..82fc6401c 100644 --- a/ceph/src/rocksdb/utilities/env_timed.cc +++ b/ceph/src/rocksdb/utilities/env_timed.cc @@ -17,121 +17,118 @@ class TimedEnv : public EnvWrapper { public: explicit TimedEnv(Env* base_env) : EnvWrapper(base_env) {} - virtual Status NewSequentialFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewSequentialFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { PERF_TIMER_GUARD(env_new_sequential_file_nanos); return EnvWrapper::NewSequentialFile(fname, result, options); } - virtual Status NewRandomAccessFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomAccessFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { PERF_TIMER_GUARD(env_new_random_access_file_nanos); return EnvWrapper::NewRandomAccessFile(fname, result, options); } - virtual Status NewWritableFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewWritableFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { PERF_TIMER_GUARD(env_new_writable_file_nanos); return EnvWrapper::NewWritableFile(fname, result, options); } - virtual Status ReuseWritableFile(const std::string& fname, - const std::string& old_fname, - unique_ptr* result, - const EnvOptions& options) override { + Status ReuseWritableFile(const std::string& fname, + const std::string& old_fname, + std::unique_ptr* result, + const EnvOptions& options) override { PERF_TIMER_GUARD(env_reuse_writable_file_nanos); return EnvWrapper::ReuseWritableFile(fname, old_fname, result, options); } - virtual Status NewRandomRWFile(const std::string& fname, - unique_ptr* result, - const EnvOptions& options) override { + Status NewRandomRWFile(const std::string& fname, + std::unique_ptr* result, + const EnvOptions& options) override { PERF_TIMER_GUARD(env_new_random_rw_file_nanos); return EnvWrapper::NewRandomRWFile(fname, result, options); } - virtual Status NewDirectory(const std::string& name, - unique_ptr* result) override { + Status NewDirectory(const std::string& name, + std::unique_ptr* result) override { PERF_TIMER_GUARD(env_new_directory_nanos); return EnvWrapper::NewDirectory(name, result); } - virtual Status FileExists(const std::string& fname) override { + Status FileExists(const std::string& fname) override { PERF_TIMER_GUARD(env_file_exists_nanos); return EnvWrapper::FileExists(fname); } - virtual Status GetChildren(const std::string& dir, - std::vector* result) override { + Status GetChildren(const std::string& dir, + std::vector* result) override { PERF_TIMER_GUARD(env_get_children_nanos); return EnvWrapper::GetChildren(dir, result); } - virtual Status GetChildrenFileAttributes( + Status GetChildrenFileAttributes( const std::string& dir, std::vector* result) override { PERF_TIMER_GUARD(env_get_children_file_attributes_nanos); return EnvWrapper::GetChildrenFileAttributes(dir, result); } - virtual Status DeleteFile(const std::string& fname) override { + Status DeleteFile(const std::string& fname) override { PERF_TIMER_GUARD(env_delete_file_nanos); return EnvWrapper::DeleteFile(fname); } - virtual Status CreateDir(const std::string& dirname) override { + Status CreateDir(const std::string& dirname) override { PERF_TIMER_GUARD(env_create_dir_nanos); return EnvWrapper::CreateDir(dirname); } - virtual Status CreateDirIfMissing(const std::string& dirname) override { + Status CreateDirIfMissing(const std::string& dirname) override { PERF_TIMER_GUARD(env_create_dir_if_missing_nanos); return EnvWrapper::CreateDirIfMissing(dirname); } - virtual Status DeleteDir(const std::string& dirname) override { + Status DeleteDir(const std::string& dirname) override { PERF_TIMER_GUARD(env_delete_dir_nanos); return EnvWrapper::DeleteDir(dirname); } - virtual Status GetFileSize(const std::string& fname, - uint64_t* file_size) override { + Status GetFileSize(const std::string& fname, uint64_t* file_size) override { PERF_TIMER_GUARD(env_get_file_size_nanos); return EnvWrapper::GetFileSize(fname, file_size); } - virtual Status GetFileModificationTime(const std::string& fname, - uint64_t* file_mtime) override { + Status GetFileModificationTime(const std::string& fname, + uint64_t* file_mtime) override { PERF_TIMER_GUARD(env_get_file_modification_time_nanos); return EnvWrapper::GetFileModificationTime(fname, file_mtime); } - virtual Status RenameFile(const std::string& src, - const std::string& dst) override { + Status RenameFile(const std::string& src, const std::string& dst) override { PERF_TIMER_GUARD(env_rename_file_nanos); return EnvWrapper::RenameFile(src, dst); } - virtual Status LinkFile(const std::string& src, - const std::string& dst) override { + Status LinkFile(const std::string& src, const std::string& dst) override { PERF_TIMER_GUARD(env_link_file_nanos); return EnvWrapper::LinkFile(src, dst); } - virtual Status LockFile(const std::string& fname, FileLock** lock) override { + Status LockFile(const std::string& fname, FileLock** lock) override { PERF_TIMER_GUARD(env_lock_file_nanos); return EnvWrapper::LockFile(fname, lock); } - virtual Status UnlockFile(FileLock* lock) override { + Status UnlockFile(FileLock* lock) override { PERF_TIMER_GUARD(env_unlock_file_nanos); return EnvWrapper::UnlockFile(lock); } - virtual Status NewLogger(const std::string& fname, - shared_ptr* result) override { + Status NewLogger(const std::string& fname, + std::shared_ptr* result) override { PERF_TIMER_GUARD(env_new_logger_nanos); return EnvWrapper::NewLogger(fname, result); } diff --git a/ceph/src/rocksdb/utilities/geodb/geodb_impl.cc b/ceph/src/rocksdb/utilities/geodb/geodb_impl.cc deleted file mode 100644 index 97c4da0f7..000000000 --- a/ceph/src/rocksdb/utilities/geodb/geodb_impl.cc +++ /dev/null @@ -1,478 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// -#ifndef ROCKSDB_LITE - -#include "utilities/geodb/geodb_impl.h" - -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#include -#include -#include -#include -#include "util/coding.h" -#include "util/filename.h" -#include "util/string_util.h" - -// -// There are two types of keys. The first type of key-values -// maps a geo location to the set of object ids and their values. -// Table 1 -// key : p + : + $quadkey + : + $id + -// : + $latitude + : + $longitude -// value : value of the object -// This table can be used to find all objects that reside near -// a specified geolocation. -// -// Table 2 -// key : 'k' + : + $id -// value: $quadkey - -namespace rocksdb { - -const double GeoDBImpl::PI = 3.141592653589793; -const double GeoDBImpl::EarthRadius = 6378137; -const double GeoDBImpl::MinLatitude = -85.05112878; -const double GeoDBImpl::MaxLatitude = 85.05112878; -const double GeoDBImpl::MinLongitude = -180; -const double GeoDBImpl::MaxLongitude = 180; - -GeoDBImpl::GeoDBImpl(DB* db, const GeoDBOptions& options) : - GeoDB(db, options), db_(db), options_(options) { -} - -GeoDBImpl::~GeoDBImpl() { -} - -Status GeoDBImpl::Insert(const GeoObject& obj) { - WriteBatch batch; - - // It is possible that this id is already associated with - // with a different position. We first have to remove that - // association before we can insert the new one. - - // remove existing object, if it exists - GeoObject old; - Status status = GetById(obj.id, &old); - if (status.ok()) { - assert(obj.id.compare(old.id) == 0); - std::string quadkey = PositionToQuad(old.position, Detail); - std::string key1 = MakeKey1(old.position, old.id, quadkey); - std::string key2 = MakeKey2(old.id); - batch.Delete(Slice(key1)); - batch.Delete(Slice(key2)); - } else if (status.IsNotFound()) { - // What if another thread is trying to insert the same ID concurrently? - } else { - return status; - } - - // insert new object - std::string quadkey = PositionToQuad(obj.position, Detail); - std::string key1 = MakeKey1(obj.position, obj.id, quadkey); - std::string key2 = MakeKey2(obj.id); - batch.Put(Slice(key1), Slice(obj.value)); - batch.Put(Slice(key2), Slice(quadkey)); - return db_->Write(woptions_, &batch); -} - -Status GeoDBImpl::GetByPosition(const GeoPosition& pos, - const Slice& id, - std::string* value) { - std::string quadkey = PositionToQuad(pos, Detail); - std::string key1 = MakeKey1(pos, id, quadkey); - return db_->Get(roptions_, Slice(key1), value); -} - -Status GeoDBImpl::GetById(const Slice& id, GeoObject* object) { - Status status; - std::string quadkey; - - // create an iterator so that we can get a consistent picture - // of the database. - Iterator* iter = db_->NewIterator(roptions_); - - // create key for table2 - std::string kt = MakeKey2(id); - Slice key2(kt); - - iter->Seek(key2); - if (iter->Valid() && iter->status().ok()) { - if (iter->key().compare(key2) == 0) { - quadkey = iter->value().ToString(); - } - } - if (quadkey.size() == 0) { - delete iter; - return Status::NotFound(key2); - } - - // - // Seek to the quadkey + id prefix - // - std::string prefix = MakeKey1Prefix(quadkey, id); - iter->Seek(Slice(prefix)); - assert(iter->Valid()); - if (!iter->Valid() || !iter->status().ok()) { - delete iter; - return Status::NotFound(); - } - - // split the key into p + quadkey + id + lat + lon - Slice key = iter->key(); - std::vector parts = StringSplit(key.ToString(), ':'); - assert(parts.size() == 5); - assert(parts[0] == "p"); - assert(parts[1] == quadkey); - assert(parts[2] == id); - - // fill up output parameters - object->position.latitude = atof(parts[3].c_str()); - object->position.longitude = atof(parts[4].c_str()); - object->id = id.ToString(); // this is redundant - object->value = iter->value().ToString(); - delete iter; - return Status::OK(); -} - - -Status GeoDBImpl::Remove(const Slice& id) { - // Read the object from the database - GeoObject obj; - Status status = GetById(id, &obj); - if (!status.ok()) { - return status; - } - - // remove the object by atomically deleting it from both tables - std::string quadkey = PositionToQuad(obj.position, Detail); - std::string key1 = MakeKey1(obj.position, obj.id, quadkey); - std::string key2 = MakeKey2(obj.id); - WriteBatch batch; - batch.Delete(Slice(key1)); - batch.Delete(Slice(key2)); - return db_->Write(woptions_, &batch); -} - -class GeoIteratorImpl : public GeoIterator { - private: - std::vector values_; - std::vector::iterator iter_; - public: - explicit GeoIteratorImpl(std::vector values) - : values_(std::move(values)) { - iter_ = values_.begin(); - } - virtual void Next() override; - virtual bool Valid() const override; - virtual const GeoObject& geo_object() override; - virtual Status status() const override; -}; - -class GeoErrorIterator : public GeoIterator { - private: - Status status_; - public: - explicit GeoErrorIterator(Status s) : status_(s) {} - virtual void Next() override {}; - virtual bool Valid() const override { return false; } - virtual const GeoObject& geo_object() override { - GeoObject* g = new GeoObject(); - return *g; - } - virtual Status status() const override { return status_; } -}; - -void GeoIteratorImpl::Next() { - assert(Valid()); - iter_++; -} - -bool GeoIteratorImpl::Valid() const { - return iter_ != values_.end(); -} - -const GeoObject& GeoIteratorImpl::geo_object() { - assert(Valid()); - return *iter_; -} - -Status GeoIteratorImpl::status() const { - return Status::OK(); -} - -GeoIterator* GeoDBImpl::SearchRadial(const GeoPosition& pos, - double radius, - int number_of_values) { - std::vector values; - - // Gather all bounding quadkeys - std::vector qids; - Status s = searchQuadIds(pos, radius, &qids); - if (!s.ok()) { - return new GeoErrorIterator(s); - } - - // create an iterator - Iterator* iter = db_->NewIterator(ReadOptions()); - - // Process each prospective quadkey - for (std::string qid : qids) { - // The user is interested in only these many objects. - if (number_of_values == 0) { - break; - } - - // convert quadkey to db key prefix - std::string dbkey = MakeQuadKeyPrefix(qid); - - for (iter->Seek(dbkey); - number_of_values > 0 && iter->Valid() && iter->status().ok(); - iter->Next()) { - // split the key into p + quadkey + id + lat + lon - Slice key = iter->key(); - std::vector parts = StringSplit(key.ToString(), ':'); - assert(parts.size() == 5); - assert(parts[0] == "p"); - std::string* quadkey = &parts[1]; - - // If the key we are looking for is a prefix of the key - // we found from the database, then this is one of the keys - // we are looking for. - auto res = std::mismatch(qid.begin(), qid.end(), quadkey->begin()); - if (res.first == qid.end()) { - GeoPosition obj_pos(atof(parts[3].c_str()), atof(parts[4].c_str())); - GeoObject obj(obj_pos, parts[2], iter->value().ToString()); - values.push_back(obj); - number_of_values--; - } else { - break; - } - } - } - delete iter; - return new GeoIteratorImpl(std::move(values)); -} - -std::string GeoDBImpl::MakeKey1(const GeoPosition& pos, Slice id, - std::string quadkey) { - std::string lat = rocksdb::ToString(pos.latitude); - std::string lon = rocksdb::ToString(pos.longitude); - std::string key = "p:"; - key.reserve(5 + quadkey.size() + id.size() + lat.size() + lon.size()); - key.append(quadkey); - key.append(":"); - key.append(id.ToString()); - key.append(":"); - key.append(lat); - key.append(":"); - key.append(lon); - return key; -} - -std::string GeoDBImpl::MakeKey2(Slice id) { - std::string key = "k:"; - key.append(id.ToString()); - return key; -} - -std::string GeoDBImpl::MakeKey1Prefix(std::string quadkey, - Slice id) { - std::string key = "p:"; - key.reserve(4 + quadkey.size() + id.size()); - key.append(quadkey); - key.append(":"); - key.append(id.ToString()); - key.append(":"); - return key; -} - -std::string GeoDBImpl::MakeQuadKeyPrefix(std::string quadkey) { - std::string key = "p:"; - key.append(quadkey); - return key; -} - -// convert degrees to radians -double GeoDBImpl::radians(double x) { - return (x * PI) / 180; -} - -// convert radians to degrees -double GeoDBImpl::degrees(double x) { - return (x * 180) / PI; -} - -// convert a gps location to quad coordinate -std::string GeoDBImpl::PositionToQuad(const GeoPosition& pos, - int levelOfDetail) { - Pixel p = PositionToPixel(pos, levelOfDetail); - Tile tile = PixelToTile(p); - return TileToQuadKey(tile, levelOfDetail); -} - -GeoPosition GeoDBImpl::displaceLatLon(double lat, double lon, - double deltay, double deltax) { - double dLat = deltay / EarthRadius; - double dLon = deltax / (EarthRadius * cos(radians(lat))); - return GeoPosition(lat + degrees(dLat), - lon + degrees(dLon)); -} - -// -// Return the distance between two positions on the earth -// -double GeoDBImpl::distance(double lat1, double lon1, - double lat2, double lon2) { - double lon = radians(lon2 - lon1); - double lat = radians(lat2 - lat1); - - double a = (sin(lat / 2) * sin(lat / 2)) + - cos(radians(lat1)) * cos(radians(lat2)) * - (sin(lon / 2) * sin(lon / 2)); - double angle = 2 * atan2(sqrt(a), sqrt(1 - a)); - return angle * EarthRadius; -} - -// -// Returns all the quadkeys inside the search range -// -Status GeoDBImpl::searchQuadIds(const GeoPosition& position, - double radius, - std::vector* quadKeys) { - // get the outline of the search square - GeoPosition topLeftPos = boundingTopLeft(position, radius); - GeoPosition bottomRightPos = boundingBottomRight(position, radius); - - Pixel topLeft = PositionToPixel(topLeftPos, Detail); - Pixel bottomRight = PositionToPixel(bottomRightPos, Detail); - - // how many level of details to look for - int numberOfTilesAtMaxDepth = static_cast(std::floor((bottomRight.x - topLeft.x) / 256)); - int zoomLevelsToRise = static_cast(std::floor(std::log(numberOfTilesAtMaxDepth) / std::log(2))); - zoomLevelsToRise++; - int levels = std::max(0, Detail - zoomLevelsToRise); - - quadKeys->push_back(PositionToQuad(GeoPosition(topLeftPos.latitude, - topLeftPos.longitude), - levels)); - quadKeys->push_back(PositionToQuad(GeoPosition(topLeftPos.latitude, - bottomRightPos.longitude), - levels)); - quadKeys->push_back(PositionToQuad(GeoPosition(bottomRightPos.latitude, - topLeftPos.longitude), - levels)); - quadKeys->push_back(PositionToQuad(GeoPosition(bottomRightPos.latitude, - bottomRightPos.longitude), - levels)); - return Status::OK(); -} - -// Determines the ground resolution (in meters per pixel) at a specified -// latitude and level of detail. -// Latitude (in degrees) at which to measure the ground resolution. -// Level of detail, from 1 (lowest detail) to 23 (highest detail). -// Returns the ground resolution, in meters per pixel. -double GeoDBImpl::GroundResolution(double latitude, int levelOfDetail) { - latitude = clip(latitude, MinLatitude, MaxLatitude); - return cos(latitude * PI / 180) * 2 * PI * EarthRadius / - MapSize(levelOfDetail); -} - -// Converts a point from latitude/longitude WGS-84 coordinates (in degrees) -// into pixel XY coordinates at a specified level of detail. -GeoDBImpl::Pixel GeoDBImpl::PositionToPixel(const GeoPosition& pos, - int levelOfDetail) { - double latitude = clip(pos.latitude, MinLatitude, MaxLatitude); - double x = (pos.longitude + 180) / 360; - double sinLatitude = sin(latitude * PI / 180); - double y = 0.5 - std::log((1 + sinLatitude) / (1 - sinLatitude)) / (4 * PI); - double mapSize = MapSize(levelOfDetail); - double X = std::floor(clip(x * mapSize + 0.5, 0, mapSize - 1)); - double Y = std::floor(clip(y * mapSize + 0.5, 0, mapSize - 1)); - return Pixel((unsigned int)X, (unsigned int)Y); -} - -GeoPosition GeoDBImpl::PixelToPosition(const Pixel& pixel, int levelOfDetail) { - double mapSize = MapSize(levelOfDetail); - double x = (clip(pixel.x, 0, mapSize - 1) / mapSize) - 0.5; - double y = 0.5 - (clip(pixel.y, 0, mapSize - 1) / mapSize); - double latitude = 90 - 360 * atan(exp(-y * 2 * PI)) / PI; - double longitude = 360 * x; - return GeoPosition(latitude, longitude); -} - -// Converts a Pixel to a Tile -GeoDBImpl::Tile GeoDBImpl::PixelToTile(const Pixel& pixel) { - unsigned int tileX = static_cast(std::floor(pixel.x / 256)); - unsigned int tileY = static_cast(std::floor(pixel.y / 256)); - return Tile(tileX, tileY); -} - -GeoDBImpl::Pixel GeoDBImpl::TileToPixel(const Tile& tile) { - unsigned int pixelX = tile.x * 256; - unsigned int pixelY = tile.y * 256; - return Pixel(pixelX, pixelY); -} - -// Convert a Tile to a quadkey -std::string GeoDBImpl::TileToQuadKey(const Tile& tile, int levelOfDetail) { - std::stringstream quadKey; - for (int i = levelOfDetail; i > 0; i--) { - char digit = '0'; - int mask = 1 << (i - 1); - if ((tile.x & mask) != 0) { - digit++; - } - if ((tile.y & mask) != 0) { - digit++; - digit++; - } - quadKey << digit; - } - return quadKey.str(); -} - -// -// Convert a quadkey to a tile and its level of detail -// -void GeoDBImpl::QuadKeyToTile(std::string quadkey, Tile* tile, - int* levelOfDetail) { - tile->x = tile->y = 0; - *levelOfDetail = static_cast(quadkey.size()); - const char* key = reinterpret_cast(quadkey.c_str()); - for (int i = *levelOfDetail; i > 0; i--) { - int mask = 1 << (i - 1); - switch (key[*levelOfDetail - i]) { - case '0': - break; - - case '1': - tile->x |= mask; - break; - - case '2': - tile->y |= mask; - break; - - case '3': - tile->x |= mask; - tile->y |= mask; - break; - - default: - std::stringstream msg; - msg << quadkey; - msg << " Invalid QuadKey."; - throw std::runtime_error(msg.str()); - } - } -} -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/geodb/geodb_impl.h b/ceph/src/rocksdb/utilities/geodb/geodb_impl.h deleted file mode 100644 index 6b15f5422..000000000 --- a/ceph/src/rocksdb/utilities/geodb/geodb_impl.h +++ /dev/null @@ -1,185 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// - -#ifndef ROCKSDB_LITE - -#pragma once -#include -#include -#include -#include -#include -#include - -#include "rocksdb/utilities/geo_db.h" -#include "rocksdb/utilities/stackable_db.h" -#include "rocksdb/env.h" -#include "rocksdb/status.h" - -namespace rocksdb { - -// A specific implementation of GeoDB - -class GeoDBImpl : public GeoDB { - public: - GeoDBImpl(DB* db, const GeoDBOptions& options); - ~GeoDBImpl(); - - // Associate the GPS location with the identified by 'id'. The value - // is a blob that is associated with this object. - virtual Status Insert(const GeoObject& object) override; - - // Retrieve the value of the object located at the specified GPS - // location and is identified by the 'id'. - virtual Status GetByPosition(const GeoPosition& pos, const Slice& id, - std::string* value) override; - - // Retrieve the value of the object identified by the 'id'. This method - // could be potentially slower than GetByPosition - virtual Status GetById(const Slice& id, GeoObject* object) override; - - // Delete the specified object - virtual Status Remove(const Slice& id) override; - - // Returns a list of all items within a circular radius from the - // specified gps location - virtual GeoIterator* SearchRadial(const GeoPosition& pos, double radius, - int number_of_values) override; - - private: - DB* db_; - const GeoDBOptions options_; - const WriteOptions woptions_; - const ReadOptions roptions_; - - // MSVC requires the definition for this static const to be in .CC file - // The value of PI - static const double PI; - - // convert degrees to radians - static double radians(double x); - - // convert radians to degrees - static double degrees(double x); - - // A pixel class that captures X and Y coordinates - class Pixel { - public: - unsigned int x; - unsigned int y; - Pixel(unsigned int a, unsigned int b) : - x(a), y(b) { - } - }; - - // A Tile in the geoid - class Tile { - public: - unsigned int x; - unsigned int y; - Tile(unsigned int a, unsigned int b) : - x(a), y(b) { - } - }; - - // convert a gps location to quad coordinate - static std::string PositionToQuad(const GeoPosition& pos, int levelOfDetail); - - // arbitrary constant use for WGS84 via - // http://en.wikipedia.org/wiki/World_Geodetic_System - // http://mathforum.org/library/drmath/view/51832.html - // http://msdn.microsoft.com/en-us/library/bb259689.aspx - // http://www.tuicool.com/articles/NBrE73 - // - const int Detail = 23; - // MSVC requires the definition for this static const to be in .CC file - static const double EarthRadius; - static const double MinLatitude; - static const double MaxLatitude; - static const double MinLongitude; - static const double MaxLongitude; - - // clips a number to the specified minimum and maximum values. - static double clip(double n, double minValue, double maxValue) { - return fmin(fmax(n, minValue), maxValue); - } - - // Determines the map width and height (in pixels) at a specified level - // of detail, from 1 (lowest detail) to 23 (highest detail). - // Returns the map width and height in pixels. - static unsigned int MapSize(int levelOfDetail) { - return (unsigned int)(256 << levelOfDetail); - } - - // Determines the ground resolution (in meters per pixel) at a specified - // latitude and level of detail. - // Latitude (in degrees) at which to measure the ground resolution. - // Level of detail, from 1 (lowest detail) to 23 (highest detail). - // Returns the ground resolution, in meters per pixel. - static double GroundResolution(double latitude, int levelOfDetail); - - // Converts a point from latitude/longitude WGS-84 coordinates (in degrees) - // into pixel XY coordinates at a specified level of detail. - static Pixel PositionToPixel(const GeoPosition& pos, int levelOfDetail); - - static GeoPosition PixelToPosition(const Pixel& pixel, int levelOfDetail); - - // Converts a Pixel to a Tile - static Tile PixelToTile(const Pixel& pixel); - - static Pixel TileToPixel(const Tile& tile); - - // Convert a Tile to a quadkey - static std::string TileToQuadKey(const Tile& tile, int levelOfDetail); - - // Convert a quadkey to a tile and its level of detail - static void QuadKeyToTile(std::string quadkey, Tile* tile, - int *levelOfDetail); - - // Return the distance between two positions on the earth - static double distance(double lat1, double lon1, - double lat2, double lon2); - static GeoPosition displaceLatLon(double lat, double lon, - double deltay, double deltax); - - // - // Returns the top left position after applying the delta to - // the specified position - // - static GeoPosition boundingTopLeft(const GeoPosition& in, double radius) { - return displaceLatLon(in.latitude, in.longitude, -radius, -radius); - } - - // - // Returns the bottom right position after applying the delta to - // the specified position - static GeoPosition boundingBottomRight(const GeoPosition& in, - double radius) { - return displaceLatLon(in.latitude, in.longitude, radius, radius); - } - - // - // Get all quadkeys within a radius of a specified position - // - Status searchQuadIds(const GeoPosition& position, - double radius, - std::vector* quadKeys); - - // - // Create keys for accessing rocksdb table(s) - // - static std::string MakeKey1(const GeoPosition& pos, - Slice id, - std::string quadkey); - static std::string MakeKey2(Slice id); - static std::string MakeKey1Prefix(std::string quadkey, - Slice id); - static std::string MakeQuadKeyPrefix(std::string quadkey); -}; - -} // namespace rocksdb - -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/geodb/geodb_test.cc b/ceph/src/rocksdb/utilities/geodb/geodb_test.cc deleted file mode 100644 index 8477c86a3..000000000 --- a/ceph/src/rocksdb/utilities/geodb/geodb_test.cc +++ /dev/null @@ -1,201 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -// -#ifndef ROCKSDB_LITE -#include "utilities/geodb/geodb_impl.h" - -#include -#include "util/testharness.h" - -namespace rocksdb { - -class GeoDBTest : public testing::Test { - public: - static const std::string kDefaultDbName; - static Options options; - DB* db; - GeoDB* geodb; - - GeoDBTest() { - GeoDBOptions geodb_options; - EXPECT_OK(DestroyDB(kDefaultDbName, options)); - options.create_if_missing = true; - Status status = DB::Open(options, kDefaultDbName, &db); - geodb = new GeoDBImpl(db, geodb_options); - } - - ~GeoDBTest() { - delete geodb; - } - - GeoDB* getdb() { - return geodb; - } -}; - -const std::string GeoDBTest::kDefaultDbName = - test::PerThreadDBPath("geodb_test"); -Options GeoDBTest::options = Options(); - -// Insert, Get and Remove -TEST_F(GeoDBTest, SimpleTest) { - GeoPosition pos1(100, 101); - std::string id1("id1"); - std::string value1("value1"); - - // insert first object into database - GeoObject obj1(pos1, id1, value1); - Status status = getdb()->Insert(obj1); - ASSERT_TRUE(status.ok()); - - // insert second object into database - GeoPosition pos2(200, 201); - std::string id2("id2"); - std::string value2 = "value2"; - GeoObject obj2(pos2, id2, value2); - status = getdb()->Insert(obj2); - ASSERT_TRUE(status.ok()); - - // retrieve first object using position - std::string value; - status = getdb()->GetByPosition(pos1, Slice(id1), &value); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(value, value1); - - // retrieve first object using id - GeoObject obj; - status = getdb()->GetById(Slice(id1), &obj); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(obj.position.latitude, 100); - ASSERT_EQ(obj.position.longitude, 101); - ASSERT_EQ(obj.id.compare(id1), 0); - ASSERT_EQ(obj.value, value1); - - // delete first object - status = getdb()->Remove(Slice(id1)); - ASSERT_TRUE(status.ok()); - status = getdb()->GetByPosition(pos1, Slice(id1), &value); - ASSERT_TRUE(status.IsNotFound()); - status = getdb()->GetById(id1, &obj); - ASSERT_TRUE(status.IsNotFound()); - - // check that we can still find second object - status = getdb()->GetByPosition(pos2, id2, &value); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(value, value2); - status = getdb()->GetById(id2, &obj); - ASSERT_TRUE(status.ok()); -} - -// Search. -// Verify distances via http://www.stevemorse.org/nearest/distance.php -TEST_F(GeoDBTest, Search) { - GeoPosition pos1(45, 45); - std::string id1("mid1"); - std::string value1 = "midvalue1"; - - // insert object at 45 degree latitude - GeoObject obj1(pos1, id1, value1); - Status status = getdb()->Insert(obj1); - ASSERT_TRUE(status.ok()); - - // search all objects centered at 46 degree latitude with - // a radius of 200 kilometers. We should find the one object that - // we inserted earlier. - GeoIterator* iter1 = getdb()->SearchRadial(GeoPosition(46, 46), 200000); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(iter1->geo_object().value, "midvalue1"); - uint32_t size = 0; - while (iter1->Valid()) { - GeoObject obj; - status = getdb()->GetById(Slice(id1), &obj); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(iter1->geo_object().position.latitude, pos1.latitude); - ASSERT_EQ(iter1->geo_object().position.longitude, pos1.longitude); - ASSERT_EQ(iter1->geo_object().id.compare(id1), 0); - ASSERT_EQ(iter1->geo_object().value, value1); - - size++; - iter1->Next(); - ASSERT_TRUE(!iter1->Valid()); - } - ASSERT_EQ(size, 1U); - delete iter1; - - // search all objects centered at 46 degree latitude with - // a radius of 2 kilometers. There should be none. - GeoIterator* iter2 = getdb()->SearchRadial(GeoPosition(46, 46), 2); - ASSERT_TRUE(status.ok()); - ASSERT_FALSE(iter2->Valid()); - delete iter2; -} - -TEST_F(GeoDBTest, DifferentPosInSameQuadkey) { - // insert obj1 into database - GeoPosition pos1(40.00001, 116.00001); - std::string id1("12"); - std::string value1("value1"); - - GeoObject obj1(pos1, id1, value1); - Status status = getdb()->Insert(obj1); - ASSERT_TRUE(status.ok()); - - // insert obj2 into database - GeoPosition pos2(40.00002, 116.00002); - std::string id2("123"); - std::string value2 = "value2"; - - GeoObject obj2(pos2, id2, value2); - status = getdb()->Insert(obj2); - ASSERT_TRUE(status.ok()); - - // get obj1's quadkey - ReadOptions opt; - PinnableSlice quadkey1; - status = getdb()->Get(opt, getdb()->DefaultColumnFamily(), "k:" + id1, &quadkey1); - ASSERT_TRUE(status.ok()); - - // get obj2's quadkey - PinnableSlice quadkey2; - status = getdb()->Get(opt, getdb()->DefaultColumnFamily(), "k:" + id2, &quadkey2); - ASSERT_TRUE(status.ok()); - - // obj1 and obj2 have the same quadkey - ASSERT_EQ(quadkey1, quadkey2); - - // get obj1 by id, and check value - GeoObject obj; - status = getdb()->GetById(Slice(id1), &obj); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(obj.position.latitude, pos1.latitude); - ASSERT_EQ(obj.position.longitude, pos1.longitude); - ASSERT_EQ(obj.id.compare(id1), 0); - ASSERT_EQ(obj.value, value1); - - // get obj2 by id, and check value - status = getdb()->GetById(Slice(id2), &obj); - ASSERT_TRUE(status.ok()); - ASSERT_EQ(obj.position.latitude, pos2.latitude); - ASSERT_EQ(obj.position.longitude, pos2.longitude); - ASSERT_EQ(obj.id.compare(id2), 0); - ASSERT_EQ(obj.value, value2); -} - -} // namespace rocksdb - -int main(int argc, char* argv[]) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} -#else - -#include - -int main() { - fprintf(stderr, "SKIPPED\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/lua/rocks_lua_compaction_filter.cc b/ceph/src/rocksdb/utilities/lua/rocks_lua_compaction_filter.cc deleted file mode 100644 index a976e9f8a..000000000 --- a/ceph/src/rocksdb/utilities/lua/rocks_lua_compaction_filter.cc +++ /dev/null @@ -1,242 +0,0 @@ -// Copyright (c) 2016, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#if defined(LUA) && !defined(ROCKSDB_LITE) -#include "rocksdb/utilities/lua/rocks_lua_compaction_filter.h" - -extern "C" { -#include -} - -#include "rocksdb/compaction_filter.h" - -namespace rocksdb { -namespace lua { - -const std::string kFilterFunctionName = "Filter"; -const std::string kNameFunctionName = "Name"; - -void RocksLuaCompactionFilter::LogLuaError(const char* format, ...) const { - if (options_.error_log.get() != nullptr && - error_count_ < options_.error_limit_per_filter) { - error_count_++; - - va_list ap; - va_start(ap, format); - options_.error_log->Logv(InfoLogLevel::ERROR_LEVEL, format, ap); - va_end(ap); - } -} - -bool RocksLuaCompactionFilter::Filter(int level, const Slice& key, - const Slice& existing_value, - std::string* new_value, - bool* value_changed) const { - auto* lua_state = lua_state_wrapper_.GetLuaState(); - // push the right function into the lua stack - lua_getglobal(lua_state, kFilterFunctionName.c_str()); - - int error_no = 0; - int num_input_values; - int num_return_values; - if (options_.ignore_value == false) { - // push input arguments into the lua stack - lua_pushnumber(lua_state, level); - lua_pushlstring(lua_state, key.data(), key.size()); - lua_pushlstring(lua_state, existing_value.data(), existing_value.size()); - num_input_values = 3; - num_return_values = 3; - } else { - // If ignore_value is set to true, then we only put two arguments - // and expect one return value - lua_pushnumber(lua_state, level); - lua_pushlstring(lua_state, key.data(), key.size()); - num_input_values = 2; - num_return_values = 1; - } - - // perform the lua call - if ((error_no = - lua_pcall(lua_state, num_input_values, num_return_values, 0)) != 0) { - LogLuaError("[Lua] Error(%d) in Filter function --- %s", error_no, - lua_tostring(lua_state, -1)); - // pops out the lua error from stack - lua_pop(lua_state, 1); - return false; - } - - // As lua_pcall went successfully, it can be guaranteed that the top - // three elements in the Lua stack are the three returned values. - - bool has_error = false; - const int kIndexIsFiltered = -num_return_values; - const int kIndexValueChanged = -num_return_values + 1; - const int kIndexNewValue = -num_return_values + 2; - - // check the types of three return values - // is_filtered - if (!lua_isboolean(lua_state, kIndexIsFiltered)) { - LogLuaError( - "[Lua] Error in Filter function -- " - "1st return value (is_filtered) is not a boolean " - "while a boolean is expected."); - has_error = true; - } - - if (options_.ignore_value == false) { - // value_changed - if (!lua_isboolean(lua_state, kIndexValueChanged)) { - LogLuaError( - "[Lua] Error in Filter function -- " - "2nd return value (value_changed) is not a boolean " - "while a boolean is expected."); - has_error = true; - } - // new_value - if (!lua_isstring(lua_state, kIndexNewValue)) { - LogLuaError( - "[Lua] Error in Filter function -- " - "3rd return value (new_value) is not a string " - "while a string is expected."); - has_error = true; - } - } - - if (has_error) { - lua_pop(lua_state, num_return_values); - return false; - } - - // Fetch the return values - bool is_filtered = false; - if (!has_error) { - is_filtered = lua_toboolean(lua_state, kIndexIsFiltered); - if (options_.ignore_value == false) { - *value_changed = lua_toboolean(lua_state, kIndexValueChanged); - if (*value_changed) { - const char* new_value_buf = lua_tostring(lua_state, kIndexNewValue); - const size_t new_value_size = lua_strlen(lua_state, kIndexNewValue); - // Note that any string that lua_tostring returns always has a zero at - // its end, bu/t it can have other zeros inside it - assert(new_value_buf[new_value_size] == '\0'); - assert(strlen(new_value_buf) <= new_value_size); - new_value->assign(new_value_buf, new_value_size); - } - } else { - *value_changed = false; - } - } - // pops the three return values. - lua_pop(lua_state, num_return_values); - return is_filtered; -} - -const char* RocksLuaCompactionFilter::Name() const { - if (name_ != "") { - return name_.c_str(); - } - auto* lua_state = lua_state_wrapper_.GetLuaState(); - // push the right function into the lua stack - lua_getglobal(lua_state, kNameFunctionName.c_str()); - - // perform the call (0 arguments, 1 result) - int error_no; - if ((error_no = lua_pcall(lua_state, 0, 1, 0)) != 0) { - LogLuaError("[Lua] Error(%d) in Name function --- %s", error_no, - lua_tostring(lua_state, -1)); - // pops out the lua error from stack - lua_pop(lua_state, 1); - return name_.c_str(); - } - - // check the return value - if (!lua_isstring(lua_state, -1)) { - LogLuaError( - "[Lua] Error in Name function -- " - "return value is not a string while string is expected"); - } else { - const char* name_buf = lua_tostring(lua_state, -1); - const size_t name_size __attribute__((__unused__)) = lua_strlen(lua_state, -1); - assert(name_buf[name_size] == '\0'); - assert(strlen(name_buf) <= name_size); - name_ = name_buf; - } - lua_pop(lua_state, 1); - return name_.c_str(); -} - -/* Not yet supported -bool RocksLuaCompactionFilter::FilterMergeOperand( - int level, const Slice& key, const Slice& operand) const { - auto* lua_state = lua_state_wrapper_.GetLuaState(); - // push the right function into the lua stack - lua_getglobal(lua_state, "FilterMergeOperand"); - - // push input arguments into the lua stack - lua_pushnumber(lua_state, level); - lua_pushlstring(lua_state, key.data(), key.size()); - lua_pushlstring(lua_state, operand.data(), operand.size()); - - // perform the call (3 arguments, 1 result) - int error_no; - if ((error_no = lua_pcall(lua_state, 3, 1, 0)) != 0) { - LogLuaError("[Lua] Error(%d) in FilterMergeOperand function --- %s", - error_no, lua_tostring(lua_state, -1)); - // pops out the lua error from stack - lua_pop(lua_state, 1); - return false; - } - - bool is_filtered = false; - // check the return value - if (!lua_isboolean(lua_state, -1)) { - LogLuaError("[Lua] Error in FilterMergeOperand function -- " - "return value is not a boolean while boolean is expected"); - } else { - is_filtered = lua_toboolean(lua_state, -1); - } - - lua_pop(lua_state, 1); - - return is_filtered; -} -*/ - -bool RocksLuaCompactionFilter::IgnoreSnapshots() const { - return options_.ignore_snapshots; -} - -RocksLuaCompactionFilterFactory::RocksLuaCompactionFilterFactory( - const RocksLuaCompactionFilterOptions opt) - : opt_(opt) { - auto filter = CreateCompactionFilter(CompactionFilter::Context()); - name_ = std::string("RocksLuaCompactionFilterFactory::") + - std::string(filter->Name()); -} - -std::unique_ptr -RocksLuaCompactionFilterFactory::CreateCompactionFilter( - const CompactionFilter::Context& /*context*/) { - std::lock_guard lock(opt_mutex_); - return std::unique_ptr(new RocksLuaCompactionFilter(opt_)); -} - -std::string RocksLuaCompactionFilterFactory::GetScript() { - std::lock_guard lock(opt_mutex_); - return opt_.lua_script; -} - -void RocksLuaCompactionFilterFactory::SetScript(const std::string& new_script) { - std::lock_guard lock(opt_mutex_); - opt_.lua_script = new_script; -} - -const char* RocksLuaCompactionFilterFactory::Name() const { - return name_.c_str(); -} - -} // namespace lua -} // namespace rocksdb -#endif // defined(LUA) && !defined(ROCKSDB_LITE) diff --git a/ceph/src/rocksdb/utilities/lua/rocks_lua_test.cc b/ceph/src/rocksdb/utilities/lua/rocks_lua_test.cc deleted file mode 100644 index cc9c14166..000000000 --- a/ceph/src/rocksdb/utilities/lua/rocks_lua_test.cc +++ /dev/null @@ -1,498 +0,0 @@ -// Copyright (c) 2016, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#include - -#if !defined(ROCKSDB_LITE) - -#if defined(LUA) - -#include - -#include "db/db_test_util.h" -#include "port/stack_trace.h" -#include "rocksdb/compaction_filter.h" -#include "rocksdb/db.h" -#include "rocksdb/utilities/lua/rocks_lua_compaction_filter.h" -#include "util/testharness.h" - -namespace rocksdb { - -class StopOnErrorLogger : public Logger { - public: - using Logger::Logv; - virtual void Logv(const char* format, va_list ap) override { - vfprintf(stderr, format, ap); - fprintf(stderr, "\n"); - FAIL(); - } -}; - - -class RocksLuaTest : public testing::Test { - public: - RocksLuaTest() : rnd_(301) { - temp_dir_ = test::TmpDir(Env::Default()); - db_ = nullptr; - } - - std::string RandomString(int len) { - std::string res; - for (int i = 0; i < len; ++i) { - res += rnd_.Uniform(26) + 'a'; - } - return res; - } - - void CreateDBWithLuaCompactionFilter( - const lua::RocksLuaCompactionFilterOptions& lua_opt, - const std::string& db_path, - std::unordered_map* kvs, - const int kNumFlushes = 5, - std::shared_ptr* - output_factory = nullptr) { - const int kKeySize = 10; - const int kValueSize = 50; - const int kKeysPerFlush = 2; - auto factory = - std::make_shared( - lua_opt); - if (output_factory != nullptr) { - *output_factory = factory; - } - - options_ = Options(); - options_.create_if_missing = true; - options_.compaction_filter_factory = factory; - options_.disable_auto_compactions = true; - options_.max_bytes_for_level_base = - (kKeySize + kValueSize) * kKeysPerFlush * 2; - options_.max_bytes_for_level_multiplier = 2; - options_.target_file_size_base = (kKeySize + kValueSize) * kKeysPerFlush; - options_.level0_file_num_compaction_trigger = 2; - DestroyDB(db_path, options_); - ASSERT_OK(DB::Open(options_, db_path, &db_)); - - for (int f = 0; f < kNumFlushes; ++f) { - for (int i = 0; i < kKeysPerFlush; ++i) { - std::string key = RandomString(kKeySize); - std::string value = RandomString(kValueSize); - kvs->insert({key, value}); - ASSERT_OK(db_->Put(WriteOptions(), key, value)); - } - db_->Flush(FlushOptions()); - } - } - - ~RocksLuaTest() { - if (db_) { - delete db_; - } - } - std::string temp_dir_; - DB* db_; - Random rnd_; - Options options_; -}; - -TEST_F(RocksLuaTest, Default) { - // If nothing is set in the LuaCompactionFilterOptions, then - // RocksDB will keep all the key / value pairs, but it will also - // print our error log indicating failure. - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - - for (auto const& entry : kvs) { - std::string value; - ASSERT_OK(db_->Get(ReadOptions(), entry.first, &value)); - ASSERT_EQ(value, entry.second); - } -} - -TEST_F(RocksLuaTest, KeepsAll) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // keeps all the key value pairs - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " return false, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"KeepsAll\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - - for (auto const& entry : kvs) { - std::string value; - ASSERT_OK(db_->Get(ReadOptions(), entry.first, &value)); - ASSERT_EQ(value, entry.second); - } -} - -TEST_F(RocksLuaTest, GetName) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - const std::string kScriptName = "SimpleLuaCompactionFilter"; - lua_opt.lua_script = - std::string( - "function Filter(level, key, existing_value)\n" - " return false, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"") + kScriptName + "\"\n" - "end\n" - "\n"; - - std::shared_ptr factory = - std::make_shared(lua_opt); - std::string factory_name(factory->Name()); - ASSERT_NE(factory_name.find(kScriptName), std::string::npos); -} - -TEST_F(RocksLuaTest, RemovesAll) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // removes all the key value pairs - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " return true, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"RemovesAll\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } -} - -TEST_F(RocksLuaTest, FilterByKey) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // removes all keys whose initial is less than 'r' - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " if key:sub(1,1) < 'r' then\n" - " return true, false, \"\"\n" - " end\n" - " return false, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"KeepsAll\"\n" - "end\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - if (entry.first[0] < 'r') { - ASSERT_TRUE(s.IsNotFound()); - } else { - ASSERT_TRUE(s.ok()); - ASSERT_TRUE(value == entry.second); - } - } -} - -TEST_F(RocksLuaTest, FilterByValue) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // removes all values whose initial is less than 'r' - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " if existing_value:sub(1,1) < 'r' then\n" - " return true, false, \"\"\n" - " end\n" - " return false, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"FilterByValue\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - if (entry.second[0] < 'r') { - ASSERT_TRUE(s.IsNotFound()); - } else { - ASSERT_TRUE(s.ok()); - ASSERT_EQ(value, entry.second); - } - } -} - -TEST_F(RocksLuaTest, ChangeValue) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // Replace all values by their reversed key - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " return false, true, key:reverse()\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"ChangeValue\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - ASSERT_OK(db_->Get(ReadOptions(), entry.first, &value)); - std::string new_value = entry.first; - std::reverse(new_value.begin(), new_value.end()); - ASSERT_EQ(value, new_value); - } -} - -TEST_F(RocksLuaTest, ConditionallyChangeAndFilterValue) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // Performs the following logic: - // If key[0] < 'h' --> replace value by reverse key - // If key[0] >= 'r' --> keep the original key value - // Otherwise, filter the key value - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " if key:sub(1,1) < 'h' then\n" - " return false, true, key:reverse()\n" - " elseif key:sub(1,1) < 'r' then\n" - " return true, false, \"\"\n" - " end\n" - " return false, false, \"\"\n" - "end\n" - "function Name()\n" - " return \"ConditionallyChangeAndFilterValue\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs); - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - if (entry.first[0] < 'h') { - ASSERT_TRUE(s.ok()); - std::string new_value = entry.first; - std::reverse(new_value.begin(), new_value.end()); - ASSERT_EQ(value, new_value); - } else if (entry.first[0] < 'r') { - ASSERT_TRUE(s.IsNotFound()); - } else { - ASSERT_TRUE(s.ok()); - ASSERT_EQ(value, entry.second); - } - } -} - -TEST_F(RocksLuaTest, DynamicChangeScript) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - lua_opt.error_log = std::make_shared(); - // keeps all the key value pairs - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " return false, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"KeepsAll\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - std::shared_ptr factory; - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs, 30, &factory); - uint64_t count = 0; - ASSERT_TRUE(db_->GetIntProperty( - rocksdb::DB::Properties::kNumEntriesActiveMemTable, &count)); - ASSERT_EQ(count, 0); - ASSERT_TRUE(db_->GetIntProperty( - rocksdb::DB::Properties::kNumEntriesImmMemTables, &count)); - ASSERT_EQ(count, 0); - - CompactRangeOptions cr_opt; - cr_opt.bottommost_level_compaction = - rocksdb::BottommostLevelCompaction::kForce; - - // Issue full compaction and expect everything is in the DB. - ASSERT_OK(db_->CompactRange(cr_opt, nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - ASSERT_OK(db_->Get(ReadOptions(), entry.first, &value)); - ASSERT_EQ(value, entry.second); - } - - // change the lua script to removes all the key value pairs - factory->SetScript( - "function Filter(level, key, existing_value)\n" - " return true, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"RemovesAll\"\n" - "end\n" - "\n"); - { - std::string key = "another-key"; - std::string value = "another-value"; - kvs.insert({key, value}); - ASSERT_OK(db_->Put(WriteOptions(), key, value)); - db_->Flush(FlushOptions()); - } - - cr_opt.change_level = true; - cr_opt.target_level = 5; - // Issue full compaction and expect nothing is in the DB. - ASSERT_OK(db_->CompactRange(cr_opt, nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - ASSERT_TRUE(s.IsNotFound()); - } -} - -TEST_F(RocksLuaTest, LuaConditionalTypeError) { - std::string db_path = test::PerThreadDBPath(temp_dir_, "rocks_lua_test"); - - lua::RocksLuaCompactionFilterOptions lua_opt; - // Filter() error when input key's initial >= 'r' - lua_opt.lua_script = - "function Filter(level, key, existing_value)\n" - " if existing_value:sub(1,1) >= 'r' then\n" - " return true, 2, \"\" -- incorrect type of 2nd return value\n" - " end\n" - " return true, false, \"\"\n" - "end\n" - "\n" - "function FilterMergeOperand(level, key, operand)\n" - " return false\n" - "end\n" - "function Name()\n" - " return \"BuggyCode\"\n" - "end\n" - "\n"; - - std::unordered_map kvs; - // Create DB with 10 files - CreateDBWithLuaCompactionFilter(lua_opt, db_path, &kvs, 10); - - // Issue full compaction and expect all keys which initial is < 'r' - // will be deleted as we keep the key value when we hit an error. - ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr)); - - for (auto const& entry : kvs) { - std::string value; - auto s = db_->Get(ReadOptions(), entry.first, &value); - if (entry.second[0] < 'r') { - ASSERT_TRUE(s.IsNotFound()); - } else { - ASSERT_TRUE(s.ok()); - ASSERT_EQ(value, entry.second); - } - } -} - -} // namespace rocksdb - -int main(int argc, char** argv) { - rocksdb::port::InstallStackTraceHandler(); - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else - -int main(int /*argc*/, char** /*argv*/) { - printf("LUA_PATH is not set. Ignoring the test.\n"); -} - -#endif // defined(LUA) - -#else - -int main(int /*argc*/, char** /*argv*/) { - printf("Lua is not supported in RocksDBLite. Ignoring the test.\n"); -} - -#endif // !defined(ROCKSDB_LITE) diff --git a/ceph/src/rocksdb/utilities/merge_operators/max.cc b/ceph/src/rocksdb/utilities/merge_operators/max.cc index 732f203e3..1ef66a34b 100644 --- a/ceph/src/rocksdb/utilities/merge_operators/max.cc +++ b/ceph/src/rocksdb/utilities/merge_operators/max.cc @@ -19,8 +19,8 @@ namespace { // anonymous namespace // Slice::compare class MaxOperator : public MergeOperator { public: - virtual bool FullMergeV2(const MergeOperationInput& merge_in, - MergeOperationOutput* merge_out) const override { + bool FullMergeV2(const MergeOperationInput& merge_in, + MergeOperationOutput* merge_out) const override { Slice& max = merge_out->existing_operand; if (merge_in.existing_value) { max = Slice(merge_in.existing_value->data(), @@ -38,9 +38,9 @@ class MaxOperator : public MergeOperator { return true; } - virtual bool PartialMerge(const Slice& /*key*/, const Slice& left_operand, - const Slice& right_operand, std::string* new_value, - Logger* /*logger*/) const override { + bool PartialMerge(const Slice& /*key*/, const Slice& left_operand, + const Slice& right_operand, std::string* new_value, + Logger* /*logger*/) const override { if (left_operand.compare(right_operand) >= 0) { new_value->assign(left_operand.data(), left_operand.size()); } else { @@ -49,10 +49,10 @@ class MaxOperator : public MergeOperator { return true; } - virtual bool PartialMergeMulti(const Slice& /*key*/, - const std::deque& operand_list, - std::string* new_value, - Logger* /*logger*/) const override { + bool PartialMergeMulti(const Slice& /*key*/, + const std::deque& operand_list, + std::string* new_value, + Logger* /*logger*/) const override { Slice max; for (const auto& operand : operand_list) { if (max.compare(operand) < 0) { @@ -64,7 +64,7 @@ class MaxOperator : public MergeOperator { return true; } - virtual const char* Name() const override { return "MaxOperator"; } + const char* Name() const override { return "MaxOperator"; } }; } // end of anonymous namespace diff --git a/ceph/src/rocksdb/utilities/merge_operators/put.cc b/ceph/src/rocksdb/utilities/merge_operators/put.cc index fcbf67d9b..a4b135fef 100644 --- a/ceph/src/rocksdb/utilities/merge_operators/put.cc +++ b/ceph/src/rocksdb/utilities/merge_operators/put.cc @@ -22,10 +22,9 @@ namespace { // anonymous namespace // From the client-perspective, semantics are the same. class PutOperator : public MergeOperator { public: - virtual bool FullMerge(const Slice& /*key*/, const Slice* /*existing_value*/, - const std::deque& operand_sequence, - std::string* new_value, - Logger* /*logger*/) const override { + bool FullMerge(const Slice& /*key*/, const Slice* /*existing_value*/, + const std::deque& operand_sequence, + std::string* new_value, Logger* /*logger*/) const override { // Put basically only looks at the current/latest value assert(!operand_sequence.empty()); assert(new_value != nullptr); @@ -33,38 +32,36 @@ class PutOperator : public MergeOperator { return true; } - virtual bool PartialMerge(const Slice& /*key*/, const Slice& /*left_operand*/, - const Slice& right_operand, std::string* new_value, - Logger* /*logger*/) const override { + bool PartialMerge(const Slice& /*key*/, const Slice& /*left_operand*/, + const Slice& right_operand, std::string* new_value, + Logger* /*logger*/) const override { new_value->assign(right_operand.data(), right_operand.size()); return true; } using MergeOperator::PartialMergeMulti; - virtual bool PartialMergeMulti(const Slice& /*key*/, - const std::deque& operand_list, - std::string* new_value, - Logger* /*logger*/) const override { + bool PartialMergeMulti(const Slice& /*key*/, + const std::deque& operand_list, + std::string* new_value, + Logger* /*logger*/) const override { new_value->assign(operand_list.back().data(), operand_list.back().size()); return true; } - virtual const char* Name() const override { - return "PutOperator"; - } + const char* Name() const override { return "PutOperator"; } }; class PutOperatorV2 : public PutOperator { - virtual bool FullMerge(const Slice& /*key*/, const Slice* /*existing_value*/, - const std::deque& /*operand_sequence*/, - std::string* /*new_value*/, - Logger* /*logger*/) const override { + bool FullMerge(const Slice& /*key*/, const Slice* /*existing_value*/, + const std::deque& /*operand_sequence*/, + std::string* /*new_value*/, + Logger* /*logger*/) const override { assert(false); return false; } - virtual bool FullMergeV2(const MergeOperationInput& merge_in, - MergeOperationOutput* merge_out) const override { + bool FullMergeV2(const MergeOperationInput& merge_in, + MergeOperationOutput* merge_out) const override { // Put basically only looks at the current/latest value assert(!merge_in.operand_list.empty()); merge_out->existing_operand = merge_in.operand_list.back(); diff --git a/ceph/src/rocksdb/utilities/merge_operators/uint64add.cc b/ceph/src/rocksdb/utilities/merge_operators/uint64add.cc index dc761e74b..b998e1b8e 100644 --- a/ceph/src/rocksdb/utilities/merge_operators/uint64add.cc +++ b/ceph/src/rocksdb/utilities/merge_operators/uint64add.cc @@ -20,9 +20,9 @@ namespace { // anonymous namespace // Implemented as an AssociativeMergeOperator for simplicity and example. class UInt64AddOperator : public AssociativeMergeOperator { public: - virtual bool Merge(const Slice& /*key*/, const Slice* existing_value, - const Slice& value, std::string* new_value, - Logger* logger) const override { + bool Merge(const Slice& /*key*/, const Slice* existing_value, + const Slice& value, std::string* new_value, + Logger* logger) const override { uint64_t orig_value = 0; if (existing_value){ orig_value = DecodeInteger(*existing_value, logger); @@ -36,9 +36,7 @@ class UInt64AddOperator : public AssociativeMergeOperator { return true; // Return true always since corruption will be treated as 0 } - virtual const char* Name() const override { - return "UInt64AddOperator"; - } + const char* Name() const override { return "UInt64AddOperator"; } private: // Takes the string and decodes it into a uint64_t diff --git a/ceph/src/rocksdb/utilities/options/options_util.cc b/ceph/src/rocksdb/utilities/options/options_util.cc index 21734923f..3975eadd7 100644 --- a/ceph/src/rocksdb/utilities/options/options_util.cc +++ b/ceph/src/rocksdb/utilities/options/options_util.cc @@ -15,20 +15,28 @@ namespace rocksdb { Status LoadOptionsFromFile(const std::string& file_name, Env* env, DBOptions* db_options, std::vector* cf_descs, - bool ignore_unknown_options) { + bool ignore_unknown_options, + std::shared_ptr* cache) { RocksDBOptionsParser parser; Status s = parser.Parse(file_name, env, ignore_unknown_options); if (!s.ok()) { return s; } - *db_options = *parser.db_opt(); - const std::vector& cf_names = *parser.cf_names(); const std::vector& cf_opts = *parser.cf_opts(); cf_descs->clear(); for (size_t i = 0; i < cf_opts.size(); ++i) { cf_descs->push_back({cf_names[i], cf_opts[i]}); + if (cache != nullptr) { + TableFactory* tf = cf_opts[i].table_factory.get(); + if (tf != nullptr && tf->GetOptions() != nullptr && + tf->Name() == BlockBasedTableFactory().Name()) { + auto* loaded_bbt_opt = + reinterpret_cast(tf->GetOptions()); + loaded_bbt_opt->block_cache = *cache; + } + } } return Status::OK(); } @@ -63,15 +71,15 @@ Status GetLatestOptionsFileName(const std::string& dbpath, Status LoadLatestOptions(const std::string& dbpath, Env* env, DBOptions* db_options, std::vector* cf_descs, - bool ignore_unknown_options) { + bool ignore_unknown_options, + std::shared_ptr* cache) { std::string options_file_name; Status s = GetLatestOptionsFileName(dbpath, env, &options_file_name); if (!s.ok()) { return s; } - return LoadOptionsFromFile(dbpath + "/" + options_file_name, env, db_options, - cf_descs, ignore_unknown_options); + cf_descs, ignore_unknown_options, cache); } Status CheckOptionsCompatibility( diff --git a/ceph/src/rocksdb/utilities/options/options_util_test.cc b/ceph/src/rocksdb/utilities/options/options_util_test.cc index bf830190c..ed7bfdfd6 100644 --- a/ceph/src/rocksdb/utilities/options/options_util_test.cc +++ b/ceph/src/rocksdb/utilities/options/options_util_test.cc @@ -94,36 +94,82 @@ TEST_F(OptionsUtilTest, SaveAndLoad) { } } +TEST_F(OptionsUtilTest, SaveAndLoadWithCacheCheck) { + // creating db + DBOptions db_opt; + db_opt.create_if_missing = true; + // initialize BlockBasedTableOptions + std::shared_ptr cache = NewLRUCache(1 * 1024); + BlockBasedTableOptions bbt_opts; + bbt_opts.block_size = 32 * 1024; + // saving cf options + std::vector cf_opts; + ColumnFamilyOptions default_column_family_opt = ColumnFamilyOptions(); + default_column_family_opt.table_factory.reset( + NewBlockBasedTableFactory(bbt_opts)); + cf_opts.push_back(default_column_family_opt); + + ColumnFamilyOptions cf_opt_sample = ColumnFamilyOptions(); + cf_opt_sample.table_factory.reset(NewBlockBasedTableFactory(bbt_opts)); + cf_opts.push_back(cf_opt_sample); + + ColumnFamilyOptions cf_opt_plain_table_opt = ColumnFamilyOptions(); + cf_opt_plain_table_opt.table_factory.reset(NewPlainTableFactory()); + cf_opts.push_back(cf_opt_plain_table_opt); + + std::vector cf_names; + cf_names.push_back(kDefaultColumnFamilyName); + cf_names.push_back("cf_sample"); + cf_names.push_back("cf_plain_table_sample"); + // Saving DB in file + const std::string kFileName = "OPTIONS-LOAD_CACHE_123456"; + PersistRocksDBOptions(db_opt, cf_names, cf_opts, kFileName, env_.get()); + DBOptions loaded_db_opt; + std::vector loaded_cf_descs; + ASSERT_OK(LoadOptionsFromFile(kFileName, env_.get(), &loaded_db_opt, + &loaded_cf_descs, false, &cache)); + for (size_t i = 0; i < loaded_cf_descs.size(); i++) { + if (IsBlockBasedTableFactory(cf_opts[i].table_factory.get())) { + auto* loaded_bbt_opt = reinterpret_cast( + loaded_cf_descs[i].options.table_factory->GetOptions()); + // Expect the same cache will be loaded + if (loaded_bbt_opt != nullptr) { + ASSERT_EQ(loaded_bbt_opt->block_cache.get(), cache.get()); + } + } + } +} + namespace { class DummyTableFactory : public TableFactory { public: DummyTableFactory() {} - virtual ~DummyTableFactory() {} + ~DummyTableFactory() override {} - virtual const char* Name() const override { return "DummyTableFactory"; } + const char* Name() const override { return "DummyTableFactory"; } - virtual Status NewTableReader( + Status NewTableReader( const TableReaderOptions& /*table_reader_options*/, - unique_ptr&& /*file*/, uint64_t /*file_size*/, - unique_ptr* /*table_reader*/, + std::unique_ptr&& /*file*/, + uint64_t /*file_size*/, std::unique_ptr* /*table_reader*/, bool /*prefetch_index_and_filter_in_cache*/) const override { return Status::NotSupported(); } - virtual TableBuilder* NewTableBuilder( + TableBuilder* NewTableBuilder( const TableBuilderOptions& /*table_builder_options*/, uint32_t /*column_family_id*/, WritableFileWriter* /*file*/) const override { return nullptr; } - virtual Status SanitizeOptions( + Status SanitizeOptions( const DBOptions& /*db_opts*/, const ColumnFamilyOptions& /*cf_opts*/) const override { return Status::NotSupported(); } - virtual std::string GetPrintableTableOptions() const override { return ""; } + std::string GetPrintableTableOptions() const override { return ""; } Status GetOptionString(std::string* /*opt_string*/, const std::string& /*delimiter*/) const override { @@ -134,39 +180,39 @@ class DummyTableFactory : public TableFactory { class DummyMergeOperator : public MergeOperator { public: DummyMergeOperator() {} - virtual ~DummyMergeOperator() {} + ~DummyMergeOperator() override {} - virtual bool FullMergeV2(const MergeOperationInput& /*merge_in*/, - MergeOperationOutput* /*merge_out*/) const override { + bool FullMergeV2(const MergeOperationInput& /*merge_in*/, + MergeOperationOutput* /*merge_out*/) const override { return false; } - virtual bool PartialMergeMulti(const Slice& /*key*/, - const std::deque& /*operand_list*/, - std::string* /*new_value*/, - Logger* /*logger*/) const override { + bool PartialMergeMulti(const Slice& /*key*/, + const std::deque& /*operand_list*/, + std::string* /*new_value*/, + Logger* /*logger*/) const override { return false; } - virtual const char* Name() const override { return "DummyMergeOperator"; } + const char* Name() const override { return "DummyMergeOperator"; } }; class DummySliceTransform : public SliceTransform { public: DummySliceTransform() {} - virtual ~DummySliceTransform() {} + ~DummySliceTransform() override {} // Return the name of this transformation. - virtual const char* Name() const { return "DummySliceTransform"; } + const char* Name() const override { return "DummySliceTransform"; } // transform a src in domain to a dst in the range - virtual Slice Transform(const Slice& src) const { return src; } + Slice Transform(const Slice& src) const override { return src; } // determine whether this is a valid src upon the function applies - virtual bool InDomain(const Slice& /*src*/) const { return false; } + bool InDomain(const Slice& /*src*/) const override { return false; } // determine whether dst=Transform(src) for some src - virtual bool InRange(const Slice& /*dst*/) const { return false; } + bool InRange(const Slice& /*dst*/) const override { return false; } }; } // namespace diff --git a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.cc b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.cc index 1ebf8ae6b..f7f72df6d 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.cc +++ b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.cc @@ -263,7 +263,7 @@ Status BlockCacheTier::InsertImpl(const Slice& key, const Slice& data) { return Status::OK(); } -Status BlockCacheTier::Lookup(const Slice& key, unique_ptr* val, +Status BlockCacheTier::Lookup(const Slice& key, std::unique_ptr* val, size_t* size) { StopWatchNano timer(opt_.env, /*auto_start=*/ true); @@ -287,7 +287,7 @@ Status BlockCacheTier::Lookup(const Slice& key, unique_ptr* val, assert(file->refs_); - unique_ptr scratch(new char[lba.size_]); + std::unique_ptr scratch(new char[lba.size_]); Slice blk_key; Slice blk_val; @@ -369,7 +369,7 @@ bool BlockCacheTier::Reserve(const size_t size) { const double retain_fac = (100 - kEvictPct) / static_cast(100); while (size + size_ > opt_.cache_size * retain_fac) { - unique_ptr f(metadata_.Evict()); + std::unique_ptr f(metadata_.Evict()); if (!f) { // nothing is evictable return false; diff --git a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.h b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.h index 2b2c0ef4f..670463a87 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.h +++ b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier.h @@ -47,7 +47,7 @@ class BlockCacheTier : public PersistentCacheTier { insert_ops_(static_cast(opt_.max_write_pipeline_backlog_size)), buffer_allocator_(opt.write_buffer_size, opt.write_buffer_count()), writer_(this, opt_.writer_qdepth, static_cast(opt_.writer_dispatch_size)) { - Info(opt_.log, "Initializing allocator. size=%d B count=%d", + Info(opt_.log, "Initializing allocator. size=%d B count=%" ROCKSDB_PRIszt, opt_.write_buffer_size, opt_.write_buffer_count()); } @@ -100,7 +100,7 @@ class BlockCacheTier : public PersistentCacheTier { std::string key_; std::string data_; - const bool signal_ = false; // signal to request processing thread to exit + bool signal_ = false; // signal to request processing thread to exit }; // entry point for insert thread diff --git a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier_file.h b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier_file.h index ef5dbab04..b7f820b06 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier_file.h +++ b/ceph/src/rocksdb/utilities/persistent_cache/block_cache_tier_file.h @@ -149,7 +149,7 @@ class RandomAccessCacheFile : public BlockCacheFile { public: explicit RandomAccessCacheFile(Env* const env, const std::string& dir, const uint32_t cache_id, - const shared_ptr& log) + const std::shared_ptr& log) : BlockCacheFile(env, dir, cache_id), log_(log) {} virtual ~RandomAccessCacheFile() {} @@ -265,11 +265,11 @@ class ThreadedWriter : public Writer { IO& operator=(const IO&) = default; size_t Size() const { return sizeof(IO); } - WritableFile* file_ = nullptr; // File to write to - CacheWriteBuffer* const buf_ = nullptr; // buffer to write - uint64_t file_off_ = 0; // file offset - bool signal_ = false; // signal to exit thread loop - std::function callback_; // Callback on completion + WritableFile* file_ = nullptr; // File to write to + CacheWriteBuffer* buf_ = nullptr; // buffer to write + uint64_t file_off_ = 0; // file offset + bool signal_ = false; // signal to exit thread loop + std::function callback_; // Callback on completion }; explicit ThreadedWriter(PersistentCacheTier* const cache, const size_t qdepth, diff --git a/ceph/src/rocksdb/utilities/persistent_cache/hash_table_test.cc b/ceph/src/rocksdb/utilities/persistent_cache/hash_table_test.cc index 6fe5a5965..d6ff3e68e 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/hash_table_test.cc +++ b/ceph/src/rocksdb/utilities/persistent_cache/hash_table_test.cc @@ -20,7 +20,7 @@ namespace rocksdb { struct HashTableTest : public testing::Test { - ~HashTableTest() { map_.Clear(&HashTableTest::ClearNode); } + ~HashTableTest() override { map_.Clear(&HashTableTest::ClearNode); } struct Node { Node() {} @@ -49,7 +49,9 @@ struct HashTableTest : public testing::Test { }; struct EvictableHashTableTest : public testing::Test { - ~EvictableHashTableTest() { map_.Clear(&EvictableHashTableTest::ClearNode); } + ~EvictableHashTableTest() override { + map_.Clear(&EvictableHashTableTest::ClearNode); + } struct Node : LRUElement { Node() {} diff --git a/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_bench.cc b/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_bench.cc index 7d26c3a7d..64d75c7a5 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_bench.cc +++ b/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_bench.cc @@ -251,7 +251,7 @@ class CacheTierBenchmark { // create data for a key by filling with a certain pattern std::unique_ptr NewBlock(const uint64_t val) { - unique_ptr data(new char[FLAGS_iosize]); + std::unique_ptr data(new char[FLAGS_iosize]); memset(data.get(), val % 255, FLAGS_iosize); return data; } diff --git a/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_test.h b/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_test.h index 37e842f2e..ad99ea864 100644 --- a/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_test.h +++ b/ceph/src/rocksdb/utilities/persistent_cache/persistent_cache_test.h @@ -157,7 +157,7 @@ class PersistentCacheTierTest : public testing::Test { memset(edata, '0' + (i % 10), sizeof(edata)); auto k = prefix + PaddedNumber(i, /*count=*/8); Slice key(k); - unique_ptr block; + std::unique_ptr block; size_t block_size; if (eviction_enabled) { @@ -210,7 +210,7 @@ class PersistentCacheTierTest : public testing::Test { } const std::string path_; - shared_ptr log_; + std::shared_ptr log_; std::shared_ptr cache_; std::atomic key_{0}; size_t max_keys_ = 0; diff --git a/ceph/src/rocksdb/utilities/redis/README b/ceph/src/rocksdb/utilities/redis/README deleted file mode 100644 index 8b17bc05a..000000000 --- a/ceph/src/rocksdb/utilities/redis/README +++ /dev/null @@ -1,14 +0,0 @@ -This folder defines a REDIS-style interface for Rocksdb. -Right now it is written as a simple tag-on in the rocksdb::RedisLists class. -It implements Redis Lists, and supports only the "non-blocking operations". - -Internally, the set of lists are stored in a rocksdb database, mapping keys to -values. Each "value" is the list itself, storing a sequence of "elements". -Each element is stored as a 32-bit-integer, followed by a sequence of bytes. -The 32-bit-integer represents the length of the element (that is, the number -of bytes that follow). And then that many bytes follow. - - -NOTE: This README file may be old. See the actual redis_lists.cc file for -definitive details on the implementation. There should be a header at the top -of that file, explaining a bit of the implementation details. diff --git a/ceph/src/rocksdb/utilities/redis/redis_list_exception.h b/ceph/src/rocksdb/utilities/redis/redis_list_exception.h deleted file mode 100644 index bc2b39a31..000000000 --- a/ceph/src/rocksdb/utilities/redis/redis_list_exception.h +++ /dev/null @@ -1,22 +0,0 @@ -/** - * A simple structure for exceptions in RedisLists. - * - * @author Deon Nicholas (dnicholas@fb.com) - * Copyright 2013 Facebook - */ - -#pragma once -#ifndef ROCKSDB_LITE -#include - -namespace rocksdb { - -class RedisListException: public std::exception { - public: - const char* what() const throw() override { - return "Invalid operation or corrupt data in Redis List."; - } -}; - -} // namespace rocksdb -#endif diff --git a/ceph/src/rocksdb/utilities/redis/redis_list_iterator.h b/ceph/src/rocksdb/utilities/redis/redis_list_iterator.h deleted file mode 100644 index 7bfe20690..000000000 --- a/ceph/src/rocksdb/utilities/redis/redis_list_iterator.h +++ /dev/null @@ -1,309 +0,0 @@ -// Copyright 2013 Facebook -/** - * RedisListIterator: - * An abstraction over the "list" concept (e.g.: for redis lists). - * Provides functionality to read, traverse, edit, and write these lists. - * - * Upon construction, the RedisListIterator is given a block of list data. - * Internally, it stores a pointer to the data and a pointer to current item. - * It also stores a "result" list that will be mutated over time. - * - * Traversal and mutation are done by "forward iteration". - * The Push() and Skip() methods will advance the iterator to the next item. - * However, Push() will also "write the current item to the result". - * Skip() will simply move to next item, causing current item to be dropped. - * - * Upon completion, the result (accessible by WriteResult()) will be saved. - * All "skipped" items will be gone; all "pushed" items will remain. - * - * @throws Any of the operations may throw a RedisListException if an invalid - * operation is performed or if the data is found to be corrupt. - * - * @notes By default, if WriteResult() is called part-way through iteration, - * it will automatically advance the iterator to the end, and Keep() - * all items that haven't been traversed yet. This may be subject - * to review. - * - * @notes Can access the "current" item via GetCurrent(), and other - * list-specific information such as Length(). - * - * @notes The internal representation is due to change at any time. Presently, - * the list is represented as follows: - * - 32-bit integer header: the number of items in the list - * - For each item: - * - 32-bit int (n): the number of bytes representing this item - * - n bytes of data: the actual data. - * - * @author Deon Nicholas (dnicholas@fb.com) - */ - -#pragma once -#ifndef ROCKSDB_LITE - -#include - -#include "redis_list_exception.h" -#include "rocksdb/slice.h" -#include "util/coding.h" - -namespace rocksdb { - -/// An abstraction over the "list" concept. -/// All operations may throw a RedisListException -class RedisListIterator { - public: - /// Construct a redis-list-iterator based on data. - /// If the data is non-empty, it must formatted according to @notes above. - /// - /// If the data is valid, we can assume the following invariant(s): - /// a) length_, num_bytes_ are set correctly. - /// b) cur_byte_ always refers to the start of the current element, - /// just before the bytes that specify element length. - /// c) cur_elem_ is always the index of the current element. - /// d) cur_elem_length_ is always the number of bytes in current element, - /// excluding the 4-byte header itself. - /// e) result_ will always contain data_[0..cur_byte_) and a header - /// f) Whenever corrupt data is encountered or an invalid operation is - /// attempted, a RedisListException will immediately be thrown. - explicit RedisListIterator(const std::string& list_data) - : data_(list_data.data()), - num_bytes_(static_cast(list_data.size())), - cur_byte_(0), - cur_elem_(0), - cur_elem_length_(0), - length_(0), - result_() { - // Initialize the result_ (reserve enough space for header) - InitializeResult(); - - // Parse the data only if it is not empty. - if (num_bytes_ == 0) { - return; - } - - // If non-empty, but less than 4 bytes, data must be corrupt - if (num_bytes_ < sizeof(length_)) { - ThrowError("Corrupt header."); // Will break control flow - } - - // Good. The first bytes specify the number of elements - length_ = DecodeFixed32(data_); - cur_byte_ = sizeof(length_); - - // If we have at least one element, point to that element. - // Also, read the first integer of the element (specifying the size), - // if possible. - if (length_ > 0) { - if (cur_byte_ + sizeof(cur_elem_length_) <= num_bytes_) { - cur_elem_length_ = DecodeFixed32(data_+cur_byte_); - } else { - ThrowError("Corrupt data for first element."); - } - } - - // At this point, we are fully set-up. - // The invariants described in the header should now be true. - } - - /// Reserve some space for the result_. - /// Equivalent to result_.reserve(bytes). - void Reserve(int bytes) { - result_.reserve(bytes); - } - - /// Go to next element in data file. - /// Also writes the current element to result_. - RedisListIterator& Push() { - WriteCurrentElement(); - MoveNext(); - return *this; - } - - /// Go to next element in data file. - /// Drops/skips the current element. It will not be written to result_. - RedisListIterator& Skip() { - MoveNext(); - --length_; // One less item - --cur_elem_; // We moved one forward, but index did not change - return *this; - } - - /// Insert elem into the result_ (just BEFORE the current element / byte) - /// Note: if Done() (i.e.: iterator points to end), this will append elem. - void InsertElement(const Slice& elem) { - // Ensure we are in a valid state - CheckErrors(); - - const int kOrigSize = static_cast(result_.size()); - result_.resize(kOrigSize + SizeOf(elem)); - EncodeFixed32(result_.data() + kOrigSize, - static_cast(elem.size())); - memcpy(result_.data() + kOrigSize + sizeof(uint32_t), elem.data(), - elem.size()); - ++length_; - ++cur_elem_; - } - - /// Access the current element, and save the result into *curElem - void GetCurrent(Slice* curElem) { - // Ensure we are in a valid state - CheckErrors(); - - // Ensure that we are not past the last element. - if (Done()) { - ThrowError("Invalid dereferencing."); - } - - // Dereference the element - *curElem = Slice(data_+cur_byte_+sizeof(cur_elem_length_), - cur_elem_length_); - } - - // Number of elements - int Length() const { - return length_; - } - - // Number of bytes in the final representation (i.e: WriteResult().size()) - int Size() const { - // result_ holds the currently written data - // data_[cur_byte..num_bytes-1] is the remainder of the data - return static_cast(result_.size() + (num_bytes_ - cur_byte_)); - } - - // Reached the end? - bool Done() const { - return cur_byte_ >= num_bytes_ || cur_elem_ >= length_; - } - - /// Returns a string representing the final, edited, data. - /// Assumes that all bytes of data_ in the range [0,cur_byte_) have been read - /// and that result_ contains this data. - /// The rest of the data must still be written. - /// So, this method ADVANCES THE ITERATOR TO THE END before writing. - Slice WriteResult() { - CheckErrors(); - - // The header should currently be filled with dummy data (0's) - // Correctly update the header. - // Note, this is safe since result_ is a vector (guaranteed contiguous) - EncodeFixed32(&result_[0],length_); - - // Append the remainder of the data to the result. - result_.insert(result_.end(),data_+cur_byte_, data_ +num_bytes_); - - // Seek to end of file - cur_byte_ = num_bytes_; - cur_elem_ = length_; - cur_elem_length_ = 0; - - // Return the result - return Slice(result_.data(),result_.size()); - } - - public: // Static public functions - - /// An upper-bound on the amount of bytes needed to store this element. - /// This is used to hide representation information from the client. - /// E.G. This can be used to compute the bytes we want to Reserve(). - static uint32_t SizeOf(const Slice& elem) { - // [Integer Length . Data] - return static_cast(sizeof(uint32_t) + elem.size()); - } - - private: // Private functions - - /// Initializes the result_ string. - /// It will fill the first few bytes with 0's so that there is - /// enough space for header information when we need to write later. - /// Currently, "header information" means: the length (number of elements) - /// Assumes that result_ is empty to begin with - void InitializeResult() { - assert(result_.empty()); // Should always be true. - result_.resize(sizeof(uint32_t),0); // Put a block of 0's as the header - } - - /// Go to the next element (used in Push() and Skip()) - void MoveNext() { - CheckErrors(); - - // Check to make sure we are not already in a finished state - if (Done()) { - ThrowError("Attempting to iterate past end of list."); - } - - // Move forward one element. - cur_byte_ += sizeof(cur_elem_length_) + cur_elem_length_; - ++cur_elem_; - - // If we are at the end, finish - if (Done()) { - cur_elem_length_ = 0; - return; - } - - // Otherwise, we should be able to read the new element's length - if (cur_byte_ + sizeof(cur_elem_length_) > num_bytes_) { - ThrowError("Corrupt element data."); - } - - // Set the new element's length - cur_elem_length_ = DecodeFixed32(data_+cur_byte_); - - return; - } - - /// Append the current element (pointed to by cur_byte_) to result_ - /// Assumes result_ has already been reserved appropriately. - void WriteCurrentElement() { - // First verify that the iterator is still valid. - CheckErrors(); - if (Done()) { - ThrowError("Attempting to write invalid element."); - } - - // Append the cur element. - result_.insert(result_.end(), - data_+cur_byte_, - data_+cur_byte_+ sizeof(uint32_t) + cur_elem_length_); - } - - /// Will ThrowError() if necessary. - /// Checks for common/ubiquitous errors that can arise after most operations. - /// This method should be called before any reading operation. - /// If this function succeeds, then we are guaranteed to be in a valid state. - /// Other member functions should check for errors and ThrowError() also - /// if an error occurs that is specific to it even while in a valid state. - void CheckErrors() { - // Check if any crazy thing has happened recently - if ((cur_elem_ > length_) || // Bad index - (cur_byte_ > num_bytes_) || // No more bytes - (cur_byte_ + cur_elem_length_ > num_bytes_) || // Item too large - (cur_byte_ == num_bytes_ && cur_elem_ != length_) || // Too many items - (cur_elem_ == length_ && cur_byte_ != num_bytes_)) { // Too many bytes - ThrowError("Corrupt data."); - } - } - - /// Will throw an exception based on the passed-in message. - /// This function is guaranteed to STOP THE CONTROL-FLOW. - /// (i.e.: you do not have to call "return" after calling ThrowError) - void ThrowError(const char* const /*msg*/ = nullptr) { - // TODO: For now we ignore the msg parameter. This can be expanded later. - throw RedisListException(); - } - - private: - const char* const data_; // A pointer to the data (the first byte) - const uint32_t num_bytes_; // The number of bytes in this list - - uint32_t cur_byte_; // The current byte being read - uint32_t cur_elem_; // The current element being read - uint32_t cur_elem_length_; // The number of bytes in current element - - uint32_t length_; // The number of elements in this list - std::vector result_; // The output data -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/redis/redis_lists.cc b/ceph/src/rocksdb/utilities/redis/redis_lists.cc deleted file mode 100644 index 3ba7470ec..000000000 --- a/ceph/src/rocksdb/utilities/redis/redis_lists.cc +++ /dev/null @@ -1,552 +0,0 @@ -// Copyright 2013 Facebook -/** - * A (persistent) Redis API built using the rocksdb backend. - * Implements Redis Lists as described on: http://redis.io/commands#list - * - * @throws All functions may throw a RedisListException on error/corruption. - * - * @notes Internally, the set of lists is stored in a rocksdb database, - * mapping keys to values. Each "value" is the list itself, storing - * some kind of internal representation of the data. All the - * representation details are handled by the RedisListIterator class. - * The present file should be oblivious to the representation details, - * handling only the client (Redis) API, and the calls to rocksdb. - * - * @TODO Presently, all operations take at least O(NV) time where - * N is the number of elements in the list, and V is the average - * number of bytes per value in the list. So maybe, with merge operator - * we can improve this to an optimal O(V) amortized time, since we - * wouldn't have to read and re-write the entire list. - * - * @author Deon Nicholas (dnicholas@fb.com) - */ - -#ifndef ROCKSDB_LITE -#include "redis_lists.h" - -#include -#include -#include - -#include "rocksdb/slice.h" -#include "util/coding.h" - -namespace rocksdb -{ - -/// Constructors - -RedisLists::RedisLists(const std::string& db_path, - Options options, bool destructive) - : put_option_(), - get_option_() { - - // Store the name of the database - db_name_ = db_path; - - // If destructive, destroy the DB before re-opening it. - if (destructive) { - DestroyDB(db_name_, Options()); - } - - // Now open and deal with the db - DB* db; - Status s = DB::Open(options, db_name_, &db); - if (!s.ok()) { - std::cerr << "ERROR " << s.ToString() << std::endl; - assert(false); - } - - db_ = std::unique_ptr(db); -} - - -/// Accessors - -// Number of elements in the list associated with key -// : throws RedisListException -int RedisLists::Length(const std::string& key) { - // Extract the string data representing the list. - std::string data; - db_->Get(get_option_, key, &data); - - // Return the length - RedisListIterator it(data); - return it.Length(); -} - -// Get the element at the specified index in the (list: key) -// Returns ("") on out-of-bounds -// : throws RedisListException -bool RedisLists::Index(const std::string& key, int32_t index, - std::string* result) { - // Extract the string data representing the list. - std::string data; - db_->Get(get_option_, key, &data); - - // Handle REDIS negative indices (from the end); fast iff Length() takes O(1) - if (index < 0) { - index = Length(key) - (-index); //replace (-i) with (N-i). - } - - // Iterate through the list until the desired index is found. - int curIndex = 0; - RedisListIterator it(data); - while(curIndex < index && !it.Done()) { - ++curIndex; - it.Skip(); - } - - // If we actually found the index - if (curIndex == index && !it.Done()) { - Slice elem; - it.GetCurrent(&elem); - if (result != nullptr) { - *result = elem.ToString(); - } - - return true; - } else { - return false; - } -} - -// Return a truncated version of the list. -// First, negative values for first/last are interpreted as "end of list". -// So, if first == -1, then it is re-set to index: (Length(key) - 1) -// Then, return exactly those indices i such that first <= i <= last. -// : throws RedisListException -std::vector RedisLists::Range(const std::string& key, - int32_t first, int32_t last) { - // Extract the string data representing the list. - std::string data; - db_->Get(get_option_, key, &data); - - // Handle negative bounds (-1 means last element, etc.) - int listLen = Length(key); - if (first < 0) { - first = listLen - (-first); // Replace (-x) with (N-x) - } - if (last < 0) { - last = listLen - (-last); - } - - // Verify bounds (and truncate the range so that it is valid) - first = std::max(first, 0); - last = std::min(last, listLen-1); - int len = std::max(last-first+1, 0); - - // Initialize the resulting list - std::vector result(len); - - // Traverse the list and update the vector - int curIdx = 0; - Slice elem; - for (RedisListIterator it(data); !it.Done() && curIdx<=last; it.Skip()) { - if (first <= curIdx && curIdx <= last) { - it.GetCurrent(&elem); - result[curIdx-first].assign(elem.data(),elem.size()); - } - - ++curIdx; - } - - // Return the result. Might be empty - return result; -} - -// Print the (list: key) out to stdout. For debugging mostly. Public for now. -void RedisLists::Print(const std::string& key) { - // Extract the string data representing the list. - std::string data; - db_->Get(get_option_, key, &data); - - // Iterate through the list and print the items - Slice elem; - for (RedisListIterator it(data); !it.Done(); it.Skip()) { - it.GetCurrent(&elem); - std::cout << "ITEM " << elem.ToString() << std::endl; - } - - //Now print the byte data - RedisListIterator it(data); - std::cout << "==Printing data==" << std::endl; - std::cout << data.size() << std::endl; - std::cout << it.Size() << " " << it.Length() << std::endl; - Slice result = it.WriteResult(); - std::cout << result.data() << std::endl; - if (true) { - std::cout << "size: " << result.size() << std::endl; - const char* val = result.data(); - for(int i=0; i<(int)result.size(); ++i) { - std::cout << (int)val[i] << " " << (val[i]>=32?val[i]:' ') << std::endl; - } - std::cout << std::endl; - } -} - -/// Insert/Update Functions -/// Note: The "real" insert function is private. See below. - -// InsertBefore and InsertAfter are simply wrappers around the Insert function. -int RedisLists::InsertBefore(const std::string& key, const std::string& pivot, - const std::string& value) { - return Insert(key, pivot, value, false); -} - -int RedisLists::InsertAfter(const std::string& key, const std::string& pivot, - const std::string& value) { - return Insert(key, pivot, value, true); -} - -// Prepend value onto beginning of (list: key) -// : throws RedisListException -int RedisLists::PushLeft(const std::string& key, const std::string& value) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Construct the result - RedisListIterator it(data); - it.Reserve(it.Size() + it.SizeOf(value)); - it.InsertElement(value); - - // Push the data back to the db and return the length - db_->Put(put_option_, key, it.WriteResult()); - return it.Length(); -} - -// Append value onto end of (list: key) -// TODO: Make this O(1) time. Might require MergeOperator. -// : throws RedisListException -int RedisLists::PushRight(const std::string& key, const std::string& value) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Create an iterator to the data and seek to the end. - RedisListIterator it(data); - it.Reserve(it.Size() + it.SizeOf(value)); - while (!it.Done()) { - it.Push(); // Write each element as we go - } - - // Insert the new element at the current position (the end) - it.InsertElement(value); - - // Push it back to the db, and return length - db_->Put(put_option_, key, it.WriteResult()); - return it.Length(); -} - -// Set (list: key)[idx] = val. Return true on success, false on fail. -// : throws RedisListException -bool RedisLists::Set(const std::string& key, int32_t index, - const std::string& value) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Handle negative index for REDIS (meaning -index from end of list) - if (index < 0) { - index = Length(key) - (-index); - } - - // Iterate through the list until we find the element we want - int curIndex = 0; - RedisListIterator it(data); - it.Reserve(it.Size() + it.SizeOf(value)); // Over-estimate is fine - while(curIndex < index && !it.Done()) { - it.Push(); - ++curIndex; - } - - // If not found, return false (this occurs when index was invalid) - if (it.Done() || curIndex != index) { - return false; - } - - // Write the new element value, and drop the previous element value - it.InsertElement(value); - it.Skip(); - - // Write the data to the database - // Check status, since it needs to return true/false guarantee - Status s = db_->Put(put_option_, key, it.WriteResult()); - - // Success - return s.ok(); -} - -/// Delete / Remove / Pop functions - -// Trim (list: key) so that it will only contain the indices from start..stop -// Invalid indices will not generate an error, just empty, -// or the portion of the list that fits in this interval -// : throws RedisListException -bool RedisLists::Trim(const std::string& key, int32_t start, int32_t stop) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Handle negative indices in REDIS - int listLen = Length(key); - if (start < 0) { - start = listLen - (-start); - } - if (stop < 0) { - stop = listLen - (-stop); - } - - // Truncate bounds to only fit in the list - start = std::max(start, 0); - stop = std::min(stop, listLen-1); - - // Construct an iterator for the list. Drop all undesired elements. - int curIndex = 0; - RedisListIterator it(data); - it.Reserve(it.Size()); // Over-estimate - while(!it.Done()) { - // If not within the range, just skip the item (drop it). - // Otherwise, continue as usual. - if (start <= curIndex && curIndex <= stop) { - it.Push(); - } else { - it.Skip(); - } - - // Increment the current index - ++curIndex; - } - - // Write the (possibly empty) result to the database - Status s = db_->Put(put_option_, key, it.WriteResult()); - - // Return true as long as the write succeeded - return s.ok(); -} - -// Return and remove the first element in the list (or "" if empty) -// : throws RedisListException -bool RedisLists::PopLeft(const std::string& key, std::string* result) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Point to first element in the list (if it exists), and get its value/size - RedisListIterator it(data); - if (it.Length() > 0) { // Proceed only if list is non-empty - Slice elem; - it.GetCurrent(&elem); // Store the value of the first element - it.Reserve(it.Size() - it.SizeOf(elem)); - it.Skip(); // DROP the first item and move to next - - // Update the db - db_->Put(put_option_, key, it.WriteResult()); - - // Return the value - if (result != nullptr) { - *result = elem.ToString(); - } - return true; - } else { - return false; - } -} - -// Remove and return the last element in the list (or "" if empty) -// TODO: Make this O(1). Might require MergeOperator. -// : throws RedisListException -bool RedisLists::PopRight(const std::string& key, std::string* result) { - // Extract the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Construct an iterator to the data and move to last element - RedisListIterator it(data); - it.Reserve(it.Size()); - int len = it.Length(); - int curIndex = 0; - while(curIndex < (len-1) && !it.Done()) { - it.Push(); - ++curIndex; - } - - // Extract and drop/skip the last element - if (curIndex == len-1) { - assert(!it.Done()); // Sanity check. Should not have ended here. - - // Extract and pop the element - Slice elem; - it.GetCurrent(&elem); // Save value of element. - it.Skip(); // Skip the element - - // Write the result to the database - db_->Put(put_option_, key, it.WriteResult()); - - // Return the value - if (result != nullptr) { - *result = elem.ToString(); - } - return true; - } else { - // Must have been an empty list - assert(it.Done() && len==0 && curIndex == 0); - return false; - } -} - -// Remove the (first or last) "num" occurrences of value in (list: key) -// : throws RedisListException -int RedisLists::Remove(const std::string& key, int32_t num, - const std::string& value) { - // Negative num ==> RemoveLast; Positive num ==> Remove First - if (num < 0) { - return RemoveLast(key, -num, value); - } else if (num > 0) { - return RemoveFirst(key, num, value); - } else { - return RemoveFirst(key, Length(key), value); - } -} - -// Remove the first "num" occurrences of value in (list: key). -// : throws RedisListException -int RedisLists::RemoveFirst(const std::string& key, int32_t num, - const std::string& value) { - // Ensure that the number is positive - assert(num >= 0); - - // Extract the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Traverse the list, appending all but the desired occurrences of value - int numSkipped = 0; // Keep track of the number of times value is seen - Slice elem; - RedisListIterator it(data); - it.Reserve(it.Size()); - while (!it.Done()) { - it.GetCurrent(&elem); - - if (elem == value && numSkipped < num) { - // Drop this item if desired - it.Skip(); - ++numSkipped; - } else { - // Otherwise keep the item and proceed as normal - it.Push(); - } - } - - // Put the result back to the database - db_->Put(put_option_, key, it.WriteResult()); - - // Return the number of elements removed - return numSkipped; -} - - -// Remove the last "num" occurrences of value in (list: key). -// TODO: I traverse the list 2x. Make faster. Might require MergeOperator. -// : throws RedisListException -int RedisLists::RemoveLast(const std::string& key, int32_t num, - const std::string& value) { - // Ensure that the number is positive - assert(num >= 0); - - // Extract the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Temporary variable to hold the "current element" in the blocks below - Slice elem; - - // Count the total number of occurrences of value - int totalOccs = 0; - for (RedisListIterator it(data); !it.Done(); it.Skip()) { - it.GetCurrent(&elem); - if (elem == value) { - ++totalOccs; - } - } - - // Construct an iterator to the data. Reserve enough space for the result. - RedisListIterator it(data); - int bytesRemoved = std::min(num,totalOccs)*it.SizeOf(value); - it.Reserve(it.Size() - bytesRemoved); - - // Traverse the list, appending all but the desired occurrences of value. - // Note: "Drop the last k occurrences" is equivalent to - // "keep only the first n-k occurrences", where n is total occurrences. - int numKept = 0; // Keep track of the number of times value is kept - while(!it.Done()) { - it.GetCurrent(&elem); - - // If we are within the deletion range and equal to value, drop it. - // Otherwise, append/keep/push it. - if (elem == value) { - if (numKept < totalOccs - num) { - it.Push(); - ++numKept; - } else { - it.Skip(); - } - } else { - // Always append the others - it.Push(); - } - } - - // Put the result back to the database - db_->Put(put_option_, key, it.WriteResult()); - - // Return the number of elements removed - return totalOccs - numKept; -} - -/// Private functions - -// Insert element value into (list: key), right before/after -// the first occurrence of pivot -// : throws RedisListException -int RedisLists::Insert(const std::string& key, const std::string& pivot, - const std::string& value, bool insert_after) { - // Get the original list data - std::string data; - db_->Get(get_option_, key, &data); - - // Construct an iterator to the data and reserve enough space for result. - RedisListIterator it(data); - it.Reserve(it.Size() + it.SizeOf(value)); - - // Iterate through the list until we find the element we want - Slice elem; - bool found = false; - while(!it.Done() && !found) { - it.GetCurrent(&elem); - - // When we find the element, insert the element and mark found - if (elem == pivot) { // Found it! - found = true; - if (insert_after == true) { // Skip one more, if inserting after it - it.Push(); - } - it.InsertElement(value); - } else { - it.Push(); - } - - } - - // Put the data (string) into the database - if (found) { - db_->Put(put_option_, key, it.WriteResult()); - } - - // Returns the new (possibly unchanged) length of the list - return it.Length(); -} - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/redis/redis_lists.h b/ceph/src/rocksdb/utilities/redis/redis_lists.h deleted file mode 100644 index 6c8b9551e..000000000 --- a/ceph/src/rocksdb/utilities/redis/redis_lists.h +++ /dev/null @@ -1,108 +0,0 @@ -/** - * A (persistent) Redis API built using the rocksdb backend. - * Implements Redis Lists as described on: http://redis.io/commands#list - * - * @throws All functions may throw a RedisListException - * - * @author Deon Nicholas (dnicholas@fb.com) - * Copyright 2013 Facebook - */ - -#ifndef ROCKSDB_LITE -#pragma once - -#include -#include "rocksdb/db.h" -#include "redis_list_iterator.h" -#include "redis_list_exception.h" - -namespace rocksdb { - -/// The Redis functionality (see http://redis.io/commands#list) -/// All functions may THROW a RedisListException -class RedisLists { - public: // Constructors / Destructors - /// Construct a new RedisLists database, with name/path of db. - /// Will clear the database on open iff destructive is true (default false). - /// Otherwise, it will restore saved changes. - /// May throw RedisListException - RedisLists(const std::string& db_path, - Options options, bool destructive = false); - - public: // Accessors - /// The number of items in (list: key) - int Length(const std::string& key); - - /// Search the list for the (index)'th item (0-based) in (list:key) - /// A negative index indicates: "from end-of-list" - /// If index is within range: return true, and return the value in *result. - /// If (index < -length OR index>=length), then index is out of range: - /// return false (and *result is left unchanged) - /// May throw RedisListException - bool Index(const std::string& key, int32_t index, - std::string* result); - - /// Return (list: key)[first..last] (inclusive) - /// May throw RedisListException - std::vector Range(const std::string& key, - int32_t first, int32_t last); - - /// Prints the entire (list: key), for debugging. - void Print(const std::string& key); - - public: // Insert/Update - /// Insert value before/after pivot in (list: key). Return the length. - /// May throw RedisListException - int InsertBefore(const std::string& key, const std::string& pivot, - const std::string& value); - int InsertAfter(const std::string& key, const std::string& pivot, - const std::string& value); - - /// Push / Insert value at beginning/end of the list. Return the length. - /// May throw RedisListException - int PushLeft(const std::string& key, const std::string& value); - int PushRight(const std::string& key, const std::string& value); - - /// Set (list: key)[idx] = val. Return true on success, false on fail - /// May throw RedisListException - bool Set(const std::string& key, int32_t index, const std::string& value); - - public: // Delete / Remove / Pop / Trim - /// Trim (list: key) so that it will only contain the indices from start..stop - /// Returns true on success - /// May throw RedisListException - bool Trim(const std::string& key, int32_t start, int32_t stop); - - /// If list is empty, return false and leave *result unchanged. - /// Else, remove the first/last elem, store it in *result, and return true - bool PopLeft(const std::string& key, std::string* result); // First - bool PopRight(const std::string& key, std::string* result); // Last - - /// Remove the first (or last) num occurrences of value from the list (key) - /// Return the number of elements removed. - /// May throw RedisListException - int Remove(const std::string& key, int32_t num, - const std::string& value); - int RemoveFirst(const std::string& key, int32_t num, - const std::string& value); - int RemoveLast(const std::string& key, int32_t num, - const std::string& value); - - private: // Private Functions - /// Calls InsertBefore or InsertAfter - int Insert(const std::string& key, const std::string& pivot, - const std::string& value, bool insert_after); - private: - std::string db_name_; // The actual database name/path - WriteOptions put_option_; - ReadOptions get_option_; - - /// The backend rocksdb database. - /// Map : key --> list - /// where a list is a sequence of elements - /// and an element is a 4-byte integer (n), followed by n bytes of data - std::unique_ptr db_; -}; - -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/redis/redis_lists_test.cc b/ceph/src/rocksdb/utilities/redis/redis_lists_test.cc deleted file mode 100644 index 961d87de7..000000000 --- a/ceph/src/rocksdb/utilities/redis/redis_lists_test.cc +++ /dev/null @@ -1,894 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). -/** - * A test harness for the Redis API built on rocksdb. - * - * USAGE: Build with: "make redis_test" (in rocksdb directory). - * Run unit tests with: "./redis_test" - * Manual/Interactive user testing: "./redis_test -m" - * Manual user testing + restart database: "./redis_test -m -d" - * - * TODO: Add LARGE random test cases to verify efficiency and scalability - * - * @author Deon Nicholas (dnicholas@fb.com) - */ - -#ifndef ROCKSDB_LITE - -#include -#include - -#include "redis_lists.h" -#include "util/testharness.h" -#include "util/random.h" - -using namespace rocksdb; - -namespace rocksdb { - -class RedisListsTest : public testing::Test { - public: - static const std::string kDefaultDbName; - static Options options; - - RedisListsTest() { - options.create_if_missing = true; - } -}; - -const std::string RedisListsTest::kDefaultDbName = - test::PerThreadDBPath("redis_lists_test"); -Options RedisListsTest::options = Options(); - -// operator== and operator<< are defined below for vectors (lists) -// Needed for ASSERT_EQ - -namespace { -void AssertListEq(const std::vector& result, - const std::vector& expected_result) { - ASSERT_EQ(result.size(), expected_result.size()); - for (size_t i = 0; i < result.size(); ++i) { - ASSERT_EQ(result[i], expected_result[i]); - } -} -} // namespace - -// PushRight, Length, Index, Range -TEST_F(RedisListsTest, SimpleTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Simple PushRight (should return the new length each time) - ASSERT_EQ(redis.PushRight("k1", "v1"), 1); - ASSERT_EQ(redis.PushRight("k1", "v2"), 2); - ASSERT_EQ(redis.PushRight("k1", "v3"), 3); - - // Check Length and Index() functions - ASSERT_EQ(redis.Length("k1"), 3); // Check length - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "v1"); // Check valid indices - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "v2"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "v3"); - - // Check range function and vectors - std::vector result = redis.Range("k1", 0, 2); // Get the list - std::vector expected_result(3); - expected_result[0] = "v1"; - expected_result[1] = "v2"; - expected_result[2] = "v3"; - AssertListEq(result, expected_result); -} - -// PushLeft, Length, Index, Range -TEST_F(RedisListsTest, SimpleTest2) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Simple PushRight - ASSERT_EQ(redis.PushLeft("k1", "v3"), 1); - ASSERT_EQ(redis.PushLeft("k1", "v2"), 2); - ASSERT_EQ(redis.PushLeft("k1", "v1"), 3); - - // Check Length and Index() functions - ASSERT_EQ(redis.Length("k1"), 3); // Check length - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "v1"); // Check valid indices - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "v2"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "v3"); - - // Check range function and vectors - std::vector result = redis.Range("k1", 0, 2); // Get the list - std::vector expected_result(3); - expected_result[0] = "v1"; - expected_result[1] = "v2"; - expected_result[2] = "v3"; - AssertListEq(result, expected_result); -} - -// Exhaustive test of the Index() function -TEST_F(RedisListsTest, IndexTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Empty Index check (return empty and should not crash or edit tempv) - tempv = "yo"; - ASSERT_TRUE(!redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "yo"); - ASSERT_TRUE(!redis.Index("fda", 3, &tempv)); - ASSERT_EQ(tempv, "yo"); - ASSERT_TRUE(!redis.Index("random", -12391, &tempv)); - ASSERT_EQ(tempv, "yo"); - - // Simple Pushes (will yield: [v6, v4, v4, v1, v2, v3] - redis.PushRight("k1", "v1"); - redis.PushRight("k1", "v2"); - redis.PushRight("k1", "v3"); - redis.PushLeft("k1", "v4"); - redis.PushLeft("k1", "v4"); - redis.PushLeft("k1", "v6"); - - // Simple, non-negative indices - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "v6"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "v4"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "v4"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "v1"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "v2"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "v3"); - - // Negative indices - ASSERT_TRUE(redis.Index("k1", -6, &tempv)); - ASSERT_EQ(tempv, "v6"); - ASSERT_TRUE(redis.Index("k1", -5, &tempv)); - ASSERT_EQ(tempv, "v4"); - ASSERT_TRUE(redis.Index("k1", -4, &tempv)); - ASSERT_EQ(tempv, "v4"); - ASSERT_TRUE(redis.Index("k1", -3, &tempv)); - ASSERT_EQ(tempv, "v1"); - ASSERT_TRUE(redis.Index("k1", -2, &tempv)); - ASSERT_EQ(tempv, "v2"); - ASSERT_TRUE(redis.Index("k1", -1, &tempv)); - ASSERT_EQ(tempv, "v3"); - - // Out of bounds (return empty, no crash) - ASSERT_TRUE(!redis.Index("k1", 6, &tempv)); - ASSERT_TRUE(!redis.Index("k1", 123219, &tempv)); - ASSERT_TRUE(!redis.Index("k1", -7, &tempv)); - ASSERT_TRUE(!redis.Index("k1", -129, &tempv)); -} - - -// Exhaustive test of the Range() function -TEST_F(RedisListsTest, RangeTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Simple Pushes (will yield: [v6, v4, v4, v1, v2, v3]) - redis.PushRight("k1", "v1"); - redis.PushRight("k1", "v2"); - redis.PushRight("k1", "v3"); - redis.PushLeft("k1", "v4"); - redis.PushLeft("k1", "v4"); - redis.PushLeft("k1", "v6"); - - // Sanity check (check the length; make sure it's 6) - ASSERT_EQ(redis.Length("k1"), 6); - - // Simple range - std::vector res = redis.Range("k1", 1, 4); - ASSERT_EQ((int)res.size(), 4); - ASSERT_EQ(res[0], "v4"); - ASSERT_EQ(res[1], "v4"); - ASSERT_EQ(res[2], "v1"); - ASSERT_EQ(res[3], "v2"); - - // Negative indices (i.e.: measured from the end) - res = redis.Range("k1", 2, -1); - ASSERT_EQ((int)res.size(), 4); - ASSERT_EQ(res[0], "v4"); - ASSERT_EQ(res[1], "v1"); - ASSERT_EQ(res[2], "v2"); - ASSERT_EQ(res[3], "v3"); - - res = redis.Range("k1", -6, -4); - ASSERT_EQ((int)res.size(), 3); - ASSERT_EQ(res[0], "v6"); - ASSERT_EQ(res[1], "v4"); - ASSERT_EQ(res[2], "v4"); - - res = redis.Range("k1", -1, 5); - ASSERT_EQ((int)res.size(), 1); - ASSERT_EQ(res[0], "v3"); - - // Partial / Broken indices - res = redis.Range("k1", -3, 1000000); - ASSERT_EQ((int)res.size(), 3); - ASSERT_EQ(res[0], "v1"); - ASSERT_EQ(res[1], "v2"); - ASSERT_EQ(res[2], "v3"); - - res = redis.Range("k1", -1000000, 1); - ASSERT_EQ((int)res.size(), 2); - ASSERT_EQ(res[0], "v6"); - ASSERT_EQ(res[1], "v4"); - - // Invalid indices - res = redis.Range("k1", 7, 9); - ASSERT_EQ((int)res.size(), 0); - - res = redis.Range("k1", -8, -7); - ASSERT_EQ((int)res.size(), 0); - - res = redis.Range("k1", 3, 2); - ASSERT_EQ((int)res.size(), 0); - - res = redis.Range("k1", 5, -2); - ASSERT_EQ((int)res.size(), 0); - - // Range matches Index - res = redis.Range("k1", -6, -4); - ASSERT_TRUE(redis.Index("k1", -6, &tempv)); - ASSERT_EQ(tempv, res[0]); - ASSERT_TRUE(redis.Index("k1", -5, &tempv)); - ASSERT_EQ(tempv, res[1]); - ASSERT_TRUE(redis.Index("k1", -4, &tempv)); - ASSERT_EQ(tempv, res[2]); - - // Last check - res = redis.Range("k1", 0, -6); - ASSERT_EQ((int)res.size(), 1); - ASSERT_EQ(res[0], "v6"); -} - -// Exhaustive test for InsertBefore(), and InsertAfter() -TEST_F(RedisListsTest, InsertTest) { - RedisLists redis(kDefaultDbName, options, true); - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Insert on empty list (return 0, and do not crash) - ASSERT_EQ(redis.InsertBefore("k1", "non-exist", "a"), 0); - ASSERT_EQ(redis.InsertAfter("k1", "other-non-exist", "c"), 0); - ASSERT_EQ(redis.Length("k1"), 0); - - // Push some preliminary stuff [g, f, e, d, c, b, a] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "b"); - redis.PushLeft("k1", "c"); - redis.PushLeft("k1", "d"); - redis.PushLeft("k1", "e"); - redis.PushLeft("k1", "f"); - redis.PushLeft("k1", "g"); - ASSERT_EQ(redis.Length("k1"), 7); - - // Test InsertBefore - int newLength = redis.InsertBefore("k1", "e", "hello"); - ASSERT_EQ(newLength, 8); - ASSERT_EQ(redis.Length("k1"), newLength); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "f"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "e"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "hello"); - - // Test InsertAfter - newLength = redis.InsertAfter("k1", "c", "bye"); - ASSERT_EQ(newLength, 9); - ASSERT_EQ(redis.Length("k1"), newLength); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "bye"); - - // Test bad value on InsertBefore - newLength = redis.InsertBefore("k1", "yo", "x"); - ASSERT_EQ(newLength, 9); - ASSERT_EQ(redis.Length("k1"), newLength); - - // Test bad value on InsertAfter - newLength = redis.InsertAfter("k1", "xxxx", "y"); - ASSERT_EQ(newLength, 9); - ASSERT_EQ(redis.Length("k1"), newLength); - - // Test InsertBefore beginning - newLength = redis.InsertBefore("k1", "g", "begggggggggggggggg"); - ASSERT_EQ(newLength, 10); - ASSERT_EQ(redis.Length("k1"), newLength); - - // Test InsertAfter end - newLength = redis.InsertAfter("k1", "a", "enddd"); - ASSERT_EQ(newLength, 11); - ASSERT_EQ(redis.Length("k1"), newLength); - - // Make sure nothing weird happened. - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "begggggggggggggggg"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "g"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "f"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "hello"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "e"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "d"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "c"); - ASSERT_TRUE(redis.Index("k1", 7, &tempv)); - ASSERT_EQ(tempv, "bye"); - ASSERT_TRUE(redis.Index("k1", 8, &tempv)); - ASSERT_EQ(tempv, "b"); - ASSERT_TRUE(redis.Index("k1", 9, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_TRUE(redis.Index("k1", 10, &tempv)); - ASSERT_EQ(tempv, "enddd"); -} - -// Exhaustive test of Set function -TEST_F(RedisListsTest, SetTest) { - RedisLists redis(kDefaultDbName, options, true); - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Set on empty list (return false, and do not crash) - ASSERT_EQ(redis.Set("k1", 7, "a"), false); - ASSERT_EQ(redis.Set("k1", 0, "a"), false); - ASSERT_EQ(redis.Set("k1", -49, "cx"), false); - ASSERT_EQ(redis.Length("k1"), 0); - - // Push some preliminary stuff [g, f, e, d, c, b, a] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "b"); - redis.PushLeft("k1", "c"); - redis.PushLeft("k1", "d"); - redis.PushLeft("k1", "e"); - redis.PushLeft("k1", "f"); - redis.PushLeft("k1", "g"); - ASSERT_EQ(redis.Length("k1"), 7); - - // Test Regular Set - ASSERT_TRUE(redis.Set("k1", 0, "0")); - ASSERT_TRUE(redis.Set("k1", 3, "3")); - ASSERT_TRUE(redis.Set("k1", 6, "6")); - ASSERT_TRUE(redis.Set("k1", 2, "2")); - ASSERT_TRUE(redis.Set("k1", 5, "5")); - ASSERT_TRUE(redis.Set("k1", 1, "1")); - ASSERT_TRUE(redis.Set("k1", 4, "4")); - - ASSERT_EQ(redis.Length("k1"), 7); // Size should not change - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "0"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "1"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "2"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "3"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "4"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "5"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "6"); - - // Set with negative indices - ASSERT_TRUE(redis.Set("k1", -7, "a")); - ASSERT_TRUE(redis.Set("k1", -4, "d")); - ASSERT_TRUE(redis.Set("k1", -1, "g")); - ASSERT_TRUE(redis.Set("k1", -5, "c")); - ASSERT_TRUE(redis.Set("k1", -2, "f")); - ASSERT_TRUE(redis.Set("k1", -6, "b")); - ASSERT_TRUE(redis.Set("k1", -3, "e")); - - ASSERT_EQ(redis.Length("k1"), 7); // Size should not change - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "b"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "c"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "d"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "e"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "f"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "g"); - - // Bad indices (just out-of-bounds / off-by-one check) - ASSERT_EQ(redis.Set("k1", -8, "off-by-one in negative index"), false); - ASSERT_EQ(redis.Set("k1", 7, "off-by-one-error in positive index"), false); - ASSERT_EQ(redis.Set("k1", 43892, "big random index should fail"), false); - ASSERT_EQ(redis.Set("k1", -21391, "large negative index should fail"), false); - - // One last check (to make sure nothing weird happened) - ASSERT_EQ(redis.Length("k1"), 7); // Size should not change - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "b"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "c"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "d"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "e"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "f"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "g"); -} - -// Testing Insert, Push, and Set, in a mixed environment -TEST_F(RedisListsTest, InsertPushSetTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // A series of pushes and insertions - // Will result in [newbegin, z, a, aftera, x, newend] - // Also, check the return value sometimes (should return length) - int lengthCheck; - lengthCheck = redis.PushLeft("k1", "a"); - ASSERT_EQ(lengthCheck, 1); - redis.PushLeft("k1", "z"); - redis.PushRight("k1", "x"); - lengthCheck = redis.InsertAfter("k1", "a", "aftera"); - ASSERT_EQ(lengthCheck , 4); - redis.InsertBefore("k1", "z", "newbegin"); // InsertBefore beginning of list - redis.InsertAfter("k1", "x", "newend"); // InsertAfter end of list - - // Check - std::vector res = redis.Range("k1", 0, -1); // Get the list - ASSERT_EQ((int)res.size(), 6); - ASSERT_EQ(res[0], "newbegin"); - ASSERT_EQ(res[5], "newend"); - ASSERT_EQ(res[3], "aftera"); - - // Testing duplicate values/pivots (multiple occurrences of 'a') - ASSERT_TRUE(redis.Set("k1", 0, "a")); // [a, z, a, aftera, x, newend] - redis.InsertAfter("k1", "a", "happy"); // [a, happy, z, a, aftera, ...] - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "happy"); - redis.InsertBefore("k1", "a", "sad"); // [sad, a, happy, z, a, aftera, ...] - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "sad"); - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "happy"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "aftera"); - redis.InsertAfter("k1", "a", "zz"); // [sad, a, zz, happy, z, a, aftera, ...] - ASSERT_TRUE(redis.Index("k1", 2, &tempv)); - ASSERT_EQ(tempv, "zz"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "aftera"); - ASSERT_TRUE(redis.Set("k1", 1, "nota")); // [sad, nota, zz, happy, z, a, ...] - redis.InsertBefore("k1", "a", "ba"); // [sad, nota, zz, happy, z, ba, a, ...] - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "ba"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "a"); - - // We currently have: [sad, nota, zz, happy, z, ba, a, aftera, x, newend] - // redis.Print("k1"); // manually check - - // Test Inserting before/after non-existent values - lengthCheck = redis.Length("k1"); // Ensure that the length doesn't change - ASSERT_EQ(lengthCheck, 10); - ASSERT_EQ(redis.InsertBefore("k1", "non-exist", "randval"), lengthCheck); - ASSERT_EQ(redis.InsertAfter("k1", "nothing", "a"), lengthCheck); - ASSERT_EQ(redis.InsertAfter("randKey", "randVal", "ranValue"), 0); // Empty - ASSERT_EQ(redis.Length("k1"), lengthCheck); // The length should not change - - // Simply Test the Set() function - redis.Set("k1", 5, "ba2"); - redis.InsertBefore("k1", "ba2", "beforeba2"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "beforeba2"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "ba2"); - ASSERT_TRUE(redis.Index("k1", 7, &tempv)); - ASSERT_EQ(tempv, "a"); - - // We have: [sad, nota, zz, happy, z, beforeba2, ba2, a, aftera, x, newend] - - // Set() with negative indices - redis.Set("k1", -1, "endprank"); - ASSERT_TRUE(!redis.Index("k1", 11, &tempv)); - ASSERT_TRUE(redis.Index("k1", 10, &tempv)); - ASSERT_EQ(tempv, "endprank"); // Ensure Set worked correctly - redis.Set("k1", -11, "t"); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "t"); - - // Test out of bounds Set - ASSERT_EQ(redis.Set("k1", -12, "ssd"), false); - ASSERT_EQ(redis.Set("k1", 11, "sasd"), false); - ASSERT_EQ(redis.Set("k1", 1200, "big"), false); -} - -// Testing Trim, Pop -TEST_F(RedisListsTest, TrimPopTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // A series of pushes and insertions - // Will result in [newbegin, z, a, aftera, x, newend] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "z"); - redis.PushRight("k1", "x"); - redis.InsertBefore("k1", "z", "newbegin"); // InsertBefore start of list - redis.InsertAfter("k1", "x", "newend"); // InsertAfter end of list - redis.InsertAfter("k1", "a", "aftera"); - - // Simple PopLeft/Right test - ASSERT_TRUE(redis.PopLeft("k1", &tempv)); - ASSERT_EQ(tempv, "newbegin"); - ASSERT_EQ(redis.Length("k1"), 5); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.PopRight("k1", &tempv)); - ASSERT_EQ(tempv, "newend"); - ASSERT_EQ(redis.Length("k1"), 4); - ASSERT_TRUE(redis.Index("k1", -1, &tempv)); - ASSERT_EQ(tempv, "x"); - - // Now have: [z, a, aftera, x] - - // Test Trim - ASSERT_TRUE(redis.Trim("k1", 0, -1)); // [z, a, aftera, x] (do nothing) - ASSERT_EQ(redis.Length("k1"), 4); - ASSERT_TRUE(redis.Trim("k1", 0, 2)); // [z, a, aftera] - ASSERT_EQ(redis.Length("k1"), 3); - ASSERT_TRUE(redis.Index("k1", -1, &tempv)); - ASSERT_EQ(tempv, "aftera"); - ASSERT_TRUE(redis.Trim("k1", 1, 1)); // [a] - ASSERT_EQ(redis.Length("k1"), 1); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "a"); - - // Test out of bounds (empty) trim - ASSERT_TRUE(redis.Trim("k1", 1, 0)); - ASSERT_EQ(redis.Length("k1"), 0); - - // Popping with empty list (return empty without error) - ASSERT_TRUE(!redis.PopLeft("k1", &tempv)); - ASSERT_TRUE(!redis.PopRight("k1", &tempv)); - ASSERT_TRUE(redis.Trim("k1", 0, 5)); - - // Exhaustive Trim test (negative and invalid indices) - // Will start in [newbegin, z, a, aftera, x, newend] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "z"); - redis.PushRight("k1", "x"); - redis.InsertBefore("k1", "z", "newbegin"); // InsertBefore start of list - redis.InsertAfter("k1", "x", "newend"); // InsertAfter end of list - redis.InsertAfter("k1", "a", "aftera"); - ASSERT_TRUE(redis.Trim("k1", -6, -1)); // Should do nothing - ASSERT_EQ(redis.Length("k1"), 6); - ASSERT_TRUE(redis.Trim("k1", 1, -2)); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "x"); - ASSERT_EQ(redis.Length("k1"), 4); - ASSERT_TRUE(redis.Trim("k1", -3, -2)); - ASSERT_EQ(redis.Length("k1"), 2); -} - -// Testing Remove, RemoveFirst, RemoveLast -TEST_F(RedisListsTest, RemoveTest) { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // A series of pushes and insertions - // Will result in [newbegin, z, a, aftera, x, newend, a, a] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "z"); - redis.PushRight("k1", "x"); - redis.InsertBefore("k1", "z", "newbegin"); // InsertBefore start of list - redis.InsertAfter("k1", "x", "newend"); // InsertAfter end of list - redis.InsertAfter("k1", "a", "aftera"); - redis.PushRight("k1", "a"); - redis.PushRight("k1", "a"); - - // Verify - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "newbegin"); - ASSERT_TRUE(redis.Index("k1", -1, &tempv)); - ASSERT_EQ(tempv, "a"); - - // Check RemoveFirst (Remove the first two 'a') - // Results in [newbegin, z, aftera, x, newend, a] - int numRemoved = redis.Remove("k1", 2, "a"); - ASSERT_EQ(numRemoved, 2); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "newbegin"); - ASSERT_TRUE(redis.Index("k1", 1, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "newend"); - ASSERT_TRUE(redis.Index("k1", 5, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_EQ(redis.Length("k1"), 6); - - // Repopulate some stuff - // Results in: [x, x, x, x, x, newbegin, z, x, aftera, x, newend, a, x] - redis.PushLeft("k1", "x"); - redis.PushLeft("k1", "x"); - redis.PushLeft("k1", "x"); - redis.PushLeft("k1", "x"); - redis.PushLeft("k1", "x"); - redis.PushRight("k1", "x"); - redis.InsertAfter("k1", "z", "x"); - - // Test removal from end - numRemoved = redis.Remove("k1", -2, "x"); - ASSERT_EQ(numRemoved, 2); - ASSERT_TRUE(redis.Index("k1", 8, &tempv)); - ASSERT_EQ(tempv, "aftera"); - ASSERT_TRUE(redis.Index("k1", 9, &tempv)); - ASSERT_EQ(tempv, "newend"); - ASSERT_TRUE(redis.Index("k1", 10, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_TRUE(!redis.Index("k1", 11, &tempv)); - numRemoved = redis.Remove("k1", -2, "x"); - ASSERT_EQ(numRemoved, 2); - ASSERT_TRUE(redis.Index("k1", 4, &tempv)); - ASSERT_EQ(tempv, "newbegin"); - ASSERT_TRUE(redis.Index("k1", 6, &tempv)); - ASSERT_EQ(tempv, "aftera"); - - // We now have: [x, x, x, x, newbegin, z, aftera, newend, a] - ASSERT_EQ(redis.Length("k1"), 9); - ASSERT_TRUE(redis.Index("k1", -1, &tempv)); - ASSERT_EQ(tempv, "a"); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "x"); - - // Test over-shooting (removing more than there exists) - numRemoved = redis.Remove("k1", -9000, "x"); - ASSERT_EQ(numRemoved , 4); // Only really removed 4 - ASSERT_EQ(redis.Length("k1"), 5); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "newbegin"); - numRemoved = redis.Remove("k1", 1, "x"); - ASSERT_EQ(numRemoved, 0); - - // Try removing ALL! - numRemoved = redis.Remove("k1", 0, "newbegin"); // REMOVE 0 will remove all! - ASSERT_EQ(numRemoved, 1); - - // Removal from an empty-list - ASSERT_TRUE(redis.Trim("k1", 1, 0)); - numRemoved = redis.Remove("k1", 1, "z"); - ASSERT_EQ(numRemoved, 0); -} - - -// Test Multiple keys and Persistence -TEST_F(RedisListsTest, PersistenceMultiKeyTest) { - std::string tempv; // Used below for all Index(), PopRight(), PopLeft() - - // Block one: populate a single key in the database - { - RedisLists redis(kDefaultDbName, options, true); // Destructive - - // A series of pushes and insertions - // Will result in [newbegin, z, a, aftera, x, newend, a, a] - redis.PushLeft("k1", "a"); - redis.PushLeft("k1", "z"); - redis.PushRight("k1", "x"); - redis.InsertBefore("k1", "z", "newbegin"); // InsertBefore start of list - redis.InsertAfter("k1", "x", "newend"); // InsertAfter end of list - redis.InsertAfter("k1", "a", "aftera"); - redis.PushRight("k1", "a"); - redis.PushRight("k1", "a"); - - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "aftera"); - } - - // Block two: make sure changes were saved and add some other key - { - RedisLists redis(kDefaultDbName, options, false); // Persistent, non-destructive - - // Check - ASSERT_EQ(redis.Length("k1"), 8); - ASSERT_TRUE(redis.Index("k1", 3, &tempv)); - ASSERT_EQ(tempv, "aftera"); - - redis.PushRight("k2", "randomkey"); - redis.PushLeft("k2", "sas"); - - redis.PopLeft("k1", &tempv); - } - - // Block three: Verify the changes from block 2 - { - RedisLists redis(kDefaultDbName, options, false); // Persistent, non-destructive - - // Check - ASSERT_EQ(redis.Length("k1"), 7); - ASSERT_EQ(redis.Length("k2"), 2); - ASSERT_TRUE(redis.Index("k1", 0, &tempv)); - ASSERT_EQ(tempv, "z"); - ASSERT_TRUE(redis.Index("k2", -2, &tempv)); - ASSERT_EQ(tempv, "sas"); - } -} - -/// THE manual REDIS TEST begins here -/// THIS WILL ONLY OCCUR IF YOU RUN: ./redis_test -m - -namespace { -void MakeUpper(std::string* const s) { - int len = static_cast(s->length()); - for (int i = 0; i < len; ++i) { - (*s)[i] = static_cast(toupper((*s)[i])); // C-version defined in - } -} - -/// Allows the user to enter in REDIS commands into the command-line. -/// This is useful for manual / interacticve testing / debugging. -/// Use destructive=true to clean the database before use. -/// Use destructive=false to remember the previous state (i.e.: persistent) -/// Should be called from main function. -int manual_redis_test(bool destructive){ - RedisLists redis(RedisListsTest::kDefaultDbName, - RedisListsTest::options, - destructive); - - // TODO: Right now, please use spaces to separate each word. - // In actual redis, you can use quotes to specify compound values - // Example: RPUSH mylist "this is a compound value" - - std::string command; - while(true) { - std::cin >> command; - MakeUpper(&command); - - if (command == "LINSERT") { - std::string k, t, p, v; - std::cin >> k >> t >> p >> v; - MakeUpper(&t); - if (t=="BEFORE") { - std::cout << redis.InsertBefore(k, p, v) << std::endl; - } else if (t=="AFTER") { - std::cout << redis.InsertAfter(k, p, v) << std::endl; - } - } else if (command == "LPUSH") { - std::string k, v; - std::cin >> k >> v; - redis.PushLeft(k, v); - } else if (command == "RPUSH") { - std::string k, v; - std::cin >> k >> v; - redis.PushRight(k, v); - } else if (command == "LPOP") { - std::string k; - std::cin >> k; - std::string res; - redis.PopLeft(k, &res); - std::cout << res << std::endl; - } else if (command == "RPOP") { - std::string k; - std::cin >> k; - std::string res; - redis.PopRight(k, &res); - std::cout << res << std::endl; - } else if (command == "LREM") { - std::string k; - int amt; - std::string v; - - std::cin >> k >> amt >> v; - std::cout << redis.Remove(k, amt, v) << std::endl; - } else if (command == "LLEN") { - std::string k; - std::cin >> k; - std::cout << redis.Length(k) << std::endl; - } else if (command == "LRANGE") { - std::string k; - int i, j; - std::cin >> k >> i >> j; - std::vector res = redis.Range(k, i, j); - for (auto it = res.begin(); it != res.end(); ++it) { - std::cout << " " << (*it); - } - std::cout << std::endl; - } else if (command == "LTRIM") { - std::string k; - int i, j; - std::cin >> k >> i >> j; - redis.Trim(k, i, j); - } else if (command == "LSET") { - std::string k; - int idx; - std::string v; - std::cin >> k >> idx >> v; - redis.Set(k, idx, v); - } else if (command == "LINDEX") { - std::string k; - int idx; - std::cin >> k >> idx; - std::string res; - redis.Index(k, idx, &res); - std::cout << res << std::endl; - } else if (command == "PRINT") { // Added by Deon - std::string k; - std::cin >> k; - redis.Print(k); - } else if (command == "QUIT") { - return 0; - } else { - std::cout << "unknown command: " << command << std::endl; - } - } -} -} // namespace - -} // namespace rocksdb - - -// USAGE: "./redis_test" for default (unit tests) -// "./redis_test -m" for manual testing (redis command api) -// "./redis_test -m -d" for destructive manual test (erase db before use) - - -namespace { -// Check for "want" argument in the argument list -bool found_arg(int argc, char* argv[], const char* want){ - for(int i=1; i - -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "SKIPPED as redis is not supported in ROCKSDB_LITE\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/simulator_cache/sim_cache.cc b/ceph/src/rocksdb/utilities/simulator_cache/sim_cache.cc index bdf6c5aa8..8629b60b0 100644 --- a/ceph/src/rocksdb/utilities/simulator_cache/sim_cache.cc +++ b/ceph/src/rocksdb/utilities/simulator_cache/sim_cache.cc @@ -160,18 +160,16 @@ class SimCacheImpl : public SimCache { hit_times_(0), stats_(nullptr) {} - virtual ~SimCacheImpl() {} - virtual void SetCapacity(size_t capacity) override { - cache_->SetCapacity(capacity); - } + ~SimCacheImpl() override {} + void SetCapacity(size_t capacity) override { cache_->SetCapacity(capacity); } - virtual void SetStrictCapacityLimit(bool strict_capacity_limit) override { + void SetStrictCapacityLimit(bool strict_capacity_limit) override { cache_->SetStrictCapacityLimit(strict_capacity_limit); } - virtual Status Insert(const Slice& key, void* value, size_t charge, - void (*deleter)(const Slice& key, void* value), - Handle** handle, Priority priority) override { + Status Insert(const Slice& key, void* value, size_t charge, + void (*deleter)(const Slice& key, void* value), Handle** handle, + Priority priority) override { // The handle and value passed in are for real cache, so we pass nullptr // to key_only_cache_ for both instead. Also, the deleter function pointer // will be called by user to perform some external operation which should @@ -191,7 +189,7 @@ class SimCacheImpl : public SimCache { return cache_->Insert(key, value, charge, deleter, handle, priority); } - virtual Handle* Lookup(const Slice& key, Statistics* stats) override { + Handle* Lookup(const Slice& key, Statistics* stats) override { Handle* h = key_only_cache_->Lookup(key); if (h != nullptr) { key_only_cache_->Release(h); @@ -207,79 +205,75 @@ class SimCacheImpl : public SimCache { return cache_->Lookup(key, stats); } - virtual bool Ref(Handle* handle) override { return cache_->Ref(handle); } + bool Ref(Handle* handle) override { return cache_->Ref(handle); } - virtual bool Release(Handle* handle, bool force_erase = false) override { + bool Release(Handle* handle, bool force_erase = false) override { return cache_->Release(handle, force_erase); } - virtual void Erase(const Slice& key) override { + void Erase(const Slice& key) override { cache_->Erase(key); key_only_cache_->Erase(key); } - virtual void* Value(Handle* handle) override { return cache_->Value(handle); } + void* Value(Handle* handle) override { return cache_->Value(handle); } - virtual uint64_t NewId() override { return cache_->NewId(); } + uint64_t NewId() override { return cache_->NewId(); } - virtual size_t GetCapacity() const override { return cache_->GetCapacity(); } + size_t GetCapacity() const override { return cache_->GetCapacity(); } - virtual bool HasStrictCapacityLimit() const override { + bool HasStrictCapacityLimit() const override { return cache_->HasStrictCapacityLimit(); } - virtual size_t GetUsage() const override { return cache_->GetUsage(); } + size_t GetUsage() const override { return cache_->GetUsage(); } - virtual size_t GetUsage(Handle* handle) const override { + size_t GetUsage(Handle* handle) const override { return cache_->GetUsage(handle); } - virtual size_t GetPinnedUsage() const override { - return cache_->GetPinnedUsage(); - } + size_t GetPinnedUsage() const override { return cache_->GetPinnedUsage(); } - virtual void DisownData() override { + void DisownData() override { cache_->DisownData(); key_only_cache_->DisownData(); } - virtual void ApplyToAllCacheEntries(void (*callback)(void*, size_t), - bool thread_safe) override { + void ApplyToAllCacheEntries(void (*callback)(void*, size_t), + bool thread_safe) override { // only apply to _cache since key_only_cache doesn't hold value cache_->ApplyToAllCacheEntries(callback, thread_safe); } - virtual void EraseUnRefEntries() override { + void EraseUnRefEntries() override { cache_->EraseUnRefEntries(); key_only_cache_->EraseUnRefEntries(); } - virtual size_t GetSimCapacity() const override { + size_t GetSimCapacity() const override { return key_only_cache_->GetCapacity(); } - virtual size_t GetSimUsage() const override { - return key_only_cache_->GetUsage(); - } - virtual void SetSimCapacity(size_t capacity) override { + size_t GetSimUsage() const override { return key_only_cache_->GetUsage(); } + void SetSimCapacity(size_t capacity) override { key_only_cache_->SetCapacity(capacity); } - virtual uint64_t get_miss_counter() const override { + uint64_t get_miss_counter() const override { return miss_times_.load(std::memory_order_relaxed); } - virtual uint64_t get_hit_counter() const override { + uint64_t get_hit_counter() const override { return hit_times_.load(std::memory_order_relaxed); } - virtual void reset_counter() override { + void reset_counter() override { miss_times_.store(0, std::memory_order_relaxed); hit_times_.store(0, std::memory_order_relaxed); SetTickerCount(stats_, SIM_BLOCK_CACHE_HIT, 0); SetTickerCount(stats_, SIM_BLOCK_CACHE_MISS, 0); } - virtual std::string ToString() const override { + std::string ToString() const override { std::string res; res.append("SimCache MISSes: " + std::to_string(get_miss_counter()) + "\n"); res.append("SimCache HITs: " + std::to_string(get_hit_counter()) + "\n"); @@ -291,7 +285,7 @@ class SimCacheImpl : public SimCache { return res; } - virtual std::string GetPrintableOptions() const override { + std::string GetPrintableOptions() const override { std::string ret; ret.reserve(20000); ret.append(" cache_options:\n"); @@ -301,18 +295,15 @@ class SimCacheImpl : public SimCache { return ret; } - virtual Status StartActivityLogging(const std::string& activity_log_file, - Env* env, - uint64_t max_logging_size = 0) override { + Status StartActivityLogging(const std::string& activity_log_file, Env* env, + uint64_t max_logging_size = 0) override { return cache_activity_logger_.StartLogging(activity_log_file, env, max_logging_size); } - virtual void StopActivityLogging() override { - cache_activity_logger_.StopLogging(); - } + void StopActivityLogging() override { cache_activity_logger_.StopLogging(); } - virtual Status GetActivityLoggingStatus() override { + Status GetActivityLoggingStatus() override { return cache_activity_logger_.bg_status(); } diff --git a/ceph/src/rocksdb/utilities/spatialdb/spatial_db.cc b/ceph/src/rocksdb/utilities/spatialdb/spatial_db.cc deleted file mode 100644 index 627eb9de6..000000000 --- a/ceph/src/rocksdb/utilities/spatialdb/spatial_db.cc +++ /dev/null @@ -1,919 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE - -#include "rocksdb/utilities/spatial_db.h" - -#ifndef __STDC_FORMAT_MACROS -#define __STDC_FORMAT_MACROS -#endif - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "rocksdb/cache.h" -#include "rocksdb/options.h" -#include "rocksdb/memtablerep.h" -#include "rocksdb/slice_transform.h" -#include "rocksdb/statistics.h" -#include "rocksdb/table.h" -#include "rocksdb/db.h" -#include "rocksdb/utilities/stackable_db.h" -#include "util/coding.h" -#include "utilities/spatialdb/utils.h" -#include "port/port.h" - -namespace rocksdb { -namespace spatial { - -// Column families are used to store element's data and spatial indexes. We use -// [default] column family to store the element data. This is the format of -// [default] column family: -// * id (fixed 64 big endian) -> blob (length prefixed slice) feature_set -// (serialized) -// We have one additional column family for each spatial index. The name of the -// column family is [spatial$]. The format is: -// * quad_key (fixed 64 bit big endian) id (fixed 64 bit big endian) -> "" -// We store information about indexes in [metadata] column family. Format is: -// * spatial$ -> bbox (4 double encodings) tile_bits -// (varint32) - -namespace { -const std::string kMetadataColumnFamilyName("metadata"); -inline std::string GetSpatialIndexColumnFamilyName( - const std::string& spatial_index_name) { - return "spatial$" + spatial_index_name; -} -inline bool GetSpatialIndexName(const std::string& column_family_name, - Slice* dst) { - *dst = Slice(column_family_name); - if (dst->starts_with("spatial$")) { - dst->remove_prefix(8); // strlen("spatial$") - return true; - } - return false; -} - -} // namespace - -void Variant::Init(const Variant& v, Data& d) { - switch (v.type_) { - case kNull: - break; - case kBool: - d.b = v.data_.b; - break; - case kInt: - d.i = v.data_.i; - break; - case kDouble: - d.d = v.data_.d; - break; - case kString: - new (d.s) std::string(*GetStringPtr(v.data_)); - break; - default: - assert(false); - } -} - -Variant& Variant::operator=(const Variant& v) { - // Construct first a temp so exception from a string ctor - // does not change this object - Data tmp; - Init(v, tmp); - - Type thisType = type_; - // Boils down to copying bits so safe - std::swap(tmp, data_); - type_ = v.type_; - - Destroy(thisType, tmp); - - return *this; -} - -Variant& Variant::operator=(Variant&& rhs) { - Destroy(type_, data_); - if (rhs.type_ == kString) { - new (data_.s) std::string(std::move(*GetStringPtr(rhs.data_))); - } else { - data_ = rhs.data_; - } - type_ = rhs.type_; - rhs.type_ = kNull; - return *this; -} - -bool Variant::operator==(const Variant& rhs) const { - if (type_ != rhs.type_) { - return false; - } - - switch (type_) { - case kNull: - return true; - case kBool: - return data_.b == rhs.data_.b; - case kInt: - return data_.i == rhs.data_.i; - case kDouble: - return data_.d == rhs.data_.d; - case kString: - return *GetStringPtr(data_) == *GetStringPtr(rhs.data_); - default: - assert(false); - } - // it will never reach here, but otherwise the compiler complains - return false; -} - -FeatureSet* FeatureSet::Set(const std::string& key, const Variant& value) { - map_.insert({key, value}); - return this; -} - -bool FeatureSet::Contains(const std::string& key) const { - return map_.find(key) != map_.end(); -} - -const Variant& FeatureSet::Get(const std::string& key) const { - auto itr = map_.find(key); - assert(itr != map_.end()); - return itr->second; -} - -FeatureSet::iterator FeatureSet::Find(const std::string& key) const { - return iterator(map_.find(key)); -} - -void FeatureSet::Clear() { map_.clear(); } - -void FeatureSet::Serialize(std::string* output) const { - for (const auto& iter : map_) { - PutLengthPrefixedSlice(output, iter.first); - output->push_back(static_cast(iter.second.type())); - switch (iter.second.type()) { - case Variant::kNull: - break; - case Variant::kBool: - output->push_back(static_cast(iter.second.get_bool())); - break; - case Variant::kInt: - PutVarint64(output, iter.second.get_int()); - break; - case Variant::kDouble: { - PutDouble(output, iter.second.get_double()); - break; - } - case Variant::kString: - PutLengthPrefixedSlice(output, iter.second.get_string()); - break; - default: - assert(false); - } - } -} - -bool FeatureSet::Deserialize(const Slice& input) { - assert(map_.empty()); - Slice s(input); - while (s.size()) { - Slice key; - if (!GetLengthPrefixedSlice(&s, &key) || s.size() == 0) { - return false; - } - char type = s[0]; - s.remove_prefix(1); - switch (type) { - case Variant::kNull: { - map_.insert({key.ToString(), Variant()}); - break; - } - case Variant::kBool: { - if (s.size() == 0) { - return false; - } - map_.insert({key.ToString(), Variant(static_cast(s[0]))}); - s.remove_prefix(1); - break; - } - case Variant::kInt: { - uint64_t v; - if (!GetVarint64(&s, &v)) { - return false; - } - map_.insert({key.ToString(), Variant(v)}); - break; - } - case Variant::kDouble: { - double d; - if (!GetDouble(&s, &d)) { - return false; - } - map_.insert({key.ToString(), Variant(d)}); - break; - } - case Variant::kString: { - Slice str; - if (!GetLengthPrefixedSlice(&s, &str)) { - return false; - } - map_.insert({key.ToString(), str.ToString()}); - break; - } - default: - return false; - } - } - return true; -} - -std::string FeatureSet::DebugString() const { - std::string out = "{"; - bool comma = false; - for (const auto& iter : map_) { - if (comma) { - out.append(", "); - } else { - comma = true; - } - out.append("\"" + iter.first + "\": "); - switch (iter.second.type()) { - case Variant::kNull: - out.append("null"); - break; - case Variant::kBool: - if (iter.second.get_bool()) { - out.append("true"); - } else { - out.append("false"); - } - break; - case Variant::kInt: { - char buf[32]; - snprintf(buf, sizeof(buf), "%" PRIu64, iter.second.get_int()); - out.append(buf); - break; - } - case Variant::kDouble: { - char buf[32]; - snprintf(buf, sizeof(buf), "%lf", iter.second.get_double()); - out.append(buf); - break; - } - case Variant::kString: - out.append("\"" + iter.second.get_string() + "\""); - break; - default: - assert(false); - } - } - return out + "}"; -} - -class ValueGetter { - public: - ValueGetter() {} - virtual ~ValueGetter() {} - - virtual bool Get(uint64_t id) = 0; - virtual const Slice value() const = 0; - - virtual Status status() const = 0; -}; - -class ValueGetterFromDB : public ValueGetter { - public: - ValueGetterFromDB(DB* db, ColumnFamilyHandle* cf) : db_(db), cf_(cf) {} - - virtual bool Get(uint64_t id) override { - std::string encoded_id; - PutFixed64BigEndian(&encoded_id, id); - status_ = db_->Get(ReadOptions(), cf_, encoded_id, &value_); - if (status_.IsNotFound()) { - status_ = Status::Corruption("Index inconsistency"); - return false; - } - - return true; - } - - virtual const Slice value() const override { return value_; } - - virtual Status status() const override { return status_; } - - private: - std::string value_; - DB* db_; - ColumnFamilyHandle* cf_; - Status status_; -}; - -class ValueGetterFromIterator : public ValueGetter { - public: - explicit ValueGetterFromIterator(Iterator* iterator) : iterator_(iterator) {} - - virtual bool Get(uint64_t id) override { - std::string encoded_id; - PutFixed64BigEndian(&encoded_id, id); - iterator_->Seek(encoded_id); - - if (!iterator_->Valid() || iterator_->key() != Slice(encoded_id)) { - status_ = Status::Corruption("Index inconsistency"); - return false; - } - - return true; - } - - virtual const Slice value() const override { return iterator_->value(); } - - virtual Status status() const override { return status_; } - - private: - std::unique_ptr iterator_; - Status status_; -}; - -class SpatialIndexCursor : public Cursor { - public: - // tile_box is inclusive - SpatialIndexCursor(Iterator* spatial_iterator, ValueGetter* value_getter, - const BoundingBox& tile_bbox, uint32_t tile_bits) - : value_getter_(value_getter), valid_(true) { - // calculate quad keys we'll need to query - std::vector quad_keys; - quad_keys.reserve(static_cast((tile_bbox.max_x - tile_bbox.min_x + 1) * - (tile_bbox.max_y - tile_bbox.min_y + 1))); - for (uint64_t x = tile_bbox.min_x; x <= tile_bbox.max_x; ++x) { - for (uint64_t y = tile_bbox.min_y; y <= tile_bbox.max_y; ++y) { - quad_keys.push_back(GetQuadKeyFromTile(x, y, tile_bits)); - } - } - std::sort(quad_keys.begin(), quad_keys.end()); - - // load primary key ids for all quad keys - for (auto quad_key : quad_keys) { - std::string encoded_quad_key; - PutFixed64BigEndian(&encoded_quad_key, quad_key); - Slice slice_quad_key(encoded_quad_key); - - // If CheckQuadKey is true, there is no need to reseek, since - // spatial_iterator is already pointing at the correct quad key. This is - // an optimization. - if (!CheckQuadKey(spatial_iterator, slice_quad_key)) { - spatial_iterator->Seek(slice_quad_key); - } - - while (CheckQuadKey(spatial_iterator, slice_quad_key)) { - // extract ID from spatial_iterator - uint64_t id; - bool ok = GetFixed64BigEndian( - Slice(spatial_iterator->key().data() + sizeof(uint64_t), - sizeof(uint64_t)), - &id); - if (!ok) { - valid_ = false; - status_ = Status::Corruption("Spatial index corruption"); - break; - } - primary_key_ids_.insert(id); - spatial_iterator->Next(); - } - } - - if (!spatial_iterator->status().ok()) { - status_ = spatial_iterator->status(); - valid_ = false; - } - delete spatial_iterator; - - valid_ = valid_ && !primary_key_ids_.empty(); - - if (valid_) { - primary_keys_iterator_ = primary_key_ids_.begin(); - ExtractData(); - } - } - - virtual bool Valid() const override { return valid_; } - - virtual void Next() override { - assert(valid_); - - ++primary_keys_iterator_; - if (primary_keys_iterator_ == primary_key_ids_.end()) { - valid_ = false; - return; - } - - ExtractData(); - } - - virtual const Slice blob() override { return current_blob_; } - virtual const FeatureSet& feature_set() override { - return current_feature_set_; - } - - virtual Status status() const override { - if (!status_.ok()) { - return status_; - } - return value_getter_->status(); - } - - private: - // * returns true if spatial iterator is on the current quad key and all is - // well - // * returns false if spatial iterator is not on current, or iterator is - // invalid or corruption - bool CheckQuadKey(Iterator* spatial_iterator, const Slice& quad_key) { - if (!spatial_iterator->Valid()) { - return false; - } - if (spatial_iterator->key().size() != 2 * sizeof(uint64_t)) { - status_ = Status::Corruption("Invalid spatial index key"); - valid_ = false; - return false; - } - Slice spatial_iterator_quad_key(spatial_iterator->key().data(), - sizeof(uint64_t)); - if (spatial_iterator_quad_key != quad_key) { - // caller needs to reseek - return false; - } - // if we come to here, we have found the quad key - return true; - } - - void ExtractData() { - assert(valid_); - valid_ = value_getter_->Get(*primary_keys_iterator_); - - if (valid_) { - Slice data = value_getter_->value(); - current_feature_set_.Clear(); - if (!GetLengthPrefixedSlice(&data, ¤t_blob_) || - !current_feature_set_.Deserialize(data)) { - status_ = Status::Corruption("Primary key column family corruption"); - valid_ = false; - } - } - - } - - unique_ptr value_getter_; - bool valid_; - Status status_; - - FeatureSet current_feature_set_; - Slice current_blob_; - - // This is loaded from spatial iterator. - std::unordered_set primary_key_ids_; - std::unordered_set::iterator primary_keys_iterator_; -}; - -class ErrorCursor : public Cursor { - public: - explicit ErrorCursor(Status s) : s_(s) { assert(!s.ok()); } - virtual Status status() const override { return s_; } - virtual bool Valid() const override { return false; } - virtual void Next() override { assert(false); } - - virtual const Slice blob() override { - assert(false); - return Slice(); - } - virtual const FeatureSet& feature_set() override { - assert(false); - // compiler complains otherwise - return trash_; - } - - private: - Status s_; - FeatureSet trash_; -}; - -class SpatialDBImpl : public SpatialDB { - public: - // * db -- base DB that needs to be forwarded to StackableDB - // * data_column_family -- column family used to store the data - // * spatial_indexes -- a list of spatial indexes together with column - // families that correspond to those spatial indexes - // * next_id -- next ID in auto-incrementing ID. This is usually - // `max_id_currenty_in_db + 1` - SpatialDBImpl( - DB* db, ColumnFamilyHandle* data_column_family, - const std::vector>& - spatial_indexes, - uint64_t next_id, bool read_only) - : SpatialDB(db), - data_column_family_(data_column_family), - next_id_(next_id), - read_only_(read_only) { - for (const auto& index : spatial_indexes) { - name_to_index_.insert( - {index.first.name, IndexColumnFamily(index.first, index.second)}); - } - } - - ~SpatialDBImpl() { - for (auto& iter : name_to_index_) { - delete iter.second.column_family; - } - delete data_column_family_; - } - - virtual Status Insert( - const WriteOptions& write_options, const BoundingBox& bbox, - const Slice& blob, const FeatureSet& feature_set, - const std::vector& spatial_indexes) override { - WriteBatch batch; - - if (spatial_indexes.size() == 0) { - return Status::InvalidArgument("Spatial indexes can't be empty"); - } - - const size_t kWriteOutEveryBytes = 1024 * 1024; // 1MB - uint64_t id = next_id_.fetch_add(1); - - for (const auto& si : spatial_indexes) { - auto itr = name_to_index_.find(si); - if (itr == name_to_index_.end()) { - return Status::InvalidArgument("Can't find index " + si); - } - const auto& spatial_index = itr->second.index; - if (!spatial_index.bbox.Intersects(bbox)) { - continue; - } - BoundingBox tile_bbox = GetTileBoundingBox(spatial_index, bbox); - - for (uint64_t x = tile_bbox.min_x; x <= tile_bbox.max_x; ++x) { - for (uint64_t y = tile_bbox.min_y; y <= tile_bbox.max_y; ++y) { - // see above for format - std::string key; - PutFixed64BigEndian( - &key, GetQuadKeyFromTile(x, y, spatial_index.tile_bits)); - PutFixed64BigEndian(&key, id); - batch.Put(itr->second.column_family, key, Slice()); - if (batch.GetDataSize() >= kWriteOutEveryBytes) { - Status s = Write(write_options, &batch); - batch.Clear(); - if (!s.ok()) { - return s; - } - } - } - } - } - - // see above for format - std::string data_key; - PutFixed64BigEndian(&data_key, id); - std::string data_value; - PutLengthPrefixedSlice(&data_value, blob); - feature_set.Serialize(&data_value); - batch.Put(data_column_family_, data_key, data_value); - - return Write(write_options, &batch); - } - - virtual Status Compact(int num_threads) override { - std::vector column_families; - column_families.push_back(data_column_family_); - - for (auto& iter : name_to_index_) { - column_families.push_back(iter.second.column_family); - } - - std::mutex state_mutex; - std::condition_variable cv; - Status s; - int threads_running = 0; - - std::vector threads; - - for (auto cfh : column_families) { - threads.emplace_back([&, cfh] { - { - std::unique_lock lk(state_mutex); - cv.wait(lk, [&] { return threads_running < num_threads; }); - threads_running++; - } - - Status t = Flush(FlushOptions(), cfh); - if (t.ok()) { - t = CompactRange(CompactRangeOptions(), cfh, nullptr, nullptr); - } - - { - std::unique_lock lk(state_mutex); - threads_running--; - if (s.ok() && !t.ok()) { - s = t; - } - cv.notify_one(); - } - }); - } - - for (auto& t : threads) { - t.join(); - } - - return s; - } - - virtual Cursor* Query(const ReadOptions& read_options, - const BoundingBox& bbox, - const std::string& spatial_index) override { - auto itr = name_to_index_.find(spatial_index); - if (itr == name_to_index_.end()) { - return new ErrorCursor(Status::InvalidArgument( - "Spatial index " + spatial_index + " not found")); - } - const auto& si = itr->second.index; - Iterator* spatial_iterator; - ValueGetter* value_getter; - - if (read_only_) { - spatial_iterator = NewIterator(read_options, itr->second.column_family); - value_getter = new ValueGetterFromDB(this, data_column_family_); - } else { - std::vector iterators; - Status s = NewIterators(read_options, - {data_column_family_, itr->second.column_family}, - &iterators); - if (!s.ok()) { - return new ErrorCursor(s); - } - - spatial_iterator = iterators[1]; - value_getter = new ValueGetterFromIterator(iterators[0]); - } - return new SpatialIndexCursor(spatial_iterator, value_getter, - GetTileBoundingBox(si, bbox), si.tile_bits); - } - - private: - ColumnFamilyHandle* data_column_family_; - struct IndexColumnFamily { - SpatialIndexOptions index; - ColumnFamilyHandle* column_family; - IndexColumnFamily(const SpatialIndexOptions& _index, - ColumnFamilyHandle* _cf) - : index(_index), column_family(_cf) {} - }; - // constant after construction! - std::unordered_map name_to_index_; - - std::atomic next_id_; - bool read_only_; -}; - -namespace { -DBOptions GetDBOptionsFromSpatialDBOptions(const SpatialDBOptions& options) { - DBOptions db_options; - db_options.max_open_files = 50000; - db_options.max_background_compactions = 3 * options.num_threads / 4; - db_options.max_background_flushes = - options.num_threads - db_options.max_background_compactions; - db_options.env->SetBackgroundThreads(db_options.max_background_compactions, - Env::LOW); - db_options.env->SetBackgroundThreads(db_options.max_background_flushes, - Env::HIGH); - db_options.statistics = CreateDBStatistics(); - if (options.bulk_load) { - db_options.stats_dump_period_sec = 600; - } else { - db_options.stats_dump_period_sec = 1800; // 30min - } - return db_options; -} - -ColumnFamilyOptions GetColumnFamilyOptions(const SpatialDBOptions& /*options*/, - std::shared_ptr block_cache) { - ColumnFamilyOptions column_family_options; - column_family_options.write_buffer_size = 128 * 1024 * 1024; // 128MB - column_family_options.max_write_buffer_number = 4; - column_family_options.max_bytes_for_level_base = 256 * 1024 * 1024; // 256MB - column_family_options.target_file_size_base = 64 * 1024 * 1024; // 64MB - column_family_options.level0_file_num_compaction_trigger = 2; - column_family_options.level0_slowdown_writes_trigger = 16; - column_family_options.level0_stop_writes_trigger = 32; - // only compress levels >= 2 - column_family_options.compression_per_level.resize( - column_family_options.num_levels); - for (int i = 0; i < column_family_options.num_levels; ++i) { - if (i < 2) { - column_family_options.compression_per_level[i] = kNoCompression; - } else { - column_family_options.compression_per_level[i] = kLZ4Compression; - } - } - BlockBasedTableOptions table_options; - table_options.block_cache = block_cache; - column_family_options.table_factory.reset( - NewBlockBasedTableFactory(table_options)); - return column_family_options; -} - -ColumnFamilyOptions OptimizeOptionsForDataColumnFamily( - ColumnFamilyOptions options, std::shared_ptr block_cache) { - options.prefix_extractor.reset(NewNoopTransform()); - BlockBasedTableOptions block_based_options; - block_based_options.index_type = BlockBasedTableOptions::kHashSearch; - block_based_options.block_cache = block_cache; - options.table_factory.reset(NewBlockBasedTableFactory(block_based_options)); - return options; -} - -} // namespace - -class MetadataStorage { - public: - MetadataStorage(DB* db, ColumnFamilyHandle* cf) : db_(db), cf_(cf) {} - ~MetadataStorage() {} - - // format: - // - Status AddIndex(const SpatialIndexOptions& index) { - std::string encoded_index; - PutDouble(&encoded_index, index.bbox.min_x); - PutDouble(&encoded_index, index.bbox.min_y); - PutDouble(&encoded_index, index.bbox.max_x); - PutDouble(&encoded_index, index.bbox.max_y); - PutVarint32(&encoded_index, index.tile_bits); - return db_->Put(WriteOptions(), cf_, - GetSpatialIndexColumnFamilyName(index.name), encoded_index); - } - - Status GetIndex(const std::string& name, SpatialIndexOptions* dst) { - std::string value; - Status s = db_->Get(ReadOptions(), cf_, - GetSpatialIndexColumnFamilyName(name), &value); - if (!s.ok()) { - return s; - } - dst->name = name; - Slice encoded_index(value); - bool ok = GetDouble(&encoded_index, &(dst->bbox.min_x)); - ok = ok && GetDouble(&encoded_index, &(dst->bbox.min_y)); - ok = ok && GetDouble(&encoded_index, &(dst->bbox.max_x)); - ok = ok && GetDouble(&encoded_index, &(dst->bbox.max_y)); - ok = ok && GetVarint32(&encoded_index, &(dst->tile_bits)); - return ok ? Status::OK() : Status::Corruption("Index encoding corrupted"); - } - - private: - DB* db_; - ColumnFamilyHandle* cf_; -}; - -Status SpatialDB::Create( - const SpatialDBOptions& options, const std::string& name, - const std::vector& spatial_indexes) { - DBOptions db_options = GetDBOptionsFromSpatialDBOptions(options); - db_options.create_if_missing = true; - db_options.create_missing_column_families = true; - db_options.error_if_exists = true; - - auto block_cache = NewLRUCache(static_cast(options.cache_size)); - ColumnFamilyOptions column_family_options = - GetColumnFamilyOptions(options, block_cache); - - std::vector column_families; - column_families.push_back(ColumnFamilyDescriptor( - kDefaultColumnFamilyName, - OptimizeOptionsForDataColumnFamily(column_family_options, block_cache))); - column_families.push_back( - ColumnFamilyDescriptor(kMetadataColumnFamilyName, column_family_options)); - - for (const auto& index : spatial_indexes) { - column_families.emplace_back(GetSpatialIndexColumnFamilyName(index.name), - column_family_options); - } - - std::vector handles; - DB* base_db; - Status s = DB::Open(db_options, name, column_families, &handles, &base_db); - if (!s.ok()) { - return s; - } - MetadataStorage metadata(base_db, handles[1]); - for (const auto& index : spatial_indexes) { - s = metadata.AddIndex(index); - if (!s.ok()) { - break; - } - } - - for (auto h : handles) { - delete h; - } - delete base_db; - - return s; -} - -Status SpatialDB::Open(const SpatialDBOptions& options, const std::string& name, - SpatialDB** db, bool read_only) { - DBOptions db_options = GetDBOptionsFromSpatialDBOptions(options); - auto block_cache = NewLRUCache(static_cast(options.cache_size)); - ColumnFamilyOptions column_family_options = - GetColumnFamilyOptions(options, block_cache); - - Status s; - std::vector existing_column_families; - std::vector spatial_indexes; - s = DB::ListColumnFamilies(db_options, name, &existing_column_families); - if (!s.ok()) { - return s; - } - for (const auto& cf_name : existing_column_families) { - Slice spatial_index; - if (GetSpatialIndexName(cf_name, &spatial_index)) { - spatial_indexes.emplace_back(spatial_index.data(), spatial_index.size()); - } - } - - std::vector column_families; - column_families.push_back(ColumnFamilyDescriptor( - kDefaultColumnFamilyName, - OptimizeOptionsForDataColumnFamily(column_family_options, block_cache))); - column_families.push_back( - ColumnFamilyDescriptor(kMetadataColumnFamilyName, column_family_options)); - - for (const auto& index : spatial_indexes) { - column_families.emplace_back(GetSpatialIndexColumnFamilyName(index), - column_family_options); - } - std::vector handles; - DB* base_db; - if (read_only) { - s = DB::OpenForReadOnly(db_options, name, column_families, &handles, - &base_db); - } else { - s = DB::Open(db_options, name, column_families, &handles, &base_db); - } - if (!s.ok()) { - return s; - } - - MetadataStorage metadata(base_db, handles[1]); - - std::vector> index_cf; - assert(handles.size() == spatial_indexes.size() + 2); - for (size_t i = 0; i < spatial_indexes.size(); ++i) { - SpatialIndexOptions index_options; - s = metadata.GetIndex(spatial_indexes[i], &index_options); - if (!s.ok()) { - break; - } - index_cf.emplace_back(index_options, handles[i + 2]); - } - uint64_t next_id = 1; - if (s.ok()) { - // find next_id - Iterator* iter = base_db->NewIterator(ReadOptions(), handles[0]); - iter->SeekToLast(); - if (iter->Valid()) { - uint64_t last_id = 0; - if (!GetFixed64BigEndian(iter->key(), &last_id)) { - s = Status::Corruption("Invalid key in data column family"); - } else { - next_id = last_id + 1; - } - } - delete iter; - } - if (!s.ok()) { - for (auto h : handles) { - delete h; - } - delete base_db; - return s; - } - - // I don't need metadata column family any more, so delete it - delete handles[1]; - *db = new SpatialDBImpl(base_db, handles[0], index_cf, next_id, read_only); - return Status::OK(); -} - -} // namespace spatial -} // namespace rocksdb -#endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/spatialdb/spatial_db_test.cc b/ceph/src/rocksdb/utilities/spatialdb/spatial_db_test.cc deleted file mode 100644 index 783b347d0..000000000 --- a/ceph/src/rocksdb/utilities/spatialdb/spatial_db_test.cc +++ /dev/null @@ -1,307 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#ifndef ROCKSDB_LITE - -#include -#include -#include - -#include "rocksdb/utilities/spatial_db.h" -#include "util/compression.h" -#include "util/testharness.h" -#include "util/testutil.h" -#include "util/random.h" - -namespace rocksdb { -namespace spatial { - -class SpatialDBTest : public testing::Test { - public: - SpatialDBTest() { - dbname_ = test::PerThreadDBPath("spatial_db_test"); - DestroyDB(dbname_, Options()); - } - - void AssertCursorResults(BoundingBox bbox, const std::string& index, - const std::vector& blobs) { - Cursor* c = db_->Query(ReadOptions(), bbox, index); - ASSERT_OK(c->status()); - std::multiset b; - for (auto x : blobs) { - b.insert(x); - } - - while (c->Valid()) { - auto itr = b.find(c->blob().ToString()); - ASSERT_TRUE(itr != b.end()); - b.erase(itr); - c->Next(); - } - ASSERT_EQ(b.size(), 0U); - ASSERT_OK(c->status()); - delete c; - } - - std::string dbname_; - SpatialDB* db_; -}; - -TEST_F(SpatialDBTest, FeatureSetSerializeTest) { - if (!LZ4_Supported()) { - return; - } - FeatureSet fs; - - fs.Set("a", std::string("b")); - fs.Set("x", static_cast(3)); - fs.Set("y", false); - fs.Set("n", Variant()); // null - fs.Set("m", 3.25); - - ASSERT_TRUE(fs.Find("w") == fs.end()); - ASSERT_TRUE(fs.Find("x") != fs.end()); - ASSERT_TRUE((*fs.Find("x")).second == Variant(static_cast(3))); - ASSERT_TRUE((*fs.Find("y")).second != Variant(true)); - std::set keys({"a", "x", "y", "n", "m"}); - for (const auto& x : fs) { - ASSERT_TRUE(keys.find(x.first) != keys.end()); - keys.erase(x.first); - } - ASSERT_EQ(keys.size(), 0U); - - std::string serialized; - fs.Serialize(&serialized); - - FeatureSet deserialized; - ASSERT_TRUE(deserialized.Deserialize(serialized)); - - ASSERT_TRUE(deserialized.Contains("a")); - ASSERT_EQ(deserialized.Get("a").type(), Variant::kString); - ASSERT_EQ(deserialized.Get("a").get_string(), "b"); - ASSERT_TRUE(deserialized.Contains("x")); - ASSERT_EQ(deserialized.Get("x").type(), Variant::kInt); - ASSERT_EQ(deserialized.Get("x").get_int(), static_cast(3)); - ASSERT_TRUE(deserialized.Contains("y")); - ASSERT_EQ(deserialized.Get("y").type(), Variant::kBool); - ASSERT_EQ(deserialized.Get("y").get_bool(), false); - ASSERT_TRUE(deserialized.Contains("n")); - ASSERT_EQ(deserialized.Get("n").type(), Variant::kNull); - ASSERT_TRUE(deserialized.Contains("m")); - ASSERT_EQ(deserialized.Get("m").type(), Variant::kDouble); - ASSERT_EQ(deserialized.Get("m").get_double(), 3.25); - - // corrupted serialization - serialized = serialized.substr(0, serialized.size() - 4); - deserialized.Clear(); - ASSERT_TRUE(!deserialized.Deserialize(serialized)); -} - -TEST_F(SpatialDBTest, TestNextID) { - if (!LZ4_Supported()) { - return; - } - ASSERT_OK(SpatialDB::Create( - SpatialDBOptions(), dbname_, - {SpatialIndexOptions("simple", BoundingBox(0, 0, 100, 100), 2)})); - - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(5, 5, 10, 10), - "one", FeatureSet(), {"simple"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(10, 10, 15, 15), - "two", FeatureSet(), {"simple"})); - delete db_; - db_ = nullptr; - - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - assert(db_ != nullptr); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(55, 55, 65, 65), - "three", FeatureSet(), {"simple"})); - delete db_; - - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - AssertCursorResults(BoundingBox(0, 0, 100, 100), "simple", - {"one", "two", "three"}); - delete db_; -} - -TEST_F(SpatialDBTest, FeatureSetTest) { - if (!LZ4_Supported()) { - return; - } - ASSERT_OK(SpatialDB::Create( - SpatialDBOptions(), dbname_, - {SpatialIndexOptions("simple", BoundingBox(0, 0, 100, 100), 2)})); - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - - FeatureSet fs; - fs.Set("a", std::string("b")); - fs.Set("c", std::string("d")); - - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(5, 5, 10, 10), - "one", fs, {"simple"})); - - Cursor* c = - db_->Query(ReadOptions(), BoundingBox(5, 5, 10, 10), "simple"); - - ASSERT_TRUE(c->Valid()); - ASSERT_EQ(c->blob().compare("one"), 0); - FeatureSet returned = c->feature_set(); - ASSERT_TRUE(returned.Contains("a")); - ASSERT_TRUE(!returned.Contains("b")); - ASSERT_TRUE(returned.Contains("c")); - ASSERT_EQ(returned.Get("a").type(), Variant::kString); - ASSERT_EQ(returned.Get("a").get_string(), "b"); - ASSERT_EQ(returned.Get("c").type(), Variant::kString); - ASSERT_EQ(returned.Get("c").get_string(), "d"); - - c->Next(); - ASSERT_TRUE(!c->Valid()); - - delete c; - delete db_; -} - -TEST_F(SpatialDBTest, SimpleTest) { - if (!LZ4_Supported()) { - return; - } - // iter 0 -- not read only - // iter 1 -- read only - for (int iter = 0; iter < 2; ++iter) { - DestroyDB(dbname_, Options()); - ASSERT_OK(SpatialDB::Create( - SpatialDBOptions(), dbname_, - {SpatialIndexOptions("index", BoundingBox(0, 0, 128, 128), - 3)})); - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - assert(db_ != nullptr); - - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(33, 17, 63, 79), - "one", FeatureSet(), {"index"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(65, 65, 111, 111), - "two", FeatureSet(), {"index"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(1, 49, 127, 63), - "three", FeatureSet(), {"index"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(20, 100, 21, 101), - "four", FeatureSet(), {"index"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(81, 33, 127, 63), - "five", FeatureSet(), {"index"})); - ASSERT_OK(db_->Insert(WriteOptions(), BoundingBox(1, 65, 47, 95), - "six", FeatureSet(), {"index"})); - - if (iter == 1) { - delete db_; - db_ = nullptr; - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_, true)); - } - - AssertCursorResults(BoundingBox(33, 17, 47, 31), "index", {"one"}); - AssertCursorResults(BoundingBox(17, 33, 79, 63), "index", - {"one", "three"}); - AssertCursorResults(BoundingBox(17, 81, 63, 111), "index", - {"four", "six"}); - AssertCursorResults(BoundingBox(85, 86, 85, 86), "index", {"two"}); - AssertCursorResults(BoundingBox(33, 1, 127, 111), "index", - {"one", "two", "three", "five", "six"}); - // even though the bounding box doesn't intersect, we got "four" back - // because - // it's in the same tile - AssertCursorResults(BoundingBox(18, 98, 19, 99), "index", {"four"}); - AssertCursorResults(BoundingBox(130, 130, 131, 131), "index", {}); - AssertCursorResults(BoundingBox(81, 17, 127, 31), "index", {}); - AssertCursorResults(BoundingBox(90, 50, 91, 51), "index", - {"three", "five"}); - - delete db_; - db_ = nullptr; - } -} - -namespace { -std::string RandomStr(Random* rnd) { - std::string r; - for (int k = 0; k < 10; ++k) { - r.push_back(static_cast(rnd->Uniform(26)) + 'a'); - } - return r; -} - -BoundingBox RandomBoundingBox(int limit, Random* rnd, int max_size) { - BoundingBox r; - r.min_x = rnd->Uniform(limit - 1); - r.min_y = rnd->Uniform(limit - 1); - r.max_x = r.min_x + rnd->Uniform(std::min(limit - 1 - r.min_x, max_size)) + 1; - r.max_y = r.min_y + rnd->Uniform(std::min(limit - 1 - r.min_y, max_size)) + 1; - return r; -} - -BoundingBox ScaleBB(BoundingBox b, double step) { - return BoundingBox(b.min_x * step + 1, b.min_y * step + 1, - (b.max_x + 1) * step - 1, - (b.max_y + 1) * step - 1); -} - -} // namespace - -TEST_F(SpatialDBTest, RandomizedTest) { - if (!LZ4_Supported()) { - return; - } - Random rnd(301); - std::vector>> elements; - - BoundingBox spatial_index_bounds(0, 0, (1LL << 32), (1LL << 32)); - ASSERT_OK(SpatialDB::Create( - SpatialDBOptions(), dbname_, - {SpatialIndexOptions("index", spatial_index_bounds, 7)})); - ASSERT_OK(SpatialDB::Open(SpatialDBOptions(), dbname_, &db_)); - double step = (1LL << 32) / (1 << 7); - - for (int i = 0; i < 1000; ++i) { - std::string blob = RandomStr(&rnd); - BoundingBox bbox = RandomBoundingBox(128, &rnd, 10); - ASSERT_OK(db_->Insert(WriteOptions(), ScaleBB(bbox, step), blob, - FeatureSet(), {"index"})); - elements.push_back(make_pair(blob, bbox)); - } - - // parallel - db_->Compact(2); - // serial - db_->Compact(1); - - for (int i = 0; i < 1000; ++i) { - BoundingBox int_bbox = RandomBoundingBox(128, &rnd, 10); - BoundingBox double_bbox = ScaleBB(int_bbox, step); - std::vector blobs; - for (auto e : elements) { - if (e.second.Intersects(int_bbox)) { - blobs.push_back(e.first); - } - } - AssertCursorResults(double_bbox, "index", blobs); - } - - delete db_; -} - -} // namespace spatial -} // namespace rocksdb - -int main(int argc, char** argv) { - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} - -#else -#include - -int main(int /*argc*/, char** /*argv*/) { - fprintf(stderr, "SKIPPED as SpatialDB is not supported in ROCKSDB_LITE\n"); - return 0; -} - -#endif // !ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/spatialdb/utils.h b/ceph/src/rocksdb/utilities/spatialdb/utils.h deleted file mode 100644 index fe4b4e253..000000000 --- a/ceph/src/rocksdb/utilities/spatialdb/utils.h +++ /dev/null @@ -1,95 +0,0 @@ -// Copyright (c) 2011-present, Facebook, Inc. All rights reserved. -// This source code is licensed under both the GPLv2 (found in the -// COPYING file in the root directory) and Apache 2.0 License -// (found in the LICENSE.Apache file in the root directory). - -#pragma once -#include -#include - -#include "rocksdb/utilities/spatial_db.h" - -namespace rocksdb { -namespace spatial { - -// indexing idea from http://msdn.microsoft.com/en-us/library/bb259689.aspx -inline uint64_t GetTileFromCoord(double x, double start, double end, - uint32_t tile_bits) { - if (x < start) { - return 0; - } - uint64_t tiles = 1ull << tile_bits; - uint64_t r = static_cast(((x - start) / (end - start)) * tiles); - return std::min(r, tiles - 1); -} - -inline uint64_t GetQuadKeyFromTile(uint64_t tile_x, uint64_t tile_y, - uint32_t tile_bits) { - uint64_t quad_key = 0; - for (uint32_t i = 0; i < tile_bits; ++i) { - uint64_t mask = (1ull << i); - quad_key |= (tile_x & mask) << i; - quad_key |= (tile_y & mask) << (i + 1); - } - return quad_key; -} - -inline BoundingBox GetTileBoundingBox( - const SpatialIndexOptions& spatial_index, BoundingBox bbox) { - return BoundingBox( - GetTileFromCoord(bbox.min_x, spatial_index.bbox.min_x, - spatial_index.bbox.max_x, spatial_index.tile_bits), - GetTileFromCoord(bbox.min_y, spatial_index.bbox.min_y, - spatial_index.bbox.max_y, spatial_index.tile_bits), - GetTileFromCoord(bbox.max_x, spatial_index.bbox.min_x, - spatial_index.bbox.max_x, spatial_index.tile_bits), - GetTileFromCoord(bbox.max_y, spatial_index.bbox.min_y, - spatial_index.bbox.max_y, spatial_index.tile_bits)); -} - -// big endian can be compared using memcpy -inline void PutFixed64BigEndian(std::string* dst, uint64_t value) { - char buf[sizeof(value)]; - buf[0] = (value >> 56) & 0xff; - buf[1] = (value >> 48) & 0xff; - buf[2] = (value >> 40) & 0xff; - buf[3] = (value >> 32) & 0xff; - buf[4] = (value >> 24) & 0xff; - buf[5] = (value >> 16) & 0xff; - buf[6] = (value >> 8) & 0xff; - buf[7] = value & 0xff; - dst->append(buf, sizeof(buf)); -} - -// big endian can be compared using memcpy -inline bool GetFixed64BigEndian(const Slice& input, uint64_t* value) { - if (input.size() < sizeof(uint64_t)) { - return false; - } - auto ptr = input.data(); - *value = (static_cast(static_cast(ptr[0])) << 56) | - (static_cast(static_cast(ptr[1])) << 48) | - (static_cast(static_cast(ptr[2])) << 40) | - (static_cast(static_cast(ptr[3])) << 32) | - (static_cast(static_cast(ptr[4])) << 24) | - (static_cast(static_cast(ptr[5])) << 16) | - (static_cast(static_cast(ptr[6])) << 8) | - static_cast(static_cast(ptr[7])); - return true; -} - -inline void PutDouble(std::string* dst, double d) { - dst->append(reinterpret_cast(&d), sizeof(double)); -} - -inline bool GetDouble(Slice* input, double* d) { - if (input->size() < sizeof(double)) { - return false; - } - memcpy(d, input->data(), sizeof(double)); - input->remove_prefix(sizeof(double)); - return true; -} - -} // namespace spatial -} // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/table_properties_collectors/compact_on_deletion_collector_test.cc b/ceph/src/rocksdb/utilities/table_properties_collectors/compact_on_deletion_collector_test.cc index e54d164e3..101aa988b 100644 --- a/ceph/src/rocksdb/utilities/table_properties_collectors/compact_on_deletion_collector_test.cc +++ b/ceph/src/rocksdb/utilities/table_properties_collectors/compact_on_deletion_collector_test.cc @@ -40,7 +40,7 @@ int main(int /*argc*/, char** /*argv*/) { // randomize tests rocksdb::Random rnd(301); const int kMaxTestSize = 100000l; - for (int random_test = 0; random_test < 50; random_test++) { + for (int random_test = 0; random_test < 30; random_test++) { int window_size = rnd.Uniform(kMaxTestSize) + 1; int deletion_trigger = rnd.Uniform(window_size); window_sizes.emplace_back(window_size); diff --git a/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.cc b/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.cc index 36baefc7b..4a81516a8 100644 --- a/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.cc +++ b/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.cc @@ -83,16 +83,18 @@ Status FileTraceWriter::Write(const Slice& data) { return file_writer_->Append(data); } +uint64_t FileTraceWriter::GetFileSize() { return file_writer_->GetFileSize(); } + Status NewFileTraceReader(Env* env, const EnvOptions& env_options, const std::string& trace_filename, std::unique_ptr* trace_reader) { - unique_ptr trace_file; + std::unique_ptr trace_file; Status s = env->NewRandomAccessFile(trace_filename, &trace_file, env_options); if (!s.ok()) { return s; } - unique_ptr file_reader; + std::unique_ptr file_reader; file_reader.reset( new RandomAccessFileReader(std::move(trace_file), trace_filename)); trace_reader->reset(new FileTraceReader(std::move(file_reader))); @@ -102,13 +104,13 @@ Status NewFileTraceReader(Env* env, const EnvOptions& env_options, Status NewFileTraceWriter(Env* env, const EnvOptions& env_options, const std::string& trace_filename, std::unique_ptr* trace_writer) { - unique_ptr trace_file; + std::unique_ptr trace_file; Status s = env->NewWritableFile(trace_filename, &trace_file, env_options); if (!s.ok()) { return s; } - unique_ptr file_writer; + std::unique_ptr file_writer; file_writer.reset(new WritableFileWriter(std::move(trace_file), trace_filename, env_options)); trace_writer->reset(new FileTraceWriter(std::move(file_writer))); diff --git a/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.h b/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.h index b363a3f09..863f5d9d0 100644 --- a/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.h +++ b/ceph/src/rocksdb/utilities/trace/file_trace_reader_writer.h @@ -22,7 +22,7 @@ class FileTraceReader : public TraceReader { virtual Status Close() override; private: - unique_ptr file_reader_; + std::unique_ptr file_reader_; Slice result_; size_t offset_; char* const buffer_; @@ -39,9 +39,10 @@ class FileTraceWriter : public TraceWriter { virtual Status Write(const Slice& data) override; virtual Status Close() override; + virtual uint64_t GetFileSize() override; private: - unique_ptr file_writer_; + std::unique_ptr file_writer_; }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.cc b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.cc index 89d3226d5..48c9180ae 100644 --- a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.cc +++ b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.cc @@ -80,8 +80,11 @@ Status OptimisticTransaction::Rollback() { // 'exclusive' is unused for OptimisticTransaction. Status OptimisticTransaction::TryLock(ColumnFamilyHandle* column_family, const Slice& key, bool read_only, - bool exclusive, bool untracked) { - if (untracked) { + bool exclusive, const bool do_validate, + const bool assume_tracked) { + assert(!assume_tracked); // not supported + (void)assume_tracked; + if (!do_validate) { return Status::OK(); } uint32_t cfh_id = GetColumnFamilyID(column_family); diff --git a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.h b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.h index 5a19489f2..445979b96 100644 --- a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.h +++ b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction.h @@ -48,8 +48,8 @@ class OptimisticTransaction : public TransactionBaseImpl { protected: Status TryLock(ColumnFamilyHandle* column_family, const Slice& key, - bool read_only, bool exclusive, - bool untracked = false) override; + bool read_only, bool exclusive, const bool do_validate = true, + const bool assume_tracked = false) override; private: OptimisticTransactionDB* const txn_db_; diff --git a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction_test.cc b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction_test.cc index 2c196d43b..fbb0d44fd 100644 --- a/ceph/src/rocksdb/utilities/transactions/optimistic_transaction_test.cc +++ b/ceph/src/rocksdb/utilities/transactions/optimistic_transaction_test.cc @@ -37,7 +37,7 @@ class OptimisticTransactionTest : public testing::Test { DestroyDB(dbname, options); Open(); } - ~OptimisticTransactionTest() { + ~OptimisticTransactionTest() override { delete txn_db; DestroyDB(dbname, options); } diff --git a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.cc b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.cc index 67a333f3b..042ae95ab 100644 --- a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.cc +++ b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.cc @@ -37,7 +37,7 @@ TransactionID PessimisticTransaction::GenTxnID() { PessimisticTransaction::PessimisticTransaction( TransactionDB* txn_db, const WriteOptions& write_options, - const TransactionOptions& txn_options) + const TransactionOptions& txn_options, const bool init) : TransactionBaseImpl(txn_db->GetRootDB(), write_options), txn_db_impl_(nullptr), expiration_time_(0), @@ -51,7 +51,9 @@ PessimisticTransaction::PessimisticTransaction( txn_db_impl_ = static_cast_with_check(txn_db); db_impl_ = static_cast_with_check(db_); - Initialize(txn_options); + if (init) { + Initialize(txn_options); + } } void PessimisticTransaction::Initialize(const TransactionOptions& txn_options) { @@ -193,22 +195,14 @@ Status PessimisticTransaction::Prepare() { } if (can_prepare) { - bool wal_already_marked = false; txn_state_.store(AWAITING_PREPARE); // transaction can't expire after preparation expiration_time_ = 0; - if (log_number_ > 0) { - assert(txn_db_impl_->GetTxnDBOptions().write_policy == WRITE_UNPREPARED); - wal_already_marked = true; - } + assert(log_number_ == 0 || + txn_db_impl_->GetTxnDBOptions().write_policy == WRITE_UNPREPARED); s = PrepareInternal(); if (s.ok()) { - assert(log_number_ != 0); - if (!wal_already_marked) { - dbimpl_->logs_with_prep_tracker()->MarkLogAsContainingPrepSection( - log_number_); - } txn_state_.store(PREPARED); } } else if (txn_state_ == LOCKS_STOLEN) { @@ -230,10 +224,38 @@ Status WriteCommittedTxn::PrepareInternal() { WriteOptions write_options = write_options_; write_options.disableWAL = false; WriteBatchInternal::MarkEndPrepare(GetWriteBatch()->GetWriteBatch(), name_); - Status s = - db_impl_->WriteImpl(write_options, GetWriteBatch()->GetWriteBatch(), - /*callback*/ nullptr, &log_number_, /*log ref*/ 0, - /* disable_memtable*/ true); + class MarkLogCallback : public PreReleaseCallback { + public: + MarkLogCallback(DBImpl* db, bool two_write_queues) + : db_(db), two_write_queues_(two_write_queues) { + (void)two_write_queues_; // to silence unused private field warning + } + virtual Status Callback(SequenceNumber, bool is_mem_disabled, + uint64_t log_number) override { +#ifdef NDEBUG + (void)is_mem_disabled; +#endif + assert(log_number != 0); + assert(!two_write_queues_ || is_mem_disabled); // implies the 2nd queue + db_->logs_with_prep_tracker()->MarkLogAsContainingPrepSection(log_number); + return Status::OK(); + } + + private: + DBImpl* db_; + bool two_write_queues_; + } mark_log_callback(db_impl_, + db_impl_->immutable_db_options().two_write_queues); + + WriteCallback* const kNoWriteCallback = nullptr; + const uint64_t kRefNoLog = 0; + const bool kDisableMemtable = true; + SequenceNumber* const KIgnoreSeqUsed = nullptr; + const size_t kNoBatchCount = 0; + Status s = db_impl_->WriteImpl( + write_options, GetWriteBatch()->GetWriteBatch(), kNoWriteCallback, + &log_number_, kRefNoLog, kDisableMemtable, KIgnoreSeqUsed, kNoBatchCount, + &mark_log_callback); return s; } @@ -322,12 +344,27 @@ Status PessimisticTransaction::Commit() { } Status WriteCommittedTxn::CommitWithoutPrepareInternal() { - Status s = db_->Write(write_options_, GetWriteBatch()->GetWriteBatch()); + uint64_t seq_used = kMaxSequenceNumber; + auto s = + db_impl_->WriteImpl(write_options_, GetWriteBatch()->GetWriteBatch(), + /*callback*/ nullptr, /*log_used*/ nullptr, + /*log_ref*/ 0, /*disable_memtable*/ false, &seq_used); + assert(!s.ok() || seq_used != kMaxSequenceNumber); + if (s.ok()) { + SetId(seq_used); + } return s; } Status WriteCommittedTxn::CommitBatchInternal(WriteBatch* batch, size_t) { - Status s = db_->Write(write_options_, batch); + uint64_t seq_used = kMaxSequenceNumber; + auto s = db_impl_->WriteImpl(write_options_, batch, /*callback*/ nullptr, + /*log_used*/ nullptr, /*log_ref*/ 0, + /*disable_memtable*/ false, &seq_used); + assert(!s.ok() || seq_used != kMaxSequenceNumber); + if (s.ok()) { + SetId(seq_used); + } return s; } @@ -345,8 +382,15 @@ Status WriteCommittedTxn::CommitInternal() { // in non recovery mode and simply insert the values WriteBatchInternal::Append(working_batch, GetWriteBatch()->GetWriteBatch()); - auto s = db_impl_->WriteImpl(write_options_, working_batch, nullptr, nullptr, - log_number_); + uint64_t seq_used = kMaxSequenceNumber; + auto s = + db_impl_->WriteImpl(write_options_, working_batch, /*callback*/ nullptr, + /*log_used*/ nullptr, /*log_ref*/ log_number_, + /*disable_memtable*/ false, &seq_used); + assert(!s.ok() || seq_used != kMaxSequenceNumber); + if (s.ok()) { + SetId(seq_used); + } return s; } @@ -435,18 +479,17 @@ Status PessimisticTransaction::LockBatch(WriteBatch* batch, } } - virtual Status PutCF(uint32_t column_family_id, const Slice& key, - const Slice& /* unused */) override { + Status PutCF(uint32_t column_family_id, const Slice& key, + const Slice& /* unused */) override { RecordKey(column_family_id, key); return Status::OK(); } - virtual Status MergeCF(uint32_t column_family_id, const Slice& key, - const Slice& /* unused */) override { + Status MergeCF(uint32_t column_family_id, const Slice& key, + const Slice& /* unused */) override { RecordKey(column_family_id, key); return Status::OK(); } - virtual Status DeleteCF(uint32_t column_family_id, - const Slice& key) override { + Status DeleteCF(uint32_t column_family_id, const Slice& key) override { RecordKey(column_family_id, key); return Status::OK(); } @@ -493,7 +536,9 @@ Status PessimisticTransaction::LockBatch(WriteBatch* batch, // the snapshot time. Status PessimisticTransaction::TryLock(ColumnFamilyHandle* column_family, const Slice& key, bool read_only, - bool exclusive, bool skip_validate) { + bool exclusive, const bool do_validate, + const bool assume_tracked) { + assert(!assume_tracked || !do_validate); Status s; if (UNLIKELY(skip_concurrency_control_)) { return s; @@ -537,7 +582,11 @@ Status PessimisticTransaction::TryLock(ColumnFamilyHandle* column_family, // any writes since this transaction's snapshot. // TODO(agiardullo): could optimize by supporting shared txn locks in the // future - if (skip_validate || snapshot_ == nullptr) { + if (!do_validate || snapshot_ == nullptr) { + if (assume_tracked && !previously_locked) { + s = Status::InvalidArgument( + "assume_tracked is set but it is not tracked yet"); + } // Need to remember the earliest sequence number that we know that this // key has not been modified after. This is useful if this same // transaction @@ -606,7 +655,7 @@ Status PessimisticTransaction::ValidateSnapshot( // Otherwise we have either // 1: tracked_at_seq == kMaxSequenceNumber, i.e., first time tracking the key // 2: snap_seq < tracked_at_seq: last time we lock the key was via - // skip_validate option which means we had skipped ValidateSnapshot. In both + // do_validate=false which means we had skipped ValidateSnapshot. In both // cases we should do ValidateSnapshot now. *tracked_at_seq = snap_seq; diff --git a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.h b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.h index 145d561fb..1f851818e 100644 --- a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.h +++ b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction.h @@ -38,7 +38,8 @@ class PessimisticTransactionDB; class PessimisticTransaction : public TransactionBaseImpl { public: PessimisticTransaction(TransactionDB* db, const WriteOptions& write_options, - const TransactionOptions& txn_options); + const TransactionOptions& txn_options, + const bool init = true); virtual ~PessimisticTransaction(); @@ -135,8 +136,8 @@ class PessimisticTransaction : public TransactionBaseImpl { Status LockBatch(WriteBatch* batch, TransactionKeyMap* keys_to_unlock); Status TryLock(ColumnFamilyHandle* column_family, const Slice& key, - bool read_only, bool exclusive, - bool skip_validate = false) override; + bool read_only, bool exclusive, const bool do_validate = true, + const bool assume_tracked = false) override; void Clear() override; diff --git a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction_db.cc b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction_db.cc index 6b016ef72..8eb21777a 100644 --- a/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction_db.cc +++ b/ceph/src/rocksdb/utilities/transactions/pessimistic_transaction_db.cc @@ -146,7 +146,9 @@ Status PessimisticTransactionDB::Initialize( assert(real_trx); real_trx->SetLogNumber(batch_info.log_number_); assert(seq != kMaxSequenceNumber); - real_trx->SetId(seq); + if (GetTxnDBOptions().write_policy != WRITE_COMMITTED) { + real_trx->SetId(seq); + } s = real_trx->SetName(recovered_trx->name_); if (!s.ok()) { diff --git a/ceph/src/rocksdb/utilities/transactions/snapshot_checker.cc b/ceph/src/rocksdb/utilities/transactions/snapshot_checker.cc index 689502908..695c020ef 100644 --- a/ceph/src/rocksdb/utilities/transactions/snapshot_checker.cc +++ b/ceph/src/rocksdb/utilities/transactions/snapshot_checker.cc @@ -17,11 +17,11 @@ namespace rocksdb { WritePreparedSnapshotChecker::WritePreparedSnapshotChecker( WritePreparedTxnDB* /*txn_db*/) {} -bool WritePreparedSnapshotChecker::IsInSnapshot( +SnapshotCheckerResult WritePreparedSnapshotChecker::CheckInSnapshot( SequenceNumber /*sequence*/, SequenceNumber /*snapshot_sequence*/) const { // Should never be called in LITE mode. assert(false); - return true; + return SnapshotCheckerResult::kInSnapshot; } #else @@ -30,9 +30,17 @@ WritePreparedSnapshotChecker::WritePreparedSnapshotChecker( WritePreparedTxnDB* txn_db) : txn_db_(txn_db){}; -bool WritePreparedSnapshotChecker::IsInSnapshot( +SnapshotCheckerResult WritePreparedSnapshotChecker::CheckInSnapshot( SequenceNumber sequence, SequenceNumber snapshot_sequence) const { - return txn_db_->IsInSnapshot(sequence, snapshot_sequence); + bool snapshot_released = false; + // TODO(myabandeh): set min_uncommitted + bool in_snapshot = txn_db_->IsInSnapshot( + sequence, snapshot_sequence, kMinUnCommittedSeq, &snapshot_released); + if (snapshot_released) { + return SnapshotCheckerResult::kSnapshotReleased; + } + return in_snapshot ? SnapshotCheckerResult::kInSnapshot + : SnapshotCheckerResult::kNotInSnapshot; } #endif // ROCKSDB_LITE diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_base.cc b/ceph/src/rocksdb/utilities/transactions/transaction_base.cc index ac459a256..212c82242 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_base.cc +++ b/ceph/src/rocksdb/utilities/transactions/transaction_base.cc @@ -7,6 +7,12 @@ #include "utilities/transactions/transaction_base.h" +#ifndef __STDC_FORMAT_MACROS +#define __STDC_FORMAT_MACROS +#endif + +#include + #include "db/db_impl.h" #include "db/column_family.h" #include "rocksdb/comparator.h" @@ -97,7 +103,8 @@ void TransactionBaseImpl::SetSnapshotIfNeeded() { Status TransactionBaseImpl::TryLock(ColumnFamilyHandle* column_family, const SliceParts& key, bool read_only, - bool exclusive, bool skip_validate) { + bool exclusive, const bool do_validate, + const bool assume_tracked) { size_t key_size = 0; for (int i = 0; i < key.num_parts; ++i) { key_size += key.parts[i].size(); @@ -110,7 +117,8 @@ Status TransactionBaseImpl::TryLock(ColumnFamilyHandle* column_family, str.append(key.parts[i].data(), key.parts[i].size()); } - return TryLock(column_family, str, read_only, exclusive, skip_validate); + return TryLock(column_family, str, read_only, exclusive, do_validate, + assume_tracked); } void TransactionBaseImpl::SetSavePoint() { @@ -178,7 +186,7 @@ Status TransactionBaseImpl::RollbackToSavePoint() { return Status::NotFound(); } } - + Status TransactionBaseImpl::PopSavePoint() { if (save_points_ == nullptr || save_points_->empty()) { @@ -187,7 +195,7 @@ Status TransactionBaseImpl::PopSavePoint() { return Status::NotFound(); } - assert(!save_points_->empty()); + assert(!save_points_->empty()); save_points_->pop(); return write_batch_.PopSavePoint(); } @@ -215,8 +223,15 @@ Status TransactionBaseImpl::Get(const ReadOptions& read_options, Status TransactionBaseImpl::GetForUpdate(const ReadOptions& read_options, ColumnFamilyHandle* column_family, const Slice& key, std::string* value, - bool exclusive) { - Status s = TryLock(column_family, key, true /* read_only */, exclusive); + bool exclusive, + const bool do_validate) { + if (!do_validate && read_options.snapshot != nullptr) { + return Status::InvalidArgument( + "If do_validate is false then GetForUpdate with snapshot is not " + "defined."); + } + Status s = + TryLock(column_family, key, true /* read_only */, exclusive, do_validate); if (s.ok() && value != nullptr) { assert(value != nullptr); @@ -234,8 +249,15 @@ Status TransactionBaseImpl::GetForUpdate(const ReadOptions& read_options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* pinnable_val, - bool exclusive) { - Status s = TryLock(column_family, key, true /* read_only */, exclusive); + bool exclusive, + const bool do_validate) { + if (!do_validate && read_options.snapshot != nullptr) { + return Status::InvalidArgument( + "If do_validate is false then GetForUpdate with snapshot is not " + "defined."); + } + Status s = + TryLock(column_family, key, true /* read_only */, exclusive, do_validate); if (s.ok() && pinnable_val != nullptr) { s = Get(read_options, column_family, key, pinnable_val); @@ -303,9 +325,11 @@ Iterator* TransactionBaseImpl::GetIterator(const ReadOptions& read_options, } Status TransactionBaseImpl::Put(ColumnFamilyHandle* column_family, - const Slice& key, const Slice& value) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const Slice& key, const Slice& value, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->Put(column_family, key, value); @@ -318,10 +342,11 @@ Status TransactionBaseImpl::Put(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::Put(ColumnFamilyHandle* column_family, - const SliceParts& key, - const SliceParts& value) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const SliceParts& key, const SliceParts& value, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->Put(column_family, key, value); @@ -334,9 +359,11 @@ Status TransactionBaseImpl::Put(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::Merge(ColumnFamilyHandle* column_family, - const Slice& key, const Slice& value) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const Slice& key, const Slice& value, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->Merge(column_family, key, value); @@ -349,9 +376,11 @@ Status TransactionBaseImpl::Merge(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::Delete(ColumnFamilyHandle* column_family, - const Slice& key) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const Slice& key, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->Delete(column_family, key); @@ -364,9 +393,11 @@ Status TransactionBaseImpl::Delete(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::Delete(ColumnFamilyHandle* column_family, - const SliceParts& key) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const SliceParts& key, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->Delete(column_family, key); @@ -379,9 +410,11 @@ Status TransactionBaseImpl::Delete(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::SingleDelete(ColumnFamilyHandle* column_family, - const Slice& key) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const Slice& key, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->SingleDelete(column_family, key); @@ -394,9 +427,11 @@ Status TransactionBaseImpl::SingleDelete(ColumnFamilyHandle* column_family, } Status TransactionBaseImpl::SingleDelete(ColumnFamilyHandle* column_family, - const SliceParts& key) { - Status s = - TryLock(column_family, key, false /* read_only */, true /* exclusive */); + const SliceParts& key, + const bool assume_tracked) { + const bool do_validate = !assume_tracked; + Status s = TryLock(column_family, key, false /* read_only */, + true /* exclusive */, do_validate, assume_tracked); if (s.ok()) { s = GetBatchForWrite()->SingleDelete(column_family, key); @@ -411,7 +446,7 @@ Status TransactionBaseImpl::SingleDelete(ColumnFamilyHandle* column_family, Status TransactionBaseImpl::PutUntracked(ColumnFamilyHandle* column_family, const Slice& key, const Slice& value) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->Put(column_family, key, value); @@ -427,7 +462,7 @@ Status TransactionBaseImpl::PutUntracked(ColumnFamilyHandle* column_family, const SliceParts& key, const SliceParts& value) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->Put(column_family, key, value); @@ -443,7 +478,7 @@ Status TransactionBaseImpl::MergeUntracked(ColumnFamilyHandle* column_family, const Slice& key, const Slice& value) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->Merge(column_family, key, value); @@ -458,7 +493,7 @@ Status TransactionBaseImpl::MergeUntracked(ColumnFamilyHandle* column_family, Status TransactionBaseImpl::DeleteUntracked(ColumnFamilyHandle* column_family, const Slice& key) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->Delete(column_family, key); @@ -473,7 +508,7 @@ Status TransactionBaseImpl::DeleteUntracked(ColumnFamilyHandle* column_family, Status TransactionBaseImpl::DeleteUntracked(ColumnFamilyHandle* column_family, const SliceParts& key) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->Delete(column_family, key); @@ -488,7 +523,7 @@ Status TransactionBaseImpl::DeleteUntracked(ColumnFamilyHandle* column_family, Status TransactionBaseImpl::SingleDeleteUntracked( ColumnFamilyHandle* column_family, const Slice& key) { Status s = TryLock(column_family, key, false /* read_only */, - true /* exclusive */, true /* skip_validate */); + true /* exclusive */, false /* do_validate */); if (s.ok()) { s = GetBatchForWrite()->SingleDelete(column_family, key); @@ -626,6 +661,9 @@ WriteBatchBase* TransactionBaseImpl::GetBatchForWrite() { void TransactionBaseImpl::ReleaseSnapshot(const Snapshot* snapshot, DB* db) { if (snapshot != nullptr) { + ROCKS_LOG_DETAILS(dbimpl_->immutable_db_options().info_log, + "ReleaseSnapshot %" PRIu64 " Set", + snapshot->GetSequenceNumber()); db->ReleaseSnapshot(snapshot); } } diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_base.h b/ceph/src/rocksdb/utilities/transactions/transaction_base.h index 171e13588..9154b3274 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_base.h +++ b/ceph/src/rocksdb/utilities/transactions/transaction_base.h @@ -36,11 +36,12 @@ class TransactionBaseImpl : public Transaction { // Called before executing Put, Merge, Delete, and GetForUpdate. If TryLock // returns non-OK, the Put/Merge/Delete/GetForUpdate will be failed. - // skip_validate will be true if called from PutUntracked, DeleteUntracked, or - // MergeUntracked. + // do_validate will be false if called from PutUntracked, DeleteUntracked, + // MergeUntracked, or GetForUpdate(do_validate=false) virtual Status TryLock(ColumnFamilyHandle* column_family, const Slice& key, bool read_only, bool exclusive, - bool skip_validate = false) = 0; + const bool do_validate = true, + const bool assume_tracked = false) = 0; void SetSavePoint() override; @@ -63,16 +64,19 @@ class TransactionBaseImpl : public Transaction { using Transaction::GetForUpdate; Status GetForUpdate(const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, - std::string* value, bool exclusive) override; + std::string* value, bool exclusive, + const bool do_validate) override; Status GetForUpdate(const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, - PinnableSlice* pinnable_val, bool exclusive) override; + PinnableSlice* pinnable_val, bool exclusive, + const bool do_validate) override; Status GetForUpdate(const ReadOptions& options, const Slice& key, - std::string* value, bool exclusive) override { + std::string* value, bool exclusive, + const bool do_validate) override { return GetForUpdate(options, db_->DefaultColumnFamily(), key, value, - exclusive); + exclusive, do_validate); } std::vector MultiGet( @@ -109,36 +113,38 @@ class TransactionBaseImpl : public Transaction { ColumnFamilyHandle* column_family) override; Status Put(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) override; + const Slice& value, const bool assume_tracked = false) override; Status Put(const Slice& key, const Slice& value) override { return Put(nullptr, key, value); } Status Put(ColumnFamilyHandle* column_family, const SliceParts& key, - const SliceParts& value) override; + const SliceParts& value, + const bool assume_tracked = false) override; Status Put(const SliceParts& key, const SliceParts& value) override { return Put(nullptr, key, value); } Status Merge(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) override; + const Slice& value, const bool assume_tracked = false) override; Status Merge(const Slice& key, const Slice& value) override { return Merge(nullptr, key, value); } - Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override; + Status Delete(ColumnFamilyHandle* column_family, const Slice& key, + const bool assume_tracked = false) override; Status Delete(const Slice& key) override { return Delete(nullptr, key); } - Status Delete(ColumnFamilyHandle* column_family, - const SliceParts& key) override; + Status Delete(ColumnFamilyHandle* column_family, const SliceParts& key, + const bool assume_tracked = false) override; Status Delete(const SliceParts& key) override { return Delete(nullptr, key); } - Status SingleDelete(ColumnFamilyHandle* column_family, - const Slice& key) override; + Status SingleDelete(ColumnFamilyHandle* column_family, const Slice& key, + const bool assume_tracked = false) override; Status SingleDelete(const Slice& key) override { return SingleDelete(nullptr, key); } - Status SingleDelete(ColumnFamilyHandle* column_family, - const SliceParts& key) override; + Status SingleDelete(ColumnFamilyHandle* column_family, const SliceParts& key, + const bool assume_tracked = false) override; Status SingleDelete(const SliceParts& key) override { return SingleDelete(nullptr, key); } @@ -335,7 +341,8 @@ class TransactionBaseImpl : public Transaction { std::shared_ptr snapshot_notifier_ = nullptr; Status TryLock(ColumnFamilyHandle* column_family, const SliceParts& key, - bool read_only, bool exclusive, bool skip_validate = false); + bool read_only, bool exclusive, const bool do_validate = true, + const bool assume_tracked = false); WriteBatchBase* GetBatchForWrite(); void SetSnapshotInternal(const Snapshot* snapshot); diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_db_mutex_impl.cc b/ceph/src/rocksdb/utilities/transactions/transaction_db_mutex_impl.cc index b6120a168..244a95077 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_db_mutex_impl.cc +++ b/ceph/src/rocksdb/utilities/transactions/transaction_db_mutex_impl.cc @@ -19,7 +19,7 @@ namespace rocksdb { class TransactionDBMutexImpl : public TransactionDBMutex { public: TransactionDBMutexImpl() {} - ~TransactionDBMutexImpl() {} + ~TransactionDBMutexImpl() override {} Status Lock() override; @@ -36,7 +36,7 @@ class TransactionDBMutexImpl : public TransactionDBMutex { class TransactionDBCondVarImpl : public TransactionDBCondVar { public: TransactionDBCondVarImpl() {} - ~TransactionDBCondVarImpl() {} + ~TransactionDBCondVarImpl() override {} Status Wait(std::shared_ptr mutex) override; diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_lock_mgr.cc b/ceph/src/rocksdb/utilities/transactions/transaction_lock_mgr.cc index d285fd30e..9074a1494 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_lock_mgr.cc +++ b/ceph/src/rocksdb/utilities/transactions/transaction_lock_mgr.cc @@ -24,7 +24,7 @@ #include "rocksdb/slice.h" #include "rocksdb/utilities/transaction_db_mutex.h" #include "util/cast_util.h" -#include "util/murmurhash.h" +#include "util/hash.h" #include "util/sync_point.h" #include "util/thread_local.h" #include "utilities/transactions/pessimistic_transaction_db.h" @@ -104,7 +104,7 @@ void DeadlockInfoBuffer::AddNewPath(DeadlockPath path) { return; } - paths_buffer_[buffer_idx_] = path; + paths_buffer_[buffer_idx_] = std::move(path); buffer_idx_ = (buffer_idx_ + 1) % paths_buffer_.size(); } @@ -183,8 +183,7 @@ TransactionLockMgr::~TransactionLockMgr() {} size_t LockMap::GetStripe(const std::string& key) const { assert(num_stripes_ > 0); - static murmur_hash hash; - size_t stripe = hash(key) % num_stripes_; + size_t stripe = static_cast(GetSliceNPHash64(key)) % num_stripes_; return stripe; } @@ -222,9 +221,9 @@ void TransactionLockMgr::RemoveColumnFamily(uint32_t column_family_id) { } } -// Look up the LockMap shared_ptr for a given column_family_id. +// Look up the LockMap std::shared_ptr for a given column_family_id. // Note: The LockMap is only valid as long as the caller is still holding on -// to the returned shared_ptr. +// to the returned std::shared_ptr. std::shared_ptr TransactionLockMgr::GetLockMap( uint32_t column_family_id) { // First check thread-local cache @@ -494,8 +493,8 @@ bool TransactionLockMgr::IncrementWaiters( auto extracted_info = wait_txn_map_.Get(queue_values[head]); path.push_back({queue_values[head], extracted_info.m_cf_id, - extracted_info.m_waiting_key, - extracted_info.m_exclusive}); + extracted_info.m_exclusive, + extracted_info.m_waiting_key}); head = queue_parents[head]; } env->GetCurrentTime(&deadlock_time); diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_test.cc b/ceph/src/rocksdb/utilities/transactions/transaction_test.cc index f49c92257..732d4c812 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_test.cc +++ b/ceph/src/rocksdb/utilities/transactions/transaction_test.cc @@ -66,12 +66,16 @@ INSTANTIATE_TEST_CASE_P( #ifndef ROCKSDB_VALGRIND_RUN INSTANTIATE_TEST_CASE_P( MySQLStyleTransactionTest, MySQLStyleTransactionTest, - ::testing::Values(std::make_tuple(false, false, WRITE_COMMITTED), - std::make_tuple(false, true, WRITE_COMMITTED), - std::make_tuple(false, false, WRITE_PREPARED), - std::make_tuple(false, true, WRITE_PREPARED), - std::make_tuple(false, false, WRITE_UNPREPARED), - std::make_tuple(false, true, WRITE_UNPREPARED))); + ::testing::Values(std::make_tuple(false, false, WRITE_COMMITTED, false), + std::make_tuple(false, true, WRITE_COMMITTED, false), + std::make_tuple(false, false, WRITE_PREPARED, false), + std::make_tuple(false, false, WRITE_PREPARED, true), + std::make_tuple(false, true, WRITE_PREPARED, false), + std::make_tuple(false, true, WRITE_PREPARED, true), + std::make_tuple(false, false, WRITE_UNPREPARED, false), + std::make_tuple(false, false, WRITE_UNPREPARED, true), + std::make_tuple(false, true, WRITE_UNPREPARED, false), + std::make_tuple(false, true, WRITE_UNPREPARED, true))); #endif // ROCKSDB_VALGRIND_RUN TEST_P(TransactionTest, DoubleEmptyWrite) { @@ -140,39 +144,110 @@ TEST_P(TransactionTest, SuccessTest) { delete txn; } +// The test clarifies the contract of do_validate and assume_tracked +// in GetForUpdate and Put/Merge/Delete +TEST_P(TransactionTest, AssumeExclusiveTracked) { + WriteOptions write_options; + ReadOptions read_options; + std::string value; + Status s; + TransactionOptions txn_options; + txn_options.lock_timeout = 1; + const bool EXCLUSIVE = true; + const bool DO_VALIDATE = true; + const bool ASSUME_LOCKED = true; + + Transaction* txn = db->BeginTransaction(write_options, txn_options); + ASSERT_TRUE(txn); + txn->SetSnapshot(); + + // commit a value after the snapshot is taken + ASSERT_OK(db->Put(write_options, Slice("foo"), Slice("bar"))); + + // By default write should fail to the commit after our snapshot + s = txn->GetForUpdate(read_options, "foo", &value, EXCLUSIVE); + ASSERT_TRUE(s.IsBusy()); + // But the user could direct the db to skip validating the snapshot. The read + // value then should be the most recently committed + ASSERT_OK( + txn->GetForUpdate(read_options, "foo", &value, EXCLUSIVE, !DO_VALIDATE)); + ASSERT_EQ(value, "bar"); + + // Although ValidateSnapshot is skipped the key must have still got locked + s = db->Put(write_options, Slice("foo"), Slice("bar")); + ASSERT_TRUE(s.IsTimedOut()); + + // By default the write operations should fail due to the commit after the + // snapshot + s = txn->Put(Slice("foo"), Slice("bar1")); + ASSERT_TRUE(s.IsBusy()); + s = txn->Put(db->DefaultColumnFamily(), Slice("foo"), Slice("bar1"), + !ASSUME_LOCKED); + ASSERT_TRUE(s.IsBusy()); + // But the user could direct the db that it already assumes exclusive lock on + // the key due to the previous GetForUpdate call. + ASSERT_OK(txn->Put(db->DefaultColumnFamily(), Slice("foo"), Slice("bar1"), + ASSUME_LOCKED)); + ASSERT_OK(txn->Merge(db->DefaultColumnFamily(), Slice("foo"), Slice("bar2"), + ASSUME_LOCKED)); + ASSERT_OK( + txn->Delete(db->DefaultColumnFamily(), Slice("foo"), ASSUME_LOCKED)); + ASSERT_OK(txn->SingleDelete(db->DefaultColumnFamily(), Slice("foo"), + ASSUME_LOCKED)); + + txn->Rollback(); + delete txn; +} + // This test clarifies the contract of ValidateSnapshot TEST_P(TransactionTest, ValidateSnapshotTest) { - for (bool with_2pc : {true, false}) { - ASSERT_OK(ReOpen()); - WriteOptions write_options; - ReadOptions read_options; - std::string value; + for (bool with_flush : {true}) { + for (bool with_2pc : {true}) { + ASSERT_OK(ReOpen()); + WriteOptions write_options; + ReadOptions read_options; + std::string value; - assert(db != nullptr); - Transaction* txn1 = - db->BeginTransaction(write_options, TransactionOptions()); - ASSERT_TRUE(txn1); - ASSERT_OK(txn1->Put(Slice("foo"), Slice("bar1"))); - if (with_2pc) { - ASSERT_OK(txn1->SetName("xid1")); - ASSERT_OK(txn1->Prepare()); - } + assert(db != nullptr); + Transaction* txn1 = + db->BeginTransaction(write_options, TransactionOptions()); + ASSERT_TRUE(txn1); + ASSERT_OK(txn1->Put(Slice("foo"), Slice("bar1"))); + if (with_2pc) { + ASSERT_OK(txn1->SetName("xid1")); + ASSERT_OK(txn1->Prepare()); + } + + if (with_flush) { + auto db_impl = reinterpret_cast(db->GetRootDB()); + db_impl->TEST_FlushMemTable(true); + // Make sure the flushed memtable is not kept in memory + int max_memtable_in_history = + std::max(options.max_write_buffer_number, + options.max_write_buffer_number_to_maintain) + + 1; + for (int i = 0; i < max_memtable_in_history; i++) { + db->Put(write_options, Slice("key"), Slice("value")); + db_impl->TEST_FlushMemTable(true); + } + } - Transaction* txn2 = - db->BeginTransaction(write_options, TransactionOptions()); - ASSERT_TRUE(txn2); - txn2->SetSnapshot(); + Transaction* txn2 = + db->BeginTransaction(write_options, TransactionOptions()); + ASSERT_TRUE(txn2); + txn2->SetSnapshot(); - ASSERT_OK(txn1->Commit()); - delete txn1; + ASSERT_OK(txn1->Commit()); + delete txn1; - auto pes_txn2 = dynamic_cast(txn2); - // Test the simple case where the key is not tracked yet - auto trakced_seq = kMaxSequenceNumber; - auto s = pes_txn2->ValidateSnapshot(db->DefaultColumnFamily(), "foo", - &trakced_seq); - ASSERT_TRUE(s.IsBusy()); - delete txn2; + auto pes_txn2 = dynamic_cast(txn2); + // Test the simple case where the key is not tracked yet + auto trakced_seq = kMaxSequenceNumber; + auto s = pes_txn2->ValidateSnapshot(db->DefaultColumnFamily(), "foo", + &trakced_seq); + ASSERT_TRUE(s.IsBusy()); + delete txn2; + } } } @@ -606,6 +681,7 @@ TEST_P(TransactionTest, DeadlockCycleShared) { } } +#ifndef ROCKSDB_VALGRIND_RUN TEST_P(TransactionStressTest, DeadlockCycle) { WriteOptions write_options; ReadOptions read_options; @@ -768,6 +844,7 @@ TEST_P(TransactionStressTest, DeadlockStress) { t.join(); } } +#endif // ROCKSDB_VALGRIND_RUN TEST_P(TransactionTest, CommitTimeBatchFailTest) { WriteOptions write_options; @@ -1097,6 +1174,7 @@ TEST_P(TransactionTest, TwoPhaseEmptyWriteTest) { } } +#ifndef ROCKSDB_VALGRIND_RUN TEST_P(TransactionStressTest, TwoPhaseExpirationTest) { Status s; @@ -1334,6 +1412,7 @@ TEST_P(TransactionTest, PersistentTwoPhaseTransactionTest) { // deleting transaction should unregister transaction ASSERT_EQ(db->GetTransactionByName("xid"), nullptr); } +#endif // ROCKSDB_VALGRIND_RUN // TODO this test needs to be updated with serial commits TEST_P(TransactionTest, DISABLED_TwoPhaseMultiThreadTest) { @@ -4716,7 +4795,7 @@ TEST_P(TransactionTest, SetSnapshotOnNextOperationWithNotification) { explicit Notifier(const Snapshot** snapshot_ptr) : snapshot_ptr_(snapshot_ptr) {} - void SnapshotCreated(const Snapshot* newSnapshot) { + void SnapshotCreated(const Snapshot* newSnapshot) override { *snapshot_ptr_ = newSnapshot; } }; @@ -4915,20 +4994,22 @@ TEST_P(TransactionStressTest, ExpiredTransactionDataRace1) { #ifndef ROCKSDB_VALGRIND_RUN namespace { -Status TransactionStressTestInserter(TransactionDB* db, - const size_t num_transactions, - const size_t num_sets, - const size_t num_keys_per_set) { - size_t seed = std::hash()(std::this_thread::get_id()); - Random64 _rand(seed); +// cmt_delay_ms is the delay between prepare and commit +// first_id is the id of the first transaction +Status TransactionStressTestInserter( + TransactionDB* db, const size_t num_transactions, const size_t num_sets, + const size_t num_keys_per_set, Random64* rand, + const uint64_t cmt_delay_ms = 0, const uint64_t first_id = 0) { WriteOptions write_options; ReadOptions read_options; TransactionOptions txn_options; - txn_options.set_snapshot = true; + // Inside the inserter we might also retake the snapshot. We do both since two + // separte functions are engaged for each. + txn_options.set_snapshot = rand->OneIn(2); - RandomTransactionInserter inserter(&_rand, write_options, read_options, - num_keys_per_set, - static_cast(num_sets)); + RandomTransactionInserter inserter( + rand, write_options, read_options, num_keys_per_set, + static_cast(num_sets), cmt_delay_ms, first_id); for (size_t t = 0; t < num_transactions; t++) { bool success = inserter.TransactionDBInsert(db, txn_options); @@ -4940,7 +5021,8 @@ Status TransactionStressTestInserter(TransactionDB* db, // Make sure at least some of the transactions succeeded. It's ok if // some failed due to write-conflicts. - if (inserter.GetFailureCount() > num_transactions / 2) { + if (num_transactions != 1 && + inserter.GetFailureCount() > num_transactions / 2) { return Status::TryAgain("Too many transactions failed! " + std::to_string(inserter.GetFailureCount()) + " / " + std::to_string(num_transactions)); @@ -4958,6 +5040,8 @@ TEST_P(MySQLStyleTransactionTest, TransactionStressTest) { ReOpenNoDelete(); const size_t num_workers = 4; // worker threads count const size_t num_checkers = 2; // checker threads count + const size_t num_slow_checkers = 2; // checker threads emulating backups + const size_t num_slow_workers = 1; // slow worker threads count const size_t num_transactions_per_thread = 10000; const uint16_t num_sets = 3; const size_t num_keys_per_set = 100; @@ -4967,15 +5051,19 @@ TEST_P(MySQLStyleTransactionTest, TransactionStressTest) { std::vector threads; std::atomic finished = {0}; bool TAKE_SNAPSHOT = true; + uint64_t time_seed = env->NowMicros(); + printf("time_seed is %" PRIu64 "\n", time_seed); // would help to reproduce std::function call_inserter = [&] { + size_t thd_seed = std::hash()(std::this_thread::get_id()); + Random64 rand(time_seed * thd_seed); ASSERT_OK(TransactionStressTestInserter(db, num_transactions_per_thread, - num_sets, num_keys_per_set)); + num_sets, num_keys_per_set, &rand)); finished++; }; std::function call_checker = [&] { - size_t seed = std::hash()(std::this_thread::get_id()); - Random64 rand(seed); + size_t thd_seed = std::hash()(std::this_thread::get_id()); + Random64 rand(time_seed * thd_seed); // Verify that data is consistent while (finished < num_workers) { Status s = RandomTransactionInserter::Verify( @@ -4983,6 +5071,28 @@ TEST_P(MySQLStyleTransactionTest, TransactionStressTest) { ASSERT_OK(s); } }; + std::function call_slow_checker = [&] { + size_t thd_seed = std::hash()(std::this_thread::get_id()); + Random64 rand(time_seed * thd_seed); + // Verify that data is consistent + while (finished < num_workers) { + uint64_t delay_ms = rand.Uniform(100) + 1; + Status s = RandomTransactionInserter::Verify( + db, num_sets, num_keys_per_set, TAKE_SNAPSHOT, &rand, delay_ms); + ASSERT_OK(s); + } + }; + std::function call_slow_inserter = [&] { + size_t thd_seed = std::hash()(std::this_thread::get_id()); + Random64 rand(time_seed * thd_seed); + uint64_t id = 0; + // Verify that data is consistent + while (finished < num_workers) { + uint64_t delay_ms = rand.Uniform(500) + 1; + ASSERT_OK(TransactionStressTestInserter(db, 1, num_sets, num_keys_per_set, + &rand, delay_ms, id++)); + } + }; for (uint32_t i = 0; i < num_workers; i++) { threads.emplace_back(call_inserter); @@ -4990,6 +5100,14 @@ TEST_P(MySQLStyleTransactionTest, TransactionStressTest) { for (uint32_t i = 0; i < num_checkers; i++) { threads.emplace_back(call_checker); } + if (with_slow_threads_) { + for (uint32_t i = 0; i < num_slow_checkers; i++) { + threads.emplace_back(call_slow_checker); + } + for (uint32_t i = 0; i < num_slow_workers; i++) { + threads.emplace_back(call_slow_inserter); + } + } // Wait for all threads to finish for (auto& t : threads) { @@ -5189,15 +5307,13 @@ TEST_P(TransactionTest, Optimizations) { class ThreeBytewiseComparator : public Comparator { public: ThreeBytewiseComparator() {} - virtual const char* Name() const override { - return "test.ThreeBytewiseComparator"; - } - virtual int Compare(const Slice& a, const Slice& b) const override { + const char* Name() const override { return "test.ThreeBytewiseComparator"; } + int Compare(const Slice& a, const Slice& b) const override { Slice na = Slice(a.data(), a.size() < 3 ? a.size() : 3); Slice nb = Slice(b.data(), b.size() < 3 ? b.size() : 3); return na.compare(nb); } - virtual bool Equal(const Slice& a, const Slice& b) const override { + bool Equal(const Slice& a, const Slice& b) const override { Slice na = Slice(a.data(), a.size() < 3 ? a.size() : 3); Slice nb = Slice(b.data(), b.size() < 3 ? b.size() : 3); return na == nb; diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_test.h b/ceph/src/rocksdb/utilities/transactions/transaction_test.h index cdc014acb..33b2c51ea 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_test.h +++ b/ceph/src/rocksdb/utilities/transactions/transaction_test.h @@ -448,6 +448,31 @@ class TransactionTest : public TransactionTestBase, class TransactionStressTest : public TransactionTest {}; -class MySQLStyleTransactionTest : public TransactionTest {}; +class MySQLStyleTransactionTest + : public TransactionTestBase, + virtual public ::testing::WithParamInterface< + std::tuple> { + public: + MySQLStyleTransactionTest() + : TransactionTestBase(std::get<0>(GetParam()), std::get<1>(GetParam()), + std::get<2>(GetParam())), + with_slow_threads_(std::get<3>(GetParam())) { + if (with_slow_threads_ && + (txn_db_options.write_policy == WRITE_PREPARED || + txn_db_options.write_policy == WRITE_UNPREPARED)) { + // The corner case with slow threads involves the caches filling + // over which would not happen even with artifial delays. To help + // such cases to show up we lower the size of the cache-related data + // structures. + txn_db_options.wp_snapshot_cache_bits = 1; + txn_db_options.wp_commit_cache_bits = 10; + EXPECT_OK(ReOpen()); + } + }; + + protected: + // Also emulate slow threads by addin artiftial delays + const bool with_slow_threads_; +}; } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_util.cc b/ceph/src/rocksdb/utilities/transactions/transaction_util.cc index 1d511880b..ec6f7e60a 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_util.cc +++ b/ceph/src/rocksdb/utilities/transactions/transaction_util.cc @@ -24,7 +24,8 @@ namespace rocksdb { Status TransactionUtil::CheckKeyForConflicts( DBImpl* db_impl, ColumnFamilyHandle* column_family, const std::string& key, - SequenceNumber snap_seq, bool cache_only, ReadCallback* snap_checker) { + SequenceNumber snap_seq, bool cache_only, ReadCallback* snap_checker, + SequenceNumber min_uncommitted) { Status result; auto cfh = reinterpret_cast(column_family); @@ -41,7 +42,7 @@ Status TransactionUtil::CheckKeyForConflicts( db_impl->GetEarliestMemTableSequenceNumber(sv, true); result = CheckKey(db_impl, sv, earliest_seq, snap_seq, key, cache_only, - snap_checker); + snap_checker, min_uncommitted); db_impl->ReturnAndCleanupSuperVersion(cfd, sv); } @@ -53,7 +54,8 @@ Status TransactionUtil::CheckKey(DBImpl* db_impl, SuperVersion* sv, SequenceNumber earliest_seq, SequenceNumber snap_seq, const std::string& key, bool cache_only, - ReadCallback* snap_checker) { + ReadCallback* snap_checker, + SequenceNumber min_uncommitted) { Status result; bool need_to_read_sst = false; @@ -75,7 +77,9 @@ Status TransactionUtil::CheckKey(DBImpl* db_impl, SuperVersion* sv, "countain a long enough history to check write at SequenceNumber: ", ToString(snap_seq)); } - } else if (snap_seq < earliest_seq) { + } else if (snap_seq < earliest_seq || min_uncommitted <= earliest_seq) { + // Use <= for min_uncommitted since earliest_seq is actually the largest sec + // before this memtable was created need_to_read_sst = true; if (cache_only) { diff --git a/ceph/src/rocksdb/utilities/transactions/transaction_util.h b/ceph/src/rocksdb/utilities/transactions/transaction_util.h index 7377874e6..0fe0e87d8 100644 --- a/ceph/src/rocksdb/utilities/transactions/transaction_util.h +++ b/ceph/src/rocksdb/utilities/transactions/transaction_util.h @@ -10,6 +10,7 @@ #include #include +#include "db/dbformat.h" #include "db/read_callback.h" #include "rocksdb/db.h" @@ -51,11 +52,11 @@ class TransactionUtil { // // Returns OK on success, BUSY if there is a conflicting write, or other error // status for any unexpected errors. - static Status CheckKeyForConflicts(DBImpl* db_impl, - ColumnFamilyHandle* column_family, - const std::string& key, - SequenceNumber snap_seq, bool cache_only, - ReadCallback* snap_checker = nullptr); + static Status CheckKeyForConflicts( + DBImpl* db_impl, ColumnFamilyHandle* column_family, + const std::string& key, SequenceNumber snap_seq, bool cache_only, + ReadCallback* snap_checker = nullptr, + SequenceNumber min_uncommitted = kMaxSequenceNumber); // For each key,SequenceNumber pair in the TransactionKeyMap, this function // will verify there have been no writes to the key in the db since that @@ -74,7 +75,8 @@ class TransactionUtil { static Status CheckKey(DBImpl* db_impl, SuperVersion* sv, SequenceNumber earliest_seq, SequenceNumber snap_seq, const std::string& key, bool cache_only, - ReadCallback* snap_checker = nullptr); + ReadCallback* snap_checker = nullptr, + SequenceNumber min_uncommitted = kMaxSequenceNumber); }; } // namespace rocksdb diff --git a/ceph/src/rocksdb/utilities/transactions/write_prepared_transaction_test.cc b/ceph/src/rocksdb/utilities/transactions/write_prepared_transaction_test.cc index 127f8cc86..c0f5a1068 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_prepared_transaction_test.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_prepared_transaction_test.cc @@ -13,6 +13,7 @@ #include #include +#include #include #include #include @@ -27,6 +28,7 @@ #include "rocksdb/utilities/transaction_db.h" #include "table/mock_table.h" #include "util/fault_injection_test_env.h" +#include "util/mutexlock.h" #include "util/random.h" #include "util/string_util.h" #include "util/sync_point.h" @@ -322,20 +324,13 @@ class WritePreparedTxnDBMock : public WritePreparedTxnDB { public: WritePreparedTxnDBMock(DBImpl* db_impl, TransactionDBOptions& opt) : WritePreparedTxnDB(db_impl, opt) {} - WritePreparedTxnDBMock(DBImpl* db_impl, TransactionDBOptions& opt, - size_t snapshot_cache_size) - : WritePreparedTxnDB(db_impl, opt, snapshot_cache_size) {} - WritePreparedTxnDBMock(DBImpl* db_impl, TransactionDBOptions& opt, - size_t snapshot_cache_size, size_t commit_cache_size) - : WritePreparedTxnDB(db_impl, opt, snapshot_cache_size, - commit_cache_size) {} void SetDBSnapshots(const std::vector& snapshots) { snapshots_ = snapshots; } void TakeSnapshot(SequenceNumber seq) { snapshots_.push_back(seq); } protected: - virtual const std::vector GetSnapshotListFromDB( + const std::vector GetSnapshotListFromDB( SequenceNumber /* unused */) override { return snapshots_; } @@ -351,6 +346,14 @@ class WritePreparedTransactionTestBase : public TransactionTestBase { : TransactionTestBase(use_stackable_db, two_write_queue, write_policy){}; protected: + void UpdateTransactionDBOptions(size_t snapshot_cache_bits, + size_t commit_cache_bits) { + txn_db_options.wp_snapshot_cache_bits = snapshot_cache_bits; + txn_db_options.wp_commit_cache_bits = commit_cache_bits; + } + void UpdateTransactionDBOptions(size_t snapshot_cache_bits) { + txn_db_options.wp_snapshot_cache_bits = snapshot_cache_bits; + } // If expect_update is set, check if it actually updated old_commit_map_. If // it did not and yet suggested not to check the next snapshot, do the // opposite to check if it was not a bad suggestion. @@ -731,13 +734,86 @@ TEST_P(WritePreparedTransactionTest, MaybeUpdateOldCommitMap) { MaybeUpdateOldCommitMapTestWithNext(p, c, s, ns, false); } +// Reproduce the bug with two snapshots with the same seuqence number and test +// that the release of the first snapshot will not affect the reads by the other +// snapshot +TEST_P(WritePreparedTransactionTest, DoubleSnapshot) { + TransactionOptions txn_options; + Status s; + + // Insert initial value + ASSERT_OK(db->Put(WriteOptions(), "key", "value1")); + + WritePreparedTxnDB* wp_db = dynamic_cast(db); + Transaction* txn = + wp_db->BeginTransaction(WriteOptions(), txn_options, nullptr); + ASSERT_OK(txn->SetName("txn")); + ASSERT_OK(txn->Put("key", "value2")); + ASSERT_OK(txn->Prepare()); + // Three snapshots with the same seq number + const Snapshot* snapshot0 = wp_db->GetSnapshot(); + const Snapshot* snapshot1 = wp_db->GetSnapshot(); + const Snapshot* snapshot2 = wp_db->GetSnapshot(); + ASSERT_OK(txn->Commit()); + SequenceNumber cache_size = wp_db->COMMIT_CACHE_SIZE; + SequenceNumber overlap_seq = txn->GetId() + cache_size; + delete txn; + + // 4th snapshot with a larger seq + const Snapshot* snapshot3 = wp_db->GetSnapshot(); + // Cause an eviction to advance max evicted seq number + // This also fetches the 4 snapshots from db since their seq is lower than the + // new max + wp_db->AddCommitted(overlap_seq, overlap_seq); + + ReadOptions ropt; + // It should see the value before commit + ropt.snapshot = snapshot2; + PinnableSlice pinnable_val; + s = wp_db->Get(ropt, wp_db->DefaultColumnFamily(), "key", &pinnable_val); + ASSERT_OK(s); + ASSERT_TRUE(pinnable_val == "value1"); + pinnable_val.Reset(); + + wp_db->ReleaseSnapshot(snapshot1); + + // It should still see the value before commit + s = wp_db->Get(ropt, wp_db->DefaultColumnFamily(), "key", &pinnable_val); + ASSERT_OK(s); + ASSERT_TRUE(pinnable_val == "value1"); + pinnable_val.Reset(); + + // Cause an eviction to advance max evicted seq number and trigger updating + // the snapshot list + overlap_seq += cache_size; + wp_db->AddCommitted(overlap_seq, overlap_seq); + + // It should still see the value before commit + s = wp_db->Get(ropt, wp_db->DefaultColumnFamily(), "key", &pinnable_val); + ASSERT_OK(s); + ASSERT_TRUE(pinnable_val == "value1"); + pinnable_val.Reset(); + + wp_db->ReleaseSnapshot(snapshot0); + wp_db->ReleaseSnapshot(snapshot2); + wp_db->ReleaseSnapshot(snapshot3); +} + +size_t UniqueCnt(std::vector vec) { + std::set aset; + for (auto i : vec) { + aset.insert(i); + } + return aset.size(); +} // Test that the entries in old_commit_map_ get garbage collected properly TEST_P(WritePreparedTransactionTest, OldCommitMapGC) { const size_t snapshot_cache_bits = 0; const size_t commit_cache_bits = 0; DBImpl* mock_db = new DBImpl(options, dbname); - std::unique_ptr wp_db(new WritePreparedTxnDBMock( - mock_db, txn_db_options, snapshot_cache_bits, commit_cache_bits)); + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + std::unique_ptr wp_db( + new WritePreparedTxnDBMock(mock_db, txn_db_options)); SequenceNumber seq = 0; // Take the first snapshot that overlaps with two txn @@ -779,9 +855,9 @@ TEST_P(WritePreparedTransactionTest, OldCommitMapGC) { ASSERT_FALSE(wp_db->old_commit_map_empty_.load()); ReadLock rl(&wp_db->old_commit_map_mutex_); ASSERT_EQ(3, wp_db->old_commit_map_.size()); - ASSERT_EQ(2, wp_db->old_commit_map_[snap_seq1].size()); - ASSERT_EQ(1, wp_db->old_commit_map_[snap_seq2].size()); - ASSERT_EQ(1, wp_db->old_commit_map_[snap_seq3].size()); + ASSERT_EQ(2, UniqueCnt(wp_db->old_commit_map_[snap_seq1])); + ASSERT_EQ(1, UniqueCnt(wp_db->old_commit_map_[snap_seq2])); + ASSERT_EQ(1, UniqueCnt(wp_db->old_commit_map_[snap_seq3])); } // Verify that the 2nd snapshot is cleaned up after the release @@ -790,8 +866,8 @@ TEST_P(WritePreparedTransactionTest, OldCommitMapGC) { ASSERT_FALSE(wp_db->old_commit_map_empty_.load()); ReadLock rl(&wp_db->old_commit_map_mutex_); ASSERT_EQ(2, wp_db->old_commit_map_.size()); - ASSERT_EQ(2, wp_db->old_commit_map_[snap_seq1].size()); - ASSERT_EQ(1, wp_db->old_commit_map_[snap_seq3].size()); + ASSERT_EQ(2, UniqueCnt(wp_db->old_commit_map_[snap_seq1])); + ASSERT_EQ(1, UniqueCnt(wp_db->old_commit_map_[snap_seq3])); } // Verify that the 1st snapshot is cleaned up after the release @@ -800,7 +876,7 @@ TEST_P(WritePreparedTransactionTest, OldCommitMapGC) { ASSERT_FALSE(wp_db->old_commit_map_empty_.load()); ReadLock rl(&wp_db->old_commit_map_mutex_); ASSERT_EQ(1, wp_db->old_commit_map_.size()); - ASSERT_EQ(1, wp_db->old_commit_map_[snap_seq3].size()); + ASSERT_EQ(1, UniqueCnt(wp_db->old_commit_map_[snap_seq3])); } // Verify that the 3rd snapshot is cleaned up after the release @@ -816,12 +892,14 @@ TEST_P(WritePreparedTransactionTest, CheckAgainstSnapshotsTest) { std::vector snapshots = {100l, 200l, 300l, 400l, 500l, 600l, 700l, 800l, 900l}; const size_t snapshot_cache_bits = 2; + const uint64_t cache_size = 1ul << snapshot_cache_bits; // Safety check to express the intended size in the test. Can be adjusted if // the snapshots lists changed. assert((1ul << snapshot_cache_bits) * 2 + 1 == snapshots.size()); DBImpl* mock_db = new DBImpl(options, dbname); + UpdateTransactionDBOptions(snapshot_cache_bits); std::unique_ptr wp_db( - new WritePreparedTxnDBMock(mock_db, txn_db_options, snapshot_cache_bits)); + new WritePreparedTxnDBMock(mock_db, txn_db_options)); SequenceNumber version = 1000l; ASSERT_EQ(0, wp_db->snapshots_total_); wp_db->UpdateSnapshots(snapshots, version); @@ -843,6 +921,57 @@ TEST_P(WritePreparedTransactionTest, CheckAgainstSnapshotsTest) { commit_entry.prep_seq <= snapshots.back(); ASSERT_EQ(expect_update, !wp_db->old_commit_map_empty_); } + + // Test that search will include multiple snapshot from snapshot cache + { + // exclude first and last item in the cache + CommitEntry commit_entry = {snapshots.front() + 1, + snapshots[cache_size - 1] - 1}; + wp_db->old_commit_map_empty_ = true; // reset + wp_db->old_commit_map_.clear(); + wp_db->CheckAgainstSnapshots(commit_entry); + ASSERT_EQ(wp_db->old_commit_map_.size(), cache_size - 2); + } + + // Test that search will include multiple snapshot from old snapshots + { + // include two in the middle + CommitEntry commit_entry = {snapshots[cache_size] + 1, + snapshots[cache_size + 2] + 1}; + wp_db->old_commit_map_empty_ = true; // reset + wp_db->old_commit_map_.clear(); + wp_db->CheckAgainstSnapshots(commit_entry); + ASSERT_EQ(wp_db->old_commit_map_.size(), 2); + } + + // Test that search will include both snapshot cache and old snapshots + // Case 1: includes all in snapshot cache + { + CommitEntry commit_entry = {snapshots.front() - 1, snapshots.back() + 1}; + wp_db->old_commit_map_empty_ = true; // reset + wp_db->old_commit_map_.clear(); + wp_db->CheckAgainstSnapshots(commit_entry); + ASSERT_EQ(wp_db->old_commit_map_.size(), snapshots.size()); + } + + // Case 2: includes all snapshot caches except the smallest + { + CommitEntry commit_entry = {snapshots.front() + 1, snapshots.back() + 1}; + wp_db->old_commit_map_empty_ = true; // reset + wp_db->old_commit_map_.clear(); + wp_db->CheckAgainstSnapshots(commit_entry); + ASSERT_EQ(wp_db->old_commit_map_.size(), snapshots.size() - 1); + } + + // Case 3: includes only the largest of snapshot cache + { + CommitEntry commit_entry = {snapshots[cache_size - 1] - 1, + snapshots.back() + 1}; + wp_db->old_commit_map_empty_ = true; // reset + wp_db->old_commit_map_.clear(); + wp_db->CheckAgainstSnapshots(commit_entry); + ASSERT_EQ(wp_db->old_commit_map_.size(), snapshots.size() - cache_size + 1); + } } // This test is too slow for travis @@ -864,8 +993,9 @@ TEST_P(SnapshotConcurrentAccessTest, SnapshotConcurrentAccessTest) { // Choose the cache size so that the new snapshot list could replace all the // existing items in the cache and also have some overflow. DBImpl* mock_db = new DBImpl(options, dbname); + UpdateTransactionDBOptions(snapshot_cache_bits); std::unique_ptr wp_db( - new WritePreparedTxnDBMock(mock_db, txn_db_options, snapshot_cache_bits)); + new WritePreparedTxnDBMock(mock_db, txn_db_options)); const size_t extra = 2; size_t loop_id = 0; // Add up to extra items that do not fit into the cache @@ -987,10 +1117,176 @@ TEST_P(WritePreparedTransactionTest, AdvanceMaxEvictedSeqBasicTest) { } } +// A new snapshot should always be always larger than max_evicted_seq_ +// Otherwise the snapshot does not go through AdvanceMaxEvictedSeq +TEST_P(WritePreparedTransactionTest, NewSnapshotLargerThanMax) { + WriteOptions woptions; + TransactionOptions txn_options; + WritePreparedTxnDB* wp_db = dynamic_cast(db); + Transaction* txn0 = db->BeginTransaction(woptions, txn_options); + ASSERT_OK(txn0->Put(Slice("key"), Slice("value"))); + ASSERT_OK(txn0->Commit()); + const SequenceNumber seq = txn0->GetId(); // is also prepare seq + delete txn0; + std::vector txns; + // Inc seq without committing anything + for (int i = 0; i < 10; i++) { + Transaction* txn = db->BeginTransaction(woptions, txn_options); + ASSERT_OK(txn->SetName("xid" + std::to_string(i))); + ASSERT_OK(txn->Put(Slice("key" + std::to_string(i)), Slice("value"))); + ASSERT_OK(txn->Prepare()); + txns.push_back(txn); + } + + // The new commit is seq + 10 + ASSERT_OK(db->Put(woptions, "key", "value")); + auto snap = wp_db->GetSnapshot(); + const SequenceNumber last_seq = snap->GetSequenceNumber(); + wp_db->ReleaseSnapshot(snap); + ASSERT_LT(seq, last_seq); + // Otherwise our test is not effective + ASSERT_LT(last_seq - seq, wp_db->INC_STEP_FOR_MAX_EVICTED); + + // Evict seq out of commit cache + const SequenceNumber overwrite_seq = seq + wp_db->COMMIT_CACHE_SIZE; + // Check that the next write could make max go beyond last + auto last_max = wp_db->max_evicted_seq_.load(); + wp_db->AddCommitted(overwrite_seq, overwrite_seq); + // Check that eviction has advanced the max + ASSERT_LT(last_max, wp_db->max_evicted_seq_.load()); + // Check that the new max has not advanced the last seq + ASSERT_LT(wp_db->max_evicted_seq_.load(), last_seq); + for (auto txn : txns) { + txn->Rollback(); + delete txn; + } +} + +// A new snapshot should always be always larger than max_evicted_seq_ +// In very rare cases max could be below last published seq. Test that +// taking snapshot will wait for max to catch up. +TEST_P(WritePreparedTransactionTest, MaxCatchupWithNewSnapshot) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // only 1 entry => frequent eviction + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WriteOptions woptions; + WritePreparedTxnDB* wp_db = dynamic_cast(db); + + const int writes = 50; + const int batch_cnt = 4; + rocksdb::port::Thread t1([&]() { + for (int i = 0; i < writes; i++) { + WriteBatch batch; + // For duplicate keys cause 4 commit entries, each evicting an entry that + // is not published yet, thus causing max ecited seq go higher than last + // published. + for (int b = 0; b < batch_cnt; b++) { + batch.Put("foo", "foo"); + } + db->Write(woptions, &batch); + } + }); + + rocksdb::port::Thread t2([&]() { + while (wp_db->max_evicted_seq_ == 0) { // wait for insert thread + std::this_thread::yield(); + } + for (int i = 0; i < 10; i++) { + auto snap = db->GetSnapshot(); + if (snap->GetSequenceNumber() != 0) { + ASSERT_LT(wp_db->max_evicted_seq_, snap->GetSequenceNumber()); + } // seq 0 is ok to be less than max since nothing is visible to it + db->ReleaseSnapshot(snap); + } + }); + + t1.join(); + t2.join(); + + // Make sure that the test has worked and seq number has advanced as we + // thought + auto snap = db->GetSnapshot(); + ASSERT_GT(snap->GetSequenceNumber(), batch_cnt * writes - 1); + db->ReleaseSnapshot(snap); +} + +// Check that old_commit_map_ cleanup works correctly if the snapshot equals +// max_evicted_seq_. +TEST_P(WritePreparedTransactionTest, CleanupSnapshotEqualToMax) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // only 1 entry => frequent eviction + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WriteOptions woptions; + WritePreparedTxnDB* wp_db = dynamic_cast(db); + // Insert something to increase seq + ASSERT_OK(db->Put(woptions, "key", "value")); + auto snap = db->GetSnapshot(); + auto snap_seq = snap->GetSequenceNumber(); + // Another insert should trigger eviction + load snapshot from db + ASSERT_OK(db->Put(woptions, "key", "value")); + // This is the scenario that we check agaisnt + ASSERT_EQ(snap_seq, wp_db->max_evicted_seq_); + // old_commit_map_ now has some data that needs gc + ASSERT_EQ(1, wp_db->snapshots_total_); + ASSERT_EQ(1, wp_db->old_commit_map_.size()); + + db->ReleaseSnapshot(snap); + + // Another insert should trigger eviction + load snapshot from db + ASSERT_OK(db->Put(woptions, "key", "value")); + + // the snapshot and related metadata must be properly garbage collected + ASSERT_EQ(0, wp_db->snapshots_total_); + ASSERT_TRUE(wp_db->snapshots_all_.empty()); + ASSERT_EQ(0, wp_db->old_commit_map_.size()); +} + +TEST_P(WritePreparedTransactionTest, AdvanceSeqByOne) { + auto snap = db->GetSnapshot(); + auto seq1 = snap->GetSequenceNumber(); + db->ReleaseSnapshot(snap); + + WritePreparedTxnDB* wp_db = dynamic_cast(db); + wp_db->AdvanceSeqByOne(); + + snap = db->GetSnapshot(); + auto seq2 = snap->GetSequenceNumber(); + db->ReleaseSnapshot(snap); + + ASSERT_LT(seq1, seq2); +} + +// Test that the txn Initilize calls the overridden functions +TEST_P(WritePreparedTransactionTest, TxnInitialize) { + TransactionOptions txn_options; + WriteOptions write_options; + ASSERT_OK(db->Put(write_options, "key", "value")); + Transaction* txn0 = db->BeginTransaction(write_options, txn_options); + ASSERT_OK(txn0->SetName("xid")); + ASSERT_OK(txn0->Put(Slice("key"), Slice("value1"))); + ASSERT_OK(txn0->Prepare()); + + // SetSnapshot is overridden to update min_uncommitted_ + txn_options.set_snapshot = true; + Transaction* txn1 = db->BeginTransaction(write_options, txn_options); + auto snap = txn1->GetSnapshot(); + auto snap_impl = reinterpret_cast(snap); + // If ::Initialize calls the overriden SetSnapshot, min_uncommitted_ must be + // udpated + ASSERT_GT(snap_impl->min_uncommitted_, kMinUnCommittedSeq); + + txn0->Rollback(); + txn1->Rollback(); + delete txn0; + delete txn1; +} + // This tests that transactions with duplicate keys perform correctly after max // is advancing their prepared sequence numbers. This will not be the case if // for example the txn does not add the prepared seq for the second sub-batch to -// the PrepareHeap structure. +// the PreparedHeap structure. TEST_P(WritePreparedTransactionTest, AdvanceMaxEvictedSeqWithDuplicatesTest) { WriteOptions write_options; TransactionOptions txn_options; @@ -1002,7 +1298,7 @@ TEST_P(WritePreparedTransactionTest, AdvanceMaxEvictedSeqWithDuplicatesTest) { WritePreparedTxnDB* wp_db = dynamic_cast(db); // Ensure that all the prepared sequence numbers will be removed from the - // PrepareHeap. + // PreparedHeap. SequenceNumber new_max = wp_db->COMMIT_CACHE_SIZE; wp_db->AdvanceMaxEvictedSeq(0, new_max); @@ -1150,12 +1446,12 @@ TEST_P(SeqAdvanceConcurrentTest, SeqAdvanceConcurrentTest) { assert(db != nullptr); db_impl = reinterpret_cast(db->GetRootDB()); seq = db_impl->TEST_GetLastVisibleSequence(); - ASSERT_EQ(exp_seq, seq); + ASSERT_LE(exp_seq, seq); // Check if flush preserves the last sequence number db_impl->Flush(fopt); seq = db_impl->GetLatestSequenceNumber(); - ASSERT_EQ(exp_seq, seq); + ASSERT_LE(exp_seq, seq); // Check if recovery after flush preserves the last sequence number db_impl->FlushWAL(true); @@ -1163,7 +1459,7 @@ TEST_P(SeqAdvanceConcurrentTest, SeqAdvanceConcurrentTest) { assert(db != nullptr); db_impl = reinterpret_cast(db->GetRootDB()); seq = db_impl->GetLatestSequenceNumber(); - ASSERT_EQ(exp_seq, seq); + ASSERT_LE(exp_seq, seq); } } @@ -1308,17 +1604,138 @@ TEST_P(WritePreparedTransactionTest, BasicRecoveryTest) { } // After recovery the commit map is empty while the max is set. The code would -// go through a different path which requires a separate test. +// go through a different path which requires a separate test. Test that the +// committed data before the restart is visible to all snapshots. TEST_P(WritePreparedTransactionTest, IsInSnapshotEmptyMapTest) { + for (bool end_with_prepare : {false, true}) { + ReOpen(); + WriteOptions woptions; + ASSERT_OK(db->Put(woptions, "key", "value")); + ASSERT_OK(db->Put(woptions, "key", "value")); + ASSERT_OK(db->Put(woptions, "key", "value")); + SequenceNumber prepare_seq = kMaxSequenceNumber; + if (end_with_prepare) { + TransactionOptions txn_options; + Transaction* txn = db->BeginTransaction(woptions, txn_options); + ASSERT_OK(txn->SetName("xid0")); + ASSERT_OK(txn->Prepare()); + prepare_seq = txn->GetId(); + delete txn; + } + dynamic_cast(db)->TEST_Crash(); + auto db_impl = reinterpret_cast(db->GetRootDB()); + db_impl->FlushWAL(true); + ReOpenNoDelete(); + WritePreparedTxnDB* wp_db = dynamic_cast(db); + assert(wp_db != nullptr); + ASSERT_GT(wp_db->max_evicted_seq_, 0); // max after recovery + // Take a snapshot right after recovery + const Snapshot* snap = db->GetSnapshot(); + auto snap_seq = snap->GetSequenceNumber(); + ASSERT_GT(snap_seq, 0); + + for (SequenceNumber seq = 0; + seq <= wp_db->max_evicted_seq_ && seq != prepare_seq; seq++) { + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq)); + } + if (end_with_prepare) { + ASSERT_FALSE(wp_db->IsInSnapshot(prepare_seq, snap_seq)); + } + // trivial check + ASSERT_FALSE(wp_db->IsInSnapshot(snap_seq + 1, snap_seq)); + + db->ReleaseSnapshot(snap); + + ASSERT_OK(db->Put(woptions, "key", "value")); + // Take a snapshot after some writes + snap = db->GetSnapshot(); + snap_seq = snap->GetSequenceNumber(); + for (SequenceNumber seq = 0; + seq <= wp_db->max_evicted_seq_ && seq != prepare_seq; seq++) { + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq)); + } + if (end_with_prepare) { + ASSERT_FALSE(wp_db->IsInSnapshot(prepare_seq, snap_seq)); + } + // trivial check + ASSERT_FALSE(wp_db->IsInSnapshot(snap_seq + 1, snap_seq)); + + db->ReleaseSnapshot(snap); + } +} + +// Shows the contract of IsInSnapshot when called on invalid/released snapshots +TEST_P(WritePreparedTransactionTest, IsInSnapshotReleased) { WritePreparedTxnDB* wp_db = dynamic_cast(db); - wp_db->max_evicted_seq_ = 100; - ASSERT_FALSE(wp_db->IsInSnapshot(50, 40)); - ASSERT_TRUE(wp_db->IsInSnapshot(50, 50)); - ASSERT_TRUE(wp_db->IsInSnapshot(50, 100)); - ASSERT_TRUE(wp_db->IsInSnapshot(50, 150)); - ASSERT_FALSE(wp_db->IsInSnapshot(100, 80)); - ASSERT_TRUE(wp_db->IsInSnapshot(100, 100)); - ASSERT_TRUE(wp_db->IsInSnapshot(100, 150)); + WriteOptions woptions; + ASSERT_OK(db->Put(woptions, "key", "value")); + // snap seq = 1 + const Snapshot* snap1 = db->GetSnapshot(); + ASSERT_OK(db->Put(woptions, "key", "value")); + ASSERT_OK(db->Put(woptions, "key", "value")); + // snap seq = 3 + const Snapshot* snap2 = db->GetSnapshot(); + const SequenceNumber seq = 1; + // Evict seq out of commit cache + size_t overwrite_seq = wp_db->COMMIT_CACHE_SIZE + seq; + wp_db->AddCommitted(overwrite_seq, overwrite_seq); + SequenceNumber snap_seq; + uint64_t min_uncommitted = kMinUnCommittedSeq; + bool released; + + released = false; + snap_seq = snap1->GetSequenceNumber(); + ASSERT_LE(seq, snap_seq); + // Valid snapshot lower than max + ASSERT_LE(snap_seq, wp_db->max_evicted_seq_); + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq, min_uncommitted, &released)); + ASSERT_FALSE(released); + + released = false; + snap_seq = snap1->GetSequenceNumber(); + // Invaid snapshot lower than max + ASSERT_LE(snap_seq + 1, wp_db->max_evicted_seq_); + ASSERT_TRUE( + wp_db->IsInSnapshot(seq, snap_seq + 1, min_uncommitted, &released)); + ASSERT_TRUE(released); + + db->ReleaseSnapshot(snap1); + + released = false; + // Released snapshot lower than max + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq, min_uncommitted, &released)); + // The release does not take affect until the next max advance + ASSERT_FALSE(released); + + released = false; + // Invaid snapshot lower than max + ASSERT_TRUE( + wp_db->IsInSnapshot(seq, snap_seq + 1, min_uncommitted, &released)); + ASSERT_TRUE(released); + + // This make the snapshot release to reflect in txn db structures + wp_db->AdvanceMaxEvictedSeq(wp_db->max_evicted_seq_, + wp_db->max_evicted_seq_ + 1); + + released = false; + // Released snapshot lower than max + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq, min_uncommitted, &released)); + ASSERT_TRUE(released); + + released = false; + // Invaid snapshot lower than max + ASSERT_TRUE( + wp_db->IsInSnapshot(seq, snap_seq + 1, min_uncommitted, &released)); + ASSERT_TRUE(released); + + snap_seq = snap2->GetSequenceNumber(); + + released = false; + // Unreleased snapshot lower than max + ASSERT_TRUE(wp_db->IsInSnapshot(seq, snap_seq, min_uncommitted, &released)); + ASSERT_FALSE(released); + + db->ReleaseSnapshot(snap2); } // Test WritePreparedTxnDB's IsInSnapshot against different ordering of @@ -1364,8 +1781,9 @@ TEST_P(WritePreparedTransactionTest, IsInSnapshotTest) { // The set of commit seq numbers to be excluded from IsInSnapshot queries std::set commit_seqs; DBImpl* mock_db = new DBImpl(options, dbname); - std::unique_ptr wp_db(new WritePreparedTxnDBMock( - mock_db, txn_db_options, snapshot_cache_bits, commit_cache_bits)); + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + std::unique_ptr wp_db( + new WritePreparedTxnDBMock(mock_db, txn_db_options)); // We continue until max advances a bit beyond the snapshot. while (!snapshot || wp_db->max_evicted_seq_ < snapshot + 100) { // do prepare for a transaction @@ -1785,6 +2203,259 @@ TEST_P(WritePreparedTransactionTest, CompactionShouldKeepSnapshotVisibleKeys) { db->ReleaseSnapshot(snapshot2); } +TEST_P(WritePreparedTransactionTest, SmallestUncommittedOptimization) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // disable commit cache + for (bool has_recent_prepare : {true, false}) { + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1")); + auto* transaction = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(transaction->SetName("txn")); + ASSERT_OK(transaction->Delete("key1")); + ASSERT_OK(transaction->Prepare()); + // snapshot1 should get min_uncommitted from prepared_txns_ heap. + auto snapshot1 = db->GetSnapshot(); + ASSERT_EQ(transaction->GetId(), + ((SnapshotImpl*)snapshot1)->min_uncommitted_); + // Add a commit to advance max_evicted_seq and move the prepared transaction + // into delayed_prepared_ set. + ASSERT_OK(db->Put(WriteOptions(), "key2", "value2")); + Transaction* txn2 = nullptr; + if (has_recent_prepare) { + txn2 = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(txn2->SetName("txn2")); + ASSERT_OK(txn2->Put("key3", "value3")); + ASSERT_OK(txn2->Prepare()); + } + // snapshot2 should get min_uncommitted from delayed_prepared_ set. + auto snapshot2 = db->GetSnapshot(); + ASSERT_EQ(transaction->GetId(), + ((SnapshotImpl*)snapshot1)->min_uncommitted_); + ASSERT_OK(transaction->Commit()); + delete transaction; + if (has_recent_prepare) { + ASSERT_OK(txn2->Commit()); + delete txn2; + } + VerifyKeys({{"key1", "NOT_FOUND"}}); + VerifyKeys({{"key1", "value1"}}, snapshot1); + VerifyKeys({{"key1", "value1"}}, snapshot2); + db->ReleaseSnapshot(snapshot1); + db->ReleaseSnapshot(snapshot2); + } +} + +// Insert two values, v1 and v2, for a key. Between prepare and commit of v2 +// take two snapshots, s1 and s2. Release s1 during compaction. +// Test to make sure compaction doesn't get confused and think s1 can see both +// values, and thus compact out the older value by mistake. +TEST_P(WritePreparedTransactionTest, ReleaseSnapshotDuringCompaction) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // minimum commit cache + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1_1")); + auto* transaction = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(transaction->SetName("txn")); + ASSERT_OK(transaction->Put("key1", "value1_2")); + ASSERT_OK(transaction->Prepare()); + auto snapshot1 = db->GetSnapshot(); + // Increment sequence number. + ASSERT_OK(db->Put(WriteOptions(), "key2", "value2")); + auto snapshot2 = db->GetSnapshot(); + ASSERT_OK(transaction->Commit()); + delete transaction; + VerifyKeys({{"key1", "value1_2"}}); + VerifyKeys({{"key1", "value1_1"}}, snapshot1); + VerifyKeys({{"key1", "value1_1"}}, snapshot2); + // Add a flush to avoid compaction to fallback to trivial move. + + auto callback = [&](void*) { + // Release snapshot1 after CompactionIterator init. + // CompactionIterator need to figure out the earliest snapshot + // that can see key1:value1_2 is kMaxSequenceNumber, not + // snapshot1 or snapshot2. + db->ReleaseSnapshot(snapshot1); + // Add some keys to advance max_evicted_seq. + ASSERT_OK(db->Put(WriteOptions(), "key3", "value3")); + ASSERT_OK(db->Put(WriteOptions(), "key4", "value4")); + }; + SyncPoint::GetInstance()->SetCallBack("CompactionIterator:AfterInit", + callback); + SyncPoint::GetInstance()->EnableProcessing(); + + ASSERT_OK(db->Flush(FlushOptions())); + VerifyKeys({{"key1", "value1_2"}}); + VerifyKeys({{"key1", "value1_1"}}, snapshot2); + db->ReleaseSnapshot(snapshot2); + SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +// Insert two values, v1 and v2, for a key. Take two snapshots, s1 and s2, +// after committing v2. Release s1 during compaction, right after compaction +// processes v2 and before processes v1. Test to make sure compaction doesn't +// get confused and believe v1 and v2 are visible to different snapshot +// (v1 by s2, v2 by s1) and refuse to compact out v1. +TEST_P(WritePreparedTransactionTest, ReleaseSnapshotDuringCompaction2) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // minimum commit cache + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1")); + ASSERT_OK(db->Put(WriteOptions(), "key1", "value2")); + SequenceNumber v2_seq = db->GetLatestSequenceNumber(); + auto* s1 = db->GetSnapshot(); + // Advance sequence number. + ASSERT_OK(db->Put(WriteOptions(), "key2", "dummy")); + auto* s2 = db->GetSnapshot(); + + int count_value = 0; + auto callback = [&](void* arg) { + auto* ikey = reinterpret_cast(arg); + if (ikey->user_key == "key1") { + count_value++; + if (count_value == 2) { + // Processing v1. + db->ReleaseSnapshot(s1); + // Add some keys to advance max_evicted_seq and update + // old_commit_map. + ASSERT_OK(db->Put(WriteOptions(), "key3", "dummy")); + ASSERT_OK(db->Put(WriteOptions(), "key4", "dummy")); + } + } + }; + SyncPoint::GetInstance()->SetCallBack("CompactionIterator:ProcessKV", + callback); + SyncPoint::GetInstance()->EnableProcessing(); + + ASSERT_OK(db->Flush(FlushOptions())); + // value1 should be compact out. + VerifyInternalKeys({{"key1", "value2", v2_seq, kTypeValue}}); + + // cleanup + db->ReleaseSnapshot(s2); + SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +// Insert two values, v1 and v2, for a key. Insert another dummy key +// so to evict the commit cache for v2, while v1 is still in commit cache. +// Take two snapshots, s1 and s2. Release s1 during compaction. +// Since commit cache for v2 is evicted, and old_commit_map don't have +// s1 (it is released), +// TODO(myabandeh): how can we be sure that the v2's commit info is evicted +// (and not v1's)? Instead of putting a dummy, we can directly call +// AddCommitted(v2_seq + cache_size, ...) to evict v2's entry from commit cache. +TEST_P(WritePreparedTransactionTest, ReleaseSnapshotDuringCompaction3) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 1; // commit cache size = 2 + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + + // Add a dummy key to evict v2 commit cache, but keep v1 commit cache. + // It also advance max_evicted_seq and can trigger old_commit_map cleanup. + auto add_dummy = [&]() { + auto* txn_dummy = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(txn_dummy->SetName("txn_dummy")); + ASSERT_OK(txn_dummy->Put("dummy", "dummy")); + ASSERT_OK(txn_dummy->Prepare()); + ASSERT_OK(txn_dummy->Commit()); + delete txn_dummy; + }; + + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1")); + auto* txn = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(txn->SetName("txn")); + ASSERT_OK(txn->Put("key1", "value2")); + ASSERT_OK(txn->Prepare()); + // TODO(myabandeh): replace it with GetId()? + auto v2_seq = db->GetLatestSequenceNumber(); + ASSERT_OK(txn->Commit()); + delete txn; + auto* s1 = db->GetSnapshot(); + // Dummy key to advance sequence number. + add_dummy(); + auto* s2 = db->GetSnapshot(); + + auto callback = [&](void*) { + db->ReleaseSnapshot(s1); + // Add some dummy entries to trigger s1 being cleanup from old_commit_map. + add_dummy(); + add_dummy(); + }; + SyncPoint::GetInstance()->SetCallBack("CompactionIterator:AfterInit", + callback); + SyncPoint::GetInstance()->EnableProcessing(); + + ASSERT_OK(db->Flush(FlushOptions())); + // value1 should be compact out. + VerifyInternalKeys({{"key1", "value2", v2_seq, kTypeValue}}); + + db->ReleaseSnapshot(s2); + SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +TEST_P(WritePreparedTransactionTest, ReleaseEarliestSnapshotDuringCompaction) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 0; // minimum commit cache + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1")); + auto* transaction = + db->BeginTransaction(WriteOptions(), TransactionOptions(), nullptr); + ASSERT_OK(transaction->SetName("txn")); + ASSERT_OK(transaction->Delete("key1")); + ASSERT_OK(transaction->Prepare()); + SequenceNumber del_seq = db->GetLatestSequenceNumber(); + auto snapshot1 = db->GetSnapshot(); + // Increment sequence number. + ASSERT_OK(db->Put(WriteOptions(), "key2", "value2")); + auto snapshot2 = db->GetSnapshot(); + ASSERT_OK(transaction->Commit()); + delete transaction; + VerifyKeys({{"key1", "NOT_FOUND"}}); + VerifyKeys({{"key1", "value1"}}, snapshot1); + VerifyKeys({{"key1", "value1"}}, snapshot2); + ASSERT_OK(db->Flush(FlushOptions())); + + auto callback = [&](void* compaction) { + // Release snapshot1 after CompactionIterator init. + // CompactionIterator need to double check and find out snapshot2 is now + // the earliest existing snapshot. + if (compaction != nullptr) { + db->ReleaseSnapshot(snapshot1); + // Add some keys to advance max_evicted_seq. + ASSERT_OK(db->Put(WriteOptions(), "key3", "value3")); + ASSERT_OK(db->Put(WriteOptions(), "key4", "value4")); + } + }; + SyncPoint::GetInstance()->SetCallBack("CompactionIterator:AfterInit", + callback); + SyncPoint::GetInstance()->EnableProcessing(); + + // Dummy keys to avoid compaction trivially move files and get around actual + // compaction logic. + ASSERT_OK(db->Put(WriteOptions(), "a", "dummy")); + ASSERT_OK(db->Put(WriteOptions(), "z", "dummy")); + ASSERT_OK(db->CompactRange(CompactRangeOptions(), nullptr, nullptr)); + // Only verify for key1. Both the put and delete for the key should be kept. + // Since the delete tombstone is not visible to snapshot2, we need to keep + // at least one version of the key, for write-conflict check. + VerifyInternalKeys({{"key1", "", del_seq, kTypeDeletion}, + {"key1", "value1", 0, kTypeValue}}); + db->ReleaseSnapshot(snapshot2); + SyncPoint::GetInstance()->ClearAllCallBacks(); +} + // A more complex test to verify compaction/flush should keep keys visible // to snapshots. TEST_P(WritePreparedTransactionTest, @@ -1916,6 +2587,34 @@ TEST_P(WritePreparedTransactionTest, delete transaction; } +TEST_P(WritePreparedTransactionTest, CommitAndSnapshotDuringCompaction) { + options.disable_auto_compactions = true; + ReOpen(); + + const Snapshot* snapshot = nullptr; + ASSERT_OK(db->Put(WriteOptions(), "key1", "value1")); + auto* txn = db->BeginTransaction(WriteOptions()); + ASSERT_OK(txn->SetName("txn")); + ASSERT_OK(txn->Put("key1", "value2")); + ASSERT_OK(txn->Prepare()); + + auto callback = [&](void*) { + // Snapshot is taken after compaction start. It should be taken into + // consideration for whether to compact out value1. + snapshot = db->GetSnapshot(); + ASSERT_OK(txn->Commit()); + delete txn; + }; + SyncPoint::GetInstance()->SetCallBack("CompactionIterator:AfterInit", + callback); + SyncPoint::GetInstance()->EnableProcessing(); + ASSERT_OK(db->Flush(FlushOptions())); + ASSERT_NE(nullptr, snapshot); + VerifyKeys({{"key1", "value2"}}); + VerifyKeys({{"key1", "value1"}}, snapshot); + db->ReleaseSnapshot(snapshot); +} + TEST_P(WritePreparedTransactionTest, Iterate) { auto verify_state = [](Iterator* iter, const std::string& key, const std::string& value) { @@ -1977,6 +2676,418 @@ TEST_P(WritePreparedTransactionTest, IteratorRefreshNotSupported) { delete iter; } +// Committing an delayed prepared has two non-atomic steps: update commit cache, +// remove seq from delayed_prepared_. The read in IsInSnapshot also involves two +// non-atomic steps of checking these two data structures. This test breaks each +// in the middle to ensure correctness in spite of non-atomic execution. +// Note: This test is limitted to the case where snapshot is larger than the +// max_evicted_seq_. +TEST_P(WritePreparedTransactionTest, NonAtomicCommitOfDelayedPrepared) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 3; // 8 entries + for (auto split_read : {true, false}) { + std::vector split_options = {false}; + if (split_read) { + // Also test for break before mutex + split_options.push_back(true); + } + for (auto split_before_mutex : split_options) { + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WritePreparedTxnDB* wp_db = dynamic_cast(db); + DBImpl* db_impl = reinterpret_cast(db->GetRootDB()); + // Fill up the commit cache + std::string init_value("value1"); + for (int i = 0; i < 10; i++) { + db->Put(WriteOptions(), Slice("key1"), Slice(init_value)); + } + // Prepare a transaction but do not commit it + Transaction* txn = + db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn->SetName("xid")); + ASSERT_OK(txn->Put(Slice("key1"), Slice("value2"))); + ASSERT_OK(txn->Prepare()); + // Commit a bunch of entries to advance max evicted seq and make the + // prepared a delayed prepared + for (int i = 0; i < 10; i++) { + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + } + // The snapshot should not see the delayed prepared entry + auto snap = db->GetSnapshot(); + + if (split_read) { + if (split_before_mutex) { + // split before acquiring prepare_mutex_ + rocksdb::SyncPoint::GetInstance()->LoadDependency( + {{"WritePreparedTxnDB::IsInSnapshot:prepared_mutex_:pause", + "AtomicCommitOfDelayedPrepared:Commit:before"}, + {"AtomicCommitOfDelayedPrepared:Commit:after", + "WritePreparedTxnDB::IsInSnapshot:prepared_mutex_:resume"}}); + } else { + // split right after reading from the commit cache + rocksdb::SyncPoint::GetInstance()->LoadDependency( + {{"WritePreparedTxnDB::IsInSnapshot:GetCommitEntry:pause", + "AtomicCommitOfDelayedPrepared:Commit:before"}, + {"AtomicCommitOfDelayedPrepared:Commit:after", + "WritePreparedTxnDB::IsInSnapshot:GetCommitEntry:resume"}}); + } + } else { // split commit + // split right before removing from delayed_prepared_ + rocksdb::SyncPoint::GetInstance()->LoadDependency( + {{"WritePreparedTxnDB::RemovePrepared:pause", + "AtomicCommitOfDelayedPrepared:Read:before"}, + {"AtomicCommitOfDelayedPrepared:Read:after", + "WritePreparedTxnDB::RemovePrepared:resume"}}); + } + SyncPoint::GetInstance()->EnableProcessing(); + + rocksdb::port::Thread commit_thread([&]() { + TEST_SYNC_POINT("AtomicCommitOfDelayedPrepared:Commit:before"); + ASSERT_OK(txn->Commit()); + if (split_before_mutex) { + // Do bunch of inserts to evict the commit entry from the cache. This + // would prevent the 2nd look into commit cache under prepare_mutex_ + // to see the commit entry. + auto seq = db_impl->TEST_GetLastVisibleSequence(); + size_t tries = 0; + while (wp_db->max_evicted_seq_ < seq && tries < 50) { + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + tries++; + }; + ASSERT_LT(tries, 50); + } + TEST_SYNC_POINT("AtomicCommitOfDelayedPrepared:Commit:after"); + delete txn; + }); + + rocksdb::port::Thread read_thread([&]() { + TEST_SYNC_POINT("AtomicCommitOfDelayedPrepared:Read:before"); + ReadOptions roptions; + roptions.snapshot = snap; + PinnableSlice value; + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key1", &value); + ASSERT_OK(s); + // It should not see the commit of delayed prepared + ASSERT_TRUE(value == init_value); + TEST_SYNC_POINT("AtomicCommitOfDelayedPrepared:Read:after"); + db->ReleaseSnapshot(snap); + }); + + read_thread.join(); + commit_thread.join(); + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); + } // for split_before_mutex + } // for split_read +} + +// When max evicted seq advances a prepared seq, it involves two updates: i) +// adding prepared seq to delayed_prepared_, ii) updating max_evicted_seq_. +// ::IsInSnapshot also reads these two values in a non-atomic way. This test +// ensures correctness if the update occurs after ::IsInSnapshot reads +// delayed_prepared_empty_ and before it reads max_evicted_seq_. +// Note: this test focuses on read snapshot larger than max_evicted_seq_. +TEST_P(WritePreparedTransactionTest, NonAtomicUpdateOfDelayedPrepared) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 3; // 8 entries + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WritePreparedTxnDB* wp_db = dynamic_cast(db); + // Fill up the commit cache + std::string init_value("value1"); + for (int i = 0; i < 10; i++) { + db->Put(WriteOptions(), Slice("key1"), Slice(init_value)); + } + // Prepare a transaction but do not commit it + Transaction* txn = db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn->SetName("xid")); + ASSERT_OK(txn->Put(Slice("key1"), Slice("value2"))); + ASSERT_OK(txn->Prepare()); + // Create a gap between prepare seq and snapshot seq + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + // The snapshot should not see the delayed prepared entry + auto snap = db->GetSnapshot(); + ASSERT_LT(txn->GetId(), snap->GetSequenceNumber()); + + // split right after reading delayed_prepared_empty_ + rocksdb::SyncPoint::GetInstance()->LoadDependency( + {{"WritePreparedTxnDB::IsInSnapshot:delayed_prepared_empty_:pause", + "AtomicUpdateOfDelayedPrepared:before"}, + {"AtomicUpdateOfDelayedPrepared:after", + "WritePreparedTxnDB::IsInSnapshot:delayed_prepared_empty_:resume"}}); + SyncPoint::GetInstance()->EnableProcessing(); + + rocksdb::port::Thread commit_thread([&]() { + TEST_SYNC_POINT("AtomicUpdateOfDelayedPrepared:before"); + // Commit a bunch of entries to advance max evicted seq and make the + // prepared a delayed prepared + size_t tries = 0; + while (wp_db->max_evicted_seq_ < txn->GetId() && tries < 50) { + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + tries++; + }; + ASSERT_LT(tries, 50); + // This is the case on which the test focuses + ASSERT_LT(wp_db->max_evicted_seq_, snap->GetSequenceNumber()); + TEST_SYNC_POINT("AtomicUpdateOfDelayedPrepared:after"); + }); + + rocksdb::port::Thread read_thread([&]() { + ReadOptions roptions; + roptions.snapshot = snap; + PinnableSlice value; + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key1", &value); + ASSERT_OK(s); + // It should not see the uncommitted value of delayed prepared + ASSERT_TRUE(value == init_value); + db->ReleaseSnapshot(snap); + }); + + read_thread.join(); + commit_thread.join(); + ASSERT_OK(txn->Commit()); + delete txn; + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +// Eviction from commit cache and update of max evicted seq are two non-atomic +// steps. Similarly the read of max_evicted_seq_ in ::IsInSnapshot and reading +// from commit cache are two non-atomic steps. This tests if the update occurs +// after reading max_evicted_seq_ and before reading the commit cache. +// Note: the test focuses on snapshot larger than max_evicted_seq_ +TEST_P(WritePreparedTransactionTest, NonAtomicUpdateOfMaxEvictedSeq) { + const size_t snapshot_cache_bits = 7; // same as default + const size_t commit_cache_bits = 3; // 8 entries + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WritePreparedTxnDB* wp_db = dynamic_cast(db); + // Fill up the commit cache + std::string init_value("value1"); + std::string last_value("value_final"); + for (int i = 0; i < 10; i++) { + db->Put(WriteOptions(), Slice("key1"), Slice(init_value)); + } + // Do an uncommitted write to prevent min_uncommitted optimization + Transaction* txn1 = + db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn1->SetName("xid1")); + ASSERT_OK(txn1->Put(Slice("key0"), last_value)); + ASSERT_OK(txn1->Prepare()); + // Do a write with prepare to get the prepare seq + Transaction* txn = db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn->SetName("xid")); + ASSERT_OK(txn->Put(Slice("key1"), last_value)); + ASSERT_OK(txn->Prepare()); + ASSERT_OK(txn->Commit()); + // Create a gap between commit entry and snapshot seq + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + // The snapshot should see the last commit + auto snap = db->GetSnapshot(); + ASSERT_LE(txn->GetId(), snap->GetSequenceNumber()); + + // split right after reading max_evicted_seq_ + rocksdb::SyncPoint::GetInstance()->LoadDependency( + {{"WritePreparedTxnDB::IsInSnapshot:max_evicted_seq_:pause", + "NonAtomicUpdateOfMaxEvictedSeq:before"}, + {"NonAtomicUpdateOfMaxEvictedSeq:after", + "WritePreparedTxnDB::IsInSnapshot:max_evicted_seq_:resume"}}); + SyncPoint::GetInstance()->EnableProcessing(); + + rocksdb::port::Thread commit_thread([&]() { + TEST_SYNC_POINT("NonAtomicUpdateOfMaxEvictedSeq:before"); + // Commit a bunch of entries to advance max evicted seq beyond txn->GetId() + size_t tries = 0; + while (wp_db->max_evicted_seq_ < txn->GetId() && tries < 50) { + db->Put(WriteOptions(), Slice("key3"), Slice("value3")); + tries++; + }; + ASSERT_LT(tries, 50); + // This is the case on which the test focuses + ASSERT_LT(wp_db->max_evicted_seq_, snap->GetSequenceNumber()); + TEST_SYNC_POINT("NonAtomicUpdateOfMaxEvictedSeq:after"); + }); + + rocksdb::port::Thread read_thread([&]() { + ReadOptions roptions; + roptions.snapshot = snap; + PinnableSlice value; + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key1", &value); + ASSERT_OK(s); + // It should see the committed value of the evicted entry + ASSERT_TRUE(value == last_value); + db->ReleaseSnapshot(snap); + }); + + read_thread.join(); + commit_thread.join(); + delete txn; + txn1->Commit(); + delete txn1; + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +// Test when we add a prepared seq when the max_evicted_seq_ already goes beyond +// that. The test focuses on a race condition between AddPrepared and +// AdvanceMaxEvictedSeq functions. +TEST_P(WritePreparedTransactionTest, AddPreparedBeforeMax) { + if (!options.two_write_queues) { + // This test is only for two write queues + return; + } + const size_t snapshot_cache_bits = 7; // same as default + // 1 entry to advance max after the 2nd commit + const size_t commit_cache_bits = 0; + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + WritePreparedTxnDB* wp_db = dynamic_cast(db); + std::string some_value("value_some"); + std::string uncommitted_value("value_uncommitted"); + // Prepare two uncommitted transactions + Transaction* txn1 = + db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn1->SetName("xid1")); + ASSERT_OK(txn1->Put(Slice("key1"), some_value)); + ASSERT_OK(txn1->Prepare()); + Transaction* txn2 = + db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn2->SetName("xid2")); + ASSERT_OK(txn2->Put(Slice("key2"), some_value)); + ASSERT_OK(txn2->Prepare()); + // Start the txn here so the other thread could get its id + Transaction* txn = db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn->SetName("xid")); + ASSERT_OK(txn->Put(Slice("key0"), uncommitted_value)); + port::Mutex txn_mutex_; + + // t1) Insert prepared entry, t2) commit other entires to advance max + // evicted sec and finish checking the existing prepared entires, t1) + // AddPrepared, t2) update max_evicted_seq_ + rocksdb::SyncPoint::GetInstance()->LoadDependency({ + {"AddPrepared::begin:pause", "AddPreparedBeforeMax::read_thread:start"}, + {"AdvanceMaxEvictedSeq::update_max:pause", "AddPrepared::begin:resume"}, + {"AddPrepared::end", "AdvanceMaxEvictedSeq::update_max:resume"}, + }); + SyncPoint::GetInstance()->EnableProcessing(); + + rocksdb::port::Thread write_thread([&]() { + txn_mutex_.Lock(); + ASSERT_OK(txn->Prepare()); + txn_mutex_.Unlock(); + }); + + rocksdb::port::Thread read_thread([&]() { + TEST_SYNC_POINT("AddPreparedBeforeMax::read_thread:start"); + // Publish seq number with a commit + ASSERT_OK(txn1->Commit()); + // Since the commit cache size is one the 2nd commit evict the 1st one and + // invokes AdcanceMaxEvictedSeq + ASSERT_OK(txn2->Commit()); + + ReadOptions roptions; + PinnableSlice value; + // The snapshot should not see the uncommitted value from write_thread + auto snap = db->GetSnapshot(); + ASSERT_LT(wp_db->max_evicted_seq_, snap->GetSequenceNumber()); + // This is the scenario that we test for + txn_mutex_.Lock(); + ASSERT_GT(wp_db->max_evicted_seq_, txn->GetId()); + txn_mutex_.Unlock(); + roptions.snapshot = snap; + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key0", &value); + ASSERT_TRUE(s.IsNotFound()); + db->ReleaseSnapshot(snap); + }); + + read_thread.join(); + write_thread.join(); + delete txn1; + delete txn2; + ASSERT_OK(txn->Commit()); + delete txn; + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); +} + +// When an old prepared entry gets committed, there is a gap between the time +// that it is published and when it is cleaned up from old_prepared_. This test +// stresses such cases. +TEST_P(WritePreparedTransactionTest, CommitOfDelayedPrepared) { + const size_t snapshot_cache_bits = 7; // same as default + for (const size_t commit_cache_bits : {0, 2, 3}) { + for (const size_t sub_batch_cnt : {1, 2, 3}) { + UpdateTransactionDBOptions(snapshot_cache_bits, commit_cache_bits); + ReOpen(); + std::atomic snap = {nullptr}; + std::atomic exp_prepare = {0}; + // Value is synchronized via snap + PinnableSlice value; + // Take a snapshot after publish and before RemovePrepared:Start + auto callback = [&](void* param) { + SequenceNumber prep_seq = *((SequenceNumber*)param); + if (prep_seq == exp_prepare.load()) { // only for write_thread + ASSERT_EQ(nullptr, snap.load()); + snap.store(db->GetSnapshot()); + ReadOptions roptions; + roptions.snapshot = snap.load(); + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key", &value); + ASSERT_OK(s); + } + }; + SyncPoint::GetInstance()->SetCallBack("RemovePrepared:Start", callback); + SyncPoint::GetInstance()->EnableProcessing(); + // Thread to cause frequent evictions + rocksdb::port::Thread eviction_thread([&]() { + // Too many txns might cause commit_seq - prepare_seq in another thread + // to go beyond DELTA_UPPERBOUND + for (int i = 0; i < 25 * (1 << commit_cache_bits); i++) { + db->Put(WriteOptions(), Slice("key1"), Slice("value1")); + } + }); + rocksdb::port::Thread write_thread([&]() { + for (int i = 0; i < 25 * (1 << commit_cache_bits); i++) { + Transaction* txn = + db->BeginTransaction(WriteOptions(), TransactionOptions()); + ASSERT_OK(txn->SetName("xid")); + std::string val_str = "value" + ToString(i); + for (size_t b = 0; b < sub_batch_cnt; b++) { + ASSERT_OK(txn->Put(Slice("key"), val_str)); + } + ASSERT_OK(txn->Prepare()); + // Let an eviction to kick in + std::this_thread::yield(); + + exp_prepare.store(txn->GetId()); + ASSERT_OK(txn->Commit()); + delete txn; + + // Read with the snapshot taken before delayed_prepared_ cleanup + ReadOptions roptions; + roptions.snapshot = snap.load(); + ASSERT_NE(nullptr, roptions.snapshot); + PinnableSlice value2; + auto s = db->Get(roptions, db->DefaultColumnFamily(), "key", &value2); + ASSERT_OK(s); + // It should see its own write + ASSERT_TRUE(val_str == value2); + // The value read by snapshot should not change + ASSERT_STREQ(value2.ToString().c_str(), value.ToString().c_str()); + + db->ReleaseSnapshot(roptions.snapshot); + snap.store(nullptr); + } + }); + write_thread.join(); + eviction_thread.join(); + } + rocksdb::SyncPoint::GetInstance()->DisableProcessing(); + rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks(); + } +} + // Test that updating the commit map will not affect the existing snapshots TEST_P(WritePreparedTransactionTest, AtomicCommit) { for (bool skip_prepare : {true, false}) { diff --git a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.cc b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.cc index cb20d1439..4100925c5 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.cc @@ -31,8 +31,13 @@ struct WriteOptions; WritePreparedTxn::WritePreparedTxn(WritePreparedTxnDB* txn_db, const WriteOptions& write_options, const TransactionOptions& txn_options) - : PessimisticTransaction(txn_db, write_options, txn_options), - wpt_db_(txn_db) {} + : PessimisticTransaction(txn_db, write_options, txn_options, false), + wpt_db_(txn_db) { + // Call Initialize outside PessimisticTransaction constructor otherwise it + // would skip overridden functions in WritePreparedTxn since they are not + // defined yet in the constructor of PessimisticTransaction + Initialize(txn_options); +} void WritePreparedTxn::Initialize(const TransactionOptions& txn_options) { PessimisticTransaction::Initialize(txn_options); @@ -45,7 +50,8 @@ Status WritePreparedTxn::Get(const ReadOptions& read_options, auto snapshot = read_options.snapshot; auto snap_seq = snapshot != nullptr ? snapshot->GetSequenceNumber() : kMaxSequenceNumber; - SequenceNumber min_uncommitted = 0; // by default disable the optimization + SequenceNumber min_uncommitted = + kMinUnCommittedSeq; // by default disable the optimization if (snapshot != nullptr) { min_uncommitted = static_cast_with_check(snapshot) @@ -78,19 +84,17 @@ Status WritePreparedTxn::PrepareInternal() { WriteOptions write_options = write_options_; write_options.disableWAL = false; const bool WRITE_AFTER_COMMIT = true; + const bool kFirstPrepareBatch = true; WriteBatchInternal::MarkEndPrepare(GetWriteBatch()->GetWriteBatch(), name_, !WRITE_AFTER_COMMIT); // For each duplicate key we account for a new sub-batch prepare_batch_cnt_ = GetWriteBatch()->SubBatchCnt(); - // AddPrepared better to be called in the pre-release callback otherwise there - // is a non-zero chance of max advancing prepare_seq and readers assume the - // data as committed. - // Also having it in the PreReleaseCallback allows in-order addition of - // prepared entries to PrepareHeap and hence enables an optimization. Refer to + // Having AddPrepared in the PreReleaseCallback allows in-order addition of + // prepared entries to PreparedHeap and hence enables an optimization. Refer to // SmallestUnCommittedSeq for more details. AddPreparedCallback add_prepared_callback( - wpt_db_, prepare_batch_cnt_, - db_impl_->immutable_db_options().two_write_queues); + wpt_db_, db_impl_, prepare_batch_cnt_, + db_impl_->immutable_db_options().two_write_queues, kFirstPrepareBatch); const bool DISABLE_MEMTABLE = true; uint64_t seq_used = kMaxSequenceNumber; Status s = db_impl_->WriteImpl( @@ -146,14 +150,19 @@ Status WritePreparedTxn::CommitInternal() { const bool disable_memtable = !includes_data; const bool do_one_write = !db_impl_->immutable_db_options().two_write_queues || disable_memtable; - const bool publish_seq = do_one_write; - // Note: CommitTimeWriteBatch does not need AddPrepared since it is written to - // DB in one shot. min_uncommitted still works since it requires capturing - // data that is written to DB but not yet committed, while - // CommitTimeWriteBatch commits with PreReleaseCallback. WritePreparedCommitEntryPreReleaseCallback update_commit_map( - wpt_db_, db_impl_, prepare_seq, prepare_batch_cnt_, commit_batch_cnt, - publish_seq); + wpt_db_, db_impl_, prepare_seq, prepare_batch_cnt_, commit_batch_cnt); + // This is to call AddPrepared on CommitTimeWriteBatch + const bool kFirstPrepareBatch = true; + AddPreparedCallback add_prepared_callback( + wpt_db_, db_impl_, commit_batch_cnt, + db_impl_->immutable_db_options().two_write_queues, !kFirstPrepareBatch); + PreReleaseCallback* pre_release_callback; + if (do_one_write) { + pre_release_callback = &update_commit_map; + } else { + pre_release_callback = &add_prepared_callback; + } uint64_t seq_used = kMaxSequenceNumber; // Since the prepared batch is directly written to memtable, there is already // a connection between the memtable and its WAL, so there is no need to @@ -162,37 +171,29 @@ Status WritePreparedTxn::CommitInternal() { size_t batch_cnt = UNLIKELY(commit_batch_cnt) ? commit_batch_cnt : 1; auto s = db_impl_->WriteImpl(write_options_, working_batch, nullptr, nullptr, zero_log_number, disable_memtable, &seq_used, - batch_cnt, &update_commit_map); + batch_cnt, pre_release_callback); assert(!s.ok() || seq_used != kMaxSequenceNumber); + const SequenceNumber commit_batch_seq = seq_used; if (LIKELY(do_one_write || !s.ok())) { if (LIKELY(s.ok())) { // Note RemovePrepared should be called after WriteImpl that publishsed // the seq. Otherwise SmallestUnCommittedSeq optimization breaks. wpt_db_->RemovePrepared(prepare_seq, prepare_batch_cnt_); } + if (UNLIKELY(!do_one_write)) { + wpt_db_->RemovePrepared(commit_batch_seq, commit_batch_cnt); + } return s; } // else do the 2nd write to publish seq // Note: the 2nd write comes with a performance penality. So if we have too // many of commits accompanied with ComitTimeWriteBatch and yet we cannot // enable use_only_the_last_commit_time_batch_for_recovery_ optimization, // two_write_queues should be disabled to avoid many additional writes here. - class PublishSeqPreReleaseCallback : public PreReleaseCallback { - public: - explicit PublishSeqPreReleaseCallback(DBImpl* db_impl) - : db_impl_(db_impl) {} - virtual Status Callback(SequenceNumber seq, bool is_mem_disabled) override { -#ifdef NDEBUG - (void)is_mem_disabled; -#endif - assert(is_mem_disabled); - assert(db_impl_->immutable_db_options().two_write_queues); - db_impl_->SetLastPublishedSequence(seq); - return Status::OK(); - } - - private: - DBImpl* db_impl_; - } publish_seq_callback(db_impl_); + const size_t kZeroData = 0; + // Update commit map only from the 2nd queue + WritePreparedCommitEntryPreReleaseCallback update_commit_map_with_aux_batch( + wpt_db_, db_impl_, prepare_seq, prepare_batch_cnt_, kZeroData, + commit_batch_seq, commit_batch_cnt); WriteBatch empty_batch; empty_batch.PutLogData(Slice()); // In the absence of Prepare markers, use Noop as a batch separator @@ -202,11 +203,12 @@ Status WritePreparedTxn::CommitInternal() { const uint64_t NO_REF_LOG = 0; s = db_impl_->WriteImpl(write_options_, &empty_batch, nullptr, nullptr, NO_REF_LOG, DISABLE_MEMTABLE, &seq_used, ONE_BATCH, - &publish_seq_callback); + &update_commit_map_with_aux_batch); assert(!s.ok() || seq_used != kMaxSequenceNumber); // Note RemovePrepared should be called after WriteImpl that publishsed the // seq. Otherwise SmallestUnCommittedSeq optimization breaks. wpt_db_->RemovePrepared(prepare_seq, prepare_batch_cnt_); + wpt_db_->RemovePrepared(commit_batch_seq, commit_batch_cnt); return s; } @@ -218,8 +220,7 @@ Status WritePreparedTxn::RollbackInternal() { assert(GetId() > 0); auto cf_map_shared_ptr = wpt_db_->GetCFHandleMap(); auto cf_comp_map_shared_ptr = wpt_db_->GetCFComparatorMap(); - // In WritePrepared, the txn is is the same as prepare seq - auto last_visible_txn = GetId() - 1; + auto read_at_seq = kMaxSequenceNumber; struct RollbackWriteBatchBuilder : public WriteBatch::Handler { DBImpl* db_; ReadOptions roptions; @@ -237,8 +238,7 @@ Status WritePreparedTxn::RollbackInternal() { std::map& handles, bool rollback_merge_operands) : db_(db), - callback(wpt_db, snap_seq, - 0), // 0 disables min_uncommitted optimization + callback(wpt_db, snap_seq), // disable min_uncommitted optimization rollback_batch_(dst_batch), comparators_(comparators), handles_(handles), @@ -307,8 +307,8 @@ Status WritePreparedTxn::RollbackInternal() { } protected: - virtual bool WriteAfterCommit() const override { return false; } - } rollback_handler(db_impl_, wpt_db_, last_visible_txn, &rollback_batch, + bool WriteAfterCommit() const override { return false; } + } rollback_handler(db_impl_, wpt_db_, read_at_seq, &rollback_batch, *cf_comp_map_shared_ptr.get(), *cf_map_shared_ptr.get(), wpt_db_->txn_db_options_.rollback_merge_operands); auto s = GetWriteBatch()->GetWriteBatch()->Iterate(&rollback_handler); @@ -323,22 +323,32 @@ Status WritePreparedTxn::RollbackInternal() { const uint64_t NO_REF_LOG = 0; uint64_t seq_used = kMaxSequenceNumber; const size_t ONE_BATCH = 1; - // We commit the rolled back prepared batches. ALthough this is + const bool kFirstPrepareBatch = true; + // We commit the rolled back prepared batches. Although this is // counter-intuitive, i) it is safe to do so, since the prepared batches are // already canceled out by the rollback batch, ii) adding the commit entry to // CommitCache will allow us to benefit from the existing mechanism in // CommitCache that keeps an entry evicted due to max advance and yet overlaps // with a live snapshot around so that the live snapshot properly skips the // entry even if its prepare seq is lower than max_evicted_seq_. + AddPreparedCallback add_prepared_callback( + wpt_db_, db_impl_, ONE_BATCH, + db_impl_->immutable_db_options().two_write_queues, !kFirstPrepareBatch); WritePreparedCommitEntryPreReleaseCallback update_commit_map( wpt_db_, db_impl_, GetId(), prepare_batch_cnt_, ONE_BATCH); + PreReleaseCallback* pre_release_callback; + if (do_one_write) { + pre_release_callback = &update_commit_map; + } else { + pre_release_callback = &add_prepared_callback; + } // Note: the rollback batch does not need AddPrepared since it is written to // DB in one shot. min_uncommitted still works since it requires capturing // data that is written to DB but not yet committed, while - // the roolback batch commits with PreReleaseCallback. + // the rollback batch commits with PreReleaseCallback. s = db_impl_->WriteImpl(write_options_, &rollback_batch, nullptr, nullptr, NO_REF_LOG, !DISABLE_MEMTABLE, &seq_used, ONE_BATCH, - do_one_write ? &update_commit_map : nullptr); + pre_release_callback); assert(!s.ok() || seq_used != kMaxSequenceNumber); if (!s.ok()) { return s; @@ -347,15 +357,14 @@ Status WritePreparedTxn::RollbackInternal() { wpt_db_->RemovePrepared(GetId(), prepare_batch_cnt_); return s; } // else do the 2nd write for commit - uint64_t& prepare_seq = seq_used; + uint64_t rollback_seq = seq_used; ROCKS_LOG_DETAILS(db_impl_->immutable_db_options().info_log, - "RollbackInternal 2nd write prepare_seq: %" PRIu64, - prepare_seq); + "RollbackInternal 2nd write rollback_seq: %" PRIu64, + rollback_seq); // Commit the batch by writing an empty batch to the queue that will release // the commit sequence number to readers. - const size_t ZERO_COMMITS = 0; - WritePreparedCommitEntryPreReleaseCallback update_commit_map_with_prepare( - wpt_db_, db_impl_, prepare_seq, ONE_BATCH, ZERO_COMMITS); + WritePreparedRollbackPreReleaseCallback update_commit_map_with_prepare( + wpt_db_, db_impl_, GetId(), rollback_seq, prepare_batch_cnt_); WriteBatch empty_batch; empty_batch.PutLogData(Slice()); // In the absence of Prepare markers, use Noop as a batch separator @@ -364,20 +373,13 @@ Status WritePreparedTxn::RollbackInternal() { NO_REF_LOG, DISABLE_MEMTABLE, &seq_used, ONE_BATCH, &update_commit_map_with_prepare); assert(!s.ok() || seq_used != kMaxSequenceNumber); - // Mark the txn as rolled back - uint64_t& rollback_seq = seq_used; + ROCKS_LOG_DETAILS(db_impl_->immutable_db_options().info_log, + "RollbackInternal (status=%s) commit: %" PRIu64, + s.ToString().c_str(), GetId()); if (s.ok()) { - // Note: it is safe to do it after PreReleaseCallback via WriteImpl since - // all the writes by the prpared batch are already blinded by the rollback - // batch. The only reason we commit the prepared batch here is to benefit - // from the existing mechanism in CommitCache that takes care of the rare - // cases that the prepare seq is visible to a snsapshot but max evicted seq - // advances that prepare seq. - for (size_t i = 0; i < prepare_batch_cnt_; i++) { - wpt_db_->AddCommitted(GetId() + i, rollback_seq); - } wpt_db_->RemovePrepared(GetId(), prepare_batch_cnt_); } + wpt_db_->RemovePrepared(rollback_seq, ONE_BATCH); return s; } @@ -410,24 +412,12 @@ Status WritePreparedTxn::ValidateSnapshot(ColumnFamilyHandle* column_family, WritePreparedTxnReadCallback snap_checker(wpt_db_, snap_seq, min_uncommitted); return TransactionUtil::CheckKeyForConflicts(db_impl_, cfh, key.ToString(), snap_seq, false /* cache_only */, - &snap_checker); + &snap_checker, min_uncommitted); } void WritePreparedTxn::SetSnapshot() { - // Note: for this optimization setting the last sequence number and obtaining - // the smallest uncommitted seq should be done atomically. However to avoid - // the mutex overhead, we call SmallestUnCommittedSeq BEFORE taking the - // snapshot. Since we always updated the list of unprepared seq (via - // AddPrepared) AFTER the last sequence is updated, this guarantees that the - // smallest uncommited seq that we pair with the snapshot is smaller or equal - // the value that would be obtained otherwise atomically. That is ok since - // this optimization works as long as min_uncommitted is less than or equal - // than the smallest uncommitted seq when the snapshot was taken. - auto min_uncommitted = wpt_db_->SmallestUnCommittedSeq(); - const bool FOR_WW_CONFLICT_CHECK = true; - SnapshotImpl* snapshot = dbimpl_->GetSnapshotImpl(FOR_WW_CONFLICT_CHECK); - assert(snapshot); - wpt_db_->EnhanceSnapshot(snapshot, min_uncommitted); + const bool kForWWConflictCheck = true; + SnapshotImpl* snapshot = wpt_db_->GetSnapshotInternal(kForWWConflictCheck); SetSnapshotInternal(snapshot); } diff --git a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.h b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.h index 46c114c74..2cd729cd2 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.h +++ b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn.h @@ -53,9 +53,11 @@ class WritePreparedTxn : public PessimisticTransaction { ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* value) override; - // To make WAL commit markers visible, the snapshot will be based on the last - // seq in the WAL that is also published, LastPublishedSequence, as opposed to - // the last seq in the memtable. + // Note: The behavior is undefined in presence of interleaved writes to the + // same transaction. + // To make WAL commit markers visible, the snapshot will be + // based on the last seq in the WAL that is also published, + // LastPublishedSequence, as opposed to the last seq in the memtable. using Transaction::GetIterator; virtual Iterator* GetIterator(const ReadOptions& options) override; virtual Iterator* GetIterator(const ReadOptions& options, diff --git a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.cc b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.cc index 2d8e4fcee..5364a9e05 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.cc @@ -49,6 +49,14 @@ Status WritePreparedTxnDB::Initialize( SequenceNumber prev_max = max_evicted_seq_; SequenceNumber last_seq = db_impl_->GetLatestSequenceNumber(); AdvanceMaxEvictedSeq(prev_max, last_seq); + // Create a gap between max and the next snapshot. This simplifies the logic + // in IsInSnapshot by not having to consider the special case of max == + // snapshot after recovery. This is tested in IsInSnapshotEmptyMapTest. + if (last_seq) { + db_impl_->versions_->SetLastAllocatedSequence(last_seq + 1); + db_impl_->versions_->SetLastSequence(last_seq + 1); + db_impl_->versions_->SetLastPublishedSequence(last_seq + 1); + } db_impl_->SetSnapshotChecker(new WritePreparedSnapshotChecker(this)); // A callback to commit a single sub-batch @@ -56,11 +64,9 @@ Status WritePreparedTxnDB::Initialize( public: explicit CommitSubBatchPreReleaseCallback(WritePreparedTxnDB* db) : db_(db) {} - virtual Status Callback(SequenceNumber commit_seq, - bool is_mem_disabled) override { -#ifdef NDEBUG - (void)is_mem_disabled; -#endif + Status Callback(SequenceNumber commit_seq, + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { assert(!is_mem_disabled); db_->AddCommitted(commit_seq, commit_seq); return Status::OK(); @@ -156,11 +162,14 @@ Status WritePreparedTxnDB::WriteInternal(const WriteOptions& write_options_orig, const uint64_t no_log_ref = 0; uint64_t seq_used = kMaxSequenceNumber; const size_t ZERO_PREPARES = 0; + const bool kSeperatePrepareCommitBatches = true; // Since this is not 2pc, there is no need for AddPrepared but having it in // the PreReleaseCallback enables an optimization. Refer to // SmallestUnCommittedSeq for more details. AddPreparedCallback add_prepared_callback( - this, batch_cnt, db_impl_->immutable_db_options().two_write_queues); + this, db_impl_, batch_cnt, + db_impl_->immutable_db_options().two_write_queues, + !kSeperatePrepareCommitBatches); WritePreparedCommitEntryPreReleaseCallback update_commit_map( this, db_impl_, kMaxSequenceNumber, ZERO_PREPARES, batch_cnt); PreReleaseCallback* pre_release_callback; @@ -379,22 +388,55 @@ void WritePreparedTxnDB::Init(const TransactionDBOptions& /* unused */) { // around. INC_STEP_FOR_MAX_EVICTED = std::max(COMMIT_CACHE_SIZE / 100, static_cast(1)); - snapshot_cache_ = unique_ptr[]>( + snapshot_cache_ = std::unique_ptr[]>( new std::atomic[SNAPSHOT_CACHE_SIZE] {}); - commit_cache_ = unique_ptr[]>( + commit_cache_ = std::unique_ptr[]>( new std::atomic[COMMIT_CACHE_SIZE] {}); } -void WritePreparedTxnDB::AddPrepared(uint64_t seq) { - ROCKS_LOG_DETAILS(info_log_, "Txn %" PRIu64 " Prepareing", seq); - assert(seq > max_evicted_seq_); - if (seq <= max_evicted_seq_) { - throw std::runtime_error( - "Added prepare_seq is larger than max_evicted_seq_: " + ToString(seq) + - " <= " + ToString(max_evicted_seq_.load())); +void WritePreparedTxnDB::CheckPreparedAgainstMax(SequenceNumber new_max) { + prepared_mutex_.AssertHeld(); + // When max_evicted_seq_ advances, move older entries from prepared_txns_ + // to delayed_prepared_. This guarantees that if a seq is lower than max, + // then it is not in prepared_txns_ and save an expensive, synchronized + // lookup from a shared set. delayed_prepared_ is expected to be empty in + // normal cases. + ROCKS_LOG_DETAILS( + info_log_, + "CheckPreparedAgainstMax prepared_txns_.empty() %d top: %" PRIu64, + prepared_txns_.empty(), + prepared_txns_.empty() ? 0 : prepared_txns_.top()); + while (!prepared_txns_.empty() && prepared_txns_.top() <= new_max) { + auto to_be_popped = prepared_txns_.top(); + delayed_prepared_.insert(to_be_popped); + ROCKS_LOG_WARN(info_log_, + "prepared_mutex_ overhead %" PRIu64 " (prep=%" PRIu64 + " new_max=%" PRIu64, + static_cast(delayed_prepared_.size()), + to_be_popped, new_max); + prepared_txns_.pop(); + delayed_prepared_empty_.store(false, std::memory_order_release); } +} + +void WritePreparedTxnDB::AddPrepared(uint64_t seq) { + ROCKS_LOG_DETAILS(info_log_, "Txn %" PRIu64 " Preparing with max %" PRIu64, + seq, max_evicted_seq_.load()); + TEST_SYNC_POINT("AddPrepared::begin:pause"); + TEST_SYNC_POINT("AddPrepared::begin:resume"); WriteLock wl(&prepared_mutex_); prepared_txns_.push(seq); + auto new_max = future_max_evicted_seq_.load(); + if (UNLIKELY(seq <= new_max)) { + // This should not happen in normal case + ROCKS_LOG_ERROR( + info_log_, + "Added prepare_seq is not larger than max_evicted_seq_: %" PRIu64 + " <= %" PRIu64, + seq, new_max); + CheckPreparedAgainstMax(new_max); + } + TEST_SYNC_POINT("AddPrepared::end"); } void WritePreparedTxnDB::AddCommitted(uint64_t prepare_seq, uint64_t commit_seq, @@ -414,13 +456,42 @@ void WritePreparedTxnDB::AddCommitted(uint64_t prepare_seq, uint64_t commit_seq, "Evicting %" PRIu64 ",%" PRIu64 " with max %" PRIu64, evicted.prep_seq, evicted.commit_seq, prev_max); if (prev_max < evicted.commit_seq) { - // Inc max in larger steps to avoid frequent updates - auto max_evicted_seq = evicted.commit_seq + INC_STEP_FOR_MAX_EVICTED; + auto last = db_impl_->GetLastPublishedSequence(); // could be 0 + SequenceNumber max_evicted_seq; + if (LIKELY(evicted.commit_seq < last)) { + assert(last > 0); + // Inc max in larger steps to avoid frequent updates + max_evicted_seq = + std::min(evicted.commit_seq + INC_STEP_FOR_MAX_EVICTED, last - 1); + } else { + // legit when a commit entry in a write batch overwrite the previous one + max_evicted_seq = evicted.commit_seq; + } + ROCKS_LOG_DETAILS(info_log_, + "%lu Evicting %" PRIu64 ",%" PRIu64 " with max %" PRIu64 + " => %lu", + prepare_seq, evicted.prep_seq, evicted.commit_seq, + prev_max, max_evicted_seq); AdvanceMaxEvictedSeq(prev_max, max_evicted_seq); } // After each eviction from commit cache, check if the commit entry should // be kept around because it overlaps with a live snapshot. CheckAgainstSnapshots(evicted); + if (UNLIKELY(!delayed_prepared_empty_.load(std::memory_order_acquire))) { + WriteLock wl(&prepared_mutex_); + for (auto dp : delayed_prepared_) { + if (dp == evicted.prep_seq) { + // This is a rare case that txn is committed but prepared_txns_ is not + // cleaned up yet. Refer to delayed_prepared_commits_ definition for + // why it should be kept updated. + delayed_prepared_commits_[evicted.prep_seq] = evicted.commit_seq; + ROCKS_LOG_DEBUG(info_log_, + "delayed_prepared_commits_[%" PRIu64 "]=%" PRIu64, + evicted.prep_seq, evicted.commit_seq); + break; + } + } + } } bool succ = ExchangeCommitEntry(indexed_seq, evicted_64b, {prepare_seq, commit_seq}); @@ -443,12 +514,26 @@ void WritePreparedTxnDB::AddCommitted(uint64_t prepare_seq, uint64_t commit_seq, void WritePreparedTxnDB::RemovePrepared(const uint64_t prepare_seq, const size_t batch_cnt) { + TEST_SYNC_POINT_CALLBACK( + "RemovePrepared:Start", + const_cast(reinterpret_cast(&prepare_seq))); + TEST_SYNC_POINT("WritePreparedTxnDB::RemovePrepared:pause"); + TEST_SYNC_POINT("WritePreparedTxnDB::RemovePrepared:resume"); + ROCKS_LOG_DETAILS(info_log_, + "RemovePrepared %" PRIu64 " cnt: %" ROCKSDB_PRIszt, + prepare_seq, batch_cnt); WriteLock wl(&prepared_mutex_); for (size_t i = 0; i < batch_cnt; i++) { prepared_txns_.erase(prepare_seq + i); bool was_empty = delayed_prepared_.empty(); if (!was_empty) { delayed_prepared_.erase(prepare_seq + i); + auto it = delayed_prepared_commits_.find(prepare_seq + i); + if (it != delayed_prepared_commits_.end()) { + ROCKS_LOG_DETAILS(info_log_, "delayed_prepared_commits_.erase %" PRIu64, + prepare_seq + i); + delayed_prepared_commits_.erase(it); + } bool is_empty = delayed_prepared_.empty(); if (was_empty != is_empty) { delayed_prepared_empty_.store(is_empty, std::memory_order_release); @@ -491,24 +576,20 @@ void WritePreparedTxnDB::AdvanceMaxEvictedSeq(const SequenceNumber& prev_max, ROCKS_LOG_DETAILS(info_log_, "AdvanceMaxEvictedSeq overhead %" PRIu64 " => %" PRIu64, prev_max, new_max); - // When max_evicted_seq_ advances, move older entries from prepared_txns_ - // to delayed_prepared_. This guarantees that if a seq is lower than max, - // then it is not in prepared_txns_ ans save an expensive, synchronized - // lookup from a shared set. delayed_prepared_ is expected to be empty in - // normal cases. + // Declare the intention before getting snapshot from the DB. This helps a + // concurrent GetSnapshot to wait to catch up with future_max_evicted_seq_ if + // it has not already. Otherwise the new snapshot is when we ask DB for + // snapshots smaller than future max. + auto updated_future_max = prev_max; + while (updated_future_max < new_max && + !future_max_evicted_seq_.compare_exchange_weak( + updated_future_max, new_max, std::memory_order_acq_rel, + std::memory_order_relaxed)) { + }; + { WriteLock wl(&prepared_mutex_); - while (!prepared_txns_.empty() && prepared_txns_.top() <= new_max) { - auto to_be_popped = prepared_txns_.top(); - delayed_prepared_.insert(to_be_popped); - ROCKS_LOG_WARN(info_log_, - "prepared_mutex_ overhead %" PRIu64 " (prep=%" PRIu64 - " new_max=%" PRIu64 " oldmax=%" PRIu64, - static_cast(delayed_prepared_.size()), - to_be_popped, new_max, prev_max); - prepared_txns_.pop(); - delayed_prepared_empty_.store(false, std::memory_order_release); - } + CheckPreparedAgainstMax(new_max); } // With each change to max_evicted_seq_ fetch the live snapshots behind it. @@ -527,8 +608,19 @@ void WritePreparedTxnDB::AdvanceMaxEvictedSeq(const SequenceNumber& prev_max, } if (update_snapshots) { UpdateSnapshots(snapshots, new_snapshots_version); + if (!snapshots.empty()) { + WriteLock wl(&old_commit_map_mutex_); + for (auto snap : snapshots) { + // This allows IsInSnapshot to tell apart the reads from in valid + // snapshots from the reads from committed values in valid snapshots. + old_commit_map_[snap]; + } + old_commit_map_empty_.store(false, std::memory_order_release); + } } auto updated_prev_max = prev_max; + TEST_SYNC_POINT("AdvanceMaxEvictedSeq::update_max:pause"); + TEST_SYNC_POINT("AdvanceMaxEvictedSeq::update_max:resume"); while (updated_prev_max < new_max && !max_evicted_seq_.compare_exchange_weak(updated_prev_max, new_max, std::memory_order_acq_rel, @@ -537,34 +629,102 @@ void WritePreparedTxnDB::AdvanceMaxEvictedSeq(const SequenceNumber& prev_max, } const Snapshot* WritePreparedTxnDB::GetSnapshot() { - // Note: SmallestUnCommittedSeq must be called before GetSnapshotImpl. Refer - // to WritePreparedTxn::SetSnapshot for more explanation. + const bool kForWWConflictCheck = true; + return GetSnapshotInternal(!kForWWConflictCheck); +} + +SnapshotImpl* WritePreparedTxnDB::GetSnapshotInternal( + bool for_ww_conflict_check) { + // Note: for this optimization setting the last sequence number and obtaining + // the smallest uncommitted seq should be done atomically. However to avoid + // the mutex overhead, we call SmallestUnCommittedSeq BEFORE taking the + // snapshot. Since we always updated the list of unprepared seq (via + // AddPrepared) AFTER the last sequence is updated, this guarantees that the + // smallest uncommitted seq that we pair with the snapshot is smaller or equal + // the value that would be obtained otherwise atomically. That is ok since + // this optimization works as long as min_uncommitted is less than or equal + // than the smallest uncommitted seq when the snapshot was taken. auto min_uncommitted = WritePreparedTxnDB::SmallestUnCommittedSeq(); - const bool FOR_WW_CONFLICT_CHECK = true; - SnapshotImpl* snap_impl = db_impl_->GetSnapshotImpl(!FOR_WW_CONFLICT_CHECK); + SnapshotImpl* snap_impl = db_impl_->GetSnapshotImpl(for_ww_conflict_check); assert(snap_impl); + SequenceNumber snap_seq = snap_impl->GetSequenceNumber(); + // Note: Check against future_max_evicted_seq_ (in contrast with + // max_evicted_seq_) in case there is a concurrent AdvanceMaxEvictedSeq. + if (UNLIKELY(snap_seq != 0 && snap_seq <= future_max_evicted_seq_)) { + // There is a very rare case in which the commit entry evicts another commit + // entry that is not published yet thus advancing max evicted seq beyond the + // last published seq. This case is not likely in real-world setup so we + // handle it with a few retries. + size_t retry = 0; + SequenceNumber max; + while ((max = future_max_evicted_seq_.load()) != 0 && + snap_impl->GetSequenceNumber() <= max && retry < 100) { + ROCKS_LOG_WARN(info_log_, + "GetSnapshot snap: %" PRIu64 " max: %" PRIu64 + " retry %" ROCKSDB_PRIszt, + snap_impl->GetSequenceNumber(), max, retry); + ReleaseSnapshot(snap_impl); + // Wait for last visible seq to catch up with max, and also go beyond it + // by one. + AdvanceSeqByOne(); + snap_impl = db_impl_->GetSnapshotImpl(for_ww_conflict_check); + assert(snap_impl); + retry++; + } + assert(snap_impl->GetSequenceNumber() > max); + if (snap_impl->GetSequenceNumber() <= max) { + throw std::runtime_error( + "Snapshot seq " + ToString(snap_impl->GetSequenceNumber()) + + " after " + ToString(retry) + + " retries is still less than futre_max_evicted_seq_" + ToString(max)); + } + } EnhanceSnapshot(snap_impl, min_uncommitted); + ROCKS_LOG_DETAILS( + db_impl_->immutable_db_options().info_log, + "GetSnapshot %" PRIu64 " ww:%" PRIi32 " min_uncommitted: %" PRIu64, + snap_impl->GetSequenceNumber(), for_ww_conflict_check, min_uncommitted); return snap_impl; } +void WritePreparedTxnDB::AdvanceSeqByOne() { + // Inserting an empty value will i) let the max evicted entry to be + // published, i.e., max == last_published, increase the last published to + // be one beyond max, i.e., max < last_published. + WriteOptions woptions; + TransactionOptions txn_options; + Transaction* txn0 = BeginTransaction(woptions, txn_options, nullptr); + std::hash hasher; + char name[64]; + snprintf(name, 64, "txn%" ROCKSDB_PRIszt, hasher(std::this_thread::get_id())); + assert(strlen(name) < 64 - 1); + Status s = txn0->SetName(name); + assert(s.ok()); + if (s.ok()) { + // Without prepare it would simply skip the commit + s = txn0->Prepare(); + } + assert(s.ok()); + if (s.ok()) { + s = txn0->Commit(); + } + assert(s.ok()); + delete txn0; +} + const std::vector WritePreparedTxnDB::GetSnapshotListFromDB( SequenceNumber max) { ROCKS_LOG_DETAILS(info_log_, "GetSnapshotListFromDB with max %" PRIu64, max); - InstrumentedMutex(db_impl_->mutex()); + InstrumentedMutexLock dblock(db_impl_->mutex()); + db_impl_->mutex()->AssertHeld(); return db_impl_->snapshots().GetAll(nullptr, max); } -void WritePreparedTxnDB::ReleaseSnapshot(const Snapshot* snapshot) { - auto snap_seq = snapshot->GetSequenceNumber(); - ReleaseSnapshotInternal(snap_seq); - db_impl_->ReleaseSnapshot(snapshot); -} - void WritePreparedTxnDB::ReleaseSnapshotInternal( const SequenceNumber snap_seq) { - // relax is enough since max increases monotonically, i.e., if snap_seq < - // old_max => snap_seq < new_max as well. - if (snap_seq < max_evicted_seq_.load(std::memory_order_relaxed)) { + // TODO(myabandeh): relax should enough since the synchronizatin is already + // done by snapshots_mutex_ under which this function is called. + if (snap_seq <= max_evicted_seq_.load(std::memory_order_acquire)) { // Then this is a rare case that transaction did not finish before max // advances. It is expected for a few read-only backup snapshots. For such // snapshots we might have kept around a couple of entries in the @@ -572,14 +732,16 @@ void WritePreparedTxnDB::ReleaseSnapshotInternal( bool need_gc = false; { WPRecordTick(TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD); - ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead"); + ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead for %" PRIu64, + snap_seq); ReadLock rl(&old_commit_map_mutex_); auto prep_set_entry = old_commit_map_.find(snap_seq); need_gc = prep_set_entry != old_commit_map_.end(); } if (need_gc) { WPRecordTick(TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD); - ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead"); + ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead for %" PRIu64, + snap_seq); WriteLock wl(&old_commit_map_mutex_); old_commit_map_.erase(snap_seq); old_commit_map_empty_.store(old_commit_map_.empty(), @@ -588,6 +750,33 @@ void WritePreparedTxnDB::ReleaseSnapshotInternal( } } +void WritePreparedTxnDB::CleanupReleasedSnapshots( + const std::vector& new_snapshots, + const std::vector& old_snapshots) { + auto newi = new_snapshots.begin(); + auto oldi = old_snapshots.begin(); + for (; newi != new_snapshots.end() && oldi != old_snapshots.end();) { + assert(*newi >= *oldi); // cannot have new snapshots with lower seq + if (*newi == *oldi) { // still not released + auto value = *newi; + while (newi != new_snapshots.end() && *newi == value) { + newi++; + } + while (oldi != old_snapshots.end() && *oldi == value) { + oldi++; + } + } else { + assert(*newi > *oldi); // *oldi is released + ReleaseSnapshotInternal(*oldi); + oldi++; + } + } + // Everything remained in old_snapshots is released and must be cleaned up + for (; oldi != old_snapshots.end(); oldi++) { + ReleaseSnapshotInternal(*oldi); + } +} + void WritePreparedTxnDB::UpdateSnapshots( const std::vector& snapshots, const SequenceNumber& version) { @@ -636,6 +825,12 @@ void WritePreparedTxnDB::UpdateSnapshots( // Update the size at the end. Otherwise a parallel reader might read // items that are not set yet. snapshots_total_.store(snapshots.size(), std::memory_order_release); + + // Note: this must be done after the snapshots data structures are updated + // with the new list of snapshots. + CleanupReleasedSnapshots(snapshots, snapshots_all_); + snapshots_all_ = snapshots; + TEST_SYNC_POINT("WritePreparedTxnDB::UpdateSnapshots:p:end"); TEST_SYNC_POINT("WritePreparedTxnDB::UpdateSnapshots:s:end"); } @@ -654,13 +849,20 @@ void WritePreparedTxnDB::CheckAgainstSnapshots(const CommitEntry& evicted) { // place before gets overwritten the reader that reads bottom-up will // eventully see it. const bool next_is_larger = true; - SequenceNumber snapshot_seq = kMaxSequenceNumber; + // We will set to true if the border line snapshot suggests that. + bool search_larger_list = false; size_t ip1 = std::min(cnt, SNAPSHOT_CACHE_SIZE); for (; 0 < ip1; ip1--) { - snapshot_seq = snapshot_cache_[ip1 - 1].load(std::memory_order_acquire); + SequenceNumber snapshot_seq = + snapshot_cache_[ip1 - 1].load(std::memory_order_acquire); TEST_IDX_SYNC_POINT("WritePreparedTxnDB::CheckAgainstSnapshots:p:", ++sync_i); TEST_IDX_SYNC_POINT("WritePreparedTxnDB::CheckAgainstSnapshots:s:", sync_i); + if (ip1 == SNAPSHOT_CACHE_SIZE) { // border line snapshot + // snapshot_seq < commit_seq => larger_snapshot_seq <= commit_seq + // then later also continue the search to larger snapshots + search_larger_list = snapshot_seq < evicted.commit_seq; + } if (!MaybeUpdateOldCommitMap(evicted.prep_seq, evicted.commit_seq, snapshot_seq, !next_is_larger)) { break; @@ -675,17 +877,20 @@ void WritePreparedTxnDB::CheckAgainstSnapshots(const CommitEntry& evicted) { #endif TEST_SYNC_POINT("WritePreparedTxnDB::CheckAgainstSnapshots:p:end"); TEST_SYNC_POINT("WritePreparedTxnDB::CheckAgainstSnapshots:s:end"); - if (UNLIKELY(SNAPSHOT_CACHE_SIZE < cnt && ip1 == SNAPSHOT_CACHE_SIZE && - snapshot_seq < evicted.prep_seq)) { + if (UNLIKELY(SNAPSHOT_CACHE_SIZE < cnt && search_larger_list)) { // Then access the less efficient list of snapshots_ WPRecordTick(TXN_SNAPSHOT_MUTEX_OVERHEAD); - ROCKS_LOG_WARN(info_log_, "snapshots_mutex_ overhead"); + ROCKS_LOG_WARN(info_log_, + "snapshots_mutex_ overhead for <%" PRIu64 ",%" PRIu64 + "> with %" ROCKSDB_PRIszt " snapshots", + evicted.prep_seq, evicted.commit_seq, cnt); ReadLock rl(&snapshots_mutex_); // Items could have moved from the snapshots_ to snapshot_cache_ before // accquiring the lock. To make sure that we do not miss a valid snapshot, // read snapshot_cache_ again while holding the lock. for (size_t i = 0; i < SNAPSHOT_CACHE_SIZE; i++) { - snapshot_seq = snapshot_cache_[i].load(std::memory_order_acquire); + SequenceNumber snapshot_seq = + snapshot_cache_[i].load(std::memory_order_acquire); if (!MaybeUpdateOldCommitMap(evicted.prep_seq, evicted.commit_seq, snapshot_seq, next_is_larger)) { break; @@ -713,7 +918,10 @@ bool WritePreparedTxnDB::MaybeUpdateOldCommitMap( // then snapshot_seq < commit_seq if (prep_seq <= snapshot_seq) { // overlapping range WPRecordTick(TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD); - ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead"); + ROCKS_LOG_WARN(info_log_, + "old_commit_map_mutex_ overhead for %" PRIu64 + " commit entry: <%" PRIu64 ",%" PRIu64 ">", + snapshot_seq, prep_seq, commit_seq); WriteLock wl(&old_commit_map_mutex_); old_commit_map_empty_.store(false, std::memory_order_release); auto& vec = old_commit_map_[snapshot_seq]; diff --git a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.h b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.h index ec76e2716..10d1dbf60 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.h +++ b/ceph/src/rocksdb/utilities/transactions/write_prepared_txn_db.h @@ -34,36 +34,28 @@ namespace rocksdb { -#define ROCKS_LOG_DETAILS(LGR, FMT, ...) \ - ; // due to overhead by default skip such lines -// ROCKS_LOG_DEBUG(LGR, FMT, ##__VA_ARGS__) - // A PessimisticTransactionDB that writes data to DB after prepare phase of 2PC. // In this way some data in the DB might not be committed. The DB provides // mechanisms to tell such data apart from committed data. class WritePreparedTxnDB : public PessimisticTransactionDB { public: - explicit WritePreparedTxnDB( - DB* db, const TransactionDBOptions& txn_db_options, - size_t snapshot_cache_bits = DEF_SNAPSHOT_CACHE_BITS, - size_t commit_cache_bits = DEF_COMMIT_CACHE_BITS) + explicit WritePreparedTxnDB(DB* db, + const TransactionDBOptions& txn_db_options) : PessimisticTransactionDB(db, txn_db_options), - SNAPSHOT_CACHE_BITS(snapshot_cache_bits), + SNAPSHOT_CACHE_BITS(txn_db_options.wp_snapshot_cache_bits), SNAPSHOT_CACHE_SIZE(static_cast(1ull << SNAPSHOT_CACHE_BITS)), - COMMIT_CACHE_BITS(commit_cache_bits), + COMMIT_CACHE_BITS(txn_db_options.wp_commit_cache_bits), COMMIT_CACHE_SIZE(static_cast(1ull << COMMIT_CACHE_BITS)), FORMAT(COMMIT_CACHE_BITS) { Init(txn_db_options); } - explicit WritePreparedTxnDB( - StackableDB* db, const TransactionDBOptions& txn_db_options, - size_t snapshot_cache_bits = DEF_SNAPSHOT_CACHE_BITS, - size_t commit_cache_bits = DEF_COMMIT_CACHE_BITS) + explicit WritePreparedTxnDB(StackableDB* db, + const TransactionDBOptions& txn_db_options) : PessimisticTransactionDB(db, txn_db_options), - SNAPSHOT_CACHE_BITS(snapshot_cache_bits), + SNAPSHOT_CACHE_BITS(txn_db_options.wp_snapshot_cache_bits), SNAPSHOT_CACHE_SIZE(static_cast(1ull << SNAPSHOT_CACHE_BITS)), - COMMIT_CACHE_BITS(commit_cache_bits), + COMMIT_CACHE_BITS(txn_db_options.wp_commit_cache_bits), COMMIT_CACHE_SIZE(static_cast(1ull << COMMIT_CACHE_BITS)), FORMAT(COMMIT_CACHE_BITS) { Init(txn_db_options); @@ -112,17 +104,21 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { const std::vector& column_families, std::vector* iterators) override; - virtual void ReleaseSnapshot(const Snapshot* snapshot) override; - // Check whether the transaction that wrote the value with sequence number seq // is visible to the snapshot with sequence number snapshot_seq. // Returns true if commit_seq <= snapshot_seq + // If the snapshot_seq is already released and snapshot_seq <= max, sets + // *snap_released to true and returns true as well. inline bool IsInSnapshot(uint64_t prep_seq, uint64_t snapshot_seq, - uint64_t min_uncommitted = 0) const { + uint64_t min_uncommitted = kMinUnCommittedSeq, + bool* snap_released = nullptr) const { ROCKS_LOG_DETAILS(info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 " min_uncommitted %" PRIu64, prep_seq, snapshot_seq, min_uncommitted); + assert(min_uncommitted >= kMinUnCommittedSeq); + // Caller is responsible to initialize snap_released. + assert(snap_released == nullptr || *snap_released == false); // Here we try to infer the return value without looking into prepare list. // This would help avoiding synchronization over a shared map. // TODO(myabandeh): optimize this. This sequence of checks must be correct @@ -142,24 +138,6 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { prep_seq, snapshot_seq, 0); return false; } - if (!delayed_prepared_empty_.load(std::memory_order_acquire)) { - // We should not normally reach here - WPRecordTick(TXN_PREPARE_MUTEX_OVERHEAD); - ReadLock rl(&prepared_mutex_); - ROCKS_LOG_WARN(info_log_, "prepared_mutex_ overhead %" PRIu64, - static_cast(delayed_prepared_.size())); - if (delayed_prepared_.find(prep_seq) != delayed_prepared_.end()) { - // Then it is not committed yet - ROCKS_LOG_DETAILS(info_log_, - "IsInSnapshot %" PRIu64 " in %" PRIu64 - " returns %" PRId32, - prep_seq, snapshot_seq, 0); - return false; - } - } - // Note: since min_uncommitted does not include the delayed_prepared_ we - // should check delayed_prepared_ first before applying this optimization. - // TODO(myabandeh): include delayed_prepared_ in min_uncommitted if (prep_seq < min_uncommitted) { ROCKS_LOG_DETAILS(info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 @@ -168,38 +146,119 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { prep_seq, snapshot_seq, 1, min_uncommitted); return true; } - auto indexed_seq = prep_seq % COMMIT_CACHE_SIZE; + // Commit of delayed prepared has two non-atomic steps: add to commit cache, + // remove from delayed prepared. Our reads from these two is also + // non-atomic. By looking into commit cache first thus we might not find the + // prep_seq neither in commit cache not in delayed_prepared_. To fix that i) + // we check if there was any delayed prepared BEFORE looking into commit + // cache, ii) if there was, we complete the search steps to be these: i) + // commit cache, ii) delayed prepared, commit cache again. In this way if + // the first query to commit cache missed the commit, the 2nd will catch it. + bool was_empty; + SequenceNumber max_evicted_seq_lb, max_evicted_seq_ub; CommitEntry64b dont_care; - CommitEntry cached; - bool exist = GetCommitEntry(indexed_seq, &dont_care, &cached); - if (exist && prep_seq == cached.prep_seq) { - // It is committed and also not evicted from commit cache - ROCKS_LOG_DETAILS( - info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 " returns %" PRId32, - prep_seq, snapshot_seq, cached.commit_seq <= snapshot_seq); - return cached.commit_seq <= snapshot_seq; - } - // else it could be committed but not inserted in the map which could happen - // after recovery, or it could be committed and evicted by another commit, - // or never committed. - - // At this point we dont know if it was committed or it is still prepared - auto max_evicted_seq = max_evicted_seq_.load(std::memory_order_acquire); - // max_evicted_seq_ when we did GetCommitEntry <= max_evicted_seq now - if (max_evicted_seq < prep_seq) { - // Not evicted from cache and also not present, so must be still prepared - ROCKS_LOG_DETAILS( - info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 " returns %" PRId32, - prep_seq, snapshot_seq, 0); - return false; - } + auto indexed_seq = prep_seq % COMMIT_CACHE_SIZE; + size_t repeats = 0; + do { + repeats++; + assert(repeats < 100); + if (UNLIKELY(repeats >= 100)) { + throw std::runtime_error( + "The read was intrupted 100 times by update to max_evicted_seq_. " + "This is unexpected in all setups"); + } + max_evicted_seq_lb = max_evicted_seq_.load(std::memory_order_acquire); + TEST_SYNC_POINT( + "WritePreparedTxnDB::IsInSnapshot:max_evicted_seq_:pause"); + TEST_SYNC_POINT( + "WritePreparedTxnDB::IsInSnapshot:max_evicted_seq_:resume"); + was_empty = delayed_prepared_empty_.load(std::memory_order_acquire); + TEST_SYNC_POINT( + "WritePreparedTxnDB::IsInSnapshot:delayed_prepared_empty_:pause"); + TEST_SYNC_POINT( + "WritePreparedTxnDB::IsInSnapshot:delayed_prepared_empty_:resume"); + CommitEntry cached; + bool exist = GetCommitEntry(indexed_seq, &dont_care, &cached); + TEST_SYNC_POINT("WritePreparedTxnDB::IsInSnapshot:GetCommitEntry:pause"); + TEST_SYNC_POINT("WritePreparedTxnDB::IsInSnapshot:GetCommitEntry:resume"); + if (exist && prep_seq == cached.prep_seq) { + // It is committed and also not evicted from commit cache + ROCKS_LOG_DETAILS( + info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 " returns %" PRId32, + prep_seq, snapshot_seq, cached.commit_seq <= snapshot_seq); + return cached.commit_seq <= snapshot_seq; + } + // else it could be committed but not inserted in the map which could + // happen after recovery, or it could be committed and evicted by another + // commit, or never committed. + + // At this point we dont know if it was committed or it is still prepared + max_evicted_seq_ub = max_evicted_seq_.load(std::memory_order_acquire); + if (UNLIKELY(max_evicted_seq_lb != max_evicted_seq_ub)) { + continue; + } + // Note: max_evicted_seq_ when we did GetCommitEntry <= max_evicted_seq_ub + if (max_evicted_seq_ub < prep_seq) { + // Not evicted from cache and also not present, so must be still + // prepared + ROCKS_LOG_DETAILS(info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 + " returns %" PRId32, + prep_seq, snapshot_seq, 0); + return false; + } + TEST_SYNC_POINT("WritePreparedTxnDB::IsInSnapshot:prepared_mutex_:pause"); + TEST_SYNC_POINT( + "WritePreparedTxnDB::IsInSnapshot:prepared_mutex_:resume"); + if (!was_empty) { + // We should not normally reach here + WPRecordTick(TXN_PREPARE_MUTEX_OVERHEAD); + ReadLock rl(&prepared_mutex_); + ROCKS_LOG_WARN( + info_log_, "prepared_mutex_ overhead %" PRIu64 " for %" PRIu64, + static_cast(delayed_prepared_.size()), prep_seq); + if (delayed_prepared_.find(prep_seq) != delayed_prepared_.end()) { + // This is the order: 1) delayed_prepared_commits_ update, 2) publish + // 3) delayed_prepared_ clean up. So check if it is the case of a late + // clenaup. + auto it = delayed_prepared_commits_.find(prep_seq); + if (it == delayed_prepared_commits_.end()) { + // Then it is not committed yet + ROCKS_LOG_DETAILS(info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 + " returns %" PRId32, + prep_seq, snapshot_seq, 0); + return false; + } else { + ROCKS_LOG_DETAILS(info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 + " commit: %" PRIu64 " returns %" PRId32, + prep_seq, snapshot_seq, it->second, + snapshot_seq <= it->second); + return it->second <= snapshot_seq; + } + } else { + // 2nd query to commit cache. Refer to was_empty comment above. + exist = GetCommitEntry(indexed_seq, &dont_care, &cached); + if (exist && prep_seq == cached.prep_seq) { + ROCKS_LOG_DETAILS( + info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 " returns %" PRId32, + prep_seq, snapshot_seq, cached.commit_seq <= snapshot_seq); + return cached.commit_seq <= snapshot_seq; + } + max_evicted_seq_ub = max_evicted_seq_.load(std::memory_order_acquire); + } + } + } while (UNLIKELY(max_evicted_seq_lb != max_evicted_seq_ub)); // When advancing max_evicted_seq_, we move older entires from prepared to // delayed_prepared_. Also we move evicted entries from commit cache to // old_commit_map_ if it overlaps with any snapshot. Since prep_seq <= // max_evicted_seq_, we have three cases: i) in delayed_prepared_, ii) in // old_commit_map_, iii) committed with no conflict with any snapshot. Case // (i) delayed_prepared_ is checked above - if (max_evicted_seq < snapshot_seq) { // then (ii) cannot be the case + if (max_evicted_seq_ub < snapshot_seq) { // then (ii) cannot be the case // only (iii) is the case: committed // commit_seq <= max_evicted_seq_ < snapshot_seq => commit_seq < // snapshot_seq @@ -212,9 +271,15 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // snapshot. If there was no overlapping commit entry, then it is committed // with a commit_seq lower than any live snapshot, including snapshot_seq. if (old_commit_map_empty_.load(std::memory_order_acquire)) { - ROCKS_LOG_DETAILS( - info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 " returns %" PRId32, - prep_seq, snapshot_seq, 1); + ROCKS_LOG_DETAILS(info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 + " returns %" PRId32 " released=1", + prep_seq, snapshot_seq, 0); + assert(snap_released); + // This snapshot is not valid anymore. We cannot tell if prep_seq is + // committed before or after the snapshot. Return true but also set + // snap_released to true. + *snap_released = true; return true; } { @@ -222,14 +287,26 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // rare case and it is ok to pay the cost of mutex ReadLock for such old, // reading transactions. WPRecordTick(TXN_OLD_COMMIT_MAP_MUTEX_OVERHEAD); - ROCKS_LOG_WARN(info_log_, "old_commit_map_mutex_ overhead"); ReadLock rl(&old_commit_map_mutex_); auto prep_set_entry = old_commit_map_.find(snapshot_seq); bool found = prep_set_entry != old_commit_map_.end(); if (found) { auto& vec = prep_set_entry->second; found = std::binary_search(vec.begin(), vec.end(), prep_seq); + } else { + // coming from compaction + ROCKS_LOG_DETAILS(info_log_, + "IsInSnapshot %" PRIu64 " in %" PRIu64 + " returns %" PRId32 " released=1", + prep_seq, snapshot_seq, 0); + // This snapshot is not valid anymore. We cannot tell if prep_seq is + // committed before or after the snapshot. Return true but also set + // snap_released to true. + assert(snap_released); + *snap_released = true; + return true; } + if (!found) { ROCKS_LOG_DETAILS(info_log_, "IsInSnapshot %" PRIu64 " in %" PRIu64 @@ -245,12 +322,17 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { return false; } - // Add the transaction with prepare sequence seq to the prepared list + // Add the transaction with prepare sequence seq to the prepared list. + // Note: must be called serially with increasing seq on each call. void AddPrepared(uint64_t seq); + // Check if any of the prepared txns are less than new max_evicted_seq_. Must + // be called with prepared_mutex_ write locked. + void CheckPreparedAgainstMax(SequenceNumber new_max); // Remove the transaction with prepare sequence seq from the prepared list void RemovePrepared(const uint64_t seq, const size_t batch_cnt = 1); // Add the transaction with prepare sequence prepare_seq and commit sequence // commit_seq to the commit map. loop_cnt is to detect infinite loops. + // Note: must be called serially. void AddCommitted(uint64_t prepare_seq, uint64_t commit_seq, uint8_t loop_cnt = 0); @@ -358,29 +440,43 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { void UpdateCFComparatorMap(ColumnFamilyHandle* handle) override; virtual const Snapshot* GetSnapshot() override; + SnapshotImpl* GetSnapshotInternal(bool for_ww_conflict_check); protected: virtual Status VerifyCFOptions( const ColumnFamilyOptions& cf_options) override; private: - friend class WritePreparedTransactionTest_IsInSnapshotTest_Test; - friend class WritePreparedTransactionTest_CheckAgainstSnapshotsTest_Test; - friend class WritePreparedTransactionTest_CommitMapTest_Test; - friend class - WritePreparedTransactionTest_ConflictDetectionAfterRecoveryTest_Test; - friend class SnapshotConcurrentAccessTest_SnapshotConcurrentAccessTest_Test; - friend class WritePreparedTransactionTestBase; friend class PreparedHeap_BasicsTest_Test; - friend class PreparedHeap_EmptyAtTheEnd_Test; friend class PreparedHeap_Concurrent_Test; + friend class PreparedHeap_EmptyAtTheEnd_Test; + friend class SnapshotConcurrentAccessTest_SnapshotConcurrentAccessTest_Test; + friend class WritePreparedCommitEntryPreReleaseCallback; + friend class WritePreparedTransactionTestBase; friend class WritePreparedTxn; friend class WritePreparedTxnDBMock; + friend class WritePreparedTransactionTest_AddPreparedBeforeMax_Test; friend class WritePreparedTransactionTest_AdvanceMaxEvictedSeqBasicTest_Test; friend class WritePreparedTransactionTest_AdvanceMaxEvictedSeqWithDuplicatesTest_Test; + friend class WritePreparedTransactionTest_AdvanceSeqByOne_Test; friend class WritePreparedTransactionTest_BasicRecoveryTest_Test; + friend class WritePreparedTransactionTest_CheckAgainstSnapshotsTest_Test; + friend class WritePreparedTransactionTest_CleanupSnapshotEqualToMax_Test; + friend class + WritePreparedTransactionTest_ConflictDetectionAfterRecoveryTest_Test; + friend class WritePreparedTransactionTest_CommitMapTest_Test; + friend class WritePreparedTransactionTest_DoubleSnapshot_Test; friend class WritePreparedTransactionTest_IsInSnapshotEmptyMapTest_Test; + friend class WritePreparedTransactionTest_IsInSnapshotReleased_Test; + friend class WritePreparedTransactionTest_IsInSnapshotTest_Test; + friend class WritePreparedTransactionTest_NewSnapshotLargerThanMax_Test; + friend class WritePreparedTransactionTest_MaxCatchupWithNewSnapshot_Test; + friend class + WritePreparedTransactionTest_NonAtomicCommitOfDelayedPrepared_Test; + friend class + WritePreparedTransactionTest_NonAtomicUpdateOfDelayedPrepared_Test; + friend class WritePreparedTransactionTest_NonAtomicUpdateOfMaxEvictedSeq_Test; friend class WritePreparedTransactionTest_OldCommitMapGC_Test; friend class WritePreparedTransactionTest_RollbackTest_Test; friend class WriteUnpreparedTxnDB; @@ -491,6 +587,10 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // reflect any uncommitted data that is not added to prepared_txns_ yet. // Otherwise, if there is no concurrent txn, this value simply reflects that // latest value in the memtable. + if (!delayed_prepared_.empty()) { + assert(!delayed_prepared_empty_.load()); + return *delayed_prepared_.begin(); + } if (prepared_txns_.empty()) { return db_impl_->GetLatestSequenceNumber() + 1; } else { @@ -519,6 +619,11 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // version value. void UpdateSnapshots(const std::vector& snapshots, const SequenceNumber& version); + // Check the new list of new snapshots against the old one to see if any of + // the snapshots are released and to do the cleanup for the released snapshot. + void CleanupReleasedSnapshots( + const std::vector& new_snapshots, + const std::vector& old_snapshots); // Check an evicted entry against live snapshots to see if it should be kept // around or it can be safely discarded (and hence assume committed for all @@ -537,6 +642,10 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { const uint64_t& snapshot_seq, const bool next_is_larger); + // A trick to increase the last visible sequence number by one and also wait + // for the in-flight commits to be visible. + void AdvanceSeqByOne(); + // The list of live snapshots at the last time that max_evicted_seq_ advanced. // The list stored into two data structures: in snapshot_cache_ that is // efficient for concurrent reads, and in snapshots_ if the data does not fit @@ -545,14 +654,17 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // The list sorted in ascending order. Thread-safety for writes is provided // with snapshots_mutex_ and concurrent reads are safe due to std::atomic for // each entry. In x86_64 architecture such reads are compiled to simple read - // instructions. 128 entries - static const size_t DEF_SNAPSHOT_CACHE_BITS = static_cast(7); + // instructions. const size_t SNAPSHOT_CACHE_BITS; const size_t SNAPSHOT_CACHE_SIZE; - unique_ptr[]> snapshot_cache_; + std::unique_ptr[]> snapshot_cache_; // 2nd list for storing snapshots. The list sorted in ascending order. // Thread-safety is provided with snapshots_mutex_. std::vector snapshots_; + // The list of all snapshots: snapshots_ + snapshot_cache_. This list although + // redundant but simplifies CleanupOldSnapshots implementation. + // Thread-safety is provided with snapshots_mutex_. + std::vector snapshots_all_; // The version of the latest list of snapshots. This can be used to avoid // rewriting a list that is concurrently updated with a more recent version. SequenceNumber snapshots_version_ = 0; @@ -560,19 +672,23 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // A heap of prepared transactions. Thread-safety is provided with // prepared_mutex_. PreparedHeap prepared_txns_; - // 8m entry, 64MB size - static const size_t DEF_COMMIT_CACHE_BITS = static_cast(23); const size_t COMMIT_CACHE_BITS; const size_t COMMIT_CACHE_SIZE; const CommitEntry64bFormat FORMAT; // commit_cache_ must be initialized to zero to tell apart an empty index from // a filled one. Thread-safety is provided with commit_cache_mutex_. - unique_ptr[]> commit_cache_; + std::unique_ptr[]> commit_cache_; // The largest evicted *commit* sequence number from the commit_cache_. If a // seq is smaller than max_evicted_seq_ is might or might not be present in // commit_cache_. So commit_cache_ must first be checked before consulting // with max_evicted_seq_. std::atomic max_evicted_seq_ = {}; + // Order: 1) update future_max_evicted_seq_ = new_max, 2) + // GetSnapshotListFromDB(new_max), max_evicted_seq_ = new_max. Since + // GetSnapshotInternal guarantess that the snapshot seq is larger than + // future_max_evicted_seq_, this guarantes that if a snapshot is not larger + // than max has already being looked at via a GetSnapshotListFromDB(new_max). + std::atomic future_max_evicted_seq_ = {}; // Advance max_evicted_seq_ by this value each time it needs an update. The // larger the value, the less frequent advances we would have. We do not want // it to be too large either as it would cause stalls by doing too much @@ -590,6 +706,11 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { // time max_evicted_seq_ advances their sequence number. This is expected to // be empty normally. Thread-safety is provided with prepared_mutex_. std::set delayed_prepared_; + // Commit of a delayed prepared: 1) update commit cache, 2) update + // delayed_prepared_commits_, 3) publish seq, 3) clean up delayed_prepared_. + // delayed_prepared_commits_ will help us tell apart the unprepared txns from + // the ones that are committed but not cleaned up yet. + std::unordered_map delayed_prepared_commits_; // Update when delayed_prepared_.empty() changes. Expected to be true // normally. std::atomic delayed_prepared_empty_ = {true}; @@ -610,76 +731,93 @@ class WritePreparedTxnDB : public PessimisticTransactionDB { class WritePreparedTxnReadCallback : public ReadCallback { public: + WritePreparedTxnReadCallback(WritePreparedTxnDB* db, SequenceNumber snapshot) + : ReadCallback(snapshot), db_(db) {} WritePreparedTxnReadCallback(WritePreparedTxnDB* db, SequenceNumber snapshot, SequenceNumber min_uncommitted) - : db_(db), snapshot_(snapshot), min_uncommitted_(min_uncommitted) {} + : ReadCallback(snapshot, min_uncommitted), db_(db) {} // Will be called to see if the seq number visible; if not it moves on to // the next seq number. - inline virtual bool IsVisible(SequenceNumber seq) override { - return db_->IsInSnapshot(seq, snapshot_, min_uncommitted_); + inline virtual bool IsVisibleFullCheck(SequenceNumber seq) override { + auto snapshot = max_visible_seq_; + return db_->IsInSnapshot(seq, snapshot, min_uncommitted_); } + // TODO(myabandeh): override Refresh when Iterator::Refresh is supported private: WritePreparedTxnDB* db_; - SequenceNumber snapshot_; - SequenceNumber min_uncommitted_; }; class AddPreparedCallback : public PreReleaseCallback { public: - AddPreparedCallback(WritePreparedTxnDB* db, size_t sub_batch_cnt, - bool two_write_queues) + AddPreparedCallback(WritePreparedTxnDB* db, DBImpl* db_impl, + size_t sub_batch_cnt, bool two_write_queues, + bool first_prepare_batch) : db_(db), + db_impl_(db_impl), sub_batch_cnt_(sub_batch_cnt), - two_write_queues_(two_write_queues) { + two_write_queues_(two_write_queues), + first_prepare_batch_(first_prepare_batch) { (void)two_write_queues_; // to silence unused private field warning } virtual Status Callback(SequenceNumber prepare_seq, - bool is_mem_disabled) override { -#ifdef NDEBUG - (void)is_mem_disabled; -#endif + bool is_mem_disabled __attribute__((__unused__)), + uint64_t log_number) override { + // Always Prepare from the main queue assert(!two_write_queues_ || !is_mem_disabled); // implies the 1st queue for (size_t i = 0; i < sub_batch_cnt_; i++) { db_->AddPrepared(prepare_seq + i); } + if (first_prepare_batch_) { + assert(log_number != 0); + db_impl_->logs_with_prep_tracker()->MarkLogAsContainingPrepSection( + log_number); + } return Status::OK(); } private: WritePreparedTxnDB* db_; + DBImpl* db_impl_; size_t sub_batch_cnt_; bool two_write_queues_; + // It is 2PC and this is the first prepare batch. Always the case in 2PC + // unless it is WriteUnPrepared. + bool first_prepare_batch_; }; class WritePreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { public: // includes_data indicates that the commit also writes non-empty // CommitTimeWriteBatch to memtable, which needs to be committed separately. - WritePreparedCommitEntryPreReleaseCallback(WritePreparedTxnDB* db, - DBImpl* db_impl, - SequenceNumber prep_seq, - size_t prep_batch_cnt, - size_t data_batch_cnt = 0, - bool publish_seq = true) + WritePreparedCommitEntryPreReleaseCallback( + WritePreparedTxnDB* db, DBImpl* db_impl, SequenceNumber prep_seq, + size_t prep_batch_cnt, size_t data_batch_cnt = 0, + SequenceNumber aux_seq = kMaxSequenceNumber, size_t aux_batch_cnt = 0) : db_(db), db_impl_(db_impl), prep_seq_(prep_seq), prep_batch_cnt_(prep_batch_cnt), data_batch_cnt_(data_batch_cnt), includes_data_(data_batch_cnt_ > 0), - publish_seq_(publish_seq) { + aux_seq_(aux_seq), + aux_batch_cnt_(aux_batch_cnt), + includes_aux_batch_(aux_batch_cnt > 0) { assert((prep_batch_cnt_ > 0) != (prep_seq == kMaxSequenceNumber)); // xor assert(prep_batch_cnt_ > 0 || data_batch_cnt_ > 0); + assert((aux_batch_cnt_ > 0) != (aux_seq == kMaxSequenceNumber)); // xor } virtual Status Callback(SequenceNumber commit_seq, - bool is_mem_disabled) override { -#ifdef NDEBUG - (void)is_mem_disabled; -#endif + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { + // Always commit from the 2nd queue + assert(!db_impl_->immutable_db_options().two_write_queues || + is_mem_disabled); assert(includes_data_ || prep_seq_ != kMaxSequenceNumber); + // Data batch is what accompanied with the commit marker and affects the + // last seq in the commit batch. const uint64_t last_commit_seq = LIKELY(data_batch_cnt_ <= 1) ? commit_seq : commit_seq + data_batch_cnt_ - 1; @@ -688,6 +826,11 @@ class WritePreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { db_->AddCommitted(prep_seq_ + i, last_commit_seq); } } // else there was no prepare phase + if (includes_aux_batch_) { + for (size_t i = 0; i < aux_batch_cnt_; i++) { + db_->AddCommitted(aux_seq_ + i, last_commit_seq); + } + } if (includes_data_) { assert(data_batch_cnt_); // Commit the data that is accompanied with the commit request @@ -698,7 +841,7 @@ class WritePreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { db_->AddCommitted(commit_seq + i, last_commit_seq); } } - if (db_impl_->immutable_db_options().two_write_queues && publish_seq_) { + if (db_impl_->immutable_db_options().two_write_queues) { assert(is_mem_disabled); // implies the 2nd queue // Publish the sequence number. We can do that here assuming the callback // is invoked only from one write queue, which would guarantee that the @@ -718,11 +861,60 @@ class WritePreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { SequenceNumber prep_seq_; size_t prep_batch_cnt_; size_t data_batch_cnt_; - // Either because it is commit without prepare or it has a - // CommitTimeWriteBatch + // Data here is the batch that is written with the commit marker, either + // because it is commit without prepare or commit has a CommitTimeWriteBatch. bool includes_data_; - // Should the callback also publishes the commit seq number - bool publish_seq_; + // Auxiliary batch (if there is any) is a batch that is written before, but + // gets the same commit seq as prepare batch or data batch. This is used in + // two write queues where the CommitTimeWriteBatch becomes the aux batch and + // we do a separate write to actually commit everything. + SequenceNumber aux_seq_; + size_t aux_batch_cnt_; + bool includes_aux_batch_; +}; + +// For two_write_queues commit both the aborted batch and the cleanup batch and +// then published the seq +class WritePreparedRollbackPreReleaseCallback : public PreReleaseCallback { + public: + WritePreparedRollbackPreReleaseCallback(WritePreparedTxnDB* db, + DBImpl* db_impl, + SequenceNumber prep_seq, + SequenceNumber rollback_seq, + size_t prep_batch_cnt) + : db_(db), + db_impl_(db_impl), + prep_seq_(prep_seq), + rollback_seq_(rollback_seq), + prep_batch_cnt_(prep_batch_cnt) { + assert(prep_seq != kMaxSequenceNumber); + assert(rollback_seq != kMaxSequenceNumber); + assert(prep_batch_cnt_ > 0); + } + + Status Callback(SequenceNumber commit_seq, bool is_mem_disabled, + uint64_t) override { + // Always commit from the 2nd queue + assert(is_mem_disabled); // implies the 2nd queue + assert(db_impl_->immutable_db_options().two_write_queues); +#ifdef NDEBUG + (void)is_mem_disabled; +#endif + const uint64_t last_commit_seq = commit_seq; + db_->AddCommitted(rollback_seq_, last_commit_seq); + for (size_t i = 0; i < prep_batch_cnt_; i++) { + db_->AddCommitted(prep_seq_ + i, last_commit_seq); + } + db_impl_->SetLastPublishedSequence(last_commit_seq); + return Status::OK(); + } + + private: + WritePreparedTxnDB* db_; + DBImpl* db_impl_; + SequenceNumber prep_seq_; + SequenceNumber rollback_seq_; + size_t prep_batch_cnt_; }; // Count the number of sub-batches inside a batch. A sub-batch does not have diff --git a/ceph/src/rocksdb/utilities/transactions/write_unprepared_transaction_test.cc b/ceph/src/rocksdb/utilities/transactions/write_unprepared_transaction_test.cc index 009991bb7..9aee33b07 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_unprepared_transaction_test.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_unprepared_transaction_test.cc @@ -81,12 +81,12 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { ReadOptions roptions; roptions.snapshot = snapshot0; + wup_txn->unprep_seqs_[snapshot2->GetSequenceNumber() + 1] = + snapshot4->GetSequenceNumber() - snapshot2->GetSequenceNumber(); auto iter = txn->GetIterator(roptions); // Test Get(). std::string value; - wup_txn->unprep_seqs_[snapshot2->GetSequenceNumber() + 1] = - snapshot4->GetSequenceNumber() - snapshot2->GetSequenceNumber(); ASSERT_OK(txn->Get(roptions, Slice("a"), &value)); ASSERT_EQ(value, "v3"); @@ -96,6 +96,8 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { wup_txn->unprep_seqs_[snapshot6->GetSequenceNumber() + 1] = snapshot8->GetSequenceNumber() - snapshot6->GetSequenceNumber(); + delete iter; + iter = txn->GetIterator(roptions); ASSERT_OK(txn->Get(roptions, Slice("a"), &value)); ASSERT_EQ(value, "v7"); @@ -108,6 +110,8 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { // Test Next(). wup_txn->unprep_seqs_[snapshot2->GetSequenceNumber() + 1] = snapshot4->GetSequenceNumber() - snapshot2->GetSequenceNumber(); + delete iter; + iter = txn->GetIterator(roptions); iter->Seek("a"); verify_state(iter, "a", "v3"); @@ -123,6 +127,8 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { wup_txn->unprep_seqs_[snapshot6->GetSequenceNumber() + 1] = snapshot8->GetSequenceNumber() - snapshot6->GetSequenceNumber(); + delete iter; + iter = txn->GetIterator(roptions); iter->Seek("a"); verify_state(iter, "a", "v7"); @@ -143,11 +149,11 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { // // Because of row locks and ValidateSnapshot, there cannot be any committed // entries after snapshot, but before the first prepared key. - delete iter; roptions.snapshot = snapshot2; - iter = txn->GetIterator(roptions); wup_txn->unprep_seqs_[snapshot2->GetSequenceNumber() + 1] = snapshot4->GetSequenceNumber() - snapshot2->GetSequenceNumber(); + delete iter; + iter = txn->GetIterator(roptions); iter->SeekForPrev("b"); verify_state(iter, "b", "v4"); @@ -161,11 +167,11 @@ TEST_P(WriteUnpreparedTransactionTest, ReadYourOwnWrite) { iter->Prev(); verify_state(iter, "a", "v3"); - delete iter; roptions.snapshot = snapshot6; - iter = txn->GetIterator(roptions); wup_txn->unprep_seqs_[snapshot6->GetSequenceNumber() + 1] = snapshot8->GetSequenceNumber() - snapshot6->GetSequenceNumber(); + delete iter; + iter = txn->GetIterator(roptions); iter->SeekForPrev("b"); verify_state(iter, "b", "v8"); diff --git a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.cc b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.cc index d4efe8ff9..731460eda 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.cc @@ -16,7 +16,7 @@ namespace rocksdb { -bool WriteUnpreparedTxnReadCallback::IsVisible(SequenceNumber seq) { +bool WriteUnpreparedTxnReadCallback::IsVisibleFullCheck(SequenceNumber seq) { auto unprep_seqs = txn_->GetUnpreparedSequenceNumbers(); // Since unprep_seqs maps prep_seq => prepare_batch_cnt, to check if seq is @@ -31,15 +31,15 @@ bool WriteUnpreparedTxnReadCallback::IsVisible(SequenceNumber seq) { } } - return db_->IsInSnapshot(seq, snapshot_, min_uncommitted_); + return db_->IsInSnapshot(seq, wup_snapshot_, min_uncommitted_); } -SequenceNumber WriteUnpreparedTxnReadCallback::MaxUnpreparedSequenceNumber() { - auto unprep_seqs = txn_->GetUnpreparedSequenceNumbers(); +SequenceNumber WriteUnpreparedTxnReadCallback::CalcMaxUnpreparedSequenceNumber( + WriteUnpreparedTxn* txn) { + auto unprep_seqs = txn->GetUnpreparedSequenceNumbers(); if (unprep_seqs.size()) { return unprep_seqs.rbegin()->first + unprep_seqs.rbegin()->second - 1; } - return 0; } @@ -84,87 +84,81 @@ void WriteUnpreparedTxn::Initialize(const TransactionOptions& txn_options) { } Status WriteUnpreparedTxn::Put(ColumnFamilyHandle* column_family, - const Slice& key, const Slice& value) { + const Slice& key, const Slice& value, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::Put(column_family, key, value); + return TransactionBaseImpl::Put(column_family, key, value, assume_tracked); } Status WriteUnpreparedTxn::Put(ColumnFamilyHandle* column_family, - const SliceParts& key, const SliceParts& value) { + const SliceParts& key, const SliceParts& value, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::Put(column_family, key, value); + return TransactionBaseImpl::Put(column_family, key, value, assume_tracked); } Status WriteUnpreparedTxn::Merge(ColumnFamilyHandle* column_family, - const Slice& key, const Slice& value) { + const Slice& key, const Slice& value, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::Merge(column_family, key, value); + return TransactionBaseImpl::Merge(column_family, key, value, assume_tracked); } Status WriteUnpreparedTxn::Delete(ColumnFamilyHandle* column_family, - const Slice& key) { + const Slice& key, const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::Delete(column_family, key); + return TransactionBaseImpl::Delete(column_family, key, assume_tracked); } Status WriteUnpreparedTxn::Delete(ColumnFamilyHandle* column_family, - const SliceParts& key) { + const SliceParts& key, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::Delete(column_family, key); + return TransactionBaseImpl::Delete(column_family, key, assume_tracked); } Status WriteUnpreparedTxn::SingleDelete(ColumnFamilyHandle* column_family, - const Slice& key) { + const Slice& key, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::SingleDelete(column_family, key); + return TransactionBaseImpl::SingleDelete(column_family, key, assume_tracked); } Status WriteUnpreparedTxn::SingleDelete(ColumnFamilyHandle* column_family, - const SliceParts& key) { + const SliceParts& key, + const bool assume_tracked) { Status s = MaybeFlushWriteBatchToDB(); if (!s.ok()) { return s; } - return TransactionBaseImpl::SingleDelete(column_family, key); + return TransactionBaseImpl::SingleDelete(column_family, key, assume_tracked); } Status WriteUnpreparedTxn::MaybeFlushWriteBatchToDB() { const bool kPrepared = true; Status s; - - bool needs_mark = (log_number_ == 0); - if (max_write_batch_size_ != 0 && write_batch_.GetDataSize() > max_write_batch_size_) { assert(GetState() != PREPARED); s = FlushWriteBatchToDB(!kPrepared); - if (s.ok()) { - assert(log_number_ > 0); - // This is done to prevent WAL files after log_number_ from being - // deleted, because they could potentially contain unprepared batches. - if (needs_mark) { - dbimpl_->logs_with_prep_tracker()->MarkLogAsContainingPrepSection( - log_number_); - } - } } return s; } @@ -192,6 +186,7 @@ Status WriteUnpreparedTxn::FlushWriteBatchToDB(bool prepared) { WriteOptions write_options = write_options_; write_options.disableWAL = false; const bool WRITE_AFTER_COMMIT = true; + const bool first_prepare_batch = log_number_ == 0; // MarkEndPrepare will change Noop marker to the appropriate marker. WriteBatchInternal::MarkEndPrepare(GetWriteBatch()->GetWriteBatch(), name_, !WRITE_AFTER_COMMIT, !prepared); @@ -201,11 +196,11 @@ Status WriteUnpreparedTxn::FlushWriteBatchToDB(bool prepared) { // is a non-zero chance of max advancing prepare_seq and readers assume the // data as committed. // Also having it in the PreReleaseCallback allows in-order addition of - // prepared entries to PrepareHeap and hence enables an optimization. Refer to - // SmallestUnCommittedSeq for more details. + // prepared entries to PreparedHeap and hence enables an optimization. Refer + // to SmallestUnCommittedSeq for more details. AddPreparedCallback add_prepared_callback( - wpt_db_, prepare_batch_cnt_, - db_impl_->immutable_db_options().two_write_queues); + wpt_db_, db_impl_, prepare_batch_cnt_, + db_impl_->immutable_db_options().two_write_queues, first_prepare_batch); const bool DISABLE_MEMTABLE = true; uint64_t seq_used = kMaxSequenceNumber; // log_number_ should refer to the oldest log containing uncommitted data @@ -327,8 +322,9 @@ Status WriteUnpreparedTxn::CommitInternal() { public: explicit PublishSeqPreReleaseCallback(DBImpl* db_impl) : db_impl_(db_impl) {} - virtual Status Callback(SequenceNumber seq, bool is_mem_disabled - __attribute__((__unused__))) override { + Status Callback(SequenceNumber seq, + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { assert(is_mem_disabled); assert(db_impl_->immutable_db_options().two_write_queues); db_impl_->SetLastPublishedSequence(seq); @@ -366,15 +362,14 @@ Status WriteUnpreparedTxn::RollbackInternal() { assert(GetId() != kMaxSequenceNumber); assert(GetId() > 0); const auto& cf_map = *wupt_db_->GetCFHandleMap(); - // In WritePrepared, the txn is is the same as prepare seq - auto last_visible_txn = GetId() - 1; + auto read_at_seq = kMaxSequenceNumber; Status s; ReadOptions roptions; // Note that we do not use WriteUnpreparedTxnReadCallback because we do not // need to read our own writes when reading prior versions of the key for // rollback. - WritePreparedTxnReadCallback callback(wpt_db_, last_visible_txn, 0); + WritePreparedTxnReadCallback callback(wpt_db_, read_at_seq); for (const auto& cfkey : write_set_keys_) { const auto cfid = cfkey.first; const auto& keys = cfkey.second; @@ -442,9 +437,8 @@ Status WriteUnpreparedTxn::RollbackInternal() { prepare_seq); // Commit the batch by writing an empty batch to the queue that will release // the commit sequence number to readers. - const size_t ZERO_COMMITS = 0; - WritePreparedCommitEntryPreReleaseCallback update_commit_map_with_prepare( - wpt_db_, db_impl_, prepare_seq, ONE_BATCH, ZERO_COMMITS); + WriteUnpreparedRollbackPreReleaseCallback update_commit_map_with_prepare( + wpt_db_, db_impl_, unprep_seqs_, prepare_seq); WriteBatch empty_batch; empty_batch.PutLogData(Slice()); // In the absence of Prepare markers, use Noop as a batch separator @@ -454,19 +448,7 @@ Status WriteUnpreparedTxn::RollbackInternal() { &update_commit_map_with_prepare); assert(!s.ok() || seq_used != kMaxSequenceNumber); // Mark the txn as rolled back - uint64_t& rollback_seq = seq_used; if (s.ok()) { - // Note: it is safe to do it after PreReleaseCallback via WriteImpl since - // all the writes by the prpared batch are already blinded by the rollback - // batch. The only reason we commit the prepared batch here is to benefit - // from the existing mechanism in CommitCache that takes care of the rare - // cases that the prepare seq is visible to a snsapshot but max evicted seq - // advances that prepare seq. - for (const auto& seq : unprep_seqs_) { - for (size_t i = 0; i < seq.second; i++) { - wpt_db_->AddCommitted(seq.first + i, rollback_seq); - } - } for (const auto& seq : unprep_seqs_) { wpt_db_->RemovePrepared(seq.first, seq.second); } @@ -483,7 +465,8 @@ Status WriteUnpreparedTxn::Get(const ReadOptions& options, auto snapshot = options.snapshot; auto snap_seq = snapshot != nullptr ? snapshot->GetSequenceNumber() : kMaxSequenceNumber; - SequenceNumber min_uncommitted = 0; // by default disable the optimization + SequenceNumber min_uncommitted = + kMinUnCommittedSeq; // by default disable the optimization if (snapshot != nullptr) { min_uncommitted = static_cast_with_check(snapshot) diff --git a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.h b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.h index 84594070a..68567eb6e 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.h +++ b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn.h @@ -23,19 +23,37 @@ class WriteUnpreparedTxnReadCallback : public ReadCallback { SequenceNumber snapshot, SequenceNumber min_uncommitted, WriteUnpreparedTxn* txn) - : db_(db), - snapshot_(snapshot), - min_uncommitted_(min_uncommitted), - txn_(txn) {} - - virtual bool IsVisible(SequenceNumber seq) override; - virtual SequenceNumber MaxUnpreparedSequenceNumber() override; + // Pass our last uncommitted seq as the snapshot to the parent class to + // ensure that the parent will not prematurely filter out own writes. We + // will do the exact comparison agaisnt snapshots in IsVisibleFullCheck + // override. + : ReadCallback(CalcMaxVisibleSeq(txn, snapshot), min_uncommitted), + db_(db), + txn_(txn), + wup_snapshot_(snapshot) {} + + virtual bool IsVisibleFullCheck(SequenceNumber seq) override; + + bool CanReseekToSkip() override { + return wup_snapshot_ == max_visible_seq_; + // Otherwise our own writes uncommitted are in db, and the assumptions + // behind reseek optimizations are no longer valid. + } + // TODO(myabandeh): override Refresh when Iterator::Refresh is supported private: + static SequenceNumber CalcMaxVisibleSeq(WriteUnpreparedTxn* txn, + SequenceNumber snapshot_seq) { + SequenceNumber max_unprepared = CalcMaxUnpreparedSequenceNumber(txn); + assert(snapshot_seq < max_unprepared || max_unprepared == 0 || + snapshot_seq == kMaxSequenceNumber); + return std::max(max_unprepared, snapshot_seq); + } + static SequenceNumber CalcMaxUnpreparedSequenceNumber( + WriteUnpreparedTxn* txn); WritePreparedTxnDB* db_; - SequenceNumber snapshot_; - SequenceNumber min_uncommitted_; WriteUnpreparedTxn* txn_; + SequenceNumber wup_snapshot_; }; class WriteUnpreparedTxn : public WritePreparedTxn { @@ -48,25 +66,31 @@ class WriteUnpreparedTxn : public WritePreparedTxn { using TransactionBaseImpl::Put; virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) override; + const Slice& value, + const bool assume_tracked = false) override; virtual Status Put(ColumnFamilyHandle* column_family, const SliceParts& key, - const SliceParts& value) override; + const SliceParts& value, + const bool assume_tracked = false) override; using TransactionBaseImpl::Merge; virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key, - const Slice& value) override; + const Slice& value, + const bool assume_tracked = false) override; using TransactionBaseImpl::Delete; + virtual Status Delete(ColumnFamilyHandle* column_family, const Slice& key, + const bool assume_tracked = false) override; virtual Status Delete(ColumnFamilyHandle* column_family, - const Slice& key) override; - virtual Status Delete(ColumnFamilyHandle* column_family, - const SliceParts& key) override; + const SliceParts& key, + const bool assume_tracked = false) override; using TransactionBaseImpl::SingleDelete; virtual Status SingleDelete(ColumnFamilyHandle* column_family, - const Slice& key) override; + const Slice& key, + const bool assume_tracked = false) override; virtual Status SingleDelete(ColumnFamilyHandle* column_family, - const SliceParts& key) override; + const SliceParts& key, + const bool assume_tracked = false) override; virtual Status RebuildFromWriteBatch(WriteBatch*) override { // This function was only useful for recovering prepared transactions, but diff --git a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.cc b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.cc index 51bb30818..55ca2b3ea 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.cc +++ b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.cc @@ -29,6 +29,27 @@ Status WriteUnpreparedTxnDB::RollbackRecoveredTransaction( // rollback batch. w_options.disableWAL = true; + class InvalidSnapshotReadCallback : public ReadCallback { + public: + InvalidSnapshotReadCallback(WritePreparedTxnDB* db, SequenceNumber snapshot) + : ReadCallback(snapshot), db_(db) {} + + // Will be called to see if the seq number visible; if not it moves on to + // the next seq number. + inline bool IsVisibleFullCheck(SequenceNumber seq) override { + // Becomes true if it cannot tell by comparing seq with snapshot seq since + // the snapshot is not a real snapshot. + auto snapshot = max_visible_seq_; + bool released = false; + auto ret = db_->IsInSnapshot(seq, snapshot, min_uncommitted_, &released); + assert(!released || ret); + return ret; + } + + private: + WritePreparedTxnDB* db_; + }; + // Iterate starting with largest sequence number. for (auto it = rtxn->batches_.rbegin(); it != rtxn->batches_.rend(); it++) { auto last_visible_txn = it->first - 1; @@ -38,7 +59,7 @@ Status WriteUnpreparedTxnDB::RollbackRecoveredTransaction( struct RollbackWriteBatchBuilder : public WriteBatch::Handler { DBImpl* db_; ReadOptions roptions; - WritePreparedTxnReadCallback callback; + InvalidSnapshotReadCallback callback; WriteBatch* rollback_batch_; std::map& comparators_; std::map& handles_; @@ -52,8 +73,8 @@ Status WriteUnpreparedTxnDB::RollbackRecoveredTransaction( std::map& handles, bool rollback_merge_operands) : db_(db), - callback(wpt_db, snap_seq, - 0), // 0 disables min_uncommitted optimization + callback(wpt_db, snap_seq), + // disable min_uncommitted optimization rollback_batch_(dst_batch), comparators_(comparators), handles_(handles), @@ -173,11 +194,9 @@ Status WriteUnpreparedTxnDB::Initialize( public: explicit CommitSubBatchPreReleaseCallback(WritePreparedTxnDB* db) : db_(db) {} - virtual Status Callback(SequenceNumber commit_seq, - bool is_mem_disabled) override { -#ifdef NDEBUG - (void)is_mem_disabled; -#endif + Status Callback(SequenceNumber commit_seq, + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { assert(!is_mem_disabled); db_->AddCommitted(commit_seq, commit_seq); return Status::OK(); @@ -214,11 +233,6 @@ Status WriteUnpreparedTxnDB::Initialize( compaction_enabled_cf_handles.push_back(handles[index]); } - Status s = EnableAutoCompaction(compaction_enabled_cf_handles); - if (!s.ok()) { - return s; - } - // create 'real' transactions from recovered shell transactions auto rtxns = dbimpl->recovered_transactions(); for (auto rtxn : rtxns) { @@ -250,7 +264,7 @@ Status WriteUnpreparedTxnDB::Initialize( real_trx->SetLogNumber(first_log_number); real_trx->SetId(first_seq); - s = real_trx->SetName(recovered_trx->name_); + Status s = real_trx->SetName(recovered_trx->name_); if (!s.ok()) { break; } @@ -288,6 +302,20 @@ Status WriteUnpreparedTxnDB::Initialize( SequenceNumber prev_max = max_evicted_seq_; SequenceNumber last_seq = db_impl_->GetLatestSequenceNumber(); AdvanceMaxEvictedSeq(prev_max, last_seq); + // Create a gap between max and the next snapshot. This simplifies the logic + // in IsInSnapshot by not having to consider the special case of max == + // snapshot after recovery. This is tested in IsInSnapshotEmptyMapTest. + if (last_seq) { + db_impl_->versions_->SetLastAllocatedSequence(last_seq + 1); + db_impl_->versions_->SetLastSequence(last_seq + 1); + db_impl_->versions_->SetLastPublishedSequence(last_seq + 1); + } + + // Compaction should start only after max_evicted_seq_ is set. + Status s = EnableAutoCompaction(compaction_enabled_cf_handles); + if (!s.ok()) { + return s; + } // Rollback unprepared transactions. for (auto rtxn : rtxns) { @@ -325,6 +353,7 @@ struct WriteUnpreparedTxnDB::IteratorState { std::shared_ptr s, SequenceNumber min_uncommitted, WriteUnpreparedTxn* txn) : callback(txn_db, sequence, min_uncommitted, txn), snapshot(s) {} + SequenceNumber MaxVisibleSeq() { return callback.max_visible_seq(); } WriteUnpreparedTxnReadCallback callback; std::shared_ptr snapshot; @@ -366,8 +395,8 @@ Iterator* WriteUnpreparedTxnDB::NewIterator(const ReadOptions& options, auto* state = new IteratorState(this, snapshot_seq, own_snapshot, min_uncommitted, txn); auto* db_iter = - db_impl_->NewIteratorImpl(options, cfd, snapshot_seq, &state->callback, - !ALLOW_BLOB, !ALLOW_REFRESH); + db_impl_->NewIteratorImpl(options, cfd, state->MaxVisibleSeq(), + &state->callback, !ALLOW_BLOB, !ALLOW_REFRESH); db_iter->RegisterCleanup(CleanupWriteUnpreparedTxnDBIterator, state, nullptr); return db_iter; } diff --git a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.h b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.h index 6763aa99f..4b4e31e1b 100644 --- a/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.h +++ b/ceph/src/rocksdb/utilities/transactions/write_unprepared_txn_db.h @@ -59,8 +59,9 @@ class WriteUnpreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { assert(unprep_seqs.size() > 0); } - virtual Status Callback(SequenceNumber commit_seq, bool is_mem_disabled - __attribute__((__unused__))) override { + virtual Status Callback(SequenceNumber commit_seq, + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { const uint64_t last_commit_seq = LIKELY(data_batch_cnt_ <= 1) ? commit_seq : commit_seq + data_batch_cnt_ - 1; @@ -106,6 +107,45 @@ class WriteUnpreparedCommitEntryPreReleaseCallback : public PreReleaseCallback { bool publish_seq_; }; +class WriteUnpreparedRollbackPreReleaseCallback : public PreReleaseCallback { + // TODO(lth): Reduce code duplication with + // WritePreparedCommitEntryPreReleaseCallback + public: + WriteUnpreparedRollbackPreReleaseCallback( + WritePreparedTxnDB* db, DBImpl* db_impl, + const std::map& unprep_seqs, + SequenceNumber rollback_seq) + : db_(db), + db_impl_(db_impl), + unprep_seqs_(unprep_seqs), + rollback_seq_(rollback_seq) { + assert(unprep_seqs.size() > 0); + assert(db_impl_->immutable_db_options().two_write_queues); + } + + virtual Status Callback(SequenceNumber commit_seq, + bool is_mem_disabled __attribute__((__unused__)), + uint64_t) override { + assert(is_mem_disabled); // implies the 2nd queue + const uint64_t last_commit_seq = commit_seq; + db_->AddCommitted(rollback_seq_, last_commit_seq); + // Recall that unprep_seqs maps (un)prepared_seq => prepare_batch_cnt. + for (const auto& s : unprep_seqs_) { + for (size_t i = 0; i < s.second; i++) { + db_->AddCommitted(s.first + i, last_commit_seq); + } + } + db_impl_->SetLastPublishedSequence(last_commit_seq); + return Status::OK(); + } + + private: + WritePreparedTxnDB* db_; + DBImpl* db_impl_; + const std::map& unprep_seqs_; + SequenceNumber rollback_seq_; +}; + struct KeySetBuilder : public WriteBatch::Handler { WriteUnpreparedTxn* txn_; bool rollback_merge_operands_; diff --git a/ceph/src/rocksdb/utilities/ttl/db_ttl_impl.cc b/ceph/src/rocksdb/utilities/ttl/db_ttl_impl.cc index 1e8d5d0e8..7ec60b615 100644 --- a/ceph/src/rocksdb/utilities/ttl/db_ttl_impl.cc +++ b/ceph/src/rocksdb/utilities/ttl/db_ttl_impl.cc @@ -259,8 +259,8 @@ Status DBWithTTLImpl::Write(const WriteOptions& opts, WriteBatch* updates) { explicit Handler(Env* env) : env_(env) {} WriteBatch updates_ttl; Status batch_rewrite_status; - virtual Status PutCF(uint32_t column_family_id, const Slice& key, - const Slice& value) override { + Status PutCF(uint32_t column_family_id, const Slice& key, + const Slice& value) override { std::string value_with_ts; Status st = AppendTS(value, &value_with_ts, env_); if (!st.ok()) { @@ -271,8 +271,8 @@ Status DBWithTTLImpl::Write(const WriteOptions& opts, WriteBatch* updates) { } return Status::OK(); } - virtual Status MergeCF(uint32_t column_family_id, const Slice& key, - const Slice& value) override { + Status MergeCF(uint32_t column_family_id, const Slice& key, + const Slice& value) override { std::string value_with_ts; Status st = AppendTS(value, &value_with_ts, env_); if (!st.ok()) { @@ -283,14 +283,11 @@ Status DBWithTTLImpl::Write(const WriteOptions& opts, WriteBatch* updates) { } return Status::OK(); } - virtual Status DeleteCF(uint32_t column_family_id, - const Slice& key) override { + Status DeleteCF(uint32_t column_family_id, const Slice& key) override { WriteBatchInternal::Delete(&updates_ttl, column_family_id, key); return Status::OK(); } - virtual void LogData(const Slice& blob) override { - updates_ttl.PutLogData(blob); - } + void LogData(const Slice& blob) override { updates_ttl.PutLogData(blob); } private: Env* env_; diff --git a/ceph/src/rocksdb/utilities/ttl/ttl_test.cc b/ceph/src/rocksdb/utilities/ttl/ttl_test.cc index ee7b317aa..6a50eb29f 100644 --- a/ceph/src/rocksdb/utilities/ttl/ttl_test.cc +++ b/ceph/src/rocksdb/utilities/ttl/ttl_test.cc @@ -30,7 +30,7 @@ class SpecialTimeEnv : public EnvWrapper { } void Sleep(int64_t sleep_time) { current_time_ += sleep_time; } - virtual Status GetCurrentTime(int64_t* current_time) override { + Status GetCurrentTime(int64_t* current_time) override { *current_time = current_time_; return Status::OK(); } @@ -53,7 +53,7 @@ class TtlTest : public testing::Test { DestroyDB(dbname_, Options()); } - ~TtlTest() { + ~TtlTest() override { CloseTtl(); DestroyDB(dbname_, Options()); } @@ -301,9 +301,8 @@ class TtlTest : public testing::Test { // Keeps key if it is in [kSampleSize_/3, 2*kSampleSize_/3), // Change value if it is in [2*kSampleSize_/3, kSampleSize_) // Eg. kSampleSize_=6. Drop:key0-1...Keep:key2-3...Change:key4-5... - virtual bool Filter(int /*level*/, const Slice& key, const Slice& /*value*/, - std::string* new_value, - bool* value_changed) const override { + bool Filter(int /*level*/, const Slice& key, const Slice& /*value*/, + std::string* new_value, bool* value_changed) const override { assert(new_value != nullptr); std::string search_str = "0123456789"; @@ -334,9 +333,7 @@ class TtlTest : public testing::Test { } } - virtual const char* Name() const override { - return "TestFilter"; - } + const char* Name() const override { return "TestFilter"; } private: const int64_t kSampleSize_; @@ -350,17 +347,15 @@ class TtlTest : public testing::Test { kNewValue_(kNewValue) { } - virtual std::unique_ptr CreateCompactionFilter( + std::unique_ptr CreateCompactionFilter( const CompactionFilter::Context& /*context*/) override { return std::unique_ptr( new TestFilter(kSampleSize_, kNewValue_)); } - virtual const char* Name() const override { - return "TestFilterFactory"; - } + const char* Name() const override { return "TestFilterFactory"; } - private: + private: const int64_t kSampleSize_; const std::string kNewValue_; }; @@ -370,14 +365,14 @@ class TtlTest : public testing::Test { static const int64_t kSampleSize_ = 100; std::string dbname_; DBWithTTL* db_ttl_; - unique_ptr env_; + std::unique_ptr env_; private: Options options_; KVMap kvmap_; KVMap::iterator kv_it_; const std::string kNewValue_ = "new_value"; - unique_ptr test_comp_filter_; + std::unique_ptr test_comp_filter_; }; // class TtlTest // If TTL is non positive or not provided, the behaviour is TTL = infinity diff --git a/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index.cc b/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index.cc index 2202d6baf..c620ebd4d 100644 --- a/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index.cc +++ b/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index.cc @@ -42,7 +42,7 @@ class BaseDeltaIterator : public Iterator { delta_iterator_(delta_iterator), comparator_(comparator) {} - virtual ~BaseDeltaIterator() {} + ~BaseDeltaIterator() override {} bool Valid() const override { return current_at_base_ ? BaseValid() : DeltaValid(); @@ -340,9 +340,9 @@ class WBWIIteratorImpl : public WBWIIterator { skip_list_iter_(skip_list), write_batch_(write_batch) {} - virtual ~WBWIIteratorImpl() {} + ~WBWIIteratorImpl() override {} - virtual bool Valid() const override { + bool Valid() const override { if (!skip_list_iter_.Valid()) { return false; } @@ -351,14 +351,14 @@ class WBWIIteratorImpl : public WBWIIterator { iter_entry->column_family == column_family_id_); } - virtual void SeekToFirst() override { + void SeekToFirst() override { WriteBatchIndexEntry search_entry( nullptr /* search_key */, column_family_id_, true /* is_forward_direction */, true /* is_seek_to_first */); skip_list_iter_.Seek(&search_entry); } - virtual void SeekToLast() override { + void SeekToLast() override { WriteBatchIndexEntry search_entry( nullptr /* search_key */, column_family_id_ + 1, true /* is_forward_direction */, true /* is_seek_to_first */); @@ -370,25 +370,25 @@ class WBWIIteratorImpl : public WBWIIterator { } } - virtual void Seek(const Slice& key) override { + void Seek(const Slice& key) override { WriteBatchIndexEntry search_entry(&key, column_family_id_, true /* is_forward_direction */, false /* is_seek_to_first */); skip_list_iter_.Seek(&search_entry); } - virtual void SeekForPrev(const Slice& key) override { + void SeekForPrev(const Slice& key) override { WriteBatchIndexEntry search_entry(&key, column_family_id_, false /* is_forward_direction */, false /* is_seek_to_first */); skip_list_iter_.SeekForPrev(&search_entry); } - virtual void Next() override { skip_list_iter_.Next(); } + void Next() override { skip_list_iter_.Next(); } - virtual void Prev() override { skip_list_iter_.Prev(); } + void Prev() override { skip_list_iter_.Prev(); } - virtual WriteEntry Entry() const override { + WriteEntry Entry() const override { WriteEntry ret; Slice blob, xid; const WriteBatchIndexEntry* iter_entry = skip_list_iter_.key(); @@ -404,7 +404,7 @@ class WBWIIteratorImpl : public WBWIIterator { return ret; } - virtual Status status() const override { + Status status() const override { // this is in-memory data structure, so the only way status can be non-ok is // through memory corruption return Status::OK(); diff --git a/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_test.cc b/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_test.cc index d477968ca..be715fe32 100644 --- a/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_test.cc +++ b/ceph/src/rocksdb/utilities/write_batch_with_index/write_batch_with_index_test.cc @@ -45,8 +45,8 @@ struct Entry { struct TestHandler : public WriteBatch::Handler { std::map> seen; - virtual Status PutCF(uint32_t column_family_id, const Slice& key, - const Slice& value) { + Status PutCF(uint32_t column_family_id, const Slice& key, + const Slice& value) override { Entry e; e.key = key.ToString(); e.value = value.ToString(); @@ -54,8 +54,8 @@ struct TestHandler : public WriteBatch::Handler { seen[column_family_id].push_back(e); return Status::OK(); } - virtual Status MergeCF(uint32_t column_family_id, const Slice& key, - const Slice& value) { + Status MergeCF(uint32_t column_family_id, const Slice& key, + const Slice& value) override { Entry e; e.key = key.ToString(); e.value = value.ToString(); @@ -63,8 +63,8 @@ struct TestHandler : public WriteBatch::Handler { seen[column_family_id].push_back(e); return Status::OK(); } - virtual void LogData(const Slice& /*blob*/) {} - virtual Status DeleteCF(uint32_t column_family_id, const Slice& key) { + void LogData(const Slice& /*blob*/) override {} + Status DeleteCF(uint32_t column_family_id, const Slice& key) override { Entry e; e.key = key.ToString(); e.value = ""; @@ -506,22 +506,24 @@ typedef std::map KVMap; class KVIter : public Iterator { public: explicit KVIter(const KVMap* map) : map_(map), iter_(map_->end()) {} - virtual bool Valid() const { return iter_ != map_->end(); } - virtual void SeekToFirst() { iter_ = map_->begin(); } - virtual void SeekToLast() { + bool Valid() const override { return iter_ != map_->end(); } + void SeekToFirst() override { iter_ = map_->begin(); } + void SeekToLast() override { if (map_->empty()) { iter_ = map_->end(); } else { iter_ = map_->find(map_->rbegin()->first); } } - virtual void Seek(const Slice& k) { iter_ = map_->lower_bound(k.ToString()); } - virtual void SeekForPrev(const Slice& k) { + void Seek(const Slice& k) override { + iter_ = map_->lower_bound(k.ToString()); + } + void SeekForPrev(const Slice& k) override { iter_ = map_->upper_bound(k.ToString()); Prev(); } - virtual void Next() { ++iter_; } - virtual void Prev() { + void Next() override { ++iter_; } + void Prev() override { if (iter_ == map_->begin()) { iter_ = map_->end(); return; @@ -529,9 +531,9 @@ class KVIter : public Iterator { --iter_; } - virtual Slice key() const { return iter_->first; } - virtual Slice value() const { return iter_->second; } - virtual Status status() const { return Status::OK(); } + Slice key() const override { return iter_->first; } + Slice value() const override { return iter_->second; } + Status status() const override { return Status::OK(); } private: const KVMap* const map_; diff --git a/ceph/src/test/CMakeLists.txt b/ceph/src/test/CMakeLists.txt index 20a64fd41..5dcee1694 100644 --- a/ceph/src/test/CMakeLists.txt +++ b/ceph/src/test/CMakeLists.txt @@ -68,6 +68,7 @@ add_subdirectory(system) if(WITH_FIO OR WITH_SYSTEM_FIO) add_subdirectory(fio) endif() +add_subdirectory(lazy-omap-stats) # test_timers add_executable(ceph_test_timers diff --git a/ceph/src/test/cli-integration/rbd/snap-diff.t b/ceph/src/test/cli-integration/rbd/snap-diff.t new file mode 100644 index 000000000..c0f56399c --- /dev/null +++ b/ceph/src/test/cli-integration/rbd/snap-diff.t @@ -0,0 +1,48 @@ + $ ceph osd pool create xrbddiff1 8 + pool 'xrbddiff1' created + $ rbd pool init xrbddiff1 + $ rbd create --thick-provision --size 1M xrbddiff1/xtestdiff1 --no-progress + $ rbd diff xrbddiff1/xtestdiff1 --format json + [{"offset":0,"length":1048576,"exists":"true"}] + $ rbd rm xrbddiff1/xtestdiff1 --no-progress + $ rbd create --size 1M xrbddiff1/xtestdiff1 + $ rbd diff xrbddiff1/xtestdiff1 --format json + [] + $ rbd snap create xrbddiff1/xtestdiff1 --snap=allzeroes + $ rbd diff xrbddiff1/xtestdiff1 --format json + [] + $ rbd diff --from-snap=allzeroes xrbddiff1/xtestdiff1 --format json + [] + $ rbd bench --io-type write --io-size 1M --io-total 1M xrbddiff1/xtestdiff1 > /dev/null 2>&1 + $ rbd diff xrbddiff1/xtestdiff1 --format json + [{"offset":0,"length":1048576,"exists":"true"}] + $ rbd diff --from-snap=allzeroes xrbddiff1/xtestdiff1 --format json + [{"offset":0,"length":1048576,"exists":"true"}] + $ rbd snap create xrbddiff1/xtestdiff1 --snap=snap1 + $ rbd snap list xrbddiff1/xtestdiff1 --format json | python -mjson.tool | sed 's/,$/, /' + [ + { + "id": *, (glob) + "name": "allzeroes", + "protected": "false", + "size": 1048576, + "timestamp": * (glob) + }, + { + "id": *, (glob) + "name": "snap1", + "protected": "false", + "size": 1048576, + "timestamp": * (glob) + } + ] + $ rbd diff --from-snap=snap1 xrbddiff1/xtestdiff1 --format json + [] + $ rbd snap rollback xrbddiff1/xtestdiff1@snap1 --no-progress + $ rbd diff --from-snap=snap1 xrbddiff1/xtestdiff1 --format json + [] + $ rbd snap rollback xrbddiff1/xtestdiff1@allzeroes --no-progress + $ rbd diff --from-snap=allzeroes xrbddiff1/xtestdiff1 --format json + [{"offset":0,"length":1048576,"exists":"false"}] + $ ceph osd pool rm xrbddiff1 xrbddiff1 --yes-i-really-really-mean-it + pool 'xrbddiff1' removed diff --git a/ceph/src/test/cli/ceph-kvstore-tool/help.t b/ceph/src/test/cli/ceph-kvstore-tool/help.t index d27fddc08..1f984b11e 100644 --- a/ceph/src/test/cli/ceph-kvstore-tool/help.t +++ b/ceph/src/test/cli/ceph-kvstore-tool/help.t @@ -18,4 +18,5 @@ compact-prefix compact-range destructive-repair (use only as last resort! may corrupt healthy data) + stats diff --git a/ceph/src/test/cli/radosgw-admin/help.t b/ceph/src/test/cli/radosgw-admin/help.t index 9ae2580e9..08122fdc9 100644 --- a/ceph/src/test/cli/radosgw-admin/help.t +++ b/ceph/src/test/cli/radosgw-admin/help.t @@ -17,7 +17,8 @@ subuser rm remove subuser key create create access key key rm remove access key - bucket list list buckets + bucket list list buckets (specify --allow-unordered for + faster, unsorted listing) bucket limit check show bucket sharding stats bucket link link bucket to specified user bucket unlink unlink bucket from specified user diff --git a/ceph/src/test/common/test_context.cc b/ceph/src/test/common/test_context.cc index 1d1e22e25..9aec89114 100644 --- a/ceph/src/test/common/test_context.cc +++ b/ceph/src/test/common/test_context.cc @@ -56,7 +56,7 @@ TEST(CephContext, do_command) bufferlist out; cct->do_command("config diff get", cmdmap, "xml", &out); string s(out.c_str(), out.length()); - EXPECT_EQ("" + value + "value", s); + EXPECT_EQ("" + value + "value6161", s); } cct->put(); } diff --git a/ceph/src/test/journal/mock/MockJournaler.h b/ceph/src/test/journal/mock/MockJournaler.h index b925ddfeb..236a42f90 100644 --- a/ceph/src/test/journal/mock/MockJournaler.h +++ b/ceph/src/test/journal/mock/MockJournaler.h @@ -120,7 +120,8 @@ struct MockJournaler { MOCK_METHOD0(stop_replay, void()); MOCK_METHOD1(stop_replay, void(Context *on_finish)); - MOCK_METHOD4(start_append, void(int, uint64_t, double, uint64_t)); + MOCK_METHOD1(start_append, void(uint64_t)); + MOCK_METHOD3(set_append_batch_options, void(int, uint64_t, double)); MOCK_CONST_METHOD0(get_max_append_size, uint64_t()); MOCK_METHOD2(append, MockFutureProxy(uint64_t tag_id, const bufferlist &bl)); @@ -257,11 +258,14 @@ struct MockJournalerProxy { MockJournaler::get_instance().stop_replay(on_finish); } - void start_append(int flush_interval, uint64_t flush_bytes, double flush_age, - uint64_t max_in_flight_appends) { - MockJournaler::get_instance().start_append(flush_interval, flush_bytes, - flush_age, - max_in_flight_appends); + void start_append(uint64_t max_in_flight_appends) { + MockJournaler::get_instance().start_append(max_in_flight_appends); + } + + void set_append_batch_options(int flush_interval, uint64_t flush_bytes, + double flush_age) { + MockJournaler::get_instance().set_append_batch_options( + flush_interval, flush_bytes, flush_age); } uint64_t get_max_append_size() const { diff --git a/ceph/src/test/journal/test_JournalRecorder.cc b/ceph/src/test/journal/test_JournalRecorder.cc index fb7c06772..7197526a1 100644 --- a/ceph/src/test/journal/test_JournalRecorder.cc +++ b/ceph/src/test/journal/test_JournalRecorder.cc @@ -22,8 +22,9 @@ public: journal::JournalRecorder *create_recorder( const std::string &oid, const journal::JournalMetadataPtr &metadata) { journal::JournalRecorder *recorder(new journal::JournalRecorder( - m_ioctx, oid + ".", metadata, 0, std::numeric_limits::max(), - 0, 0)); + m_ioctx, oid + ".", metadata, 0)); + recorder->set_append_batch_options(0, std::numeric_limits::max(), + 0); m_recorders.push_back(recorder); return recorder; } diff --git a/ceph/src/test/journal/test_ObjectPlayer.cc b/ceph/src/test/journal/test_ObjectPlayer.cc index 3c255c9ed..b78bd219f 100644 --- a/ceph/src/test/journal/test_ObjectPlayer.cc +++ b/ceph/src/test/journal/test_ObjectPlayer.cc @@ -155,21 +155,25 @@ TYPED_TEST(TestObjectPlayer, FetchCorrupt) { journal::Entry entry1(234, 123, this->create_payload(std::string(24, '1'))); journal::Entry entry2(234, 124, this->create_payload(std::string(24, '2'))); + journal::Entry entry3(234, 125, this->create_payload(std::string(24, '3'))); bufferlist bl; encode(entry1, bl); - encode(this->create_payload("corruption"), bl); + encode(this->create_payload("corruption" + std::string(1024, 'X')), bl); encode(entry2, bl); + encode(this->create_payload("corruption" + std::string(1024, 'Y')), bl); + encode(entry3, bl); ASSERT_EQ(0, this->append(this->get_object_name(oid), bl)); journal::ObjectPlayerPtr object = this->create_object(oid, 14); ASSERT_EQ(-EBADMSG, this->fetch(object)); + ASSERT_EQ(0, this->fetch(object)); journal::ObjectPlayer::Entries entries; object->get_entries(&entries); - ASSERT_EQ(2U, entries.size()); + ASSERT_EQ(3U, entries.size()); - journal::ObjectPlayer::Entries expected_entries = {entry1, entry2}; + journal::ObjectPlayer::Entries expected_entries = {entry1, entry2, entry3}; ASSERT_EQ(expected_entries, entries); } diff --git a/ceph/src/test/journal/test_ObjectRecorder.cc b/ceph/src/test/journal/test_ObjectRecorder.cc index 21c741e58..3cc8e893c 100644 --- a/ceph/src/test/journal/test_ObjectRecorder.cc +++ b/ceph/src/test/journal/test_ObjectRecorder.cc @@ -72,14 +72,12 @@ public: RadosTestFixture::TearDown(); } - inline void set_flush_interval(uint32_t i) { - m_flush_interval = i; - } - inline void set_flush_bytes(uint64_t i) { - m_flush_bytes = i; - } - inline void set_flush_age(double i) { - m_flush_age = i; + inline void set_batch_options(uint32_t flush_interval, uint64_t flush_bytes, + double flush_age, int max_in_flight) { + m_flush_interval = flush_interval; + m_flush_bytes = flush_bytes; + m_flush_age = flush_age; + m_max_in_flight_appends = max_in_flight; } journal::AppendBuffer create_append_buffer(uint64_t tag_tid, uint64_t entry_tid, @@ -96,9 +94,13 @@ public: journal::ObjectRecorderPtr create_object(const std::string &oid, uint8_t order, shared_ptr lock) { journal::ObjectRecorderPtr object(new journal::ObjectRecorder( - m_ioctx, oid, 0, lock, m_work_queue, *m_timer, m_timer_lock, &m_handler, - order, m_flush_interval, m_flush_bytes, m_flush_age, + m_ioctx, oid, 0, lock, m_work_queue, &m_handler, order, m_max_in_flight_appends)); + { + Mutex::Locker locker(*lock); + object->set_append_batch_options(m_flush_interval, m_flush_bytes, + m_flush_age); + } m_object_recorders.push_back(object); m_object_recorder_locks.insert(std::make_pair(oid, lock)); m_handler.object_lock = lock; @@ -113,6 +115,7 @@ TEST_F(TestObjectRecorder, Append) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); + set_batch_options(0, 0, 0, 0); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -121,15 +124,17 @@ TEST_F(TestObjectRecorder, Append) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); - ASSERT_EQ(1U, object->get_pending_appends()); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); + ASSERT_EQ(0U, object->get_pending_appends()); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, "payload"); append_buffers = {append_buffer2}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); - ASSERT_EQ(2U, object->get_pending_appends()); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); + ASSERT_EQ(0U, object->get_pending_appends()); C_SaferCond cond; append_buffer2.first->flush(&cond); @@ -144,7 +149,7 @@ TEST_F(TestObjectRecorder, AppendFlushByCount) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); - set_flush_interval(2); + set_batch_options(2, 0, 0, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -153,14 +158,16 @@ TEST_F(TestObjectRecorder, AppendFlushByCount) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(1U, object->get_pending_appends()); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, "payload"); append_buffers = {append_buffer2}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(0U, object->get_pending_appends()); C_SaferCond cond; @@ -175,7 +182,7 @@ TEST_F(TestObjectRecorder, AppendFlushByBytes) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); - set_flush_bytes(10); + set_batch_options(0, 10, 0, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -184,14 +191,16 @@ TEST_F(TestObjectRecorder, AppendFlushByBytes) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(1U, object->get_pending_appends()); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, "payload"); append_buffers = {append_buffer2}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(0U, object->get_pending_appends()); C_SaferCond cond; @@ -206,7 +215,7 @@ TEST_F(TestObjectRecorder, AppendFlushByAge) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); - set_flush_age(0.1); + set_batch_options(0, 0, 0.1, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -215,13 +224,15 @@ TEST_F(TestObjectRecorder, AppendFlushByAge) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, "payload"); append_buffers = {append_buffer2}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); C_SaferCond cond; append_buffer2.first->wait(&cond); @@ -245,13 +256,15 @@ TEST_F(TestObjectRecorder, AppendFilledObject) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, payload); append_buffers = {append_buffer2}; lock->Lock(); - ASSERT_TRUE(object->append_unlock(std::move(append_buffers))); + ASSERT_TRUE(object->append(std::move(append_buffers))); + lock->Unlock(); C_SaferCond cond; append_buffer2.first->wait(&cond); @@ -266,6 +279,7 @@ TEST_F(TestObjectRecorder, Flush) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); + set_batch_options(0, 10, 0, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -274,7 +288,8 @@ TEST_F(TestObjectRecorder, Flush) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(1U, object->get_pending_appends()); C_SaferCond cond1; @@ -294,6 +309,7 @@ TEST_F(TestObjectRecorder, FlushFuture) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); + set_batch_options(0, 10, 0, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -302,15 +318,13 @@ TEST_F(TestObjectRecorder, FlushFuture) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(1U, object->get_pending_appends()); C_SaferCond cond; append_buffer.first->wait(&cond); - lock->Lock(); object->flush(append_buffer.first); - ASSERT_TRUE(lock->is_locked()); - lock->Unlock(); ASSERT_TRUE(append_buffer.first->is_flush_in_progress() || append_buffer.first->is_complete()); ASSERT_EQ(0, cond.wait()); @@ -332,13 +346,11 @@ TEST_F(TestObjectRecorder, FlushDetachedFuture) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer}; - lock->Lock(); object->flush(append_buffer.first); - ASSERT_TRUE(lock->is_locked()); - lock->Unlock(); ASSERT_FALSE(append_buffer.first->is_flush_in_progress()); lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); // should automatically flush once its attached to the object C_SaferCond cond; @@ -353,7 +365,7 @@ TEST_F(TestObjectRecorder, Close) { journal::JournalMetadataPtr metadata = create_metadata(oid); ASSERT_EQ(0, init_metadata(metadata)); - set_flush_interval(2); + set_batch_options(2, 0, 0, -1); shared_ptr lock(new Mutex("object_recorder_lock")); journal::ObjectRecorderPtr object = create_object(oid, 24, lock); @@ -362,7 +374,8 @@ TEST_F(TestObjectRecorder, Close) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1}; lock->Lock(); - ASSERT_FALSE(object->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object->append(std::move(append_buffers))); + lock->Unlock(); ASSERT_EQ(1U, object->get_pending_appends()); lock->Lock(); @@ -393,10 +406,8 @@ TEST_F(TestObjectRecorder, Overflow) { shared_ptr lock1(new Mutex("object_recorder_lock_1")); journal::ObjectRecorderPtr object1 = create_object(oid, 12, lock1); - shared_ptr lock2(new Mutex("object_recorder_lock_2")); - journal::ObjectRecorderPtr object2 = create_object(oid, 12, lock2); - std::string payload(2048, '1'); + std::string payload(1 << 11, '1'); journal::AppendBuffer append_buffer1 = create_append_buffer(234, 123, payload); journal::AppendBuffer append_buffer2 = create_append_buffer(234, 124, @@ -404,22 +415,43 @@ TEST_F(TestObjectRecorder, Overflow) { journal::AppendBuffers append_buffers; append_buffers = {append_buffer1, append_buffer2}; lock1->Lock(); - ASSERT_TRUE(object1->append_unlock(std::move(append_buffers))); + ASSERT_TRUE(object1->append(std::move(append_buffers))); + lock1->Unlock(); C_SaferCond cond; append_buffer2.first->wait(&cond); ASSERT_EQ(0, cond.wait()); ASSERT_EQ(0U, object1->get_pending_appends()); + bool overflowed = false; + { + Mutex::Locker locker(m_handler.lock); + while (m_handler.overflows == 0) { + if (m_handler.cond.WaitInterval( + m_handler.lock, utime_t(10, 0)) != 0) { + break; + } + } + if (m_handler.overflows != 0) { + overflowed = true; + m_handler.overflows = 0; + } + } + + ASSERT_TRUE(overflowed); + + shared_ptr lock2(new Mutex("object_recorder_lock_2")); + journal::ObjectRecorderPtr object2 = create_object(oid, 12, lock2); + journal::AppendBuffer append_buffer3 = create_append_buffer(456, 123, payload); append_buffers = {append_buffer3}; - lock2->Lock(); - ASSERT_FALSE(object2->append_unlock(std::move(append_buffers))); + ASSERT_FALSE(object2->append(std::move(append_buffers))); + lock2->Unlock(); append_buffer3.first->flush(NULL); - bool overflowed = false; + overflowed = false; { Mutex::Locker locker(m_handler.lock); while (m_handler.overflows == 0) { diff --git a/ceph/src/test/lazy-omap-stats/CMakeLists.txt b/ceph/src/test/lazy-omap-stats/CMakeLists.txt new file mode 100644 index 000000000..fad71f135 --- /dev/null +++ b/ceph/src/test/lazy-omap-stats/CMakeLists.txt @@ -0,0 +1,10 @@ +# Lazy omap stat collection tests + +add_executable(ceph_test_lazy_omap_stats + main.cc + lazy_omap_stats_test.cc) +target_link_libraries(ceph_test_lazy_omap_stats + librados ${UNITTEST_LIBS} Boost::system) +install(TARGETS + ceph_test_lazy_omap_stats + DESTINATION ${CMAKE_INSTALL_BINDIR}) diff --git a/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.cc b/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.cc new file mode 100644 index 000000000..dd461429f --- /dev/null +++ b/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.cc @@ -0,0 +1,567 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2019 Red Hat + * + * This is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License version 2.1, as published by the Free Software + * Foundation. See file COPYING. + * + */ + + +#include +#include +#include +#include +#include // uuid class +#include // generators +#include // streaming operators etc. +#include +#include +#include +#include + +#include "lazy_omap_stats_test.h" + +using namespace std; +namespace bp = boost::process; + +void LazyOmapStatsTest::init(const int argc, const char** argv) +{ + int ret = rados.init("admin"); + if (ret < 0) { + ret = -ret; + cerr << "Failed to initialise rados! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } + + ret = rados.conf_parse_argv(argc, argv); + if (ret < 0) { + ret = -ret; + cerr << "Failed to parse command line config options! Error: " << ret << " " + << strerror(ret) << endl; + exit(ret); + } + + rados.conf_parse_env(NULL); + if (ret < 0) { + ret = -ret; + cerr << "Failed to parse environment! Error: " << ret << " " + << strerror(ret) << endl; + exit(ret); + } + + rados.conf_read_file(NULL); + if (ret < 0) { + ret = -ret; + cerr << "Failed to read config file! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } + + ret = rados.connect(); + if (ret < 0) { + ret = -ret; + cerr << "Failed to connect to running cluster! Error: " << ret << " " + << strerror(ret) << endl; + exit(ret); + } + + string command = R"( + { + "prefix": "osd pool create", + "pool": ")" + conf.pool_name + + R"(", + "pool_type": "replicated", + "size": )" + to_string(conf.replica_count) + + R"( + })"; + librados::bufferlist inbl; + string output; + ret = rados.mon_command(command, inbl, nullptr, &output); + if (output.length()) cout << output << endl; + if (ret < 0) { + ret = -ret; + cerr << "Failed to create pool! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } + + ret = rados.ioctx_create(conf.pool_name.c_str(), io_ctx); + if (ret < 0) { + ret = -ret; + cerr << "Failed to create ioctx! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } +} + +void LazyOmapStatsTest::shutdown() +{ + rados.pool_delete(conf.pool_name.c_str()); + rados.shutdown(); +} + +void LazyOmapStatsTest::write_omap(const string& object_name) +{ + librados::bufferlist bl; + int ret = io_ctx.write_full(object_name, bl); + if (ret < 0) { + ret = -ret; + cerr << "Failed to create object! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } + ret = io_ctx.omap_set(object_name, payload); + if (ret < 0) { + ret = -ret; + cerr << "Failed to write omap payload! Error: " << ret << " " + << strerror(ret) << endl; + exit(ret); + } + cout << "Wrote " << conf.keys << " omap keys of " << conf.payload_size + << " bytes to " + << "the " << object_name << " object" << endl; +} + +const string LazyOmapStatsTest::get_name() const +{ + boost::uuids::uuid uuid = boost::uuids::random_generator()(); + return boost::uuids::to_string(uuid); +} + +void LazyOmapStatsTest::write_many(uint how_many) +{ + for (uint i = 0; i < how_many; i++) { + write_omap(get_name()); + } +} + +void LazyOmapStatsTest::create_payload() +{ + librados::bufferlist Lorem; + Lorem.append( + "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do " + "eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut " + "enim ad minim veniam, quis nostrud exercitation ullamco laboris " + "nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in " + "reprehenderit in voluptate velit esse cillum dolore eu fugiat " + "nulla pariatur. Excepteur sint occaecat cupidatat non proident, " + "sunt in culpa qui officia deserunt mollit anim id est laborum."); + conf.payload_size = Lorem.length(); + conf.total_bytes = conf.keys * conf.payload_size * conf.how_many; + conf.total_keys = conf.keys * conf.how_many; + uint i = 0; + for (i = 1; i < conf.keys + 1; ++i) { + payload[get_name()] = Lorem; + } + cout << "Created payload with " << conf.keys << " keys of " + << conf.payload_size + << " bytes each. Total size in bytes = " << conf.keys * conf.payload_size + << endl; +} + +void LazyOmapStatsTest::scrub() const +{ + // Use CLI because we need to block + + cout << "Scrubbing" << endl; + error_code ec; + bp::ipstream is; + bp::system("ceph osd deep-scrub all --block", bp::std_out > is, ec); + if (ec) { + cout << "Deep scrub command failed! Error: " << ec.value() << " " + << ec.message() << endl; + exit(ec.value()); + } + cout << is.rdbuf() << endl; +} + +const int LazyOmapStatsTest::find_matches(string& output, regex& reg) const +{ + sregex_iterator cur(output.begin(), output.end(), reg); + uint x = 0; + for (auto end = std::sregex_iterator(); cur != end; ++cur) { + cout << (*cur)[1].str() << endl; + x++; + } + return x; +} + +const string LazyOmapStatsTest::get_output(const string command, + const bool silent) +{ + librados::bufferlist inbl, outbl; + string output; + int ret = rados.mgr_command(command, inbl, &outbl, &output); + if (output.length() && !silent) { + cout << output << endl; + } + if (ret < 0) { + ret = -ret; + cerr << "Failed to get " << command << "! Error: " << ret << " " + << strerror(ret) << endl; + exit(ret); + } + return string(outbl.c_str(), outbl.length()); +} + +void LazyOmapStatsTest::check_one() +{ + string full_output = get_output(); + cout << full_output << endl; + regex reg( + "\n" + R"((PG_STAT[\s\S]*)" + "\n)OSD_STAT"); // Strip OSD_STAT table so we don't find matches there + smatch match; + regex_search(full_output, match, reg); + auto truncated_output = match[1].str(); + cout << truncated_output << endl; + reg = regex( + "\n" + R"(([0-9,s].*\s)" + + to_string(conf.keys) + + R"(\s.*))" + "\n"); + + cout << "Checking number of keys " << conf.keys << endl; + cout << "Found the following lines" << endl; + cout << "*************************" << endl; + uint result = find_matches(truncated_output, reg); + cout << "**********************" << endl; + cout << "Found " << result << " matching line(s)" << endl; + uint total = result; + + reg = regex( + "\n" + R"(([0-9,s].*\s)" + + to_string(conf.payload_size * conf.keys) + + R"(\s.*))" + "\n"); + cout << "Checking number of bytes " + << conf.payload_size * conf.keys << endl; + cout << "Found the following lines" << endl; + cout << "*************************" << endl; + result = find_matches(truncated_output, reg); + cout << "**********************" << endl; + cout << "Found " << result << " matching line(s)" << endl; + + total += result; + if (total != 6) { + cout << "Error: Found " << total << " matches, expected 6! Exiting..." + << endl; + exit(22); // EINVAL + } + cout << "check_one successful. Found " << total << " matches as expected" + << endl; +} + +const int LazyOmapStatsTest::find_index(string& haystack, regex& needle, + string label) const +{ + smatch match; + regex_search(haystack, match, needle); + auto line = match[1].str(); + boost::algorithm::trim(line); + boost::char_separator sep{" "}; + boost::tokenizer> tok(line, sep); + vector tokens(tok.begin(), tok.end()); + auto it = find(tokens.begin(), tokens.end(), label); + if (it != tokens.end()) { + return distance(tokens.begin(), it); + } + + cerr << "find_index failed to find index for " << label << endl; + exit(2); // ENOENT + return -1; // Unreachable +} + +const uint LazyOmapStatsTest::tally_column(const uint omap_bytes_index, + const string& table, + bool header) const +{ + istringstream buffer(table); + string line; + uint64_t total = 0; + while (std::getline(buffer, line)) { + if (header) { + header = false; + continue; + } + boost::char_separator sep{" "}; + boost::tokenizer> tok(line, sep); + vector tokens(tok.begin(), tok.end()); + total += stoi(tokens.at(omap_bytes_index)); + } + + return total; +} + +void LazyOmapStatsTest::check_column(const int index, const string& table, + const string& type, bool header) const +{ + uint expected; + string errormsg; + if (type.compare("bytes") == 0) { + expected = conf.total_bytes; + errormsg = "Error. Got unexpected byte count!"; + } else { + expected = conf.total_keys; + errormsg = "Error. Got unexpected key count!"; + } + uint sum = tally_column(index, table, header); + cout << "Got: " << sum << " Expected: " << expected << endl; + if (sum != expected) { + cout << errormsg << endl; + exit(22); // EINVAL + } +} + +index_t LazyOmapStatsTest::get_indexes(regex& reg, string& output) const +{ + index_t indexes; + indexes.byte_index = find_index(output, reg, "OMAP_BYTES*"); + indexes.key_index = find_index(output, reg, "OMAP_KEYS*"); + + return indexes; +} + +const string LazyOmapStatsTest::get_pool_id(string& pool) +{ + cout << R"(Querying pool id)" << endl; + + string command = R"({"prefix": "osd pool ls", "detail": "detail"})"; + librados::bufferlist inbl, outbl; + string output; + int ret = rados.mon_command(command, inbl, &outbl, &output); + if (output.length()) cout << output << endl; + if (ret < 0) { + ret = -ret; + cerr << "Failed to get pool id! Error: " << ret << " " << strerror(ret) + << endl; + exit(ret); + } + string dump_output(outbl.c_str(), outbl.length()); + cout << dump_output << endl; + + string poolregstring = R"(pool\s(\d+)\s')" + pool + "'"; + regex reg(poolregstring); + smatch match; + regex_search(dump_output, match, reg); + auto pool_id = match[1].str(); + cout << "Found pool ID: " << pool_id << endl; + + return pool_id; +} + +void LazyOmapStatsTest::check_pg_dump() +{ + cout << R"(Checking "pg dump" output)" << endl; + + string dump_output = get_output(); + cout << dump_output << endl; + + regex reg( + "\n" + R"((PG_STAT\s.*))" + "\n"); + index_t indexes = get_indexes(reg, dump_output); + + reg = + "\n" + R"((PG_STAT[\s\S]*))" + "\n +\n[0-9]"; + smatch match; + regex_search(dump_output, match, reg); + auto table = match[1].str(); + + cout << "Checking bytes" << endl; + check_column(indexes.byte_index, table, string("bytes")); + + cout << "Checking keys" << endl; + check_column(indexes.key_index, table, string("keys")); + + cout << endl; +} + +void LazyOmapStatsTest::check_pg_dump_summary() +{ + cout << R"(Checking "pg dump summary" output)" << endl; + + string command = R"({"prefix": "pg dump", "dumpcontents": ["summary"]})"; + string dump_output = get_output(command); + cout << dump_output << endl; + + regex reg( + "\n" + R"((PG_STAT\s.*))" + "\n"); + index_t indexes = get_indexes(reg, dump_output); + + reg = + "\n" + R"((sum\s.*))" + "\n"; + smatch match; + regex_search(dump_output, match, reg); + auto table = match[1].str(); + + cout << "Checking bytes" << endl; + check_column(indexes.byte_index, table, string("bytes"), false); + + cout << "Checking keys" << endl; + check_column(indexes.key_index, table, string("keys"), false); + cout << endl; +} + +void LazyOmapStatsTest::check_pg_dump_pgs() +{ + cout << R"(Checking "pg dump pgs" output)" << endl; + + string command = R"({"prefix": "pg dump", "dumpcontents": ["pgs"]})"; + string dump_output = get_output(command); + cout << dump_output << endl; + + regex reg(R"(^(PG_STAT\s.*))" + "\n"); + index_t indexes = get_indexes(reg, dump_output); + + reg = R"(^(PG_STAT[\s\S]*))" + "\n\n"; + smatch match; + regex_search(dump_output, match, reg); + auto table = match[1].str(); + + cout << "Checking bytes" << endl; + check_column(indexes.byte_index, table, string("bytes")); + + cout << "Checking keys" << endl; + check_column(indexes.key_index, table, string("keys")); + cout << endl; +} + +void LazyOmapStatsTest::check_pg_dump_pools() +{ + cout << R"(Checking "pg dump pools" output)" << endl; + + string command = R"({"prefix": "pg dump", "dumpcontents": ["pools"]})"; + string dump_output = get_output(command); + cout << dump_output << endl; + + regex reg(R"(^(POOLID\s.*))" + "\n"); + index_t indexes = get_indexes(reg, dump_output); + + auto pool_id = get_pool_id(conf.pool_name); + + reg = + "\n" + R"(()" + + pool_id + + R"(\s.*))" + "\n"; + smatch match; + regex_search(dump_output, match, reg); + auto line = match[1].str(); + + cout << "Checking bytes" << endl; + check_column(indexes.byte_index, line, string("bytes"), false); + + cout << "Checking keys" << endl; + check_column(indexes.key_index, line, string("keys"), false); + cout << endl; +} + +void LazyOmapStatsTest::check_pg_ls() +{ + cout << R"(Checking "pg ls" output)" << endl; + + string command = R"({"prefix": "pg ls"})"; + string dump_output = get_output(command); + cout << dump_output << endl; + + regex reg(R"(^(PG\s.*))" + "\n"); + index_t indexes = get_indexes(reg, dump_output); + + reg = R"(^(PG[\s\S]*))" + "\n\n"; + smatch match; + regex_search(dump_output, match, reg); + auto table = match[1].str(); + + cout << "Checking bytes" << endl; + check_column(indexes.byte_index, table, string("bytes")); + + cout << "Checking keys" << endl; + check_column(indexes.key_index, table, string("keys")); + cout << endl; +} + +void LazyOmapStatsTest::wait_for_active_clean() +{ + cout << "Waiting for active+clean" << endl; + + int index = -1; + regex reg( + "\n" + R"((PG_STAT[\s\S]*))" + "\n +\n[0-9]"); + string command = R"({"prefix": "pg dump"})"; + int num_not_clean; + do { + string dump_output = get_output(command, true); + if (index == -1) { + regex ireg( + "\n" + R"((PG_STAT\s.*))" + "\n"); + index = find_index(dump_output, ireg, "STATE"); + } + smatch match; + regex_search(dump_output, match, reg); + istringstream buffer(match[1].str()); + string line; + num_not_clean = 0; + while (std::getline(buffer, line)) { + if (line.compare(0, 1, "P") == 0) continue; + boost::char_separator sep{" "}; + boost::tokenizer> tok(line, sep); + vector tokens(tok.begin(), tok.end()); + num_not_clean += tokens.at(index).compare("active+clean"); + } + cout << "." << flush; + this_thread::sleep_for(chrono::milliseconds(250)); + } while (num_not_clean); + + cout << endl; +} + +const int LazyOmapStatsTest::run(const int argc, const char** argv) +{ + init(argc, argv); + create_payload(); + wait_for_active_clean(); + write_omap(get_name()); + scrub(); + check_one(); + + write_many(conf.how_many - 1); // Since we already wrote one + scrub(); + check_pg_dump(); + check_pg_dump_summary(); + check_pg_dump_pgs(); + check_pg_dump_pools(); + check_pg_ls(); + cout << "All tests passed. Success!" << endl; + + shutdown(); + + return 0; +} diff --git a/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.h b/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.h new file mode 100644 index 000000000..5399012ea --- /dev/null +++ b/ceph/src/test/lazy-omap-stats/lazy_omap_stats_test.h @@ -0,0 +1,79 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2019 Red Hat + * + * This is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License version 2.1, as published by the Free Software + * Foundation. See file COPYING. + * + */ + +#ifndef CEPH_LAZY_OMAP_STATS_TEST_H +#define CEPH_LAZY_OMAP_STATS_TEST_H + +#include +#include +#include + +#include "include/rados/librados.hpp" + +struct index_t { + uint byte_index = 0; + uint key_index = 0; +}; + +class LazyOmapStatsTest +{ + librados::IoCtx io_ctx; + librados::Rados rados; + std::map payload; + + struct lazy_omap_test_t { + uint payload_size = 0; + uint replica_count = 3; + uint keys = 2000; + uint how_many = 50; + std::string pool_name = "lazy_omap_test_pool"; + uint total_bytes = 0; + uint total_keys = 0; + } conf; + + LazyOmapStatsTest(LazyOmapStatsTest&) = delete; + void operator=(LazyOmapStatsTest) = delete; + void init(const int argc, const char** argv); + void shutdown(); + void write_omap(const std::string& object_name); + const std::string get_name() const; + void create_payload(); + void write_many(const uint how_many); + void scrub() const; + const int find_matches(std::string& output, std::regex& reg) const; + void check_one(); + const int find_index(std::string& haystack, std::regex& needle, + std::string label) const; + const uint tally_column(const uint omap_bytes_index, + const std::string& table, bool header) const; + void check_column(const int index, const std::string& table, + const std::string& type, bool header = true) const; + index_t get_indexes(std::regex& reg, std::string& output) const; + const std::string get_pool_id(std::string& pool); + void check_pg_dump(); + void check_pg_dump_summary(); + void check_pg_dump_pgs(); + void check_pg_dump_pools(); + void check_pg_ls(); + const std::string get_output( + const std::string command = R"({"prefix": "pg dump"})", + const bool silent = false); + void wait_for_active_clean(); + + public: + LazyOmapStatsTest() = default; + const int run(const int argc, const char** argv); +}; + +#endif // CEPH_LAZY_OMAP_STATS_TEST_H diff --git a/ceph/src/test/lazy-omap-stats/main.cc b/ceph/src/test/lazy-omap-stats/main.cc new file mode 100644 index 000000000..d379e8fbd --- /dev/null +++ b/ceph/src/test/lazy-omap-stats/main.cc @@ -0,0 +1,21 @@ +// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- +// vim: ts=8 sw=2 smarttab +/* + * Ceph - scalable distributed file system + * + * Copyright (C) 2019 Red Hat + * + * This is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License version 2.1, as published by the Free Software + * Foundation. See file COPYING. + * + */ + +#include "lazy_omap_stats_test.h" + +int main(const int argc, const char** argv) +{ + LazyOmapStatsTest app; + return app.run(argc, argv); +} diff --git a/ceph/src/test/librbd/fsx.cc b/ceph/src/test/librbd/fsx.cc index e0b40937d..24b0538e7 100644 --- a/ceph/src/test/librbd/fsx.cc +++ b/ceph/src/test/librbd/fsx.cc @@ -431,7 +431,7 @@ int replay_journal(rados_ioctx_t ioctx, const char *image_name, return r; } - replay_journaler.start_append(0, 0, 0, 0); + replay_journaler.start_append(0); C_SaferCond replay_ctx; ReplayHandler replay_handler(&journaler, &replay_journaler, diff --git a/ceph/src/test/librbd/io/test_mock_CopyupRequest.cc b/ceph/src/test/librbd/io/test_mock_CopyupRequest.cc index 823e5b0fe..c6a28595a 100644 --- a/ceph/src/test/librbd/io/test_mock_CopyupRequest.cc +++ b/ceph/src/test/librbd/io/test_mock_CopyupRequest.cc @@ -13,6 +13,7 @@ #include "librbd/deep_copy/ObjectCopyRequest.h" #include "librbd/io/CopyupRequest.h" #include "librbd/io/ImageRequest.h" +#include "librbd/io/ImageRequestWQ.h" #include "librbd/io/ObjectRequest.h" #include "librbd/io/ReadResult.h" @@ -310,6 +311,10 @@ struct TestMockIoCopyupRequest : public TestMockFixture { })); } + void flush_async_operations(librbd::ImageCtx* ictx) { + ictx->io_work_queue->flush(); + } + std::string m_parent_image_name; }; @@ -450,7 +455,7 @@ TEST_F(TestMockIoCopyupRequest, CopyOnRead) { {{0, 4096}}, {}); mock_image_ctx.copyup_list[0] = req; req->send(); - ictx->flush_async_operations(); + flush_async_operations(ictx); } TEST_F(TestMockIoCopyupRequest, CopyOnReadWithSnaps) { @@ -496,7 +501,7 @@ TEST_F(TestMockIoCopyupRequest, CopyOnReadWithSnaps) { {{0, 4096}}, {}); mock_image_ctx.copyup_list[0] = req; req->send(); - ictx->flush_async_operations(); + flush_async_operations(ictx); } TEST_F(TestMockIoCopyupRequest, DeepCopy) { @@ -580,10 +585,10 @@ TEST_F(TestMockIoCopyupRequest, DeepCopyOnRead) { {{0, 4096}}, {}); mock_image_ctx.copyup_list[0] = req; req->send(); - ictx->flush_async_operations(); + flush_async_operations(ictx); } -TEST_F(TestMockIoCopyupRequest, DeepCopyWithSnaps) { +TEST_F(TestMockIoCopyupRequest, DeepCopyWithPostSnaps) { REQUIRE_FEATURE(RBD_FEATURE_LAYERING); librbd::ImageCtx *ictx; @@ -616,6 +621,77 @@ TEST_F(TestMockIoCopyupRequest, DeepCopyWithSnaps) { InSequence seq; + MockAbstractObjectWriteRequest mock_write_request; + MockObjectCopyRequest mock_object_copy_request; + mock_image_ctx.migration_info = {1, "", "", "image id", + {{CEPH_NOSNAP, {2, 1}}}, + ictx->size, true}; + expect_is_empty_write_op(mock_write_request, false); + expect_object_copy(mock_image_ctx, mock_object_copy_request, true, 0); + + expect_is_empty_write_op(mock_write_request, false); + expect_get_parent_overlap(mock_image_ctx, 1, 0, 0); + expect_get_parent_overlap(mock_image_ctx, 2, 1, 0); + expect_prune_parent_extents(mock_image_ctx, 1, 1); + expect_get_parent_overlap(mock_image_ctx, 3, 1, 0); + expect_prune_parent_extents(mock_image_ctx, 1, 1); + expect_get_pre_write_object_map_state(mock_image_ctx, mock_write_request, + OBJECT_EXISTS); + expect_object_map_at(mock_image_ctx, 0, OBJECT_NONEXISTENT); + expect_object_map_update(mock_image_ctx, 2, 0, OBJECT_EXISTS, true, 0); + expect_object_map_update(mock_image_ctx, 3, 0, OBJECT_EXISTS_CLEAN, true, 0); + expect_object_map_update(mock_image_ctx, CEPH_NOSNAP, 0, OBJECT_EXISTS, true, + 0); + + expect_add_copyup_ops(mock_write_request); + expect_copyup(mock_image_ctx, CEPH_NOSNAP, "oid", "", 0); + expect_write(mock_image_ctx, CEPH_NOSNAP, "oid", 0); + + auto req = new MockCopyupRequest(&mock_image_ctx, "oid", 0, + {{0, 4096}}, {}); + mock_image_ctx.copyup_list[0] = req; + req->append_request(&mock_write_request); + req->send(); + + ASSERT_EQ(0, mock_write_request.ctx.wait()); +} + +TEST_F(TestMockIoCopyupRequest, DeepCopyWithPreAndPostSnaps) { + REQUIRE_FEATURE(RBD_FEATURE_LAYERING); + + librbd::ImageCtx *ictx; + ASSERT_EQ(0, open_image(m_image_name, &ictx)); + ictx->snap_lock.get_write(); + ictx->add_snap(cls::rbd::UserSnapshotNamespace(), "4", 4, ictx->size, + ictx->parent_md, RBD_PROTECTION_STATUS_UNPROTECTED, + 0, {}); + ictx->add_snap(cls::rbd::UserSnapshotNamespace(), "3", 3, ictx->size, + ictx->parent_md, RBD_PROTECTION_STATUS_UNPROTECTED, + 0, {}); + ictx->add_snap(cls::rbd::UserSnapshotNamespace(), "2", 2, ictx->size, + ictx->parent_md, RBD_PROTECTION_STATUS_UNPROTECTED, + 0, {}); + ictx->add_snap(cls::rbd::UserSnapshotNamespace(), "1", 1, ictx->size, + ictx->parent_md, RBD_PROTECTION_STATUS_UNPROTECTED, + 0, {}); + ictx->snapc = {4, {4, 3, 2, 1}}; + ictx->snap_lock.put_write(); + + MockTestImageCtx mock_parent_image_ctx(*ictx->parent); + MockTestImageCtx mock_image_ctx(*ictx, &mock_parent_image_ctx); + + MockExclusiveLock mock_exclusive_lock; + MockJournal mock_journal; + MockObjectMap mock_object_map; + initialize_features(ictx, mock_image_ctx, mock_exclusive_lock, mock_journal, + mock_object_map); + + expect_test_features(mock_image_ctx); + expect_op_work_queue(mock_image_ctx); + expect_is_lock_owner(mock_image_ctx); + + InSequence seq; + MockAbstractObjectWriteRequest mock_write_request; MockObjectCopyRequest mock_object_copy_request; mock_image_ctx.migration_info = {1, "", "", "image id", @@ -628,10 +704,13 @@ TEST_F(TestMockIoCopyupRequest, DeepCopyWithSnaps) { expect_get_parent_overlap(mock_image_ctx, 2, 0, 0); expect_get_parent_overlap(mock_image_ctx, 3, 1, 0); expect_prune_parent_extents(mock_image_ctx, 1, 1); + expect_get_parent_overlap(mock_image_ctx, 4, 1, 0); + expect_prune_parent_extents(mock_image_ctx, 1, 1); expect_get_pre_write_object_map_state(mock_image_ctx, mock_write_request, OBJECT_EXISTS); expect_object_map_at(mock_image_ctx, 0, OBJECT_NONEXISTENT); - expect_object_map_update(mock_image_ctx, 3, 0, OBJECT_EXISTS, true, 0); + expect_object_map_update(mock_image_ctx, 3, 0, OBJECT_EXISTS_CLEAN, true, 0); + expect_object_map_update(mock_image_ctx, 4, 0, OBJECT_EXISTS_CLEAN, true, 0); expect_object_map_update(mock_image_ctx, CEPH_NOSNAP, 0, OBJECT_EXISTS, true, 0); @@ -723,7 +802,7 @@ TEST_F(TestMockIoCopyupRequest, ZeroedCopyOnRead) { {{0, 4096}}, {}); mock_image_ctx.copyup_list[0] = req; req->send(); - ictx->flush_async_operations(); + flush_async_operations(ictx); } TEST_F(TestMockIoCopyupRequest, NoOpCopyup) { @@ -980,7 +1059,7 @@ TEST_F(TestMockIoCopyupRequest, CopyupError) { req->send(); ASSERT_EQ(-EPERM, mock_write_request.ctx.wait()); - ictx->flush_async_operations(); + flush_async_operations(ictx); } } // namespace io diff --git a/ceph/src/test/librbd/io/test_mock_ImageRequest.cc b/ceph/src/test/librbd/io/test_mock_ImageRequest.cc index 5f5851c37..034058028 100644 --- a/ceph/src/test/librbd/io/test_mock_ImageRequest.cc +++ b/ceph/src/test/librbd/io/test_mock_ImageRequest.cc @@ -113,11 +113,6 @@ struct TestMockIoImageRequest : public TestMockFixture { mock_image_ctx.image_ctx->op_work_queue->queue(&spec->dispatcher_ctx, r); })); } - - void expect_flush_async_operations(MockImageCtx &mock_image_ctx, int r) { - EXPECT_CALL(mock_image_ctx, flush_async_operations(_)) - .WillOnce(CompleteContext(r, mock_image_ctx.image_ctx->op_work_queue)); - } }; TEST_F(TestMockIoImageRequest, AioWriteModifyTimestamp) { @@ -388,7 +383,6 @@ TEST_F(TestMockIoImageRequest, AioFlushJournalAppendDisabled) { InSequence seq; expect_is_journal_appending(mock_journal, false); - expect_flush_async_operations(mock_image_ctx, 0); expect_object_request_send(mock_image_ctx, 0); C_SaferCond aio_comp_ctx; diff --git a/ceph/src/test/librbd/journal/test_Replay.cc b/ceph/src/test/librbd/journal/test_Replay.cc index b7137b5c6..88b01c60f 100644 --- a/ceph/src/test/librbd/journal/test_Replay.cc +++ b/ceph/src/test/librbd/journal/test_Replay.cc @@ -857,7 +857,7 @@ TEST_F(TestJournalReplay, ObjectPosition) { // user flush requests are ignored when journaling + cache are enabled C_SaferCond flush_ctx; - aio_comp = librbd::io::AioCompletion::create( + aio_comp = librbd::io::AioCompletion::create_and_start( &flush_ctx, ictx, librbd::io::AIO_TYPE_FLUSH); auto req = librbd::io::ImageDispatchSpec<>::create_flush_request( *ictx, aio_comp, librbd::io::FLUSH_SOURCE_INTERNAL, {}); diff --git a/ceph/src/test/librbd/journal/test_mock_PromoteRequest.cc b/ceph/src/test/librbd/journal/test_mock_PromoteRequest.cc index 0e61a8889..68a627a79 100644 --- a/ceph/src/test/librbd/journal/test_mock_PromoteRequest.cc +++ b/ceph/src/test/librbd/journal/test_mock_PromoteRequest.cc @@ -120,7 +120,7 @@ public: } void expect_start_append(::journal::MockJournaler &mock_journaler) { - EXPECT_CALL(mock_journaler, start_append(_, _, _, _)); + EXPECT_CALL(mock_journaler, start_append(_)); } void expect_stop_append(::journal::MockJournaler &mock_journaler, int r) { diff --git a/ceph/src/test/librbd/mock/MockImageCtx.h b/ceph/src/test/librbd/mock/MockImageCtx.h index 5d2c3d28a..4c45011d3 100644 --- a/ceph/src/test/librbd/mock/MockImageCtx.h +++ b/ceph/src/test/librbd/mock/MockImageCtx.h @@ -186,7 +186,6 @@ struct MockImageCtx { librados::snap_t id)); MOCK_METHOD0(user_flushed, void()); - MOCK_METHOD1(flush_async_operations, void(Context *)); MOCK_METHOD1(flush_copyup, void(Context *)); MOCK_CONST_METHOD1(test_features, bool(uint64_t test_features)); diff --git a/ceph/src/test/librbd/mock/MockJournal.h b/ceph/src/test/librbd/mock/MockJournal.h index dd9328307..31806217a 100644 --- a/ceph/src/test/librbd/mock/MockJournal.h +++ b/ceph/src/test/librbd/mock/MockJournal.h @@ -65,6 +65,8 @@ struct MockJournal { MOCK_METHOD0(allocate_op_tid, uint64_t()); + MOCK_METHOD0(user_flushed, void()); + MOCK_METHOD3(append_op_event_mock, void(uint64_t, const journal::EventEntry&, Context *)); void append_op_event(uint64_t op_tid, journal::EventEntry &&event_entry, diff --git a/ceph/src/test/librbd/operation/test_mock_ResizeRequest.cc b/ceph/src/test/librbd/operation/test_mock_ResizeRequest.cc index 931d5f462..0f959230e 100644 --- a/ceph/src/test/librbd/operation/test_mock_ResizeRequest.cc +++ b/ceph/src/test/librbd/operation/test_mock_ResizeRequest.cc @@ -227,9 +227,7 @@ TEST_F(TestMockOperationResizeRequest, GrowSuccess) { InSequence seq; expect_block_writes(mock_image_ctx, 0); expect_append_op_event(mock_image_ctx, true, 0); - expect_unblock_writes(mock_image_ctx); expect_grow_object_map(mock_image_ctx); - expect_block_writes(mock_image_ctx, 0); expect_update_header(mock_image_ctx, 0); expect_unblock_writes(mock_image_ctx); expect_commit_op_event(mock_image_ctx, 0); @@ -388,11 +386,17 @@ TEST_F(TestMockOperationResizeRequest, PostBlockWritesError) { expect_block_writes(mock_image_ctx, 0); expect_append_op_event(mock_image_ctx, true, 0); expect_unblock_writes(mock_image_ctx); - expect_grow_object_map(mock_image_ctx); + + MockTrimRequest mock_trim_request; + auto mock_io_image_dispatch_spec = new MockIoImageDispatchSpec(); + expect_flush_cache(mock_image_ctx, *mock_io_image_dispatch_spec, 0); + expect_invalidate_cache(mock_image_ctx, 0); + expect_trim(mock_image_ctx, mock_trim_request, 0); expect_block_writes(mock_image_ctx, -EINVAL); expect_unblock_writes(mock_image_ctx); expect_commit_op_event(mock_image_ctx, -EINVAL); - ASSERT_EQ(-EINVAL, when_resize(mock_image_ctx, ictx->size * 2, true, 0, false)); + ASSERT_EQ(-EINVAL, when_resize(mock_image_ctx, ictx->size / 2, true, 0, + false)); } TEST_F(TestMockOperationResizeRequest, UpdateHeaderError) { @@ -409,9 +413,7 @@ TEST_F(TestMockOperationResizeRequest, UpdateHeaderError) { InSequence seq; expect_block_writes(mock_image_ctx, 0); expect_append_op_event(mock_image_ctx, true, 0); - expect_unblock_writes(mock_image_ctx); expect_grow_object_map(mock_image_ctx); - expect_block_writes(mock_image_ctx, 0); expect_update_header(mock_image_ctx, -EINVAL); expect_unblock_writes(mock_image_ctx); expect_commit_op_event(mock_image_ctx, -EINVAL); diff --git a/ceph/src/test/librbd/test_internal.cc b/ceph/src/test/librbd/test_internal.cc index 20146e075..e95be0b8a 100644 --- a/ceph/src/test/librbd/test_internal.cc +++ b/ceph/src/test/librbd/test_internal.cc @@ -15,6 +15,7 @@ #include "librbd/Operations.h" #include "librbd/api/DiffIterate.h" #include "librbd/api/Image.h" +#include "librbd/api/Migration.h" #include "librbd/api/PoolMetadata.h" #include "librbd/io/AioCompletion.h" #include "librbd/io/ImageRequest.h" @@ -665,6 +666,181 @@ TEST_F(TestInternal, SnapshotCopyup) } } +TEST_F(TestInternal, SnapshotCopyupZeros) +{ + REQUIRE_FEATURE(RBD_FEATURE_LAYERING); + + librbd::ImageCtx *ictx; + ASSERT_EQ(0, open_image(m_image_name, &ictx)); + + // create an empty clone + ASSERT_EQ(0, snap_create(*ictx, "snap1")); + ASSERT_EQ(0, + ictx->operations->snap_protect(cls::rbd::UserSnapshotNamespace(), + "snap1")); + + uint64_t features; + ASSERT_EQ(0, librbd::get_features(ictx, &features)); + + std::string clone_name = get_temp_image_name(); + int order = ictx->order; + ASSERT_EQ(0, librbd::clone(m_ioctx, m_image_name.c_str(), "snap1", m_ioctx, + clone_name.c_str(), features, &order, 0, 0)); + + librbd::ImageCtx *ictx2; + ASSERT_EQ(0, open_image(clone_name, &ictx2)); + + ASSERT_EQ(0, snap_create(*ictx2, "snap1")); + + bufferlist bl; + bl.append(std::string(256, '1')); + ASSERT_EQ(256, ictx2->io_work_queue->write(256, bl.length(), bufferlist{bl}, + 0)); + + librados::IoCtx snap_ctx; + snap_ctx.dup(ictx2->data_ctx); + snap_ctx.snap_set_read(CEPH_SNAPDIR); + + librados::snap_set_t snap_set; + ASSERT_EQ(0, snap_ctx.list_snaps(ictx2->get_object_name(0), &snap_set)); + + // verify that snapshot wasn't affected + ASSERT_EQ(1U, snap_set.clones.size()); + ASSERT_EQ(CEPH_NOSNAP, snap_set.clones[0].cloneid); + + bufferptr read_ptr(256); + bufferlist read_bl; + read_bl.push_back(read_ptr); + + std::list snaps = {"snap1", ""}; + librbd::io::ReadResult read_result{&read_bl}; + for (std::list::iterator it = snaps.begin(); + it != snaps.end(); ++it) { + const char *snap_name = it->empty() ? NULL : it->c_str(); + ASSERT_EQ(0, librbd::api::Image<>::snap_set( + ictx2, cls::rbd::UserSnapshotNamespace(), snap_name)); + + ASSERT_EQ(256, + ictx2->io_work_queue->read(0, 256, + librbd::io::ReadResult{read_result}, + 0)); + ASSERT_TRUE(read_bl.is_zero()); + + ASSERT_EQ(256, + ictx2->io_work_queue->read(256, 256, + librbd::io::ReadResult{read_result}, + 0)); + if (snap_name == NULL) { + ASSERT_TRUE(bl.contents_equal(read_bl)); + } else { + ASSERT_TRUE(read_bl.is_zero()); + } + + // verify that only HEAD object map was updated + if ((ictx2->features & RBD_FEATURE_OBJECT_MAP) != 0) { + uint8_t state = OBJECT_EXISTS; + if (snap_name != NULL) { + state = OBJECT_NONEXISTENT; + } + + librbd::ObjectMap<> object_map(*ictx2, ictx2->snap_id); + C_SaferCond ctx; + object_map.open(&ctx); + ASSERT_EQ(0, ctx.wait()); + + RWLock::WLocker object_map_locker(ictx2->object_map_lock); + ASSERT_EQ(state, object_map[0]); + } + } +} + +TEST_F(TestInternal, SnapshotCopyupZerosMigration) +{ + REQUIRE_FEATURE(RBD_FEATURE_LAYERING); + + librbd::ImageCtx *ictx; + ASSERT_EQ(0, open_image(m_image_name, &ictx)); + + uint64_t features; + ASSERT_EQ(0, librbd::get_features(ictx, &features)); + + close_image(ictx); + + // migrate an empty image + std::string dst_name = get_temp_image_name(); + librbd::ImageOptions dst_opts; + dst_opts.set(RBD_IMAGE_OPTION_FEATURES, features); + ASSERT_EQ(0, librbd::api::Migration<>::prepare(m_ioctx, m_image_name, + m_ioctx, dst_name, + dst_opts)); + + librbd::ImageCtx *ictx2; + ASSERT_EQ(0, open_image(dst_name, &ictx2)); + + ASSERT_EQ(0, snap_create(*ictx2, "snap1")); + + bufferlist bl; + bl.append(std::string(256, '1')); + ASSERT_EQ(256, ictx2->io_work_queue->write(256, bl.length(), bufferlist{bl}, + 0)); + + librados::IoCtx snap_ctx; + snap_ctx.dup(ictx2->data_ctx); + snap_ctx.snap_set_read(CEPH_SNAPDIR); + + librados::snap_set_t snap_set; + ASSERT_EQ(0, snap_ctx.list_snaps(ictx2->get_object_name(0), &snap_set)); + + // verify that snapshot wasn't affected + ASSERT_EQ(1U, snap_set.clones.size()); + ASSERT_EQ(CEPH_NOSNAP, snap_set.clones[0].cloneid); + + bufferptr read_ptr(256); + bufferlist read_bl; + read_bl.push_back(read_ptr); + + std::list snaps = {"snap1", ""}; + librbd::io::ReadResult read_result{&read_bl}; + for (std::list::iterator it = snaps.begin(); + it != snaps.end(); ++it) { + const char *snap_name = it->empty() ? NULL : it->c_str(); + ASSERT_EQ(0, librbd::api::Image<>::snap_set( + ictx2, cls::rbd::UserSnapshotNamespace(), snap_name)); + + ASSERT_EQ(256, + ictx2->io_work_queue->read(0, 256, + librbd::io::ReadResult{read_result}, + 0)); + ASSERT_TRUE(read_bl.is_zero()); + + ASSERT_EQ(256, + ictx2->io_work_queue->read(256, 256, + librbd::io::ReadResult{read_result}, + 0)); + if (snap_name == NULL) { + ASSERT_TRUE(bl.contents_equal(read_bl)); + } else { + ASSERT_TRUE(read_bl.is_zero()); + } + + // verify that only HEAD object map was updated + if ((ictx2->features & RBD_FEATURE_OBJECT_MAP) != 0) { + uint8_t state = OBJECT_EXISTS; + if (snap_name != NULL) { + state = OBJECT_NONEXISTENT; + } + + librbd::ObjectMap<> object_map(*ictx2, ictx2->snap_id); + C_SaferCond ctx; + object_map.open(&ctx); + ASSERT_EQ(0, ctx.wait()); + + RWLock::WLocker object_map_locker(ictx2->object_map_lock); + ASSERT_EQ(state, object_map[0]); + } + } +} + TEST_F(TestInternal, ResizeCopyup) { REQUIRE_FEATURE(RBD_FEATURE_LAYERING); diff --git a/ceph/src/test/librbd/test_mock_Journal.cc b/ceph/src/test/librbd/test_mock_Journal.cc index 12f445d6c..40aed36f3 100644 --- a/ceph/src/test/librbd/test_mock_Journal.cc +++ b/ceph/src/test/librbd/test_mock_Journal.cc @@ -396,7 +396,16 @@ public: } void expect_start_append(::journal::MockJournaler &mock_journaler) { - EXPECT_CALL(mock_journaler, start_append(_, _, _, _)); + EXPECT_CALL(mock_journaler, start_append(_)); + } + + void expect_set_append_batch_options(MockJournalImageCtx &mock_image_ctx, + ::journal::MockJournaler &mock_journaler, + bool user_flushed) { + if (mock_image_ctx.image_ctx->config.get_val("rbd_journal_object_writethrough_until_flush") == + user_flushed) { + EXPECT_CALL(mock_journaler, set_append_batch_options(_, _, _)); + } } void expect_stop_append(::journal::MockJournaler &mock_journaler, int r) { @@ -518,6 +527,7 @@ public: expect_committed(mock_journaler, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); } @@ -585,6 +595,7 @@ TEST_F(TestMockJournal, StateTransitions) { expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); @@ -662,6 +673,7 @@ TEST_F(TestMockJournal, ReplayCompleteError) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); expect_stop_append(mock_journaler, 0); @@ -719,6 +731,7 @@ TEST_F(TestMockJournal, FlushReplayError) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); expect_stop_append(mock_journaler, 0); @@ -773,6 +786,7 @@ TEST_F(TestMockJournal, CorruptEntry) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); expect_stop_append(mock_journaler, -EINVAL); @@ -811,6 +825,7 @@ TEST_F(TestMockJournal, StopError) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); ASSERT_EQ(0, when_open(mock_journal)); expect_stop_append(mock_journaler, -EINVAL); @@ -876,6 +891,7 @@ TEST_F(TestMockJournal, ReplayOnDiskPreFlushError) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); C_SaferCond ctx; mock_journal.open(&ctx); @@ -958,6 +974,7 @@ TEST_F(TestMockJournal, ReplayOnDiskPostFlushError) { expect_shut_down_replay(mock_image_ctx, mock_journal_replay, 0); expect_flush_commit_position(mock_journaler); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); C_SaferCond ctx; mock_journal.open(&ctx); @@ -1272,6 +1289,7 @@ TEST_F(TestMockJournal, ExternalReplay) { InSequence seq; expect_stop_append(mock_journaler, 0); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); expect_shut_down_journaler(mock_journaler); C_SaferCond start_ctx; @@ -1303,6 +1321,7 @@ TEST_F(TestMockJournal, ExternalReplayFailure) { InSequence seq; expect_stop_append(mock_journaler, -EINVAL); expect_start_append(mock_journaler); + expect_set_append_batch_options(mock_image_ctx, mock_journaler, false); expect_shut_down_journaler(mock_journaler); C_SaferCond start_ctx; @@ -1486,4 +1505,27 @@ TEST_F(TestMockJournal, ForcePromoted) { ASSERT_EQ(0, listener.ctx.wait()); } +TEST_F(TestMockJournal, UserFlushed) { + REQUIRE_FEATURE(RBD_FEATURE_JOURNALING); + + librbd::ImageCtx *ictx; + ASSERT_EQ(0, open_image(m_image_name, &ictx)); + + MockJournalImageCtx mock_image_ctx(*ictx); + MockJournal mock_journal(mock_image_ctx); + MockObjectDispatch mock_object_dispatch; + ::journal::MockJournaler mock_journaler; + MockJournalOpenRequest mock_open_request; + open_journal(mock_image_ctx, mock_journal, mock_object_dispatch, + mock_journaler, mock_open_request); + BOOST_SCOPE_EXIT_ALL(&) { + close_journal(mock_image_ctx, mock_journal, mock_journaler); + }; + + expect_set_append_batch_options(mock_image_ctx, mock_journaler, true); + mock_journal.user_flushed(); + + expect_shut_down_journaler(mock_journaler); +} + } // namespace librbd diff --git a/ceph/src/test/objectstore/Allocator_test.cc b/ceph/src/test/objectstore/Allocator_test.cc index 229d793a8..753851e89 100644 --- a/ceph/src/test/objectstore/Allocator_test.cc +++ b/ceph/src/test/objectstore/Allocator_test.cc @@ -318,6 +318,27 @@ TEST_P(AllocTest, test_alloc_bug_24598) EXPECT_EQ(1u, tmp.size()); } +//Verifies issue from +//http://tracker.ceph.com/issues/40703 +// +TEST_P(AllocTest, test_alloc_big2) +{ + int64_t block_size = 4096; + int64_t blocks = 1048576 * 2; + int64_t mas = 1024*1024; + init_alloc(blocks*block_size, block_size); + alloc->init_add_free(0, blocks * block_size); + + PExtentVector extents; + uint64_t need = block_size * blocks / 4; // 2GB + EXPECT_EQ(need, + alloc->allocate(need, mas, 0, &extents)); + need = block_size * blocks / 4; // 2GB + EXPECT_EQ(need, + alloc->allocate(need, mas, 0, &extents)); + EXPECT_TRUE(extents[0].length > 0); +} + INSTANTIATE_TEST_CASE_P( Allocator, AllocTest, diff --git a/ceph/src/test/objectstore/test_bluefs.cc b/ceph/src/test/objectstore/test_bluefs.cc index 60f31efee..60e51a12d 100644 --- a/ceph/src/test/objectstore/test_bluefs.cc +++ b/ceph/src/test/objectstore/test_bluefs.cc @@ -135,6 +135,58 @@ TEST(BlueFS, small_appends) { rm_temp_bdev(fn); } +TEST(BlueFS, very_large_write) { + // we'll write a ~3G file, so allocate more than that for the whole fs + uint64_t size = 1048576 * 1024 * 8ull; + string fn = get_temp_bdev(size); + BlueFS fs(g_ceph_context); + + bool old = g_ceph_context->_conf.get_val("bluefs_buffered_io"); + g_ceph_context->_conf.set_val("bluefs_buffered_io", "false"); + + ASSERT_EQ(0, fs.add_block_device(BlueFS::BDEV_DB, fn, false)); + fs.add_block_extent(BlueFS::BDEV_DB, 1048576, size - 1048576); + uuid_d fsid; + ASSERT_EQ(0, fs.mkfs(fsid)); + ASSERT_EQ(0, fs.mount()); + char buf[1048571]; // this is biggish, but intentionally not evenly aligned + for (unsigned i = 0; i < sizeof(buf); ++i) { + buf[i] = i; + } + { + BlueFS::FileWriter *h; + ASSERT_EQ(0, fs.mkdir("dir")); + ASSERT_EQ(0, fs.open_for_write("dir", "bigfile", &h, false)); + for (unsigned i = 0; i < 3*1024*1048576ull / sizeof(buf); ++i) { + h->append(buf, sizeof(buf)); + } + fs.fsync(h); + fs.close_writer(h); + } + { + BlueFS::FileReader *h; + ASSERT_EQ(0, fs.open_for_read("dir", "bigfile", &h)); + bufferlist bl; + BlueFS::FileReaderBuffer readbuf(10485760); + for (unsigned i = 0; i < 3*1024*1048576ull / sizeof(buf); ++i) { + bl.clear(); + fs.read(h, &readbuf, i * sizeof(buf), sizeof(buf), &bl, NULL); + int r = memcmp(buf, bl.c_str(), sizeof(buf)); + if (r) { + cerr << "read got mismatch at offset " << i*sizeof(buf) << " r " << r + << std::endl; + } + ASSERT_EQ(0, r); + } + delete h; + } + fs.umount(); + + g_ceph_context->_conf.set_val("bluefs_buffered_io", stringify((int)old)); + + rm_temp_bdev(fn); +} + #define ALLOC_SIZE 4096 void write_data(BlueFS &fs, uint64_t rationed_bytes) diff --git a/ceph/src/test/osd/TestOSDMap.cc b/ceph/src/test/osd/TestOSDMap.cc index 0ed5b477c..2bd873fd0 100644 --- a/ceph/src/test/osd/TestOSDMap.cc +++ b/ceph/src/test/osd/TestOSDMap.cc @@ -2,6 +2,7 @@ #include "gtest/gtest.h" #include "osd/OSDMap.h" #include "osd/OSDMapMapping.h" +#include "mon/OSDMonitor.h" #include "global/global_context.h" #include "global/global_init.h" @@ -186,6 +187,21 @@ public: cout << "first: " << *first << std::endl;; cout << "primary: " << *primary << std::endl;; } + void clean_pg_upmaps(CephContext *cct, + const OSDMap& om, + OSDMap::Incremental& pending_inc) { + int cpu_num = 8; + int pgs_per_chunk = 256; + ThreadPool tp(cct, "BUG_40104::clean_upmap_tp", "clean_upmap_tp", cpu_num); + tp.start(); + ParallelPGMapper mapper(cct, &tp); + vector pgs_to_check; + om.get_upmap_pgs(&pgs_to_check); + OSDMonitor::CleanUpmapJob job(cct, om, pending_inc); + mapper.queue(&job, pgs_per_chunk, pgs_to_check); + job.wait(); + tp.stop(); + } }; TEST_F(OSDMapTest, Create) { @@ -667,7 +683,7 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { nextmap.apply_incremental(pending_inc); ASSERT_TRUE(nextmap.have_pg_upmaps(pgid)); OSDMap::Incremental new_pending_inc(nextmap.get_epoch() + 1); - nextmap.clean_pg_upmaps(g_ceph_context, &new_pending_inc); + clean_pg_upmaps(g_ceph_context, nextmap, new_pending_inc); nextmap.apply_incremental(new_pending_inc); ASSERT_TRUE(!nextmap.have_pg_upmaps(pgid)); } @@ -714,12 +730,9 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { ASSERT_TRUE(tmpmap.have_pg_upmaps(ec_pgid)); } { - // confirm *maybe_remove_pg_upmaps* won't do anything bad + // confirm *clean_pg_upmaps* won't do anything bad OSDMap::Incremental pending_inc(tmpmap.get_epoch() + 1); - OSDMap nextmap; - nextmap.deepish_copy_from(tmpmap); - nextmap.maybe_remove_pg_upmaps(g_ceph_context, tmpmap, - nextmap, &pending_inc); + clean_pg_upmaps(g_ceph_context, tmpmap, pending_inc); tmpmap.apply_incremental(pending_inc); ASSERT_TRUE(tmpmap.have_pg_upmaps(ec_pgid)); } @@ -767,12 +780,9 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { ASSERT_TRUE(tmpmap.have_pg_upmaps(ec_pgid)); } { - // *maybe_remove_pg_upmaps* should be able to remove the above *bad* mapping + // *clean_pg_upmaps* should be able to remove the above *bad* mapping OSDMap::Incremental pending_inc(tmpmap.get_epoch() + 1); - OSDMap nextmap; - nextmap.deepish_copy_from(tmpmap); - nextmap.maybe_remove_pg_upmaps(g_ceph_context, tmpmap, - nextmap, &pending_inc); + clean_pg_upmaps(g_ceph_context, tmpmap, pending_inc); tmpmap.apply_incremental(pending_inc); ASSERT_TRUE(!tmpmap.have_pg_upmaps(ec_pgid)); } @@ -894,12 +904,9 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { ASSERT_TRUE(tmp.have_pg_upmaps(ec_pgid)); } { - // *maybe_remove_pg_upmaps* should not remove the above upmap_item + // *clean_pg_upmaps* should not remove the above upmap_item OSDMap::Incremental pending_inc(tmp.get_epoch() + 1); - OSDMap nextmap; - nextmap.deepish_copy_from(tmp); - nextmap.maybe_remove_pg_upmaps(g_ceph_context, tmp, - nextmap, &pending_inc); + clean_pg_upmaps(g_ceph_context, tmp, pending_inc); tmp.apply_incremental(pending_inc); ASSERT_TRUE(tmp.have_pg_upmaps(ec_pgid)); } @@ -964,10 +971,7 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { { // STEP-2: apply cure OSDMap::Incremental pending_inc(osdmap.get_epoch() + 1); - OSDMap tmpmap; - tmpmap.deepish_copy_from(osdmap); - tmpmap.apply_incremental(pending_inc); - osdmap.maybe_remove_pg_upmaps(g_ceph_context, osdmap, tmpmap, &pending_inc); + clean_pg_upmaps(g_ceph_context, osdmap, pending_inc); osdmap.apply_incremental(pending_inc); { // validate pg_upmap is gone (reverted) @@ -1101,11 +1105,7 @@ TEST_F(OSDMapTest, CleanPGUpmaps) { { // STEP-4: apply cure OSDMap::Incremental pending_inc(osdmap.get_epoch() + 1); - OSDMap tmpmap; - tmpmap.deepish_copy_from(osdmap); - tmpmap.apply_incremental(pending_inc); - osdmap.maybe_remove_pg_upmaps(g_ceph_context, osdmap, tmpmap, - &pending_inc); + clean_pg_upmaps(g_ceph_context, osdmap, pending_inc); osdmap.apply_incremental(pending_inc); { // validate pg_upmap_items is gone (reverted) @@ -1341,6 +1341,62 @@ TEST_F(OSDMapTest, BUG_38897) { } } +TEST_F(OSDMapTest, BUG_40104) { + // http://tracker.ceph.com/issues/40104 + int big_osd_num = 5000; + int big_pg_num = 10000; + set_up_map(big_osd_num, true); + int pool_id; + { + OSDMap::Incremental pending_inc(osdmap.get_epoch() + 1); + pending_inc.new_pool_max = osdmap.get_pool_max(); + pool_id = ++pending_inc.new_pool_max; + pg_pool_t empty; + auto p = pending_inc.get_new_pool(pool_id, &empty); + p->size = 3; + p->min_size = 1; + p->set_pg_num(big_pg_num); + p->set_pgp_num(big_pg_num); + p->type = pg_pool_t::TYPE_REPLICATED; + p->crush_rule = 0; + p->set_flag(pg_pool_t::FLAG_HASHPSPOOL); + pending_inc.new_pool_names[pool_id] = "big_pool"; + osdmap.apply_incremental(pending_inc); + ASSERT_TRUE(osdmap.have_pg_pool(pool_id)); + ASSERT_TRUE(osdmap.get_pool_name(pool_id) == "big_pool"); + } + { + // generate pg_upmap_items for each pg + OSDMap::Incremental pending_inc(osdmap.get_epoch() + 1); + for (int i = 0; i < big_pg_num; i++) { + pg_t rawpg(i, pool_id); + pg_t pgid = osdmap.raw_pg_to_pg(rawpg); + vector up; + int up_primary; + osdmap.pg_to_raw_up(pgid, &up, &up_primary); + ASSERT_TRUE(up.size() == 3); + int victim = up[0]; + int replaced_by = random() % big_osd_num; + vector> new_pg_upmap_items; + // note that it might or might not be valid, we don't care + new_pg_upmap_items.push_back(make_pair(victim, replaced_by)); + pending_inc.new_pg_upmap_items[pgid] = + mempool::osdmap::vector>( + new_pg_upmap_items.begin(), new_pg_upmap_items.end()); + } + osdmap.apply_incremental(pending_inc); + } + { + OSDMap::Incremental pending_inc(osdmap.get_epoch() + 1); + auto start = mono_clock::now(); + clean_pg_upmaps(g_ceph_context, osdmap, pending_inc); + auto latency = mono_clock::now() - start; + std::cout << "clean_pg_upmaps (~" << big_pg_num + << " pg_upmap_items) latency:" << timespan_str(latency) + << std::endl; + } +} + TEST(PGTempMap, basic) { PGTempMap m; diff --git a/ceph/src/test/pybind/test_rbd.py b/ceph/src/test/pybind/test_rbd.py index 1035ae319..0742ab806 100644 --- a/ceph/src/test/pybind/test_rbd.py +++ b/ceph/src/test/pybind/test_rbd.py @@ -1,4 +1,5 @@ # vim: expandtab smarttab shiftwidth=4 softtabstop=4 +import errno import functools import socket import os @@ -15,6 +16,7 @@ from rados import (Rados, from rbd import (RBD, Group, Image, ImageNotFound, InvalidArgument, ImageExists, ImageBusy, ImageHasSnapshots, ReadOnlyImage, FunctionNotSupported, ArgumentOutOfRange, + ECANCELED, OperationCanceled, DiskQuotaExceeded, ConnectionShutdown, PermissionError, RBD_FEATURE_LAYERING, RBD_FEATURE_STRIPINGV2, RBD_FEATURE_EXCLUSIVE_LOCK, RBD_FEATURE_JOURNALING, @@ -324,6 +326,24 @@ def test_list(): image_id = image.id() eq([{'id': image_id, 'name': image_name}], list(RBD().list2(ioctx))) +@with_setup(create_image) +def test_remove_with_progress(): + d = {'received_callback': False} + def progress_cb(current, total): + d['received_callback'] = True + return 0 + + RBD().remove(ioctx, image_name, on_progress=progress_cb) + eq(True, d['received_callback']) + +@with_setup(create_image) +def test_remove_canceled(): + def progress_cb(current, total): + return -ECANCELED + + assert_raises(OperationCanceled, RBD().remove, ioctx, image_name, + on_progress=progress_cb) + @with_setup(create_image, remove_image) def test_rename(): rbd = RBD() @@ -1519,6 +1539,22 @@ class TestClone(object): self.clone.remove_snap('snap2') self.rbd.remove(ioctx, clone_name3) + def test_flatten_with_progress(self): + d = {'received_callback': False} + def progress_cb(current, total): + d['received_callback'] = True + return 0 + + global ioctx + global features + clone_name = get_temp_image_name() + self.rbd.clone(ioctx, image_name, 'snap1', ioctx, clone_name, + features, 0) + with Image(ioctx, clone_name) as clone: + clone.flatten(on_progress=progress_cb) + self.rbd.remove(ioctx, clone_name) + eq(True, d['received_callback']) + def test_resize_flatten_multi_level(self): self.clone.create_snap('snap2') self.clone.protect_snap('snap2') @@ -1919,6 +1955,20 @@ class TestTrash(object): RBD().trash_move(ioctx, image_name, 0) RBD().trash_remove(ioctx, image_id) + def test_remove_with_progress(self): + d = {'received_callback': False} + def progress_cb(current, total): + d['received_callback'] = True + return 0 + + create_image() + with Image(ioctx, image_name) as image: + image_id = image.id() + + RBD().trash_move(ioctx, image_name, 0) + RBD().trash_remove(ioctx, image_id, on_progress=progress_cb) + eq(True, d['received_callback']) + def test_get(self): create_image() with Image(ioctx, image_name) as image: @@ -2165,6 +2215,24 @@ class TestMigration(object): RBD().migration_commit(ioctx, image_name) remove_image() + def test_migration_with_progress(self): + d = {'received_callback': False} + def progress_cb(current, total): + d['received_callback'] = True + return 0 + + create_image() + RBD().migration_prepare(ioctx, image_name, ioctx, image_name, features=63, + order=23, stripe_unit=1<<23, stripe_count=1, + data_pool=None) + RBD().migration_execute(ioctx, image_name, on_progress=progress_cb) + eq(True, d['received_callback']) + d['received_callback'] = False + + RBD().migration_commit(ioctx, image_name, on_progress=progress_cb) + eq(True, d['received_callback']) + remove_image() + def test_migrate_abort(self): create_image() RBD().migration_prepare(ioctx, image_name, ioctx, image_name, features=63, @@ -2172,3 +2240,17 @@ class TestMigration(object): data_pool=None) RBD().migration_abort(ioctx, image_name) remove_image() + + def test_migrate_abort_with_progress(self): + d = {'received_callback': False} + def progress_cb(current, total): + d['received_callback'] = True + return 0 + + create_image() + RBD().migration_prepare(ioctx, image_name, ioctx, image_name, features=63, + order=23, stripe_unit=1<<23, stripe_count=1, + data_pool=None) + RBD().migration_abort(ioctx, image_name, on_progress=progress_cb) + eq(True, d['received_callback']) + remove_image() diff --git a/ceph/src/test/rbd_mirror/test_ImageSync.cc b/ceph/src/test/rbd_mirror/test_ImageSync.cc index fbd338111..7f9ae105d 100644 --- a/ceph/src/test/rbd_mirror/test_ImageSync.cc +++ b/ceph/src/test/rbd_mirror/test_ImageSync.cc @@ -30,7 +30,7 @@ namespace { int flush(librbd::ImageCtx *image_ctx) { C_SaferCond ctx; - auto aio_comp = librbd::io::AioCompletion::create( + auto aio_comp = librbd::io::AioCompletion::create_and_start( &ctx, image_ctx, librbd::io::AIO_TYPE_FLUSH); auto req = librbd::io::ImageDispatchSpec<>::create_flush_request( *image_ctx, aio_comp, librbd::io::FLUSH_SOURCE_INTERNAL, {}); diff --git a/ceph/src/test/rbd_mirror/test_mock_ImageSyncThrottler.cc b/ceph/src/test/rbd_mirror/test_mock_ImageSyncThrottler.cc index f30a299f1..af88edcba 100644 --- a/ceph/src/test/rbd_mirror/test_mock_ImageSyncThrottler.cc +++ b/ceph/src/test/rbd_mirror/test_mock_ImageSyncThrottler.cc @@ -113,6 +113,65 @@ TEST_F(TestMockImageSyncThrottler, Cancel_Running_Sync_Start_Waiting) { throttler.finish_op("id2"); } +TEST_F(TestMockImageSyncThrottler, Duplicate) { + MockImageSyncThrottler throttler(g_ceph_context); + throttler.set_max_concurrent_syncs(1); + + C_SaferCond on_start1; + throttler.start_op("id1", &on_start1); + ASSERT_EQ(0, on_start1.wait()); + + C_SaferCond on_start2; + throttler.start_op("id1", &on_start2); + ASSERT_EQ(0, on_start2.wait()); + + C_SaferCond on_start3; + throttler.start_op("id2", &on_start3); + C_SaferCond on_start4; + throttler.start_op("id2", &on_start4); + ASSERT_EQ(-ENOENT, on_start3.wait()); + + throttler.finish_op("id1"); + ASSERT_EQ(0, on_start4.wait()); + throttler.finish_op("id2"); +} + +TEST_F(TestMockImageSyncThrottler, Duplicate2) { + MockImageSyncThrottler throttler(g_ceph_context); + throttler.set_max_concurrent_syncs(2); + + C_SaferCond on_start1; + throttler.start_op("id1", &on_start1); + ASSERT_EQ(0, on_start1.wait()); + C_SaferCond on_start2; + throttler.start_op("id2", &on_start2); + ASSERT_EQ(0, on_start2.wait()); + + C_SaferCond on_start3; + throttler.start_op("id3", &on_start3); + C_SaferCond on_start4; + throttler.start_op("id3", &on_start4); // dup + ASSERT_EQ(-ENOENT, on_start3.wait()); + + C_SaferCond on_start5; + throttler.start_op("id4", &on_start5); + + throttler.finish_op("id1"); + ASSERT_EQ(0, on_start4.wait()); + + throttler.finish_op("id2"); + ASSERT_EQ(0, on_start5.wait()); + + C_SaferCond on_start6; + throttler.start_op("id5", &on_start6); + + throttler.finish_op("id3"); + ASSERT_EQ(0, on_start6.wait()); + + throttler.finish_op("id4"); + throttler.finish_op("id5"); +} + TEST_F(TestMockImageSyncThrottler, Increase_Max_Concurrent_Syncs) { MockImageSyncThrottler throttler(g_ceph_context); throttler.set_max_concurrent_syncs(2); diff --git a/ceph/src/tools/ceph_kvstore_tool.cc b/ceph/src/tools/ceph_kvstore_tool.cc index 6f9087691..4a4f5214d 100644 --- a/ceph/src/tools/ceph_kvstore_tool.cc +++ b/ceph/src/tools/ceph_kvstore_tool.cc @@ -47,6 +47,7 @@ void usage(const char *pname) << " compact-prefix \n" << " compact-range \n" << " destructive-repair (use only as last resort! may corrupt healthy data)\n" + << " stats\n" << std::endl; } @@ -97,7 +98,8 @@ int main(int argc, const char *argv[]) } bool need_open_db = (cmd != "destructive-repair"); - StoreTool st(type, path, need_open_db); + bool need_stats = (cmd == "stats"); + StoreTool st(type, path, need_open_db, need_stats); if (cmd == "destructive-repair") { int ret = st.destructive_repair(); @@ -343,6 +345,8 @@ int main(int argc, const char *argv[]) string start(url_unescape(argv[5])); string end(url_unescape(argv[6])); st.compact_range(prefix, start, end); + } else if (cmd == "stats") { + st.print_stats(); } else { std::cerr << "Unrecognized command: " << cmd << std::endl; return 1; diff --git a/ceph/src/tools/cephfs/cephfs-shell b/ceph/src/tools/cephfs/cephfs-shell index 05e1fa447..09eed5a00 100644 --- a/ceph/src/tools/cephfs/cephfs-shell +++ b/ceph/src/tools/cephfs/cephfs-shell @@ -55,7 +55,7 @@ shell = None def poutput(s, end='\n'): - shell.poutput(s, end) + shell.poutput(s, end=end) def setup_cephfs(config_file): @@ -725,12 +725,12 @@ exists.') if dirs: paths.extend(dirs) else: - self.poutput(path, ':\n') + self.poutput(path, end=':\n') items = sorted(items, key=lambda item: item.d_name) else: if path != '' and path != cephfs.getcwd().decode( 'utf-8') and len(paths) > 1: - self.poutput(path, ':\n') + self.poutput(path, end=':\n') items = sorted(ls(path), key=lambda item: item.d_name) if not args.all: @@ -769,7 +769,7 @@ exists.') if not args.long: print_list(values, shutil.get_terminal_size().columns) if path != paths[-1]: - self.poutput('\n') + self.poutput('') def complete_rmdir(self, text, line, begidx, endidx): """ @@ -843,9 +843,10 @@ sub-directories, files') """ Remove a specific file """ - for path in args.paths: + file_paths = args.paths + for path in file_paths: if path.count('*') > 0: - files.extend([i for i in get_all_possible_paths( + file_paths.extend([i for i in get_all_possible_paths( path) if is_file_exists(i)]) else: try: diff --git a/ceph/src/tools/kvstore_tool.cc b/ceph/src/tools/kvstore_tool.cc index 7ed6c9e2d..ed33b29c6 100644 --- a/ceph/src/tools/kvstore_tool.cc +++ b/ceph/src/tools/kvstore_tool.cc @@ -10,9 +10,18 @@ #include "include/buffer.h" #include "kv/KeyValueDB.h" -StoreTool::StoreTool(const string& type, const string& path, bool need_open_db) +StoreTool::StoreTool(const string& type, + const string& path, + bool need_open_db, + bool need_stats) : store_path(path) { + + if (need_stats) { + g_conf()->rocksdb_perf = true; + g_conf()->rocksdb_collect_compaction_stats = true; + } + if (type == "bluestore-kv") { #ifdef WITH_BLUESTORE if (load_bluestore(path, need_open_db) != 0) @@ -200,6 +209,25 @@ void StoreTool::print_summary(const uint64_t total_keys, const uint64_t total_si std::cout << " duration " << duration << " seconds" << std::endl; } +int StoreTool::print_stats() const +{ + ostringstream ostr; + Formatter* f = Formatter::create("json-pretty", "json-pretty", "json-pretty"); + int ret = -1; + if (g_conf()->rocksdb_perf) { + db->get_statistics(f); + ostr << "db_statistics "; + f->flush(ostr); + ret = 0; + } else { + ostr << "db_statistics not enabled"; + f->flush(ostr); + } + std::cout << ostr.str() << std::endl; + delete f; + return ret; +} + int StoreTool::copy_store_to(const string& type, const string& other_path, const int num_keys_per_tx, const string& other_type) diff --git a/ceph/src/tools/kvstore_tool.h b/ceph/src/tools/kvstore_tool.h index 320302472..d8c896613 100644 --- a/ceph/src/tools/kvstore_tool.h +++ b/ceph/src/tools/kvstore_tool.h @@ -43,7 +43,8 @@ class StoreTool public: StoreTool(const std::string& type, const std::string& path, - bool need_open_db=true); + bool need_open_db = true, + bool need_stats = false); int load_bluestore(const std::string& path, bool need_open_db); uint32_t traverse(const std::string& prefix, const bool do_crc, @@ -74,4 +75,6 @@ public: const std::string& start, const std::string& end); int destructive_repair(); + + int print_stats() const; }; diff --git a/ceph/src/tools/rbd/action/Config.cc b/ceph/src/tools/rbd/action/Config.cc index 40093172f..2868c7ad1 100644 --- a/ceph/src/tools/rbd/action/Config.cc +++ b/ceph/src/tools/rbd/action/Config.cc @@ -676,7 +676,11 @@ int execute_image_set(const po::variables_map &vm, return r; } - std::string value = utils::get_positional_argument(vm, 2); + std::string value = utils::get_positional_argument(vm, arg_index); + if (value.empty()) { + std::cerr << "rbd: image config value was not specified" << std::endl; + return -EINVAL; + } librados::Rados rados; librados::IoCtx io_ctx; diff --git a/ceph/src/tools/rbd/action/Export.cc b/ceph/src/tools/rbd/action/Export.cc index cbad1cd28..b5b82f4c0 100644 --- a/ceph/src/tools/rbd/action/Export.cc +++ b/ceph/src/tools/rbd/action/Export.cc @@ -316,26 +316,25 @@ Shell::Action action_diff( class C_Export : public Context { public: - C_Export(SimpleThrottle &simple_throttle, librbd::Image &image, + C_Export(OrderedThrottle &ordered_throttle, librbd::Image &image, uint64_t fd_offset, uint64_t offset, uint64_t length, int fd) - : m_aio_completion( - new librbd::RBD::AioCompletion(this, &utils::aio_context_callback)), - m_throttle(simple_throttle), m_image(image), m_dest_offset(fd_offset), + : m_throttle(ordered_throttle), m_image(image), m_dest_offset(fd_offset), m_offset(offset), m_length(length), m_fd(fd) { } void send() { - m_throttle.start_op(); - + auto ctx = m_throttle.start_op(this); + auto aio_completion = new librbd::RBD::AioCompletion( + ctx, &utils::aio_context_callback); int op_flags = LIBRADOS_OP_FLAG_FADVISE_SEQUENTIAL | LIBRADOS_OP_FLAG_FADVISE_NOCACHE; int r = m_image.aio_read2(m_offset, m_length, m_bufferlist, - m_aio_completion, op_flags); + aio_completion, op_flags); if (r < 0) { cerr << "rbd: error requesting read from source image" << std::endl; - m_aio_completion->release(); + aio_completion->release(); m_throttle.end_op(r); } } @@ -376,8 +375,7 @@ public: } private: - librbd::RBD::AioCompletion *m_aio_completion; - SimpleThrottle &m_throttle; + OrderedThrottle &m_throttle; librbd::Image &m_image; bufferlist m_bufferlist; uint64_t m_dest_offset; @@ -517,19 +515,21 @@ static int do_export_v2(librbd::Image& image, librbd::image_info_t &info, int fd return r; } -static int do_export_v1(librbd::Image& image, librbd::image_info_t &info, int fd, - uint64_t period, int max_concurrent_ops, utils::ProgressContext &pc) +static int do_export_v1(librbd::Image& image, librbd::image_info_t &info, + int fd, uint64_t period, int max_concurrent_ops, + utils::ProgressContext &pc) { int r = 0; size_t file_size = 0; - SimpleThrottle throttle(max_concurrent_ops, false); + OrderedThrottle throttle(max_concurrent_ops, false); for (uint64_t offset = 0; offset < info.size; offset += period) { if (throttle.pending_error()) { break; } uint64_t length = min(period, info.size - offset); - C_Export *ctx = new C_Export(throttle, image, file_size + offset, offset, length, fd); + C_Export *ctx = new C_Export(throttle, image, file_size + offset, offset, + length, fd); ctx->send(); pc.update_progress(offset, info.size); @@ -551,7 +551,8 @@ static int do_export_v1(librbd::Image& image, librbd::image_info_t &info, int fd return r; } -static int do_export(librbd::Image& image, const char *path, bool no_progress, int export_format) +static int do_export(librbd::Image& image, const char *path, bool no_progress, + int export_format) { librbd::image_info_t info; int64_t r = image.stat(info, sizeof(info)); @@ -559,13 +560,11 @@ static int do_export(librbd::Image& image, const char *path, bool no_progress, i return r; int fd; - int max_concurrent_ops; + int max_concurrent_ops = g_conf().get_val("rbd_concurrent_management_ops"); bool to_stdout = (strcmp(path, "-") == 0); if (to_stdout) { fd = STDOUT_FILENO; - max_concurrent_ops = 1; } else { - max_concurrent_ops = g_conf().get_val("rbd_concurrent_management_ops"); fd = open(path, O_WRONLY | O_CREAT | O_EXCL, 0644); if (fd < 0) { return -errno; diff --git a/ceph/src/tools/rbd/action/ImageMeta.cc b/ceph/src/tools/rbd/action/ImageMeta.cc index 1ea7413d1..20c4555da 100644 --- a/ceph/src/tools/rbd/action/ImageMeta.cc +++ b/ceph/src/tools/rbd/action/ImageMeta.cc @@ -256,7 +256,7 @@ int execute_set(const po::variables_map &vm, return r; } - std::string value = utils::get_positional_argument(vm, 2); + std::string value = utils::get_positional_argument(vm, arg_index); if (value.empty()) { std::cerr << "rbd: metadata value was not specified" << std::endl; return -EINVAL; diff --git a/ceph/src/tools/rbd/action/Journal.cc b/ceph/src/tools/rbd/action/Journal.cc index e36c6a6a5..d3a54f94f 100644 --- a/ceph/src/tools/rbd/action/Journal.cc +++ b/ceph/src/tools/rbd/action/Journal.cc @@ -832,7 +832,7 @@ public: if (r < 0) { return r; } - m_journaler.start_append(0, 0, 0, 0); + m_journaler.start_append(0); int r1 = 0; bufferlist bl; diff --git a/ceph/src/tools/rbd/action/Lock.cc b/ceph/src/tools/rbd/action/Lock.cc index 4ede92450..754cb384c 100644 --- a/ceph/src/tools/rbd/action/Lock.cc +++ b/ceph/src/tools/rbd/action/Lock.cc @@ -24,11 +24,14 @@ void add_id_option(po::options_description *positional) { ("lock-id", "unique lock id"); } -int get_id(const po::variables_map &vm, std::string *id) { - *id = utils::get_positional_argument(vm, 1); +int get_id(const po::variables_map &vm, size_t *arg_index, + std::string *id) { + *id = utils::get_positional_argument(vm, *arg_index); if (id->empty()) { std::cerr << "rbd: lock id was not specified" << std::endl; return -EINVAL; + } else { + ++(*arg_index); } return 0; } @@ -172,7 +175,7 @@ int execute_add(const po::variables_map &vm, } std::string lock_cookie; - r = get_id(vm, &lock_cookie); + r = get_id(vm, &arg_index, &lock_cookie); if (r < 0) { return r; } @@ -233,12 +236,12 @@ int execute_remove(const po::variables_map &vm, } std::string lock_cookie; - r = get_id(vm, &lock_cookie); + r = get_id(vm, &arg_index, &lock_cookie); if (r < 0) { return r; } - std::string lock_client = utils::get_positional_argument(vm, 2); + std::string lock_client = utils::get_positional_argument(vm, arg_index); if (lock_client.empty()) { std::cerr << "rbd: locker was not specified" << std::endl; return -EINVAL; diff --git a/ceph/src/tools/rbd_mirror/CMakeLists.txt b/ceph/src/tools/rbd_mirror/CMakeLists.txt index fb39f9c52..6184e418d 100644 --- a/ceph/src/tools/rbd_mirror/CMakeLists.txt +++ b/ceph/src/tools/rbd_mirror/CMakeLists.txt @@ -64,5 +64,6 @@ target_link_libraries(rbd-mirror cls_rbd_client cls_lock_client cls_journal_client - global) + global + ${ALLOC_LIBS}) install(TARGETS rbd-mirror DESTINATION bin) diff --git a/ceph/src/tools/rbd_mirror/ImageSyncThrottler.cc b/ceph/src/tools/rbd_mirror/ImageSyncThrottler.cc index c2e618bf4..b395a0127 100644 --- a/ceph/src/tools/rbd_mirror/ImageSyncThrottler.cc +++ b/ceph/src/tools/rbd_mirror/ImageSyncThrottler.cc @@ -51,11 +51,16 @@ template void ImageSyncThrottler::start_op(const std::string &id, Context *on_start) { dout(20) << "id=" << id << dendl; + int r = 0; { Mutex::Locker locker(m_lock); if (m_inflight_ops.count(id) > 0) { dout(20) << "duplicate for already started op " << id << dendl; + } else if (m_queued_ops.count(id) > 0) { + dout(20) << "duplicate for already queued op " << id << dendl; + std::swap(m_queued_ops[id], on_start); + r = -ENOENT; } else if (m_max_concurrent_syncs == 0 || m_inflight_ops.size() < m_max_concurrent_syncs) { ceph_assert(m_queue.empty()); @@ -64,14 +69,14 @@ void ImageSyncThrottler::start_op(const std::string &id, Context *on_start) { << m_inflight_ops.size() << "/" << m_max_concurrent_syncs << "]" << dendl; } else { - m_queue.push_back(std::make_pair(id, on_start)); - on_start = nullptr; + m_queue.push_back(id); + std::swap(m_queued_ops[id], on_start); dout(20) << "image sync for " << id << " has been queued" << dendl; } } if (on_start != nullptr) { - on_start->complete(0); + on_start->complete(r); } } @@ -82,13 +87,12 @@ bool ImageSyncThrottler::cancel_op(const std::string &id) { Context *on_start = nullptr; { Mutex::Locker locker(m_lock); - for (auto it = m_queue.begin(); it != m_queue.end(); ++it) { - if (it->first == id) { - on_start = it->second; - dout(20) << "canceled queued sync for " << id << dendl; - m_queue.erase(it); - break; - } + auto it = m_queued_ops.find(id); + if (it != m_queued_ops.end()) { + dout(20) << "canceled queued sync for " << id << dendl; + m_queue.remove(id); + on_start = it->second; + m_queued_ops.erase(it); } } @@ -115,12 +119,15 @@ void ImageSyncThrottler::finish_op(const std::string &id) { m_inflight_ops.erase(id); if (m_inflight_ops.size() < m_max_concurrent_syncs && !m_queue.empty()) { - auto pair = m_queue.front(); - m_inflight_ops.insert(pair.first); - dout(20) << "ready to start sync for " << pair.first << " [" + auto id = m_queue.front(); + auto it = m_queued_ops.find(id); + ceph_assert(it != m_queued_ops.end()); + m_inflight_ops.insert(id); + dout(20) << "ready to start sync for " << id << " [" << m_inflight_ops.size() << "/" << m_max_concurrent_syncs << "]" << dendl; - on_start= pair.second; + on_start = it->second; + m_queued_ops.erase(it); m_queue.pop_front(); } } @@ -134,15 +141,16 @@ template void ImageSyncThrottler::drain(int r) { dout(20) << dendl; - std::list> queue; + std::map queued_ops; { Mutex::Locker locker(m_lock); - std::swap(m_queue, queue); + std::swap(m_queued_ops, queued_ops); + m_queue.clear(); m_inflight_ops.clear(); } - for (auto &pair : queue) { - pair.second->complete(r); + for (auto &it : queued_ops) { + it.second->complete(r); } } @@ -159,12 +167,15 @@ void ImageSyncThrottler::set_max_concurrent_syncs(uint32_t max) { while ((m_max_concurrent_syncs == 0 || m_inflight_ops.size() < m_max_concurrent_syncs) && !m_queue.empty()) { - auto pair = m_queue.front(); - m_inflight_ops.insert(pair.first); - dout(20) << "ready to start sync for " << pair.first << " [" + auto id = m_queue.front(); + m_inflight_ops.insert(id); + dout(20) << "ready to start sync for " << id << " [" << m_inflight_ops.size() << "/" << m_max_concurrent_syncs << "]" << dendl; - ops.push_back(pair.second); + auto it = m_queued_ops.find(id); + ceph_assert(it != m_queued_ops.end()); + ops.push_back(it->second); + m_queued_ops.erase(it); m_queue.pop_front(); } } diff --git a/ceph/src/tools/rbd_mirror/ImageSyncThrottler.h b/ceph/src/tools/rbd_mirror/ImageSyncThrottler.h index 8c8f75462..c0cda61e9 100644 --- a/ceph/src/tools/rbd_mirror/ImageSyncThrottler.h +++ b/ceph/src/tools/rbd_mirror/ImageSyncThrottler.h @@ -5,6 +5,7 @@ #define RBD_MIRROR_IMAGE_SYNC_THROTTLER_H #include +#include #include #include #include @@ -47,7 +48,8 @@ private: CephContext *m_cct; Mutex m_lock; uint32_t m_max_concurrent_syncs; - std::list> m_queue; + std::list m_queue; + std::map m_queued_ops; std::set m_inflight_ops; const char **get_tracked_conf_keys() const override; diff --git a/ceph/src/tools/rbd_mirror/InstanceWatcher.cc b/ceph/src/tools/rbd_mirror/InstanceWatcher.cc index 5889e0135..d9e1ba233 100644 --- a/ceph/src/tools/rbd_mirror/InstanceWatcher.cc +++ b/ceph/src/tools/rbd_mirror/InstanceWatcher.cc @@ -1176,6 +1176,9 @@ void InstanceWatcher::handle_sync_request(const std::string &instance_id, if (r == 0) { notify_sync_start(instance_id, sync_id); } + if (r == -ENOENT) { + r = 0; + } on_finish->complete(r); })); m_image_sync_throttler->start_op(sync_id, on_start); diff --git a/ceph/src/tools/rbd_nbd/rbd-nbd.cc b/ceph/src/tools/rbd_nbd/rbd-nbd.cc index 0e8fb16a8..5aa3ea109 100644 --- a/ceph/src/tools/rbd_nbd/rbd-nbd.cc +++ b/ceph/src/tools/rbd_nbd/rbd-nbd.cc @@ -89,7 +89,7 @@ static void usage() << " unmap Unmap nbd device\n" << " [options] list-mapped List mapped nbd devices\n" << "Map options:\n" - << " --device Specify nbd device path\n" + << " --device Specify nbd device path (/dev/nbd{num})\n" << " --read-only Map read-only\n" << " --nbds_max Override for module param nbds_max\n" << " --max_part Override for module param max_part\n" @@ -782,7 +782,10 @@ static int do_map(int argc, const char *argv[], Config *cfg) } } else { r = sscanf(cfg->devpath.c_str(), "/dev/nbd%d", &index); - if (r < 0) { + if (r <= 0) { + // mean an early matching failure. But some cases need a negative value. + if (r == 0) + r = -EINVAL; cerr << "rbd-nbd: invalid device path: " << cfg->devpath << " (expected /dev/nbd{num})" << std::endl; goto close_fd; -- 2.39.2

    SpFbJ%J=k-*iYIBpR2fBX0Ye$WNton41mnv`04a4TSbc2ZGF!oP zwWi>uJ@}(p3$TX;J{R)=u*dNzkpXgCPSQT_qp%`j6}P=?Nnc|CVnIBIJ^!~5{`bS= zY4FCR`gb0A34nKal4O$p)?)wt0I2~mNMhEtjs*jhl(1PL_McdDyC_RBC7?H+I3(6s zfz?=4-r0fd*iEfB-vA^!9&ts-%4T1=4!kJ{vzLZ(2K^6mmuDuSU=nkrN$diPw!_CL zFTh(oi8nZk?m%q26m!h3>@q>p!eT|i*yAD=2{m9fl0w7YKhZA#yJpfWF9EX?$rf94 z=;6r|{i<*5$RRcUi6q;3&-z5QLM@#7fm9koUm&pVQcDvVH z%+Y&y9Yu}O2174#HY0WEPV}!%)iVkIm$`q+0^oo%B)w-fau}Hi-WVa9etxehk`#4b zG)vSp6__GlefHJ8<~Ri&oRXS5Y3d8ahO?fL+?V}R5V?UM_I+x}(me&H*z3>4x5pH} zf%L#7dMR4dLh!;dmDs(Sio{+7BZqvpcn2{8Q#4|Nk+ba1H;{awQ;M9715)FG&0&^w z7u&1v#}p8*4#ja$2S$6MU$91V_Z(0lGRz>+aF+kbCM5P5m?MRM-CnX)2OiWB`rzH> zPN8WFo0>|H@r0HQe_ELf^03Ts zJcYoi0hb%HtdL{X{M0N0e*iA0mT@{^`Wld3W+CD)tMsc*Jx1yhbLOEFWdskNuN_b_7u>&QTz&^EfCp0~toM@aJfMAY8y-$lBfryrdxoY{ zpy4zh;($qq5EfJh?|5zbJ<*99ND#z(_qkzg~AeG|p zMwERNiVR@qqO?6Jwb%g+eG;+8%g7TQqzAXK5KBqnGeED&hdhSw}`s{8y3ErzO0a$o3`H45g2e5~j8-){j zy3)=vitd5MI;ogLD9wR#>lpuddUx(PVd@Q}>}+i>TXQ0yy-?Jd0TL!Y`NPO)aLiim zf5*g+d<6I?%jfzYYi~Xb7?Zm__fBjBU?VG`^F=$rx8w_w1T>m12!NIeDgRO)wirtEpBzaa>Xu$G($iwSrlBu_C4e31aZhTZe2 zfu^Ln>1r$j_#?Y<(U-@-l3s#p;3 zYkqsznXolK08~=g5WE-b0C*!x9Ok+k@C0DfS-?aVWuh(Y6wK_v#R>b=s=UYI9KahR zm0r!!n!*78qMKe3cAq1J0}x)!AY|n=BofG2uiG<6@}i&}I``@V8Cz5N=t(Jd6RhU! z^2g9URuc^vbL~At$%W?EK*$2$3q|dII6w*kR?jW}eUbuc4cN%3&64rR?wci`0LH#z zHhW_7JXj5lV%}b=^}*BKBFIYO!HTARIMqW+$*zyeuvY4>AVEFPGoS!p@f+p!juDO51xb|%iL zLFCw864DM_zf|?RAnWDQ@Ao+i(Bns>?+$JfJWr+cRqjO~6);4Un8T)GtS1#06^ylP zO|4w#U{mT}g?#nuOel&mi_k9$nPMqjh{H-bemnZ(zvj3>9W>K(3jp5vqyK6$^ zcW{hR%*gBoQDDGu!UtaU?eWG_fZw9Sj6HvE#(+1jMYEQW$JI#!iu?|Yn_?-cMkg6W z9n|-~xv<9@b4jW-N^3tm@aDiAffKj~2Ui33U>r@R2RF;V15f9~NAulO4bX$}fH9c` zb;J~6g#jOxTy2si$4=kTAy0tdVk$T&D0^=MDU*HsVGk#m9}DJqhocjabrhUA0@NEF z_E?M=c<^52Rs6+fL$DZ@NZZO?2@VM&aiQbLu}BEVpG22$ha7ZR_0bHHN z10w(AkN+yt0T4i)CWs3CL7{W;K2O2c_kkd%4vk}b9K&4YVl!dh<{v4k9sPh1+I{T| z^QvL>m=eb}Wr#>ui7&X~HQQ%BDlp_U;c5aqC`tRpDvNp`AQdGqDn2k=jq(Kab)b!M<2&y+gfePbyOKvlcm z86uJFp>3D6yH?Wy7%l^G4c}Ilp_o6gp&_KtsXHm!*pS_neG2iIUL(ezZsh~$= zc(HkvB2t*QKEQ4w=XzVlgmI#dQVE-to9R8VX%u^FDxvQVMR5l4z!|n*)MT?2*yH1o z${%!m6vO836+Io|O;Cj9C0P*W5?8uAYua9O48#O#=ZY$_b_-+I7y(1#DUZ*@M6l?U ziAWY&8%11oyyZ%?DZ9puex$tR^e~8guVuk5%f3e8s6OljNOEOkLp{+J#KpXoSp-@* zT?}p|w$#l5y}nQDA|#fF>d{4vdwz6?2R1CeRkaN6;T!zzIGL?NuF-|{mg=;TysP$0 zpm_s&YDrMW>mZK#_F^D9rXR1-vT7~*Jz$>C`ZmrIf6uiiVUVudsqQP|LF9qMUqcEk zFiRyULjtT#TxyNRtnWISQ3IVqK6gfN>b5{7Z-R@6mk>TGSQhj6JYg}ktvq2+0l<}T zC{BxapL7T~$<}<9g8N?Yt3ZiSI2K1-sWZcSr$@Dzm5{;8ti%F+e~@F~)gi}J2p=UZ zt76U*vDjob7J*QdW5MYShy_>6JcW2t(K9bvis4rW6<9axR||3aLItiJtnFR&7S`$F!UHR}R4 zSc{-3IS+{i`B9AhI$J^c3GNPEi3!}SHc#JhQJmzd`k`ILkS7WeYq5WNK=R3gHsT8!^(z#m`*n=>s-c-tJGN02_0lWix&kq1i zHd7<556=<0)DdBUV)!|M^m%=D)Ak!rAwfsZ!N;3gbM^2BLELAgeqm-vYza`7vx2Oy zH%SC6I$r<7(>^JFHb{&LeHoRXloo6h~lZ-cZ2h8+NU{{0(`WUS#C?D%N|R znC2-LJN>IpMb4cBr?)OQ#$)>^=qNFp-fiO73jAzI+H{NU)Cu}`|di0 zoi%F|CT|KqHHm1C0%skjwYxtH z$p5WhOGt;25>eqt6q0--hhX!%aKb86US4ohND}PhS%(9CDM|p09fG5-f1k@?z3)E?`=-tjKO!Gx1cY}!p0RRSJneug>56(?Svu{THo9ACy1Aa!cB1Z zWILwF1nERF=(wiN5jFx=5(MF(ge`_*)q0(pV2!t_QITI}HqdPxITw_6#jjx{!;OJy2rU zh1m&axsO*eBpRw;VJyVVvCeHIJu#cUslw&#&{F#W<+55XF#ip2R$VMfS68SU`6V%? z@y>LqWO`nOkvy~eTsdmSf7WQO03as+DvD1b`A60{76zm$tL2`4DWEJes?y2*!mB?* zzzg41klsi|qEnv(O`AyKx1u$%PbZy@Fj(3m^)~UxID$!2$*Ikj+jU>$$Tx===Y*#V=XM`B-bI zoO4&)Z6_eT%k?NneFOn-wcmjzQS%5aXv2k`nm%LGZ_`;Sm^f;>_F*eEO^w?+4#n5D z>HUD%?^)&bK;COM7<0u+dakR@r*-qvv>OpaKuv!ty@11Br%61!O!Ki6S)S4m;xL$N zWo+%&?!w=1%v@AvDY-Z^my|R%m8q6te(-^OLSwFnNgReb(j?Ixy-aLOZq4Zt!Oefi zT|$TGV3|ZToZt80WyM1S(U^HP>~c$!>-4wkaq{pu1b*ZPF$7hz^$7+<%e3%|YG~&7 z)elflbk@@t3_p63*(G6&A^m-HpqPnXL{5R53#f~pcY^?4s$ z6NxBhnRw||7GHI5!(-gZjXeGomJIIf~oOC2lbyrw)Xp$ z59}X&e|5=n#R${tX=W?8Vlee>wc_gCD5+w+i8BwyYbW93{_l9sHG!@DsA!S&V{>7q z8i|^E)}B3-fr{~33x{i?iQZ;r6DkgEN18sJbCfN#9b6wRjk9C$hS8=(uXTohLadc5 zOC^lyF0F+dHZDhlH^Jw^gj;p%9nK9i$+Z_cEJMTy6(5vS2QNIY`vcQmDyZh!sO7>x z+bbj}EUa8(P=gDoSHzy1U`$>lh94zSNnxukW3Y+w2_5-~E?F69QTBI_0JnUjEaEBf zgYZJXJFXOty5g)mDu(h9IhWoHaNYv{G2FbV|lI~cqiN{)F2ovtOxRQ%W zGt8edZF3(|!u-?%k9!~9g7-oX`l+q)JW-aHCYJYWpE-S_X^~;>>?ibR4o!Uqqd+Ox zVDbW%isM*j@8M95fczQ%60acPS%=Oh(}m|atU}wsd~=ZEuTl3Ve0Nv z;=o^-N{#YmF(0`@Dd8ne%a*p)SlrK*J}H|v{t0!_h*E;~DUU{0jwLlc^ty3s@?rJI zs!Y1i@3-g#Kp(p3&}> zE6y*WtPzqYAMi>$5R<>~MhAG@(x(H3L|IFpdgKg1+2&YYnLlhby@-wVvg~akHN8s7 zXC;5&DmMJ+EDh8Z3bT3-=h!cwg9aKezdKvA_O6F^ux;|@IUioYjsxOM1Li8Qj$@NM(eD`o`#zoi~8dWR9Y#Z4j6>B>6lG#R$``efT^Cop} zD9dg)A<|Yk_Ja>RTSL zTAahR@js%-XUr*x!2Sl@Noo)oD1`MJanDKNrFriv9FG(?{|Y1!YK~bM^=SjX{ATjC zfYOCl*pGbW{xTDJ*Oh~94*NP6oZmW39iF=|^Fr+K%}8?t<@oB82KVGEp{i0j2B63F zBgY1>)NNy)dY?a(v-wy02tV3U#I+WO>qXV?)@X!;QAhe_E;HYBE?IyXd?`SG-)Cqb zJ%#VrTew%$sv5jV_?k|0jA$aCc38)wv){3MajVYyrIrm^bHmwT?%lk}_{1)T%@9P! z0#+Y_#$u9O_mynb&va9%TLzt#<_=-G_Q|yp*|n@#uYmn1Jl9y##+&!ef5^J?!}Zhy z7Zw|{tY~mS#VXbh#GwgG8fVYo17H(8V#aDM^r0;^iqZ{yXCq5Q$|!hJ`tgng<|ma< z&+*DEF2%%BLa#clT53`!+{?gp_;7)I6HO*2A{@;;s^#R7@FvPZ5~B&5O2}!gX2p$F zjD^hyNKYyGsohiZQ)f-%ew-mN^>aBFmAc4|Rg{-5fi#4$m?ZEQ{tNdNY}FaNjc97< z!y7cLLUTj$x*p^Dy(njwvEEY_NkF&FH(Jx-13N5e-18BSExx-9s#)@B7T^uiQ1Bn) z+w?v%bOH;HF|W5&*rV zXzu1Y)L+^zg)BtsE z$ot-yBR3y;b+)S1W#(}oow--ysMj);=N6TtBL9}P*N7W?LBwAwG$YK9|6-n+f3Q|U zO+8iClJ$B}X}5E>c6h6amYq~OUs<-zU}Kj0{%Uk{wjp6@JEtf2-nx#Jpkcj=r5EF1 z6LqLzBYQD?B@Yc>VVvJAY#> z545fLw+hwBU@;++#kmP}&qk(0mO)~0{r=S1-iAVuj2}E+Lz3tJswU)1;Dd+QJX&>$ zARRdlQ(h>5uhs=}7hGGb>AsLdCr5d{M-7ox^0=5GW8v~qwZx{3bJWcJ>g(-4+%RZK zVom1)y#K+Rq+2C-mb2QinK(Io4vs#+$=+3TQR_(lsz0nSxbcqJ5!yWCn)eK+%WC-0 z7~jDr%WqBUg=>a&bu(vljY6VSiSFHt8&_E*?n}0(8FyJXMlIpfq{QdQuPy!6V>bsA z1M=0HNTC~u@6g=)Ha(4;Tb8eTQVbWTJlPd(-~}S6NU7_H%vQC4EDGy3rR9C~w~dYw z?stdA5cREH=N2_N)hI#;4=wU+(;_{@lC)YvFn+25Jz3bf?X}RJ+!CTvFQmx1sdFZ_$~mT^BClfJV0a4DsZ?}$dy1zb`1*F}9G*M) zE0yBx__UM8b(=GN@UtMk#$Bo1vZa>i+5~f7GK?GaibpQ1^wNrG~pB9bPp@6J@x4 z@-H={KkdIR%ymyy&n7<`Kj$ZC>3c(~)h zN$BI!)niHO+}Aj9kAmCYsBAfK$Bf{evRt_-vPB*-)q<0$)XD6$5^Gm^GeWY*SKovy zch5TIAQ?i6P4#-$<7>H_vJ)>%y8akPI`%@7{QWPY^QWGbm<As9@mipB@!8w8Bt;AMR|;lbDs8Rrt2Q`BqU@4%#kj4u z?eusjYX|QX(pmY$h=h3$@(2|ded-VKYTI0Lzr3xjp}rv#nww#wP=CqB%?sAI@8wQ9@y&Rfu={q2sH&j2x5*9f)Cy4I6J7Ic# zi}&$tD1moIDW^0h!QGZD+{V`IhUC0?^JXq_)?j%nsp=#$2Qk_k$Ng8tpBkm6e*vk} zxlF`G=qUlq{zg22*g)K5brVN-(3g$k?Rr{-Z9hNKco%H;W4^{ZCKCGO^oN=|Hf|Z_ z9bPwD6b7^Mr?^lRS7j0!Skya^r~9wWnhe8-4*TYg)pXlC5;4QGciGzm^ix9EpY;wy zTnZdZ40+v3J57M}qoe}}9)x*mgzD1Zsa!XvV6w}KyBSY|o!?HUI)muZ?O0l*j%}{F zU%ggRO6O2~=jIeA1C*-zDq^FXh^92IgNi1m(d2H=z4HiRQn1AJ2(M)%#D*v640^GP?v;|pXoE{l&)0rO^54dE> zjN6p1wXZR1Fs+Oy=im~{XZMprZLLK%N(!7F4+zZg1RaUt+=H!g!Qu*_iQIqhvwkX{ ze==Zx12^AiY@XsO(L&U-wQjT{_{TMRBJaTA$klh2&#*(M(6A%*_(v+ad8NFnD@mh_ zzv?ei$ZnQYk5Hr80xk`U9>hlfHji0V8qfPSBmd&i@e~_zV1_zwG;z7V)BEos5SNs8g-Zw{o z9(}=qZdv*Nn)RQPYbKCxA!67iK_yHbRih|A`LwjUPV@tVrwq`?dxgG)DHGSK`2$E< z_)+Ki&tcN<3=$9?;&|KXPbYR`BXIGH0UNqn zEx~kgH}d_tCrrQ(>~*ns{9mQi!r)@!f85=Y-qc(TRxcENPKVrE`+IVe$G{zlO?{3s zxu@|rK~Y)Yf+>JQuIoSM>c0_M*PCFgn3gTy{agZ>aU(r#b)~th+PzKiCjnzCDM``k zJSlIhHPJM%AVQM%)Q<>LBS!c{xur?(F=$Ih_KYucR2%FH!so(KxKq zmgY6eWEMYlR&P3*+_nIn36{Cskh_ykbL#Z^2F-Y4fOPHrf*Z+u$iAKzl-k*VYq*0O zP|Xy3#blB8 zim6}p{&E3sepU%Y+>3d}M>-B#c-*_OD?i@_*=g3-9(L@c1}L5iLpzF~{0_wQ+}|?K zo-O+pD*3poG;vjZ_hhTf9QTR3kH*aHeljrXwtnNtfLTi|c1JypZzU#HrCLmX8-*Rs z%!Yb&mbXj8ZRE{lt!Sk5xYSmGW4|8M(&ICSD9xxI)^dZtcHPu8i1y}16|j!>?F6KW@6uct z!{Gc4S_SYQ#nChgx~D#+YNHcY<-)bbeTHoby?+x!k*#ulNEh0|)ieCzIi{$0eWGAb@{m@7J-_+A^`0RH=0C3KK@M#K3tD zZ?BA17<}E@Zgvy9h~8AOoYi!@tHhyqO%jB#uERw0miirWkZ>WtIwL0uz$w}DpSQm2 z(b=%NE?wMedq!6^tiNJcx4*45F?ZuLZ>q3|9K^Lm#=G@QjmME zhDll2FpuH5#@aFEw2MuF z3b&i@Q&Zp^vLp4ZLl!{^rOy25CM((xMxS49sLOZ$c z!qpFil{fiGaYA*1s0l15Hd=fZ=2krcF-$`-@# zk_FSu1%y2HNA6Bi{;^Xp(ti()5qx!|8tN=-iPj`&zy}t@SJg_UAYwDm%e75cD;yqh9c_%(5SKh=tyI{Y zs#QoVGIRy;3lArQGy%7sgFa11)C<8aPgBTz{jxgE(Q5gk4~}0Ij(;rV>sCMgkw-kf zzy{d*(73XWK&L^L4=)p&0Tvv*Pi~q>f{?yV1R4V*&Rrk_$T+rc} zk(W>8X5?uzs%XpgDW7))p3;T0@jPh>e9g~Y%d1<$PM2}p?|y2CyjeUN*B~Ulu=K*VwHqQ0?NQZ|N7#Y(NKn+Uan&+O55q&C<@v zP^-;pINmJ{)T%o+mnJc8Z}kpM&te!}FS$MREgl{X_poVx&0WQs0$^m&SBCtPoK>^9 zGMv`aT9%bFj_qco1XT1#5S3l^7<0cJx;>=TD0UUO@7rM}F4S~RbqU-J2Bd{RZQBXT zIBuO6s+P(o=e$0s7B1m)mXXlnw)EDkhBwA=jfy-Pmo+R3-O-WDa;~|AcY#i6**P@a z)W5}~mk{4xWleC0kvpM~*dB`D(BA~m9KAp7@Tt=xKzY7vUGOdVDRX|@+f|a4h?y$i zVvTV1X}bl8l-r--z(PUEA_PBCTagB(`9b4OKBDQ}!!y)#F4=3gR%yd4O$|QIa9F!_ zN9yv`kfbg0t%xGsG|z0b!AVN~@`)Eb4B)CLX2jiM(c!Dveonv@``Qz8E6bbTFpQ~J z(n;tpj?N|dcz(IeuPtY_Pel(g*5OFUWU(0FlHZ@_v(<(d{!!CE!W&>-8ai&+KE#2o zIEC15K&=Gvz5m#&N?r|f-w#+L`in*+LltXCIpS_h+g-I-?~OHEoEmPP!^$loel+OO zb1zlHzM>b#PZxKkYwa?`7rNah(Rg-n7Tel-B;H(M#cP-_z%DFoojY)= zrnO=Ds&b@j^x(>C1Z{7}{=_T4It}_KDVGqQqKAT0f0k4VK=m=S=m+x44h_~n0}Tlp zl6iA|mE1KXai3y3la>m-&hWLicv$+H36De=xaigFT8?^9!2el%wbk9MI~#=qLAC&z z*(uH7Zti7eu6govd@P!Zk(h5iFl*#U+K}db6dc9QE?af!G>Ufszuj$;5_138wH(@& zFK8?w;-uvfS*O!iv>en3Vq&^>&-FIGSU=A?&w4n&@QQYWu-aG&Kda$cflWdRl#nT_ z{)xOy5-CPiv~0sNeQR=MBxbWcNZg%z6V$aJy${!m+Y@iQLs=#jK$xevmNyUnKGzoa z4K(Tc3SXoLhxEB#GmCFpTQR|YbUwj>6~9?+&bZyy)FS9Ji+f1reAq-7TRveUVv2`5 z91_xlrRPI7E=w2D>quJN@nh^Hqc`Y?cOP%lmi@qECxsPx{3Nj4O1@8d16|5=@}A9n z9-Y=R3Oxn~VXB@Q6gJ607GRb7AVoKRv$i#n`ki|gw-UawINK*$Jd%N`&#<~KN7390 zA8V{6Oh|`8=hR4%E3>xVVdqB~j~b^$#hj$3&Kh2?o9oVs4{tF0Dck>YsvxocfcuWh zC+WC0Sg1K^g|3p+=38*75*xYoOf2MO@QN+C3};WB9l^T<4M+*gyij%dDxcRKczw>D zc&c)~gS&Xpx9R%$L}dZTJhpCCG_S(nqk4WO?5Z@YDF#uA(oPN;0LZvDVNrAIk%ZU3 zp%!!#nzF>MK0`YVhKE}4xgCHV|1qTn*YT)pO-SmL=8F_E-xx$QizG<+yNW9_@EOGlYHFm(D4&bG(no zO=v1Ir+~nt5FeOE(74mXxvVU*=Db;K`kEMGnzYf(@QZI4D@?V7i0p|R6*edRW%2=% z`h!|YigZ;0VtBM{3Sz)?uesCR`%$HS=3g#OJt6Bv(JVsBL^1gpBAvX7JiEN+*#2VrP%Wcm zp2kMEFn^(Tz})hWHMQNURD7wtiQnOif+plv;9uVefUD}RTb7FCd4K;1&Sj92#ho;= z>jD4ISAp)s)Uy#U(m4Sev~%;@U{EL%IYdbi=N|QR%;2VG`tzYS9T4?V&@V4WqD;E{ zF^s4dkmo3XT$kIo7_|S~zoz-;WYR&Fbgswo71HUEj*9-=wMu!kub4Hs?os|f{n2qK zzwj&u%h%3LQF!C&m*?cAt#}HV27YmR%zjC5HipO`L(@DAT*qG~(9NSk z78mj*wdGb7z#g*WzVyJ$gp7AcIoGH^bFTT%QcM*8@G=|S@AGGx8Kif+dC9R6Tl4Dg zbNAmLO!ot)mFezI2awn|U*2otv6FPmt>XblxaoH4YDZ6YZ_Ly?>f6eSIakHmNzR}D zMJU1Ss=;nBPJa>te-Y3g>LAYkK$#6FAIntt0*PIDUl5YEk6p(!YgUDxwU8ATfaGq| zzmmHzb1Z;#p?l~U2`E`ErxjHT}eJh!rGtM0yEOt?&#>vf^<5q+lJE7O zU`;wGD9dfstplT+$=@kcQbG(hV0|qPw2mHOQB3vNFfprsUvAemaRY?$l-Sg|B;C4d zQLC>F+vEmUdruRI-sE|KvF<>8X2pd`O_M-1;SrkfV8Wad5ne=q6T=AvZl5G}CAZ&s z#1~rx;Dou9rouZ2f3N1kp~r$hP4Q_RXLePnGyp=3hwwVvD5A>G_Qo zaj+r&oVJVH2+k7YhN%_YV1XnaL3BYxbe3`O=oVgOcKs2eO#+~Kdw(AVGtY@Wz3Qgx zkrN3VU72^qqtD*X6mc-x>?TSmXmqt)PhI|HBuM6=yJs008K-h@z?NK^2muy;MTovo zK0Yy0g(H%bgu&XsVY~TVGU#K2(yY9a)xbe7kaS5e;@Gl+T5s0GK4;)nEOQ;V!+tfS znICCkBvYiZaH=%aG^SAS8raT~$_T4-UqRm|%*Zm<^V?v+u(Y z@LOnGlz01WB=vZ9EU(9I>7ajCtddlqI`v)9bEK-NztBPDx${99mb-A=TmwE()n$>r ze%kj_2K*M-b_+bD0?!@dF+$I|LPI-7F1kCY7Z7!NKdT9w`Or4Pv0tR zJ*bQGJ>@W;!(Fn=MdaVg0#s%InHJ_Si%XqN%Lb?w9T!O(`HMRF`3-~)*fF|u`Fs3% z=iDtoQu9a0#QMS}>BIA?(@tsc9``4W0PzDG%oF`oz99(acNc1?cG782lnJ)xQc|2h zib?W+MY>5SGLiRIT=Ku61dSu_C0p<91EqjabIK5EQ>_I0xR)`f<9Z^kq2s;p+@<R!krB+?=`!3$TP2$nde?~IOBi3fUkkI^BO z;NfHO@8A2H4_s*L40}78skZmS*TKUL_d5@eOW)t$Ah`f3xkP5|Ez)?dSq1d)UQb}% zYw+KYi=YyxdZM#Esk)amx?GKuL|Il6RqbBuq!9BB;pG~z>ltdP8VpQce4n+B*M`CGPr z`Je@=2`m;VO`~_%cK7FiFnX5mPuSTa&r3?`drwmpP+irwcTp{mG|N6aXHeYnH@$!P zz%Kogs1^?_L z?~DI3;?OU^Uh zuQ9k%Tmk?XrPJYq*^?R=oKljw8U?yGm^Ho^9rG<%EB*p{T?Z2YuBX_9YyH)lkExlZ z5#ctw7{Ncu9ea~9f~6OaA0)bO0*FezQ%zlEgw@yf>bg&iibj07Ti;w%kj>b5#qdHu zXJ!2dhqR{ack(6^e@_>}bXdS0ULR>)WTt(zOWQ$_)F-uQ*qf<()h$zU1t8txdS3fPEd08NkFmm0r7i?DRT-o_&mV$qEWS@pidtRxC2 z^jV|Rlk01w*0M-VMd);%*^W?n&c8`pO-S1cxR`;ruw8AJu1AF+>6`sQnbwsXFhS-As2D zJAPT3=uouU32g&ywpuH^jBA-PxpV3u@T-MVw+20yg%LT)2;qL) zo14Ql2$5?^R|(X<7G@^dJicO6dpb>%^2or$v&TUP-6D?j6-6|N^-An=4AHH}1V~k8 zkT!9~L^=|)(FhBCufAp5AlT*KAE?jq2f90BSupd7BIvq<|5mbo!b%TxtdD;dSHOU5 z>H}cR%1w{yPOr^}K&?5MVr6DD-p%w9-CPsaJlqhrLI zO0OHb^IfWovrN(^uR;4$LP7R~VP4Hqd_{9YNY4w1F}5{pUp??$e5f_7n&HNtS#=m)5&UfQlXxB-bJJDW zTz_^@0G(@y`{e}fp|hz>(}%S3EqmUO@a`ij`%T4*N5oa(#9km zHRBVYqO#&y_(zlvaf1j%5rC|BnD?%$t(ixx`yeh$G`I&|jRtLO3MWSX=qv;6hV1(1 zmzX+k%VYV7OMHI3tf0vVwq)MjW^+uBF!p*nNSfd>$80mWUM=ivnN2bOqzT@-F%c6J zmDoCzvVeDC_j-_0U}O>&%yPAc)ZkET`1vVW>nfpG0~qMT5uoiids}+xbQ)iFh_k3&S3Dkxa@Q9Ra!su)uKHe0mn=T)F$Z3 zbb9_c=BoOc#Hxq4njZp2wM|92r9Dig7n3rZFtB}Zup)GcSR zz?4JUV}JeZg#w^N4=dGdpPQ|Q#lOJ=q;|=aKn?63i8~}KE}#`V_Cd$8<4kI-JmOJq zYn>ipDNglC&kzxC!^JI;DVUk91JIlu=bVENXbb|;knPaJ{$XjQ$$2|%<*u`2o<*84 zTYYpo4u%yUdt}k!^N|4NNMO#V?du%{#p4JIQ zoc#VO7sQ9XjY_jAGNi1Hmm;#F**I-qwRxU7ciAH+-C;Z6ywF9f;op}`^Kzp=7qgks zurgBbi>q~~>0j7#<;~Owb`_5ca#nmZH>|6@_XB2zHz0=O7hiN=^wUdY)K~o2VBUoB z*qW8epFSj|p#F^x7@>z|;cr`UCqg*iph?YUP7kE4WR%AJu@|feFt<*JQ4*-s+_J1| zQ>S9qzlMG2^rVr}csM3LxX~E5Y|a6!4rfsb@VMfZWux7O_Oo6-JbOVxe6C>Bo*V4} zq!?Cjckd%#*5>C0aEa(#b-1|O`kGp1u(!b}GmA76VbHGb;qPi?Ra0foqW46GpF3lH zJ$CbZcD!AkE6trBbc)e)aeln=HXh577MN>S^CGTDOli8SuJ=Di1k%}Lne$hxyL{bR zAK|Db$RSqbuuhq9ol5mDRa--Ipu?B)Jb`2lf=4(gTD!SW(W*9HhRG8S*xKCIjnpJw zqrJ}qQC!VS&Agbi^dGdbw5g3LB{pd&H;!nz%k&>QE{1AfGxuxOcoLfK?MG)@whArl zxF4j^LJyTTwmQ8j(|lH%_MGg~ZvTah&+ajMk5O`oKeuDQ)Kqf3ed^>r@mW;d;OvQR zk^}p;@zkhpzO{~ZhfbO;H|*3I^;TQfO**?MAoA5G3E#Ex;#R!BZ#T{h^j}lEPG1VM1z|J6va-*HmsI8U zCb1zk$_-wBbyFQT+uETnyEkBSmvNThXMU%;`8c@ zM+3!c>0+95$Lc~EBoV-mIvuq)UM2O)Gf0=0b5_;k;YF+xqG$m%Dn%sB6Frn;Sb^Uc zxBOk~W%B~b8Y!=mx(c>D8NP$v`a$Y-QPD>y9RnIo%HST!=*15XsJBB--K(}}d7fdm z=V{ASg+#_iNnHPhsczRr(EY`wCD)vT_Kf?SLrfk1A-C{rcs|C}!j-r+7z0NPXsk*Q z)<Nva>o)`b%z3#>;f@LG#ikix+-;t~|~yp_qW@Ez-0-nYt-|Gb6uv zjK>L}UKmZ1ypLyE&@H=S`t;e{&b{c2R6FdbW6Hq&HKTZ`_ClLRiPK^gLlSiz7jxFN zD-|WAu7I$TX`R9tc_R*2JO)1BP?E}v$!8cI0F5PT)y^8UgKiWL9p~Sv+~X9>#QjK& zw5|={sXn_81G)sss=lTQB^{nxN&p6b>11*5oqCY8cQxAHz~Ibj7Fra^YicYh%hcYu z%w(*$*&6zw*oV(w3bY?f(o)CK<#qTst!fCwOZWbBu=?+DnzC0ZHWObYu*gDk2Mh0~ zeNLo)Ho_68(W6?x$c5I`DAkN%KT}tcMLGgduP3z){USB(ff93U-)DBV%wC=F) zukwV+$(BPMmi*EE9 z^lN-`bRpa+$%^+bJ!-jhvw2U%=-BTRI%j*U%+aKFf}}pvMxu3I4-%*p@-45FNS~cB z^in4II1l{OTjVGoFRIM0f8~%;o^`gqX!!hTLXC!DZ7l+ri@OZkwl{jdd=4SVNCuCK zTf}RY8LCd)x)RZ$G`puVo@H-#>N7M(WpT~CON9N zIKdM4(T|$7UEeEA0WmouAc+zm4>R=rI^Gnc=ZN|Cmii6J@}7oX3n`t*Kt)kckxriD z6e%voX|GaNHe4Iy^%n2tQ5W)}XR>{RWY5sK_m;ZD2av#10v+&vOre4F+4ntHGnY_f zlqJT+r-SMo!l)d&Ayrt%hx};ojn~g*b zIF?zsTE@`}3dZe{0{b(P&@A|;t0x%&M&s^!Ov#aZ;E#Z*(LiQ#`*LlyL6@!1K;LQ? zoL>E!=YMJ-FZ*-pVyCEX7j?;nGxdSyQS7f|3wk_?NI*8C$Js z<;m+Uc^u}-7BRb&c>T%a066O62OSeyEYJT4Bmu}Q)6KQ8x8mn=$!;9~Ioh6>lmOwK ziH2~N|M8GxI0&`XU23=e*YAE6H?hCqmG|0f;DP9n?x3>|37*?Kw1sR05k)7=d+gP{ zmj($1k*y-jKaP=sfczu*vCAOj7bR@sx=RnYzoG!mMj-md8+pHytVH4;N%?=|`G4fu zoumKnaSuc2TZ(Eb2ul9L+2DHO`-y`2(ydlf z6bbYq7dYFI`d3GH{zM1@fzT>(bkwg{QN$qH53M2JnB1J~h^~y?9 zEZ-IW8TCvFaro;EI|KcH?0tn-Roffw6;Vk+KoO9X5|Hi&mF})Xr<8zzv>>1eNOyyD zcN{{H?ru1Amvr;i(Hq|Teq+3U;5mjv1|HaZuf4vSbAIbHK!)R0Goa*Hhh1;O6uj*l zKp1XyZvG1+6dbyveVB@yt|-Gr0Ux_+w%i#ZZLRYGzEt4!m(KqFez`Nv1Sflt1FVCY zN0OB@4w{hLRm+i@gRyMp8_ptn%Ow|ze`x`J_Xo{1HKz>%pz8QibF+)-{B!Z=ZosK4 z@6&U7kFF@b&okOZQ{8$XB!uWq;zy3#pEdnhNk6qXU;##Yg5PN#5F0LHO(7zgriv{Y zmNXPN5dNY-=v6M`7gITTAsb<#mx7$#i+a3Bmz$8bk-XVDsEcD#V&8!>n~R1EdY?ue zSKaKD8!7@OHM?gbpNqziozul24y*%4B*8zxH;FtyV{PN0lsfLtst@ok2C#Go;ISyo z@_1akvN-N&#fgxzX9BXAnGcurk6S-~OLR0hK6gIH4lfyTzh;zQB3S05u_{=D^DNZ|;#kR1uw zEf@(ntXRRmM9vQtHqi#N^&ALm`w`>4e?DM5cGlwo8SOU)tpR?wzcR+a(Zs={jk28? zP-U9l;e83^eES5xgn#~0@Z~K86i=s<8Aqx-MO6sI@=lG_ObwpXpx)mfNsdQqzb!jo zPcbLnjvAX9nBZ}JCZhfBF<4OW;8B7tpB%s#>ac2nAsLi?;aosUJK9c-MiiJ>8%R^6 zY#XAdAfgxk?UKo4$kdn1V8oUQC~z>x>UU+VAR9O?duVMQ?lsw7waOKuDT_HHyhX7Y7sSH0zp_InL2yvs?o3x79|I3HEEUW8Y{yxxp58SJ+LtbWElOywG+C5N?^cp; zfY}Xr7Ds+;*-kJ_TeFwg@e2k?gEIHjaT)_0ZE{U1H= z*850k?0m9={^ak~=>LkewoCHr@!C*k&>6t%wPb=O?~q!&fS`gjg1m(0`z|@D3GE^A7H$*30)>j?IA0^1XiZQPx89_pa0x^j}^fq zHe796Qys1@b%YsMw^3>Zyb-qcUz$9uC}Ia~B_QP_D{> zuD{)N%ouM^mal+` zh-`rFAnr~U`Vh-Y{0ID%nWA}qez1)wv=C2>udqtKus#jOW9z|ZLIyNL{Tb@V(H=9B zw5raU)WeybYds5(!?l58H;+}0hBN6bU8-VvvPfFv3v6Ita^r^@I%CE55{zb4Q&ulfeu*hF1r4us5PVl35#sfsADhRY^DR62| z!>&_MA40<*IP=uA>x5|7O;+IQ?I7DuQ>RS?09rm`H?umWuM4Qd-tK^bv$=R2R$3Y% zikpLQhT$`I2l+P&>__FT@xm(S$6zRCG%_~jY37!@+HA*QP2BfMkHtWaSO5sor6kmd zT1K?vqAADrYW7F8Jp$oKuozj(VS?_Ul|UypCue0Pkb|Di!Je8J{$CRh`u%OeX!L~q zNPPT6p&7*Usxb7K@Hp|!sc4fwZte?eT(F)l2oFm$5;l>2 zI^sF36CXBDb9vCp(m2gZCvcC5%{>bj=iHa}ngNqr;}mehsA2}G^Q{39o$m42Oh;HJ z`b1N%NYR*^O19K6(Zb!+`S#ha#|_kjM&_G8R5#!Z@ArAo8tFwo5dQ51KQ$nmS!nU^0W(6aj<_>VH@y+xKp?24kf&E9C(GRZk;=Kq(#(aAhmo(i2-G7?3PV)OPAMxEjeZKrp0oK zb$3S@VR4I*K649u7uy+L$$Jv-&?vV93A8J=`w}(}ZX}k0Y7{WKmpEi)(gf#u;J35H2Y!%9c+;59E z$CcHv&F=r50#HBp1Ubfde{_K$7wOuvi%(zkFZajo={rW8n#lkSm#O6eQ+2XwrJna*7MvxXMJCLW=T5@c-fUp?0W8Rd()5;#BC`cr( zHMGjBgtOn{UA;l&>Z^IY#m>SU+n7 zN52p2h{8VX=m+gxMl?iVPYQvPOTB$bHMF!dyl>uU6{ zGZ1WJbS@WgGr;gyU!;50M$%T`MwTC#mZT=sC@g68Z%S^KN2HqKYKh9snw>E_k@-%X z{#XnYcDeh)o@N`k!Zyy`>Y|I8>dgXZ@3zRSL=hAyAD(8a3eRAxj8Vt5V`G2#Cxm3t zA%HpJItJ(pOqftf#Q#cCYf!&Y*jUXH|@qEET&0a zWIpQ%(JiP34&Ujy2{oW0`_S-w4t55i=Atro*KOA&WXB)s_glH-`I(%Af#<|AYj#iO zsM7KkLb5!3Sp%3rBFa2fVOdIde!SgP8Y~ae?WMr?-duI$6xKWw0@pt<>;Taby**Wl zoycoF+yUyUEJOrp?K0YzR5Vtl4Bp~%5-UBCdHnE@H%;MGyZjxVQ&w5@aW10S-CUit zC75?N0_!oTgaHmC@`GA;k^oGs`)$U}aKn#4Ys16aWNXT*Ioe?bffm0R1&Zh#Nc2BF z9BuIMnBDs_0AEs52}B}~%O6!M+CchY)Aq9-Yp1wXM_&GVgr?Nx#OkRv6z&=Vo^TD8 znU8&GXt70(tp(-lXEM15sR^UHVVDj()vLzCT+XH6afja8(J;|HmQd6-(;#3-Kqp|Q zqxb6!a4xip=dv^Q!qrR}bqJe_rd5n{<2hRjGx+#g)?f+;%`7`OA?9;hF-kE!1i{S@ z>;pL_L{O9IKBCt$ok|l=RX8jevIceTS@UQy+ygQ7#4-dDd%+7(P>6nVdi1OfUM%{o z461ztw>SM@FUf#j)eYvwFkWLGIFXYFuiH>=(I?#}kG**T;&K;edSnQr;WGK{$nk#`iuvyD~M0qsqxjG+3x7OQ%S1T1$R{oNyMKR?SwfPcb>_CeiWvL3FD?FV98FrTLqEaf4s%6R!< zNgv;}Ox;p&_q4NHM}b}aJ~E=teC{DUz|r) zTG~XziaiV&i>MveqV@$-*C&aMc0{2;nr5K*biGk(x&|fV0&geIioz^JhI%lr7%LSg zjf-0DnY9NWhf#qub>EN0HqBdw25ny|q^|mh@g1Bm$W5_s_VIc}H{0(>qbawQ_VG_? zOSCY(v))tN-sihFA_7V{sVf%CNmjeYJGfsh8aQd`GWulPQiCRw_cE<`v!upM8u!n> zlZts8{9|f~h@*Mx2CRB?x!KuzSK<3xtzzY)yICMaQY%dAlKD`gUp!wL>~OIjNPSf- z=L(8-oWv*hWt#J~zrLN{Lo#^$Km(6L@`T_VQ!y=NK$w-A?niG1o|5|tALIa8!} zMYQgCL2W33A`_JBNZBeauq zLOkq_5Y3g+M6L{UL!QXLS~n$hS_;AQNSdj$@E4r}61EEY$6ZPpY^YZz<)~)y8$|hL zZO-Z0NDp^oh4peg9Cl|daVjn|1+KF5y8&S&l_`_(S(QlwUJ}cBA<_t!g{j8XPPtz$ z61O?2y_$c%ON*Pq1aXddI!~8_`M6tF+v&X65r8emTC(7XGtDBqqY?Z!3wW3E8wK|& z_4ffA%@B&2S-DUEf}8FmJi;zThVHllqXka$89|TGR)}n6gz3aMG1hwL9QSX&Mh~g> z`h2t4>oAf-B?fU8lyBa?W?Wtmo{D%Eq%HHVE?q6Y^zlZk#f~)u$t%dZFQ*!=G^cBu zM2X#o*I<*@bw?_BSXegXm#0lKTVy;G(t>vM@+*+b`t8{|7m1*?P%FlWQp7{YaZsXf z#X6wk(zCw!c%RKVMkIx~yp<4yau5vA;R~RsNSp=RCO04EGUr4xs@#O_RRKTDW4ce+j z!ysIK1fUT{Y)UCHwKxX4C!UBrS8wj4m|=BjnaVZHGSnT0@}Xg4Ff?4+ruq1X#c*yq zQ!WbGOX9k-s5`}f{3lJJ=0iBYPz)1hu4~C>+b-%P-}})<+{PxQSE=CCq^ztwA?Nn} zBSs~I%JKRT)s8#xMxL$MNX`m%XDe|c!in`B6VbwR=VZ`*VYxnE@)N;iPQ-ZA%)y6b zim5XIhz!*J3KzWE(7eiq2Stfs!IrDn5lERq?#W%bmHgQfvLq9-gibdnrD3b~Um}2O$&#wAuWy4j^ zCD23rfi5v@9=wvp)(=f@KH2fNz7Q*p_xf#lZXsAVTva3>DJUN<5UNxoPS^Mp_<;RU zHYE*}J{mNdj1vc_4>?70HHJ^-(O9x#`*({5JLilGN_ob|^g|83{J;h@5X%y?wUf^` zati}qUc!BzirlQjCx`SA)#1CKu%`F?jJ^Ft=E{b8fRLal^_)R zJrPkuaBqJpZ15&|S9Q)~ubwlR`3b zoQzd`7|0=9aaA9k(ISqt1`-5Pc3&3R0{(;r0(I!=Ag@Z*LOhVYCmJHceC8HRFHlzw zQ@a2nbAx~-!X?eUaR_$Xt=o62zK^R&sT63L1p&D?LFJ{V`=)7N5&&iTbW+Cscg#?HfM(X&@pad-P(;tmS6CYg@~FS2Yl z{r6%5aFgmb;w2bu%?UT44_Y~+qj>95k3BDTs%?^6bJ3U%)5g7p{oW`nwo7|MYe7|* zQ075q7!yV=G`s(KWL%8hD)eEEZlhO~B#LF6BMdfO^^LLo0!n_5+de=L-$YHRqC#^{nWb?iE}sjVhCt>7&tKI2Et7fOae0LI$@N1jq8qiptuK}m>}|(-M7ndAfcH2 zwW2{itT-TmNrOtZ5A_DRtTX0b!bBvs`sM1n{rF;M|MNcKTeI&Kdaz908dpZ7NcnDq zK#m(HaF+Ay6M*oC%asEuObie`P{g0eF2%)M%Z^kXw$PmMWBCMoriL{%tNfUM|? z@++_GL>J~dI!{yF7vBLy(%+0sGRgMimR$Ra4yb!pJns>Gdguf|c|14rZ-fB+L-u&P zDY_24h#^1Fzm4-*txlR3+CzV@bx48*1rfeb@)%ts$_qE42&DKm00xE4s)<)QtVJ3j zi1K;E&DAc|Lyk*catU65kl3Wgc+_ED?63TIbxZnN!(J1r%KPq&%bJ=R3#+a*qLIGB zQjbzfW`D~L%CZkfwgw^#%^*i6!cJhfo_!*vfz@^{W2lc0bA-7$4r;Qhgi1$KVH=X%L{Gx z`aG^#w*ApMrvq(#mU!WHk^yg+SLs>3=sHYl15--V^lsz6zf=5Y%V^Hs4Gkn?`J}p> zuxpHgRZts9Ml0Ic)0gMpkVPC+DH5`{u}5h@voPT4xfYgSSr>)$zJ&Okz|Dcc;!B>W zagVTp1j+KcAFTuj6)Ji;w$;Ca?ym}rAMsM0!^11fRipnj_BFy*h$s>A(S3<1=&5*? z*mA^IUL_G4p{Ee8>KdZMCHeZ;_V71{yFd*m{wl(7nbuX6UpnO`4B6p~j^p)gHf34Z zGHHquLNT`Q>({SCDW1n2pgfC}!I2z|Rw9!wahX2ZQEfO^*3hFK;val>@M3{7_XU4% zSE)P$!fBt;h`?iR7WD~tQ>AqIcf+k}6@4Q4J(cET1<{!n!`y}kZc+D)DhwDT>QH(} zBoa^}L{in*El?ZjhIY#hDD`HY?n3R-YmROx4|L=nzyL%`;KSMmuMAl2fhUl5L&Pb% zN)ruY2Nic2*LK-Z3H{OhM`CtB&svS1*ggF%DH?kE2B6f@z;%X0z7uGgAX8tHSKm&S zvyP?P9nKEpD>v$T1f87(!t(Xd<_Z>*UMW0EDI>(^(KmqQ7OqpLl!-STI~=^)AABw4 z0#E|;@2Kab{o#e)n)Y?9t&iPyw?vTrM*VpJ z^i9McsSw5OGqQ3Q(DURF0)u~irSTpy8oC=^-sA((Cwf#U@D7M}b9Tgoxqn!1!DO=s z{RC6JIXOAgR$&vJl257eO&6>;hO@=+yU1t7Om8)nqA zTkQ=rue`5tlsgJ>UXD=3+6i%tXmHSyE9<(`;4OUcPMEAffKtiB?VeX3ZbI1FO596r zCN$Kv?4y;k*{^+~2GX!LK+Bv>G#`II7cPmS6?ZYxa6kh>5nl zg)y^6uRGUzTExwePwz+ygnlS6UipO|^$4wQP)(!uUW^5Hf2VAKAW@PXvWT+wFR5$Qvqu0m8g0BXrYnzPeVnNh}z z7{LevXh`ezX%ua(d-60O%8-c@#_XlBsNG74=^})DRQ>{tfHg0L3htpFYOO%-J3V_6 zXB~SfiOVhNVNDx2c%H%Np+{Nx2aH_G6pO2yRVY(jM8F!EPcJM@??jX@lbO7LoCVnt zs4PD^0WqtAIT58~-0nf_H-Xi7JKEmHG~UQEfXtuLQ}=UFkQ|uic=GLynZ=qToAF)#V&l*hMFVc1b0#xv7#` zIF$iu^Xvgm!R2M@*1e>_g#xB0qfwo*s}y;6Hw!TRW7H^NCjdilGs>@*9JM+s^!9+c z8ecnA>d@USxKx5EWtR{`9et+Gk0s)UI@;6ls3NK3gzuDZ81+H6R8zv_ z0v+9OF;qqBZo&QZqLG8n6GR#1Ot8^DA7Xt@ySzzeKlLiPZs={h$<01p+K&}B*8dsT z-kY{P{tN&91q{)|3o2r7dtukdcK0k9ai$;fEwJ+9+UXU^JirSq)KSvb1 z6L|X>*U72mn4C|IKQn0@eUg^W2_Ij$BMwvhh|nTsn9a%#?Q(uSkNiu?vAfbc7uxl) zQhi^OJ}d<6qfP<=^n#4I)MpJ5$^`>0QZ8;JXMev4;Y$jZe1?^&_K>Gn-vdrQ^fA>@ zKN;~tkj-(vcVMw~!uR^-ZWlTa{i6>>H8`8`O)19vc{GSD`SPopd(Z_P)SZh>#tr;b zNHAs_{0TdQ1c#iL%P!A9;sby8^O=Eh!0n{{)3?!l&BZ}tz>V3o$=@%#4&Plb1;v2( zRGC$%b&}h$`XHv2L}rnDqxk9Z_SA*?k}J&5aj`Y9G^Op*{=1eQRt|{XSfjV?#tw>m zmdRb|H*s;i0r?Z#wbeym2n1-xN~<7gsyc)^Y?pKiH77{fU@jG2viiCyKIVg!-F4-P zLBtktH7HpX>4%DY83(`0*uKx-Hd+8NKi--MGh=73W9ZH3A{p9+B2wrGMmZYMG%Aw? z?m}=~*P9A;iXWfaQy;tA+uMsMQzNo?@MhmI|3JW~&%(~L772WAy8r?+PW4?Jc>bQu zibOIQAtE4$`u9HnwG3ROnDN8ALz%LzXAT>~&vp`lOt_hFlVr__ETm90h$v-#g?yY3 z?)$XGVxuj#$X$igL*wPR_M^FhFxi3+DBA0)!gHX_K~l8@3(Q%;w>=irq&u4l(uFZu zbMP92i?M(hDqTLt!_7`^yh{nQ^A#a}Sev{k;X|s)btqL(uP)C@Zvug;l)(!Fn$qN| z`G#mrRkaX|8pmBC)emjN9@D-_=okDFMJ_8bCJTd`>HdZ)^k_dTn}Z&@Q;&+o?|KZ; zIBDDA$Uom}3+a1P)h9dZQ8oH4pT{vESE2<5Fwk;=RWs|IMA>-_L&zV``Z z_eAZ(UB!PZaKWO4^LqkZyjfX+=!H-iW=6m3b@Pd$ET56K%Qjn@xB=Cr_#WzK!_G~h z!9z8UIWa)KtwAfMO&g1JB0VcGlp%c}Ta4iu`Vwghcc%e>4>VHs9PVS&)vhZ z$guWZ0oCwVG_s|F4~#srDmK#ZHWcnQ+_-m1+p+a3O6e;^Q}bGk7nPU@-u@G=m5u>H z8u{qj>kg_#d$bQK?xa9(JV~a()qsYWA|w^wD!N4+etwINN!CMaBc6wtjd<{9iTyYB zkS;{BasGBc7+I#kJeBJm{Wy*EWY=cCT88)!-DrM4r1P1Fho{d}2%q61$hYcqeNoS+ zSoL=}SnUhK-p08<$l<0M7rVaf1;;=OCiGYA{Gq2up4wlL4%$&AyQ(|MF^_I`J1zL~ z;g!1_9wO#8%5;!(m(uK;bI!tQ{ZF>l78R58h^>G#m%;V5c?sHnIZdo<95a14(R>%QNIpu8sW$h z+2zrFWF>{>lE(WekMEuPye~w=%t*3gi+yVIB$H0WGU|>;|0aet)m*h@XFwIwZ?*O< zBtt2|O$#HMm+;Q7CL)!QPwss0{0u2*gL(@Jkn5+U?JuP!EeUbH9?q&NszynnZ4_Tf*p<#WUfFwjT6@ zN^AnnSwwn~TeV3J^S(0~?l=!I3E%hf>i0z_kPt3I^>yhomioxbv6|@> zwu{AQXW3{&;z|2iEo{@7m0iH;_L(_^=4(OI_W7z(>!$}#@impUu9H<0ct1;qBrx=g z2iWzg1X8)}CWw=dma<{Xmufw(VvlTMhjn{ht@@9q_q%WW^0z)z?@h?=EqGLBorblLJjp0U>P$yF%PFU^50h)9_$oW`spWP z?-MyKJs_YLU6=ttNjIRC{fYQ9vCcoS|CanI@!#?pIgmz3BzEovsY1iM9}9@q)r{7( z9=2C&X+d9%(lH(ByCGYKM2@T>&#lv>sNl(LqFio~1C4zD2Z@H;sFoV9zg1@0%Xx^f zmBVmdbzHW~+fzmlv2t70OWcDq(}|rZ!`|Z3E;ki14-Roi?>C{pJK)UCiQVcO+GijD`9X`6|ngX?|7a(qV!xYFjcU zj`KA01!_!sHsJl#+yKF) zNWq7VjgQAl7p!Kn-CjgW`+~G(cj|h`WA9R(ey~@FuG96nJ9G*_(nD!7O`H_brghNk zq<8VH0R{u89n{0LC9ewm*?k-tCe!r|3i)H8x>4M#s3pGFz_1_Nxdcrqd#zdG;)URK zE_M~)3*zpp15moAORDs53Nh5^UNtl4?Uv?$)us%VTa(&4ylih~B@VaRG78mUv@C+T z1%Y`~7GJtNh1j<{SNVK&AK|PWPLHZ}HIpBvUkH$fU5fpZ^|m~Hp9#6U^>9alNEc2E ztP%Hrc&!mrTsIl}!);2_x(2PpszjV(=`7J{zug7ukm=-d_%!iE1wzNYr%(vJdpAVK zw;7X9C!R{TUikWMTWM$bAJ099bTpNSM(UHZS1tCq=Eh(!2K&425J@8eSa%bz#1UDu zx=BLB;G#?)6iyoAYQJlAX5Qqn1H@M(3j4+{QL{S0xcvW+5kLcd6GEF1Tq*k9BE`E7 zWr=ng{`bwF+;UnYMeN^r7Ka!1u$~zUU#7!mm^e_!#pKZI38$%Xhu8x3sE8iQvh?wH;%mks z8c~TtcccQYO#YGPf{ooKO83J7QDiT}uL$)rUy*g3=+7xVrHIG0A&T#Mb#E?C5Kmgh za37R=0Kq(gwC}~)vs7s{9Ym@g`yhfodS3Gkdz4Wo+_oI(`FE4WdYAqEP`UMUiDin5 z>uNUA?;D3RyY3u9x~ezS@!8mptdvopLpQ4!Q^kS4&rE95g@#{f=&WcL!U)M|b7t>k z8TXjYE3TnwTb^3gfb#Y(GZ;?=c3K}S6dR0jLjCco5j;KipxV!sS&*rRx4SI)+l=e{{>fM5qokY5RTWm0*|;6G zg+zWJT@I)vQZ39yhp-BMomuz6_EO*GO2WxS05or zNXMa$SIu4c<11!P<8I94rvIz#PSk?5wagJmK5DcLXKov+q2jryn8GV-w@pnOga}&E z)dRhAdh?jDqW={9_Wm%&ewK~Tk$@0KQwS5 z8K=;T3L4V5kR^KO!JXXz!H^YBEQKfFM|U0~-1$wR^T0+ z#@$E?*)jQ9$a)FsuzKMJ!`&M(H>_4*wsNJaL6 zh#m@|8FzBBF=OV>d9~f9>%QTF%o`K^vPWqlb0g6#(furzb?n`j*9F#B!CvyamKNM2 z%8N7+pJH;*iqTM`R$)&^C2&Vrtx{bx(^jH#-*!yKY`M~R+V_e0)l5Wdvx_ZvL`o-K z^Th_e&g-YccZXL?TJfw)(Os<(;A zqk|aBoE27WXvpz&{yr`Jt$e~$=gYBrM?-aM-bC@3v#^BW6)-20)-Qyb(u%V%bon2a4Khq9GWIyWc zjXd-Ho#ANY15PFN&KIc?Z|ZnegIWVvqC~^rt1?j zJXF5jqszd3G?~bIzl5F1Wk^qK=YYs2T{(SWOk|!h>D>ad<^#XCW5lqjfbEfr(}!ty zkrB&!XSR3qic?tZNT~Y5Y;lA^77(ssmO-?g8clzBUNzdJY9pK@Q2A-042pi z7{kp2#lnQl;6rU4sp!(g*Xl)lg+yvpBK5umon=+YS@_&(65!+TX5 z7AS6Ghak&t(8$12#c-?pNR-m)VtOP9iy_P{%FF2Pu5#71#krIg=pfODDXM;%OtHZ$ z=dUSbf;l=7U62}3JQ6v&dmsBvA#{g(EWB(=i&;!l%^T4jM&ncg6ZPme;jSN4F2(F- zO=M-Cag?+lER|Yh4l^;XsleGk2l_69GWw)MMGv4^F?f~h5#9Go!D-qAib{lp!pC!; ztqTP1NU?yjX0U*<9PVm@p8|S@3qhP6NUB;MP>T)} z#B=L`-Wr=(1r3j9F(!&?P<$5;wZwS%hvjw<;oXv)3{Y?eOcWlnZ(v=+^qKt4J_$fa zh^}5fC;h{7;Q`60!TY+Vw_-(M30zFzf>y|;%;UJTOEZ9VbbOrO@b%~iP(0zO&H<8Y z%UM&10h0pg32jB@VGGPFa0pTG&V$Enufu;z4WBz7^l$|7JFkom4@ZNx#g+myQFG-2 z&B|WW?5N~GVgdelpZ^w@b-o7yJ~{00_RvHcu{AS{V7#L&w~2Is>DkD3H-%>Rm%0ms zkr!qDDZdGPfvs{;F&ef9_w!P8`XJk-_yrb|!GKyW6tcn#Uwz6~x}a6YCN;sl_9oPu zC+ani&BDu}T-8D)@v6TFOhG@83ZlN10vwxxNyXNL3B8Az&)Ik6E6g2zB?;@*>h!7Z zZR^74b)V&b>z(rqB}q2LcYhfenxzz@D$LLU0XX{#5R9N6?R^^Lm8S7nB!DGbr+yadj zi8r;GoE){nPLKQk+wu9;N`d1TzMsS2Z}b<)poRb!Z{kAUhNzdBhQQk*pSp|#hOifn zfE|q?xXA$*E=MsE*4>+HTmdbD8~{i52FHiqjWp~Tg33!K=G7@Xj&jzogAL$ZubJ3^!QKb?>?RHjX8fs zM*G*Q{({$Vd{>Hg7tZP_ zK3A(8(qc7_IjV}YeM!Yq&tT1v{w?Z25=(Yj`J_2BE&%-W!U_A;m5aLLLYx_PLRoujf2T< znXlwcTkSxKX7k2Xtg*$LhcZp3B2}t*jC4h7cd>pfF9pQc0*ubTgG(y!e~fn0q-r?L_%K z$uc7%=EZDmVI*TDgWK@*{fhxktMK&FGN+lK3rmShToL(C@BMFY`0sTylKo?#-~ohM z$gDG2=D4)Z-Q$H=+sIUpS223H&P}M5pMoFvhiF;8sDeXIX};aQW0mf#}lS;pN5mWBUevpEARF{?dn- zsOpZLzlqiU$8!CF&_WIZuGG0sRQ_J%IXNo&x=X`Z%CoVpEDrOEHdMO_?<}+zy5DUo zERY(rH&uc+C%Ge)NT1uw7=O2SCv0VjeE(0!8T@7+y>3zYOAGLCI_u}vJ=@_xbsCMf zunwS+{Z*g-6wdySr~GFpw^Mi;S9b@1K8D}w{Qv!>XE$))M!f+%|L@NRE5ehN7IEYM zzPkVWD(&$gDz0=_-ToIl_dget3+@*cni{MR^tWzyRGN{OvV0r-R}kU zkhUhvyN1Wcrkct|Z2FIPX3W8f!Vh>8j;lcJ3YP{VuY>+7OCasoa2ikuhH7usUcN$7XKK{dNsAHV+8%ix2R}o@+49VBc<%kwCb%v%k5% z$|xwHXT1V_Ny2@Z-0I*QgG1!lM5yItS#c&?0XWHxSgPqyAD-NSHCI?gId#PPBpcQ%GdJ|LlrpKflVT3h zkb^(rQ;5V`J6j+CB}~H836g3${D!ekFN@>leaY!*5u97aStH-&nG`hZ$0SxeAFoU? zyS3)<@2GFQd`-1?X21nIqV)JPZD9iY{$Uuu(SI1^tTza;Wk;{R2%|iy7Rl|{RSN6x zHE6~?oGds0n5kXBLl$%;B@p@-5cznI_6M-f zx)c|c8Td@&xiMr;x_fl8&$`s@9UoCDNDhas(G@CEt(_%i{M|0+X$y=%wSZ5w?l;5~ zZjj-RZ*e%~iK|MJU4#|JsMwp1y;eLbU)A`NnMx|;o2p>to1(aR##7?*qP^U3`7ds& zczv)C1bFuCNfV4;o{7VoeY)&|HqAU+%F8gCaiN)4nq9K_MZ4sy;x}XQ@AGVGET`CM zM~yb!h6{3H@P^lQAAbwsyZ_|{Ph=c1)?`ud$n$_!M;gsigTeu-nCEIPZ+ID|i@Hn$ zx*{zmEGxz`kJXf2j*!1-`Z;0kvZ_#gB3YtH(-}Y5pye}R(`K_e8sO}0-LfmMnOC2z zwznH9d_fx*NZ7@OexY~vI-XXVMzczW$Wp3+F~M z-3LAosFv(EhC2Y(RFo0+A+9Z-mx67eFzBU`AF;fpoHQ+u2t}(Inps|r%@ejDiy`%t z8q0-AfBFV$gV8`~WODolnQ2MNP);UmpfxCX3}uh;#`JrKU#e#w(SlVPhK;ZvI{a>? z%q)zsa33*H-yy)k%rgG*s-Kg%Txt z2ZM&R7XSN;8ETT-bVVWH?P}`5!N=VvCGkhQC*xwdj7IDhTjDcJ(l;WG07TXePPr7=wNAi zp-MK>PoJM(6u(h#TeafdPy~!GX&E`+B(FA;V&hJa)6>z>(Md_HJwOfUB^9F>{L0Ni zKS2lxL(h&;?Ee$d<%AKJ)>Fg{ksUgOi@7|lc0v>M{oLsT9=K872d3t1av8Jpe-a095{pD9NaA$$h!e#M@C7-9kuJO zo-$sMt+~ByuN1+N0?^zeLVcvIcQYkZ&Y>I2$O4BMSX>XRJ3=Xnamf$!{@VHJFO$u# zceN$>1R?*4-Qgs+kcmMrbusq7-YiipBjFS@5N|3s0sOU`WcwmfHv zuUf}b9~8Q@ly04W)p>vBNOJm!HIs&x5;jZ6Nw1oURL1%O+_FiM^@~xa7H1o&9$nAs+(jRkksL~ z1wsGg@RiVRIl0hjlu_%Ese@PdGIFnx-1fR=N}9fl=cWE9%BW=uvZ3aZW*(ZKxUtf>2hy*29()Tp8#LVzod1OPbgX{EbY211A;U|q@7#+iOD~>% ztM`^i{(3HkR&9RGO}mh7>NG2kgvV+cj2@%@ek0uQ!9D_<#$ISbk=TlMkixa=)FXTl zg*t7%ug_~@YH|@Wmj}!~2ItxaytEbYN%oPEkvj$I>o`;?J43Ppfd{%ORnw9p4llIm zg*p7{_~At}k$TktZ?S3V!`(v*^{O{1595yF|FRz+N)U5R7s5f={Cnb=+>Athm|7bV zZzi3r6S!%#bmjRl>V0!C%q=rW9Ym;Vr=8cO`kHk|NFkM9@_?GEC13Y}#Msw!p!n1+ z^G#qu78&w&e3fS>__@lSC@qL?+K}k-&4-7=J6Xh)60D0)n*Ch|($D-QC>+B4yCs-Q6Y9-8F=C4b9M;dwhMv@0|0;Yy82?^W3px z?Y-C1qVpIGh>5(buj9Yx#IUHI zfwg`!ltKH)ocrZhMzi)=_=gMmqf&Ev}xmD>a=}Cg@~?L7~&> zu=0WQc1G33@Uo3#<=vGrk|5E@pg)T8p1eq?hVje>3PIai6lTjm zo((_#ATEsjrTyy%{b-8GYg=ocmP+Nv>8GJ}cGs3~cx=`Wy7MEmTF0iQl!wFKeohyS zYy$*0`^RUC_dP{y%2-D>!n}yYR>yjKM+@`HC~;V8fFH`pLGM3!5377%sN#0XZL<>X z!A8Iaqdm%b&jXlz+L7t1b;|}RAMlrTe~^z(i*I!GskR!cT>%c4yl`*xPGrd1?BI<3 z$}1+lx7SNX`~@>y=Q>Q6gN00J)d?`16gBoYs{fg_^W?okEGfQ|SI5$2GRs!c5M`2- zIRYoDk9_9c;gg-6hg$)BuHtQBUHKEKyfU#=!u}1Myn(ZJTJ!z*(w;bS6A=RB zUl{hk+20VhC9$Z;L^TV~yBd1tjr`Xc1l=kuF@j0H_xn_U(tqvO%da3_kj6m8?7-F`d|TRN1%F# znnS(L)p_G}?;#Azy=n*Mx`L}$6J&a0Em$2bSL$vs0%|sqkfwTgUw@6n1$FmhwRNcD za2~v;T(bU~+>fPjwrcZ;({VYO_p=_Sixy8O(x0`y`|o`Uqx8DN(Ma9@iEU8XV#=bcZgiAn#$k0Lg3jw` z_Bud(9bct}>+4+1bJOLE)AE(mRUWqCjNl$ot^J?fk(3(h)M6Q6)l&iR!h)`$H^b>- zWtOGM<}M{^bIf+Pa&x>oqVG~WUu~1_5x2~&&Hl*wz6ICn;F?k=GmOqZe4!jw?*gBy-~q_!!Fhn24&lUeHOK}{W+f{KCv0hA z7GqA$pfv`GPEJnogEjVwMh_;`8|OIdPm&Z0uRZjP#^a{+fG67W+pRBbsNu9oQLC_9 z4}4W{x5|lr`O3;`aqzNMSxe&0T_ zHJxon*{~b-Z6`6hvK(W3FqG5NO+iG2Cm;h!e#;Ra{3n)_&exPEb!B{wWOEv*5yKPb zNY$cdWGiX-;@O1S)ew5>mGxLy9WO|JdqDJMI!w+y$LYYb`JviNDrNdM&x|3l(nL`~ zY!1~zd3H=&?%2t2jFF-d_GisZ1JmtezsQ@iqFfF|T@YqoBh|m^B@6fX;$M#h^jf>sE-MV1L9uK24lzd_6So_S;9X8~H;l4R_ zCo@#!-W-BKH<a`AV}cJlv1Kj; zuDs&!`mvgQl46&ud@0vUkX8$@SQT!EUJZk1I9fl*Nwq{~?XY>6?cc+~uEV*@@*9{gfh}JZQVx@#$ zXTCRe`gD`ni*oEN1s5@E*VPCgj4}dZ==R5XyXkDMnUY&k@m&=2Z**$pYp|kWQ3zr? zuQw0o%fGyN&~N3W)iNQ5iO?&{>-VB(Rz6gDbIh(0XO2?n8jDHQV|3#M|oHT3kEG#Pz^L2Te zF!s=U;}-x)?>*FUd1SUXz!32rG|^2Q4J9a+b$%ko-g)M^2GDH+CV+Btn3EG~Ra27A zVsotIqT9ZSeU`<$`>oK8ZjgnnpqsWh%U9HB;=Fa@}w#o z4aMyUM+;UoljxYSL=5F9QU{RmYp28yg;pE+Ih_rC>kYeqxd4GOCdmO~NN|AE7m zia-1#$FFMNzt9<&Z{L48AUJEcA7N$UYXp%=5IZ|GohF?R-iCgzI8>HaXL*m>TxsGk z5wKJgpdFBg%^lStxuxLS5yx5=_~ljF-q{SEiVm@;gk=QjQzVOv?+xRN*G~8hdpzAm@oRNxb~xU02>O5%B4r4W zSzikElx3q}c8}hgBIAVX-e&!?)WUW(_HD`QvUUktRgFpx=0vg@2j*rL*iXiveP0hXq(epeO zC_O8pMR6q)vZj0Lr?xu72f=1pdP0`Gwv0O+R9EAQG=k<>!~bUJSDhrrICYP@{~hgX zBs4g7RD|iQS$pn5%bnRL3;fzp5c59bB=E7t4rPJ4^As1lG5idWWK(IL=Yr%vKXYP(2o@GEHKv;ts~Pd~sa0Wi^u#NWy8C~)ry+#CxZvOZy_O?V2CBUu&P4qP0ERvA{@Su*ALll=5zYf95Oo#^TE zann`Zw3B7+Aqqs9jK%Mx41!fS`DDK2T$pqM>k-k7anIuS=@3Gkjx;sTlBM;CU~od_ z1mHUxBvWp_J(zxVVnR5tp4{U=vyUz)#q03n|8F$Jo792K{CcMq%gKuw)o$7>s&0PD zB<5sX@Cj@Dy1Tdp+;qI<&{Rba&QWP@PMOfH6o(y1u3{f6UVE$*8qLD7(Hpa^F?mUf z*WNN`JT!rxs^S0G?9%;ckvmUmi z*d4AMlbSgv_Dx?jE{$!bO0(_~yvz~#bL>_>s6K!cIqX^dd{kWp%r&kXnd@AK`hvEh z8J(YELvBy|AOpdSZlBvvTt0MQdVyI4-Xv1Qx$%DO;agT15_&I@u2oM_{9$1*-d4-d z+Dzz<$8-Djmc>x!)er*R0vBg>!&hRy(--RwA&i-!gqkg5J%M_y|1#nMceO8u=;rJM%$32&}==)n6A1uv!VNU6!&k39j zp2@7Q3nOu-j}^TN+^?@*^+)==5UwA`5-Dp=6*cR7_lU2}(RihXdhZQMz8j@XqKK*Z za{MBvy|&K6cc0~n;>qkMc^SCyoS5lAY%OxJ?l>v8-GYnlQ-~k#S1t=Or=HHZs085` zBe#NO@UB?#=J?u7%7`u1`$V?Y?+K^1j+ zSR*tqF4Onf0C^#8aG~P-N33WhQbNF7fIZc@K*W zF;+~Noz|0hhtOK$8Ds3TB%;Fs4o5mGdW96Bni}#5!tEf6?<`uTD4}%9{C5h|443v3 zbND`f(&z^Fo1hf?{(H1reZ#^mc5j5klvTV6^?aJ}$lXeqqr;+UL+70+-<~~Zy!m()Zpi%dRulXI#4+#(y