]>
Commit | Line | Data |
---|---|---|
31f18b77 FG |
1 | # Internals |
2 | ||
3 | This section records some design and implementation details. | |
4 | ||
5 | [TOC] | |
6 | ||
7 | # Architecture {#Architecture} | |
8 | ||
9 | ## SAX and DOM | |
10 | ||
11 | The basic relationships of SAX and DOM is shown in the following UML diagram. | |
12 | ||
13 | ![Architecture UML class diagram](diagram/architecture.png) | |
14 | ||
15 | The core of the relationship is the `Handler` concept. From the SAX side, `Reader` parses a JSON from a stream and publish events to a `Handler`. `Writer` implements the `Handler` concept to handle the same set of events. From the DOM side, `Document` implements the `Handler` concept to build a DOM according to the events. `Value` supports a `Value::Accept(Handler&)` function, which traverses the DOM to publish events. | |
16 | ||
17 | With this design, SAX is not dependent on DOM. Even `Reader` and `Writer` have no dependencies between them. This provides flexibility to chain event publisher and handlers. Besides, `Value` does not depends on SAX as well. So, in addition to stringify a DOM to JSON, user may also stringify it to a XML writer, or do anything else. | |
18 | ||
19 | ## Utility Classes | |
20 | ||
21 | Both SAX and DOM APIs depends on 3 additional concepts: `Allocator`, `Encoding` and `Stream`. Their inheritance hierarchy is shown as below. | |
22 | ||
23 | ![Utility classes UML class diagram](diagram/utilityclass.png) | |
24 | ||
25 | # Value {#Value} | |
26 | ||
27 | `Value` (actually a typedef of `GenericValue<UTF8<>>`) is the core of DOM API. This section describes the design of it. | |
28 | ||
29 | ## Data Layout {#DataLayout} | |
30 | ||
31 | `Value` is a [variant type](http://en.wikipedia.org/wiki/Variant_type). In RapidJSON's context, an instance of `Value` can contain 1 of 6 JSON value types. This is possible by using `union`. Each `Value` contains two members: `union Data data_` and a`unsigned flags_`. The `flags_` indiciates the JSON type, and also additional information. | |
32 | ||
33 | The following tables show the data layout of each type. The 32-bit/64-bit columns indicates the size of the field in bytes. | |
34 | ||
35 | | Null | |32-bit|64-bit| | |
36 | |-------------------|----------------------------------|:----:|:----:| | |
37 | | (unused) | |4 |8 | | |
38 | | (unused) | |4 |4 | | |
39 | | (unused) | |4 |4 | | |
40 | | `unsigned flags_` | `kNullType kNullFlag` |4 |4 | | |
41 | ||
42 | | Bool | |32-bit|64-bit| | |
43 | |-------------------|----------------------------------------------------|:----:|:----:| | |
44 | | (unused) | |4 |8 | | |
45 | | (unused) | |4 |4 | | |
46 | | (unused) | |4 |4 | | |
47 | | `unsigned flags_` | `kBoolType` (either `kTrueFlag` or `kFalseFlag`) |4 |4 | | |
48 | ||
49 | | String | |32-bit|64-bit| | |
50 | |---------------------|-------------------------------------|:----:|:----:| | |
51 | | `Ch* str` | Pointer to the string (may own) |4 |8 | | |
52 | | `SizeType length` | Length of string |4 |4 | | |
53 | | (unused) | |4 |4 | | |
54 | | `unsigned flags_` | `kStringType kStringFlag ...` |4 |4 | | |
55 | ||
56 | | Object | |32-bit|64-bit| | |
57 | |---------------------|-------------------------------------|:----:|:----:| | |
58 | | `Member* members` | Pointer to array of members (owned) |4 |8 | | |
59 | | `SizeType size` | Number of members |4 |4 | | |
60 | | `SizeType capacity` | Capacity of members |4 |4 | | |
61 | | `unsigned flags_` | `kObjectType kObjectFlag` |4 |4 | | |
62 | ||
63 | | Array | |32-bit|64-bit| | |
64 | |---------------------|-------------------------------------|:----:|:----:| | |
65 | | `Value* values` | Pointer to array of values (owned) |4 |8 | | |
66 | | `SizeType size` | Number of values |4 |4 | | |
67 | | `SizeType capacity` | Capacity of values |4 |4 | | |
68 | | `unsigned flags_` | `kArrayType kArrayFlag` |4 |4 | | |
69 | ||
70 | | Number (Int) | |32-bit|64-bit| | |
71 | |---------------------|-------------------------------------|:----:|:----:| | |
72 | | `int i` | 32-bit signed integer |4 |4 | | |
73 | | (zero padding) | 0 |4 |4 | | |
74 | | (unused) | |4 |8 | | |
75 | | `unsigned flags_` | `kNumberType kNumberFlag kIntFlag kInt64Flag ...` |4 |4 | | |
76 | ||
77 | | Number (UInt) | |32-bit|64-bit| | |
78 | |---------------------|-------------------------------------|:----:|:----:| | |
79 | | `unsigned u` | 32-bit unsigned integer |4 |4 | | |
80 | | (zero padding) | 0 |4 |4 | | |
81 | | (unused) | |4 |8 | | |
82 | | `unsigned flags_` | `kNumberType kNumberFlag kUIntFlag kUInt64Flag ...` |4 |4 | | |
83 | ||
84 | | Number (Int64) | |32-bit|64-bit| | |
85 | |---------------------|-------------------------------------|:----:|:----:| | |
86 | | `int64_t i64` | 64-bit signed integer |8 |8 | | |
87 | | (unused) | |4 |8 | | |
88 | | `unsigned flags_` | `kNumberType kNumberFlag kInt64Flag ...` |4 |4 | | |
89 | ||
90 | | Number (Uint64) | |32-bit|64-bit| | |
91 | |---------------------|-------------------------------------|:----:|:----:| | |
92 | | `uint64_t i64` | 64-bit unsigned integer |8 |8 | | |
93 | | (unused) | |4 |8 | | |
94 | | `unsigned flags_` | `kNumberType kNumberFlag kInt64Flag ...` |4 |4 | | |
95 | ||
96 | | Number (Double) | |32-bit|64-bit| | |
97 | |---------------------|-------------------------------------|:----:|:----:| | |
98 | | `uint64_t i64` | Double precision floating-point |8 |8 | | |
99 | | (unused) | |4 |8 | | |
100 | | `unsigned flags_` | `kNumberType kNumberFlag kDoubleFlag` |4 |4 | | |
101 | ||
102 | Here are some notes: | |
103 | * To reduce memory consumption for 64-bit architecture, `SizeType` is typedef as `unsigned` instead of `size_t`. | |
104 | * Zero padding for 32-bit number may be placed after or before the actual type, according to the endianess. This makes possible for interpreting a 32-bit integer as a 64-bit integer, without any conversion. | |
105 | * An `Int` is always an `Int64`, but the converse is not always true. | |
106 | ||
107 | ## Flags {#Flags} | |
108 | ||
109 | The 32-bit `flags_` contains both JSON type and other additional information. As shown in the above tables, each JSON type contains redundant `kXXXType` and `kXXXFlag`. This design is for optimizing the operation of testing bit-flags (`IsNumber()`) and obtaining a sequential number for each type (`GetType()`). | |
110 | ||
111 | String has two optional flags. `kCopyFlag` means that the string owns a copy of the string. `kInlineStrFlag` means using [Short-String Optimization](#ShortString). | |
112 | ||
113 | Number is a bit more complicated. For normal integer values, it can contains `kIntFlag`, `kUintFlag`, `kInt64Flag` and/or `kUint64Flag`, according to the range of the integer. For numbers with fraction, and integers larger than 64-bit range, they will be stored as `double` with `kDoubleFlag`. | |
114 | ||
115 | ## Short-String Optimization {#ShortString} | |
116 | ||
117 | [Kosta](https://github.com/Kosta-Github) provided a very neat short-string optimization. The optimization idea is given as follow. Excluding the `flags_`, a `Value` has 12 or 16 bytes (32-bit or 64-bit) for storing actual data. Instead of storing a pointer to a string, it is possible to store short strings in these space internally. For encoding with 1-byte character type (e.g. `char`), it can store maximum 11 or 15 characters string inside the `Value` type. | |
118 | ||
119 | | ShortString (Ch=char) | |32-bit|64-bit| | |
120 | |---------------------|-------------------------------------|:----:|:----:| | |
121 | | `Ch str[MaxChars]` | String buffer |11 |15 | | |
122 | | `Ch invLength` | MaxChars - Length |1 |1 | | |
123 | | `unsigned flags_` | `kStringType kStringFlag ...` |4 |4 | | |
124 | ||
125 | A special technique is applied. Instead of storing the length of string directly, it stores (MaxChars - length). This make it possible to store 11 characters with trailing `\0`. | |
126 | ||
127 | This optimization can reduce memory usage for copy-string. It can also improve cache-coherence thus improve runtime performance. | |
128 | ||
129 | # Allocator {#InternalAllocator} | |
130 | ||
131 | `Allocator` is a concept in RapidJSON: | |
132 | ~~~cpp | |
133 | concept Allocator { | |
134 | static const bool kNeedFree; //!< Whether this allocator needs to call Free(). | |
135 | ||
136 | // Allocate a memory block. | |
137 | // \param size of the memory block in bytes. | |
138 | // \returns pointer to the memory block. | |
139 | void* Malloc(size_t size); | |
140 | ||
141 | // Resize a memory block. | |
142 | // \param originalPtr The pointer to current memory block. Null pointer is permitted. | |
143 | // \param originalSize The current size in bytes. (Design issue: since some allocator may not book-keep this, explicitly pass to it can save memory.) | |
144 | // \param newSize the new size in bytes. | |
145 | void* Realloc(void* originalPtr, size_t originalSize, size_t newSize); | |
146 | ||
147 | // Free a memory block. | |
148 | // \param pointer to the memory block. Null pointer is permitted. | |
149 | static void Free(void *ptr); | |
150 | }; | |
151 | ~~~ | |
152 | ||
153 | Note that `Malloc()` and `Realloc()` are member functions but `Free()` is static member function. | |
154 | ||
155 | ## MemoryPoolAllocator {#MemoryPoolAllocator} | |
156 | ||
157 | `MemoryPoolAllocator` is the default allocator for DOM. It allocate but do not free memory. This is suitable for building a DOM tree. | |
158 | ||
159 | Internally, it allocates chunks of memory from the base allocator (by default `CrtAllocator`) and stores the chunks as a singly linked list. When user requests an allocation, it allocates memory from the following order: | |
160 | ||
161 | 1. User supplied buffer if it is available. (See [User Buffer section in DOM](doc/dom.md)) | |
162 | 2. If user supplied buffer is full, use the current memory chunk. | |
163 | 3. If the current block is full, allocate a new block of memory. | |
164 | ||
165 | # Parsing Optimization {#ParsingOptimization} | |
166 | ||
167 | ## Skip Whitespaces with SIMD {#SkipwhitespaceWithSIMD} | |
168 | ||
169 | When parsing JSON from a stream, the parser need to skip 4 whitespace characters: | |
170 | ||
171 | 1. Space (`U+0020`) | |
172 | 2. Character Tabulation (`U+000B`) | |
173 | 3. Line Feed (`U+000A`) | |
174 | 4. Carriage Return (`U+000D`) | |
175 | ||
176 | A simple implementation will be simply: | |
177 | ~~~cpp | |
178 | void SkipWhitespace(InputStream& s) { | |
179 | while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t') | |
180 | s.Take(); | |
181 | } | |
182 | ~~~ | |
183 | ||
184 | However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot. | |
185 | ||
186 | To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON only supports SSE2 and SSE4.2 instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing. | |
187 | ||
188 | To enable this optimization, need to define `RAPIDJSON_SSE2` or `RAPIDJSON_SSE42` before including `rapidjson.h`. Some compilers can detect the setting, as in `perftest.h`: | |
189 | ||
190 | ~~~cpp | |
191 | // __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler. | |
192 | // We use -march=native with gmake to enable -msse2 and -msse4.2, if supported. | |
193 | #if defined(__SSE4_2__) | |
194 | # define RAPIDJSON_SSE42 | |
195 | #elif defined(__SSE2__) | |
196 | # define RAPIDJSON_SSE2 | |
197 | #endif | |
198 | ~~~ | |
199 | ||
200 | Note that, these are compile-time settings. Running the executable on a machine without such instruction set support will make it crash. | |
201 | ||
202 | ### Page boundary issue | |
203 | ||
204 | In an early version of RapidJSON, [an issue](https://code.google.com/archive/p/rapidjson/issues/104) reported that the `SkipWhitespace_SIMD()` causes crash very rarely (around 1 in 500,000). After investigation, it is suspected that `_mm_loadu_si128()` accessed bytes after `'\0'`, and across a protected page boundary. | |
205 | ||
206 | In [Intel® 64 and IA-32 Architectures Optimization Reference Manual | |
207 | ](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html), section 10.2.1: | |
208 | ||
209 | > To support algorithms requiring unaligned 128-bit SIMD memory accesses, memory buffer allocation by a caller function should consider adding some pad space so that a callee function can safely use the address pointer safely with unaligned 128-bit SIMD memory operations. | |
210 | > The minimal padding size should be the width of the SIMD register that might be used in conjunction with unaligned SIMD memory access. | |
211 | ||
212 | This is not feasible as RapidJSON should not enforce such requirement. | |
213 | ||
214 | To fix this issue, currently the routine process bytes up to the next aligned address. After tha, use aligned read to perform SIMD processing. Also see [#85](https://github.com/miloyip/rapidjson/issues/85). | |
215 | ||
216 | ## Local Stream Copy {#LocalStreamCopy} | |
217 | ||
218 | During optimization, it is found that some compilers cannot localize some member data access of streams into local variables or registers. Experimental results show that for some stream types, making a copy of the stream and used it in inner-loop can improve performance. For example, the actual (non-SIMD) implementation of `SkipWhitespace()` is implemented as: | |
219 | ||
220 | ~~~cpp | |
221 | template<typename InputStream> | |
222 | void SkipWhitespace(InputStream& is) { | |
223 | internal::StreamLocalCopy<InputStream> copy(is); | |
224 | InputStream& s(copy.s); | |
225 | ||
226 | while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t') | |
227 | s.Take(); | |
228 | } | |
229 | ~~~ | |
230 | ||
231 | Depending on the traits of stream, `StreamLocalCopy` will make (or not make) a copy of the stream object, use it locally and copy the states of stream back to the original stream. | |
232 | ||
233 | ## Parsing to Double {#ParsingDouble} | |
234 | ||
235 | Parsing string into `double` is difficult. The standard library function `strtod()` can do the job but it is slow. By default, the parsers use normal precision setting. This has has maximum 3 [ULP](http://en.wikipedia.org/wiki/Unit_in_the_last_place) error and implemented in `internal::StrtodNormalPrecision()`. | |
236 | ||
237 | When using `kParseFullPrecisionFlag`, the parsers calls `internal::StrtodFullPrecision()` instead, and this function actually implemented 3 versions of conversion methods. | |
238 | 1. [Fast-Path](http://www.exploringbinary.com/fast-path-decimal-to-floating-point-conversion/). | |
239 | 2. Custom DIY-FP implementation as in [double-conversion](https://github.com/floitsch/double-conversion). | |
240 | 3. Big Integer Method as in (Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990). | |
241 | ||
242 | If the first conversion methods fail, it will try the second, and so on. | |
243 | ||
244 | # Generation Optimization {#GenerationOptimization} | |
245 | ||
246 | ## Integer-to-String conversion {#itoa} | |
247 | ||
248 | The naive algorithm for integer-to-string conversion involves division per each decimal digit. We have implemented various implementations and evaluated them in [itoa-benchmark](https://github.com/miloyip/itoa-benchmark). | |
249 | ||
250 | Although SSE2 version is the fastest but the difference is minor by comparing to the first running-up `branchlut`. And `branchlut` is pure C++ implementation so we adopt `branchlut` in RapidJSON. | |
251 | ||
252 | ## Double-to-String conversion {#dtoa} | |
253 | ||
254 | Originally RapidJSON uses `snprintf(..., ..., "%g")` to achieve double-to-string conversion. This is not accurate as the default precision is 6. Later we also find that this is slow and there is an alternative. | |
255 | ||
256 | Google's V8 [double-conversion](https://github.com/floitsch/double-conversion | |
257 | ) implemented a newer, fast algorithm called Grisu3 (Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.). | |
258 | ||
259 | However, since it is not header-only so that we implemented a header-only version of Grisu2. This algorithm guarantees that the result is always accurate. And in most of cases it produces the shortest (optimal) string representation. | |
260 | ||
261 | The header-only conversion function has been evaluated in [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark). | |
262 | ||
263 | # Parser {#Parser} | |
264 | ||
265 | ## Iterative Parser {#IterativeParser} | |
266 | ||
267 | The iterative parser is a recursive descent LL(1) parser | |
268 | implemented in a non-recursive manner. | |
269 | ||
270 | ### Grammar {#IterativeParserGrammar} | |
271 | ||
272 | The grammar used for this parser is based on strict JSON syntax: | |
273 | ~~~~~~~~~~ | |
274 | S -> array | object | |
275 | array -> [ values ] | |
276 | object -> { members } | |
277 | values -> non-empty-values | ε | |
278 | non-empty-values -> value addition-values | |
279 | addition-values -> ε | , non-empty-values | |
280 | members -> non-empty-members | ε | |
281 | non-empty-members -> member addition-members | |
282 | addition-members -> ε | , non-empty-members | |
283 | member -> STRING : value | |
284 | value -> STRING | NUMBER | NULL | BOOLEAN | object | array | |
285 | ~~~~~~~~~~ | |
286 | ||
287 | Note that left factoring is applied to non-terminals `values` and `members` | |
288 | to make the grammar be LL(1). | |
289 | ||
290 | ### Parsing Table {#IterativeParserParsingTable} | |
291 | ||
292 | Based on the grammar, we can construct the FIRST and FOLLOW set. | |
293 | ||
294 | The FIRST set of non-terminals is listed below: | |
295 | ||
296 | | NON-TERMINAL | FIRST | | |
297 | |:-----------------:|:--------------------------------:| | |
298 | | array | [ | | |
299 | | object | { | | |
300 | | values | ε STRING NUMBER NULL BOOLEAN { [ | | |
301 | | addition-values | ε COMMA | | |
302 | | members | ε STRING | | |
303 | | addition-members | ε COMMA | | |
304 | | member | STRING | | |
305 | | value | STRING NUMBER NULL BOOLEAN { [ | | |
306 | | S | [ { | | |
307 | | non-empty-members | STRING | | |
308 | | non-empty-values | STRING NUMBER NULL BOOLEAN { [ | | |
309 | ||
310 | The FOLLOW set is listed below: | |
311 | ||
312 | | NON-TERMINAL | FOLLOW | | |
313 | |:-----------------:|:-------:| | |
314 | | S | $ | | |
315 | | array | , $ } ] | | |
316 | | object | , $ } ] | | |
317 | | values | ] | | |
318 | | non-empty-values | ] | | |
319 | | addition-values | ] | | |
320 | | members | } | | |
321 | | non-empty-members | } | | |
322 | | addition-members | } | | |
323 | | member | , } | | |
324 | | value | , } ] | | |
325 | ||
326 | Finally the parsing table can be constructed from FIRST and FOLLOW set: | |
327 | ||
328 | | NON-TERMINAL | [ | { | , | : | ] | } | STRING | NUMBER | NULL | BOOLEAN | | |
329 | |:-----------------:|:---------------------:|:---------------------:|:-------------------:|:-:|:-:|:-:|:-----------------------:|:---------------------:|:---------------------:|:---------------------:| | |
330 | | S | array | object | | | | | | | | | | |
331 | | array | [ values ] | | | | | | | | | | | |
332 | | object | | { members } | | | | | | | | | | |
333 | | values | non-empty-values | non-empty-values | | | ε | | non-empty-values | non-empty-values | non-empty-values | non-empty-values | | |
334 | | non-empty-values | value addition-values | value addition-values | | | | | value addition-values | value addition-values | value addition-values | value addition-values | | |
335 | | addition-values | | | , non-empty-values | | ε | | | | | | | |
336 | | members | | | | | | ε | non-empty-members | | | | | |
337 | | non-empty-members | | | | | | | member addition-members | | | | | |
338 | | addition-members | | | , non-empty-members | | | ε | | | | | | |
339 | | member | | | | | | | STRING : value | | | | | |
340 | | value | array | object | | | | | STRING | NUMBER | NULL | BOOLEAN | | |
341 | ||
342 | There is a great [tool](http://hackingoff.com/compilers/predict-first-follow-set) for above grammar analysis. | |
343 | ||
344 | ### Implementation {#IterativeParserImplementation} | |
345 | ||
346 | Based on the parsing table, a direct(or conventional) implementation | |
347 | that pushes the production body in reverse order | |
348 | while generating a production could work. | |
349 | ||
350 | In RapidJSON, several modifications(or adaptations to current design) are made to a direct implementation. | |
351 | ||
352 | First, the parsing table is encoded in a state machine in RapidJSON. | |
353 | States are constructed by the head and body of production. | |
354 | State transitions are constructed by production rules. | |
355 | Besides, extra states are added for productions involved with `array` and `object`. | |
356 | In this way the generation of array values or object members would be a single state transition, | |
357 | rather than several pop/push operations in the direct implementation. | |
358 | This also makes the estimation of stack size more easier. | |
359 | ||
360 | The state diagram is shown as follows: | |
361 | ||
362 | ![State Diagram](diagram/iterative-parser-states-diagram.png) | |
363 | ||
364 | Second, the iterative parser also keeps track of array's value count and object's member count | |
365 | in its internal stack, which may be different from a conventional implementation. |