]>
Commit | Line | Data |
---|---|---|
31f18b77 FG |
1 | # Schema |
2 | ||
3 | (This feature was released in v1.1.0) | |
4 | ||
5 | JSON Schema is a draft standard for describing the format of JSON data. The schema itself is also JSON data. By validating a JSON structure with JSON Schema, your code can safely access the DOM without manually checking types, or whether a key exists, etc. It can also ensure that the serialized JSON conform to a specified schema. | |
6 | ||
7 | RapidJSON implemented a JSON Schema validator for [JSON Schema Draft v4](http://json-schema.org/documentation.html). If you are not familiar with JSON Schema, you may refer to [Understanding JSON Schema](http://spacetelescope.github.io/understanding-json-schema/). | |
8 | ||
9 | [TOC] | |
10 | ||
1e59de90 | 11 | # Basic Usage {#Basic} |
31f18b77 FG |
12 | |
13 | First of all, you need to parse a JSON Schema into `Document`, and then compile the `Document` into a `SchemaDocument`. | |
14 | ||
15 | Secondly, construct a `SchemaValidator` with the `SchemaDocument`. It is similar to a `Writer` in the sense of handling SAX events. So, you can use `document.Accept(validator)` to validate a document, and then check the validity. | |
16 | ||
17 | ~~~cpp | |
18 | #include "rapidjson/schema.h" | |
19 | ||
20 | // ... | |
21 | ||
22 | Document sd; | |
1e59de90 | 23 | if (sd.Parse(schemaJson).HasParseError()) { |
31f18b77 FG |
24 | // the schema is not a valid JSON. |
25 | // ... | |
26 | } | |
27 | SchemaDocument schema(sd); // Compile a Document to SchemaDocument | |
28 | // sd is no longer needed here. | |
29 | ||
30 | Document d; | |
1e59de90 | 31 | if (d.Parse(inputJson).HasParseError()) { |
31f18b77 FG |
32 | // the input is not a valid JSON. |
33 | // ... | |
34 | } | |
35 | ||
36 | SchemaValidator validator(schema); | |
37 | if (!d.Accept(validator)) { | |
38 | // Input JSON is invalid according to the schema | |
39 | // Output diagnostic information | |
40 | StringBuffer sb; | |
41 | validator.GetInvalidSchemaPointer().StringifyUriFragment(sb); | |
42 | printf("Invalid schema: %s\n", sb.GetString()); | |
43 | printf("Invalid keyword: %s\n", validator.GetInvalidSchemaKeyword()); | |
44 | sb.Clear(); | |
45 | validator.GetInvalidDocumentPointer().StringifyUriFragment(sb); | |
46 | printf("Invalid document: %s\n", sb.GetString()); | |
47 | } | |
48 | ~~~ | |
49 | ||
50 | Some notes: | |
51 | ||
1e59de90 | 52 | * One `SchemaDocument` can be referenced by multiple `SchemaValidator`s. It will not be modified by `SchemaValidator`s. |
31f18b77 FG |
53 | * A `SchemaValidator` may be reused to validate multiple documents. To run it for other documents, call `validator.Reset()` first. |
54 | ||
1e59de90 | 55 | # Validation during parsing/serialization {#Fused} |
31f18b77 FG |
56 | |
57 | Unlike most JSON Schema validator implementations, RapidJSON provides a SAX-based schema validator. Therefore, you can parse a JSON from a stream while validating it on the fly. If the validator encounters a JSON value that invalidates the supplied schema, the parsing will be terminated immediately. This design is especially useful for parsing large JSON files. | |
58 | ||
1e59de90 | 59 | ## DOM parsing {#DOM} |
31f18b77 FG |
60 | |
61 | For using DOM in parsing, `Document` needs some preparation and finalizing tasks, in addition to receiving SAX events, thus it needs some work to route the reader, validator and the document. `SchemaValidatingReader` is a helper class that doing such work. | |
62 | ||
63 | ~~~cpp | |
64 | #include "rapidjson/filereadstream.h" | |
65 | ||
66 | // ... | |
67 | SchemaDocument schema(sd); // Compile a Document to SchemaDocument | |
68 | ||
69 | // Use reader to parse the JSON | |
70 | FILE* fp = fopen("big.json", "r"); | |
71 | FileReadStream is(fp, buffer, sizeof(buffer)); | |
72 | ||
73 | // Parse JSON from reader, validate the SAX events, and store in d. | |
74 | Document d; | |
75 | SchemaValidatingReader<kParseDefaultFlags, FileReadStream, UTF8<> > reader(is, schema); | |
76 | d.Populate(reader); | |
77 | ||
78 | if (!reader.GetParseResult()) { | |
79 | // Not a valid JSON | |
80 | // When reader.GetParseResult().Code() == kParseErrorTermination, | |
81 | // it may be terminated by: | |
82 | // (1) the validator found that the JSON is invalid according to schema; or | |
83 | // (2) the input stream has I/O error. | |
84 | ||
85 | // Check the validation result | |
86 | if (!reader.IsValid()) { | |
87 | // Input JSON is invalid according to the schema | |
88 | // Output diagnostic information | |
89 | StringBuffer sb; | |
90 | reader.GetInvalidSchemaPointer().StringifyUriFragment(sb); | |
91 | printf("Invalid schema: %s\n", sb.GetString()); | |
92 | printf("Invalid keyword: %s\n", reader.GetInvalidSchemaKeyword()); | |
93 | sb.Clear(); | |
94 | reader.GetInvalidDocumentPointer().StringifyUriFragment(sb); | |
95 | printf("Invalid document: %s\n", sb.GetString()); | |
96 | } | |
97 | } | |
98 | ~~~ | |
99 | ||
1e59de90 | 100 | ## SAX parsing {#SAX} |
31f18b77 FG |
101 | |
102 | For using SAX in parsing, it is much simpler. If it only need to validate the JSON without further processing, it is simply: | |
103 | ||
104 | ~~~ | |
105 | SchemaValidator validator(schema); | |
106 | Reader reader; | |
107 | if (!reader.Parse(stream, validator)) { | |
108 | if (!validator.IsValid()) { | |
109 | // ... | |
110 | } | |
111 | } | |
112 | ~~~ | |
113 | ||
114 | This is exactly the method used in the [schemavalidator](example/schemavalidator/schemavalidator.cpp) example. The distinct advantage is low memory usage, no matter how big the JSON was (the memory usage depends on the complexity of the schema). | |
115 | ||
116 | If you need to handle the SAX events further, then you need to use the template class `GenericSchemaValidator` to set the output handler of the validator: | |
117 | ||
118 | ~~~ | |
119 | MyHandler handler; | |
120 | GenericSchemaValidator<SchemaDocument, MyHandler> validator(schema, handler); | |
121 | Reader reader; | |
122 | if (!reader.Parse(ss, validator)) { | |
123 | if (!validator.IsValid()) { | |
124 | // ... | |
125 | } | |
126 | } | |
127 | ~~~ | |
128 | ||
1e59de90 | 129 | ## Serialization {#Serialization} |
31f18b77 FG |
130 | |
131 | It is also possible to do validation during serializing. This can ensure the result JSON is valid according to the JSON schema. | |
132 | ||
133 | ~~~ | |
134 | StringBuffer sb; | |
135 | Writer<StringBuffer> writer(sb); | |
136 | GenericSchemaValidator<SchemaDocument, Writer<StringBuffer> > validator(s, writer); | |
137 | if (!d.Accept(validator)) { | |
138 | // Some problem during Accept(), it may be validation or encoding issues. | |
139 | if (!validator.IsValid()) { | |
140 | // ... | |
141 | } | |
142 | } | |
143 | ~~~ | |
144 | ||
145 | Of course, if your application only needs SAX-style serialization, it can simply send SAX events to `SchemaValidator` instead of `Writer`. | |
146 | ||
1e59de90 | 147 | # Remote Schema {#Remote} |
31f18b77 FG |
148 | |
149 | JSON Schema supports [`$ref` keyword](http://spacetelescope.github.io/understanding-json-schema/structuring.html), which is a [JSON pointer](doc/pointer.md) referencing to a local or remote schema. Local pointer is prefixed with `#`, while remote pointer is an relative or absolute URI. For example: | |
150 | ||
151 | ~~~js | |
152 | { "$ref": "definitions.json#/address" } | |
153 | ~~~ | |
154 | ||
155 | As `SchemaDocument` does not know how to resolve such URI, it needs a user-provided `IRemoteSchemaDocumentProvider` instance to do so. | |
156 | ||
157 | ~~~ | |
158 | class MyRemoteSchemaDocumentProvider : public IRemoteSchemaDocumentProvider { | |
159 | public: | |
1e59de90 | 160 | virtual const SchemaDocument* GetRemoteDocument(const char* uri, SizeType length) { |
31f18b77 FG |
161 | // Resolve the uri and returns a pointer to that schema. |
162 | } | |
163 | }; | |
164 | ||
165 | // ... | |
166 | ||
167 | MyRemoteSchemaDocumentProvider provider; | |
168 | SchemaDocument schema(sd, &provider); | |
169 | ~~~ | |
170 | ||
1e59de90 | 171 | # Conformance {#Conformance} |
31f18b77 FG |
172 | |
173 | RapidJSON passed 262 out of 263 tests in [JSON Schema Test Suite](https://github.com/json-schema/JSON-Schema-Test-Suite) (Json Schema draft 4). | |
174 | ||
175 | The failed test is "changed scope ref invalid" of "change resolution scope" in `refRemote.json`. It is due to that `id` schema keyword and URI combining function are not implemented. | |
176 | ||
177 | Besides, the `format` schema keyword for string values is ignored, since it is not required by the specification. | |
178 | ||
1e59de90 | 179 | ## Regular Expression {#Regex} |
31f18b77 FG |
180 | |
181 | The schema keyword `pattern` and `patternProperties` uses regular expression to match the required pattern. | |
182 | ||
183 | RapidJSON implemented a simple NFA regular expression engine, which is used by default. It supports the following syntax. | |
184 | ||
185 | |Syntax|Description| | |
186 | |------|-----------| | |
187 | |`ab` | Concatenation | | |
1e59de90 | 188 | |<code>a|b</code> | Alternation | |
31f18b77 FG |
189 | |`a?` | Zero or one | |
190 | |`a*` | Zero or more | | |
191 | |`a+` | One or more | | |
192 | |`a{3}` | Exactly 3 times | | |
193 | |`a{3,}` | At least 3 times | | |
194 | |`a{3,5}`| 3 to 5 times | | |
195 | |`(ab)` | Grouping | | |
196 | |`^a` | At the beginning | | |
197 | |`a$` | At the end | | |
198 | |`.` | Any character | | |
199 | |`[abc]` | Character classes | | |
200 | |`[a-c]` | Character class range | | |
201 | |`[a-z0-9_]` | Character class combination | | |
202 | |`[^abc]` | Negated character classes | | |
203 | |`[^a-c]` | Negated character class range | | |
204 | |`[\b]` | Backspace (U+0008) | | |
1e59de90 | 205 | |<code>\\|</code>, `\\`, ... | Escape characters | |
31f18b77 FG |
206 | |`\f` | Form feed (U+000C) | |
207 | |`\n` | Line feed (U+000A) | | |
208 | |`\r` | Carriage return (U+000D) | | |
209 | |`\t` | Tab (U+0009) | | |
210 | |`\v` | Vertical tab (U+000B) | | |
211 | ||
212 | For C++11 compiler, it is also possible to use the `std::regex` by defining `RAPIDJSON_SCHEMA_USE_INTERNALREGEX=0` and `RAPIDJSON_SCHEMA_USE_STDREGEX=1`. If your schemas do not need `pattern` and `patternProperties`, you can set both macros to zero to disable this feature, which will reduce some code size. | |
213 | ||
1e59de90 | 214 | # Performance {#Performance} |
31f18b77 FG |
215 | |
216 | Most C++ JSON libraries do not yet support JSON Schema. So we tried to evaluate the performance of RapidJSON's JSON Schema validator according to [json-schema-benchmark](https://github.com/ebdrup/json-schema-benchmark), which tests 11 JavaScript libraries running on Node.js. | |
217 | ||
218 | That benchmark runs validations on [JSON Schema Test Suite](https://github.com/json-schema/JSON-Schema-Test-Suite), in which some test suites and tests are excluded. We made the same benchmarking procedure in [`schematest.cpp`](test/perftest/schematest.cpp). | |
219 | ||
220 | On a Mac Book Pro (2.8 GHz Intel Core i7), the following results are collected. | |
221 | ||
222 | |Validator|Relative speed|Number of test runs per second| | |
223 | |---------|:------------:|:----------------------------:| | |
224 | |RapidJSON|155%|30682| | |
225 | |[`ajv`](https://github.com/epoberezkin/ajv)|100%|19770 (± 1.31%)| | |
226 | |[`is-my-json-valid`](https://github.com/mafintosh/is-my-json-valid)|70%|13835 (± 2.84%)| | |
227 | |[`jsen`](https://github.com/bugventure/jsen)|57.7%|11411 (± 1.27%)| | |
228 | |[`schemasaurus`](https://github.com/AlexeyGrishin/schemasaurus)|26%|5145 (± 1.62%)| | |
229 | |[`themis`](https://github.com/playlyfe/themis)|19.9%|3935 (± 2.69%)| | |
230 | |[`z-schema`](https://github.com/zaggino/z-schema)|7%|1388 (± 0.84%)| | |
231 | |[`jsck`](https://github.com/pandastrike/jsck#readme)|3.1%|606 (± 2.84%)| | |
232 | |[`jsonschema`](https://github.com/tdegrunt/jsonschema#readme)|0.9%|185 (± 1.01%)| | |
233 | |[`skeemas`](https://github.com/Prestaul/skeemas#readme)|0.8%|154 (± 0.79%)| | |
234 | |tv4|0.5%|93 (± 0.94%)| | |
235 | |[`jayschema`](https://github.com/natesilva/jayschema)|0.1%|21 (± 1.14%)| | |
236 | ||
237 | That is, RapidJSON is about 1.5x faster than the fastest JavaScript library (ajv). And 1400x faster than the slowest one. | |
1e59de90 TL |
238 | |
239 | # Schema violation reporting {#Reporting} | |
240 | ||
241 | (Unreleased as of 2017-09-20) | |
242 | ||
243 | When validating an instance against a JSON Schema, | |
244 | it is often desirable to report not only whether the instance is valid, | |
245 | but also the ways in which it violates the schema. | |
246 | ||
247 | The `SchemaValidator` class | |
248 | collects errors encountered during validation | |
249 | into a JSON `Value`. | |
250 | This error object can then be accessed as `validator.GetError()`. | |
251 | ||
252 | The structure of the error object is subject to change | |
253 | in future versions of RapidJSON, | |
254 | as there is no standard schema for violations. | |
255 | The details below this point are provisional only. | |
256 | ||
257 | ## General provisions {#ReportingGeneral} | |
258 | ||
259 | Validation of an instance value against a schema | |
260 | produces an error value. | |
261 | The error value is always an object. | |
262 | An empty object `{}` indicates the instance is valid. | |
263 | ||
264 | * The name of each member | |
265 | corresponds to the JSON Schema keyword that is violated. | |
266 | * The value is either an object describing a single violation, | |
267 | or an array of such objects. | |
268 | ||
269 | Each violation object contains two string-valued members | |
270 | named `instanceRef` and `schemaRef`. | |
271 | `instanceRef` contains the URI fragment serialization | |
272 | of a JSON Pointer to the instance subobject | |
273 | in which the violation was detected. | |
274 | `schemaRef` contains the URI of the schema | |
275 | and the fragment serialization of a JSON Pointer | |
276 | to the subschema that was violated. | |
277 | ||
278 | Individual violation objects can contain other keyword-specific members. | |
279 | These are detailed further. | |
280 | ||
281 | For example, validating this instance: | |
282 | ||
283 | ~~~json | |
284 | {"numbers": [1, 2, "3", 4, 5]} | |
285 | ~~~ | |
286 | ||
287 | against this schema: | |
288 | ||
289 | ~~~json | |
290 | { | |
291 | "type": "object", | |
292 | "properties": { | |
293 | "numbers": {"$ref": "numbers.schema.json"} | |
294 | } | |
295 | } | |
296 | ~~~ | |
297 | ||
298 | where `numbers.schema.json` refers | |
299 | (via a suitable `IRemoteSchemaDocumentProvider`) | |
300 | to this schema: | |
301 | ||
302 | ~~~json | |
303 | { | |
304 | "type": "array", | |
305 | "items": {"type": "number"} | |
306 | } | |
307 | ~~~ | |
308 | ||
309 | produces the following error object: | |
310 | ||
311 | ~~~json | |
312 | { | |
313 | "type": { | |
314 | "instanceRef": "#/numbers/2", | |
315 | "schemaRef": "numbers.schema.json#/items", | |
316 | "expected": ["number"], | |
317 | "actual": "string" | |
318 | } | |
319 | } | |
320 | ~~~ | |
321 | ||
322 | ## Validation keywords for numbers {#Numbers} | |
323 | ||
324 | ### multipleOf {#multipleof} | |
325 | ||
326 | * `expected`: required number strictly greater than 0. | |
327 | The value of the `multipleOf` keyword specified in the schema. | |
328 | * `actual`: required number. | |
329 | The instance value. | |
330 | ||
331 | ### maximum {#maximum} | |
332 | ||
333 | * `expected`: required number. | |
334 | The value of the `maximum` keyword specified in the schema. | |
335 | * `exclusiveMaximum`: optional boolean. | |
336 | This will be true if the schema specified `"exclusiveMaximum": true`, | |
337 | and will be omitted otherwise. | |
338 | * `actual`: required number. | |
339 | The instance value. | |
340 | ||
341 | ### minimum {#minimum} | |
342 | ||
343 | * `expected`: required number. | |
344 | The value of the `minimum` keyword specified in the schema. | |
345 | * `exclusiveMinimum`: optional boolean. | |
346 | This will be true if the schema specified `"exclusiveMinimum": true`, | |
347 | and will be omitted otherwise. | |
348 | * `actual`: required number. | |
349 | The instance value. | |
350 | ||
351 | ## Validation keywords for strings {#Strings} | |
352 | ||
353 | ### maxLength {#maxLength} | |
354 | ||
355 | * `expected`: required number greater than or equal to 0. | |
356 | The value of the `maxLength` keyword specified in the schema. | |
357 | * `actual`: required string. | |
358 | The instance value. | |
359 | ||
360 | ### minLength {#minLength} | |
361 | ||
362 | * `expected`: required number greater than or equal to 0. | |
363 | The value of the `minLength` keyword specified in the schema. | |
364 | * `actual`: required string. | |
365 | The instance value. | |
366 | ||
367 | ### pattern {#pattern} | |
368 | ||
369 | * `actual`: required string. | |
370 | The instance value. | |
371 | ||
372 | (The expected pattern is not reported | |
373 | because the internal representation in `SchemaDocument` | |
374 | does not store the pattern in original string form.) | |
375 | ||
376 | ## Validation keywords for arrays {#Arrays} | |
377 | ||
378 | ### additionalItems {#additionalItems} | |
379 | ||
380 | This keyword is reported | |
381 | when the value of `items` schema keyword is an array, | |
382 | the value of `additionalItems` is `false`, | |
383 | and the instance is an array | |
384 | with more items than specified in the `items` array. | |
385 | ||
386 | * `disallowed`: required integer greater than or equal to 0. | |
387 | The index of the first item that has no corresponding schema. | |
388 | ||
389 | ### maxItems and minItems {#maxItems-minItems} | |
390 | ||
391 | * `expected`: required integer greater than or equal to 0. | |
392 | The value of `maxItems` (respectively, `minItems`) | |
393 | specified in the schema. | |
394 | * `actual`: required integer greater than or equal to 0. | |
395 | Number of items in the instance array. | |
396 | ||
397 | ### uniqueItems {#uniqueItems} | |
398 | ||
399 | * `duplicates`: required array | |
400 | whose items are integers greater than or equal to 0. | |
401 | Indices of items of the instance that are equal. | |
402 | ||
403 | (RapidJSON only reports the first two equal items, | |
404 | for performance reasons.) | |
405 | ||
406 | ## Validation keywords for objects | |
407 | ||
408 | ### maxProperties and minProperties {#maxProperties-minProperties} | |
409 | ||
410 | * `expected`: required integer greater than or equal to 0. | |
411 | The value of `maxProperties` (respectively, `minProperties`) | |
412 | specified in the schema. | |
413 | * `actual`: required integer greater than or equal to 0. | |
414 | Number of properties in the instance object. | |
415 | ||
416 | ### required {#required} | |
417 | ||
418 | * `missing`: required array of one or more unique strings. | |
419 | The names of properties | |
420 | that are listed in the value of the `required` schema keyword | |
421 | but not present in the instance object. | |
422 | ||
423 | ### additionalProperties {#additionalProperties} | |
424 | ||
425 | This keyword is reported | |
426 | when the schema specifies `additionalProperties: false` | |
427 | and the name of a property of the instance is | |
428 | neither listed in the `properties` keyword | |
429 | nor matches any regular expression in the `patternProperties` keyword. | |
430 | ||
431 | * `disallowed`: required string. | |
432 | Name of the offending property of the instance. | |
433 | ||
434 | (For performance reasons, | |
435 | RapidJSON only reports the first such property encountered.) | |
436 | ||
437 | ### dependencies {#dependencies} | |
438 | ||
439 | * `errors`: required object with one or more properties. | |
440 | Names and values of its properties are described below. | |
441 | ||
442 | Recall that JSON Schema Draft 04 supports | |
443 | *schema dependencies*, | |
444 | where presence of a named *controlling* property | |
445 | requires the instance object to be valid against a subschema, | |
446 | and *property dependencies*, | |
447 | where presence of a controlling property | |
448 | requires other *dependent* properties to be also present. | |
449 | ||
450 | For a violated schema dependency, | |
451 | `errors` will contain a property | |
452 | with the name of the controlling property | |
453 | and its value will be the error object | |
454 | produced by validating the instance object | |
455 | against the dependent schema. | |
456 | ||
457 | For a violated property dependency, | |
458 | `errors` will contain a property | |
459 | with the name of the controlling property | |
460 | and its value will be an array of one or more unique strings | |
461 | listing the missing dependent properties. | |
462 | ||
463 | ## Validation keywords for any instance type {#AnyTypes} | |
464 | ||
465 | ### enum {#enum} | |
466 | ||
467 | This keyword has no additional properties | |
468 | beyond `instanceRef` and `schemaRef`. | |
469 | ||
470 | * The allowed values are not listed | |
471 | because `SchemaDocument` does not store them in original form. | |
472 | * The violating value is not reported | |
473 | because it might be unwieldy. | |
474 | ||
475 | If you need to report these details to your users, | |
476 | you can access the necessary information | |
477 | by following `instanceRef` and `schemaRef`. | |
478 | ||
479 | ### type {#type} | |
480 | ||
481 | * `expected`: required array of one or more unique strings, | |
482 | each of which is one of the seven primitive types | |
483 | defined by the JSON Schema Draft 04 Core specification. | |
484 | Lists the types allowed by the `type` schema keyword. | |
485 | * `actual`: required string, also one of seven primitive types. | |
486 | The primitive type of the instance. | |
487 | ||
488 | ### allOf, anyOf, and oneOf {#allOf-anyOf-oneOf} | |
489 | ||
490 | * `errors`: required array of at least one object. | |
491 | There will be as many items as there are subschemas | |
492 | in the `allOf`, `anyOf` or `oneOf` schema keyword, respectively. | |
493 | Each item will be the error value | |
494 | produced by validating the instance | |
495 | against the corresponding subschema. | |
496 | ||
497 | For `allOf`, at least one error value will be non-empty. | |
498 | For `anyOf`, all error values will be non-empty. | |
499 | For `oneOf`, either all error values will be non-empty, | |
500 | or more than one will be empty. | |
501 | ||
502 | ### not {#not} | |
503 | ||
504 | This keyword has no additional properties | |
505 | apart from `instanceRef` and `schemaRef`. |