]>
Commit | Line | Data |
---|---|---|
31f18b77 FG |
1 | # Schema |
2 | ||
3 | (This feature was released in v1.1.0) | |
4 | ||
5 | JSON Schema is a draft standard for describing the format of JSON data. The schema itself is also JSON data. By validating a JSON structure with JSON Schema, your code can safely access the DOM without manually checking types, or whether a key exists, etc. It can also ensure that the serialized JSON conform to a specified schema. | |
6 | ||
7 | RapidJSON implemented a JSON Schema validator for [JSON Schema Draft v4](http://json-schema.org/documentation.html). If you are not familiar with JSON Schema, you may refer to [Understanding JSON Schema](http://spacetelescope.github.io/understanding-json-schema/). | |
8 | ||
9 | [TOC] | |
10 | ||
11 | ## Basic Usage | |
12 | ||
13 | First of all, you need to parse a JSON Schema into `Document`, and then compile the `Document` into a `SchemaDocument`. | |
14 | ||
15 | Secondly, construct a `SchemaValidator` with the `SchemaDocument`. It is similar to a `Writer` in the sense of handling SAX events. So, you can use `document.Accept(validator)` to validate a document, and then check the validity. | |
16 | ||
17 | ~~~cpp | |
18 | #include "rapidjson/schema.h" | |
19 | ||
20 | // ... | |
21 | ||
22 | Document sd; | |
23 | if (!sd.Parse(schemaJson).HasParseError()) { | |
24 | // the schema is not a valid JSON. | |
25 | // ... | |
26 | } | |
27 | SchemaDocument schema(sd); // Compile a Document to SchemaDocument | |
28 | // sd is no longer needed here. | |
29 | ||
30 | Document d; | |
31 | if (!d.Parse(inputJson).HasParseError()) { | |
32 | // the input is not a valid JSON. | |
33 | // ... | |
34 | } | |
35 | ||
36 | SchemaValidator validator(schema); | |
37 | if (!d.Accept(validator)) { | |
38 | // Input JSON is invalid according to the schema | |
39 | // Output diagnostic information | |
40 | StringBuffer sb; | |
41 | validator.GetInvalidSchemaPointer().StringifyUriFragment(sb); | |
42 | printf("Invalid schema: %s\n", sb.GetString()); | |
43 | printf("Invalid keyword: %s\n", validator.GetInvalidSchemaKeyword()); | |
44 | sb.Clear(); | |
45 | validator.GetInvalidDocumentPointer().StringifyUriFragment(sb); | |
46 | printf("Invalid document: %s\n", sb.GetString()); | |
47 | } | |
48 | ~~~ | |
49 | ||
50 | Some notes: | |
51 | ||
52 | * One `SchemaDocment` can be referenced by multiple `SchemaValidator`s. It will not be modified by `SchemaValidator`s. | |
53 | * A `SchemaValidator` may be reused to validate multiple documents. To run it for other documents, call `validator.Reset()` first. | |
54 | ||
55 | ## Validation during parsing/serialization | |
56 | ||
57 | Unlike most JSON Schema validator implementations, RapidJSON provides a SAX-based schema validator. Therefore, you can parse a JSON from a stream while validating it on the fly. If the validator encounters a JSON value that invalidates the supplied schema, the parsing will be terminated immediately. This design is especially useful for parsing large JSON files. | |
58 | ||
59 | ### DOM parsing | |
60 | ||
61 | For using DOM in parsing, `Document` needs some preparation and finalizing tasks, in addition to receiving SAX events, thus it needs some work to route the reader, validator and the document. `SchemaValidatingReader` is a helper class that doing such work. | |
62 | ||
63 | ~~~cpp | |
64 | #include "rapidjson/filereadstream.h" | |
65 | ||
66 | // ... | |
67 | SchemaDocument schema(sd); // Compile a Document to SchemaDocument | |
68 | ||
69 | // Use reader to parse the JSON | |
70 | FILE* fp = fopen("big.json", "r"); | |
71 | FileReadStream is(fp, buffer, sizeof(buffer)); | |
72 | ||
73 | // Parse JSON from reader, validate the SAX events, and store in d. | |
74 | Document d; | |
75 | SchemaValidatingReader<kParseDefaultFlags, FileReadStream, UTF8<> > reader(is, schema); | |
76 | d.Populate(reader); | |
77 | ||
78 | if (!reader.GetParseResult()) { | |
79 | // Not a valid JSON | |
80 | // When reader.GetParseResult().Code() == kParseErrorTermination, | |
81 | // it may be terminated by: | |
82 | // (1) the validator found that the JSON is invalid according to schema; or | |
83 | // (2) the input stream has I/O error. | |
84 | ||
85 | // Check the validation result | |
86 | if (!reader.IsValid()) { | |
87 | // Input JSON is invalid according to the schema | |
88 | // Output diagnostic information | |
89 | StringBuffer sb; | |
90 | reader.GetInvalidSchemaPointer().StringifyUriFragment(sb); | |
91 | printf("Invalid schema: %s\n", sb.GetString()); | |
92 | printf("Invalid keyword: %s\n", reader.GetInvalidSchemaKeyword()); | |
93 | sb.Clear(); | |
94 | reader.GetInvalidDocumentPointer().StringifyUriFragment(sb); | |
95 | printf("Invalid document: %s\n", sb.GetString()); | |
96 | } | |
97 | } | |
98 | ~~~ | |
99 | ||
100 | ### SAX parsing | |
101 | ||
102 | For using SAX in parsing, it is much simpler. If it only need to validate the JSON without further processing, it is simply: | |
103 | ||
104 | ~~~ | |
105 | SchemaValidator validator(schema); | |
106 | Reader reader; | |
107 | if (!reader.Parse(stream, validator)) { | |
108 | if (!validator.IsValid()) { | |
109 | // ... | |
110 | } | |
111 | } | |
112 | ~~~ | |
113 | ||
114 | This is exactly the method used in the [schemavalidator](example/schemavalidator/schemavalidator.cpp) example. The distinct advantage is low memory usage, no matter how big the JSON was (the memory usage depends on the complexity of the schema). | |
115 | ||
116 | If you need to handle the SAX events further, then you need to use the template class `GenericSchemaValidator` to set the output handler of the validator: | |
117 | ||
118 | ~~~ | |
119 | MyHandler handler; | |
120 | GenericSchemaValidator<SchemaDocument, MyHandler> validator(schema, handler); | |
121 | Reader reader; | |
122 | if (!reader.Parse(ss, validator)) { | |
123 | if (!validator.IsValid()) { | |
124 | // ... | |
125 | } | |
126 | } | |
127 | ~~~ | |
128 | ||
129 | ### Serialization | |
130 | ||
131 | It is also possible to do validation during serializing. This can ensure the result JSON is valid according to the JSON schema. | |
132 | ||
133 | ~~~ | |
134 | StringBuffer sb; | |
135 | Writer<StringBuffer> writer(sb); | |
136 | GenericSchemaValidator<SchemaDocument, Writer<StringBuffer> > validator(s, writer); | |
137 | if (!d.Accept(validator)) { | |
138 | // Some problem during Accept(), it may be validation or encoding issues. | |
139 | if (!validator.IsValid()) { | |
140 | // ... | |
141 | } | |
142 | } | |
143 | ~~~ | |
144 | ||
145 | Of course, if your application only needs SAX-style serialization, it can simply send SAX events to `SchemaValidator` instead of `Writer`. | |
146 | ||
147 | ## Remote Schema | |
148 | ||
149 | JSON Schema supports [`$ref` keyword](http://spacetelescope.github.io/understanding-json-schema/structuring.html), which is a [JSON pointer](doc/pointer.md) referencing to a local or remote schema. Local pointer is prefixed with `#`, while remote pointer is an relative or absolute URI. For example: | |
150 | ||
151 | ~~~js | |
152 | { "$ref": "definitions.json#/address" } | |
153 | ~~~ | |
154 | ||
155 | As `SchemaDocument` does not know how to resolve such URI, it needs a user-provided `IRemoteSchemaDocumentProvider` instance to do so. | |
156 | ||
157 | ~~~ | |
158 | class MyRemoteSchemaDocumentProvider : public IRemoteSchemaDocumentProvider { | |
159 | public: | |
160 | virtual const SchemaDocument* GetRemoteDocument(const char* uri, SizeTyp length) { | |
161 | // Resolve the uri and returns a pointer to that schema. | |
162 | } | |
163 | }; | |
164 | ||
165 | // ... | |
166 | ||
167 | MyRemoteSchemaDocumentProvider provider; | |
168 | SchemaDocument schema(sd, &provider); | |
169 | ~~~ | |
170 | ||
171 | ## Conformance | |
172 | ||
173 | RapidJSON passed 262 out of 263 tests in [JSON Schema Test Suite](https://github.com/json-schema/JSON-Schema-Test-Suite) (Json Schema draft 4). | |
174 | ||
175 | The failed test is "changed scope ref invalid" of "change resolution scope" in `refRemote.json`. It is due to that `id` schema keyword and URI combining function are not implemented. | |
176 | ||
177 | Besides, the `format` schema keyword for string values is ignored, since it is not required by the specification. | |
178 | ||
179 | ### Regular Expression | |
180 | ||
181 | The schema keyword `pattern` and `patternProperties` uses regular expression to match the required pattern. | |
182 | ||
183 | RapidJSON implemented a simple NFA regular expression engine, which is used by default. It supports the following syntax. | |
184 | ||
185 | |Syntax|Description| | |
186 | |------|-----------| | |
187 | |`ab` | Concatenation | | |
188 | |`a|b` | Alternation | | |
189 | |`a?` | Zero or one | | |
190 | |`a*` | Zero or more | | |
191 | |`a+` | One or more | | |
192 | |`a{3}` | Exactly 3 times | | |
193 | |`a{3,}` | At least 3 times | | |
194 | |`a{3,5}`| 3 to 5 times | | |
195 | |`(ab)` | Grouping | | |
196 | |`^a` | At the beginning | | |
197 | |`a$` | At the end | | |
198 | |`.` | Any character | | |
199 | |`[abc]` | Character classes | | |
200 | |`[a-c]` | Character class range | | |
201 | |`[a-z0-9_]` | Character class combination | | |
202 | |`[^abc]` | Negated character classes | | |
203 | |`[^a-c]` | Negated character class range | | |
204 | |`[\b]` | Backspace (U+0008) | | |
205 | |`\|`, `\\`, ... | Escape characters | | |
206 | |`\f` | Form feed (U+000C) | | |
207 | |`\n` | Line feed (U+000A) | | |
208 | |`\r` | Carriage return (U+000D) | | |
209 | |`\t` | Tab (U+0009) | | |
210 | |`\v` | Vertical tab (U+000B) | | |
211 | ||
212 | For C++11 compiler, it is also possible to use the `std::regex` by defining `RAPIDJSON_SCHEMA_USE_INTERNALREGEX=0` and `RAPIDJSON_SCHEMA_USE_STDREGEX=1`. If your schemas do not need `pattern` and `patternProperties`, you can set both macros to zero to disable this feature, which will reduce some code size. | |
213 | ||
214 | ## Performance | |
215 | ||
216 | Most C++ JSON libraries do not yet support JSON Schema. So we tried to evaluate the performance of RapidJSON's JSON Schema validator according to [json-schema-benchmark](https://github.com/ebdrup/json-schema-benchmark), which tests 11 JavaScript libraries running on Node.js. | |
217 | ||
218 | That benchmark runs validations on [JSON Schema Test Suite](https://github.com/json-schema/JSON-Schema-Test-Suite), in which some test suites and tests are excluded. We made the same benchmarking procedure in [`schematest.cpp`](test/perftest/schematest.cpp). | |
219 | ||
220 | On a Mac Book Pro (2.8 GHz Intel Core i7), the following results are collected. | |
221 | ||
222 | |Validator|Relative speed|Number of test runs per second| | |
223 | |---------|:------------:|:----------------------------:| | |
224 | |RapidJSON|155%|30682| | |
225 | |[`ajv`](https://github.com/epoberezkin/ajv)|100%|19770 (± 1.31%)| | |
226 | |[`is-my-json-valid`](https://github.com/mafintosh/is-my-json-valid)|70%|13835 (± 2.84%)| | |
227 | |[`jsen`](https://github.com/bugventure/jsen)|57.7%|11411 (± 1.27%)| | |
228 | |[`schemasaurus`](https://github.com/AlexeyGrishin/schemasaurus)|26%|5145 (± 1.62%)| | |
229 | |[`themis`](https://github.com/playlyfe/themis)|19.9%|3935 (± 2.69%)| | |
230 | |[`z-schema`](https://github.com/zaggino/z-schema)|7%|1388 (± 0.84%)| | |
231 | |[`jsck`](https://github.com/pandastrike/jsck#readme)|3.1%|606 (± 2.84%)| | |
232 | |[`jsonschema`](https://github.com/tdegrunt/jsonschema#readme)|0.9%|185 (± 1.01%)| | |
233 | |[`skeemas`](https://github.com/Prestaul/skeemas#readme)|0.8%|154 (± 0.79%)| | |
234 | |tv4|0.5%|93 (± 0.94%)| | |
235 | |[`jayschema`](https://github.com/natesilva/jayschema)|0.1%|21 (± 1.14%)| | |
236 | ||
237 | That is, RapidJSON is about 1.5x faster than the fastest JavaScript library (ajv). And 1400x faster than the slowest one. |