]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | Copyright (C) 2009 Andreas Haberstroh? | |
5 | ||
6 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
7 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
8 | ===============================================================================/] | |
9 | ||
10 | [section:indepth In Depth] | |
11 | ||
12 | [section:parsers_indepth Parsers in Depth] | |
13 | ||
14 | This section is not for the faint of heart. In here, are distilled the inner | |
15 | workings of __qi__ parsers, using real code from the __spirit__ library as | |
16 | examples. On the other hand, here is no reason to fear reading on, though. | |
17 | We tried to explain things step by step while highlighting the important | |
18 | insights. | |
19 | ||
20 | The `__parser_concept__` class is the base class for all parsers. | |
21 | ||
22 | [import ../../../../boost/spirit/home/qi/parser.hpp] | |
23 | [parser_base_parser] | |
24 | ||
25 | The `__parser_concept__` class does not really know how to parse anything but | |
26 | instead relies on the template parameter `Derived` to do the actual parsing. | |
27 | This technique is known as the "Curiously Recurring Template Pattern" in template | |
28 | meta-programming circles. This inheritance strategy gives us the power of | |
29 | polymorphism without the virtual function overhead. In essence this is a way to | |
30 | implement compile time polymorphism. | |
31 | ||
32 | The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`, | |
33 | `__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary | |
34 | facilities for parser detection, introspection, transformation and visitation. | |
35 | ||
36 | Derived parsers must support the following: | |
37 | ||
38 | [variablelist bool parse(f, l, context, skip, attr) | |
39 | [[`f`, `l`] [first/last iterator pair]] | |
40 | [[`context`] [enclosing rule context (can be unused_type)]] | |
41 | [[`skip`] [skipper (can be unused_type)]] | |
42 | [[`attr`] [attribute (can be unused_type)]] | |
43 | ] | |
44 | ||
45 | The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`. | |
46 | It's a type used every where in __spirit__ to signify "don't-care". There | |
47 | is an overload for /skip/ for `unused_type` that is simply a no-op. | |
48 | That way, we do not have to write multiple parse functions for | |
49 | phrase and character level parsing. | |
50 | ||
51 | Here are the basic rules for parsing: | |
52 | ||
53 | * The parser returns `true` if successful, `false` otherwise. | |
54 | * If successful, `first` is incremented N number of times, where N | |
55 | is the number of characters parsed. N can be zero --an empty (epsilon) | |
56 | match. | |
57 | * If successful, the parsed attribute is assigned to /attr/ | |
58 | * If unsuccessful, `first` is reset to its position before entering | |
59 | the parser function. /attr/ is untouched. | |
60 | ||
61 | [variablelist void what(context) | |
62 | [[`context`] [enclosing rule context (can be `unused_type`)]] | |
63 | ] | |
64 | ||
65 | The /what/ function should be obvious. It provides some information | |
66 | about ["what] the parser is. It is used as a debugging aid, for | |
67 | example. | |
68 | ||
69 | [variablelist P::template attribute<context>::type | |
70 | [[`P`] [a parser type]] | |
71 | [[`context`] [A context type (can be unused_type)]] | |
72 | ] | |
73 | ||
74 | The /attribute/ metafunction returns the expected attribute type | |
75 | of the parser. In some cases, this is context dependent. | |
76 | ||
77 | In this section, we will dissect two parser types: | |
78 | ||
79 | [variablelist Parsers | |
80 | [[`__primitive_parser_concept__`] [A parser for primitive data (e.g. integer parsing).]] | |
81 | [[`__unary_parser_concept__`] [A parser that has single subject (e.g. kleene star).]] | |
82 | ] | |
83 | ||
84 | [/------------------------------------------------------------------------------] | |
85 | [heading Primitive Parsers] | |
86 | ||
87 | For our dissection study, we will use a __spirit__ primitive, the `any_int_parser` | |
88 | in the boost::spirit::qi namespace. | |
89 | ||
90 | [import ../../../../boost/spirit/home/qi/numeric/int.hpp] | |
91 | [primitive_parsers_any_int_parser] | |
92 | ||
93 | The `any_int_parser` is derived from a `__primitive_parser_concept__<Derived>`, | |
94 | which in turn derives from `parser<Derived>`. Therefore, it supports the | |
95 | following requirements: | |
96 | ||
97 | * The `parse` member function | |
98 | * The `what` member function | |
99 | * The nested `attribute` metafunction | |
100 | ||
101 | /parse/ is the main entry point. For primitive parsers, our first thing to do is | |
102 | call: | |
103 | ||
104 | `` | |
105 | qi::skip(first, last, skipper); | |
106 | `` | |
107 | ||
108 | to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The | |
109 | actual parsing code is placed in `extract_int<T, Radix, MinDigits, | |
110 | MaxDigits>::call(first, last, attr);` | |
111 | ||
112 | This simple no-frills protocol is one of the reasons why __spirit__ is | |
113 | fast. If you know the internals of __classic__ and perhaps | |
114 | even wrote some parsers with it, this simple __spirit__ mechanism | |
115 | is a joy to work with. There are no scanners and all that crap. | |
116 | ||
117 | The /what/ function just tells us that it is an integer parser. Simple. | |
118 | ||
119 | The /attribute/ metafunction returns the T template parameter. We associate the | |
120 | `any_int_parser` to some placeholders for `short_`, `int_`, `long_` and | |
121 | `long_long` types. But, first, we enable these placeholders in namespace | |
122 | boost::spirit: | |
123 | ||
124 | [primitive_parsers_enable_short] | |
125 | [primitive_parsers_enable_int] | |
126 | [primitive_parsers_enable_long] | |
127 | [primitive_parsers_enable_long_long] | |
128 | ||
129 | Notice that `any_int_parser` is placed in the namespace boost::spirit::qi | |
130 | while these /enablers/ are in namespace boost::spirit. The reason is | |
131 | that these placeholders are shared by other __spirit__ /domains/. __qi__, | |
132 | the parser is one domain. __karma__, the generator is another domain. | |
133 | Other parser technologies may be developed and placed in yet | |
134 | another domain. Yet, all these can potentially share the same | |
135 | placeholders for interoperability. The interpretation of these | |
136 | placeholders is domain-specific. | |
137 | ||
138 | Now that we enabled the placeholders, we have to write generators | |
139 | for them. The make_xxx stuff (in boost::spirit::qi namespace): | |
140 | ||
141 | [primitive_parsers_make_int] | |
142 | ||
143 | This one above is our main generator. It's a simple function object | |
144 | with 2 (unused) arguments. These arguments are | |
145 | ||
146 | # The actual terminal value obtained by proto. In this case, either | |
147 | a short_, int_, long_ or long_long. We don't care about this. | |
148 | ||
149 | # Modifiers. We also don't care about this. This allows directives | |
150 | such as `no_case[p]` to pass information to inner parser nodes. | |
151 | We'll see how that works later. | |
152 | ||
153 | Now: | |
154 | ||
155 | [primitive_parsers_short_primitive] | |
156 | [primitive_parsers_int_primitive] | |
157 | [primitive_parsers_long_primitive] | |
158 | [primitive_parsers_long_long_primitive] | |
159 | ||
160 | These, specialize `qi:make_primitive` for specific tags. They all | |
161 | inherit from `make_int` which does the actual work. | |
162 | ||
163 | [heading Composite Parsers] | |
164 | ||
165 | Let me present the kleene star (also in namespace spirit::qi): | |
166 | ||
167 | [import ../../../../boost/spirit/home/qi/operator/kleene.hpp] | |
168 | [composite_parsers_kleene] | |
169 | ||
170 | Looks similar in form to its primitive cousin, the `int_parser`. And, again, it | |
171 | has the same basic ingredients required by `Derived`. | |
172 | ||
173 | * The nested attribute metafunction | |
174 | * The parse member function | |
175 | * The what member function | |
176 | ||
177 | kleene is a composite parser. It is a parser that composes another | |
178 | parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it. | |
179 | Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives | |
180 | from `parser<Derived>`. | |
181 | ||
182 | unary_parser<Derived>, has these expression requirements on Derived: | |
183 | ||
184 | * p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.) | |
185 | * P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.) | |
186 | ||
187 | /parse/ is the main parser entry point. Since this is not a primitive | |
188 | parser, we do not need to call `qi::skip(first, last, skipper)`. The | |
189 | ['subject], if it is a primitive, will do the pre-skip. If if it is | |
190 | another composite parser, it will eventually call a primitive parser | |
191 | somewhere down the line which will do the pre-skip. This makes it a | |
192 | lot more efficient than __classic__. __classic__ puts the skipping business | |
193 | into the so-called "scanner" which blindly attempts a pre-skip | |
194 | every time we increment the iterator. | |
195 | ||
196 | What is the /attribute/ of the kleene? In general, it is a `std::vector<T>` | |
197 | where `T` is the attribute of the subject. There is a special case though. | |
198 | If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`. | |
199 | `traits::build_std_vector` takes care of that minor detail. | |
200 | ||
201 | So, let's parse. First, we need to provide a local attribute of for | |
202 | the subject: | |
203 | ||
204 | `` | |
205 | typename traits::attribute_of<Subject, Context>::type val; | |
206 | `` | |
207 | ||
208 | `traits::attribute_of<Subject, Context>` simply calls the subject's | |
209 | `struct attribute<Context>` nested metafunction. | |
210 | ||
211 | /val/ starts out default initialized. This val is the one we'll | |
212 | pass to the subject's parse function. | |
213 | ||
214 | The kleene repeats indefinitely while the subject parser is | |
215 | successful. On each successful parse, we `push_back` the parsed | |
216 | attribute to the kleene's attribute, which is expected to be, | |
217 | at the very least, compatible with a `std::vector`. In other words, | |
218 | although we say that we want our attribute to be a `std::vector`, | |
219 | we try to be more lenient than that. The caller of kleene's | |
220 | parse may pass a different attribute type. For as long as it is | |
221 | also a conforming STL container with `push_back`, we are ok. Here | |
222 | is the kleene loop: | |
223 | ||
224 | `` | |
225 | while (subject.parse(first, last, context, skipper, val)) | |
226 | { | |
227 | // push the parsed value into our attribute | |
228 | traits::push_back(attr, val); | |
229 | traits::clear(val); | |
230 | } | |
231 | return true; | |
232 | `` | |
233 | Take note that we didn't call attr.push_back(val). Instead, we | |
234 | called a Spirit provided function: | |
235 | ||
236 | `` | |
237 | traits::push_back(attr, val); | |
238 | `` | |
239 | ||
240 | This is a recurring pattern. The reason why we do it this way is | |
241 | because attr [*can] be `unused_type`. `traits::push_back` takes care | |
242 | of that detail. The overload for unused_type is a no-op. Now, you | |
243 | can imagine why __spirit__ is fast! The parsers are so simple and the | |
244 | generated code is as efficient as a hand rolled loop. All these | |
245 | parser compositions and recursive parse invocations are extensively | |
246 | inlined by a modern C++ compiler. In the end, you get a tight loop | |
247 | when you use the kleene. No more excess baggage. If the attribute | |
248 | is unused, then there is no code generated for that. That's how | |
249 | __spirit__ is designed. | |
250 | ||
251 | The /what/ function simply wraps the output of the subject in a | |
252 | "kleene[" ... "]". | |
253 | ||
254 | Ok, now, like the `int_parser`, we have to hook our parser to the | |
255 | _qi_ engine. Here's how we do it: | |
256 | ||
257 | First, we enable the prefix star operator. In proto, it's called | |
258 | the "dereference": | |
259 | ||
260 | [composite_parsers_kleene_enable_] | |
261 | ||
262 | This is done in namespace `boost::spirit` like its friend, the `use_terminal` | |
263 | specialization for our `int_parser`. Obviously, we use /use_operator/ to | |
264 | enable the dereference for the qi::domain. | |
265 | ||
266 | Then, we need to write our generator (in namespace qi): | |
267 | ||
268 | [composite_parsers_kleene_generator] | |
269 | ||
270 | This essentially says; for all expressions of the form: `*p`, to build a kleene | |
271 | parser. Elements is a __fusion__ sequence. For the kleene, which is a unary | |
272 | operator, expect only one element in the sequence. That element is the subject | |
273 | of the kleene. | |
274 | ||
275 | We still don't care about the Modifiers. We'll see how the modifiers is | |
276 | all about when we get to deep directives. | |
277 | ||
278 | [endsect] | |
279 | ||
280 | [endsect] |