1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
4 Copyright (C) 2009 Andreas Haberstroh?
6 Distributed under the Boost Software License, Version 1.0. (See accompanying
7 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
8 ===============================================================================/]
10 [section:indepth In Depth]
12 [section:parsers_indepth Parsers in Depth]
14 This section is not for the faint of heart. In here, are distilled the inner
15 workings of __qi__ parsers, using real code from the __spirit__ library as
16 examples. On the other hand, here is no reason to fear reading on, though.
17 We tried to explain things step by step while highlighting the important
20 The `__parser_concept__` class is the base class for all parsers.
22 [import ../../../../boost/spirit/home/qi/parser.hpp]
25 The `__parser_concept__` class does not really know how to parse anything but
26 instead relies on the template parameter `Derived` to do the actual parsing.
27 This technique is known as the "Curiously Recurring Template Pattern" in template
28 meta-programming circles. This inheritance strategy gives us the power of
29 polymorphism without the virtual function overhead. In essence this is a way to
30 implement compile time polymorphism.
32 The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`,
33 `__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary
34 facilities for parser detection, introspection, transformation and visitation.
36 Derived parsers must support the following:
38 [variablelist bool parse(f, l, context, skip, attr)
39 [[`f`, `l`] [first/last iterator pair]]
40 [[`context`] [enclosing rule context (can be unused_type)]]
41 [[`skip`] [skipper (can be unused_type)]]
42 [[`attr`] [attribute (can be unused_type)]]
45 The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`.
46 It's a type used every where in __spirit__ to signify "don't-care". There
47 is an overload for /skip/ for `unused_type` that is simply a no-op.
48 That way, we do not have to write multiple parse functions for
49 phrase and character level parsing.
51 Here are the basic rules for parsing:
53 * The parser returns `true` if successful, `false` otherwise.
54 * If successful, `first` is incremented N number of times, where N
55 is the number of characters parsed. N can be zero --an empty (epsilon)
57 * If successful, the parsed attribute is assigned to /attr/
58 * If unsuccessful, `first` is reset to its position before entering
59 the parser function. /attr/ is untouched.
61 [variablelist void what(context)
62 [[`context`] [enclosing rule context (can be `unused_type`)]]
65 The /what/ function should be obvious. It provides some information
66 about ["what] the parser is. It is used as a debugging aid, for
69 [variablelist P::template attribute<context>::type
70 [[`P`] [a parser type]]
71 [[`context`] [A context type (can be unused_type)]]
74 The /attribute/ metafunction returns the expected attribute type
75 of the parser. In some cases, this is context dependent.
77 In this section, we will dissect two parser types:
80 [[`__primitive_parser_concept__`] [A parser for primitive data (e.g. integer parsing).]]
81 [[`__unary_parser_concept__`] [A parser that has single subject (e.g. kleene star).]]
84 [/------------------------------------------------------------------------------]
85 [heading Primitive Parsers]
87 For our dissection study, we will use a __spirit__ primitive, the `any_int_parser`
88 in the boost::spirit::qi namespace.
90 [import ../../../../boost/spirit/home/qi/numeric/int.hpp]
91 [primitive_parsers_any_int_parser]
93 The `any_int_parser` is derived from a `__primitive_parser_concept__<Derived>`,
94 which in turn derives from `parser<Derived>`. Therefore, it supports the
95 following requirements:
97 * The `parse` member function
98 * The `what` member function
99 * The nested `attribute` metafunction
101 /parse/ is the main entry point. For primitive parsers, our first thing to do is
105 qi::skip(first, last, skipper);
108 to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The
109 actual parsing code is placed in `extract_int<T, Radix, MinDigits,
110 MaxDigits>::call(first, last, attr);`
112 This simple no-frills protocol is one of the reasons why __spirit__ is
113 fast. If you know the internals of __classic__ and perhaps
114 even wrote some parsers with it, this simple __spirit__ mechanism
115 is a joy to work with. There are no scanners and all that crap.
117 The /what/ function just tells us that it is an integer parser. Simple.
119 The /attribute/ metafunction returns the T template parameter. We associate the
120 `any_int_parser` to some placeholders for `short_`, `int_`, `long_` and
121 `long_long` types. But, first, we enable these placeholders in namespace
124 [primitive_parsers_enable_short]
125 [primitive_parsers_enable_int]
126 [primitive_parsers_enable_long]
127 [primitive_parsers_enable_long_long]
129 Notice that `any_int_parser` is placed in the namespace boost::spirit::qi
130 while these /enablers/ are in namespace boost::spirit. The reason is
131 that these placeholders are shared by other __spirit__ /domains/. __qi__,
132 the parser is one domain. __karma__, the generator is another domain.
133 Other parser technologies may be developed and placed in yet
134 another domain. Yet, all these can potentially share the same
135 placeholders for interoperability. The interpretation of these
136 placeholders is domain-specific.
138 Now that we enabled the placeholders, we have to write generators
139 for them. The make_xxx stuff (in boost::spirit::qi namespace):
141 [primitive_parsers_make_int]
143 This one above is our main generator. It's a simple function object
144 with 2 (unused) arguments. These arguments are
146 # The actual terminal value obtained by proto. In this case, either
147 a short_, int_, long_ or long_long. We don't care about this.
149 # Modifiers. We also don't care about this. This allows directives
150 such as `no_case[p]` to pass information to inner parser nodes.
151 We'll see how that works later.
155 [primitive_parsers_short_primitive]
156 [primitive_parsers_int_primitive]
157 [primitive_parsers_long_primitive]
158 [primitive_parsers_long_long_primitive]
160 These, specialize `qi:make_primitive` for specific tags. They all
161 inherit from `make_int` which does the actual work.
163 [heading Composite Parsers]
165 Let me present the kleene star (also in namespace spirit::qi):
167 [import ../../../../boost/spirit/home/qi/operator/kleene.hpp]
168 [composite_parsers_kleene]
170 Looks similar in form to its primitive cousin, the `int_parser`. And, again, it
171 has the same basic ingredients required by `Derived`.
173 * The nested attribute metafunction
174 * The parse member function
175 * The what member function
177 kleene is a composite parser. It is a parser that composes another
178 parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it.
179 Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives
180 from `parser<Derived>`.
182 unary_parser<Derived>, has these expression requirements on Derived:
184 * p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.)
185 * P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.)
187 /parse/ is the main parser entry point. Since this is not a primitive
188 parser, we do not need to call `qi::skip(first, last, skipper)`. The
189 ['subject], if it is a primitive, will do the pre-skip. If if it is
190 another composite parser, it will eventually call a primitive parser
191 somewhere down the line which will do the pre-skip. This makes it a
192 lot more efficient than __classic__. __classic__ puts the skipping business
193 into the so-called "scanner" which blindly attempts a pre-skip
194 every time we increment the iterator.
196 What is the /attribute/ of the kleene? In general, it is a `std::vector<T>`
197 where `T` is the attribute of the subject. There is a special case though.
198 If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`.
199 `traits::build_std_vector` takes care of that minor detail.
201 So, let's parse. First, we need to provide a local attribute of for
205 typename traits::attribute_of<Subject, Context>::type val;
208 `traits::attribute_of<Subject, Context>` simply calls the subject's
209 `struct attribute<Context>` nested metafunction.
211 /val/ starts out default initialized. This val is the one we'll
212 pass to the subject's parse function.
214 The kleene repeats indefinitely while the subject parser is
215 successful. On each successful parse, we `push_back` the parsed
216 attribute to the kleene's attribute, which is expected to be,
217 at the very least, compatible with a `std::vector`. In other words,
218 although we say that we want our attribute to be a `std::vector`,
219 we try to be more lenient than that. The caller of kleene's
220 parse may pass a different attribute type. For as long as it is
221 also a conforming STL container with `push_back`, we are ok. Here
225 while (subject.parse(first, last, context, skipper, val))
227 // push the parsed value into our attribute
228 traits::push_back(attr, val);
233 Take note that we didn't call attr.push_back(val). Instead, we
234 called a Spirit provided function:
237 traits::push_back(attr, val);
240 This is a recurring pattern. The reason why we do it this way is
241 because attr [*can] be `unused_type`. `traits::push_back` takes care
242 of that detail. The overload for unused_type is a no-op. Now, you
243 can imagine why __spirit__ is fast! The parsers are so simple and the
244 generated code is as efficient as a hand rolled loop. All these
245 parser compositions and recursive parse invocations are extensively
246 inlined by a modern C++ compiler. In the end, you get a tight loop
247 when you use the kleene. No more excess baggage. If the attribute
248 is unused, then there is no code generated for that. That's how
249 __spirit__ is designed.
251 The /what/ function simply wraps the output of the subject in a
254 Ok, now, like the `int_parser`, we have to hook our parser to the
255 _qi_ engine. Here's how we do it:
257 First, we enable the prefix star operator. In proto, it's called
260 [composite_parsers_kleene_enable_]
262 This is done in namespace `boost::spirit` like its friend, the `use_terminal`
263 specialization for our `int_parser`. Obviously, we use /use_operator/ to
264 enable the dereference for the qi::domain.
266 Then, we need to write our generator (in namespace qi):
268 [composite_parsers_kleene_generator]
270 This essentially says; for all expressions of the form: `*p`, to build a kleene
271 parser. Elements is a __fusion__ sequence. For the kleene, which is a unary
272 operator, expect only one element in the sequence. That element is the subject
275 We still don't care about the Modifiers. We'll see how the modifiers is
276 all about when we get to deep directives.