[ceph.git] / ceph / src / boost / libs / spirit / doc / advanced / indepth.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser
    Copyright (C) 2009 Andreas Haberstroh?

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section:indepth In Depth]

[section:parsers_indepth Parsers in Depth]

This section is not for the faint of heart. In here, are distilled the inner 
workings of __qi__ parsers, using real code from the __spirit__ library as 
examples. On the other hand, here is no reason to fear reading on, though. 
We tried to explain things step by step while highlighting the important 
insights.

The `__parser_concept__` class is the base class for all parsers. 

[import ../../../../boost/spirit/home/qi/parser.hpp]
[parser_base_parser]

The `__parser_concept__` class does not really know how to parse anything but 
instead relies on the template parameter `Derived` to do the actual parsing. 
This technique is known as the "Curiously Recurring Template Pattern" in template 
meta-programming circles. This inheritance strategy gives us the power of 
polymorphism without the virtual function overhead. In essence this is a way to 
implement compile time polymorphism.

The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`,
`__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary 
facilities for parser detection, introspection, transformation and visitation.

Derived parsers must support the following:

[variablelist bool parse(f, l, context, skip, attr)
  [[`f`, `l`] [first/last iterator pair]]
  [[`context`]    [enclosing rule context (can be unused_type)]]
  [[`skip`]   [skipper (can be unused_type)]]
  [[`attr`]   [attribute (can be unused_type)]]
]

The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`. 
It's a type used every where in __spirit__ to signify "don't-care". There
is an overload for /skip/ for `unused_type` that is simply a no-op.
That way, we do not have to write multiple parse functions for
phrase and character level parsing.

Here are the basic rules for parsing:

* The parser returns `true` if successful, `false` otherwise.
* If successful, `first` is incremented N number of times, where N
   is the number of characters parsed. N can be zero --an empty (epsilon)
   match.
* If successful, the parsed attribute is assigned to /attr/
* If unsuccessful, `first` is reset to its position before entering
   the parser function. /attr/ is untouched.

[variablelist void what(context)
  [[`context`]    [enclosing rule context (can be `unused_type`)]]
]

The /what/ function should be obvious. It provides some information
about ["what] the parser is. It is used as a debugging aid, for
example.

[variablelist P::template attribute<context>::type
  [[`P`]       [a parser type]]
  [[`context`] [A context type (can be unused_type)]]
]

The /attribute/ metafunction returns the expected attribute type
of the parser. In some cases, this is context dependent.

In this section, we will dissect two parser types:

[variablelist Parsers
  [[`__primitive_parser_concept__`]  [A parser for primitive data (e.g. integer parsing).]]
  [[`__unary_parser_concept__`]  [A parser that has single subject (e.g. kleene star).]]
] 

[/------------------------------------------------------------------------------]
[heading Primitive Parsers]

For our dissection study, we will use a __spirit__ primitive, the `any_int_parser`
in the boost::spirit::qi namespace.

[import ../../../../boost/spirit/home/qi/numeric/int.hpp]
[primitive_parsers_any_int_parser]

The `any_int_parser` is derived from a `__primitive_parser_concept__<Derived>`,
which in turn derives from `parser<Derived>`. Therefore, it supports the
following requirements:

* The `parse` member function
* The `what` member function
* The nested `attribute` metafunction

/parse/ is the main entry point. For primitive parsers, our first thing to do is 
call:

``
qi::skip(first, last, skipper);
``

to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The
actual parsing code is placed in `extract_int<T, Radix, MinDigits,
MaxDigits>::call(first, last, attr);`

This simple no-frills protocol is one of the reasons why __spirit__ is
fast. If you know the internals of __classic__ and perhaps
even wrote some parsers with it, this simple __spirit__ mechanism
is a joy to work with. There are no scanners and all that crap.

The /what/ function just tells us that it is an integer parser. Simple.

The /attribute/ metafunction returns the T template parameter. We associate the 
`any_int_parser` to some placeholders for `short_`, `int_`, `long_` and
`long_long` types. But, first, we enable these placeholders in namespace
boost::spirit:

[primitive_parsers_enable_short]
[primitive_parsers_enable_int]
[primitive_parsers_enable_long]
[primitive_parsers_enable_long_long]

Notice that `any_int_parser` is placed in the namespace boost::spirit::qi
while these /enablers/ are in namespace boost::spirit. The reason is
that these placeholders are shared by other __spirit__ /domains/. __qi__,
the parser is one domain. __karma__, the generator is another domain.
Other parser technologies may be developed and placed in yet
another domain. Yet, all these can potentially share the same
placeholders for interoperability. The interpretation of these
placeholders is domain-specific.

Now that we enabled the placeholders, we have to write generators
for them. The make_xxx stuff (in boost::spirit::qi namespace):

[primitive_parsers_make_int]

This one above is our main generator. It's a simple function object
with 2 (unused) arguments. These arguments are

# The actual terminal value obtained by proto. In this case, either
  a short_, int_, long_ or long_long. We don't care about this.

# Modifiers. We also don't care about this. This allows directives
  such as `no_case[p]` to pass information to inner parser nodes.
  We'll see how that works later.

Now:

[primitive_parsers_short_primitive]
[primitive_parsers_int_primitive]
[primitive_parsers_long_primitive]
[primitive_parsers_long_long_primitive]

These, specialize `qi:make_primitive` for specific tags. They all
inherit from `make_int` which does the actual work.

[heading Composite Parsers]

Let me present the kleene star (also in namespace spirit::qi):

[import ../../../../boost/spirit/home/qi/operator/kleene.hpp]
[composite_parsers_kleene]

Looks similar in form to its primitive cousin, the `int_parser`. And, again, it
has the same basic ingredients required by `Derived`.

* The nested attribute metafunction
* The parse member function
* The what member function

kleene is a composite parser. It is a parser that composes another
parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it.
Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives 
from `parser<Derived>`.

unary_parser<Derived>, has these expression requirements on Derived:

* p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.)
* P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.)

/parse/ is the main parser entry point. Since this is not a primitive
parser, we do not need to call `qi::skip(first, last, skipper)`. The
['subject], if it is a primitive, will do the pre-skip. If if it is
another composite parser, it will eventually call a primitive parser
somewhere down the line which will do the pre-skip. This makes it a
lot more efficient than __classic__. __classic__ puts the skipping business
into the so-called "scanner" which blindly attempts a pre-skip
every time we increment the iterator.

What is the /attribute/ of the kleene? In general, it is a `std::vector<T>`
where `T` is the attribute of the subject. There is a special case though.
If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`.
`traits::build_std_vector` takes care of that minor detail.

So, let's parse. First, we need to provide a local attribute of for
the subject:

``
typename traits::attribute_of<Subject, Context>::type val;
``

`traits::attribute_of<Subject, Context>` simply calls the subject's
`struct attribute<Context>` nested metafunction.

/val/ starts out default initialized. This val is the one we'll
pass to the subject's parse function.

The kleene repeats indefinitely while the subject parser is
successful. On each successful parse, we `push_back` the parsed
attribute to the kleene's attribute, which is expected to be,
at the very least, compatible with a `std::vector`. In other words,
although we say that we want our attribute to be a `std::vector`,
we try to be more lenient than that. The caller of kleene's
parse may pass a different attribute type. For as long as it is
also a conforming STL container with `push_back`, we are ok. Here
is the kleene loop:

``
while (subject.parse(first, last, context, skipper, val))
{
    // push the parsed value into our attribute
    traits::push_back(attr, val);
    traits::clear(val);
}
return true;
``
Take note that we didn't call attr.push_back(val). Instead, we
called a Spirit provided function:

``
traits::push_back(attr, val);
``

This is a recurring pattern. The reason why we do it this way is
because attr [*can] be `unused_type`. `traits::push_back` takes care
of that detail. The overload for unused_type is a no-op. Now, you
can imagine why __spirit__ is fast! The parsers are so simple and the
generated code is as efficient as a hand rolled loop. All these
parser compositions and recursive parse invocations are extensively
inlined by a modern C++ compiler. In the end, you get a tight loop
when you use the kleene. No more excess baggage. If the attribute
is unused, then there is no code generated for that. That's how
__spirit__ is designed.

The /what/ function simply wraps the output of the subject in a 
"kleene[" ... "]".

Ok, now, like the `int_parser`, we have to hook our parser to the
_qi_ engine. Here's how we do it:

First, we enable the prefix star operator. In proto, it's called
the "dereference":

[composite_parsers_kleene_enable_]

This is done in namespace `boost::spirit` like its friend, the `use_terminal`
specialization for our `int_parser`. Obviously, we use /use_operator/ to
enable the dereference for the qi::domain.

Then, we need to write our generator (in namespace qi):

[composite_parsers_kleene_generator]

This essentially says; for all expressions of the form: `*p`, to build a kleene 
parser. Elements is a __fusion__ sequence. For the kleene, which is a unary 
operator, expect only one element in the sequence. That element is the subject 
of the kleene.

We still don't care about the Modifiers. We'll see how the modifiers is
all about when we get to deep directives.

[endsect]

[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4	Copyright (C) 2009 Andreas Haberstroh?
	5
	6	Distributed under the Boost Software License, Version 1.0. (See accompanying
	7	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	8	===============================================================================/]
	9
	10	[section:indepth In Depth]
	11
	12	[section:parsers_indepth Parsers in Depth]
	13
	14	This section is not for the faint of heart. In here, are distilled the inner
	15	workings of __qi__ parsers, using real code from the __spirit__ library as
	16	examples. On the other hand, here is no reason to fear reading on, though.
	17	We tried to explain things step by step while highlighting the important
	18	insights.
	19
	20	The `__parser_concept__` class is the base class for all parsers.
	21
	22	[import ../../../../boost/spirit/home/qi/parser.hpp]
	23	[parser_base_parser]
	24
	25	The `__parser_concept__` class does not really know how to parse anything but
	26	instead relies on the template parameter `Derived` to do the actual parsing.
	27	This technique is known as the "Curiously Recurring Template Pattern" in template
	28	meta-programming circles. This inheritance strategy gives us the power of
	29	polymorphism without the virtual function overhead. In essence this is a way to
	30	implement compile time polymorphism.
	31
	32	The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`,
	33	`__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary
	34	facilities for parser detection, introspection, transformation and visitation.
	35
	36	Derived parsers must support the following:
	37
	38	[variablelist bool parse(f, l, context, skip, attr)
	39	[[`f`, `l`] [first/last iterator pair]]
	40	[[`context`] [enclosing rule context (can be unused_type)]]
	41	[[`skip`] [skipper (can be unused_type)]]
	42	[[`attr`] [attribute (can be unused_type)]]
	43	]
	44
	45	The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`.
	46	It's a type used every where in __spirit__ to signify "don't-care". There
	47	is an overload for /skip/ for `unused_type` that is simply a no-op.
	48	That way, we do not have to write multiple parse functions for
	49	phrase and character level parsing.
	50
	51	Here are the basic rules for parsing:
	52
	53	* The parser returns `true` if successful, `false` otherwise.
	54	* If successful, `first` is incremented N number of times, where N
	55	is the number of characters parsed. N can be zero --an empty (epsilon)
	56	match.
	57	* If successful, the parsed attribute is assigned to /attr/
	58	* If unsuccessful, `first` is reset to its position before entering
	59	the parser function. /attr/ is untouched.
	60
	61	[variablelist void what(context)
	62	[[`context`] [enclosing rule context (can be `unused_type`)]]
	63	]
	64
65	The /what/ function should be obvious. It provides some information
66	about ["what] the parser is. It is used as a debugging aid, for
67	example.
68
69	[variablelist P::template attribute<context>::type
70	[[`P`] [a parser type]]
71	[[`context`] [A context type (can be unused_type)]]
72	]
73
74	The /attribute/ metafunction returns the expected attribute type
75	of the parser. In some cases, this is context dependent.
76
77	In this section, we will dissect two parser types:
78
79	[variablelist Parsers
80	[[`__primitive_parser_concept__`] [A parser for primitive data (e.g. integer parsing).]]
81	[[`__unary_parser_concept__`] [A parser that has single subject (e.g. kleene star).]]
82	]
83
84	[/------------------------------------------------------------------------------]
85	[heading Primitive Parsers]
86
87	For our dissection study, we will use a __spirit__ primitive, the `any_int_parser`
88	in the boost::spirit::qi namespace.
89
90	[import ../../../../boost/spirit/home/qi/numeric/int.hpp]
91	[primitive_parsers_any_int_parser]
92
93	The `any_int_parser` is derived from a `__primitive_parser_concept__<Derived>`,
94	which in turn derives from `parser<Derived>`. Therefore, it supports the
95	following requirements:
96
97	* The `parse` member function
98	* The `what` member function
99	* The nested `attribute` metafunction
100
101	/parse/ is the main entry point. For primitive parsers, our first thing to do is
102	call:
103
104	``
105	qi::skip(first, last, skipper);
106	``
107
108	to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The
109	actual parsing code is placed in `extract_int<T, Radix, MinDigits,
110	MaxDigits>::call(first, last, attr);`
111
112	This simple no-frills protocol is one of the reasons why __spirit__ is
113	fast. If you know the internals of __classic__ and perhaps
114	even wrote some parsers with it, this simple __spirit__ mechanism
115	is a joy to work with. There are no scanners and all that crap.
116
117	The /what/ function just tells us that it is an integer parser. Simple.
118
119	The /attribute/ metafunction returns the T template parameter. We associate the
120	`any_int_parser` to some placeholders for `short_`, `int_`, `long_` and
121	`long_long` types. But, first, we enable these placeholders in namespace
122	boost::spirit:
123
124	[primitive_parsers_enable_short]
125	[primitive_parsers_enable_int]
126	[primitive_parsers_enable_long]
127	[primitive_parsers_enable_long_long]
128
129	Notice that `any_int_parser` is placed in the namespace boost::spirit::qi
130	while these /enablers/ are in namespace boost::spirit. The reason is
131	that these placeholders are shared by other __spirit__ /domains/. __qi__,
132	the parser is one domain. __karma__, the generator is another domain.
133	Other parser technologies may be developed and placed in yet
134	another domain. Yet, all these can potentially share the same
135	placeholders for interoperability. The interpretation of these
136	placeholders is domain-specific.
137
138	Now that we enabled the placeholders, we have to write generators
139	for them. The make_xxx stuff (in boost::spirit::qi namespace):
140
141	[primitive_parsers_make_int]
142
143	This one above is our main generator. It's a simple function object
144	with 2 (unused) arguments. These arguments are
145
146	# The actual terminal value obtained by proto. In this case, either
147	a short_, int_, long_ or long_long. We don't care about this.
148
149	# Modifiers. We also don't care about this. This allows directives
150	such as `no_case[p]` to pass information to inner parser nodes.
151	We'll see how that works later.
152
153	Now:
154
155	[primitive_parsers_short_primitive]
156	[primitive_parsers_int_primitive]
157	[primitive_parsers_long_primitive]
158	[primitive_parsers_long_long_primitive]
159
160	These, specialize `qi:make_primitive` for specific tags. They all
161	inherit from `make_int` which does the actual work.
162
163	[heading Composite Parsers]
164
165	Let me present the kleene star (also in namespace spirit::qi):
166
167	[import ../../../../boost/spirit/home/qi/operator/kleene.hpp]
168	[composite_parsers_kleene]
169
170	Looks similar in form to its primitive cousin, the `int_parser`. And, again, it
171	has the same basic ingredients required by `Derived`.
172
173	* The nested attribute metafunction
174	* The parse member function
175	* The what member function
176
177	kleene is a composite parser. It is a parser that composes another
178	parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it.
179	Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives
180	from `parser<Derived>`.
181
182	unary_parser<Derived>, has these expression requirements on Derived:
183
184	* p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.)
185	* P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.)
186
187	/parse/ is the main parser entry point. Since this is not a primitive
188	parser, we do not need to call `qi::skip(first, last, skipper)`. The
189	['subject], if it is a primitive, will do the pre-skip. If if it is
190	another composite parser, it will eventually call a primitive parser
191	somewhere down the line which will do the pre-skip. This makes it a
192	lot more efficient than __classic__. __classic__ puts the skipping business
193	into the so-called "scanner" which blindly attempts a pre-skip
194	every time we increment the iterator.
195
196	What is the /attribute/ of the kleene? In general, it is a `std::vector<T>`
197	where `T` is the attribute of the subject. There is a special case though.
198	If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`.
199	`traits::build_std_vector` takes care of that minor detail.
200
201	So, let's parse. First, we need to provide a local attribute of for
202	the subject:
203
204	``
205	typename traits::attribute_of<Subject, Context>::type val;
206	``
207
208	`traits::attribute_of<Subject, Context>` simply calls the subject's
209	`struct attribute<Context>` nested metafunction.
210
211	/val/ starts out default initialized. This val is the one we'll
212	pass to the subject's parse function.
213
214	The kleene repeats indefinitely while the subject parser is
215	successful. On each successful parse, we `push_back` the parsed
216	attribute to the kleene's attribute, which is expected to be,
217	at the very least, compatible with a `std::vector`. In other words,
218	although we say that we want our attribute to be a `std::vector`,
219	we try to be more lenient than that. The caller of kleene's
220	parse may pass a different attribute type. For as long as it is
221	also a conforming STL container with `push_back`, we are ok. Here
222	is the kleene loop:
223
224	``
225	while (subject.parse(first, last, context, skipper, val))
226	{
227	// push the parsed value into our attribute
228	traits::push_back(attr, val);
229	traits::clear(val);
230	}
231	return true;
232	``
233	Take note that we didn't call attr.push_back(val). Instead, we
234	called a Spirit provided function:
235
236	``
237	traits::push_back(attr, val);
238	``
239
240	This is a recurring pattern. The reason why we do it this way is
241	because attr [*can] be `unused_type`. `traits::push_back` takes care
242	of that detail. The overload for unused_type is a no-op. Now, you
243	can imagine why __spirit__ is fast! The parsers are so simple and the
244	generated code is as efficient as a hand rolled loop. All these
245	parser compositions and recursive parse invocations are extensively
246	inlined by a modern C++ compiler. In the end, you get a tight loop
247	when you use the kleene. No more excess baggage. If the attribute
248	is unused, then there is no code generated for that. That's how
249	__spirit__ is designed.
250
251	The /what/ function simply wraps the output of the subject in a
252	"kleene[" ... "]".
253
254	Ok, now, like the `int_parser`, we have to hook our parser to the
255	_qi_ engine. Here's how we do it:
256
257	First, we enable the prefix star operator. In proto, it's called
258	the "dereference":
259
260	[composite_parsers_kleene_enable_]
261
262	This is done in namespace `boost::spirit` like its friend, the `use_terminal`
263	specialization for our `int_parser`. Obviously, we use /use_operator/ to
264	enable the dereference for the qi::domain.
265
266	Then, we need to write our generator (in namespace qi):
267
268	[composite_parsers_kleene_generator]
269
270	This essentially says; for all expressions of the form: `*p`, to build a kleene
271	parser. Elements is a __fusion__ sequence. For the kleene, which is a unary
272	operator, expect only one element in the sequence. That element is the subject
273	of the kleene.
274
275	We still don't care about the Modifiers. We'll see how the modifiers is
276	all about when we get to deep directives.
277
278	[endsect]
279
280	[endsect]