[ceph.git] / ceph / src / boost / libs / spirit / doc / introduction.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section Introduction]

Boost Spirit is an object-oriented, recursive-descent parser and
output generation library for C++. It allows you to write grammars and
format descriptions using a format similar to Extended Backus Naur
Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
ISO-EBNF]] directly in C++. These inline grammar
specifications can mix freely with other C++ code and, thanks to the
generative power of C++ templates, are immediately executable.  In
retrospect, conventional compiler-compilers or parser-generators have
to perform an additional translation step from the source EBNF code to
C or C++ code.

The syntax and semantics of the libraries' API directly form domain-specific 
embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the 
user: 

* one for creating parser grammars, 
* one for the specification of the required tokens to be used for parsing, 
* and one for the description of the required output formats. 

Since the target input grammars and output formats are written entirely in C++ 
we do not need any separate tools to compile, preprocess or integrate those 
into the build process. __spirit__ allows seamless integration of the parsing 
and output generation process with other C++ code. This often allows for 
simpler and more efficient code.

Both the created parsers and generators are fully attributed, which allows you 
to easily build and handle hierarchical data structures in memory. These data 
structures resemble the structure of the input data and can directly be used 
to generate arbitrarily-formatted output.

The [link spirit.spiritstructure figure] below depicts the overall structure 
of the Boost Spirit library. The library consists of 4 major parts:

* __classic__: This is the almost-unchanged code base taken from the 
  former Boost Spirit V1.8 distribution. It has been moved into the namespace 
  boost::spirit::classic. A special compatibility layer has been added to 
  ensure complete compatibility with existing code using Spirit V1.8.
* __qi__: This is the parser library allowing you to build recursive 
  descent parsers. The exposed domain-specific language can be used to describe
  the grammars to implement, and the rules for storing the parsed information.
* __lex__: This is the library usable to create tokenizers (lexers). The
  domain-specific language exposed by __lex__ allows you to define regular 
  expressions used to match tokens (create token definitions), associate these 
  regular expressions with code to be executed whenever they are matched, and 
  to add the token definitions to the lexical analyzer.
* __karma__: This is the generator library allowing you to create code for 
  recursive descent, data type-driven output formatting. The exposed
  domain-specific language is almost equivalent to the parser description language 
  used in __qi__, except that it is used to describe the required output 
  format to generate from a given data structure.

[fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]


The three components, __qi__, __karma__ and __lex__, are designed to be used
either stand alone, or together. The general methodology is to use the token 
sequence generated by __lex__ as the input for a parser generated by __qi__.
On the opposite side of the equation, the hierarchical data structures generated 
by __qi__ are used for the output generators created using __karma__. 
However, there is nothing to stop you from using any of these components all 
by themselves.

The [link spirit.spiritkarmaflow figure] below shows the typical data flow of 
some input being converted to some internal representation. After some 
(optional) transformation these data are converted back into some different, 
external representation. The picture highlights Spirit's place in this data 
transformation flow.

[fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]

[heading A Quick Overview of Parsing with __qi__]

__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
target grammar (essentially a format description of the input data to read).

A simple EBNF grammar snippet: 

    group       ::= '(' expression ')'
    factor      ::= integer | group
    term        ::= factor (('*' factor) | ('/' factor))*
    expression  ::= term (('+' term) | ('-' term))*

is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this 
code snippet: 

    group       = '(' >> expression >> ')';
    factor      = integer | group;
    term        = factor >> *(('*' >> factor) | ('/' >> factor));
    expression  = term >> *(('+' >> term) | ('-' >> term));

Through the magic of expression templates, this is perfectly valid and 
executable C++ code. The production rule `expression` is, in fact, an object that 
has a member function `parse` that does the work given a source code written in 
the grammar that we have just declared. Yes, it's a calculator. We shall 
simplify for now by skipping the type declarations and the definition of the 
rule `integer` invoked by `factor`. Now, the production rule `expression` in our 
grammar specification, traditionally called the `start` symbol, can recognize 
inputs such as: 

    12345
    -12345
    +12345
    1 + 2
    1 * 2
    1/2 + 3/4
    1 + 2 + 3 + 4
    1 * 2 * 3 * 4
    (1 + 2) * (3 + 4)
    (-1 + 2) * (3 + -4)
    1 + ((6 * 200) - 20) / 6
    (1 + (2 + (3 + (4 + 5))))

Certainly we have modified the original EBNF syntax. This is done to
conform to C++ syntax rules. Most notably we see the abundance of
shift >> operators. Since there are no 'empty' operators in C++, it is
simply not possible to write something like:

    a b

as seen in math syntax, for example, to mean multiplication or, in our case, 
as seen in EBNF syntax to mean sequencing (b should follow a). __qi__ 
uses the shift `>>` operator instead for this purpose. We take the `>>` operator, 
with arrows pointing to the right, to mean "is followed by". Thus we write: 

    a >> b

The alternative operator `|` and the parentheses `()` remain as is. The 
assignment operator `=` is used in place of EBNF's `::=`. Last but not least, 
the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a 
prefix. Instead of: 

    a* //... in EBNF syntax,

we write: 

    *a //... in Spirit.

since there are no postfix stars, `*`, in C/C++. Finally, we terminate each 
rule with the ubiquitous semi-colon, `;`. 


[heading A Quick Overview of Output Generation with __karma__]

Spirit not only allows you to describe the structure of the input, it also enables 
the specification of the output format for your data in a similar way, and based 
on a single syntax and compatible semantics.

Let's assume we need to generate a textual representation from a simple data 
structure such as a `std::vector<int>`. Conventional code probably would look like:

    std::vector<int> v (initialize_and_fill());
    std::vector<int>::iterator end = v.end();
    for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
        std::cout << *it << std::endl;

which is not very flexible and quite difficult to maintain when it comes to 
changing the required output format. Spirit's sublibrary /Karma/ allows you to 
specify output formats for arbitrary data structures in a very flexible way. 
The following snippet is the /Karma/ format description used to create the 
same output as the traditional code above:

    *(int_ << eol)

Here are some more examples of format descriptions for different output 
representations of the same `std::vector<int>`:

[table Different output formats for `std::vector<int>` 
    [ [Format]                         [Example]          [Description] ]
    [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`]      [Comma separated list of integers] ]
    [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`]  [Comma separated list of integers in parenthesis] ]
    [ [`*hex`]                         [`18a`]            [A list of hexadecimal numbers] ]
    [ [`*(double_ << ',')`]            [`1.0,8.0,10.0,`]  [A list of floating point numbers] ]
]

We will see later in this documentation how it is possible to avoid printing 
the trailing `','`.

Overall, the syntax is similar to __qi__ with the exception that we use the `<<` 
operator for output concatenation. This should be easy to understand as it 
follows the conventions used in the Standard's I/O streams. 

Another important feature of __karma__ allows you to fully decouple the data 
type from the output format. You can use the same output format with different 
data types as long as these conform conceptually. The next table gives some 
related examples.

[table Different data types usable with the output format `*(int_ << eol)`
    [ [Data type]                 [Description] ]
    [ [`int i[4]`]                [C style arrays] ]
    [ [`std::vector<int>`]        [Standard vector] ]
    [ [`std::list<int>`]          [Standard list] ]
    [ [`boost::array<long, 20>`]  [Boost array] ]
]

[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section Introduction]
	10
	11	Boost Spirit is an object-oriented, recursive-descent parser and
	12	output generation library for C++. It allows you to write grammars and
	13	format descriptions using a format similar to Extended Backus Naur
	14	Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
	15	ISO-EBNF]] directly in C++. These inline grammar
	16	specifications can mix freely with other C++ code and, thanks to the
	17	generative power of C++ templates, are immediately executable. In
	18	retrospect, conventional compiler-compilers or parser-generators have
	19	to perform an additional translation step from the source EBNF code to
	20	C or C++ code.
	21
	22	The syntax and semantics of the libraries' API directly form domain-specific
	23	embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
	24	user:
	25
	26	* one for creating parser grammars,
	27	* one for the specification of the required tokens to be used for parsing,
	28	* and one for the description of the required output formats.
	29
	30	Since the target input grammars and output formats are written entirely in C++
	31	we do not need any separate tools to compile, preprocess or integrate those
	32	into the build process. __spirit__ allows seamless integration of the parsing
	33	and output generation process with other C++ code. This often allows for
	34	simpler and more efficient code.
	35
	36	Both the created parsers and generators are fully attributed, which allows you
	37	to easily build and handle hierarchical data structures in memory. These data
	38	structures resemble the structure of the input data and can directly be used
	39	to generate arbitrarily-formatted output.
	40
	41	The [link spirit.spiritstructure figure] below depicts the overall structure
	42	of the Boost Spirit library. The library consists of 4 major parts:
	43
	44	* __classic__: This is the almost-unchanged code base taken from the
	45	former Boost Spirit V1.8 distribution. It has been moved into the namespace
	46	boost::spirit::classic. A special compatibility layer has been added to
	47	ensure complete compatibility with existing code using Spirit V1.8.
	48	* __qi__: This is the parser library allowing you to build recursive
	49	descent parsers. The exposed domain-specific language can be used to describe
	50	the grammars to implement, and the rules for storing the parsed information.
	51	* __lex__: This is the library usable to create tokenizers (lexers). The
	52	domain-specific language exposed by __lex__ allows you to define regular
	53	expressions used to match tokens (create token definitions), associate these
	54	regular expressions with code to be executed whenever they are matched, and
	55	to add the token definitions to the lexical analyzer.
	56	* __karma__: This is the generator library allowing you to create code for
	57	recursive descent, data type-driven output formatting. The exposed
	58	domain-specific language is almost equivalent to the parser description language
	59	used in __qi__, except that it is used to describe the required output
	60	format to generate from a given data structure.
	61
	62	[fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
	63
	64
65	The three components, __qi__, __karma__ and __lex__, are designed to be used
66	either stand alone, or together. The general methodology is to use the token
67	sequence generated by __lex__ as the input for a parser generated by __qi__.
68	On the opposite side of the equation, the hierarchical data structures generated
69	by __qi__ are used for the output generators created using __karma__.
70	However, there is nothing to stop you from using any of these components all
71	by themselves.
72
73	The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
74	some input being converted to some internal representation. After some
75	(optional) transformation these data are converted back into some different,
76	external representation. The picture highlights Spirit's place in this data
77	transformation flow.
78
79	[fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
80
81	[heading A Quick Overview of Parsing with __qi__]
82
83	__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
84	target grammar (essentially a format description of the input data to read).
85
86	A simple EBNF grammar snippet:
87
88	group ::= '(' expression ')'
89	factor ::= integer \| group
90	term ::= factor (('' factor) \| ('/' factor))
91	expression ::= term (('+' term) \| ('-' term))*
92
93	is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
94	code snippet:
95
96	group = '(' >> expression >> ')';
97	factor = integer \| group;
98	term = factor >> (('' >> factor) \| ('/' >> factor));
99	expression = term >> *(('+' >> term) \| ('-' >> term));
100
101	Through the magic of expression templates, this is perfectly valid and
102	executable C++ code. The production rule `expression` is, in fact, an object that
103	has a member function `parse` that does the work given a source code written in
104	the grammar that we have just declared. Yes, it's a calculator. We shall
105	simplify for now by skipping the type declarations and the definition of the
106	rule `integer` invoked by `factor`. Now, the production rule `expression` in our
107	grammar specification, traditionally called the `start` symbol, can recognize
108	inputs such as:
109
110	12345
111	-12345
112	+12345
113	1 + 2
114	1 * 2
115	1/2 + 3/4
116	1 + 2 + 3 + 4
117	1 * 2 * 3 * 4
118	(1 + 2) * (3 + 4)
119	(-1 + 2) * (3 + -4)
120	1 + ((6 * 200) - 20) / 6
121	(1 + (2 + (3 + (4 + 5))))
122
123	Certainly we have modified the original EBNF syntax. This is done to
124	conform to C++ syntax rules. Most notably we see the abundance of
125	shift >> operators. Since there are no 'empty' operators in C++, it is
126	simply not possible to write something like:
127
128	a b
129
130	as seen in math syntax, for example, to mean multiplication or, in our case,
131	as seen in EBNF syntax to mean sequencing (b should follow a). __qi__
132	uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
133	with arrows pointing to the right, to mean "is followed by". Thus we write:
134
135	a >> b
136
137	The alternative operator `\|` and the parentheses `()` remain as is. The
138	assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
139	the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a
140	prefix. Instead of:
141
142	a* //... in EBNF syntax,
143
144	we write:
145
146	*a //... in Spirit.
147
148	since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
149	rule with the ubiquitous semi-colon, `;`.
150
151
152	[heading A Quick Overview of Output Generation with __karma__]
153
154	Spirit not only allows you to describe the structure of the input, it also enables
155	the specification of the output format for your data in a similar way, and based
156	on a single syntax and compatible semantics.
157
158	Let's assume we need to generate a textual representation from a simple data
159	structure such as a `std::vector<int>`. Conventional code probably would look like:
160
161	std::vector<int> v (initialize_and_fill());
162	std::vector<int>::iterator end = v.end();
163	for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
164	std::cout << *it << std::endl;
165
166	which is not very flexible and quite difficult to maintain when it comes to
167	changing the required output format. Spirit's sublibrary /Karma/ allows you to
168	specify output formats for arbitrary data structures in a very flexible way.
169	The following snippet is the /Karma/ format description used to create the
170	same output as the traditional code above:
171
172	*(int_ << eol)
173
174	Here are some more examples of format descriptions for different output
175	representations of the same `std::vector<int>`:
176
177	[table Different output formats for `std::vector<int>`
178	[ [Format] [Example] [Description] ]
179	[ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
180	[ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`] [Comma separated list of integers in parenthesis] ]
181	[ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
182	[ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
183	]
184
185	We will see later in this documentation how it is possible to avoid printing
186	the trailing `','`.
187
188	Overall, the syntax is similar to __qi__ with the exception that we use the `<<`
189	operator for output concatenation. This should be easy to understand as it
190	follows the conventions used in the Standard's I/O streams.
191
192	Another important feature of __karma__ allows you to fully decouple the data
193	type from the output format. You can use the same output format with different
194	data types as long as these conform conceptually. The next table gives some
195	related examples.
196
197	[table Different data types usable with the output format `*(int_ << eol)`
198	[ [Data type] [Description] ]
199	[ [`int i[4]`] [C style arrays] ]
200	[ [`std::vector<int>`] [Standard vector] ]
201	[ [`std::list<int>`] [Standard list] ]
202	[ [`boost::array<long, 20>`] [Boost array] ]
203	]
204
205	[endsect]