[ceph.git] / ceph / src / boost / libs / spirit / doc / qi / roman.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section Roman Numerals]

This example demonstrates:

* symbol table
* rule
* grammar

[heading Symbol Table]

The symbol table holds a dictionary of symbols where each symbol is a sequence
of characters (a `char`, `wchar_t`, `int`, enumeration etc.) . The template
class, parameterized by the character type, can work efficiently with 8, 16, 32
and even 64 bit characters. Mutable data of type T are associated with each
symbol.

Traditionally, symbol table management is maintained separately outside the BNF
grammar through semantic actions. Contrary to standard practice, the Spirit
symbol table class `symbols` is a parser. An object of which may be used
anywhere in the EBNF grammar specification. It is an example of a dynamic
parser. A dynamic parser is characterized by its ability to modify its behavior
at run time. Initially, an empty symbols object matches nothing. At any time,
symbols may be added or removed, thus, dynamically altering its behavior.

Each entry in a symbol table has an associated mutable data slot. In this
regard, one can view the symbol table as an associative container (or map) of
key-value pairs where the keys are strings.

The symbols class expects two template parameters. The first parameter specifies
the character type of the symbols. The second specifies the data type associated
with each symbol: its attribute.

Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
mind that the data associated with each slot is the parser's attribute (which is
passed to attached semantic actions).

[import ../../example/qi/roman.cpp]

[tutorial_roman_hundreds]

Here's a parser for roman tens (10..90):

[tutorial_roman_tens]

and, finally, for ones (1..9):

[tutorial_roman_ones]

Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
They are all parsers.

[heading Rules]

Up until now, we've been inlining our parser expressions, passing them directly
to the `phrase_parse` function. The expression evaluates into a temporary,
unnamed parser which is passed into the `phrase_parse` function, used, and then
destroyed. This is fine for small parsers. When the expressions get complicated,
you'd want to break the expressions into smaller easier-to-understand pieces,
name them, and refer to them from other parser expressions by name.

A parser expression can be assigned to what is called a "rule". There are
various ways to declare rules. The simplest form is:

    rule<Iterator> r;

At the very least, the rule needs to know the iterator type it will be working
on. This rule cannot be used with `phrase_parse`. It can only be used with the
`parse` function -- a version that does not do white space skipping (does not
have the skipper argument). If you want to have it skip white spaces, you need
to pass in the type skip parser, as in the next form:

    rule<Iterator, Skipper> r;

Example:

    rule<std::string::iterator, space_type> r;

This type of rule can be used for both `phrase_parse` and `parse`.

For our next example, there's one more rule form you should know about:

    rule<Iterator, Signature> r;

or

    rule<Iterator, Signature, Skipper> r;

[tip All rule template arguments after Iterator can be supplied in any order.]

The Signature specifies the attributes of the rule. You've seen that our parsers
can have an attribute. Recall that the `double_` parser has an attribute of
`double`. To be precise, these are /synthesized/ attributes. The parser
"synthesizes" the attribute value. Think of them as function return values.

There's another type of attribute called "inherited" attribute. We won't need
them for now, but it's good that you be aware of such attributes. You can think
of them as function arguments. And, rightly so, the rule signature is a function
signature of the form:

    result(argN, argN,..., argN)

After having declared a rule, you can now assign any parser expression to it.
Example:

    r = double_ >> *(',' >> double_);

[heading Grammars]

A grammar encapsulates one or more rules. It has the same template parameters as
the rule. You declare a grammar by:

# deriving a struct (or class) from the `grammar` class template
# declare one or more rules as member variables
# initialize the base grammar class by giving it the start rule (its the first
  rule that gets called when the grammar starts parsing)
# initialize your rules in your constructor

The roman numeral grammar is a very nice and simple example of a grammar:

[tutorial_roman_grammar]

Things to take notice of:

* The grammar and start rule signature is `unsigned()`. It has a synthesized
  attribute (return value) of type `unsigned` with no inherited attributes
  (arguments).

* We did not specify a skip-parser. We don't want to skip in between the
  numerals.

* `roman::base_type` is a typedef for `grammar<Iterator, unsigned()>`. If
   `roman` was not a template, you could simply write: base_type(start)

* It's best to make your grammar templates such that they can be reused
  for different iterator types.

* `_val` is another __phoenix__ placeholder representing the rule's synthesized
  attribute.

* `eps` is a special spirit parser that consumes no input but is always
  successful. We use it to initialize `_val`, the rule's synthesized
  attribute, to zero before anything else. The actual parser starts at
  `+lit('M')`, parsing roman thousands. Using `eps` this way is good
  for doing pre and post initializations.

* The expression `a || b` reads: match a or b and in sequence. That is, if both
  `a` and `b` match, it must be in sequence; this is equivalent to `a >> -b | b`,
  but more efficient.

[heading Let's Parse!]

[tutorial_roman_grammar_parse]

`roman_parser` is an object of type `roman`, our roman numeral parser. This time
around we are using the no-skipping version of the parse functions. We do not
want to skip any spaces! We are also passing in an attribute, `unsigned result`,
which will receive the parsed value.

The full cpp file for this example can be found here: [@../../example/qi/roman.cpp]


[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section Roman Numerals]
	10
	11	This example demonstrates:
	12
	13	* symbol table
	14	* rule
	15	* grammar
	16
	17	[heading Symbol Table]
	18
	19	The symbol table holds a dictionary of symbols where each symbol is a sequence
	20	of characters (a `char`, `wchar_t`, `int`, enumeration etc.) . The template
	21	class, parameterized by the character type, can work efficiently with 8, 16, 32
	22	and even 64 bit characters. Mutable data of type T are associated with each
	23	symbol.
	24
	25	Traditionally, symbol table management is maintained separately outside the BNF
	26	grammar through semantic actions. Contrary to standard practice, the Spirit
	27	symbol table class `symbols` is a parser. An object of which may be used
	28	anywhere in the EBNF grammar specification. It is an example of a dynamic
	29	parser. A dynamic parser is characterized by its ability to modify its behavior
	30	at run time. Initially, an empty symbols object matches nothing. At any time,
	31	symbols may be added or removed, thus, dynamically altering its behavior.
	32
	33	Each entry in a symbol table has an associated mutable data slot. In this
	34	regard, one can view the symbol table as an associative container (or map) of
	35	key-value pairs where the keys are strings.
	36
	37	The symbols class expects two template parameters. The first parameter specifies
	38	the character type of the symbols. The second specifies the data type associated
	39	with each symbol: its attribute.
	40
	41	Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
	42	mind that the data associated with each slot is the parser's attribute (which is
	43	passed to attached semantic actions).
	44
	45	[import ../../example/qi/roman.cpp]
	46
	47	[tutorial_roman_hundreds]
	48
	49	Here's a parser for roman tens (10..90):
	50
	51	[tutorial_roman_tens]
	52
	53	and, finally, for ones (1..9):
	54
	55	[tutorial_roman_ones]
	56
	57	Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
	58	They are all parsers.
	59
	60	[heading Rules]
	61
	62	Up until now, we've been inlining our parser expressions, passing them directly
	63	to the `phrase_parse` function. The expression evaluates into a temporary,
	64	unnamed parser which is passed into the `phrase_parse` function, used, and then
65	destroyed. This is fine for small parsers. When the expressions get complicated,
66	you'd want to break the expressions into smaller easier-to-understand pieces,
67	name them, and refer to them from other parser expressions by name.
68
69	A parser expression can be assigned to what is called a "rule". There are
70	various ways to declare rules. The simplest form is:
71
72	rule<Iterator> r;
73
74	At the very least, the rule needs to know the iterator type it will be working
75	on. This rule cannot be used with `phrase_parse`. It can only be used with the
76	`parse` function -- a version that does not do white space skipping (does not
77	have the skipper argument). If you want to have it skip white spaces, you need
78	to pass in the type skip parser, as in the next form:
79
80	rule<Iterator, Skipper> r;
81
82	Example:
83
84	rule<std::string::iterator, space_type> r;
85
86	This type of rule can be used for both `phrase_parse` and `parse`.
87
88	For our next example, there's one more rule form you should know about:
89
90	rule<Iterator, Signature> r;
91
92	or
93
94	rule<Iterator, Signature, Skipper> r;
95
96	[tip All rule template arguments after Iterator can be supplied in any order.]
97
98	The Signature specifies the attributes of the rule. You've seen that our parsers
99	can have an attribute. Recall that the `double_` parser has an attribute of
100	`double`. To be precise, these are /synthesized/ attributes. The parser
101	"synthesizes" the attribute value. Think of them as function return values.
102
103	There's another type of attribute called "inherited" attribute. We won't need
104	them for now, but it's good that you be aware of such attributes. You can think
105	of them as function arguments. And, rightly so, the rule signature is a function
106	signature of the form:
107
108	result(argN, argN,..., argN)
109
110	After having declared a rule, you can now assign any parser expression to it.
111	Example:
112
113	r = double_ >> *(',' >> double_);
114
115	[heading Grammars]
116
117	A grammar encapsulates one or more rules. It has the same template parameters as
118	the rule. You declare a grammar by:
119
120	# deriving a struct (or class) from the `grammar` class template
121	# declare one or more rules as member variables
122	# initialize the base grammar class by giving it the start rule (its the first
123	rule that gets called when the grammar starts parsing)
124	# initialize your rules in your constructor
125
126	The roman numeral grammar is a very nice and simple example of a grammar:
127
128	[tutorial_roman_grammar]
129
130	Things to take notice of:
131
132	* The grammar and start rule signature is `unsigned()`. It has a synthesized
133	attribute (return value) of type `unsigned` with no inherited attributes
134	(arguments).
135
136	* We did not specify a skip-parser. We don't want to skip in between the
137	numerals.
138
139	* `roman::base_type` is a typedef for `grammar<Iterator, unsigned()>`. If
140	`roman` was not a template, you could simply write: base_type(start)
141
142	* It's best to make your grammar templates such that they can be reused
143	for different iterator types.
144
145	* `_val` is another __phoenix__ placeholder representing the rule's synthesized
146	attribute.
147
148	* `eps` is a special spirit parser that consumes no input but is always
149	successful. We use it to initialize `_val`, the rule's synthesized
150	attribute, to zero before anything else. The actual parser starts at
151	`+lit('M')`, parsing roman thousands. Using `eps` this way is good
152	for doing pre and post initializations.
153
154	* The expression `a \|\| b` reads: match a or b and in sequence. That is, if both
155	`a` and `b` match, it must be in sequence; this is equivalent to `a >> -b \| b`,
156	but more efficient.
157
158	[heading Let's Parse!]
159
160	[tutorial_roman_grammar_parse]
161
162	`roman_parser` is an object of type `roman`, our roman numeral parser. This time
163	around we are using the no-skipping version of the parse functions. We do not
164	want to skip any spaces! We are also passing in an attribute, `unsigned result`,
165	which will receive the parsed value.
166
167	The full cpp file for this example can be found here: [@../../example/qi/roman.cpp]
168
169
170	[endsect]