]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section Roman Numerals] | |
10 | ||
11 | This example demonstrates: | |
12 | ||
13 | * symbol table | |
14 | * rule | |
15 | * grammar | |
16 | ||
17 | [heading Symbol Table] | |
18 | ||
19 | The symbol table holds a dictionary of symbols where each symbol is a sequence | |
20 | of characters (a `char`, `wchar_t`, `int`, enumeration etc.) . The template | |
21 | class, parameterized by the character type, can work efficiently with 8, 16, 32 | |
22 | and even 64 bit characters. Mutable data of type T are associated with each | |
23 | symbol. | |
24 | ||
25 | Traditionally, symbol table management is maintained separately outside the BNF | |
26 | grammar through semantic actions. Contrary to standard practice, the Spirit | |
27 | symbol table class `symbols` is a parser. An object of which may be used | |
28 | anywhere in the EBNF grammar specification. It is an example of a dynamic | |
29 | parser. A dynamic parser is characterized by its ability to modify its behavior | |
30 | at run time. Initially, an empty symbols object matches nothing. At any time, | |
31 | symbols may be added or removed, thus, dynamically altering its behavior. | |
32 | ||
33 | Each entry in a symbol table has an associated mutable data slot. In this | |
34 | regard, one can view the symbol table as an associative container (or map) of | |
35 | key-value pairs where the keys are strings. | |
36 | ||
37 | The symbols class expects two template parameters. The first parameter specifies | |
38 | the character type of the symbols. The second specifies the data type associated | |
39 | with each symbol: its attribute. | |
40 | ||
41 | Here's a parser for roman hundreds (100..900) using the symbol table. Keep in | |
42 | mind that the data associated with each slot is the parser's attribute (which is | |
43 | passed to attached semantic actions). | |
44 | ||
45 | [import ../../example/qi/roman.cpp] | |
46 | ||
47 | [tutorial_roman_hundreds] | |
48 | ||
49 | Here's a parser for roman tens (10..90): | |
50 | ||
51 | [tutorial_roman_tens] | |
52 | ||
53 | and, finally, for ones (1..9): | |
54 | ||
55 | [tutorial_roman_ones] | |
56 | ||
57 | Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions. | |
58 | They are all parsers. | |
59 | ||
60 | [heading Rules] | |
61 | ||
62 | Up until now, we've been inlining our parser expressions, passing them directly | |
63 | to the `phrase_parse` function. The expression evaluates into a temporary, | |
64 | unnamed parser which is passed into the `phrase_parse` function, used, and then | |
65 | destroyed. This is fine for small parsers. When the expressions get complicated, | |
66 | you'd want to break the expressions into smaller easier-to-understand pieces, | |
67 | name them, and refer to them from other parser expressions by name. | |
68 | ||
69 | A parser expression can be assigned to what is called a "rule". There are | |
70 | various ways to declare rules. The simplest form is: | |
71 | ||
72 | rule<Iterator> r; | |
73 | ||
74 | At the very least, the rule needs to know the iterator type it will be working | |
75 | on. This rule cannot be used with `phrase_parse`. It can only be used with the | |
76 | `parse` function -- a version that does not do white space skipping (does not | |
77 | have the skipper argument). If you want to have it skip white spaces, you need | |
78 | to pass in the type skip parser, as in the next form: | |
79 | ||
80 | rule<Iterator, Skipper> r; | |
81 | ||
82 | Example: | |
83 | ||
84 | rule<std::string::iterator, space_type> r; | |
85 | ||
86 | This type of rule can be used for both `phrase_parse` and `parse`. | |
87 | ||
88 | For our next example, there's one more rule form you should know about: | |
89 | ||
90 | rule<Iterator, Signature> r; | |
91 | ||
92 | or | |
93 | ||
94 | rule<Iterator, Signature, Skipper> r; | |
95 | ||
96 | [tip All rule template arguments after Iterator can be supplied in any order.] | |
97 | ||
98 | The Signature specifies the attributes of the rule. You've seen that our parsers | |
99 | can have an attribute. Recall that the `double_` parser has an attribute of | |
100 | `double`. To be precise, these are /synthesized/ attributes. The parser | |
101 | "synthesizes" the attribute value. Think of them as function return values. | |
102 | ||
103 | There's another type of attribute called "inherited" attribute. We won't need | |
104 | them for now, but it's good that you be aware of such attributes. You can think | |
105 | of them as function arguments. And, rightly so, the rule signature is a function | |
106 | signature of the form: | |
107 | ||
108 | result(argN, argN,..., argN) | |
109 | ||
110 | After having declared a rule, you can now assign any parser expression to it. | |
111 | Example: | |
112 | ||
113 | r = double_ >> *(',' >> double_); | |
114 | ||
115 | [heading Grammars] | |
116 | ||
117 | A grammar encapsulates one or more rules. It has the same template parameters as | |
118 | the rule. You declare a grammar by: | |
119 | ||
120 | # deriving a struct (or class) from the `grammar` class template | |
121 | # declare one or more rules as member variables | |
122 | # initialize the base grammar class by giving it the start rule (its the first | |
123 | rule that gets called when the grammar starts parsing) | |
124 | # initialize your rules in your constructor | |
125 | ||
126 | The roman numeral grammar is a very nice and simple example of a grammar: | |
127 | ||
128 | [tutorial_roman_grammar] | |
129 | ||
130 | Things to take notice of: | |
131 | ||
132 | * The grammar and start rule signature is `unsigned()`. It has a synthesized | |
133 | attribute (return value) of type `unsigned` with no inherited attributes | |
134 | (arguments). | |
135 | ||
136 | * We did not specify a skip-parser. We don't want to skip in between the | |
137 | numerals. | |
138 | ||
139 | * `roman::base_type` is a typedef for `grammar<Iterator, unsigned()>`. If | |
140 | `roman` was not a template, you could simply write: base_type(start) | |
141 | ||
142 | * It's best to make your grammar templates such that they can be reused | |
143 | for different iterator types. | |
144 | ||
145 | * `_val` is another __phoenix__ placeholder representing the rule's synthesized | |
146 | attribute. | |
147 | ||
148 | * `eps` is a special spirit parser that consumes no input but is always | |
149 | successful. We use it to initialize `_val`, the rule's synthesized | |
150 | attribute, to zero before anything else. The actual parser starts at | |
151 | `+lit('M')`, parsing roman thousands. Using `eps` this way is good | |
152 | for doing pre and post initializations. | |
153 | ||
154 | * The expression `a || b` reads: match a or b and in sequence. That is, if both | |
155 | `a` and `b` match, it must be in sequence; this is equivalent to `a >> -b | b`, | |
156 | but more efficient. | |
157 | ||
158 | [heading Let's Parse!] | |
159 | ||
160 | [tutorial_roman_grammar_parse] | |
161 | ||
162 | `roman_parser` is an object of type `roman`, our roman numeral parser. This time | |
163 | around we are using the no-skipping version of the parse functions. We do not | |
164 | want to skip any spaces! We are also passing in an attribute, `unsigned result`, | |
165 | which will receive the parsed value. | |
166 | ||
167 | The full cpp file for this example can be found here: [@../../example/qi/roman.cpp] | |
168 | ||
169 | ||
170 | [endsect] |