]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <title>Subrules</title> | |
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | |
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> | |
6 | </head> | |
7 | ||
8 | <body> | |
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> | |
10 | <tr> | |
11 | <td width="10"> | |
12 | </td> | |
13 | <td width="85%"> | |
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Subrules</b></font> | |
15 | </td> | |
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> | |
17 | </tr> | |
18 | </table> | |
19 | <br> | |
20 | <table border="0"> | |
21 | <tr> | |
22 | <td width="10"></td> | |
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
24 | <td width="30"><a href="grammar.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
25 | <td width="30"><a href="semantic_actions.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
26 | </tr> | |
27 | </table> | |
28 | <p>Spirit is implemented using expression templates. This is a very powerful technique. | |
29 | Along with its power comes some complications. We almost take for granted that | |
30 | when we write <tt>i | j >> k</tt> where <tt>i</tt>, <tt>j</tt> and <tt>k</tt> | |
31 | are all integers the result is still an integer. Yet, with expression templates, | |
32 | the same expression <tt>i | j >> k</tt> where <tt>i</tt>, <tt>j</tt> and | |
33 | <tt>k</tt> are of type <tt>T</tt>, the result is a complex composite type [see | |
34 | <a href="basic_concepts.html">Basic Concepts</a>]. Spirit expressions, which | |
35 | are combinations of primitives and composites yield an infinite set of new types. | |
36 | One problem is that C++ offers no easy facility to deduce the type of an arbitrarily | |
37 | complex expression that yields a complex type. Thus, while it is easy to write:</p> | |
38 | <pre><code><font color="#000000"><span class=identifier> </span><span class=keyword>int </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are ints</span></font></code></pre> | |
39 | <p>Expression templates yield an endless supply of types. Without the <a href="rule.html">rule</a>, | |
40 | there is no easy way to do this in C++ if <tt>i</tt>, <tt>j</tt> and <tt>k</tt> | |
41 | are Spirit parsers:</p> | |
42 | <pre><code><font color="#000000"><span class=comment> </span><span class=special><</span><span class=identifier>what_type???</span><span class=special>> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are Spirit parsers</span></font></code></pre> | |
43 | <p>If <tt>i</tt>, <tt>j</tt> and <tt>k</tt> are all <tt>chlit<></tt> objects, | |
44 | the type that we want is:</p> | |
45 | <pre><code><font color="#000000"><span class=comment> </span><span class=keyword>typedef | |
46 | </span><span class=identifier>alternative</span><span class=special>< | |
47 | </span><span class=identifier>chlit</span><span class=special><></span><span class=comment> // i | |
48 | </span><span class=special>,</span> <span class=identifier>sequence</span><span class=special>< | |
49 | </span><span class=identifier>chlit</span><span class=special><> </span><span class=comment>// j | |
50 | </span><span class=special> ,</span><span class=comment> </span><span class=identifier>chlit</span><span class=special><> </span><span class=comment>// k | |
51 | </span><span class=special>> | |
52 | > | |
53 | </span><span class=identifier>rule_t</span><span class=special>; | |
54 | ||
55 | </span><span class=identifier>rule_t r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are chlit<> objects</span></font></code></pre> | |
56 | <p>We deliberately formatted the type declaration nicely to make it understandable. | |
57 | Try that with a more complex expression. While it can be done, explicitly spelling | |
58 | out the type of a Spirit expression template is tedious and error prone. The | |
59 | right hand side (rhs) has to mirror the type of the left hand side (lhs). (<img src="theme/lens.gif" width="15" height="16"> | |
60 | Yet, if you still wish to do it, see this <a href="techniques.html#no_rules">link</a> | |
61 | for a technique). </p> | |
62 | <table width="80%" border="0" align="center"> | |
63 | <tr> | |
64 | <td class="note_box"><p><img src="theme/lens.gif" width="15" height="16"><b> | |
65 | typeof and auto</b> <br> | |
66 | <br> | |
67 | Some compilers already support the <tt>typeof</tt> keyword. This can be | |
68 | used to free us from having to explicitly type the type (pun intentional). | |
69 | Using the <tt>typeof</tt>, we can rewrite the Spirit expression above | |
70 | as:<br> | |
71 | <br> | |
72 | <span class="keyword"><code>typeof</code><code></code></span><code><span class=special>(</span><span class=identifier>i | |
73 | </span><span class=special>| </span><span class=identifier>j </span><span class=special>>> | |
74 | </span><span class=identifier>k</span><span class=special>) </span><span class=identifier>r | |
75 | </span><span class=special>= </span><span class=identifier>i </span><span class=special>| | |
76 | </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>;</span></code><br> | |
77 | <br> | |
78 | While this is better than having to explicitly declare a complex type, | |
79 | it is redundant, error prone and still an eye sore. The expression is | |
80 | typed twice. The only way to simplify this is to introduce a macro (See | |
81 | this <a href="techniques.html#typeof">link</a> for more information).<br> | |
82 | <br> | |
83 | <a href="http://www.boost-consulting.com">David Abrahams</a> proposed | |
84 | in comp.std.c++ to reuse the <tt>auto</tt> keyword for type deduced variables. | |
85 | This has been extensibly discussed in <a href="http://www.boost.org">boost.org</a>. Example: | |
86 | <br> | |
87 | <br> | |
88 | <span class=keyword><code>auto </code></span><code><span class=identifier>r | |
89 | </span><span class=special>= </span><span class=identifier>i </span><span class=special>| | |
90 | </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>;</span></code><br> | |
91 | <br> | |
92 | Once such a C++ extension is accepted into the standard, this would be | |
93 | a neat solution and a nice fit for our purpose. It's not a complete solution | |
94 | though since there are still situations where we do not know the rhs beforehand; | |
95 | for instance when pre-declaring cyclic dependent rules.</p> | |
96 | </td> | |
97 | </tr> | |
98 | </table> | |
99 | <p>Fortunately, rules come to the rescue. Rules can capture the type of the expression | |
100 | assigned to it. Thus:</p> | |
101 | <pre><code><font color="#000000"> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>>> </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are chlit<> objects</span></font></code></pre> | |
102 | <p>It might not be apparent but behind the scenes, plain rules are actually implemented | |
103 | using a pointer to a runtime polymorphic abstract class that holds the dynamic | |
104 | type of the parser assigned to it. When a Spirit expression is assigned to a | |
105 | rule, its type is encapsulated in a concrete subclass of the abstract class. | |
106 | A virtual parse function delegates the parsing to the encapsulated object.</p> | |
107 | <p>Rules have drawbacks though:</p> | |
108 | <p><img src="theme/bullet.gif" width="12" height="12"> It is coupled to a specific | |
109 | scanner type. The rule is tied to a specific scanner [see <a href="faq.html#scanner_business">The | |
110 | Scanner Business</a>].<br> | |
111 | <img src="theme/bullet.gif" width="12" height="12"> The rule's parse member | |
112 | function has a virtual function call overhead that cannot be inlined.</p> | |
113 | <h2>Static rules: subrules</h2> | |
114 | <p>The subrule is a fully static version of the rule. The subrule does not have | |
115 | the drawbacks listed above. </p> | |
116 | <p><img src="theme/bullet.gif" width="12" height="12"> The subrule is not tied | |
117 | to a specific scanner so just about any scanner type may be used<br> | |
118 | <img src="theme/bullet.gif" width="12" height="12"> The subrule also allows | |
119 | aggressive inlining since there are no virtual function calls</p> | |
120 | <pre><code><font color="#000000"><span class=identifier> </span><span class=keyword>template</span><span class=special><</span><span class=keyword>int </span></font><span class="identifier">ID</span><font color="#000000"><span class=special>, </span><span class=keyword>typename </span><span class=identifier>ContextT </span><span class=special>= </span><span class=identifier>parser_context</span><span class=special><></span> <span class=special>> | |
121 | </span><span class=keyword>class </span><span class=identifier>subrule</span><span class=special>;</span></font></code></pre> | |
122 | <p>The first template parameter gives the subrule an identification tag. Like | |
123 | the <a href="rule.html">rule</a>, there is a ContextT template parameter that | |
124 | defaults to <code><tt>parser_context</tt></code>. You need not be concerned | |
125 | at all with the <tt>ContextT</tt> template parameter unless you wish to tweak | |
126 | the low level behavior of the subrule. Detailed information on the <tt>ContextT</tt> | |
127 | template parameter is provided <a href="indepth_the_parser_context.html">elsewhere</a>. | |
128 | </p> | |
129 | <p>Presented above is the public API. There may actually be more template parameters | |
130 | after <tt>ContextT</tt>. Everything after the <tt>ContextT</tt> parameter should | |
131 | not be of concern to the client and are strictly for internal use only.</p> | |
132 | <p>Apart from a few minor differences, the subrule follows the usage and syntax | |
133 | of the rule closely. Here's the calculator grammar using subrules:</p> | |
134 | <pre><code><font color="#000000"><span class=comment> </span><span class=keyword>struct </span><span class=identifier>calculator </span><span class=special>: </span><span class=keyword>public </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>calculator</span><span class=special>> | |
135 | </span><span class=special>{ | |
136 | </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> | |
137 | </span><span class=keyword>struct </span><span class=identifier>definition | |
138 | </span><span class=special>{ | |
139 | </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>calculator </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>self</span><span class=special>) | |
140 | </span><span class=special>{ | |
141 | </span><span class=identifier>first </span><span class=special>= | |
142 | </span><span class=special>( | |
143 | </span><span class=identifier>expression </span><span class=special>= </span><span class=identifier>term </span><span class=special>>> </span><span class=special>*((</span><span class=literal>'+' </span><span class=special>>> </span><span class=identifier>term</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'-' </span><span class=special>>> </span><span class=identifier>term</span><span class=special>)), | |
144 | </span><span class=identifier>term </span><span class=special>= </span><span class=identifier>factor </span><span class=special>>> </span><span class=special>*((</span><span class=literal>'*' </span><span class=special>>> </span><span class=identifier>factor</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'/' </span><span class=special>>> </span><span class=identifier>factor</span><span class=special>)), | |
145 | </span><span class=identifier>factor </span><span class=special>= </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>group</span><span class=special>, | |
146 | </span><span class=identifier>group </span><span class=special>= </span><span class=literal>'(' </span><span class=special>>> </span><span class=identifier>expression </span><span class=special>>> </span><span class=literal>')' | |
147 | </span><span class=special>); | |
148 | </span><span class=special>} | |
149 | ||
150 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>expression</span><span class=special>; | |
151 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>1</span><span class=special>> </span><span class=identifier>term</span><span class=special>; | |
152 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>2</span><span class=special>> </span><span class=identifier>factor</span><span class=special>; | |
153 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>3</span><span class=special>> </span><span class=identifier>group</span><span class=special>; | |
154 | ||
155 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=identifier>first</span><span class=special>; | |
156 | </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& | |
157 | </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>first</span><span class=special>; </span><span class=special>} | |
158 | </span><span class=special>}; | |
159 | </span><span class=special>};</span></font></code></pre> | |
160 | <p><img src="theme/lens.gif" width="15" height="16"> A fully working example with | |
161 | <a href="semantic_actions.html">semantic actions</a> can be <a href="../example/fundamental/subrule_calc.cpp">viewed | |
162 | here</a>. This is part of the Spirit distribution. </p> | |
163 | <table border="0" align="left"> | |
164 | <tr> | |
165 | <td width="199"><img src="theme/subrule1.png" width="234" height="224"></td> | |
166 | <td width="2"></td> | |
167 | </tr> | |
168 | </table> | |
169 | <p>The subrule as an efficient version of the rule. Compiler optimizations such | |
170 | as aggressive inlining help reduce the code size and increase performance significantly. | |
171 | </p> | |
172 | <p>The subrule is not a panacea however. Subrules push the C++ compiler hard to | |
173 | its knees. For example, current compilers have a limit on recursion depth that | |
174 | may not be exceeded. Don't even think about writing a full pascal grammar using | |
175 | subrules alone. A grammar using subrules is a single C++ expression. Current | |
176 | C++ compilers cannot handle very complex expressions very well. Finally, a plain | |
177 | rule is still needed to act as place holder for subrules.</p> | |
178 | <p>The code above is a good example of the recommended way to use subrules. Notice | |
179 | the hierarchy. We have a grammar that encapsulates the whole calculator. The | |
180 | start rule is a plain rule that holds the set of subrules. The subrules in turn | |
181 | defines the actual details of the grammar.</p> | |
182 | <table width="80%" border="0" align="center"> | |
183 | <tr> | |
184 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"><b> | |
185 | Template instantiation depth</b> <br> <br> | |
186 | Spirit pushes the C++ compiler hard. Current C++ compilers cannot handle | |
187 | very complex heavily nested expressions very well. One restricting factor | |
188 | is the typical compiler's limit on template recursion depth. Some, but not | |
189 | all, compilers allow this limit to be configured.<br> | |
190 | <br> | |
191 | g++'s maximum can be set using a compiler flag: -ftemplate-depth. Set this | |
192 | appropriately if you have a relatively complex grammar.<br> | |
193 | <br> | |
194 | Microsoft Visual C++ can take greater than 1000 for both template class | |
195 | and function instantiation depths. However, the linker chokes with deep | |
196 | template function instantiation unless inline recursion depth is set using | |
197 | these pragmas:<br> | |
198 | <br> | |
199 | <span class="preprocessor">#pragma</span> inline_depth<span class="special">(</span>255<span class="special">)</span><br> | |
200 | <span class="preprocessor">#pragma</span> inline_recursion<span class="special">(</span>on<span class="special">)<br> | |
201 | <br> | |
202 | </span>Perhaps this limitations no longer applies to more current versions | |
203 | of these compilers. Be sure to check your compiler documentation.</td> | |
204 | </tr> | |
205 | </table> | |
206 | <p>This setup gives a good balance. The subrules do all the work. Each grammar | |
207 | will have only one rule: <tt>first</tt>. The rule is used just to hold the subrules | |
208 | and make them visible to the grammar. </p> | |
209 | <h3>The subrule definition</h3> | |
210 | <p>Like the rule, the expression after assignment operator <tt>=</tt> defines | |
211 | the subrule:</p> | |
212 | <pre> <span class=identifier>identifier </span><span class=special>= </span><span class=identifier>expression</span></pre> | |
213 | <p>Unlike rules, subrules may be defined only once. Redefining a subrule is illegal | |
214 | and will result to a compile time assertion.</p> | |
215 | <h3>Separators [ , ]</h3> | |
216 | <p>While rules are terminated by the semicollon <tt>';'</tt>. Subrules are not | |
217 | terminated but are separated by the comma: <tt>','</tt>. Like Pascal statements, | |
218 | the last subrule in a group may not have a trailing comma.</p> | |
219 | <pre><span class=identifier> </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>), | |
220 | </span><span class=identifier>b </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'b'</span><span class=special>), | |
221 | </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>), </span><span class=comment>// BAD, trailing comma</span><code><font color="#000000"><font color="#800000"><i></i></font></font></code><code><font color="#000000"><font color="#800000"><i></i></font></font><i></i></code></pre> | |
222 | <p> | |
223 | <pre><code><span class=comment> </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>), | |
224 | </span><span class=identifier>b </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'b'</span><span class=special>), | |
225 | </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>) </span><span class=comment>// OK</span></code></pre> | |
226 | <h3> The start subrule</h3> | |
227 | <p>Unlike rules, parsing proceeds from the start subrule. The first (topmost) | |
228 | subrule in a group of subrules is called the <b>start subrule</b>. In our example | |
229 | above, <tt>expression</tt> is the start subrule. When a group of subrules is | |
230 | called forth, the start subrule <tt>expression</tt> is called first.</p> | |
231 | <h3>IDs</h3> | |
232 | <p>Each subrule has a corresponding ID; an integral constant that uniquely specifies | |
233 | the subrule. Our example above has four subrules. They are declared as:</p> | |
234 | <pre><code><span class=comment> </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>expression</span><span class=special>; | |
235 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>1</span><span class=special>> </span><span class=identifier>term</span><span class=special>; | |
236 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>2</span><span class=special>> </span><span class=identifier>factor</span><span class=special>; | |
237 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>3</span><span class=special>> </span><span class=identifier>group</span><span class=special>;</span></code></pre> | |
238 | <h3> Aliases</h3> | |
239 | <p>It is possible to have subrules with similar IDs. A subrule with a similar | |
240 | ID to will be an alias of the other. Both subrules may be used interchangeably.</p> | |
241 | <pre><code><span class=special> </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>a</span><span class=special>; | |
242 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>alias</span><span class=special>; </span><span class=comment>// alias of a</span></code></pre> | |
243 | <h3>Groups: scope and nesting</h3> | |
244 | <p>The scope of a subrule and its definition is the enclosing group, typically | |
245 | (and by convention) enclosed inside the parentheses. IDs outside a scope are | |
246 | not directly visible. Inner subrule groups can be nested by enclosing each sub-group | |
247 | inside another set of parentheses. Each group is unique and acts independently. | |
248 | Consequently, while it may not be advisable to do so, a subrule in a group may | |
249 | share the same ID as a subrule in another group since both groups are independent | |
250 | of each other.</p> | |
251 | <pre><code><span class=comment> </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>a</span><span class=special>; | |
252 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>1</span><span class=special>> </span><span class=identifier>b</span><span class=special>; | |
253 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>0</span><span class=special>> </span><span class=identifier>c</span><span class=special>; | |
254 | </span><span class=identifier>subrule</span><span class=special><</span><span class=number>1</span><span class=special>> </span><span class=identifier>d</span><span class=special>; | |
255 | ||
256 | </span><span class=special>( </span><span class=comment>// outer subrule group, scope of a and b | |
257 | </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>), | |
258 | </span><span class=identifier>b </span><span class=special>= | |
259 | </span><span class=special>( </span><span class=comment>// inner subrule group, scope of b and c | |
260 | </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>), | |
261 | </span><span class=identifier>d </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'d'</span><span class=special>) | |
262 | </span><span class=special>) | |
263 | </span><span class=special>)</span></code></pre> | |
264 | <p>Subrule IDs need to be unique only within a group. A grammar is an implicit | |
265 | group. Furthermore, even subrules in a grammar may have the same IDs without | |
266 | clashing if they are inside a group. Subrules may be explicitly grouped using | |
267 | the parentheses. Parenthesized groups have unique scopes. In the code above, | |
268 | the outer subrule group defines the subrules <tt>a</tt> and <tt>b</tt> while | |
269 | the inner subrule group defines the subrules <tt>c</tt> and <tt>d</tt>. Notice | |
270 | that the definition of <tt>b</tt> is the inner subrule.</p> | |
271 | <table border="0"> | |
272 | <tr> | |
273 | <td width="10"></td> | |
274 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
275 | <td width="30"><a href="grammar.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
276 | <td width="30"><a href="semantic_actions.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
277 | </tr> | |
278 | </table> | |
279 | <br> | |
280 | <hr size="1"> | |
281 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> | |
282 | <br> | |
283 | <font size="2">Use, modification and distribution is subject to the Boost Software | |
284 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | |
285 | http://www.boost.org/LICENSE_1_0.txt)</font></p> | |
286 | <p> </p> | |
287 | <p><code><font color="#000000"><font color="#0000ff"></font></font></code></p> | |
288 | </body> | |
289 | </html> |