]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <title>The Lazy Parsers</title> | |
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | |
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> | |
6 | </head> | |
7 | ||
8 | <body> | |
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> | |
10 | <tr> | |
11 | <td width="10"> | |
12 | </td> | |
13 | <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>The | |
14 | Lazy Parser</b></font></td> | |
15 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> | |
16 | </tr> | |
17 | </table> | |
18 | <br> | |
19 | <table border="0"> | |
20 | <tr> | |
21 | <td width="10"></td> | |
22 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
23 | <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
24 | <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
25 | </tr> | |
26 | </table> | |
27 | <p>Closures are cool. It allows us to inject stack based local variables anywhere | |
28 | in our parse descent hierarchy. Typically, we store temporary variables, generated | |
29 | by our semantic actions, in our closure variables, as a means to pass information | |
30 | up and down the recursive descent.</p> | |
31 | <p>Now imagine this... Having in mind that closure variables can be just about | |
32 | any type, we can store a parser, a rule, or a pointer to a parser or rule, in | |
33 | a closure variable. <em>Yeah, right, so what?...</em> Ok, hold on... What if | |
34 | we can use this closure variable to initiate a parse? Think about it for a second. | |
35 | Suddenly we'll have some powerful dynamic parsers! Suddenly we'll have a full | |
36 | round trip from to <a href="../phoenix/index.html">Phoenix</a> and Spirit and | |
37 | back! <a href="../phoenix/index.html">Phoenix</a> semantic actions choose the | |
38 | right Spirit parser and Spirit parsers choose the right <a href="../phoenix/index.html">Phoenix</a> | |
39 | semantic action. Oh MAN, what a honky cool idea, I might say!!</p> | |
40 | <h2>lazy_p</h2> | |
41 | <p>This is the idea behind the <tt>lazy_p</tt> parser. The <tt>lazy_p</tt> syntax | |
42 | is:</p> | |
43 | <pre> lazy_p<span class="special">(</span>actor<span class="special">)</span></pre> | |
44 | <p>where actor is a <a href="../phoenix/index.html">Phoenix</a> expression that | |
45 | returns a Spirit parser. This returned parser is used in the parsing process. | |
46 | </p> | |
47 | <p>Example: </p> | |
48 | <pre> lazy_p<span class="special">(</span>phoenix<span class="special">::</span>val<span class="special">(</span>int_p<span class="special">))[</span>assign_a<span class="special">(</span>result<span class="special">)]</span> | |
49 | </pre> | |
50 | <p>Semantic actions attached to the <tt>lazy_p</tt> parser expects the same signature | |
51 | as that of the returned parser (<tt>int_p</tt>, in our example above).</p> | |
52 | <h2>lazy_p example</h2> | |
53 | <p>To give you a better glimpse (see the <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt>), | |
54 | say you want to parse inputs such as:</p> | |
55 | <pre> <span class=identifier>dec | |
56 | </span><span class="special">{</span><span class=identifier><br> 1 2 3<br> bin | |
57 | </span><span class="special">{</span><span class=identifier><br> 1 10 11<br> </span><span class="special">}</span><span class=identifier><br> 4 5 6<br> </span><span class="special">}</span></pre> | |
58 | <p>where <tt>bin {...}</tt> and <tt>dec {...}</tt> specifies the numeric format | |
59 | (binary or decimal) that we are expecting to read. If we analyze the input, | |
60 | we want a grammar like:</p> | |
61 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base </span><span class="special">=</span><span class=identifier> </span><span class="string">"bin"</span><span class=identifier> </span><span class="special">|</span><span class=identifier> </span><span class="string">"dec"</span><span class="special">;</span><span class=identifier> | |
62 | block </span><span class=special>= </span><span class="identifier">base</span><span class=special> >> </span><span class="literal">'{'</span><span class=special> >> *</span><span class="identifier">block_line</span><span class=special> >> </span><span class="literal">'}'</span><span class=special>; | |
63 | </span>block_line <span class=special>= </span><span class="identifier">number</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre> | |
64 | <p>We intentionally left out the <code><font color="#000000"><span class="identifier"><tt>number</tt></span></font></code> | |
65 | rule. The tricky part is that the way <tt>number</tt> rule behaves depends on | |
66 | the result of the <tt>base</tt> rule. If <tt>base</tt> got a <em>"bin"</em>, | |
67 | then number should parse binary numbers. If <tt>base</tt> got a <em>"dec"</em>, | |
68 | then number should parse decimal numbers. Typically we'll have to rewrite our | |
69 | grammar to accomodate the different parsing behavior:</p> | |
70 | <pre><code><font color="#000000"><span class=identifier> block </span><span class=special>= | |
71 | </span><span class=identifier>"bin"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>bin_line<span class=special> >> </span><span class="literal">'}'</span><span class=special> | |
72 | | </span><span class=identifier>"dec"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>dec_line<span class=special> >> </span><span class="literal">'}'</span><span class=special> | |
73 | ; | |
74 | </span>bin_line <span class=special>= </span><span class="identifier">bin_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>; | |
75 | </span>dec_line <span class=special>= </span><span class="identifier">int_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre> | |
76 | <p>while this is fine, the redundancy makes us want to find a better solution; | |
77 | after all, we'd want to make full use of Spirit's dynamic parsing capabilities. | |
78 | Apart from that, there will be cases where the set of parsing behaviors for | |
79 | our <tt>number</tt> rule is not known when the grammar is written. We'll only | |
80 | be given a map of string descriptors and corresponding rules [e.g. (("dec", | |
81 | int_p), ("bin", bin_p) ... etc...)].</p> | |
82 | <p>The basic idea is to have a rule for binary and decimal numbers. That's easy | |
83 | enough to do (see <a href="numerics.html">numerics</a>). When <tt>base</tt> | |
84 | is being parsed, in your semantic action, store a pointer to the selected base | |
85 | in a closure variable (e.g. <tt>block.int_rule</tt>). Here's an example:</p> | |
86 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base | |
87 | </span><span class="special">=</span><span class=identifier> str_p</span><span class="special">(</span><span class="string">"bin"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">bin_rule</span><span class="special">)] | |
88 | | </span><span class=identifier>str_p</span><span class="special">(</span><span class="string">"dec"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">dec_rule</span><span class="special">)] | |
89 | ;</span></font></code></pre> | |
90 | <p>With this setup, your number rule will now look something like:</p> | |
91 | <pre><code><font color="#000000"><span class=special> </span><span class=identifier>number </span><span class="special">=</span><span class=identifier> lazy_p</span><span class="special">(*</span><span class=identifier>block.int_rule</span><span class="special">);</span></font></code></pre> | |
92 | <p>The <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt> | |
93 | does it a bit differently, ingeniously using the <a href="symbols.html">symbol | |
94 | table</a> to dispatch the correct rule, but in essence, both strategies are | |
95 | similar. This technique, using the symbol table, is detailed in the Techiques section: <a href="techniques.html#nabialek_trick">nabialek_trick</a>. Admitedly, when you add up all the rules, the resulting grammar is | |
96 | more complex than the hard-coded grammar above. Yet, for more complex grammar | |
97 | patterns with a lot more rules to choose from, the additional setup is well | |
98 | worth it.</p> | |
99 | <table border="0"> | |
100 | <tr> | |
101 | <td width="10"></td> | |
102 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
103 | <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
104 | <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
105 | </tr> | |
106 | </table> | |
107 | <br> | |
108 | <hr size="1"> | |
109 | <p class="copyright">Copyright © 2003 Joel de Guzman<br> | |
110 | Copyright © 2003 Vaclav Vesely<br> | |
111 | <br> | |
112 | <font size="2">Use, modification and distribution is subject to the Boost Software | |
113 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | |
114 | http://www.boost.org/LICENSE_1_0.txt)</font></p> | |
115 | <p class="copyright"> </p> | |
116 | </body> | |
117 | </html> |