]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <title>Symbols</title> | |
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | |
5 | <link rel="stylesheet" href="theme/style.css" type="text/css"> | |
6 | </head> | |
7 | ||
8 | <body> | |
9 | <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> | |
10 | <tr> | |
11 | <td width="10"> | |
12 | </td> | |
13 | <td width="85%"> | |
14 | <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Symbols</b></font> | |
15 | </td> | |
16 | <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> | |
17 | </tr> | |
18 | </table> | |
19 | <br> | |
20 | <table border="0"> | |
21 | <tr> | |
22 | <td width="10"></td> | |
23 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
24 | <td width="30"><a href="distinct.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
25 | <td width="30"><a href="trees.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
26 | </tr> | |
27 | </table> | |
28 | <p>This class symbols implements a symbol table. The symbol table holds a dictionary | |
29 | of symbols where each symbol is a sequence of CharTs (a <tt>char</tt>, <tt>wchar_t</tt>, | |
30 | <tt>int</tt>, enumeration etc.) . The template class, parameterized by the character | |
31 | type (CharT), can work efficiently with 8, 16, 32 and even 64 bit characters. | |
32 | Mutable data of type T is associated with each symbol.<br> | |
33 | </p> | |
34 | <p>Traditionally, symbol table management is maintained seperately outside the | |
35 | BNF grammar through semantic actions. Contrary to standard practice, the Spirit | |
36 | symbol table class <tt>symbols</tt> is-a parser. An instance of which may be | |
37 | used anywhere in the EBNF grammar specification. It is an example of a dynamic | |
38 | parser. A dynamic parser is characterized by its ability to modify its behavior | |
39 | at run time. Initially, an empty symbols object matches nothing. At any time, | |
40 | symbols may be added, thus, dynamically altering its behavior.</p> | |
41 | <p>Each entry in a symbol table has an associated mutable data slot. In this regard, | |
42 | one can view the symbol table as an associative container (or map) of key-value | |
43 | pairs where the keys are strings. </p> | |
44 | <p>The symbols class expects two template parameters (actually there is a third, | |
45 | see detail box). The first parameter <tt>T</tt> specifies the data type associated | |
46 | with each symbol (defaults to <tt>int</tt>) and the second parameter <tt>CharT</tt> | |
47 | specifies the character type of the symbols (defaults to <tt>char</tt>). </p> | |
48 | <pre><span class=identifier> </span><span class=keyword>template | |
49 | </span><span class=special>< | |
50 | </span><span class=keyword>typename </span><span class=identifier>T </span><span class=special>= </span><span class=keyword>int</span><span class=special>, | |
51 | </span><span class=keyword>typename </span><span class=identifier>CharT </span><span class=special>= </span><span class=keyword>char</span><span class=special>, | |
52 | </span><span class=keyword>typename </span><span class=identifier>SetT </span><span class=special>= </span><span class=identifier>impl</span><span class=special>::</span><span class=identifier>tst</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>> | |
53 | </span><span class=special>> | |
54 | </span><span class=keyword>class </span><span class=identifier>symbols</span><span class=special>;</span></pre> | |
55 | <table width="80%" border="0" align="center"> | |
56 | <tr> | |
57 | <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Ternary | |
58 | State Trees</b><br> | |
59 | <br> | |
60 | The actual set implementation is supplied by the SetT template parameter | |
61 | (3rd template parameter of the symbols class) . By default, this uses the | |
62 | tst class which is an implementation of the Ternary Search Tree. <br> | |
63 | <br> | |
64 | Ternary Search Trees are faster than hashing for many typical search problems | |
65 | especially when the search interface is iterator based. Searching for a | |
66 | string of length k in a ternary search tree with n strings will require | |
67 | at most O(log n+k) character comparisons. TSTs are many times faster than | |
68 | hash tables for unsuccessful searches since mismatches are discovered earlier | |
69 | after examining only a few characters. Hash tables always examine an entire | |
70 | key when searching.<br> | |
71 | <br> | |
72 | For details see <a href="http://www.cs.princeton.edu/%7Ers/strings/">http://www.cs.princeton.edu/~rs/strings/</a>.</td> | |
73 | </tr> | |
74 | </table> | |
75 | <p>Here are some sample declarations:</p> | |
76 | <pre><span class=identifier> </span><span class=identifier>symbols</span><span class=special><> </span><span class=identifier>sym</span><span class=special>; | |
77 | </span><span class=identifier>symbols</span><span class=special><</span><span class=keyword>short</span><span class=special>, </span><span class=keyword>wchar_t</span><span class=special>> </span><span class=identifier>sym2</span><span class=special>; | |
78 | ||
79 | </span><span class=keyword>struct </span><span class=identifier>my_info | |
80 | </span><span class=special>{ | |
81 | </span><span class=keyword>int </span><span class=identifier>id</span><span class=special>; | |
82 | </span><span class=keyword>double </span><span class=identifier>value</span><span class=special>; | |
83 | </span><span class=special>}; | |
84 | ||
85 | </span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>my_info</span><span class=special>> </span><span class=identifier>sym3</span><span class=special>;</span></pre> | |
86 | <p>After having declared our symbol tables, symbols may be added statically using | |
87 | the construct:</p> | |
88 | <pre><span class=identifier> sym </span><span class=special>= </span><span class=identifier>a</span><span class=special>, </span><span class=identifier>b</span><span class=special>, </span><span class=identifier>c</span><span class=special>, </span><span class=identifier>d </span><span class=special>...;</span></pre> | |
89 | <p>where <tt>sym</tt> is a symbol table and <tt>a..d</tt> etc. are strings. <img src="theme/note.gif" width="16" height="16">Note | |
90 | that the comma operator is separating the items being added to the symbol table, | |
91 | through an assignment. Due to operator overloading this is possible and correct | |
92 | (though it may take a little getting used to) and is a concise way to initialize | |
93 | the symbol table with many symbols. Also, it is perfectly valid to make multiple | |
94 | assignments to a symbol table to iteratively add symbols (or groups of symbols) | |
95 | at different times.</p> | |
96 | <p>Simple example:<br> | |
97 | </p> | |
98 | <pre><span class=identifier> sym </span><span class=special>= </span><span class=string>"pineapple"</span><span class=special>, </span><span class=string>"orange"</span><span class=special>, </span><span class=string>"banana"</span><span class=special>, </span><span class=string>"apple"</span><span class=special>, </span><span class=string>"mango"</span><span class=special>;</span></pre> | |
99 | <p>Note that it is invalid to add the same symbol multiple times to a symbol table, | |
100 | though you may modify the value associated with a symbol artibrarily many times.</p> | |
101 | <p>Now, we may use sym in the grammar. Example:</p> | |
102 | <pre><span class=identifier> fruits </span><span class=special>= </span><span class=identifier>sym </span><span class=special>>> </span><span class=special>*(</span><span class=literal>',' </span><span class=special>>> </span><span class=identifier>sym</span><span class=special>);</span></pre> | |
103 | <p>Alternatively, symbols may be added dynamically through the member functor | |
104 | <tt>add</tt> (see <tt><a href="#symbol_inserter">symbol_inserter</a></tt> below). | |
105 | The member functor <tt>add</tt> may be attached to a parser as a semantic action | |
106 | taking in a begin/end pair:</p> | |
107 | <pre><span class=identifier> p</span><span class=special>[</span><span class=identifier>sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>]</span></pre> | |
108 | <p>where p is a parser (and sym is a symbol table). On success, the matching portion | |
109 | of the input is added to the symbol table.</p> | |
110 | <p><tt>add</tt> may also be used to directly initialize data. Examples:</p> | |
111 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=string>"hello"</span><span class=special>, </span><span class=number>1</span><span class=special>)(</span><span class=string>"crazy"</span><span class=special>, </span><span class=number>2</span><span class=special>)(</span><span class=string>"world"</span><span class=special>, </span><span class=number>3</span><span class=special>);</span></pre> | |
112 | <p>Assuming of course that the data slot associated with <tt>sym</tt> is an integer.</p> | |
113 | <p>The data associated with each symbol may be modified any time. The most obvious | |
114 | way of course is through <a href="semantic_actions.html">semantic actions</a>. | |
115 | A function or functor, as usual, may be attached to the symbol table. The symbol | |
116 | table expects a function or functor compatible with the signature:</p> | |
117 | <p><b>Signature for functions:</b></p> | |
118 | <pre><code><font color="#000000"><span class=identifier> </span><span class=keyword>void </span><span class=identifier>func</span><span class=special>(</span><span class=identifier>T</span><span class="special">&</span><span class=identifier> data</span><span class=special>);</span></font></code></pre> | |
119 | <p><b>Signature for functors:</b><br> | |
120 | </p> | |
121 | <pre><code><font color="#000000"><span class=special> </span><span class=keyword>struct </span><span class=identifier>ftor | |
122 | </span><span class=special>{ | |
123 | </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>T</span><span class="special">&</span><span class=identifier> data</span><span class=special>) </span><span class=keyword>const</span><span class=special>; | |
124 | </span><span class=special>};</span></font></code></pre> | |
125 | <p>Where <tt>T</tt> is the data type of the symbol table (the <tt>T</tt> in its | |
126 | template parameter list). When the symbol table successfully matches something | |
127 | from the input, the data associated with the matching entry in the symbol table | |
128 | is reported to the semantic action.</p> | |
129 | <h2>Symbol table utilities</h2> | |
130 | <p>Sometimes, one may wish to deal with the symbol table directly. Provided are | |
131 | some symbol table utilities.</p> | |
132 | <p><b>add</b></p> | |
133 | <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>CharT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>SetT</span><span class=special>> | |
134 | </span><span class=identifier>T</span><span class=special>* </span><span class=identifier>add</span><span class=special>(</span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>, </span><span class=identifier>SetT</span><span class=special>>& </span><span class=identifier>table</span><span class=special>, </span><span class=identifier>CharT </span><span class=keyword>const</span><span class=special>* </span><span class=identifier>sym</span><span class=special>, </span><span class=identifier>T </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>data </span><span class=special>= </span><span class=identifier>T</span><span class=special>());</span></pre> | |
135 | <p>adds a symbol <tt>sym</tt> (C string) to a symbol table <tt>table</tt> plus | |
136 | an optional data <tt>data</tt> associated with the symbol. Returns a pointer | |
137 | to the data associated with the symbol or <tt>NULL</tt> if add failed (e.g. | |
138 | when the symbol is already added before).<br> | |
139 | <br> | |
140 | <b>find</b></p> | |
141 | <pre><span class=special> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>CharT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>SetT</span><span class=special>> | |
142 | </span><span class=identifier>T</span><span class=special>* </span><span class=identifier>find</span><span class=special>(</span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>T</span><span class=special>, </span><span class=identifier>CharT</span><span class=special>, </span><span class=identifier>SetT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>table</span><span class=special>, </span><span class=identifier>CharT </span><span class=keyword>const</span><span class=special>* </span><span class=identifier>sym</span><span class=special>);</span></pre> | |
143 | <p>finds a symbol <tt>sym</tt> (C string) from a symbol table <tt>table</tt>. | |
144 | Returns a pointer to the data associated with the symbol or <tt>NULL</tt> if | |
145 | not found</p> | |
146 | <h2><a name="symbol_inserter"></a>symbol_inserter</h2> | |
147 | <p>The symbols class holds an instance of this class named <tt>add</tt>. This | |
148 | can be called directly just like a member function, passing in a first/last | |
149 | iterator and optional data:<br> | |
150 | <br> | |
151 | </p> | |
152 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=identifier>first</span><span class=special>, </span><span class=identifier>last</span><span class=special>, </span><span class=identifier>data</span><span class=special>);</span></pre> | |
153 | <p>Or, passing in a C string and optional data:<br> | |
154 | </p> | |
155 | <pre><span class=identifier> sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>(</span><span class=identifier>c_string</span><span class=special>, </span><span class=identifier>data</span><span class=special>);</span></pre> | |
156 | <p>where <tt>sym</tt> is a symbol table. The <tt>data</tt> argument is optional. | |
157 | The nice thing about this scheme is that it can be cascaded. We've seen this | |
158 | applied above. Here's a snippet from the roman numerals parser</p> | |
159 | <pre> <span class=comment>// Parse roman numerals (1..9) using the symbol table. | |
160 | ||
161 | </span> <span class=keyword>struct </span><span class=identifier>ones </span><span class=special>: </span><span class=identifier>symbols</span><span class=special><</span><span class=keyword>unsigned</span><span class=special>> | |
162 | </span><span class=special>{ | |
163 | </span><span class=identifier>ones</span><span class=special>() | |
164 | </span><span class=special>{ | |
165 | </span><span class=identifier>add | |
166 | </span><span class=special>(</span><span class=string>"I" </span><span class=special>, </span><span class=number>1</span><span class=special>) | |
167 | </span><span class=special>(</span><span class=string>"II" </span><span class=special>, </span><span class=number>2</span><span class=special>) | |
168 | </span><span class=special>(</span><span class=string>"III" </span><span class=special>, </span><span class=number>3</span><span class=special>) | |
169 | </span><span class=special>(</span><span class=string>"IV" </span><span class=special>, </span><span class=number>4</span><span class=special>) | |
170 | </span><span class=special>(</span><span class=string>"V" </span><span class=special>, </span><span class=number>5</span><span class=special>) | |
171 | </span><span class=special>(</span><span class=string>"VI" </span><span class=special>, </span><span class=number>6</span><span class=special>) | |
172 | </span><span class=special>(</span><span class=string>"VII" </span><span class=special>, </span><span class=number>7</span><span class=special>) | |
173 | </span><span class=special>(</span><span class=string>"VIII" </span><span class=special>, </span><span class=number>8</span><span class=special>) | |
174 | </span><span class=special>(</span><span class=string>"IX" </span><span class=special>, </span><span class=number>9</span><span class=special>) | |
175 | </span><span class=special>; | |
176 | </span><span class=special>} | |
177 | ||
178 | </span><span class=special>} </span><span class=identifier>ones_p</span><span class=special>;</span></pre> | |
179 | <p>Notice that a user defined struct <tt>ones</tt> is subclassed from <tt>symbols</tt>. | |
180 | Then at construction time, we added all the symbols using the <tt>add</tt> symbol_inserter.</p> | |
181 | <p> <img height="16" width="15" src="theme/lens.gif"> The full source code can be <a href="../example/fundamental/roman_numerals.cpp">viewed here</a>. This is part of the Spirit distribution.</p> | |
182 | <p>Again, <tt>add</tt> may also be used as a semantic action since it conforms | |
183 | to the action interface (see semantic actions):<br> | |
184 | </p> | |
185 | <pre><span class=special></span><span class=identifier> p</span><span class=special>[</span><span class=identifier>sym</span><span class=special>.</span><span class=identifier>add</span><span class=special>]</span></pre> | |
186 | <p>where p is a parser of course.<span class=special><br> | |
187 | </span></p> | |
188 | <table border="0"> | |
189 | <tr> | |
190 | <td width="10"></td> | |
191 | <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> | |
192 | <td width="30"><a href="distinct.html"><img src="theme/l_arr.gif" border="0"></a></td> | |
193 | <td width="30"><a href="trees.html"><img src="theme/r_arr.gif" border="0"></a></td> | |
194 | </tr> | |
195 | </table> | |
196 | <br> | |
197 | <hr size="1"> | |
198 | <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> | |
199 | <br> | |
200 | <font size="2">Use, modification and distribution is subject to the Boost Software | |
201 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | |
202 | http://www.boost.org/LICENSE_1_0.txt)</font></p> | |
203 | </body> | |
204 | </html> |