]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:string String Parsers] | |
10 | ||
11 | This module includes parsers for strings. Currently, this module | |
12 | includes the literal and string parsers and the symbol table. | |
13 | ||
14 | [heading Module Header] | |
15 | ||
16 | // forwards to <boost/spirit/home/qi/string.hpp> | |
17 | #include <boost/spirit/include/qi_string.hpp> | |
18 | ||
19 | Also, see __include_structure__. | |
20 | ||
21 | [/------------------------------------------------------------------------------] | |
22 | [section:string String Parsers (`string`, `lit`)] | |
23 | ||
24 | [heading Description] | |
25 | ||
26 | The `string` parser matches a string of characters. The `string` parser | |
27 | is an implicit lexeme: the `skip` parser is not applied in between | |
28 | characters of the string. The `string` parser has an associated | |
29 | __char_encoding_namespace__. This is needed when doing basic operations | |
30 | such as inhibiting case sensitivity. Examples: | |
31 | ||
32 | string("Hello") | |
33 | string(L"Hello") | |
34 | string(s) // s is a std::string | |
35 | ||
36 | `lit`, like `string`, also matches a string of characters. The main | |
37 | difference is that `lit` does not synthesize an attribute. A plain | |
38 | string like `"hello"` or a `std::basic_string` is equivalent to a `lit`. | |
39 | Examples: | |
40 | ||
41 | "Hello" | |
42 | lit("Hello") | |
43 | lit(L"Hello") | |
44 | lit(s) // s is a std::string | |
45 | ||
46 | [heading Header] | |
47 | ||
48 | // forwards to <boost/spirit/home/qi/string/lit.hpp> | |
49 | #include <boost/spirit/include/qi_lit.hpp> | |
50 | ||
51 | [heading Namespace] | |
52 | ||
53 | [table | |
54 | [[Name]] | |
55 | [[`boost::spirit::lit // alias: boost::spirit::qi::lit`]] | |
56 | [[`ns::string`]] | |
57 | ] | |
58 | ||
59 | In the table above, `ns` represents a __char_encoding_namespace__. | |
60 | ||
61 | [heading Model of] | |
62 | ||
63 | [:__primitive_parser_concept__] | |
64 | ||
65 | [variablelist Notation | |
66 | [[`s`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__.]] | |
67 | [[`ns`] [A __char_encoding_namespace__.]]] | |
68 | ||
69 | [heading Expression Semantics] | |
70 | ||
71 | Semantics of an expression is defined only where it differs from, or is | |
72 | not defined in __primitive_parser_concept__. | |
73 | ||
74 | [table | |
75 | [[Expression] [Semantics]] | |
76 | [[`s`] [Create string parser | |
77 | from a string, `s`.]] | |
78 | [[`lit(s)`] [Create a string parser | |
79 | from a string, `s`.]] | |
80 | [[`ns::string(s)`] [Create a string parser with `ns` encoding | |
81 | from a string, `s`.]] | |
82 | ] | |
83 | ||
84 | [heading Attributes] | |
85 | ||
86 | [table | |
87 | [[Expression] [Attribute]] | |
88 | [[`s`] [__unused__]] | |
89 | [[`lit(s)`] [__unused__]] | |
90 | [[`ns::string(s)`] [`std::basic_string<T>` where `T` | |
91 | is the underlying character type | |
92 | of `s`.]] | |
93 | ] | |
94 | ||
95 | [heading Complexity] | |
96 | ||
97 | [:O(N)] | |
98 | ||
99 | where `N` is the number of characters in the string to be parsed. | |
100 | ||
101 | [heading Example] | |
102 | ||
103 | [note The test harness for the example(s) below is presented in the | |
104 | __qi_basics_examples__ section.] | |
105 | ||
106 | Some using declarations: | |
107 | ||
108 | [reference_using_declarations_lit_string] | |
109 | ||
110 | Basic literals: | |
111 | ||
112 | [reference_string_literals] | |
113 | ||
114 | From a `std::string` | |
115 | ||
116 | [reference_string_std_string] | |
117 | ||
118 | Lazy strings using __phoenix__ | |
119 | ||
120 | [reference_string_phoenix] | |
121 | ||
122 | [endsect] [/ lit/string] | |
123 | ||
124 | ||
125 | [/------------------------------------------------------------------------------] | |
126 | [section:symbols Symbols Parser (`symbols`)] | |
127 | ||
128 | [heading Description] | |
129 | ||
130 | The class `symbols` implements a symbol table: an associative container | |
131 | (or map) of key-value pairs where the keys are strings. The `symbols` | |
132 | class can work efficiently with 8, 16, 32 and even 64 bit characters. | |
133 | ||
134 | Traditionally, symbol table management is maintained separately outside | |
135 | the grammar through semantic actions. Contrary to standard practice, the | |
136 | Spirit symbol table class `symbols` is-a parser, an instance of which may | |
137 | be used anywhere in the grammar specification. It is an example of a | |
138 | dynamic parser. A dynamic parser is characterized by its ability to | |
139 | modify its behavior at run time. Initially, an empty symbols object | |
140 | matches nothing. At any time, symbols may be added, thus, dynamically | |
141 | altering its behavior. | |
142 | ||
143 | [heading Header] | |
144 | ||
145 | // forwards to <boost/spirit/home/qi/string/symbols.hpp> | |
146 | #include <boost/spirit/include/qi_symbols.hpp> | |
147 | ||
148 | Also, see __include_structure__. | |
149 | ||
150 | [heading Namespace] | |
151 | ||
152 | [table | |
153 | [[Name]] | |
154 | [[`boost::spirit::qi::symbols`]] | |
155 | [[`boost::spirit::qi::tst`]] | |
156 | [[`boost::spirit::qi::tst_map`]] | |
157 | ] | |
158 | ||
159 | [heading Synopsis] | |
160 | ||
161 | template <typename Char, typename T, typename Lookup> | |
162 | struct symbols; | |
163 | ||
164 | [heading Template parameters] | |
165 | ||
166 | [table | |
167 | [[Parameter] [Description] [Default]] | |
168 | [[`Char`] [The character type | |
169 | of the symbol strings.] [`char`]] | |
170 | [[`T`] [The data type associated | |
171 | with each symbol.] [__unused_type__]] | |
172 | [[`Lookup`] [The symbol search | |
173 | implementation] [`tst<Char, T>`]] | |
174 | ] | |
175 | ||
176 | [heading Model of] | |
177 | ||
178 | [:__primitive_parser_concept__] | |
179 | ||
180 | [variablelist Notation | |
181 | [[`Sym`] [A `symbols` type.]] | |
182 | [[`Char`] [A character type.]] | |
183 | [[`T`] [A data type.]] | |
184 | [[`sym`, `sym2`][`symbols` objects.]] | |
185 | [[`sseq`] [An __stl__ container of strings.]] | |
186 | [[`dseq`] [An __stl__ container of data with `value_type` `T`.]] | |
187 | [[`s1`...`sN`] [A __string__.]] | |
188 | [[`d1`...`dN`] [Objects of type `T`.]] | |
189 | [[`f`] [A callable function or function object.]] | |
190 | [[`f`, `l`] [`ForwardIterator` first/last pair.]] | |
191 | ] | |
192 | ||
193 | [heading Expression Semantics] | |
194 | ||
195 | Semantics of an expression is defined only where it differs from, or is not | |
196 | defined in __primitive_parser_concept__. | |
197 | ||
198 | [table | |
199 | [[Expression] [Semantics]] | |
200 | [[`Sym()`] [Construct an empty symbols names `"symbols"`.]] | |
201 | [[`Sym(name)`] [Construct an empty symbols named `name`.]] | |
202 | [[`Sym(sym2)`] [Copy construct a symbols from `sym2` (Another `symbols` object).]] | |
203 | [[`Sym(sseq)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `"symbols"`.]] | |
204 | [[`Sym(sseq, name)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `name`.]] | |
205 | [[`Sym(sseq, dseq)`] [Construct symbols from `sseq` and `dseq` | |
206 | (An __stl__ container of strings and an __stl__ container of | |
207 | data with `value_type` `T`) which is named `"symbols"`.]] | |
208 | [[`Sym(sseq, dseq, name)`] [Construct symbols from `sseq` and `dseq` | |
209 | (An __stl__ container of strings and an __stl__ container of | |
210 | data with `value_type` `T`) which is named `name`.]] | |
211 | [[`sym = sym2`] [Assign `sym2` to `sym`.]] | |
212 | [[`sym = s1, s2, ..., sN`] [Assign one or more symbols (`s1`...`sN`) to `sym`.]] | |
213 | [[`sym += s1, s2, ..., sN`] [Add one or more symbols (`s1`...`sN`) to `sym`.]] | |
214 | [[`sym.add(s1)(s2)...(sN)`] [Add one or more symbols (`s1`...`sN`) to `sym`.]] | |
215 | [[`sym.add(s1, d1)(s2, d2)...(sN, dN)`] | |
216 | [Add one or more symbols (`s1`...`sN`) | |
217 | with associated data (`d1`...`dN`) to `sym`.]] | |
218 | [[`sym -= s1, s2, ..., sN`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]] | |
219 | [[`sym.remove(s1)(s2)...(sN)`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]] | |
220 | [[`sym.clear()`] [Erase all of the symbols in `sym`.]] | |
221 | [[`sym.at(s)`] [Return a reference to the object associated | |
222 | with symbol, `s`. If `sym` does not already | |
223 | contain such an object, `at` inserts the default | |
224 | object `T()`.]] | |
225 | [[`sym.find(s)`] [Return a pointer to the object associated | |
226 | with symbol, `s`. If `sym` does not already | |
227 | contain such an object, `find` returns a null | |
228 | pointer.]] | |
229 | [[`sym.prefix_find(f, l)`] [Return a pointer to the object associated | |
230 | with longest symbol that matches the beginning | |
231 | of the range `[f, l)`, and updates `f` to point | |
232 | to one past the end of that match. If no symbol matches, | |
233 | then return a null pointer, and `f` is unchanged.]] | |
234 | [[`sym.for_each(f)`] [For each symbol in `sym`, `s`, a | |
235 | `std::basic_string<Char>` with associated data, | |
236 | `d`, an object of type `T`, invoke `f(s, d)`]] | |
237 | [[`sym.name()`] [Retrieve the current name of the symbols object.]] | |
238 | [[`sym.name(name)`] [Set the current name of the symbols object to be `name`.]] | |
239 | ] | |
240 | ||
241 | [heading Attributes] | |
242 | ||
243 | The attribute of `symbol<Char, T>` is `T`. | |
244 | ||
245 | [heading Complexity] | |
246 | ||
247 | The default implementation uses a Ternary Search Tree (TST) with | |
248 | complexity: | |
249 | ||
250 | [:O(log n+k)] | |
251 | ||
252 | Where k is the length of the string to be searched in a TST with n | |
253 | strings. | |
254 | ||
255 | TSTs are faster than hashing for many typical search problems especially | |
256 | when the search interface is iterator based. TSTs are many times faster | |
257 | than hash tables for unsuccessful searches since mismatches are | |
258 | discovered earlier after examining only a few characters. Hash tables | |
259 | always examine an entire key when searching. | |
260 | ||
261 | An alternative implementation uses a hybrid hash-map front end (for the | |
262 | first character) plus a TST: `tst_map`. This gives us a complexity of | |
263 | ||
264 | [:O(1 + log n+k-1)] | |
265 | ||
266 | This is found to be significantly faster than plain TST, albeit with a | |
267 | bit more memory usage requirements (each slot in the hash-map is a TST | |
268 | node). If you require a lot of symbols to be searched, use the `tst_map` | |
269 | implementation. This can be done by using `tst_map` as the third | |
270 | template parameter to the symbols class: | |
271 | ||
272 | symbols<Char, T, tst_map<Char, T> > sym; | |
273 | ||
274 | [heading Example] | |
275 | ||
276 | [note The test harness for the example(s) below is presented in the | |
277 | __qi_basics_examples__ section.] | |
278 | ||
279 | Some using declarations: | |
280 | ||
281 | [reference_using_declarations_symbols] | |
282 | ||
283 | Symbols with data: | |
284 | ||
285 | [reference_symbols_with_data] | |
286 | ||
287 | When `symbols` is used for case-insensitive parsing (in a __qi_no_case__ | |
288 | directive), added symbol strings should be in lowercase. Symbol strings | |
289 | containing one or more uppercase characters will not match any input | |
290 | when symbols is used in a `no_case` directive. | |
291 | ||
292 | [reference_symbols_with_no_case] | |
293 | ||
294 | ||
295 | [endsect] [/ symbols] | |
296 | ||
297 | [endsect] [/ String] |