1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7 ===============================================================================/]
9 [section:string String Parsers]
11 This module includes parsers for strings. Currently, this module
12 includes the literal and string parsers and the symbol table.
14 [heading Module Header]
16 // forwards to <boost/spirit/home/qi/string.hpp>
17 #include <boost/spirit/include/qi_string.hpp>
19 Also, see __include_structure__.
21 [/------------------------------------------------------------------------------]
22 [section:string String Parsers (`string`, `lit`)]
26 The `string` parser matches a string of characters. The `string` parser
27 is an implicit lexeme: the `skip` parser is not applied in between
28 characters of the string. The `string` parser has an associated
29 __char_encoding_namespace__. This is needed when doing basic operations
30 such as inhibiting case sensitivity. Examples:
34 string(s) // s is a std::string
36 `lit`, like `string`, also matches a string of characters. The main
37 difference is that `lit` does not synthesize an attribute. A plain
38 string like `"hello"` or a `std::basic_string` is equivalent to a `lit`.
44 lit(s) // s is a std::string
48 // forwards to <boost/spirit/home/qi/string/lit.hpp>
49 #include <boost/spirit/include/qi_lit.hpp>
55 [[`boost::spirit::lit // alias: boost::spirit::qi::lit`]]
59 In the table above, `ns` represents a __char_encoding_namespace__.
63 [:__primitive_parser_concept__]
65 [variablelist Notation
66 [[`s`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__.]]
67 [[`ns`] [A __char_encoding_namespace__.]]]
69 [heading Expression Semantics]
71 Semantics of an expression is defined only where it differs from, or is
72 not defined in __primitive_parser_concept__.
75 [[Expression] [Semantics]]
76 [[`s`] [Create string parser
78 [[`lit(s)`] [Create a string parser
80 [[`ns::string(s)`] [Create a string parser with `ns` encoding
87 [[Expression] [Attribute]]
89 [[`lit(s)`] [__unused__]]
90 [[`ns::string(s)`] [`std::basic_string<T>` where `T`
91 is the underlying character type
99 where `N` is the number of characters in the string to be parsed.
103 [note The test harness for the example(s) below is presented in the
104 __qi_basics_examples__ section.]
106 Some using declarations:
108 [reference_using_declarations_lit_string]
112 [reference_string_literals]
116 [reference_string_std_string]
118 Lazy strings using __phoenix__
120 [reference_string_phoenix]
122 [endsect] [/ lit/string]
125 [/------------------------------------------------------------------------------]
126 [section:symbols Symbols Parser (`symbols`)]
128 [heading Description]
130 The class `symbols` implements a symbol table: an associative container
131 (or map) of key-value pairs where the keys are strings. The `symbols`
132 class can work efficiently with 8, 16, 32 and even 64 bit characters.
134 Traditionally, symbol table management is maintained separately outside
135 the grammar through semantic actions. Contrary to standard practice, the
136 Spirit symbol table class `symbols` is-a parser, an instance of which may
137 be used anywhere in the grammar specification. It is an example of a
138 dynamic parser. A dynamic parser is characterized by its ability to
139 modify its behavior at run time. Initially, an empty symbols object
140 matches nothing. At any time, symbols may be added, thus, dynamically
141 altering its behavior.
145 // forwards to <boost/spirit/home/qi/string/symbols.hpp>
146 #include <boost/spirit/include/qi_symbols.hpp>
148 Also, see __include_structure__.
154 [[`boost::spirit::qi::symbols`]]
155 [[`boost::spirit::qi::tst`]]
156 [[`boost::spirit::qi::tst_map`]]
161 template <typename Char, typename T, typename Lookup>
164 [heading Template parameters]
167 [[Parameter] [Description] [Default]]
168 [[`Char`] [The character type
169 of the symbol strings.] [`char`]]
170 [[`T`] [The data type associated
171 with each symbol.] [__unused_type__]]
172 [[`Lookup`] [The symbol search
173 implementation] [`tst<Char, T>`]]
178 [:__primitive_parser_concept__]
180 [variablelist Notation
181 [[`Sym`] [A `symbols` type.]]
182 [[`Char`] [A character type.]]
183 [[`T`] [A data type.]]
184 [[`sym`, `sym2`][`symbols` objects.]]
185 [[`sseq`] [An __stl__ container of strings.]]
186 [[`dseq`] [An __stl__ container of data with `value_type` `T`.]]
187 [[`s1`...`sN`] [A __string__.]]
188 [[`d1`...`dN`] [Objects of type `T`.]]
189 [[`f`] [A callable function or function object.]]
190 [[`f`, `l`] [`ForwardIterator` first/last pair.]]
193 [heading Expression Semantics]
195 Semantics of an expression is defined only where it differs from, or is not
196 defined in __primitive_parser_concept__.
199 [[Expression] [Semantics]]
200 [[`Sym()`] [Construct an empty symbols names `"symbols"`.]]
201 [[`Sym(name)`] [Construct an empty symbols named `name`.]]
202 [[`Sym(sym2)`] [Copy construct a symbols from `sym2` (Another `symbols` object).]]
203 [[`Sym(sseq)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `"symbols"`.]]
204 [[`Sym(sseq, name)`] [Construct symbols from `sseq` (an __stl__ container of strings) named `name`.]]
205 [[`Sym(sseq, dseq)`] [Construct symbols from `sseq` and `dseq`
206 (An __stl__ container of strings and an __stl__ container of
207 data with `value_type` `T`) which is named `"symbols"`.]]
208 [[`Sym(sseq, dseq, name)`] [Construct symbols from `sseq` and `dseq`
209 (An __stl__ container of strings and an __stl__ container of
210 data with `value_type` `T`) which is named `name`.]]
211 [[`sym = sym2`] [Assign `sym2` to `sym`.]]
212 [[`sym = s1, s2, ..., sN`] [Assign one or more symbols (`s1`...`sN`) to `sym`.]]
213 [[`sym += s1, s2, ..., sN`] [Add one or more symbols (`s1`...`sN`) to `sym`.]]
214 [[`sym.add(s1)(s2)...(sN)`] [Add one or more symbols (`s1`...`sN`) to `sym`.]]
215 [[`sym.add(s1, d1)(s2, d2)...(sN, dN)`]
216 [Add one or more symbols (`s1`...`sN`)
217 with associated data (`d1`...`dN`) to `sym`.]]
218 [[`sym -= s1, s2, ..., sN`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]]
219 [[`sym.remove(s1)(s2)...(sN)`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]]
220 [[`sym.clear()`] [Erase all of the symbols in `sym`.]]
221 [[`sym.at(s)`] [Return a reference to the object associated
222 with symbol, `s`. If `sym` does not already
223 contain such an object, `at` inserts the default
225 [[`sym.find(s)`] [Return a pointer to the object associated
226 with symbol, `s`. If `sym` does not already
227 contain such an object, `find` returns a null
229 [[`sym.prefix_find(f, l)`] [Return a pointer to the object associated
230 with longest symbol that matches the beginning
231 of the range `[f, l)`, and updates `f` to point
232 to one past the end of that match. If no symbol matches,
233 then return a null pointer, and `f` is unchanged.]]
234 [[`sym.for_each(f)`] [For each symbol in `sym`, `s`, a
235 `std::basic_string<Char>` with associated data,
236 `d`, an object of type `T`, invoke `f(s, d)`]]
237 [[`sym.name()`] [Retrieve the current name of the symbols object.]]
238 [[`sym.name(name)`] [Set the current name of the symbols object to be `name`.]]
243 The attribute of `symbol<Char, T>` is `T`.
247 The default implementation uses a Ternary Search Tree (TST) with
252 Where k is the length of the string to be searched in a TST with n
255 TSTs are faster than hashing for many typical search problems especially
256 when the search interface is iterator based. TSTs are many times faster
257 than hash tables for unsuccessful searches since mismatches are
258 discovered earlier after examining only a few characters. Hash tables
259 always examine an entire key when searching.
261 An alternative implementation uses a hybrid hash-map front end (for the
262 first character) plus a TST: `tst_map`. This gives us a complexity of
266 This is found to be significantly faster than plain TST, albeit with a
267 bit more memory usage requirements (each slot in the hash-map is a TST
268 node). If you require a lot of symbols to be searched, use the `tst_map`
269 implementation. This can be done by using `tst_map` as the third
270 template parameter to the symbols class:
272 symbols<Char, T, tst_map<Char, T> > sym;
276 [note The test harness for the example(s) below is presented in the
277 __qi_basics_examples__ section.]
279 Some using declarations:
281 [reference_using_declarations_symbols]
285 [reference_symbols_with_data]
287 When `symbols` is used for case-insensitive parsing (in a __qi_no_case__
288 directive), added symbol strings should be in lowercase. Symbol strings
289 containing one or more uppercase characters will not match any input
290 when symbols is used in a `no_case` directive.
292 [reference_symbols_with_no_case]
295 [endsect] [/ symbols]