]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | [section:char Character Parsers] | |
9 | ||
10 | This module includes parsers for single characters. Currently, this | |
11 | module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single | |
12 | characters, ranges and character sets) and the encoding specific | |
13 | character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.). | |
14 | ||
15 | [heading Module Header] | |
16 | ||
17 | // forwards to <boost/spirit/home/qi/char.hpp> | |
18 | #include <boost/spirit/include/qi_char.hpp> | |
19 | ||
20 | Also, see __include_structure__. | |
21 | ||
22 | [/------------------------------------------------------------------------------] | |
23 | [section:char Character Parser (`char_`, `lit`)] | |
24 | ||
25 | [heading Description] | |
26 | ||
27 | The `char_` parser matches single characters. The `char_` parser has an | |
28 | associated __char_encoding_namespace__. This is needed when doing basic | |
29 | operations such as inhibiting case sensitivity and dealing with | |
30 | character ranges. | |
31 | ||
32 | There are various forms of `char_`. | |
33 | ||
34 | [heading char_] | |
35 | ||
36 | The no argument form of `char_` matches any character in the associated | |
37 | __char_encoding_namespace__. | |
38 | ||
39 | char_ // matches any character | |
40 | ||
41 | [heading char_(ch)] | |
42 | ||
43 | The single argument form of `char_` (with a character argument) matches | |
44 | the supplied character. | |
45 | ||
46 | char_('x') // matches 'x' | |
47 | char_(L'x') // matches L'x' | |
48 | char_(x) // matches x (a char) | |
49 | ||
50 | [heading char_(first, last)] | |
51 | ||
52 | `char_` with two arguments, matches a range of characters. | |
53 | ||
54 | char_('a','z') // alphabetic characters | |
55 | char_(L'0',L'9') // digits | |
56 | ||
57 | A range of characters is created from a low-high character pair. Such a | |
58 | parser matches a single character that is in the range, including both | |
59 | endpoints. Note, the first character must be /before/ the second, | |
60 | according to the underlying __char_encoding_namespace__. | |
61 | ||
62 | Character mapping is inherently platform dependent. It is not guaranteed | |
63 | in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we | |
64 | purposely attach a specific __char_encoding_namespace__ (such as ASCII, | |
65 | ISO-8859-1) to the `char_` parser to eliminate such ambiguities. | |
66 | ||
67 | [note *Sparse bit vectors* | |
68 | ||
69 | To accommodate 16/32 and 64 bit characters, the char-set statically | |
70 | switches from a `std::bitset` implementation when the character type is | |
71 | not greater than 8 bits, to a sparse bit/boolean set which uses a sorted | |
72 | vector of disjoint ranges (`range_run`). The set is constructed from | |
73 | ranges such that adjacent or overlapping ranges are coalesced. | |
74 | ||
75 | `range_runs` are very space-economical in situations where there are lots | |
76 | of ranges and a few individual disjoint values. Searching is O(log n) | |
77 | where n is the number of ranges.] | |
78 | ||
79 | [heading char_(def)] | |
80 | ||
81 | Lastly, when given a string (a plain C string, a `std::basic_string`, | |
82 | etc.), the string is regarded as a char-set definition string following | |
83 | a syntax that resembles posix style regular expression character sets | |
84 | (except that double quotes delimit the set elements instead of square | |
85 | brackets and there is no special negation ^ character). Examples: | |
86 | ||
87 | char_("a-zA-Z") // alphabetic characters | |
88 | char_("0-9a-fA-F") // hexadecimal characters | |
89 | char_("actgACTG") // DNA identifiers | |
90 | char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E | |
91 | ||
92 | [heading lit(ch)] | |
93 | ||
94 | `lit`, when passed a single character, behaves like the single argument | |
95 | `char_` except that `lit` does not synthesize an attribute. A plain | |
96 | `char` or `wchar_t` is equivalent to a `lit`. | |
97 | ||
98 | [note `lit` is reused by both the [qi_lit_string string parsers] and the | |
99 | char parsers. In general, a char parser is created when you pass in a | |
100 | character and a string parser is created when you pass in a string. The | |
101 | exception is when you pass a single element literal string, e.g. | |
102 | `lit("x")`. In this case, we optimize this to create a char parser | |
103 | instead of a string parser.] | |
104 | ||
105 | Examples: | |
106 | ||
107 | 'x' | |
108 | lit('x') | |
109 | lit(L'x') | |
110 | lit(c) // c is a char | |
111 | ||
112 | [heading Header] | |
113 | ||
114 | // forwards to <boost/spirit/home/qi/char/char.hpp> | |
115 | #include <boost/spirit/include/qi_char_.hpp> | |
116 | ||
117 | Also, see __include_structure__. | |
118 | ||
119 | [heading Namespace] | |
120 | ||
121 | [table | |
122 | [[Name]] | |
123 | [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]] | |
124 | [[`ns::char_`]] | |
125 | ] | |
126 | ||
127 | In the table above, `ns` represents a __char_encoding_namespace__. | |
128 | ||
129 | [heading Model of] | |
130 | ||
131 | [:__primitive_parser_concept__] | |
132 | ||
133 | [variablelist Notation | |
134 | [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be | |
135 | converted to a `char` or `wchar_t`, or a __qi_lazy_argument__ | |
136 | that evaluates to anything that can be converted to a `char` | |
137 | or `wchar_t`.]] | |
138 | [[`ns`] [A __char_encoding_namespace__.]] | |
139 | [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__ | |
140 | that specifies a char-set definition string following a syntax | |
141 | that resembles posix style regular expression character sets | |
142 | (except the square brackets and the negation `^` character).]] | |
143 | [[`cp`] [A char parser, a char range parser or a char set parser.]] | |
144 | ] | |
145 | ||
146 | [heading Expression Semantics] | |
147 | ||
148 | Semantics of an expression is defined only where it differs from, or is | |
149 | not defined in __primitive_parser_concept__. | |
150 | ||
151 | [table | |
152 | [[Expression] [Semantics]] | |
153 | [[`c`] [Create char parser from a char, `c`.]] | |
154 | [[`lit(c)`] [Create a char parser from a char, `c`.]] | |
155 | [[`ns::char_`] [Create a char parser that matches any character in the | |
156 | `ns` encoding.]] | |
157 | [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]] | |
158 | [[`ns::char_(f, l)`][Create a char-range parser that matches characters from | |
159 | range (`f` to `l`, inclusive) with `ns` encoding.]] | |
160 | [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set | |
161 | definition string, `cs`.]] | |
162 | [[`~cp`] [Negate `cp`. The result is a negated char parser that | |
163 | matches any character in the `ns` encoding except the | |
164 | characters matched by `cp`.]] | |
165 | ] | |
166 | ||
167 | [heading Attributes] | |
168 | ||
169 | [table | |
170 | [[Expression] [Attribute]] | |
171 | [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character | |
172 | type returned by invoking it.]] | |
173 | [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character | |
174 | type returned by invoking it.]] | |
175 | [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]] | |
176 | [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]] | |
177 | [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]] | |
178 | [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]] | |
179 | [[`~cp`] [The attribute of `cp`.]] | |
180 | ] | |
181 | ||
182 | [heading Complexity] | |
183 | ||
184 | [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g. | |
185 | `wchar_t`). These have *O(log N)* complexity, where N is the number of | |
186 | distinct character ranges in the set.] | |
187 | ||
188 | [heading Example] | |
189 | ||
190 | [note The test harness for the example(s) below is presented in the | |
191 | __qi_basics_examples__ section.] | |
192 | ||
193 | Some using declarations: | |
194 | ||
195 | [reference_using_declarations_lit_char] | |
196 | ||
197 | Basic literals: | |
198 | ||
199 | [reference_char_literals] | |
200 | ||
201 | Range: | |
202 | ||
203 | [reference_char_range] | |
204 | ||
205 | Character set: | |
206 | ||
207 | [reference_char_set] | |
208 | ||
209 | Lazy char_ using __phoenix__ | |
210 | ||
211 | [reference_char_phoenix] | |
212 | ||
213 | [endsect] [/ Char] | |
214 | ||
215 | [/------------------------------------------------------------------------------] | |
216 | [section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)] | |
217 | ||
218 | [heading Description] | |
219 | ||
220 | The library has the full repertoire of single character parsers for | |
221 | character classification. This includes the usual `alnum`, `alpha`, | |
222 | `digit`, `xdigit`, etc. parsers. These parsers have an associated | |
223 | __char_encoding_namespace__. This is needed when doing basic operations | |
224 | such as inhibiting case sensitivity. | |
225 | ||
226 | [heading Header] | |
227 | ||
228 | // forwards to <boost/spirit/home/qi/char/char_class.hpp> | |
229 | #include <boost/spirit/include/qi_char_class.hpp> | |
230 | ||
231 | Also, see __include_structure__. | |
232 | ||
233 | [heading Namespace] | |
234 | ||
235 | [table | |
236 | [[Name]] | |
237 | [[`ns::alnum`]] | |
238 | [[`ns::alpha`]] | |
239 | [[`ns::blank`]] | |
240 | [[`ns::cntrl`]] | |
241 | [[`ns::digit`]] | |
242 | [[`ns::graph`]] | |
243 | [[`ns::lower`]] | |
244 | [[`ns::print`]] | |
245 | [[`ns::punct`]] | |
246 | [[`ns::space`]] | |
247 | [[`ns::upper`]] | |
248 | [[`ns::xdigit`]] | |
249 | ] | |
250 | ||
251 | In the table above, `ns` represents a __char_encoding_namespace__. | |
252 | ||
253 | [heading Model of] | |
254 | ||
255 | [:__primitive_parser_concept__] | |
256 | ||
257 | [variablelist Notation | |
258 | [[`ns`] [A __char_encoding_namespace__.]] | |
259 | ] | |
260 | ||
261 | [heading Expression Semantics] | |
262 | ||
263 | Semantics of an expression is defined only where it differs from, or is | |
264 | not defined in __primitive_parser_concept__. | |
265 | ||
266 | [table | |
267 | [[Expression] [Semantics]] | |
268 | [[`ns::alnum`] [Matches alpha-numeric characters]] | |
269 | [[`ns::alpha`] [Matches alphabetic characters]] | |
270 | [[`ns::blank`] [Matches spaces or tabs]] | |
271 | [[`ns::cntrl`] [Matches control characters]] | |
272 | [[`ns::digit`] [Matches numeric digits]] | |
273 | [[`ns::graph`] [Matches non-space printing characters]] | |
274 | [[`ns::lower`] [Matches lower case letters]] | |
275 | [[`ns::print`] [Matches printable characters]] | |
276 | [[`ns::punct`] [Matches punctuation symbols]] | |
277 | [[`ns::space`] [Matches spaces, tabs, returns, and newlines]] | |
278 | [[`ns::upper`] [Matches upper case letters]] | |
279 | [[`ns::xdigit`] [Matches hexadecimal digits]] | |
280 | ] | |
281 | ||
282 | [heading Attributes] | |
283 | ||
284 | [:The character type of the __char_encoding_namespace__, `ns`.] | |
285 | ||
286 | [heading Complexity] | |
287 | ||
288 | [:O(N)] | |
289 | ||
290 | [heading Example] | |
291 | ||
292 | [note The test harness for the example(s) below is presented in the | |
293 | __qi_basics_examples__ section.] | |
294 | ||
295 | Some using declarations: | |
296 | ||
297 | [reference_using_declarations_char_class] | |
298 | ||
299 | Basic usage: | |
300 | ||
301 | [reference_char_class] | |
302 | ||
303 | [endsect] [/ Char Classification] | |
304 | ||
305 | [endsect] |