1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7 ===============================================================================/]
8 [section:char Character Parsers]
10 This module includes parsers for single characters. Currently, this
11 module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
12 characters, ranges and character sets) and the encoding specific
13 character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
15 [heading Module Header]
17 // forwards to <boost/spirit/home/qi/char.hpp>
18 #include <boost/spirit/include/qi_char.hpp>
20 Also, see __include_structure__.
22 [/------------------------------------------------------------------------------]
23 [section:char Character Parser (`char_`, `lit`)]
27 The `char_` parser matches single characters. The `char_` parser has an
28 associated __char_encoding_namespace__. This is needed when doing basic
29 operations such as inhibiting case sensitivity and dealing with
32 There are various forms of `char_`.
36 The no argument form of `char_` matches any character in the associated
37 __char_encoding_namespace__.
39 char_ // matches any character
43 The single argument form of `char_` (with a character argument) matches
44 the supplied character.
46 char_('x') // matches 'x'
47 char_(L'x') // matches L'x'
48 char_(x) // matches x (a char)
50 [heading char_(first, last)]
52 `char_` with two arguments, matches a range of characters.
54 char_('a','z') // alphabetic characters
55 char_(L'0',L'9') // digits
57 A range of characters is created from a low-high character pair. Such a
58 parser matches a single character that is in the range, including both
59 endpoints. Note, the first character must be /before/ the second,
60 according to the underlying __char_encoding_namespace__.
62 Character mapping is inherently platform dependent. It is not guaranteed
63 in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
64 purposely attach a specific __char_encoding_namespace__ (such as ASCII,
65 ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
67 [note *Sparse bit vectors*
69 To accommodate 16/32 and 64 bit characters, the char-set statically
70 switches from a `std::bitset` implementation when the character type is
71 not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
72 vector of disjoint ranges (`range_run`). The set is constructed from
73 ranges such that adjacent or overlapping ranges are coalesced.
75 `range_runs` are very space-economical in situations where there are lots
76 of ranges and a few individual disjoint values. Searching is O(log n)
77 where n is the number of ranges.]
81 Lastly, when given a string (a plain C string, a `std::basic_string`,
82 etc.), the string is regarded as a char-set definition string following
83 a syntax that resembles posix style regular expression character sets
84 (except that double quotes delimit the set elements instead of square
85 brackets and there is no special negation ^ character). Examples:
87 char_("a-zA-Z") // alphabetic characters
88 char_("0-9a-fA-F") // hexadecimal characters
89 char_("actgACTG") // DNA identifiers
90 char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
94 `lit`, when passed a single character, behaves like the single argument
95 `char_` except that `lit` does not synthesize an attribute. A plain
96 `char` or `wchar_t` is equivalent to a `lit`.
98 [note `lit` is reused by both the [qi_lit_string string parsers] and the
99 char parsers. In general, a char parser is created when you pass in a
100 character and a string parser is created when you pass in a string. The
101 exception is when you pass a single element literal string, e.g.
102 `lit("x")`. In this case, we optimize this to create a char parser
103 instead of a string parser.]
110 lit(c) // c is a char
114 // forwards to <boost/spirit/home/qi/char/char.hpp>
115 #include <boost/spirit/include/qi_char_.hpp>
117 Also, see __include_structure__.
123 [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
127 In the table above, `ns` represents a __char_encoding_namespace__.
131 [:__primitive_parser_concept__]
133 [variablelist Notation
134 [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be
135 converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
136 that evaluates to anything that can be converted to a `char`
138 [[`ns`] [A __char_encoding_namespace__.]]
139 [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
140 that specifies a char-set definition string following a syntax
141 that resembles posix style regular expression character sets
142 (except the square brackets and the negation `^` character).]]
143 [[`cp`] [A char parser, a char range parser or a char set parser.]]
146 [heading Expression Semantics]
148 Semantics of an expression is defined only where it differs from, or is
149 not defined in __primitive_parser_concept__.
152 [[Expression] [Semantics]]
153 [[`c`] [Create char parser from a char, `c`.]]
154 [[`lit(c)`] [Create a char parser from a char, `c`.]]
155 [[`ns::char_`] [Create a char parser that matches any character in the
157 [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]]
158 [[`ns::char_(f, l)`][Create a char-range parser that matches characters from
159 range (`f` to `l`, inclusive) with `ns` encoding.]]
160 [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set
161 definition string, `cs`.]]
162 [[`~cp`] [Negate `cp`. The result is a negated char parser that
163 matches any character in the `ns` encoding except the
164 characters matched by `cp`.]]
170 [[Expression] [Attribute]]
171 [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
172 type returned by invoking it.]]
173 [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
174 type returned by invoking it.]]
175 [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]]
176 [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]]
177 [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
178 [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]]
179 [[`~cp`] [The attribute of `cp`.]]
184 [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
185 `wchar_t`). These have *O(log N)* complexity, where N is the number of
186 distinct character ranges in the set.]
190 [note The test harness for the example(s) below is presented in the
191 __qi_basics_examples__ section.]
193 Some using declarations:
195 [reference_using_declarations_lit_char]
199 [reference_char_literals]
203 [reference_char_range]
209 Lazy char_ using __phoenix__
211 [reference_char_phoenix]
215 [/------------------------------------------------------------------------------]
216 [section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
218 [heading Description]
220 The library has the full repertoire of single character parsers for
221 character classification. This includes the usual `alnum`, `alpha`,
222 `digit`, `xdigit`, etc. parsers. These parsers have an associated
223 __char_encoding_namespace__. This is needed when doing basic operations
224 such as inhibiting case sensitivity.
228 // forwards to <boost/spirit/home/qi/char/char_class.hpp>
229 #include <boost/spirit/include/qi_char_class.hpp>
231 Also, see __include_structure__.
251 In the table above, `ns` represents a __char_encoding_namespace__.
255 [:__primitive_parser_concept__]
257 [variablelist Notation
258 [[`ns`] [A __char_encoding_namespace__.]]
261 [heading Expression Semantics]
263 Semantics of an expression is defined only where it differs from, or is
264 not defined in __primitive_parser_concept__.
267 [[Expression] [Semantics]]
268 [[`ns::alnum`] [Matches alpha-numeric characters]]
269 [[`ns::alpha`] [Matches alphabetic characters]]
270 [[`ns::blank`] [Matches spaces or tabs]]
271 [[`ns::cntrl`] [Matches control characters]]
272 [[`ns::digit`] [Matches numeric digits]]
273 [[`ns::graph`] [Matches non-space printing characters]]
274 [[`ns::lower`] [Matches lower case letters]]
275 [[`ns::print`] [Matches printable characters]]
276 [[`ns::punct`] [Matches punctuation symbols]]
277 [[`ns::space`] [Matches spaces, tabs, returns, and newlines]]
278 [[`ns::upper`] [Matches upper case letters]]
279 [[`ns::xdigit`] [Matches hexadecimal digits]]
284 [:The character type of the __char_encoding_namespace__, `ns`.]
292 [note The test harness for the example(s) below is presented in the
293 __qi_basics_examples__ section.]
295 Some using declarations:
297 [reference_using_declarations_char_class]
301 [reference_char_class]
303 [endsect] [/ Char Classification]