]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section Warming up] | |
10 | ||
11 | We'll start by showing examples of parser expressions to give you a feel on how | |
12 | to build parsers from the simplest parser, building up as we go. When comparing | |
13 | EBNF to __spirit__, the expressions may seem awkward at first. __spirit__ heavily | |
14 | uses operator overloading to accomplish its magic. | |
15 | ||
16 | [heading Trivial Example #1 Parsing a number] | |
17 | ||
18 | Create a parser that will parse a floating-point number. | |
19 | ||
20 | double_ | |
21 | ||
22 | (You've got to admit, that's trivial!) The above code actually generates a | |
23 | Spirit floating point parser (a built-in parser). Spirit has many pre-defined | |
24 | parsers and consistent naming conventions help you keep from going insane! | |
25 | ||
26 | [heading Trivial Example #2 Parsing two numbers] | |
27 | ||
28 | Create a parser that will accept a line consisting of two floating-point numbers. | |
29 | ||
30 | double_ >> double_ | |
31 | ||
32 | Here you see the familiar floating-point numeric parser `double_` used twice, | |
33 | once for each number. What's that `>>` operator doing in there? Well, they had | |
34 | to be separated by something, and this was chosen as the "followed by" sequence | |
35 | operator. The above program creates a parser from two simpler parsers, glueing | |
36 | them together with the sequence operator. The result is a parser that is a | |
37 | composition of smaller parsers. Whitespace between numbers can implicitly be | |
38 | consumed depending on how the parser is invoked (see below). | |
39 | ||
40 | [note When we combine parsers, we end up with a "bigger" parser, but | |
41 | it's still a parser. Parsers can get bigger and bigger, nesting more and more, | |
42 | but whenever you glue two parsers together, you end up with one bigger parser. | |
43 | This is an important concept. | |
44 | ] | |
45 | ||
46 | [heading Trivial Example #3 Parsing zero or more numbers] | |
47 | ||
48 | Create a parser that will accept zero or more floating-point numbers. | |
49 | ||
50 | *double_ | |
51 | ||
52 | This is like a regular-expression Kleene Star, though the syntax might look a | |
53 | bit odd for a C++ programmer not used to seeing the `*` operator overloaded like | |
54 | this. Actually, if you know regular expressions it may look odd too since the | |
55 | star is before the expression it modifies. C'est la vie. Blame it on the fact | |
56 | that we must work with the syntax rules of C++. | |
57 | ||
58 | Any expression that evaluates to a parser may be used with the Kleene Star. | |
59 | Keep in mind that C++ operator precedence rules may require you to put | |
60 | expressions in parentheses for complex expressions. The Kleene Star | |
61 | is also known as a Kleene Closure, but we call it the Star in most places. | |
62 | ||
63 | [heading Trivial Example #4 Parsing a comma-delimited list of numbers] | |
64 | ||
65 | This example will create a parser that accepts a comma-delimited list of | |
66 | numbers. | |
67 | ||
68 | double_ >> *(char_(',') >> double_) | |
69 | ||
70 | Notice `char_(',')`. It is a literal character parser that can recognize the | |
71 | comma `','`. In this case, the Kleene Star is modifying a more complex parser, | |
72 | namely, the one generated by the expression: | |
73 | ||
74 | (char_(',') >> double_) | |
75 | ||
76 | Note that this is a case where the parentheses are necessary. The Kleene star | |
77 | encloses the complete expression above. | |
78 | ||
79 | [heading Let's Parse!] | |
80 | ||
81 | We're done with defining the parser. So the next step is now invoking this | |
82 | parser to do its work. There are a couple of ways to do this. For now, we will | |
83 | use the `phrase_parse` function. One overload of this function accepts four | |
84 | arguments: | |
85 | ||
86 | # An iterator pointing to the start of the input | |
87 | # An iterator pointing to one past the end of the input | |
88 | # The parser object | |
89 | # Another parser called the skip parser | |
90 | ||
91 | In our example, we wish to skip spaces and tabs. Another parser named `space` | |
92 | is included in Spirit's repertoire of predefined parsers. It is a very simple | |
93 | parser that simply recognizes whitespace. We will use `space` as our skip | |
94 | parser. The skip parser is the one responsible for skipping characters in | |
95 | between parser elements such as the `double_` and `char_`. | |
96 | ||
97 | Ok, so now let's parse! | |
98 | ||
99 | [import ../../example/qi/num_list1.cpp] | |
100 | [tutorial_numlist1] | |
101 | ||
102 | The parse function returns `true` or `false` depending on the result of | |
103 | the parse. The first iterator is passed by reference. On a successful | |
104 | parse, this iterator is repositioned to the rightmost position consumed | |
105 | by the parser. If this becomes equal to `last`, then we have a full | |
106 | match. If not, then we have a partial match. A partial match happens | |
107 | when the parser is only able to parse a portion of the input. | |
108 | ||
109 | Note that we inlined the parser directly in the call to parse. Upon calling | |
110 | parse, the expression evaluates into a temporary, unnamed parser which is passed | |
111 | into the parse() function, used, and then destroyed. | |
112 | ||
113 | Here, we opted to make the parser generic by making it a template, parameterized | |
114 | by the iterator type. By doing so, it can take in data coming from any STL | |
115 | conforming sequence as long as the iterators conform to a forward iterator. | |
116 | ||
117 | You can find the full cpp file here: [@../../example/qi/num_list1.cpp] | |
118 | ||
119 | [note `char` and `wchar_t` operands | |
120 | ||
121 | The careful reader may notice that the parser expression has `','` instead of | |
122 | `char_(',')` as the previous examples did. This is ok due to C++ syntax rules of | |
123 | conversion. There are `>>` operators that are overloaded to accept a `char` or | |
124 | `wchar_t` argument on its left or right (but not both). An operator may be | |
125 | overloaded if at least one of its parameters is a user-defined type. In this | |
126 | case, the `double_` is the 2nd argument to `operator>>`, and so the proper | |
127 | overload of `>>` is used, converting `','` into a character literal parser. | |
128 | ||
129 | The problem with omitting the `char_` should be obvious: `'a' >> 'b'` is not a | |
130 | spirit parser, it is a numeric expression, right-shifting the ASCII (or another | |
131 | encoding) value of `'a'` by the ASCII value of `'b'`. However, both | |
132 | `char_('a') >> 'b'` and `'a' >> char_('b')` are Spirit sequence parsers | |
133 | for the letter `'a'` followed by `'b'`. You'll get used to it, sooner or later. | |
134 | ] | |
135 | ||
136 | Finally, take note that we test for a full match (i.e. the parser fully parsed | |
137 | the input) by checking if the first iterator, after parsing, is equal to the end | |
138 | iterator. You may strike out this part if partial matches are to be allowed. | |
139 | ||
140 | [endsect] [/ Warming up] |