3 <title>Operators
</title>
4 <meta http-equiv=
"Content-Type" content=
"text/html; charset=iso-8859-1">
5 <link rel=
"stylesheet" href=
"theme/style.css" type=
"text/css">
9 <table width=
"100%" border=
"0" background=
"theme/bkd2.gif" cellspacing=
"2">
14 <font size=
"6" face=
"Verdana, Arial, Helvetica, sans-serif"><b>Operators
</b></font>
16 <td width=
"112"><a href=
"http://spirit.sf.net"><img src=
"theme/spirit.gif" width=
"112" height=
"48" align=
"right" border=
"0"></a></td>
23 <td width=
"30"><a href=
"../index.html"><img src=
"theme/u_arr.gif" border=
"0"></a></td>
24 <td width=
"30"><a href=
"primitives.html"><img src=
"theme/l_arr.gif" border=
"0"></a></td>
25 <td width=
"30"><a href=
"numerics.html"><img src=
"theme/r_arr.gif" border=
"0"></a></td>
28 <p>Operators are used as a means for object composition and embedding. Simple
29 parsers may be composed to form composites through operator overloading, crafted
30 to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant.
31 An expression such as:
</p>
32 <pre><code><font color=
"#000000"> <span class=identifier
>a
</span><span class=special
>|
</span><span class=identifier
>b
</span></font></code></pre>
33 <p>actually yields a new parser type which is a composite of its operands, a and
34 b. Taking this example further, if a and b were of type
<tt>chlit
</tt><>,
35 the result would have the composite type:
</p>
36 <pre><code><font color=
"#000000"> <span class=identifier
>alternative
</span><span class=special
><</span><span class=identifier
>chlit
</span><span class=special
><>,
</span><span class=identifier
>chlit
</span><span class=special
><> </span><span class=special
>></span></font></code></pre>
37 <p> In general, for any binary operator, it will take its two arguments, parser1
38 and parser2, and create a new composed parser of the form
</p>
39 <pre><code><font color=
"#000000"> <span class=identifier
>op
</span><span class=special
><</span><span class=identifier
>parser1
</span><span class=special
>,
</span><span class=identifier
>parser2
</span><span class=special
>></span></font></code></pre>
40 <p>where parser1 and parser2 can be arbitrarily complex parsers themselves, with
41 the only limitations being what your compiler imposes.
</p>
42 <h3>Set Operators
</h3>
43 <table width=
"90%" border=
"0" align=
"center">
45 <td class=
"table_title" colspan=
"3">Set operators
</td>
48 <td class=
"table_cells" width=
"20%"><code><span class=identifier
>a
</span><span class=special
>|
49 </span><span class=identifier
>b
</span></code></td>
50 <td class=
"table_cells" width=
"24%">Union
</td>
51 <td class=
"table_cells" width=
"56%">Match a or b. Also referred to as alternative
</td>
54 <td class=
"table_cells" width=
"20%"><code><span class=identifier
>a
</span><span class=special
>&
55 </span><span class=identifier
>b
</span></code></td>
56 <td class=
"table_cells" width=
"24%">Intersection
</td>
57 <td class=
"table_cells" width=
"56%">Match a and b
</td>
60 <td class=
"table_cells" width=
"20%"><code><span class=identifier
>a
</span><span class=special
>-
61 </span><span class=identifier
>b
</span></code></td>
62 <td class=
"table_cells" width=
"24%">Difference
</td>
63 <td class=
"table_cells" width=
"56%">Match a but not b. If both match and b's
64 matched text is shorter than a's matched text, a successful match is made
</td>
67 <td class=
"table_cells" width=
"20%"><code><span class=identifier
>a
</span><span class=special
>^
68 </span><span class=identifier
>b
</span></code></td>
69 <td class=
"table_cells" width=
"24%">XOR
</td>
70 <td class=
"table_cells" width=
"56%">Match a or b, but not both
</td>
73 <p><b>Short-circuiting
</b></p>
74 <p>Alternative operands are tried one by one on a first come first served basis
75 starting from the leftmost operand. After a successfully matched alternative
76 is found, the parser concludes its search, essentially short-circuiting the
77 search for other potentially viable candidates. This short-circuiting implicitly
78 gives the highest priority to the leftmost alternative.
</p>
79 <p>Short-circuiting is done in the same manner as C or C++'s logical expressions;
80 e.g.
<tt>if
</tt> <tt><span class=
"operators">(
</span>x
<span class=
"operators"><</span>
81 3 <span class=
"operators">||
</span> y
<span class=
"operators"><</span> 2<span class=
"operators">)
</span></tt>
82 where, if
<tt>x
</tt> evaluates to be less than
3, the
<tt>y
<span class=
"operators"><</span>
83 2</tt> test is not done at all. In addition to providing an implicit priority
84 rule for alternatives which is necessary, given the non-deterministic nature
85 of the Spirit parser compiler, short-circuiting improves the execution time.
86 If the order of your alternatives is logically irrelevant, strive to put the
87 (expected) most common choice first for maximum efficiency.
</p>
88 <table width=
"80%" border=
"0" align=
"center">
90 <td class=
"note_box"><img src=
"theme/lens.gif" width=
"15" height=
"16"> <b>Intersections
</b><br>
92 Some researchers assert that the intersections (e.g.
<tt>a
& b
</tt>)
93 let us define context sensitive languages (
<a href=
"references.html#intersections">"XBNF
"</a>
94 [citing Leu-Weiner,
1973]).
"The theory of defining a language as the
95 intersection of a finite number of context free languages was developed
96 by Leu and Weiner in
1973".
<br>
98 <b><img src=
"theme/lens.gif" width=
"15" height=
"16"> <b></b>~ Operator
</b><br>
100 The complement operator
<tt>~
</tt> was originally put into consideration.
101 Further understanding of its value and meaning leads us to uncertainty.
102 The basic problem stems from the fact that
<tt>~a
</tt> will yield
<tt>U-a
</tt>,
103 where
<tt>U
</tt> is the universal set of all strings. However, where it
104 makes sense, some parsers can be complemented (see the
<a href=
"primitives.html#negation">primitive
105 character parsers
</a> for examples).
</td>
108 <h3>Sequencing Operators
</h3>
109 <table width=
"90%" border=
"0" align=
"center">
111 <td class=
"table_title" colspan=
"3">Sequencing operators
</td>
114 <td class=
"table_cells" width=
"21%"><code><span class=identifier
>a
</span><span class=special
>>>
115 </span><span class=identifier
>b
</span></code></td>
116 <td class=
"table_cells" width=
"23%">Sequence
</td>
117 <td class=
"table_cells" width=
"56%">Match a and b in sequence
</td>
120 <td class=
"table_cells" width=
"21%"><code><span class=identifier
>a
</span><span class=special
>&&
121 </span><span class=identifier
>b
</span></code></td>
122 <td class=
"table_cells" width=
"23%">Sequential-and
</td>
123 <td class=
"table_cells" width=
"56%">Sequential-and. Same as above, match a
124 and b in sequence
</td>
127 <td class=
"table_cells" width=
"21%"><code><span class=identifier
>a
</span><span class=special
>||
128 </span><span class=identifier
>b
</span></code></td>
129 <td class=
"table_cells" width=
"23%">Sequential-or
</td>
130 <td class=
"table_cells" width=
"56%">Match a or b in sequence
</td>
133 <p>The sequencing operator
<tt class=
"operators">>></tt> can alternatively
134 be thought of as the sequential-and operator. The expression
<tt>a
<span class=
"operators">&&</span>
135 b
</tt> reads as match a and b in sequence. Continuing this logic, we can also
136 have a sequential-or operator where the expression
<tt>a
<span class=
"operators">||
</span>
137 b
</tt> reads as match a or b and in sequence. That is, if both a and b match,
138 it must be in sequence; this is equivalent to
<tt>a
>> !b | b
</tt>.
</p>
139 <h3>Optional and Loops
</h3>
140 <table width=
"90%" border=
"0" align=
"center">
142 <td class=
"table_title" colspan=
"3">Optional and Loops
</td>
145 <td class=
"table_cells" width=
"21%"><code><span class=special
>*
</span><span class=identifier
>a
</span></code></td>
146 <td class=
"table_cells" width=
"23%">Kleene star
</td>
147 <td class=
"table_cells" width=
"56%">Match a zero (
0) or more times
</td>
150 <td class=
"table_cells" width=
"21%"><code><span class=special
>+
</span><span class=identifier
>a
</span></code></td>
151 <td class=
"table_cells" width=
"23%">Positive
</td>
152 <td class=
"table_cells" width=
"56%">Match a one (
1) or more times
</td>
155 <td class=
"table_cells" width=
"21%"><code><span class=special
>!
</span><span class=identifier
>a
</span></code></td>
156 <td class=
"table_cells" width=
"23%">Optional
</td>
157 <td class=
"table_cells" width=
"56%">Match a zero (
0) or one (
1) time
</td>
160 <td class=
"table_cells" width=
"21%"><code><span class=identifier
>a
</span><span class=special
>%
161 </span><span class=identifier
>b
</span></code></td>
162 <td class=
"table_cells" width=
"23%">List
</td>
163 <td class=
"table_cells" width=
"56%">Match a list of one or more repetitions
164 of a separated by occurrences of b. This is the same as
<tt>a
>> *(b
165 >> a)
</tt>. Note that
<tt>a
</tt> must not also match
<tt>b
</tt></td>
168 <p><img src=
"theme/note.gif" width=
"16" height=
"16"> If we look more closely,
169 take note that we generalized the optional expression of the form
<tt>!a
</tt>
170 in the same category as loops. This is logical, considering that the optional
171 matches the expression following it zero (
0) or one (
1) time.
</p>
172 <p><b>Primitive type operands
</b></p>
173 <p> For binary operators, one of the operands but not both may be a
<tt>char
</tt>,
174 <tt> wchar_t
</tt>,
<tt>char const
<span class=
"operators">*
</span></tt> or
<tt>wchar_t
175 const
<span class=
"operators">*
</span></tt>. Where P is a parser object, here
176 are some examples:
</p>
177 <pre><code><span class=identifier
> </span><span class=identifier
>P
</span><span class=special
>|
</span><span class=literal
>'x'
178 </span><span class=identifier
>P
</span><span class=special
>-
</span><span class=identifier
>L
</span><span class=string
>"Hello World"
179 </span><span class=literal
>'x'
</span><span class=special
>>> </span><span class=identifier
>P
180 </span><span class=string
>"bebop" </span><span class=special
>>> </span><span class=identifier
>P
</span></code></pre>
181 <p>It is important to emphasize that C++ mandates that operators may only be overloaded
182 if at least one argument is a user-defined type. Typically, in an expression
183 involving multiple operators, explicitly typing the leftmost operand as a parser
184 is enough to cause propagation to all the rest of the operands to its right
185 to be regarded as parsers. Examples:
</p>
186 <pre><code><font color=
"#000000"><span class=identifier
> </span><span class=identifier
>r
</span><span class=special
>=
</span><span class=literal
>'a'
</span><span class=special
>|
</span><span class=literal
>'b'
</span><span class=special
>|
</span><span class=literal
>'c'
</span><span class=special
>|
</span><span class=literal
>'d'
</span><span class=special
>;
</span><span class=comment
>// ill formed
187 </span><span class=identifier
>r
</span><span class=special
>=
</span><span class=identifier
>ch_p
</span><span class=special
>(
</span><span class=literal
>'a'
</span><span class=special
>)
</span><span class=special
>|
</span><span class=literal
>'b'
</span><span class=special
>|
</span><span class=literal
>'c'
</span><span class=special
>|
</span><span class=literal
>'d'
</span><span class=special
>;
</span><span class=comment
>// OK
</span></font></code></pre>
188 <p>The second case is parsed as follows:
</p>
189 <pre><code><font color=
"#000000"> r
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(((
</span><span class=identifier
>chlit
</span><span class=special
><</span><span class=keyword
>char
</span><span class=special
>> </span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
191 a
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(
</span><span class=identifier
>chlit
</span><span class=special
><</span><span class=keyword
>char
</span><span class=special
>> </span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
192 r
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(((
</span><span class=identifier
>a
</span><span class=special
>)
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
194 b
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(
</span><span class=identifier
>a
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
195 r
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(((
</span><span class=identifier
>b
</span><span class=special
>))
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
197 c
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(
</span><span class=identifier
>b
</span><span class=special
>|
</span><span class=keyword
>char
</span><span class=special
>)
</span></font>
198 r
<font color=
"#0000ff"><img src=
"theme/arrow.gif"> <span class=special
>(((
</span><span class=identifier
>c
</span><span class=special
>)))
</span></font></font></code></pre>
199 <p><b>Operator precedence and grouping
</b></p>
200 <p>Since we are defining our meta-language in C++, we follow C/C++'s operator
201 precedence rules. Grouping expressions inside the parentheses override this
202 (e.g.,
<tt><span class=
"operators">*(
</span>a
<span class=
"operators">|
</span>
203 b
<span class=
"operators">)
</span></tt> reads: match a or b zero (
0) or more
208 <td width=
"30"><a href=
"../index.html"><img src=
"theme/u_arr.gif" border=
"0"></a></td>
209 <td width=
"30"><a href=
"primitives.html"><img src=
"theme/l_arr.gif" border=
"0"></a></td>
210 <td width=
"30"><a href=
"numerics.html"><img src=
"theme/r_arr.gif" border=
"0"></a></td>
215 <p class=
"copyright">Copyright
© 1998-
2003 Joel de Guzman
<br>
217 <font size=
"2">Use, modification and distribution is subject to the Boost Software
218 License, Version
1.0. (See accompanying file LICENSE_1_0.txt or copy at
219 http://www.boost.org/LICENSE_1_0.txt)
</font> </p>