]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/xpressive/doc/actions.qbk
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / src / boost / libs / xpressive / doc / actions.qbk
1 [/
2 / Copyright (c) 2008 Eric Niebler
3 /
4 / Distributed under the Boost Software License, Version 1.0. (See accompanying
5 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6 /]
7
8 [section Semantic Actions and User-Defined Assertions]
9
10 [h2 Overview]
11
12 Imagine you want to parse an input string and build a `std::map<>` from it. For
13 something like that, matching a regular expression isn't enough. You want to
14 /do something/ when parts of your regular expression match. Xpressive lets
15 you attach semantic actions to parts of your static regular expressions. This
16 section shows you how.
17
18 [h2 Semantic Actions]
19
20 Consider the following code, which uses xpressive's semantic actions to parse
21 a string of word/integer pairs and stuffs them into a `std::map<>`. It is
22 described below.
23
24 #include <string>
25 #include <iostream>
26 #include <boost/xpressive/xpressive.hpp>
27 #include <boost/xpressive/regex_actions.hpp>
28 using namespace boost::xpressive;
29
30 int main()
31 {
32 std::map<std::string, int> result;
33 std::string str("aaa=>1 bbb=>23 ccc=>456");
34
35 // Match a word and an integer, separated by =>,
36 // and then stuff the result into a std::map<>
37 sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
38 [ ref(result)[s1] = as<int>(s2) ];
39
40 // Match one or more word/integer pairs, separated
41 // by whitespace.
42 sregex rx = pair >> *(+_s >> pair);
43
44 if(regex_match(str, rx))
45 {
46 std::cout << result["aaa"] << '\n';
47 std::cout << result["bbb"] << '\n';
48 std::cout << result["ccc"] << '\n';
49 }
50
51 return 0;
52 }
53
54 This program prints the following:
55
56 [pre
57 1
58 23
59 456
60 ]
61
62 The regular expression `pair` has two parts: the pattern and the action. The
63 pattern says to match a word, capturing it in sub-match 1, and an integer,
64 capturing it in sub-match 2, separated by `"=>"`. The action is the part in
65 square brackets: `[ ref(result)[s1] = as<int>(s2) ]`. It says to take sub-match
66 one and use it to index into the `results` map, and assign to it the result of
67 converting sub-match 2 to an integer.
68
69 [note To use semantic actions with your static regexes, you must
70 `#include <boost/xpressive/regex_actions.hpp>`]
71
72 How does this work? Just as the rest of the static regular expression, the part
73 between brackets is an expression template. It encodes the action and executes
74 it later. The expression `ref(result)` creates a lazy reference to the `result`
75 object. The larger expression `ref(result)[s1]` is a lazy map index operation.
76 Later, when this action is getting executed, `s1` gets replaced with the
77 first _sub_match_. Likewise, when `as<int>(s2)` gets executed, `s2` is replaced
78 with the second _sub_match_. The `as<>` action converts its argument to the
79 requested type using Boost.Lexical_cast. The effect of the whole action is to
80 insert a new word/integer pair into the map.
81
82 [note There is an important difference between the function `boost::ref()` in
83 `<boost/ref.hpp>` and `boost::xpressive::ref()` in
84 `<boost/xpressive/regex_actions.hpp>`. The first returns a plain
85 `reference_wrapper<>` which behaves in many respects like an ordinary
86 reference. By contrast, `boost::xpressive::ref()` returns a /lazy/ reference
87 that you can use in expressions that are executed lazily. That is why we can
88 say `ref(result)[s1]`, even though `result` doesn't have an `operator[]` that
89 would accept `s1`.]
90
91 In addition to the sub-match placeholders `s1`, `s2`, etc., you can also use
92 the placeholder `_` within an action to refer back to the string matched by
93 the sub-expression to which the action is attached. For instance, you can use
94 the following regex to match a bunch of digits, interpret them as an integer
95 and assign the result to a local variable:
96
97 int i = 0;
98 // Here, _ refers back to all the
99 // characters matched by (+_d)
100 sregex rex = (+_d)[ ref(i) = as<int>(_) ];
101
102 [h3 Lazy Action Execution]
103
104 What does it mean, exactly, to attach an action to part of a regular expression
105 and perform a match? When does the action execute? If the action is part of a
106 repeated sub-expression, does the action execute once or many times? And if the
107 sub-expression initially matches, but ultimately fails because the rest of the
108 regular expression fails to match, is the action executed at all?
109
110 The answer is that by default, actions are executed /lazily/. When a sub-expression
111 matches a string, its action is placed on a queue, along with the current
112 values of any sub-matches to which the action refers. If the match algorithm
113 must backtrack, actions are popped off the queue as necessary. Only after the
114 entire regex has matched successfully are the actions actually exeucted. They
115 are executed all at once, in the order in which they were added to the queue,
116 as the last step before _regex_match_ returns.
117
118 For example, consider the following regex that increments a counter whenever
119 it finds a digit.
120
121 int i = 0;
122 std::string str("1!2!3?");
123 // count the exciting digits, but not the
124 // questionable ones.
125 sregex rex = +( _d [ ++ref(i) ] >> '!' );
126 regex_search(str, rex);
127 assert( i == 2 );
128
129 The action `++ref(i)` is queued three times: once for each found digit. But
130 it is only /executed/ twice: once for each digit that precedes a `'!'`
131 character. When the `'?'` character is encountered, the match algorithm
132 backtracks, removing the final action from the queue.
133
134 [h3 Immediate Action Execution]
135
136 When you want semantic actions to execute immediately, you can wrap the
137 sub-expression containing the action in a [^[funcref boost::xpressive::keep keep()]].
138 `keep()` turns off back-tracking for its sub-expression, but it also causes
139 any actions queued by the sub-expression to execute at the end of the `keep()`.
140 It is as if the sub-expression in the `keep()` were compiled into an
141 independent regex object, and matching the `keep()` is like a separate invocation
142 of `regex_search()`. It matches characters and executes actions but never backtracks
143 or unwinds. For example, imagine the above example had been written as follows:
144
145 int i = 0;
146 std::string str("1!2!3?");
147 // count all the digits.
148 sregex rex = +( keep( _d [ ++ref(i) ] ) >> '!' );
149 regex_search(str, rex);
150 assert( i == 3 );
151
152 We have wrapped the sub-expression `_d [ ++ref(i) ]` in `keep()`. Now, whenever
153 this regex matches a digit, the action will be queued and then immediately
154 executed before we try to match a `'!'` character. In this case, the action
155 executes three times.
156
157 [note Like `keep()`, actions within [^[funcref boost::xpressive::before before()]]
158 and [^[funcref boost::xpressive::after after()]] are also executed early when their
159 sub-expressions have matched.]
160
161 [h3 Lazy Functions]
162
163 So far, we've seen how to write semantic actions consisting of variables and
164 operators. But what if you want to be able to call a function from a semantic
165 action? Xpressive provides a mechanism to do this.
166
167 The first step is to define a function object type. Here, for instance, is a
168 function object type that calls `push()` on its argument:
169
170 struct push_impl
171 {
172 // Result type, needed for tr1::result_of
173 typedef void result_type;
174
175 template<typename Sequence, typename Value>
176 void operator()(Sequence &seq, Value const &val) const
177 {
178 seq.push(val);
179 }
180 };
181
182 The next step is to use xpressive's `function<>` template to define a function
183 object named `push`:
184
185 // Global "push" function object.
186 function<push_impl>::type const push = {{}};
187
188 The initialization looks a bit odd, but this is because `push` is being
189 statically initialized. That means it doesn't need to be constructed
190 at runtime. We can use `push` in semantic actions as follows:
191
192 std::stack<int> ints;
193 // Match digits, cast them to an int
194 // and push it on the stack.
195 sregex rex = (+_d)[push(ref(ints), as<int>(_))];
196
197 You'll notice that doing it this way causes member function invocations
198 to look like ordinary function invocations. You can choose to write your
199 semantic action in a different way that makes it look a bit more like
200 a member function call:
201
202 sregex rex = (+_d)[ref(ints)->*push(as<int>(_))];
203
204 Xpressive recognizes the use of the `->*` and treats this expression
205 exactly the same as the one above.
206
207 When your function object must return a type that depends on its
208 arguments, you can use a `result<>` member template instead of the
209 `result_type` typedef. Here, for example, is a `first` function object
210 that returns the `first` member of a `std::pair<>` or _sub_match_:
211
212 // Function object that returns the
213 // first element of a pair.
214 struct first_impl
215 {
216 template<typename Sig> struct result {};
217
218 template<typename This, typename Pair>
219 struct result<This(Pair)>
220 {
221 typedef typename remove_reference<Pair>
222 ::type::first_type type;
223 };
224
225 template<typename Pair>
226 typename Pair::first_type
227 operator()(Pair const &p) const
228 {
229 return p.first;
230 }
231 };
232
233 // OK, use as first(s1) to get the begin iterator
234 // of the sub-match referred to by s1.
235 function<first_impl>::type const first = {{}};
236
237 [h3 Referring to Local Variables]
238
239 As we've seen in the examples above, we can refer to local variables within
240 an actions using `xpressive::ref()`. Any such variables are held by reference
241 by the regular expression, and care should be taken to avoid letting those
242 references dangle. For instance, in the following code, the reference to `i`
243 is left to dangle when `bad_voodoo()` returns:
244
245 sregex bad_voodoo()
246 {
247 int i = 0;
248 sregex rex = +( _d [ ++ref(i) ] >> '!' );
249 // ERROR! rex refers by reference to a local
250 // variable, which will dangle after bad_voodoo()
251 // returns.
252 return rex;
253 }
254
255 When writing semantic actions, it is your responsibility to make sure that
256 all the references do not dangle. One way to do that would be to make the
257 variables shared pointers that are held by the regex by value.
258
259 sregex good_voodoo(boost::shared_ptr<int> pi)
260 {
261 // Use val() to hold the shared_ptr by value:
262 sregex rex = +( _d [ ++*val(pi) ] >> '!' );
263 // OK, rex holds a reference count to the integer.
264 return rex;
265 }
266
267 In the above code, we use `xpressive::val()` to hold the shared pointer by
268 value. That's not normally necessary because local variables appearing in
269 actions are held by value by default, but in this case, it is necessary. Had
270 we written the action as `++*pi`, it would have executed immediately. That's
271 because `++*pi` is not an expression template, but `++*val(pi)` is.
272
273 It can be tedious to wrap all your variables in `ref()` and `val()` in your
274 semantic actions. Xpressive provides the `reference<>` and `value<>` templates
275 to make things easier. The following table shows the equivalencies:
276
277 [table reference<> and value<>
278 [[This ...][... is equivalent to this ...]]
279 [[``int i = 0;
280
281 sregex rex = +( _d [ ++ref(i) ] >> '!' );``][``int i = 0;
282 reference<int> ri(i);
283 sregex rex = +( _d [ ++ri ] >> '!' );``]]
284 [[``boost::shared_ptr<int> pi(new int(0));
285
286 sregex rex = +( _d [ ++*val(pi) ] >> '!' );``][``boost::shared_ptr<int> pi(new int(0));
287 value<boost::shared_ptr<int> > vpi(pi);
288 sregex rex = +( _d [ ++*vpi ] >> '!' );``]]
289 ]
290
291 As you can see, when using `reference<>`, you need to first declare a local
292 variable and then declare a `reference<>` to it. These two steps can be combined
293 into one using `local<>`.
294
295 [table local<> vs. reference<>
296 [[This ...][... is equivalent to this ...]]
297 [[``local<int> i(0);
298
299 sregex rex = +( _d [ ++i ] >> '!' );``][``int i = 0;
300 reference<int> ri(i);
301 sregex rex = +( _d [ ++ri ] >> '!' );``]]
302 ]
303
304 We can use `local<>` to rewrite the above example as follows:
305
306 local<int> i(0);
307 std::string str("1!2!3?");
308 // count the exciting digits, but not the
309 // questionable ones.
310 sregex rex = +( _d [ ++i ] >> '!' );
311 regex_search(str, rex);
312 assert( i.get() == 2 );
313
314 Notice that we use `local<>::get()` to access the value of the local
315 variable. Also, beware that `local<>` can be used to create a dangling
316 reference, just as `reference<>` can.
317
318 [h3 Referring to Non-Local Variables]
319
320 In the beginning of this
321 section, we used a regex with a semantic action to parse a string of
322 word/integer pairs and stuff them into a `std::map<>`. That required that
323 the map and the regex be defined together and used before either could
324 go out of scope. What if we wanted to define the regex once and use it
325 to fill lots of different maps? We would rather pass the map into the
326 _regex_match_ algorithm rather than embed a reference to it directly in
327 the regex object. What we can do instead is define a placeholder and use
328 that in the semantic action instead of the map itself. Later, when we
329 call one of the regex algorithms, we can bind the reference to an actual
330 map object. The following code shows how.
331
332 // Define a placeholder for a map object:
333 placeholder<std::map<std::string, int> > _map;
334
335 // Match a word and an integer, separated by =>,
336 // and then stuff the result into a std::map<>
337 sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
338 [ _map[s1] = as<int>(s2) ];
339
340 // Match one or more word/integer pairs, separated
341 // by whitespace.
342 sregex rx = pair >> *(+_s >> pair);
343
344 // The string to parse
345 std::string str("aaa=>1 bbb=>23 ccc=>456");
346
347 // Here is the actual map to fill in:
348 std::map<std::string, int> result;
349
350 // Bind the _map placeholder to the actual map
351 smatch what;
352 what.let( _map = result );
353
354 // Execute the match and fill in result map
355 if(regex_match(str, what, rx))
356 {
357 std::cout << result["aaa"] << '\n';
358 std::cout << result["bbb"] << '\n';
359 std::cout << result["ccc"] << '\n';
360 }
361
362 This program displays:
363
364 [pre
365 1
366 23
367 456
368 ]
369
370 We use `placeholder<>` here to define `_map`, which stands in for a
371 `std::map<>` variable. We can use the placeholder in the semantic action as if
372 it were a map. Then, we define a _match_results_ struct and bind an actual map
373 to the placeholder with "`what.let( _map = result );`". The _regex_match_ call
374 behaves as if the placeholder in the semantic action had been replaced with a
375 reference to `result`.
376
377 [note Placeholders in semantic actions are not /actually/ replaced at runtime
378 with references to variables. The regex object is never mutated in any way
379 during any of the regex algorithms, so they are safe to use in multiple
380 threads.]
381
382 The syntax for late-bound action arguments is a little different if you are
383 using _regex_iterator_ or _regex_token_iterator_. The regex iterators accept
384 an extra constructor parameter for specifying the argument bindings. There is
385 a `let()` function that you can use to bind variables to their placeholders.
386 The following code demonstrates how.
387
388 // Define a placeholder for a map object:
389 placeholder<std::map<std::string, int> > _map;
390
391 // Match a word and an integer, separated by =>,
392 // and then stuff the result into a std::map<>
393 sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
394 [ _map[s1] = as<int>(s2) ];
395
396 // The string to parse
397 std::string str("aaa=>1 bbb=>23 ccc=>456");
398
399 // Here is the actual map to fill in:
400 std::map<std::string, int> result;
401
402 // Create a regex_iterator to find all the matches
403 sregex_iterator it(str.begin(), str.end(), pair, let(_map=result));
404 sregex_iterator end;
405
406 // step through all the matches, and fill in
407 // the result map
408 while(it != end)
409 ++it;
410
411 std::cout << result["aaa"] << '\n';
412 std::cout << result["bbb"] << '\n';
413 std::cout << result["ccc"] << '\n';
414
415 This program displays:
416
417 [pre
418 1
419 23
420 456
421 ]
422
423 [h2 User-Defined Assertions]
424
425 You are probably already familiar with regular expression /assertions/. In
426 Perl, some examples are the [^^] and [^$] assertions, which you can use to
427 match the beginning and end of a string, respectively. Xpressive lets you
428 define your own assertions. A custom assertion is a contition which must be
429 true at a point in the match in order for the match to succeed. You can check
430 a custom assertion with xpressive's _check_ function.
431
432 There are a couple of ways to define a custom assertion. The simplest is to
433 use a function object. Let's say that you want to ensure that a sub-expression
434 matches a sub-string that is either 3 or 6 characters long. The following
435 struct defines such a predicate:
436
437 // A predicate that is true IFF a sub-match is
438 // either 3 or 6 characters long.
439 struct three_or_six
440 {
441 bool operator()(ssub_match const &sub) const
442 {
443 return sub.length() == 3 || sub.length() == 6;
444 }
445 };
446
447 You can use this predicate within a regular expression as follows:
448
449 // match words of 3 characters or 6 characters.
450 sregex rx = (bow >> +_w >> eow)[ check(three_or_six()) ] ;
451
452 The above regular expression will find whole words that are either 3 or 6
453 characters long. The `three_or_six` predicate accepts a _sub_match_ that refers
454 back to the part of the string matched by the sub-expression to which the
455 custom assertion is attached.
456
457 [note The custom assertion participates in determining whether the match
458 succeeds or fails. Unlike actions, which execute lazily, custom assertions
459 execute immediately while the regex engine is searching for a match.]
460
461 Custom assertions can also be defined inline using the same syntax as for
462 semantic actions. Below is the same custom assertion written inline:
463
464 // match words of 3 characters or 6 characters.
465 sregex rx = (bow >> +_w >> eow)[ check(length(_)==3 || length(_)==6) ] ;
466
467 In the above, `length()` is a lazy function that calls the `length()` member
468 function of its argument, and `_` is a placeholder that receives the
469 `sub_match`.
470
471 Once you get the hang of writing custom assertions inline, they can be
472 very powerful. For example, you can write a regular expression that
473 only matches valid dates (for some suitably liberal definition of the
474 term ["valid]).
475
476 int const days_per_month[] =
477 {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 31, 31};
478
479 mark_tag month(1), day(2);
480 // find a valid date of the form month/day/year.
481 sregex date =
482 (
483 // Month must be between 1 and 12 inclusive
484 (month= _d >> !_d) [ check(as<int>(_) >= 1
485 && as<int>(_) <= 12) ]
486 >> '/'
487 // Day must be between 1 and 31 inclusive
488 >> (day= _d >> !_d) [ check(as<int>(_) >= 1
489 && as<int>(_) <= 31) ]
490 >> '/'
491 // Only consider years between 1970 and 2038
492 >> (_d >> _d >> _d >> _d) [ check(as<int>(_) >= 1970
493 && as<int>(_) <= 2038) ]
494 )
495 // Ensure the month actually has that many days!
496 [ check( ref(days_per_month)[as<int>(month)-1] >= as<int>(day) ) ]
497 ;
498
499 smatch what;
500 std::string str("99/99/9999 2/30/2006 2/28/2006");
501
502 if(regex_search(str, what, date))
503 {
504 std::cout << what[0] << std::endl;
505 }
506
507 The above program prints out the following:
508
509 [pre
510 2/28/2006
511 ]
512
513 Notice how the inline custom assertions are used to range-check the values for
514 the month, day and year. The regular expression doesn't match `"99/99/9999"` or
515 `"2/30/2006"` because they are not valid dates. (There is no 99th month, and
516 February doesn't have 30 days.)
517
518 [endsect]