]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/ |
2 | / Copyright (c) 2008 Eric Niebler | |
3 | / | |
4 | / Distributed under the Boost Software License, Version 1.0. (See accompanying | |
5 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
6 | /] | |
7 | ||
8 | [section Introduction] | |
9 | ||
10 | [h2 What is xpressive?] | |
11 | ||
12 | xpressive is a regular expression template library. Regular expressions | |
13 | (regexes) can be written as strings that are parsed dynamically at runtime | |
14 | (dynamic regexes), or as ['expression templates][footnote See | |
15 | [@http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html | |
16 | Expression Templates]] that are parsed at compile-time (static regexes). | |
17 | Dynamic regexes have the advantage that they can be accepted from the user | |
18 | as input at runtime or read from an initialization file. Static regexes | |
19 | have several advantages. Since they are C++ expressions instead of | |
20 | strings, they can be syntax-checked at compile-time. Also, they can naturally | |
21 | refer to code and data elsewhere in your program, giving you the ability to call | |
22 | back into your code from within a regex match. Finally, since they are statically | |
23 | bound, the compiler can generate faster code for static regexes. | |
24 | ||
25 | xpressive's dual nature is unique and powerful. Static xpressive is a bit | |
26 | like the _spirit_fx_. Like _spirit_, you can build grammars with | |
27 | static regexes using expression templates. (Unlike _spirit_, xpressive does | |
28 | exhaustive backtracking, trying every possibility to find a match for your | |
29 | pattern.) Dynamic xpressive is a bit like _regexpp_. In fact, | |
30 | xpressive's interface should be familiar to anyone who has used _regexpp_. | |
31 | xpressive's innovation comes from allowing you to mix and match static and | |
32 | dynamic regexes in the same program, and even in the same expression! You | |
33 | can embed a dynamic regex in a static regex, or /vice versa/, and the embedded | |
34 | regex will participate fully in the search, back-tracking as needed to make | |
35 | the match succeed. | |
36 | ||
37 | [h2 Hello, world!] | |
38 | ||
39 | Enough theory. Let's have a look at ['Hello World], xpressive style: | |
40 | ||
41 | #include <iostream> | |
42 | #include <boost/xpressive/xpressive.hpp> | |
43 | ||
44 | using namespace boost::xpressive; | |
45 | ||
46 | int main() | |
47 | { | |
48 | std::string hello( "hello world!" ); | |
49 | ||
50 | sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); | |
51 | smatch what; | |
52 | ||
53 | if( regex_match( hello, what, rex ) ) | |
54 | { | |
55 | std::cout << what[0] << '\n'; // whole match | |
56 | std::cout << what[1] << '\n'; // first capture | |
57 | std::cout << what[2] << '\n'; // second capture | |
58 | } | |
59 | ||
60 | return 0; | |
61 | } | |
62 | ||
63 | This program outputs the following: | |
64 | ||
65 | [pre | |
66 | hello world! | |
67 | hello | |
68 | world | |
69 | ] | |
70 | ||
71 | The first thing you'll notice about the code is that all the types in xpressive live in | |
72 | the `boost::xpressive` namespace. | |
73 | ||
74 | [note Most of the rest of the examples in this document will leave off the | |
75 | `using namespace boost::xpressive;` directive. Just pretend it's there.] | |
76 | ||
77 | Next, you'll notice the type of the regular expression object is `sregex`. If you are familiar | |
78 | with _regexpp_, this is different than what you are used to. The "`s`" in "`sregex`" stands for | |
79 | "`string`", indicating that this regex can be used to find patterns in `std::string` objects. | |
80 | I'll discuss this difference and its implications in detail later. | |
81 | ||
82 | Notice how the regex object is initialized: | |
83 | ||
84 | sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); | |
85 | ||
86 | To create a regular expression object from a string, you must call a factory method such as | |
87 | _regex_compile_. This is another area in which xpressive differs from | |
88 | other object-oriented regular expression libraries. Other libraries encourage you to think of | |
89 | a regular expression as a kind of string on steroids. In xpressive, regular expressions are not | |
90 | strings; they are little programs in a domain-specific language. Strings are only one ['representation] | |
91 | of that language. Another representation is an expression template. For example, the above line of code | |
92 | is equivalent to the following: | |
93 | ||
94 | sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!'; | |
95 | ||
96 | This describes the same regular expression, except it uses the domain-specific embedded language | |
97 | defined by static xpressive. | |
98 | ||
99 | As you can see, static regexes have a syntax that is noticeably different than standard Perl | |
100 | syntax. That is because we are constrained by C++'s syntax. The biggest difference is the use | |
101 | of `>>` to mean "followed by". For instance, in Perl you can just put sub-expressions next | |
102 | to each other: | |
103 | ||
104 | abc | |
105 | ||
106 | But in C++, there must be an operator separating sub-expressions: | |
107 | ||
108 | a >> b >> c | |
109 | ||
110 | In Perl, parentheses `()` have special meaning. They group, but as a side-effect they also create | |
111 | back-references like [^$1] and [^$2]. In C++, there is no way to overload parentheses to give them | |
112 | side-effects. To get the same effect, we use the special `s1`, `s2`, etc. tokens. Assign to | |
113 | one to create a back-reference (known as a sub-match in xpressive). | |
114 | ||
115 | You'll also notice that the one-or-more repetition operator `+` has moved from postfix | |
116 | to prefix position. That's because C++ doesn't have a postfix `+` operator. So: | |
117 | ||
118 | "\\w+" | |
119 | ||
120 | is the same as: | |
121 | ||
122 | +_w | |
123 | ||
124 | We'll cover all the other differences [link boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes later]. | |
125 | ||
126 | [endsect] |