]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/regex/doc/captures.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / captures.qbk
1 [/
2 Copyright 2006-2007 John Maddock.
3 Distributed under the Boost Software License, Version 1.0.
4 (See accompanying file LICENSE_1_0.txt or copy at
5 http://www.boost.org/LICENSE_1_0.txt).
6 ]
7
8
9 [section:captures Understanding Marked Sub-Expressions and Captures]
10
11 Captures are the iterator ranges that are "captured" by marked
12 sub-expressions as a regular expression gets matched. Each marked
13 sub-expression can result in more than one capture, if it is matched
14 more than once. This document explains how captures and marked
15 sub-expressions in Boost.Regex are represented and accessed.
16
17 [h4 Marked sub-expressions]
18
19 Every time a Perl regular expression contains a parenthesis group `()`, it
20 spits out an extra field, known as a marked sub-expression,
21 for example the expression:
22
23 [pre (\w+)\W+(\w+)]
24
25 Has two marked sub-expressions (known as $1 and $2 respectively), in
26 addition the complete match is known as $&, everything before the
27 first match as $\`, and everything after the match as $'. So
28 if the above expression is searched for within `"@abc def--"`, then we obtain:
29
30 [table
31 [[Sub-expression][Text found]]
32 [[$\`]["@"]]
33 [[$&]["abc def"]]
34 [[$1]["abc"]]
35 [[$2]["def"]]
36 [[$']["--"]]
37 ]
38
39 In Boost.Regex all these are accessible via the [match_results] class that
40 gets filled in when calling one of the regular expression matching algorithms
41 ([regex_search], [regex_match], or [regex_iterator]). So given:
42
43 boost::match_results<IteratorType> m;
44
45 The Perl and Boost.Regex equivalents are as follows:
46
47 [table
48 [[Perl][Boost.Regex]]
49 [[$\`][`m.prefix()`]]
50 [[$&][`m[0]`]]
51 [[$n][`m[n]`]]
52 [[$\'][`m.suffix()`]]
53 ]
54
55 In Boost.Regex each sub-expression match is represented by a [sub_match] object,
56 this is basically just a pair of iterators denoting the start and end
57 position of the sub-expression match, but there are some additional
58 operators provided so that objects of type [sub_match] behave a lot like a
59 `std::basic_string`: for example they are implicitly convertible to a
60 `basic_string`, they can be compared to a string, added to a string, or
61 streamed out to an output stream.
62
63 [h4 Unmatched Sub-Expressions]
64
65 When a regular expression match is found there is no need for all of the
66 marked sub-expressions to have participated in the match, for example the expression:
67
68 [pre (abc)|(def)]
69
70 can match either $1 or $2, but never both at the same time. In Boost.Regex
71 you can determine which sub-expressions matched by accessing the
72 `sub_match::matched` data member.
73
74 [h4 Repeated Captures]
75
76 When a marked sub-expression is repeated, then the sub-expression gets
77 "captured" multiple times, however normally only the final capture is available,
78 for example if
79
80 [pre (?:(\w+)\W+)+]
81
82 is matched against
83
84 [pre one fine day]
85
86 Then $1 will contain the string "day", and all the previous captures will have
87 been forgotten.
88
89 However, Boost.Regex has an experimental feature that allows all the capture
90 information to be retained - this is accessed either via the
91 `match_results::captures` member function or the `sub_match::captures` member
92 function. These functions return a container that contains a sequence of all
93 the captures obtained during the regular expression matching. The following
94 example program shows how this information may be used:
95
96 #include <boost/regex.hpp>
97 #include <iostream>
98
99 void print_captures(const std::string& regx, const std::string& text)
100 {
101 boost::regex e(regx);
102 boost::smatch what;
103 std::cout << "Expression: \"" << regx << "\"\n";
104 std::cout << "Text: \"" << text << "\"\n";
105 if(boost::regex_match(text, what, e, boost::match_extra))
106 {
107 unsigned i, j;
108 std::cout << "** Match found **\n Sub-Expressions:\n";
109 for(i = 0; i < what.size(); ++i)
110 std::cout << " $" << i << " = \"" << what[i] << "\"\n";
111 std::cout << " Captures:\n";
112 for(i = 0; i < what.size(); ++i)
113 {
114 std::cout << " $" << i << " = {";
115 for(j = 0; j < what.captures(i).size(); ++j)
116 {
117 if(j)
118 std::cout << ", ";
119 else
120 std::cout << " ";
121 std::cout << "\"" << what.captures(i)[j] << "\"";
122 }
123 std::cout << " }\n";
124 }
125 }
126 else
127 {
128 std::cout << "** No Match found **\n";
129 }
130 }
131
132 int main(int , char* [])
133 {
134 print_captures("(([[:lower:]]+)|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee");
135 print_captures("(.*)bar|(.*)bah", "abcbar");
136 print_captures("(.*)bar|(.*)bah", "abcbah");
137 print_captures("^(?:(\\w+)|(?>\\W+))*$",
138 "now is the time for all good men to come to the aid of the party");
139 return 0;
140 }
141
142 Which produces the following output:
143
144 [pre
145 Expression: "((\[\[:lower:\]\]+)|(\[\[:upper:\]\]+))+"
146 Text: "aBBcccDDDDDeeeeeeee"
147 '''**''' Match found '''**'''
148 Sub-Expressions:
149 $0 = "aBBcccDDDDDeeeeeeee"
150 $1 = "eeeeeeee"
151 $2 = "eeeeeeee"
152 $3 = "DDDDD"
153 Captures:
154 $0 = { "aBBcccDDDDDeeeeeeee" }
155 $1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" }
156 $2 = { "a", "ccc", "eeeeeeee" }
157 $3 = { "BB", "DDDDD" }
158 Expression: "(.'''*''')bar|(.'''*''')bah"
159 Text: "abcbar"
160 '''**''' Match found '''**'''
161 Sub-Expressions:
162 $0 = "abcbar"
163 $1 = "abc"
164 $2 = ""
165 Captures:
166 $0 = { "abcbar" }
167 $1 = { "abc" }
168 $2 = { }
169 Expression: "(.'''*''')bar|(.'''*''')bah"
170 Text: "abcbah"
171 '''**''' Match found '''**'''
172 Sub-Expressions:
173 $0 = "abcbah"
174 $1 = ""
175 $2 = "abc"
176 Captures:
177 $0 = { "abcbah" }
178 $1 = { }
179 $2 = { "abc" }
180 Expression: "^(?:(\w+)|(?>\W+))'''*$'''"
181 Text: "now is the time for all good men to come to the aid of the party"
182 '''**''' Match found '''**'''
183 Sub-Expressions:
184 $0 = "now is the time for all good men to come to the aid of the party"
185 $1 = "party"
186 Captures:
187 $0 = { "now is the time for all good men to come to the aid of the party" }
188 $1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to",
189 "come", "to", "the", "aid", "of", "the", "party" }
190 ]
191
192 Unfortunately enabling this feature has an impact on performance
193 (even if you don't use it), and a much bigger impact if you do use it,
194 therefore to use this feature you need to:
195
196 * Define BOOST_REGEX_MATCH_EXTRA for all translation units including the library source (the best way to do this is to uncomment this define in boost/regex/user.hpp and then rebuild everything.
197 * Pass the match_extra flag to the particular algorithms where you actually need the captures information (regex_search, regex_match, or regex_iterator).
198
199 [endsect]
200