]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/libs/regex/doc/html/boost_regex/captures.html
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / html / boost_regex / captures.html
CommitLineData
7c673cae
FG
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4<title>Understanding Marked Sub-Expressions and Captures</title>
5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.77.1">
7<link rel="home" href="../index.html" title="Boost.Regex 5.1.2">
8<link rel="up" href="../index.html" title="Boost.Regex 5.1.2">
9<link rel="prev" href="unicode.html" title="Unicode and Boost.Regex">
10<link rel="next" href="partial_matches.html" title="Partial Matches">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%"><tr>
14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
15<td align="center"><a href="../../../../../index.html">Home</a></td>
16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19<td align="center"><a href="../../../../../more/index.htm">More</a></td>
20</tr></table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
24</div>
25<div class="section">
26<div class="titlepage"><div><div><h2 class="title" style="clear: both">
27<a name="boost_regex.captures"></a><a class="link" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">Understanding Marked Sub-Expressions
28 and Captures</a>
29</h2></div></div></div>
30<p>
31 Captures are the iterator ranges that are "captured" by marked sub-expressions
32 as a regular expression gets matched. Each marked sub-expression can result
33 in more than one capture, if it is matched more than once. This document explains
34 how captures and marked sub-expressions in Boost.Regex are represented and
35 accessed.
36 </p>
37<h5>
38<a name="boost_regex.captures.h0"></a>
39 <span class="phrase"><a name="boost_regex.captures.marked_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.marked_sub_expressions">Marked
40 sub-expressions</a>
41 </h5>
42<p>
43 Every time a Perl regular expression contains a parenthesis group <code class="computeroutput"><span class="special">()</span></code>, it spits out an extra field, known as a
44 marked sub-expression, for example the expression:
45 </p>
46<pre class="programlisting">(\w+)\W+(\w+)</pre>
47<p>
48 Has two marked sub-expressions (known as $1 and $2 respectively), in addition
49 the complete match is known as $&amp;, everything before the first match as
50 $`, and everything after the match as $'. So if the above expression is searched
51 for within <code class="computeroutput"><span class="string">"@abc def--"</span></code>,
52 then we obtain:
53 </p>
54<div class="informaltable"><table class="table">
55<colgroup>
56<col>
57<col>
58</colgroup>
59<thead><tr>
60<th>
61 <p>
62 Sub-expression
63 </p>
64 </th>
65<th>
66 <p>
67 Text found
68 </p>
69 </th>
70</tr></thead>
71<tbody>
72<tr>
73<td>
74 <p>
75 $`
76 </p>
77 </td>
78<td>
79 <p>
80 "@"
81 </p>
82 </td>
83</tr>
84<tr>
85<td>
86 <p>
87 $&amp;
88 </p>
89 </td>
90<td>
91 <p>
92 "abc def"
93 </p>
94 </td>
95</tr>
96<tr>
97<td>
98 <p>
99 $1
100 </p>
101 </td>
102<td>
103 <p>
104 "abc"
105 </p>
106 </td>
107</tr>
108<tr>
109<td>
110 <p>
111 $2
112 </p>
113 </td>
114<td>
115 <p>
116 "def"
117 </p>
118 </td>
119</tr>
120<tr>
121<td>
122 <p>
123 $'
124 </p>
125 </td>
126<td>
127 <p>
128 "--"
129 </p>
130 </td>
131</tr>
132</tbody>
133</table></div>
134<p>
135 In Boost.Regex all these are accessible via the <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> class that gets filled
136 in when calling one of the regular expression matching algorithms ( <a class="link" href="ref/regex_search.html" title="regex_search"><code class="computeroutput"><span class="identifier">regex_search</span></code></a>, <a class="link" href="ref/regex_match.html" title="regex_match"><code class="computeroutput"><span class="identifier">regex_match</span></code></a>, or <a class="link" href="ref/regex_iterator.html" title="regex_iterator"><code class="computeroutput"><span class="identifier">regex_iterator</span></code></a>). So given:
137 </p>
138<pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_results</span><span class="special">&lt;</span><span class="identifier">IteratorType</span><span class="special">&gt;</span> <span class="identifier">m</span><span class="special">;</span>
139</pre>
140<p>
141 The Perl and Boost.Regex equivalents are as follows:
142 </p>
143<div class="informaltable"><table class="table">
144<colgroup>
145<col>
146<col>
147</colgroup>
148<thead><tr>
149<th>
150 <p>
151 Perl
152 </p>
153 </th>
154<th>
155 <p>
156 Boost.Regex
157 </p>
158 </th>
159</tr></thead>
160<tbody>
161<tr>
162<td>
163 <p>
164 $`
165 </p>
166 </td>
167<td>
168 <p>
169 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">prefix</span><span class="special">()</span></code>
170 </p>
171 </td>
172</tr>
173<tr>
174<td>
175 <p>
176 $&amp;
177 </p>
178 </td>
179<td>
180 <p>
181 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="number">0</span><span class="special">]</span></code>
182 </p>
183 </td>
184</tr>
185<tr>
186<td>
187 <p>
188 $n
189 </p>
190 </td>
191<td>
192 <p>
193 <code class="computeroutput"><span class="identifier">m</span><span class="special">[</span><span class="identifier">n</span><span class="special">]</span></code>
194 </p>
195 </td>
196</tr>
197<tr>
198<td>
199 <p>
200 $'
201 </p>
202 </td>
203<td>
204 <p>
205 <code class="computeroutput"><span class="identifier">m</span><span class="special">.</span><span class="identifier">suffix</span><span class="special">()</span></code>
206 </p>
207 </td>
208</tr>
209</tbody>
210</table></div>
211<p>
212 In Boost.Regex each sub-expression match is represented by a <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a> object, this is basically
213 just a pair of iterators denoting the start and end position of the sub-expression
214 match, but there are some additional operators provided so that objects of
215 type <a class="link" href="ref/sub_match.html" title="sub_match"><code class="computeroutput"><span class="identifier">sub_match</span></code></a>
216 behave a lot like a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span></code>: for example they are implicitly
217 convertible to a <code class="computeroutput"><span class="identifier">basic_string</span></code>,
218 they can be compared to a string, added to a string, or streamed out to an
219 output stream.
220 </p>
221<h5>
222<a name="boost_regex.captures.h1"></a>
223 <span class="phrase"><a name="boost_regex.captures.unmatched_sub_expressions"></a></span><a class="link" href="captures.html#boost_regex.captures.unmatched_sub_expressions">Unmatched
224 Sub-Expressions</a>
225 </h5>
226<p>
227 When a regular expression match is found there is no need for all of the marked
228 sub-expressions to have participated in the match, for example the expression:
229 </p>
230<pre class="programlisting">(abc)|(def)</pre>
231<p>
232 can match either $1 or $2, but never both at the same time. In Boost.Regex
233 you can determine which sub-expressions matched by accessing the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">matched</span></code> data member.
234 </p>
235<h5>
236<a name="boost_regex.captures.h2"></a>
237 <span class="phrase"><a name="boost_regex.captures.repeated_captures"></a></span><a class="link" href="captures.html#boost_regex.captures.repeated_captures">Repeated
238 Captures</a>
239 </h5>
240<p>
241 When a marked sub-expression is repeated, then the sub-expression gets "captured"
242 multiple times, however normally only the final capture is available, for example
243 if
244 </p>
245<pre class="programlisting">(?:(\w+)\W+)+</pre>
246<p>
247 is matched against
248 </p>
249<pre class="programlisting">one fine day</pre>
250<p>
251 Then $1 will contain the string "day", and all the previous captures
252 will have been forgotten.
253 </p>
254<p>
255 However, Boost.Regex has an experimental feature that allows all the capture
256 information to be retained - this is accessed either via the <code class="computeroutput"><span class="identifier">match_results</span><span class="special">::</span><span class="identifier">captures</span></code> member function or the <code class="computeroutput"><span class="identifier">sub_match</span><span class="special">::</span><span class="identifier">captures</span></code> member function. These functions
257 return a container that contains a sequence of all the captures obtained during
258 the regular expression matching. The following example program shows how this
259 information may be used:
260 </p>
261<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">regex</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
262<span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">iostream</span><span class="special">&gt;</span>
263
264<span class="keyword">void</span> <span class="identifier">print_captures</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&amp;</span> <span class="identifier">regx</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&amp;</span> <span class="identifier">text</span><span class="special">)</span>
265<span class="special">{</span>
266 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="identifier">regx</span><span class="special">);</span>
267 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span>
268 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"Expression: \""</span> <span class="special">&lt;&lt;</span> <span class="identifier">regx</span> <span class="special">&lt;&lt;</span> <span class="string">"\"\n"</span><span class="special">;</span>
269 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"Text: \""</span> <span class="special">&lt;&lt;</span> <span class="identifier">text</span> <span class="special">&lt;&lt;</span> <span class="string">"\"\n"</span><span class="special">;</span>
270 <span class="keyword">if</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">text</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">e</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_extra</span><span class="special">))</span>
271 <span class="special">{</span>
272 <span class="keyword">unsigned</span> <span class="identifier">i</span><span class="special">,</span> <span class="identifier">j</span><span class="special">;</span>
273 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"** Match found **\n Sub-Expressions:\n"</span><span class="special">;</span>
274 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">&lt;</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
275 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">" $"</span> <span class="special">&lt;&lt;</span> <span class="identifier">i</span> <span class="special">&lt;&lt;</span> <span class="string">" = \""</span> <span class="special">&lt;&lt;</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">&lt;&lt;</span> <span class="string">"\"\n"</span><span class="special">;</span>
276 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">" Captures:\n"</span><span class="special">;</span>
277 <span class="keyword">for</span><span class="special">(</span><span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">&lt;</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
278 <span class="special">{</span>
279 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">" $"</span> <span class="special">&lt;&lt;</span> <span class="identifier">i</span> <span class="special">&lt;&lt;</span> <span class="string">" = {"</span><span class="special">;</span>
280 <span class="keyword">for</span><span class="special">(</span><span class="identifier">j</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">j</span> <span class="special">&lt;</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">).</span><span class="identifier">size</span><span class="special">();</span> <span class="special">++</span><span class="identifier">j</span><span class="special">)</span>
281 <span class="special">{</span>
282 <span class="keyword">if</span><span class="special">(</span><span class="identifier">j</span><span class="special">)</span>
283 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">", "</span><span class="special">;</span>
284 <span class="keyword">else</span>
285 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">" "</span><span class="special">;</span>
286 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"\""</span> <span class="special">&lt;&lt;</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">captures</span><span class="special">(</span><span class="identifier">i</span><span class="special">)[</span><span class="identifier">j</span><span class="special">]</span> <span class="special">&lt;&lt;</span> <span class="string">"\""</span><span class="special">;</span>
287 <span class="special">}</span>
288 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">" }\n"</span><span class="special">;</span>
289 <span class="special">}</span>
290 <span class="special">}</span>
291 <span class="keyword">else</span>
292 <span class="special">{</span>
293 <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"** No Match found **\n"</span><span class="special">;</span>
294 <span class="special">}</span>
295<span class="special">}</span>
296
297<span class="keyword">int</span> <span class="identifier">main</span><span class="special">(</span><span class="keyword">int</span> <span class="special">,</span> <span class="keyword">char</span><span class="special">*</span> <span class="special">[])</span>
298<span class="special">{</span>
299 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(([[:lower:]]+)|([[:upper:]]+))+"</span><span class="special">,</span> <span class="string">"aBBcccDDDDDeeeeeeee"</span><span class="special">);</span>
300 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbar"</span><span class="special">);</span>
301 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"(.*)bar|(.*)bah"</span><span class="special">,</span> <span class="string">"abcbah"</span><span class="special">);</span>
302 <span class="identifier">print_captures</span><span class="special">(</span><span class="string">"^(?:(\\w+)|(?&gt;\\W+))*$"</span><span class="special">,</span>
303 <span class="string">"now is the time for all good men to come to the aid of the party"</span><span class="special">);</span>
304 <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span>
305<span class="special">}</span>
306</pre>
307<p>
308 Which produces the following output:
309 </p>
310<pre class="programlisting">Expression: "(([[:lower:]]+)|([[:upper:]]+))+"
311Text: "aBBcccDDDDDeeeeeeee"
312** Match found **
313 Sub-Expressions:
314 $0 = "aBBcccDDDDDeeeeeeee"
315 $1 = "eeeeeeee"
316 $2 = "eeeeeeee"
317 $3 = "DDDDD"
318 Captures:
319 $0 = { "aBBcccDDDDDeeeeeeee" }
320 $1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" }
321 $2 = { "a", "ccc", "eeeeeeee" }
322 $3 = { "BB", "DDDDD" }
323Expression: "(.*)bar|(.*)bah"
324Text: "abcbar"
325** Match found **
326 Sub-Expressions:
327 $0 = "abcbar"
328 $1 = "abc"
329 $2 = ""
330 Captures:
331 $0 = { "abcbar" }
332 $1 = { "abc" }
333 $2 = { }
334Expression: "(.*)bar|(.*)bah"
335Text: "abcbah"
336** Match found **
337 Sub-Expressions:
338 $0 = "abcbah"
339 $1 = ""
340 $2 = "abc"
341 Captures:
342 $0 = { "abcbah" }
343 $1 = { }
344 $2 = { "abc" }
345Expression: "^(?:(\w+)|(?&gt;\W+))*$"
346Text: "now is the time for all good men to come to the aid of the party"
347** Match found **
348 Sub-Expressions:
349 $0 = "now is the time for all good men to come to the aid of the party"
350 $1 = "party"
351 Captures:
352 $0 = { "now is the time for all good men to come to the aid of the party" }
353 $1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to",
354 "come", "to", "the", "aid", "of", "the", "party" }
355</pre>
356<p>
357 Unfortunately enabling this feature has an impact on performance (even if you
358 don't use it), and a much bigger impact if you do use it, therefore to use
359 this feature you need to:
360 </p>
361<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
362<li class="listitem">
363 Define BOOST_REGEX_MATCH_EXTRA for all translation units including the
364 library source (the best way to do this is to uncomment this define in
365 boost/regex/user.hpp and then rebuild everything.
366 </li>
367<li class="listitem">
368 Pass the match_extra flag to the particular algorithms where you actually
369 need the captures information (regex_search, regex_match, or regex_iterator).
370 </li>
371</ul></div>
372</div>
373<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
374<td align="left"></td>
375<td align="right"><div class="copyright-footer">Copyright &#169; 1998-2013 John Maddock<p>
376 Distributed under the Boost Software License, Version 1.0. (See accompanying
377 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
378 </p>
379</div></td>
380</tr></table>
381<hr>
382<div class="spirit-nav">
383<a accesskey="p" href="unicode.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="partial_matches.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
384</div>
385</body>
386</html>