]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/regex/doc/html/boost_regex/syntax/basic_extended.html
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / html / boost_regex / syntax / basic_extended.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>POSIX Extended Regular Expression Syntax</title>
5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.77.1">
7 <link rel="home" href="../../index.html" title="Boost.Regex 5.1.2">
8 <link rel="up" href="../syntax.html" title="Regular Expression Syntax">
9 <link rel="prev" href="perl_syntax.html" title="Perl Regular Expression Syntax">
10 <link rel="next" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h3 class="title">
27 <a name="boost_regex.syntax.basic_extended"></a><a class="link" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">POSIX Extended Regular
28 Expression Syntax</a>
29 </h3></div></div></div>
30 <h4>
31 <a name="boost_regex.syntax.basic_extended.h0"></a>
32 <span class="phrase"><a name="boost_regex.syntax.basic_extended.synopsis"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.synopsis">Synopsis</a>
33 </h4>
34 <p>
35 The POSIX-Extended regular expression syntax is supported by the POSIX C
36 regular expression API's, and variations are used by the utilities <code class="computeroutput"><span class="identifier">egrep</span></code> and <code class="computeroutput"><span class="identifier">awk</span></code>.
37 You can construct POSIX extended regular expressions in Boost.Regex by passing
38 the flag <code class="computeroutput"><span class="identifier">extended</span></code> to the
39 regex constructor, for example:
40 </p>
41 <pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:</span>
42 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span>
43 <span class="comment">// e2 a case insensitive POSIX-Extended expression:</span>
44 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
45 </pre>
46 <a name="boost_regex.posix_extended_syntax"></a><h4>
47 <a name="boost_regex.syntax.basic_extended.h1"></a>
48 <span class="phrase"><a name="boost_regex.syntax.basic_extended.posix_extended_syntax"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.posix_extended_syntax">POSIX Extended
49 Syntax</a>
50 </h4>
51 <p>
52 In POSIX-Extended regular expressions, all characters match themselves except
53 for the following special characters:
54 </p>
55 <pre class="programlisting">.[{}()\*+?|^$</pre>
56 <h5>
57 <a name="boost_regex.syntax.basic_extended.h2"></a>
58 <span class="phrase"><a name="boost_regex.syntax.basic_extended.wildcard"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.wildcard">Wildcard:</a>
59 </h5>
60 <p>
61 The single character '.' when used outside of a character set will match
62 any single character except:
63 </p>
64 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
65 <li class="listitem">
66 The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code>
67 is passed to the matching algorithms.
68 </li>
69 <li class="listitem">
70 The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code>
71 is passed to the matching algorithms.
72 </li>
73 </ul></div>
74 <h5>
75 <a name="boost_regex.syntax.basic_extended.h3"></a>
76 <span class="phrase"><a name="boost_regex.syntax.basic_extended.anchors"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.anchors">Anchors:</a>
77 </h5>
78 <p>
79 A '^' character shall match the start of a line when used as the first character
80 of an expression, or the first character of a sub-expression.
81 </p>
82 <p>
83 A '$' character shall match the end of a line when used as the last character
84 of an expression, or the last character of a sub-expression.
85 </p>
86 <h5>
87 <a name="boost_regex.syntax.basic_extended.h4"></a>
88 <span class="phrase"><a name="boost_regex.syntax.basic_extended.marked_sub_expressions"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.marked_sub_expressions">Marked
89 sub-expressions:</a>
90 </h5>
91 <p>
92 A section beginning <code class="computeroutput"><span class="special">(</span></code> and ending
93 <code class="computeroutput"><span class="special">)</span></code> acts as a marked sub-expression.
94 Whatever matched the sub-expression is split out in a separate field by the
95 matching algorithms. Marked sub-expressions can also repeated, or referred
96 to by a back-reference.
97 </p>
98 <h5>
99 <a name="boost_regex.syntax.basic_extended.h5"></a>
100 <span class="phrase"><a name="boost_regex.syntax.basic_extended.repeats"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.repeats">Repeats:</a>
101 </h5>
102 <p>
103 Any atom (a single character, a marked sub-expression, or a character class)
104 can be repeated with the <code class="computeroutput"><span class="special">*</span></code>,
105 <code class="computeroutput"><span class="special">+</span></code>, <code class="computeroutput"><span class="special">?</span></code>,
106 and <code class="computeroutput"><span class="special">{}</span></code> operators.
107 </p>
108 <p>
109 The <code class="computeroutput"><span class="special">*</span></code> operator will match the
110 preceding atom <span class="emphasis"><em>zero or more times</em></span>, for example the expression
111 <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code> will match any of the following:
112 </p>
113 <pre class="programlisting">b
114 ab
115 aaaaaaaab
116 </pre>
117 <p>
118 The <code class="computeroutput"><span class="special">+</span></code> operator will match the
119 preceding atom <span class="emphasis"><em>one or more times</em></span>, for example the expression
120 a+b will match any of the following:
121 </p>
122 <pre class="programlisting">ab
123 aaaaaaaab
124 </pre>
125 <p>
126 But will not match:
127 </p>
128 <pre class="programlisting">b
129 </pre>
130 <p>
131 The <code class="computeroutput"><span class="special">?</span></code> operator will match the
132 preceding atom <span class="emphasis"><em>zero or one times</em></span>, for example the expression
133 <code class="computeroutput"><span class="identifier">ca</span><span class="special">?</span><span class="identifier">b</span></code> will match any of the following:
134 </p>
135 <pre class="programlisting">cb
136 cab
137 </pre>
138 <p>
139 But will not match:
140 </p>
141 <pre class="programlisting">caab
142 </pre>
143 <p>
144 An atom can also be repeated with a bounded repeat:
145 </p>
146 <p>
147 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">}</span></code> Matches
148 'a' repeated <span class="emphasis"><em>exactly n times</em></span>.
149 </p>
150 <p>
151 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,}</span></code> Matches
152 'a' repeated <span class="emphasis"><em>n or more times</em></span>.
153 </p>
154 <p>
155 <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">}</span></code> Matches 'a' repeated <span class="emphasis"><em>between n
156 and m times inclusive</em></span>.
157 </p>
158 <p>
159 For example:
160 </p>
161 <pre class="programlisting">^a{2,3}$</pre>
162 <p>
163 Will match either of:
164 </p>
165 <pre class="programlisting"><span class="identifier">aa</span>
166 <span class="identifier">aaa</span>
167 </pre>
168 <p>
169 But neither of:
170 </p>
171 <pre class="programlisting"><span class="identifier">a</span>
172 <span class="identifier">aaaa</span>
173 </pre>
174 <p>
175 It is an error to use a repeat operator, if the preceding construct can not
176 be repeated, for example:
177 </p>
178 <pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span>
179 </pre>
180 <p>
181 Will raise an error, as there is nothing for the <code class="computeroutput"><span class="special">*</span></code>
182 operator to be applied to.
183 </p>
184 <h5>
185 <a name="boost_regex.syntax.basic_extended.h6"></a>
186 <span class="phrase"><a name="boost_regex.syntax.basic_extended.back_references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.back_references">Back
187 references:</a>
188 </h5>
189 <p>
190 An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span>
191 is in the range 1-9, matches the same string that was matched by sub-expression
192 <span class="emphasis"><em>n</em></span>. For example the expression:
193 </p>
194 <pre class="programlisting">^(a*).*\1$</pre>
195 <p>
196 Will match the string:
197 </p>
198 <pre class="programlisting"><span class="identifier">aaabbaaa</span>
199 </pre>
200 <p>
201 But not the string:
202 </p>
203 <pre class="programlisting"><span class="identifier">aaabba</span>
204 </pre>
205 <div class="caution"><table border="0" summary="Caution">
206 <tr>
207 <td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="../../../../../../doc/src/images/caution.png"></td>
208 <th align="left">Caution</th>
209 </tr>
210 <tr><td align="left" valign="top"><p>
211 The POSIX standard does not support back-references for "extended"
212 regular expressions, this is a compatible extension to that standard.
213 </p></td></tr>
214 </table></div>
215 <h5>
216 <a name="boost_regex.syntax.basic_extended.h7"></a>
217 <span class="phrase"><a name="boost_regex.syntax.basic_extended.alternation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.alternation">Alternation</a>
218 </h5>
219 <p>
220 The <code class="computeroutput"><span class="special">|</span></code> operator will match either
221 of its arguments, so for example: <code class="computeroutput"><span class="identifier">abc</span><span class="special">|</span><span class="identifier">def</span></code> will
222 match either "abc" or "def".
223 </p>
224 <p>
225 Parenthesis can be used to group alternations, for example: <code class="computeroutput"><span class="identifier">ab</span><span class="special">(</span><span class="identifier">d</span><span class="special">|</span><span class="identifier">ef</span><span class="special">)</span></code>
226 will match either of "abd" or "abef".
227 </p>
228 <h5>
229 <a name="boost_regex.syntax.basic_extended.h8"></a>
230 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_sets"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_sets">Character
231 sets:</a>
232 </h5>
233 <p>
234 A character set is a bracket-expression starting with [ and ending with ],
235 it defines a set of characters, and matches any single character that is
236 a member of that set.
237 </p>
238 <p>
239 A bracket expression may contain any combination of the following:
240 </p>
241 <h6>
242 <a name="boost_regex.syntax.basic_extended.h9"></a>
243 <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_characters"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_characters">Single
244 characters:</a>
245 </h6>
246 <p>
247 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b',
248 or 'c'.
249 </p>
250 <h6>
251 <a name="boost_regex.syntax.basic_extended.h10"></a>
252 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_ranges"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_ranges">Character
253 ranges:</a>
254 </h6>
255 <p>
256 For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code>
257 will match any single character in the range 'a' to 'c'. By default, for
258 POSIX-Extended regular expressions, a character <span class="emphasis"><em>x</em></span> is
259 within the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it
260 collates within that range; this results in locale specific behavior . This
261 behavior can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code>
262 <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">option flag</a> - in
263 which case whether a character appears within a range is determined by comparing
264 the code points of the characters only.
265 </p>
266 <h6>
267 <a name="boost_regex.syntax.basic_extended.h11"></a>
268 <span class="phrase"><a name="boost_regex.syntax.basic_extended.negation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.negation">Negation:</a>
269 </h6>
270 <p>
271 If the bracket-expression begins with the ^ character, then it matches the
272 complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the
273 range <code class="computeroutput"><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span></code>.
274 </p>
275 <h6>
276 <a name="boost_regex.syntax.basic_extended.h12"></a>
277 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_classes">Character
278 classes:</a>
279 </h6>
280 <p>
281 An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code>
282 matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See
283 <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>.
284 </p>
285 <h6>
286 <a name="boost_regex.syntax.basic_extended.h13"></a>
287 <span class="phrase"><a name="boost_regex.syntax.basic_extended.collating_elements"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.collating_elements">Collating
288 Elements:</a>
289 </h6>
290 <p>
291 An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches
292 the collating element <span class="emphasis"><em>col</em></span>. A collating element is any
293 single character, or any sequence of characters that collates as a single
294 unit. Collating elements may also be used as the end point of a range, for
295 example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code>
296 matches the character sequence "ae", plus any single character
297 in the range "ae"-c, assuming that "ae" is treated as
298 a single collating element in the current locale.
299 </p>
300 <p>
301 Collating elements may be used in place of escapes (which are not normally
302 allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would
303 match either one of the characters 'abc^'.
304 </p>
305 <p>
306 As an extension, a collating element may also be specified via its <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example:
307 </p>
308 <pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span>
309 </pre>
310 <p>
311 matches a NUL character.
312 </p>
313 <h6>
314 <a name="boost_regex.syntax.basic_extended.h14"></a>
315 <span class="phrase"><a name="boost_regex.syntax.basic_extended.equivalence_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.equivalence_classes">Equivalence
316 classes:</a>
317 </h6>
318 <p>
319 An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>,
320 matches any character or collating element whose primary sort key is the
321 same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating
322 elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic
323 name</a>. A primary sort key is one that ignores case, accentation, or
324 locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches
325 any of the characters: a, &#192;, &#193;, &#194;, &#195;, &#196;, &#197;, A, &#224;, &#225;, &#226;, &#227;, &#228; and &#229;. Unfortunately implementation
326 of this is reliant on the platform's collation and localisation support;
327 this feature can not be relied upon to work portably across all platforms,
328 or even all locales on one platform.
329 </p>
330 <h6>
331 <a name="boost_regex.syntax.basic_extended.h15"></a>
332 <span class="phrase"><a name="boost_regex.syntax.basic_extended.combinations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.combinations">Combinations:</a>
333 </h6>
334 <p>
335 All of the above can be combined in one character set declaration, for example:
336 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]]</span></code>.
337 </p>
338 <h5>
339 <a name="boost_regex.syntax.basic_extended.h16"></a>
340 <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes">Escapes</a>
341 </h5>
342 <p>
343 The POSIX standard defines no escape sequences for POSIX-Extended regular
344 expressions, except that:
345 </p>
346 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
347 <li class="listitem">
348 Any special character preceded by an escape shall match itself.
349 </li>
350 <li class="listitem">
351 The effect of any ordinary character being preceded by an escape is undefined.
352 </li>
353 <li class="listitem">
354 An escape inside a character class declaration shall match itself: in
355 other words the escape character is not "special" inside a
356 character class declaration; so <code class="computeroutput"><span class="special">[\^]</span></code>
357 will match either a literal '\' or a '^'.
358 </li>
359 </ul></div>
360 <p>
361 However, that's rather restrictive, so the following standard-compatible
362 extensions are also supported by Boost.Regex:
363 </p>
364 <h6>
365 <a name="boost_regex.syntax.basic_extended.h17"></a>
366 <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_char"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes_matching_a_specific_char">Escapes
367 matching a specific character</a>
368 </h6>
369 <p>
370 The following escape sequences are all synonyms for single characters:
371 </p>
372 <div class="informaltable"><table class="table">
373 <colgroup>
374 <col>
375 <col>
376 </colgroup>
377 <thead><tr>
378 <th>
379 <p>
380 Escape
381 </p>
382 </th>
383 <th>
384 <p>
385 Character
386 </p>
387 </th>
388 </tr></thead>
389 <tbody>
390 <tr>
391 <td>
392 <p>
393 \a
394 </p>
395 </td>
396 <td>
397 <p>
398 '\a'
399 </p>
400 </td>
401 </tr>
402 <tr>
403 <td>
404 <p>
405 \e
406 </p>
407 </td>
408 <td>
409 <p>
410 0x1B
411 </p>
412 </td>
413 </tr>
414 <tr>
415 <td>
416 <p>
417 \f
418 </p>
419 </td>
420 <td>
421 <p>
422 \f
423 </p>
424 </td>
425 </tr>
426 <tr>
427 <td>
428 <p>
429 \n
430 </p>
431 </td>
432 <td>
433 <p>
434 \n
435 </p>
436 </td>
437 </tr>
438 <tr>
439 <td>
440 <p>
441 \r
442 </p>
443 </td>
444 <td>
445 <p>
446 \r
447 </p>
448 </td>
449 </tr>
450 <tr>
451 <td>
452 <p>
453 \t
454 </p>
455 </td>
456 <td>
457 <p>
458 \t
459 </p>
460 </td>
461 </tr>
462 <tr>
463 <td>
464 <p>
465 \v
466 </p>
467 </td>
468 <td>
469 <p>
470 \v
471 </p>
472 </td>
473 </tr>
474 <tr>
475 <td>
476 <p>
477 \b
478 </p>
479 </td>
480 <td>
481 <p>
482 \b (but only inside a character class declaration).
483 </p>
484 </td>
485 </tr>
486 <tr>
487 <td>
488 <p>
489 \cX
490 </p>
491 </td>
492 <td>
493 <p>
494 An ASCII escape sequence - the character whose code point is X
495 % 32
496 </p>
497 </td>
498 </tr>
499 <tr>
500 <td>
501 <p>
502 \xdd
503 </p>
504 </td>
505 <td>
506 <p>
507 A hexadecimal escape sequence - matches the single character whose
508 code point is 0xdd.
509 </p>
510 </td>
511 </tr>
512 <tr>
513 <td>
514 <p>
515 \x{dddd}
516 </p>
517 </td>
518 <td>
519 <p>
520 A hexadecimal escape sequence - matches the single character whose
521 code point is 0xdddd.
522 </p>
523 </td>
524 </tr>
525 <tr>
526 <td>
527 <p>
528 \0ddd
529 </p>
530 </td>
531 <td>
532 <p>
533 An octal escape sequence - matches the single character whose code
534 point is 0ddd.
535 </p>
536 </td>
537 </tr>
538 <tr>
539 <td>
540 <p>
541 \N{Name}
542 </p>
543 </td>
544 <td>
545 <p>
546 Matches the single character which has the symbolic name <span class="emphasis"><em>Name</em></span>.
547 For example <code class="computeroutput"><span class="special">\\</span><span class="identifier">N</span><span class="special">{</span><span class="identifier">newline</span><span class="special">}</span></code> matches the single character \n.
548 </p>
549 </td>
550 </tr>
551 </tbody>
552 </table></div>
553 <h6>
554 <a name="boost_regex.syntax.basic_extended.h18"></a>
555 <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_character_character_class"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_character_character_class">"Single
556 character" character classes:</a>
557 </h6>
558 <p>
559 Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is
560 the name of a character class shall match any character that is a member
561 of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span>
562 is the name of a character class, shall match any character not in that class.
563 </p>
564 <p>
565 The following are supported by default:
566 </p>
567 <div class="informaltable"><table class="table">
568 <colgroup>
569 <col>
570 <col>
571 </colgroup>
572 <thead><tr>
573 <th>
574 <p>
575 Escape sequence
576 </p>
577 </th>
578 <th>
579 <p>
580 Equivalent to
581 </p>
582 </th>
583 </tr></thead>
584 <tbody>
585 <tr>
586 <td>
587 <p>
588 <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code>
589 </p>
590 </td>
591 <td>
592 <p>
593 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
594 </p>
595 </td>
596 </tr>
597 <tr>
598 <td>
599 <p>
600 <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code>
601 </p>
602 </td>
603 <td>
604 <p>
605 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
606 </p>
607 </td>
608 </tr>
609 <tr>
610 <td>
611 <p>
612 <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code>
613 </p>
614 </td>
615 <td>
616 <p>
617 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
618 </p>
619 </td>
620 </tr>
621 <tr>
622 <td>
623 <p>
624 <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code>
625 </p>
626 </td>
627 <td>
628 <p>
629 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
630 </p>
631 </td>
632 </tr>
633 <tr>
634 <td>
635 <p>
636 <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code>
637 </p>
638 </td>
639 <td>
640 <p>
641 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
642 </p>
643 </td>
644 </tr>
645 <tr>
646 <td>
647 <p>
648 <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code>
649 </p>
650 </td>
651 <td>
652 <p>
653 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
654 </p>
655 </td>
656 </tr>
657 <tr>
658 <td>
659 <p>
660 <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code>
661 </p>
662 </td>
663 <td>
664 <p>
665 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
666 </p>
667 </td>
668 </tr>
669 <tr>
670 <td>
671 <p>
672 <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code>
673 </p>
674 </td>
675 <td>
676 <p>
677 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
678 </p>
679 </td>
680 </tr>
681 <tr>
682 <td>
683 <p>
684 <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code>
685 </p>
686 </td>
687 <td>
688 <p>
689 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
690 </p>
691 </td>
692 </tr>
693 <tr>
694 <td>
695 <p>
696 <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code>
697 </p>
698 </td>
699 <td>
700 <p>
701 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
702 </p>
703 </td>
704 </tr>
705 </tbody>
706 </table></div>
707 <h6>
708 <a name="boost_regex.syntax.basic_extended.h19"></a>
709 <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_properties"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_properties">Character
710 Properties</a>
711 </h6>
712 <p>
713 The character property names in the following table are all equivalent to
714 the names used in character classes.
715 </p>
716 <div class="informaltable"><table class="table">
717 <colgroup>
718 <col>
719 <col>
720 <col>
721 </colgroup>
722 <thead><tr>
723 <th>
724 <p>
725 Form
726 </p>
727 </th>
728 <th>
729 <p>
730 Description
731 </p>
732 </th>
733 <th>
734 <p>
735 Equivalent character set form
736 </p>
737 </th>
738 </tr></thead>
739 <tbody>
740 <tr>
741 <td>
742 <p>
743 <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code>
744 </p>
745 </td>
746 <td>
747 <p>
748 Matches any character that has the property X.
749 </p>
750 </td>
751 <td>
752 <p>
753 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
754 </p>
755 </td>
756 </tr>
757 <tr>
758 <td>
759 <p>
760 <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
761 </p>
762 </td>
763 <td>
764 <p>
765 Matches any character that has the property Name.
766 </p>
767 </td>
768 <td>
769 <p>
770 <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
771 </p>
772 </td>
773 </tr>
774 <tr>
775 <td>
776 <p>
777 <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code>
778 </p>
779 </td>
780 <td>
781 <p>
782 Matches any character that does not have the property X.
783 </p>
784 </td>
785 <td>
786 <p>
787 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
788 </p>
789 </td>
790 </tr>
791 <tr>
792 <td>
793 <p>
794 <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
795 </p>
796 </td>
797 <td>
798 <p>
799 Matches any character that does not have the property Name.
800 </p>
801 </td>
802 <td>
803 <p>
804 <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
805 </p>
806 </td>
807 </tr>
808 </tbody>
809 </table></div>
810 <p>
811 For example <code class="computeroutput"><span class="special">\</span><span class="identifier">pd</span></code>
812 matches any "digit" character, as does <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">digit</span><span class="special">}</span></code>.
813 </p>
814 <h6>
815 <a name="boost_regex.syntax.basic_extended.h20"></a>
816 <span class="phrase"><a name="boost_regex.syntax.basic_extended.word_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.word_boundaries">Word
817 Boundaries</a>
818 </h6>
819 <p>
820 The following escape sequences match the boundaries of words:
821 </p>
822 <div class="informaltable"><table class="table">
823 <colgroup>
824 <col>
825 <col>
826 </colgroup>
827 <thead><tr>
828 <th>
829 <p>
830 Escape
831 </p>
832 </th>
833 <th>
834 <p>
835 Meaning
836 </p>
837 </th>
838 </tr></thead>
839 <tbody>
840 <tr>
841 <td>
842 <p>
843 <code class="computeroutput"><span class="special">\&lt;</span></code>
844 </p>
845 </td>
846 <td>
847 <p>
848 Matches the start of a word.
849 </p>
850 </td>
851 </tr>
852 <tr>
853 <td>
854 <p>
855 <code class="computeroutput"><span class="special">\&gt;</span></code>
856 </p>
857 </td>
858 <td>
859 <p>
860 Matches the end of a word.
861 </p>
862 </td>
863 </tr>
864 <tr>
865 <td>
866 <p>
867 <code class="computeroutput"><span class="special">\</span><span class="identifier">b</span></code>
868 </p>
869 </td>
870 <td>
871 <p>
872 Matches a word boundary (the start or end of a word).
873 </p>
874 </td>
875 </tr>
876 <tr>
877 <td>
878 <p>
879 <code class="computeroutput"><span class="special">\</span><span class="identifier">B</span></code>
880 </p>
881 </td>
882 <td>
883 <p>
884 Matches only when not at a word boundary.
885 </p>
886 </td>
887 </tr>
888 </tbody>
889 </table></div>
890 <h6>
891 <a name="boost_regex.syntax.basic_extended.h21"></a>
892 <span class="phrase"><a name="boost_regex.syntax.basic_extended.buffer_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.buffer_boundaries">Buffer
893 boundaries</a>
894 </h6>
895 <p>
896 The following match only at buffer boundaries: a "buffer" in this
897 context is the whole of the input text that is being matched against (note
898 that ^ and $ may match embedded newlines within the text).
899 </p>
900 <div class="informaltable"><table class="table">
901 <colgroup>
902 <col>
903 <col>
904 </colgroup>
905 <thead><tr>
906 <th>
907 <p>
908 Escape
909 </p>
910 </th>
911 <th>
912 <p>
913 Meaning
914 </p>
915 </th>
916 </tr></thead>
917 <tbody>
918 <tr>
919 <td>
920 <p>
921 \`
922 </p>
923 </td>
924 <td>
925 <p>
926 Matches at the start of a buffer only.
927 </p>
928 </td>
929 </tr>
930 <tr>
931 <td>
932 <p>
933 \'
934 </p>
935 </td>
936 <td>
937 <p>
938 Matches at the end of a buffer only.
939 </p>
940 </td>
941 </tr>
942 <tr>
943 <td>
944 <p>
945 <code class="computeroutput"><span class="special">\</span><span class="identifier">A</span></code>
946 </p>
947 </td>
948 <td>
949 <p>
950 Matches at the start of a buffer only (the same as \`).
951 </p>
952 </td>
953 </tr>
954 <tr>
955 <td>
956 <p>
957 <code class="computeroutput"><span class="special">\</span><span class="identifier">z</span></code>
958 </p>
959 </td>
960 <td>
961 <p>
962 Matches at the end of a buffer only (the same as \').
963 </p>
964 </td>
965 </tr>
966 <tr>
967 <td>
968 <p>
969 <code class="computeroutput"><span class="special">\</span><span class="identifier">Z</span></code>
970 </p>
971 </td>
972 <td>
973 <p>
974 Matches an optional sequence of newlines at the end of a buffer:
975 equivalent to the regular expression <code class="computeroutput"><span class="special">\</span><span class="identifier">n</span><span class="special">*\</span><span class="identifier">z</span></code>
976 </p>
977 </td>
978 </tr>
979 </tbody>
980 </table></div>
981 <h6>
982 <a name="boost_regex.syntax.basic_extended.h22"></a>
983 <span class="phrase"><a name="boost_regex.syntax.basic_extended.continuation_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.continuation_escape">Continuation
984 Escape</a>
985 </h6>
986 <p>
987 The sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">G</span></code>
988 matches only at the end of the last match found, or at the start of the text
989 being matched if no previous match was found. This escape useful if you're
990 iterating over the matches contained within a text, and you want each subsequence
991 match to start where the last one ended.
992 </p>
993 <h6>
994 <a name="boost_regex.syntax.basic_extended.h23"></a>
995 <span class="phrase"><a name="boost_regex.syntax.basic_extended.quoting_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape">Quoting
996 escape</a>
997 </h6>
998 <p>
999 The escape sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span></code>
1000 begins a "quoted sequence": all the subsequent characters are treated
1001 as literals, until either the end of the regular expression or <code class="computeroutput"><span class="special">\</span><span class="identifier">E</span></code> is found.
1002 For example the expression: <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span><span class="special">\*+\</span><span class="identifier">Ea</span><span class="special">+</span></code> would match either of:
1003 </p>
1004 <pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span>
1005 <span class="special">\*+</span><span class="identifier">aaa</span>
1006 </pre>
1007 <h6>
1008 <a name="boost_regex.syntax.basic_extended.h24"></a>
1009 <span class="phrase"><a name="boost_regex.syntax.basic_extended.unicode_escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.unicode_escapes">Unicode
1010 escapes</a>
1011 </h6>
1012 <div class="informaltable"><table class="table">
1013 <colgroup>
1014 <col>
1015 <col>
1016 </colgroup>
1017 <thead><tr>
1018 <th>
1019 <p>
1020 Escape
1021 </p>
1022 </th>
1023 <th>
1024 <p>
1025 Meaning
1026 </p>
1027 </th>
1028 </tr></thead>
1029 <tbody>
1030 <tr>
1031 <td>
1032 <p>
1033 <code class="computeroutput"><span class="special">\</span><span class="identifier">C</span></code>
1034 </p>
1035 </td>
1036 <td>
1037 <p>
1038 Matches a single code point: in Boost regex this has exactly the
1039 same effect as a "." operator.
1040 </p>
1041 </td>
1042 </tr>
1043 <tr>
1044 <td>
1045 <p>
1046 <code class="computeroutput"><span class="special">\</span><span class="identifier">X</span></code>
1047 </p>
1048 </td>
1049 <td>
1050 <p>
1051 Matches a combining character sequence: that is any non-combining
1052 character followed by a sequence of zero or more combining characters.
1053 </p>
1054 </td>
1055 </tr>
1056 </tbody>
1057 </table></div>
1058 <h6>
1059 <a name="boost_regex.syntax.basic_extended.h25"></a>
1060 <span class="phrase"><a name="boost_regex.syntax.basic_extended.any_other_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.any_other_escape">Any
1061 other escape</a>
1062 </h6>
1063 <p>
1064 Any other escape sequence matches the character that is escaped, for example
1065 \@ matches a literal '@'.
1066 </p>
1067 <h5>
1068 <a name="boost_regex.syntax.basic_extended.h26"></a>
1069 <span class="phrase"><a name="boost_regex.syntax.basic_extended.operator_precedence"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.operator_precedence">Operator
1070 precedence</a>
1071 </h5>
1072 <p>
1073 The order of precedence for of operators is as follows:
1074 </p>
1075 <div class="orderedlist"><ol class="orderedlist" type="1">
1076 <li class="listitem">
1077 Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span>
1078 <span class="special">[::]</span> <span class="special">[..]</span></code>
1079 </li>
1080 <li class="listitem">
1081 Escaped characters <code class="computeroutput"><span class="special">\</span></code>
1082 </li>
1083 <li class="listitem">
1084 Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code>
1085 </li>
1086 <li class="listitem">
1087 Grouping <code class="computeroutput"><span class="special">()</span></code>
1088 </li>
1089 <li class="listitem">
1090 Single-character-ERE duplication <code class="computeroutput"><span class="special">*</span>
1091 <span class="special">+</span> <span class="special">?</span>
1092 <span class="special">{</span><span class="identifier">m</span><span class="special">,</span><span class="identifier">n</span><span class="special">}</span></code>
1093 </li>
1094 <li class="listitem">
1095 Concatenation
1096 </li>
1097 <li class="listitem">
1098 Anchoring ^$
1099 </li>
1100 <li class="listitem">
1101 Alternation <code class="computeroutput"><span class="special">|</span></code>
1102 </li>
1103 </ol></div>
1104 <h5>
1105 <a name="boost_regex.syntax.basic_extended.h27"></a>
1106 <span class="phrase"><a name="boost_regex.syntax.basic_extended.what_gets_matched"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.what_gets_matched">What
1107 Gets Matched</a>
1108 </h5>
1109 <p>
1110 When there is more that one way to match a regular expression, the "best"
1111 possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest
1112 rule</a>.
1113 </p>
1114 <h4>
1115 <a name="boost_regex.syntax.basic_extended.h28"></a>
1116 <span class="phrase"><a name="boost_regex.syntax.basic_extended.variations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.variations">Variations</a>
1117 </h4>
1118 <h5>
1119 <a name="boost_regex.syntax.basic_extended.h29"></a>
1120 <span class="phrase"><a name="boost_regex.syntax.basic_extended.egrep"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.egrep">Egrep</a>
1121 </h5>
1122 <p>
1123 When an expression is compiled with the <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">flag
1124 <code class="computeroutput"><span class="identifier">egrep</span></code></a> set, then the
1125 expression is treated as a newline separated list of <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended
1126 expressions</a>, a match is found if any of the expressions in the list
1127 match, for example:
1128 </p>
1129 <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">egrep</span><span class="special">);</span>
1130 </pre>
1131 <p>
1132 will match either of the POSIX-Basic expressions "abc" or "def".
1133 </p>
1134 <p>
1135 As its name suggests, this behavior is consistent with the Unix utility
1136 <code class="computeroutput"><span class="identifier">egrep</span></code>, and with grep when
1137 used with the -E option.
1138 </p>
1139 <h5>
1140 <a name="boost_regex.syntax.basic_extended.h30"></a>
1141 <span class="phrase"><a name="boost_regex.syntax.basic_extended.awk"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.awk">awk</a>
1142 </h5>
1143 <p>
1144 In addition to the <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended
1145 features</a> the escape character is special inside a character class
1146 declaration.
1147 </p>
1148 <p>
1149 In addition, some escape sequences that are not defined as part of POSIX-Extended
1150 specification are required to be supported - however Boost.Regex supports
1151 these by default anyway.
1152 </p>
1153 <h4>
1154 <a name="boost_regex.syntax.basic_extended.h31"></a>
1155 <span class="phrase"><a name="boost_regex.syntax.basic_extended.options"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.options">Options</a>
1156 </h4>
1157 <p>
1158 There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions">variety
1159 of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">extended</span></code>
1160 and <code class="computeroutput"><span class="identifier">egrep</span></code> options when constructing
1161 the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code></a> option alters the syntax,
1162 while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code>, <code class="computeroutput"><span class="identifier">nosubs</span></code>
1163 and <code class="computeroutput"><span class="identifier">icase</span></code> options</a>
1164 modify how the case and locale sensitivity are to be applied.
1165 </p>
1166 <h4>
1167 <a name="boost_regex.syntax.basic_extended.h32"></a>
1168 <span class="phrase"><a name="boost_regex.syntax.basic_extended.references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.references">References</a>
1169 </h4>
1170 <p>
1171 <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE
1172 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions
1173 and Headers, Section 9, Regular Expressions.</a>
1174 </p>
1175 <p>
1176 <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE
1177 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
1178 Utilities, Section 4, Utilities, egrep.</a>
1179 </p>
1180 <p>
1181 <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/awk.html" target="_top">IEEE
1182 Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
1183 Utilities, Section 4, Utilities, awk.</a>
1184 </p>
1185 </div>
1186 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
1187 <td align="left"></td>
1188 <td align="right"><div class="copyright-footer">Copyright &#169; 1998-2013 John Maddock<p>
1189 Distributed under the Boost Software License, Version 1.0. (See accompanying
1190 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
1191 </p>
1192 </div></td>
1193 </tr></table>
1194 <hr>
1195 <div class="spirit-nav">
1196 <a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
1197 </div>
1198 </body>
1199 </html>