]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> | |
4 | <title>POSIX Extended Regular Expression Syntax</title> | |
5 | <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> | |
6 | <meta name="generator" content="DocBook XSL Stylesheets V1.77.1"> | |
7 | <link rel="home" href="../../index.html" title="Boost.Regex 5.1.2"> | |
8 | <link rel="up" href="../syntax.html" title="Regular Expression Syntax"> | |
9 | <link rel="prev" href="perl_syntax.html" title="Perl Regular Expression Syntax"> | |
10 | <link rel="next" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax"> | |
11 | </head> | |
12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> | |
13 | <table cellpadding="2" width="100%"><tr> | |
14 | <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> | |
15 | <td align="center"><a href="../../../../../../index.html">Home</a></td> | |
16 | <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> | |
17 | <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> | |
18 | <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> | |
19 | <td align="center"><a href="../../../../../../more/index.htm">More</a></td> | |
20 | </tr></table> | |
21 | <hr> | |
22 | <div class="spirit-nav"> | |
23 | <a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
24 | </div> | |
25 | <div class="section"> | |
26 | <div class="titlepage"><div><div><h3 class="title"> | |
27 | <a name="boost_regex.syntax.basic_extended"></a><a class="link" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">POSIX Extended Regular | |
28 | Expression Syntax</a> | |
29 | </h3></div></div></div> | |
30 | <h4> | |
31 | <a name="boost_regex.syntax.basic_extended.h0"></a> | |
32 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.synopsis"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.synopsis">Synopsis</a> | |
33 | </h4> | |
34 | <p> | |
35 | The POSIX-Extended regular expression syntax is supported by the POSIX C | |
36 | regular expression API's, and variations are used by the utilities <code class="computeroutput"><span class="identifier">egrep</span></code> and <code class="computeroutput"><span class="identifier">awk</span></code>. | |
37 | You can construct POSIX extended regular expressions in Boost.Regex by passing | |
38 | the flag <code class="computeroutput"><span class="identifier">extended</span></code> to the | |
39 | regex constructor, for example: | |
40 | </p> | |
41 | <pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:</span> | |
42 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span> | |
43 | <span class="comment">// e2 a case insensitive POSIX-Extended expression:</span> | |
44 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span> | |
45 | </pre> | |
46 | <a name="boost_regex.posix_extended_syntax"></a><h4> | |
47 | <a name="boost_regex.syntax.basic_extended.h1"></a> | |
48 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.posix_extended_syntax"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.posix_extended_syntax">POSIX Extended | |
49 | Syntax</a> | |
50 | </h4> | |
51 | <p> | |
52 | In POSIX-Extended regular expressions, all characters match themselves except | |
53 | for the following special characters: | |
54 | </p> | |
55 | <pre class="programlisting">.[{}()\*+?|^$</pre> | |
56 | <h5> | |
57 | <a name="boost_regex.syntax.basic_extended.h2"></a> | |
58 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.wildcard"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.wildcard">Wildcard:</a> | |
59 | </h5> | |
60 | <p> | |
61 | The single character '.' when used outside of a character set will match | |
62 | any single character except: | |
63 | </p> | |
64 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
65 | <li class="listitem"> | |
66 | The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code> | |
67 | is passed to the matching algorithms. | |
68 | </li> | |
69 | <li class="listitem"> | |
70 | The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code> | |
71 | is passed to the matching algorithms. | |
72 | </li> | |
73 | </ul></div> | |
74 | <h5> | |
75 | <a name="boost_regex.syntax.basic_extended.h3"></a> | |
76 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.anchors"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.anchors">Anchors:</a> | |
77 | </h5> | |
78 | <p> | |
79 | A '^' character shall match the start of a line when used as the first character | |
80 | of an expression, or the first character of a sub-expression. | |
81 | </p> | |
82 | <p> | |
83 | A '$' character shall match the end of a line when used as the last character | |
84 | of an expression, or the last character of a sub-expression. | |
85 | </p> | |
86 | <h5> | |
87 | <a name="boost_regex.syntax.basic_extended.h4"></a> | |
88 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.marked_sub_expressions"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.marked_sub_expressions">Marked | |
89 | sub-expressions:</a> | |
90 | </h5> | |
91 | <p> | |
92 | A section beginning <code class="computeroutput"><span class="special">(</span></code> and ending | |
93 | <code class="computeroutput"><span class="special">)</span></code> acts as a marked sub-expression. | |
94 | Whatever matched the sub-expression is split out in a separate field by the | |
95 | matching algorithms. Marked sub-expressions can also repeated, or referred | |
96 | to by a back-reference. | |
97 | </p> | |
98 | <h5> | |
99 | <a name="boost_regex.syntax.basic_extended.h5"></a> | |
100 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.repeats"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.repeats">Repeats:</a> | |
101 | </h5> | |
102 | <p> | |
103 | Any atom (a single character, a marked sub-expression, or a character class) | |
104 | can be repeated with the <code class="computeroutput"><span class="special">*</span></code>, | |
105 | <code class="computeroutput"><span class="special">+</span></code>, <code class="computeroutput"><span class="special">?</span></code>, | |
106 | and <code class="computeroutput"><span class="special">{}</span></code> operators. | |
107 | </p> | |
108 | <p> | |
109 | The <code class="computeroutput"><span class="special">*</span></code> operator will match the | |
110 | preceding atom <span class="emphasis"><em>zero or more times</em></span>, for example the expression | |
111 | <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code> will match any of the following: | |
112 | </p> | |
113 | <pre class="programlisting">b | |
114 | ab | |
115 | aaaaaaaab | |
116 | </pre> | |
117 | <p> | |
118 | The <code class="computeroutput"><span class="special">+</span></code> operator will match the | |
119 | preceding atom <span class="emphasis"><em>one or more times</em></span>, for example the expression | |
120 | a+b will match any of the following: | |
121 | </p> | |
122 | <pre class="programlisting">ab | |
123 | aaaaaaaab | |
124 | </pre> | |
125 | <p> | |
126 | But will not match: | |
127 | </p> | |
128 | <pre class="programlisting">b | |
129 | </pre> | |
130 | <p> | |
131 | The <code class="computeroutput"><span class="special">?</span></code> operator will match the | |
132 | preceding atom <span class="emphasis"><em>zero or one times</em></span>, for example the expression | |
133 | <code class="computeroutput"><span class="identifier">ca</span><span class="special">?</span><span class="identifier">b</span></code> will match any of the following: | |
134 | </p> | |
135 | <pre class="programlisting">cb | |
136 | cab | |
137 | </pre> | |
138 | <p> | |
139 | But will not match: | |
140 | </p> | |
141 | <pre class="programlisting">caab | |
142 | </pre> | |
143 | <p> | |
144 | An atom can also be repeated with a bounded repeat: | |
145 | </p> | |
146 | <p> | |
147 | <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">}</span></code> Matches | |
148 | 'a' repeated <span class="emphasis"><em>exactly n times</em></span>. | |
149 | </p> | |
150 | <p> | |
151 | <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,}</span></code> Matches | |
152 | 'a' repeated <span class="emphasis"><em>n or more times</em></span>. | |
153 | </p> | |
154 | <p> | |
155 | <code class="computeroutput"><span class="identifier">a</span><span class="special">{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">}</span></code> Matches 'a' repeated <span class="emphasis"><em>between n | |
156 | and m times inclusive</em></span>. | |
157 | </p> | |
158 | <p> | |
159 | For example: | |
160 | </p> | |
161 | <pre class="programlisting">^a{2,3}$</pre> | |
162 | <p> | |
163 | Will match either of: | |
164 | </p> | |
165 | <pre class="programlisting"><span class="identifier">aa</span> | |
166 | <span class="identifier">aaa</span> | |
167 | </pre> | |
168 | <p> | |
169 | But neither of: | |
170 | </p> | |
171 | <pre class="programlisting"><span class="identifier">a</span> | |
172 | <span class="identifier">aaaa</span> | |
173 | </pre> | |
174 | <p> | |
175 | It is an error to use a repeat operator, if the preceding construct can not | |
176 | be repeated, for example: | |
177 | </p> | |
178 | <pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span> | |
179 | </pre> | |
180 | <p> | |
181 | Will raise an error, as there is nothing for the <code class="computeroutput"><span class="special">*</span></code> | |
182 | operator to be applied to. | |
183 | </p> | |
184 | <h5> | |
185 | <a name="boost_regex.syntax.basic_extended.h6"></a> | |
186 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.back_references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.back_references">Back | |
187 | references:</a> | |
188 | </h5> | |
189 | <p> | |
190 | An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span> | |
191 | is in the range 1-9, matches the same string that was matched by sub-expression | |
192 | <span class="emphasis"><em>n</em></span>. For example the expression: | |
193 | </p> | |
194 | <pre class="programlisting">^(a*).*\1$</pre> | |
195 | <p> | |
196 | Will match the string: | |
197 | </p> | |
198 | <pre class="programlisting"><span class="identifier">aaabbaaa</span> | |
199 | </pre> | |
200 | <p> | |
201 | But not the string: | |
202 | </p> | |
203 | <pre class="programlisting"><span class="identifier">aaabba</span> | |
204 | </pre> | |
205 | <div class="caution"><table border="0" summary="Caution"> | |
206 | <tr> | |
207 | <td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="../../../../../../doc/src/images/caution.png"></td> | |
208 | <th align="left">Caution</th> | |
209 | </tr> | |
210 | <tr><td align="left" valign="top"><p> | |
211 | The POSIX standard does not support back-references for "extended" | |
212 | regular expressions, this is a compatible extension to that standard. | |
213 | </p></td></tr> | |
214 | </table></div> | |
215 | <h5> | |
216 | <a name="boost_regex.syntax.basic_extended.h7"></a> | |
217 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.alternation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.alternation">Alternation</a> | |
218 | </h5> | |
219 | <p> | |
220 | The <code class="computeroutput"><span class="special">|</span></code> operator will match either | |
221 | of its arguments, so for example: <code class="computeroutput"><span class="identifier">abc</span><span class="special">|</span><span class="identifier">def</span></code> will | |
222 | match either "abc" or "def". | |
223 | </p> | |
224 | <p> | |
225 | Parenthesis can be used to group alternations, for example: <code class="computeroutput"><span class="identifier">ab</span><span class="special">(</span><span class="identifier">d</span><span class="special">|</span><span class="identifier">ef</span><span class="special">)</span></code> | |
226 | will match either of "abd" or "abef". | |
227 | </p> | |
228 | <h5> | |
229 | <a name="boost_regex.syntax.basic_extended.h8"></a> | |
230 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_sets"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_sets">Character | |
231 | sets:</a> | |
232 | </h5> | |
233 | <p> | |
234 | A character set is a bracket-expression starting with [ and ending with ], | |
235 | it defines a set of characters, and matches any single character that is | |
236 | a member of that set. | |
237 | </p> | |
238 | <p> | |
239 | A bracket expression may contain any combination of the following: | |
240 | </p> | |
241 | <h6> | |
242 | <a name="boost_regex.syntax.basic_extended.h9"></a> | |
243 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_characters"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_characters">Single | |
244 | characters:</a> | |
245 | </h6> | |
246 | <p> | |
247 | For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b', | |
248 | or 'c'. | |
249 | </p> | |
250 | <h6> | |
251 | <a name="boost_regex.syntax.basic_extended.h10"></a> | |
252 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_ranges"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_ranges">Character | |
253 | ranges:</a> | |
254 | </h6> | |
255 | <p> | |
256 | For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> | |
257 | will match any single character in the range 'a' to 'c'. By default, for | |
258 | POSIX-Extended regular expressions, a character <span class="emphasis"><em>x</em></span> is | |
259 | within the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it | |
260 | collates within that range; this results in locale specific behavior . This | |
261 | behavior can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code> | |
262 | <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">option flag</a> - in | |
263 | which case whether a character appears within a range is determined by comparing | |
264 | the code points of the characters only. | |
265 | </p> | |
266 | <h6> | |
267 | <a name="boost_regex.syntax.basic_extended.h11"></a> | |
268 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.negation"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.negation">Negation:</a> | |
269 | </h6> | |
270 | <p> | |
271 | If the bracket-expression begins with the ^ character, then it matches the | |
272 | complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the | |
273 | range <code class="computeroutput"><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span></code>. | |
274 | </p> | |
275 | <h6> | |
276 | <a name="boost_regex.syntax.basic_extended.h12"></a> | |
277 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_classes">Character | |
278 | classes:</a> | |
279 | </h6> | |
280 | <p> | |
281 | An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code> | |
282 | matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See | |
283 | <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>. | |
284 | </p> | |
285 | <h6> | |
286 | <a name="boost_regex.syntax.basic_extended.h13"></a> | |
287 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.collating_elements"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.collating_elements">Collating | |
288 | Elements:</a> | |
289 | </h6> | |
290 | <p> | |
291 | An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches | |
292 | the collating element <span class="emphasis"><em>col</em></span>. A collating element is any | |
293 | single character, or any sequence of characters that collates as a single | |
294 | unit. Collating elements may also be used as the end point of a range, for | |
295 | example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code> | |
296 | matches the character sequence "ae", plus any single character | |
297 | in the range "ae"-c, assuming that "ae" is treated as | |
298 | a single collating element in the current locale. | |
299 | </p> | |
300 | <p> | |
301 | Collating elements may be used in place of escapes (which are not normally | |
302 | allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would | |
303 | match either one of the characters 'abc^'. | |
304 | </p> | |
305 | <p> | |
306 | As an extension, a collating element may also be specified via its <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example: | |
307 | </p> | |
308 | <pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span> | |
309 | </pre> | |
310 | <p> | |
311 | matches a NUL character. | |
312 | </p> | |
313 | <h6> | |
314 | <a name="boost_regex.syntax.basic_extended.h14"></a> | |
315 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.equivalence_classes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.equivalence_classes">Equivalence | |
316 | classes:</a> | |
317 | </h6> | |
318 | <p> | |
319 | An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>, | |
320 | matches any character or collating element whose primary sort key is the | |
321 | same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating | |
322 | elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic | |
323 | name</a>. A primary sort key is one that ignores case, accentation, or | |
324 | locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches | |
325 | any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation | |
326 | of this is reliant on the platform's collation and localisation support; | |
327 | this feature can not be relied upon to work portably across all platforms, | |
328 | or even all locales on one platform. | |
329 | </p> | |
330 | <h6> | |
331 | <a name="boost_regex.syntax.basic_extended.h15"></a> | |
332 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.combinations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.combinations">Combinations:</a> | |
333 | </h6> | |
334 | <p> | |
335 | All of the above can be combined in one character set declaration, for example: | |
336 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]]</span></code>. | |
337 | </p> | |
338 | <h5> | |
339 | <a name="boost_regex.syntax.basic_extended.h16"></a> | |
340 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes">Escapes</a> | |
341 | </h5> | |
342 | <p> | |
343 | The POSIX standard defines no escape sequences for POSIX-Extended regular | |
344 | expressions, except that: | |
345 | </p> | |
346 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
347 | <li class="listitem"> | |
348 | Any special character preceded by an escape shall match itself. | |
349 | </li> | |
350 | <li class="listitem"> | |
351 | The effect of any ordinary character being preceded by an escape is undefined. | |
352 | </li> | |
353 | <li class="listitem"> | |
354 | An escape inside a character class declaration shall match itself: in | |
355 | other words the escape character is not "special" inside a | |
356 | character class declaration; so <code class="computeroutput"><span class="special">[\^]</span></code> | |
357 | will match either a literal '\' or a '^'. | |
358 | </li> | |
359 | </ul></div> | |
360 | <p> | |
361 | However, that's rather restrictive, so the following standard-compatible | |
362 | extensions are also supported by Boost.Regex: | |
363 | </p> | |
364 | <h6> | |
365 | <a name="boost_regex.syntax.basic_extended.h17"></a> | |
366 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_char"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes_matching_a_specific_char">Escapes | |
367 | matching a specific character</a> | |
368 | </h6> | |
369 | <p> | |
370 | The following escape sequences are all synonyms for single characters: | |
371 | </p> | |
372 | <div class="informaltable"><table class="table"> | |
373 | <colgroup> | |
374 | <col> | |
375 | <col> | |
376 | </colgroup> | |
377 | <thead><tr> | |
378 | <th> | |
379 | <p> | |
380 | Escape | |
381 | </p> | |
382 | </th> | |
383 | <th> | |
384 | <p> | |
385 | Character | |
386 | </p> | |
387 | </th> | |
388 | </tr></thead> | |
389 | <tbody> | |
390 | <tr> | |
391 | <td> | |
392 | <p> | |
393 | \a | |
394 | </p> | |
395 | </td> | |
396 | <td> | |
397 | <p> | |
398 | '\a' | |
399 | </p> | |
400 | </td> | |
401 | </tr> | |
402 | <tr> | |
403 | <td> | |
404 | <p> | |
405 | \e | |
406 | </p> | |
407 | </td> | |
408 | <td> | |
409 | <p> | |
410 | 0x1B | |
411 | </p> | |
412 | </td> | |
413 | </tr> | |
414 | <tr> | |
415 | <td> | |
416 | <p> | |
417 | \f | |
418 | </p> | |
419 | </td> | |
420 | <td> | |
421 | <p> | |
422 | \f | |
423 | </p> | |
424 | </td> | |
425 | </tr> | |
426 | <tr> | |
427 | <td> | |
428 | <p> | |
429 | \n | |
430 | </p> | |
431 | </td> | |
432 | <td> | |
433 | <p> | |
434 | \n | |
435 | </p> | |
436 | </td> | |
437 | </tr> | |
438 | <tr> | |
439 | <td> | |
440 | <p> | |
441 | \r | |
442 | </p> | |
443 | </td> | |
444 | <td> | |
445 | <p> | |
446 | \r | |
447 | </p> | |
448 | </td> | |
449 | </tr> | |
450 | <tr> | |
451 | <td> | |
452 | <p> | |
453 | \t | |
454 | </p> | |
455 | </td> | |
456 | <td> | |
457 | <p> | |
458 | \t | |
459 | </p> | |
460 | </td> | |
461 | </tr> | |
462 | <tr> | |
463 | <td> | |
464 | <p> | |
465 | \v | |
466 | </p> | |
467 | </td> | |
468 | <td> | |
469 | <p> | |
470 | \v | |
471 | </p> | |
472 | </td> | |
473 | </tr> | |
474 | <tr> | |
475 | <td> | |
476 | <p> | |
477 | \b | |
478 | </p> | |
479 | </td> | |
480 | <td> | |
481 | <p> | |
482 | \b (but only inside a character class declaration). | |
483 | </p> | |
484 | </td> | |
485 | </tr> | |
486 | <tr> | |
487 | <td> | |
488 | <p> | |
489 | \cX | |
490 | </p> | |
491 | </td> | |
492 | <td> | |
493 | <p> | |
494 | An ASCII escape sequence - the character whose code point is X | |
495 | % 32 | |
496 | </p> | |
497 | </td> | |
498 | </tr> | |
499 | <tr> | |
500 | <td> | |
501 | <p> | |
502 | \xdd | |
503 | </p> | |
504 | </td> | |
505 | <td> | |
506 | <p> | |
507 | A hexadecimal escape sequence - matches the single character whose | |
508 | code point is 0xdd. | |
509 | </p> | |
510 | </td> | |
511 | </tr> | |
512 | <tr> | |
513 | <td> | |
514 | <p> | |
515 | \x{dddd} | |
516 | </p> | |
517 | </td> | |
518 | <td> | |
519 | <p> | |
520 | A hexadecimal escape sequence - matches the single character whose | |
521 | code point is 0xdddd. | |
522 | </p> | |
523 | </td> | |
524 | </tr> | |
525 | <tr> | |
526 | <td> | |
527 | <p> | |
528 | \0ddd | |
529 | </p> | |
530 | </td> | |
531 | <td> | |
532 | <p> | |
533 | An octal escape sequence - matches the single character whose code | |
534 | point is 0ddd. | |
535 | </p> | |
536 | </td> | |
537 | </tr> | |
538 | <tr> | |
539 | <td> | |
540 | <p> | |
541 | \N{Name} | |
542 | </p> | |
543 | </td> | |
544 | <td> | |
545 | <p> | |
546 | Matches the single character which has the symbolic name <span class="emphasis"><em>Name</em></span>. | |
547 | For example <code class="computeroutput"><span class="special">\\</span><span class="identifier">N</span><span class="special">{</span><span class="identifier">newline</span><span class="special">}</span></code> matches the single character \n. | |
548 | </p> | |
549 | </td> | |
550 | </tr> | |
551 | </tbody> | |
552 | </table></div> | |
553 | <h6> | |
554 | <a name="boost_regex.syntax.basic_extended.h18"></a> | |
555 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.single_character_character_class"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_character_character_class">"Single | |
556 | character" character classes:</a> | |
557 | </h6> | |
558 | <p> | |
559 | Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is | |
560 | the name of a character class shall match any character that is a member | |
561 | of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span> | |
562 | is the name of a character class, shall match any character not in that class. | |
563 | </p> | |
564 | <p> | |
565 | The following are supported by default: | |
566 | </p> | |
567 | <div class="informaltable"><table class="table"> | |
568 | <colgroup> | |
569 | <col> | |
570 | <col> | |
571 | </colgroup> | |
572 | <thead><tr> | |
573 | <th> | |
574 | <p> | |
575 | Escape sequence | |
576 | </p> | |
577 | </th> | |
578 | <th> | |
579 | <p> | |
580 | Equivalent to | |
581 | </p> | |
582 | </th> | |
583 | </tr></thead> | |
584 | <tbody> | |
585 | <tr> | |
586 | <td> | |
587 | <p> | |
588 | <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code> | |
589 | </p> | |
590 | </td> | |
591 | <td> | |
592 | <p> | |
593 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> | |
594 | </p> | |
595 | </td> | |
596 | </tr> | |
597 | <tr> | |
598 | <td> | |
599 | <p> | |
600 | <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code> | |
601 | </p> | |
602 | </td> | |
603 | <td> | |
604 | <p> | |
605 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> | |
606 | </p> | |
607 | </td> | |
608 | </tr> | |
609 | <tr> | |
610 | <td> | |
611 | <p> | |
612 | <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code> | |
613 | </p> | |
614 | </td> | |
615 | <td> | |
616 | <p> | |
617 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code> | |
618 | </p> | |
619 | </td> | |
620 | </tr> | |
621 | <tr> | |
622 | <td> | |
623 | <p> | |
624 | <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code> | |
625 | </p> | |
626 | </td> | |
627 | <td> | |
628 | <p> | |
629 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> | |
630 | </p> | |
631 | </td> | |
632 | </tr> | |
633 | <tr> | |
634 | <td> | |
635 | <p> | |
636 | <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code> | |
637 | </p> | |
638 | </td> | |
639 | <td> | |
640 | <p> | |
641 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code> | |
642 | </p> | |
643 | </td> | |
644 | </tr> | |
645 | <tr> | |
646 | <td> | |
647 | <p> | |
648 | <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code> | |
649 | </p> | |
650 | </td> | |
651 | <td> | |
652 | <p> | |
653 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> | |
654 | </p> | |
655 | </td> | |
656 | </tr> | |
657 | <tr> | |
658 | <td> | |
659 | <p> | |
660 | <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code> | |
661 | </p> | |
662 | </td> | |
663 | <td> | |
664 | <p> | |
665 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> | |
666 | </p> | |
667 | </td> | |
668 | </tr> | |
669 | <tr> | |
670 | <td> | |
671 | <p> | |
672 | <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code> | |
673 | </p> | |
674 | </td> | |
675 | <td> | |
676 | <p> | |
677 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code> | |
678 | </p> | |
679 | </td> | |
680 | </tr> | |
681 | <tr> | |
682 | <td> | |
683 | <p> | |
684 | <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code> | |
685 | </p> | |
686 | </td> | |
687 | <td> | |
688 | <p> | |
689 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> | |
690 | </p> | |
691 | </td> | |
692 | </tr> | |
693 | <tr> | |
694 | <td> | |
695 | <p> | |
696 | <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code> | |
697 | </p> | |
698 | </td> | |
699 | <td> | |
700 | <p> | |
701 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code> | |
702 | </p> | |
703 | </td> | |
704 | </tr> | |
705 | </tbody> | |
706 | </table></div> | |
707 | <h6> | |
708 | <a name="boost_regex.syntax.basic_extended.h19"></a> | |
709 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.character_properties"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_properties">Character | |
710 | Properties</a> | |
711 | </h6> | |
712 | <p> | |
713 | The character property names in the following table are all equivalent to | |
714 | the names used in character classes. | |
715 | </p> | |
716 | <div class="informaltable"><table class="table"> | |
717 | <colgroup> | |
718 | <col> | |
719 | <col> | |
720 | <col> | |
721 | </colgroup> | |
722 | <thead><tr> | |
723 | <th> | |
724 | <p> | |
725 | Form | |
726 | </p> | |
727 | </th> | |
728 | <th> | |
729 | <p> | |
730 | Description | |
731 | </p> | |
732 | </th> | |
733 | <th> | |
734 | <p> | |
735 | Equivalent character set form | |
736 | </p> | |
737 | </th> | |
738 | </tr></thead> | |
739 | <tbody> | |
740 | <tr> | |
741 | <td> | |
742 | <p> | |
743 | <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code> | |
744 | </p> | |
745 | </td> | |
746 | <td> | |
747 | <p> | |
748 | Matches any character that has the property X. | |
749 | </p> | |
750 | </td> | |
751 | <td> | |
752 | <p> | |
753 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code> | |
754 | </p> | |
755 | </td> | |
756 | </tr> | |
757 | <tr> | |
758 | <td> | |
759 | <p> | |
760 | <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> | |
761 | </p> | |
762 | </td> | |
763 | <td> | |
764 | <p> | |
765 | Matches any character that has the property Name. | |
766 | </p> | |
767 | </td> | |
768 | <td> | |
769 | <p> | |
770 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> | |
771 | </p> | |
772 | </td> | |
773 | </tr> | |
774 | <tr> | |
775 | <td> | |
776 | <p> | |
777 | <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code> | |
778 | </p> | |
779 | </td> | |
780 | <td> | |
781 | <p> | |
782 | Matches any character that does not have the property X. | |
783 | </p> | |
784 | </td> | |
785 | <td> | |
786 | <p> | |
787 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code> | |
788 | </p> | |
789 | </td> | |
790 | </tr> | |
791 | <tr> | |
792 | <td> | |
793 | <p> | |
794 | <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> | |
795 | </p> | |
796 | </td> | |
797 | <td> | |
798 | <p> | |
799 | Matches any character that does not have the property Name. | |
800 | </p> | |
801 | </td> | |
802 | <td> | |
803 | <p> | |
804 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> | |
805 | </p> | |
806 | </td> | |
807 | </tr> | |
808 | </tbody> | |
809 | </table></div> | |
810 | <p> | |
811 | For example <code class="computeroutput"><span class="special">\</span><span class="identifier">pd</span></code> | |
812 | matches any "digit" character, as does <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">digit</span><span class="special">}</span></code>. | |
813 | </p> | |
814 | <h6> | |
815 | <a name="boost_regex.syntax.basic_extended.h20"></a> | |
816 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.word_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.word_boundaries">Word | |
817 | Boundaries</a> | |
818 | </h6> | |
819 | <p> | |
820 | The following escape sequences match the boundaries of words: | |
821 | </p> | |
822 | <div class="informaltable"><table class="table"> | |
823 | <colgroup> | |
824 | <col> | |
825 | <col> | |
826 | </colgroup> | |
827 | <thead><tr> | |
828 | <th> | |
829 | <p> | |
830 | Escape | |
831 | </p> | |
832 | </th> | |
833 | <th> | |
834 | <p> | |
835 | Meaning | |
836 | </p> | |
837 | </th> | |
838 | </tr></thead> | |
839 | <tbody> | |
840 | <tr> | |
841 | <td> | |
842 | <p> | |
843 | <code class="computeroutput"><span class="special">\<</span></code> | |
844 | </p> | |
845 | </td> | |
846 | <td> | |
847 | <p> | |
848 | Matches the start of a word. | |
849 | </p> | |
850 | </td> | |
851 | </tr> | |
852 | <tr> | |
853 | <td> | |
854 | <p> | |
855 | <code class="computeroutput"><span class="special">\></span></code> | |
856 | </p> | |
857 | </td> | |
858 | <td> | |
859 | <p> | |
860 | Matches the end of a word. | |
861 | </p> | |
862 | </td> | |
863 | </tr> | |
864 | <tr> | |
865 | <td> | |
866 | <p> | |
867 | <code class="computeroutput"><span class="special">\</span><span class="identifier">b</span></code> | |
868 | </p> | |
869 | </td> | |
870 | <td> | |
871 | <p> | |
872 | Matches a word boundary (the start or end of a word). | |
873 | </p> | |
874 | </td> | |
875 | </tr> | |
876 | <tr> | |
877 | <td> | |
878 | <p> | |
879 | <code class="computeroutput"><span class="special">\</span><span class="identifier">B</span></code> | |
880 | </p> | |
881 | </td> | |
882 | <td> | |
883 | <p> | |
884 | Matches only when not at a word boundary. | |
885 | </p> | |
886 | </td> | |
887 | </tr> | |
888 | </tbody> | |
889 | </table></div> | |
890 | <h6> | |
891 | <a name="boost_regex.syntax.basic_extended.h21"></a> | |
892 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.buffer_boundaries"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.buffer_boundaries">Buffer | |
893 | boundaries</a> | |
894 | </h6> | |
895 | <p> | |
896 | The following match only at buffer boundaries: a "buffer" in this | |
897 | context is the whole of the input text that is being matched against (note | |
898 | that ^ and $ may match embedded newlines within the text). | |
899 | </p> | |
900 | <div class="informaltable"><table class="table"> | |
901 | <colgroup> | |
902 | <col> | |
903 | <col> | |
904 | </colgroup> | |
905 | <thead><tr> | |
906 | <th> | |
907 | <p> | |
908 | Escape | |
909 | </p> | |
910 | </th> | |
911 | <th> | |
912 | <p> | |
913 | Meaning | |
914 | </p> | |
915 | </th> | |
916 | </tr></thead> | |
917 | <tbody> | |
918 | <tr> | |
919 | <td> | |
920 | <p> | |
921 | \` | |
922 | </p> | |
923 | </td> | |
924 | <td> | |
925 | <p> | |
926 | Matches at the start of a buffer only. | |
927 | </p> | |
928 | </td> | |
929 | </tr> | |
930 | <tr> | |
931 | <td> | |
932 | <p> | |
933 | \' | |
934 | </p> | |
935 | </td> | |
936 | <td> | |
937 | <p> | |
938 | Matches at the end of a buffer only. | |
939 | </p> | |
940 | </td> | |
941 | </tr> | |
942 | <tr> | |
943 | <td> | |
944 | <p> | |
945 | <code class="computeroutput"><span class="special">\</span><span class="identifier">A</span></code> | |
946 | </p> | |
947 | </td> | |
948 | <td> | |
949 | <p> | |
950 | Matches at the start of a buffer only (the same as \`). | |
951 | </p> | |
952 | </td> | |
953 | </tr> | |
954 | <tr> | |
955 | <td> | |
956 | <p> | |
957 | <code class="computeroutput"><span class="special">\</span><span class="identifier">z</span></code> | |
958 | </p> | |
959 | </td> | |
960 | <td> | |
961 | <p> | |
962 | Matches at the end of a buffer only (the same as \'). | |
963 | </p> | |
964 | </td> | |
965 | </tr> | |
966 | <tr> | |
967 | <td> | |
968 | <p> | |
969 | <code class="computeroutput"><span class="special">\</span><span class="identifier">Z</span></code> | |
970 | </p> | |
971 | </td> | |
972 | <td> | |
973 | <p> | |
974 | Matches an optional sequence of newlines at the end of a buffer: | |
975 | equivalent to the regular expression <code class="computeroutput"><span class="special">\</span><span class="identifier">n</span><span class="special">*\</span><span class="identifier">z</span></code> | |
976 | </p> | |
977 | </td> | |
978 | </tr> | |
979 | </tbody> | |
980 | </table></div> | |
981 | <h6> | |
982 | <a name="boost_regex.syntax.basic_extended.h22"></a> | |
983 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.continuation_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.continuation_escape">Continuation | |
984 | Escape</a> | |
985 | </h6> | |
986 | <p> | |
987 | The sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">G</span></code> | |
988 | matches only at the end of the last match found, or at the start of the text | |
989 | being matched if no previous match was found. This escape useful if you're | |
990 | iterating over the matches contained within a text, and you want each subsequence | |
991 | match to start where the last one ended. | |
992 | </p> | |
993 | <h6> | |
994 | <a name="boost_regex.syntax.basic_extended.h23"></a> | |
995 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.quoting_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape">Quoting | |
996 | escape</a> | |
997 | </h6> | |
998 | <p> | |
999 | The escape sequence <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span></code> | |
1000 | begins a "quoted sequence": all the subsequent characters are treated | |
1001 | as literals, until either the end of the regular expression or <code class="computeroutput"><span class="special">\</span><span class="identifier">E</span></code> is found. | |
1002 | For example the expression: <code class="computeroutput"><span class="special">\</span><span class="identifier">Q</span><span class="special">\*+\</span><span class="identifier">Ea</span><span class="special">+</span></code> would match either of: | |
1003 | </p> | |
1004 | <pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span> | |
1005 | <span class="special">\*+</span><span class="identifier">aaa</span> | |
1006 | </pre> | |
1007 | <h6> | |
1008 | <a name="boost_regex.syntax.basic_extended.h24"></a> | |
1009 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.unicode_escapes"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.unicode_escapes">Unicode | |
1010 | escapes</a> | |
1011 | </h6> | |
1012 | <div class="informaltable"><table class="table"> | |
1013 | <colgroup> | |
1014 | <col> | |
1015 | <col> | |
1016 | </colgroup> | |
1017 | <thead><tr> | |
1018 | <th> | |
1019 | <p> | |
1020 | Escape | |
1021 | </p> | |
1022 | </th> | |
1023 | <th> | |
1024 | <p> | |
1025 | Meaning | |
1026 | </p> | |
1027 | </th> | |
1028 | </tr></thead> | |
1029 | <tbody> | |
1030 | <tr> | |
1031 | <td> | |
1032 | <p> | |
1033 | <code class="computeroutput"><span class="special">\</span><span class="identifier">C</span></code> | |
1034 | </p> | |
1035 | </td> | |
1036 | <td> | |
1037 | <p> | |
1038 | Matches a single code point: in Boost regex this has exactly the | |
1039 | same effect as a "." operator. | |
1040 | </p> | |
1041 | </td> | |
1042 | </tr> | |
1043 | <tr> | |
1044 | <td> | |
1045 | <p> | |
1046 | <code class="computeroutput"><span class="special">\</span><span class="identifier">X</span></code> | |
1047 | </p> | |
1048 | </td> | |
1049 | <td> | |
1050 | <p> | |
1051 | Matches a combining character sequence: that is any non-combining | |
1052 | character followed by a sequence of zero or more combining characters. | |
1053 | </p> | |
1054 | </td> | |
1055 | </tr> | |
1056 | </tbody> | |
1057 | </table></div> | |
1058 | <h6> | |
1059 | <a name="boost_regex.syntax.basic_extended.h25"></a> | |
1060 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.any_other_escape"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.any_other_escape">Any | |
1061 | other escape</a> | |
1062 | </h6> | |
1063 | <p> | |
1064 | Any other escape sequence matches the character that is escaped, for example | |
1065 | \@ matches a literal '@'. | |
1066 | </p> | |
1067 | <h5> | |
1068 | <a name="boost_regex.syntax.basic_extended.h26"></a> | |
1069 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.operator_precedence"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.operator_precedence">Operator | |
1070 | precedence</a> | |
1071 | </h5> | |
1072 | <p> | |
1073 | The order of precedence for of operators is as follows: | |
1074 | </p> | |
1075 | <div class="orderedlist"><ol class="orderedlist" type="1"> | |
1076 | <li class="listitem"> | |
1077 | Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span> | |
1078 | <span class="special">[::]</span> <span class="special">[..]</span></code> | |
1079 | </li> | |
1080 | <li class="listitem"> | |
1081 | Escaped characters <code class="computeroutput"><span class="special">\</span></code> | |
1082 | </li> | |
1083 | <li class="listitem"> | |
1084 | Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code> | |
1085 | </li> | |
1086 | <li class="listitem"> | |
1087 | Grouping <code class="computeroutput"><span class="special">()</span></code> | |
1088 | </li> | |
1089 | <li class="listitem"> | |
1090 | Single-character-ERE duplication <code class="computeroutput"><span class="special">*</span> | |
1091 | <span class="special">+</span> <span class="special">?</span> | |
1092 | <span class="special">{</span><span class="identifier">m</span><span class="special">,</span><span class="identifier">n</span><span class="special">}</span></code> | |
1093 | </li> | |
1094 | <li class="listitem"> | |
1095 | Concatenation | |
1096 | </li> | |
1097 | <li class="listitem"> | |
1098 | Anchoring ^$ | |
1099 | </li> | |
1100 | <li class="listitem"> | |
1101 | Alternation <code class="computeroutput"><span class="special">|</span></code> | |
1102 | </li> | |
1103 | </ol></div> | |
1104 | <h5> | |
1105 | <a name="boost_regex.syntax.basic_extended.h27"></a> | |
1106 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.what_gets_matched"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.what_gets_matched">What | |
1107 | Gets Matched</a> | |
1108 | </h5> | |
1109 | <p> | |
1110 | When there is more that one way to match a regular expression, the "best" | |
1111 | possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest | |
1112 | rule</a>. | |
1113 | </p> | |
1114 | <h4> | |
1115 | <a name="boost_regex.syntax.basic_extended.h28"></a> | |
1116 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.variations"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.variations">Variations</a> | |
1117 | </h4> | |
1118 | <h5> | |
1119 | <a name="boost_regex.syntax.basic_extended.h29"></a> | |
1120 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.egrep"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.egrep">Egrep</a> | |
1121 | </h5> | |
1122 | <p> | |
1123 | When an expression is compiled with the <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type">flag | |
1124 | <code class="computeroutput"><span class="identifier">egrep</span></code></a> set, then the | |
1125 | expression is treated as a newline separated list of <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended | |
1126 | expressions</a>, a match is found if any of the expressions in the list | |
1127 | match, for example: | |
1128 | </p> | |
1129 | <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">egrep</span><span class="special">);</span> | |
1130 | </pre> | |
1131 | <p> | |
1132 | will match either of the POSIX-Basic expressions "abc" or "def". | |
1133 | </p> | |
1134 | <p> | |
1135 | As its name suggests, this behavior is consistent with the Unix utility | |
1136 | <code class="computeroutput"><span class="identifier">egrep</span></code>, and with grep when | |
1137 | used with the -E option. | |
1138 | </p> | |
1139 | <h5> | |
1140 | <a name="boost_regex.syntax.basic_extended.h30"></a> | |
1141 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.awk"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.awk">awk</a> | |
1142 | </h5> | |
1143 | <p> | |
1144 | In addition to the <a class="link" href="basic_extended.html#boost_regex.posix_extended_syntax">POSIX-Extended | |
1145 | features</a> the escape character is special inside a character class | |
1146 | declaration. | |
1147 | </p> | |
1148 | <p> | |
1149 | In addition, some escape sequences that are not defined as part of POSIX-Extended | |
1150 | specification are required to be supported - however Boost.Regex supports | |
1151 | these by default anyway. | |
1152 | </p> | |
1153 | <h4> | |
1154 | <a name="boost_regex.syntax.basic_extended.h31"></a> | |
1155 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.options"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.options">Options</a> | |
1156 | </h4> | |
1157 | <p> | |
1158 | There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions">variety | |
1159 | of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">extended</span></code> | |
1160 | and <code class="computeroutput"><span class="identifier">egrep</span></code> options when constructing | |
1161 | the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code></a> option alters the syntax, | |
1162 | while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_extended.html" title="Options for POSIX Extended Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code>, <code class="computeroutput"><span class="identifier">nosubs</span></code> | |
1163 | and <code class="computeroutput"><span class="identifier">icase</span></code> options</a> | |
1164 | modify how the case and locale sensitivity are to be applied. | |
1165 | </p> | |
1166 | <h4> | |
1167 | <a name="boost_regex.syntax.basic_extended.h32"></a> | |
1168 | <span class="phrase"><a name="boost_regex.syntax.basic_extended.references"></a></span><a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.references">References</a> | |
1169 | </h4> | |
1170 | <p> | |
1171 | <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE | |
1172 | Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions | |
1173 | and Headers, Section 9, Regular Expressions.</a> | |
1174 | </p> | |
1175 | <p> | |
1176 | <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE | |
1177 | Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and | |
1178 | Utilities, Section 4, Utilities, egrep.</a> | |
1179 | </p> | |
1180 | <p> | |
1181 | <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/awk.html" target="_top">IEEE | |
1182 | Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and | |
1183 | Utilities, Section 4, Utilities, awk.</a> | |
1184 | </p> | |
1185 | </div> | |
1186 | <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> | |
1187 | <td align="left"></td> | |
1188 | <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> | |
1189 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
1190 | file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) | |
1191 | </p> | |
1192 | </div></td> | |
1193 | </tr></table> | |
1194 | <hr> | |
1195 | <div class="spirit-nav"> | |
1196 | <a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
1197 | </div> | |
1198 | </body> | |
1199 | </html> |