]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> | |
4 | <title>Perl Regular Expression Syntax</title> | |
5 | <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> | |
6 | <meta name="generator" content="DocBook XSL Stylesheets V1.77.1"> | |
7 | <link rel="home" href="../../index.html" title="Boost.Regex 5.1.2"> | |
8 | <link rel="up" href="../syntax.html" title="Regular Expression Syntax"> | |
9 | <link rel="prev" href="../syntax.html" title="Regular Expression Syntax"> | |
10 | <link rel="next" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax"> | |
11 | </head> | |
12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> | |
13 | <table cellpadding="2" width="100%"><tr> | |
14 | <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> | |
15 | <td align="center"><a href="../../../../../../index.html">Home</a></td> | |
16 | <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> | |
17 | <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> | |
18 | <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> | |
19 | <td align="center"><a href="../../../../../../more/index.htm">More</a></td> | |
20 | </tr></table> | |
21 | <hr> | |
22 | <div class="spirit-nav"> | |
23 | <a accesskey="p" href="../syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_extended.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
24 | </div> | |
25 | <div class="section"> | |
26 | <div class="titlepage"><div><div><h3 class="title"> | |
27 | <a name="boost_regex.syntax.perl_syntax"></a><a class="link" href="perl_syntax.html" title="Perl Regular Expression Syntax">Perl Regular Expression | |
28 | Syntax</a> | |
29 | </h3></div></div></div> | |
30 | <h4> | |
31 | <a name="boost_regex.syntax.perl_syntax.h0"></a> | |
32 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.synopsis"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.synopsis">Synopsis</a> | |
33 | </h4> | |
34 | <p> | |
35 | The Perl regular expression syntax is based on that used by the programming | |
36 | language Perl . Perl regular expressions are the default behavior in Boost.Regex | |
37 | or you can pass the flag <code class="literal">perl</code> to the <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a> constructor, for example: | |
38 | </p> | |
39 | <pre class="programlisting"><span class="comment">// e1 is a case sensitive Perl regular expression: </span> | |
40 | <span class="comment">// since Perl is the default option there's no need to explicitly specify the syntax used here:</span> | |
41 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">);</span> | |
42 | <span class="comment">// e2 a case insensitive Perl regular expression:</span> | |
43 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">perl</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span> | |
44 | </pre> | |
45 | <h4> | |
46 | <a name="boost_regex.syntax.perl_syntax.h1"></a> | |
47 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.perl_regular_expression_syntax"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_regular_expression_syntax">Perl | |
48 | Regular Expression Syntax</a> | |
49 | </h4> | |
50 | <p> | |
51 | In Perl regular expressions, all characters match themselves except for the | |
52 | following special characters: | |
53 | </p> | |
54 | <pre class="programlisting">.[{}()\*+?|^$</pre> | |
55 | <h5> | |
56 | <a name="boost_regex.syntax.perl_syntax.h2"></a> | |
57 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.wildcard"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.wildcard">Wildcard</a> | |
58 | </h5> | |
59 | <p> | |
60 | The single character '.' when used outside of a character set will match | |
61 | any single character except: | |
62 | </p> | |
63 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
64 | <li class="listitem"> | |
65 | The NULL character when the <a class="link" href="../ref/match_flag_type.html" title="match_flag_type">flag | |
66 | <code class="literal">match_not_dot_null</code></a> is passed to the matching | |
67 | algorithms. | |
68 | </li> | |
69 | <li class="listitem"> | |
70 | The newline character when the <a class="link" href="../ref/match_flag_type.html" title="match_flag_type">flag | |
71 | <code class="literal">match_not_dot_newline</code></a> is passed to the matching | |
72 | algorithms. | |
73 | </li> | |
74 | </ul></div> | |
75 | <h5> | |
76 | <a name="boost_regex.syntax.perl_syntax.h3"></a> | |
77 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.anchors"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.anchors">Anchors</a> | |
78 | </h5> | |
79 | <p> | |
80 | A '^' character shall match the start of a line. | |
81 | </p> | |
82 | <p> | |
83 | A '$' character shall match the end of a line. | |
84 | </p> | |
85 | <h5> | |
86 | <a name="boost_regex.syntax.perl_syntax.h4"></a> | |
87 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.marked_sub_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.marked_sub_expressions">Marked sub-expressions</a> | |
88 | </h5> | |
89 | <p> | |
90 | A section beginning <code class="literal">(</code> and ending <code class="literal">)</code> | |
91 | acts as a marked sub-expression. Whatever matched the sub-expression is split | |
92 | out in a separate field by the matching algorithms. Marked sub-expressions | |
93 | can also repeated, or referred to by a back-reference. | |
94 | </p> | |
95 | <h5> | |
96 | <a name="boost_regex.syntax.perl_syntax.h5"></a> | |
97 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_marking_grouping"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_grouping">Non-marking | |
98 | grouping</a> | |
99 | </h5> | |
100 | <p> | |
101 | A marked sub-expression is useful to lexically group part of a regular expression, | |
102 | but has the side-effect of spitting out an extra field in the result. As | |
103 | an alternative you can lexically group part of a regular expression, without | |
104 | generating a marked sub-expression by using <code class="literal">(?:</code> and <code class="literal">)</code> | |
105 | , for example <code class="literal">(?:ab)+</code> will repeat <code class="literal">ab</code> | |
106 | without splitting out any separate sub-expressions. | |
107 | </p> | |
108 | <h5> | |
109 | <a name="boost_regex.syntax.perl_syntax.h6"></a> | |
110 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.repeats">Repeats</a> | |
111 | </h5> | |
112 | <p> | |
113 | Any atom (a single character, a marked sub-expression, or a character class) | |
114 | can be repeated with the <code class="literal">*</code>, <code class="literal">+</code>, <code class="literal">?</code>, | |
115 | and <code class="literal">{}</code> operators. | |
116 | </p> | |
117 | <p> | |
118 | The <code class="literal">*</code> operator will match the preceding atom zero or more | |
119 | times, for example the expression <code class="literal">a*b</code> will match any of | |
120 | the following: | |
121 | </p> | |
122 | <pre class="programlisting"><span class="identifier">b</span> | |
123 | <span class="identifier">ab</span> | |
124 | <span class="identifier">aaaaaaaab</span> | |
125 | </pre> | |
126 | <p> | |
127 | The <code class="literal">+</code> operator will match the preceding atom one or more | |
128 | times, for example the expression <code class="literal">a+b</code> will match any of | |
129 | the following: | |
130 | </p> | |
131 | <pre class="programlisting"><span class="identifier">ab</span> | |
132 | <span class="identifier">aaaaaaaab</span> | |
133 | </pre> | |
134 | <p> | |
135 | But will not match: | |
136 | </p> | |
137 | <pre class="programlisting"><span class="identifier">b</span> | |
138 | </pre> | |
139 | <p> | |
140 | The <code class="literal">?</code> operator will match the preceding atom zero or one | |
141 | times, for example the expression ca?b will match any of the following: | |
142 | </p> | |
143 | <pre class="programlisting"><span class="identifier">cb</span> | |
144 | <span class="identifier">cab</span> | |
145 | </pre> | |
146 | <p> | |
147 | But will not match: | |
148 | </p> | |
149 | <pre class="programlisting"><span class="identifier">caab</span> | |
150 | </pre> | |
151 | <p> | |
152 | An atom can also be repeated with a bounded repeat: | |
153 | </p> | |
154 | <p> | |
155 | <code class="literal">a{n}</code> Matches 'a' repeated exactly n times. | |
156 | </p> | |
157 | <p> | |
158 | <code class="literal">a{n,}</code> Matches 'a' repeated n or more times. | |
159 | </p> | |
160 | <p> | |
161 | <code class="literal">a{n, m}</code> Matches 'a' repeated between n and m times inclusive. | |
162 | </p> | |
163 | <p> | |
164 | For example: | |
165 | </p> | |
166 | <pre class="programlisting">^a{2,3}$</pre> | |
167 | <p> | |
168 | Will match either of: | |
169 | </p> | |
170 | <pre class="programlisting"><span class="identifier">aa</span> | |
171 | <span class="identifier">aaa</span> | |
172 | </pre> | |
173 | <p> | |
174 | But neither of: | |
175 | </p> | |
176 | <pre class="programlisting"><span class="identifier">a</span> | |
177 | <span class="identifier">aaaa</span> | |
178 | </pre> | |
179 | <p> | |
180 | Note that the "{" and "}" characters will treated as | |
181 | ordinary literals when used in a context that is not a repeat: this matches | |
182 | Perl 5.x behavior. For example in the expressions "ab{1", "ab1}" | |
183 | and "a{b}c" the curly brackets are all treated as literals and | |
184 | <span class="emphasis"><em>no error will be raised</em></span>. | |
185 | </p> | |
186 | <p> | |
187 | It is an error to use a repeat operator, if the preceding construct can not | |
188 | be repeated, for example: | |
189 | </p> | |
190 | <pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span> | |
191 | </pre> | |
192 | <p> | |
193 | Will raise an error, as there is nothing for the <code class="literal">*</code> operator | |
194 | to be applied to. | |
195 | </p> | |
196 | <h5> | |
197 | <a name="boost_regex.syntax.perl_syntax.h7"></a> | |
198 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_greedy_repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_greedy_repeats">Non | |
199 | greedy repeats</a> | |
200 | </h5> | |
201 | <p> | |
202 | The normal repeat operators are "greedy", that is to say they will | |
203 | consume as much input as possible. There are non-greedy versions available | |
204 | that will consume as little input as possible while still producing a match. | |
205 | </p> | |
206 | <p> | |
207 | <code class="literal">*?</code> Matches the previous atom zero or more times, while | |
208 | consuming as little input as possible. | |
209 | </p> | |
210 | <p> | |
211 | <code class="literal">+?</code> Matches the previous atom one or more times, while | |
212 | consuming as little input as possible. | |
213 | </p> | |
214 | <p> | |
215 | <code class="literal">??</code> Matches the previous atom zero or one times, while | |
216 | consuming as little input as possible. | |
217 | </p> | |
218 | <p> | |
219 | <code class="literal">{n,}?</code> Matches the previous atom n or more times, while | |
220 | consuming as little input as possible. | |
221 | </p> | |
222 | <p> | |
223 | <code class="literal">{n,m}?</code> Matches the previous atom between n and m times, | |
224 | while consuming as little input as possible. | |
225 | </p> | |
226 | <h5> | |
227 | <a name="boost_regex.syntax.perl_syntax.h8"></a> | |
228 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.possessive_repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.possessive_repeats">Possessive | |
229 | repeats</a> | |
230 | </h5> | |
231 | <p> | |
232 | By default when a repeated pattern does not match then the engine will backtrack | |
233 | until a match is found. However, this behaviour can sometime be undesireble | |
234 | so there are also "possessive" repeats: these match as much as | |
235 | possible and do not then allow backtracking if the rest of the expression | |
236 | fails to match. | |
237 | </p> | |
238 | <p> | |
239 | <code class="literal">*+</code> Matches the previous atom zero or more times, while | |
240 | giving nothing back. | |
241 | </p> | |
242 | <p> | |
243 | <code class="literal">++</code> Matches the previous atom one or more times, while | |
244 | giving nothing back. | |
245 | </p> | |
246 | <p> | |
247 | <code class="literal">?+</code> Matches the previous atom zero or one times, while | |
248 | giving nothing back. | |
249 | </p> | |
250 | <p> | |
251 | <code class="literal">{n,}+</code> Matches the previous atom n or more times, while | |
252 | giving nothing back. | |
253 | </p> | |
254 | <p> | |
255 | <code class="literal">{n,m}+</code> Matches the previous atom between n and m times, | |
256 | while giving nothing back. | |
257 | </p> | |
258 | <h5> | |
259 | <a name="boost_regex.syntax.perl_syntax.h9"></a> | |
260 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.back_references"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.back_references">Back | |
261 | references</a> | |
262 | </h5> | |
263 | <p> | |
264 | An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span> | |
265 | is in the range 1-9, matches the same string that was matched by sub-expression | |
266 | <span class="emphasis"><em>n</em></span>. For example the expression: | |
267 | </p> | |
268 | <pre class="programlisting">^(a*).*\1$</pre> | |
269 | <p> | |
270 | Will match the string: | |
271 | </p> | |
272 | <pre class="programlisting"><span class="identifier">aaabbaaa</span> | |
273 | </pre> | |
274 | <p> | |
275 | But not the string: | |
276 | </p> | |
277 | <pre class="programlisting"><span class="identifier">aaabba</span> | |
278 | </pre> | |
279 | <p> | |
280 | You can also use the \g escape for the same function, for example: | |
281 | </p> | |
282 | <div class="informaltable"><table class="table"> | |
283 | <colgroup> | |
284 | <col> | |
285 | <col> | |
286 | </colgroup> | |
287 | <thead><tr> | |
288 | <th> | |
289 | <p> | |
290 | Escape | |
291 | </p> | |
292 | </th> | |
293 | <th> | |
294 | <p> | |
295 | Meaning | |
296 | </p> | |
297 | </th> | |
298 | </tr></thead> | |
299 | <tbody> | |
300 | <tr> | |
301 | <td> | |
302 | <p> | |
303 | <code class="literal">\g1</code> | |
304 | </p> | |
305 | </td> | |
306 | <td> | |
307 | <p> | |
308 | Match whatever matched sub-expression 1 | |
309 | </p> | |
310 | </td> | |
311 | </tr> | |
312 | <tr> | |
313 | <td> | |
314 | <p> | |
315 | <code class="literal">\g{1}</code> | |
316 | </p> | |
317 | </td> | |
318 | <td> | |
319 | <p> | |
320 | Match whatever matched sub-expression 1: this form allows for safer | |
321 | parsing of the expression in cases like <code class="literal">\g{1}2</code> | |
322 | or for indexes higher than 9 as in <code class="literal">\g{1234}</code> | |
323 | </p> | |
324 | </td> | |
325 | </tr> | |
326 | <tr> | |
327 | <td> | |
328 | <p> | |
329 | <code class="literal">\g-1</code> | |
330 | </p> | |
331 | </td> | |
332 | <td> | |
333 | <p> | |
334 | Match whatever matched the last opened sub-expression | |
335 | </p> | |
336 | </td> | |
337 | </tr> | |
338 | <tr> | |
339 | <td> | |
340 | <p> | |
341 | <code class="literal">\g{-2}</code> | |
342 | </p> | |
343 | </td> | |
344 | <td> | |
345 | <p> | |
346 | Match whatever matched the last but one opened sub-expression | |
347 | </p> | |
348 | </td> | |
349 | </tr> | |
350 | <tr> | |
351 | <td> | |
352 | <p> | |
353 | <code class="literal">\g{one}</code> | |
354 | </p> | |
355 | </td> | |
356 | <td> | |
357 | <p> | |
358 | Match whatever matched the sub-expression named "one" | |
359 | </p> | |
360 | </td> | |
361 | </tr> | |
362 | </tbody> | |
363 | </table></div> | |
364 | <p> | |
365 | Finally the \k escape can be used to refer to named subexpressions, for example | |
366 | <code class="literal">\k<two></code> will match whatever matched the subexpression | |
367 | named "two". | |
368 | </p> | |
369 | <h5> | |
370 | <a name="boost_regex.syntax.perl_syntax.h10"></a> | |
371 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.alternation"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.alternation">Alternation</a> | |
372 | </h5> | |
373 | <p> | |
374 | The <code class="literal">|</code> operator will match either of its arguments, so | |
375 | for example: <code class="literal">abc|def</code> will match either "abc" | |
376 | or "def". | |
377 | </p> | |
378 | <p> | |
379 | Parenthesis can be used to group alternations, for example: <code class="literal">ab(d|ef)</code> | |
380 | will match either of "abd" or "abef". | |
381 | </p> | |
382 | <p> | |
383 | Empty alternatives are not allowed (these are almost always a mistake), but | |
384 | if you really want an empty alternative use <code class="literal">(?:)</code> as a | |
385 | placeholder, for example: | |
386 | </p> | |
387 | <p> | |
388 | <code class="literal">|abc</code> is not a valid expression, but | |
389 | </p> | |
390 | <p> | |
391 | <code class="literal">(?:)|abc</code> is and is equivalent, also the expression: | |
392 | </p> | |
393 | <p> | |
394 | <code class="literal">(?:abc)??</code> has exactly the same effect. | |
395 | </p> | |
396 | <h5> | |
397 | <a name="boost_regex.syntax.perl_syntax.h11"></a> | |
398 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_sets"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets">Character | |
399 | sets</a> | |
400 | </h5> | |
401 | <p> | |
402 | A character set is a bracket-expression starting with <code class="literal">[] and ending | |
403 | with <code class="literal"></code></code>, it defines a set of characters, and matches | |
404 | any single character that is a member of that set. | |
405 | </p> | |
406 | <p> | |
407 | A bracket expression may contain any combination of the following: | |
408 | </p> | |
409 | <h6> | |
410 | <a name="boost_regex.syntax.perl_syntax.h12"></a> | |
411 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.single_characters"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.single_characters">Single | |
412 | characters</a> | |
413 | </h6> | |
414 | <p> | |
415 | For example <code class="literal">[abc]</code>, will match any of the characters 'a', | |
416 | 'b', or 'c'. | |
417 | </p> | |
418 | <h6> | |
419 | <a name="boost_regex.syntax.perl_syntax.h13"></a> | |
420 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_ranges"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_ranges">Character | |
421 | ranges</a> | |
422 | </h6> | |
423 | <p> | |
424 | For example <code class="literal">[a-c]</code> will match any single character in the | |
425 | range 'a' to 'c'. By default, for Perl regular expressions, a character x | |
426 | is within the range y to z, if the code point of the character lies within | |
427 | the codepoints of the endpoints of the range. Alternatively, if you set the | |
428 | <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions"><code class="literal">collate</code> | |
429 | flag</a> when constructing the regular expression, then ranges are locale | |
430 | sensitive. | |
431 | </p> | |
432 | <h6> | |
433 | <a name="boost_regex.syntax.perl_syntax.h14"></a> | |
434 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.negation"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.negation">Negation</a> | |
435 | </h6> | |
436 | <p> | |
437 | If the bracket-expression begins with the ^ character, then it matches the | |
438 | complement of the characters it contains, for example <code class="literal">[^a-c]</code> | |
439 | matches any character that is not in the range <code class="literal">a-c</code>. | |
440 | </p> | |
441 | <h6> | |
442 | <a name="boost_regex.syntax.perl_syntax.h15"></a> | |
443 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_classes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_classes">Character | |
444 | classes</a> | |
445 | </h6> | |
446 | <p> | |
447 | An expression of the form <code class="literal">[[:name:]]</code> matches the named | |
448 | character class "name", for example <code class="literal">[[:lower:]]</code> | |
449 | matches any lower case character. See <a class="link" href="character_classes.html" title="Character Class Names">character | |
450 | class names</a>. | |
451 | </p> | |
452 | <h6> | |
453 | <a name="boost_regex.syntax.perl_syntax.h16"></a> | |
454 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.collating_elements"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.collating_elements">Collating | |
455 | Elements</a> | |
456 | </h6> | |
457 | <p> | |
458 | An expression of the form <code class="literal">[[.col.]]</code> matches the collating | |
459 | element <span class="emphasis"><em>col</em></span>. A collating element is any single character, | |
460 | or any sequence of characters that collates as a single unit. Collating elements | |
461 | may also be used as the end point of a range, for example: <code class="literal">[[.ae.]-c]</code> | |
462 | matches the character sequence "ae", plus any single character | |
463 | in the range "ae"-c, assuming that "ae" is treated as | |
464 | a single collating element in the current locale. | |
465 | </p> | |
466 | <p> | |
467 | As an extension, a collating element may also be specified via it's <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example: | |
468 | </p> | |
469 | <pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span> | |
470 | </pre> | |
471 | <p> | |
472 | matches a <code class="literal">\0</code> character. | |
473 | </p> | |
474 | <h6> | |
475 | <a name="boost_regex.syntax.perl_syntax.h17"></a> | |
476 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.equivalence_classes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.equivalence_classes">Equivalence | |
477 | classes</a> | |
478 | </h6> | |
479 | <p> | |
480 | An expression of the form <code class="literal">[[=col=]]</code>, matches any character | |
481 | or collating element whose primary sort key is the same as that for collating | |
482 | element <span class="emphasis"><em>col</em></span>, as with collating elements the name <span class="emphasis"><em>col</em></span> | |
483 | may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>. | |
484 | A primary sort key is one that ignores case, accentation, or locale-specific | |
485 | tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches | |
486 | any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation | |
487 | of this is reliant on the platform's collation and localisation support; | |
488 | this feature can not be relied upon to work portably across all platforms, | |
489 | or even all locales on one platform. | |
490 | </p> | |
491 | <h6> | |
492 | <a name="boost_regex.syntax.perl_syntax.h18"></a> | |
493 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.escaped_characters"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escaped_characters">Escaped | |
494 | Characters</a> | |
495 | </h6> | |
496 | <p> | |
497 | All the escape sequences that match a single character, or a single character | |
498 | class are permitted within a character class definition. For example <code class="computeroutput"><span class="special">[\[\]]</span></code> would match either of <code class="computeroutput"><span class="special">[</span></code> or <code class="computeroutput"><span class="special">]</span></code> | |
499 | while <code class="computeroutput"><span class="special">[\</span><span class="identifier">W</span><span class="special">\</span><span class="identifier">d</span><span class="special">]</span></code> | |
500 | would match any character that is either a "digit", <span class="emphasis"><em>or</em></span> | |
501 | is <span class="emphasis"><em>not</em></span> a "word" character. | |
502 | </p> | |
503 | <h6> | |
504 | <a name="boost_regex.syntax.perl_syntax.h19"></a> | |
505 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.combinations"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.combinations">Combinations</a> | |
506 | </h6> | |
507 | <p> | |
508 | All of the above can be combined in one character set declaration, for example: | |
509 | <code class="literal">[[:digit:]a-c[.NUL.]]</code>. | |
510 | </p> | |
511 | <h5> | |
512 | <a name="boost_regex.syntax.perl_syntax.h20"></a> | |
513 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.escapes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escapes">Escapes</a> | |
514 | </h5> | |
515 | <p> | |
516 | Any special character preceded by an escape shall match itself. | |
517 | </p> | |
518 | <p> | |
519 | The following escape sequences are all synonyms for single characters: | |
520 | </p> | |
521 | <div class="informaltable"><table class="table"> | |
522 | <colgroup> | |
523 | <col> | |
524 | <col> | |
525 | </colgroup> | |
526 | <thead><tr> | |
527 | <th> | |
528 | <p> | |
529 | Escape | |
530 | </p> | |
531 | </th> | |
532 | <th> | |
533 | <p> | |
534 | Character | |
535 | </p> | |
536 | </th> | |
537 | </tr></thead> | |
538 | <tbody> | |
539 | <tr> | |
540 | <td> | |
541 | <p> | |
542 | <code class="literal">\a</code> | |
543 | </p> | |
544 | </td> | |
545 | <td> | |
546 | <p> | |
547 | <code class="literal">\a</code> | |
548 | </p> | |
549 | </td> | |
550 | </tr> | |
551 | <tr> | |
552 | <td> | |
553 | <p> | |
554 | <code class="literal">\e</code> | |
555 | </p> | |
556 | </td> | |
557 | <td> | |
558 | <p> | |
559 | <code class="literal">0x1B</code> | |
560 | </p> | |
561 | </td> | |
562 | </tr> | |
563 | <tr> | |
564 | <td> | |
565 | <p> | |
566 | <code class="literal">\f</code> | |
567 | </p> | |
568 | </td> | |
569 | <td> | |
570 | <p> | |
571 | <code class="literal">\f</code> | |
572 | </p> | |
573 | </td> | |
574 | </tr> | |
575 | <tr> | |
576 | <td> | |
577 | <p> | |
578 | <code class="literal"><br> </code> | |
579 | </p> | |
580 | </td> | |
581 | <td> | |
582 | <p> | |
583 | <code class="literal"><br> </code> | |
584 | </p> | |
585 | </td> | |
586 | </tr> | |
587 | <tr> | |
588 | <td> | |
589 | <p> | |
590 | <code class="literal">\r</code> | |
591 | </p> | |
592 | </td> | |
593 | <td> | |
594 | <p> | |
595 | <code class="literal">\r</code> | |
596 | </p> | |
597 | </td> | |
598 | </tr> | |
599 | <tr> | |
600 | <td> | |
601 | <p> | |
602 | <code class="literal">\t</code> | |
603 | </p> | |
604 | </td> | |
605 | <td> | |
606 | <p> | |
607 | <code class="literal">\t</code> | |
608 | </p> | |
609 | </td> | |
610 | </tr> | |
611 | <tr> | |
612 | <td> | |
613 | <p> | |
614 | <code class="literal">\v</code> | |
615 | </p> | |
616 | </td> | |
617 | <td> | |
618 | <p> | |
619 | <code class="literal">\v</code> | |
620 | </p> | |
621 | </td> | |
622 | </tr> | |
623 | <tr> | |
624 | <td> | |
625 | <p> | |
626 | <code class="literal">\b</code> | |
627 | </p> | |
628 | </td> | |
629 | <td> | |
630 | <p> | |
631 | <code class="literal">\b</code> (but only inside a character class declaration). | |
632 | </p> | |
633 | </td> | |
634 | </tr> | |
635 | <tr> | |
636 | <td> | |
637 | <p> | |
638 | <code class="literal">\cX</code> | |
639 | </p> | |
640 | </td> | |
641 | <td> | |
642 | <p> | |
643 | An ASCII escape sequence - the character whose code point is X | |
644 | % 32 | |
645 | </p> | |
646 | </td> | |
647 | </tr> | |
648 | <tr> | |
649 | <td> | |
650 | <p> | |
651 | <code class="literal">\xdd</code> | |
652 | </p> | |
653 | </td> | |
654 | <td> | |
655 | <p> | |
656 | A hexadecimal escape sequence - matches the single character whose | |
657 | code point is 0xdd. | |
658 | </p> | |
659 | </td> | |
660 | </tr> | |
661 | <tr> | |
662 | <td> | |
663 | <p> | |
664 | <code class="literal">\x{dddd}</code> | |
665 | </p> | |
666 | </td> | |
667 | <td> | |
668 | <p> | |
669 | A hexadecimal escape sequence - matches the single character whose | |
670 | code point is 0xdddd. | |
671 | </p> | |
672 | </td> | |
673 | </tr> | |
674 | <tr> | |
675 | <td> | |
676 | <p> | |
677 | <code class="literal">\0ddd</code> | |
678 | </p> | |
679 | </td> | |
680 | <td> | |
681 | <p> | |
682 | An octal escape sequence - matches the single character whose code | |
683 | point is 0ddd. | |
684 | </p> | |
685 | </td> | |
686 | </tr> | |
687 | <tr> | |
688 | <td> | |
689 | <p> | |
690 | <code class="literal">\N{name}</code> | |
691 | </p> | |
692 | </td> | |
693 | <td> | |
694 | <p> | |
695 | Matches the single character which has the <a class="link" href="collating_names.html" title="Collating Names">symbolic | |
696 | name</a> <span class="emphasis"><em>name</em></span>. For example <code class="literal">\N{newline}</code> | |
697 | matches the single character \n. | |
698 | </p> | |
699 | </td> | |
700 | </tr> | |
701 | </tbody> | |
702 | </table></div> | |
703 | <h6> | |
704 | <a name="boost_regex.syntax.perl_syntax.h21"></a> | |
705 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.single_character_character_class"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.single_character_character_class">"Single | |
706 | character" character classes:</a> | |
707 | </h6> | |
708 | <p> | |
709 | Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is | |
710 | the name of a character class shall match any character that is a member | |
711 | of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span> | |
712 | is the name of a character class, shall match any character not in that class. | |
713 | </p> | |
714 | <p> | |
715 | The following are supported by default: | |
716 | </p> | |
717 | <div class="informaltable"><table class="table"> | |
718 | <colgroup> | |
719 | <col> | |
720 | <col> | |
721 | </colgroup> | |
722 | <thead><tr> | |
723 | <th> | |
724 | <p> | |
725 | Escape sequence | |
726 | </p> | |
727 | </th> | |
728 | <th> | |
729 | <p> | |
730 | Equivalent to | |
731 | </p> | |
732 | </th> | |
733 | </tr></thead> | |
734 | <tbody> | |
735 | <tr> | |
736 | <td> | |
737 | <p> | |
738 | <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code> | |
739 | </p> | |
740 | </td> | |
741 | <td> | |
742 | <p> | |
743 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> | |
744 | </p> | |
745 | </td> | |
746 | </tr> | |
747 | <tr> | |
748 | <td> | |
749 | <p> | |
750 | <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code> | |
751 | </p> | |
752 | </td> | |
753 | <td> | |
754 | <p> | |
755 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> | |
756 | </p> | |
757 | </td> | |
758 | </tr> | |
759 | <tr> | |
760 | <td> | |
761 | <p> | |
762 | <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code> | |
763 | </p> | |
764 | </td> | |
765 | <td> | |
766 | <p> | |
767 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code> | |
768 | </p> | |
769 | </td> | |
770 | </tr> | |
771 | <tr> | |
772 | <td> | |
773 | <p> | |
774 | <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code> | |
775 | </p> | |
776 | </td> | |
777 | <td> | |
778 | <p> | |
779 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> | |
780 | </p> | |
781 | </td> | |
782 | </tr> | |
783 | <tr> | |
784 | <td> | |
785 | <p> | |
786 | <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code> | |
787 | </p> | |
788 | </td> | |
789 | <td> | |
790 | <p> | |
791 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code> | |
792 | </p> | |
793 | </td> | |
794 | </tr> | |
795 | <tr> | |
796 | <td> | |
797 | <p> | |
798 | <code class="computeroutput"><span class="special">\</span><span class="identifier">h</span></code> | |
799 | </p> | |
800 | </td> | |
801 | <td> | |
802 | <p> | |
803 | Horizontal whitespace | |
804 | </p> | |
805 | </td> | |
806 | </tr> | |
807 | <tr> | |
808 | <td> | |
809 | <p> | |
810 | <code class="computeroutput"><span class="special">\</span><span class="identifier">v</span></code> | |
811 | </p> | |
812 | </td> | |
813 | <td> | |
814 | <p> | |
815 | Vertical whitespace | |
816 | </p> | |
817 | </td> | |
818 | </tr> | |
819 | <tr> | |
820 | <td> | |
821 | <p> | |
822 | <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code> | |
823 | </p> | |
824 | </td> | |
825 | <td> | |
826 | <p> | |
827 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code> | |
828 | </p> | |
829 | </td> | |
830 | </tr> | |
831 | <tr> | |
832 | <td> | |
833 | <p> | |
834 | <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code> | |
835 | </p> | |
836 | </td> | |
837 | <td> | |
838 | <p> | |
839 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> | |
840 | </p> | |
841 | </td> | |
842 | </tr> | |
843 | <tr> | |
844 | <td> | |
845 | <p> | |
846 | <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code> | |
847 | </p> | |
848 | </td> | |
849 | <td> | |
850 | <p> | |
851 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code> | |
852 | </p> | |
853 | </td> | |
854 | </tr> | |
855 | <tr> | |
856 | <td> | |
857 | <p> | |
858 | <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code> | |
859 | </p> | |
860 | </td> | |
861 | <td> | |
862 | <p> | |
863 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code> | |
864 | </p> | |
865 | </td> | |
866 | </tr> | |
867 | <tr> | |
868 | <td> | |
869 | <p> | |
870 | <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code> | |
871 | </p> | |
872 | </td> | |
873 | <td> | |
874 | <p> | |
875 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code> | |
876 | </p> | |
877 | </td> | |
878 | </tr> | |
879 | <tr> | |
880 | <td> | |
881 | <p> | |
882 | <code class="computeroutput"><span class="special">\</span><span class="identifier">H</span></code> | |
883 | </p> | |
884 | </td> | |
885 | <td> | |
886 | <p> | |
887 | Not Horizontal whitespace | |
888 | </p> | |
889 | </td> | |
890 | </tr> | |
891 | <tr> | |
892 | <td> | |
893 | <p> | |
894 | <code class="computeroutput"><span class="special">\</span><span class="identifier">V</span></code> | |
895 | </p> | |
896 | </td> | |
897 | <td> | |
898 | <p> | |
899 | Not Vertical whitespace | |
900 | </p> | |
901 | </td> | |
902 | </tr> | |
903 | </tbody> | |
904 | </table></div> | |
905 | <h6> | |
906 | <a name="boost_regex.syntax.perl_syntax.h22"></a> | |
907 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_properties"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_properties">Character | |
908 | Properties</a> | |
909 | </h6> | |
910 | <p> | |
911 | The character property names in the following table are all equivalent to | |
912 | the <a class="link" href="character_classes.html" title="Character Class Names">names used in character | |
913 | classes</a>. | |
914 | </p> | |
915 | <div class="informaltable"><table class="table"> | |
916 | <colgroup> | |
917 | <col> | |
918 | <col> | |
919 | <col> | |
920 | </colgroup> | |
921 | <thead><tr> | |
922 | <th> | |
923 | <p> | |
924 | Form | |
925 | </p> | |
926 | </th> | |
927 | <th> | |
928 | <p> | |
929 | Description | |
930 | </p> | |
931 | </th> | |
932 | <th> | |
933 | <p> | |
934 | Equivalent character set form | |
935 | </p> | |
936 | </th> | |
937 | </tr></thead> | |
938 | <tbody> | |
939 | <tr> | |
940 | <td> | |
941 | <p> | |
942 | <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code> | |
943 | </p> | |
944 | </td> | |
945 | <td> | |
946 | <p> | |
947 | Matches any character that has the property X. | |
948 | </p> | |
949 | </td> | |
950 | <td> | |
951 | <p> | |
952 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code> | |
953 | </p> | |
954 | </td> | |
955 | </tr> | |
956 | <tr> | |
957 | <td> | |
958 | <p> | |
959 | <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> | |
960 | </p> | |
961 | </td> | |
962 | <td> | |
963 | <p> | |
964 | Matches any character that has the property Name. | |
965 | </p> | |
966 | </td> | |
967 | <td> | |
968 | <p> | |
969 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> | |
970 | </p> | |
971 | </td> | |
972 | </tr> | |
973 | <tr> | |
974 | <td> | |
975 | <p> | |
976 | <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code> | |
977 | </p> | |
978 | </td> | |
979 | <td> | |
980 | <p> | |
981 | Matches any character that does not have the property X. | |
982 | </p> | |
983 | </td> | |
984 | <td> | |
985 | <p> | |
986 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code> | |
987 | </p> | |
988 | </td> | |
989 | </tr> | |
990 | <tr> | |
991 | <td> | |
992 | <p> | |
993 | <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code> | |
994 | </p> | |
995 | </td> | |
996 | <td> | |
997 | <p> | |
998 | Matches any character that does not have the property Name. | |
999 | </p> | |
1000 | </td> | |
1001 | <td> | |
1002 | <p> | |
1003 | <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code> | |
1004 | </p> | |
1005 | </td> | |
1006 | </tr> | |
1007 | </tbody> | |
1008 | </table></div> | |
1009 | <p> | |
1010 | For example <code class="literal">\pd</code> matches any "digit" character, | |
1011 | as does <code class="literal">\p{digit}</code>. | |
1012 | </p> | |
1013 | <h6> | |
1014 | <a name="boost_regex.syntax.perl_syntax.h23"></a> | |
1015 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.word_boundaries"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.word_boundaries">Word | |
1016 | Boundaries</a> | |
1017 | </h6> | |
1018 | <p> | |
1019 | The following escape sequences match the boundaries of words: | |
1020 | </p> | |
1021 | <p> | |
1022 | <code class="literal"><</code> Matches the start of a word. | |
1023 | </p> | |
1024 | <p> | |
1025 | <code class="literal">></code> Matches the end of a word. | |
1026 | </p> | |
1027 | <p> | |
1028 | <code class="literal">\b</code> Matches a word boundary (the start or end of a word). | |
1029 | </p> | |
1030 | <p> | |
1031 | <code class="literal">\B</code> Matches only when not at a word boundary. | |
1032 | </p> | |
1033 | <h6> | |
1034 | <a name="boost_regex.syntax.perl_syntax.h24"></a> | |
1035 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.buffer_boundaries"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.buffer_boundaries">Buffer | |
1036 | boundaries</a> | |
1037 | </h6> | |
1038 | <p> | |
1039 | The following match only at buffer boundaries: a "buffer" in this | |
1040 | context is the whole of the input text that is being matched against (note | |
1041 | that ^ and $ may match embedded newlines within the text). | |
1042 | </p> | |
1043 | <p> | |
1044 | \` Matches at the start of a buffer only. | |
1045 | </p> | |
1046 | <p> | |
1047 | \' Matches at the end of a buffer only. | |
1048 | </p> | |
1049 | <p> | |
1050 | \A Matches at the start of a buffer only (the same as <code class="literal">\`</code>). | |
1051 | </p> | |
1052 | <p> | |
1053 | \z Matches at the end of a buffer only (the same as <code class="literal">\'</code>). | |
1054 | </p> | |
1055 | <p> | |
1056 | \Z Matches a zero-width assertion consisting of an optional sequence of newlines | |
1057 | at the end of a buffer: equivalent to the regular expression <code class="literal">(?=\v*\z)</code>. | |
1058 | Note that this is subtly different from Perl which behaves as if matching | |
1059 | <code class="literal">(?=\n?\z)</code>. | |
1060 | </p> | |
1061 | <h6> | |
1062 | <a name="boost_regex.syntax.perl_syntax.h25"></a> | |
1063 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.continuation_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.continuation_escape">Continuation | |
1064 | Escape</a> | |
1065 | </h6> | |
1066 | <p> | |
1067 | The sequence <code class="literal">\G</code> matches only at the end of the last match | |
1068 | found, or at the start of the text being matched if no previous match was | |
1069 | found. This escape useful if you're iterating over the matches contained | |
1070 | within a text, and you want each subsequence match to start where the last | |
1071 | one ended. | |
1072 | </p> | |
1073 | <h6> | |
1074 | <a name="boost_regex.syntax.perl_syntax.h26"></a> | |
1075 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.quoting_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.quoting_escape">Quoting | |
1076 | escape</a> | |
1077 | </h6> | |
1078 | <p> | |
1079 | The escape sequence <code class="literal">\Q</code> begins a "quoted sequence": | |
1080 | all the subsequent characters are treated as literals, until either the end | |
1081 | of the regular expression or \E is found. For example the expression: <code class="literal">\Q*+\Ea+</code> | |
1082 | would match either of: | |
1083 | </p> | |
1084 | <pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span> | |
1085 | <span class="special">\*+</span><span class="identifier">aaa</span> | |
1086 | </pre> | |
1087 | <h6> | |
1088 | <a name="boost_regex.syntax.perl_syntax.h27"></a> | |
1089 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.unicode_escapes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.unicode_escapes">Unicode | |
1090 | escapes</a> | |
1091 | </h6> | |
1092 | <p> | |
1093 | <code class="literal">\C</code> Matches a single code point: in Boost regex this has | |
1094 | exactly the same effect as a "." operator. <code class="literal">\X</code> | |
1095 | Matches a combining character sequence: that is any non-combining character | |
1096 | followed by a sequence of zero or more combining characters. | |
1097 | </p> | |
1098 | <h6> | |
1099 | <a name="boost_regex.syntax.perl_syntax.h28"></a> | |
1100 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.matching_line_endings"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.matching_line_endings">Matching Line | |
1101 | Endings</a> | |
1102 | </h6> | |
1103 | <p> | |
1104 | The escape sequence <code class="literal">\R</code> matches any line ending character | |
1105 | sequence, specifically it is identical to the expression <code class="literal">(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])</code>. | |
1106 | </p> | |
1107 | <h6> | |
1108 | <a name="boost_regex.syntax.perl_syntax.h29"></a> | |
1109 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.keeping_back_some_text"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.keeping_back_some_text">Keeping back | |
1110 | some text</a> | |
1111 | </h6> | |
1112 | <p> | |
1113 | <code class="literal">\K</code> Resets the start location of $0 to the current text | |
1114 | position: in other words everything to the left of \K is "kept back" | |
1115 | and does not form part of the regular expression match. $` is updated accordingly. | |
1116 | </p> | |
1117 | <p> | |
1118 | For example <code class="literal">foo\Kbar</code> matched against the text "foobar" | |
1119 | would return the match "bar" for $0 and "foo" for $`. | |
1120 | This can be used to simulate variable width lookbehind assertions. | |
1121 | </p> | |
1122 | <h6> | |
1123 | <a name="boost_regex.syntax.perl_syntax.h30"></a> | |
1124 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.any_other_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.any_other_escape">Any | |
1125 | other escape</a> | |
1126 | </h6> | |
1127 | <p> | |
1128 | Any other escape sequence matches the character that is escaped, for example | |
1129 | \@ matches a literal '@'. | |
1130 | </p> | |
1131 | <h5> | |
1132 | <a name="boost_regex.syntax.perl_syntax.h31"></a> | |
1133 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.perl_extended_patterns"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_extended_patterns">Perl Extended | |
1134 | Patterns</a> | |
1135 | </h5> | |
1136 | <p> | |
1137 | Perl-specific extensions to the regular expression syntax all start with | |
1138 | <code class="literal">(?</code>. | |
1139 | </p> | |
1140 | <h6> | |
1141 | <a name="boost_regex.syntax.perl_syntax.h32"></a> | |
1142 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.named_subexpressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions">Named | |
1143 | Subexpressions</a> | |
1144 | </h6> | |
1145 | <p> | |
1146 | You can create a named subexpression using: | |
1147 | </p> | |
1148 | <pre class="programlisting"><span class="special">(?<</span><span class="identifier">NAME</span><span class="special">></span><span class="identifier">expression</span><span class="special">)</span> | |
1149 | </pre> | |
1150 | <p> | |
1151 | Which can be then be referred to by the name <span class="emphasis"><em>NAME</em></span>. Alternatively | |
1152 | you can delimit the name using 'NAME' as in: | |
1153 | </p> | |
1154 | <pre class="programlisting"><span class="special">(?</span><span class="char">'NAME'</span><span class="identifier">expression</span><span class="special">)</span> | |
1155 | </pre> | |
1156 | <p> | |
1157 | These named subexpressions can be referred to in a backreference using either | |
1158 | <code class="literal">\g{NAME}</code> or <code class="literal">\k<NAME></code> and can | |
1159 | also be referred to by name in a <a class="link" href="../format/perl_format.html" title="Perl Format String Syntax">Perl</a> | |
1160 | format string for search and replace operations, or in the <a class="link" href="../ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> member functions. | |
1161 | </p> | |
1162 | <h6> | |
1163 | <a name="boost_regex.syntax.perl_syntax.h33"></a> | |
1164 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.comments"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.comments">Comments</a> | |
1165 | </h6> | |
1166 | <p> | |
1167 | <code class="literal">(?# ... )</code> is treated as a comment, it's contents are ignored. | |
1168 | </p> | |
1169 | <h6> | |
1170 | <a name="boost_regex.syntax.perl_syntax.h34"></a> | |
1171 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.modifiers"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.modifiers">Modifiers</a> | |
1172 | </h6> | |
1173 | <p> | |
1174 | <code class="literal">(?imsx-imsx ... )</code> alters which of the perl modifiers are | |
1175 | in effect within the pattern, changes take effect from the point that the | |
1176 | block is first seen and extend to any enclosing <code class="literal">)</code>. Letters | |
1177 | before a '-' turn that perl modifier on, letters afterward, turn it off. | |
1178 | </p> | |
1179 | <p> | |
1180 | <code class="literal">(?imsx-imsx:pattern)</code> applies the specified modifiers to | |
1181 | pattern only. | |
1182 | </p> | |
1183 | <h6> | |
1184 | <a name="boost_regex.syntax.perl_syntax.h35"></a> | |
1185 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_marking_groups"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_groups">Non-marking | |
1186 | groups</a> | |
1187 | </h6> | |
1188 | <p> | |
1189 | <code class="literal">(?:pattern)</code> lexically groups pattern, without generating | |
1190 | an additional sub-expression. | |
1191 | </p> | |
1192 | <h6> | |
1193 | <a name="boost_regex.syntax.perl_syntax.h36"></a> | |
1194 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.branch_reset"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.branch_reset">Branch | |
1195 | reset</a> | |
1196 | </h6> | |
1197 | <p> | |
1198 | <code class="literal">(?|pattern)</code> resets the subexpression count at the start | |
1199 | of each "|" alternative within <span class="emphasis"><em>pattern</em></span>. | |
1200 | </p> | |
1201 | <p> | |
1202 | The sub-expression count following this construct is that of whichever branch | |
1203 | had the largest number of sub-expressions. This construct is useful when | |
1204 | you want to capture one of a number of alternative matches in a single sub-expression | |
1205 | index. | |
1206 | </p> | |
1207 | <p> | |
1208 | In the following example the index of each sub-expression is shown below | |
1209 | the expression: | |
1210 | </p> | |
1211 | <pre class="programlisting"># before ---------------branch-reset----------- after | |
1212 | / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x | |
1213 | # 1 2 2 3 2 3 4 | |
1214 | </pre> | |
1215 | <h6> | |
1216 | <a name="boost_regex.syntax.perl_syntax.h37"></a> | |
1217 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.lookahead"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookahead">Lookahead</a> | |
1218 | </h6> | |
1219 | <p> | |
1220 | <code class="literal">(?=pattern)</code> consumes zero characters, only if pattern | |
1221 | matches. | |
1222 | </p> | |
1223 | <p> | |
1224 | <code class="literal">(?!pattern)</code> consumes zero characters, only if pattern | |
1225 | does not match. | |
1226 | </p> | |
1227 | <p> | |
1228 | Lookahead is typically used to create the logical AND of two regular expressions, | |
1229 | for example if a password must contain a lower case letter, an upper case | |
1230 | letter, a punctuation symbol, and be at least 6 characters long, then the | |
1231 | expression: | |
1232 | </p> | |
1233 | <pre class="programlisting"><span class="special">(?=.*[[:</span><span class="identifier">lower</span><span class="special">:]])(?=.*[[:</span><span class="identifier">upper</span><span class="special">:]])(?=.*[[:</span><span class="identifier">punct</span><span class="special">:]]).{</span><span class="number">6</span><span class="special">,}</span> | |
1234 | </pre> | |
1235 | <p> | |
1236 | could be used to validate the password. | |
1237 | </p> | |
1238 | <h6> | |
1239 | <a name="boost_regex.syntax.perl_syntax.h38"></a> | |
1240 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.lookbehind"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookbehind">Lookbehind</a> | |
1241 | </h6> | |
1242 | <p> | |
1243 | <code class="literal">(?<=pattern)</code> consumes zero characters, only if pattern | |
1244 | could be matched against the characters preceding the current position (pattern | |
1245 | must be of fixed length). | |
1246 | </p> | |
1247 | <p> | |
1248 | <code class="literal">(?<!pattern)</code> consumes zero characters, only if pattern | |
1249 | could not be matched against the characters preceding the current position | |
1250 | (pattern must be of fixed length). | |
1251 | </p> | |
1252 | <h6> | |
1253 | <a name="boost_regex.syntax.perl_syntax.h39"></a> | |
1254 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.independent_sub_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.independent_sub_expressions">Independent | |
1255 | sub-expressions</a> | |
1256 | </h6> | |
1257 | <p> | |
1258 | <code class="literal">(?>pattern)</code> <span class="emphasis"><em>pattern</em></span> is matched | |
1259 | independently of the surrounding patterns, the expression will never backtrack | |
1260 | into <span class="emphasis"><em>pattern</em></span>. Independent sub-expressions are typically | |
1261 | used to improve performance; only the best possible match for pattern will | |
1262 | be considered, if this doesn't allow the expression as a whole to match then | |
1263 | no match is found at all. | |
1264 | </p> | |
1265 | <h6> | |
1266 | <a name="boost_regex.syntax.perl_syntax.h40"></a> | |
1267 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.recursive_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions">Recursive | |
1268 | Expressions</a> | |
1269 | </h6> | |
1270 | <p> | |
1271 | <code class="literal">(?<span class="emphasis"><em>N</em></span>) (?-<span class="emphasis"><em>N</em></span>) (?+<span class="emphasis"><em>N</em></span>) | |
1272 | (?R) (?0) (?&NAME)</code> | |
1273 | </p> | |
1274 | <p> | |
1275 | <code class="literal">(?R)</code> and <code class="literal">(?0)</code> recurse to the start | |
1276 | of the entire pattern. | |
1277 | </p> | |
1278 | <p> | |
1279 | <code class="literal">(?<span class="emphasis"><em>N</em></span>)</code> executes sub-expression <span class="emphasis"><em>N</em></span> | |
1280 | recursively, for example <code class="literal">(?2)</code> will recurse to sub-expression | |
1281 | 2. | |
1282 | </p> | |
1283 | <p> | |
1284 | <code class="literal">(?-<span class="emphasis"><em>N</em></span>)</code> and <code class="literal">(?+<span class="emphasis"><em>N</em></span>)</code> | |
1285 | are relative recursions, so for example <code class="literal">(?-1)</code> recurses | |
1286 | to the last sub-expression to be declared, and <code class="literal">(?+1)</code> recurses | |
1287 | to the next sub-expression to be declared. | |
1288 | </p> | |
1289 | <p> | |
1290 | <code class="literal">(?&NAME)</code> recurses to named sub-expression <span class="emphasis"><em>NAME</em></span>. | |
1291 | </p> | |
1292 | <h6> | |
1293 | <a name="boost_regex.syntax.perl_syntax.h41"></a> | |
1294 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.conditional_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.conditional_expressions">Conditional | |
1295 | Expressions</a> | |
1296 | </h6> | |
1297 | <p> | |
1298 | <code class="literal">(?(condition)yes-pattern|no-pattern)</code> attempts to match | |
1299 | <span class="emphasis"><em>yes-pattern</em></span> if the <span class="emphasis"><em>condition</em></span> is | |
1300 | true, otherwise attempts to match <span class="emphasis"><em>no-pattern</em></span>. | |
1301 | </p> | |
1302 | <p> | |
1303 | <code class="literal">(?(condition)yes-pattern)</code> attempts to match <span class="emphasis"><em>yes-pattern</em></span> | |
1304 | if the <span class="emphasis"><em>condition</em></span> is true, otherwise matches the NULL | |
1305 | string. | |
1306 | </p> | |
1307 | <p> | |
1308 | <span class="emphasis"><em>condition</em></span> may be either: a forward lookahead assert, | |
1309 | the index of a marked sub-expression (the condition becomes true if the sub-expression | |
1310 | has been matched), or an index of a recursion (the condition become true | |
1311 | if we are executing directly inside the specified recursion). | |
1312 | </p> | |
1313 | <p> | |
1314 | Here is a summary of the possible predicates: | |
1315 | </p> | |
1316 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
1317 | <li class="listitem"> | |
1318 | <code class="literal">(?(?=assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span> | |
1319 | if the forward look-ahead assert matches, otherwise executes <span class="emphasis"><em>no-pattern</em></span>. | |
1320 | </li> | |
1321 | <li class="listitem"> | |
1322 | <code class="literal">(?(?!assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span> | |
1323 | if the forward look-ahead assert does not match, otherwise executes | |
1324 | <span class="emphasis"><em>no-pattern</em></span>. | |
1325 | </li> | |
1326 | <li class="listitem"> | |
1327 | <code class="literal">(?(<span class="emphasis"><em>N</em></span>)yes-pattern|no-pattern)</code> | |
1328 | Executes <span class="emphasis"><em>yes-pattern</em></span> if subexpression <span class="emphasis"><em>N</em></span> | |
1329 | has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>. | |
1330 | </li> | |
1331 | <li class="listitem"> | |
1332 | <code class="literal">(?(<<span class="emphasis"><em>name</em></span>>)yes-pattern|no-pattern)</code> | |
1333 | Executes <span class="emphasis"><em>yes-pattern</em></span> if named subexpression <span class="emphasis"><em>name</em></span> | |
1334 | has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>. | |
1335 | </li> | |
1336 | <li class="listitem"> | |
1337 | <code class="literal">(?('<span class="emphasis"><em>name</em></span>')yes-pattern|no-pattern)</code> | |
1338 | Executes <span class="emphasis"><em>yes-pattern</em></span> if named subexpression <span class="emphasis"><em>name</em></span> | |
1339 | has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>. | |
1340 | </li> | |
1341 | <li class="listitem"> | |
1342 | <code class="literal">(?(R)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span> | |
1343 | if we are executing inside a recursion, otherwise executes <span class="emphasis"><em>no-pattern</em></span>. | |
1344 | </li> | |
1345 | <li class="listitem"> | |
1346 | <code class="literal">(?(R<span class="emphasis"><em>N</em></span>)yes-pattern|no-pattern)</code> | |
1347 | Executes <span class="emphasis"><em>yes-pattern</em></span> if we are executing inside | |
1348 | a recursion to sub-expression <span class="emphasis"><em>N</em></span>, otherwise executes | |
1349 | <span class="emphasis"><em>no-pattern</em></span>. | |
1350 | </li> | |
1351 | <li class="listitem"> | |
1352 | <code class="literal">(?(R&<span class="emphasis"><em>name</em></span>)yes-pattern|no-pattern)</code> | |
1353 | Executes <span class="emphasis"><em>yes-pattern</em></span> if we are executing inside | |
1354 | a recursion to named sub-expression <span class="emphasis"><em>name</em></span>, otherwise | |
1355 | executes <span class="emphasis"><em>no-pattern</em></span>. | |
1356 | </li> | |
1357 | <li class="listitem"> | |
1358 | <code class="literal">(?(DEFINE)never-exectuted-pattern)</code> Defines a block | |
1359 | of code that is never executed and matches no characters: this is usually | |
1360 | used to define one or more named sub-expressions which are referred to | |
1361 | from elsewhere in the pattern. | |
1362 | </li> | |
1363 | </ul></div> | |
1364 | <h6> | |
1365 | <a name="boost_regex.syntax.perl_syntax.h42"></a> | |
1366 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.backtracking_control_verbs"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.backtracking_control_verbs">Backtracking | |
1367 | Control Verbs</a> | |
1368 | </h6> | |
1369 | <p> | |
1370 | This library has partial support for Perl's backtracking control verbs, in | |
1371 | particular (*MARK) is not supported. There may also be detail differences | |
1372 | in behaviour between this library and Perl, not least because Perl's behaviour | |
1373 | is rather under-documented and often somewhat random in how it behaves in | |
1374 | practice. The verbs supported are: | |
1375 | </p> | |
1376 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
1377 | <li class="listitem"> | |
1378 | <code class="literal">(*PRUNE)</code> Has no effect unless backtracked onto, in | |
1379 | which case all the backtracking information prior to this point is discarded. | |
1380 | </li> | |
1381 | <li class="listitem"> | |
1382 | <code class="literal">(*SKIP)</code> Behaves the same as <code class="literal">(*PRUNE)</code> | |
1383 | except that it is assumed that no match can possibly occur prior to the | |
1384 | current point in the string being searched. This can be used to optimize | |
1385 | searches by skipping over chunks of text that have already been determined | |
1386 | can not form a match. | |
1387 | </li> | |
1388 | <li class="listitem"> | |
1389 | <code class="literal">(*THEN)</code> Has no effect unless backtracked onto, in | |
1390 | which case all subsequent alternatives in a group of alternations are | |
1391 | discarded. | |
1392 | </li> | |
1393 | <li class="listitem"> | |
1394 | <code class="literal">(*COMMIT)</code> Has no effect unless backtracked onto, in | |
1395 | which case all subsequent matching/searching attempts are abandoned. | |
1396 | </li> | |
1397 | <li class="listitem"> | |
1398 | <code class="literal">(*FAIL)</code> Causes the match to fail unconditionally at | |
1399 | this point, can be used to force the engine to backtrack. | |
1400 | </li> | |
1401 | <li class="listitem"> | |
1402 | <code class="literal">(*ACCEPT)</code> Causes the pattern to be considered matched | |
1403 | at the current point. Any half-open sub-expressions are closed at the | |
1404 | current point. | |
1405 | </li> | |
1406 | </ul></div> | |
1407 | <h5> | |
1408 | <a name="boost_regex.syntax.perl_syntax.h43"></a> | |
1409 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.operator_precedence"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.operator_precedence">Operator | |
1410 | precedence</a> | |
1411 | </h5> | |
1412 | <p> | |
1413 | The order of precedence for of operators is as follows: | |
1414 | </p> | |
1415 | <div class="orderedlist"><ol class="orderedlist" type="1"> | |
1416 | <li class="listitem"> | |
1417 | Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span> | |
1418 | <span class="special">[::]</span> <span class="special">[..]</span></code> | |
1419 | </li> | |
1420 | <li class="listitem"> | |
1421 | Escaped characters <code class="literal">\</code> | |
1422 | </li> | |
1423 | <li class="listitem"> | |
1424 | Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code> | |
1425 | </li> | |
1426 | <li class="listitem"> | |
1427 | Grouping <code class="literal">()</code> | |
1428 | </li> | |
1429 | <li class="listitem"> | |
1430 | Single-character-ERE duplication <code class="literal">* + ? {m,n}</code> | |
1431 | </li> | |
1432 | <li class="listitem"> | |
1433 | Concatenation | |
1434 | </li> | |
1435 | <li class="listitem"> | |
1436 | Anchoring ^$ | |
1437 | </li> | |
1438 | <li class="listitem"> | |
1439 | Alternation | | |
1440 | </li> | |
1441 | </ol></div> | |
1442 | <h4> | |
1443 | <a name="boost_regex.syntax.perl_syntax.h44"></a> | |
1444 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.what_gets_matched"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">What | |
1445 | gets matched</a> | |
1446 | </h4> | |
1447 | <p> | |
1448 | If you view the regular expression as a directed (possibly cyclic) graph, | |
1449 | then the best match found is the first match found by a depth-first-search | |
1450 | performed on that graph, while matching the input text. | |
1451 | </p> | |
1452 | <p> | |
1453 | Alternatively: | |
1454 | </p> | |
1455 | <p> | |
1456 | The best match found is the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost | |
1457 | match</a>, with individual elements matched as follows; | |
1458 | </p> | |
1459 | <div class="informaltable"><table class="table"> | |
1460 | <colgroup> | |
1461 | <col> | |
1462 | <col> | |
1463 | </colgroup> | |
1464 | <thead><tr> | |
1465 | <th> | |
1466 | <p> | |
1467 | Construct | |
1468 | </p> | |
1469 | </th> | |
1470 | <th> | |
1471 | <p> | |
1472 | What gets matched | |
1473 | </p> | |
1474 | </th> | |
1475 | </tr></thead> | |
1476 | <tbody> | |
1477 | <tr> | |
1478 | <td> | |
1479 | <p> | |
1480 | <code class="literal">AtomA AtomB</code> | |
1481 | </p> | |
1482 | </td> | |
1483 | <td> | |
1484 | <p> | |
1485 | Locates the best match for <span class="emphasis"><em>AtomA</em></span> that has | |
1486 | a following match for <span class="emphasis"><em>AtomB</em></span>. | |
1487 | </p> | |
1488 | </td> | |
1489 | </tr> | |
1490 | <tr> | |
1491 | <td> | |
1492 | <p> | |
1493 | <code class="literal">Expression1 | Expression2</code> | |
1494 | </p> | |
1495 | </td> | |
1496 | <td> | |
1497 | <p> | |
1498 | If <span class="emphasis"><em>Expresion1</em></span> can be matched then returns | |
1499 | that match, otherwise attempts to match <span class="emphasis"><em>Expression2</em></span>. | |
1500 | </p> | |
1501 | </td> | |
1502 | </tr> | |
1503 | <tr> | |
1504 | <td> | |
1505 | <p> | |
1506 | <code class="literal">S{N}</code> | |
1507 | </p> | |
1508 | </td> | |
1509 | <td> | |
1510 | <p> | |
1511 | Matches <span class="emphasis"><em>S</em></span> repeated exactly N times. | |
1512 | </p> | |
1513 | </td> | |
1514 | </tr> | |
1515 | <tr> | |
1516 | <td> | |
1517 | <p> | |
1518 | <code class="literal">S{N,M}</code> | |
1519 | </p> | |
1520 | </td> | |
1521 | <td> | |
1522 | <p> | |
1523 | Matches S repeated between N and M times, and as many times as | |
1524 | possible. | |
1525 | </p> | |
1526 | </td> | |
1527 | </tr> | |
1528 | <tr> | |
1529 | <td> | |
1530 | <p> | |
1531 | <code class="literal">S{N,M}?</code> | |
1532 | </p> | |
1533 | </td> | |
1534 | <td> | |
1535 | <p> | |
1536 | Matches S repeated between N and M times, and as few times as possible. | |
1537 | </p> | |
1538 | </td> | |
1539 | </tr> | |
1540 | <tr> | |
1541 | <td> | |
1542 | <p> | |
1543 | <code class="literal">S?, S*, S+</code> | |
1544 | </p> | |
1545 | </td> | |
1546 | <td> | |
1547 | <p> | |
1548 | The same as <code class="literal">S{0,1}</code>, <code class="literal">S{0,UINT_MAX}</code>, | |
1549 | <code class="literal">S{1,UINT_MAX}</code> respectively. | |
1550 | </p> | |
1551 | </td> | |
1552 | </tr> | |
1553 | <tr> | |
1554 | <td> | |
1555 | <p> | |
1556 | <code class="literal">S??, S*?, S+?</code> | |
1557 | </p> | |
1558 | </td> | |
1559 | <td> | |
1560 | <p> | |
1561 | The same as <code class="literal">S{0,1}?</code>, <code class="literal">S{0,UINT_MAX}?</code>, | |
1562 | <code class="literal">S{1,UINT_MAX}?</code> respectively. | |
1563 | </p> | |
1564 | </td> | |
1565 | </tr> | |
1566 | <tr> | |
1567 | <td> | |
1568 | <p> | |
1569 | <code class="literal">(?>S)</code> | |
1570 | </p> | |
1571 | </td> | |
1572 | <td> | |
1573 | <p> | |
1574 | Matches the best match for <span class="emphasis"><em>S</em></span>, and only that. | |
1575 | </p> | |
1576 | </td> | |
1577 | </tr> | |
1578 | <tr> | |
1579 | <td> | |
1580 | <p> | |
1581 | <code class="literal">(?=S), (?<=S)</code> | |
1582 | </p> | |
1583 | </td> | |
1584 | <td> | |
1585 | <p> | |
1586 | Matches only the best match for <span class="emphasis"><em>S</em></span> (this is | |
1587 | only visible if there are capturing parenthesis within <span class="emphasis"><em>S</em></span>). | |
1588 | </p> | |
1589 | </td> | |
1590 | </tr> | |
1591 | <tr> | |
1592 | <td> | |
1593 | <p> | |
1594 | <code class="literal">(?!S), (?<!S)</code> | |
1595 | </p> | |
1596 | </td> | |
1597 | <td> | |
1598 | <p> | |
1599 | Considers only whether a match for S exists or not. | |
1600 | </p> | |
1601 | </td> | |
1602 | </tr> | |
1603 | <tr> | |
1604 | <td> | |
1605 | <p> | |
1606 | <code class="literal">(?(condition)yes-pattern | no-pattern)</code> | |
1607 | </p> | |
1608 | </td> | |
1609 | <td> | |
1610 | <p> | |
1611 | If condition is true, then only yes-pattern is considered, otherwise | |
1612 | only no-pattern is considered. | |
1613 | </p> | |
1614 | </td> | |
1615 | </tr> | |
1616 | </tbody> | |
1617 | </table></div> | |
1618 | <h4> | |
1619 | <a name="boost_regex.syntax.perl_syntax.h45"></a> | |
1620 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.variations"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.variations">Variations</a> | |
1621 | </h4> | |
1622 | <p> | |
1623 | The <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">options | |
1624 | <code class="literal">normal</code>, <code class="literal">ECMAScript</code>, <code class="literal">JavaScript</code> | |
1625 | and <code class="literal">JScript</code></a> are all synonyms for <code class="literal">perl</code>. | |
1626 | </p> | |
1627 | <h4> | |
1628 | <a name="boost_regex.syntax.perl_syntax.h46"></a> | |
1629 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.options"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.options">Options</a> | |
1630 | </h4> | |
1631 | <p> | |
1632 | There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">variety | |
1633 | of flags</a> that may be combined with the <code class="literal">perl</code> option | |
1634 | when constructing the regular expression, in particular note that the <code class="literal">newline_alt</code> | |
1635 | option alters the syntax, while the <code class="literal">collate</code>, <code class="literal">nosubs</code> | |
1636 | and <code class="literal">icase</code> options modify how the case and locale sensitivity | |
1637 | are to be applied. | |
1638 | </p> | |
1639 | <h4> | |
1640 | <a name="boost_regex.syntax.perl_syntax.h47"></a> | |
1641 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.pattern_modifiers"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.pattern_modifiers">Pattern | |
1642 | Modifiers</a> | |
1643 | </h4> | |
1644 | <p> | |
1645 | The perl <code class="literal">smix</code> modifiers can either be applied using a | |
1646 | <code class="literal">(?smix-smix)</code> prefix to the regular expression, or with | |
1647 | one of the <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">regex-compile | |
1648 | time flags <code class="literal">no_mod_m</code>, <code class="literal">mod_x</code>, <code class="literal">mod_s</code>, | |
1649 | and <code class="literal">no_mod_s</code></a>. | |
1650 | </p> | |
1651 | <h4> | |
1652 | <a name="boost_regex.syntax.perl_syntax.h48"></a> | |
1653 | <span class="phrase"><a name="boost_regex.syntax.perl_syntax.references"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.references">References</a> | |
1654 | </h4> | |
1655 | <p> | |
1656 | <a href="http://perldoc.perl.org/perlre.html" target="_top">Perl 5.8</a>. | |
1657 | </p> | |
1658 | </div> | |
1659 | <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> | |
1660 | <td align="left"></td> | |
1661 | <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> | |
1662 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
1663 | file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) | |
1664 | </p> | |
1665 | </div></td> | |
1666 | </tr></table> | |
1667 | <hr> | |
1668 | <div class="spirit-nav"> | |
1669 | <a accesskey="p" href="../syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_extended.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
1670 | </div> | |
1671 | </body> | |
1672 | </html> |