ceph/src/boost/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

   1 <html>
   2 <head>
   3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
   4 <title>Perl Regular Expression Syntax</title>
   5 <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
   6 <meta name="generator" content="DocBook XSL Stylesheets V1.77.1">
   7 <link rel="home" href="../../index.html" title="Boost.Regex 5.1.2">
   8 <link rel="up" href="../syntax.html" title="Regular Expression Syntax">
   9 <link rel="prev" href="../syntax.html" title="Regular Expression Syntax">
  10 <link rel="next" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">
  11 </head>
  12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13 <table cellpadding="2" width="100%"><tr>
  14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
  15 <td align="center"><a href="../../../../../../index.html">Home</a></td>
  16 <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
  17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19 <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
  20 </tr></table>
  21 <hr>
  22 <div class="spirit-nav">
  23 <a accesskey="p" href="../syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_extended.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
  24 </div>
  25 <div class="section">
  26 <div class="titlepage"><div><div><h3 class="title">
  27 <a name="boost_regex.syntax.perl_syntax"></a><a class="link" href="perl_syntax.html" title="Perl Regular Expression Syntax">Perl Regular Expression
  28       Syntax</a>
  29 </h3></div></div></div>
  30 <h4>
  31 <a name="boost_regex.syntax.perl_syntax.h0"></a>
  32         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.synopsis"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.synopsis">Synopsis</a>
  33       </h4>
  34 <p>
  35         The Perl regular expression syntax is based on that used by the programming
  36         language Perl . Perl regular expressions are the default behavior in Boost.Regex
  37         or you can pass the flag <code class="literal">perl</code> to the <a class="link" href="../ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a> constructor, for example:
  38       </p>
  39 <pre class="programlisting"><span class="comment">// e1 is a case sensitive Perl regular expression: </span>
  40 <span class="comment">// since Perl is the default option there's no need to explicitly specify the syntax used here:</span>
  41 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">);</span>
  42 <span class="comment">// e2 a case insensitive Perl regular expression:</span>
  43 <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">perl</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
  44 </pre>
  45 <h4>
  46 <a name="boost_regex.syntax.perl_syntax.h1"></a>
  47         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.perl_regular_expression_syntax"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_regular_expression_syntax">Perl
  48         Regular Expression Syntax</a>
  49       </h4>
  50 <p>
  51         In Perl regular expressions, all characters match themselves except for the
  52         following special characters:
  53       </p>
  54 <pre class="programlisting">.[{}()\*+?|^$</pre>
  55 <h5>
  56 <a name="boost_regex.syntax.perl_syntax.h2"></a>
  57         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.wildcard"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.wildcard">Wildcard</a>
  58       </h5>
  59 <p>
  60         The single character '.' when used outside of a character set will match
  61         any single character except:
  62       </p>
  63 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  64 <li class="listitem">
  65             The NULL character when the <a class="link" href="../ref/match_flag_type.html" title="match_flag_type">flag
  66             <code class="literal">match_not_dot_null</code></a> is passed to the matching
  67             algorithms.
  68           </li>
  69 <li class="listitem">
  70             The newline character when the <a class="link" href="../ref/match_flag_type.html" title="match_flag_type">flag
  71             <code class="literal">match_not_dot_newline</code></a> is passed to the matching
  72             algorithms.
  73           </li>
  74 </ul></div>
  75 <h5>
  76 <a name="boost_regex.syntax.perl_syntax.h3"></a>
  77         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.anchors"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.anchors">Anchors</a>
  78       </h5>
  79 <p>
  80         A '^' character shall match the start of a line.
  81       </p>
  82 <p>
  83         A '$' character shall match the end of a line.
  84       </p>
  85 <h5>
  86 <a name="boost_regex.syntax.perl_syntax.h4"></a>
  87         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.marked_sub_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.marked_sub_expressions">Marked sub-expressions</a>
  88       </h5>
  89 <p>
  90         A section beginning <code class="literal">(</code> and ending <code class="literal">)</code>
  91         acts as a marked sub-expression. Whatever matched the sub-expression is split
  92         out in a separate field by the matching algorithms. Marked sub-expressions
  93         can also repeated, or referred to by a back-reference.
  94       </p>
  95 <h5>
  96 <a name="boost_regex.syntax.perl_syntax.h5"></a>
  97         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_marking_grouping"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_grouping">Non-marking
  98         grouping</a>
  99       </h5>
 100 <p>
 101         A marked sub-expression is useful to lexically group part of a regular expression,
 102         but has the side-effect of spitting out an extra field in the result. As
 103         an alternative you can lexically group part of a regular expression, without
 104         generating a marked sub-expression by using <code class="literal">(?:</code> and <code class="literal">)</code>
 105         , for example <code class="literal">(?:ab)+</code> will repeat <code class="literal">ab</code>
 106         without splitting out any separate sub-expressions.
 107       </p>
 108 <h5>
 109 <a name="boost_regex.syntax.perl_syntax.h6"></a>
 110         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.repeats">Repeats</a>
 111       </h5>
 112 <p>
 113         Any atom (a single character, a marked sub-expression, or a character class)
 114         can be repeated with the <code class="literal">*</code>, <code class="literal">+</code>, <code class="literal">?</code>,
 115         and <code class="literal">{}</code> operators.
 116       </p>
 117 <p>
 118         The <code class="literal">*</code> operator will match the preceding atom zero or more
 119         times, for example the expression <code class="literal">a*b</code> will match any of
 120         the following:
 121       </p>
 122 <pre class="programlisting"><span class="identifier">b</span>
 123 <span class="identifier">ab</span>
 124 <span class="identifier">aaaaaaaab</span>
 125 </pre>
 126 <p>
 127         The <code class="literal">+</code> operator will match the preceding atom one or more
 128         times, for example the expression <code class="literal">a+b</code> will match any of
 129         the following:
 130       </p>
 131 <pre class="programlisting"><span class="identifier">ab</span>
 132 <span class="identifier">aaaaaaaab</span>
 133 </pre>
 134 <p>
 135         But will not match:
 136       </p>
 137 <pre class="programlisting"><span class="identifier">b</span>
 138 </pre>
 139 <p>
 140         The <code class="literal">?</code> operator will match the preceding atom zero or one
 141         times, for example the expression ca?b will match any of the following:
 142       </p>
 143 <pre class="programlisting"><span class="identifier">cb</span>
 144 <span class="identifier">cab</span>
 145 </pre>
 146 <p>
 147         But will not match:
 148       </p>
 149 <pre class="programlisting"><span class="identifier">caab</span>
 150 </pre>
 151 <p>
 152         An atom can also be repeated with a bounded repeat:
 153       </p>
 154 <p>
 155         <code class="literal">a{n}</code> Matches 'a' repeated exactly n times.
 156       </p>
 157 <p>
 158         <code class="literal">a{n,}</code> Matches 'a' repeated n or more times.
 159       </p>
 160 <p>
 161         <code class="literal">a{n, m}</code> Matches 'a' repeated between n and m times inclusive.
 162       </p>
 163 <p>
 164         For example:
 165       </p>
 166 <pre class="programlisting">^a{2,3}$</pre>
 167 <p>
 168         Will match either of:
 169       </p>
 170 <pre class="programlisting"><span class="identifier">aa</span>
 171 <span class="identifier">aaa</span>
 172 </pre>
 173 <p>
 174         But neither of:
 175       </p>
 176 <pre class="programlisting"><span class="identifier">a</span>
 177 <span class="identifier">aaaa</span>
 178 </pre>
 179 <p>
 180         Note that the "{" and "}" characters will treated as
 181         ordinary literals when used in a context that is not a repeat: this matches
 182         Perl 5.x behavior. For example in the expressions "ab{1", "ab1}"
 183         and "a{b}c" the curly brackets are all treated as literals and
 184         <span class="emphasis"><em>no error will be raised</em></span>.
 185       </p>
 186 <p>
 187         It is an error to use a repeat operator, if the preceding construct can not
 188         be repeated, for example:
 189       </p>
 190 <pre class="programlisting"><span class="identifier">a</span><span class="special">(*)</span>
 191 </pre>
 192 <p>
 193         Will raise an error, as there is nothing for the <code class="literal">*</code> operator
 194         to be applied to.
 195       </p>
 196 <h5>
 197 <a name="boost_regex.syntax.perl_syntax.h7"></a>
 198         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_greedy_repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_greedy_repeats">Non
 199         greedy repeats</a>
 200       </h5>
 201 <p>
 202         The normal repeat operators are "greedy", that is to say they will
 203         consume as much input as possible. There are non-greedy versions available
 204         that will consume as little input as possible while still producing a match.
 205       </p>
 206 <p>
 207         <code class="literal">*?</code> Matches the previous atom zero or more times, while
 208         consuming as little input as possible.
 209       </p>
 210 <p>
 211         <code class="literal">+?</code> Matches the previous atom one or more times, while
 212         consuming as little input as possible.
 213       </p>
 214 <p>
 215         <code class="literal">??</code> Matches the previous atom zero or one times, while
 216         consuming as little input as possible.
 217       </p>
 218 <p>
 219         <code class="literal">{n,}?</code> Matches the previous atom n or more times, while
 220         consuming as little input as possible.
 221       </p>
 222 <p>
 223         <code class="literal">{n,m}?</code> Matches the previous atom between n and m times,
 224         while consuming as little input as possible.
 225       </p>
 226 <h5>
 227 <a name="boost_regex.syntax.perl_syntax.h8"></a>
 228         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.possessive_repeats"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.possessive_repeats">Possessive
 229         repeats</a>
 230       </h5>
 231 <p>
 232         By default when a repeated pattern does not match then the engine will backtrack
 233         until a match is found. However, this behaviour can sometime be undesireble
 234         so there are also "possessive" repeats: these match as much as
 235         possible and do not then allow backtracking if the rest of the expression
 236         fails to match.
 237       </p>
 238 <p>
 239         <code class="literal">*+</code> Matches the previous atom zero or more times, while
 240         giving nothing back.
 241       </p>
 242 <p>
 243         <code class="literal">++</code> Matches the previous atom one or more times, while
 244         giving nothing back.
 245       </p>
 246 <p>
 247         <code class="literal">?+</code> Matches the previous atom zero or one times, while
 248         giving nothing back.
 249       </p>
 250 <p>
 251         <code class="literal">{n,}+</code> Matches the previous atom n or more times, while
 252         giving nothing back.
 253       </p>
 254 <p>
 255         <code class="literal">{n,m}+</code> Matches the previous atom between n and m times,
 256         while giving nothing back.
 257       </p>
 258 <h5>
 259 <a name="boost_regex.syntax.perl_syntax.h9"></a>
 260         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.back_references"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.back_references">Back
 261         references</a>
 262       </h5>
 263 <p>
 264         An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span>
 265         is in the range 1-9, matches the same string that was matched by sub-expression
 266         <span class="emphasis"><em>n</em></span>. For example the expression:
 267       </p>
 268 <pre class="programlisting">^(a*).*\1$</pre>
 269 <p>
 270         Will match the string:
 271       </p>
 272 <pre class="programlisting"><span class="identifier">aaabbaaa</span>
 273 </pre>
 274 <p>
 275         But not the string:
 276       </p>
 277 <pre class="programlisting"><span class="identifier">aaabba</span>
 278 </pre>
 279 <p>
 280         You can also use the \g escape for the same function, for example:
 281       </p>
 282 <div class="informaltable"><table class="table">
 283 <colgroup>
 284 <col>
 285 <col>
 286 </colgroup>
 287 <thead><tr>
 288 <th>
 289                 <p>
 290                   Escape
 291                 </p>
 292               </th>
 293 <th>
 294                 <p>
 295                   Meaning
 296                 </p>
 297               </th>
 298 </tr></thead>
 299 <tbody>
 300 <tr>
 301 <td>
 302                 <p>
 303                   <code class="literal">\g1</code>
 304                 </p>
 305               </td>
 306 <td>
 307                 <p>
 308                   Match whatever matched sub-expression 1
 309                 </p>
 310               </td>
 311 </tr>
 312 <tr>
 313 <td>
 314                 <p>
 315                   <code class="literal">\g{1}</code>
 316                 </p>
 317               </td>
 318 <td>
 319                 <p>
 320                   Match whatever matched sub-expression 1: this form allows for safer
 321                   parsing of the expression in cases like <code class="literal">\g{1}2</code>
 322                   or for indexes higher than 9 as in <code class="literal">\g{1234}</code>
 323                 </p>
 324               </td>
 325 </tr>
 326 <tr>
 327 <td>
 328                 <p>
 329                   <code class="literal">\g-1</code>
 330                 </p>
 331               </td>
 332 <td>
 333                 <p>
 334                   Match whatever matched the last opened sub-expression
 335                 </p>
 336               </td>
 337 </tr>
 338 <tr>
 339 <td>
 340                 <p>
 341                   <code class="literal">\g{-2}</code>
 342                 </p>
 343               </td>
 344 <td>
 345                 <p>
 346                   Match whatever matched the last but one opened sub-expression
 347                 </p>
 348               </td>
 349 </tr>
 350 <tr>
 351 <td>
 352                 <p>
 353                   <code class="literal">\g{one}</code>
 354                 </p>
 355               </td>
 356 <td>
 357                 <p>
 358                   Match whatever matched the sub-expression named "one"
 359                 </p>
 360               </td>
 361 </tr>
 362 </tbody>
 363 </table></div>
 364 <p>
 365         Finally the \k escape can be used to refer to named subexpressions, for example
 366         <code class="literal">\k&lt;two&gt;</code> will match whatever matched the subexpression
 367         named "two".
 368       </p>
 369 <h5>
 370 <a name="boost_regex.syntax.perl_syntax.h10"></a>
 371         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.alternation"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.alternation">Alternation</a>
 372       </h5>
 373 <p>
 374         The <code class="literal">|</code> operator will match either of its arguments, so
 375         for example: <code class="literal">abc|def</code> will match either "abc"
 376         or "def".
 377       </p>
 378 <p>
 379         Parenthesis can be used to group alternations, for example: <code class="literal">ab(d|ef)</code>
 380         will match either of "abd" or "abef".
 381       </p>
 382 <p>
 383         Empty alternatives are not allowed (these are almost always a mistake), but
 384         if you really want an empty alternative use <code class="literal">(?:)</code> as a
 385         placeholder, for example:
 386       </p>
 387 <p>
 388         <code class="literal">|abc</code> is not a valid expression, but
 389       </p>
 390 <p>
 391         <code class="literal">(?:)|abc</code> is and is equivalent, also the expression:
 392       </p>
 393 <p>
 394         <code class="literal">(?:abc)??</code> has exactly the same effect.
 395       </p>
 396 <h5>
 397 <a name="boost_regex.syntax.perl_syntax.h11"></a>
 398         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_sets"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets">Character
 399         sets</a>
 400       </h5>
 401 <p>
 402         A character set is a bracket-expression starting with <code class="literal">[] and ending
 403         with <code class="literal"></code></code>, it defines a set of characters, and matches
 404         any single character that is a member of that set.
 405       </p>
 406 <p>
 407         A bracket expression may contain any combination of the following:
 408       </p>
 409 <h6>
 410 <a name="boost_regex.syntax.perl_syntax.h12"></a>
 411         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.single_characters"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.single_characters">Single
 412         characters</a>
 413       </h6>
 414 <p>
 415         For example <code class="literal">[abc]</code>, will match any of the characters 'a',
 416         'b', or 'c'.
 417       </p>
 418 <h6>
 419 <a name="boost_regex.syntax.perl_syntax.h13"></a>
 420         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_ranges"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_ranges">Character
 421         ranges</a>
 422       </h6>
 423 <p>
 424         For example <code class="literal">[a-c]</code> will match any single character in the
 425         range 'a' to 'c'. By default, for Perl regular expressions, a character x
 426         is within the range y to z, if the code point of the character lies within
 427         the codepoints of the endpoints of the range. Alternatively, if you set the
 428         <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions"><code class="literal">collate</code>
 429         flag</a> when constructing the regular expression, then ranges are locale
 430         sensitive.
 431       </p>
 432 <h6>
 433 <a name="boost_regex.syntax.perl_syntax.h14"></a>
 434         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.negation"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.negation">Negation</a>
 435       </h6>
 436 <p>
 437         If the bracket-expression begins with the ^ character, then it matches the
 438         complement of the characters it contains, for example <code class="literal">[^a-c]</code>
 439         matches any character that is not in the range <code class="literal">a-c</code>.
 440       </p>
 441 <h6>
 442 <a name="boost_regex.syntax.perl_syntax.h15"></a>
 443         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_classes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_classes">Character
 444         classes</a>
 445       </h6>
 446 <p>
 447         An expression of the form <code class="literal">[[:name:]]</code> matches the named
 448         character class "name", for example <code class="literal">[[:lower:]]</code>
 449         matches any lower case character. See <a class="link" href="character_classes.html" title="Character Class Names">character
 450         class names</a>.
 451       </p>
 452 <h6>
 453 <a name="boost_regex.syntax.perl_syntax.h16"></a>
 454         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.collating_elements"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.collating_elements">Collating
 455         Elements</a>
 456       </h6>
 457 <p>
 458         An expression of the form <code class="literal">[[.col.]]</code> matches the collating
 459         element <span class="emphasis"><em>col</em></span>. A collating element is any single character,
 460         or any sequence of characters that collates as a single unit. Collating elements
 461         may also be used as the end point of a range, for example: <code class="literal">[[.ae.]-c]</code>
 462         matches the character sequence "ae", plus any single character
 463         in the range "ae"-c, assuming that "ae" is treated as
 464         a single collating element in the current locale.
 465       </p>
 466 <p>
 467         As an extension, a collating element may also be specified via it's <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>, for example:
 468       </p>
 469 <pre class="programlisting"><span class="special">[[.</span><span class="identifier">NUL</span><span class="special">.]]</span>
 470 </pre>
 471 <p>
 472         matches a <code class="literal">\0</code> character.
 473       </p>
 474 <h6>
 475 <a name="boost_regex.syntax.perl_syntax.h17"></a>
 476         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.equivalence_classes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.equivalence_classes">Equivalence
 477         classes</a>
 478       </h6>
 479 <p>
 480         An expression of the form <code class="literal">[[=col=]]</code>, matches any character
 481         or collating element whose primary sort key is the same as that for collating
 482         element <span class="emphasis"><em>col</em></span>, as with collating elements the name <span class="emphasis"><em>col</em></span>
 483         may be a <a class="link" href="collating_names.html" title="Collating Names">symbolic name</a>.
 484         A primary sort key is one that ignores case, accentation, or locale-specific
 485         tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches
 486         any of the characters: a, &#192;, &#193;, &#194;, &#195;, &#196;, &#197;, A, &#224;, &#225;, &#226;, &#227;, &#228; and &#229;. Unfortunately implementation
 487         of this is reliant on the platform's collation and localisation support;
 488         this feature can not be relied upon to work portably across all platforms,
 489         or even all locales on one platform.
 490       </p>
 491 <h6>
 492 <a name="boost_regex.syntax.perl_syntax.h18"></a>
 493         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.escaped_characters"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escaped_characters">Escaped
 494         Characters</a>
 495       </h6>
 496 <p>
 497         All the escape sequences that match a single character, or a single character
 498         class are permitted within a character class definition. For example <code class="computeroutput"><span class="special">[\[\]]</span></code> would match either of <code class="computeroutput"><span class="special">[</span></code> or <code class="computeroutput"><span class="special">]</span></code>
 499         while <code class="computeroutput"><span class="special">[\</span><span class="identifier">W</span><span class="special">\</span><span class="identifier">d</span><span class="special">]</span></code>
 500         would match any character that is either a "digit", <span class="emphasis"><em>or</em></span>
 501         is <span class="emphasis"><em>not</em></span> a "word" character.
 502       </p>
 503 <h6>
 504 <a name="boost_regex.syntax.perl_syntax.h19"></a>
 505         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.combinations"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.combinations">Combinations</a>
 506       </h6>
 507 <p>
 508         All of the above can be combined in one character set declaration, for example:
 509         <code class="literal">[[:digit:]a-c[.NUL.]]</code>.
 510       </p>
 511 <h5>
 512 <a name="boost_regex.syntax.perl_syntax.h20"></a>
 513         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.escapes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escapes">Escapes</a>
 514       </h5>
 515 <p>
 516         Any special character preceded by an escape shall match itself.
 517       </p>
 518 <p>
 519         The following escape sequences are all synonyms for single characters:
 520       </p>
 521 <div class="informaltable"><table class="table">
 522 <colgroup>
 523 <col>
 524 <col>
 525 </colgroup>
 526 <thead><tr>
 527 <th>
 528                 <p>
 529                   Escape
 530                 </p>
 531               </th>
 532 <th>
 533                 <p>
 534                   Character
 535                 </p>
 536               </th>
 537 </tr></thead>
 538 <tbody>
 539 <tr>
 540 <td>
 541                 <p>
 542                   <code class="literal">\a</code>
 543                 </p>
 544               </td>
 545 <td>
 546                 <p>
 547                   <code class="literal">\a</code>
 548                 </p>
 549               </td>
 550 </tr>
 551 <tr>
 552 <td>
 553                 <p>
 554                   <code class="literal">\e</code>
 555                 </p>
 556               </td>
 557 <td>
 558                 <p>
 559                   <code class="literal">0x1B</code>
 560                 </p>
 561               </td>
 562 </tr>
 563 <tr>
 564 <td>
 565                 <p>
 566                   <code class="literal">\f</code>
 567                 </p>
 568               </td>
 569 <td>
 570                 <p>
 571                   <code class="literal">\f</code>
 572                 </p>
 573               </td>
 574 </tr>
 575 <tr>
 576 <td>
 577                 <p>
 578                   <code class="literal"><br> </code>
 579                 </p>
 580               </td>
 581 <td>
 582                 <p>
 583                   <code class="literal"><br> </code>
 584                 </p>
 585               </td>
 586 </tr>
 587 <tr>
 588 <td>
 589                 <p>
 590                   <code class="literal">\r</code>
 591                 </p>
 592               </td>
 593 <td>
 594                 <p>
 595                   <code class="literal">\r</code>
 596                 </p>
 597               </td>
 598 </tr>
 599 <tr>
 600 <td>
 601                 <p>
 602                   <code class="literal">\t</code>
 603                 </p>
 604               </td>
 605 <td>
 606                 <p>
 607                   <code class="literal">\t</code>
 608                 </p>
 609               </td>
 610 </tr>
 611 <tr>
 612 <td>
 613                 <p>
 614                   <code class="literal">\v</code>
 615                 </p>
 616               </td>
 617 <td>
 618                 <p>
 619                   <code class="literal">\v</code>
 620                 </p>
 621               </td>
 622 </tr>
 623 <tr>
 624 <td>
 625                 <p>
 626                   <code class="literal">\b</code>
 627                 </p>
 628               </td>
 629 <td>
 630                 <p>
 631                   <code class="literal">\b</code> (but only inside a character class declaration).
 632                 </p>
 633               </td>
 634 </tr>
 635 <tr>
 636 <td>
 637                 <p>
 638                   <code class="literal">\cX</code>
 639                 </p>
 640               </td>
 641 <td>
 642                 <p>
 643                   An ASCII escape sequence - the character whose code point is X
 644                   % 32
 645                 </p>
 646               </td>
 647 </tr>
 648 <tr>
 649 <td>
 650                 <p>
 651                   <code class="literal">\xdd</code>
 652                 </p>
 653               </td>
 654 <td>
 655                 <p>
 656                   A hexadecimal escape sequence - matches the single character whose
 657                   code point is 0xdd.
 658                 </p>
 659               </td>
 660 </tr>
 661 <tr>
 662 <td>
 663                 <p>
 664                   <code class="literal">\x{dddd}</code>
 665                 </p>
 666               </td>
 667 <td>
 668                 <p>
 669                   A hexadecimal escape sequence - matches the single character whose
 670                   code point is 0xdddd.
 671                 </p>
 672               </td>
 673 </tr>
 674 <tr>
 675 <td>
 676                 <p>
 677                   <code class="literal">\0ddd</code>
 678                 </p>
 679               </td>
 680 <td>
 681                 <p>
 682                   An octal escape sequence - matches the single character whose code
 683                   point is 0ddd.
 684                 </p>
 685               </td>
 686 </tr>
 687 <tr>
 688 <td>
 689                 <p>
 690                   <code class="literal">\N{name}</code>
 691                 </p>
 692               </td>
 693 <td>
 694                 <p>
 695                   Matches the single character which has the <a class="link" href="collating_names.html" title="Collating Names">symbolic
 696                   name</a> <span class="emphasis"><em>name</em></span>. For example <code class="literal">\N{newline}</code>
 697                   matches the single character \n.
 698                 </p>
 699               </td>
 700 </tr>
 701 </tbody>
 702 </table></div>
 703 <h6>
 704 <a name="boost_regex.syntax.perl_syntax.h21"></a>
 705         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.single_character_character_class"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.single_character_character_class">"Single
 706         character" character classes:</a>
 707       </h6>
 708 <p>
 709         Any escaped character <span class="emphasis"><em>x</em></span>, if <span class="emphasis"><em>x</em></span> is
 710         the name of a character class shall match any character that is a member
 711         of that class, and any escaped character <span class="emphasis"><em>X</em></span>, if <span class="emphasis"><em>x</em></span>
 712         is the name of a character class, shall match any character not in that class.
 713       </p>
 714 <p>
 715         The following are supported by default:
 716       </p>
 717 <div class="informaltable"><table class="table">
 718 <colgroup>
 719 <col>
 720 <col>
 721 </colgroup>
 722 <thead><tr>
 723 <th>
 724                 <p>
 725                   Escape sequence
 726                 </p>
 727               </th>
 728 <th>
 729                 <p>
 730                   Equivalent to
 731                 </p>
 732               </th>
 733 </tr></thead>
 734 <tbody>
 735 <tr>
 736 <td>
 737                 <p>
 738                   <code class="computeroutput"><span class="special">\</span><span class="identifier">d</span></code>
 739                 </p>
 740               </td>
 741 <td>
 742                 <p>
 743                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
 744                 </p>
 745               </td>
 746 </tr>
 747 <tr>
 748 <td>
 749                 <p>
 750                   <code class="computeroutput"><span class="special">\</span><span class="identifier">l</span></code>
 751                 </p>
 752               </td>
 753 <td>
 754                 <p>
 755                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
 756                 </p>
 757               </td>
 758 </tr>
 759 <tr>
 760 <td>
 761                 <p>
 762                   <code class="computeroutput"><span class="special">\</span><span class="identifier">s</span></code>
 763                 </p>
 764               </td>
 765 <td>
 766                 <p>
 767                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
 768                 </p>
 769               </td>
 770 </tr>
 771 <tr>
 772 <td>
 773                 <p>
 774                   <code class="computeroutput"><span class="special">\</span><span class="identifier">u</span></code>
 775                 </p>
 776               </td>
 777 <td>
 778                 <p>
 779                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
 780                 </p>
 781               </td>
 782 </tr>
 783 <tr>
 784 <td>
 785                 <p>
 786                   <code class="computeroutput"><span class="special">\</span><span class="identifier">w</span></code>
 787                 </p>
 788               </td>
 789 <td>
 790                 <p>
 791                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
 792                 </p>
 793               </td>
 794 </tr>
 795 <tr>
 796 <td>
 797                 <p>
 798                   <code class="computeroutput"><span class="special">\</span><span class="identifier">h</span></code>
 799                 </p>
 800               </td>
 801 <td>
 802                 <p>
 803                   Horizontal whitespace
 804                 </p>
 805               </td>
 806 </tr>
 807 <tr>
 808 <td>
 809                 <p>
 810                   <code class="computeroutput"><span class="special">\</span><span class="identifier">v</span></code>
 811                 </p>
 812               </td>
 813 <td>
 814                 <p>
 815                   Vertical whitespace
 816                 </p>
 817               </td>
 818 </tr>
 819 <tr>
 820 <td>
 821                 <p>
 822                   <code class="computeroutput"><span class="special">\</span><span class="identifier">D</span></code>
 823                 </p>
 824               </td>
 825 <td>
 826                 <p>
 827                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">digit</span><span class="special">:]]</span></code>
 828                 </p>
 829               </td>
 830 </tr>
 831 <tr>
 832 <td>
 833                 <p>
 834                   <code class="computeroutput"><span class="special">\</span><span class="identifier">L</span></code>
 835                 </p>
 836               </td>
 837 <td>
 838                 <p>
 839                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">lower</span><span class="special">:]]</span></code>
 840                 </p>
 841               </td>
 842 </tr>
 843 <tr>
 844 <td>
 845                 <p>
 846                   <code class="computeroutput"><span class="special">\</span><span class="identifier">S</span></code>
 847                 </p>
 848               </td>
 849 <td>
 850                 <p>
 851                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">space</span><span class="special">:]]</span></code>
 852                 </p>
 853               </td>
 854 </tr>
 855 <tr>
 856 <td>
 857                 <p>
 858                   <code class="computeroutput"><span class="special">\</span><span class="identifier">U</span></code>
 859                 </p>
 860               </td>
 861 <td>
 862                 <p>
 863                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">upper</span><span class="special">:]]</span></code>
 864                 </p>
 865               </td>
 866 </tr>
 867 <tr>
 868 <td>
 869                 <p>
 870                   <code class="computeroutput"><span class="special">\</span><span class="identifier">W</span></code>
 871                 </p>
 872               </td>
 873 <td>
 874                 <p>
 875                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">word</span><span class="special">:]]</span></code>
 876                 </p>
 877               </td>
 878 </tr>
 879 <tr>
 880 <td>
 881                 <p>
 882                   <code class="computeroutput"><span class="special">\</span><span class="identifier">H</span></code>
 883                 </p>
 884               </td>
 885 <td>
 886                 <p>
 887                   Not Horizontal whitespace
 888                 </p>
 889               </td>
 890 </tr>
 891 <tr>
 892 <td>
 893                 <p>
 894                   <code class="computeroutput"><span class="special">\</span><span class="identifier">V</span></code>
 895                 </p>
 896               </td>
 897 <td>
 898                 <p>
 899                   Not Vertical whitespace
 900                 </p>
 901               </td>
 902 </tr>
 903 </tbody>
 904 </table></div>
 905 <h6>
 906 <a name="boost_regex.syntax.perl_syntax.h22"></a>
 907         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.character_properties"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_properties">Character
 908         Properties</a>
 909       </h6>
 910 <p>
 911         The character property names in the following table are all equivalent to
 912         the <a class="link" href="character_classes.html" title="Character Class Names">names used in character
 913         classes</a>.
 914       </p>
 915 <div class="informaltable"><table class="table">
 916 <colgroup>
 917 <col>
 918 <col>
 919 <col>
 920 </colgroup>
 921 <thead><tr>
 922 <th>
 923                 <p>
 924                   Form
 925                 </p>
 926               </th>
 927 <th>
 928                 <p>
 929                   Description
 930                 </p>
 931               </th>
 932 <th>
 933                 <p>
 934                   Equivalent character set form
 935                 </p>
 936               </th>
 937 </tr></thead>
 938 <tbody>
 939 <tr>
 940 <td>
 941                 <p>
 942                   <code class="computeroutput"><span class="special">\</span><span class="identifier">pX</span></code>
 943                 </p>
 944               </td>
 945 <td>
 946                 <p>
 947                   Matches any character that has the property X.
 948                 </p>
 949               </td>
 950 <td>
 951                 <p>
 952                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
 953                 </p>
 954               </td>
 955 </tr>
 956 <tr>
 957 <td>
 958                 <p>
 959                   <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
 960                 </p>
 961               </td>
 962 <td>
 963                 <p>
 964                   Matches any character that has the property Name.
 965                 </p>
 966               </td>
 967 <td>
 968                 <p>
 969                   <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
 970                 </p>
 971               </td>
 972 </tr>
 973 <tr>
 974 <td>
 975                 <p>
 976                   <code class="computeroutput"><span class="special">\</span><span class="identifier">PX</span></code>
 977                 </p>
 978               </td>
 979 <td>
 980                 <p>
 981                   Matches any character that does not have the property X.
 982                 </p>
 983               </td>
 984 <td>
 985                 <p>
 986                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">X</span><span class="special">:]]</span></code>
 987                 </p>
 988               </td>
 989 </tr>
 990 <tr>
 991 <td>
 992                 <p>
 993                   <code class="computeroutput"><span class="special">\</span><span class="identifier">P</span><span class="special">{</span><span class="identifier">Name</span><span class="special">}</span></code>
 994                 </p>
 995               </td>
 996 <td>
 997                 <p>
 998                   Matches any character that does not have the property Name.
 999                 </p>
1000               </td>
1001 <td>
1002                 <p>
1003                   <code class="computeroutput"><span class="special">[^[:</span><span class="identifier">Name</span><span class="special">:]]</span></code>
1004                 </p>
1005               </td>
1006 </tr>
1007 </tbody>
1008 </table></div>
1009 <p>
1010         For example <code class="literal">\pd</code> matches any "digit" character,
1011         as does <code class="literal">\p{digit}</code>.
1012       </p>
1013 <h6>
1014 <a name="boost_regex.syntax.perl_syntax.h23"></a>
1015         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.word_boundaries"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.word_boundaries">Word
1016         Boundaries</a>
1017       </h6>
1018 <p>
1019         The following escape sequences match the boundaries of words:
1020       </p>
1021 <p>
1022         <code class="literal">&lt;</code> Matches the start of a word.
1023       </p>
1024 <p>
1025         <code class="literal">&gt;</code> Matches the end of a word.
1026       </p>
1027 <p>
1028         <code class="literal">\b</code> Matches a word boundary (the start or end of a word).
1029       </p>
1030 <p>
1031         <code class="literal">\B</code> Matches only when not at a word boundary.
1032       </p>
1033 <h6>
1034 <a name="boost_regex.syntax.perl_syntax.h24"></a>
1035         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.buffer_boundaries"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.buffer_boundaries">Buffer
1036         boundaries</a>
1037       </h6>
1038 <p>
1039         The following match only at buffer boundaries: a "buffer" in this
1040         context is the whole of the input text that is being matched against (note
1041         that ^ and $ may match embedded newlines within the text).
1042       </p>
1043 <p>
1044         \` Matches at the start of a buffer only.
1045       </p>
1046 <p>
1047         \' Matches at the end of a buffer only.
1048       </p>
1049 <p>
1050         \A Matches at the start of a buffer only (the same as <code class="literal">\`</code>).
1051       </p>
1052 <p>
1053         \z Matches at the end of a buffer only (the same as <code class="literal">\'</code>).
1054       </p>
1055 <p>
1056         \Z Matches a zero-width assertion consisting of an optional sequence of newlines
1057         at the end of a buffer: equivalent to the regular expression <code class="literal">(?=\v*\z)</code>.
1058         Note that this is subtly different from Perl which behaves as if matching
1059         <code class="literal">(?=\n?\z)</code>.
1060       </p>
1061 <h6>
1062 <a name="boost_regex.syntax.perl_syntax.h25"></a>
1063         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.continuation_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.continuation_escape">Continuation
1064         Escape</a>
1065       </h6>
1066 <p>
1067         The sequence <code class="literal">\G</code> matches only at the end of the last match
1068         found, or at the start of the text being matched if no previous match was
1069         found. This escape useful if you're iterating over the matches contained
1070         within a text, and you want each subsequence match to start where the last
1071         one ended.
1072       </p>
1073 <h6>
1074 <a name="boost_regex.syntax.perl_syntax.h26"></a>
1075         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.quoting_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.quoting_escape">Quoting
1076         escape</a>
1077       </h6>
1078 <p>
1079         The escape sequence <code class="literal">\Q</code> begins a "quoted sequence":
1080         all the subsequent characters are treated as literals, until either the end
1081         of the regular expression or \E is found. For example the expression: <code class="literal">\Q*+\Ea+</code>
1082         would match either of:
1083       </p>
1084 <pre class="programlisting"><span class="special">\*+</span><span class="identifier">a</span>
1085 <span class="special">\*+</span><span class="identifier">aaa</span>
1086 </pre>
1087 <h6>
1088 <a name="boost_regex.syntax.perl_syntax.h27"></a>
1089         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.unicode_escapes"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.unicode_escapes">Unicode
1090         escapes</a>
1091       </h6>
1092 <p>
1093         <code class="literal">\C</code> Matches a single code point: in Boost regex this has
1094         exactly the same effect as a "." operator. <code class="literal">\X</code>
1095         Matches a combining character sequence: that is any non-combining character
1096         followed by a sequence of zero or more combining characters.
1097       </p>
1098 <h6>
1099 <a name="boost_regex.syntax.perl_syntax.h28"></a>
1100         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.matching_line_endings"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.matching_line_endings">Matching Line
1101         Endings</a>
1102       </h6>
1103 <p>
1104         The escape sequence <code class="literal">\R</code> matches any line ending character
1105         sequence, specifically it is identical to the expression <code class="literal">(?&gt;\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])</code>.
1106       </p>
1107 <h6>
1108 <a name="boost_regex.syntax.perl_syntax.h29"></a>
1109         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.keeping_back_some_text"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.keeping_back_some_text">Keeping back
1110         some text</a>
1111       </h6>
1112 <p>
1113         <code class="literal">\K</code> Resets the start location of $0 to the current text
1114         position: in other words everything to the left of \K is "kept back"
1115         and does not form part of the regular expression match. $` is updated accordingly.
1116       </p>
1117 <p>
1118         For example <code class="literal">foo\Kbar</code> matched against the text "foobar"
1119         would return the match "bar" for $0 and "foo" for $`.
1120         This can be used to simulate variable width lookbehind assertions.
1121       </p>
1122 <h6>
1123 <a name="boost_regex.syntax.perl_syntax.h30"></a>
1124         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.any_other_escape"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.any_other_escape">Any
1125         other escape</a>
1126       </h6>
1127 <p>
1128         Any other escape sequence matches the character that is escaped, for example
1129         \@ matches a literal '@'.
1130       </p>
1131 <h5>
1132 <a name="boost_regex.syntax.perl_syntax.h31"></a>
1133         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.perl_extended_patterns"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_extended_patterns">Perl Extended
1134         Patterns</a>
1135       </h5>
1136 <p>
1137         Perl-specific extensions to the regular expression syntax all start with
1138         <code class="literal">(?</code>.
1139       </p>
1140 <h6>
1141 <a name="boost_regex.syntax.perl_syntax.h32"></a>
1142         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.named_subexpressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions">Named
1143         Subexpressions</a>
1144       </h6>
1145 <p>
1146         You can create a named subexpression using:
1147       </p>
1148 <pre class="programlisting"><span class="special">(?&lt;</span><span class="identifier">NAME</span><span class="special">&gt;</span><span class="identifier">expression</span><span class="special">)</span>
1149 </pre>
1150 <p>
1151         Which can be then be referred to by the name <span class="emphasis"><em>NAME</em></span>. Alternatively
1152         you can delimit the name using 'NAME' as in:
1153       </p>
1154 <pre class="programlisting"><span class="special">(?</span><span class="char">'NAME'</span><span class="identifier">expression</span><span class="special">)</span>
1155 </pre>
1156 <p>
1157         These named subexpressions can be referred to in a backreference using either
1158         <code class="literal">\g{NAME}</code> or <code class="literal">\k&lt;NAME&gt;</code> and can
1159         also be referred to by name in a <a class="link" href="../format/perl_format.html" title="Perl Format String Syntax">Perl</a>
1160         format string for search and replace operations, or in the <a class="link" href="../ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> member functions.
1161       </p>
1162 <h6>
1163 <a name="boost_regex.syntax.perl_syntax.h33"></a>
1164         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.comments"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.comments">Comments</a>
1165       </h6>
1166 <p>
1167         <code class="literal">(?# ... )</code> is treated as a comment, it's contents are ignored.
1168       </p>
1169 <h6>
1170 <a name="boost_regex.syntax.perl_syntax.h34"></a>
1171         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.modifiers"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.modifiers">Modifiers</a>
1172       </h6>
1173 <p>
1174         <code class="literal">(?imsx-imsx ... )</code> alters which of the perl modifiers are
1175         in effect within the pattern, changes take effect from the point that the
1176         block is first seen and extend to any enclosing <code class="literal">)</code>. Letters
1177         before a '-' turn that perl modifier on, letters afterward, turn it off.
1178       </p>
1179 <p>
1180         <code class="literal">(?imsx-imsx:pattern)</code> applies the specified modifiers to
1181         pattern only.
1182       </p>
1183 <h6>
1184 <a name="boost_regex.syntax.perl_syntax.h35"></a>
1185         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.non_marking_groups"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_groups">Non-marking
1186         groups</a>
1187       </h6>
1188 <p>
1189         <code class="literal">(?:pattern)</code> lexically groups pattern, without generating
1190         an additional sub-expression.
1191       </p>
1192 <h6>
1193 <a name="boost_regex.syntax.perl_syntax.h36"></a>
1194         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.branch_reset"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.branch_reset">Branch
1195         reset</a>
1196       </h6>
1197 <p>
1198         <code class="literal">(?|pattern)</code> resets the subexpression count at the start
1199         of each "|" alternative within <span class="emphasis"><em>pattern</em></span>.
1200       </p>
1201 <p>
1202         The sub-expression count following this construct is that of whichever branch
1203         had the largest number of sub-expressions. This construct is useful when
1204         you want to capture one of a number of alternative matches in a single sub-expression
1205         index.
1206       </p>
1207 <p>
1208         In the following example the index of each sub-expression is shown below
1209         the expression:
1210       </p>
1211 <pre class="programlisting"># before  ---------------branch-reset----------- after
1212 / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1213 # 1            2         2  3        2     3     4
1214 </pre>
1215 <h6>
1216 <a name="boost_regex.syntax.perl_syntax.h37"></a>
1217         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.lookahead"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookahead">Lookahead</a>
1218       </h6>
1219 <p>
1220         <code class="literal">(?=pattern)</code> consumes zero characters, only if pattern
1221         matches.
1222       </p>
1223 <p>
1224         <code class="literal">(?!pattern)</code> consumes zero characters, only if pattern
1225         does not match.
1226       </p>
1227 <p>
1228         Lookahead is typically used to create the logical AND of two regular expressions,
1229         for example if a password must contain a lower case letter, an upper case
1230         letter, a punctuation symbol, and be at least 6 characters long, then the
1231         expression:
1232       </p>
1233 <pre class="programlisting"><span class="special">(?=.*[[:</span><span class="identifier">lower</span><span class="special">:]])(?=.*[[:</span><span class="identifier">upper</span><span class="special">:]])(?=.*[[:</span><span class="identifier">punct</span><span class="special">:]]).{</span><span class="number">6</span><span class="special">,}</span>
1234 </pre>
1235 <p>
1236         could be used to validate the password.
1237       </p>
1238 <h6>
1239 <a name="boost_regex.syntax.perl_syntax.h38"></a>
1240         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.lookbehind"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookbehind">Lookbehind</a>
1241       </h6>
1242 <p>
1243         <code class="literal">(?&lt;=pattern)</code> consumes zero characters, only if pattern
1244         could be matched against the characters preceding the current position (pattern
1245         must be of fixed length).
1246       </p>
1247 <p>
1248         <code class="literal">(?&lt;!pattern)</code> consumes zero characters, only if pattern
1249         could not be matched against the characters preceding the current position
1250         (pattern must be of fixed length).
1251       </p>
1252 <h6>
1253 <a name="boost_regex.syntax.perl_syntax.h39"></a>
1254         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.independent_sub_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.independent_sub_expressions">Independent
1255         sub-expressions</a>
1256       </h6>
1257 <p>
1258         <code class="literal">(?&gt;pattern)</code> <span class="emphasis"><em>pattern</em></span> is matched
1259         independently of the surrounding patterns, the expression will never backtrack
1260         into <span class="emphasis"><em>pattern</em></span>. Independent sub-expressions are typically
1261         used to improve performance; only the best possible match for pattern will
1262         be considered, if this doesn't allow the expression as a whole to match then
1263         no match is found at all.
1264       </p>
1265 <h6>
1266 <a name="boost_regex.syntax.perl_syntax.h40"></a>
1267         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.recursive_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions">Recursive
1268         Expressions</a>
1269       </h6>
1270 <p>
1271         <code class="literal">(?<span class="emphasis"><em>N</em></span>) (?-<span class="emphasis"><em>N</em></span>) (?+<span class="emphasis"><em>N</em></span>)
1272         (?R) (?0) (?&amp;NAME)</code>
1273       </p>
1274 <p>
1275         <code class="literal">(?R)</code> and <code class="literal">(?0)</code> recurse to the start
1276         of the entire pattern.
1277       </p>
1278 <p>
1279         <code class="literal">(?<span class="emphasis"><em>N</em></span>)</code> executes sub-expression <span class="emphasis"><em>N</em></span>
1280         recursively, for example <code class="literal">(?2)</code> will recurse to sub-expression
1281         2.
1282       </p>
1283 <p>
1284         <code class="literal">(?-<span class="emphasis"><em>N</em></span>)</code> and <code class="literal">(?+<span class="emphasis"><em>N</em></span>)</code>
1285         are relative recursions, so for example <code class="literal">(?-1)</code> recurses
1286         to the last sub-expression to be declared, and <code class="literal">(?+1)</code> recurses
1287         to the next sub-expression to be declared.
1288       </p>
1289 <p>
1290         <code class="literal">(?&amp;NAME)</code> recurses to named sub-expression <span class="emphasis"><em>NAME</em></span>.
1291       </p>
1292 <h6>
1293 <a name="boost_regex.syntax.perl_syntax.h41"></a>
1294         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.conditional_expressions"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.conditional_expressions">Conditional
1295         Expressions</a>
1296       </h6>
1297 <p>
1298         <code class="literal">(?(condition)yes-pattern|no-pattern)</code> attempts to match
1299         <span class="emphasis"><em>yes-pattern</em></span> if the <span class="emphasis"><em>condition</em></span> is
1300         true, otherwise attempts to match <span class="emphasis"><em>no-pattern</em></span>.
1301       </p>
1302 <p>
1303         <code class="literal">(?(condition)yes-pattern)</code> attempts to match <span class="emphasis"><em>yes-pattern</em></span>
1304         if the <span class="emphasis"><em>condition</em></span> is true, otherwise matches the NULL
1305         string.
1306       </p>
1307 <p>
1308         <span class="emphasis"><em>condition</em></span> may be either: a forward lookahead assert,
1309         the index of a marked sub-expression (the condition becomes true if the sub-expression
1310         has been matched), or an index of a recursion (the condition become true
1311         if we are executing directly inside the specified recursion).
1312       </p>
1313 <p>
1314         Here is a summary of the possible predicates:
1315       </p>
1316 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
1317 <li class="listitem">
1318             <code class="literal">(?(?=assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
1319             if the forward look-ahead assert matches, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
1320           </li>
1321 <li class="listitem">
1322             <code class="literal">(?(?!assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
1323             if the forward look-ahead assert does not match, otherwise executes
1324             <span class="emphasis"><em>no-pattern</em></span>.
1325           </li>
1326 <li class="listitem">
1327             <code class="literal">(?(<span class="emphasis"><em>N</em></span>)yes-pattern|no-pattern)</code>
1328             Executes <span class="emphasis"><em>yes-pattern</em></span> if subexpression <span class="emphasis"><em>N</em></span>
1329             has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
1330           </li>
1331 <li class="listitem">
1332             <code class="literal">(?(&lt;<span class="emphasis"><em>name</em></span>&gt;)yes-pattern|no-pattern)</code>
1333             Executes <span class="emphasis"><em>yes-pattern</em></span> if named subexpression <span class="emphasis"><em>name</em></span>
1334             has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
1335           </li>
1336 <li class="listitem">
1337             <code class="literal">(?('<span class="emphasis"><em>name</em></span>')yes-pattern|no-pattern)</code>
1338             Executes <span class="emphasis"><em>yes-pattern</em></span> if named subexpression <span class="emphasis"><em>name</em></span>
1339             has been matched, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
1340           </li>
1341 <li class="listitem">
1342             <code class="literal">(?(R)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
1343             if we are executing inside a recursion, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
1344           </li>
1345 <li class="listitem">
1346             <code class="literal">(?(R<span class="emphasis"><em>N</em></span>)yes-pattern|no-pattern)</code>
1347             Executes <span class="emphasis"><em>yes-pattern</em></span> if we are executing inside
1348             a recursion to sub-expression <span class="emphasis"><em>N</em></span>, otherwise executes
1349             <span class="emphasis"><em>no-pattern</em></span>.
1350           </li>
1351 <li class="listitem">
1352             <code class="literal">(?(R&amp;<span class="emphasis"><em>name</em></span>)yes-pattern|no-pattern)</code>
1353             Executes <span class="emphasis"><em>yes-pattern</em></span> if we are executing inside
1354             a recursion to named sub-expression <span class="emphasis"><em>name</em></span>, otherwise
1355             executes <span class="emphasis"><em>no-pattern</em></span>.
1356           </li>
1357 <li class="listitem">
1358             <code class="literal">(?(DEFINE)never-exectuted-pattern)</code> Defines a block
1359             of code that is never executed and matches no characters: this is usually
1360             used to define one or more named sub-expressions which are referred to
1361             from elsewhere in the pattern.
1362           </li>
1363 </ul></div>
1364 <h6>
1365 <a name="boost_regex.syntax.perl_syntax.h42"></a>
1366         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.backtracking_control_verbs"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.backtracking_control_verbs">Backtracking
1367         Control Verbs</a>
1368       </h6>
1369 <p>
1370         This library has partial support for Perl's backtracking control verbs, in
1371         particular (*MARK) is not supported. There may also be detail differences
1372         in behaviour between this library and Perl, not least because Perl's behaviour
1373         is rather under-documented and often somewhat random in how it behaves in
1374         practice. The verbs supported are:
1375       </p>
1376 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
1377 <li class="listitem">
1378             <code class="literal">(*PRUNE)</code> Has no effect unless backtracked onto, in
1379             which case all the backtracking information prior to this point is discarded.
1380           </li>
1381 <li class="listitem">
1382             <code class="literal">(*SKIP)</code> Behaves the same as <code class="literal">(*PRUNE)</code>
1383             except that it is assumed that no match can possibly occur prior to the
1384             current point in the string being searched. This can be used to optimize
1385             searches by skipping over chunks of text that have already been determined
1386             can not form a match.
1387           </li>
1388 <li class="listitem">
1389             <code class="literal">(*THEN)</code> Has no effect unless backtracked onto, in
1390             which case all subsequent alternatives in a group of alternations are
1391             discarded.
1392           </li>
1393 <li class="listitem">
1394             <code class="literal">(*COMMIT)</code> Has no effect unless backtracked onto, in
1395             which case all subsequent matching/searching attempts are abandoned.
1396           </li>
1397 <li class="listitem">
1398             <code class="literal">(*FAIL)</code> Causes the match to fail unconditionally at
1399             this point, can be used to force the engine to backtrack.
1400           </li>
1401 <li class="listitem">
1402             <code class="literal">(*ACCEPT)</code> Causes the pattern to be considered matched
1403             at the current point. Any half-open sub-expressions are closed at the
1404             current point.
1405           </li>
1406 </ul></div>
1407 <h5>
1408 <a name="boost_regex.syntax.perl_syntax.h43"></a>
1409         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.operator_precedence"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.operator_precedence">Operator
1410         precedence</a>
1411       </h5>
1412 <p>
1413         The order of precedence for of operators is as follows:
1414       </p>
1415 <div class="orderedlist"><ol class="orderedlist" type="1">
1416 <li class="listitem">
1417             Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span>
1418             <span class="special">[::]</span> <span class="special">[..]</span></code>
1419           </li>
1420 <li class="listitem">
1421             Escaped characters <code class="literal">\</code>
1422           </li>
1423 <li class="listitem">
1424             Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code>
1425           </li>
1426 <li class="listitem">
1427             Grouping <code class="literal">()</code>
1428           </li>
1429 <li class="listitem">
1430             Single-character-ERE duplication <code class="literal">* + ? {m,n}</code>
1431           </li>
1432 <li class="listitem">
1433             Concatenation
1434           </li>
1435 <li class="listitem">
1436             Anchoring ^$
1437           </li>
1438 <li class="listitem">
1439             Alternation |
1440           </li>
1441 </ol></div>
1442 <h4>
1443 <a name="boost_regex.syntax.perl_syntax.h44"></a>
1444         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.what_gets_matched"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">What
1445         gets matched</a>
1446       </h4>
1447 <p>
1448         If you view the regular expression as a directed (possibly cyclic) graph,
1449         then the best match found is the first match found by a depth-first-search
1450         performed on that graph, while matching the input text.
1451       </p>
1452 <p>
1453         Alternatively:
1454       </p>
1455 <p>
1456         The best match found is the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost
1457         match</a>, with individual elements matched as follows;
1458       </p>
1459 <div class="informaltable"><table class="table">
1460 <colgroup>
1461 <col>
1462 <col>
1463 </colgroup>
1464 <thead><tr>
1465 <th>
1466                 <p>
1467                   Construct
1468                 </p>
1469               </th>
1470 <th>
1471                 <p>
1472                   What gets matched
1473                 </p>
1474               </th>
1475 </tr></thead>
1476 <tbody>
1477 <tr>
1478 <td>
1479                 <p>
1480                   <code class="literal">AtomA AtomB</code>
1481                 </p>
1482               </td>
1483 <td>
1484                 <p>
1485                   Locates the best match for <span class="emphasis"><em>AtomA</em></span> that has
1486                   a following match for <span class="emphasis"><em>AtomB</em></span>.
1487                 </p>
1488               </td>
1489 </tr>
1490 <tr>
1491 <td>
1492                 <p>
1493                   <code class="literal">Expression1 | Expression2</code>
1494                 </p>
1495               </td>
1496 <td>
1497                 <p>
1498                   If <span class="emphasis"><em>Expresion1</em></span> can be matched then returns
1499                   that match, otherwise attempts to match <span class="emphasis"><em>Expression2</em></span>.
1500                 </p>
1501               </td>
1502 </tr>
1503 <tr>
1504 <td>
1505                 <p>
1506                   <code class="literal">S{N}</code>
1507                 </p>
1508               </td>
1509 <td>
1510                 <p>
1511                   Matches <span class="emphasis"><em>S</em></span> repeated exactly N times.
1512                 </p>
1513               </td>
1514 </tr>
1515 <tr>
1516 <td>
1517                 <p>
1518                   <code class="literal">S{N,M}</code>
1519                 </p>
1520               </td>
1521 <td>
1522                 <p>
1523                   Matches S repeated between N and M times, and as many times as
1524                   possible.
1525                 </p>
1526               </td>
1527 </tr>
1528 <tr>
1529 <td>
1530                 <p>
1531                   <code class="literal">S{N,M}?</code>
1532                 </p>
1533               </td>
1534 <td>
1535                 <p>
1536                   Matches S repeated between N and M times, and as few times as possible.
1537                 </p>
1538               </td>
1539 </tr>
1540 <tr>
1541 <td>
1542                 <p>
1543                   <code class="literal">S?, S*, S+</code>
1544                 </p>
1545               </td>
1546 <td>
1547                 <p>
1548                   The same as <code class="literal">S{0,1}</code>, <code class="literal">S{0,UINT_MAX}</code>,
1549                   <code class="literal">S{1,UINT_MAX}</code> respectively.
1550                 </p>
1551               </td>
1552 </tr>
1553 <tr>
1554 <td>
1555                 <p>
1556                   <code class="literal">S??, S*?, S+?</code>
1557                 </p>
1558               </td>
1559 <td>
1560                 <p>
1561                   The same as <code class="literal">S{0,1}?</code>, <code class="literal">S{0,UINT_MAX}?</code>,
1562                   <code class="literal">S{1,UINT_MAX}?</code> respectively.
1563                 </p>
1564               </td>
1565 </tr>
1566 <tr>
1567 <td>
1568                 <p>
1569                   <code class="literal">(?&gt;S)</code>
1570                 </p>
1571               </td>
1572 <td>
1573                 <p>
1574                   Matches the best match for <span class="emphasis"><em>S</em></span>, and only that.
1575                 </p>
1576               </td>
1577 </tr>
1578 <tr>
1579 <td>
1580                 <p>
1581                   <code class="literal">(?=S), (?&lt;=S)</code>
1582                 </p>
1583               </td>
1584 <td>
1585                 <p>
1586                   Matches only the best match for <span class="emphasis"><em>S</em></span> (this is
1587                   only visible if there are capturing parenthesis within <span class="emphasis"><em>S</em></span>).
1588                 </p>
1589               </td>
1590 </tr>
1591 <tr>
1592 <td>
1593                 <p>
1594                   <code class="literal">(?!S), (?&lt;!S)</code>
1595                 </p>
1596               </td>
1597 <td>
1598                 <p>
1599                   Considers only whether a match for S exists or not.
1600                 </p>
1601               </td>
1602 </tr>
1603 <tr>
1604 <td>
1605                 <p>
1606                   <code class="literal">(?(condition)yes-pattern | no-pattern)</code>
1607                 </p>
1608               </td>
1609 <td>
1610                 <p>
1611                   If condition is true, then only yes-pattern is considered, otherwise
1612                   only no-pattern is considered.
1613                 </p>
1614               </td>
1615 </tr>
1616 </tbody>
1617 </table></div>
1618 <h4>
1619 <a name="boost_regex.syntax.perl_syntax.h45"></a>
1620         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.variations"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.variations">Variations</a>
1621       </h4>
1622 <p>
1623         The <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">options
1624         <code class="literal">normal</code>, <code class="literal">ECMAScript</code>, <code class="literal">JavaScript</code>
1625         and <code class="literal">JScript</code></a> are all synonyms for <code class="literal">perl</code>.
1626       </p>
1627 <h4>
1628 <a name="boost_regex.syntax.perl_syntax.h46"></a>
1629         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.options"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.options">Options</a>
1630       </h4>
1631 <p>
1632         There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">variety
1633         of flags</a> that may be combined with the <code class="literal">perl</code> option
1634         when constructing the regular expression, in particular note that the <code class="literal">newline_alt</code>
1635         option alters the syntax, while the <code class="literal">collate</code>, <code class="literal">nosubs</code>
1636         and <code class="literal">icase</code> options modify how the case and locale sensitivity
1637         are to be applied.
1638       </p>
1639 <h4>
1640 <a name="boost_regex.syntax.perl_syntax.h47"></a>
1641         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.pattern_modifiers"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.pattern_modifiers">Pattern
1642         Modifiers</a>
1643       </h4>
1644 <p>
1645         The perl <code class="literal">smix</code> modifiers can either be applied using a
1646         <code class="literal">(?smix-smix)</code> prefix to the regular expression, or with
1647         one of the <a class="link" href="../ref/syntax_option_type/syntax_option_type_perl.html" title="Options for Perl Regular Expressions">regex-compile
1648         time flags <code class="literal">no_mod_m</code>, <code class="literal">mod_x</code>, <code class="literal">mod_s</code>,
1649         and <code class="literal">no_mod_s</code></a>.
1650       </p>
1651 <h4>
1652 <a name="boost_regex.syntax.perl_syntax.h48"></a>
1653         <span class="phrase"><a name="boost_regex.syntax.perl_syntax.references"></a></span><a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.references">References</a>
1654       </h4>
1655 <p>
1656         <a href="http://perldoc.perl.org/perlre.html" target="_top">Perl 5.8</a>.
1657       </p>
1658 </div>
1659 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
1660 <td align="left"></td>
1661 <td align="right"><div class="copyright-footer">Copyright &#169; 1998-2013 John Maddock<p>
1662         Distributed under the Boost Software License, Version 1.0. (See accompanying
1663         file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
1664       </p>
1665 </div></td>
1666 </tr></table>
1667 <hr>
1668 <div class="spirit-nav">
1669 <a accesskey="p" href="../syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_extended.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
1670 </div>
1671 </body>
1672 </html>