ceph/src/boost/libs/regex/doc/syntax_perl.qbk

   1 [/
   2   Copyright 2006-2007 John Maddock.
   3   Distributed under the Boost Software License, Version 1.0.
   4   (See accompanying file LICENSE_1_0.txt or copy at
   5   http://www.boost.org/LICENSE_1_0.txt).
   6 ]
   7
   8
   9 [section:perl_syntax Perl Regular Expression Syntax]
  10
  11 [h3 Synopsis]
  12
  13 The Perl regular expression syntax is based on that used by the
  14 programming language Perl .  Perl regular expressions are the
  15 default behavior in Boost.Regex or you can pass the flag [^perl] to the
  16 [basic_regex] constructor, for example:
  17
  18    // e1 is a case sensitive Perl regular expression:
  19    // since Perl is the default option there's no need to explicitly specify the syntax used here:
  20    boost::regex e1(my_expression);
  21    // e2 a case insensitive Perl regular expression:
  22    boost::regex e2(my_expression, boost::regex::perl|boost::regex::icase);
  23
  24 [h3 Perl Regular Expression Syntax]
  25
  26 In Perl regular expressions, all characters match themselves except for the
  27 following special characters:
  28
  29 [pre .\[{}()\\\*+?|^$]
  30
  31 [h4 Wildcard]
  32
  33 The single character '.' when used outside of a character set will match
  34 any single character except:
  35
  36 * The NULL character when the [link boost_regex.ref.match_flag_type flag
  37    [^match_not_dot_null]] is passed to the matching algorithms.
  38 * The newline character when the [link boost_regex.ref.match_flag_type
  39    flag [^match_not_dot_newline]] is passed to
  40    the matching algorithms.
  41
  42 [h4 Anchors]
  43
  44 A '^' character shall match the start of a line.
  45
  46 A '$' character shall match the end of a line.
  47
  48 [h4 Marked sub-expressions]
  49
  50 A section beginning [^(] and ending [^)] acts as a marked sub-expression.
  51 Whatever matched the sub-expression is split out in a separate field by
  52 the matching algorithms.  Marked sub-expressions can also repeated, or
  53 referred to by a back-reference.
  54
  55 [h4 Non-marking grouping]
  56
  57 A marked sub-expression is useful to lexically group part of a regular
  58 expression, but has the side-effect of spitting out an extra field in
  59 the result.  As an alternative you can lexically group part of a
  60 regular expression, without generating a marked sub-expression by using
  61 [^(?:] and [^)] , for example [^(?:ab)+] will repeat [^ab] without splitting
  62 out any separate sub-expressions.
  63
  64 [h4 Repeats]
  65
  66 Any atom (a single character, a marked sub-expression, or a character class)
  67 can be repeated with the [^*], [^+], [^?], and [^{}] operators.
  68
  69 The [^*] operator will match the preceding atom zero or more times,
  70 for example the expression [^a*b] will match any of the following:
  71
  72    b
  73    ab
  74    aaaaaaaab
  75
  76 The [^+] operator will match the preceding atom one or more times, for
  77 example the expression [^a+b] will match any of the following:
  78
  79    ab
  80    aaaaaaaab
  81
  82 But will not match:
  83
  84    b
  85
  86 The [^?] operator will match the preceding atom zero or one times, for
  87 example the expression ca?b will match any of the following:
  88
  89    cb
  90    cab
  91
  92 But will not match:
  93
  94    caab
  95
  96 An atom can also be repeated with a bounded repeat:
  97
  98 [^a{n}]  Matches 'a' repeated exactly n times.
  99
 100 [^a{n,}]  Matches 'a' repeated n or more times.
 101
 102 [^a{n, m}]  Matches 'a' repeated between n and m times inclusive.
 103
 104 For example:
 105
 106 [pre ^a{2,3}$]
 107
 108 Will match either of:
 109
 110    aa
 111    aaa
 112
 113 But neither of:
 114
 115    a
 116    aaaa
 117
 118 Note that the "{" and "}" characters will treated as ordinary literals when used
 119 in a context that is not a repeat: this matches Perl 5.x behavior.  For example in
 120 the expressions "ab{1", "ab1}" and "a{b}c" the curly brackets are all treated as
 121 literals and ['no error will be raised].
 122
 123 It is an error to use a repeat operator, if the preceding construct can not
 124 be repeated, for example:
 125
 126    a(*)
 127
 128 Will raise an error, as there is nothing for the [^*] operator to be applied to.
 129
 130 [h4 Non greedy repeats]
 131
 132 The normal repeat operators are "greedy", that is to say they will consume as
 133 much input as possible.  There are non-greedy versions available that will
 134 consume as little input as possible while still producing a match.
 135
 136 [^*?] Matches the previous atom zero or more times, while consuming as little
 137    input as possible.
 138
 139 [^+?] Matches the previous atom one or more times, while consuming as
 140    little input as possible.
 141
 142 [^??] Matches the previous atom zero or one times, while consuming
 143    as little input as possible.
 144
 145 [^{n,}?] Matches the previous atom n or more times, while consuming as
 146    little input as possible.
 147
 148 [^{n,m}?] Matches the previous atom between n and m times, while
 149    consuming as little input as possible.
 150
 151 [h4 Possessive repeats]
 152
 153 By default when a repeated pattern does not match then the engine will backtrack until
 154 a match is found.  However, this behaviour can sometime be undesireble so there are
 155 also "possessive" repeats: these match as much as possible and do not then allow
 156 backtracking if the rest of the expression fails to match.
 157
 158 [^*+] Matches the previous atom zero or more times, while giving nothing back.
 159
 160 [^++] Matches the previous atom one or more times, while giving nothing back.
 161
 162 [^?+] Matches the previous atom zero or one times, while giving nothing back.
 163
 164 [^{n,}+] Matches the previous atom n or more times, while giving nothing back.
 165
 166 [^{n,m}+] Matches the previous atom between n and m times, while giving nothing back.
 167
 168 [h4 Back references]
 169
 170 An escape character followed by a digit /n/, where /n/ is in the range 1-9,
 171 matches the same string that was matched by sub-expression /n/.  For example
 172 the expression:
 173
 174 [pre ^(a\*).\*\\1$]
 175
 176 Will match the string:
 177
 178    aaabbaaa
 179
 180 But not the string:
 181
 182    aaabba
 183
 184 You can also use the \g escape for the same function, for example:
 185
 186 [table
 187 [[Escape][Meaning]]
 188 [[[^\g1]][Match whatever matched sub-expression 1]]
 189 [[[^\g{1}]][Match whatever matched sub-expression 1: this form allows for safer
 190         parsing of the expression in cases like [^\g{1}2] or for indexes higher than 9 as in [^\g{1234}]]]
 191 [[[^\g-1]][Match whatever matched the last opened sub-expression]]
 192 [[[^\g{-2}]][Match whatever matched the last but one opened sub-expression]]
 193 [[[^\g{one}]][Match whatever matched the sub-expression named "one"]]
 194 ]
 195
 196 Finally the \k escape can be used to refer to named subexpressions, for example [^\k<two>] will match
 197 whatever matched the subexpression named "two".
 198
 199 [h4 Alternation]
 200
 201 The [^|] operator will match either of its arguments, so for example:
 202 [^abc|def] will match either "abc" or "def".
 203
 204 Parenthesis can be used to group alternations, for example: [^ab(d|ef)]
 205 will match either of "abd" or "abef".
 206
 207 Empty alternatives are not allowed (these are almost always a mistake), but
 208 if you really want an empty alternative use [^(?:)] as a placeholder, for example:
 209
 210 [^|abc] is not a valid expression, but
 211
 212 [^(?:)|abc] is and is equivalent, also the expression:
 213
 214 [^(?:abc)??] has exactly the same effect.
 215
 216 [h4 Character sets]
 217
 218 A character set is a bracket-expression starting with [^[] and ending with [^]],
 219 it defines a set of characters, and matches any single character that is a
 220 member of that set.
 221
 222 A bracket expression may contain any combination of the following:
 223
 224 [h5 Single characters]
 225
 226 For example [^\[abc\]], will match any of the characters 'a', 'b', or 'c'.
 227
 228 [h5 Character ranges]
 229
 230 For example [^\[a-c\]] will match any single character in the range 'a' to 'c'.
 231 By default, for Perl regular expressions, a character x is within the
 232 range y to z, if the code point of the character lies within the codepoints of
 233 the endpoints of the range.  Alternatively, if you set the
 234 [link boost_regex.ref.syntax_option_type.syntax_option_type_perl [^collate] flag]
 235 when constructing the regular expression, then ranges are locale sensitive.
 236
 237 [h5 Negation]
 238
 239 If the bracket-expression begins with the ^ character, then it matches the
 240 complement of the characters it contains, for example [^\[^a-c\]] matches
 241 any character that is not in the range [^a-c].
 242
 243 [h5 Character classes]
 244
 245 An expression of the form [^\[\[:name:\]\]] matches the named character class
 246 "name", for example [^\[\[:lower:\]\]] matches any lower case character.
 247 See [link boost_regex.syntax.character_classes character class names].
 248
 249 [h5 Collating Elements]
 250
 251 An expression of the form [^\[\[.col.\]\]] matches the collating element /col/.
 252 A collating element is any single character, or any sequence of characters
 253 that collates as a single unit.  Collating elements may also be used
 254 as the end point of a range, for example: [^\[\[.ae.\]-c\]] matches the
 255 character sequence "ae", plus any single character in the range "ae"-c,
 256 assuming that "ae" is treated as a single collating element in the current locale.
 257
 258 As an extension, a collating element may also be specified via it's
 259 [link boost_regex.syntax.collating_names symbolic name], for example:
 260
 261    [[.NUL.]]
 262
 263 matches a [^\0] character.
 264
 265 [h5 Equivalence classes]
 266
 267 An expression of the form [^\[\[\=col\=\]\]], matches any character or collating element
 268 whose primary sort key is the same as that for collating element /col/, as with
 269 collating elements the name /col/ may be a
 270 [link boost_regex.syntax.collating_names symbolic name].  A primary sort key is
 271 one that ignores case, accentation, or locale-specific tailorings; so for
 272 example `[[=a=]]` matches any of the characters:
 273 a, '''&#xC0;''', '''&#xC1;''', '''&#xC2;''',
 274 '''&#xC3;''', '''&#xC4;''', '''&#xC5;''', A, '''&#xE0;''', '''&#xE1;''',
 275 '''&#xE2;''', '''&#xE3;''', '''&#xE4;''' and '''&#xE5;'''.
 276 Unfortunately implementation of this is reliant on the platform's collation
 277 and localisation support; this feature can not be relied upon to work portably
 278 across all platforms, or even all locales on one platform.
 279
 280 [h5 Escaped Characters]
 281
 282 All the escape sequences that match a single character, or a single character
 283 class are permitted within a character class definition.  For example
 284 `[\[\]]` would match either of `[` or `]` while `[\W\d]` would match any character
 285 that is either a "digit", /or/ is /not/ a "word" character.
 286
 287 [h5 Combinations]
 288
 289 All of the above can be combined in one character set declaration, for example:
 290 [^\[\[:digit:\]a-c\[.NUL.\]\]].
 291
 292 [h4 Escapes]
 293
 294 Any special character preceded by an escape shall match itself.
 295
 296 The following escape sequences are all synonyms for single characters:
 297
 298 [table
 299 [[Escape][Character]]
 300 [[[^\a]][[^\a]]]
 301 [[[^\e]][[^0x1B]]]
 302 [[[^\f]][[^\f]]]
 303 [[[^\n]][[^\n]]]
 304 [[[^\r]][[^\r]]]
 305 [[[^\t]][[^\t]]]
 306 [[[^\v]][[^\v]]]
 307 [[[^\b]][[^\b] (but only inside a character class declaration).]]
 308 [[[^\cX]][An ASCII escape sequence - the character whose code point is X % 32]]
 309 [[[^\xdd]][A hexadecimal escape sequence - matches the single character whose
 310       code point is 0xdd.]]
 311 [[[^\x{dddd}]][A hexadecimal escape sequence - matches the single character whose
 312       code point is 0xdddd.]]
 313 [[[^\0ddd]][An octal escape sequence - matches the single character whose
 314    code point is 0ddd.]]
 315 [[[^\N{name}]][Matches the single character which has the
 316       [link boost_regex.syntax.collating_names symbolic name] /name/.
 317       For example [^\N{newline}] matches the single character \\n.]]
 318 ]
 319
 320 [h5 "Single character" character classes:]
 321
 322 Any escaped character /x/, if /x/ is the name of a character class shall
 323 match any character that is a member of that class, and any
 324 escaped character /X/, if /x/ is the name of a character class, shall
 325 match any character not in that class.
 326
 327 The following are supported by default:
 328
 329 [table
 330 [[Escape sequence][Equivalent to]]
 331 [[`\d`][`[[:digit:]]`]]
 332 [[`\l`][`[[:lower:]]`]]
 333 [[`\s`][`[[:space:]]`]]
 334 [[`\u`][`[[:upper:]]`]]
 335 [[`\w`][`[[:word:]]`]]
 336 [[`\h`][Horizontal whitespace]]
 337 [[`\v`][Vertical whitespace]]
 338 [[`\D`][`[^[:digit:]]`]]
 339 [[`\L`][`[^[:lower:]]`]]
 340 [[`\S`][`[^[:space:]]`]]
 341 [[`\U`][`[^[:upper:]]`]]
 342 [[`\W`][`[^[:word:]]`]]
 343 [[`\H`][Not Horizontal whitespace]]
 344 [[`\V`][Not Vertical whitespace]]
 345 ]
 346
 347 [h5 Character Properties]
 348
 349 The character property names in the following table are all equivalent
 350 to the [link boost_regex.syntax.character_classes names used in character classes].
 351
 352 [table
 353 [[Form][Description][Equivalent character set form]]
 354 [[`\pX`][Matches any character that has the property X.][`[[:X:]]`]]
 355 [[`\p{Name}`][Matches any character that has the property Name.][`[[:Name:]]`]]
 356 [[`\PX`][Matches any character that does not have the property X.][`[^[:X:]]`]]
 357 [[`\P{Name}`][Matches any character that does not have the property Name.][`[^[:Name:]]`]]
 358 ]
 359
 360 For example [^\pd] matches any "digit" character, as does [^\p{digit}].
 361
 362 [h5 Word Boundaries]
 363
 364 The following escape sequences match the boundaries of words:
 365
 366 [^\<]   Matches the start of a word.
 367
 368 [^\>]   Matches the end of a word.
 369
 370 [^\b]   Matches a word boundary (the start or end of a word).
 371
 372 [^\B]   Matches only when not at a word boundary.
 373
 374 [h5 Buffer boundaries]
 375
 376 The following match only at buffer boundaries: a "buffer" in this
 377 context is the whole of the input text that is being matched against
 378 (note that ^ and $ may match embedded newlines within the text).
 379
 380 \\\`    Matches at the start of a buffer only.
 381
 382 \\'     Matches at the end of a buffer only.
 383
 384 \\A     Matches at the start of a buffer only (the same as [^\\\`]).
 385
 386 \\z     Matches at the end of a buffer only (the same as [^\\']).
 387
 388 \\Z     Matches a zero-width assertion consisting of an optional sequence of newlines at the end of a buffer:
 389 equivalent to the regular expression [^(?=\\v*\\z)].  Note that this is subtly different from Perl which
 390 behaves as if matching [^(?=\\n?\\z)].
 391
 392 [h5 Continuation Escape]
 393
 394 The sequence [^\G] matches only at the end of the last match found, or at
 395 the start of the text being matched if no previous match was found.
 396 This escape useful if you're iterating over the matches contained within a
 397 text, and you want each subsequence match to start where the last one ended.
 398
 399 [h5 Quoting escape]
 400
 401 The escape sequence [^\Q] begins a "quoted sequence": all the subsequent characters
 402 are treated as literals, until either the end of the regular expression or \\E
 403 is found.  For example the expression: [^\Q\*+\Ea+] would match either of:
 404
 405     \*+a
 406     \*+aaa
 407
 408 [h5 Unicode escapes]
 409
 410 [^\C]   Matches a single code point: in Boost regex this has exactly the
 411    same effect as a "." operator.
 412 [^\X]   Matches a combining character sequence: that is any non-combining
 413       character followed by a sequence of zero or more combining characters.
 414
 415 [h5 Matching Line Endings]
 416
 417 The escape sequence [^\R] matches any line ending character sequence, specifically it is identical to
 418 the expression [^(?>\x0D\x0A?|\[\x0A-\x0C\x85\x{2028}\x{2029}\])].
 419
 420 [h5 Keeping back some text]
 421
 422 [^\K] Resets the start location of $0 to the current text position: in other words everything to the
 423 left of \K is "kept back" and does not form part of the regular expression match. $` is updated
 424 accordingly.
 425
 426 For example [^foo\Kbar] matched against the text "foobar" would return the match "bar" for $0 and "foo"
 427 for $`.  This can be used to simulate variable width lookbehind assertions.
 428
 429 [h5 Any other escape]
 430
 431 Any other escape sequence matches the character that is escaped, for example
 432 \\@ matches a literal '@'.
 433
 434 [h4 Perl Extended Patterns]
 435
 436 Perl-specific extensions to the regular expression syntax all start with [^(?].
 437
 438 [h5 Named Subexpressions]
 439
 440 You can create a named subexpression using:
 441
 442         (?<NAME>expression)
 443
 444 Which can be then be referred to by the name /NAME/.  Alternatively you can delimit the name
 445 using 'NAME' as in:
 446
 447         (?'NAME'expression)
 448
 449 These named subexpressions can be referred to in a backreference using either [^\g{NAME}] or [^\k<NAME>]
 450 and can also be referred to by name in a [perl_format] format string for search and replace operations, or in the
 451 [match_results] member functions.
 452
 453 [h5 Comments]
 454
 455 [^(?# ... )] is treated as a comment, it's contents are ignored.
 456
 457 [h5 Modifiers]
 458
 459 [^(?imsx-imsx ... )] alters which of the perl modifiers are in effect within
 460 the pattern, changes take effect from the point that the block is first seen
 461 and extend to any enclosing [^)].  Letters before a '-' turn that perl
 462 modifier on, letters afterward, turn it off.
 463
 464 [^(?imsx-imsx:pattern)] applies the specified modifiers to pattern only.
 465
 466 [h5 Non-marking groups]
 467
 468 [^(?:pattern)] lexically groups pattern, without generating an additional
 469 sub-expression.
 470
 471 [h5 Branch reset]
 472
 473 [^(?|pattern)]  resets the subexpression count at the start of each "|" alternative within /pattern/.
 474
 475 The sub-expression count following this construct is that of whichever branch had the largest number of
 476 sub-expressions.  This construct is useful when you want to capture one of a number of alternative matches
 477 in a single sub-expression index.
 478
 479 In the following example the index of each sub-expression is shown below the expression:
 480
 481 [pre
 482 # before  ---------------branch-reset----------- after
 483 / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
 484 # 1            2         2  3        2     3     4
 485 ]
 486
 487 [h5 Lookahead]
 488
 489 [^(?=pattern)] consumes zero characters, only if pattern matches.
 490
 491 [^(?!pattern)] consumes zero characters, only if pattern does not match.
 492
 493 Lookahead is typically used to create the logical AND of two regular
 494 expressions, for example if a password must contain a lower case letter,
 495 an upper case letter, a punctuation symbol, and be at least 6 characters long,
 496 then the expression:
 497
 498     (?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}
 499
 500 could be used to validate the password.
 501
 502 [h5 Lookbehind]
 503
 504 [^(?<=pattern)] consumes zero characters, only if pattern could be matched
 505 against the characters preceding the current position (pattern must be
 506 of fixed length).
 507
 508 [^(?<!pattern)] consumes zero characters, only if pattern could not be
 509 matched against the characters preceding the current position (pattern must
 510 be of fixed length).
 511
 512 [h5 Independent sub-expressions]
 513
 514 [^(?>pattern)] /pattern/ is matched independently of the surrounding patterns,
 515 the expression will never backtrack into /pattern/.  Independent sub-expressions
 516 are typically used to improve performance; only the best possible match
 517 for pattern will be considered, if this doesn't allow the expression as a
 518 whole to match then no match is found at all.
 519
 520 [h5 Recursive Expressions]
 521
 522 [^(?['N]) (?-['N]) (?+['N]) (?R) (?0) (?&NAME)]
 523
 524 [^(?R)] and [^(?0)] recurse to the start of the entire pattern.
 525
 526 [^(?['N])] executes sub-expression /N/ recursively, for example [^(?2)] will recurse to sub-expression 2.
 527
 528 [^(?-['N])] and [^(?+['N])] are relative recursions, so for example [^(?-1)] recurses to the last sub-expression to be declared,
 529 and [^(?+1)] recurses to the next sub-expression to be declared.
 530
 531 [^(?&NAME)] recurses to named sub-expression ['NAME].
 532
 533 [h5 Conditional Expressions]
 534
 535 [^(?(condition)yes-pattern|no-pattern)] attempts to match /yes-pattern/ if
 536 the /condition/ is true, otherwise attempts to match /no-pattern/.
 537
 538 [^(?(condition)yes-pattern)] attempts to match /yes-pattern/ if the /condition/
 539 is true, otherwise matches the NULL string.
 540
 541 /condition/ may be either: a forward lookahead assert, the index of
 542 a marked sub-expression (the condition becomes true if the sub-expression
 543 has been matched), or an index of a recursion (the condition become true if we are executing
 544 directly inside the specified recursion).
 545
 546 Here is a summary of the possible predicates:
 547
 548 * [^(?(?\=assert)yes-pattern|no-pattern)]  Executes /yes-pattern/ if the forward look-ahead assert matches, otherwise
 549 executes /no-pattern/.
 550 * [^(?(?!assert)yes-pattern|no-pattern)]  Executes /yes-pattern/ if the forward look-ahead assert does not match, otherwise
 551 executes /no-pattern/.
 552 * [^(?(['N])yes-pattern|no-pattern)]  Executes /yes-pattern/ if subexpression /N/ has been matched, otherwise
 553 executes /no-pattern/.
 554 * [^(?(<['name]>)yes-pattern|no-pattern)]  Executes /yes-pattern/ if named subexpression /name/ has been matched, otherwise
 555 executes /no-pattern/.
 556 * [^(?('['name]')yes-pattern|no-pattern)]  Executes /yes-pattern/ if named subexpression /name/ has been matched, otherwise
 557 executes /no-pattern/.
 558 * [^(?(R)yes-pattern|no-pattern)]  Executes /yes-pattern/ if we are executing inside a recursion, otherwise
 559 executes /no-pattern/.
 560 * [^(?(R['N])yes-pattern|no-pattern)]  Executes /yes-pattern/ if we are executing inside a recursion to sub-expression /N/, otherwise
 561 executes /no-pattern/.
 562 * [^(?(R&['name])yes-pattern|no-pattern)]  Executes /yes-pattern/ if we are executing inside a recursion to named sub-expression /name/, otherwise
 563 executes /no-pattern/.
 564 * [^(?(DEFINE)never-exectuted-pattern)]  Defines a block of code that is never executed and matches no characters:
 565 this is usually used to define one or more named sub-expressions which are referred to from elsewhere in the pattern.
 566
 567 [h5 Backtracking Control Verbs]
 568
 569 This library has partial support for Perl's backtracking control verbs, in particular (*MARK) is not supported.
 570 There may also be detail differences in behaviour between this library and Perl, not least because Perl's behaviour
 571 is rather under-documented and often somewhat random in how it behaves in practice.  The verbs supported are:
 572
 573 * [^(*PRUNE)]  Has no effect unless backtracked onto, in which case all the backtracking information prior to this
 574 point is discarded.
 575 * [^(*SKIP)]   Behaves the same as [^(*PRUNE)] except that it is assumed that no match can possibly occur prior to
 576 the current point in the string being searched.  This can be used to optimize searches by skipping over chunks of text
 577 that have already been determined can not form a match.
 578 * [^(*THEN)]  Has no effect unless backtracked onto, in which case all subsequent alternatives in a group of alternations
 579 are discarded.
 580 * [^(*COMMIT)]  Has no effect unless backtracked onto, in which case all subsequent matching/searching attempts are abandoned.
 581 * [^(*FAIL)]  Causes the match to fail unconditionally at this point, can be used to force the engine to backtrack.
 582 * [^(*ACCEPT)] Causes the pattern to be considered matched at the current point.  Any half-open sub-expressions are closed at the current point.
 583
 584 [h4 Operator precedence]
 585
 586 The order of precedence for of operators is as follows:
 587
 588 # Collation-related bracket symbols     `[==] [::] [..]`
 589 # Escaped characters    [^\\]
 590 # Character set (bracket expression)    `[]`
 591 # Grouping      [^()]
 592 # Single-character-ERE duplication      [^* + ? {m,n}]
 593 # Concatenation
 594 # Anchoring     ^$
 595 # Alternation   |
 596
 597 [h3 What gets matched]
 598
 599 If you view the regular expression as a directed (possibly cyclic)
 600 graph, then the best match found is the first match found by a
 601 depth-first-search performed on that graph, while matching the input text.
 602
 603 Alternatively:
 604
 605 The best match found is the
 606 [link boost_regex.syntax.leftmost_longest_rule leftmost match],
 607 with individual elements matched as follows;
 608
 609 [table
 610 [[Construct][What gets matched]]
 611 [[[^AtomA AtomB]][Locates the best match for /AtomA/ that has a following match for /AtomB/.]]
 612 [[[^Expression1 | Expression2]][If /Expresion1/ can be matched then returns that match,
 613    otherwise attempts to match /Expression2/.]]
 614 [[[^S{N}]][Matches /S/ repeated exactly N times.]]
 615 [[[^S{N,M}]][Matches S repeated between N and M times, and as many times as possible.]]
 616 [[[^S{N,M}?]][Matches S repeated between N and M times, and as few times as possible.]]
 617 [[[^S?, S*, S+]][The same as [^S{0,1}], [^S{0,UINT_MAX}], [^S{1,UINT_MAX}] respectively.]]
 618 [[[^S??, S*?, S+?]][The same as [^S{0,1}?], [^S{0,UINT_MAX}?], [^S{1,UINT_MAX}?] respectively.]]
 619 [[[^(?>S)]][Matches the best match for /S/, and only that.]]
 620 [[[^(?=S), (?<=S)]][Matches only the best match for /S/ (this is only
 621       visible if there are capturing parenthesis within /S/).]]
 622 [[[^(?!S), (?<!S)]][Considers only whether a match for S exists or not.]]
 623 [[[^(?(condition)yes-pattern | no-pattern)]][If condition is true, then
 624    only yes-pattern is considered, otherwise only no-pattern is considered.]]
 625 ]
 626
 627 [h3 Variations]
 628
 629 The [link boost_regex.ref.syntax_option_type.syntax_option_type_perl options [^normal],
 630 [^ECMAScript], [^JavaScript] and [^JScript]] are all synonyms for
 631 [^perl].
 632
 633 [h3 Options]
 634
 635 There are a [link boost_regex.ref.syntax_option_type.syntax_option_type_perl
 636 variety of flags] that may be combined with the [^perl] option when
 637 constructing the regular expression, in particular note that the
 638 [^newline_alt] option alters the syntax, while the [^collate], [^nosubs] and
 639 [^icase] options modify how the case and locale sensitivity are to be applied.
 640
 641 [h3 Pattern Modifiers]
 642
 643 The perl [^smix] modifiers can either be applied using a [^(?smix-smix)]
 644 prefix to the regular expression, or with one of the
 645 [link boost_regex.ref.syntax_option_type.syntax_option_type_perl regex-compile time
 646 flags [^no_mod_m], [^mod_x], [^mod_s], and [^no_mod_s]].
 647
 648 [h3 References]
 649
 650 [@http://perldoc.perl.org/perlre.html Perl 5.8].
 651
 652
 653 [endsect]
 654
 655