]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/ |
2 | Copyright 2006-2007 John Maddock. | |
3 | Distributed under the Boost Software License, Version 1.0. | |
4 | (See accompanying file LICENSE_1_0.txt or copy at | |
5 | http://www.boost.org/LICENSE_1_0.txt). | |
6 | ] | |
7 | ||
8 | ||
9 | [section:standards Standards Conformance] | |
10 | ||
11 | [h4 C++] | |
12 | ||
13 | Boost.Regex is intended to conform to the [tr1]. | |
14 | ||
15 | [h4 ECMAScript / JavaScript] | |
16 | ||
17 | All of the ECMAScript regular expression syntax features are supported, except that: | |
18 | ||
19 | The escape sequence \\u matches any upper case character (the same as \[\[:upper:\]\]) | |
20 | rather than a Unicode escape sequence; use \\x{DDDD} for Unicode escape sequences. | |
21 | ||
22 | [h4 Perl] | |
23 | ||
24 | Almost all Perl features are supported, except for: | |
25 | ||
26 | (?{code}) Not implementable in a compiled strongly typed language. | |
27 | ||
28 | (??{code}) Not implementable in a compiled strongly typed language. | |
29 | ||
30 | (*VERB) The [@http://perldoc.perl.org/perlre.html#Special-Backtracking-Control-Verbs | |
31 | backtracking control verbs] are not recognised or implemented at this time. | |
32 | ||
33 | In addition the following features behave slightly differently from Perl: | |
34 | ||
35 | ^ $ \Z These recognise any line termination sequence, and not just \\n: see the Unicode requirements below. | |
36 | ||
37 | [h4 POSIX] | |
38 | ||
39 | All the POSIX basic and extended regular expression features are supported, | |
40 | except that: | |
41 | ||
42 | No character collating names are recognized except those specified in the | |
43 | POSIX standard for the C locale, unless they are explicitly registered with the | |
44 | traits class. | |
45 | ||
46 | Character equivalence classes ( \[\[\=a\=\]\] etc) are probably buggy except on Win32. | |
47 | Implementing this feature requires knowledge of the format of the string sort | |
48 | keys produced by the system; if you need this, and the default implementation | |
49 | doesn't work on your platform, then you will need to supply a custom traits class. | |
50 | ||
51 | [h4 Unicode] | |
52 | ||
53 | The following comments refer to | |
54 | [@http://unicode.org/reports/tr18/ Unicode Technical Standard #18: Unicode | |
55 | Regular Expressions version 11]. | |
56 | ||
57 | [table | |
58 | [[Item][Feature][Support]] | |
59 | [[1.1][Hex Notation][Yes: use \x{DDDD} to refer to code point UDDDD.]] | |
60 | [[1.2][Character Properties][All the names listed under the General Category Property are supported. Script names and Other Names are not currently supported.]] | |
61 | [[1.3][Subtraction and Intersection][Indirectly support by forward-lookahead: | |
62 | ||
63 | `(?=[[:X:]])[[:Y:]]` | |
64 | ||
65 | Gives the intersection of character properties X and Y. | |
66 | ||
67 | `(?![[:X:]])[[:Y:]]` | |
68 | ||
69 | Gives everything in Y that is not in X (subtraction).]] | |
70 | [[1.4][Simple Word Boundaries][Conforming: non-spacing marks are included in the set of word characters.]] | |
71 | [[1.5][Caseless Matching][Supported, note that at this level, case transformations are 1:1, many to many case folding operations are not supported (for example "'''ß'''" to "SS").]] | |
72 | [[1.6][Line Boundaries][Supported, except that "." matches only one character of "\\r\\n". Other than that word boundaries match correctly; including not matching in the middle of a "\\r\\n" sequence.]] | |
73 | [[1.7][Code Points][Supported: provided you use the u32* algorithms, then UTF-8, UTF-16 and UTF-32 are all treated as sequences of 32-bit code points.]] | |
74 | [[2.1][Canonical Equivalence][Not supported: it is up to the user of the library to convert all text into the same canonical form as the regular expression.]] | |
75 | [[2.2][Default Grapheme Clusters][Not supported.]] | |
76 | [[2.3Default Word Boundaries][Not supported.]] | |
77 | [[2.4][Default Loose Matches][Not Supported.]] | |
78 | [[2.5][Named Properties][Supported: the expression "\[\[:name:\]\]" or \\N{name} matches the named character "name".]] | |
79 | [[2.6][Wildcard properties][Not Supported.]] | |
80 | [[3.1][Tailored Punctuation.][Not Supported.]] | |
81 | [[3.2][Tailored Grapheme Clusters][Not Supported.]] | |
82 | [[3.3][Tailored Word Boundaries.][Not Supported.]] | |
83 | [[3.4][Tailored Loose Matches][Partial support: \[\[\=c\=\]\] matches characters with the same primary equivalence class as "c".]] | |
84 | [[3.5][Tailored Ranges][Supported: \[a-b\] matches any character that collates in the range a to b, when the expression is constructed with the collate flag set.]] | |
85 | [[3.6][Context Matches][Not Supported.]] | |
86 | [[3.7][Incremental Matches][Supported: pass the flag `match_partial` to the regex algorithms.]] | |
87 | [[3.8][Unicode Set Sharing][Not Supported.]] | |
88 | [[3.9][Possible Match Sets][Not supported, however this information is used internally to optimise the matching of regular expressions, and return quickly if no match is possible.]] | |
89 | [[3.10][Folded Matching][Partial Support: It is possible to achieve a similar effect by using a custom regular expression traits class.]] | |
90 | [[3.11][Custom Submatch Evaluation][Not Supported.]] | |
91 | ] | |
92 | ||
93 | [endsect] | |
94 |