]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/regex/doc/syntax_basic.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / syntax_basic.qbk
1 [/
2 Copyright 2006-2007 John Maddock.
3 Distributed under the Boost Software License, Version 1.0.
4 (See accompanying file LICENSE_1_0.txt or copy at
5 http://www.boost.org/LICENSE_1_0.txt).
6 ]
7
8
9 [section:basic_syntax POSIX Basic Regular Expression Syntax]
10
11 [h3 Synopsis]
12
13 The POSIX-Basic regular expression syntax is used by the Unix utility `sed`,
14 and variations are used by `grep` and `emacs`. You can construct POSIX
15 basic regular expressions in Boost.Regex by passing the flag `basic` to the
16 regex constructor (see [syntax_option_type]), for example:
17
18 // e1 is a case sensitive POSIX-Basic expression:
19 boost::regex e1(my_expression, boost::regex::basic);
20 // e2 a case insensitive POSIX-Basic expression:
21 boost::regex e2(my_expression, boost::regex::basic|boost::regex::icase);
22
23 [#boost_regex.posix_basic][h3 POSIX Basic Syntax]
24
25 In POSIX-Basic regular expressions, all characters are match themselves except
26 for the following special characters:
27
28 [pre .\[\\*^$]
29
30 [h4 Wildcard:]
31
32 The single character '.' when used outside of a character set will match any
33 single character except:
34
35 * The NULL character when the flag `match_no_dot_null` is passed to the
36 matching algorithms.
37 * The newline character when the flag `match_not_dot_newline` is passed to
38 the matching algorithms.
39
40 [h4 Anchors:]
41
42 A '^' character shall match the start of a line when used as the first
43 character of an expression, or the first character of a sub-expression.
44
45 A '$' character shall match the end of a line when used as the last
46 character of an expression, or the last character of a sub-expression.
47
48 [h4 Marked sub-expressions:]
49
50 A section beginning `\(` and ending `\)` acts as a marked sub-expression.
51 Whatever matched the sub-expression is split out in a separate field by the
52 matching algorithms. Marked sub-expressions can also repeated, or
53 referred-to by a back-reference.
54
55 [h4 Repeats:]
56
57 Any atom (a single character, a marked sub-expression, or a character class)
58 can be repeated with the \* operator.
59
60 For example `a*` will match any number of letter a's repeated zero or more
61 times (an atom repeated zero times matches an empty string), so the
62 expression `a*b` will match any of the following:
63
64 [pre
65 b
66 ab
67 aaaaaaaab
68 ]
69
70 An atom can also be repeated with a bounded repeat:
71
72 `a\{n\}` Matches 'a' repeated exactly n times.
73
74 `a\{n,\}` Matches 'a' repeated n or more times.
75
76 `a\{n, m\}` Matches 'a' repeated between n and m times inclusive.
77
78 For example:
79
80 [pre ^a\{2,3\}$]
81
82 Will match either of:
83
84 [pre
85 aa
86 aaa
87 ]
88
89 But neither of:
90
91 [pre
92 a
93 aaaa
94 ]
95
96 It is an error to use a repeat operator, if the preceding construct can not be
97 repeated, for example:
98
99 [pre a\(*\)]
100
101 Will raise an error, as there is nothing for the \* operator to be applied to.
102
103 [h4 Back references:]
104
105 An escape character followed by a digit /n/, where /n/ is in the range 1-9,
106 matches the same string that was matched by sub-expression /n/. For example
107 the expression:
108
109 [pre ^\\(a\*\\).\*\\1$]
110
111 Will match the string:
112
113 [pre aaabbaaa]
114
115 But not the string:
116
117 [pre aaabba]
118
119 [h4 Character sets:]
120
121 A character set is a bracket-expression starting with \[ and ending with \],
122 it defines a set of characters, and matches any single character that is a
123 member of that set.
124
125 A bracket expression may contain any combination of the following:
126
127 [h5 Single characters:]
128
129 For example `[abc]`, will match any of the characters 'a', 'b', or 'c'.
130
131 [h5 Character ranges:]
132
133 For example `[a-c]` will match any single character in the range 'a' to 'c'.
134 By default, for POSIX-Basic regular expressions, a character /x/ is within the
135 range /y/ to /z/, if it collates within that range; this results in
136 locale specific behavior. This behavior can be turned off by unsetting
137 the `collate` option flag when constructing the regular expression
138 - in which case whether a character appears within
139 a range is determined by comparing the code points of the characters only.
140
141 [h5 Negation:]
142
143 If the bracket-expression begins with the ^ character, then it matches the
144 complement of the characters it contains, for example `[^a-c]` matches
145 any character that is not in the range a-c.
146
147 [h5 Character classes:]
148
149 An expression of the form `[[:name:]]` matches the named character class "name",
150 for example `[[:lower:]]` matches any lower case character.
151 See [link boost_regex.syntax.character_classes character class names].
152
153 [h5 Collating Elements:]
154
155 An expression of the form `[[.col.]` matches the collating element /col/.
156 A collating element is any single character, or any sequence of
157 characters that collates as a single unit. Collating elements may also
158 be used as the end point of a range, for example: `[[.ae.]-c]` matches
159 the character sequence "ae", plus any single character in the range "ae"-c,
160 assuming that "ae" is treated as a single collating element in the current locale.
161
162 Collating elements may be used in place of escapes (which are not
163 normally allowed inside character sets), for example `[[.^.]abc]` would
164 match either one of the characters 'abc^'.
165
166 As an extension, a collating element may also be specified via its
167 symbolic name, for example:
168
169 [pre \[\[\.NUL\.\]\]]
170
171 matches a 'NUL' character.
172 See [link boost_regex.syntax.collating_names collating element names].
173
174 [h5 Equivalence classes:]
175
176 An expression of the form `[[=col=]]`, matches any character or collating
177 element whose primary sort key is the same as that for collating element
178 /col/, as with collating elements the name /col/ may be a
179 [link boost_regex.syntax.collating_names collating symbolic name].
180 A primary sort key is one that ignores case, accentation, or
181 locale-specific tailorings; so for example `[[=a=]]` matches any of
182 the characters: a, '''À''', '''Á''', '''Â''',
183 '''Ã''', '''Ä''', '''Å''', A, '''à''', '''á''',
184 '''â''', '''ã''', '''ä''' and '''å'''.
185 Unfortunately implementation of this is reliant on the platform's
186 collation and localisation support; this feature can not be relied
187 upon to work portably across all platforms, or even all locales on one platform.
188
189 [h5 Combinations:]
190
191 All of the above can be combined in one character set declaration, for
192 example: `[[:digit:]a-c[.NUL.]].`
193
194 [h4 Escapes]
195
196 With the exception of the escape sequences \\{, \\}, \\(, and \\),
197 which are documented above, an escape followed by any character matches
198 that character. This can be used to make the special characters
199
200 [pre .\[\\\*^$]
201
202 "ordinary". Note that the escape character loses its special meaning
203 inside a character set, so `[\^]` will match either a literal '\\' or a '^'.
204
205 [h3 What Gets Matched]
206
207 When there is more that one way to match a regular expression, the
208 "best" possible match is obtained using the
209 [link boost_regex.syntax.leftmost_longest_rule leftmost-longest rule].
210
211 [h3 Variations]
212
213 [#boost_regex.grep_syntax][h4 Grep]
214
215 When an expression is compiled with the flag `grep` set, then the
216 expression is treated as a newline separated list of
217 [link boost_regex.posix_basic POSIX-Basic expressions],
218 a match is found if any of the expressions in the list match, for example:
219
220 boost::regex e("abc\ndef", boost::regex::grep);
221
222 will match either of the [link boost_regex.posix_basic POSIX-Basic expressions]
223 "abc" or "def".
224
225 As its name suggests, this behavior is consistent with the Unix utility grep.
226
227 [h4 emacs]
228
229 In addition to the [link boost_regex.posix_basic POSIX-Basic features]
230 the following characters are also special:
231
232 [table
233 [[Character][Description]]
234 [[+][repeats the preceding atom one or more times.]]
235 [[?][repeats the preceding atom zero or one times.]]
236 [[*?][A non-greedy version of *.]]
237 [[+?][A non-greedy version of +.]]
238 [[??][A non-greedy version of ?.]]
239 ]
240
241 And the following escape sequences are also recognised:
242
243 [table
244 [[Escape][Description]]
245 [[\\|][specifies an alternative.]]
246 [[\\(?: ... \)][is a non-marking grouping construct - allows you to lexically group something without spitting out an extra sub-expression.]]
247 [[\\w][matches any word character.]]
248 [[\\W][matches any non-word character.]]
249 [[\\sx][matches any character in the syntax group x, the following
250 emacs groupings are supported: 's', ' ', '_', 'w', '.', ')', '(', '"', '\\'', '>' and '<'. Refer to the emacs docs for details.]]
251 [[\\Sx][matches any character not in the syntax grouping x.]]
252 [[\\c and \\C][These are not supported.]]
253 [[\\`][matches zero characters only at the start of a buffer (or string being matched).]]
254 [[\\'][matches zero characters only at the end of a buffer (or string being matched).]]
255 [[\\b][matches zero characters at a word boundary.]]
256 [[\\B][matches zero characters, not at a word boundary.]]
257 [[\\<][matches zero characters only at the start of a word.]]
258 [[\\>][matches zero characters only at the end of a word.]]
259 ]
260
261 Finally, you should note that emacs style regular expressions are matched
262 according to the
263 [link boost_regex.syntax.perl_syntax.what_gets_matched Perl "depth first search" rules].
264 Emacs expressions are
265 matched this way because they contain Perl-like extensions, that do not
266 interact well with the
267 [link boost_regex.syntax.leftmost_longest_rule POSIX-style leftmost-longest rule].
268
269 [h3 Options]
270
271 There are a [link boost_regex.ref.syntax_option_type.syntax_option_type_basic variety of flags] that may be combined with the `basic` and `grep`
272 options when constructing the regular expression, in particular note
273 that the
274 [link boost_regex.ref.syntax_option_type.syntax_option_type_basic `newline_alt`, `no_char_classes`, `no-intervals`, `bk_plus_qm`
275 and `bk_plus_vbar`] options all alter the syntax, while the
276 [link boost_regex.ref.syntax_option_type.syntax_option_type_basic `collate` and `icase` options] modify how the case and locale sensitivity
277 are to be applied.
278
279 [h3 References]
280
281 [@http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).]
282
283 [@http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1).]
284
285 [@http://www.gnu.org/software/emacs/ Emacs Version 21.3.]
286
287 [endsect]
288
289