]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | <head> | |
3 | <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> | |
4 | <title>POSIX Basic Regular Expression Syntax</title> | |
5 | <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> | |
6 | <meta name="generator" content="DocBook XSL Stylesheets V1.77.1"> | |
7 | <link rel="home" href="../../index.html" title="Boost.Regex 5.1.2"> | |
8 | <link rel="up" href="../syntax.html" title="Regular Expression Syntax"> | |
9 | <link rel="prev" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax"> | |
10 | <link rel="next" href="character_classes.html" title="Character Class Names"> | |
11 | </head> | |
12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> | |
13 | <table cellpadding="2" width="100%"><tr> | |
14 | <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> | |
15 | <td align="center"><a href="../../../../../../index.html">Home</a></td> | |
16 | <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> | |
17 | <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> | |
18 | <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> | |
19 | <td align="center"><a href="../../../../../../more/index.htm">More</a></td> | |
20 | </tr></table> | |
21 | <hr> | |
22 | <div class="spirit-nav"> | |
23 | <a accesskey="p" href="basic_extended.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="character_classes.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
24 | </div> | |
25 | <div class="section"> | |
26 | <div class="titlepage"><div><div><h3 class="title"> | |
27 | <a name="boost_regex.syntax.basic_syntax"></a><a class="link" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax">POSIX Basic Regular | |
28 | Expression Syntax</a> | |
29 | </h3></div></div></div> | |
30 | <h4> | |
31 | <a name="boost_regex.syntax.basic_syntax.h0"></a> | |
32 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.synopsis"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis">Synopsis</a> | |
33 | </h4> | |
34 | <p> | |
35 | The POSIX-Basic regular expression syntax is used by the Unix utility <code class="computeroutput"><span class="identifier">sed</span></code>, and variations are used by <code class="computeroutput"><span class="identifier">grep</span></code> and <code class="computeroutput"><span class="identifier">emacs</span></code>. | |
36 | You can construct POSIX basic regular expressions in Boost.Regex by passing | |
37 | the flag <code class="computeroutput"><span class="identifier">basic</span></code> to the regex | |
38 | constructor (see <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type"><code class="computeroutput"><span class="identifier">syntax_option_type</span></code></a>), for example: | |
39 | </p> | |
40 | <pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Basic expression:</span> | |
41 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">basic</span><span class="special">);</span> | |
42 | <span class="comment">// e2 a case insensitive POSIX-Basic expression:</span> | |
43 | <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">basic</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span> | |
44 | </pre> | |
45 | <a name="boost_regex.posix_basic"></a><h4> | |
46 | <a name="boost_regex.syntax.basic_syntax.h1"></a> | |
47 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.posix_basic_syntax"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax">POSIX | |
48 | Basic Syntax</a> | |
49 | </h4> | |
50 | <p> | |
51 | In POSIX-Basic regular expressions, all characters are match themselves except | |
52 | for the following special characters: | |
53 | </p> | |
54 | <pre class="programlisting">.[\*^$</pre> | |
55 | <h5> | |
56 | <a name="boost_regex.syntax.basic_syntax.h2"></a> | |
57 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.wildcard"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard">Wildcard:</a> | |
58 | </h5> | |
59 | <p> | |
60 | The single character '.' when used outside of a character set will match | |
61 | any single character except: | |
62 | </p> | |
63 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
64 | <li class="listitem"> | |
65 | The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code> | |
66 | is passed to the matching algorithms. | |
67 | </li> | |
68 | <li class="listitem"> | |
69 | The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code> | |
70 | is passed to the matching algorithms. | |
71 | </li> | |
72 | </ul></div> | |
73 | <h5> | |
74 | <a name="boost_regex.syntax.basic_syntax.h3"></a> | |
75 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.anchors"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.anchors">Anchors:</a> | |
76 | </h5> | |
77 | <p> | |
78 | A '^' character shall match the start of a line when used as the first character | |
79 | of an expression, or the first character of a sub-expression. | |
80 | </p> | |
81 | <p> | |
82 | A '$' character shall match the end of a line when used as the last character | |
83 | of an expression, or the last character of a sub-expression. | |
84 | </p> | |
85 | <h5> | |
86 | <a name="boost_regex.syntax.basic_syntax.h4"></a> | |
87 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.marked_sub_expressions"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions">Marked sub-expressions:</a> | |
88 | </h5> | |
89 | <p> | |
90 | A section beginning <code class="computeroutput"><span class="special">\(</span></code> and ending | |
91 | <code class="computeroutput"><span class="special">\)</span></code> acts as a marked sub-expression. | |
92 | Whatever matched the sub-expression is split out in a separate field by the | |
93 | matching algorithms. Marked sub-expressions can also repeated, or referred-to | |
94 | by a back-reference. | |
95 | </p> | |
96 | <h5> | |
97 | <a name="boost_regex.syntax.basic_syntax.h5"></a> | |
98 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.repeats"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.repeats">Repeats:</a> | |
99 | </h5> | |
100 | <p> | |
101 | Any atom (a single character, a marked sub-expression, or a character class) | |
102 | can be repeated with the * operator. | |
103 | </p> | |
104 | <p> | |
105 | For example <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span></code> | |
106 | will match any number of letter a's repeated zero or more times (an atom | |
107 | repeated zero times matches an empty string), so the expression <code class="computeroutput"><span class="identifier">a</span><span class="special">*</span><span class="identifier">b</span></code> | |
108 | will match any of the following: | |
109 | </p> | |
110 | <pre class="programlisting">b | |
111 | ab | |
112 | aaaaaaaab | |
113 | </pre> | |
114 | <p> | |
115 | An atom can also be repeated with a bounded repeat: | |
116 | </p> | |
117 | <p> | |
118 | <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">\}</span></code> Matches | |
119 | 'a' repeated exactly n times. | |
120 | </p> | |
121 | <p> | |
122 | <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">,\}</span></code> Matches | |
123 | 'a' repeated n or more times. | |
124 | </p> | |
125 | <p> | |
126 | <code class="computeroutput"><span class="identifier">a</span><span class="special">\{</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">m</span><span class="special">\}</span></code> Matches 'a' repeated between n and m times | |
127 | inclusive. | |
128 | </p> | |
129 | <p> | |
130 | For example: | |
131 | </p> | |
132 | <pre class="programlisting">^a{2,3}$</pre> | |
133 | <p> | |
134 | Will match either of: | |
135 | </p> | |
136 | <pre class="programlisting">aa | |
137 | aaa | |
138 | </pre> | |
139 | <p> | |
140 | But neither of: | |
141 | </p> | |
142 | <pre class="programlisting">a | |
143 | aaaa | |
144 | </pre> | |
145 | <p> | |
146 | It is an error to use a repeat operator, if the preceding construct can not | |
147 | be repeated, for example: | |
148 | </p> | |
149 | <pre class="programlisting">a(*)</pre> | |
150 | <p> | |
151 | Will raise an error, as there is nothing for the * operator to be applied | |
152 | to. | |
153 | </p> | |
154 | <h5> | |
155 | <a name="boost_regex.syntax.basic_syntax.h6"></a> | |
156 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.back_references"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.back_references">Back | |
157 | references:</a> | |
158 | </h5> | |
159 | <p> | |
160 | An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span> | |
161 | is in the range 1-9, matches the same string that was matched by sub-expression | |
162 | <span class="emphasis"><em>n</em></span>. For example the expression: | |
163 | </p> | |
164 | <pre class="programlisting">^\(a*\).*\1$</pre> | |
165 | <p> | |
166 | Will match the string: | |
167 | </p> | |
168 | <pre class="programlisting">aaabbaaa</pre> | |
169 | <p> | |
170 | But not the string: | |
171 | </p> | |
172 | <pre class="programlisting">aaabba</pre> | |
173 | <h5> | |
174 | <a name="boost_regex.syntax.basic_syntax.h7"></a> | |
175 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_sets"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets">Character | |
176 | sets:</a> | |
177 | </h5> | |
178 | <p> | |
179 | A character set is a bracket-expression starting with [ and ending with ], | |
180 | it defines a set of characters, and matches any single character that is | |
181 | a member of that set. | |
182 | </p> | |
183 | <p> | |
184 | A bracket expression may contain any combination of the following: | |
185 | </p> | |
186 | <h6> | |
187 | <a name="boost_regex.syntax.basic_syntax.h8"></a> | |
188 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.single_characters"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters">Single | |
189 | characters:</a> | |
190 | </h6> | |
191 | <p> | |
192 | For example <code class="computeroutput"><span class="special">[</span><span class="identifier">abc</span><span class="special">]</span></code>, will match any of the characters 'a', 'b', | |
193 | or 'c'. | |
194 | </p> | |
195 | <h6> | |
196 | <a name="boost_regex.syntax.basic_syntax.h9"></a> | |
197 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_ranges"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges">Character | |
198 | ranges:</a> | |
199 | </h6> | |
200 | <p> | |
201 | For example <code class="computeroutput"><span class="special">[</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> | |
202 | will match any single character in the range 'a' to 'c'. By default, for | |
203 | POSIX-Basic regular expressions, a character <span class="emphasis"><em>x</em></span> is within | |
204 | the range <span class="emphasis"><em>y</em></span> to <span class="emphasis"><em>z</em></span>, if it collates | |
205 | within that range; this results in locale specific behavior. This behavior | |
206 | can be turned off by unsetting the <code class="computeroutput"><span class="identifier">collate</span></code> | |
207 | option flag when constructing the regular expression - in which case whether | |
208 | a character appears within a range is determined by comparing the code points | |
209 | of the characters only. | |
210 | </p> | |
211 | <h6> | |
212 | <a name="boost_regex.syntax.basic_syntax.h10"></a> | |
213 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.negation"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.negation">Negation:</a> | |
214 | </h6> | |
215 | <p> | |
216 | If the bracket-expression begins with the ^ character, then it matches the | |
217 | complement of the characters it contains, for example <code class="computeroutput"><span class="special">[^</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">]</span></code> matches any character that is not in the | |
218 | range a-c. | |
219 | </p> | |
220 | <h6> | |
221 | <a name="boost_regex.syntax.basic_syntax.h11"></a> | |
222 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.character_classes"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes">Character | |
223 | classes:</a> | |
224 | </h6> | |
225 | <p> | |
226 | An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code> | |
227 | matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See | |
228 | <a class="link" href="character_classes.html" title="Character Class Names">character class names</a>. | |
229 | </p> | |
230 | <h6> | |
231 | <a name="boost_regex.syntax.basic_syntax.h12"></a> | |
232 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.collating_elements"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements">Collating | |
233 | Elements:</a> | |
234 | </h6> | |
235 | <p> | |
236 | An expression of the form <code class="computeroutput"><span class="special">[[.</span><span class="identifier">col</span><span class="special">.]</span></code> matches | |
237 | the collating element <span class="emphasis"><em>col</em></span>. A collating element is any | |
238 | single character, or any sequence of characters that collates as a single | |
239 | unit. Collating elements may also be used as the end point of a range, for | |
240 | example: <code class="computeroutput"><span class="special">[[.</span><span class="identifier">ae</span><span class="special">.]-</span><span class="identifier">c</span><span class="special">]</span></code> | |
241 | matches the character sequence "ae", plus any single character | |
242 | in the range "ae"-c, assuming that "ae" is treated as | |
243 | a single collating element in the current locale. | |
244 | </p> | |
245 | <p> | |
246 | Collating elements may be used in place of escapes (which are not normally | |
247 | allowed inside character sets), for example <code class="computeroutput"><span class="special">[[.^.]</span><span class="identifier">abc</span><span class="special">]</span></code> would | |
248 | match either one of the characters 'abc^'. | |
249 | </p> | |
250 | <p> | |
251 | As an extension, a collating element may also be specified via its symbolic | |
252 | name, for example: | |
253 | </p> | |
254 | <pre class="programlisting">[[.NUL.]]</pre> | |
255 | <p> | |
256 | matches a 'NUL' character. See <a class="link" href="collating_names.html" title="Collating Names">collating | |
257 | element names</a>. | |
258 | </p> | |
259 | <h6> | |
260 | <a name="boost_regex.syntax.basic_syntax.h13"></a> | |
261 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.equivalence_classes"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes">Equivalence | |
262 | classes:</a> | |
263 | </h6> | |
264 | <p> | |
265 | An expression of the form <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>, | |
266 | matches any character or collating element whose primary sort key is the | |
267 | same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating | |
268 | elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">collating | |
269 | symbolic name</a>. A primary sort key is one that ignores case, accentation, | |
270 | or locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches | |
271 | any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation | |
272 | of this is reliant on the platform's collation and localisation support; | |
273 | this feature can not be relied upon to work portably across all platforms, | |
274 | or even all locales on one platform. | |
275 | </p> | |
276 | <h6> | |
277 | <a name="boost_regex.syntax.basic_syntax.h14"></a> | |
278 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.combinations"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.combinations">Combinations:</a> | |
279 | </h6> | |
280 | <p> | |
281 | All of the above can be combined in one character set declaration, for example: | |
282 | <code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]].</span></code> | |
283 | </p> | |
284 | <h5> | |
285 | <a name="boost_regex.syntax.basic_syntax.h15"></a> | |
286 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.escapes"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.escapes">Escapes</a> | |
287 | </h5> | |
288 | <p> | |
289 | With the exception of the escape sequences \{, \}, \(, and \), which are | |
290 | documented above, an escape followed by any character matches that character. | |
291 | This can be used to make the special characters | |
292 | </p> | |
293 | <pre class="programlisting">.[\*^$</pre> | |
294 | <p> | |
295 | "ordinary". Note that the escape character loses its special meaning | |
296 | inside a character set, so <code class="computeroutput"><span class="special">[\^]</span></code> | |
297 | will match either a literal '\' or a '^'. | |
298 | </p> | |
299 | <h4> | |
300 | <a name="boost_regex.syntax.basic_syntax.h16"></a> | |
301 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.what_gets_matched"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched">What | |
302 | Gets Matched</a> | |
303 | </h4> | |
304 | <p> | |
305 | When there is more that one way to match a regular expression, the "best" | |
306 | possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest | |
307 | rule</a>. | |
308 | </p> | |
309 | <h4> | |
310 | <a name="boost_regex.syntax.basic_syntax.h17"></a> | |
311 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.variations"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.variations">Variations</a> | |
312 | </h4> | |
313 | <a name="boost_regex.grep_syntax"></a><h5> | |
314 | <a name="boost_regex.syntax.basic_syntax.h18"></a> | |
315 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.grep"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.grep">Grep</a> | |
316 | </h5> | |
317 | <p> | |
318 | When an expression is compiled with the flag <code class="computeroutput"><span class="identifier">grep</span></code> | |
319 | set, then the expression is treated as a newline separated list of <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic expressions</a>, a match | |
320 | is found if any of the expressions in the list match, for example: | |
321 | </p> | |
322 | <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">grep</span><span class="special">);</span> | |
323 | </pre> | |
324 | <p> | |
325 | will match either of the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic | |
326 | expressions</a> "abc" or "def". | |
327 | </p> | |
328 | <p> | |
329 | As its name suggests, this behavior is consistent with the Unix utility grep. | |
330 | </p> | |
331 | <h5> | |
332 | <a name="boost_regex.syntax.basic_syntax.h19"></a> | |
333 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.emacs"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.emacs">emacs</a> | |
334 | </h5> | |
335 | <p> | |
336 | In addition to the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic features</a> | |
337 | the following characters are also special: | |
338 | </p> | |
339 | <div class="informaltable"><table class="table"> | |
340 | <colgroup> | |
341 | <col> | |
342 | <col> | |
343 | </colgroup> | |
344 | <thead><tr> | |
345 | <th> | |
346 | <p> | |
347 | Character | |
348 | </p> | |
349 | </th> | |
350 | <th> | |
351 | <p> | |
352 | Description | |
353 | </p> | |
354 | </th> | |
355 | </tr></thead> | |
356 | <tbody> | |
357 | <tr> | |
358 | <td> | |
359 | <p> | |
360 | + | |
361 | </p> | |
362 | </td> | |
363 | <td> | |
364 | <p> | |
365 | repeats the preceding atom one or more times. | |
366 | </p> | |
367 | </td> | |
368 | </tr> | |
369 | <tr> | |
370 | <td> | |
371 | <p> | |
372 | ? | |
373 | </p> | |
374 | </td> | |
375 | <td> | |
376 | <p> | |
377 | repeats the preceding atom zero or one times. | |
378 | </p> | |
379 | </td> | |
380 | </tr> | |
381 | <tr> | |
382 | <td> | |
383 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"> | |
384 | ? | |
385 | </li></ul></div> | |
386 | </td> | |
387 | <td> | |
388 | <p> | |
389 | A non-greedy version of *. | |
390 | </p> | |
391 | </td> | |
392 | </tr> | |
393 | <tr> | |
394 | <td> | |
395 | <p> | |
396 | +? | |
397 | </p> | |
398 | </td> | |
399 | <td> | |
400 | <p> | |
401 | A non-greedy version of +. | |
402 | </p> | |
403 | </td> | |
404 | </tr> | |
405 | <tr> | |
406 | <td> | |
407 | <p> | |
408 | ?? | |
409 | </p> | |
410 | </td> | |
411 | <td> | |
412 | <p> | |
413 | A non-greedy version of ?. | |
414 | </p> | |
415 | </td> | |
416 | </tr> | |
417 | </tbody> | |
418 | </table></div> | |
419 | <p> | |
420 | And the following escape sequences are also recognised: | |
421 | </p> | |
422 | <div class="informaltable"><table class="table"> | |
423 | <colgroup> | |
424 | <col> | |
425 | <col> | |
426 | </colgroup> | |
427 | <thead><tr> | |
428 | <th> | |
429 | <p> | |
430 | Escape | |
431 | </p> | |
432 | </th> | |
433 | <th> | |
434 | <p> | |
435 | Description | |
436 | </p> | |
437 | </th> | |
438 | </tr></thead> | |
439 | <tbody> | |
440 | <tr> | |
441 | <td> | |
442 | <p> | |
443 | \| | |
444 | </p> | |
445 | </td> | |
446 | <td> | |
447 | <p> | |
448 | specifies an alternative. | |
449 | </p> | |
450 | </td> | |
451 | </tr> | |
452 | <tr> | |
453 | <td> | |
454 | <p> | |
455 | \(?: ... ) | |
456 | </p> | |
457 | </td> | |
458 | <td> | |
459 | <p> | |
460 | is a non-marking grouping construct - allows you to lexically group | |
461 | something without spitting out an extra sub-expression. | |
462 | </p> | |
463 | </td> | |
464 | </tr> | |
465 | <tr> | |
466 | <td> | |
467 | <p> | |
468 | \w | |
469 | </p> | |
470 | </td> | |
471 | <td> | |
472 | <p> | |
473 | matches any word character. | |
474 | </p> | |
475 | </td> | |
476 | </tr> | |
477 | <tr> | |
478 | <td> | |
479 | <p> | |
480 | \W | |
481 | </p> | |
482 | </td> | |
483 | <td> | |
484 | <p> | |
485 | matches any non-word character. | |
486 | </p> | |
487 | </td> | |
488 | </tr> | |
489 | <tr> | |
490 | <td> | |
491 | <p> | |
492 | \sx | |
493 | </p> | |
494 | </td> | |
495 | <td> | |
496 | <p> | |
497 | matches any character in the syntax group x, the following emacs | |
498 | groupings are supported: 's', ' ', '_', 'w', '.', ')', '(', '"', | |
499 | '\'', '>' and '<'. Refer to the emacs docs for details. | |
500 | </p> | |
501 | </td> | |
502 | </tr> | |
503 | <tr> | |
504 | <td> | |
505 | <p> | |
506 | \Sx | |
507 | </p> | |
508 | </td> | |
509 | <td> | |
510 | <p> | |
511 | matches any character not in the syntax grouping x. | |
512 | </p> | |
513 | </td> | |
514 | </tr> | |
515 | <tr> | |
516 | <td> | |
517 | <p> | |
518 | \c and \C | |
519 | </p> | |
520 | </td> | |
521 | <td> | |
522 | <p> | |
523 | These are not supported. | |
524 | </p> | |
525 | </td> | |
526 | </tr> | |
527 | <tr> | |
528 | <td> | |
529 | <p> | |
530 | \` | |
531 | </p> | |
532 | </td> | |
533 | <td> | |
534 | <p> | |
535 | matches zero characters only at the start of a buffer (or string | |
536 | being matched). | |
537 | </p> | |
538 | </td> | |
539 | </tr> | |
540 | <tr> | |
541 | <td> | |
542 | <p> | |
543 | \' | |
544 | </p> | |
545 | </td> | |
546 | <td> | |
547 | <p> | |
548 | matches zero characters only at the end of a buffer (or string | |
549 | being matched). | |
550 | </p> | |
551 | </td> | |
552 | </tr> | |
553 | <tr> | |
554 | <td> | |
555 | <p> | |
556 | \b | |
557 | </p> | |
558 | </td> | |
559 | <td> | |
560 | <p> | |
561 | matches zero characters at a word boundary. | |
562 | </p> | |
563 | </td> | |
564 | </tr> | |
565 | <tr> | |
566 | <td> | |
567 | <p> | |
568 | \B | |
569 | </p> | |
570 | </td> | |
571 | <td> | |
572 | <p> | |
573 | matches zero characters, not at a word boundary. | |
574 | </p> | |
575 | </td> | |
576 | </tr> | |
577 | <tr> | |
578 | <td> | |
579 | <p> | |
580 | \< | |
581 | </p> | |
582 | </td> | |
583 | <td> | |
584 | <p> | |
585 | matches zero characters only at the start of a word. | |
586 | </p> | |
587 | </td> | |
588 | </tr> | |
589 | <tr> | |
590 | <td> | |
591 | <p> | |
592 | \> | |
593 | </p> | |
594 | </td> | |
595 | <td> | |
596 | <p> | |
597 | matches zero characters only at the end of a word. | |
598 | </p> | |
599 | </td> | |
600 | </tr> | |
601 | </tbody> | |
602 | </table></div> | |
603 | <p> | |
604 | Finally, you should note that emacs style regular expressions are matched | |
605 | according to the <a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">Perl | |
606 | "depth first search" rules</a>. Emacs expressions are matched | |
607 | this way because they contain Perl-like extensions, that do not interact | |
608 | well with the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">POSIX-style | |
609 | leftmost-longest rule</a>. | |
610 | </p> | |
611 | <h4> | |
612 | <a name="boost_regex.syntax.basic_syntax.h20"></a> | |
613 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.options"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.options">Options</a> | |
614 | </h4> | |
615 | <p> | |
616 | There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions">variety | |
617 | of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">basic</span></code> | |
618 | and <code class="computeroutput"><span class="identifier">grep</span></code> options when constructing | |
619 | the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code>, <code class="computeroutput"><span class="identifier">no_char_classes</span></code>, | |
620 | <code class="computeroutput"><span class="identifier">no</span><span class="special">-</span><span class="identifier">intervals</span></code>, <code class="computeroutput"><span class="identifier">bk_plus_qm</span></code> | |
621 | and <code class="computeroutput"><span class="identifier">bk_plus_vbar</span></code></a> options | |
622 | all alter the syntax, while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code> and <code class="computeroutput"><span class="identifier">icase</span></code> | |
623 | options</a> modify how the case and locale sensitivity are to be applied. | |
624 | </p> | |
625 | <h4> | |
626 | <a name="boost_regex.syntax.basic_syntax.h21"></a> | |
627 | <span class="phrase"><a name="boost_regex.syntax.basic_syntax.references"></a></span><a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.references">References</a> | |
628 | </h4> | |
629 | <p> | |
630 | <a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE | |
631 | Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions | |
632 | and Headers, Section 9, Regular Expressions (FWD.1).</a> | |
633 | </p> | |
634 | <p> | |
635 | <a href="http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target="_top">IEEE | |
636 | Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and | |
637 | Utilities, Section 4, Utilities, grep (FWD.1).</a> | |
638 | </p> | |
639 | <p> | |
640 | <a href="http://www.gnu.org/software/emacs/" target="_top">Emacs Version 21.3.</a> | |
641 | </p> | |
642 | </div> | |
643 | <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> | |
644 | <td align="left"></td> | |
645 | <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> | |
646 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
647 | file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) | |
648 | </p> | |
649 | </div></td> | |
650 | </tr></table> | |
651 | <hr> | |
652 | <div class="spirit-nav"> | |
653 | <a accesskey="p" href="basic_extended.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="character_classes.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> | |
654 | </div> | |
655 | </body> | |
656 | </html> |