]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:lexer Supported Regular Expressions] | |
10 | ||
11 | [table Regular expressions support | |
12 | [[Expression] [Meaning]] | |
13 | [[`x`] [Match any character `x`]] | |
14 | [[`.`] [Match any except newline (or optionally *any* character)]] | |
15 | [[`"..."`] [All characters taken as literals between double quotes, except escape sequences]] | |
16 | [[`[xyz]`] [A character class; in this case matches `x`, `y` or `z`]] | |
17 | [[`[abj-oZ]`] [A character class with a range in it; matches `a`, `b` any | |
18 | letter from `j` through `o` or a `Z`]] | |
19 | [[`[^A-Z]`] [A negated character class i.e. any character but those in | |
20 | the class. In this case, any character except an uppercase | |
21 | letter]] | |
22 | [[`r*`] [Zero or more r's (greedy), where r is any regular expression]] | |
23 | [[`r*?`] [Zero or more r's (abstemious), where r is any regular expression]] | |
24 | [[`r+`] [One or more r's (greedy)]] | |
25 | [[`r+?`] [One or more r's (abstemious)]] | |
26 | [[`r?`] [Zero or one r's (greedy), i.e. optional]] | |
27 | [[`r??`] [Zero or one r's (abstemious), i.e. optional]] | |
28 | [[`r{2,5}`] [Anywhere between two and five r's (greedy)]] | |
29 | [[`r{2,5}?`] [Anywhere between two and five r's (abstemious)]] | |
30 | [[`r{2,}`] [Two or more r's (greedy)]] | |
31 | [[`r{2,}?`] [Two or more r's (abstemious)]] | |
32 | [[`r{4}`] [Exactly four r's]] | |
33 | [[`{NAME}`] [The macro `NAME` (see below)]] | |
34 | [[`"[xyz]\"foo"`] [The literal string `[xyz]\"foo`]] | |
35 | [[`\X`] [If X is `a`, `b`, `e`, `n`, `r`, `f`, `t`, `v` then the | |
36 | ANSI-C interpretation of `\x`. Otherwise a literal `X` | |
37 | (used to escape operators such as `*`)]] | |
38 | [[`\0`] [A NUL character (ASCII code 0)]] | |
39 | [[`\123`] [The character with octal value 123]] | |
40 | [[`\x2a`] [The character with hexadecimal value 2a]] | |
41 | [[`\cX`] [A named control character `X`.]] | |
42 | [[`\a`] [A shortcut for Alert (bell).]] | |
43 | [[`\b`] [A shortcut for Backspace]] | |
44 | [[`\e`] [A shortcut for ESC (escape character `0x1b`)]] | |
45 | [[`\n`] [A shortcut for newline]] | |
46 | [[`\r`] [A shortcut for carriage return]] | |
47 | [[`\f`] [A shortcut for form feed `0x0c`]] | |
48 | [[`\t`] [A shortcut for horizontal tab `0x09`]] | |
49 | [[`\v`] [A shortcut for vertical tab `0x0b`]] | |
50 | [[`\d`] [A shortcut for `[0-9]`]] | |
51 | [[`\D`] [A shortcut for `[^0-9]`]] | |
52 | [[`\s`] [A shortcut for `[\x20\t\n\r\f\v]`]] | |
53 | [[`\S`] [A shortcut for `[^\x20\t\n\r\f\v]`]] | |
54 | [[`\w`] [A shortcut for `[a-zA-Z0-9_]`]] | |
55 | [[`\W`] [A shortcut for `[^a-zA-Z0-9_]`]] | |
56 | [[`(r)`] [Match an `r`; parenthesis are used to override precedence | |
57 | (see below)]] | |
58 | [[`(?r-s:pattern)`] [apply option 'r' and omit option 's' while interpreting pattern. | |
59 | Options may be zero or more of the characters 'i' or 's'. | |
60 | 'i' means case-insensitive. '-i' means case-sensitive. | |
61 | 's' alters the meaning of the '.' syntax to match any single character whatsoever. | |
62 | '-s' alters the meaning of '.' to match any character except '`\n`'.]] | |
63 | [[`rs`] [The regular expression `r` followed by the regular | |
64 | expression `s` (a sequence)]] | |
65 | [[`r|s`] [Either an `r` or and `s`]] | |
66 | [[`^r`] [An `r` but only at the beginning of a line (i.e. when just | |
67 | starting to scan, or right after a newline has been | |
68 | scanned)]] | |
69 | [[`r`$] [An `r` but only at the end of a line (i.e. just before a | |
70 | newline)]] | |
71 | ] | |
72 | ||
73 | [note POSIX character classes are not currently supported, due to performance issues | |
74 | when creating them in wide character mode.] | |
75 | ||
76 | [tip If you want to build tokens for syntaxes that recognize items like quotes | |
77 | (`"'"`, `'"'`) and backslash (`\`), here is example syntax to get you started. | |
78 | The lesson here really is to remember that both c++, as well as regular | |
79 | expressions require escaping with `\` for some constructs, which can | |
80 | cascade. | |
81 | `` | |
82 | quote1 = "'"; // match single "'" | |
83 | quote2 = "\\\""; // match single '"' | |
84 | literal_quote1 = "\\'"; // match backslash followed by single "'" | |
85 | literal_quote2 = "\\\\\\\""; // match backslash followed by single '"' | |
86 | literal_backslash = "\\\\\\\\"; // match two backslashes | |
87 | `` | |
88 | ] | |
89 | ||
90 | [heading Regular Expression Precedence] | |
91 | ||
92 | * `rs` has highest precedence | |
93 | * `r*` has next highest (`+`, `?`, `{n,m}` have the same precedence as `*`) | |
94 | * `r|s` has the lowest precedence | |
95 | ||
96 | [heading Macros] | |
97 | ||
98 | Regular expressions can be given a name and referred to in rules using the | |
99 | syntax `{NAME}` where `NAME` is the name you have given to the macro. A macro | |
100 | name can be at most 30 characters long and must start with a `_` or a letter. | |
101 | Subsequent characters can be `_`, `-`, a letter or a decimal digit. | |
102 | ||
103 | [endsect] | |
104 |