]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/spirit/doc/lex/lexer.qbk
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / lexer.qbk
1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
4
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7 ===============================================================================/]
8
9 [section:lexer Supported Regular Expressions]
10
11 [table Regular expressions support
12 [[Expression] [Meaning]]
13 [[`x`] [Match any character `x`]]
14 [[`.`] [Match any except newline (or optionally *any* character)]]
15 [[`"..."`] [All characters taken as literals between double quotes, except escape sequences]]
16 [[`[xyz]`] [A character class; in this case matches `x`, `y` or `z`]]
17 [[`[abj-oZ]`] [A character class with a range in it; matches `a`, `b` any
18 letter from `j` through `o` or a `Z`]]
19 [[`[^A-Z]`] [A negated character class i.e. any character but those in
20 the class. In this case, any character except an uppercase
21 letter]]
22 [[`r*`] [Zero or more r's (greedy), where r is any regular expression]]
23 [[`r*?`] [Zero or more r's (abstemious), where r is any regular expression]]
24 [[`r+`] [One or more r's (greedy)]]
25 [[`r+?`] [One or more r's (abstemious)]]
26 [[`r?`] [Zero or one r's (greedy), i.e. optional]]
27 [[`r??`] [Zero or one r's (abstemious), i.e. optional]]
28 [[`r{2,5}`] [Anywhere between two and five r's (greedy)]]
29 [[`r{2,5}?`] [Anywhere between two and five r's (abstemious)]]
30 [[`r{2,}`] [Two or more r's (greedy)]]
31 [[`r{2,}?`] [Two or more r's (abstemious)]]
32 [[`r{4}`] [Exactly four r's]]
33 [[`{NAME}`] [The macro `NAME` (see below)]]
34 [[`"[xyz]\"foo"`] [The literal string `[xyz]\"foo`]]
35 [[`\X`] [If X is `a`, `b`, `e`, `n`, `r`, `f`, `t`, `v` then the
36 ANSI-C interpretation of `\x`. Otherwise a literal `X`
37 (used to escape operators such as `*`)]]
38 [[`\0`] [A NUL character (ASCII code 0)]]
39 [[`\123`] [The character with octal value 123]]
40 [[`\x2a`] [The character with hexadecimal value 2a]]
41 [[`\cX`] [A named control character `X`.]]
42 [[`\a`] [A shortcut for Alert (bell).]]
43 [[`\b`] [A shortcut for Backspace]]
44 [[`\e`] [A shortcut for ESC (escape character `0x1b`)]]
45 [[`\n`] [A shortcut for newline]]
46 [[`\r`] [A shortcut for carriage return]]
47 [[`\f`] [A shortcut for form feed `0x0c`]]
48 [[`\t`] [A shortcut for horizontal tab `0x09`]]
49 [[`\v`] [A shortcut for vertical tab `0x0b`]]
50 [[`\d`] [A shortcut for `[0-9]`]]
51 [[`\D`] [A shortcut for `[^0-9]`]]
52 [[`\s`] [A shortcut for `[\x20\t\n\r\f\v]`]]
53 [[`\S`] [A shortcut for `[^\x20\t\n\r\f\v]`]]
54 [[`\w`] [A shortcut for `[a-zA-Z0-9_]`]]
55 [[`\W`] [A shortcut for `[^a-zA-Z0-9_]`]]
56 [[`(r)`] [Match an `r`; parenthesis are used to override precedence
57 (see below)]]
58 [[`(?r-s:pattern)`] [apply option 'r' and omit option 's' while interpreting pattern.
59 Options may be zero or more of the characters 'i' or 's'.
60 'i' means case-insensitive. '-i' means case-sensitive.
61 's' alters the meaning of the '.' syntax to match any single character whatsoever.
62 '-s' alters the meaning of '.' to match any character except '`\n`'.]]
63 [[`rs`] [The regular expression `r` followed by the regular
64 expression `s` (a sequence)]]
65 [[`r|s`] [Either an `r` or and `s`]]
66 [[`^r`] [An `r` but only at the beginning of a line (i.e. when just
67 starting to scan, or right after a newline has been
68 scanned)]]
69 [[`r`$] [An `r` but only at the end of a line (i.e. just before a
70 newline)]]
71 ]
72
73 [note POSIX character classes are not currently supported, due to performance issues
74 when creating them in wide character mode.]
75
76 [tip If you want to build tokens for syntaxes that recognize items like quotes
77 (`"'"`, `'"'`) and backslash (`\`), here is example syntax to get you started.
78 The lesson here really is to remember that both c++, as well as regular
79 expressions require escaping with `\` for some constructs, which can
80 cascade.
81 ``
82 quote1 = "'"; // match single "'"
83 quote2 = "\\\""; // match single '"'
84 literal_quote1 = "\\'"; // match backslash followed by single "'"
85 literal_quote2 = "\\\\\\\""; // match backslash followed by single '"'
86 literal_backslash = "\\\\\\\\"; // match two backslashes
87 ``
88 ]
89
90 [heading Regular Expression Precedence]
91
92 * `rs` has highest precedence
93 * `r*` has next highest (`+`, `?`, `{n,m}` have the same precedence as `*`)
94 * `r|s` has the lowest precedence
95
96 [heading Macros]
97
98 Regular expressions can be given a name and referred to in rules using the
99 syntax `{NAME}` where `NAME` is the name you have given to the macro. A macro
100 name can be at most 30 characters long and must start with a `_` or a letter.
101 Subsequent characters can be `_`, `-`, a letter or a decimal digit.
102
103 [endsect]
104