]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/ |
2 | / Copyright (c) 2008 Eric Niebler | |
3 | / | |
4 | / Distributed under the Boost Software License, Version 1.0. (See accompanying | |
5 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
6 | /] | |
7 | ||
8 | [section String Substitutions] | |
9 | ||
10 | Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the | |
11 | most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for | |
12 | searching and replacing. | |
13 | ||
14 | [h2 regex_replace()] | |
15 | ||
16 | Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object, | |
17 | and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept | |
18 | the input sequence as a bidirectional container such as `std::string` and returns the result in a new container | |
19 | of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others | |
20 | accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution | |
21 | may be specified as a string with format sequences or as a formatter object. Below are some simple examples of | |
22 | using string-based substitutions. | |
23 | ||
24 | std::string input("This is his face"); | |
25 | sregex re = as_xpr("his"); // find all occurrences of "his" ... | |
26 | std::string format("her"); // ... and replace them with "her" | |
27 | ||
28 | // use the version of regex_replace() that operates on strings | |
29 | std::string output = regex_replace( input, re, format ); | |
30 | std::cout << output << '\n'; | |
31 | ||
32 | // use the version of regex_replace() that operates on iterators | |
33 | std::ostream_iterator< char > out_iter( std::cout ); | |
34 | regex_replace( out_iter, input.begin(), input.end(), re, format ); | |
35 | ||
36 | The above program prints out the following: | |
37 | ||
38 | [pre | |
39 | Ther is her face | |
40 | Ther is her face | |
41 | ] | |
42 | ||
43 | Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`. | |
44 | ||
45 | Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see | |
46 | a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference | |
47 | to see a complete list of the available overloads. | |
48 | ||
49 | [h2 Replace Options] | |
50 | ||
51 | The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The | |
52 | possible values of the bitmask are: | |
53 | ||
54 | [table Format Flags | |
55 | [[Flag] [Meaning]] | |
56 | [[`format_default`] [Recognize the ECMA-262 format sequences (see below).]] | |
57 | [[`format_first_only`] [Only replace the first match, not all of them.]] | |
58 | [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex | |
59 | to the output sequence.]] | |
60 | [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any | |
61 | escape sequences.]] | |
62 | [[`format_perl`] [Recognize the Perl format sequences (see below).]] | |
63 | [[`format_sed`] [Recognize the sed format sequences (see below).]] | |
64 | [[`format_all`] [In addition to the Perl format sequences, recognize some | |
65 | Boost-specific format sequences.]] | |
66 | ] | |
67 | ||
68 | These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is | |
69 | a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and | |
70 | `format_all` are ignored. | |
71 | ||
72 | [h2 The ECMA-262 Format Sequences] | |
73 | ||
74 | When you haven't specified a substitution string dialect with one of the format flags above, | |
75 | you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows | |
76 | the escape sequences recognized in ECMA-262 mode. | |
77 | ||
78 | [table Format Escape Sequences | |
79 | [[Escape Sequence] [Meaning]] | |
80 | [[[^$1], [^$2], etc.] [the corresponding sub-match]] | |
81 | [[[^$&]] [the full match]] | |
82 | [[[^$\`]] [the match prefix]] | |
83 | [[[^$']] [the match suffix]] | |
84 | [[[^$$]] [a literal `'$'` character]] | |
85 | ] | |
86 | ||
87 | Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were | |
88 | `"$a"` then `"$a"` would be inserted into the output sequence. | |
89 | ||
90 | [h2 The Sed Format Sequences] | |
91 | ||
92 | When specifying the `format_sed` flag to _regex_replace_, the following escape sequences | |
93 | are recognized: | |
94 | ||
95 | [table Sed Format Escape Sequences | |
96 | [[Escape Sequence] [Meaning]] | |
97 | [[[^\\1], [^\\2], etc.] [The corresponding sub-match]] | |
98 | [[[^&]] [the full match]] | |
99 | [[[^\\a]] [A literal `'\a'`]] | |
100 | [[[^\\e]] [A literal `char_type(27)`]] | |
101 | [[[^\\f]] [A literal `'\f'`]] | |
102 | [[[^\\n]] [A literal `'\n'`]] | |
103 | [[[^\\r]] [A literal `'\r'`]] | |
104 | [[[^\\t]] [A literal `'\t'`]] | |
105 | [[[^\\v]] [A literal `'\v'`]] | |
106 | [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]] | |
107 | [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]] | |
108 | [[[^\\cX]] [The control character [^['X]]]] | |
109 | ] | |
110 | ||
111 | [h2 The Perl Format Sequences] | |
112 | ||
113 | When specifying the `format_perl` flag to _regex_replace_, the following escape sequences | |
114 | are recognized: | |
115 | ||
116 | [table Perl Format Escape Sequences | |
117 | [[Escape Sequence] [Meaning]] | |
118 | [[[^$1], [^$2], etc.] [the corresponding sub-match]] | |
119 | [[[^$&]] [the full match]] | |
120 | [[[^$\`]] [the match prefix]] | |
121 | [[[^$']] [the match suffix]] | |
122 | [[[^$$]] [a literal `'$'` character]] | |
123 | [[[^\\a]] [A literal `'\a'`]] | |
124 | [[[^\\e]] [A literal `char_type(27)`]] | |
125 | [[[^\\f]] [A literal `'\f'`]] | |
126 | [[[^\\n]] [A literal `'\n'`]] | |
127 | [[[^\\r]] [A literal `'\r'`]] | |
128 | [[[^\\t]] [A literal `'\t'`]] | |
129 | [[[^\\v]] [A literal `'\v'`]] | |
130 | [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]] | |
131 | [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]] | |
132 | [[[^\\cX]] [The control character [^['X]]]] | |
133 | [[[^\\l]] [Make the next character lowercase]] | |
134 | [[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]] | |
135 | [[[^\\u]] [Make the next character uppercase]] | |
136 | [[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]] | |
137 | [[[^\\E]] [Terminate [^\\L] or [^\\U]]] | |
138 | [[[^\\1], [^\\2], etc.] [The corresponding sub-match]] | |
139 | [[[^\\g<name>]] [The named backref /name/]] | |
140 | ] | |
141 | ||
142 | [h2 The Boost-Specific Format Sequences] | |
143 | ||
144 | When specifying the `format_all` flag to _regex_replace_, the escape sequences | |
145 | recognized are the same as those above for `format_perl`. In addition, conditional | |
146 | expressions of the following form are recognized: | |
147 | ||
148 | [pre | |
149 | ?Ntrue-expression:false-expression | |
150 | ] | |
151 | ||
152 | where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match | |
153 | participated in the full match, then the substitution is /true-expression/. Otherwise, | |
154 | it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you | |
155 | want a literal paren, you must escape it as [^\\(]. | |
156 | ||
157 | [h2 Formatter Objects] | |
158 | ||
159 | Format strings are not always expressive enough for all your text substitution | |
160 | needs. Consider the simple example of wanting to map input strings to output | |
161 | strings, as you may want to do with environment variables. Rather than a format | |
162 | /string/, for this you would use a formatter /object/. Consider the following | |
163 | code, which finds embedded environment variables of the form `"$(XYZ)"` and | |
164 | computes the substitution string by looking up the environment variable in a | |
165 | map. | |
166 | ||
167 | #include <map> | |
168 | #include <string> | |
169 | #include <iostream> | |
170 | #include <boost/xpressive/xpressive.hpp> | |
171 | using namespace boost; | |
172 | using namespace xpressive; | |
173 | ||
174 | std::map<std::string, std::string> env; | |
175 | ||
176 | std::string const &format_fun(smatch const &what) | |
177 | { | |
178 | return env[what[1].str()]; | |
179 | } | |
180 | ||
181 | int main() | |
182 | { | |
183 | env["X"] = "this"; | |
184 | env["Y"] = "that"; | |
185 | ||
186 | std::string input("\"$(X)\" has the value \"$(Y)\""); | |
187 | ||
188 | // replace strings like "$(XYZ)" with the result of env["XYZ"] | |
189 | sregex envar = "$(" >> (s1 = +_w) >> ')'; | |
190 | std::string output = regex_replace(input, envar, format_fun); | |
191 | std::cout << output << std::endl; | |
192 | } | |
193 | ||
194 | In this case, we use a function, `format_fun()` to compute the substitution string | |
195 | on the fly. It accepts a _match_results_ object which contains the results of the | |
196 | current match. `format_fun()` uses the first submatch as a key into the global `env` | |
197 | map. The above code displays: | |
198 | ||
199 | [pre | |
200 | "this" has the value "that" | |
201 | ] | |
202 | ||
203 | The formatter need not be an ordinary function. It may be an object of class type. | |
204 | And rather than return a string, it may accept an output iterator into which it | |
205 | writes the substitution. Consider the following, which is functionally equivalent | |
206 | to the above. | |
207 | ||
208 | #include <map> | |
209 | #include <string> | |
210 | #include <iostream> | |
211 | #include <boost/xpressive/xpressive.hpp> | |
212 | using namespace boost; | |
213 | using namespace xpressive; | |
214 | ||
215 | struct formatter | |
216 | { | |
217 | typedef std::map<std::string, std::string> env_map; | |
218 | env_map env; | |
219 | ||
220 | template<typename Out> | |
221 | Out operator()(smatch const &what, Out out) const | |
222 | { | |
223 | env_map::const_iterator where = env.find(what[1]); | |
224 | if(where != env.end()) | |
225 | { | |
226 | std::string const &sub = where->second; | |
227 | out = std::copy(sub.begin(), sub.end(), out); | |
228 | } | |
229 | return out; | |
230 | } | |
231 | ||
232 | }; | |
233 | ||
234 | int main() | |
235 | { | |
236 | formatter fmt; | |
237 | fmt.env["X"] = "this"; | |
238 | fmt.env["Y"] = "that"; | |
239 | ||
240 | std::string input("\"$(X)\" has the value \"$(Y)\""); | |
241 | ||
242 | sregex envar = "$(" >> (s1 = +_w) >> ')'; | |
243 | std::string output = regex_replace(input, envar, fmt); | |
244 | std::cout << output << std::endl; | |
245 | } | |
246 | ||
247 | The formatter must be a callable object -- a function or a function object -- | |
248 | that has one of three possible signatures, detailed in the table below. For | |
249 | the table, `fmt` is a function pointer or function object, `what` is a | |
250 | _match_results_ object, `out` is an OutputIterator, and `flags` is a value | |
251 | of `regex_constants::match_flag_type`: | |
252 | ||
253 | [table Formatter Signatures | |
254 | [ | |
255 | [Formatter Invocation] | |
256 | [Return Type] | |
257 | [Semantics] | |
258 | ] | |
259 | [ | |
260 | [`fmt(what)`] | |
261 | [Range of characters (e.g. `std::string`) or null-terminated string] | |
262 | [The string matched by the regex is replaced with the string returned by | |
263 | the formatter.] | |
264 | ] | |
265 | [ | |
266 | [`fmt(what, out)`] | |
267 | [OutputIterator] | |
268 | [The formatter writes the replacement string into `out` and returns `out`.] | |
269 | ] | |
270 | [ | |
271 | [`fmt(what, out, flags)`] | |
272 | [OutputIterator] | |
273 | [The formatter writes the replacement string into `out` and returns `out`. | |
274 | The `flags` parameter is the value of the match flags passed to the | |
275 | _regex_replace_ algorithm.] | |
276 | ] | |
277 | ] | |
278 | ||
279 | [h2 Formatter Expressions] | |
280 | ||
281 | In addition to format /strings/ and formatter /objects/, _regex_replace_ also | |
282 | accepts formatter /expressions/. A formatter expression is a lambda expression | |
283 | that generates a string. It uses the same syntax as that for | |
284 | [link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions | |
285 | Semantic Actions], which are covered later. The above example, which uses | |
286 | _regex_replace_ to substitute strings for environment variables, is repeated | |
287 | here using a formatter expression. | |
288 | ||
289 | #include <map> | |
290 | #include <string> | |
291 | #include <iostream> | |
292 | #include <boost/xpressive/xpressive.hpp> | |
293 | #include <boost/xpressive/regex_actions.hpp> | |
294 | using namespace boost::xpressive; | |
295 | ||
296 | int main() | |
297 | { | |
298 | std::map<std::string, std::string> env; | |
299 | env["X"] = "this"; | |
300 | env["Y"] = "that"; | |
301 | ||
302 | std::string input("\"$(X)\" has the value \"$(Y)\""); | |
303 | ||
304 | sregex envar = "$(" >> (s1 = +_w) >> ')'; | |
305 | std::string output = regex_replace(input, envar, ref(env)[s1]); | |
306 | std::cout << output << std::endl; | |
307 | } | |
308 | ||
309 | In the above, the formatter expression is `ref(env)[s1]`. This means to use the | |
310 | value of the first submatch, `s1`, as a key into the `env` map. The purpose of | |
311 | `xpressive::ref()` here is to make the reference to the `env` local variable /lazy/ | |
312 | so that the index operation is deferred until we know what to replace `s1` with. | |
313 | ||
314 | [endsect] |