[ceph.git] / ceph / src / boost / libs / xpressive / doc / substitutions.qbk

[/
 / Copyright (c) 2008 Eric Niebler
 /
 / Distributed under the Boost Software License, Version 1.0. (See accompanying
 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
 /]

[section String Substitutions]

Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
searching and replacing.

[h2 regex_replace()]

Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
using string-based substitutions.

    std::string input("This is his face");
    sregex re = as_xpr("his");                // find all occurrences of "his" ...
    std::string format("her");                // ... and replace them with "her"

    // use the version of regex_replace() that operates on strings
    std::string output = regex_replace( input, re, format );
    std::cout << output << '\n';

    // use the version of regex_replace() that operates on iterators
    std::ostream_iterator< char > out_iter( std::cout );
    regex_replace( out_iter, input.begin(), input.end(), re, format );

The above program prints out the following:

[pre
Ther is her face
Ther is her face
]

Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.

Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
to see a complete list of the available overloads.

[h2 Replace Options]

The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
possible values of the bitmask are:

[table Format Flags
    [[Flag]                     [Meaning]]
    [[`format_default`]         [Recognize the ECMA-262 format sequences (see below).]]
    [[`format_first_only`]      [Only replace the first match, not all of them.]]
    [[`format_no_copy`]         [Don't copy the parts of the input sequence that didn't match the regex
                                 to the output sequence.]]
    [[`format_literal`]         [Treat the format string as a literal; that is, don't recognize any
                                 escape sequences.]]
    [[`format_perl`]            [Recognize the Perl format sequences (see below).]]
    [[`format_sed`]             [Recognize the sed format sequences (see below).]]
    [[`format_all`]             [In addition to the Perl format sequences, recognize some
                                 Boost-specific format sequences.]]
]

These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
`format_all` are ignored.

[h2 The ECMA-262 Format Sequences]

When you haven't specified a substitution string dialect with one of the format flags above,
you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
the escape sequences recognized in ECMA-262 mode.

[table Format Escape Sequences
    [[Escape Sequence]      [Meaning]]
    [[[^$1], [^$2], etc.]   [the corresponding sub-match]]
    [[[^$&]]                [the full match]]
    [[[^$\`]]               [the match prefix]]
    [[[^$']]                [the match suffix]]
    [[[^$$]]                [a literal `'$'` character]]
]

Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
`"$a"` then `"$a"` would be inserted into the output sequence.

[h2 The Sed Format Sequences]

When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
are recognized:

[table Sed Format Escape Sequences
    [[Escape Sequence]      [Meaning]]
    [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
    [[[^&]]                 [the full match]]
    [[[^\\a]]               [A literal `'\a'`]]
    [[[^\\e]]               [A literal `char_type(27)`]]
    [[[^\\f]]               [A literal `'\f'`]]
    [[[^\\n]]               [A literal `'\n'`]]
    [[[^\\r]]               [A literal `'\r'`]]
    [[[^\\t]]               [A literal `'\t'`]]
    [[[^\\v]]               [A literal `'\v'`]]
    [[[^\\xFF]]             [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
    [[[^\\x{FFFF}]]         [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
    [[[^\\cX]]              [The control character [^['X]]]]
]

[h2 The Perl Format Sequences]

When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
are recognized:

[table Perl Format Escape Sequences
    [[Escape Sequence]      [Meaning]]
    [[[^$1], [^$2], etc.]   [the corresponding sub-match]]
    [[[^$&]]                [the full match]]
    [[[^$\`]]               [the match prefix]]
    [[[^$']]                [the match suffix]]
    [[[^$$]]                [a literal `'$'` character]]
    [[[^\\a]]               [A literal `'\a'`]]
    [[[^\\e]]               [A literal `char_type(27)`]]
    [[[^\\f]]               [A literal `'\f'`]]
    [[[^\\n]]               [A literal `'\n'`]]
    [[[^\\r]]               [A literal `'\r'`]]
    [[[^\\t]]               [A literal `'\t'`]]
    [[[^\\v]]               [A literal `'\v'`]]
    [[[^\\xFF]]             [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
    [[[^\\x{FFFF}]]         [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
    [[[^\\cX]]              [The control character [^['X]]]]
    [[[^\\l]]               [Make the next character lowercase]]
    [[[^\\L]]               [Make the rest of the substitution lowercase until the next [^\\E]]]
    [[[^\\u]]               [Make the next character uppercase]]
    [[[^\\U]]               [Make the rest of the substitution uppercase until the next [^\\E]]]
    [[[^\\E]]               [Terminate [^\\L] or [^\\U]]]
    [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
    [[[^\\g<name>]]         [The named backref /name/]]
]

[h2 The Boost-Specific Format Sequences]

When specifying the `format_all` flag to _regex_replace_, the escape sequences
recognized are the same as those above for `format_perl`. In addition, conditional
expressions of the following form are recognized:

[pre
?Ntrue-expression:false-expression
]

where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
participated in the full match, then the substitution is /true-expression/. Otherwise,
it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
want a literal paren, you must escape it as [^\\(].

[h2 Formatter Objects]

Format strings are not always expressive enough for all your text substitution
needs. Consider the simple example of wanting to map input strings to output
strings, as you may want to do with environment variables. Rather than a format
/string/, for this you would use a formatter /object/. Consider the following
code, which finds embedded environment variables of the form `"$(XYZ)"` and
computes the substitution string by looking up the environment variable in a
map.

    #include <map>
    #include <string>
    #include <iostream>
    #include <boost/xpressive/xpressive.hpp>
    using namespace boost;
    using namespace xpressive;

    std::map<std::string, std::string> env;

    std::string const &format_fun(smatch const &what)
    {
        return env[what[1].str()];
    }

    int main()
    {
        env["X"] = "this";
        env["Y"] = "that";

        std::string input("\"$(X)\" has the value \"$(Y)\"");

        // replace strings like "$(XYZ)" with the result of env["XYZ"]
        sregex envar = "$(" >> (s1 = +_w) >> ')';
        std::string output = regex_replace(input, envar, format_fun);
        std::cout << output << std::endl;
    }

In this case, we use a function, `format_fun()` to compute the substitution string
on the fly. It accepts a _match_results_ object which contains the results of the
current match. `format_fun()` uses the first submatch as a key into the global `env`
map. The above code displays:

[pre
"this" has the value "that"
]

The formatter need not be an ordinary function. It may be an object of class type.
And rather than return a string, it may accept an output iterator into which it
writes the substitution. Consider the following, which is functionally equivalent
to the above.

    #include <map>
    #include <string>
    #include <iostream>
    #include <boost/xpressive/xpressive.hpp>
    using namespace boost;
    using namespace xpressive;

    struct formatter
    {
        typedef std::map<std::string, std::string> env_map;
        env_map env;

        template<typename Out>
        Out operator()(smatch const &what, Out out) const
        {
            env_map::const_iterator where = env.find(what[1]);
            if(where != env.end())
            {
                std::string const &sub = where->second;
                out = std::copy(sub.begin(), sub.end(), out);
            }
            return out;
        }

    };

    int main()
    {
        formatter fmt;
        fmt.env["X"] = "this";
        fmt.env["Y"] = "that";

        std::string input("\"$(X)\" has the value \"$(Y)\"");

        sregex envar = "$(" >> (s1 = +_w) >> ')';
        std::string output = regex_replace(input, envar, fmt);
        std::cout << output << std::endl;
    }

The formatter must be a callable object -- a function or a function object --
that has one of three possible signatures, detailed in the table below. For
the table, `fmt` is a function pointer or function object, `what` is a
_match_results_ object, `out` is an OutputIterator, and `flags` is a value
of `regex_constants::match_flag_type`:

[table Formatter Signatures
[
    [Formatter Invocation]
    [Return Type]
    [Semantics]
]
[
    [`fmt(what)`]
    [Range of characters (e.g. `std::string`) or null-terminated string]
    [The string matched by the regex is replaced with the string returned by
     the formatter.]
]
[
    [`fmt(what, out)`]
    [OutputIterator]
    [The formatter writes the replacement string into `out` and returns `out`.]
]
[
    [`fmt(what, out, flags)`]
    [OutputIterator]
    [The formatter writes the replacement string into `out` and returns `out`.
     The `flags` parameter is the value of the match flags passed to the
     _regex_replace_ algorithm.]
]
]

[h2 Formatter Expressions]

In addition to format /strings/ and formatter /objects/, _regex_replace_ also
accepts formatter /expressions/. A formatter expression is a lambda expression
that generates a string. It uses the same syntax as that for
[link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
Semantic Actions], which are covered later. The above example, which uses
_regex_replace_ to substitute strings for environment variables, is repeated
here using a formatter expression.

    #include <map>
    #include <string>
    #include <iostream>
    #include <boost/xpressive/xpressive.hpp>
    #include <boost/xpressive/regex_actions.hpp>
    using namespace boost::xpressive;

    int main()
    {
        std::map<std::string, std::string> env;
        env["X"] = "this";
        env["Y"] = "that";

        std::string input("\"$(X)\" has the value \"$(Y)\"");

        sregex envar = "$(" >> (s1 = +_w) >> ')';
        std::string output = regex_replace(input, envar, ref(env)[s1]);
        std::cout << output << std::endl;
    }

In the above, the formatter expression is `ref(env)[s1]`. This means to use the
value of the first submatch, `s1`, as a key into the `env` map. The purpose of
`xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
so that the index operation is deferred until we know what to replace `s1` with.

[endsect]
Commit	Line	Data
7c673cae FG	1	[/
	2	/ Copyright (c) 2008 Eric Niebler
	3	/
	4	/ Distributed under the Boost Software License, Version 1.0. (See accompanying
	5	/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	6	/]
	7
	8	[section String Substitutions]
	9
	10	Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
	11	most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
	12	searching and replacing.
	13
	14	[h2 regex_replace()]
	15
	16	Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
	17	and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
	18	the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
	19	of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
	20	accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
	21	may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
	22	using string-based substitutions.
	23
	24	std::string input("This is his face");
	25	sregex re = as_xpr("his"); // find all occurrences of "his" ...
	26	std::string format("her"); // ... and replace them with "her"
	27
	28	// use the version of regex_replace() that operates on strings
	29	std::string output = regex_replace( input, re, format );
	30	std::cout << output << '\n';
	31
	32	// use the version of regex_replace() that operates on iterators
	33	std::ostream_iterator< char > out_iter( std::cout );
	34	regex_replace( out_iter, input.begin(), input.end(), re, format );
	35
	36	The above program prints out the following:
	37
	38	[pre
	39	Ther is her face
	40	Ther is her face
	41	]
	42
	43	Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.
	44
	45	Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
	46	a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
	47	to see a complete list of the available overloads.
	48
	49	[h2 Replace Options]
	50
	51	The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
	52	possible values of the bitmask are:
	53
	54	[table Format Flags
	55	[[Flag] [Meaning]]
	56	[[`format_default`] [Recognize the ECMA-262 format sequences (see below).]]
	57	[[`format_first_only`] [Only replace the first match, not all of them.]]
	58	[[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex
	59	to the output sequence.]]
	60	[[`format_literal`] [Treat the format string as a literal; that is, don't recognize any
	61	escape sequences.]]
	62	[[`format_perl`] [Recognize the Perl format sequences (see below).]]
	63	[[`format_sed`] [Recognize the sed format sequences (see below).]]
	64	[[`format_all`] [In addition to the Perl format sequences, recognize some
65	Boost-specific format sequences.]]
66	]
67
68	These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
69	a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
70	`format_all` are ignored.
71
72	[h2 The ECMA-262 Format Sequences]
73
74	When you haven't specified a substitution string dialect with one of the format flags above,
75	you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
76	the escape sequences recognized in ECMA-262 mode.
77
78	[table Format Escape Sequences
79	[[Escape Sequence] [Meaning]]
80	[[[^$1], [^$2], etc.] [the corresponding sub-match]]
81	[[[^$&]] [the full match]]
82	[[[^$\`]] [the match prefix]]
83	[[[^$']] [the match suffix]]
84	[[[^$$]] [a literal `'$'` character]]
85	]
86
87	Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
88	`"$a"` then `"$a"` would be inserted into the output sequence.
89
90	[h2 The Sed Format Sequences]
91
92	When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
93	are recognized:
94
95	[table Sed Format Escape Sequences
96	[[Escape Sequence] [Meaning]]
97	[[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
98	[[[^&]] [the full match]]
99	[[[^\\a]] [A literal `'\a'`]]
100	[[[^\\e]] [A literal `char_type(27)`]]
101	[[[^\\f]] [A literal `'\f'`]]
102	[[[^\\n]] [A literal `'\n'`]]
103	[[[^\\r]] [A literal `'\r'`]]
104	[[[^\\t]] [A literal `'\t'`]]
105	[[[^\\v]] [A literal `'\v'`]]
106	[[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
107	[[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
108	[[[^\\cX]] [The control character [^['X]]]]
109	]
110
111	[h2 The Perl Format Sequences]
112
113	When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
114	are recognized:
115
116	[table Perl Format Escape Sequences
117	[[Escape Sequence] [Meaning]]
118	[[[^$1], [^$2], etc.] [the corresponding sub-match]]
119	[[[^$&]] [the full match]]
120	[[[^$\`]] [the match prefix]]
121	[[[^$']] [the match suffix]]
122	[[[^$$]] [a literal `'$'` character]]
123	[[[^\\a]] [A literal `'\a'`]]
124	[[[^\\e]] [A literal `char_type(27)`]]
125	[[[^\\f]] [A literal `'\f'`]]
126	[[[^\\n]] [A literal `'\n'`]]
127	[[[^\\r]] [A literal `'\r'`]]
128	[[[^\\t]] [A literal `'\t'`]]
129	[[[^\\v]] [A literal `'\v'`]]
130	[[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
131	[[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
132	[[[^\\cX]] [The control character [^['X]]]]
133	[[[^\\l]] [Make the next character lowercase]]
134	[[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]]
135	[[[^\\u]] [Make the next character uppercase]]
136	[[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]]
137	[[[^\\E]] [Terminate [^\\L] or [^\\U]]]
138	[[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
139	[[[^\\g<name>]] [The named backref /name/]]
140	]
141
142	[h2 The Boost-Specific Format Sequences]
143
144	When specifying the `format_all` flag to _regex_replace_, the escape sequences
145	recognized are the same as those above for `format_perl`. In addition, conditional
146	expressions of the following form are recognized:
147
148	[pre
149	?Ntrue-expression:false-expression
150	]
151
152	where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
153	participated in the full match, then the substitution is /true-expression/. Otherwise,
154	it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
155	want a literal paren, you must escape it as [^\\(].
156
157	[h2 Formatter Objects]
158
159	Format strings are not always expressive enough for all your text substitution
160	needs. Consider the simple example of wanting to map input strings to output
161	strings, as you may want to do with environment variables. Rather than a format
162	/string/, for this you would use a formatter /object/. Consider the following
163	code, which finds embedded environment variables of the form `"$(XYZ)"` and
164	computes the substitution string by looking up the environment variable in a
165	map.
166
167	#include <map>
168	#include <string>
169	#include <iostream>
170	#include <boost/xpressive/xpressive.hpp>
171	using namespace boost;
172	using namespace xpressive;
173
174	std::map<std::string, std::string> env;
175
176	std::string const &format_fun(smatch const &what)
177	{
178	return env[what[1].str()];
179	}
180
181	int main()
182	{
183	env["X"] = "this";
184	env["Y"] = "that";
185
186	std::string input("\"$(X)\" has the value \"$(Y)\"");
187
188	// replace strings like "$(XYZ)" with the result of env["XYZ"]
189	sregex envar = "$(" >> (s1 = +_w) >> ')';
190	std::string output = regex_replace(input, envar, format_fun);
191	std::cout << output << std::endl;
192	}
193
194	In this case, we use a function, `format_fun()` to compute the substitution string
195	on the fly. It accepts a _match_results_ object which contains the results of the
196	current match. `format_fun()` uses the first submatch as a key into the global `env`
197	map. The above code displays:
198
199	[pre
200	"this" has the value "that"
201	]
202
203	The formatter need not be an ordinary function. It may be an object of class type.
204	And rather than return a string, it may accept an output iterator into which it
205	writes the substitution. Consider the following, which is functionally equivalent
206	to the above.
207
208	#include <map>
209	#include <string>
210	#include <iostream>
211	#include <boost/xpressive/xpressive.hpp>
212	using namespace boost;
213	using namespace xpressive;
214
215	struct formatter
216	{
217	typedef std::map<std::string, std::string> env_map;
218	env_map env;
219
220	template<typename Out>
221	Out operator()(smatch const &what, Out out) const
222	{
223	env_map::const_iterator where = env.find(what[1]);
224	if(where != env.end())
225	{
226	std::string const &sub = where->second;
227	out = std::copy(sub.begin(), sub.end(), out);
228	}
229	return out;
230	}
231
232	};
233
234	int main()
235	{
236	formatter fmt;
237	fmt.env["X"] = "this";
238	fmt.env["Y"] = "that";
239
240	std::string input("\"$(X)\" has the value \"$(Y)\"");
241
242	sregex envar = "$(" >> (s1 = +_w) >> ')';
243	std::string output = regex_replace(input, envar, fmt);
244	std::cout << output << std::endl;
245	}
246
247	The formatter must be a callable object -- a function or a function object --
248	that has one of three possible signatures, detailed in the table below. For
249	the table, `fmt` is a function pointer or function object, `what` is a
250	_match_results_ object, `out` is an OutputIterator, and `flags` is a value
251	of `regex_constants::match_flag_type`:
252
253	[table Formatter Signatures
254	[
255	[Formatter Invocation]
256	[Return Type]
257	[Semantics]
258	]
259	[
260	[`fmt(what)`]
261	[Range of characters (e.g. `std::string`) or null-terminated string]
262	[The string matched by the regex is replaced with the string returned by
263	the formatter.]
264	]
265	[
266	[`fmt(what, out)`]
267	[OutputIterator]
268	[The formatter writes the replacement string into `out` and returns `out`.]
269	]
270	[
271	[`fmt(what, out, flags)`]
272	[OutputIterator]
273	[The formatter writes the replacement string into `out` and returns `out`.
274	The `flags` parameter is the value of the match flags passed to the
275	_regex_replace_ algorithm.]
276	]
277	]
278
279	[h2 Formatter Expressions]
280
281	In addition to format /strings/ and formatter /objects/, _regex_replace_ also
282	accepts formatter /expressions/. A formatter expression is a lambda expression
283	that generates a string. It uses the same syntax as that for
284	[link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
285	Semantic Actions], which are covered later. The above example, which uses
286	_regex_replace_ to substitute strings for environment variables, is repeated
287	here using a formatter expression.
288
289	#include <map>
290	#include <string>
291	#include <iostream>
292	#include <boost/xpressive/xpressive.hpp>
293	#include <boost/xpressive/regex_actions.hpp>
294	using namespace boost::xpressive;
295
296	int main()
297	{
298	std::map<std::string, std::string> env;
299	env["X"] = "this";
300	env["Y"] = "that";
301
302	std::string input("\"$(X)\" has the value \"$(Y)\"");
303
304	sregex envar = "$(" >> (s1 = +_w) >> ')';
305	std::string output = regex_replace(input, envar, ref(env)[s1]);
306	std::cout << output << std::endl;
307	}
308
309	In the above, the formatter expression is `ref(env)[s1]`. This means to use the
310	value of the first submatch, `s1`, as a key into the `env` map. The purpose of
311	`xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
312	so that the index operation is deferred until we know what to replace `s1` with.
313
314	[endsect]