]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" | |
3 | "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"> | |
4 | ||
5 | ||
6 | <!-- Copyright (c) 2002-2006 Pavol Droba. | |
7 | Subject to the Boost Software License, Version 1.0. | |
8 | (See accompanying file LICENSE_1_0.txt or http://www.boost.org/LICENSE_1_0.txt) | |
9 | --> | |
10 | ||
11 | ||
12 | <section id="string_algo.usage" last-revision="$Date$"> | |
13 | <title>Usage</title> | |
14 | ||
15 | <using-namespace name="boost"/> | |
16 | <using-namespace name="boost::algorithm"/> | |
17 | ||
18 | ||
19 | <section> | |
20 | <title>First Example</title> | |
21 | ||
22 | <para> | |
23 | Using the algorithms is straightforward. Let us have a look at the first example: | |
24 | </para> | |
25 | <programlisting> | |
26 | #include <boost/algorithm/string.hpp> | |
27 | using namespace std; | |
28 | using namespace boost; | |
29 | ||
30 | // ... | |
31 | ||
32 | string str1(" hello world! "); | |
33 | to_upper(str1); // str1 == " HELLO WORLD! " | |
34 | trim(str1); // str1 == "HELLO WORLD!" | |
35 | ||
36 | string str2= | |
37 | to_lower_copy( | |
38 | ireplace_first_copy( | |
39 | str1,"hello","goodbye")); // str2 == "goodbye world!" | |
40 | </programlisting> | |
41 | <para> | |
42 | This example converts str1 to upper case and trims spaces from the start and the end | |
43 | of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye". | |
44 | This example demonstrates several important concepts used in the library: | |
45 | </para> | |
46 | <itemizedlist> | |
47 | <listitem> | |
48 | <para><emphasis role="bold">Container parameters:</emphasis> | |
49 | Unlike in the STL algorithms, parameters are not specified only in the form | |
50 | of iterators. The STL convention allows for great flexibility, | |
51 | but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together, | |
52 | because a container is passed in two parameters. Therefore it is not possible to use | |
53 | a return value from another algorithm. It is considerably easier to write | |
54 | <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>. | |
55 | </para> | |
56 | <para> | |
57 | The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> | |
58 | provides a uniform way of handling different string types. | |
59 | If there is a need to pass a pair of iterators, | |
60 | <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> | |
61 | can be used to package iterators into a structure with a compatible interface. | |
62 | </para> | |
63 | </listitem> | |
64 | <listitem> | |
65 | <para><emphasis role="bold">Copy vs. Mutable:</emphasis> | |
66 | Many algorithms in the library are performing a transformation of the input. | |
67 | The transformation can be done in-place, mutating the input sequence, or a copy | |
68 | of the transformed input can be created, leaving the input intact. None of | |
69 | these possibilities is superior to the other one and both have different | |
70 | advantages and disadvantages. For this reason, both are provided with the library. | |
71 | </para> | |
72 | </listitem> | |
73 | <listitem> | |
74 | <para><emphasis role="bold">Algorithm stacking:</emphasis> | |
75 | Copy versions return a transformed input as a result, thus allow a simple chaining of | |
76 | transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>). | |
77 | Mutable versions have <code>void</code> return, to avoid misuse. | |
78 | </para> | |
79 | </listitem> | |
80 | <listitem> | |
81 | <para><emphasis role="bold">Naming:</emphasis> | |
82 | Naming follows the conventions from the Standard C++ Library. If there is a | |
83 | copy and a mutable version of the same algorithm, the mutable version has no suffix | |
84 | and the copy version has the suffix <emphasis>_copy</emphasis>. | |
85 | Some algorithms have the prefix <emphasis>i</emphasis> | |
86 | (e.g. <functionname>ifind_first()</functionname>). | |
87 | This prefix identifies that the algorithm works in a case-insensitive manner. | |
88 | </para> | |
89 | </listitem> | |
90 | </itemizedlist> | |
91 | <para> | |
92 | To use the library, include the <headername>boost/algorithm/string.hpp</headername> header. | |
93 | If the regex related functions are needed, include the | |
94 | <headername>boost/algorithm/string_regex.hpp</headername> header. | |
95 | </para> | |
96 | </section> | |
97 | <section> | |
98 | <title>Case conversion</title> | |
99 | ||
100 | <para> | |
101 | STL has a nice way of converting character case. Unfortunately, it works only | |
102 | for a single character and we want to convert a string, | |
103 | </para> | |
104 | <programlisting> | |
105 | string str1("HeLlO WoRld!"); | |
106 | to_upper(str1); // str1=="HELLO WORLD!" | |
107 | </programlisting> | |
108 | <para> | |
109 | <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of | |
110 | characters in a string using a specified locale. | |
111 | </para> | |
112 | <para> | |
113 | For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>. | |
114 | </para> | |
115 | </section> | |
116 | <section> | |
117 | <title>Predicates and Classification</title> | |
118 | <para> | |
119 | A part of the library deals with string related predicates. Consider this example: | |
120 | </para> | |
121 | <programlisting> | |
122 | bool is_executable( string& filename ) | |
123 | { | |
124 | return | |
125 | iends_with(filename, ".exe") || | |
126 | iends_with(filename, ".com"); | |
127 | } | |
128 | ||
129 | // ... | |
130 | string str1("command.com"); | |
131 | cout | |
132 | << str1 | |
133 | << (is_executable(str1)? "is": "is not") | |
134 | << "an executable" | |
135 | << endl; // prints "command.com is an executable" | |
136 | ||
137 | //.. | |
138 | char text1[]="hello"; | |
139 | cout | |
140 | << text1 | |
141 | << (all( text1, is_lower() )? " is": " is not") | |
142 | << " written in the lower case" | |
143 | << endl; // prints "hello is written in the lower case" | |
144 | </programlisting> | |
145 | <para> | |
146 | The predicates determine whether if a substring is contained in the input string | |
147 | under various conditions. The conditions are: a string starts with the substring, | |
148 | ends with the substring, | |
149 | simply contains the substring or if both strings are equal. See the reference for | |
150 | <headername>boost/algorithm/string/predicate.hpp</headername> for more details. | |
151 | </para> | |
152 | <para> | |
153 | Note that if we had used "hello world" as the input to the test, it would have | |
154 | output "hello world is not written in the lower case" because the space in the | |
155 | input string is not a lower case letter. | |
156 | </para> | |
157 | <para> | |
158 | In addition the algorithm <functionname>all()</functionname> checks | |
159 | all elements of a container to satisfy a condition specified by a predicate. | |
160 | This predicate can be any unary predicate, but the library provides a bunch of | |
161 | useful string-related predicates and combinators ready for use. | |
162 | These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header. | |
163 | Classification predicates can be combined using logical combinators to form | |
164 | a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code> | |
165 | </para> | |
166 | </section> | |
167 | <section> | |
168 | <title>Trimming</title> | |
169 | ||
170 | <para> | |
171 | When parsing the input from a user, strings often have unwanted leading or trailing | |
172 | characters. To get rid of them, we need trim functions: | |
173 | </para> | |
174 | <programlisting> | |
175 | string str1=" hello world! "; | |
176 | string str2=trim_left_copy(str1); // str2 == "hello world! " | |
177 | string str3=trim_right_copy(str1); // str3 == " hello world!" | |
178 | trim(str1); // str1 == "hello world!" | |
179 | ||
180 | string phone="00423333444"; | |
181 | // remove leading 0 from the phone number | |
182 | trim_left_if(phone,is_any_of("0")); // phone == "423333444" | |
183 | </programlisting> | |
184 | <para> | |
185 | It is possible to trim the spaces on the right, on the left or on both sides of a string. | |
186 | And for those cases when there is a need to remove something else than blank space, there | |
187 | are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will | |
188 | select the <emphasis>space</emphasis> to be removed. It is possible to use classification | |
189 | predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph. | |
190 | See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>. | |
191 | </para> | |
192 | </section> | |
193 | <section> | |
194 | <title>Find algorithms</title> | |
195 | ||
196 | <para> | |
197 | The library contains a set of find algorithms. Here is an example: | |
198 | </para> | |
199 | <programlisting> | |
200 | char text[]="hello dolly!"; | |
201 | iterator_range<char*> result=find_last(text,"ll"); | |
202 | ||
203 | transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) ); | |
204 | // text = "hello dommy!" | |
205 | ||
206 | to_upper(result); // text == "hello doMMy!" | |
207 | ||
208 | // iterator_range is convertible to bool | |
209 | if(find_first(text, "dolly")) | |
210 | { | |
211 | cout << "Dolly is there" << endl; | |
212 | } | |
213 | </programlisting> | |
214 | <para> | |
215 | We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll". | |
216 | The result is given in the <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink>. | |
217 | This range delimits the | |
218 | part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll". | |
219 | ||
220 | As we can see, input of the <functionname>find_last()</functionname> algorithm can be also | |
221 | char[] because this type is supported by | |
222 | <ulink url="../../libs/range/index.html">Boost.Range</ulink>. | |
223 | ||
224 | The following lines transform the result. Notice that | |
225 | <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> has familiar | |
226 | <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container. | |
227 | Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking. | |
228 | </para> | |
229 | <para> | |
230 | Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>. | |
231 | </para> | |
232 | </section> | |
233 | <section> | |
234 | <title>Replace Algorithms</title> | |
235 | <para> | |
236 | Find algorithms can be used for searching for a specific part of string. Replace goes one step | |
237 | further. After a matching part is found, it is substituted with something else. The substitution is computed | |
238 | from the original, using some transformation. | |
239 | </para> | |
240 | <programlisting> | |
241 | string str1="Hello Dolly, Hello World!" | |
242 | replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!" | |
243 | replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!" | |
244 | erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!" | |
245 | erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!" | |
246 | </programlisting> | |
247 | <para> | |
248 | For the complete list of replace and erase functions see the | |
249 | <link linkend="string_algo.reference">reference</link>. | |
250 | There is a lot of predefined function for common usage, however, the library allows you to | |
251 | define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> | |
252 | function which takes two parameters. | |
253 | The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is | |
254 | a <link linkend="string_algo.formatter_concept">Formatter</link> object. | |
255 | The Finder object is a functor which performs the searching for the replacement part. The Formatter object | |
256 | takes the result of the Finder (usually a reference to the found substring) and creates a | |
257 | substitute for it. Replace algorithm puts these two together and makes the desired substitution. | |
258 | </para> | |
259 | <para> | |
260 | Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and | |
261 | <headername>boost/algorithm/string/find_format.hpp</headername> for reference. | |
262 | </para> | |
263 | </section> | |
264 | <section> | |
265 | <title>Find Iterator</title> | |
266 | ||
267 | <para> | |
268 | An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string, | |
269 | the find iterator allows us to iterate over the substrings matching the specified criteria. | |
270 | This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally | |
271 | search the string. | |
272 | Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/html/range/reference/utilities/iterator_range.html"><code>boost::iterator_range</code></ulink> | |
273 | object, that delimits the current match. | |
274 | </para> | |
275 | <para> | |
276 | There are two iterators provided <classname>find_iterator</classname> and | |
277 | <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified | |
278 | Finder. The latter iterates over the gaps between these substrings. | |
279 | </para> | |
280 | <programlisting> | |
281 | string str1("abc-*-ABC-*-aBc"); | |
282 | // Find all 'abc' substrings (ignoring the case) | |
283 | // Create a find_iterator | |
284 | typedef find_iterator<string::iterator> string_find_iterator; | |
285 | for(string_find_iterator It= | |
286 | make_find_iterator(str1, first_finder("abc", is_iequal())); | |
287 | It!=string_find_iterator(); | |
288 | ++It) | |
289 | { | |
290 | cout << copy_range<std::string>(*It) << endl; | |
291 | } | |
292 | ||
293 | // Output will be: | |
294 | // abc | |
295 | // ABC | |
296 | // aBC | |
297 | ||
298 | typedef split_iterator<string::iterator> string_split_iterator; | |
299 | for(string_split_iterator It= | |
300 | make_split_iterator(str1, first_finder("-*-", is_iequal())); | |
301 | It!=string_split_iterator(); | |
302 | ++It) | |
303 | { | |
304 | cout << copy_range<std::string>(*It) << endl; | |
305 | } | |
306 | ||
307 | // Output will be: | |
308 | // abc | |
309 | // ABC | |
310 | // aBC | |
311 | </programlisting> | |
312 | <para> | |
313 | Note that the find iterators have only one template parameter. It is the base iterator type. | |
314 | The Finder is specified at runtime. This allows us to typedef a find iterator for | |
315 | common string types and reuse it. Additionally make_*_iterator functions help | |
316 | to construct a find iterator for a particular range. | |
317 | </para> | |
318 | <para> | |
319 | See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>. | |
320 | </para> | |
321 | </section> | |
322 | <section> | |
323 | <title>Split</title> | |
324 | ||
325 | <para> | |
326 | Split algorithms are an extension to the find iterator for one common usage scenario. | |
327 | These algorithms use a find iterator and store all matches into the provided | |
328 | container. This container must be able to hold copies (e.g. <code>std::string</code>) or | |
329 | references (e.g. <code>iterator_range</code>) of the extracted substrings. | |
330 | </para> | |
331 | <para> | |
332 | Two algorithms are provided. <functionname>find_all()</functionname> finds all copies | |
333 | of a string in the input. <functionname>split()</functionname> splits the input into parts. | |
334 | </para> | |
335 | ||
336 | <programlisting> | |
337 | string str1("hello abc-*-ABC-*-aBc goodbye"); | |
338 | ||
339 | typedef vector< iterator_range<string::iterator> > find_vector_type; | |
340 | ||
341 | find_vector_type FindVec; // #1: Search for separators | |
342 | ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] } | |
343 | ||
344 | typedef vector< string > split_vector_type; | |
345 | ||
346 | split_vector_type SplitVec; // #2: Search for tokens | |
347 | split( SplitVec, str1, is_any_of("-*"), token_compress_on ); // SplitVec == { "hello abc","ABC","aBc goodbye" } | |
348 | </programlisting> | |
349 | <para> | |
350 | <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring. | |
351 | </para> | |
352 | <para> | |
353 | First example show how to construct a container to hold references to all extracted | |
354 | substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references | |
355 | to all substrings that are in case-insensitive manner equal to "abc". | |
356 | </para> | |
357 | <para> | |
358 | Second example uses <functionname>split()</functionname> to split string str1 into parts | |
359 | separated by characters '-' or '*'. These parts are then put into the SplitVec. | |
360 | It is possible to specify if adjacent separators are concatenated or not. | |
361 | </para> | |
362 | <para> | |
363 | More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>. | |
364 | </para> | |
365 | </section> | |
366 | </section> |