]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" | |
3 | "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"> | |
4 | ||
5 | ||
6 | <!-- Copyright (c) 2002-2006 Pavol Droba. | |
7 | Subject to the Boost Software License, Version 1.0. | |
8 | (See accompanying file LICENSE_1_0.txt or http://www.boost.org/LICENSE_1_0.txt) | |
9 | --> | |
10 | ||
11 | <section id="string_algo.design" last-revision="$Date$"> | |
12 | <title>Design Topics</title> | |
13 | ||
14 | <using-namespace name="boost"/> | |
15 | <using-namespace name="boost::algorithm"/> | |
16 | ||
17 | <section id="string_algo.string"> | |
18 | <title>String Representation</title> | |
19 | ||
20 | <para> | |
21 | As the name suggest, this library works mainly with strings. However, in the context of this library, | |
22 | a string is not restricted to any particular implementation (like <code>std::basic_string</code>), | |
23 | rather it is a concept. This allows the algorithms in this library to be reused for any string type, | |
24 | that satisfies the given requirements. | |
25 | </para> | |
26 | <para> | |
27 | <emphasis role="bold">Definition:</emphasis> A string is a | |
28 | <ulink url="../../libs/range/index.html">range</ulink> of characters accessible in sequential | |
29 | ordered fashion. Character is any value type with "cheap" copying and assignment. | |
30 | </para> | |
31 | <para> | |
32 | First requirement of string-type is that it must accessible using | |
33 | <ulink url="../../libs/range/index.html">Boost.Range</ulink>. This facility allows to access | |
34 | the elements inside the string in a uniform iterator-based fashion. | |
35 | This is sufficient for our library | |
36 | </para> | |
37 | <para> | |
38 | Second requirement defines the way in which the characters are stored in the string. Algorithms in | |
39 | this library work with an assumption that copying a character is cheaper then allocating extra | |
40 | storage to cache results. This is a natural assumption for common character types. Algorithms will | |
41 | work even if this requirement is not satisfied, however at the cost of performance degradation. | |
42 | <para> | |
43 | </para> | |
44 | In addition some algorithms have additional requirements on the string-type. Particularly, it is required | |
45 | that an algorithm can create a new string of the given type. In this case, it is required that | |
46 | the type satisfies the sequence (Std §23.1.1) requirements. | |
47 | </para> | |
48 | <para> | |
49 | In the reference and also in the code, requirement on the string type is designated by the name of | |
50 | template argument. <code>RangeT</code> means that the basic range requirements must hold. | |
51 | <code>SequenceT</code> designates extended sequence requirements. | |
52 | </para> | |
53 | </section> | |
54 | ||
55 | <section id="string_algo.sequence_traits"> | |
56 | <title>Sequence Traits</title> | |
57 | ||
58 | <para> | |
59 | The major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces | |
60 | they provide, but rather in the inner details of the class and the way how it performs | |
61 | various operations. The problem is that it is not possible to infer this difference from the | |
62 | definitions of classes without some special mechanism. | |
63 | However, some algorithms can run significantly faster with the knowledge of the properties | |
64 | of a particular container. | |
65 | </para> | |
66 | <para> | |
67 | Sequence traits allow one to specify additional properties of a sequence container (see Std.§32.2). | |
68 | These properties are then used by algorithms to select optimized handling for some operations. | |
69 | The sequence traits are declared in the header | |
70 | <headername>boost/algorithm/string/sequence_traits.hpp</headername>. | |
71 | </para> | |
72 | ||
73 | <para> | |
74 | In the table C denotes a container and c is an object of C. | |
75 | </para> | |
76 | <table> | |
77 | <title>Sequence Traits</title> | |
78 | <tgroup cols="2" align="left"> | |
79 | <thead> | |
80 | <row> | |
81 | <entry>Trait</entry> | |
82 | <entry>Description</entry> | |
83 | </row> | |
84 | </thead> | |
85 | <tbody> | |
86 | <row> | |
87 | <entry><classname>has_native_replace<C></classname>::value</entry> | |
88 | <entry>Specifies that the sequence has std::string like replace method</entry> | |
89 | </row> | |
90 | <row> | |
91 | <entry><classname>has_stable_iterators<C></classname>::value</entry> | |
92 | <entry> | |
93 | Specifies that the sequence has stable iterators. It means, | |
94 | that operations like <code>insert</code>/<code>erase</code>/<code>replace</code> | |
95 | do not invalidate iterators. | |
96 | </entry> | |
97 | </row> | |
98 | <row> | |
99 | <entry><classname>has_const_time_insert<C></classname>::value</entry> | |
100 | <entry> | |
101 | Specifies that the insert method of the sequence has | |
102 | constant time complexity. | |
103 | </entry> | |
104 | </row> | |
105 | <row> | |
106 | <entry><classname>has_const_time_erase<C></classname>::value</entry> | |
107 | <entry> | |
108 | Specifies that the erase method of the sequence has constant time complexity | |
109 | </entry> | |
110 | </row> | |
111 | </tbody> | |
112 | </tgroup> | |
113 | </table> | |
114 | ||
115 | <para> | |
116 | Current implementation contains specializations for std::list<T> and | |
117 | std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>. | |
118 | </para> | |
119 | </section> | |
120 | <section id="string_algo.find"> | |
121 | <title>Find Algorithms</title> | |
122 | ||
123 | <para> | |
124 | Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different | |
125 | interface which is more suitable for common string operations. | |
126 | Instead of returning just the start of matching subsequence they return a range which is necessary | |
127 | when the length of the matching subsequence is not known beforehand. | |
128 | This feature also allows a partitioning of the input sequence into three | |
129 | parts: a prefix, a substring and a suffix. | |
130 | </para> | |
131 | <para> | |
132 | Another difference is an addition of various searching methods besides find_first, including find_regex. | |
133 | </para> | |
134 | <para> | |
135 | It the library, find algorithms are implemented in terms of | |
136 | <link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities | |
137 | (replace,split). | |
138 | For convenience, there are also function wrappers for these finders to simplify find operations. | |
139 | </para> | |
140 | <para> | |
141 | Currently the library contains only naive implementation of find algorithms with complexity | |
142 | O(n * m) where n is the size of the input sequence and m is the size of the search sequence. | |
143 | There are algorithms with complexity O(n), but for smaller sequence a constant overhead is | |
144 | rather big. For small m << n (m by magnitude smaller than n) the current implementation | |
145 | provides acceptable efficiency. | |
146 | Even the C++ standard defines the required complexity for search algorithm as O(n * m). | |
147 | It is possible that a future version of library will also contain algorithms with linear | |
148 | complexity as an option | |
149 | </para> | |
150 | </section> | |
151 | <section id="string_algo.replace"> | |
152 | <title>Replace Algorithms</title> | |
153 | ||
154 | <para> | |
155 | The implementation of replace algorithms follows the layered structure of the library. The | |
156 | lower layer implements generic substitution of a range in the input sequence. | |
157 | This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a | |
158 | <link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two | |
159 | functors define what to replace and what to replace it with. The upper layer functions | |
160 | are just wrapping calls to the lower layer. Finders are shared with the find and split facility. | |
161 | </para> | |
162 | <para> | |
163 | As usual, the implementation of the lower layer is designed to work with a generic sequence while | |
164 | taking advantage of specific features if possible | |
165 | (by using <link linkend="string_algo.sequence_traits">Sequence traits</link>) | |
166 | </para> | |
167 | </section> | |
168 | <section id="string_algo.split"> | |
169 | <title>Find Iterators & Split Algorithms</title> | |
170 | ||
171 | <para> | |
172 | Find iterators are a logical extension of the <link linkend="string_algo.find">find facility</link>. | |
173 | Instead of searching for one match, the whole input can be iteratively searched for multiple matches. | |
174 | The result of the search is then used to partition the input. It depends on the algorithms which parts | |
175 | are returned as the result. They can be the matching parts (<classname>find_iterator</classname>) of the parts in | |
176 | between (<classname>split_iterator</classname>). | |
177 | </para> | |
178 | <para> | |
179 | In addition the split algorithms like <functionname>find_all()</functionname> and <functionname>split()</functionname> | |
180 | can simplify the common operations. They use a find iterator to search the whole input and copy the | |
181 | matches they found into the supplied container. | |
182 | </para> | |
183 | </section> | |
184 | <section id="string_algo.exception"> | |
185 | <title>Exception Safety</title> | |
186 | ||
187 | <para> | |
188 | The library requires that all operations on types used as template | |
189 | or function arguments provide the <emphasis>basic exception-safety guarantee</emphasis>. | |
190 | In turn, all functions and algorithms in this library, except where stated | |
191 | otherwise, will provide the <emphasis>basic exception-safety guarantee</emphasis>. | |
192 | In other words: | |
193 | The library maintains its invariants and does not leak resources in | |
194 | the face of exceptions. Some library operations give stronger | |
195 | guarantees, which are documented on an individual basis. | |
196 | </para> | |
197 | ||
198 | <para> | |
199 | Some functions can provide the <emphasis>strong exception-safety guarantee</emphasis>. | |
200 | That means that following statements are true: | |
201 | <itemizedlist> | |
202 | <listitem> | |
203 | If an exception is thrown, there are no effects other than those | |
204 | of the function | |
205 | </listitem> | |
206 | <listitem> | |
207 | If an exception is thrown other than by the function, there are no effects | |
208 | </listitem> | |
209 | </itemizedlist> | |
210 | This guarantee can be provided under the condition that the operations | |
211 | on the types used for arguments for these functions either | |
212 | provide the strong exception guarantee or do not alter the global state . | |
213 | </para> | |
214 | <para> | |
215 | In the reference, under the term <emphasis>strong exception-safety guarantee</emphasis>, we mean the | |
216 | guarantee as defined above. | |
217 | </para> | |
218 | <para> | |
219 | For more information about the exception safety topics, follow this | |
220 | <ulink url="http://www.boost.org/more/generic_exception_safety.html">link</ulink> | |
221 | </para> | |
222 | </section> | |
223 | </section> |