[ceph.git] / ceph / src / boost / libs / algorithm / string / doc / design.xml

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">


<!-- Copyright (c) 2002-2006 Pavol Droba.
     Subject to the Boost Software License, Version 1.0. 
     (See accompanying file LICENSE_1_0.txt or  http://www.boost.org/LICENSE_1_0.txt)
-->

<section id="string_algo.design" last-revision="$Date$">
    <title>Design Topics</title>

    <using-namespace name="boost"/>
    <using-namespace name="boost::algorithm"/>

    <section id="string_algo.string">
        <title>String Representation</title>

        <para>
            As the name suggest, this library works mainly with strings. However, in the context of this library,
            a string is not restricted to any particular implementation (like <code>std::basic_string</code>),
            rather it is a concept. This allows the algorithms in this library to be reused for any string type,
            that satisfies the given requirements.
        </para>
        <para>
            <emphasis role="bold">Definition:</emphasis> A string is a 
            <ulink url="../../libs/range/index.html">range</ulink> of characters accessible in sequential
            ordered fashion. Character is any value type with "cheap" copying and assignment.                
        </para>
        <para>
            First requirement of string-type is that it must accessible using 
            <ulink url="../../libs/range/index.html">Boost.Range</ulink>. This facility allows to access
            the elements inside the string in a uniform iterator-based fashion. 
            This is sufficient for our library
        </para>
        <para>            
            Second requirement defines the way in which the characters are stored in the string. Algorithms in 
            this library work with an assumption that copying a character is cheaper then allocating extra 
            storage to cache results. This is a natural assumption for common character types. Algorithms will 
            work even if this requirement is not satisfied, however at the cost of performance degradation.
        <para>
        </para>
            In addition some algorithms have additional requirements on the string-type. Particularly, it is required
            that an algorithm can create a new string of the given type. In this case, it is required that
            the type satisfies the sequence (Std &sect;23.1.1) requirements.
        </para>
        <para>
            In the reference and also in the code, requirement on the string type is designated by the name of
            template argument. <code>RangeT</code> means that the basic range requirements must hold.
            <code>SequenceT</code> designates extended sequence requirements.
        </para>
    </section>
    
    <section id="string_algo.sequence_traits">
        <title>Sequence Traits</title>

        <para>
            The major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
            they provide, but rather in the inner details of the class and the way how it performs 
            various operations. The problem is that it is not possible to infer this difference from the 
            definitions of classes without some special mechanism.
            However, some algorithms can run significantly faster with the knowledge of the properties
            of a particular container.
        </para>
        <para>
            Sequence traits allow one to specify additional properties of a sequence container (see Std.&sect;32.2).
            These properties are then used by algorithms to select optimized handling for some operations.
            The sequence traits are declared in the header 
            <headername>boost/algorithm/string/sequence_traits.hpp</headername>.
        </para>

        <para>
            In the table C denotes a container and c is an object of C.
        </para>
        <table>
            <title>Sequence Traits</title>
            <tgroup cols="2" align="left">
                <thead>
                    <row>   
                        <entry>Trait</entry>
                        <entry>Description</entry>
                    </row>
                </thead>
                <tbody>
                    <row>
                        <entry><classname>has_native_replace&lt;C&gt;</classname>::value</entry>
                        <entry>Specifies that the sequence has std::string like replace method</entry>
                    </row>
                    <row>
                        <entry><classname>has_stable_iterators&lt;C&gt;</classname>::value</entry>
                        <entry>
                            Specifies that the sequence has stable iterators. It means,
                            that operations like <code>insert</code>/<code>erase</code>/<code>replace</code> 
                            do not invalidate iterators.
                        </entry>
                    </row>
                    <row>
                        <entry><classname>has_const_time_insert&lt;C&gt;</classname>::value</entry>
                        <entry>
                            Specifies that the insert method of the sequence has 
                            constant time complexity.
                        </entry>
                    </row>
                    <row>
                        <entry><classname>has_const_time_erase&lt;C&gt;</classname>::value</entry>
                        <entry>
                            Specifies that the erase method of the sequence has constant time complexity
                        </entry>
                    </row>
                    </tbody>
            </tgroup>
        </table>
        
        <para>
            Current implementation contains specializations for std::list&lt;T&gt; and
            std::basic_string&lt;T&gt; from the standard library and SGI's std::rope&lt;T&gt; and std::slist&lt;T&gt;.
        </para>
    </section>
    <section id="string_algo.find">
        <title>Find Algorithms</title>

        <para>
            Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different 
            interface which is more suitable for common string operations. 
            Instead of returning just the start of matching subsequence they return a range which is necessary 
            when the length of the matching subsequence is not known beforehand. 
            This feature also allows a partitioning of  the input sequence into three 
            parts: a prefix, a substring and a suffix. 
        </para>
        <para>
            Another difference is an addition of various searching methods besides find_first, including find_regex. 
        </para>
        <para>
            It the library, find algorithms are implemented in terms of 
            <link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities 
            (replace,split).
            For convenience, there are also function wrappers for these finders to simplify find operations.
        </para>
        <para>
            Currently the library contains only naive implementation of find algorithms with complexity 
            O(n * m) where n is the size of the input sequence and m is the size of the search sequence. 
            There are algorithms with complexity O(n), but for smaller sequence a constant overhead is 
            rather big. For small m &lt;&lt; n (m by magnitude smaller than n) the current implementation 
            provides acceptable efficiency. 
            Even the C++ standard defines the required complexity for search algorithm as O(n * m). 
            It is possible that a future version of library will also contain algorithms with linear 
            complexity as an option
        </para>
    </section>
    <section id="string_algo.replace">
        <title>Replace Algorithms</title>

        <para>
            The implementation of replace algorithms follows the layered structure of the library. The 
            lower layer implements generic substitution of a range in the input sequence. 
            This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a 
            <link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two 
            functors define what to replace and what to replace it with. The upper layer functions 
            are just wrapping calls to the lower layer. Finders are shared with the find and split facility. 
        </para>
        <para>
            As usual, the implementation of the lower layer is designed to work with a generic sequence while
            taking advantage of specific features if possible 
            (by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
        </para>         
    </section>
    <section id="string_algo.split">
        <title>Find Iterators &amp; Split Algorithms</title>

        <para>
            Find iterators are a logical extension of the <link linkend="string_algo.find">find facility</link>.
            Instead of searching for one match, the whole input can be iteratively searched for multiple matches.
            The result of the search is then used to partition the input. It depends on the algorithms which parts 
            are returned as the result. They can be the matching parts (<classname>find_iterator</classname>) of the parts in
            between (<classname>split_iterator</classname>). 
        </para>
        <para>
            In addition the split algorithms like <functionname>find_all()</functionname> and <functionname>split()</functionname>
            can simplify the common operations. They use a find iterator to search the whole input and copy the 
            matches they found into the supplied container.
        </para>
    </section>
    <section id="string_algo.exception">
        <title>Exception Safety</title>

        <para>
            The library requires that all operations on types used as template
            or function arguments provide the <emphasis>basic exception-safety guarantee</emphasis>. 
            In turn, all functions and algorithms in this library, except where stated
            otherwise, will provide the <emphasis>basic exception-safety guarantee</emphasis>.
            In other words:
            The library maintains its invariants and does not leak resources in
            the face of exceptions.  Some library operations give stronger
            guarantees, which are documented on an individual basis.
        </para>
        
        <para>
            Some functions can provide the <emphasis>strong exception-safety guarantee</emphasis>.
            That means that following statements are true:    
            <itemizedlist>
                <listitem>
                    If an exception is thrown, there are no effects other than those
                    of the function 
                </listitem>
                <listitem>
                    If an exception is thrown other than by the function, there are no effects
                </listitem>
            </itemizedlist>
            This guarantee can be provided under the condition that the operations 
            on the types used for arguments for these functions either
            provide the strong exception guarantee or do not alter the global state .
         </para>
        <para>
            In the reference, under the term <emphasis>strong exception-safety guarantee</emphasis>, we mean the
            guarantee as defined above.            
        </para>
        <para>
            For more information about the exception safety topics, follow this 
            <ulink url="http://www.boost.org/more/generic_exception_safety.html">link</ulink>
        </para>        
    </section>
</section>
Commit	Line	Data
7c673cae FG	1	<?xml version="1.0" encoding="utf-8"?>
	2	<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
	3	"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
	4
	5
	6	<!-- Copyright (c) 2002-2006 Pavol Droba.
	7	Subject to the Boost Software License, Version 1.0.
	8	(See accompanying file LICENSE_1_0.txt or http://www.boost.org/LICENSE_1_0.txt)
	9	-->
	10
	11	<section id="string_algo.design" last-revision="$Date$">
	12	<title>Design Topics</title>
	13
	14	<using-namespace name="boost"/>
	15	<using-namespace name="boost::algorithm"/>
	16
	17	<section id="string_algo.string">
	18	<title>String Representation</title>
	19
	20	<para>
	21	As the name suggest, this library works mainly with strings. However, in the context of this library,
	22	a string is not restricted to any particular implementation (like <code>std::basic_string</code>),
	23	rather it is a concept. This allows the algorithms in this library to be reused for any string type,
	24	that satisfies the given requirements.
	25	</para>
	26	<para>
	27	<emphasis role="bold">Definition:</emphasis> A string is a
	28	<ulink url="../../libs/range/index.html">range</ulink> of characters accessible in sequential
	29	ordered fashion. Character is any value type with "cheap" copying and assignment.
	30	</para>
	31	<para>
	32	First requirement of string-type is that it must accessible using
	33	<ulink url="../../libs/range/index.html">Boost.Range</ulink>. This facility allows to access
	34	the elements inside the string in a uniform iterator-based fashion.
	35	This is sufficient for our library
	36	</para>
	37	<para>
	38	Second requirement defines the way in which the characters are stored in the string. Algorithms in
	39	this library work with an assumption that copying a character is cheaper then allocating extra
	40	storage to cache results. This is a natural assumption for common character types. Algorithms will
	41	work even if this requirement is not satisfied, however at the cost of performance degradation.
	42	<para>
	43	</para>
	44	In addition some algorithms have additional requirements on the string-type. Particularly, it is required
	45	that an algorithm can create a new string of the given type. In this case, it is required that
	46	the type satisfies the sequence (Std §23.1.1) requirements.
	47	</para>
	48	<para>
	49	In the reference and also in the code, requirement on the string type is designated by the name of
	50	template argument. <code>RangeT</code> means that the basic range requirements must hold.
	51	<code>SequenceT</code> designates extended sequence requirements.
	52	</para>
	53	</section>
	54
	55	<section id="string_algo.sequence_traits">
	56	<title>Sequence Traits</title>
	57
	58	<para>
	59	The major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
	60	they provide, but rather in the inner details of the class and the way how it performs
	61	various operations. The problem is that it is not possible to infer this difference from the
	62	definitions of classes without some special mechanism.
	63	However, some algorithms can run significantly faster with the knowledge of the properties
	64	of a particular container.
65	</para>
66	<para>
67	Sequence traits allow one to specify additional properties of a sequence container (see Std.§32.2).
68	These properties are then used by algorithms to select optimized handling for some operations.
69	The sequence traits are declared in the header
70	<headername>boost/algorithm/string/sequence_traits.hpp</headername>.
71	</para>
72
73	<para>
74	In the table C denotes a container and c is an object of C.
75	</para>
76	<table>
77	<title>Sequence Traits</title>
78	<tgroup cols="2" align="left">
79	<thead>
80	<row>
81	<entry>Trait</entry>
82	<entry>Description</entry>
83	</row>
84	</thead>
85	<tbody>
86	<row>
87	<entry><classname>has_native_replace<C></classname>::value</entry>
88	<entry>Specifies that the sequence has std::string like replace method</entry>
89	</row>
90	<row>
91	<entry><classname>has_stable_iterators<C></classname>::value</entry>
92	<entry>
93	Specifies that the sequence has stable iterators. It means,
94	that operations like <code>insert</code>/<code>erase</code>/<code>replace</code>
95	do not invalidate iterators.
96	</entry>
97	</row>
98	<row>
99	<entry><classname>has_const_time_insert<C></classname>::value</entry>
100	<entry>
101	Specifies that the insert method of the sequence has
102	constant time complexity.
103	</entry>
104	</row>
105	<row>
106	<entry><classname>has_const_time_erase<C></classname>::value</entry>
107	<entry>
108	Specifies that the erase method of the sequence has constant time complexity
109	</entry>
110	</row>
111	</tbody>
112	</tgroup>
113	</table>
114
115	<para>
116	Current implementation contains specializations for std::list<T> and
117	std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>.
118	</para>
119	</section>
120	<section id="string_algo.find">
121	<title>Find Algorithms</title>
122
123	<para>
124	Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different
125	interface which is more suitable for common string operations.
126	Instead of returning just the start of matching subsequence they return a range which is necessary
127	when the length of the matching subsequence is not known beforehand.
128	This feature also allows a partitioning of the input sequence into three
129	parts: a prefix, a substring and a suffix.
130	</para>
131	<para>
132	Another difference is an addition of various searching methods besides find_first, including find_regex.
133	</para>
134	<para>
135	It the library, find algorithms are implemented in terms of
136	<link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities
137	(replace,split).
138	For convenience, there are also function wrappers for these finders to simplify find operations.
139	</para>
140	<para>
141	Currently the library contains only naive implementation of find algorithms with complexity
142	O(n * m) where n is the size of the input sequence and m is the size of the search sequence.
143	There are algorithms with complexity O(n), but for smaller sequence a constant overhead is
144	rather big. For small m << n (m by magnitude smaller than n) the current implementation
145	provides acceptable efficiency.
146	Even the C++ standard defines the required complexity for search algorithm as O(n * m).
147	It is possible that a future version of library will also contain algorithms with linear
148	complexity as an option
149	</para>
150	</section>
151	<section id="string_algo.replace">
152	<title>Replace Algorithms</title>
153
154	<para>
155	The implementation of replace algorithms follows the layered structure of the library. The
156	lower layer implements generic substitution of a range in the input sequence.
157	This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a
158	<link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two
159	functors define what to replace and what to replace it with. The upper layer functions
160	are just wrapping calls to the lower layer. Finders are shared with the find and split facility.
161	</para>
162	<para>
163	As usual, the implementation of the lower layer is designed to work with a generic sequence while
164	taking advantage of specific features if possible
165	(by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
166	</para>
167	</section>
168	<section id="string_algo.split">
169	<title>Find Iterators & Split Algorithms</title>
170
171	<para>
172	Find iterators are a logical extension of the <link linkend="string_algo.find">find facility</link>.
173	Instead of searching for one match, the whole input can be iteratively searched for multiple matches.
174	The result of the search is then used to partition the input. It depends on the algorithms which parts
175	are returned as the result. They can be the matching parts (<classname>find_iterator</classname>) of the parts in
176	between (<classname>split_iterator</classname>).
177	</para>
178	<para>
179	In addition the split algorithms like <functionname>find_all()</functionname> and <functionname>split()</functionname>
180	can simplify the common operations. They use a find iterator to search the whole input and copy the
181	matches they found into the supplied container.
182	</para>
183	</section>
184	<section id="string_algo.exception">
185	<title>Exception Safety</title>
186
187	<para>
188	The library requires that all operations on types used as template
189	or function arguments provide the <emphasis>basic exception-safety guarantee</emphasis>.
190	In turn, all functions and algorithms in this library, except where stated
191	otherwise, will provide the <emphasis>basic exception-safety guarantee</emphasis>.
192	In other words:
193	The library maintains its invariants and does not leak resources in
194	the face of exceptions. Some library operations give stronger
195	guarantees, which are documented on an individual basis.
196	</para>
197
198	<para>
199	Some functions can provide the <emphasis>strong exception-safety guarantee</emphasis>.
200	That means that following statements are true:
201	<itemizedlist>
202	<listitem>
203	If an exception is thrown, there are no effects other than those
204	of the function
205	</listitem>
206	<listitem>
207	If an exception is thrown other than by the function, there are no effects
208	</listitem>
209	</itemizedlist>
210	This guarantee can be provided under the condition that the operations
211	on the types used for arguments for these functions either
212	provide the strong exception guarantee or do not alter the global state .
213	</para>
214	<para>
215	In the reference, under the term <emphasis>strong exception-safety guarantee</emphasis>, we mean the
216	guarantee as defined above.
217	</para>
218	<para>
219	For more information about the exception safety topics, follow this
220	<ulink url="http://www.boost.org/more/generic_exception_safety.html">link</ulink>
221	</para>
222	</section>
223	</section>