]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/filesystem/doc/design.htm
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / filesystem / doc / design.htm
1 <html>
2
3 <head>
4 <meta http-equiv="Content-Language" content="en-us">
5 <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
6 <meta name="ProgId" content="FrontPage.Editor.Document">
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <title>Boost Filesystem Library Design</title>
9 <link href="styles.css" rel="stylesheet">
10 </head>
11
12 <body bgcolor="#FFFFFF">
13
14 <h1>
15 <img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem
16 Library Design</h1>
17
18 <p><a href="#Introduction">Introduction</a><br>
19 <a href="#Requirements">Requirements</a><br>
20 <a href="#Realities">Realities</a><br>
21 <a href="#Rationale">Rationale</a><br>
22 <a href="#Abandoned_Designs">Abandoned_Designs</a><br>
23 <a href="#References">References</a></p>
24
25 <h2><a name="Introduction">Introduction</a></h2>
26
27 <p>The primary motivation for beginning work on the Filesystem Library was
28 frustration with Boost administrative tools.&nbsp; Scripts were written in
29 Python, Perl, Bash, and Windows command languages.&nbsp; There was no single
30 scripting language familiar and acceptable to all Boost administrators. Yet they
31 were all skilled C++ programmers - why couldn't C++ be used as the scripting
32 language?</p>
33
34 <p>The key feature C++ lacked for script-like applications was the ability to
35 perform portable filesystem operations on directories and their contents. The
36 Filesystem Library was developed to fill that void.</p>
37
38 <p>The intent is not to compete with traditional scripting languages, but to
39 provide a solution for situations where C++ is already the language
40 of choice..</p>
41
42 <h2><a name="Requirements">Requirements</a></h2>
43 <ul>
44 <li>Be able to write portable script-style filesystem operations in modern
45 C++.<br>
46 <br>
47 Rationale: This is a common programming need. It is both an
48 embarrassment and a hardship that this is not possible with either the current
49 C++ or Boost libraries.&nbsp; The need is particularly acute
50 when C++ is the only toolset allowed in the tool chain.&nbsp; File system
51 operations are provided by many languages&nbsp;used on multiple platforms,
52 such as Perl and Python, as well as by many platform specific scripting
53 languages. All operating systems provide some form of API for filesystem
54 operations, and the POSIX bindings are increasingly available even on
55 operating systems not normally associated with POSIX, such as the Mac, z/OS,
56 or OS/390.<br>
57 &nbsp;</li>
58 <li>Work within the <a href="#Realities">realities</a> described below.<br>
59 <br>
60 Rationale: This isn't a research project. The need is for something that works on
61 today's platforms, including some of the embedded operating systems
62 with limited file systems. Because of the emphasis on portability, such a
63 library would be much more useful if standardized. That means being able to
64 work with a much wider range of platforms that just Unix or Windows and their
65 clones.<br>
66 &nbsp;</li>
67 <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications
68 and use of global variables.&nbsp;If a dangerous feature is provided, identify it as such.<br>
69 <br>
70 Rationale: Normally this would be covered by &quot;the usual Boost requirements...&quot;,
71 but it is mentioned explicitly because the equivalent native platform and
72 scripting language interfaces often depend on all-too-easy-to-ignore error
73 notifications and global variables like &quot;current
74 working directory&quot;.<br>
75 &nbsp;</li>
76 <li>Structure the library so that it is still useful even if some functionality
77 does not map well onto a given platform or directory tree. Particularly, much
78 useful functionality should be portable even to flat
79 (non-hierarchical) filesystems.<br>
80 <br>
81 Rationale: Much functionality which does not
82 require a hierarchical directory structure is still useful on flat-structure
83 filesystems.&nbsp; There are many systems, particularly embedded systems,
84 where even very limited functionality is still useful.</li>
85 </ul>
86 <ul>
87 <li>Interface smoothly with current C++ Standard Library input/output
88 facilities.&nbsp; For example, paths should be
89 easy to use in std::basic_fstream constructors.<br>
90 <br>
91 Rationale: One of the most common uses of file system functionality is to
92 manipulate paths for eventual use in input/output operations.&nbsp;
93 Thus the need to interface smoothly with standard library I/O.<br>
94 &nbsp;</li>
95 <li>Suitable for eventual standardization. The implication of this requirement
96 is that the interface be close to minimal, and that great care be take
97 regarding portability.<br>
98 <br>
99 Rationale: The lack of file system operations is a serious hole
100 in the current standard, with no other known candidates to fill that hole.
101 Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for
102 standardization.<br>
103 &nbsp;</li>
104 <li>The usual Boost <a href="http://www.boost.org/more/lib_guide.htm">requirements and
105 guidelines</a> apply.<br>
106 &nbsp;</li>
107 <li>Encourage, but do not require, portability in path names.<br>
108 <br>
109 Rationale: For paths which originate from user input it is unreasonable to
110 require portable path syntax.<br>
111 &nbsp;</li>
112 <li>Avoid giving the illusion of portability where portability in fact does not
113 exist.<br>
114 <br>
115 Rationale: Leaving important behavior unspecified or &quot;implementation defined&quot; does a
116 great disservice to programmers using a library because it makes it appear
117 that code relying on the behavior is portable, when in fact there is nothing
118 portable about it. The only case where such under-specification is acceptable is when both users and implementors know from
119 other sources exactly what behavior is required, yet for some reason it isn't
120 possible to specify it exactly.</li>
121 </ul>
122 <h2><a name="Realities">Realities</a></h2>
123 <ul>
124 <li>Some operating systems have a single directory tree root, others have
125 multiple roots.<br>
126 &nbsp;</li>
127 <li>Some file systems provide both a long and short form of filenames.<br>
128 &nbsp;</li>
129 <li>Some file systems have different syntax for file paths and directory
130 paths.<br>
131 &nbsp;</li>
132 <li>Some file systems have different rules for valid file names and valid
133 directory names.<br>
134 &nbsp;</li>
135 <li>Some file systems (ISO-9660, level 1, for example) use very restricted
136 (so-called 8.3) file names.<br>
137 &nbsp;</li>
138 <li>Some operating systems allow file systems with different
139 characteristics to be &quot;mounted&quot; within a directory tree.&nbsp; Thus a
140 ISO-9660 or Windows
141 file system may end up as a sub-tree of a POSIX directory tree.<br>
142 &nbsp;</li>
143 <li>Wide-character versions of directory and file operations are available on some operating
144 systems, and not available on others.<br>
145 &nbsp;</li>
146 <li>There is no law that says directory hierarchies have to be specified in
147 terms of left-to-right decent from the root.<br>
148 &nbsp;</li>
149 <li>Some file systems have a concept of file &quot;version number&quot; or &quot;generation
150 number&quot;.&nbsp; Some don't.<br>
151 &nbsp;</li>
152 <li>Not all operating systems use single character separators in path names.&nbsp; Some use
153 paired notations. A typical fully-specified OpenVMS filename
154 might look something like this:<br>
155 <br>
156 <code>&nbsp;&nbsp; DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br>
157 </code><br>
158 The general OpenVMS format is:<br>
159 <br>
160 &nbsp;&nbsp;&nbsp;&nbsp;
161 <i>Device:[directories.dot.separated]filename.extension;version_number</i><br>
162 &nbsp;</li>
163 <li>For common file systems, determining if two descriptors are for same
164 entity is extremely difficult or impossible.&nbsp; For example, the concept of
165 equality can be different for each portion of a path - some portions may be
166 case or locale sensitive, others not. Case sensitivity is a property of the
167 pathname itself, and not the platform. Determining collating sequence is even
168 worse.<br>
169 &nbsp;</li>
170 <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the
171 filesystem.&nbsp; That may well include computers on the other side of the
172 world or in orbit around the world. This implies that file system operations
173 may fail in unexpected ways.&nbsp;For example:<br>
174 <br>
175 <code>&nbsp;&nbsp;&nbsp;&nbsp; assert( exists(&quot;foo&quot;) == exists(&quot;foo&quot;) );
176 // may fail!<br>
177 &nbsp;&nbsp;&nbsp;&nbsp; assert( is_directory(&quot;foo&quot;) == is_directory(&quot;foo&quot;);
178 // may fail!<br>
179 </code><br>
180 In the first example, the file may have been deleted between calls to
181 exists().&nbsp; In the second example, the file may have been deleted and then
182 replaced by a directory of the same name between the calls to is_directory().<br>
183 &nbsp;</li>
184 <li>Even though an application may be portable, it still will have to traffic
185 in system specific paths occasionally; user provided input is a common
186 example.<br>
187 &nbsp;</li>
188 <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and
189 normal form of some paths to represent different files or directories. For
190 example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic
191 link in <code>/a</code> named <code>x</code>&nbsp; pointing to <code>b/c</code>,
192 then under POSIX Pathname Resolution rules a path of <code>&quot;/a/x/..&quot;</code>
193 should resolve to <code>&quot;/a/b&quot;</code>. If <code>&quot;/a/x/..&quot;</code> were first
194 normalized to <code>&quot;/a&quot;</code>, it would resolve incorrectly. (Case supplied
195 by Walter Landry.)</li>
196 </ul>
197
198 <h2><a name="Rationale">Rationale</a></h2>
199
200 <p>The <a href="#Requirements">Requirements</a> and <a href="#Realities">
201 Realities</a> above drove much of the C++ interface design.&nbsp; In particular,
202 the desire to make script-like code straightforward caused a great deal of
203 effort to go into ensuring that apparently simple expressions like <i>exists( &quot;foo&quot;
204 )</i> work as expected.</p>
205
206 <p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed
207 design decisions.</p>
208
209 <p>Several key insights went into the <i>path</i> class design:</p>
210 <ul>
211 <li>Decoupling of the input formats, internal conceptual (<i>vector&lt;string&gt;</i>
212 or other sequence)
213 model, and output formats.</li>
214 <li>Providing two input formats (generic and O/S specific) broke a major
215 design deadlock.</li>
216 <li>Providing several output formats solved another set of previously
217 intractable problems.</li>
218 <li>Several non-obvious functions (particularly decomposition and composition)
219 are required to support portable code. (Peter Dimov, Thomas Witt, Glen
220 Knowles, others.)</li>
221 </ul>
222
223 <p>Error checking was a particularly difficult area. One key insight was that
224 with file and directory names, portability isn't a universal truth.&nbsp;
225 Rather, the programmer must think out the question &quot;What operating systems do I
226 want this path to be portable to?&quot;&nbsp; By providing support for several
227 answers to that question, the Filesystem Library alerts programmers of the need
228 to ask it in the first place.</p>
229 <h2><a name="Abandoned_Designs">Abandoned Designs</a></h2>
230 <h3>operations.hpp</h3>
231 <p>Dietmar Kühl's original dir_it design and implementation supported
232 wide-character file and directory names. It was abandoned after extensive
233 discussions among Library Working Group members failed to identify portable
234 semantics for wide-character names on systems not providing native support. See
235 <a href="faq.htm#wide-character_names">FAQ</a>.</p>
236 <p>Previous iterations of the interface design used explicitly named functions providing a
237 large number of convenience operations, with no compile-time or run-time
238 options. There were so many function names that they were very confusing to use,
239 and the interface was much larger. Any benefits seemed theoretical rather than
240 real. </p>
241 <p>Designs based on compile time (rather than runtime) flag and option selection
242 (via policy, enum, or int template parameters) became so complicated that they
243 were abandoned, often after investing quite a bit of time and effort. The need
244 to qualify attribute or option names with namespaces, even aliases, made use in
245 template parameters ugly; that wasn't fully appreciated until actually writing
246 real code.</p>
247 <p>Yet another set of convenience functions ( for example, <i>remove</i> with
248 permissive, prune, recurse, and other options, plus predicate, and possibly
249 other, filtering features) were abandoned because the details became both
250 complex and contentious.</p>
251
252 <p>What is left is a toolkit of low-level operations from which the user can
253 create more complex convenience operations, plus a very small number of
254 convenience functions which were found to be useful enough to justify inclusion.</p>
255
256 <h3>path.hpp</h3>
257
258 <p>There were so many abandoned path designs, I've lost track. Policy-based
259 class templates in several flavors, constructor supplied runtime policies,
260 operation specific runtime policies, they were all considered, often
261 implemented, and ultimately abandoned as far too complicated for any small
262 benefits observed.</p>
263
264 <p>Additional design considerations apply to <a href="v3_design.html">Internationalization</a>. </p>
265
266 <h3>error checking</h3>
267
268 <p>A number of designs for the error checking machinery were abandoned, some
269 after experiments with implementations. Totally automatic error checking was
270 attempted in particular. But automatic error checking tended to make the overall
271 library design much more complicated.</p>
272
273 <p>Some designs associated error checking mechanisms with paths.&nbsp; Some with
274 operations functions.&nbsp; A policy-based error checking template design was
275 partially implemented, then abandoned as too complicated for everyday
276 script-like programs.</p>
277
278 <p>The final design, which depends partially on explicit error checking function
279 calls,&nbsp; is much simpler and straightforward, although it does depend to
280 some extent on programmer discipline.&nbsp; But it should allow programmers who
281 are concerned about portability to be reasonably sure that their programs will
282 work correctly on their choice of target systems.</p>
283
284 <h2><a name="References">References</a></h2>
285
286 <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
287 <tr>
288 <td width="13%" valign="top">[<a name="IBM-01">IBM-01</a>]</td>
289 <td width="87%">IBM Corporation, <i>z/OS V1R3.0 C/C++ Run-Time
290 Library Reference</i>, SA22-7821-02, 2001,
291 <a href="http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/">
292 www-1.ibm.com/servers/eserver/zseries/zos/bkserv/</a></td>
293 </tr>
294 <tr>
295 <td width="13%" valign="top">[<a name="ISO-9660">ISO-9660</a>]</td>
296 <td width="87%">International Standards Organization, 1988</td>
297 </tr>
298 <tr>
299 <td width="13%" valign="top">[<a name="Kuhn">Kuhn</a>]</td>
300 <td width="87%">UTF-8 and Unicode FAQ for Unix/Linux,
301 <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">
302 www.cl.cam.ac.uk/~mgk25/unicode.html</a></td>
303 </tr>
304 <tr>
305 <td width="13%" valign="top">[<a name="MSDN">MSDN</a>] </td>
306 <td width="87%">Microsoft Platform SDK for Windows, Storage Start
307 Page,
308 <a href="http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp">
309 msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp</a></td>
310 </tr>
311 <tr>
312 <td width="13%" valign="top">[<a name="POSIX-01">POSIX-01</a>]</td>
313 <td width="87%">IEEE&nbsp;Std&nbsp;1003.1-2001, ISO/IEC 9945:2002, and The Open Group Base Specifications, Issue 6. Also known as The
314 Single Unix<font face="Times New Roman">® Specification, Version 3.
315 Available from each of the organizations involved in its creation. For
316 example, read online or download from
317 <a href="http://www.unix.org/single_unix_specification/">
318 www.unix.org/single_unix_specification/</a>.</font> The ISO JTC1/SC22/WG15 - POSIX
319 homepage is <a href="http://www.open-std.org/jtc1/sc22/WG15/">
320 www.open-std.org/jtc1/sc22/WG15/</a></td>
321 </tr>
322 <tr>
323 <td width="13%" valign="top">[<a name="URI">URI</a>]</td>
324 <td width="87%">RFC-2396, Uniform Resource Identifiers (URI): Generic
325 Syntax, <a href="http://www.ietf.org/rfc/rfc2396.txt">
326 www.ietf.org/rfc/rfc2396.txt</a></td>
327 </tr>
328 <tr>
329 <td width="13%" valign="top">[<a name="UTF-16">UTF-16</a>]</td>
330 <td width="87%">Wikipedia, UTF-16,
331 <a href="http://en.wikipedia.org/wiki/UTF-16">
332 en.wikipedia.org/wiki/UTF-16</a></td>
333 </tr>
334 <tr>
335 <td width="13%" valign="top">[<a name="Wulf-Shaw-73">Wulf-Shaw-73</a>]</td>
336 <td width="87%">William Wulf, Mary Shaw, <i>Global
337 Variable Considered Harmful</i>, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34</td>
338 </tr>
339 </table>
340
341 <hr>
342 <p>Revised
343 <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->26 December, 2014<!--webbot bot="Timestamp" endspan i-checksum="38646" --></p>
344
345 <p>&copy; Copyright Beman Dawes, 2002</p>
346 <p> Use, modification, and distribution are subject to the Boost Software
347 License, Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt">
348 LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt">
349 www.boost.org/LICENSE_1_0.txt</a>)</p>
350
351 </body>
352
353 </html>