]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | ||
3 | <head> | |
4 | <meta http-equiv="Content-Language" content="en-us"> | |
5 | <meta name="GENERATOR" content="Microsoft FrontPage 5.0"> | |
6 | <meta name="ProgId" content="FrontPage.Editor.Document"> | |
7 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |
8 | <title>Boost Filesystem Library Design</title> | |
9 | <link href="styles.css" rel="stylesheet"> | |
10 | </head> | |
11 | ||
12 | <body bgcolor="#FFFFFF"> | |
13 | ||
14 | <h1> | |
15 | <img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem | |
16 | Library Design</h1> | |
17 | ||
18 | <p><a href="#Introduction">Introduction</a><br> | |
19 | <a href="#Requirements">Requirements</a><br> | |
20 | <a href="#Realities">Realities</a><br> | |
21 | <a href="#Rationale">Rationale</a><br> | |
22 | <a href="#Abandoned_Designs">Abandoned_Designs</a><br> | |
23 | <a href="#References">References</a></p> | |
24 | ||
25 | <h2><a name="Introduction">Introduction</a></h2> | |
26 | ||
27 | <p>The primary motivation for beginning work on the Filesystem Library was | |
28 | frustration with Boost administrative tools. Scripts were written in | |
29 | Python, Perl, Bash, and Windows command languages. There was no single | |
30 | scripting language familiar and acceptable to all Boost administrators. Yet they | |
31 | were all skilled C++ programmers - why couldn't C++ be used as the scripting | |
32 | language?</p> | |
33 | ||
34 | <p>The key feature C++ lacked for script-like applications was the ability to | |
35 | perform portable filesystem operations on directories and their contents. The | |
36 | Filesystem Library was developed to fill that void.</p> | |
37 | ||
38 | <p>The intent is not to compete with traditional scripting languages, but to | |
39 | provide a solution for situations where C++ is already the language | |
40 | of choice..</p> | |
41 | ||
42 | <h2><a name="Requirements">Requirements</a></h2> | |
43 | <ul> | |
44 | <li>Be able to write portable script-style filesystem operations in modern | |
45 | C++.<br> | |
46 | <br> | |
47 | Rationale: This is a common programming need. It is both an | |
48 | embarrassment and a hardship that this is not possible with either the current | |
49 | C++ or Boost libraries. The need is particularly acute | |
50 | when C++ is the only toolset allowed in the tool chain. File system | |
51 | operations are provided by many languages used on multiple platforms, | |
52 | such as Perl and Python, as well as by many platform specific scripting | |
53 | languages. All operating systems provide some form of API for filesystem | |
54 | operations, and the POSIX bindings are increasingly available even on | |
55 | operating systems not normally associated with POSIX, such as the Mac, z/OS, | |
56 | or OS/390.<br> | |
57 | </li> | |
58 | <li>Work within the <a href="#Realities">realities</a> described below.<br> | |
59 | <br> | |
60 | Rationale: This isn't a research project. The need is for something that works on | |
61 | today's platforms, including some of the embedded operating systems | |
62 | with limited file systems. Because of the emphasis on portability, such a | |
63 | library would be much more useful if standardized. That means being able to | |
64 | work with a much wider range of platforms that just Unix or Windows and their | |
65 | clones.<br> | |
66 | </li> | |
67 | <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications | |
68 | and use of global variables. If a dangerous feature is provided, identify it as such.<br> | |
69 | <br> | |
70 | Rationale: Normally this would be covered by "the usual Boost requirements...", | |
71 | but it is mentioned explicitly because the equivalent native platform and | |
72 | scripting language interfaces often depend on all-too-easy-to-ignore error | |
73 | notifications and global variables like "current | |
74 | working directory".<br> | |
75 | </li> | |
76 | <li>Structure the library so that it is still useful even if some functionality | |
77 | does not map well onto a given platform or directory tree. Particularly, much | |
78 | useful functionality should be portable even to flat | |
79 | (non-hierarchical) filesystems.<br> | |
80 | <br> | |
81 | Rationale: Much functionality which does not | |
82 | require a hierarchical directory structure is still useful on flat-structure | |
83 | filesystems. There are many systems, particularly embedded systems, | |
84 | where even very limited functionality is still useful.</li> | |
85 | </ul> | |
86 | <ul> | |
87 | <li>Interface smoothly with current C++ Standard Library input/output | |
88 | facilities. For example, paths should be | |
89 | easy to use in std::basic_fstream constructors.<br> | |
90 | <br> | |
91 | Rationale: One of the most common uses of file system functionality is to | |
92 | manipulate paths for eventual use in input/output operations. | |
93 | Thus the need to interface smoothly with standard library I/O.<br> | |
94 | </li> | |
95 | <li>Suitable for eventual standardization. The implication of this requirement | |
96 | is that the interface be close to minimal, and that great care be take | |
97 | regarding portability.<br> | |
98 | <br> | |
99 | Rationale: The lack of file system operations is a serious hole | |
100 | in the current standard, with no other known candidates to fill that hole. | |
101 | Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for | |
102 | standardization.<br> | |
103 | </li> | |
104 | <li>The usual Boost <a href="http://www.boost.org/more/lib_guide.htm">requirements and | |
105 | guidelines</a> apply.<br> | |
106 | </li> | |
107 | <li>Encourage, but do not require, portability in path names.<br> | |
108 | <br> | |
109 | Rationale: For paths which originate from user input it is unreasonable to | |
110 | require portable path syntax.<br> | |
111 | </li> | |
112 | <li>Avoid giving the illusion of portability where portability in fact does not | |
113 | exist.<br> | |
114 | <br> | |
115 | Rationale: Leaving important behavior unspecified or "implementation defined" does a | |
116 | great disservice to programmers using a library because it makes it appear | |
117 | that code relying on the behavior is portable, when in fact there is nothing | |
118 | portable about it. The only case where such under-specification is acceptable is when both users and implementors know from | |
119 | other sources exactly what behavior is required, yet for some reason it isn't | |
120 | possible to specify it exactly.</li> | |
121 | </ul> | |
122 | <h2><a name="Realities">Realities</a></h2> | |
123 | <ul> | |
124 | <li>Some operating systems have a single directory tree root, others have | |
125 | multiple roots.<br> | |
126 | </li> | |
127 | <li>Some file systems provide both a long and short form of filenames.<br> | |
128 | </li> | |
129 | <li>Some file systems have different syntax for file paths and directory | |
130 | paths.<br> | |
131 | </li> | |
132 | <li>Some file systems have different rules for valid file names and valid | |
133 | directory names.<br> | |
134 | </li> | |
135 | <li>Some file systems (ISO-9660, level 1, for example) use very restricted | |
136 | (so-called 8.3) file names.<br> | |
137 | </li> | |
138 | <li>Some operating systems allow file systems with different | |
139 | characteristics to be "mounted" within a directory tree. Thus a | |
140 | ISO-9660 or Windows | |
141 | file system may end up as a sub-tree of a POSIX directory tree.<br> | |
142 | </li> | |
143 | <li>Wide-character versions of directory and file operations are available on some operating | |
144 | systems, and not available on others.<br> | |
145 | </li> | |
146 | <li>There is no law that says directory hierarchies have to be specified in | |
147 | terms of left-to-right decent from the root.<br> | |
148 | </li> | |
149 | <li>Some file systems have a concept of file "version number" or "generation | |
150 | number". Some don't.<br> | |
151 | </li> | |
152 | <li>Not all operating systems use single character separators in path names. Some use | |
153 | paired notations. A typical fully-specified OpenVMS filename | |
154 | might look something like this:<br> | |
155 | <br> | |
156 | <code> DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br> | |
157 | </code><br> | |
158 | The general OpenVMS format is:<br> | |
159 | <br> | |
160 | | |
161 | <i>Device:[directories.dot.separated]filename.extension;version_number</i><br> | |
162 | </li> | |
163 | <li>For common file systems, determining if two descriptors are for same | |
164 | entity is extremely difficult or impossible. For example, the concept of | |
165 | equality can be different for each portion of a path - some portions may be | |
166 | case or locale sensitive, others not. Case sensitivity is a property of the | |
167 | pathname itself, and not the platform. Determining collating sequence is even | |
168 | worse.<br> | |
169 | </li> | |
170 | <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the | |
171 | filesystem. That may well include computers on the other side of the | |
172 | world or in orbit around the world. This implies that file system operations | |
173 | may fail in unexpected ways. For example:<br> | |
174 | <br> | |
175 | <code> assert( exists("foo") == exists("foo") ); | |
176 | // may fail!<br> | |
177 | assert( is_directory("foo") == is_directory("foo"); | |
178 | // may fail!<br> | |
179 | </code><br> | |
180 | In the first example, the file may have been deleted between calls to | |
181 | exists(). In the second example, the file may have been deleted and then | |
182 | replaced by a directory of the same name between the calls to is_directory().<br> | |
183 | </li> | |
184 | <li>Even though an application may be portable, it still will have to traffic | |
185 | in system specific paths occasionally; user provided input is a common | |
186 | example.<br> | |
187 | </li> | |
188 | <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and | |
189 | normal form of some paths to represent different files or directories. For | |
190 | example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic | |
191 | link in <code>/a</code> named <code>x</code> pointing to <code>b/c</code>, | |
192 | then under POSIX Pathname Resolution rules a path of <code>"/a/x/.."</code> | |
193 | should resolve to <code>"/a/b"</code>. If <code>"/a/x/.."</code> were first | |
194 | normalized to <code>"/a"</code>, it would resolve incorrectly. (Case supplied | |
195 | by Walter Landry.)</li> | |
196 | </ul> | |
197 | ||
198 | <h2><a name="Rationale">Rationale</a></h2> | |
199 | ||
200 | <p>The <a href="#Requirements">Requirements</a> and <a href="#Realities"> | |
201 | Realities</a> above drove much of the C++ interface design. In particular, | |
202 | the desire to make script-like code straightforward caused a great deal of | |
203 | effort to go into ensuring that apparently simple expressions like <i>exists( "foo" | |
204 | )</i> work as expected.</p> | |
205 | ||
206 | <p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed | |
207 | design decisions.</p> | |
208 | ||
209 | <p>Several key insights went into the <i>path</i> class design:</p> | |
210 | <ul> | |
211 | <li>Decoupling of the input formats, internal conceptual (<i>vector<string></i> | |
212 | or other sequence) | |
213 | model, and output formats.</li> | |
214 | <li>Providing two input formats (generic and O/S specific) broke a major | |
215 | design deadlock.</li> | |
216 | <li>Providing several output formats solved another set of previously | |
217 | intractable problems.</li> | |
218 | <li>Several non-obvious functions (particularly decomposition and composition) | |
219 | are required to support portable code. (Peter Dimov, Thomas Witt, Glen | |
220 | Knowles, others.)</li> | |
221 | </ul> | |
222 | ||
223 | <p>Error checking was a particularly difficult area. One key insight was that | |
224 | with file and directory names, portability isn't a universal truth. | |
225 | Rather, the programmer must think out the question "What operating systems do I | |
226 | want this path to be portable to?" By providing support for several | |
227 | answers to that question, the Filesystem Library alerts programmers of the need | |
228 | to ask it in the first place.</p> | |
229 | <h2><a name="Abandoned_Designs">Abandoned Designs</a></h2> | |
230 | <h3>operations.hpp</h3> | |
231 |