]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
2 | ||
3 | <html> | |
4 | <head> | |
5 | <meta http-equiv="Content-Language" content="en-us"> | |
6 | <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> | |
7 | ||
8 | <title>Type-safe 'printf-like' format class</title> | |
9 | </head> | |
10 | ||
11 | <body bgcolor="#FFFFFF" text="#000000"> | |
12 | <h1><img align="middle" alt="boost.png (6897 bytes)" height="86" src= | |
13 | "../../../boost.png" width="277">Type-safe 'printf-like' <b>format | |
14 | class</b></h1> | |
15 | ||
16 | <h2>Choices made</h2> | |
17 | ||
18 | <p>"Le pourquoi du comment" ( - "the why of the how")</p> | |
19 | <hr> | |
20 | ||
21 | <h3>The syntax of the format-string</h3> | |
22 | ||
23 | <p>Format is a new library. One of its goal is to provide a replacement for | |
24 | printf, that means format can parse a format-string designed for printf, | |
25 | apply it to the given arguments, and produce the same result as printf | |
26 | would have.<br> | |
27 | With this constraint, there were roughly 3 possible choices for the syntax | |
28 | of the format-string :</p> | |
29 | ||
30 | <ol> | |
31 | <li>Use the exact same syntax of printf. It's well known by many | |
32 | experienced users, and fits almost all needs. But with C++ streams, the | |
33 | type-conversion character, crucial to determine the end of a directive, | |
34 | is only useful to set some associated formatting options, in a C++ | |
35 | streams context (%x for setting hexa, etc..) It would be better to make | |
36 | this obligatory type-conversion character, with modified meaning, | |
37 | optional.</li> | |
38 | ||
39 | <li>extend printf syntax while maintaining compatibility, by using | |
40 | characters and constructs not yet valid as printf syntax. e.g. : "%1%", | |
41 | "%[1]", "%|1$d|", .. Using begin / end marks, all sort of extension can | |
42 | be considered.</li> | |
43 | ||
44 | <li>Provide a non-legacy mode, in parallel of the printf-compatible one, | |
45 | that can be designed to fit other objectives without constraints of | |
46 | compatibilty with the existing printf syntax.<br> | |
47 | But Designing a replacement to printf's syntax, that would be clearly | |
48 | better, and as much powerful, is yet another task than building a format | |
49 | class. When such a syntax is designed, we should consider splitting | |
50 | Boost.format into 2 separate libraries : one working hand in hand with | |
51 | this new syntax, and another supporting the legacy syntax (possibly a | |
52 | fast version, built with safety improvement above snprintf or the | |
53 | like).</li> | |
54 | </ol>In the absence of a full, clever, new syntax clearly better adapted to | |
55 | C++ streams than printf, the second approach was chosen. Boost.format uses | |
56 | printf's syntax, with extensions (tabulations, centered alignements) that | |
57 | can be expressed using extensions to this syntax.<br> | |
58 | And alternate compatible notations are provided to address the weaknesses | |
59 | of printf's : | |
60 | ||
61 | <ul> | |
62 | <li><i>"%<b>N</b>%"</i> as a simpler positional, typeless and optionless | |
63 | notation.</li> | |
64 | ||
65 | <li><i>%|spec|</i> as a way to encapsulate printf directive in movre | |
66 | visually evident structures, at the same time making printf's | |
67 | 'type-conversion character' optional.</li> | |
68 | </ul> | |
69 | <hr> | |
70 | ||
71 | <h3>Why are arguments passed through an operator rather than a function | |
72 | call ?</h3><br> | |
73 | The inconvenience of the operator approach (for some people) is that it | |
74 | might be confusing. It's a usual warning that too much of overloading | |
75 | operators gets people real confused.<br> | |
76 | Since the use of format objects will be in specific contexts ( most often | |
77 | right after a "cout << ") and look like a formatting string followed | |
78 | by arguments indeed : | |
79 | ||
80 | <blockquote> | |
81 | <pre> | |
82 | format(" %s at %s with %s\n") % x % y % z; | |
83 | </pre> | |
84 | </blockquote>we can hope it wont confuse people that much. | |
85 | ||
86 | <p>An other fear about operators, is precedence problems. What if I someday | |
87 | write <b>format("%s") % x+y</b><br> | |
88 | instead of <i>format("%s") % (x+y)</i> ??<br> | |
89 | It will make a mistake at compile-time, so the error will be immediately | |
90 | detected.<br> | |
91 | indeed, this line calls <i>tmp = operator%( format("%s"), x)</i><br> | |
92 | and then <i>operator+(tmp, y)</i><br> | |
93 | tmp will be a format object, for which no implicit conversion is defined, | |
94 | and thus the call to operator+ will fail. (except if you define such an | |
95 | operator, of course). So you can safely assume precedence mistakes will be | |
96 | noticed at compilation.</p> | |
97 | ||
98 | <p><br> | |
99 | On the other hand, the function approach has a true inconvenience. It needs | |
100 | to define lots of template function like :</p> | |
101 | ||
102 | <blockquote> | |
103 | <pre> | |
104 | template <class T1, class T2, .., class TN> | |
105 | string format(string s, const T1& x1, .... , const T1& xN); | |
106 | ||
107 | </pre> | |
108 | </blockquote>and even if we define those for N up to 500, that is still a | |
109 | limitation, that C's printf does not have.<br> | |
110 | Also, since format somehow emulates printf in some cases, but is far from | |
111 | being fully equivalent to printf, it's best to use a radically different | |
112 | appearance, and using operator calls succeeds very well in that ! | |
113 | ||
114 | <p><br> | |
115 | Anyhow, if we actually chose the formal function call templates system, it | |
116 | would only be able to print Classes T for which there is an</p> | |
117 | ||
118 | <blockquote> | |
119 | <pre> | |
120 | operator<< ( stream, const T&) | |
121 | </pre> | |
122 | </blockquote>Because allowing both const and non const produces a | |
123 | combinatorics explosion - if we go up to 10 arguments, we need 2^10 | |
124 | functions.<br> | |
125 | (providing overloads on T& / const T& is at the frontier of defects | |
126 | of the C++ standard, and thus is far from guaranteed to be supported. But | |
127 | right now several compilers support those overloads)<br> | |
128 | There is a lot of chances that a class which only provides the non-const | |
129 | equivalent is badly designed, but yet it is another unjustified restriction | |
130 | to the user.<br> | |
131 | Also, some manipulators are functions, and can not be passed as const | |
132 | references. The function call approach thus does not support manipulators | |
133 | well. | |
134 | ||
135 | <p>In conclusion, using a dedicated binary operator is the simplest, most | |
136 | robust, and least restrictive mechanism to pass arguments when you can't | |
137 | know the number of arguments at compile-time.</p> | |
138 | <hr> | |
139 | ||
140 | <h3>Why operator% rather than a member function 'with(..)' | |
141 | ??</h3>technically, | |
142 | ||
143 | <blockquote> | |
144 | <pre> | |
145 | format(fstr) % x1 % x2 % x3; | |
146 | </pre> | |
147 | </blockquote>has the same structure as | |
148 | ||
149 | <blockquote> | |
150 | <pre> | |
151 | format(fstr).with( x1 ).with( x2 ).with( x3 ); | |
152 | </pre> | |
153 | </blockquote>which does not have any precedence problem. The only drawback, | |
154 | is it's harder for the eye to catch what is done in this line, than when we | |
155 | are using operators. calling .with(..), it looks just like any other line | |
156 | of code. So it may be a better solution, depending on tastes. The extra | |
157 | characters, and overall cluttered aspect of the line of code using | |
158 | 'with(..)' were enough for me to opt for a true operator. | |
159 | <hr> | |
160 | ||
161 | <h3>Why operator% rather than usual formatting operator<< ??</h3> | |
162 | ||
163 | <ul> | |
164 | <li>because passing arguments to a format object is *not* the same as | |
165 | sending variables, sequentially, into a stream, and because a format | |
166 | object is not a stream, nor a manipulator.<br> | |
167 | We use an operator to pass arguments. format will use them as a | |
168 | function would, it simply takes arguments one by one.<br> | |
169 | format objects can not provide stream-like behaviour. When you try to | |
170 | implement a format object that acts like a manipulator, returning a | |
171 | stream, you make the user beleive it is completely like a | |
172 | stream-manipulator. And sooner or later, the user is deceived by this | |
173 | point of view.<br> | |
174 | The most obvious example of that difference in behaviour is | |
175 | ||
176 | <blockquote> | |
177 | <pre> | |
178 | cout << format("%s %s ") << x; | |
179 | cout << y ; // uh-oh, format is not really a stream manipulator | |
180 | </pre> | |
181 | </blockquote> | |
182 | </li> | |
183 | ||
184 | <li>precedence of % is higher than that of <<. It can be viewd as a | |
185 | problem, because + and - thus needs to be grouped inside parentheses, | |
186 | while it is not necessary with '<<'. But if the user forgets, the | |
187 | mistake is catched at compilation, and hopefully he won't forget | |
188 | again.<br> | |
189 | On the other hand, the higher precedence makes format's behaviour very | |
190 | straight-forward. | |
191 | ||
192 | <blockquote> | |
193 | <pre> | |
194 | cout << format("%s %s ") % x % y << endl; | |
195 | </pre> | |
196 | </blockquote>is treated exaclt like : | |
197 | ||
198 | <blockquote> | |
199 | <pre> | |
200 | cout << ( format("%s %s ") % x % y ) << endl; | |
201 | </pre> | |
202 | </blockquote>So using %, the life of a format object does not interfere | |
203 | with the surrounding stream context. This is the simplest possible | |
204 | behaviour, and thus the user is able to continue using the stream after | |
205 | the format object.<br> | |
206 | <br> | |
207 | With operator<<, things are much more problematic in this | |
208 | situation. This line : | |
209 | ||
210 | <blockquote> | |
211 | <pre> | |
212 | cout << format("%s %s ") << x << y << endl; | |
213 | </pre> | |
214 | </blockquote>is understood as : | |
215 | ||
216 | <blockquote> | |
217 | <pre> | |
218 | ( ( ( cout << format("%s %s ") ) << x ) << y ) << endl; | |
219 | </pre> | |
220 | </blockquote>Several alternative implementations chose | |
221 | operator<<, and there is only one way to make it work :<br> | |
222 | the first call to | |
223 | ||
224 | <blockquote> | |
225 | <pre> | |
226 | operator<<( ostream&, format const&) | |
227 | </pre> | |
228 | </blockquote>returns a proxy, encapsulating both the final destination | |
229 | (cout) and the format-string information<br> | |
230 | Passing arguments to format, or to the final destination after | |
231 | completion of the format are indistinguishable. This is a problem. | |
232 | ||
233 | <p>I examined several possible implementations, and none is completely | |
234 | satsifying.<br> | |
235 | E.g. : In order to catch users mistake, it makes sense to raise | |
236 | exceptions when the user passes too many arguments. But in this | |
237 | context, supplementary arguments are most certainly aimed at the final | |
238 | destination. There are several choices here :</p> | |
239 | ||
240 | <ul> | |
241 | <li>You can give-up detection of arity excess, and have the proxy's | |
242 | template member operator<<( const T&) simply forward all | |
243 | supplementary arguments to cout.</li> | |
244 | ||
245 | <li>Require the user to close the format arguments with a special | |
246 | manipulator, 'endf', in this way : | |
247 | ||
248 | <blockquote> | |
249 | <pre> | |
250 | cout << format("%s %s ") << x << y << endf << endl; | |
251 | </pre> | |
252 | </blockquote>You can define endf to be a function that returns the | |
253 | final destination stored inside the proxy. Then it's okay, after | |
254 | endf the user is calling << on cout again. | |
255 | </li> | |
256 | ||
257 | <li>An intermediate solution, is to adress the most frequent use, | |
258 | where the user simply wants to output one more manipulator item to | |
259 | cout (a std::flush, or endl, ..) | |
260 | ||
261 | <blockquote> | |
262 | <pre> | |
263 | cout << format("%s %s \n") << x << y << flush ; | |
264 | </pre> | |
265 | </blockquote>Then, the solution is to overload the operator<< | |
266 | for manipulators. This way You don't need endf, but outputting a | |
267 | non-manipulator item right after the format arguments is a mistake. | |
268 | </li> | |
269 | </ul><br> | |
270 | The most complete solution is the one with the endf manipualtor. With | |
271 | operator%, there is no need for this end-format function, plus you | |
272 | instantly see which arguments are going into the format object, and | |
273 | which are going to the stream. | |
274 | </li> | |
275 | ||
276 | <li>Esthetically : '%' is the same letter as used inside the | |
277 | format-string. That is quite nice to have the same letter used for | |
278 | passing each argument. '<<' is 2 letters, '%' is one. '%' is also | |
279 | smaller in size. It overall improves visualisation (we see what goes with | |
280 | what) : | |
281 | ||
282 | <blockquote> | |
283 | <pre> | |
284 | cout << format("%s %s %s") %x %y %z << "And avg is" << format("%s\n") %avg; | |
285 | </pre> | |
286 | </blockquote>compared to : | |
287 | ||
288 | <blockquote> | |
289 | <pre> | |
290 | cout << format("%s %s %s") << x << y << z << endf <<"And avg is" << format("%s\n") << avg; | |
291 | </pre> | |
292 | </blockquote>"<<" misleadingly puts the arguments at the same | |
293 | level as any object passed to the stream. | |
294 | </li> | |
295 | ||
296 | <li>python also uses % for formatting, so you see it's not so "unheard | |
297 | of" ;-)</li> | |
298 | </ul> | |
299 | <hr> | |
300 | ||
301 | <h3>Why operator% rather than operator(), or operator[] ??</h3> | |
302 | ||
303 | <p>operator() has the merit of being the natural way to send an argument | |
304 | into a function. And some think that operator[] 's meaning apply well to | |
305 | the usage in format.<br> | |
306 | They're as good as operator% technically, but quite ugly. (that's a matter | |
307 | of taste)<br> | |
308 | And deepd down, using operator% for passing arguments that were referred to | |
309 | by "%" in the format string seems much more natural to me than using those | |
310 | operators.</p> | |
311 | <hr> | |
312 | ||
313 | <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= | |
314 | "../../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional" | |
315 | height="31" width="88"></a></p> | |
316 | ||
317 | <p>Revised | |
318 | <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->02 December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38510" --></p> | |
319 | ||
320 | <p><i>Copyright © 2001 Samuel Krempp</i></p> | |
321 | ||
322 | <p><i>Distributed under the Boost Software License, Version 1.0. (See | |
323 | accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or | |
324 | copy at <a href= | |
325 | "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> | |
326 | </body> | |
327 | </html> |