]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | <html> |
2 | ||
3 | <head> | |
4 | <meta name="GENERATOR" content="Microsoft FrontPage 5.0"> | |
5 | <meta name="ProgId" content="FrontPage.Editor.Document"> | |
6 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |
7 | <title>Choosing Approach</title> | |
8 | <link href="styles.css" rel="stylesheet"> | |
9 | </head> | |
10 | ||
11 | <body> | |
12 | ||
13 | <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> | |
14 | <tr> | |
15 | <td width="339"> | |
16 | <a href="../../../index.html"> | |
17 | <img src="../../../boost.png" alt="Boost logo" align="middle" border="0" width="277" height="86"></a></td> | |
18 | <td align="middle" width="1253"> | |
19 | <font size="6"><b>Choosing the Approach</b></font></td> | |
20 | </tr> | |
21 | </table> | |
22 | ||
23 | <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" | |
24 | bordercolor="#111111" bgcolor="#D7EEFF" width="100%"> | |
25 | <tr> | |
26 | <td><b> | |
27 | <a href="index.html">Endian Home</a> | |
28 | <a href="conversion.html">Conversion Functions</a> | |
29 | <a href="arithmetic.html">Arithmetic Types</a> | |
30 | <a href="buffers.html">Buffer Types</a> | |
31 | <a href="choosing_approach.html">Choosing Approach</a></b></td> | |
32 | </tr> | |
33 | </table> | |
34 | <p></p> | |
35 | ||
36 | <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" align="right"> | |
37 | <tr> | |
38 | <td width="100%" bgcolor="#D7EEFF" align="center"> | |
39 | <i><b>Contents</b></i></td> | |
40 | </tr> | |
41 | <tr> | |
42 | <td width="100%" bgcolor="#E8F5FF"> | |
43 | <a href="#Introduction">Introduction</a><br> | |
44 | <a href="#Choosing">Choosing between conversion functions,</a><br> | |
45 | <a href="#Choosing">buffer types, and arithmetic types</a><br> | |
46 | <a href="#Characteristics">Characteristics</a><br> | |
47 | <a href="#Endianness-invariants">Endianness invariants</a><br> | |
48 | <a href="#Conversion-explicitness">Conversion explicitness</a><br> | |
49 | <a href="#Arithmetic-operations">Arithmetic operations</a><br> | |
50 | <a href="#Sizes">Sizes</a><br> | |
51 | <a href="#Alignments">Alignments</a><br> | |
52 | <a href="#Design-patterns">Design patterns</a><br> | |
53 | <a href="#As-needed">Convert only as needed (i.e. lazy)</a><br> | |
54 | <a href="#Anticipating-need">Convert in anticipation of need</a><br> | |
55 | <a href="#Convert-generally-as-needed-locally-in-anticipation">Generally | |
56 | as needed, locally in anticipation</a><br> | |
57 | <a href="#Use-cases">Use case examples</a><br> | |
58 | <a href="#Porting-endian-unaware-codebase">Porting endian unaware codebase</a><br> | |
59 | <a href="#Porting-endian-aware-codebase">Porting endian aware codebase</a><br> | |
60 | <a href="#Reliability-arithmetic-speed">Reliability and arithmetic-speed</a><br> | |
61 | <a href="#Reliability-ease-of-use">Reliability and ease-of-use</a></td> | |
62 | </tr> | |
63 | </table> | |
64 | ||
65 | <h2><a name="Introduction">Introduction</a></h2> | |
66 | ||
67 | <p>Deciding which is the best endianness approach (conversion functions, buffer | |
68 | types, or arithmetic types) for a particular application involves complex | |
69 | engineering trade-offs. It is hard to assess those trade-offs without some | |
70 | understanding of the different interfaces, so you might want to read the | |
71 | <a href="conversion.html">conversion functions</a>, <a href="buffers.html"> | |
72 | buffer types</a>, and <a href="arithmetic.html">arithmetic types</a> pages | |
73 | before diving into this page.</p> | |
74 | ||
75 | <h2><a name="Choosing">Choosing</a> between conversion functions, buffer types, | |
76 | and arithmetic types</h2> | |
77 | ||
78 | <p>The best approach to endianness for a particular application depends on the interaction between | |
79 | the application's needs and the characteristics of each of the three approaches.</p> | |
80 | ||
81 | <p><b>Recommendation:</b> If you are new to endianness, uncertain, or don't want to invest | |
82 | the time to | |
83 | study | |
84 | engineering trade-offs, use <a href="arithmetic.html">endian arithmetic types</a>. They are safe, easy | |
85 | to use, and easy to maintain. Use the | |
86 | <a href="#Anticipating-need"> <i> | |
87 | anticipating need</i></a> design pattern locally around performance hot spots | |
88 | like lengthy loops, if needed.</p> | |
89 | ||
90 | <h3><a name="Background">Background</a> </h3> | |
91 | ||
92 | <p>A dealing with endianness usually implies a program portability or a data | |
93 | portability requirement, and often both. That means real programs dealing with | |
94 | endianness are usually complex, so the examples shown here would really be | |
95 | written as multiple functions spread across multiple translation units. They | |
96 | would involve interfaces that can not be altered as they are supplied by | |
97 | third-parties or the standard library. </p> | |
98 | ||
99 | <h3><a name="Characteristics">Characteristics</a></h3> | |
100 | ||
101 | <p>The characteristics that differentiate the three approaches to endianness are the endianness | |
102 | invariants, conversion explicitness, arithmetic operations, sizes available, and | |
103 | alignment requirements.</p> | |
104 | ||
105 | <h4><a name="Endianness-invariants">Endianness invariants</a></h4> | |
106 | ||
107 | <blockquote> | |
108 | ||
109 | <p><b>Endian conversion functions</b> use objects of the ordinary C++ arithmetic | |
110 | types like <code>int</code> or <code>unsigned short</code> to hold values. That | |
111 | breaks the implicit invariant that the C++ language rules apply. The usual | |
112 | language rules only apply if the endianness of the object is currently set to the native endianness for the platform. That can | |
113 | make it very hard to reason about logic flow, and result in difficult to | |
114 | find bugs.</p> | |
115 | ||
116 | <p>For example:</p> | |
117 | ||
118 | <blockquote> | |
119 | <pre>struct data_t // big endian | |
120 | { | |
121 | int32_t v1; // description ... | |
122 | int32_t v2; // description ... | |
123 | ... additional character data members (i.e. non-endian) | |
124 | int32_t v3; // description ... | |
125 | }; | |
126 | ||
127 | data_t data; | |
128 | ||
129 | read(data); | |
130 | big_to_native_inplace(data.v1); | |
131 | big_to_native_inplace(data.v2); | |
132 | ||
133 | ... | |
134 | ||
135 | ++v1; | |
136 | third_party::func(data.v2); | |
137 | ||
138 | ... | |
139 | ||
140 | native_to_big_inplace(data.v1); | |
141 | native_to_big_inplace(data.v2); | |
142 | write(data); | |
143 | </pre> | |
144 | <p>The programmer didn't bother to convert <code>data.v3</code> to native | |
145 | endianness because that member isn't used. A later maintainer needs to pass | |
146 | <code>data.v3</code> to the third-party function, so adds <code>third_party::func(data.v3);</code> | |
147 | somewhere deep in the code. This causes a silent failure because the usual | |
148 | invariant that an object of type <code>int32_t</code> holds a value as | |
149 | described by the C++ core language does not apply.</p> | |
150 | </blockquote> | |
151 | <p><b>Endian buffer and arithmetic types</b> hold values internally as arrays of | |
152 | characters with an invariant that the endianness of the array never changes. | |
153 | That makes these types easier to use and programs easier to maintain. </p> | |
154 | <p>Here is the same example, using an endian arithmetic type:</p> | |
155 | <blockquote> | |
156 | <pre>struct data_t | |
157 | { | |
158 | big_int32_t v1; // description ... | |
159 | big_int32_t v2; // description ... | |
160 | ... additional character data members (i.e. non-endian) | |
161 | big_int32_t v3; // description ... | |
162 | }; | |
163 | ||
164 | data_t data; | |
165 | ||
166 | read(data); | |
167 | ||
168 | ... | |
169 | ||
170 | ++v1; | |
171 | third_party::func(data.v2); | |
172 | ||
173 | ... | |
174 | ||
175 | write(data); | |
176 | </pre> | |
177 | <p>A later maintainer can add <code>third_party::func(data.v3)</code>and it | |
178 | will just-work.</p> | |
179 | </blockquote> | |
180 | ||
181 | </blockquote> | |
182 | ||
183 | <h4><a name="Conversion-explicitness">Conversion explicitness</a></h4> | |
184 | ||
185 | <blockquote> | |
186 | ||
187 | <p><b>Endian conversion functions</b> and <b>buffer types</b> never perform | |
188 | implicit conversions. This gives users explicit control of when conversion | |
189 | occurs, and may help avoid unnecessary conversions.</p> | |
190 | ||
191 | <p><b>Endian arithmetic types</b> perform conversion implicitly. That makes | |
192 | these types very easy to use, but can result in unnecessary conversions. Failure | |
193 | to hoist conversions out of inner loops can bring a performance penalty.</p> | |
194 | ||
195 | </blockquote> | |
196 | ||
197 | <h4><a name="Arithmetic-operations">Arithmetic operations</a></h4> | |
198 | ||
199 | <blockquote> | |
200 | ||
201 | <p><b>Endian conversion functions</b> do not supply arithmetic | |
202 | operations, but this is not a concern since this approach uses ordinary C++ | |
203 | arithmetic types to hold values.</p> | |
204 | ||
205 | <p><b>Endian buffer types</b> do not supply arithmetic operations. Although this | |
206 | approach avoids unnecessary conversions, it can result in the introduction of | |
207 | additional variables and confuse maintenance programmers.</p> | |
208 | ||
209 | <p><b>Endian</b> <b>arithmetic types</b> do supply arithmetic operations. They | |
210 | are very easy to use if lots of arithmetic is involved. </p> | |
211 | ||
212 | </blockquote> | |
213 | ||
214 | <h4><a name="Sizes">Sizes</a></h4> | |
215 | ||
216 | <blockquote> | |
217 | ||
218 | <p><b>Endianness conversion functions</b> only support 1, 2, 4, and 8 byte | |
219 | integers. That's sufficient for many applications.</p> | |
220 | ||
221 | <p><b>Endian buffer and arithmetic types</b> support 1, 2, 3, 4, 5, 6, 7, and 8 | |
222 | byte integers. For an application where memory use or I/O speed is the limiting | |
223 | factor, using sizes tailored to application needs can be useful.</p> | |
224 | ||
225 | </blockquote> | |
226 | ||
227 | <h4><a name="Alignments">Alignments</a></h4> | |
228 | ||
229 | <blockquote> | |
230 | ||
231 | <p><b>Endianness conversion functions</b> only support aligned integer and | |
232 | floating-point types. That's sufficient for most applications.</p> | |
233 | ||
234 | <p><b>Endian buffer and arithmetic types</b> support both aligned and unaligned | |
235 | integer and floating-point types. Unaligned types are rarely needed, but when | |
236 | needed they are often very useful and workarounds are painful. For example,</p> | |
237 | ||
238 | <blockquote> | |
239 | <p>Non-portable code like this:<blockquote> | |
240 | <pre>struct S { | |
241 | uint16_t a; // big endian | |
242 | uint32_t b; // big endian | |
243 | } __attribute__ ((packed));</pre> | |
244 | </blockquote> | |
245 | <p>Can be replaced with portable code like this:</p> | |
246 | <blockquote> | |
247 | <pre>struct S { | |
248 | big_uint16_ut a; | |
249 | big_uint32_ut b; | |
250 | };</pre> | |
251 | </blockquote> | |
252 | </blockquote> | |
253 | ||
254 | </blockquote> | |
255 | ||
256 | <h3><a name="Design-patterns">Design patterns</a></h3> | |
257 | ||
258 | <p>Applications often traffic in endian data as records or packets containing | |
259 | multiple endian data elements. For simplicity, we will just call them records.</p> | |
260 | ||
261 | <p>If desired endianness differs from native endianness, a conversion has to be | |
262 | performed. When should that conversion occur? Three design patterns have | |
263 | evolved.</p> | |
264 | ||
265 | <h4><a name="As-needed">Convert only as needed</a> (i.e. lazy)</h4> | |
266 | ||
267 | <p>This pattern defers conversion to the point in the code where the data | |
268 | element is actually used.</p> | |
269 | ||
270 | <p>This pattern is appropriate when which endian element is actually used varies | |
271 | greatly according to record content or other circumstances</p> | |
272 | ||
273 | <h4><a name="Anticipating-need">Convert in anticipation of need</a></h4> | |
274 | ||
275 | <p>This pattern performs conversion to native endianness in anticipation of use, | |
276 | such as immediately after reading records. If needed, conversion to the output | |
277 | endianness is performed after all possible needs have passed, such as just | |
278 | before writing records.</p> | |
279 | ||
280 | <p>One implementation of this pattern is to create a proxy record with | |
281 | endianness converted to native in a read function, and expose only that proxy to | |
282 | the rest of the implementation. If a write function, if needed, handles the | |
283 | conversion from native to the desired output endianness.</p> | |
284 | ||
285 | <p>This pattern is appropriate when all endian elements in a record are | |
286 | typically used regardless of record content or other circumstances</p> | |
287 | ||
288 | <h4><a name="Convert-generally-as-needed-locally-in-anticipation">Convert | |
289 | only as needed, except locally in anticipation of need</a></h4> | |
290 | ||
291 | <p>This pattern in general defers conversion but for specific local needs does | |
292 | anticipatory conversion. Although particularly appropriate when coupled with the endian buffer | |
293 | or arithmetic types, it also works well with the conversion functions.</p> | |
294 | ||
295 | <p>Example:</p> | |
296 | ||
297 | <blockquote> | |
298 | <pre>struct data_t | |
299 | { | |
300 | big_int32_t v1; | |
301 | big_int32_t v2; | |
302 | big_int32_t v3; | |
303 | }; | |
304 | ||
305 | data_t data; | |
306 | ||
307 | read(data); | |
308 | ||
309 | ... | |
310 | ++v1; | |
311 | ... | |
312 | ||
313 | int32_t v3_temp = data.v3; // hoist conversion out of loop | |
314 | ||
315 | for (int32_t i = 0; i < <i><b>large-number</b></i>; ++i) | |
316 | { | |
317 | ... <i><b>lengthy computation that accesses </b></i>v3_temp<i><b> many times</b></i> ... | |
318 | } | |
319 | data.v3 = v3_temp; | |
320 | ||
321 | write(data); | |
322 | </pre> | |
323 | </blockquote> | |
324 | ||
325 | <p dir="ltr">In general the above pseudo-code leaves conversion up to the endian | |
326 | arithmetic type <code>big_int32_t</code>. But to avoid conversion inside the | |
327 | loop, a temporary is created before the loop is entered, and then used to set | |
328 | the new value of <code>data.v3</code> after the loop is complete.</p> | |
329 | ||
330 | <blockquote> | |
331 | ||
332 | <p dir="ltr">Question: Won't the compiler's optimizer hoist the conversion out | |
333 | of the loop anyhow?</p> | |
334 | ||
335 | <p dir="ltr">Answer: VC++ 2015 Preview, and probably others, does not, even for | |
336 | a toy test program. Although the savings is small (two register <code> | |
337 | <span style="font-size: 85%">bswap</span></code> instructions), the cost might | |
338 | be significant if the loop is repeated enough times. On the other hand, the | |
339 | program may be so dominated by I/O time that even a lengthy loop will be | |
340 | immaterial.</p> | |
341 | ||
342 | </blockquote> | |
343 | ||
344 | <h3><a name="Use-cases">Use case examples</a></h3> | |
345 | ||
346 | <h4><a name="Porting-endian-unaware-codebase">Porting endian unaware codebase</a></h4> | |
347 | ||
348 | <p>An existing codebase runs on big endian systems. It does not | |
349 | currently deal with endianness. The codebase needs to be modified so it can run | |
350 | on little endian systems under various operating systems. To ease | |
351 | transition and protect value of existing files, external data will continue to | |
352 | be maintained as big endian.</p> | |
353 | ||
354 | <p dir="ltr">The <a href="arithmetic.html">endian | |
355 | arithmetic approach</a> is recommended to meet these needs. A relatively small | |
356 | number of header files dealing with binary I/O layouts need to change types. For | |
357 | example, | |
358 | <code>short</code> or <code>int16_t</code> would change to <code>big_int16_t</code>. No | |
359 | changes are required for <code>.cpp</code> files.</p> | |
360 | ||
361 | <h4><a name="Porting-endian-aware-codebase">Porting endian aware codebase</a></h4> | |
362 | ||
363 | <p>An existing codebase runs on little-endian Linux systems. It already | |
364 | deals with endianness via | |
365 | <a href="http://man7.org/linux/man-pages/man3/endian.3.html">Linux provided | |
366 | functions</a>. Because of a business merger, the codebase has to be quickly | |
367 | modified for Windows and possibly other operating systems, while still | |
368 | supporting Linux. The codebase is reliable and the programmers are all | |
369 | well-aware of endian issues. </p> | |
370 | ||
371 | <p dir="ltr">These factors all argue for an <a href="conversion.html">endian conversion | |
372 | approach</a> that just mechanically changes the calls to <code>htobe32</code>, | |
373 | etc. to <code>boost::endian::native_to_big</code>, etc. and replaces <code><endian.h></code> | |
374 | with <code><boost/endian/conversion.hpp></code>.</p> | |
375 | ||
376 | <h4><a name="Reliability-arithmetic-speed">Reliability and arithmetic-speed</a></h4> | |
377 | ||
378 | <p>A new, complex, multi-threaded application is to be developed that must run | |
379 | on little endian machines, but do big endian network I/O. The developers believe | |
380 | computational speed for endian variable is critical but have seen numerous bugs | |
381 | result from inability to reason about endian conversion state. They are also | |
382 | worried that future maintenance changes could inadvertently introduce a lot of | |
383 | slow conversions if full-blown endian arithmetic types are used.</p> | |
384 | ||
385 | <p>The <a href="buffers.html">endian buffers</a> approach is made-to-order for | |
386 | this use case.</p> | |
387 | ||
388 | <h4><a name="Reliability-ease-of-use">Reliability and ease-of-use</a></h4> | |
389 | ||
390 | <p>A new, complex, multi-threaded application is to be developed that must run | |
391 | on little endian machines, but do big endian network I/O. The developers believe | |
392 | computational speed for endian variables is <b>not critical</b> but have seen | |
393 | numerous bugs result from inability to reason about endian conversion state. | |
394 | They are also concerned about ease-of-use both during development and long-term | |
395 | maintenance.</p> | |
396 | ||
397 | <p>Removing concern about conversion speed and adding concern about ease-of-use | |
398 | tips the balance strongly in favor the <a href="arithmetic.html">endian | |
399 | arithmetic approach</a>.</p> | |
400 | ||
401 | <hr> | |
402 | <p>Last revised: | |
403 | <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->19 January, 2015<!--webbot bot="Timestamp" endspan i-checksum="38903" --></p> | |
404 | <p>© Copyright Beman Dawes, 2011, 2013, 2014</p> | |
405 | <p>Distributed under the Boost Software License, Version 1.0. See | |
406 | <a href="http://www.boost.org/LICENSE_1_0.txt">www.boost.org/ LICENSE_1_0.txt</a></p> | |
407 | ||
408 | <p> </p> | |
409 | ||
410 | </body> | |
411 | ||
412 | </html> |