ceph/src/boost/libs/endian/doc/choosing_approach.html

   1 <html>
   2
   3 <head>
   4 <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
   5 <meta name="ProgId" content="FrontPage.Editor.Document">
   6 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   7 <title>Choosing Approach</title>
   8 <link href="styles.css" rel="stylesheet">
   9 </head>
  10
  11 <body>
  12
  13 <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
  14   <tr>
  15     <td width="339">
  16 <a href="../../../index.html">
  17 <img src="../../../boost.png" alt="Boost logo" align="middle" border="0" width="277" height="86"></a></td>
  18     <td align="middle" width="1253">
  19     <font size="6"><b>Choosing the Approach</b></font></td>
  20   </tr>
  21 </table>
  22
  23 <table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse"
  24   bordercolor="#111111" bgcolor="#D7EEFF" width="100%">
  25   <tr>
  26     <td><b>
  27     <a href="index.html">Endian Home</a>&nbsp;&nbsp;&nbsp;&nbsp;
  28     <a href="conversion.html">Conversion Functions</a>&nbsp;&nbsp;&nbsp;&nbsp;
  29     <a href="arithmetic.html">Arithmetic Types</a>&nbsp;&nbsp;&nbsp;&nbsp;
  30     <a href="buffers.html">Buffer Types</a>&nbsp;&nbsp;&nbsp;&nbsp;
  31     <a href="choosing_approach.html">Choosing Approach</a></b></td>
  32   </tr>
  33 </table>
  34 <p></p>
  35
  36 <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" align="right">
  37   <tr>
  38     <td width="100%" bgcolor="#D7EEFF" align="center">
  39       <i><b>Contents</b></i></td>
  40   </tr>
  41   <tr>
  42     <td width="100%" bgcolor="#E8F5FF">
  43 <a href="#Introduction">Introduction</a><br>
  44 <a href="#Choosing">Choosing between conversion functions,</a><br>
  45   &nbsp;  <a href="#Choosing">buffer types, and  arithmetic types</a><br>
  46 &nbsp;&nbsp;&nbsp;<a href="#Characteristics">Characteristics</a><br>
  47 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Endianness-invariants">Endianness invariants</a><br>
  48 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Conversion-explicitness">Conversion explicitness</a><br>
  49 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Arithmetic-operations">Arithmetic operations</a><br>
  50 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Sizes">Sizes</a><br>
  51 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Alignments">Alignments</a><br>
  52 &nbsp;&nbsp;&nbsp;<a href="#Design-patterns">Design patterns</a><br>
  53 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#As-needed">Convert only as needed (i.e. lazy)</a><br>
  54 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Anticipating-need">Convert in anticipation of need</a><br>
  55 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Convert-generally-as-needed-locally-in-anticipation">Generally
  56 as needed, locally in anticipation</a><br>
  57 &nbsp;&nbsp;&nbsp;<a href="#Use-cases">Use case examples</a><br>
  58 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Porting-endian-unaware-codebase">Porting endian unaware codebase</a><br>
  59 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Porting-endian-aware-codebase">Porting endian aware codebase</a><br>
  60 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Reliability-arithmetic-speed">Reliability and arithmetic-speed</a><br>
  61 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#Reliability-ease-of-use">Reliability and ease-of-use</a></td>
  62   </tr>
  63   </table>
  64
  65 <h2><a name="Introduction">Introduction</a></h2>
  66
  67 <p>Deciding which is the best endianness approach (conversion functions, buffer
  68 types, or arithmetic types) for a particular application involves complex
  69 engineering trade-offs. It is hard to assess those trade-offs without some
  70 understanding of the different interfaces, so you might want to read the
  71 <a href="conversion.html">conversion functions</a>, <a href="buffers.html">
  72 buffer types</a>, and <a href="arithmetic.html">arithmetic types</a> pages
  73 before diving into this page.</p>
  74
  75 <h2><a name="Choosing">Choosing</a> between  conversion functions,  buffer types,
  76 and  arithmetic types</h2>
  77
  78 <p>The best approach to endianness for a particular application  depends on  the interaction between
  79 the application&#39;s needs and the characteristics of each of the three  approaches.</p>
  80
  81 <p><b>Recommendation:</b> If you are new to endianness, uncertain, or don&#39;t want to invest
  82 the time to
  83 study
  84 engineering trade-offs, use <a href="arithmetic.html">endian arithmetic types</a>. They are safe, easy
  85 to use, and easy to maintain. Use the
  86 <a href="#Anticipating-need"> <i>
  87 anticipating need</i></a> design pattern locally around performance hot spots
  88 like lengthy loops, if needed.</p>
  89
  90 <h3><a name="Background">Background</a> </h3>
  91
  92 <p>A dealing with endianness usually implies a program portability or a data
  93 portability requirement, and often both. That means real programs dealing with
  94 endianness are usually complex, so the examples shown here would really be
  95 written as multiple functions spread across multiple translation units. They
  96 would involve interfaces that can not be altered as they are supplied by
  97 third-parties or the standard library. </p>
  98
  99 <h3><a name="Characteristics">Characteristics</a></h3>
 100
 101 <p>The characteristics that differentiate the three approaches to endianness are the endianness
 102 invariants, conversion explicitness, arithmetic operations, sizes available, and
 103 alignment requirements.</p>
 104
 105 <h4><a name="Endianness-invariants">Endianness invariants</a></h4>
 106
 107 <blockquote>
 108
 109 <p><b>Endian conversion functions</b> use objects of the ordinary C++ arithmetic
 110 types like <code>int</code> or <code>unsigned short</code> to hold values. That
 111 breaks the implicit invariant that the C++ language rules apply. The usual
 112 language rules only apply if the endianness of the object is currently set to the native endianness for the platform. That can
 113 make it very hard to reason about logic flow, and result in difficult to
 114 find bugs.</p>
 115
 116 <p>For example:</p>
 117
 118 <blockquote>
 119   <pre>struct data_t  // big endian
 120 {
 121   int32_t   v1;  // description ...
 122   int32_t   v2;  // description ...
 123   ... additional character data members (i.e. non-endian)
 124   int32_t   v3;  // description ...
 125 };
 126
 127 data_t data;
 128
 129 read(data);
 130 big_to_native_inplace(data.v1);
 131 big_to_native_inplace(data.v2);
 132
 133 ...
 134
 135 ++v1;
 136 third_party::func(data.v2);
 137
 138 ...
 139
 140 native_to_big_inplace(data.v1);
 141 native_to_big_inplace(data.v2);
 142 write(data);
 143 </pre>
 144   <p>The programmer didn&#39;t bother to convert <code>data.v3</code> to native
 145   endianness because that member isn&#39;t used. A later maintainer needs to pass
 146   <code>data.v3</code> to the third-party function, so adds <code>third_party::func(data.v3);</code>
 147   somewhere deep in the code. This causes a silent failure because the usual
 148   invariant that an object of type <code>int32_t</code> holds a value as
 149   described by the C++ core language does not apply.</p>
 150 </blockquote>
 151 <p><b>Endian buffer and arithmetic types</b> hold values internally as arrays of
 152 characters with an invariant that the endianness of the array never changes.
 153 That makes these types easier to use and programs easier to maintain. </p>
 154 <p>Here is the same example, using an endian arithmetic type:</p>
 155 <blockquote>
 156   <pre>struct data_t
 157 {
 158   big_int32_t   v1;  // description ...
 159   big_int32_t   v2;  // description ...
 160   ... additional character data members (i.e. non-endian)
 161   big_int32_t   v3;  // description ...
 162 };
 163
 164 data_t data;
 165
 166 read(data);
 167
 168 ...
 169
 170 ++v1;
 171 third_party::func(data.v2);
 172
 173 ...
 174
 175 write(data);
 176 </pre>
 177   <p>A later maintainer can add <code>third_party::func(data.v3)</code>and it
 178   will just-work.</p>
 179 </blockquote>
 180
 181 </blockquote>
 182
 183 <h4><a name="Conversion-explicitness">Conversion explicitness</a></h4>
 184
 185 <blockquote>
 186
 187 <p><b>Endian conversion functions</b> and <b>buffer types</b> never perform
 188 implicit conversions. This gives users explicit control of when conversion
 189 occurs, and may help avoid unnecessary conversions.</p>
 190
 191 <p><b>Endian arithmetic types</b> perform conversion implicitly. That makes
 192 these types very easy to use, but can result in unnecessary conversions. Failure
 193 to hoist conversions out of inner loops can bring a performance penalty.</p>
 194
 195 </blockquote>
 196
 197 <h4><a name="Arithmetic-operations">Arithmetic operations</a></h4>
 198
 199 <blockquote>
 200
 201 <p><b>Endian conversion functions</b> do not supply arithmetic
 202 operations, but this is not a concern since this approach uses ordinary C++
 203 arithmetic types to hold values.</p>
 204
 205 <p><b>Endian buffer types</b> do not supply arithmetic operations. Although this
 206 approach avoids unnecessary conversions, it can result in the introduction of
 207 additional variables and confuse maintenance programmers.</p>
 208
 209 <p><b>Endian</b> <b>arithmetic types</b> do supply arithmetic operations. They
 210 are very easy to use if lots of arithmetic is involved. </p>
 211
 212 </blockquote>
 213
 214 <h4><a name="Sizes">Sizes</a></h4>
 215
 216 <blockquote>
 217
 218 <p><b>Endianness conversion functions</b> only support 1, 2, 4, and 8 byte
 219 integers. That&#39;s sufficient for many applications.</p>
 220
 221 <p><b>Endian buffer and arithmetic types</b> support 1, 2, 3, 4, 5, 6, 7, and 8
 222 byte integers. For an application where memory use or I/O speed is the limiting
 223 factor, using sizes tailored to application needs can be  useful.</p>
 224
 225 </blockquote>
 226
 227 <h4><a name="Alignments">Alignments</a></h4>
 228
 229 <blockquote>
 230
 231 <p><b>Endianness conversion functions</b> only support aligned integer and
 232 floating-point types. That&#39;s sufficient for most applications.</p>
 233
 234 <p><b>Endian buffer and arithmetic types</b> support both aligned and unaligned
 235 integer and floating-point types. Unaligned types are rarely needed, but when
 236 needed they are often very useful and workarounds are painful. For example,</p>
 237
 238 <blockquote>
 239   <p>Non-portable code like this:<blockquote>
 240       <pre>struct S {
 241   uint16_t a;&nbsp; // big endian
 242   uint32_t b;&nbsp; // big endian
 243 } __attribute__ ((packed));</pre>
 244     </blockquote>
 245     <p>Can be replaced with portable code like this:</p>
 246     <blockquote>
 247       <pre>struct S {
 248   big_uint16_ut a;
 249   big_uint32_ut b;
 250 };</pre>
 251     </blockquote>
 252       </blockquote>
 253
 254 </blockquote>
 255
 256 <h3><a name="Design-patterns">Design patterns</a></h3>
 257
 258 <p>Applications often traffic in endian data as records or packets containing
 259 multiple endian data elements. For simplicity, we will just call them records.</p>
 260
 261 <p>If desired endianness differs from native endianness, a conversion has to be
 262 performed. When should that conversion occur? Three design patterns have
 263 evolved.</p>
 264
 265 <h4><a name="As-needed">Convert only as needed</a> (i.e. lazy)</h4>
 266
 267 <p>This pattern defers conversion to the point in the code where the data
 268 element is actually used.</p>
 269
 270 <p>This pattern is appropriate when which endian element is actually used varies
 271 greatly according to record content or other circumstances</p>
 272
 273 <h4><a name="Anticipating-need">Convert in anticipation of need</a></h4>
 274
 275 <p>This pattern performs conversion to native endianness in anticipation of use,
 276 such as immediately after reading records. If needed, conversion to the output
 277 endianness is performed after all possible needs have passed, such as just
 278 before writing records.</p>
 279
 280 <p>One implementation of this pattern is to create a proxy record with
 281 endianness converted to native in a read function, and expose only that proxy to
 282 the rest of the implementation. If a write function, if needed, handles the
 283 conversion from native to the desired output endianness.</p>
 284
 285 <p>This pattern is appropriate when all endian elements in a record are
 286 typically used regardless of record content or other circumstances</p>
 287
 288 <h4><a name="Convert-generally-as-needed-locally-in-anticipation">Convert
 289 only as needed, except locally in anticipation of need</a></h4>
 290
 291 <p>This pattern in general defers conversion but for specific local needs does
 292 anticipatory conversion. Although particularly appropriate when coupled with the endian buffer
 293 or arithmetic types, it also works well with the conversion functions.</p>
 294
 295 <p>Example:</p>
 296
 297 <blockquote>
 298   <pre>struct data_t
 299 {
 300   big_int32_t   v1;
 301   big_int32_t   v2;
 302   big_int32_t   v3;
 303 };
 304
 305 data_t data;
 306
 307 read(data);
 308
 309 ...
 310 ++v1;
 311 ...
 312
 313 int32_t v3_temp = data.v3;  // hoist conversion out of loop
 314
 315 for (int32_t i = 0; i &lt; <i><b>large-number</b></i>; ++i)
 316 {
 317   ... <i><b>lengthy computation that accesses </b></i>v3_temp<i><b> many times</b></i> ...
 318 }
 319 data.v3 = v3_temp;
 320
 321 write(data);
 322 </pre>
 323 </blockquote>
 324
 325 <p dir="ltr">In general the above pseudo-code leaves conversion up to the endian
 326 arithmetic type <code>big_int32_t</code>. But to avoid conversion inside the
 327 loop, a temporary is created before the loop is entered, and then used to set
 328 the new value of <code>data.v3</code> after the loop is complete.</p>
 329
 330 <blockquote>
 331
 332 <p dir="ltr">Question: Won&#39;t the compiler&#39;s optimizer hoist the conversion out
 333 of the loop anyhow?</p>
 334
 335 <p dir="ltr">Answer: VC++ 2015 Preview, and probably others, does not, even for
 336 a toy test program. Although the savings is small (two register <code>
 337 <span style="font-size: 85%">bswap</span></code> instructions), the cost might
 338 be significant if the loop is repeated enough times. On the other hand, the
 339 program may be so dominated by I/O time that even a lengthy loop will be
 340 immaterial.</p>
 341
 342 </blockquote>
 343
 344 <h3><a name="Use-cases">Use case examples</a></h3>
 345
 346 <h4><a name="Porting-endian-unaware-codebase">Porting endian unaware codebase</a></h4>
 347
 348 <p>An existing codebase runs on  big endian systems. It does not
 349 currently deal with endianness. The codebase needs to be modified so it can run
 350 on&nbsp; little endian systems under various operating systems. To ease
 351 transition and protect value of existing files, external data will continue to
 352 be maintained as big endian.</p>
 353
 354 <p dir="ltr">The <a href="arithmetic.html">endian
 355 arithmetic approach</a> is recommended to meet these needs. A relatively small
 356 number of header files dealing with binary I/O layouts need to change types. For
 357 example,&nbsp;
 358 <code>short</code> or <code>int16_t</code> would change to <code>big_int16_t</code>. No
 359 changes are required for <code>.cpp</code> files.</p>
 360
 361 <h4><a name="Porting-endian-aware-codebase">Porting endian aware codebase</a></h4>
 362
 363 <p>An existing codebase runs on little-endian Linux systems. It already
 364 deals with endianness via
 365 <a href="http://man7.org/linux/man-pages/man3/endian.3.html">Linux provided
 366 functions</a>. Because of a business merger, the codebase has to be quickly
 367 modified for Windows and possibly other operating systems, while still
 368 supporting Linux. The codebase is reliable and the programmers are all
 369 well-aware of endian issues. </p>
 370
 371 <p dir="ltr">These factors all argue for an <a href="conversion.html">endian conversion
 372 approach</a> that just mechanically changes the calls to <code>htobe32</code>,
 373 etc. to <code>boost::endian::native_to_big</code>, etc. and replaces <code>&lt;endian.h&gt;</code>
 374 with <code>&lt;boost/endian/conversion.hpp&gt;</code>.</p>
 375
 376 <h4><a name="Reliability-arithmetic-speed">Reliability and arithmetic-speed</a></h4>
 377
 378 <p>A new, complex, multi-threaded application is to be developed that must run
 379 on little endian machines, but do big endian network I/O. The developers believe
 380 computational speed for endian variable is critical but have seen numerous bugs
 381 result from inability to reason about endian conversion state. They are also
 382 worried that future maintenance changes could inadvertently introduce a lot of
 383 slow conversions if full-blown endian arithmetic types are used.</p>
 384
 385 <p>The <a href="buffers.html">endian buffers</a> approach is made-to-order for
 386 this use case.</p>
 387
 388 <h4><a name="Reliability-ease-of-use">Reliability and ease-of-use</a></h4>
 389
 390 <p>A new, complex, multi-threaded application is to be developed that must run
 391 on little endian machines, but do big endian network I/O. The developers believe
 392 computational speed for endian variables is <b>not critical</b> but have seen
 393 numerous bugs result from inability to reason about endian conversion state.
 394 They are also concerned about ease-of-use both during development and long-term
 395 maintenance.</p>
 396
 397 <p>Removing concern about conversion speed and adding concern about ease-of-use
 398 tips the balance strongly in favor the <a href="arithmetic.html">endian
 399 arithmetic approach</a>.</p>
 400
 401 <hr>
 402 <p>Last revised:
 403 <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->19 January, 2015<!--webbot bot="Timestamp" endspan i-checksum="38903" --></p>
 404 <p>© Copyright Beman Dawes, 2011, 2013, 2014</p>
 405 <p>Distributed under the Boost Software License, Version 1.0. See
 406 <a href="http://www.boost.org/LICENSE_1_0.txt">www.boost.org/ LICENSE_1_0.txt</a></p>
 407
 408 <p>&nbsp;</p>
 409
 410 </body>
 411
 412 </html>