ceph/src/boost/libs/numeric/conversion/doc/definitions.qbk

   1 [/
   2     Boost.Optional
   3
   4     Copyright (c) 2003-2007 Fernando Luis Cacciola Carballal
   5
   6     Distributed under the Boost Software License, Version 1.0.
   7     (See accompanying file LICENSE_1_0.txt or copy at
   8     http://www.boost.org/LICENSE_1_0.txt)
   9 ]
  10
  11
  12 [section Definitions]
  13
  14 [section Introduction]
  15
  16 This section provides definitions of terms used in the Numeric Conversion library.
  17
  18 [blurb [*Notation]
  19 [_underlined text] denotes terms defined in the C++ standard.
  20
  21 [*bold face] denotes terms defined here but not in the standard.
  22 ]
  23
  24 [endsect]
  25
  26 [section Types and Values]
  27
  28 As defined by the [_C++ Object Model] (§1.7) the [_storage] or memory on which a
  29 C++ program runs is a contiguous sequence of [_bytes] where each byte is a
  30 contiguous sequence of bits.
  31
  32 An [_object] is a region of storage (§1.8) and has a type (§3.9).
  33
  34 A [_type] is a discrete set of values.
  35
  36 An object of type `T` has an [_object representation] which is the sequence of
  37 bytes stored in the object (§3.9/4)
  38
  39 An object of type `T` has a [_value representation] which is the set of
  40 bits that determine the ['value] of an object of that type (§3.9/4).
  41 For [_POD] types (§3.9/10), this bitset is given by the object representation,
  42 but not all the bits in the storage need to participate in the value
  43 representation (except for character types): for example, some bits might
  44 be used for padding or there may be trap-bits.
  45
  46 __SPACE__
  47
  48 The [*typed value] that is held by an object is the value which is determined
  49 by its value representation.
  50
  51 An [*abstract value] (untyped) is the conceptual information that is
  52 represented in a type (i.e. the number π).
  53
  54 The [*intrinsic value] of an object is the binary value of the sequence of
  55 unsigned characters which form its object representation.
  56
  57 __SPACE__
  58
  59 ['Abstract] values can be [*represented] in a given type.
  60
  61 To [*represent] an abstract value `V` in a type `T` is to obtain a typed value
  62 `v` which corresponds to the abstract value `V`.
  63
  64 The operation is denoted using the `rep()` operator, as in: `v=rep(V)`.
  65 `v` is the [*representation] of `V` in the type `T`.
  66
  67 For example, the abstract value π can be represented in the type
  68 `double` as the `double value M_PI` and in the type `int` as the
  69 `int value 3`
  70
  71 __SPACE__
  72
  73 Conversely, ['typed values] can be [*abstracted].
  74
  75 To [*abstract] a typed value `v` of type `T` is to obtain the abstract value `V`
  76 whose representation in `T` is `v`.
  77
  78 The operation is denoted using the `abt()` operator, as in: `V=abt(v)`.
  79
  80 `V` is the [*abstraction] of `v` of type `T`.
  81
  82 Abstraction is just an abstract operation (you can't do it); but it is
  83 defined nevertheless because it will be used to give the definitions in the
  84 rest of this document.
  85
  86 [endsect]
  87
  88 [section C++ Arithmetic Types]
  89
  90 The C++ language defines [_fundamental types] (§3.9.1). The following subsets of
  91 the fundamental types are intended to represent ['numbers]:
  92
  93 [variablelist
  94 [[[_signed integer types] (§3.9.1/2):][
  95 `{signed char, signed short int, signed int, signed long int}`
  96 Can be used to represent general integer numbers (both negative and positive).
  97 ]]
  98 [[[_unsigned integer types] (§3.9.1/3):][
  99 `{unsigned char, unsigned short int, unsigned int, unsigned long int}`
 100 Can be used to represent positive integer numbers with modulo-arithmetic.
 101 ]]
 102 [[[_floating-point types] (§3.9.1/8):][
 103 `{float,double,long double}`
 104 Can be used to represent real numbers.
 105 ]]
 106 [[[_integral or integer types] (§3.9.1/7):][
 107 `{{signed integers},{unsigned integers}, bool, char and wchar_t}`
 108 ]]
 109 [[[_arithmetic types] (§3.9.1/8):][
 110 `{{integer types},{floating types}}`
 111 ]]
 112 ]
 113
 114 The integer types are required to have a ['binary] value representation.
 115
 116 Additionally, the signed/unsigned integer types of the same base type
 117 (`short`, `int` or `long`) are required to have the same value representation,
 118 that is:
 119
 120              int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits)
 121     unsigned int u =  i ; // u is required to have the same 10011 as its value representation.
 122
 123 In other words, the integer types signed/unsigned X use the same value
 124 representation but a different ['interpretation] of it; that is, their
 125 ['typed values] might differ.
 126
 127 Another consequence of this is that the range for signed X is always a smaller
 128 subset of the range of unsigned X, as required by §3.9.1/3.
 129
 130 [note
 131 Always remember that unsigned types, unlike signed types, have modulo-arithmetic;
 132 that is, they do not overflow.
 133 This means that:
 134
 135 [*-] Always be extra careful when mixing signed/unsigned types
 136
 137 [*-] Use unsigned types only when you need modulo arithmetic or very very large
 138 numbers. Don't use unsigned types just because you intend to deal with
 139 positive values only (you can do this with signed types as well).
 140 ]
 141
 142
 143 [endsect]
 144
 145 [section Numeric Types]
 146
 147 This section introduces the following definitions intended to integrate
 148 arithmetic types with user-defined types which behave like numbers.
 149 Some definitions are purposely broad in order to include a vast variety of
 150 user-defined number types.
 151
 152 Within this library, the term ['number] refers to an abstract numeric value.
 153
 154 A type is [*numeric] if:
 155
 156 * It is an arithmetic type, or,
 157 * It is a user-defined type which
 158     * Represents numeric abstract values (i.e. numbers).
 159     * Can be converted (either implicitly or explicitly) to/from at least one arithmetic type.
 160     * Has [link boost_numericconversion.definitions.range_and_precision range] (possibly unbounded)
 161       and [link boost_numericconversion.definitions.range_and_precision precision] (possibly dynamic or
 162       unlimited).
 163     * Provides an specialization of `std::numeric_limits`.
 164
 165 A numeric type is [*signed] if the abstract values it represent include negative numbers.
 166
 167 A numeric type is [*unsigned] if the abstract values it represent exclude negative numbers.
 168
 169 A numeric type is [*modulo] if it has modulo-arithmetic (does not overflow).
 170
 171 A numeric type is [*integer] if the abstract values it represent are whole numbers.
 172
 173 A numeric type is [*floating] if the abstract values it represent are real numbers.
 174
 175 An [*arithmetic value] is the typed value of an arithmetic type
 176
 177 A [*numeric value] is the typed value of a numeric type
 178
 179 These definitions simply generalize the standard notions of arithmetic types and
 180 values by introducing a superset called [_numeric]. All arithmetic types and values are
 181 numeric types and values, but not vice versa, since user-defined numeric types are not
 182 arithmetic types.
 183
 184 The following examples clarify the differences between arithmetic and numeric
 185 types (and values):
 186
 187
 188     // A numeric type which is not an arithmetic type (is user-defined)
 189     // and which is intended to represent integer numbers (i.e., an 'integer' numeric type)
 190     class MyInt
 191     {
 192         MyInt ( long long v ) ;
 193         long long to_builtin();
 194     } ;
 195     namespace std {
 196     template<> numeric_limits<MyInt> { ... } ;
 197     }
 198
 199     // A 'floating' numeric type (double) which is also an arithmetic type (built-in),
 200     // with a float numeric value.
 201     double pi = M_PI ;
 202
 203     // A 'floating' numeric type with a whole numeric value.
 204     // NOTE: numeric values are typed valued, hence, they are, for instance,
 205     // integer or floating, despite the value itself being whole or including
 206     // a fractional part.
 207     double two = 2.0 ;
 208
 209     // An integer numeric type with an integer numeric value.
 210     MyInt i(1234);
 211
 212
 213 [endsect]
 214
 215 [section Range and Precision]
 216
 217 Given a number set `N`, some of its elements are representable in a numeric type `T`.
 218
 219 The set of representable values of type `T`, or numeric set of `T`, is a set of numeric
 220 values whose elements are the representation of some subset of `N`.
 221
 222 For example, the interval of `int` values `[INT_MIN,INT_MAX]` is the set of representable
 223 values of type `int`, i.e. the `int` numeric set, and corresponds to the representation
 224 of the elements of the interval of abstract values `[abt(INT_MIN),abt(INT_MAX)]` from
 225 the integer numbers.
 226
 227 Similarly, the interval of `double` values `[-DBL_MAX,DBL_MAX]` is the `double`
 228 numeric set, which corresponds to the subset of the real numbers from `abt(-DBL_MAX)` to
 229 `abt(DBL_MAX)`.
 230
 231 __SPACE__
 232
 233 Let [*`next(x)`] denote the lowest numeric value greater than x.
 234
 235 Let [*`prev(x)`] denote the highest numeric value lower then x.
 236
 237 Let [*`v=prev(next(V))`] and [*`v=next(prev(V))`] be identities that relate a numeric
 238 typed value `v` with a number `V`.
 239
 240 An ordered pair of numeric values `x`,`y` s.t. `x<y` are [*consecutive] iff `next(x)==y`.
 241
 242 The abstract distance between consecutive numeric values is usually referred to as a
 243 [_Unit in the Last Place], or [*ulp] for short. A ulp is a quantity whose abstract
 244 magnitude is relative to the numeric values it corresponds to: If the numeric set
 245 is not evenly distributed, that is, if the abstract distance between consecutive
 246 numeric values varies along the set -as is the case with the floating-point types-,
 247 the magnitude of 1ulp after the numeric value `x` might be (usually is) different
 248 from the magnitude of a 1ulp after the numeric value y for `x!=y`.
 249
 250 Since numbers are inherently ordered, a [*numeric set] of type `T` is an ordered sequence
 251 of numeric values (of type `T`) of the form:
 252
 253     REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h}
 254
 255 where `l` and `h` are respectively the lowest and highest values of type `T`, called
 256 the boundary values of type `T`.
 257
 258 __SPACE__
 259
 260 A numeric set is discrete. It has a [*size] which is the number of numeric values in the set,
 261 a [*width] which is the abstract difference between the highest and lowest boundary values:
 262 `[abt(h)-abt(l)]`, and a [*density] which is the relation between its size and width:
 263 `density=size/width`.
 264
 265 The integer types have density 1, which means that there are no unrepresentable integer
 266 numbers between `abt(l)` and `abt(h)` (i.e. there are no gaps). On the other hand,
 267 floating types have density much smaller than 1, which means that there are real numbers
 268 unrepresented between consecutive floating values (i.e. there are gaps).
 269
 270 __SPACE__
 271
 272 The interval of [_abstract values] `[abt(l),abt(h)]` is the range of the type `T`,
 273 denoted `R(T)`.
 274
 275 A range is a set of abstract values and not a set of numeric values. In other
 276 documents, such as the C++ standard, the word `range` is ['sometimes] used as synonym
 277 for `numeric set`, that is, as the ordered sequence of numeric values from `l` to `h`.
 278 In this document, however, a range is an abstract interval which subtends the
 279 numeric set.
 280
 281 For example, the sequence `[-DBL_MAX,DBL_MAX]` is the numeric set of the type
 282 `double`, and the real interval `[abt(-DBL_MAX),abt(DBL_MAX)]` is its range.
 283
 284 Notice, for instance, that the range of a floating-point type is ['continuous]
 285 unlike its numeric set.
 286
 287 This definition was chosen because:
 288
 289 * [*(a)] The discrete set of numeric values is already given by the numeric set.
 290 * [*(b)] Abstract intervals are easier to compare and overlap since only boundary
 291 values need to be considered.
 292
 293 This definition allows for a concise definition of `subranged` as given in the last section.
 294
 295 The width of a numeric set, as defined, is exactly equivalent to the width of a range.
 296
 297 __SPACE__
 298
 299 The [*precision] of a type is given by the width or density of the numeric set.
 300
 301 For integer types, which have density 1, the precision is conceptually equivalent
 302 to the range and is determined by the number of bits used in the value representation:
 303 The higher the number of bits the bigger the size of the numeric set, the wider the
 304 range, and the higher the precision.
 305
 306 For floating types, which have density <<1, the precision is given not by the width
 307 of the range but by the density. In a typical implementation, the range is determined
 308 by the number of bits used in the exponent, and the precision by the number of bits
 309 used in the mantissa (giving the maximum number of significant digits that can be
 310 exactly represented). The higher the number of exponent bits the wider the range,
 311 while the higher the number of mantissa bits, the higher the precision.
 312
 313 [endsect]
 314
 315 [section Exact, Correctly Rounded and Out-Of-Range Representations]
 316
 317 Given an abstract value `V` and a type `T` with its corresponding range `[abt(l),abt(h)]`:
 318
 319 If `V < abt(l)` or `V > abt(h)`, `V` is [*not representable] (cannot be represented) in
 320 the type `T`, or, equivalently, it's representation in the type `T` is [*out of range],
 321 or [*overflows].
 322
 323 * If `V < abt(l)`, the [*overflow is negative].
 324 * If `V > abt(h)`, the [*overflow is positive].
 325
 326 If `V >= abt(l)` and `V <= abt(h)`, `V` is [*representable] (can be represented) in the
 327 type `T`, or, equivalently, its representation in the type `T` is [*in range], or
 328 [*does not overflow].
 329
 330 Notice that a numeric type, such as a C++ unsigned type, can define that any `V` does
 331 not overflow by always representing not `V` itself but the abstract value
 332 `U = [ V % (abt(h)+1) ]`, which is always in range.
 333
 334 Given an abstract value `V` represented in the type `T` as `v`, the [*roundoff] error
 335 of the representation is the abstract difference: `(abt(v)-V)`.
 336
 337 Notice that a representation is an ['operation], hence, the roundoff error corresponds
 338 to the representation operation and not to the numeric value itself
 339 (i.e. numeric values do not have any error themselves)
 340
 341 * If the roundoff is 0, the representation is [*exact], and `V` is exactly representable
 342 in the type `T`.
 343 * If the roundoff is not 0, the representation is [*inexact], and `V` is inexactly
 344 representable in the type `T`.
 345
 346 If a representation `v` in a type `T` -either exact or inexact-, is any of the adjacents
 347 of `V` in that type, that is, if `v==prev` or `v==next`, the representation is
 348 faithfully rounded. If the choice between `prev` and `next` matches a given
 349 [*rounding direction], it is [*correctly rounded].
 350
 351 All exact representations are correctly rounded, but not all inexact representations are.
 352 In particular, C++ requires numeric conversions (described below) and the result of
 353 arithmetic operations (not covered by this document) to be correctly rounded, but
 354 batch operations propagate roundoff, thus final results are usually incorrectly
 355 rounded, that is, the numeric value `r` which is the computed result is neither of
 356 the adjacents of the abstract value `R` which is the theoretical result.
 357
 358 Because a correctly rounded representation is always one of adjacents of the abstract
 359 value being represented, the roundoff is guaranteed to be at most 1ulp.
 360
 361 The following examples summarize the given definitions. Consider:
 362
 363 * A numeric type `Int` representing integer numbers with a
 364 ['numeric set]: `{-2,-1,0,1,2}` and
 365 ['range]: `[-2,2]`
 366 * A numeric type `Cardinal` representing integer numbers with a
 367 ['numeric set]: `{0,1,2,3,4,5,6,7,8,9}` and
 368 ['range]: `[0,9]` (no modulo-arithmetic here)
 369 * A numeric type `Real` representing real numbers with a
 370 ['numeric set]: `{-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0}` and
 371 ['range]: `[-2.0,+2.0]`
 372 * A numeric type `Whole` representing real numbers with a
 373 ['numeric set]: `{-2.0,-1.0,0.0,+1.0,+2.0}` and
 374 ['range]: `[-2.0,+2.0]`
 375
 376 First, notice that the types `Real` and `Whole` both represent real numbers,
 377 have the same range, but different precision.
 378
 379 * The integer number `1` (an abstract value) can be exactly represented
 380 in any of these types.
 381 * The integer number `-1` can be exactly represented in `Int`, `Real` and `Whole`,
 382 but cannot be represented in `Cardinal`, yielding negative overflow.
 383 * The real number `1.5` can be exactly represented in `Real`, and inexactly
 384 represented in the other types.
 385 * If `1.5` is represented as either `1` or `2` in any of the types (except `Real`),
 386 the representation is correctly rounded.
 387 * If `0.5` is represented as `+1.5` in the type `Real`, it is incorrectly rounded.
 388 * `(-2.0,-1.5)` are the `Real` adjacents of any real number in the interval
 389 `[-2.0,-1.5]`, yet there are no `Real` adjacents for `x < -2.0`, nor for `x > +2.0`.
 390
 391 [endsect]
 392
 393 [section Standard (numeric) Conversions]
 394
 395 The C++ language defines [_Standard Conversions] (§4) some of which are conversions
 396 between arithmetic types.
 397
 398 These are [_Integral promotions] (§4.5), [_Integral conversions] (§4.7),
 399 [_Floating point promotions] (§4.6), [_Floating point conversions] (§4.8) and
 400 [_Floating-integral conversions] (§4.9).
 401
 402 In the sequel, integral and floating point promotions are called [*arithmetic promotions],
 403 and these plus integral, floating-point and floating-integral conversions are called
 404 [*arithmetic conversions] (i.e, promotions are conversions).
 405
 406 Promotions, both Integral and Floating point, are ['value-preserving], which means that
 407 the typed value is not changed with the conversion.
 408
 409 In the sequel, consider a source typed value `s` of type `S`, the source abstract
 410 value `N=abt(s)`, a destination type `T`; and whenever possible, a result typed value
 411 `t` of type `T`.
 412
 413
 414 Integer to integer conversions are always defined:
 415
 416 * If `T` is unsigned, the abstract value which is effectively represented is not
 417 `N` but `M=[ N % ( abt(h) + 1 ) ]`, where `h` is the highest unsigned typed
 418 value of type `T`.
 419 * If `T` is signed and `N` is not directly representable, the result `t` is
 420 [_implementation-defined], which means that the C++ implementation is required to
 421 produce a value `t` even if it is totally unrelated to `s`.
 422
 423
 424 Floating to Floating conversions are defined only if `N` is representable;
 425 if it is not, the conversion has [_undefined behavior].
 426
 427 * If `N` is exactly representable, `t` is required to be the exact representation.
 428 * If `N` is inexactly representable, `t` is required to be one of the two
 429 adjacents, with an implementation-defined choice of rounding direction;
 430 that is, the conversion is required to be correctly rounded.
 431
 432
 433 Floating to Integer conversions represent not `N` but `M=trunc(N)`, were
 434 `trunc()` is to truncate: i.e. to remove the fractional part, if any.
 435
 436 * If `M` is not representable in `T`, the conversion has [_undefined behavior]
 437 (unless `T` is `bool`, see §4.12).
 438
 439
 440 Integer to Floating conversions are always defined.
 441
 442 * If `N` is exactly representable, `t` is required to be the exact representation.
 443 * If `N` is inexactly representable, `t` is required to be one of the
 444 two adjacents, with an implementation-defined choice of rounding direction;
 445 that is, the conversion is required to be correctly rounded.
 446
 447 [endsect]
 448
 449 [section Subranged Conversion Direction, Subtype and Supertype]
 450
 451 Given a source type `S` and a destination type `T`, there is a
 452 [*conversion direction] denoted: `S->T`.
 453
 454 For any two ranges the following ['range relation] can be defined:
 455 A range `X` can be ['entirely contained] in a range `Y`, in which case
 456 it is said that `X` is enclosed by `Y`.
 457
 458 [: [*Formally:] `R(S)` is enclosed by `R(T)` iif `(R(S) intersection R(T)) == R(S)`.]
 459
 460 If the source type range, `R(S)`, is not enclosed in the target type range,
 461 `R(T)`; that is, if `(R(S) & R(T)) != R(S)`, the conversion direction is said
 462 to be [*subranged], which means that `R(S)` is not entirely contained in `R(T)`
 463 and therefore there is some portion of the source range which falls outside
 464 the target range. In other words, if a conversion direction `S->T` is subranged,
 465 there are values in `S` which cannot be represented in `T` because they are
 466 out of range.
 467 Notice that for `S->T`, the adjective subranged applies to `T`.
 468
 469 Examples:
 470
 471 Given the following numeric types all representing real numbers:
 472
 473 * `X` with numeric set `{-2.0,-1.0,0.0,+1.0,+2.0}` and range `[-2.0,+2.0]`
 474 * `Y` with numeric set `{-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0}` and range `[-2.0,+2.0]`
 475 * `Z` with numeric set `{-1.0,0.0,+1.0}` and range `[-1.0,+1.0]`
 476
 477 For:
 478
 479 [variablelist
 480 [[(a) X->Y:][
 481 `R(X) & R(Y) == R(X)`, then `X->Y` is not subranged.
 482 Thus, all values of type `X` are representable in the type `Y`.
 483 ]]
 484 [[(b) Y->X:][
 485 `R(Y) & R(X) == R(Y)`, then `Y->X` is not subranged.
 486 Thus, all values of type `Y` are representable in the type `X`, but in this case,
 487 some values are ['inexactly] representable (all the halves).
 488 (note: it is to permit this case that a range is an interval of abstract values and
 489 not an interval of typed values)
 490 ]]
 491 [[(b) X->Z:][
 492 `R(X) & R(Z) != R(X)`, then `X->Z` is subranged.
 493 Thus, some values of type `X` are not representable in the type `Z`, they fall
 494 out of range `(-2.0 and +2.0)`.
 495 ]]
 496 ]
 497
 498 It is possible that `R(S)` is not enclosed by `R(T)`, while neither is `R(T)` enclosed
 499 by `R(S)`; for example, `UNSIG=[0,255]` is not enclosed by `SIG=[-128,127]`;
 500 neither is `SIG` enclosed by `UNSIG`.
 501 This implies that is possible that a conversion direction is subranged both ways.
 502 This occurs when a mixture of signed/unsigned types are involved and indicates that
 503 in both directions there are values which can fall out of range.
 504
 505 Given the range relation (subranged or not) of a conversion direction `S->T`, it
 506 is possible to classify `S` and `T` as [*supertype] and [*subtype]:
 507 If the conversion is subranged, which means that `T` cannot represent all possible
 508 values of type `S`, `S` is the supertype and `T` the subtype; otherwise, `T` is the
 509 supertype and `S` the subtype.
 510
 511 For example:
 512
 513 [: `R(float)=[-FLT_MAX,FLT_MAX]` and `R(double)=[-DBL_MAX,DBL_MAX]` ]
 514
 515 If `FLT_MAX < DBL_MAX`:
 516
 517 * `double->float` is subranged and `supertype=double`, `subtype=float`.
 518 * `float->double` is not subranged and `supertype=double`, `subtype=float`.
 519
 520 Notice that while `double->float` is subranged, `float->double` is not,
 521 which yields the same supertype,subtype for both directions.
 522
 523 Now consider:
 524
 525 [: `R(int)=[INT_MIN,INT_MAX]` and `R(unsigned int)=[0,UINT_MAX]` ]
 526
 527 A C++ implementation is required to have `UINT_MAX > INT_MAX` (§3.9/3), so:
 528
 529 * 'int->unsigned' is subranged (negative values fall out of range)
 530 and `supertype=int`, `subtype=unsigned`.
 531 * 'unsigned->int' is ['also] subranged (high positive values fall out of range)
 532 and `supertype=unsigned`, `subtype=int`.
 533
 534 In this case, the conversion is subranged in both directions and the
 535 supertype,subtype pairs are not invariant (under inversion of direction).
 536 This indicates that none of the types can represent all the values of the other.
 537
 538 When the supertype is the same for both `S->T` and `T->S`, it is effectively
 539 indicating a type which can represent all the values of the subtype.
 540 Consequently, if a conversion `X->Y` is not subranged, but the opposite `(Y->X)` is,
 541 so that the supertype is always `Y`, it is said that the direction `X->Y` is [*correctly
 542 rounded value preserving], meaning that all such conversions are guaranteed to
 543 produce results in range and correctly rounded (even if inexact).
 544 For example, all integer to floating conversions are correctly rounded value preserving.
 545
 546 [endsect]
 547
 548 [endsect]
 549
 550