ceph/src/boost/libs/icl/doc/projects.qbk

   1 [/
   2     Copyright (c) 2009-2009 Joachim Faulhaber
   3
   4     Distributed under the Boost Software License, Version 1.0.
   5     (See accompanying file LICENSE_1_0.txt or copy at
   6     http://www.boost.org/LICENSE_1_0.txt)
   7 ]
   8
   9 [section Projects]
  10
  11 ['*Projects*] are examples on the usage of interval containers
  12 that go beyond small toy snippets of code. The code presented
  13 here addresses more serious applications that approach the
  14 quality of real world programming. At the same time it aims to
  15 guide the reader more deeply into various aspects of
  16 the library. In order not to overburden the reader with implementation
  17 details, the code in ['*projects*] tries to be ['*minimal*]. It has a focus on
  18 the main aspects of the projects and is not intended to be complete
  19 and mature like the library code itself. Cause it's minimal,
  20 project code lives in `namespace mini`.
  21
  22 [/
  23 [section Overview]
  24
  25 [table Overview over Icl Examples
  26     [[level]   [example] [classes] [features]]
  27     [[basic]   [[link boost_icl.examples.large_bitset Large Bitset]]
  28         [__itv_map__][The most simple application of an interval map:
  29                       Counting the overlaps of added intervals.]]
  30 ]
  31
  32 [endsect][/ Overview IN Projects]
  33 ]
  34
  35 [section Large Bitset][/ IN Projects]
  36
  37 Bitsets are just sets. Sets of unsigned integrals,
  38 to be more precise.
  39 The prefix ['*bit*] usually only indicates,
  40 that the representation of those sets is organized in a compressed
  41 form that exploits the fact, that we can switch on an off single
  42 bits in machine words. Bitsets are therefore known to be very small
  43 and thus efficient.
  44 The efficiency of bitsets is usually coupled to the
  45 precondition that the range of values of elements
  46 is relatively small, like [0..32) or [0..64), values that can
  47 be typically represented in single or a small number of
  48 machine words. If we wanted to represent a set containing
  49 two values {1, 1000000}, we would be much better off using
  50 other sets like e.g. an `std::set`.
  51
  52 Bitsets compress well, if elements spread over narrow ranges
  53 only. Interval sets compress well, if many elements are clustered
  54 over intervals. They can span large sets very efficiently then.
  55 In project ['*Large Bitset*] we want to ['*combine the bit compression
  56 and the interval compression*] to achieve a set implementation,
  57 that is capable of spanning large chunks of contiguous elements
  58 using intervals and also to represent more narrow ['nests] of varying
  59 bit sequences using bitset compression.
  60 As we will see, this can be achieved using only a small
  61 amount of code because most of the properties we need
  62 are provided by an __itv_map__ of `bitsets`:
  63
  64 ``
  65 typedef interval_map<IntegralT, SomeBitSet<N>, partial_absorber,
  66                      std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap;
  67 ``
  68
  69 Such an `IntervalBitmap` represents `k*N` bits for every segment.
  70 ``
  71 [a, a+k)->'1111....1111' // N bits associated: Represents a total of k*N bits.
  72 ``
  73
  74 For the interval `[a, a+k)` above all bits are set.
  75 But we can also have individual ['nests] or ['clusters]
  76 of bitsequences.
  77
  78 ``
  79 [b,  b+1)->'01001011...1'
  80 [b+1,b+2)->'11010001...0'
  81 . . .
  82 ``
  83
  84 and we can span intervals of equal bit sequences that represent
  85 periodic patterns.
  86
  87 ``
  88 [c,d)->'010101....01'  // Every second bit is set              in range [c,d)
  89 [d,e)->'001100..0011'  // Every two bits alterate              in range [d,e)
  90 [e,f)->'bit-sequence'  // 'bit-sequence' reoccurs every N bits in range [e,f)
  91 ``
  92
  93 An `IntervalBitmap` can represent
  94 `N*(2^M)` elements, if `M` is the number of bits of the
  95 integral type `IntegralT`. Unlike bitsets, that usually
  96 represent ['*unsigned*] integral numbers, large_bitset may
  97 range over negative numbers as well.
  98 There are fields where such large
  99 bitsets implementations are needed. E.g. for the compact
 100 representation of large file allocation tables.
 101 What remains to be done for project *Large Bitset* is
 102 to code a wrapper `class large_bitset` around `IntervalBitmap`
 103 so that `large_bitset` looks and feels like a usual
 104 set class.
 105
 106 [section Using large_bitset]
 107
 108 To quicken your appetite for a look at the implementation
 109 here are a few use cases first. Within the examples that follow,
 110 we will use `nat`[^['k]] for unsigned integrals
 111 and `bits`[^['k]] for bitsets containing [^['k]] bits.
 112
 113 Let's start large.
 114 In the first example . . .
 115
 116 [import ../example/large_bitset_/large_bitset.cpp]
 117 [large_bitset_test_large_set_all]
 118
 119 . . . we are testing the limits. First we set all bits and
 120 then we switch off the very last bit.
 121
 122 [large_bitset_test_large_erase_last]
 123
 124 Program output (/a little beautified/):
 125 ``
 126 ----- Test function test_large() -----------------------------------------------
 127 We have just turned on the awesome amount of 18,446,744,073,709,551,616 bits ;-)
 128 [                 0, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111111
 129 ---- Let's swich off the very last bit -----------------------------------------
 130 [                 0, 288230376151711743) -> 1111111111111111111111111111111111111111111111111111111111111111
 131 [288230376151711743, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111110
 132 ---- Venti is plenty ... let's do something small: A tall ----------------------
 133 ``
 134
 135 More readable is a smaller version of `large_bitset`.
 136 In function `test_small()` we apply a few more operations . . .
 137
 138 [large_bitset_test_small]
 139
 140 . . . producing this output:
 141 ``
 142 ----- Test function test_small() -----------
 143 -- Switch on all bits in range [0,64] ------
 144 [0,8)->11111111
 145 [8,9)->10000000
 146 --------------------------------------------
 147 -- Turn off bits: 25,27,28 -----------------
 148 [0,3)->11111111
 149 [3,4)->10100111
 150 [4,8)->11111111
 151 [8,9)->10000000
 152 --------------------------------------------
 153 -- Flip bits in range [24,30) --------------
 154 [0,3)->11111111
 155 [3,4)->01011011
 156 [4,8)->11111111
 157 [8,9)->10000000
 158 --------------------------------------------
 159 -- Remove the first 10 bits ----------------
 160 [1,2)->00111111
 161 [2,3)->11111111
 162 [3,4)->01011011
 163 [4,8)->11111111
 164 [8,9)->10000000
 165 -- Remove even bits in range [0,72) --------
 166 [1,2)->00010101
 167 [2,3)->01010101
 168 [3,4)->01010001
 169 [4,8)->01010101
 170 --    Set odd  bits in range [0,72) --------
 171 [0,9)->01010101
 172 --------------------------------------------
 173 ``
 174
 175 Finally, we present a little /picturesque/ example,
 176 that demonstrates that `large_bitset` can also serve
 177 as a self compressing bitmap, that we can 'paint' with.
 178
 179 [large_bitset_test_picturesque]
 180
 181 Note that we have two `large_bitsets` for the /outline/
 182 and the /interior/. Both parts are compressed but we can
 183 compose both by `operator +`, because the right /positions/
 184 are provided. This is the program output:
 185
 186 ``
 187 ----- Test function test_picturesque() -----
 188 -------- empty face:       3 intervals -----
 189 ********
 190 *      *
 191 *      *
 192 *      *
 193 *      *
 194  ******
 195 -------- compressed smile: 2 intervals -----
 196   *  *
 197   ****
 198 -------- staring bitset:   6 intervals -----
 199 ********
 200 *      *
 201 * *  * *
 202 *      *
 203 * **** *
 204  ******
 205 --------------------------------------------
 206 ``
 207
 208 So, may be you are curious how this class template
 209 is coded on top of __itv_map__ using only about 250 lines
 210 of code. This is shown in the sections that follow.
 211
 212 [endsect][/ Usage of a large_bitset]
 213
 214 [section The interval_bitmap]
 215
 216 To begin, let's look at the basic data type again, that
 217 will be providing the major functionality:
 218
 219 ``
 220 typedef interval_map<DomainT, BitSetT, partial_absorber,
 221                      std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap;
 222 ``
 223
 224 `DomainT` is supposed to be an integral
 225 type, the bitset type `BitSetT` will be a wrapper class around an
 226 unsigned integral type. `BitSetT` has to implement bitwise operators
 227 that will be called by the functors `inplace_bit_add<BitSetT>`
 228 and `inplace_bit_and<BitSetT>`.
 229 The type trait of interval_map is `partial_absorber`, which means
 230 that it is /partial/ and that empty `BitSetTs` are not stored in
 231 the map. This is desired and keeps the `interval_map` minimal, storing
 232 only bitsets, that contain at least one bit switched on.
 233 Functor template `inplace_bit_add` for parameter `Combine` indicates that we
 234 do not expect `operator +=` as addition but the bitwise operator
 235 `|=`. For template parameter `Section` which is instaniated by
 236 `inplace_bit_and` we expect the bitwise `&=` operator.
 237
 238 [endsect][/ section The interval_bitmap]
 239
 240 [section A class implementation for the bitset type]
 241
 242 The code of the project is enclosed in a `namespace mini`.
 243 The name indicates, that the implementation is a /minimal/
 244 example implementation. The name of the bitset class will
 245 be `bits` or `mini::bits` if qualified.
 246
 247 To be used as a codomain parameter of class template __itv_map__,
 248 `mini::bits` has to implement all the functions that are required
 249 for a codomain_type in general, which are the default constructor `bits()`
 250 and an equality `operator==`. Moreover `mini::bits` has to implement operators
 251 required by the instantiations for parameter `Combine` and `Section`
 252 which are `inplace_bit_add` and `inplace_bit_and`. From functors
 253 `inplace_bit_add` and `inplace_bit_and` there are inverse functors
 254 `inplace_bit_subtract` and `inplace_bit_xor`. Those functors
 255 use operators `|= &= ^=` and `~`. Finally if we want to apply
 256 lexicographical and subset comparison on large_bitset, we also
 257 need an `operator <`. All the operators that we need can be implemented
 258 for `mini::bits` on a few lines:
 259
 260 [import ../example/large_bitset_/bits.hpp]
 261 [mini_bits_class_bits]
 262
 263 Finally there is one important piece of meta information, we have
 264 to provide: `mini::bits` has to be recognized as a `Set` by the
 265 icl code. Otherwise we can not exploit the fact that a map of
 266 sets is model of `Set` and the resulting large_bitset would not
 267 behave like a set. So we have to say that `mini::bits` shall be sets:
 268
 269 [mini_bits_is_set]
 270
 271 This is done by adding a partial template specialization to
 272 the type trait template `icl::is_set`.
 273 For the extension of this type trait template and the result
 274 values of inclusion_compare we need these `#includes` for the
 275 implementation of `mini::bits`:
 276
 277 [mini_bits_includes]
 278
 279 [endsect][/ A class implementation for the bitset type]
 280
 281 [section Implementation of a large bitset]
 282
 283 Having finished our `mini::bits` implementation, we can start to
 284 code the wrapper class that hides the efficient interval map of mini::bits
 285 and exposes a simple and convenient set behavior to the world of users.
 286
 287 Let's start with the required `#include`s this time:
 288
 289 [import ../example/large_bitset_/large_bitset.hpp]
 290 [large_bitset_includes]
 291
 292 Besides `boost/icl/interval_map.hpp` and `bits.hpp` the most important
 293 include here is `boost/operators.hpp`. We use this library in order
 294 to further minimize the code and to provide pretty extensive operator
 295 functionality using very little code.
 296
 297 For a short and concise naming of the most important unsigned integer
 298 types and the corresponding `mini::bits` we define this:
 299
 300 [large_bitset_natural_typedefs]
 301
 302 [section large_bitset: Public interface][/ ------------------------------------]
 303
 304 And now let's code `large_bitset`.
 305
 306 [large_bitset_class_template_header]
 307
 308 The first template parameter `DomainT` will be instantiated with
 309 an integral type that defines the kind of numbers that can
 310 be elements of the set. Since we want to go for a large set we
 311 use `nat64` as default which is a 64 bit unsigned integer ranging
 312 from `0` to `2^64-1`. As bitset parameter we also choose a 64-bit
 313 default. Parameters `Combine` and `Interval` are necessary to
 314 be passed to dependent type expressions. An allocator can be
 315 chosen, if desired.
 316
 317 The nested list of private inheritance contains groups of template
 318 instantiations from
 319 [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator],
 320 that provides derivable operators from more fundamental once.
 321 Implementing the fundamental operators, we get the derivable ones
 322 /for free/. Below is a short overview of what we get using Boost.Operator,
 323 where __S stands for `large_bitset`, __i for it's `interval_type`
 324 and __e for it's `domain_type` or `element_type`.
 325
 326 [table
 327 [[Group                    ][fundamental]      [derivable     ]]
 328 [[Equality, ordering       ][`==`]             [`!=`          ]]
 329 [[                         ][`<` ]             [`> <= >=`     ]]
 330 [[Set operators (__S x __S)][`+= |= -= &= ^=` ][`+ | - & ^`   ]]
 331 [[Set operators (__S x __e)][`+= |= -= &= ^=` ][`+ | - & ^`   ]]
 332 [[Set operators (__S x __i)][`+= |= -= &= ^=` ][`+ | - & ^`   ]]
 333 ]
 334
 335 There is a number of associated types
 336
 337 [large_bitset_associated_types]
 338
 339 most importantly the implementing `interval_bitmap_type` that is used
 340 for the implementing container.
 341
 342 [large_bitmap_impl_map]
 343
 344 In order to use
 345 [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator]
 346 we have to implement the fundamental operators as class members. This can be
 347 done quite schematically.
 348
 349 [large_bitset_operators]
 350
 351 As we can see, the seven most important operators that work on the
 352 class type `large_bitset` can be directly implemented by propagating
 353 the operation to the implementing `_map`
 354 of type `interval_bitmap_type`. For the operators that work on segment and
 355 element types, we use member functions `add`, `subtract`, `intersect` and
 356 `flip`. As we will see only a small amount of adaper code is needed
 357 to couple those functions with the functionality of the implementing
 358 container.
 359
 360 Member functions `add`, `subtract`, `intersect` and `flip`,
 361 that allow to combine ['*intervals*] to `large_bitsets` can
 362 be uniformly implemented using a private function
 363 `segment_apply` that applies /addition/, /subtraction/,
 364 /intersection/ or /symmetric difference/, after having
 365 translated the interval's borders into the right bitset
 366 positions.
 367
 368 [large_bitset_fundamental_functions]
 369
 370 In the sample programs, that we will present to demonstrate
 371 the capabilities of `large_bitset` we will need a few additional
 372 functions specifically output functions in two different
 373 flavors.
 374
 375 [large_bitset_demo_functions]
 376
 377 * The first one, `show_segments()` shows the container
 378 content as it is implemented, in the compressed form.
 379
 380 * The second function `show_matrix` shows the complete
 381 matrix of bits that is represented by the container.
 382
 383 [endsect][/ large_bitset: Public interface]
 384 [section large_bitset: Private implementation]  [/ -------------------------------------]
 385
 386 In order to implement operations like the addition of
 387 an element say `42` to
 388 the large bitset, we need to translate the /value/ to the
 389 /position/ of the associated *bit* representing `42` in the
 390 interval container of bitsets. As an example, suppose we
 391 use a
 392 ``
 393 large_bitset<nat, mini::bits8> lbs;
 394 ``
 395 that carries small bitsets of 8 bits only.
 396 The first four interval of `lbs` are assumed to
 397 be associated with some bitsets. Now we want to add the interval
 398 `[a,b]==[5,27]`. This will result in the following situation:
 399 ``
 400    [0,1)->   [1,2)->   [2,3)->   [3,4)->
 401    [00101100][11001011][11101001][11100000]
 402 +       [111  11111111  11111111  1111]      [5,27] as bitset
 403          a                           b
 404
 405 => [0,1)->   [1,3)->   [3,4)->
 406    [00101111][11111111][11110000]
 407 ``
 408 So we have to convert values 5 and 27 into a part that
 409 points to the interval and a part that refers to the
 410 position within the interval, which is done by a
 411 /division/ and a /modulo/ computation. (In order to have a
 412 consistent representation of the bitsequences across the containers,
 413 within this project, bitsets are denoted with the
 414 ['*least significant bit on the left!*])
 415
 416 ``
 417 A = a/8 =  5/8 = 0 // refers to interval
 418 B = b/8 = 27/8 = 3
 419 R = a%8 =  5%8 = 5 // refers to the position in the associated bitset.
 420 S = b%8 = 27%8 = 3
 421 ``
 422
 423 All /division/ and /modulo operations/ needed here are always done
 424 using a divisor `d` that is a power of `2`: `d = 2^x`. Therefore
 425 division and modulo can be expressed by bitset operations.
 426 The constants needed for those bitset computations are defined here:
 427
 428 [large_bitset_impl_constants]
 429
 430 Looking at the example again, we can see that we have to identify the
 431 positions of the beginning and ending of the interval [5,27] that is
 432 to insert, and then ['*subdivide*] that range of bitsets into three partitions.
 433
 434 # The bitset where the interval starts.
 435 # the bitset where the interval ends
 436 # The bitsets that are completely overlapped by the interval
 437
 438 ``
 439 combine interval [5,27] to large_bitset lbs w.r.t. some operation o
 440
 441    [0,1)->   [1,2)->   [2,3)->   [3,4)->
 442    [00101100][11001011][11101001][11100000]
 443 o       [111  11111111  11111111  1111]
 444          a                           b
 445 subdivide:
 446    [first!  ][mid_1] . . .[mid_n][   !last]
 447    [00000111][1...1] . . .[1...1][11110000]
 448 ``
 449
 450 After subdividing, we perform the operation `o` as follows:
 451
 452 # For the first bitset: Set all bits from ther starting bit (`!`)
 453   to the end of the bitset to `1`. All other bits are `0`. Then
 454   perform operation `o`: `_map o= ([0,1)->00000111)`
 455
 456 # For the last bitset: Set all bits from the beginning of the bitset
 457   to the ending bit (`!`) to `1`. All other bits are `0`. Then
 458   perform operation `o`: `_map o= ([3,4)->11110000)`
 459
 460 # For the range of bitsets in between the staring and ending one,
 461   perform operation `o`: `_map o= ([1,3)->11111111)`
 462
 463 The algorithm, that has been outlined and illustrated by the
 464 example, is implemented by the private member function
 465 `segment_apply`. To make the combiner operation a variable
 466 in this algorithm, we use a /pointer to member function type/
 467
 468 [large_bitset_segment_combiner]
 469
 470 as first function argument. We will pass member functions `combine_` here,
 471 ``
 472 combine_(first_of_interval, end_of_interval, some_bitset);
 473 ``
 474 that take the beginning and ending of an interval and a bitset and combine
 475 them to the implementing `interval_bitmap_type _map`. Here are these functions:
 476
 477 [large_bitmap_combiners]
 478
 479 Finally we can code function `segment_apply`, that does the partitioning
 480 and subsequent combining:
 481
 482 [large_bitset_segment_apply]
 483
 484 The functions that help filling bitsets to and from a given bit are
 485 implemented here:
 486 [large_bitset_bitset_filler]
 487
 488 This completes the implementation of class template `large_bitset`.
 489 Using only a small amount of mostly schematic code,
 490 we have been able to provide a pretty powerful, self compressing
 491 and generally usable set type for all integral domain types.
 492
 493 [endsect][/ large_bitset: Private implementation]
 494
 495 [endsect][/ Implementation of a large bitset]
 496
 497 [endsect][/ Large Bitsets IN Projects]
 498
 499
 500 [endsect][/ Projects]
 501
 502
 503