]>
Commit | Line | Data |
---|---|---|
1 | [/ | |
2 | Copyright (c) 2009-2009 Joachim Faulhaber | |
3 | ||
4 | Distributed under the Boost Software License, Version 1.0. | |
5 | (See accompanying file LICENSE_1_0.txt or copy at | |
6 | http://www.boost.org/LICENSE_1_0.txt) | |
7 | ] | |
8 | ||
9 | [section Projects] | |
10 | ||
11 | ['*Projects*] are examples on the usage of interval containers | |
12 | that go beyond small toy snippets of code. The code presented | |
13 | here addresses more serious applications that approach the | |
14 | quality of real world programming. At the same time it aims to | |
15 | guide the reader more deeply into various aspects of | |
16 | the library. In order not to overburden the reader with implementation | |
17 | details, the code in ['*projects*] tries to be ['*minimal*]. It has a focus on | |
18 | the main aspects of the projects and is not intended to be complete | |
19 | and mature like the library code itself. Cause it's minimal, | |
20 | project code lives in `namespace mini`. | |
21 | ||
22 | [/ | |
23 | [section Overview] | |
24 | ||
25 | [table Overview over Icl Examples | |
26 | [[level] [example] [classes] [features]] | |
27 | [[basic] [[link boost_icl.examples.large_bitset Large Bitset]] | |
28 | [__itv_map__][The most simple application of an interval map: | |
29 | Counting the overlaps of added intervals.]] | |
30 | ] | |
31 | ||
32 | [endsect][/ Overview IN Projects] | |
33 | ] | |
34 | ||
35 | [section Large Bitset][/ IN Projects] | |
36 | ||
37 | Bitsets are just sets. Sets of unsigned integrals, | |
38 | to be more precise. | |
39 | The prefix ['*bit*] usually only indicates, | |
40 | that the representation of those sets is organized in a compressed | |
41 | form that exploits the fact, that we can switch on an off single | |
42 | bits in machine words. Bitsets are therefore known to be very small | |
43 | and thus efficient. | |
44 | The efficiency of bitsets is usually coupled to the | |
45 | precondition that the range of values of elements | |
46 | is relatively small, like [0..32) or [0..64), values that can | |
47 | be typically represented in single or a small number of | |
48 | machine words. If we wanted to represent a set containing | |
49 | two values {1, 1000000}, we would be much better off using | |
50 | other sets like e.g. an `std::set`. | |
51 | ||
52 | Bitsets compress well, if elements spread over narrow ranges | |
53 | only. Interval sets compress well, if many elements are clustered | |
54 | over intervals. They can span large sets very efficiently then. | |
55 | In project ['*Large Bitset*] we want to ['*combine the bit compression | |
56 | and the interval compression*] to achieve a set implementation, | |
57 | that is capable of spanning large chunks of contiguous elements | |
58 | using intervals and also to represent more narrow ['nests] of varying | |
59 | bit sequences using bitset compression. | |
60 | As we will see, this can be achieved using only a small | |
61 | amount of code because most of the properties we need | |
62 | are provided by an __itv_map__ of `bitsets`: | |
63 | ||
64 | `` | |
65 | typedef interval_map<IntegralT, SomeBitSet<N>, partial_absorber, | |
66 | std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap; | |
67 | `` | |
68 | ||
69 | Such an `IntervalBitmap` represents `k*N` bits for every segment. | |
70 | `` | |
71 | [a, a+k)->'1111....1111' // N bits associated: Represents a total of k*N bits. | |
72 | `` | |
73 | ||
74 | For the interval `[a, a+k)` above all bits are set. | |
75 | But we can also have individual ['nests] or ['clusters] | |
76 | of bitsequences. | |
77 | ||
78 | `` | |
79 | [b, b+1)->'01001011...1' | |
80 | [b+1,b+2)->'11010001...0' | |
81 | . . . | |
82 | `` | |
83 | ||
84 | and we can span intervals of equal bit sequences that represent | |
85 | periodic patterns. | |
86 | ||
87 | `` | |
88 | [c,d)->'010101....01' // Every second bit is set in range [c,d) | |
89 | [d,e)->'001100..0011' // Every two bits alterate in range [d,e) | |
90 | [e,f)->'bit-sequence' // 'bit-sequence' reoccurs every N bits in range [e,f) | |
91 | `` | |
92 | ||
93 | An `IntervalBitmap` can represent | |
94 | `N*(2^M)` elements, if `M` is the number of bits of the | |
95 | integral type `IntegralT`. Unlike bitsets, that usually | |
96 | represent ['*unsigned*] integral numbers, large_bitset may | |
97 | range over negative numbers as well. | |
98 | There are fields where such large | |
99 | bitsets implementations are needed. E.g. for the compact | |
100 | representation of large file allocation tables. | |
101 | What remains to be done for project *Large Bitset* is | |
102 | to code a wrapper `class large_bitset` around `IntervalBitmap` | |
103 | so that `large_bitset` looks and feels like a usual | |
104 | set class. | |
105 | ||
106 | [section Using large_bitset] | |
107 | ||
108 | To quicken your appetite for a look at the implementation | |
109 | here are a few use cases first. Within the examples that follow, | |
110 | we will use `nat`[^['k]] for unsigned integrals | |
111 | and `bits`[^['k]] for bitsets containing [^['k]] bits. | |
112 | ||
113 | Let's start large. | |
114 | In the first example . . . | |
115 | ||
116 | [import ../example/large_bitset_/large_bitset.cpp] | |
117 | [large_bitset_test_large_set_all] | |
118 | ||
119 | . . . we are testing the limits. First we set all bits and | |
120 | then we switch off the very last bit. | |
121 | ||
122 | [large_bitset_test_large_erase_last] | |
123 | ||
124 | Program output (/a little beautified/): | |
125 | `` | |
126 | ----- Test function test_large() ----------------------------------------------- | |
127 | We have just turned on the awesome amount of 18,446,744,073,709,551,616 bits ;-) | |
128 | [ 0, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111111 | |
129 | ---- Let's swich off the very last bit ----------------------------------------- | |
130 | [ 0, 288230376151711743) -> 1111111111111111111111111111111111111111111111111111111111111111 | |
131 | [288230376151711743, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111110 | |
132 | ---- Venti is plenty ... let's do something small: A tall ---------------------- | |
133 | `` | |
134 | ||
135 | More readable is a smaller version of `large_bitset`. | |
136 | In function `test_small()` we apply a few more operations . . . | |
137 | ||
138 | [large_bitset_test_small] | |
139 | ||
140 | . . . producing this output: | |
141 | `` | |
142 | ----- Test function test_small() ----------- | |
143 | -- Switch on all bits in range [0,64] ------ | |
144 | [0,8)->11111111 | |
145 | [8,9)->10000000 | |
146 | -------------------------------------------- | |
147 | -- Turn off bits: 25,27,28 ----------------- | |
148 | [0,3)->11111111 | |
149 | [3,4)->10100111 | |
150 | [4,8)->11111111 | |
151 | [8,9)->10000000 | |
152 | -------------------------------------------- | |
153 | -- Flip bits in range [24,30) -------------- | |
154 | [0,3)->11111111 | |
155 | [3,4)->01011011 | |
156 | [4,8)->11111111 | |
157 | [8,9)->10000000 | |
158 | -------------------------------------------- | |
159 | -- Remove the first 10 bits ---------------- | |
160 | [1,2)->00111111 | |
161 | [2,3)->11111111 | |
162 | [3,4)->01011011 | |
163 | [4,8)->11111111 | |
164 | [8,9)->10000000 | |
165 | -- Remove even bits in range [0,72) -------- | |
166 | [1,2)->00010101 | |
167 | [2,3)->01010101 | |
168 | [3,4)->01010001 | |
169 | [4,8)->01010101 | |
170 | -- Set odd bits in range [0,72) -------- | |
171 | [0,9)->01010101 | |
172 | -------------------------------------------- | |
173 | `` | |
174 | ||
175 | Finally, we present a little /picturesque/ example, | |
176 | that demonstrates that `large_bitset` can also serve | |
177 | as a self compressing bitmap, that we can 'paint' with. | |
178 | ||
179 | [large_bitset_test_picturesque] | |
180 | ||
181 | Note that we have two `large_bitsets` for the /outline/ | |
182 | and the /interior/. Both parts are compressed but we can | |
183 | compose both by `operator +`, because the right /positions/ | |
184 | are provided. This is the program output: | |
185 | ||
186 | `` | |
187 | ----- Test function test_picturesque() ----- | |
188 | -------- empty face: 3 intervals ----- | |
189 | ******** | |
190 | * * | |
191 | * * | |
192 | * * | |
193 | * * | |
194 | ****** | |
195 | -------- compressed smile: 2 intervals ----- | |
196 | * * | |
197 | **** | |
198 | -------- staring bitset: 6 intervals ----- | |
199 | ******** | |
200 | * * | |
201 | * * * * | |
202 | * * | |
203 | * **** * | |
204 | ****** | |
205 | -------------------------------------------- | |
206 | `` | |
207 | ||
208 | So, may be you are curious how this class template | |
209 | is coded on top of __itv_map__ using only about 250 lines | |
210 | of code. This is shown in the sections that follow. | |
211 | ||
212 | [endsect][/ Usage of a large_bitset] | |
213 | ||
214 | [section The interval_bitmap] | |
215 | ||
216 | To begin, let's look at the basic data type again, that | |
217 | will be providing the major functionality: | |
218 | ||
219 | `` | |
220 | typedef interval_map<DomainT, BitSetT, partial_absorber, | |
221 | std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap; | |
222 | `` | |
223 | ||
224 | `DomainT` is supposed to be an integral | |
225 | type, the bitset type `BitSetT` will be a wrapper class around an | |
226 | unsigned integral type. `BitSetT` has to implement bitwise operators | |
227 | that will be called by the functors `inplace_bit_add<BitSetT>` | |
228 | and `inplace_bit_and<BitSetT>`. | |
229 | The type trait of interval_map is `partial_absorber`, which means | |
230 | that it is /partial/ and that empty `BitSetTs` are not stored in | |
231 | the map. This is desired and keeps the `interval_map` minimal, storing | |
232 | only bitsets, that contain at least one bit switched on. | |
233 | Functor template `inplace_bit_add` for parameter `Combine` indicates that we | |
234 | do not expect `operator +=` as addition but the bitwise operator | |
235 | `|=`. For template parameter `Section` which is instaniated by | |
236 | `inplace_bit_and` we expect the bitwise `&=` operator. | |
237 | ||
238 | [endsect][/ section The interval_bitmap] | |
239 | ||
240 | [section A class implementation for the bitset type] | |
241 | ||
242 | The code of the project is enclosed in a `namespace mini`. | |
243 | The name indicates, that the implementation is a /minimal/ | |
244 | example implementation. The name of the bitset class will | |
245 | be `bits` or `mini::bits` if qualified. | |
246 | ||
247 | To be used as a codomain parameter of class template __itv_map__, | |
248 | `mini::bits` has to implement all the functions that are required | |
249 | for a codomain_type in general, which are the default constructor `bits()` | |
250 | and an equality `operator==`. Moreover `mini::bits` has to implement operators | |
251 | required by the instantiations for parameter `Combine` and `Section` | |
252 | which are `inplace_bit_add` and `inplace_bit_and`. From functors | |
253 | `inplace_bit_add` and `inplace_bit_and` there are inverse functors | |
254 | `inplace_bit_subtract` and `inplace_bit_xor`. Those functors | |
255 | use operators `|= &= ^=` and `~`. Finally if we want to apply | |
256 | lexicographical and subset comparison on large_bitset, we also | |
257 | need an `operator <`. All the operators that we need can be implemented | |
258 | for `mini::bits` on a few lines: | |
259 | ||
260 | [import ../example/large_bitset_/bits.hpp] | |
261 | [mini_bits_class_bits] | |
262 | ||
263 | Finally there is one important piece of meta information, we have | |
264 | to provide: `mini::bits` has to be recognized as a `Set` by the | |
265 | icl code. Otherwise we can not exploit the fact that a map of | |
266 | sets is model of `Set` and the resulting large_bitset would not | |
267 | behave like a set. So we have to say that `mini::bits` shall be sets: | |
268 | ||
269 | [mini_bits_is_set] | |
270 | ||
271 | This is done by adding a partial template specialization to | |
272 | the type trait template `icl::is_set`. | |
273 | For the extension of this type trait template and the result | |
274 | values of inclusion_compare we need these `#includes` for the | |
275 | implementation of `mini::bits`: | |
276 | ||
277 | [mini_bits_includes] | |
278 | ||
279 | [endsect][/ A class implementation for the bitset type] | |
280 | ||
281 | [section Implementation of a large bitset] | |
282 | ||
283 | Having finished our `mini::bits` implementation, we can start to | |
284 | code the wrapper class that hides the efficient interval map of mini::bits | |
285 | and exposes a simple and convenient set behavior to the world of users. | |
286 | ||
287 | Let's start with the required `#include`s this time: | |
288 | ||
289 | [import ../example/large_bitset_/large_bitset.hpp] | |
290 | [large_bitset_includes] | |
291 | ||
292 | Besides `boost/icl/interval_map.hpp` and `bits.hpp` the most important | |
293 | include here is `boost/operators.hpp`. We use this library in order | |
294 | to further minimize the code and to provide pretty extensive operator | |
295 | functionality using very little code. | |
296 | ||
297 | For a short and concise naming of the most important unsigned integer | |
298 | types and the corresponding `mini::bits` we define this: | |
299 | ||
300 | [large_bitset_natural_typedefs] | |
301 | ||
302 | [section large_bitset: Public interface][/ ------------------------------------] | |
303 | ||
304 | And now let's code `large_bitset`. | |
305 | ||
306 | [large_bitset_class_template_header] | |
307 | ||
308 | The first template parameter `DomainT` will be instantiated with | |
309 | an integral type that defines the kind of numbers that can | |
310 | be elements of the set. Since we want to go for a large set we | |
311 | use `nat64` as default which is a 64 bit unsigned integer ranging | |
312 | from `0` to `2^64-1`. As bitset parameter we also choose a 64-bit | |
313 | default. Parameters `Combine` and `Interval` are necessary to | |
314 | be passed to dependent type expressions. An allocator can be | |
315 | chosen, if desired. | |
316 | ||
317 | The nested list of private inheritance contains groups of template | |
318 | instantiations from | |
319 | [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator], | |
320 | that provides derivable operators from more fundamental once. | |
321 | Implementing the fundamental operators, we get the derivable ones | |
322 | /for free/. Below is a short overview of what we get using Boost.Operator, | |
323 | where __S stands for `large_bitset`, __i for it's `interval_type` | |
324 | and __e for it's `domain_type` or `element_type`. | |
325 | ||
326 | [table | |
327 | [[Group ][fundamental] [derivable ]] | |
328 | [[Equality, ordering ][`==`] [`!=` ]] | |
329 | [[ ][`<` ] [`> <= >=` ]] | |
330 | [[Set operators (__S x __S)][`+= |= -= &= ^=` ][`+ | - & ^` ]] | |
331 | [[Set operators (__S x __e)][`+= |= -= &= ^=` ][`+ | - & ^` ]] | |
332 | [[Set operators (__S x __i)][`+= |= -= &= ^=` ][`+ | - & ^` ]] | |
333 | ] | |
334 | ||
335 | There is a number of associated types | |
336 | ||
337 | [large_bitset_associated_types] | |
338 | ||
339 | most importantly the implementing `interval_bitmap_type` that is used | |
340 | for the implementing container. | |
341 | ||
342 | [large_bitmap_impl_map] | |
343 | ||
344 | In order to use | |
345 | [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator] | |
346 | we have to implement the fundamental operators as class members. This can be | |
347 | done quite schematically. | |
348 | ||
349 | [large_bitset_operators] | |
350 | ||
351 | As we can see, the seven most important operators that work on the | |
352 | class type `large_bitset` can be directly implemented by propagating | |
353 | the operation to the implementing `_map` | |
354 | of type `interval_bitmap_type`. For the operators that work on segment and | |
355 | element types, we use member functions `add`, `subtract`, `intersect` and | |
356 | `flip`. As we will see only a small amount of adaper code is needed | |
357 | to couple those functions with the functionality of the implementing | |
358 | container. | |
359 | ||
360 | Member functions `add`, `subtract`, `intersect` and `flip`, | |
361 | that allow to combine ['*intervals*] to `large_bitsets` can | |
362 | be uniformly implemented using a private function | |
363 | `segment_apply` that applies /addition/, /subtraction/, | |
364 | /intersection/ or /symmetric difference/, after having | |
365 | translated the interval's borders into the right bitset | |
366 | positions. | |
367 | ||
368 | [large_bitset_fundamental_functions] | |
369 | ||
370 | In the sample programs, that we will present to demonstrate | |
371 | the capabilities of `large_bitset` we will need a few additional | |
372 | functions specifically output functions in two different | |
373 | flavors. | |
374 | ||
375 | [large_bitset_demo_functions] | |
376 | ||
377 | * The first one, `show_segments()` shows the container | |
378 | content as it is implemented, in the compressed form. | |
379 | ||
380 | * The second function `show_matrix` shows the complete | |
381 | matrix of bits that is represented by the container. | |
382 | ||
383 | [endsect][/ large_bitset: Public interface] | |
384 | [section large_bitset: Private implementation] [/ -------------------------------------] | |
385 | ||
386 | In order to implement operations like the addition of | |
387 | an element say `42` to | |
388 | the large bitset, we need to translate the /value/ to the | |
389 | /position/ of the associated *bit* representing `42` in the | |
390 | interval container of bitsets. As an example, suppose we | |
391 | use a | |
392 | `` | |
393 | large_bitset<nat, mini::bits8> lbs; | |
394 | `` | |
395 | that carries small bitsets of 8 bits only. | |
396 | The first four interval of `lbs` are assumed to | |
397 | be associated with some bitsets. Now we want to add the interval | |
398 | `[a,b]==[5,27]`. This will result in the following situation: | |
399 | `` | |
400 | [0,1)-> [1,2)-> [2,3)-> [3,4)-> | |
401 | [00101100][11001011][11101001][11100000] | |
402 | + [111 11111111 11111111 1111] [5,27] as bitset | |
403 | a b | |
404 | ||
405 | => [0,1)-> [1,3)-> [3,4)-> | |
406 | [00101111][11111111][11110000] | |
407 | `` | |
408 | So we have to convert values 5 and 27 into a part that | |
409 | points to the interval and a part that refers to the | |
410 | position within the interval, which is done by a | |
411 | /division/ and a /modulo/ computation. (In order to have a | |
412 | consistent representation of the bitsequences across the containers, | |
413 | within this project, bitsets are denoted with the | |
414 | ['*least significant bit on the left!*]) | |
415 | ||
416 | `` | |
417 | A = a/8 = 5/8 = 0 // refers to interval | |
418 | B = b/8 = 27/8 = 3 | |
419 | R = a%8 = 5%8 = 5 // refers to the position in the associated bitset. | |
420 | S = b%8 = 27%8 = 3 | |
421 | `` | |
422 | ||
423 | All /division/ and /modulo operations/ needed here are always done | |
424 | using a divisor `d` that is a power of `2`: `d = 2^x`. Therefore | |
425 | division and modulo can be expressed by bitset operations. | |
426 | The constants needed for those bitset computations are defined here: | |
427 | ||
428 | [large_bitset_impl_constants] | |
429 | ||
430 | Looking at the example again, we can see that we have to identify the | |
431 | positions of the beginning and ending of the interval [5,27] that is | |
432 | to insert, and then ['*subdivide*] that range of bitsets into three partitions. | |
433 | ||
434 | # The bitset where the interval starts. | |
435 | # the bitset where the interval ends | |
436 | # The bitsets that are completely overlapped by the interval | |
437 | ||
438 | `` | |
439 | combine interval [5,27] to large_bitset lbs w.r.t. some operation o | |
440 | ||
441 | [0,1)-> [1,2)-> [2,3)-> [3,4)-> | |
442 | [00101100][11001011][11101001][11100000] | |
443 | o [111 11111111 11111111 1111] | |
444 | a b | |
445 | subdivide: | |
446 | [first! ][mid_1] . . .[mid_n][ !last] | |
447 | [00000111][1...1] . . .[1...1][11110000] | |
448 | `` | |
449 | ||
450 | After subdividing, we perform the operation `o` as follows: | |
451 | ||
452 | # For the first bitset: Set all bits from ther starting bit (`!`) | |
453 | to the end of the bitset to `1`. All other bits are `0`. Then | |
454 | perform operation `o`: `_map o= ([0,1)->00000111)` | |
455 | ||
456 | # For the last bitset: Set all bits from the beginning of the bitset | |
457 | to the ending bit (`!`) to `1`. All other bits are `0`. Then | |
458 | perform operation `o`: `_map o= ([3,4)->11110000)` | |
459 | ||
460 | # For the range of bitsets in between the staring and ending one, | |
461 | perform operation `o`: `_map o= ([1,3)->11111111)` | |
462 | ||
463 | The algorithm, that has been outlined and illustrated by the | |
464 | example, is implemented by the private member function | |
465 | `segment_apply`. To make the combiner operation a variable | |
466 | in this algorithm, we use a /pointer to member function type/ | |
467 | ||
468 | [large_bitset_segment_combiner] | |
469 | ||
470 | as first function argument. We will pass member functions `combine_` here, | |
471 | `` | |
472 | combine_(first_of_interval, end_of_interval, some_bitset); | |
473 | `` | |
474 | that take the beginning and ending of an interval and a bitset and combine | |
475 | them to the implementing `interval_bitmap_type _map`. Here are these functions: | |
476 | ||
477 | [large_bitmap_combiners] | |
478 | ||
479 | Finally we can code function `segment_apply`, that does the partitioning | |
480 | and subsequent combining: | |
481 | ||
482 | [large_bitset_segment_apply] | |
483 | ||
484 | The functions that help filling bitsets to and from a given bit are | |
485 | implemented here: | |
486 | [large_bitset_bitset_filler] | |
487 | ||
488 | This completes the implementation of class template `large_bitset`. | |
489 | Using only a small amount of mostly schematic code, | |
490 | we have been able to provide a pretty powerful, self compressing | |
491 | and generally usable set type for all integral domain types. | |
492 | ||
493 | [endsect][/ large_bitset: Private implementation] | |
494 | ||
495 | [endsect][/ Implementation of a large bitset] | |
496 | ||
497 | [endsect][/ Large Bitsets IN Projects] | |
498 | ||
499 | ||
500 | [endsect][/ Projects] | |
501 | ||
502 | ||
503 |