]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/ Copyright 2006-2008 Daniel James. |
2 | / Distributed under the Boost Software License, Version 1.0. (See accompanying | |
3 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] | |
4 | ||
5 | [section:buckets The Data Structure] | |
6 | ||
7 | The containers are made up of a number of 'buckets', each of which can contain | |
8 | any number of elements. For example, the following diagram shows an [classref | |
9 | boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`, | |
10 | `B`, `C`, `D` and `E` (this is just for illustration, containers will typically | |
11 | have more buckets). | |
12 | ||
13 | [diagram buckets] | |
14 | ||
15 | In order to decide which bucket to place an element in, the container applies | |
16 | the hash function, `Hash`, to the element's key (for `unordered_set` and | |
17 | `unordered_multiset` the key is the whole element, but is referred to as the key | |
18 | so that the same terminology can be used for sets and maps). This returns a | |
19 | value of type `std::size_t`. `std::size_t` has a much greater range of values | |
20 | then the number of buckets, so that container applies another transformation to | |
21 | that value to choose a bucket to place the element in. | |
22 | ||
23 | Retrieving the elements for a given key is simple. The same process is applied | |
24 | to the key to find the correct bucket. Then the key is compared with the | |
25 | elements in the bucket to find any elements that match (using the equality | |
26 | predicate `Pred`). If the hash function has worked well the elements will be | |
27 | evenly distributed amongst the buckets so only a small number of elements will | |
28 | need to be examined. | |
29 | ||
30 | There is [link unordered.hash_equality more information on hash functions and | |
31 | equality predicates in the next section]. | |
32 | ||
33 | You can see in the diagram that `A` & `D` have been placed in the same bucket. | |
34 | When looking for elements in this bucket up to 2 comparisons are made, making | |
35 | the search slower. This is known as a collision. To keep things fast we try to | |
36 | keep collisions to a minimum. | |
37 | ||
38 | ''' | |
39 | <table frame="all"><title>Methods for Accessing Buckets</title> | |
40 | <tgroup cols="2"> | |
41 | <thead><row> | |
42 | <entry><para>Method</para></entry> | |
43 | <entry><para>Description</para></entry> | |
44 | </row></thead> | |
45 | <tbody> | |
46 | <row> | |
47 | <entry>'''`size_type bucket_count() const`'''</entry> | |
48 | <entry>'''The number of buckets.'''</entry> | |
49 | </row> | |
50 | <row> | |
51 | <entry>'''`size_type max_bucket_count() const`'''</entry> | |
52 | <entry>'''An upper bound on the number of buckets.'''</entry> | |
53 | </row> | |
54 | <row> | |
55 | <entry>'''`size_type bucket_size(size_type n) const`'''</entry> | |
56 | <entry>'''The number of elements in bucket `n`.'''</entry> | |
57 | </row> | |
58 | <row> | |
59 | <entry>'''`size_type bucket(key_type const& k) const`'''</entry> | |
60 | <entry>'''Returns the index of the bucket which would contain k'''</entry> | |
61 | </row> | |
62 | <row> | |
63 | <entry>'''`local_iterator begin(size_type n);`'''</entry> | |
64 | <entry morerows='5'>'''Return begin and end iterators for bucket `n`.'''</entry> | |
65 | </row> | |
66 | <row> | |
67 | <entry>'''`local_iterator end(size_type n);`'''</entry> | |
68 | </row> | |
69 | <row> | |
70 | <entry>'''`const_local_iterator begin(size_type n) const;`'''</entry> | |
71 | </row> | |
72 | <row> | |
73 | <entry>'''`const_local_iterator end(size_type n) const;`'''</entry> | |
74 | </row> | |
75 | <row> | |
76 | <entry>'''`const_local_iterator cbegin(size_type n) const;`'''</entry> | |
77 | </row> | |
78 | <row> | |
79 | <entry>'''`const_local_iterator cend(size_type n) const;`'''</entry> | |
80 | </row> | |
81 | </tbody> | |
82 | </tgroup> | |
83 | </table> | |
84 | ''' | |
85 | ||
86 | [h2 Controlling the number of buckets] | |
87 | ||
88 | As more elements are added to an unordered associative container, the number | |
89 | of elements in the buckets will increase causing performance to degrade. | |
90 | To combat this the containers increase the bucket count as elements are inserted. | |
91 | You can also tell the container to change the bucket count (if required) by | |
92 | calling `rehash`. | |
93 | ||
94 | The standard leaves a lot of freedom to the implementer to decide how the | |
95 | number of buckets are chosen, but it does make some requirements based on the | |
96 | container's 'load factor', the average number of elements per bucket. | |
97 | Containers also have a 'maximum load factor' which they should try to keep the | |
98 | load factor below. | |
99 | ||
100 | You can't control the bucket count directly but there are two ways to | |
101 | influence it: | |
102 | ||
103 | * Specify the minimum number of buckets when constructing a container or | |
104 | when calling `rehash`. | |
105 | * Suggest a maximum load factor by calling `max_load_factor`. | |
106 | ||
107 | `max_load_factor` doesn't let you set the maximum load factor yourself, it just | |
108 | lets you give a /hint/. And even then, the draft standard doesn't actually | |
109 | require the container to pay much attention to this value. The only time the | |
110 | load factor is /required/ to be less than the maximum is following a call to | |
111 | `rehash`. But most implementations will try to keep the number of elements | |
112 | below the max load factor, and set the maximum load factor to be the same as | |
113 | or close to the hint - unless your hint is unreasonably small or large. | |
114 | ||
115 | [table:bucket_size Methods for Controlling Bucket Size | |
116 | [[Method] [Description]] | |
117 | ||
118 | [ | |
119 | [`X(size_type n)`] | |
120 | [Construct an empty container with at least `n` buckets (`X` is the container type).] | |
121 | ] | |
122 | [ | |
123 | [`X(InputIterator i, InputIterator j, size_type n)`] | |
124 | [Construct an empty container with at least `n` buckets and insert elements | |
125 | from the range \[`i`, `j`) (`X` is the container type).] | |
126 | ] | |
127 | [ | |
128 | [`float load_factor() const`] | |
129 | [The average number of elements per bucket.] | |
130 | ] | |
131 | [ | |
132 | [`float max_load_factor() const`] | |
133 | [Returns the current maximum load factor.] | |
134 | ] | |
135 | [ | |
136 | [`float max_load_factor(float z)`] | |
137 | [Changes the container's maximum load factor, using `z` as a hint.] | |
138 | ] | |
139 | [ | |
140 | [`void rehash(size_type n)`] | |
141 | [Changes the number of buckets so that there at least n buckets, and | |
142 | so that the load factor is less than the maximum load factor.] | |
143 | ] | |
144 | ||
145 | ] | |
146 | ||
147 | [h2 Iterator Invalidation] | |
148 | ||
149 | It is not specified how member functions other than `rehash` affect | |
150 | the bucket count, although `insert` is only allowed to invalidate iterators | |
151 | when the insertion causes the load factor to be greater than or equal to the | |
152 | maximum load factor. For most implementations this means that insert will only | |
153 | change the number of buckets when this happens. While iterators can be | |
154 | invalidated by calls to `insert` and `rehash`, pointers and references to the | |
155 | container's elements are never invalidated. | |
156 | ||
157 | In a similar manner to using `reserve` for `vector`s, it can be a good idea | |
158 | to call `rehash` before inserting a large number of elements. This will get | |
159 | the expensive rehashing out of the way and let you store iterators, safe in | |
160 | the knowledge that they won't be invalidated. If you are inserting `n` | |
161 | elements into container `x`, you could first call: | |
162 | ||
163 | x.rehash((x.size() + n) / x.max_load_factor() + 1); | |
164 | ||
165 | [blurb Note: `rehash`'s argument is the minimum number of buckets, not the | |
166 | number of elements, which is why the new size is divided by the maximum load factor. The | |
167 | `+ 1` guarantees there is no invalidation; without it, reallocation could occur | |
168 | if the number of bucket exactly divides the target size, since the container is | |
169 | allowed to rehash when the load factor is equal to the maximum load factor.] | |
170 | ||
171 | [endsect] |