[ceph.git] / ceph / src / boost / libs / unordered / doc / buckets.qbk

[/ Copyright 2006-2008 Daniel James.
 / Distributed under the Boost Software License, Version 1.0. (See accompanying
 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]

[section:buckets The Data Structure]

The containers are made up of a number of 'buckets', each of which can contain
any number of elements. For example, the following diagram shows an [classref
boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
have more buckets).

[diagram buckets]

In order to decide which bucket to place an element in, the container applies
the hash function, `Hash`, to the element's key (for `unordered_set` and
`unordered_multiset` the key is the whole element, but is referred to as the key
so that the same terminology can be used for sets and maps). This returns a
value of type `std::size_t`.  `std::size_t` has a much greater range of values
then the number of buckets, so that container applies another transformation to
that value to choose a bucket to place the element in.

Retrieving the elements for a given key is simple. The same process is applied
to the key to find the correct bucket. Then the key is compared with the
elements in the bucket to find any elements that match (using the equality
predicate `Pred`).  If the hash function has worked well the elements will be
evenly distributed amongst the buckets so only a small number of elements will
need to be examined.

There is [link unordered.hash_equality more information on hash functions and
equality predicates in the next section].

You can see in the diagram that `A` & `D` have been placed in the same bucket.
When looking for elements in this bucket up to 2 comparisons are made, making
the search slower. This is known as a collision. To keep things fast we try to
keep collisions to a minimum.

'''
<table frame="all"><title>Methods for Accessing Buckets</title>
  <tgroup cols="2">
    <thead><row>
      <entry><para>Method</para></entry>
      <entry><para>Description</para></entry>
    </row></thead>
    <tbody>
      <row>
       <entry>'''`size_type bucket_count() const`'''</entry>
       <entry>'''The number of buckets.'''</entry>
      </row>
      <row>
        <entry>'''`size_type max_bucket_count() const`'''</entry>
        <entry>'''An upper bound on the number of buckets.'''</entry>
      </row>
      <row>
        <entry>'''`size_type bucket_size(size_type n) const`'''</entry>
        <entry>'''The number of elements in bucket `n`.'''</entry>
      </row>
      <row>
        <entry>'''`size_type bucket(key_type const& k) const`'''</entry>
        <entry>'''Returns the index of the bucket which would contain k'''</entry>
      </row>
      <row>
        <entry>'''`local_iterator begin(size_type n);`'''</entry>
        <entry morerows='5'>'''Return begin and end iterators for bucket `n`.'''</entry>
      </row>
      <row>
        <entry>'''`local_iterator end(size_type n);`'''</entry>
      </row>
      <row>
        <entry>'''`const_local_iterator begin(size_type n) const;`'''</entry>
      </row>
      <row>
        <entry>'''`const_local_iterator end(size_type n) const;`'''</entry>
      </row>
      <row>
        <entry>'''`const_local_iterator cbegin(size_type n) const;`'''</entry>
      </row>
      <row>
        <entry>'''`const_local_iterator cend(size_type n) const;`'''</entry>
      </row>
    </tbody>
  </tgroup>
</table>
'''

[h2 Controlling the number of buckets]

As more elements are added to an unordered associative container, the number
of elements in the buckets will increase causing performance to degrade.
To combat this the containers increase the bucket count as elements are inserted.
You can also tell the container to change the bucket count (if required) by
calling `rehash`.

The standard leaves a lot of freedom to the implementer to decide how the
number of buckets are chosen, but it does make some requirements based on the
container's 'load factor', the average number of elements per bucket.
Containers also have a 'maximum load factor' which they should try to keep the
load factor below.

You can't control the bucket count directly but there are two ways to
influence it:

* Specify the minimum number of buckets when constructing a container or
  when calling `rehash`.
* Suggest a maximum load factor by calling `max_load_factor`.

`max_load_factor` doesn't let you set the maximum load factor yourself, it just
lets you give a /hint/.  And even then, the draft standard doesn't actually
require the container to pay much attention to this value. The only time the
load factor is /required/ to be less than the maximum is following a call to
`rehash`. But most implementations will try to keep the number of elements
below the max load factor, and set the maximum load factor to be the same as
or close to the hint - unless your hint is unreasonably small or large.

[table:bucket_size Methods for Controlling Bucket Size
    [[Method] [Description]]

    [
        [`X(size_type n)`]
        [Construct an empty container with at least `n` buckets (`X` is the container type).]
    ]
    [
        [`X(InputIterator i, InputIterator j, size_type n)`]
        [Construct an empty container with at least `n` buckets and insert elements
         from the range \[`i`, `j`) (`X` is the container type).]
    ]
    [
        [`float load_factor() const`]
        [The average number of elements per bucket.]
    ]
    [
        [`float max_load_factor() const`]
        [Returns the current maximum load factor.]
    ]
    [
        [`float max_load_factor(float z)`]
        [Changes the container's maximum load factor, using `z` as a hint.]
    ]
    [
        [`void rehash(size_type n)`]
        [Changes the number of buckets so that there at least n buckets, and
        so that the load factor is less than the maximum load factor.]
    ]

]

[h2 Iterator Invalidation]

It is not specified how member functions other than `rehash` affect
the bucket count, although `insert` is only allowed to invalidate iterators
when the insertion causes the load factor to be greater than or equal to the
maximum load factor. For most implementations this means that insert will only
change the number of buckets when this happens. While iterators can be
invalidated by calls to `insert` and `rehash`, pointers and references to the
container's elements are never invalidated.

In a similar manner to using `reserve` for `vector`s, it can be a good idea
to call `rehash` before inserting a large number of elements. This will get
the expensive rehashing out of the way and let you store iterators, safe in
the knowledge that they won't be invalidated. If you are inserting `n`
elements into container `x`, you could first call:

    x.rehash((x.size() + n) / x.max_load_factor() + 1);

[blurb Note: `rehash`'s argument is the minimum number of buckets, not the
number of elements, which is why the new size is divided by the maximum load factor.  The
`+ 1` guarantees there is no invalidation; without it, reallocation could occur
if the number of bucket exactly divides the target size, since the container is
allowed to rehash when the load factor is equal to the maximum load factor.]

[endsect]
Commit	Line	Data
7c673cae FG	1	[/ Copyright 2006-2008 Daniel James.
	2	/ Distributed under the Boost Software License, Version 1.0. (See accompanying
	3	/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
	4
	5	[section:buckets The Data Structure]
	6
	7	The containers are made up of a number of 'buckets', each of which can contain
	8	any number of elements. For example, the following diagram shows an [classref
	9	boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
	10	`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
	11	have more buckets).
	12
	13	[diagram buckets]
	14
	15	In order to decide which bucket to place an element in, the container applies
	16	the hash function, `Hash`, to the element's key (for `unordered_set` and
	17	`unordered_multiset` the key is the whole element, but is referred to as the key
	18	so that the same terminology can be used for sets and maps). This returns a
	19	value of type `std::size_t`. `std::size_t` has a much greater range of values
	20	then the number of buckets, so that container applies another transformation to
	21	that value to choose a bucket to place the element in.
	22
	23	Retrieving the elements for a given key is simple. The same process is applied
	24	to the key to find the correct bucket. Then the key is compared with the
	25	elements in the bucket to find any elements that match (using the equality
	26	predicate `Pred`). If the hash function has worked well the elements will be
	27	evenly distributed amongst the buckets so only a small number of elements will
	28	need to be examined.
	29
	30	There is [link unordered.hash_equality more information on hash functions and
	31	equality predicates in the next section].
	32
	33	You can see in the diagram that `A` & `D` have been placed in the same bucket.
	34	When looking for elements in this bucket up to 2 comparisons are made, making
	35	the search slower. This is known as a collision. To keep things fast we try to
	36	keep collisions to a minimum.
	37
	38	'''
	39	<table frame="all"><title>Methods for Accessing Buckets</title>
	40	<tgroup cols="2">
	41	<thead><row>
	42	<entry><para>Method</para></entry>
	43	<entry><para>Description</para></entry>
	44	</row></thead>
	45	<tbody>
	46	<row>
	47	<entry>'''`size_type bucket_count() const`'''</entry>
	48	<entry>'''The number of buckets.'''</entry>
	49	</row>
	50	<row>
	51	<entry>'''`size_type max_bucket_count() const`'''</entry>
	52	<entry>'''An upper bound on the number of buckets.'''</entry>
	53	</row>
	54	<row>
	55	<entry>'''`size_type bucket_size(size_type n) const`'''</entry>
	56	<entry>'''The number of elements in bucket `n`.'''</entry>
	57	</row>
	58	<row>
	59	<entry>'''`size_type bucket(key_type const& k) const`'''</entry>
	60	<entry>'''Returns the index of the bucket which would contain k'''</entry>
	61	</row>
	62	<row>
	63	<entry>'''`local_iterator begin(size_type n);`'''</entry>
	64	<entry morerows='5'>'''Return begin and end iterators for bucket `n`.'''</entry>
65	</row>
66	<row>
67	<entry>'''`local_iterator end(size_type n);`'''</entry>
68	</row>
69	<row>
70	<entry>'''`const_local_iterator begin(size_type n) const;`'''</entry>
71	</row>
72	<row>
73	<entry>'''`const_local_iterator end(size_type n) const;`'''</entry>
74	</row>
75	<row>
76	<entry>'''`const_local_iterator cbegin(size_type n) const;`'''</entry>
77	</row>
78	<row>
79	<entry>'''`const_local_iterator cend(size_type n) const;`'''</entry>
80	</row>
81	</tbody>
82	</tgroup>
83	</table>
84	'''
85
86	[h2 Controlling the number of buckets]
87
88	As more elements are added to an unordered associative container, the number
89	of elements in the buckets will increase causing performance to degrade.
90	To combat this the containers increase the bucket count as elements are inserted.
91	You can also tell the container to change the bucket count (if required) by
92	calling `rehash`.
93
94	The standard leaves a lot of freedom to the implementer to decide how the
95	number of buckets are chosen, but it does make some requirements based on the
96	container's 'load factor', the average number of elements per bucket.
97	Containers also have a 'maximum load factor' which they should try to keep the
98	load factor below.
99
100	You can't control the bucket count directly but there are two ways to
101	influence it:
102
103	* Specify the minimum number of buckets when constructing a container or
104	when calling `rehash`.
105	* Suggest a maximum load factor by calling `max_load_factor`.
106
107	`max_load_factor` doesn't let you set the maximum load factor yourself, it just
108	lets you give a /hint/. And even then, the draft standard doesn't actually
109	require the container to pay much attention to this value. The only time the
110	load factor is /required/ to be less than the maximum is following a call to
111	`rehash`. But most implementations will try to keep the number of elements
112	below the max load factor, and set the maximum load factor to be the same as
113	or close to the hint - unless your hint is unreasonably small or large.
114
115	[table:bucket_size Methods for Controlling Bucket Size
116	[[Method] [Description]]
117
118	[
119	[`X(size_type n)`]
120	[Construct an empty container with at least `n` buckets (`X` is the container type).]
121	]
122	[
123	[`X(InputIterator i, InputIterator j, size_type n)`]
124	[Construct an empty container with at least `n` buckets and insert elements
125	from the range \[`i`, `j`) (`X` is the container type).]
126	]
127	[
128	[`float load_factor() const`]
129	[The average number of elements per bucket.]
130	]
131	[
132	[`float max_load_factor() const`]
133	[Returns the current maximum load factor.]
134	]
135	[
136	[`float max_load_factor(float z)`]
137	[Changes the container's maximum load factor, using `z` as a hint.]
138	]
139	[
140	[`void rehash(size_type n)`]
141	[Changes the number of buckets so that there at least n buckets, and
142	so that the load factor is less than the maximum load factor.]
143	]
144
145	]
146
147	[h2 Iterator Invalidation]
148
149	It is not specified how member functions other than `rehash` affect
150	the bucket count, although `insert` is only allowed to invalidate iterators
151	when the insertion causes the load factor to be greater than or equal to the
152	maximum load factor. For most implementations this means that insert will only
153	change the number of buckets when this happens. While iterators can be
154	invalidated by calls to `insert` and `rehash`, pointers and references to the
155	container's elements are never invalidated.
156
157	In a similar manner to using `reserve` for `vector`s, it can be a good idea
158	to call `rehash` before inserting a large number of elements. This will get
159	the expensive rehashing out of the way and let you store iterators, safe in
160	the knowledge that they won't be invalidated. If you are inserting `n`
161	elements into container `x`, you could first call:
162
163	x.rehash((x.size() + n) / x.max_load_factor() + 1);
164
165	[blurb Note: `rehash`'s argument is the minimum number of buckets, not the
166	number of elements, which is why the new size is divided by the maximum load factor. The
167	`+ 1` guarantees there is no invalidation; without it, reallocation could occur
168	if the number of bucket exactly divides the target size, since the container is
169	allowed to rehash when the load factor is equal to the maximum load factor.]
170
171	[endsect]