]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | |
2 | [/ Copyright 2011 Daniel James. | |
3 | / Distributed under the Boost Software License, Version 1.0. (See accompanying | |
4 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] | |
5 | ||
6 | [section:rationale Rationale] | |
7 | ||
8 | The rationale can be found in the original design | |
9 | [footnote issue 6.18 of the __issues__ (page 63)]. | |
10 | ||
11 | [heading Quality of the hash function] | |
12 | ||
13 | Many hash functions strive to have little correlation between the input | |
14 | and output values. They attempt to uniformally distribute the output | |
15 | values for very similar inputs. This hash function makes no such | |
16 | attempt. In fact, for integers, the result of the hash function is often | |
17 | just the input value. So similar but different input values will often | |
18 | result in similar but different output values. | |
19 | This means that it is not appropriate as a general hash function. For | |
20 | example, a hash table may discard bits from the hash function resulting | |
21 | in likely collisions, or might have poor collision resolution when hash | |
22 | values are clustered together. In such cases this hash function will | |
23 | preform poorly. | |
24 | ||
25 | But the standard has no such requirement for the hash function, | |
26 | it just requires that the hashes of two different values are unlikely | |
27 | to collide. Containers or algorithms | |
28 | designed to work with the standard hash function will have to be | |
29 | implemented to work well when the hash function's output is correlated | |
30 | to its input. Since they are paying that cost a higher quality hash function | |
31 | would be wasteful. | |
32 | ||
33 | For other use cases, if you do need a higher quality hash function, | |
34 | then neither the standard hash function or `boost::hash` are appropriate. | |
35 | There are several options | |
36 | available. One is to use a second hash on the output of this hash | |
37 | function, such as [@http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm | |
38 | Thomas Wang's hash function]. This this may not work as | |
39 | well as a hash algorithm tailored for the input. | |
40 | ||
41 | For strings there are several fast, high quality hash functions | |
42 | available (for example [@http://code.google.com/p/smhasher/ MurmurHash3] | |
43 | and [@http://code.google.com/p/cityhash/ Google's CityHash]), | |
44 | although they tend to be more machine specific. | |
45 | These may also be appropriate for hashing a binary representation of | |
46 | your data - providing that all equal values have an equal | |
47 | representation, which is not always the case (e.g. for floating point | |
48 | values). | |
49 | ||
50 | [endsect] |