ceph/src/boost/libs/functional/hash/doc/rationale.qbk

   1
   2 [/ Copyright 2011 Daniel James.
   3  / Distributed under the Boost Software License, Version 1.0. (See accompanying
   4  / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
   5
   6 [section:rationale Rationale]
   7
   8 The rationale can be found in the original design
   9 [footnote issue 6.18 of the __issues__ (page 63)].
  10
  11 [heading Quality of the hash function]
  12
  13 Many hash functions strive to have little correlation between the input
  14 and output values. They attempt to uniformally distribute the output
  15 values for very similar inputs. This hash function makes no such
  16 attempt. In fact, for integers, the result of the hash function is often
  17 just the input value. So similar but different input values will often
  18 result in similar but different output values.
  19 This means that it is not appropriate as a general hash function. For
  20 example, a hash table may discard bits from the hash function resulting
  21 in likely collisions, or might have poor collision resolution when hash
  22 values are clustered together. In such cases this hash function will
  23 preform poorly.
  24
  25 But the standard has no such requirement for the hash function,
  26 it just requires that the hashes of two different values are unlikely
  27 to collide. Containers or algorithms
  28 designed to work with the standard hash function will have to be
  29 implemented to work well when the hash function's output is correlated
  30 to its input. Since they are paying that cost a higher quality hash function
  31 would be wasteful.
  32
  33 For other use cases, if you do need a higher quality hash function,
  34 then neither the standard hash function or `boost::hash` are appropriate.
  35 There are several options
  36 available. One is to use a second hash on the output of this hash
  37 function, such as [@http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm
  38 Thomas Wang's hash function]. This this may not work as
  39 well as a hash algorithm tailored for the input.
  40
  41 For strings there are several fast, high quality hash functions
  42 available (for example [@http://code.google.com/p/smhasher/ MurmurHash3]
  43 and [@http://code.google.com/p/cityhash/ Google's CityHash]),
  44 although they tend to be more machine specific.
  45 These may also be appropriate for hashing a binary representation of
  46 your data - providing that all equal values have an equal
  47 representation, which is not always the case (e.g. for floating point
  48 values).
  49
  50 [endsect]