From: John Yates <john_AT_yates-sheets.org>

Date: Wed, 4 Aug 2010 09:22:58 -0400

Date: Wed, 4 Aug 2010 09:22:58 -0400

Here are two useful references:

http://bretm.home.comcast.net/hash/

http://burtleburtle.net/bob/hash/

RE computation cost: Generalized integer multiplication and integer

division both used to be expensive. On modern processors a

multiplication is fully pipelined and comparable to an L1 data cache

hit (~2 to 4 cycles). Most modern implementations of division run the

divide circuit at a multiple greater than 1 of the cpu clock

(typically 2x) and implement early out. Further division is nearly

always an autonomous, non-pipelined logic block that once initiated

runs asynchronously until it is ready to deliver its result. Bottom

line: on any contemporary out-of-order x86 chip from either Intel or

AMD the cost of either of these instruction is negligible.

RE general hashing concepts: I find it helpful to think of a hash

table lookup as proceeding in steps. The first step is to generate a

fixed size digest of the (possibly variable length) key. On a 32 bit

machine this digest most commonly is a 32 bit (unsigned) integer. The

quality of a digest is related to how much of the entropy in the key

it is able to preserve. The second step is to map the digest into a

bucket selector. Nearly universally this is a modulo operation taking

the remainder of the digest divided by the number of buckets. A

particularly poor approach is to have a power of 2 number of buckets

so as to to reduce the modulo operation to simple masking. The reason

is that this discards all entropy in all masked out bits. A _much_

better approach is to pick the number of buckets to be a prime number

-- the remainder of division by a prime number will be influenced by

_every_ bit in the digest. (When I need to implement a dynamic hash

table I always include a statically initialized array of primes chosen

to give me the growth pattern I desire.)

/john

Received on Wed Aug 04 2010 - 15:22:58 CEST

*
This archive was generated by hypermail 2.2.0
: Wed Aug 04 2010 - 15:24:02 CEST
*