From: David Tweed <david.tweed_AT_gmail.com>

Date: Wed, 4 Aug 2010 18:01:11 +0100

Date: Wed, 4 Aug 2010 18:01:11 +0100

On Wed, Aug 4, 2010 at 2:22 PM, John Yates <john_AT_yates-sheets.org> wrote:

*> Here are two useful references:
*

*>
*

*> http://bretm.home.comcast.net/hash/
*

*> http://burtleburtle.net/bob/hash/
*

*>
*

*> RE computation cost: Generalized integer multiplication and integer
*

*> division both used to be expensive. On modern processors a
*

*> multiplication is fully pipelined and comparable to an L1 data cache
*

*> hit (~2 to 4 cycles). Most modern implementations of division run the
*

*> divide circuit at a multiple greater than 1 of the cpu clock
*

*> (typically 2x) and implement early out. Further division is nearly
*

*> always an autonomous, non-pipelined logic block that once initiated
*

*> runs asynchronously until it is ready to deliver its result. Bottom
*

*> line: on any contemporary out-of-order x86 chip from either Intel or
*

*> AMD the cost of either of these instruction is negligible.
*

I'm not a great expert of microarchitecture, but isn't the fact the

implied division is the last "explicit" operation in a function, with

nothing much after it to overlap with, going to serialise things

thought? (The fact that I'm currently developing for Intel Atoms and

ARM chips, both in-order chips, probably does skew my ideas on

performance though. An instruction reference claims that it's latency

is 50 instructions on an Atom, which is why I try and avoid

unnecessary divisions.)

*> bucket selector. Nearly universally this is a modulo operation taking
*

*> the remainder of the digest divided by the number of buckets. A
*

*> particularly poor approach is to have a power of 2 number of buckets
*

*> so as to to reduce the modulo operation to simple masking. The reason
*

*> is that this discards all entropy in all masked out bits. A _much_
*

*> better approach is to pick the number of buckets to be a prime number
*

*> -- the remainder of division by a prime number will be influenced by
*

*> _every_ bit in the digest. (When I need to implement a dynamic hash
*

*> table I always include a statically initialized array of primes chosen
*

*> to give me the growth pattern I desire.)
*

If your hash function has appropriate avalanching and funneling so

that you believe it is a good approximation to uniform, then there's

no reason to believe that the entropy obtained by a using 2^n bits

should be any better or worse than reducing modulo a prime. (Certainly

if you look at most of the hash tables on Linux they're powers of 2,

but I don't know if that means much.) That said, I haven't actually

timed a full hash-table implementation to see if an integer division

operation shows up on an optimised program using hash tables though.

My bigger point was just that lots of the code snippets in K & R was

written for a different time and is inappropriate in 2010.

-- cheers, dave tweed__________________________ computer vision reasearcher: david.tweed_AT_gmail.com "while having code so boring anyone can maintain it, use Python." -- attempted insult seen on slashdotReceived on Wed Aug 04 2010 - 19:01:11 CEST

*
This archive was generated by hypermail 2.2.0
: Wed Aug 04 2010 - 19:12:01 CEST
*