Re: [dev] The mysterious 31

From: David Tweed <>
Date: Wed, 4 Aug 2010 18:01:11 +0100

On Wed, Aug 4, 2010 at 2:22 PM, John Yates <> wrote:
> Here are two useful references:
> RE computation cost: Generalized integer multiplication and integer
> division both used to be expensive.  On modern processors a
> multiplication is fully pipelined and comparable to an L1 data cache
> hit (~2 to 4 cycles).  Most modern implementations of division run the
> divide circuit at a multiple greater than 1 of the cpu clock
> (typically 2x) and implement early out.  Further division is nearly
> always an autonomous, non-pipelined logic block that once initiated
> runs asynchronously until it is ready to deliver its result.  Bottom
> line: on any contemporary out-of-order x86 chip from either Intel or
> AMD the cost of either of these instruction is negligible.

I'm not a great expert of microarchitecture, but isn't the fact the
implied division is the last "explicit" operation in a function, with
nothing much after it to overlap with, going to serialise things
thought? (The fact that I'm currently developing for Intel Atoms and
ARM chips, both in-order chips, probably does skew my ideas on
performance though. An instruction reference claims that it's latency
is 50 instructions on an Atom, which is why I try and avoid
unnecessary divisions.)

> bucket selector.  Nearly universally this is a modulo operation taking
> the remainder of the digest divided by the number of buckets.  A
> particularly poor approach is to have a power of 2 number of buckets
> so as to to reduce the modulo operation to simple masking.  The reason
> is that this discards all entropy in all masked out bits.  A _much_
> better approach is to pick the number of buckets to be a prime number
> -- the remainder of division by a prime number will be influenced by
> _every_ bit in the digest.  (When I need to implement a dynamic hash
> table I always include a statically initialized array of primes chosen
> to give me the growth pattern I desire.)

If your hash function has appropriate avalanching and funneling so
that you believe it is a good approximation to uniform, then there's
no reason to believe that the entropy obtained by a using 2^n bits
should be any better or worse than reducing modulo a prime. (Certainly
if you look at most of the hash tables on Linux they're powers of 2,
but I don't know if that means much.) That said, I haven't actually
timed a full hash-table implementation to see if an integer division
operation shows up on an optimised program using hash tables though.

My bigger point was just that lots of the code snippets in K & R was
written for a different time and is inappropriate in 2010.

cheers, dave tweed__________________________
computer vision reasearcher:
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
Received on Wed Aug 04 2010 - 19:01:11 CEST

This archive was generated by hypermail 2.2.0 : Wed Aug 04 2010 - 19:12:01 CEST