David,
You are correct on many counts. Our divergent domains of development
definitely colors our biases. I work with high end out of order
processors.
The Intel Core i7 has a multiply latency of 3 cycles and can issue a
new multiply into the pipeline every cycle. It has a worst case
integer divide of 26 cycles. This is obviously less than the minimum
of 32 cycles if it were developing one quotient bit per cycle. The 26
cycles reflects a moderate fixed startup, a max of 16 double-clocked
cycles to carry out the division and a small fixed overhead to deliver
the final result. During start up the operands pass through normal
integer data path elements:
- OR numerator and denominator and count resulting leading zeros
- compute loop count: 32 - # leading zeros
I am less familiar with the Atom's micro-architecture. A bit of
Googling suggests that it has a partially pipelined multiplier with 5
cycle latency and confirms your claim of 50 cycles (worst case?) for
divide. Clearly this is not double clocked. I could not find any
indication of whether or not the divider incorporates early out
preprocessing.
> If your hash function has appropriate avalanching and funneling so
> that you believe it is a good approximation to uniform, then there's
> no reason to believe that the entropy obtained by a using 2^n bits
> should be any better or worse than reducing modulo a prime.
Digest functions that exhibit good avalanching and funneling are not
cheap. True they need not be anywhere near as expensive as
cryptographic hashes. But typically they will swamp the savings
achieve by avoiding a divide.
Again, our differing domains of development probably come out. For
the past decade I have worked on a very high performance database
execution engine. Our workhorse join operator is a hash join. Here I
need an architecture that performs well irrespective of join column
datatypes. Joining strings columns I might possibly be able to
justify a high quality digest function with good avalanching and
funneling but I surely cannot do so when joining 32-bit or 64-bit
integer columns (our bread and butter). In the case of a 32-bit
integer my digest is the identity function. In the case of a 64-bit
integer I simply xor the 2 32-bit halves. Under such conditions a
power of 2 performs very poorly.
/john
Received on Wed Aug 04 2010 - 22:42:23 CEST
This archive was generated by hypermail 2.2.0 : Wed Aug 04 2010 - 22:48:02 CEST