Re: [dev] libzahl paper

From: Markus Wichmann <nullplan_AT_gmx.net>
Date: Tue, 14 Jun 2016 20:55:25 +0200

On Mon, Jun 13, 2016 at 11:32:11PM +0200, Mattias Andrée wrote:
> Proofreading and suggestions is greatly appreciated!
>

Page 5: "Branches are incredibly cheap on modern CPUs." Not so! I wrote
a variety of CRC32 algorithms to benchmark (which will just measure my
personal CPU, I know, but it's better than guesswork), and had two
different bitwise versions:

for (i = 0; i < 8; i++)
{
    c = crc & 1;
    crc >>= 1;
    if (c)
        crc ^= 0xedb88320ul;
}

And:

for (i = 0; i < 8; i++)
{
    crc = (crc >> 1) ^ (-(crc & 1) & 0xedb88320ul);
}

The second version had twice the throughput of the first. (Of course
that 4-bytes-at-a-time version had the highest throughput, after that
the times got bigger again, but that's beside the point.) That remained
even when writing the loop in assembly by hand (the compiler refused to
just write this:)

shr eax
jnc 1f
xor eax, edb88320ul
1:

And that even though the second version is longer in assembly. But
eliminating that branch really did increase the throughput immensly.

Although, I don't know what would happen on a different architecture.

Ciao,
Markus
Received on Tue Jun 14 2016 - 20:55:25 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 14 2016 - 21:00:14 CEST