Re: [dev][sbase] Proposal of suckless compression

From: Ralph Eastwood <tcmreastwood_AT_gmail.com>
Date: Wed, 24 Sep 2014 12:55:58 +0100

On 24 September 2014 12:02, Hiltjo Posthuma <hiltjo_AT_codemadness.org> wrote:
> For sbase I think it should be, because gzip and bzip2 are the norm.
> Not everything that is the norm is sane or even nice ofcourse, but for
> sbase I'd want a minimal stable set of unix tools that work well.

Although the norm changes - if 'compress' wasn't patent encumbered, I guess
there would be wide support for it still.

> FWIW I think this should not be in sbase. A tar and gzip
> implementation though would be nice to have. The tricky part for tar
> might be to have it's behaviour to be mostly compatible with existing
> implementations[0].

I think I would stick with tar anyway, and there is a tar[0]
implementation in sbase anyway.


I guess even if the preference for it not to be included in sbase, it
can find a separate home in the suckless world.

I'm just choosing my entropy coder at the moment - so far the simplest
implementations for this boils down to a bitwise arithmetic coder
(patents for this has expired now), (bytewise) range coder and rANS
[1]. Huffman, although faster than any of these is 150-200 lines of
code at a glance at flate, and I don't think that's the most optimal
version.

In terms of code complexity, the bitwise arithmetic coder is simplest,
but also the slowest of the bunch. rANS is faster than both (and has
more optimising potential) but has the downside that the input stream
must be in reverse order of the output stream, which is easily worked
around by buffering/encoding in blocks. Range coding has the best
compromise, as the streams can encoded forward which gives the nice
property of writing almost as soon as possible (there is some
buffering in edge cases, but only by a few bytes). I think I can
implement range coding in < 100 lines.

The stream format can be quite simple: MAGIC|===DATA===|EOS|MD5SUM



[0] http://git.suckless.org/sbase/plain/tar.c
[1] https://github.com/rygorous/ryg_rans

-- 
Tai Chi Minh Ralph Eastwood
tcmreastwood_AT_gmail.com
Received on Wed Sep 24 2014 - 13:55:58 CEST

This archive was generated by hypermail 2.3.0 : Wed Sep 24 2014 - 14:00:12 CEST