Re: [dev][st][patch] new utf decoder

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Silvan Jegen <s.jegen_AT_gmail.com>
Date: Fri, 21 Mar 2014 10:39:18 +0100

Heyho

On Thu, Mar 20, 2014 at 5:39 PM, Damian Okrasa <dokrasa_AT_gmail.com> wrote:
> Hey,
>
> this patch replaces current utf decoder with a new one, which is ~50
> lines shorter and should be easier to understand. Parsing 5 and 6
> sequences, if necessary, requires trivial modification of UTF_SIZ
> constant and utfbyte, utfmask, utfmin, utfmax arrays.

I can't yet claim to fully understand the code but according to my testing with

https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

the behavior of the decoder has not changed a bit which I'll assume is
a good thing.

"Benchmarking" the decoder with

time for i in `seq 10000`; do cat UTF-8-test.txt; done;

did not seem to highlight any significant differences either.

I will stare at the code some more but so far it looks good to me.

Cheers,

Silvan
Received on Fri Mar 21 2014 - 10:39:18 CET

This archive was generated by hypermail 2.3.0 : Fri Mar 21 2014 - 10:48:06 CET