Heyho
On Thu, Mar 20, 2014 at 5:39 PM, Damian Okrasa <dokrasa_AT_gmail.com> wrote:
> Hey,
>
> this patch replaces current utf decoder with a new one, which is ~50
> lines shorter and should be easier to understand. Parsing 5 and 6
> sequences, if necessary, requires trivial modification of UTF_SIZ
> constant and utfbyte, utfmask, utfmin, utfmax arrays.
I can't yet claim to fully understand the code but according to my testing with
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
the behavior of the decoder has not changed a bit which I'll assume is
a good thing.
"Benchmarking" the decoder with
time for i in `seq 10000`; do cat UTF-8-test.txt; done;
did not seem to highlight any significant differences either.
I will stare at the code some more but so far it looks good to me.
Cheers,
Silvan
Received on Fri Mar 21 2014 - 10:39:18 CET