Re: [dev][st][patch] new utf decoder

From: Silvan Jegen <>
Date: Fri, 21 Mar 2014 10:39:18 +0100


On Thu, Mar 20, 2014 at 5:39 PM, Damian Okrasa <> wrote:
> this patch replaces current utf decoder with a new one, which is ~50
> lines shorter and should be easier to understand. Parsing 5 and 6
> sequences, if necessary, requires trivial modification of UTF_SIZ
> constant and utfbyte, utfmask, utfmin, utfmax arrays.

I can't yet claim to fully understand the code but according to my testing with

the behavior of the decoder has not changed a bit which I'll assume is
a good thing.

"Benchmarking" the decoder with

time for i in `seq 10000`; do cat UTF-8-test.txt; done;

did not seem to highlight any significant differences either.

I will stare at the code some more but so far it looks good to me.


Received on Fri Mar 21 2014 - 10:39:18 CET

