Re: [dev] freetype2/fc pain

From: Laslo Hunhold <dev_AT_frign.de>
Date: Fri, 28 Sep 2018 20:27:30 +0200

On Fri, 28 Sep 2018 13:38:03 +0000
sylvain.bertrand_AT_gmail.com wrote:

Dear Sylvain,

> That's what the specs says: "extended grapheme cluster" (EGC) should
> not go beyond 10 unicode points "in theory". This stream-safe thingy
> seems to apply to non normalized unicode stream with it's 32 unicode
> points and continuation mark.
>
> With that "continuation mark", an EGCs can go to "infinity and
> beyond"... and the application is in charge of the size of the
> "infinity and beyond" (aka, _you better deal with microsoft, apple,
> google and mozilla "infinity and beyond"_).
>
> I am in favor of a hard limit of 32 unicode points, with a nice 128
> bytes shifting buffer (AVX/MMX register size if I recall properly).
> The "continuation mark" would switch the state machine in
> "discarding" mode, and certainly not in "infinity and beyond" memory
> allocation. The parser would need to switch to a discarding state
> till the "infinity and beyond" EGC terminator bound or some
> corruption.

yeah, lets keep it simple. 32 is a good value.

> I wonder how this is handled in lynx, ncurses, vim, readline,
> libedit, etc... Wild guess: their "atom" in only 1 unicode point.
> Probably some work will have to be done here... (and their
> maintainers won't be happy...)

Your wild guess is a good one.

> > Yes, but don't worry about that too much as we don't need
> > normalization as much as you probably think.
>
> Agreed, as far as I can think of, with my limited knowledge on
> unicode, it would be kind of required only for the EGC renderer in
> order to help the "rendering correctness".
> Additionally, ill EGCs with tons of combining code points (less than
> 32 though) will likely be "compressed" by this normalization.

No, not even that. We only need normalization really if we want to do
"perceptual" string comparisons, which is generally questionable for
UNIX tools.

With best regards

Laslo

-- 
Laslo Hunhold <dev_AT_frign.de>
Received on Fri Sep 28 2018 - 20:27:30 CEST

This archive was generated by hypermail 2.3.0 : Fri Sep 28 2018 - 20:36:07 CEST