Re: [dev] freetype2/fc pain from sylvain.bertrand_AT_gmail.com on 2018-09-28 (dev mail list archive)

From: <sylvain.bertrand_AT_gmail.com>
Date: Fri, 28 Sep 2018 02:05:20 +0000

On Thu, Sep 27, 2018 at 10:06:25PM +0200, Laslo Hunhold wrote:
> ...

> The function bound() just operates on relatively small LUTs and is
> pretty efficient. If we implement a font drawing library in some way,
> we will have to think about how we do this special handling right.
> Extended grapheme clusters fortunately really stand for themselves and
> can be a good "atom" to base font rendering on.

Agreed: the "atom" would be this "extended grapheme cluster", and from this
point of view, a terminal would be a grid of "space" and "extended grapheme".

> ...

> Javascript has its purposes if applied lightly and always as an
> afterthought (i.e. the page works 100% without Javascript).

Unfortunately, I am still working out some issues before sueing the french
administration for that...

> This is not a bash or anything but really just due to the fact that all
> this processing on higher layers is a question of efficiency,
> especially when e.g. the UNIX system tools are used with plain ASCII
> data 99% of the time, not requiring all the UTF-8 processing.

For pure system tools ofc. But then I would need an i18n terminal for mutt,
lynx, etc.

> I would not favor such a solution, but this is just my opinion.

Idem, for the previous reasons.

> ...

> I've not yet dared to touch NFD or generally normalization and string
> comparison, but for simple stream-based operations and to get a grasp
> of a stream and where the bounds for extended grapheme clusters are
> you, by definition of bound(), only need to know the current and
> previous code point to know when a "drawn character" is finished.
>
> Still even there we would need bounds, as Unicode sets no limit for the
> size of an extended grapheme cluster. But this is a "problem" of the
> implementing application itself and not of the library, which I strive
> to have no memory allocations at all.

Well, there is something about stream safe unicode application. Basically, it
is a buffer of 128 bytes (32 unicode points) with a continuation mark if a
"extented grapheme cluster" is not finished at the end of the buffer. It seems
related only to stream normalization on the fly, though.

I did not go that deep into the "extended grapheme cluster" boundaries
computation, it seems that everything we need is there, but it raises many
more questions, for instance:
  - how this finite state machine is resilient to garbage data?
  - can we locate "extended grapheme cluster" boundaries on non normalized unicode?
  - can we normalize on the fly a "extented grapheme cluster"?
  - etc...

regards,

-- 
Sylvain

Received on Fri Sep 28 2018 - 04:05:20 CEST

This archive was generated by hypermail 2.3.0 : Fri Sep 28 2018 - 04:12:07 CEST