On Thu, Sep 27, 2018 at 10:06:25PM +0200, Laslo Hunhold wrote:
> ...
> The function bound() just operates on relatively small LUTs and is
> pretty efficient. If we implement a font drawing library in some way,
> we will have to think about how we do this special handling right.
> Extended grapheme clusters fortunately really stand for themselves and
> can be a good "atom" to base font rendering on.
Agreed: the "atom" would be this "extended grapheme cluster", and from this
point of view, a terminal would be a grid of "space" and "extended grapheme".
> ...
> Javascript has its purposes if applied lightly and always as an
> afterthought (i.e. the page works 100% without Javascript).
Unfortunately, I am still working out some issues before sueing the french
administration for that...
> This is not a bash or anything but really just due to the fact that all
> this processing on higher layers is a question of efficiency,
> especially when e.g. the UNIX system tools are used with plain ASCII
> data 99% of the time, not requiring all the UTF-8 processing.
For pure system tools ofc. But then I would need an i18n terminal for mutt,
lynx, etc.
> I would not favor such a solution, but this is just my opinion.
Idem, for the previous reasons.
> ...
> I've not yet dared to touch NFD or generally normalization and string
> comparison, but for simple stream-based operations and to get a grasp
> of a stream and where the bounds for extended grapheme clusters are
> you, by definition of bound(), only need to know the current and
> previous code point to know when a "drawn character" is finished.
>
> Still even there we would need bounds, as Unicode sets no limit for the
> size of an extended grapheme cluster. But this is a "problem" of the
> implementing application itself and not of the library, which I strive
> to have no memory allocations at all.
Well, there is something about stream safe unicode application. Basically, it
is a buffer of 128 bytes (32 unicode points) with a continuation mark if a
"extented grapheme cluster" is not finished at the end of the buffer. It seems
related only to stream normalization on the fly, though.
I did not go that deep into the "extended grapheme cluster" boundaries
computation, it seems that everything we need is there, but it raises many
more questions, for instance:
- how this finite state machine is resilient to garbage data?
- can we locate "extended grapheme cluster" boundaries on non normalized unicode?
- can we normalize on the fly a "extented grapheme cluster"?
- etc...
regards,
--
Sylvain
Received on Fri Sep 28 2018 - 04:05:20 CEST