On Sat, Sep 29, 2018 at 03:46:36PM +0200, Laslo Hunhold wrote:
> On Sat, 29 Sep 2018 12:59:15 +0000
> sylvain.bertrand_AT_gmail.com wrote:
>
> Dear Sylvain,
>
> > mmmh... for the reason I stated before, the fonts files will probably
> > be more and more NFD normalization only (lighter font files, and
> > significantly less work to do for font designers). Font files will
> > miss more and more pre-combined (legacy) glyphs: full decomposition
> > in base glyphs will be more and more required.
>
> no, that's unlikely, as they cannot impose the data format that is
> still prominently used for all data exchange. The only thing that might
> happen is that font libraries will need to do some normalization, but
> maybe we are discussing nonsense here and the TTF format has some kind
> of way to refer to other glyphs and combine them or something.
That what I did think first, but much of "lambda user" software uses
super-sucking renderers (harfbuzz/graphite/apple one/uniscribe/other) doing
glyph combining. Therefore, for fonts with missing pre-combined glyphs those
renderers will NFD normalized (with huge tables for CJK) them and use the
combining glyphs which are more likely to be in the font file.
> > I have not gone into the details of the EGC boundaries algorithm, but
> > I'm really curious to how the unicode consortium algorithm can know
> > that an unicode point is an EGC terminator without looking the next
> > unicode point.
>
> It does in fact. The algorithm works by determining if _between_ to code
> points there is an EGC terminator.
Ok, then the consequences are spectacular: anything interactive has to work
with context tracking, and this, unicode point per unicode point.
You cannot know if an EGC is complete till you have the next unicode point
following the last unicode point of this very EGC.
For instance a terminal cell would have to be redrawn for each unicode point
transmitted by the terminal application (invalid or not), because it cannot
presume it's the last one but must display something. The good thing about
that: it allows unicode point per unicode point input (a bit like if a 8 bits
char was transmitted/inputed bit per bit and drawn for each bit received).
Where some maintainers may cringe: something that was very easily in sync with
non-"real"-unicode text, the grid segmentation of unicode text of terminal text
editors must exactly match the one from any properly i18n-ed terminal. I foresee
that some quite significant "funny" things are going to happen here.
Now, regarding suckless, from my super not legitimate, wannabe and humble
opinion:
If I presume suckless being limited to system tools, and I don't think being
that wrong, then I 100% agree on the 100% ascii. st should go full ascii, but
since there are plans to support wayland (which is consistent for a graphical
terminal emulator), it means suckless own ascii only font format (100% text
plz) and a custom renderer. Reverting to XFont will cleanup the base code for a
futur really suckless wayland backend.
But for consistency, some apps should be put in a non-suckless (or i18n)
category, a bit like gnu/non-gnu (i.e. surf). In this category, the terminal
ppl may find some full i18n, terminal based and hybrid (similar to mplayer,
with i18n subs) and user oriented apps, because full i18n support cannot be
suckless since most written languages suck really hard.
regards,
--
Sylvain
Received on Sat Sep 29 2018 - 22:59:04 CEST