Re: [dev] [libgrapheme] announcement

From: <sylvain.bertrand_AT_gmail.com>
Date: Fri, 27 Mar 2020 22:24:22 +0000

On Fri, Mar 27, 2020 at 10:24:52PM +0100, Laslo Hunhold wrote:
> ... This will cover 99.5% of all cases...

What do you mean? They managed to add in grapheme cluster definition some weird
edge cases up to 0.5%??

About string comparison: if I recall well, after utf-8 normalization (n11n), strings
are supposed to be 100% perfect for comparison byte per byte.

The more you know: utf-8 n11n got its way in linux filesystems support, and
that quite recently. This will become a problem for terminal based
applications. In near future gnu/linux distros, the filenames will become
normalized using the "right way"(TM) n11n.

This "right way"(TM) n11n (there are 2 n11ns) produces only non-pre-composed
grapheme cluster of codepoints (but in the CJK realm, there are exceptions if I
recall properly). AFAIK, all terminal based applications do expect
"pre-composed" grapheme codepoint.

For instance the french letter 'è' won't be 1 codepoint anymore, but 'e' + '`'
(I don't recall the n11n order), namely a sequence of 2 codepoints.

I am a bit scared because software like ncurses, lynx, links, vim, may use the
abominations of software we discussed earlier to handle all this.

-- 
Sylvain
Received on Fri Mar 27 2020 - 23:24:22 CET

This archive was generated by hypermail 2.3.0 : Fri Mar 27 2020 - 23:36:09 CET