Re: [dev] [st] wide characters

From: Thorsten Glaser <tg_AT_mirbsd.de>
Date: Mon, 15 Apr 2013 19:36:42 +0000 (UTC)

Strake dixit:

>In UTF-8 the maximum encoded character length is 6 bytes [1]

Right, but the largest codepoint in Unicode is U-0001FFFF,
which is 🿿: F0 9F BF BF in UTF-8.

Most things are in the BMP anyway – for example, the distance
between the lowest and highest encoded glyph in an X11 font
is roughly 2¹⁶, so you’ll end up using up to 3 octets normally,
but at additional cost for some operations (glyph width, and,
though very minor, movement across characters).

Actually, wint_t is the standard type to use for this. One
could also use wchar_t but that may be an unsigned short on
some systems, or a signed or unsigned int. uint32_t makes
sense, if one doesn’t want to go after the possible savings
on 16-bit Unicode systems, since signed integers in C are
almost Undefined anyway…

bye,
//mirabilos
-- 
15:39⎜«mika:#grml» mira|AO: "mit XFree86® wär’ das nicht passiert" - muhaha
15:48⎜<thkoehler:#grml> also warum machen die xorg Jungs eigentlich alles
kaputt? :)    15:49⎜<novoid:#grml> thkoehler: weil sie als Kinder nie den
gebauten Turm selber umschmeissen durften?	-- ~/.Xmodmap wonders…
Received on Mon Apr 15 2013 - 21:36:42 CEST

This archive was generated by hypermail 2.3.0 : Mon Apr 15 2013 - 21:48:06 CEST