Re: [dev] [st] Proposal of changing internal representation
On Sat, Aug 23, 2014 at 05:35:54PM +0200, Roberto E. Vargas Caballero wrote:
> If the character is a multibyte, we decode it again!!!!. So for
> multibyte characters we:
>
> - decode
> - encode
> - decode
>
> It is slow and really ugly. But we have this problem not only in
> tputc. We have a function utf8len:
>
>
> size_t
> utf8len(char *c) {
> return utf8decode(c, &(long){0}, UTF_SIZ);
> }
>
> That decode again the string because in some places we need the size
> of the utf8 string.
I am not an st developer and not familiar with the code, but the above
approach seems quite crazy...
> I think we should decode the utf8 character in the input, store it
> in raw unicode with 4 bytes, and encode again in output (usually in
> getsel or in printer functions). The memory usage is going to be the
> same, because we store the utf8 string with 'char c[UTF_SIZ]', where
> UTF_SIZE is 4 (although it should be bigger because if we accept
> unicode of 32 bits then we can receive utf8 strings of 6 bytes).
Sounds pretty sensible to me.
Received on Sat Aug 23 2014 - 17:47:49 CEST
This archive was generated by hypermail 2.3.0
: Sat Aug 23 2014 - 18:00:11 CEST