Re: [dev] [st] Proposal of changing internal representation

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Dimitris Papastamos <sin_AT_2f30.org>
Date: Sat, 23 Aug 2014 16:47:49 +0100

On Sat, Aug 23, 2014 at 05:35:54PM +0200, Roberto E. Vargas Caballero wrote:
> If the character is a multibyte, we decode it again!!!!. So for
> multibyte characters we:
>
> - decode
> - encode
> - decode
>
> It is slow and really ugly. But we have this problem not only in
> tputc. We have a function utf8len:
>
>
> size_t
> utf8len(char *c) {
> return utf8decode(c, &(long){0}, UTF_SIZ);
> }
>
> That decode again the string because in some places we need the size
> of the utf8 string.

I am not an st developer and not familiar with the code, but the above
approach seems quite crazy...

> I think we should decode the utf8 character in the input, store it
> in raw unicode with 4 bytes, and encode again in output (usually in
> getsel or in printer functions). The memory usage is going to be the
> same, because we store the utf8 string with 'char c[UTF_SIZ]', where
> UTF_SIZE is 4 (although it should be bigger because if we accept
> unicode of 32 bits then we can receive utf8 strings of 6 bytes).

Sounds pretty sensible to me.
Received on Sat Aug 23 2014 - 17:47:49 CEST

This archive was generated by hypermail 2.3.0 : Sat Aug 23 2014 - 18:00:11 CEST