Re: [dev][st][patch] new utf decoder

From: Silvan Jegen <s.jegen_AT_gmail.com>
Date: Mon, 24 Mar 2014 11:29:12 +0100

On Mon, Mar 24, 2014 at 9:43 AM, Christoph Lohmann <20h_AT_r-36.net> wrote:
> Greetings.
>
> On Mon, 24 Mar 2014 09:43:23 +0100 "Roberto E. Vargas Caballero" <k0ga_AT_shike2.com> wrote:
>> > It is number of function calls, on cat dwm
>> >
>> > cat UTF-8-demo yields:
>> > utflen 113
>> > utfencode 8152
>> > utfdecode 198346
>> >
>> > So I think only utfdecode need to be optimised if necessary.
>>
>> I also like the patch, so if nobody complaints about it then I will apply next
>> week.
>
> The naming is wrong. It is just decoding »utf8« and can’t decode
> »utf16«. So: s,utf,utf8,g

While we are talking about variable naming...

> _AT_@ -1308,9 +1257,8 @@ ttyread(void) {
> /* process every complete utf8 char */
> buflen += ret;
> ptr = buf;
> - while(buflen >= UTF_SIZ || isfullutf8(ptr,buflen)) {
> - charsize = utf8decode(ptr, &utf8c);
> - utf8encode(&utf8c, s);
> + while(charsize = utfdecode(ptr, &utf8c, buflen)) {
> + utfencode(utf8c, s, UTF_SIZ);

utf8c is actually a Unicode point and has nothing to do with utf-8 any more.

> tputc(s, charsize);
> ptr += charsize;
> buflen -= charsize;
> _AT_@ -2420,7 +2368,7 @@ tputc(char *c, int len) {
> if(len == 1) {
> width = 1;
> } else {
> - utf8decode(c, &u8char);
> + utfdecode(c, &u8char, UTF_SIZ);

Same here for u8char.

> width = wcwidth(u8char);
> }
>
> _AT_@ -3293,7 +3241,7 @@ xdraws(char *s, Glyph base, int x, int y, int charlen, int bytelen) {
> oneatatime = font->width != xw.cw;
> for(;;) {
> u8c = s;
> - u8cblen = utf8decode(s, &u8char);
> + u8cblen = utfdecode(s, &u8char, UTF_SIZ);

And here.

> s += u8cblen;
> bytelen -= u8cblen;
>
> _AT_@ -3430,7 +3378,7 @@ xdrawcursor(void) {
> memcpy(g.c, term.line[term.c.y][term.c.x].c, UTF_SIZ);
>
> /* remove the old cursor */
> - sl = utf8size(term.line[oldy][oldx].c);
> + sl = utflen(term.line[oldy][oldx].c);
> width = (term.line[oldy][oldx].mode & ATTR_WIDE)? 2 : 1;
> xdraws(term.line[oldy][oldx].c, term.line[oldy][oldx], oldx,
> oldy, width, sl);
> _AT_@ -3444,7 +3392,7 @@ xdrawcursor(void) {
> g.bg = defaultfg;
> }
>
> - sl = utf8size(g.c);
> + sl = utflen(g.c);
> width = (term.line[term.c.y][curx].mode & ATTR_WIDE)\
> ? 2 : 1;
> xdraws(g.c, g, term.c.x, term.c.y, width, sl);
> _AT_@ -3548,7 +3496,7 @@ drawregion(int x1, int y1, int x2, int y2) {
> base = new;
> }
>
> - sl = utf8decode(new.c, &u8char);
> + sl = utfdecode(new.c, &u8char, UTF_SIZ);

And here.

I would suggest using a name that reflects that fact (like 'unicodep' perhaps?).


Cheers,

Silvan
Received on Mon Mar 24 2014 - 11:29:12 CET

This archive was generated by hypermail 2.3.0 : Mon Mar 24 2014 - 11:36:06 CET