Re: [dev] Re: [st] multibyte patch

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Fri, 19 Nov 2010 14:38:41 +0100

On Sat, Nov 13, 2010 at 10:53 PM, Damian Okrasa <dokrasa_AT_gmail.com> wrote:
> I removed the wchar_t completely, added some UTF-8  parsing functions.
> No support for combining, bidi, doublecolumn etc. Markus Kuhn's UTF-8
> stress test file is not working 100% correctly (the decoder works
> however, even when reading bytes one by one).
>

I noticed in canstou():

   329 /* use this if your buffer is less than UTF_SIZ, it returns 1
if you can decode
   330 UTF-8 otherwise return 0 */
   331 static int canstou(char *s, int b) {
   332 unsigned char c = *s;
   333 int n;
   334
   335 if (b < 1)
   336 return 0;
   337 else if (~c&B7)
   338 return 1;
   339 else if ((c&(B7|B6|B5)) == (B7|B6))
   340 n = 1;
   341 else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
   342 n = 2;
   343 else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
   344 n = 3;
   345 else
   346 return 1;

        |
        v this is never reached.
   347 for (--b,++s; n>0&&b>0; --n,--b,++s) {
   348 c = *s;
   349 if ((c&(B7|B6)) != B7)
   350 break;
   351 }
   352 if (n > 0 && b == 0)
   353 return 0;
   354 else
   355 return 1;
   356 }

If the current function is correct, then it can be simplified to:

/* use this if your buffer is less than UTF_SIZ, it returns 1 if you can decode
   UTF-8 otherwise return 0 */
static int canstou(char *s, int b) {
        unsigned char c = *s;

        if (b < 1)
                return 0;
        else if (~c&B7)
                return 1;
        else if ((c&(B7|B6|B5)) == (B7|B6))
                return 1;
        else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
                return 2;
        else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
                return 3;
        return 1;
}

the (b < 1) check shouldnt probably be there either.

Offtopic and not specificly aimed at you:
I noticed the coding style of st is quite ugly. Lots of
non-descriptive variable names, recurring logic which could be grouped
in a function. Inconsistent. One can take an example to look at dwm
imo, it's pretty clean.

Kind regards,
Hiltjo
Received on Fri Nov 19 2010 - 14:38:41 CET

This archive was generated by hypermail 2.2.0 : Fri Nov 19 2010 - 14:48:02 CET