On Sat, Nov 13, 2010 at 10:53 PM, Damian Okrasa <dokrasa_AT_gmail.com> wrote:
> I removed the wchar_t completely, added some UTF-8 Â parsing functions.
> No support for combining, bidi, doublecolumn etc. Markus Kuhn's UTF-8
> stress test file is not working 100% correctly (the decoder works
> however, even when reading bytes one by one).
>
I noticed in canstou():
329 /* use this if your buffer is less than UTF_SIZ, it returns 1
if you can decode
330 UTF-8 otherwise return 0 */
331 static int canstou(char *s, int b) {
332 unsigned char c = *s;
333 int n;
334
335 if (b < 1)
336 return 0;
337 else if (~c&B7)
338 return 1;
339 else if ((c&(B7|B6|B5)) == (B7|B6))
340 n = 1;
341 else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
342 n = 2;
343 else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
344 n = 3;
345 else
346 return 1;
|
v this is never reached.
347 for (--b,++s; n>0&&b>0; --n,--b,++s) {
348 c = *s;
349 if ((c&(B7|B6)) != B7)
350 break;
351 }
352 if (n > 0 && b == 0)
353 return 0;
354 else
355 return 1;
356 }
If the current function is correct, then it can be simplified to:
/* use this if your buffer is less than UTF_SIZ, it returns 1 if you can decode
UTF-8 otherwise return 0 */
static int canstou(char *s, int b) {
unsigned char c = *s;
if (b < 1)
return 0;
else if (~c&B7)
return 1;
else if ((c&(B7|B6|B5)) == (B7|B6))
return 1;
else if ((c&(B7|B6|B5|B4)) == (B7|B6|B5))
return 2;
else if ((c&(B7|B6|B5|B4|B3)) == (B7|B6|B5|B4))
return 3;
return 1;
}
the (b < 1) check shouldnt probably be there either.
Offtopic and not specificly aimed at you:
I noticed the coding style of st is quite ugly. Lots of
non-descriptive variable names, recurring logic which could be grouped
in a function. Inconsistent. One can take an example to look at dwm
imo, it's pretty clean.
Kind regards,
Hiltjo
Received on Fri Nov 19 2010 - 14:38:41 CET
This archive was generated by hypermail 2.2.0 : Fri Nov 19 2010 - 14:48:02 CET