Re: [dev] [9base][awk] printf and utf-8

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: David Dufberg Tøttrup <david_AT_dufberg.se>
Date: 22 Jan 2013 11:05:50 +0100

Speaking of the devil; if a string contains an invalid UTF-8 char, substr
gets a really wierd behavior: $ echo | ~/program/9base/awk/awk '{s =
sprintf("asdf%casdf", 195); printf("\"%s\"\n", substr(s, 6, 4)); print s;}'
"" asdfÃasdf

Try changing the second and third arg of substr (set length to 1 and it
returns "f").

David

On Jan 22 2013, Peter A. Shevtsov wrote:

>Hello,
>
> I've found the bug in 9base's awk. It seems that printf works incorrectly
> with utf-8 strings. The way it counts string lengs is weird:
>
> echo latin кириллица | /usr/local/plan9/bin/awk
> '{printf("[%20s][%20s]\n", $1, $2)}'
>
>and the output is:
>
>[ latin][ кириллица]
>
> It seems that it counts every cyrillic letter as two, i. e. it ain't
> count letters (or runes) but bytes.
>
>
Received on Tue Jan 22 2013 - 11:05:50 CET

This archive was generated by hypermail 2.3.0 : Tue Jan 22 2013 - 11:12:04 CET