Re: [dev] Re: [9base][awk] printf and utf-8

From: mauro tonon <tononmr_AT_gmail.com>
Date: Tue, 22 Jan 2013 09:05:11 +0100

2013/1/22 Peter A. Shevtsov <petr.shevtsov_AT_gmail.com>:
> On 22/01/13 at 02:32pm, Peter A. Shevtsov wrote:
>
>> It seems that it counts every cyrillic letter as two, i. e. it ain't count letters
>> (or runes) but bytes.
>
> Indeed,
>
> echo latin кириллица | /usr/local/plan9/bin/awk '{printf("%d %d\n", length($1),
> length($2))}'
>
> 5 18
>

Also, awk can't know beforehand if the input string is UTF-8 encoded
or not, so the only thing it can do is to count bytes....
Received on Tue Jan 22 2013 - 09:05:11 CET

This archive was generated by hypermail 2.3.0 : Tue Jan 22 2013 - 09:12:04 CET