Re: [dev] Re: [9base][awk] printf and utf-8

From: Sam Watkins <sam_AT_nipl.net>
Date: Wed, 23 Jan 2013 01:08:15 +1100

On Tue, Jan 22, 2013 at 09:05:11AM +0100, mauro tonon wrote:
> 2013/1/22 Peter A. Shevtsov <petr.shevtsov_AT_gmail.com>:
> > On 22/01/13 at 02:32pm, Peter A. Shevtsov wrote:
> >
> >> It seems that it counts every cyrillic letter as two, i. e. it ain't count letters
> >> (or runes) but bytes.
> >
> > Indeed,
> >
> > echo latin ?????????????????? | /usr/local/plan9/bin/awk '{printf("%d %d\n", length($1),
> > length($2))}'
> >
> > 5 18
> >
>
> Also, awk can't know beforehand if the input string is UTF-8 encoded
> or not, so the only thing it can do is to count bytes....

Don't we have environment vars for that? or do they suck?

In plan 9, everything is utf-8, no?

anyway, I say stick with counting bytes, for better performance!
Received on Tue Jan 22 2013 - 15:08:15 CET

This archive was generated by hypermail 2.3.0 : Tue Jan 22 2013 - 15:12:04 CET