Re: [dev] GSoC 2010 from Anselm R Garbe on 2010-03-08 (dev mail list archive)

From: Anselm R Garbe <anselm_AT_garbe.us>
Date: Mon, 8 Mar 2010 16:07:42 +0000

On 8 March 2010 15:57, Gregor Best <gbe_AT_ring0.de> wrote:
> On Mon, Mar 08, 2010 at 03:44:28PM +0000, Anselm R Garbe wrote:
>> [...]
>> Sure, but according to the spec:
>>
>> "The strlen() function shall compute the number of bytes in the string
>> to which s points, not including the terminating null byte."
>>
>> strlen() should not count multi-char characters as 1 but rather return
>> number of bytes. Do you disagree?
>> [...]
>
> I never read the actual docs of that function (a few glances at the
> manpage aside), and if it definitely says "count the number of bytes",
> fine. But intuitively, I would've thought it gives the length of a
> string, as in "how many letters appear on my screen if I printf()
> this?".

Well if so, then many C programs would completely fall over, because
it is common to allocate buffers of the length returned by strlen(),
and if that returns just number of UTF-8 glyphs we'll have buffer
overflows in nearly any language except English presumably.

The only part where UTF-8 might matter are sorting routines, but I
wouldn't bother too much about it because in most case < or > on a
per-byte basis will still lead to reasonable results, which is another
reason for the beauty of UTF-8. And if you really want to use more
improved sorting routines, I'd recommend Plan 9 Rune's
(http://swtch.com/plan9port/man/man3/rune.html) on top of the plain
handling.

Cheers,
Anselm
Received on Mon Mar 08 2010 - 16:07:42 UTC

This archive was generated by hypermail 2.2.0 : Mon Mar 08 2010 - 16:12:04 UTC