Re: [dev] [sbase][RFC] Add a simplistic version of tr

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Strake <strake888_AT_gmail.com>
Date: Thu, 28 Nov 2013 13:24:40 -0500

On 28/11/2013, Silvan Jegen <s.jegen_AT_gmail.com> wrote:
> On Thu, Nov 28, 2013 at 11:45:33AM -0500, Strake wrote:
>> > (either using UTF-8 or UTF-32 indices), right?
>>
>> I meant Unicodepoints; those are just Unicodecs.
>
> UTF-32 is an encoding that is identical to the unicode point as far as
> I know. So what I am thinking is that one would either use the UTF-8
> representation of the Unicode point as an index, or the unicode point
> itself. Since using UTF-8 would not require any conversion (on UTF-8
> locales) I think it would be preferrable.

UTF-8 has variable width, so one must find the length of the sequence
anyhow and shift it bytewise into an integer, so one may as well just
use fgetwc or the like and work with codepoints.
Received on Thu Nov 28 2013 - 19:24:40 CET

This archive was generated by hypermail 2.3.0 : Thu Nov 28 2013 - 19:36:08 CET