Re: [dev] [sbase] [PATCH] Rewrite tr(1) in a sane way

From: <random832_AT_fastmail.us>
Date: Fri, 09 Jan 2015 18:24:46 -0500

On Fri, Jan 9, 2015, at 18:08, FRIGN wrote:
>
> This is madness. If you want the bytes to be collated,

I don't see where you're getting that either of us want the bytes to be
collated. I don't even know what you mean by "collated", since collating
is not what tr does, except when ordering ranges.

> you just write the
> literal \50102.

Even if octal values could be more than three digits, I have no idea
what you think 50102 is. Its decimal value is 20546. Its hex value is
0x5042. I have no idea what it has to do with character U+00F6 whose
UTF-8 representation is 0xC3 0xB6..... I just realized what you're
doing, 0xC3B6 has the _decimal_ value 50102, I have no idea why you
would think _that_ is a representation people would want to use. If
you're so pro-unicode, make it accept \u00F6 - that's a valid extension.
But reusing the syntax POSIX uses for three-digit octal literals, for
arbitrarily long decimal literals that aren't even unicode code points,
makes no sense at all. In what universe is that intuitive?

> POSIX often is a solution to a problem that doesn't exist
> in the first place when you just use UTF-8.
>
> > They have nothing to do with UTF-8.
>
> That's exactly the point. Collating elements are depending on the current
> locale which is too much of a mess to deal with.

Huh?

> So when the Spanish "ll" collates before "m" and after "l" in a given
> locale, we don't give a fuck.
> So please give me the point why you are torturing me with this
> information.

Because collating elements are the thing POSIX forbids which you appear
to have _misinterpreted_ as forbidding multibyte characters. Otherwise I
have _no idea_ what in POSIX you interpret as preventing reasonable
behavior with UTF-8 multibyte characters.

> I stated that I did not implement collating elements into this tr(1) at
> the beginning and that it's a POSIX-nightmare to do so, bringing harm
> to anybody who is interested in a consistent, usable tool.

tl;dr:

Collating elements = POSIX forbids them = You don't want them anyway.
Multibyte characters = POSIX allows/requires them = You like them too.
What is the problem?
I don't know what you want to do that you think POSIX doesn't allow.
Received on Sat Jan 10 2015 - 00:24:46 CET

This archive was generated by hypermail 2.3.0 : Sat Jan 10 2015 - 00:36:07 CET