Re: [dev] [sbase] [PATCH] Rewrite tr(1) in a sane way
On Fri, Jan 9, 2015, at 18:08, FRIGN wrote:
>
> This is madness. If you want the bytes to be collated,
I don't see where you're getting that either of us want the bytes to be
collated. I don't even know what you mean by "collated", since collating
is not what tr does, except when ordering ranges.
> you just write the
> literal \50102.
Even if octal values could be more than three digits, I have no idea
what you think 50102 is. Its decimal value is 20546. Its hex value is
0x5042. I have no idea what it has to do with character U+00F6 whose
UTF-8 representation is 0xC3 0xB6..... I just realized what you're
doing, 0xC3B6 has the _decimal_ value 50102, I have no idea why you
would think _that_ is a representation people would want to use. If
you're so pro-unicode, make it accept \u00F6 - that's a valid extension.
But reusing the syntax POSIX uses for three-digit octal literals, for
arbitrarily long decimal literals that aren't even unicode code points,
makes no sense at all. In what universe is that intuitive?
> POSIX often is a solution to a problem that doesn't exist
> in the first place when you just use UTF-8.
>
> > They have nothing to do with UTF-8.
>
> That's exactly the point. Collating elements are depending on the current
> locale which is too much of a mess to deal with.
Huh?
> So when the Spanish "ll" collates before "m" and after "l" in a given
> locale, we don't give a fuck.
> So please give me the point why you are torturing me with this
> information.
Because collating elements are the thing POSIX forbids which you appear
to have _misinterpreted_ as forbidding multibyte characters. Otherwise I
have _no idea_ what in POSIX you interpret as preventing reasonable
behavior with UTF-8 multibyte characters.
> I stated that I did not implement collating elements into this tr(1) at
> the beginning and that it's a POSIX-nightmare to do so, bringing harm
> to anybody who is interested in a consistent, usable tool.
tl;dr:
Collating elements = POSIX forbids them = You don't want them anyway.
Multibyte characters = POSIX allows/requires them = You like them too.
What is the problem?
I don't know what you want to do that you think POSIX doesn't allow.
Received on Sat Jan 10 2015 - 00:24:46 CET
This archive was generated by hypermail 2.3.0
: Sat Jan 10 2015 - 00:36:07 CET