Re: [dev] [sbase] [PATCH-UPDATE] Rewrite tr(1) in a sane way

From: FRIGN <dev_AT_frign.de>
Date: Sun, 11 Jan 2015 11:17:58 +0100

On Sat, 10 Jan 2015 22:47:09 +0100
Markus Wichmann <nullplan_AT_gmx.net> wrote:

> You wanted to be Unicode compatible, right? Because in that case I
> expect [:alpha:] to be the class of all characters in General Category L
> (that is, Lu, Ll, Lt, Lm, or Lo). That includes a few more characters
> than just A-Z and a-z. And I don't see you add any other character to
> that class later.

Okay, to clear this up once and for all.
Initially, I planned to just ignore the [:CLASS:]-blocks in the interest
of a simpler implementation (If you go all the way, you end up with a
complex and crufted POSIX-libc-mess). But Dimitris and Hiltjo rightfully
criticized that we can't just break scripts that easily. So this was one
motivation for a basic support to at least provide semi-consistent
behaviour.
I also take in regard that glibc is not the only libc around. toupper()
only operates on ASCII anyway, so you can't work with that.

> So, what I'm saying is, you can't have it both ways: Either you support
> Unicode or not.

That's true, but I never aimed for Unicode-support. I just in the initial
sense support UTF-8, which allows mapping all Unicode characters.

> I really don't see a way to achieve this without including a database of
> sorts into tr itself. (...) If we had a variable
> iterate from 1 to Unicode maximum and call iswalpha() for every one,
> we'd get the set of all alphabetic characters. Can this work for us?

Or we just stop worrying about it.

The only reason why I added the raw classes is not to break scripts in a
major way.
I agree that A-Z is not sufficient to define [:upper:]. What I planned
was to also include the greek and cyrillic alphabet with a number of
accented characters. At the end of the day, we can be relaxed looking
at how flexible this tr(1)-implementation is to allow these ideas.

In 99% of the cases, A-Z is sufficient though. But for a better experience,
I'll augment it as soon as I have put together some of my ideas.

Thanks for your feedback!

Cheers

FRIGN

-- 
FRIGN <dev_AT_frign.de>
Received on Sun Jan 11 2015 - 11:17:58 CET

This archive was generated by hypermail 2.3.0 : Sun Jan 11 2015 - 11:24:08 CET