On Sat, 10 Jan 2015 22:47:09 +0100
Markus Wichmann <nullplan_AT_gmx.net> wrote:
> You wanted to be Unicode compatible, right? Because in that case I
> expect [:alpha:] to be the class of all characters in General Category L
> (that is, Lu, Ll, Lt, Lm, or Lo). That includes a few more characters
> than just A-Z and a-z. And I don't see you add any other character to
> that class later.
Okay, to clear this up once and for all.
Initially, I planned to just ignore the [:CLASS:]-blocks in the interest
of a simpler implementation (If you go all the way, you end up with a
complex and crufted POSIX-libc-mess). But Dimitris and Hiltjo rightfully
criticized that we can't just break scripts that easily. So this was one
motivation for a basic support to at least provide semi-consistent
behaviour.
I also take in regard that glibc is not the only libc around. toupper()
only operates on ASCII anyway, so you can't work with that.
> So, what I'm saying is, you can't have it both ways: Either you support
> Unicode or not.
That's true, but I never aimed for Unicode-support. I just in the initial
sense support UTF-8, which allows mapping all Unicode characters.
> I really don't see a way to achieve this without including a database of
> sorts into tr itself. (...) If we had a variable
> iterate from 1 to Unicode maximum and call iswalpha() for every one,
> we'd get the set of all alphabetic characters. Can this work for us?
Or we just stop worrying about it.
The only reason why I added the raw classes is not to break scripts in a
major way.
I agree that A-Z is not sufficient to define [:upper:]. What I planned
was to also include the greek and cyrillic alphabet with a number of
accented characters. At the end of the day, we can be relaxed looking
at how flexible this tr(1)-implementation is to allow these ideas.
In 99% of the cases, A-Z is sufficient though. But for a better experience,
I'll augment it as soon as I have put together some of my ideas.
Thanks for your feedback!
Cheers
FRIGN
--
FRIGN <dev_AT_frign.de>
Received on Sun Jan 11 2015 - 11:17:58 CET