Re: [dev] [sbase] [PATCH-UPDATE] Rewrite tr(1) in a sane way

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Dmitrij D. Czarkoff <czarkoff_AT_gmail.com>
Date: Sat, 10 Jan 2015 23:19:20 +0100

FRIGN said:
> On Sat, 10 Jan 2015 02:52:09 +0100
> "Dmitrij D. Czarkoff" <czarkoff_AT_gmail.com> wrote:
>
> > > +#define UPPER "A-Z"
> > > +#define LOWER "a-z"
> > > +#define PUNCT "!\"#$%&'()*+,-./:;<=>?_AT_[\\]^_`{|}~"
> >
> > These definitions hugely misrepresent corresponding character classes.
>
> I interpreted the character classes by default for the C locale. What do
> you mean by hugely misrepresenting? They are just fragments to build the
> classes later on.

No, you interpret the character classes for the C locale only, not just
by default. Character classes are useless for C locale ("A-Z" is easier
to type then "[:upper:]" anyway); they only really make sense for
scripts that are supposed to do The Right Thing™ for every locale.
Also, defining ranges on systems with no locale-aware collation rules
may be tricky.

As I gather, sbase is supposed to ignore POSIX locales, so there is no
reasonable hope that "[A-Z]" would actually match the whole alphabets of
languages based on Latin script. Thus the sanest default I see here is
to use isw* family of functions for matching characters against classes,
delegating the problem to libc, where it actually belongs.

That said, the defines in your patch appear to be fully compatible with
GNU and BSD implementations of tr(1), so you may as discourage use of
character classes in manual, label them as legacy compatibility syntax
and be done with it.

-- 
Dmitrij D. Czarkoff

Received on Sat Jan 10 2015 - 23:19:20 CET

This archive was generated by hypermail 2.3.0 : Sat Jan 10 2015 - 23:24:08 CET