Re: [dev] [sbase][RFC] Add a simplistic version of tr from Silvan Jegen on 2013-11-30 (dev mail list archive)

From: Silvan Jegen <s.jegen_AT_gmail.com>
Date: Sat, 30 Nov 2013 12:38:21 +0100

On Thu, Nov 28, 2013 at 12:45:40PM +0200, sin wrote:
> On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote:
> > Hi
> >
> > This is a braindead and incomplete implementation of tr that only
> > works for one-byte encodings. Do you think it makes sense to use this
> > implementation as some kind of stopgap-measure until we have a more
> > robust version of tr?
>
> This particular version of the patch does not introduce a manpage
> which would be necessary to document the limited behaviour of the
> current program.

I can add a man page as soon as we have decided whether we want Unicode
support or not.

> I am starting to wonder, do you guys think it would make sense to
> have a staging branch that we can use for incomplete tools? Currently
> some of the tools implement a subset of the total behaviour but I'd
> like to believe that they implement that subset correctly. As long as
> we document that they can go in master with possible eprintf("not implemented");
> calls for the options that we care about.
>
> Programs that are obviously buggy can go in the staging branch.

I don't mind either way. Having a staging area could allow the project
to grow faster since not every contribution has to be complete to be
included.

> > If you you would rather not take this version, what approach would
> > you take for the character set mapping when using UTF-8? A hashmap-,
> > or B-tree-based solution or something else entirely?
>
> I am not knowledgeable enough about UTF-8 so I can't answer this.
> A B-tree is I think an overkill for sbase. We do not have a nice
> implementation of a hash table in sbase as we did not need it but
> if we go down that path it makes sense to put this in util/ so other
> programs can benefit. Currently we don't have an implementation of
> a singly linked list that we can reuse, but that is trivial enough and
> we've re-implemented it wherever needed (with the minimum set of
> operations needed for each tool). I can send an implementation of
> a hash table that I've used for my own programs, MIT/X licensed and it is
> simple enough.
>
> Regarding UTF-8, some other programs in sbase also lack proper handling
> of UTF-8. Do you think we could embed libutf8 from suckless.org and
> use it?

I think having Unicode support is necessary at least in the long run
and UTF-8 is the way to go. libutf provides the most basic handling of
UTF-8 but should be sufficient as long as you do not want to go into
text normalization too much [1] [2]. BTW, the most recently updated version of
the library seems to be at https://github.com/cls/libutf/commits/master
and not at http://git.suckless.org/libutf/ for some reason.

[1] http://blog.golang.org/normalization
[2] http://mortoray.com/2013/11/27/the-string-type-is-broken/

> > +usage(void)
> > +{
> > + eprintf("usage: tr set1 [set2]\n");
> > +}
>
> Use %s and argv0.

I changed it in the new version of the patch that I will send out when
we have decided the Unicode issue.

> > +void
> > +handle_escapes(char *s)
> > +{
> > + switch(*s) {
> > + case 'n':
> > + *s = '\x0A';
> > + break;
> > + case 't':
> > + *s = '\x09';
> > + break;
> > + case '\\':
> > + *s = '\x5c';
> > + break;
> > + }
> > +}
>
> I have not yet applied this patch but I suspect you have
> mixed whitespace + tabs here. Use tabs only.

You were right. I changed the whitespace to be tabs only.

> > + if (ferror(stdin)) {
> > + eprintf("<stdin>: read error:");
> > + return EXIT_FAILURE;
> > + }
>
> Indentation issues.

Corrected.

Cheers,

Silvan
Received on Sat Nov 30 2013 - 12:38:21 CET

This archive was generated by hypermail 2.3.0 : Sat Nov 30 2013 - 12:48:06 CET