Re: [dev] [sbase][sed] first pass at sed, request for comments from FRIGN on 2015-02-04 (dev mail list archive)

From: FRIGN <dev_AT_frign.de>
Date: Wed, 4 Feb 2015 19:27:57 +0100

On Tue, 3 Feb 2015 15:18:59 -0800
Evan Gates <evan.gates_AT_gmail.com> wrote:

Hey Evan,

> I finally have a new sed implementation. It's littered with FIXMEs and
> there are some points that need to be discussed, but for the most part
> it works like it should. It definitely has some work until it sucks
> less.

that sounds very cool! sed(1) is a big point on the TODO-list and I'm
glad you sat down and worked on it!
Let me get to your points:

> 1) When should we choose POSIX behavior and when should we choose GNU
> behavior? There are a number of differences, many times the GNU
> behavior seems to make more sense. (no final newline,
> semicolon/whitespace terminated labels and filenames, etc.)

Sometimes, going the third way is best. Make it as flexible as possible.
If a delimiter is limited to one char, implement it in a way that
arbitrary strings are allowed for instance (this could be improved
here). GNU behaviour can be erratic in many cases and reflecting on it,
the POSIX-people mostly have good reasons to make software behave as it
does. In some cases, POSIX is not suckless though.

Can you be more specific and rule out the cases where GNU differs from
POSIX? I am absolutely sure this has to be decided on a
case-by-case-basis.

> 2) Should we strictly enforce valid UTF-8? In the script? In the
> input? Currently it's enforced in a few places of the script because
> that made it easier for me, but it's not enforced in the input file.

Use chartorunearray instead of handrolling it. Having a Rune-array can
make things simpler.
Also, readrune already deals with invalid UTF-8 in such a way that
partial reads are returned with RuneError.

> 3) Pending a resolution on (2), should we allow nul bytes in the
> input? Currently I'm using libc's string functions so nul bytes cause
> bad things to happen. If we decide to support nul bytes it'll be a
> rather large change.

Given I'm not an expert in this area. Can anybody give a reason to
support nul bytes? If there are benefits, it should be supported.
GNU bullshit should never be a reference in _lack_ of features, as seen
in many other cases.

> 4) Which extensions over POSIX should be implemented? (\t for tab in
> regex and s replacement text? etc.)

Look at unescape(). POSIX defines escaped characters rather
inconsistently across the base, which is not good. Use util-functions
where possible and try to keep the code as concise as possible.

I have not tested it yet, but if you say it works as of now, nothing
speaks against pulling it into sbase.

Keep up the great work!

Cheers

FRIGN

-- 
FRIGN <dev_AT_frign.de>

Received on Wed Feb 04 2015 - 19:27:57 CET

This archive was generated by hypermail 2.3.0 : Wed Feb 04 2015 - 19:36:07 CET