Hello,
I finally have a new sed implementation. It's littered with FIXMEs and
there are some points that need to be discussed, but for the most part
it works like it should. It definitely has some work until it sucks
less.
A summary of points that should be discussed, for more detail read the
source looking for FIXME:
1) When should we choose POSIX behavior and when should we choose GNU
behavior? There are a number of differences, many times the GNU
behavior seems to make more sense. (no final newline,
semicolon/whitespace terminated labels and filenames, etc.)
2) Should we strictly enforce valid UTF-8? In the script? In the
input? Currently it's enforced in a few places of the script because
that made it easier for me, but it's not enforced in the input file.
3) Pending a resolution on (2), should we allow nul bytes in the
input? Currently I'm using libc's string functions so nul bytes cause
bad things to happen. If we decide to support nul bytes it'll be a
rather large change.
4) Which extensions over POSIX should be implemented? (\t for tab in
regex and s replacement text? etc.)
And of course any other comments/criticism/ideas please.
-emg
- text/x-csrc attachment: sed.c
Received on Wed Feb 04 2015 - 00:18:59 CET