Re: [hackers] [sbase] [PATCH 11/11] tail: Process bytes with -c option, and add -m option for runes

From: Michael Forney <mforney_AT_mforney.org>
Date: Tue, 27 Dec 2016 18:03:24 -0800

On 12/27/16, Evan Gates <evan.gates_AT_gmail.com> wrote:
> On Tue, Dec 27, 2016 at 5:55 AM, Laslo Hunhold <dev_AT_frign.de> wrote:
>> well-spotted! Still, it's _very_ counterintuitive to call the flag
>> "-c". Instead of adding a non-portable m-flag, it would even sound
>> better to me to add a b-flag for byte-offsets.

Yes, it's a bit counter-intuitive, but conflicting with POSIX for this
alone seems like a really bad idea. I always consult POSIX when
writing shell scripts to ensure that they will run on any conforming
system. If sbase decided that the option character name was not the
best choice, then reasonable, valid, and portable scripts may start
operating unexpectedly with no indication as to why.

Also, wc(1) (even sbase's implementation) uses -c to refer to bytes,
and -m to refer to characters. It wouldn't be self-consistent to make
tail use -b for bytes and -c for characters. (Just to clarify, I also
think it would be a really bad idea to make wc use -b for bytes and -c
for characters).

>> It all depends on how many scripts rely on this behaviour. Can you give
>> an example?

Sure. gcc's build system uses tail to skip the first 16 bytes of the
binaries to check that stage2 and stage3 are the same. Granted, it
does use non-standard syntax tail +16c, and I don't know that there
are any bytes in there with the high bit set, but still, tail *does*
get invoked on binary files, and treating the byte offsets as
characters will break things in strange ways that are difficult to
debug.

>> I thought cut(1) was the tool of choice for extracting
>> headers and such things.

How do you use cut(1) to strip the first 512 bytes of a binary file?
It operates on lines.

> I think deviating from POSIX here is a bad idea. Every deviation from
> POSIX means that our tools cannot be used in another situation and
> pushes prospective users away. If the user wants characters instead of
> bytes we have tools to do that, don't surprise the user by doing
> something different than every other implementation.
>
> P.S. I too found -c confusing the first time I expected utf8
> characters, but remembering these tools were created with ascii in
> mind, I think of -c as char and it all works out...

Agreed.
Received on Wed Dec 28 2016 - 03:03:24 CET

This archive was generated by hypermail 2.3.0 : Wed Dec 28 2016 - 03:12:15 CET