[dev] [sbase] wc output formatting

From: Michael Forney <mforney_AT_mforney.org>
Date: Sat, 2 Nov 2019 14:00:34 -0700


I was looking through wc.c in sbase and noticed a couple curious
things about the output formatting. POSIX says the tool should write

        "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>

When the tool was first written, it used fixed field widths,
presumably to maintain alignment when multiple files were specified.

        " %5zu %5zu %5zu %s", <newlines>, <words>, <bytes>, <file>

This seems to match what several other implementations do.

In 39802832[0], this was changed to

        "%*.zu%*.zu%*.zu %s", 0, <newlines>, 7, <words>, 7, <bytes>, <file>

with the intention of making the output POSIX compliant. '%*.zu' has a
field width specifier of '*' (meaning it is passed as an argument),
and a precision of '.' (equivalent to '.0', which means the value 0
produces no characters). I'm not sure exactly what the problem was, or
how this was meant to fix it, but there are a few issues with this:

1. Now that the first field has no minimum width, the width depends on
the number of newlines in the file, so the remaining fields are not
aligned, even though that have fixed minimum widths. If we don't care
about alignment, we may as well just use "%zu %zu %zu %s\n".
2. With a precision of 0, any counts with value 0 get skipped. I'm
guessing this was just a mistake, and there wasn't meant to be a
precision specifier at all ('%*zu').
3. Since the field may consume the full width, we might end up with no
separating whitespace between the fields.

Issue 2 was fixed in bbd2b4d2[1], by changing the precision specifier to 1

        "%*.1zu%*.1zu%*.1zu %s", 0, <newlines>, 7, <words>, 7, <bytes>, <file>

But, 1 is the default precision for 'u' conversions, so I think a
better change would be

        "%*zu%*zu%*zu %s", 0, <newlines>, 7, <words>, 7, <bytes>, <file>

Issue 3 was fixed in 79e8e330[2], by reducing the field width to 6,
and adding a leading space

         "%*.1zu %*.1zu %*.1zu %s", 0, <newlines>, 6, <words>, 6, <bytes>, <file>

This leaves issue 1, which makes me wonder about the point of the
field widths if they aren't for alignment of the output. If we don't
care about alignment, I think we should just use "%zu %zu %zu %s\n".
If we do care about the alignment, we should use fixed widths similar
to the original code, like "%6zu %6zu %6zu %s\n". But now we've come
full circle, which makes me wonder what POSIX compliance issue commit
39802832 was meant to fix. Is the leading whitespace for the first
field a problem? If so, I don't think trying for alignment makes sense
since we'd have to left-justify the first column, which breaks the
digit alignment.

Anyone have any thoughts on this?

[0] https://git.suckless.org/sbase/commit/39802832af40f1a24aa362ca73e369a0cd26ecf2.html
[1] https://git.suckless.org/sbase/commit/bbd2b4d2439e13d44ec7d1f55bbc84f23d256401.html
[2] https://git.suckless.org/sbase/commit/79e8e330cbfbce4cbabae7c2a31846a89afdc095.html
Received on Sat Nov 02 2019 - 22:00:34 CET

This archive was generated by hypermail 2.3.0 : Sun Nov 03 2019 - 00:36:09 CET