Re: [dev] ii: how to process out in a pipeline and still page with less

From: Markus Wichmann <nullplan_AT_gmx.net>
Date: Sat, 28 May 2022 22:04:27 +0200

On Sat, May 28, 2022 at 07:19:24PM +0000, Rodrigo Martins wrote:
> Hello, Markus,
>
> Thank for filling in the details. I should do more research next time.
>
> I tried to write a program that does the same as stdbuf(1), but using
> setbuf(3). Unfortunately it seems the buffering mode is reset across
> exec(3), since my program did not work. If it did that would be a
> clean solution.
>

But it cannot possibly happen that way, because the buffering set with
setbuf(3) is solely in userspace. And you fundamentally cannot change
anything about the userspace of another program, at least not in UNIX.

> Does the buffering happen on the input or on the output side? Or it
> that not the right way to look at it? Are these programs doing
> something wrong or is this a limitation by the specifications?
>

There is too much buffering and changing according to file mode going on
here. I had a program I called "syslogd" (on Windows) that would simply
listen to the syslog port on UDP and print all the packages that
arrived. Running just "syslogd" on its own would print all packages as
they came in, but running "syslogd | tr a A" would print blocks of data
long after the fact, making it useless for my usecase.

Why? Because for one, syslogd's output buffering mode had changed to
"fully buffered", now that the output was a pipe rather than a terminal.
tr's input buffering mode was also fully buffered now, but that doesn't
much matter, since the data is usually passed on quickly. It's just that
the data is only actually sent on from syslogd when syslogd's buffer is
full. Cygwin by default defines a BUFSIZ of 1024, so that's the buffer
that has to be filled first.

tr's buffer on input doesn't much matter, because the input buffer is
refilled on underflow, and then filled only as far as is possible in one
go. And tr's output buffer is line buffered in the above application,
making it perfect for the application. No, the problem was the change in
output mode for my own application happening as part of being in the
middle of a pipeline.

> Is modifying each program the best solutions we have? Granted it is
> not an invasive change, especially for simple pipeline-processing
> programs, but making such extensions could bring portability issues.
>

Modifying all programs is typically a bad solution. It is what the
systemd people are doing, and most here despise them for that if nothing
else. It just appears there is no simple solution for this problem other
than writing specialized programs. Having one special-case program is
better than changing all the general ones, right?

Ciao,
Markus
Received on Sat May 28 2022 - 22:04:27 CEST

This archive was generated by hypermail 2.3.0 : Sat May 28 2022 - 22:12:08 CEST