Re: [hackers] [sbase] [PATCH 00/10] IO improvements and some bug fixes from Michael Forney on 2016-12-05 (hackers mail list archive)

From: Michael Forney <mforney_AT_mforney.org>
Date: Mon, 5 Dec 2016 00:51:07 -0800

On Sun, Dec 4, 2016 at 11:27 PM, Laslo Hunhold <dev_AT_frign.de> wrote:
> On Sun, 4 Dec 2016 21:55:02 -0800
> Michael Forney <mforney_AT_mforney.org> wrote:
>
> Dear Michael,
>
>> I finally got around to addressing the issues with fread I raised at
>> the end of http://lists.suckless.org/dev/1504/26580.html.
>>
>> This patch series eliminates several uses of fread/fwrite where plain
>> read/write are more appropriate. To help with the common case of
>> writing an entire buffer to a file descriptor, I added a writeall
>> function (which works similar to plan9's standard write function and
>> the Write/write_all functions in go/rust). All users of concat were
>> better suited to work on plain file descriptors except for tail, so I
>> switched concat to a read/writeall loop, and tail to use a
>> fgetc/fputc loop.
>
> fgetc/fputc really can slow down your program, which I noticed when I
> wrote farbfeld-filters. Working on a buffer-basis rather than a
> char-by-char-basis really speeds up things a lot (for me it was around
> 50%-70%).

Internally, fgetc/fputc are still using the buffers behind the FILE
structures, so the only overhead I can imagine is due to the repeated
function calls. The previous tail -f code used getline/fputs. We could
fix the NUL byte issue with fwrite instead of fputs, but I'd like to
make sure that it actually makes any noticeable difference before
trying this.

> Can you provide some performance-information e.g. for cat(1)?
> Does it get any faster with the writeall()-loop? What other benefits do
> we see here?

I compared the time it took to cat a 4.3G file to /dev/null. After
running several times to wait for any file caching to settle down, I
see

before:
    0m00.81s real 0m00.14s user 0m00.67s system

after:
    0m00.66s real 0m00.03s user 0m00.63s system

Sidenote: this is more like 1.4s real when I build with musl because
musl's BUFSIZ is 1024, and glibc's is 8192. I'm thinking concat should
use a fixed buffer length like 8192 rather than BUFSIZ. go's Copy uses
32*1024, and rust's uses 8*1024. But this is outside the scope of this
patch set. I might send in another patch later.

> Can you elaborate in this thread again why we should do
> raw I/O? In a naïve sense, the operating system can "shedule" reads and
> writes more efficiently with the buffered I/O.

For the concat case, previously what was happening was essentially this

fread(BUFSIZ)
  -> read(BUFSIZ) = r1, read(BUFSIZ - r1) = r2, read(BUFSIZ - r1 - r2) = r3, ...
  -> copy data from FILE buffer to provided buffer
fwrite(BUFSIZ)
  -> copy data from provided buffer to FILE buffer
  -> write(BUFSIZ) = w1, write(BUFSIZ - w1) = w2, write(BUFSIZ - w1 -
w2) = w3, ...

now, we do this

read(BUFSIZ) = r1
write(r1), ...
read(BUFSIZ) = r2
write(r2), ...
read(BUFSIZ) = r3
write(r3), ...

So in terms of system calls, it is near identical. We also don't have
to copy the data twice. The only thing that could make this slower
than above is if the reads are returning tiny chunks, so we end up
making more write calls. We could fix this in concat by making sure to
read at least some amount of data before writing, perhaps controlled
by an argument which we can pass when cat's -u is set. I didn't bother
with this for now because openbsd's cat doesn't do it. If your input
file descriptor is only returning data in small chunks, it probably
isn't coming in very fast, so it doesn't matter much to make sure
writes happen in big chunks.

For the cases where we are just processing the incoming data like *sum
and od, the raw read really is the right way to go. Both fread and the
new code are making the same read calls into a buffer of size BUFSIZ.
The only difference could be that we go through more loop iterations
if the reads come in small chunks rather then doing the extra copy to
the provided buffer.

I think a good rule of thumb is if you are trying to read data in and
don't care what size chunks it comes in, use read. fread offers you no
advantages. Otherwise, use fread and let stdio take care of buffering
behind the scenes.

>> The result should be more efficient (no copy between read/write
>> buffers, and data is processed as it comes in), and is roughly the
>> same complexity as before. In most cases, we now pull in less of
>> stdio, so statically linked binaries get smaller. Additionally,
>> cat/tee on a pipe/terminal is now usable.
>
> That makes sense.
>
>> Along the way, I found and fixed several bugs, mostly dealing with
>> error checking.
>> I've been running with these patches for a couple days now and
>> haven't noticed any regressions.
>
> That is great! I'm glad you test your patches before submitting, which
> I have been guilty of neglecting for so many times.
Received on Mon Dec 05 2016 - 09:51:07 CET

This archive was generated by hypermail 2.3.0 : Mon Dec 05 2016 - 10:00:25 CET