Re: [hackers] [PATCH sbase] libutil/recurse: Split into two functions

From: Richard Ipsum <richardipsum_AT_vx21.xyz>
Date: Tue, 23 Jun 2020 18:43:53 +0200

On Tue, Jun 23, 2020 at 02:30:51AM -0700, Michael Forney wrote:
> recurse() is peculiar function since it is used for two purposes:
>
> 1. To bootstrap the recursion process.
> 2. To perform recursion when a directory is encountered.
>
> In the first case, we stat() the directory, record it in the history
> list, then depending on the value of DIRFIRST, we call the function
> and then recurse or vice-versa. The function then calls recurse()
> again on the same directory, but since we have already added it to
> the history list, this call gets pruned.
>
> In the second case, we stat() the directory a second time (we already
> did this when traversing the parent directory's entries), then call
> the function on each entry.
>
> This approach leads to a few problems:
>
> - We stat() each directory in the hierarchy twice, once at the
> beginning of recurse(), and once for each contained directory.
> - We need a DIRFIRST flag to specify whether the function recurses
> at the start or end, even though just running the function would
> do things in the correct order.
> - History is only checked when we are about to recurse on a directory,
> *after* we have already operated on it. So while we do prevent
> infinite loops, we double-count the directory.

I don't feel qualified to criticise the overall design, but do we not still
need a way to specify whether traversal should be pre-order or post-order?
I figure this is what DIRFIRST was for right?

>
[snip]
> +
> void
> recurse(int dirfd, const char *name, void *data, struct recursor *r)
> {
> - struct dirent *d;
> - struct history *new, *h;
> - struct stat st, dst;
> - DIR *dp;
> - int flags = 0, fd;
> - size_t pathlen = r->pathlen;
> -
> - if (dirfd == AT_FDCWD)
> - pathlen = estrlcpy(r->path, name, sizeof(r->path));

Now that we no longer do this, r->path is not being reset between
separate calls, so we have:

% mkdir D
% echo 'hello world' > f
% du D f
4 D
4 f
% ~/sbase/du D f
4 D
4 D

Applying

diff --git a/du.c b/du.c
index 6388159..e06234a 100644
--- a/du.c
+++ b/du.c
_AT_@ -109,6 +109,7 @@ main(int argc, char *argv[])
                for (; *argv; argc--, argv++) {
                        n = 0;
                        recurse(AT_FDCWD, *argv, &n, &r);
+ r.pathlen = 0;
                }
        }

fixes this, though I guess it's better to do the initialisation in recurse().

Thanks,
Richard
Received on Tue Jun 23 2020 - 18:43:53 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 23 2020 - 18:48:35 CEST