Re: [dev] Re: coreutils / moreutils - DC a directory counter

From: Bjartur Thorlacius <svartman95_AT_gmail.com>
Date: Mon, 29 Jul 2013 11:28:49 +0000

On mán 29.júl 2013 04:39, Paul Hoffman wrote:
> Their 100+ Perl and bash scripts are slow because they're opening files
> in a humongous directory. They can't subdivide the directory because
> they're afraid that they will break the scripts when modifying them.
I posted a comprehensive comment on the blog post that has yet to be
approved by the censor. In short, ext2/3 directories are linked lists.
You can traverse said list in constant space and process each entry when
you encounter it. O(n) time is unavoidable. Bash globs and ls listings
are automatically sorted. Dash and ls -f or find . -type f don't.
Switching to dash might result in all sorts of compatibility issues, but
s/ls/ls -f/g is easy to test, and just might work. And then s/ \* /
\$(ls -f) /g (assuming old regex, \n in $IFS).

Dividing the directory into folders requires structural changes and just
contains the scalability issue instead of just not sorting. Sorting does
not only take up to O(n^2) time, but requires searching for every single
entry in the linked list. That's equal to traversing half the list n
times, instead of all of the list just once.

http://ext2.sourceforge.net/2005-ols/paper-html/node3.html

P.S. This just might be my favorite regex: s/ \* / \$\(ls -f\) /g.
Received on Mon Jul 29 2013 - 13:28:49 CEST

This archive was generated by hypermail 2.3.0 : Mon Jul 29 2013 - 13:36:06 CEST