Re: [dev] alternatives to find for querying the filesystem

From: Andrew Gwozdziewycz <web_AT_apgwoz.com>
Date: Thu, 12 Dec 2013 15:00:32 -0500

On Thu, Dec 12, 2013 at 2:36 PM, Chris Down <chris_AT_chrisdown.name> wrote:
> On 2013-12-12 14:32:03 -0500, Andrew Gwozdziewycz wrote:
>> So, to find all files in /etc modified within the last hour...
>>
>> walk /etc | agep -1H -
>>
>> Or,
>>
>> walk /etc | xargs agep -1H
>
> The problem here is speed. For any non-trivial number of files, this
> will become non-negligibly slower due to the number of stat family calls
> required (and the cost of reinterpreting the data each time).

That's a great point, though, the idea would be (as in SQL) to
eliminate the most files as soon as possible. The user has some
intuition that find doesn't have (unless there's a relative order
implemented in find). So, the cost of parsing and reinterpreting
things presumably gets smaller and smaller. The number of stat calls
of course is duplicated across the pipelines, but again, fewer and
fewer each time through.

Assume that each filter halves the fileset of, say, 256 files (my /etc
directory on this OSX machine has just 247 files). That's less than
512 calls with a few filters. Is that really so bad on modern
hardware?

-- 
http://apgwoz.com
Received on Thu Dec 12 2013 - 21:00:32 CET

This archive was generated by hypermail 2.3.0 : Thu Dec 12 2013 - 21:12:06 CET