Re: [dev] off topic - awk versions performance comparison

From: Uriel <uriel_AT_berlinblue.org>
Date: Sat, 14 Aug 2010 22:38:32 +0200

On Sat, Aug 14, 2010 at 12:07 PM, Kris Maglione <maglione.k_AT_gmail.com> wrote:
> On Wed, Aug 11, 2010 at 03:06:13PM -0400, Joseph Xu wrote:
>>
>> I was using GNU tr. The input files were single lines with 10000 or
>> 1000000 y's, so I was doing 100 matches in each case (from the for loop) on
>> the same line. I guess I should have made that more explicit, sorry. I'm
>> interested in what results you're getting.
>
> Sorry, I get tetchy when I haven't had enough sleep:
>
> 0.00u 0.00s 0.00r        nawk {for (i=1; i < 100; i++) { match($0, "y")} }
> big
> 0.01u 0.00s 0.02r        nawk {for (i=1; i < 100; i++) { match($0, "y")} }
> bigger
> 0.08u 0.00s 0.08r        gawk {for (i=1; i < 100; i++) { match($0, "y")} }
> big
> 7.80u 0.03s 8.08r        gawk {for (i=1; i < 100; i++) { match($0, "y")} }
> bigger
> 0.04u 0.00s 0.05r        /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) {
> match($0, "y")} } big
> 5.00u 0.01s 5.20r        /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) {
> match($0, "y")} } bigger
>
> nawk is one-true-awk from FreeBSD. I find the results strange, namely
> because Plan 9's awk is also one-true-awk. It also produces reasonable
> results with "^y" instead of "y", while gawk doesn't. I know that nawk uses
> a combination NFA/DFA, but I see that Plan 9's awk instead pre-process the
> expression and uses Plan 9's pure-NFA engine. My guess is that, because
> they're greedy algorithms, they both traverse the entire string looking for
> a possibly longer match, but my understanding of Plan 9's altorithm was
> otherwise.

I think one of the very few differences between Plan 9's awk and bwk's
awk is UTF-8 support, not sure why that would make a difference, but I
thought they were mostly identical otherwise.

uriel

> Looking at gawk, it also uses a DFA algorithm, so I really can't
> explain the shortcoming, especially when the anchor is used, but I refuse to
> look deeper because gawk is written to GNU coding standards and I value my
> sanity.
>
> As for my previous result, I have a shell function defined that replaces awk
> with nawk, and it seems that I called awk rather than gawk explicitly.
>
> Oh, and looking at my sig, it's serenditipously by BWK...
>
> --
> Kris Maglione
>
> As we said in the preface to the first edition, C "wears well as one's
> experience with it grows." With a decade more experience, we still
> feel that way.
>        --Brian Kernighan and Dennis Ritchie
>
>
>
Received on Sat Aug 14 2010 - 22:38:32 CEST

This archive was generated by hypermail 2.2.0 : Sat Aug 14 2010 - 22:48:02 CEST