Re: [dev] off topic - awk versions performance comparison

From: Kris Maglione <maglione.k_AT_gmail.com>
Date: Sat, 14 Aug 2010 06:07:21 -0400

On Wed, Aug 11, 2010 at 03:06:13PM -0400, Joseph Xu wrote:
>I was using GNU tr. The input files were single lines with
>10000 or 1000000 y's, so I was doing 100 matches in each case
>(from the for loop) on the same line. I guess I should have
>made that more explicit, sorry. I'm interested in what results
>you're getting.

Sorry, I get tetchy when I haven't had enough sleep:

0.00u 0.00s 0.00r nawk {for (i=1; i < 100; i++) { match($0, "y")} } big
0.01u 0.00s 0.02r nawk {for (i=1; i < 100; i++) { match($0, "y")} } bigger
0.08u 0.00s 0.08r gawk {for (i=1; i < 100; i++) { match($0, "y")} } big
7.80u 0.03s 8.08r gawk {for (i=1; i < 100; i++) { match($0, "y")} } bigger
0.04u 0.00s 0.05r /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { match($0, "y")} } big
5.00u 0.01s 5.20r /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { match($0, "y")} } bigger

nawk is one-true-awk from FreeBSD. I find the results strange,
namely because Plan 9's awk is also one-true-awk. It also
produces reasonable results with "^y" instead of "y", while gawk
doesn't. I know that nawk uses a combination NFA/DFA, but I see
that Plan 9's awk instead pre-process the expression and uses
Plan 9's pure-NFA engine. My guess is that, because they're
greedy algorithms, they both traverse the entire string looking
for a possibly longer match, but my understanding of Plan 9's
altorithm was otherwise. Looking at gawk, it also uses a DFA
algorithm, so I really can't explain the shortcoming, especially
when the anchor is used, but I refuse to look deeper because
gawk is written to GNU coding standards and I value my sanity.

As for my previous result, I have a shell function defined that
replaces awk with nawk, and it seems that I called awk rather
than gawk explicitly.

Oh, and looking at my sig, it's serenditipously by BWK...

-- 
Kris Maglione
As we said in the preface to the first edition, C "wears well as one's
experience with it grows." With a decade more experience, we still
feel that way.
	--Brian Kernighan and Dennis Ritchie
Received on Sat Aug 14 2010 - 12:07:21 CEST

This archive was generated by hypermail 2.2.0 : Sat Aug 14 2010 - 12:12:02 CEST