From: Robert Ransom <rransom.8774_AT_gmail.com>

Date: Wed, 4 Aug 2010 01:55:07 -0700

Received on Wed Aug 04 2010 - 10:55:07 CEST

Date: Wed, 4 Aug 2010 01:55:07 -0700

On Tue, 3 Aug 2010 23:08:49 -0400

Kris Maglione <maglione.k_AT_gmail.com> wrote:

*> On Tue, Aug 03, 2010 at 11:03:50PM -0400, Kris Maglione wrote:
*

*> >There isn't much science to hash functions,
*

*>
*

*> Let me rephrase that, given the nature of this list. There's not
*

*> much science to simple string hash functions.
*

More correctly, if all you're using a hash function for is assigning

strings to buckets in a small hash table, you can ignore most of the

mathematics involved.

If you have something fancier in mind, such as the Bentley-McIlroy

repeated substring detection algorithm, you may need to consider the

mathematical behaviour of your hash function. (The Bentley-McIlroy

algorithm computes a hash function over each contiguous N-character

substring of a string (which I will call a document here), and saves

every N-th substring (if N=10, it saves the hashes of d[0:9],

d[10:19], ...) in a hash table. If you encounter an N-char substring

which is already in your hash table, you have a repeated substring;

extend the match both backward and forward to find a repeated substring

of maximal length. This process is guaranteed to find all repeated

substrings of length 2N-1 or greater. Computing the hash function over

each N-character substring is tolerably efficient because they use a

polynomial hash function (almost, but not exactly like Mr. Todd's

example) in which one can quickly compute h(d[m:n]) from d[m], d[n],

and h(d[m-1:n-1]).)

Robert Ransom

- application/pgp-signature attachment: signature.asc

*
This archive was generated by hypermail 2.2.0
: Wed Aug 04 2010 - 11:00:04 CEST
*