Abstract:
Benchmark program for hash tables and comparison of 15 popular hash functions.

Created by Peter Kankowski
Last changed
Contributors: Nils, Ace, Won, Andrew M., and Georgi 'Sanmayce'
Filed under Algorithms

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

Hash functions: An empirical comparison

Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. Hashing in large databases is also left out from this article; the benchmark includes medium-size hash tables such as:

  • symbol table in a parser,
  • IP address table for filtering network traffic,
  • the dictionary in a word counting program or a spellchecker.

There are two classes of the functions used in hash tables:

  • multiplicative hash functions, which are simple and fast, but have a high number of collisions;
  • more complex functions, which have better quality, but take more time to calculate.

Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.

The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?

Multiplicative hash functions

Any multiplicative hash function is a special case of the following algorithm:

UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
   UINT hash = INITIAL_VALUE;
   for(UINT i = 0; i < len; ++i)
      hash = M * hash + key[i];
   return hash % TABLE_SIZE;
}

(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.

A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:

  INITIAL_VALUE * M^4  +  'T' * M^3  +  'O' * M^2  +  'N' * M  +  'E'

Let's enter several similar strings and watch the output of the functions:

     Bernstein Kernighan
        (M=33)     (M=31)
 too   b88af17     1c154
 top   b88af18     1c155
 tor   b88af1a     1c157
 tpp   b88af39     1c174
a000  7c9312d6    2cd22f
a001  7c9312d7    2cd230
a002  7c9312d8    2cd231
a003  7c9312d9    2cd232
a004  7c9312da    2cd233
a005  7c9312db    2cd234
a006  7c9312dc    2cd235
a007  7c9312dd    2cd236
a008  7c9312de    2cd237
a009  7c9312df    2cd238
a010  7c9312f7    2cd24e
   a     2b606        61
  aa    597727       c20
 aaa   b885c68     17841

Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.

Now let's compare top with tpp. Their hashes will be:

  INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P'
  INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'

The hashes will be different by M * ('P' - 'O') = M. Similarly, when the first letters are different by x, their hashes will be different by x * M^2.

When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).

If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.

There are no exact rules for choosing the multiplier, only some heuristics:

  • the multiplier should be large enough to accommodate most of the possible letters (e.g., 3 or 5 is too small);
  • the multiplier should be fast to calculate with shifts and additions [e.g., 33 * hash can be calculated as (hash << 5) + hash];
  • the multiplier should be odd for the reason explained above;
  • prime numbers are good multipliers.

Complex hash functions

These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:

     Paul Hsieh One At Time
 too   3ad11d33  3a9fad1e  
 top   78b5a877  4c5dd09a  
 tor   c09e2021  f2aa9d35  
 tpp   3058996d  d5e9e480  
a000   7552599f  ed3859d8  
a001   3cc1d896  fef7fd57  
a002   c6ff5c9b  08a610b3  
a003   dcab7b0c  1a88b478  
a004   780c7202  3621ebaa  
a005   7eb63e3a  47db8f1d  
a006   6b0a7a17  b901717b  
a007   cb5cb1ab  caec1550  
a008   5c2a15c0  e58d4a92  
a009   33339829  f75aee2d  
a010   eb1f336e  bd097a6b  
   a   115ea782  ca2e9442  
  aa   008ad357  7081738e  
 aaa   7dfdc310  ae4f22ec

To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?

Test conditions

The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on Pentium-M and Core i5 processors.

The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:

  • the list of common words from Wiktionary (500 items);
  • the list of Win32 functions from Colorer syntax highlight scheme (1992 items);
  • 500 names from a000 to a499 (imitates the names in auto-generated source code);
  • the list of common words with a long prefix and postfix;
  • all variable names from WordPress 2.3.2 source code in wp-includes folder (1842 names);
  • list of all words in Sonnets by W. Shakespeare (imitates a word counting program; 3228 words);
  • list of all words in La Peau de chagrin by Balzac (in French, UTF-8 encoding);
  • search engine IP addresses (binary).

Results

Pentium-M processor

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
Jesteress78[110]418[397]59[300]119[102]114[106]353[366]513[585]2420[2427]439[1499]1.02[3.19]
Meiyan80[102]425[409]56[125]123[106]119[112]354[350]524[588]2445[2377]445[768]1.03[1.86]
Novak unrolled91[113]517[399]56[90]168[118]163[113]399[342]574[581]2718[2430]482[969]1.20[1.67]
Fletcher83[131]443[406]102[460]139[127]132[108]375[507]593[1052]2901[4893]514[1359]1.23[4.60]
SBox88[91]551[431]57[116]181[108]177[91]412[347]559[526]2816[2442]472[836]1.24[1.77]
Murmur296[103]530[415]64[104]165[106]161[111]433[383]619[566]2928[2399]537[834]1.27[1.73]
CRC-3290[101]564[426]56[64]196[107]190[94]427[338]590[563]2836[2400]469[725]1.28[1.40]
x17 unrolled93[109]592[415]52[24]214[113]207[102]433[368]592[589]2864[2392]486[829]1.33[1.18]
lookup395[101]566[412]71[97]189[101]181[95]436[361]634[550]2949[2392]570[834]1.35[1.64]
K&R96[106]619[437]58[288]221[94]218[106]447[360]604[561]3021[2365]448[831]1.37[2.99]
Paul Larson95[99]630[416]50[16]232[99]227[105]455[366]600[583]3024[2447]469[755]1.38[1.09]
Paul Hsieh105[114]574[420]71[118]182[101]177[100]456[341]680[600]3155[2380]579[847]1.38[1.82]
Bernstein95[114]621[412]61[288]225[100]221[102]445[353]593[572]3002[2380]469[703]1.39[2.98]
x6559999[111]630[382]60[203]234[107]231[122]459[379]616[560]3073[2373]472[846]1.42[2.44]
Sedgewick101[107]666[414]52[48]245[103]241[103]478[348]630[570]3204[2437]476[782]1.45[1.32]
Murmur2A113[114]597[433]79[102]181[112]176[109]490[365]719[544]3383[2369]650[772]1.47[1.72]
FNV-1a101[124]658[428]62[108]239[94]236[105]472[374]624[555]3136[2446]517[807]1.47[1.76]
MaPrime2c108[103]705[426]65[106]255[91]253[106]509[349]674[550]3413[2406]541[865]1.57[1.72]
Ramakrishna108[108]726[409]78[91]277[125]271[103]510[360]662[528]3376[2383]517[840]1.63[1.65]
Arash Partow106[101]740[435]93[420]281[98]275[85]515[355]672[570]3344[2372]542[779]1.68[3.87]
One At Time118[105]832[421]81[110]319[97]315[103]577[364]741[545]3807[2346]657[795]1.86[1.74]
Weinberger120[104]958[422]55[100]377[111]379[117]617[364]747[712]3994[2547]569[744]1.95[1.74]
Hanson86[118]531[649]55[112]167[118]1644[499]393[435]550[592]2745[2890]463[833]2.63[2.44]

Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Avg column contains the average normalized execution time (and the number of collisions).

The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).

As you can see from the table, the function with the lowest number of collisions is not always the fastest one.

Results on a large data set (list of all words in English Wikipedia, 12.5 million words, from the benchmark by Georgi 'Sanmayce'):

Core i5 processor

WikipediaAvg
iSCSI CRC5686064[2077725]1.00[1.00]
Jesteress5757994[2121868]1.01[1.02]
Meiyan5781950[2111271]1.02[1.02]
Murmur26255797[2081476]1.10[1.00]
Paul Larson6305466[2080111]1.11[1.00]
x655996410172[2102893]1.13[1.01]
FNV-1a6581385[2081195]1.16[1.00]
Hanson6835921[2129832]1.20[1.03]
CRC-326927378[2075088]1.22[1.00]
Sedgewick7012507[2080640]1.23[1.00]
K&R7057453[2083145]1.24[1.00]
SBox7142103[2084018]1.26[1.00]
Bernstein7189250[2074237]1.26[1.00]
Murmur2A7305910[2081370]1.28[1.00]
lookup37285984[2084889]1.28[1.01]
Paul Hsieh7335526[2180206]1.29[1.05]
x17 unrolled7351758[2410605]1.29[1.16]
Ramakrishna8133363[2093253]1.43[1.01]
One At Time8271113[2087861]1.45[1.01]
MaPrime2c8365788[2084467]1.47[1.00]
Arash Partow8437500[2084572]1.48[1.00]
Weinberger9344561[3541181]1.64[1.71]
Novak unrolled21137740[6318611]3.72[3.05]
Fletcher22100750[9063797]3.89[4.37]

Pentium-M processor

WikipediaAvg
x17 unrolled11216841[2410605]1.00[1.16]
K&R11613260[2083145]1.04[1.00]
Bernstein11660386[2074237]1.04[1.00]
Paul Larson11794370[2080111]1.05[1.00]
Sedgewick12029681[2080640]1.07[1.00]
x6559912029638[2102893]1.07[1.01]
Arash Partow12164940[2084572]1.08[1.00]
Ramakrishna12099292[2093253]1.08[1.01]
Jesteress12160339[2121868]1.08[1.02]
Meiyan12163372[2111271]1.08[1.02]
CRC-3212496534[2075088]1.11[1.00]
Murmur212577192[2081476]1.12[1.00]
Hanson12532500[2129832]1.12[1.03]
SBox12626848[2084018]1.13[1.00]
lookup312697932[2084889]1.13[1.01]
FNV-1a12760139[2081195]1.14[1.00]
Paul Hsieh12892492[2180206]1.15[1.05]
Murmur2A12994766[2081370]1.16[1.00]
MaPrime2c13266856[2084467]1.18[1.00]
One At Time13575868[2087861]1.21[1.01]
Weinberger14512944[3541181]1.29[1.71]
Fletcher36966422[9063797]3.30[4.37]
Novak unrolled37239016[6318611]3.32[3.05]

Some functions were excluded from the benchmark because of very bad performance:

The number of collisions depending on the hash table size (for the same data set, thanks to Ace for the idea):

For 28 bits: Novak unrolled - 5.9 million collisions, Fletcher - 4.9 million collisions, Weinberger - 1.1 million collisions, x17 unrolled - 0.8 million collisions, Paul Hsieh - about 0.4 million collisions, other functions - about 0.3 million collisions

Red Dragon Book proposes the following formula for evaluating hash function quality:

sum from j=0 to m-1: b_j(b_j+1)/2 / [(n/2m)(n+2m-1)]

where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value. The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, which is not bad.

Here are the results for some of our functions:

Hash function quality (using the formula from Red Dragon book). In Numbers test: K&R and Bernstein - 1.6, Jesteress - 1.45, x65599 - 1.2, x17 and Paul Larson - 0.8, CRC-32 - 0.9. SBox, Murmur2, Paul Hsieh, and lookup3 - between 0.95 and 1.05. In other tests all functions have the quality between 0.95 and 1.05.

Conclusion

Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (words and sonnets tests).

For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.

Novak showed bad results on the large data set. Jesteress has a high number of collisions in numbers test.

Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.

Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.

Download the source code (152 KB, MSVC++)

Variations

XORing high and low part

For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:

   return hash ^ (hash >> 16);

Subtracting a constant

x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.

Using larger multipliers for a compiler

Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.

The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).

Setting hash table size to a prime number

A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:

WordsWin32NumbersPrefixPostfixVariablesShakespeare
Bernstein % 2K145[261]880[889]426[8030]326[214]316[226]649[697]874[1131]
Bernstein % prime186[221]1049[995]445[5621]364[194]357[217]805[800]1123[1051]
Bernstein optimized mod160[221]960[995]416[5621]341[194]334[217]722[800]969[1051]
x17 % 2K137[193]847[1002]81[340]314[244]300[228]641[863]832[1012]
x17 % prime173[256]1010[1026]104[324]356[246]339[216]760[760]1046[1064]
x17 optimized mod155[256]915[1026]96[324]330[246]315[216]691[760]930[1064]

Implementing open addressing vs. separate chaining

With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:

Bernst.K&Rx17 unrollx65599FNVUnivWeinb.HsiehOne-atLookup3PartowCRC
OA4268184207889127311010392104279
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
32-bit17969741148680125105999234782
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
chain92687382888473107999514984
[500][500][24][258][124][48][100][138][131][108][1530][64]

You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.

Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.

Peter Kankowski
Peter Kankowski

About the author

Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP (on Windows platform). He can be reached at kankowski@narod.ru.

165 comments

Ten recent comments are shown below. Show all comments

Georgi 'Sanmayce',

Mantis source and Benchmark-Dumps also (http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z) (224,916,121 bytes) fixed now,

here is the bugless Mantis:

define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	const char *p = str;
	// Cases: 0,1,2,3,4,5,6,7
	if (wrdlen & sizeof(DWORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(DWORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(WORD);
	}
	if (wrdlen & 1) {
		hash32 = (hash32 ^ *p) * PRIME;
		p += sizeof(char);
		//wrdlen -= sizeof(char);
	}
		wrdlen -= p-str;
// The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress.
// FNV1A_Jesteress: fastest strong
// FNV1A_Meiyan   : faster  stronger
// FNV1A_Mantis   : fast    strongest
	if (wrdlen) {
	for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME;		
	}
		hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME;
	} // Bug Fixed!
	return hash32 ^ (hash32 >> 16);
}


Peter Kankowski,

Mantis results on Pentium M:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
Jesteress78[110]419[397]59[300]121[102]115[106]353[366]512[585]2411[2427]439[1499]1.02[3.19]
Meiyan79[102]427[409]56[125]122[106]118[112]354[350]524[588]2439[2377]445[768]1.03[1.86]
Mantis84[97]471[404]56[125]140[117]133[99]389[370]551[585]2646[2388]446[768]1.10[1.86]
Novak unrolled90[113]518[399]56[90]168[118]163[113]401[342]576[581]2728[2430]482[969]1.20[1.67]
Fletcher84[131]443[406]102[460]139[127]132[108]377[507]594[1052]2912[4893]513[1359]1.23[4.60]
SBox88[91]551[431]57[116]181[108]176[91]413[347]560[526]2817[2442]472[836]1.24[1.77]
Murmur294[103]529[415]64[104]165[106]160[111]430[383]619[566]2933[2399]538[834]1.26[1.73]
CRC-3290[101]566[426]56[64]197[107]192[94]427[338]593[563]2840[2400]469[725]1.28[1.40]
x17 unrolled93[109]592[415]52[24]214[113]207[102]435[368]593[589]2867[2392]486[829]1.32[1.18]
lookup395[101]565[412]71[97]189[101]183[95]436[361]633[550]2954[2392]570[834]1.35[1.64]
K&R94[106]617[437]58[288]221[94]218[106]443[360]588[561]2966[2365]448[831]1.36[2.99]
Paul Larson94[99]630[416]50[16]231[99]227[105]455[366]600[583]3026[2447]469[755]1.38[1.09]
Bernstein95[114]621[412]61[288]225[100]221[102]445[353]593[572]3002[2380]469[703]1.38[2.98]
Paul Hsieh106[114]574[420]71[118]183[101]179[100]457[341]684[600]3168[2380]579[847]1.39[1.82]
x6559994[111]626[382]61[203]234[107]231[122]451[379]597[560]2997[2373]471[846]1.40[2.44]
Sedgewick101[107]666[414]53[48]244[103]241[103]477[348]630[570]3205[2437]475[782]1.45[1.32]
Murmur2A112[114]598[433]79[102]181[112]176[109]490[365]721[544]3382[2369]650[772]1.46[1.72]
FNV-1a101[124]659[428]62[108]239[94]235[105]472[374]625[555]3135[2446]517[807]1.46[1.76]
MaPrime2c107[103]705[426]65[106]255[91]253[106]509[349]675[550]3414[2406]541[865]1.56[1.72]
Ramakrishna108[108]727[409]61[91]277[125]271[103]513[360]669[528]3413[2383]517[840]1.59[1.65]
Arash Partow107[101]739[435]93[420]281[98]274[85]516[355]674[570]3355[2372]542[779]1.68[3.87]
One At Time118[105]832[421]81[110]320[97]316[103]576[364]741[545]3806[2346]657[795]1.85[1.74]
Weinberger120[104]957[422]54[100]376[111]378[117]623[364]753[712]4019[2547]560[744]1.94[1.74]
Hanson87[118]530[649]55[112]168[118]1647[499]393[435]549[592]2740[2890]462[833]2.62[2.44]
WikipediaAvg
x17 unrolled11407606[2410605]1.00[1.16]
K&R11743083[2083145]1.03[1.00]
Bernstein11850076[2074237]1.04[1.00]
Paul Larson11998017[2080111]1.05[1.00]
Sedgewick12224089[2080640]1.07[1.00]
x6559912166596[2102893]1.07[1.01]
Arash Partow12374085[2084572]1.08[1.00]
Ramakrishna12334890[2093253]1.08[1.01]
Meiyan12381114[2111271]1.09[1.02]
Jesteress12390279[2121868]1.09[1.02]
Mantis12525368[2082213]1.10[1.00]
CRC-3212739133[2075088]1.12[1.00]
Murmur212815546[2081476]1.12[1.00]
Hanson12766271[2129832]1.12[1.03]
SBox12851512[2084018]1.13[1.00]
lookup312934889[2084889]1.13[1.01]
FNV-1a12982534[2081195]1.14[1.00]
Paul Hsieh13126250[2180206]1.15[1.05]
Murmur2A13204842[2081370]1.16[1.00]
MaPrime2c13489436[2084467]1.18[1.00]
One At Time13793712[2087861]1.21[1.01]
Weinberger14772418[3541181]1.29[1.71]
Fletcher37809825[9063797]3.31[4.37]
Novak unrolled38061845[6318611]3.34[3.05]

Core i5:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
iSCSI CRC74[105]330[415]36[112]85[106]83[92]278[368]407[584]1974[2388]322[838]1.01[1.77]
Meiyan74[102]328[409]45[125]86[106]85[112]274[350]413[588]1979[2377]352[768]1.05[1.86]
Jesteress76[110]324[397]46[300]86[102]84[106]274[366]410[585]1964[2427]366[1499]1.06[3.19]
Mantis78[97]358[404]45[125]100[117]96[99]301[370]425[585]2115[2388]348[768]1.12[1.86]
Murmur271[103]378[415]49[104]109[106]106[111]314[383]452[566]2181[2399]398[834]1.19[1.73]
SBox70[91]439[431]46[116]124[108]122[91]300[347]429[526]2151[2442]378[836]1.22[1.77]
Paul Larson69[99]396[416]39[16]143[99]141[105]305[366]435[583]2159[2447]349[755]1.23[1.09]
Novak unrolled76[113]461[399]44[90]126[118]125[113]322[342]460[581]2283[2430]379[969]1.26[1.67]
CRC-3270[101]489[426]39[64]146[107]144[94]320[338]444[563]2229[2400]357[725]1.28[1.40]
Sedgewick73[107]476[414]42[48]144[103]143[103]318[348]450[570]2255[2437]349[782]1.29[1.32]
Fletcher71[131]402[406]80[460]103[127]99[108]311[507]482[1052]2476[4893]387[1359]1.30[4.60]
Murmur2A91[114]408[433]53[102]117[112]115[109]335[365]496[544]2382[2369]428[772]1.31[1.72]
x6559973[111]463[382]51[203]144[107]145[122]315[379]440[560]2182[2373]350[846]1.31[2.44]
FNV-1a73[124]469[428]53[108]144[94]144[105]310[374]443[555]2203[2446]373[807]1.32[1.76]
Bernstein74[114]432[412]49[288]150[100]150[102]323[353]450[572]2276[2380]351[703]1.32[2.98]
Paul Hsieh82[114]413[420]54[118]127[101]125[100]342[341]502[600]2379[2380]433[847]1.33[1.82]
K&R77[106]494[437]47[288]150[94]149[106]334[360]481[561]2364[2365]344[831]1.35[2.99]
x17 unrolled78[109]510[415]44[24]156[113]153[102]343[368]473[589]2367[2392]373[829]1.38[1.18]
lookup383[101]460[412]55[97]140[101]138[95]361[361]527[550]2497[2392]427[834]1.41[1.64]
MaPrime2c79[103]457[426]57[106]155[91]155[106]349[349]484[550]2491[2406]405[865]1.42[1.72]
Ramakrishna82[108]587[409]45[91]189[125]186[103]371[360]493[528]2602[2383]381[840]1.53[1.65]
Arash Partow83[101]560[435]70[420]215[98]212[85]391[355]507[570]2624[2372]408[779]1.69[3.87]
One At Time84[105]561[421]58[110]221[97]220[103]391[364]509[545]2645[2346]459[795]1.70[1.74]
Weinberger85[104]595[422]42[100]266[111]279[117]402[364]548[712]2769[2547]419[744]1.81[1.74]
Hanson72[118]469[649]45[112]122[118]1217[499]311[435]436[592]2238[2890]370[833]2.70[2.44]
WikipediaAvg
iSCSI CRC5814350[2077725]1.00[1.00]
Jesteress5820373[2121868]1.00[1.02]
Meiyan5833530[2111271]1.00[1.02]
Mantis5987971[2082213]1.03[1.00]
Murmur26320510[2081476]1.09[1.00]
Paul Larson6349291[2080111]1.09[1.00]
x655996478493[2102893]1.11[1.01]
FNV-1a6645319[2081195]1.14[1.00]
Hanson6891590[2129832]1.19[1.03]
SBox6959635[2084018]1.20[1.00]
Sedgewick7072573[2080640]1.22[1.00]
CRC-327090810[2075088]1.22[1.00]
K&R7145952[2083145]1.23[1.00]
Bernstein7256091[2074237]1.25[1.00]
Murmur2A7369543[2081370]1.27[1.00]
lookup37359399[2084889]1.27[1.01]
Paul Hsieh7417808[2180206]1.28[1.05]
x17 unrolled7419562[2410605]1.28[1.16]
Ramakrishna8183394[2093253]1.41[1.01]
One At Time8332427[2087861]1.43[1.01]
MaPrime2c8433854[2084467]1.45[1.00]
Arash Partow8501990[2084572]1.46[1.00]
Weinberger9433157[3541181]1.62[1.71]
Novak unrolled21350811[6318611]3.67[3.05]
Fletcher22272811[9063797]3.83[4.37]

Generally, Mantis has similar number of collisions to Meiyan, but Mantis is slower.

Georgi 'Sanmayce',

Thanks for testing,

at link below is my attempt to present Heavy-Hash-Hustle dumps in a more digestible fashion:

http://www.sanmayce.com/Fastest_Hash/index.html#Heavy-Hash-Hustle

Intel Core 2 Quad Q9550S Yorkfield 2.83GHz 12MB L2 Cache:

D:\_2010-Dec-05\_KAZE_hash_test_r3.RESULTS.Atom.Q9550S\RESULTS_Q9550S>sort IP.TXT /+75

8388608 elements in the table (23 bits)

2995394 lines read

        FNV1A_Meiyan:    1914155   1914404   1914122   1914486   1913915|   1913915 [  593723]
     FNV1A_Jesteress:    1935195   1937318   1937469   1936312   1935495|   1935195 [  691369]
     Alfalfa_Rollick:    1980953   1981207   1980671   1981201   1981856|   1980671 [  604098]
        FNV1A_Mantis:    2016612   1995406   1993654   1993662   1993258|   1993258 [  481137]
      Novak unrolled:    2010097   2009996   2009829   2010971   2009834|   2009829 [  657377]
              Hanson:    2012466   2012191   2012537   2012350   2012236|   2012191 [  534251]
       FNV1A_Smaragd:    2030879   2028917   2029161   2029013   2028666|   2028666 [  480914]
              CRC-32:    2034011   2033842   2035153   2033904   2034696|   2033842 [  472854]
        FNV1A_Jester:    2050708   2050776   2050694   2050668   2051040|   2050668 [  689339]
                 K&R:    2064544   2064406   2065423   2065249   2065812|   2064406 [  474011]
             Murmur2:    2066808   2066666   2066382   2066497   2066478|   2066382 [  476330]
             Alfalfa:    2068149   2067824   2068182   2068081   2067452|   2067452 [  475434]
        Alfalfa_HALF:    2071077   2071411   2071349   2070671   2071446|   2070671 [  480071]
       Alfalfa_DWORD:    2081631   2081774   2081485   2081846   2081789|   2081485 [  475434]
     FNV1A_Peregrine:    2098860   2098795   2098753   2098718   2098887|   2098718 [  546915]
        x17 unrolled:    2112134   2114144   2112556   2112551   2112255|   2112134 [  475528]
         Paul Larson:    2121042   2121087   2120977   2120391   2121282|   2120391 [  475575]
          FNV1A_Whiz:    2127393   2127963   2127276   2126858   2127638|   2126858 [  689339]
                SBox:    2137206   2137208   2137045   2136962   2136563|   2136563 [  476681]
           Sedgewick:    2137897   2138151   2137352   2138251   2137971|   2137352 [  477931]
           Bernstein:    2165058   2164180   2164633   2164803   2164633|   2164180 [  474048]
              FNV-1a:    2169036   2169173   2168907   2168979   2168931|   2168907 [  477067]
     FNV1A_Nefertiti:    2184810   2184848   2184522   2184738   2185078|   2184522 [  763451]
       Alfalfa_QWORD:    2190122   2189991   2189488   2189910   2189894|   2189488 [  475434]
          Paul Hsieh:    2196409   2195461   2195511   2195236   2195676|   2195236 [  543835]
          Weinberger:    2203173   2203305   2202832   2202739   2202064|   2202064 [ 1159267]
     Sixtinsensitive:    2225863   2225928   2226657   2226719   2226205|   2225863 [  582793]
        Arash Partow:    2241218   2241227   2241293   2240949   2240652|   2240652 [  478246]
             lookup3:    2246638   2246955   2247440   2247833   2247080|   2246638 [  476566]
           MaPrime2c:    2250515   2250361   2251148   2251678   2252040|   2250361 [  477151]
         Ramakrishna:    2268889   2268536   2268740   2268861   2268980|   2268536 [  476020]
    Sixtinsensitive+:    2272421   2272439   2272689   2272420   2272411|   2272411 [  716367]
              x65599:    2341902   2342241   2341970   2341930   2341731|   2341731 [  654463]
         One At Time:    2360163   2360112   2360203   2359913   2359343|   2359343 [  477667]
            Fletcher:   24424563  24401091  24400993  24395252  24394052|  24394052 [ 2856890]

Intel Atom N450 1.66GHz 512KB L2 Cache:

D:\_2010-Dec-05\_KAZE_hash_test_r3.RESULTS.Atom.Q9550S\RESULTS_Atom>sort IP.TXT /+75

8388608 elements in the table (23 bits)

2995394 lines read

        FNV1A_Meiyan:    3155958   3159537   3157158   3158773   3156719|   3155958 [  593723]
          Weinberger:    3248750   3246228   3248741   3249811   3243717|   3243717 [ 1159267]
     FNV1A_Jesteress:    3253953   3256344   3264163   3254169   3254820|   3253953 [  691369]
        FNV1A_Mantis:    3314496   3281163   3284368   3283819   3281818|   3281163 [  481137]
              CRC-32:    3343937   3345737   3356535   3347699   3345348|   3343937 [  472854]
     FNV1A_Peregrine:    3376694   3378179   3386379   3379572   3377683|   3376694 [  546915]
        Alfalfa_HALF:    3379134   3382696   3380707   3377835   3392481|   3377835 [  480071]
       FNV1A_Smaragd:    3379196   3382352   3380635   3379201   3379668|   3379196 [  480914]
             lookup3:    3424892   3430226   3426356   3421977   3426662|   3421977 [  476566]
             Murmur2:    3430503   3430565   3429001   3429150   3427817|   3427817 [  476330]
              Hanson:    3433796   3434865   3435452   3435932   3432672|   3432672 [  534251]
          Paul Hsieh:    3435899   3433042   3435868   3436050   3436606|   3433042 [  543835]
     Alfalfa_Rollick:    3465457   3459388   3456972   3458956   3458066|   3456972 [  604098]
           Bernstein:    3487977   3491575   3494922   3486510   3487834|   3486510 [  474048]
        FNV1A_Jester:    3504276   3504202   3510988   3505340   3503647|   3503647 [  689339]
     Sixtinsensitive:    3515471   3519793   3523615   3517541   3519258|   3515471 [  582793]
          FNV1A_Whiz:    3533310   3534248   3531524   3534347   3531857|   3531524 [  689339]
        x17 unrolled:    3542761   3543107   3541781   3540489   3542713|   3540489 [  475528]
        Arash Partow:    3542980   3542590   3542073   3542242   3545691|   3542073 [  478246]
    Sixtinsensitive+:    3546786   3546044   3546339   3546185   3544348|   3544348 [  716367]
                 K&R:    3564368   3562589   3565386   3565243   3563323|   3562589 [  474011]
              FNV-1a:    3576419   3588693   3579410   3576049   3579704|   3576049 [  477067]
      Novak unrolled:    3595877   3582905   3580473   3580509   3582652|   3580473 [  657377]
             Alfalfa:    3583829   3584782   3582429   3584607   3581127|   3581127 [  475434]
     FNV1A_Nefertiti:    3593590   3588775   3591870   3593603   3590078|   3588775 [  763451]
       Alfalfa_DWORD:    3622713   3620781   3622630   3622695   3626486|   3620781 [  475434]
         Ramakrishna:    3651378   3652073   3652729   3652338   3651156|   3651156 [  476020]
           Sedgewick:    3661915   3656998   3656072   3656996   3662433|   3656072 [  477931]
         Paul Larson:    3706139   3694095   3692079   3694297   3694153|   3692079 [  475575]
       Alfalfa_QWORD:    3696531   3702283   3701765   3703357   3699967|   3696531 [  475434]
                SBox:    3785574   3779069   3780873   3782789   3785080|   3779069 [  476681]
         One At Time:    3955247   3954321   3955469   3963892   3953817|   3953817 [  477667]
              x65599:    3998490   4006600   3997098   3994519   3996586|   3994519 [  654463]
           MaPrime2c:    4191055   4193915   4193048   4204844   4193416|   4191055 [  477151]
            Fletcher:   77210862  77226344  77294433  77235843  77205175|  77205175 [ 2856890]

My wish is to cover all meaningful(at least for LZ) lengths that is 3..66 bytes but a different approach must be commenced because of HUGE size of dataset:

3*46486= 139,458 bytes
...
66*198631486=13,109,678,076 bytes

Speaking of very precious(regarding English language usage, and original thoughts used, also hundreds of books included) OSHO.TXT I propose one simple way of achieving Building-Blocks hashing:

loading 197MB(the file itself) and hashing(3..66 chunks) at each position(i.e. one byte increment).

Another thing I want to share regarding collision managing:

approaches(rehashing, chains, ...) without definite goals i.e. context are like kata(detailed choreographed patterns of movements)(the real fight is an extension/mix of kata/techniques with complex timing which includes awareness of timings of outer things not just your own timing), I mean if enough(free RAM) resources are given not utilizing/exploiting them and talking about speed as this-and-that is a dead-end.

For example I tested(now commented) a FNV1A variant hash function in Leprechaun which outperforms(hash time + lookup time) FNV1A_Jesteress by (1,8??,???-1,6??,???)/1,6??,???*100%= 12.5%, but it is completely due to B-tree used as collision manager at final stage.

This very hasher performs not well while other techniques are used, though.

The point is, speed is something beyond all limitations imposed, it must be chased for each niche relentlessly.

One phenom in real world is Mr. Bolt: his fantastic technique is being constantly improved, or as he says in one interview he and his trainer work on even faster than one of the fastest starts in 100M races. It's just amazing the tallest sprinter to have one of most explosive starts as well. And even more amazing is the will for improvements.

I have a sambo practitioner buddy who had said about his 100/200M records: "What technique? It is just left-right left-right!"

Of course I did disagree. Neglecting the basic/fundamental stances leads to nasty future slips(a kind of 'O! What happened' i.e. lack of further deep-understanding/technique-improvement).

Georgi 'Sanmayce',

I fully agree with:

http://cbloomrants.blogspot.com/2010/11/11-29-10-useless-hash-test.html

Looking at my 3chars..12chars Building-Block test I see the strong candidate for A future ultimate testbed.

Peter Kankowski,
Thanks for the link. I'm glad that he got good results with Whiz :)
Georgi 'Sanmayce',

Hi Peter,

pre-yesterday a documentary movie on History Channel about Nicola Tesla (an outstanding man not only a pragmatic visionary) inspired me to tune an almost forgotten hasher.

Here comes FNV1A_Tesla: a suitable hasher for keys [much] longer than 15 bytes (the case of 3+grams phrases).

That is the very (with the 64bitx32bit->32bit multiplication and loss of carry) hash function I was talking about previously.

In all tests, below, FNV1A_Tesla outspeeds all my FNV1A variants.

Surprisingly the bad collision rate doesn't affect its speed, I have been hit [again] by the fact that the brutal loss of data doesn't (when keys are not with ONLY weak-range lengths) hurt the lookup.

Here the speed/dispersion trade-off was made in favor of speed of course.

The function itself:

//#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Tesla(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	//unsigned long long hash64 = 2166136261; // Change with a bigger one!
	const char *p = str;
	//unsigned long long QWORD1,QWORD2; //64bit=QWORD
	for(; wrdlen >= 2*2*sizeof(DWORD); wrdlen -= 2*2*sizeof(DWORD), p += 2*2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(unsigned long long *)(p+0),5-0)^*(unsigned long long *)(p+8))) * PRIME; // loss of carry!
		//hash64 = (hash64 ^ (ROL(QWORD1,5-0)^QWORD2)) * PRIME;		
		//hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME;		
		//hash32 = (hash32 ^ (ROL(*(DWORD *)(p+8),5-0)^*(DWORD *)(p+12))) * PRIME;		
	}
	//hash32 = hash64 ^ (hash64 >> 32);
	// Cases: 0,1,2,3,4,5,6,7,... 15
	if (wrdlen & (2*sizeof(DWORD))) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME;		
		//hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
		//hash32 = (hash32 ^ *(DWORD*)(p+4)) * PRIME;
		p += 2*sizeof(DWORD);
	}
	if (wrdlen & sizeof(DWORD)) {
		hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
		p += sizeof(DWORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
	}
	if (wrdlen & 1) 
		hash32 = (hash32 ^ *p) * PRIME;
	return hash32 ^ (hash32 >> 16);
}

I am curious what amendments can be done. It is revision 1.

My 64bit knowledge/experience is next to nothing, so it would be nice somebody to refine it especially for 64bit compilers.

Some tests (on my Intel Merom 2.16GHz, Windows XP 32bit, VS2008 32bit compiler):

D:\_KAZE_new-stuff\VivaNicolaTesla>dir/oe

Volume in drive D is H320_Vol5

Volume Serial Number is 0CB3-C881

 Directory of D:\_KAZE_new-stuff\VivaNicolaTesla

03/16/2011  07:54 AM    <DIR>          ..
03/16/2011  07:54 AM    <DIR>          .
03/16/2011  07:39 AM           218,698 hash.cod
03/16/2011  07:39 AM            65,440 hash.cpp
03/16/2011  07:39 AM            87,552 hash.exe
03/16/2011  07:39 AM             8,390 BuildLog.htm
11/14/2010  02:39 PM         7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
12/03/2010  07:30 AM        42,892,307 IPS.TXT
11/14/2010  02:39 PM         4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
03/15/2011  12:10 PM       104,857,601 100MB_as_one_line.TXT
03/16/2011  07:54 AM       409,829,386 googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt
11/14/2010  02:39 PM         4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/14/2010  02:39 PM           388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/14/2010  02:39 PM       146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/14/2010  02:39 PM       278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd
              13 File(s)    998,706,809 bytes
               2 Dir(s)   2,947,858,432 bytes free

D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"

22202980 lines read

67108864 elements in the table (26 bits)

    FNV1A_Hash_Tesla:   23890797  23614936  23680579  23684698  23606344|  23606344 [ 3457538]
        FNV1A_Mantis:   24848068  24866965  24863372  25003541  24859437|  24848068 [ 3298270]
        FNV1A_Meiyan:   23836095  23832986  23858019  23818340  23992756|  23818340 [ 3345260]
     FNV1A_Jesteress:   23737579  23756975  23731544  23743013  23743112|  23731544 [ 3355676]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"

12561874 lines read

33554432 elements in the table (25 bits)

    FNV1A_Hash_Tesla:   12464491  12317094  12331733  12323774  12346488|  12317094 [ 2141464]
        FNV1A_Mantis:   12946943  12933482  12932442  12942204  13009030|  12932442 [ 2082213]
        FNV1A_Meiyan:   12583824  12417170  12440549  12465760  12441958|  12417170 [ 2111271]
     FNV1A_Jesteress:   12388725  12369952  12378952  12377230  12380569|  12369952 [ 2121868]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd"

351114 lines read

1048576 elements in the table (20 bits)

    FNV1A_Hash_Tesla:     252801    238573    234905    233660    237413|    233660 [   53107]
        FNV1A_Mantis:     253479    251841    249576    250135    254282|    249576 [   52712]
        FNV1A_Meiyan:     234582    238797    235336    239111    238004|    234582 [   52910]
     FNV1A_Jesteress:     235985    236268    234515    236577    234398|    234398 [   52684]
        FNV1A_Jester:     236458    237823^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd"

38936 lines read

131072 elements in the table (17 bits)

    FNV1A_Hash_Tesla:       9787      9429      9419      9397      9567|      9397 [    5176]
        FNV1A_Mantis:      10181     10205     10302     12326     11283|     10181 [    5185]
        FNV1A_Meiyan:       9591      9563      9530      9493      9603|      9493 [    5224]
     FNV1A_Jesteress:      15021     10163      9524      9533      9476|      9476 [    5182]
        FNV1A_Jester:       9637      9482      9586      9580      9588|      9482 [    5200]
       FNV1A_Smaragd:      11148     10041     10030     10048     10077|     10030 [    5194]
     FNV1A_Peregrine:      10117     10179      9974      9968     10116|      9968 [    5277]
          FNV1A_Whiz:       9616      9877      9679      9614     10118|      9614 [    5200]
     FNV1A_Nefertiti:      10428      9939      9870      9925     10214|      9870 [    5381]
              FNV-1a:      11328     11370     11388     11335     11216|     11216 [    5321]
    Sixtinsensitive+:      10182     10274      9987     10234     10065|      9987 [    5209]
     Sixtinsensitive:      10847     10800     10723     12666     10626|     10626 [    5347]
     Alfalfa_Rollick:      10855     10116     10006     10077     10019|     10006 [    5242]
             Alfalfa:      10422     10409     10427     10586     10946|     10409 [    5252]
        Alfalfa_HALF:      10676     10529     10602     10584     10553|     10529 [    5231]
       Alfalfa_DWORD:      10679     10883     11059     11093     11386|     10679 [    5252]
       Alfalfa_QWORD:      10759     10692     10777     10792     10745|     10692 [    5252]
           Bernstein:      11311^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv"

584879 lines read

2097152 elements in the table (21 bits)

    FNV1A_Hash_Tesla:     393365    373464    374534    375388    373557|    373464 [   81232]
        FNV1A_Mantis:     412156    414563    411944    414678    415841|    411944 [   74643]
        FNV1A_Meiyan:     383821    387854    387760    390291    387271|    383821 [   75377]
     FNV1A_Jesteress:     382663    383224    381564    380962    382974|    380962 [   75404]
        FNV1A_Jester:     384581^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt"

32359 lines read

65536 elements in the table (16 bits)

    FNV1A_Hash_Tesla:      27520     26722     27432     28257     26136|     26136 [    6937]
        FNV1A_Mantis:      28386     28415     28554     28248     27951|     27951 [    6925]
        FNV1A_Meiyan:      27598     27359     27334     27319     27338|     27319 [    6897]
     FNV1A_Jesteress:      27888     27308     27483     27310     27274|     27274 [    6883]
        FNV1A_Jester:      31040     31119     31379     30732     30596|     30596 [    6874]
       FNV1A_Smaragd:      40554     40719     42560     40805     40459|     40459 [    6849]
     FNV1A_Peregrine:      30687     30474     30474     31499     32991|     30474 [    6838]
          FNV1A_Whiz:      32561     31988     31396     31424     31415|     31396 [    6874]
     FNV1A_Nefertiti:      30534     30860     30331     30011     30143|     30011 [    6878]
              FNV-1a:      60627     60952     61696     61693     61678|     60627 [    6840]
    Sixtinsensitive+:      35231     35090     35459     35186     37219|     35090 [    6839]
     Sixtinsensitive:      38508^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash 100MB_as_one_line.TXT

1 lines read

4 elements in the table (2 bits)

    FNV1A_Hash_Tesla:     194019    197848    195058    193316    193382|    193316 [       0]
        FNV1A_Mantis:     234425    236411    236747    234573    234684|    234425 [       0]
        FNV1A_Meiyan:     240160    240099    242438    240194    242915|    240099 [       0]
     FNV1A_Jesteress:     243338    241873    239935    239735    239124|    239124 [       0]
        FNV1A_Jester:     331840^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash IPS.TXT

2995394 lines read

8388608 elements in the table (23 bits)

    FNV1A_Hash_Tesla:    2289107   2226568   2226390   2234596   2222701|   2222701 [  691369]
        FNV1A_Mantis:    2469897   2466911   2466878   2471249   2465728|   2465728 [  481137]
        FNV1A_Meiyan:    2290118   2285787   2284061   2291056   2286210|   2284061 [  593723]
     FNV1A_Jesteress:    2331767   2324661   2327224   2326173   2325028|   2324661 [  691369]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt

17981107 lines read

67108864 elements in the table (26 bits)

    FNV1A_Hash_Tesla:   19108584  18921663  19047468  18902535  18913730|  18902535 [ 4218589]
        FNV1A_Mantis:   20408048  20413041  20428172  20421550  20570044|  20408048 [ 2208686]
        FNV1A_Meiyan:   19589642  19586573  19584081  19590689  19588464|  19584081 [ 2209364]
     FNV1A_Jesteress:   19482570  19662021  19476235  19490673  19499351|  19476235 [ 2208081]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>type googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt
...
a_bacillus_and_a
a_bacillus_belonging_to
a_bacillus_closely_related
a_bacillus_closely_resembling
a_bacillus_described_by
a_bacillus_discovered_by
a_bacillus_found_in
a_bacillus_from_the
a_bacillus_has_been
a_bacillus_identical_with
a_bacillus_in_the
a_bacillus_isolated_from
a_bacillus_known_as
a_bacillus_obtained_from
a_bacillus_of_the
a_bacillus_or_a
a_bacillus_resembling_that
a_bacillus_resembling_the
a_bacillus_similar_to
a_bacillus_that_is
a_bacillus_to_which
a_bacillus_which_has
a_bacillus_which_he
a_bacillus_which_is
a_bacillus_which_may
a_bacillus_which_they
a_bacillus_which_was
a_bacillus_whose_growth
a_bacillus_with_rounded
a_back_alley_and
a_back_alley_behind
a_back_alley_in
a_back_alley_of
a_back_alley_off
a_back_alley_or
a_back_alley_somewhere
a_back_alley_that
a_back_alley_to
a_back_alley_where
a_back_alley_with
...
D:\_KAZE_new-stuff\VivaNicolaTesla>

Regards

by the will of the hash(ing) gods...,

I am hunting for an extremely fast integer->integer hashing method for working with a large array of hashtables, in particular, where there are a high number of key (re)inserts, key (re)deletions, and value (re)updates within each hashtable, as large volumes of data are processed. Currently writing in C, but open to inlining assembly if it offers nice gains.

By the will of the hash(ing) gods ... show me the way!

Peter Kankowski,
If you have a high number of delete operations, balanced tree is a better choice than hash table.
ace,

Testing avalanche on integer hash functions

http://baagoe.org/en/wiki/Avalanche_on_integer_hash_functions

Quinn Norton,

I'd love to use some hashes in PHP but not have to enable or install an extension to do so, naturally speed and efficiency would suffer, but the "portability" of the code makes the trade off worth it for my needs. I'd love to have lookup3/SuperFastHash ported to a php function, even One-At-A-Time would be great!

http://www.burtleburtle.net/bob/hash/doobs.html http://www.azillionmonkeys.com/qed/hash.html
quinn.norton (shift+2) uknowwhat.org
Your name:
Comment:

Please ignore this field: