Abstract:
Benchmark program for hash tables and comparison of 15 popular hash functions.

Created by Peter Kankowski
Last changed
Contributors: Nils, Ace, Won, Andrew M., and Georgi 'Sanmayce'
Filed under Algorithms

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

Hash functions: An empirical comparison

Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. Hashing in large databases is also left out from this article; the benchmark includes medium-size hash tables such as:

  • symbol table in a parser,
  • IP address table for filtering network traffic,
  • the dictionary in a word counting program or a spellchecker.

There are two classes of the functions used in hash tables:

  • multiplicative hash functions, which are simple and fast, but have a high number of collisions;
  • more complex functions, which have better quality, but take more time to calculate.

Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.

The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?

Multiplicative hash functions

Any multiplicative hash function is a special case of the following algorithm:

UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
   UINT hash = INITIAL_VALUE;
   for(UINT i = 0; i < len; ++i)
      hash = M * hash + key[i];
   return hash % TABLE_SIZE;
}

(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.

A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:

  INITIAL_VALUE * M^4  +  'T' * M^3  +  'O' * M^2  +  'N' * M  +  'E'

Let's enter several similar strings and watch the output of the functions:

     Bernstein Kernighan
        (M=33)     (M=31)
 too   b88af17     1c154
 top   b88af18     1c155
 tor   b88af1a     1c157
 tpp   b88af39     1c174
a000  7c9312d6    2cd22f
a001  7c9312d7    2cd230
a002  7c9312d8    2cd231
a003  7c9312d9    2cd232
a004  7c9312da    2cd233
a005  7c9312db    2cd234
a006  7c9312dc    2cd235
a007  7c9312dd    2cd236
a008  7c9312de    2cd237
a009  7c9312df    2cd238
a010  7c9312f7    2cd24e
   a     2b606        61
  aa    597727       c20
 aaa   b885c68     17841

Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.

Now let's compare top with tpp. Their hashes will be:

  INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P'
  INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'

The hashes will be different by M * ('P' - 'O') = M. Similarly, when the first letters are different by x, their hashes will be different by x * M^2.

When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).

If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.

There are no exact rules for choosing the multiplier, only some heuristics:

  • the multiplier should be large enough to accommodate most of the possible letters (e.g., 3 or 5 is too small);
  • the multiplier should be fast to calculate with shifts and additions [e.g., 33 * hash can be calculated as (hash << 5) + hash];
  • the multiplier should be odd for the reason explained above;
  • prime numbers are good multipliers.

Complex hash functions

These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:

     Paul Hsieh One At Time
 too   3ad11d33  3a9fad1e  
 top   78b5a877  4c5dd09a  
 tor   c09e2021  f2aa9d35  
 tpp   3058996d  d5e9e480  
a000   7552599f  ed3859d8  
a001   3cc1d896  fef7fd57  
a002   c6ff5c9b  08a610b3  
a003   dcab7b0c  1a88b478  
a004   780c7202  3621ebaa  
a005   7eb63e3a  47db8f1d  
a006   6b0a7a17  b901717b  
a007   cb5cb1ab  caec1550  
a008   5c2a15c0  e58d4a92  
a009   33339829  f75aee2d  
a010   eb1f336e  bd097a6b  
   a   115ea782  ca2e9442  
  aa   008ad357  7081738e  
 aaa   7dfdc310  ae4f22ec

To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?

Test conditions

The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on Pentium-M and Core i5 processors.

The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:

  • the list of common words from Wiktionary (500 items);
  • the list of Win32 functions from Colorer syntax highlight scheme (1992 items);
  • 500 names from a000 to a499 (imitates the names in auto-generated source code);
  • the list of common words with a long prefix and postfix;
  • all variable names from WordPress 2.3.2 source code in wp-includes folder (1842 names);
  • list of all words in Sonnets by W. Shakespeare (imitates a word counting program; 3228 words);
  • list of all words in La Peau de chagrin by Balzac (in French, UTF-8 encoding);
  • search engine IP addresses (binary).

Results

Core i5 processor

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
iSCSI CRC65[105]329[415]36[112]84[106]83[92]280[368]408[584]1964[2388]322[838]1.01[1.78]
Meiyan64[102]328[409]45[125]87[106]85[112]274[350]411[588]1972[2377]353[768]1.05[1.87]
Murmur272[103]378[415]48[104]109[106]106[111]315[383]450[566]2183[2399]399[834]1.21[1.74]
XXHfast3278[110]372[420]57[102]88[103]88[106]315[347]473[491]2323[2494]463[838]1.23[1.71]
SBox70[91]389[431]46[116]124[108]123[91]304[347]430[526]2182[2442]377[836]1.23[1.78]
Larson72[99]401[416]34[16]143[99]141[105]312[366]451[583]2230[2447]349[755]1.25[1.10]
XXHstrong3279[109]385[429]58[102]93[102]92[112]321[355]474[491]2332[2496]464[838]1.25[1.72]
Sedgewick73[107]417[414]36[48]143[103]143[103]319[348]446[570]2246[2437]349[782]1.26[1.33]
Novak unrolled76[113]404[399]43[90]127[118]125[113]322[342]459[581]2284[2430]379[969]1.26[1.68]
CRC-3270[101]429[426]40[64]146[107]143[94]320[338]443[563]2231[2400]357[725]1.28[1.41]
Murmur378[101]391[380]54[104]108[103]107[105]331[334]492[555]2360[2376]433[783]1.28[1.69]
x6559974[111]407[382]45[203]144[107]144[122]316[379]449[560]2221[2373]349[846]1.29[2.45]
FNV-1a74[124]408[428]47[108]144[94]144[105]309[374]440[555]2193[2446]376[807]1.30[1.77]
Murmur2A79[114]410[433]53[102]117[112]114[109]337[365]494[544]2377[2369]429[772]1.31[1.73]
Fletcher71[131]352[406]80[460]104[127]100[108]312[507]481[1052]2477[4893]388[1359]1.31[4.62]
K&R73[106]429[437]47[288]149[94]149[106]324[360]450[561]2266[2365]343[831]1.32[3.00]
Paul Hsieh80[114]410[420]54[118]123[101]121[100]336[341]496[600]2351[2380]433[847]1.33[1.83]
Bernstein75[114]428[412]49[288]150[100]150[102]324[353]460[572]2312[2380]351[703]1.34[2.99]
x17 unrolled78[109]446[415]43[24]156[113]153[102]344[368]472[589]2361[2392]373[829]1.37[1.19]
lookup383[101]459[412]55[97]140[101]137[95]359[361]526[550]2480[2392]427[834]1.42[1.65]
MaPrime2c79[103]459[426]50[106]155[91]155[106]349[349]486[550]2493[2406]406[865]1.42[1.73]
Ramakrishna80[108]513[409]44[91]189[125]186[103]370[360]483[528]2565[2383]380[840]1.51[1.66]
One At Time85[105]562[421]58[110]221[97]220[103]392[364]511[545]2659[2346]459[795]1.72[1.75]
Arash Partow83[101]560[435]71[420]215[98]212[85]392[355]507[570]2638[2372]407[779]1.72[3.88]
Weinberger87[104]590[422]37[100]254[111]273[117]398[364]541[712]2734[2547]419[744]1.78[1.75]
Hanson73[118]417[649]45[112]123[118]1207[499]318[435]448[592]2324[2890]370[833]2.70[2.46]

Pentium-M processor

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
Meiyan80[102]426[409]56[125]123[106]121[112]354[350]525[588]2443[2377]445[768]1.02[1.87]
Novak unrolled90[113]517[399]56[90]169[118]164[113]398[342]575[581]2716[2430]482[969]1.18[1.68]
Fletcher84[131]444[406]102[460]140[127]133[108]374[507]592[1052]2891[4893]513[1359]1.21[4.62]
SBox88[91]552[431]57[116]181[108]178[91]414[347]560[526]2814[2442]472[836]1.22[1.78]
Murmur297[103]532[415]65[104]165[106]162[111]434[383]622[566]2948[2399]537[834]1.25[1.74]
CRC-3290[101]565[426]55[64]198[107]192[94]427[338]590[563]2842[2400]469[725]1.26[1.41]
x17 unrolled93[109]593[415]52[24]214[113]208[102]434[368]593[589]2867[2392]486[829]1.30[1.19]
lookup394[101]565[412]70[97]189[101]182[95]432[361]631[550]2943[2392]572[834]1.32[1.65]
K&R93[106]619[437]58[288]221[94]218[106]442[360]587[561]2961[2365]447[831]1.33[3.00]
Larson95[99]631[416]49[16]231[99]228[105]455[366]599[583]3027[2447]469[755]1.35[1.10]
XXHfast32108[110]546[420]86[102]139[103]136[106]459[347]681[491]3259[2494]717[838]1.35[1.71]
Murmur3108[101]561[380]74[104]167[103]165[105]468[334]700[555]3259[2376]604[783]1.36[1.69]
Bernstein97[114]622[412]61[288]225[100]222[102]448[353]609[572]3053[2380]469[703]1.37[2.99]
XXHstrong32108[109]558[429]86[102]150[102]147[112]460[355]682[491]3262[2496]714[838]1.38[1.72]
x6559999[111]628[382]61[203]234[107]232[122]459[379]630[560]3097[2373]471[846]1.40[2.45]
Paul Hsieh106[114]576[420]82[118]183[101]178[100]456[341]678[600]3154[2380]670[847]1.41[1.83]
Sedgewick101[107]667[414]52[48]245[103]242[103]478[348]630[570]3204[2437]475[782]1.42[1.33]
Murmur2A113[114]598[433]78[102]183[112]178[109]488[365]719[544]3380[2369]651[772]1.44[1.73]
FNV-1a102[124]660[428]62[108]239[94]237[105]473[374]627[555]3140[2446]516[807]1.44[1.77]
MaPrime2c108[103]705[426]65[106]255[91]254[106]508[349]674[550]3413[2406]542[865]1.54[1.73]
Ramakrishna108[108]728[409]61[91]278[125]272[103]511[360]660[528]3378[2383]517[840]1.56[1.66]
Arash Partow106[101]739[435]93[420]280[98]275[85]514[355]671[570]3332[2372]543[779]1.65[3.88]
One At Time118[105]830[421]81[110]321[97]319[103]578[364]741[545]3809[2346]657[795]1.82[1.75]
Weinberger119[104]956[422]54[100]375[111]379[117]614[364]745[712]3973[2547]560[744]1.89[1.75]
Hanson86[118]531[649]55[112]168[118]1722[499]393[435]549[592]2742[2890]463[833]2.60[2.46]

Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Avg column contains the average normalized execution time (and the number of collisions).

The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).

As you can see from the table, the function with the lowest number of collisions is not always the fastest one.

Results on a large data set (list of all words in English Wikipedia, 12.5 million words, from the benchmark by Georgi 'Sanmayce'):

Core i5 processor

WikipediaAvg
iSCSI CRC5725944[2077725]1.00[1.00]
Meiyan5829105[2111271]1.02[1.02]
Murmur26313466[2081476]1.10[1.00]
Larson6403975[2080111]1.12[1.00]
Murmur36492620[2082084]1.13[1.00]
x655996479417[2102893]1.13[1.01]
FNV-1a6599423[2081195]1.15[1.00]
SBox6964673[2084018]1.22[1.00]
Hanson7007689[2129832]1.22[1.03]
CRC-327016147[2075088]1.23[1.00]
Sedgewick7060691[2080640]1.23[1.00]
XXHfast327078804[2084164]1.24[1.00]
K&R7109841[2083145]1.24[1.00]
XXHstrong327168788[2084514]1.25[1.00]
Bernstein7247096[2074237]1.27[1.00]
lookup37342986[2084889]1.28[1.01]
Murmur2A7376650[2081370]1.29[1.00]
Paul Hsieh7387317[2180206]1.29[1.05]
x17 unrolled7410443[2410605]1.29[1.16]
Ramakrishna8172670[2093253]1.43[1.01]
One At Time8338799[2087861]1.46[1.01]
MaPrime2c8428492[2084467]1.47[1.00]
Arash Partow8503299[2084572]1.49[1.00]
Weinberger9416340[3541181]1.64[1.71]
Novak unrolled21289919[6318611]3.72[3.05]
Fletcher22235133[9063797]3.88[4.37]

Pentium-M processor

WikipediaAvg
x17 unrolled11321744[2410605]1.00[1.16]
K&R11666050[2083145]1.03[1.00]
Bernstein11833902[2074237]1.05[1.00]
Larson11888751[2080111]1.05[1.00]
Sedgewick12111839[2080640]1.07[1.00]
x6559912144777[2102893]1.07[1.01]
Arash Partow12235396[2084572]1.08[1.00]
Ramakrishna12185834[2093253]1.08[1.01]
Meiyan12269691[2111271]1.08[1.02]
CRC-3212604152[2075088]1.11[1.00]
Murmur212713455[2081476]1.12[1.00]
SBox12716574[2084018]1.12[1.00]
Hanson12627597[2129832]1.12[1.03]
lookup312791917[2084889]1.13[1.01]
FNV-1a12868991[2081195]1.14[1.00]
Murmur312916960[2082084]1.14[1.00]
XXHfast3212936106[2084164]1.14[1.00]
XXHstrong3212950650[2084514]1.14[1.00]
Murmur2A13068746[2081370]1.15[1.00]
Paul Hsieh12992315[2180206]1.15[1.05]
MaPrime2c13348580[2084467]1.18[1.00]
One At Time13662010[2087861]1.21[1.01]
Weinberger14592843[3541181]1.29[1.71]
Fletcher37410790[9063797]3.30[4.37]
Novak unrolled37769882[6318611]3.34[3.05]

Some functions were excluded from the benchmark because of very bad performance:

  • Adler-32 (slow filling, not suitable as a hash function);
  • TwoChars (bad for machine-generated names and variable names that are similar to each other, disastrous for large data sets such as Wikipedia).

The number of collisions depending on the hash table size (for the same data set, thanks to Ace for the idea):

For 28 bits: Novak unrolled - 5.9 million collisions, Fletcher - 4.9 million collisions, Weinberger - 1.1 million collisions, x17 unrolled - 0.8 million collisions, Paul Hsieh - about 0.4 million collisions, other functions - about 0.3 million collisions

Red Dragon Book proposes the following formula for evaluating hash function quality:

sum from j=0 to m-1: b_j(b_j+1)/2 / [(n/2m)(n+2m-1)]

where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value. The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, which is not bad.

Here are the results for some of our functions:

Hash function quality (using the formula from Red Dragon book). In Numbers test: K&R and Bernstein - 1.6, x65599 - 1.2, x17 and Paul Larson - 0.8, CRC-32 - 0.9. Meiyan, FNV-1a, SBox, Murmur2, Paul Hsieh, XXHfast32, and lookup3 - between 0.95 and 1.05. In other tests all functions have the quality between 0.95 and 1.05.

Conclusion

Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (words and sonnets tests).

For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.

Novak showed bad results on the large data set. Jesteress has a high number of collisions in numbers test.

Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.

Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.

Download the source code (152 KB, MSVC++)

Variations

XORing high and low part

For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:

   return hash ^ (hash >> 16);

Subtracting a constant

x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.

Using larger multipliers for a compiler

Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.

The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).

Setting hash table size to a prime number

A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:

WordsWin32NumbersPrefixPostfixVariablesShakespeare
Bernstein % 2K145[261]880[889]426[8030]326[214]316[226]649[697]874[1131]
Bernstein % prime186[221]1049[995]445[5621]364[194]357[217]805[800]1123[1051]
Bernstein optimized mod160[221]960[995]416[5621]341[194]334[217]722[800]969[1051]
x17 % 2K137[193]847[1002]81[340]314[244]300[228]641[863]832[1012]
x17 % prime173[256]1010[1026]104[324]356[246]339[216]760[760]1046[1064]
x17 optimized mod155[256]915[1026]96[324]330[246]315[216]691[760]930[1064]

Implementing open addressing vs. separate chaining

With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:

Bernst.K&Rx17 unrollx65599FNVUnivWeinb.HsiehOne-atLookup3PartowCRC
OA4268184207889127311010392104279
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
32-bit17969741148680125105999234782
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
chain92687382888473107999514984
[500][500][24][258][124][48][100][138][131][108][1530][64]

You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.

Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.

Peter Kankowski
Peter Kankowski

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created by Peter Kankowski
Last changed
Contributors: Nils, Ace, Won, Andrew M., and Georgi 'Sanmayce'

215 comments

nils,
Hi.

Have you tried to compare the hash-functions against CRC32? That would be interesting!

Some DSP's can already do galois multiplies which is the slow part of CRC. For PC we'll have cheap CRC in the future when the new SSE becomes mainstream (hopefully).

Cool blob btw..
Peter Kankowski,
CRC implementation on x86 is slower than the other hash functions (see Paul Hsieh's tests). With SSE4, it should be faster, and it would be interesting to compare them in future. Let's wait for SSE4 :).
ace,
As far as I know, the authors of the traditional hash functions that you presented made them under the assumption that the size of the table is a prime, not some "round" number like 1024. They counted on the modulo step to spread the resulting values. So when you use 1024 as the table size, of course their functions don't fare good. Can you try the table sizes that they would use and share these results with us?
Peter Kankowski,
Thank you for the idea, I will test it this weekend (and also CRC function that Nils proposed). However, modulus of prime number will be much slower than (hash % 1024), which is optimized to (hash & 1023), so I'm not sure which one will be faster.
Arash Partow,
Hi Peter nice article wanted to mention a few things though:

Most of the function presented there produce entropy of some level at a quantity of 32-bits. What you are doing by mod'ing by 1024 is ignoring the 22-bit higher bits. You should integrate them back into the result somehow.

A true test would not quantize the values. Instead it would just make a list of already generated values and see if those generated values occur again within the period. The period being the size of the "common words" etc.

Another good test is to see the avalanching abilities of the functions. In this case you only change 1 bit in the input and then see how many of the output bits change. The average should be close to about half the number of output bits(eg: 16 bits in the case of 32-bit outputs)

Another good thing to remember is to use random values, than only English or some such as you will see that in the case of English only certain bits in a byte are likely to change, how many times do we use control characters in English words?

I believe if you follow the above you may see different results...
acd,
Arash, I think that Peter didn't want to investigate "avalanching abilities" or something like that -- his goal was to evaluate the functions on the "good enough" principle in some specific example.
Peter Kankowski,
Arash, for multiplicative functions, the tests mix together higher and lower 16 bits with XOR, so higher bits are not ignored.

My purpose was to benchmark the hash functions in close-to-real-life scenario. Theoretical tests will give a completely different result.

For example, cryptographic hashes such as MD5 will give a perfect distribution, but they are impractically slow for hash tables. Good results in a theoretical test not always mean a fast hash function.
ace,
Nice work.
So, even if the K&R idea is close to the reinvented "simple steps for each character, complex step at the end", the function they published is still not good enough for your "numbers" test, but otherwise is not so bad. And using primes is not so good idea.
But if I remember, the "K&R" code you use is still not the "from the book" one as they don't xor higher and lower bits?
And also, if I properly remember the K&R book, they use what Wikipedia article about "hash table" names "separate chaining". Would you compare how everything behaves then? Of course, for good results, you shouldn't allocate each new node with a separate malloc call.
Peter Kankowski,
You have a good memory :). K&R don't XOR the lower and higher halves, but I tested both versions, and XORing was usually faster because of a lower number of collisions:

                    Bernstein:    K&R:           x17:          x65599:        FNV-1a:
Words without XOR   151 [  308]   147 [  196]    140 [  240]   141 [  293]    152 [  272]
Words with XOR      147 [  261]   142 [  194]    137 [  193]   139 [  250]    151 [  262]

Win32 without XOR   893 [ 1087]   892 [  932]    839 [  880]   866 [ 1049]    963 [ 1076]
with XOR            889 [  889]   897 [ 1006]    847 [ 1002]   843 [  816]    951 [ 1021]

Numbers without XOR 825 [18330]   869 [20810]    76 [  360]    298 [ 5400]    91 [  409]
with XOR            453 [ 8030]   842 [19533]    81 [  340]    205 [ 3158]    88 [  207]

Prefix without XOR  333 [ 262 ]   329 [  192]    316 [  235]   329 [  251]    374 [  260]
with XOR            328 [  214]   339 [  274]    314 [  244]   319 [  200]    368 [  233]

Postfix without XOR 320 [262  ]   320 [  234]    304 [  237]   318 [  265]    355 [  211]
with XOR            316 [  226]   323 [  256]    300 [  228]   316 [  316]    356 [ 237 ]

Variables without X 671 [  948]   668 [  762]    654 [ 1109]   642 [  883]    697 [  787]
with XOR            658 [  697]   667 [  811]    641 [  863]   641 [  904]    693 [  790]

Shakespeare without 882 [ 1151]   915 [ 1168]    844 [ 1019]   837 [ 1135]    890 [  982]
with XOR            884 [ 1131]   893 [ 1188]    832 [ 1012]   838 [ 1111]    908 [ 1063]


For K&R function, the version without XOR seems to be better, so I will use it in future tests. FNV authors recommend using XOR, though their function often works better without it. XORing also does not help complex functions, because they already mix the bits.

The choice between separate chaining and open addressing depends on your task. If you need add-only hash table (a symbols table in compiler, a word counting program, etc.), open addressing will be faster because of better data locality for caching.

If you need to delete items from the table, separate chaining may be faster. Or may be not: note that in Python dictionary, they use some complex variation of open addressing, not separate chaining (see Beautiful code book). I will not do the tests, because the results will depend on the usage pattern of a particular application (how often does it delete items? which items? how many times is the table resized? etc.) You should benchmark it on some specific task.
ace,
I expect much less collisions and much better behaviour with separate chaining in the "numbers" test with the K&R function, that's why I asked.
I always used separate chaining for my hashes. Unless you know at the start how big hash you need, separate chaining behaves better.
ace,
Comparing K&R and x17 using efficiently implemented "separate chaining", for all your input sets I get:

   Kernighan&Ritchie: |       179 [  234]
                 x17: |       179 [  198]

   Kernighan&Ritchie: |       971 [ 1022]
                 x17: |       971 [  982]

   Kernighan&Ritchie: |       108 [ 1000]
                 x17: |        84 [   48]

   Kernighan&Ritchie: |       311 [  226]
                 x17: |       310 [  214]

   Kernighan&Ritchie: |       299 [  234]
                 x17: |       299 [  226]

   Kernighan&Ritchie: |       773 [  822]
                 x17: |       782 [  868]

   Kernighan&Ritchie: |      1116 [ 1282]
                 x17: |      1137 [ 1276]


As expected, x17 is noticeably faster only in your "Numbers" test set. Otherwise the numbers of collisions are very similar. Also note that the pathological behaviour is fully avoided.
ace,
In the previous message, the missing numbers eaten by the blogging software on the right from 299s are:

  234
  226
ace,
For the reference, the results of your original code on my machine:
   Kernighan&Ritchie: |       186 [  196 ]
                 x17: |       183 [  193 ]

   Kernighan&Ritchie: |       987 [  932 ]
                 x17: |       992 [ 1002 ]

   Kernighan&Ritchie: |      1056 [20810 ]
                 x17: |       109 [  340 ]

   Kernighan&Ritchie: |       321 [  192 ]
                 x17: |       329 [  244 ]

   Kernighan&Ritchie: |       304 [  234 ]
                 x17: |       304 [  228 ]

   Kernighan&Ritchie: |       793 [  762 ]
                 x17: |       801 [  863 ]

   Kernighan&Ritchie: |      1153 [ 1168 ]
                 x17: |      1132 [ 1012 ]
Peter Kankowski,
Thank you very much! It looks to be faster than my program. May I ask what table size did you use? How was the memory allocated?

Could you please share your code via an upload site (such as RapidShare.de or Zalil.ru)?
ace,
I kept the table sizes the same as you did. As I've already said, the malloc was not called.

Now, your code depends a lot on knowing exactly how many strings it has to process before it determines the table size. And if we use more elements for the same table size my program will run circles around yours.

I've checked now the wikipedia entry I've mentioned (the first time I've just looked up how they call "chaining" and didn't read anything) and I can just tell you -- don't believe them when they write:

"in most cases the differences between these algorithms is marginal, and other considerations typically come into play." ( false )

It looks to me that the author of wikipedia article "invented" that conclusion (that's against the wikipedia policy, but wikipedia shouldn't be used as authoritative reference anyway).

It seems that your statements "The choice between separate chaining and open addressing depends on your task. If you need add-only hash table (a symbols table in compiler, a word counting program, etc.), open addressing will be faster because of better data locality for caching" are also based on that wikipedia article and not on your actual or thought experiments.

Properly implemented, chaining is always much faster in "harder" cases and has comparable performance otherwise. Using it is more important then trying to be smart with the hash function. And of course, in practice there's seldom any benefit of using anything more complicated than the K&R-style function -- in that aspect you were on the right track. Only for very long strings, there can be benefit of processing more bytes at once during hash index calculation. In any real use, the time spent in hash index calculation is not what makes any program slow, not using chains often is (and of course, using mallocs where they are not needed alwasy slows things down).

I'm sorry that I can't upload the code, but it's easy to repeat the results. I've counted the "collisions" any time the string comparison to equality failed.
Peter Kankowski,
For fair comparison, table size should be smaller for separate chaining.

Separate chaining needs more memory because of the next item pointers (usually 3 times more than open addressing with the same table size), so people reduce the size of the table to save space. Sedgewick recommends setting table size to 1/5..1/10 of the number of items (on average, 5..10 items in each chain). For open addressing, he uses table size = 2..4 * N.

If you make a sparser table, the search will be faster (a classical time-space compromise), and it's true for both methods. Open addressing would also be faster if you take a larger table. I would use not 1/5..1/10, but somewhere around 1/2..1 for separate chaining. Memory consumption should be the same, so it will be fair.

I've counted the "collisions" any time the string comparison to equality failed.
You probably counted them twice (when inserting into the table and when searching in it). That's why your figures for collisions should not be compared with mine.

...based on that wikipedia article and not on your actual or thought experiments
My statements are based on the understanding of computer architecture and caching. Open addressing requires less memory accesses. If the string comparison fails, it reads the next item, usually from the same cache line. Separate chaining reads from non-adjacent cache lines.

There is another way to reduce the number of memory accesses (used in Python dictionary implementation). Let's store a full 32-bit hash in the hash table cells. It would require twice more memory, but string comparison will be rarely needed (only when the hashes are equal). I will probably try this.

About prime numbers: I just found in Sedgewick's book that they were used in modular hashing (not to be confused with multiplicative hashing). "Before high-level languages appeared" (i.e., long before your and my birth :), they used something like this:


UINT hash(const CHAR* str) {
return *(UINT*)str % TABLE_SIZE;
}


Here, TABLE_SIZE should not be the power of 2, or else it will mask out the letters in higher bytes. Today this function is not very useful, because division is much slower than other operations on modern processors. It also will show terrible results in the "Prefix" test.

What is more important is that your program with separate chaining eliminates the worst-case ("Numbers" test). It's really good.
nils,
> Here, TABLE_SIZE should not be the power of
> 2, or else it will mask out the letters in
> higher bytes. Today this function is not very
> useful, because division is much slower than
> other operations on modern processors. It
> also will show terrible results in
> the "Prefix" test.

Wait wait wait, guys..

No compiler does a slow division to get the modulo of a constant these days. No matter if your constant is a prime or a power of two.

Two multiplications and some adds and subs is all you need for any number. Powers of two just need an logical and (as everyone here knows).

So using a modulo is not *that* much slower. I think memory access times are much more important.


Besides that: The number of collisions become very important if the data you hash is large. For things like filenames you can get away with a simple hash, but try to hash megabyte large blobs of data, possible with paging from disk.

You'll be glad about any collision you don#t have in these cases.
ace,
> usually 3 times more than open addressing with the same table size

3 times more? Of what? Certainly not in my code. If you can't imagine it without actually programming, then try to implement it efficiently -- you'll see then.


> Memory consumption should be the same, so it will be fair.

In my view, to be fair

1) Try to fix the table size before you know how many elements you have to hash (you have to do this for almost any real life purpose). With open addr, you will either use insane amounts of memory "just to be sure" or will have to make new tables each time you cross some percentage of table occupancy. That percentage is important, since the rest of the pointers is going to be unused. And making a new table on these points is also expensive (you have to rehash everything again).

2) Forget about nice power of two table sizes -- once you care about the memory you can't allow the luxury of having each table size twice as big as previous.

Everything else is not realistic enough.


> You probably counted them twice (when inserting into the table and when searching in it)

You're right. So you count "collision" only after inserting? Moreover, should "collision" be every cmp or only "did it hit empty table entry or not"? Personally I'd avoid the term "collision" at all -- it would be better to count some specific operations which have some specific costs (cmp, hash calc etc). Or if we observe "table occupancy" it's also not the full information there once the chaining exists. Etc. More or less boring things -- personally I wouldn't mix the speed tests with the evaluations of the properties of the hash functions or the program strategies.


> Separate chaining reads from non-adjacent cache lines.

What makes you think that?


> About prime numbers: (...) they were used in modular hashing (..) "Before high-level languages appeared"

Exactly! That's why people who knew about that even later tried to keep the table sizes uneven. However I still stumble on the texts that preach the major importance of primality (and I never believed them). On another side, most of the times you don't have the luxury of having hash tables with only of power of 2 sizes, so the "modulo" calculation is hard to be avoided in practice -- that's why I liked to see it in the tests. :) As the added benefit, it demonstrated that dependence of hashes on primes is not really needed.

Btw, I've just checked K&R 2ed to see their example, now you know the whole story behind their

#define HASHSIZE 101


line. :)

(speaking of defines, in your cpp code you should use enums instead)
ace,
nils, you're right, once the access to the hashed elements is costly, the investment in the hash function with minimal collisions is the best way to go.
Re modulo operation, as far as I know, it can't be avoided if the the table size is not known in advance.
ace,
Chaining, table size 1/2 of that from the "open addr" code, counting "collisions" only in 2nd phase:

Numbers / 500 lines / 512 elements in the table
           Bernstein:        109 [  500 ]
   Kernighan&Ritchie:        111 [  500 ]
                 x17:        101 [  168 ]
              x65599:        134 [  780 ]


Orig code (no chaining, table size twice as big as in chaining):

Numbers / 500 lines / 1024 elements in the table
           Bernstein:        516 [ 8030 ]
   Kernighan&Ritchie:       1056 [20810 ]
                 x17:        109 [  340 ]
              x65599:        262 [ 3158 ]
Peter Kankowski,
Nils, thank you for the idea. I did not know about possible optimization for modulus. Just tried some constants on MSVC++ 2005 (full optimization, favor fast code):

; UINT j = i % 257;
mov  eax, 0xff00ff01
mul  esi
shr  edx, 8
imul edx, 257
mov  ecx, esi
sub  ecx, edx

; UINT j = i % 127;
mov	 eax, 0x02040811
mul	 esi
mov	 eax, esi
sub	 eax, edx
shr	 eax, 1
add	 eax, edx
shr	 eax, 6
imul	 eax, 127
mov	 ecx, esi
sub	 ecx, eax


It's equivalent to j = i - i / 257 * 257, where i / 257 is calculated with MUL. Unfortunately, this optimization is impossible if table size is not constant.

There is also a method for optimizing modulo of 2^N-1 (Mersenne primes). Note that his code is wrong for large numbers. The correct, slower code will be:


UINT mod127(UINT k) {
const UINT p = 127, s = 7;
do {
k = (k & p) + (k >> s);
} while (k > p);
return k == p ? 0 : k;
}


You are right about the case when there is a lot of data. Paul Hsieh said his function is used by Adobe, Apple, and Google. As far as I understand, they use it for large amount of data (such as Google web search index). Complex functions perform better in this case, and his function is one the best for this kind of tasks.
Peter Kankowski,
> 3 times more?

I'm sorry, not 3 times, but N additional pointers for the next items, where N is the number of strings inserted into the table.

About non-adjacent cache lines. With separate chaining, you need to look up the hash value in the index table, then walk to the key-value pairs (see the picture). They are usually located far from each other in memory. If the value is not found in the first record, you need to follow the next pointer, which also can be far from the first one.

Now imagine open addressing with stored 32-bit hash. In case of collision, it will read from adjacent memory cells (usually, no need for string compare), so it may be faster than other methods. I will test this.

A hybrid scheme looks interesting: items in the collision chain will be close to each other, so it should be fast. Another scheme.


> And making a new table on these points is also expensive (you have to rehash everything again).
Resizing hash tables is another interesting topic. For now, let's assume that we know the approximate table size before starting hashing. For example, in a word counting program you could divide the size of the text file by the average word length to get the estimated size.

Thank you for the results; they are really impressive. Why can't you share your code? Zalil.ru is really easy to use (even with no knowledge of Russian language): just choose your file. It will upload it and give you a link on their server. RapidShare is another popular server.
ace,
> About non-adjacent cache lines. With separate chaining (...)

I still claim you didin't try to imagine how it should be implemented.

> A hybrid scheme

Just another bad wikipedia article with unsubstantiated claims, written by the author who doesn't understand the topic, or maybe just edited to stupidity by the following "contributors" -- I'm not interested to check how it happened.

> Another scheme

This at least doesn't fail the "first glance test".

> For example, in a word counting program you could divide the size of the text file by the average word length to get the estimated size.

The perl script is 25 lines, the input files are 5 GB each. Estimate how big hashes should be constructed during the interpretation of the script on a 256 MB machine.

> Why can't you share your code?

My code is not important. I know you were able to understand everything (even without actually programming!) if you didn't stick to your initial idea and code.
Nils,
Hi Peter,

You can simply calculate the fast modulo using reciprocals yourself. No need to rely on the compiler optimization.

The code to find the magic constants for reciprocal multiplication can be found here: http://www.hackersdelight.org


The hashtable size does not change that often, so it's not a big deal.

Btw, I may be wrong, but I remember that compilers generate better code for the optimized division when you work with unsigned values for the index and the prime-constant.
Arash Partow,
Hi Peter,

Perhaps you could use the following code:



#include
#include
#include
#include
#include
#include

template
inline bool read_into_sequence(const std::string& file_name, Sequence& buffer)
{
std::ifstream file(file_name.c_str());
if (!file) return false;
std::string line;
while (std::getline(file,line))
{
buffer.push_back(line);
}
file.close();
}

unsigned int ap_hash(const std::string& str)
{
unsigned int hash = 0xAAAAAAAA;

for(std::size_t i = 0; i < str.length(); i++)
{
hash ^= ((i & 1) == 0) ? ( (hash << 7) ^ str[i] * (hash >> 3)) :
(~((hash << 11) + str[i] ^ (hash >> 5)));
}
return hash;
}

template
void collision_test(const StrSequence& str_list,
const HashFunction& function,
const IntSequence& quantizer_list)
{
typedef std::map hash_value_list_type;
std::vector hvl(quantizer_list.size());
for(StrSequence::const_iterator it = str_list.begin(); it != str_list.end(); ++it)
{
unsigned int hvlit = 0;
for(IntSequence::const_iterator qit = quantizer_list.begin(); qit != quantizer_list.end(); ++qit, ++hvlit)
{
unsigned int hash_value = function(*it) % (*qit);
hash_value_list_type::iterator hash_it = hvl[hvlit].find(hash_value);
if (hash_it != hvl[hvlit].end())
hash_it->second++;
else
hvl[hvlit].insert(std::make_pair(hash_value,1));
}
}

for(std::vector::iterator hvlit = hvl.begin(); hvlit != hvl.end(); ++hvlit)
{
unsigned int total_collisions = 0;
for(hash_value_list_type::iterator it = (*hvlit).begin(); it != (*hvlit).end(); ++it)
{
if (it->second > 1)
{
total_collisions += (it->second - 1);
}
}
std::cout << total_collisions << "\t";
}
std::cout << std::endl;
}

int main()
{
std::vector str_list;

if (read_into_sequence("words.txt", str_list))
{
std::cout << "ERROR - Could not read list!" << std::endl;
return 1;
}

std::vector quantizer_list;
quantizer_list.push_back(std::numeric_limits::max());

quantizer_list.push_back(256); quantizer_list.push_back(512);
quantizer_list.push_back(1024); quantizer_list.push_back(2048);
quantizer_list.push_back(4096); quantizer_list.push_back(7477);
quantizer_list.push_back(8011); quantizer_list.push_back(8192);
quantizer_list.push_back(8329); quantizer_list.push_back(9059);
quantizer_list.push_back(16384); quantizer_list.push_back(32768);
quantizer_list.push_back(65536); quantizer_list.push_back(131072);
quantizer_list.push_back(262144); quantizer_list.push_back(469207);
quantizer_list.push_back(524288); quantizer_list.push_back(544771);
quantizer_list.push_back(711209); quantizer_list.push_back(902677);
quantizer_list.push_back(1048576); quantizer_list.push_back(1299289);
quantizer_list.push_back(2097152); quantizer_list.push_back(4194304);
quantizer_list.push_back(8388608); quantizer_list.push_back(16777216);

collision_test(str_list,ap_hash,quantizer_list);

return 0;
}
Peter Kankowski,
Thank you again, Nils. As Warren says in his book, unsigned values require additional operations. In x86 ISA, there is no SHRXI instruction that he uses, so we have to use a longer code with 2 SHRs.

When you first said about modulo optimization, I thought that it's possible to generate the code on the fly. In this case, we can use faster sequences for some divisors instead of general and slower code, but code generation can slow down the whole program. Another solution is to select one of 32 pre-generated functions for prime divisors near to 2^K (one function for each possible value of K).

In most cases, using prime numbers provide similar number of collisions to that of 2^K table sizes (see the table above), so I'm not sure if using these tricks worth a cost. It will not be much faster than the original program, but it's interesting and I will possibly try this. Thank you very much for your contribution.
Peter Kankowski,
Ace, please show your code; it will explain everything. I just don't understand your idea.

> the input files are 5 GB each. Estimate how big hashes should be constructed...
I agree with you, it should have some reasonable limit on hash table size. However, the basic idea is correct. It makes difference if you count words in a 20 KB article or in a multi-megabyte book such as "War and Peace". You can use the file size to estimate the size of the table and avoid resizing it too often.
Peter Kankowski,
Arash, thank you for following the discussion. Your code is good for evaluating the number of collisions, but it's not suitable for performance comparison. I just use a different method for comparisons, please try to understand it.

Your function shows mediocre speed in my tests because of using ((i & 1) == 0) condition that was not optimized to branchless code by the compiler. Weinberger's function has the same problem (on old computers, the cost of branch misprediction was lower, so I guess his function was quite fast for his time). On modern computers, branches on random data should be avoided in inner loops whenever possible.

In other aspects (the number of collisions) your function is good and shows quite nice results. You should try to avoid the branch inside the loop.
ace,
Peter, consider differencies of open addressing in your implementation (OA) vs chaining (C) for 500 strings. OA: 1024 pointers in the table and the strings. C: 512 pointers in the table. You already answered how many pointers more are needed. Now look at the results from OA and C -- everything is there. The OA needs more comparisons than C. It's obvious why -- even the first collision in OA spoils some other entry. The results show how that progresses. In C, all entries of one index value don't influence another. That's why C is better.

Finally, this "cache-hits bla bla bla" what somebody wrote and you stuck to is a nonsense, as I've already said. Check what happens in OA collisions: the code can't just play with pointers, it must do string comparisons, so OA in that case is not better.

Just start implementing C.
Arash Partow,
Hi Peter,

Thanx for the comments and advice on my function, I'm aware of the speed issues in the published versions the only reason why it and other hash function on my site are described like that is for explanatory purposes. In fact most of the hash functions on the page can be speeded up trivially by unrolling the loops somewhat.

For example when I use my hash function in real life I remove the if statement and process 2 or 4 or 8 bytes at a time rather than the 1 byte per loop. In any case a good optimizing compiler should see that the branching statement is not related to the data but the iterator and from there should be able to come up with a decent execution path. An example of how this optimization can be done:


unsigned int ap_hash(const char* begin, const char* end)
{
unsigned int hash = 0xAAAAAAAA;
unsigned int length = static_cast(end - begin);
unsigned int rounds = length / 2;
const char* it = begin;
for(std::size_t r = 0; r < rounds; ++r)
{
hash ^= (hash << 7) ^ (*it++) * (hash >> 3);
hash ^= ~((hash << 11) + (*it++) ^ (hash >> 5));
}
if (1 == (length & 0x01))
{
hash ^= (hash << 7) ^ (*it) * (hash >> 3);
}
return hash;
}


Note: the above can be done to make of the hash function presented.

That said a purely "collisions oriented" hash function test suite would be great to have, I'd be willing to collaborate with you and others here to come up with a set of tests (BJs or PHs) to definitively test hash functions.

The reason why I say this is from experience, good hash functions from a collisions pov are easier to optimize for speed than good hash functions from a speed pov are upgraded to perform well from a minimizing collisions pov.
ace,
Arash,

> In any case a good optimizing compiler should see that the branching statement is not related to the data but the iterator and from there should be able to come up with a decent execution path.

Peter uses MS compiler. I've tried with gcc 4, and it also can't make such optimization. Which compiler can?
Peter Kankowski,
Ace, thank you, I will do the version with chaining.
Arash Partow,
Ace,

>>Which compiler can?
I'm not sure if any compilers do at this point, that is why I reorganize things. The point I was trying to make was that one-at-a-time hash's should have their loops unrolled (where possible).
Peter Kankowski,
Arash, with this optimization, your function looks much better. I will include it in the next benchmark.

About the test suite. What do you think about finding the average collision chain length in addition to the number of collisions?

There are probably other valuable metrics. I'm not so good at math aspects of hash functions (programming always was more interesting for me :), so if you have other ideas, please propose them. You seems to have more experience in math than me. BTW, what are BJs and PHs?

> The point I was trying to make was that one-at-a-time hash's should have their loops unrolled (where possible).
Thank you, I will try to unroll other functions, too.
ace,
Peter, "average chain length" hides "maximal chain length" which can in OA have bigger impact. I would simply count the number of hash calls and the number of string comparisons for each algorithm, for different percentage of occupancy of slots (that's what you don't do now) and different input sets.

You did a good job identifying a good examples for weaknesses of OA with linear probing.

Regarding investigation of hash functions and algoritghm properties independently of speed, did you know you can do integer aritmetic in Perl just like in C?

sub myh
{
	use integer; # important
	my $h = 0;
	for my $i ( 0..9 ) {
		$h = $h * 33 + $i;
		my $v = $h & 0x1ff;
		print "$i\t$v\n";
	}
}

myh();


gives

0	0
1	1
2	35
3	134
4	330
5	143
6	117
7	284
8	164
9	301
ace,
Arash,

>>> In any case a good optimizing compiler should
>> Which compiler can?
> I'm not sure if any compilers do at this point,

Thanks. Which one do you use? I guess MS?

Anybody to try Intel?
nils,
I could run a GCC 4.2.2 on cygwin this evening.
ace,
nils, thanks for possible 4.2.x test.

When I said Intel I was referring to the "Intel Compiler" (which is distributed with some licence managing -- limiting software even for Linux).

I've tried the latest gcc from 4.1 branch, and Peter uses VS 2005, but as far as I've seen up to now VS 2008 doesn't introduce some significant optimizations on the function level, that's why I asked about Intel Compiler.

Still, it is possible that gcc 4.2 has some new optimizations (I haven't analyzed what's new in 4.2), so it's worth that you try it.

But my question on that topic is still -- is there anybody to try Arash's original code on the Intel Compiler to see if the bit check of i can be moved outside of the loop by compiler alone?
Arash Partow,
Hi Peter, Ace,

>>Arash, with this optimization, your function looks much better. I will include it in the next benchmark.
Can't wait to see the new results. :)


>>About the test suite. What do you think about finding the average collision chain length in addition to the number of collisions?

Average collision analysis is a moot point with regards to hash function, due to two reasons, a hash function is inherently a prng so its behavior is best described as a poisson process (when was the last time someone wanted to know the average value the mersenne twister gives?), and secondly average collisions would require a quantizer value - this as you can see is another debatable issue, what value to use? should it be prime or can it be a power of 2(for efficient "and"ing etc)?

As Ace suggested the maximal chain length would be a nice fact to know and a good measure. Knowing the mean chain length, and the varaince from the mean then assessing how close that is to the maximal would be another to know but these should be done with and without quantizers.

With regards BJ and PH, Bob Jenkins and Paul Hsieh, they both advocate analysis of avalanching properties - which I agree is important and does reveal a lot more than basic collision analysis. In short coming back to the prng model for hash functions, given some random string of random length, and a hash function that outputs n-bits, a good hash function as far as avalanching goes (not mentioning strict avalanching criteria) is one that has probability of the i'th bit in the output being 1 close to 0.5. A hash function that has some i'th output bit that is predictable eg: always 1 or most times 1 implies that the output bit is either not being included properly in the mixing process or that whatever mixing is occurring results in it being in said way, a simplistic example of this is if the result of a particular bit is the output of an "OR" gate with inverted inputs (C = A or ~A)


>> Which one do you use? I guess MS?

I use Intel (icc) 10.x, and it doesn't do it. But there is a good reasons why, as far as the c++ standard is concerned there are some issue which regards to reference aliasing that the optimizer in the compiler detects, and as a result is not capable of optimizing away. It may change in the 2009 standard ratification if threading is introduced as according the Sutter for threads to be apart of the language definition the language should provide some guarantees about memory model and some-such this would inherently result in positive side effects for such optimizations.

In some cases I find it better to craft the code specifically rather than relying on some black-box of a compiler's optimizer, you may write something and it may look good in contrived test cases, but then when used elsewhere the compiler may see other possible optimizations and as a result do "other" things. I guess that is why even today you still see people implementing data structures such as RB-Trees etc purely as macros.
ace,
> The question remains, should you look for a better hash function or rely on chaining to fix the worst-case behavior.

Chaining can be used where OH can't. So when implementing only one strategy, not implementing chaining is the wrong choice. And once there's chaining, there's no real reason to implement OH. Second, with chaining, most hash functions will have the same effect -- the difference is hard to recognize in most of the cases. But, where the number of collisions is more important than speed (where access to the data is really expensive), not surprisingly, CRC32 proved to be the very safe bet. And for tables with up to 2^16 elements CRC16 can probably be enough.

Now, Peter, your function gives quite consistent good results in speed and number of collisions, at least in your set of tests! It's amazing that 17, even smaller constant than 33 and 31 went unrecognized up to now. Congratulations! Of course, this will not give significant speed up in practice (unless in cases where somebody unwisely used OH), where widely used functions all have the similar speeds.

But you also nicely demonstrated that the effort to develop the complicated functions (excluding CRC of course) was practically unneeded.

Here is some kind of "overall speed scores" based on the normalized results of the chaining tests with your code (I gave all test groups the same weight, that is maybe controversial) with

g_table_size_mask = NextPowerOfTwo(i) - 1;


that is, the table twice as small as in your OH tests.

(In chaining, only one pointer per string can be enough. The string can simply immediately follow the "next" pointer.)

Compiled with /O1

1.00 x17 unrolled
1.02 x17
1.04 Kernighan&Ritchie
1.06 Bernstein
1.08 x65599
1.13 Paul Hsieh
1.13 FNV-1a
1.14 universal
1.20 CRC-32
1.26 One At Time
1.34 Arash Partow
1.47 Weinberger
11.62 lookup3


(lookup3 is the only one really dangerous here -- not to be used unless it can be checked that the compiler optimized what was necessary to optimize to even use the function)

Compiled with /O2

1.00 x17 unrolled
1.00 x65599
1.00 x17
1.05 lookup3
1.07 universal
1.08 Paul Hsieh
1.09 FNV-1a
1.10 Kernighan&Ritchie
1.11 Bernstein
1.16 CRC-32
1.23 One At Time
1.32 Arash Partow
1.35 Weinberger


Other common cases not covered by your tests are hashing of integers, and hashing of memory addresses. In both cases there is a limited number of bytes, some patterns occur often and there's still the need to hash fast and good.
peter_k,
> Nils, thank you for the idea. I did not know about possible optimization for modulus. Just tried some constants on MSVC++ 2005 (full optimization, favor fast code):

Hmm, i've wrote about it in your blog before :-]
http://smallcode.weblogs.us/2007/01/05/what-your-compiler-can-do-for-you/
In most cases divide and modulo (together) can be done using only one MUL
Peter Kankowski,
Yes, chaining seems to be better choice than OA. I'm afraid that my function also may show bad results on some data set, and chaining really helps in such cases.

My function differs from the others not only by multiplier, but also by subtracting 0x20 from each character, which helps to "pack" more characters in the hash value. Certainly, this will work only for ASCII strings (you should use a different function for random binary data).

Arash said a lot about PRNG-type hash functions (with avalanching effect), but these functions are actually less effective for short strings than non-random multiplicative functions. If the table is small enough, a multiplicative function can pack all input characters into integer, so the hash value will be unique.

Thank you for the overall score figures and especially for lookup3 test. This function is too complicated for both compilers and humans :).

Do you mean hashing 32-bit integers and memory addresses? In this case, different hash functions must be used. Modulo prime should work well, and the optimization proposed by Nils will be very useful.
Peter Kankowski,
> Knowing the mean chain length, and the varaince from the mean then assessing how close that is to the maximal.
Arash, thank you very much for these ideas. If I will have some free time, I will implement them in my tests. You have a good framework for theoretical tests, so you could try them, too.

Prime quantizers are not needed for hashing strings; ANDing will do better.

About avalancing: multiplicative functions are not random at all, but they often show better results than PRNG-like functions.

Take a look at Python sources. They use a simple multiplicative hash function with M = 1000003 (search for "string_hash" in file stringobject.c). In dictobject.c, they say:

Most hash schemes depend on having a good hash function, in the sense of simulating randomness. Python doesn't, [...] but this isn't necessarily bad.

Read also their notes on optimizing dictionaries.

Thank you again for your help. Your function shows much better results after unrolling, and mine also wins from it :).

> i've wrote about it in your blog before
Peter, sorry, I forgot about this. I knew about "magic numbers" for division, but never thought that it can be applied to modulo. BTW, you will need additional shifts anyway, so it's not only one MUL.
ace,
> but also by subtracting 0x20 from each character

Many years ago, at the time I considered such a modification, I believed it shouldn't bring anything. However I never used OH or searched for pathological examples. Have you tried to get the results without that subtraction?

> Thank you for the overall score figures

That was on Core 2. I've made the same scoring on Pentium III, /O2 and received:

1.00 x17
1.02 x17 unrolled
1.07 lookup3
1.08 x65599
1.08 Kernighan&Ritchie
1.10 Paul Hsieh
1.13 Bernstein
1.15 universal
1.16 CRC-32
1.19 FNV-1a
1.27 One At Time
1.34 Arash Partow
1.38 Weinberger


Note that here a function with 1.16 is on average 16% slower than the best one. Of couse a bit of reasonable doubt remains as you selected the test sets. :)

In practice the hash functions are called between other processing, so the speed of actual hash functions can contribute even less to the overall speed. In such cases some shorter function can be better even if in tests like yours it appears slower than some with the bigger footprint. On another side, when we really want to minimize the number of collisions, it would be interesting to know "collison goodness factor" from your tests -- using the same method I get the following scores:

1.00 x17
1.00 x17 unrolled
1.03 CRC-32
1.05 lookup3
1.08 universal
1.11 Paul Hsieh
1.12 FNV-1a
1.13 One At Time
1.35 x65599
1.35 Bernstein
1.36 Weinberger
1.36 Kernighan&Ritchie
2.34 Arash Partow


Now, I also considered the possibility that your function gets too good score here since you selected the pathological example and possibly optimized your function only for that. So I removed the "numbers" results and the scores are:

1.00 lookup3
1.01 x17
1.01 x17 unrolled
1.03 CRC-32
1.05 universal
1.05 Paul Hsieh
1.06 Bernstein
1.06 x65599
1.06 FNV-1a
1.06 Kernighan&Ritchie
1.06 One At Time
1.07 Arash Partow
1.17 Weinberger


Note that most of the functions are very close. Not counting 'Weinberger', all inside of 7% span!

(Still, I'd rather count the number of comparisons and not "collisions".)

> Modulo prime should work well, and the optimization proposed by Nils will be very useful.

How much is the gain of using these compared to a single DIV in processor cycles? Can that method be implemented to work for any N? Can it be programmed to work without programmer manually testing constants (class FasterDiv) where the constants are calculated once (but in the runtime) and then often used?
Peter Kankowski,
> Have you tried to get the results without that subtraction?

It was almost always slower:
               Words       Win32      Numbers    Prefix     Postfix   Variables Shakespeare
x17            146[237]   890[980]    85 [351]   339[277]   320[234]  684[1196]   875[1059]
x17 - 0x20     138[193]  848[1002]    81 [340]   313[244]   300[228]   640[863]   839[1012]
K&R            143[196]   892[932]  866[20810]   329[192]   320[234]   658[762]   878[1168]
K&R - 0x20     136[187]   842[917]  863[20810]   312[201]   302[207]   636[852]   840[1050]
Bernst.        145[261]   880[889]   426[8030]   326[214]   316[226]   649[697]   874[1131]
Bernst. - 0x20 139[229]   850[903]   468[9295]   319[276]   303[263]   640[756]   841[1025]


> How much is the gain of using these compared to a single DIV in processor cycles?

           % 1031 and 257     % 127
DIV            39                39
mod_table_size 14                17


All times are in clock cycles; measured with Agner Fog's program.


> Can that method be implemented to work for any N?

Some divisors require 4 additional instructions (compare 257 and 127). If you have a pre-calculated table of divisors close to the powers of 2, you can select only the divisors that give a shorter code. If it gives a longer code, just select the next prime number, e.g., 131 instead of 127.

Another solution is to use CMOV to ignore the result of the 4 additional operations when they are not needed, but this will be a little slower.
ace,
>> Have you tried to get the results without that subtraction?
> It was almost always slower

I've asked because the method of using multiplication by constant is widely accepted but using subtraction not.

The "collision score tables" which include minus and no minus versions for various mult factors and using chaining look inconclusive:

Without numbers test

1.00 lookup3
1.01 x17
1.03 CRC-32
1.04 KandRM
1.04 universal
1.05 x65599
1.05 Paul Hsieh
1.06 Bernstein
1.06 Kernighan&Ritchie
1.06 FNV-1a
1.06 One At Time
1.07 x65599M
1.07 BernsteinM
1.10 X17NoM
1.16 Arash Partow
1.17 Weinberger


with numbers test

1.00 x17
1.00 x17 unrolled
1.03 CRC-32
1.05 lookup3
1.08 X17NoM
1.08 universal
1.11 Paul Hsieh
1.12 FNV-1a
1.13 One At Time
1.34 KandRM
1.34 x65599
1.35 Bernstein
1.35 Kernighan&Ritchie
1.35 Weinberger
1.37 BernsteinM
1.74 x65599M
8.97 Arash Partow


It looks to me that the visible "noise" of the results can not actually lead to some useful conclusion. If that operation really has the properties of conditioning the final result better even for these test cases, I'd expect that every function based on the multiplication to benefit from it? Here, K&R got better with subtraction, x65599 and Bernstein got worse.

Also note that "no subtraction" x17 comes much worse, too much to claim that 17 is much better factor than others.

Note also that 'a' - ' ' == 'A'. As it looks like that most of your test samples contain small letter strings, what you were doing is most of the times just making them ALL CAPS strings and testing the *same* functions!

The only useful real conclusion I have up to now is that functions with multiplication factors are still the good choice for practical purposes.
Peter Kankowski,
> too much to claim that 17 is much better factor than others
Yes, it's the combination of the factor 17 and subtracting that makes it the fastest function.

> functions with multiplication factors are still the good choice for practical purposes.
That was the main point of my first article :).

Ace, what do you think about Python dictionary code?
ace,
>> too much to claim that 17 is much better factor than others
> it's the combination of the factor 17 and subtracting that makes it the fastest function.

But let me point again, I beleive that given two functions on strings where one processes

char - Const

and the other

char

can both be considered as one and the same function. It doesn't look to me that you proved anything, except that the function gives good results exactly for the set of strings you've selected. Wouldn't just "toggling case" of your set before the hashing result in loss exactly where now the gain is? Do you want to say that you've developed the hash function that does good on strings with mainly 'a'..'z' but not on strings with mainly 'A'..'Z'?

> what do you think about Python dictionary code?

I haven't invested much time, so I can't say much. If I correctly see, they have a special treatment for very small dictionaries -- that's very important and a very good thing to do. Then, if I've correctly understood, they don't use chaining, and they resize the table once it's more than 2/3 full. Now if the 2/3 is an optimal limit is something that depends of their collision resolution efficiency. In my opinion you've demonstrated that without chaining and with linear probing even 1/2 can be "too full".
Peter Kankowski,
> Do you want to say that you've developed the hash function that does good on strings with mainly 'a'..'z' but not on strings with mainly 'A'..'Z'?
No, it will not be slower. Here are the raw results with upper-cased text files (open addressing, Pentium M):

               Words      Win32        Prefix       Postfix    Variables  Shakespeare
Bernstei     148[287]     882[919]     335[267]     319[231]     662[787]   864[1050]
K&R          144[214]     889[933]     334[229]     321[221]     652[655]   865[1029]
x17          139[197]     838[848]     309[192]     300[226]     642[939]   856[1219]
x17 unroll   133[197]     821[848]     302[192]     294[226]     626[939]   823[1219]
x65599       139[213]     860[1090     323[215]     311[222]     639[860]   839[1129]
FNV-1a       150[222]     956[1021     369[216]     357[222]     686[655]   903[1067]


Mixed case and lower case:

x17 unrolled      1.02
x17               1.05
x65599            1.06
lookup3           1.06
Paul Hsieh        1.07
Bernstein         1.09
K&R               1.10
FNV-1a            1.18
Arash Partow      1.20
universal         1.20
CRC-32            1.23
One At Time       1.25
Weinberger        1.45


After conversion to upper case:
x17 unrolled      1.01
x17               1.04
x65599            1.06
lookup3           1.06
Paul Hsieh        1.07
K&R               1.09
Bernstein         1.10
FNV-1a            1.17
universal         1.20
Arash Partow      1.20
CRC-32            1.22
One At Time       1.25
Weinberger        1.45


x17 should be slower on strings with a lot of "\r\n" and other control characters (though I have not checked this).
Mark Dennehy,
Have you reviewed “Performance in Practise of String Hashing Functions” (Jobel & Ramakrishna, 1997) at all? They have some interesting data on performance of various classes of hash functions. I've used their conclusions myself with good results, see here:
http://stochasticgeometry.wordpress.com/2008/03/29/cache-concious-hash-tables/
Peter Kankowski,
Austin Appleby, the author of Murmur hash, replied me by e-mail and commented about Murmur having a larger number of collisions:

Actually, the number of collisions that Murmur (and lookup3, and any cryptographic hash) produces is in the range predicted by statistics if the hash is truly random. I don't recall the equation offhand, but your x17 does better than predicted because it's less random. :)

Statistically good distribution is important in some applications because it allows me to say "My hash function will not produce pathological results with your keyset" with some certainty even if I have no idea what your keyset is. Bob Jenkin's frog.c test is particularly good at producing pathologically bad keysets - it creates large keys that are mostly 0 bits but with a small handful of 1s - that choke most simple-but-fast hash functions. Murmur happens to hit a nice sweet spot where it is simple, fast, and still statistically strong.
Won,
Peter, thanks for doing these tests. I am very surprised how poorly Adler32 does; an order of magnitude worse in Shakespeare than Fletcher is unexpected. Your code looks like it matches the wikipedia version, though.


UINT32 HashAdler(const CHAR * data, SIZE_T len)
{
UINT32 a = 1, b = 0;

while(len > 0) {
SIZE_T tlen = len > 5550 ? 5550 : len;
len -= tlen;
do {
a += *data++;
b += a;
} while (--tlen);

a %= 65521;
b %= 65521;
}
return (b << 16) | a;
}
ace,
I really like MurmurHash.

But for me the shortest Murmur variant is a bit of cheating. The starts of the strings in C and C++ are *not* aligned to 4 bytes unless typical malloc or new is made for each string or each string is copied to the convenient buffer before the hash is calculated. So I'd consider MurmurHashAligned2 a real function and I'd do the timings on the set of the strings which are packed in memory one after another, without anything between them (which makes the start of each of them unaligned and not predictable). I'd also like to see the comparison between "copy to the nice buffer then do MurmurHash2" and "just do "MurmurHashAligned2" for strings never longer than e.g. 1024 bytes -- when something is longer it's certainly not probable to appear in unaligned input (this test would probably show the quality of L1 cache?).

------

The need for "statistically good distribution" is exactly the reason why "adding constant" to each source byte should not matter, the reason why I didn't like your claim about the goodness of "subtracting constant" in x17.

------

Austin Appleby mentions on

http://murmurhash.googlepages.com/discussion

that he did chi-square and avalanche testing.

Chi-square test are fundamental tests and should be used for most of experiments. In comp science it was of course used to check goodness of rnd generators probably since they exists.

One possible introduction:

http://www.fourmilab.ch/rpkp/experiments/statistics.html

and the program which uses the test to evaluate the quality of "random" stream:

http://www.fourmilab.ch/random/
Won,
FWIW, glibc malloc has been 8-byte aligned for a while, and current versions might even be 16-byte aligned. tcmalloc is 8-byte aligned.

http://code.google.com/p/google-perftools/

In addition to tcmaloc, the perftools include some interesting hash table implementations. If you can deal with the limitations, intrusive data structures can be much more efficient.
Won,
My mistake, the hash table implementations are here:

http://code.google.com/p/google-sparsehash/
Arash Partow,
Hi Peter,

Nice go for a 4th time around, but I think we previously discussed as far as timing is concerned if the hash functions are not implemented in a similar fashion then its really not a fair or meaningful experiment. The hash functions from my site are implemented in their most basic definition, for production purposes you wouldn't use them as is you would try and do things like duff's device or something similar to what Jenkins does.

Take for example the way you have unrolled my hash function, its probably the worst way it could have been done, further more why not djb, pjw or others? So as far as timing is concerned unless you can get them all on the same footing it is worthless/meaningless to mention times.

The next issue is collisions, I still don't accept your methods as being generally acceptable though it does seem valid for most situations, as I suggest previously it may be better to get together on this an define a real set of tests, standard inputs and testing methodologies.

In anycase keep up the good work.
Peter Kankowski,
AC, thanks for the link to fourmilab.ch; his explanations are so clear.

x86 processor can access non-aligned dwords. My previous measurements showed that it's faster to read short strings without alignment, so that you can avoid an additional switch statement before the main loop. Alignment matters only if you are going to use the function on a different processor or to hash some very long strings.

Subtracting a constant matters because you then calculate a modulus of the hash table size, so you effectively throw off some high bits. You can save more information in low bits by subtracting ' ' (if you know that '\n', '\r', and other control characters will not appear in the hashed strings).

Certainly, x17 has nothing common with a statistically good hash function; it just tries to "pack" more characters into a small hash value.
 

Won, thank you, I'm studying SparseHash now. They used quadratic probing instead of separate chaining. It would be interesting to compare these approaches (I will probably do this in future).

The first problem with Adler and Fletcher is that sum2 will be masked away when calculating the modulus. I tried replacing the last line with the following:


return a ^ b;


and got much better results:

                   Words       Win32       Numbers     Variables   Shakespeare
Adler-32 original  151 [178]   830 [1070]  382 [7688]  725 [1564]  1540 [11686]
Adler-32 (a ^ b)   147 [157]   791  [559]  251 [3624]  655  [719]  1141  [3567]


The second problem is that the characters are not "weighted" (multiplied by different numbers), so that Adler-32("01") = Adler-32("10"), that's why it fails the Numbers test. Ditto for anagrams in Shakespeare's sonnets: Adler-32("heart") = Adler-32("earth").

So, Adler-32 may be a good checksum for compressed data, but I would not use it for hash tables. Murmur is definitely better :)
ace,
> so you effectively throw off some high bits

No, you just shuffle the values of each input byte (1 becomes 225 etc).

> Certainly, x17 has nothing common with a statistically good hash function; it just tries to "pack" more characters into a small hash value.

Yes, your tests clearly demonstrate that even using simple K&R and separate chaining can often be good enough. Maybe Google searches for a stronger function only because they decided not to use separate chaining?

I haven't analyzed their code, but it can be that they didn't really have to avoid separate chaining -- that they were able to keep the same memory usage, or even have better behaviour in cases when the number of input elements is not known in advance.

I'd still enjoy to see any real life example which demonstrates the real need for statistically strong hash function, provided open addressing is avoided.
Peter Kankowski,
Speaking about quadratic probing vs. separate chaining, Google gives the following link in their sources:

http://www.augustana.ca/~mohrj/courses/1999.fall/csc210/lecture_notes/hashing.html
Won,
google-sparsehash also contains a dense hash map which is quite fast. Although, as I mentioned earlier, intrusive implementations are much faster (if you can live with the limitations).

http://www.boost.org/doc/libs/1_35_0/doc/html/intrusive/unordered_set_unordered_multiset.html

Boost, of course, has lots of interesting stuff.

I mentioned Cuckoo hashing as an alternative chaining method to Peter in an e-mail. There are variants that can sustain very high load factors.

http://en.wikipedia.org/wiki/Cuckoo_hashing

For Adler/Fletcher: ah, that makes sense. I suppose the prime mod versions would work much better. But, I don't think it is correct to say that Adler/Fletcher cannot distinguish between permutations (think about how 'b' accumulates). I also don't think a^b is a particularly good mix for Adler/Fletcher. There is the problem with how you compute the bucket from the hash. If you need N bits, maybe it makes sense to combine N/2 bits for a and N/2 bits from B. Maybe this is as simple as taking the middle N bits from an Adler/Fletcher hash, rather than the least N bits.
Arash Partow,
There is a nice write by mitzenmacher that breaks down cuckoo hashing into nothing more than intriguing theory. It is virtually impractical/impossible to obtain the optimal or even get even the optimal theoretical performance in practise.
Won,
My reading of Mitzenmacher (his blogs, I will read the paper) is mostly positive. Certainly there is always going to be a difference between theory and practice, but that doesn't mean there isn't a practical variation of a cuckoo hashing that is useful.

One variation is to allow for multiple items in a table entry (similar to a multi-way cache) before rehashing those elements. This variation is certainly not unique to cuckoo hashing, but seems to work well with it.

Anyway, my discussion is mostly from memory, but these postings are encouraging me to look into it myself...
Peter Kankowski,
Oh, I'm sorry, 'b' will be different for the anagrams, so the hash value will be different.

For short strings, the higher byte of 'a' is zero, so if we take the middle N bits, lower bits of 'a' will be lost. It makes sense to calculate a ^ (b << (N/2)).

I've just tried a ^ (b << 4) for these test files, and the results were comparable with Murmur:

                   Words       Win32       Numbers     Variables   Shakespeare
Adler-32 original  151 [178]   830 [1070]  382 [7688]  725 [1564]  1540 [11686]
Adler-32 (a ^ b)   147 [157]   791  [559]  251 [3624]  655  [719]  1141  [3567]
Adler-32 a^(b<<4)  146 [135]   789  [475]   98 [ 500]  627  [428]   923   [784]


Thank you very much for your ideas.
ace,
> http://www.augustana.ca/~mohrj/courses/1999.fall/csc210/lecture_notes/hashing.html

AFAIK the tables there also clearly demonstrate the superiority of chaining? See what happens when load factor increases.

So I'd still like to see any good argument to use open addressing, especially since it demands more complex hash function only to be able to perform acceptably and then it even requires using this function more often, and behaves exponentially worse as load factor even approaches 1 (I'd call that lose-lose-exp(lose) scenario :) ).
Won,
Ace, those are some good points.

However, there are some advantages to open addressing. A non-intrusive C++ hash table implementation that uses chaining (e.g. tr1/unordered_map, based on SGI STL hash_map) has to allocate nodes to store the data + the link pointers. This means memory allocations, copies, and indirections that contribute to the constant factor for chained hash table implementations. For typical load factors (1/2 to 2/3), open addressing (google-sparsehash uses quadratic probing) can be faster.

http://google-sparsehash.googlecode.com/svn/trunk/doc/performance.html

There is also the slight advantage that open addressing can be slightly more compact, but I don't know how important this actually is. Sustaining a high load factor is probably much more important than avoiding pointer overhead, so chaining might be even better in this regard. If you really care about compactness, something like Judy might be better:

http://nothings.org/computer/judy/

An intrusive hash table implementation avoids most of the practical problems of chaining. Since you avoid all those copies, allocations and indirections. These are much faster than any of the non-intrusive options, but they are not appropriate in every situation. It is not always possible to instrument classes to be used in such an application. The instrumentation overhead affects all instances, not just those in hash tables. Ownership semantics are very different, and that can lead to design subtleties (amplified by concurrency).
ace,
Won, when you refer to "intrusive implementations" do you talk about the concept where the item being owned by the container contains a placeholder for the pointer to be used by the container? If so, of course the same method can be used in chaining. So I still fail to see the advantage of open addressing?

Moreover how can the load factor which so influences open addressing be kept constant without introducing even bigger performance penalties, when even slight changes in load factors force rebuild of the whole container?

I agree with you that ownership is very important. Personally I don't think it's smart overusing smart pointers in C++ and that when somebody doesn't want to care about ownership he should not use C++ at all but some language which has "natural" GC. For C and C++, I think the best results are achieved by maintaining a distinction between owning and referring containers. The compiler writers knew about that since forever I guess. And they certainly had to maintain a lot of hashes.

Is it possible for you to explain (if you know) what are the exact design decisions behind "sparse hash" since I fail to grasp them from the site? Only then it can be discussed if there's a possibility for improvement, or what the major contribution of that implementation is.

Thanks in advance.
Won,
I only know as much about sparse-hash that is publicly known, which is essentially everything since it is open source! But no, I don't know the story of its creation.

Open addressing does not require additional information to handle collisions, because the "chaining" is implicit, based on the probing strategy. Objects are copied into the container without need for modification or ancillary data (maybe a little bit to indicate empty/full).

Explicit chaining requires some kind of pointer. Each bucket forms a short linked-list of elements. You can implement that linked-list in several ways. The non-intrusive way (like SGI hash_map) makes a node that is essentially a pair of the object + a pointer to the next node. The intrusive way (like Boost intrusive_unordered_map) is to have a special field within the object itself to point to the next object. So for STL-compatible containers, this isn't necessarily a huge win, since they are value-oriented. However, a non-owning, pointer-oriented intrusive hash map can be very fast because it avoids almost all copies and allocations.

BTW, ownership is always a design problem. GC does not automatically solve it for you.
ace,
I've finally read carefully the sparsehash docs to understand what the goal of the author was. He wanted to make a replacement for SGI STL hash_set which doesn't introduce too big memory overhead for the hash table infrastructure.

Specifically, it seems that his problem was that when STL hash_set is used to *own* values (of some big structures?) a lot of memory was taken by placeholders for unpopulated entries.

That's why he designed the "sparsetable" which spends (more-or-less) a single bit to mark an used or unused entry in it, and allocates space to hold only the assigned values it owns. Then he used that "sparsetable" as the underlying table for the open-addressing hash table implementation.

So his goal was to fit as much as possible 'indexed' (with a hash function) structures in the memory without even spending a single pointer per structure kept, giving away performance to get the lower memory footprint.

Nice.
ace,

I've accidentally discovered the error in the implementation of "HashWeinberger." The line

h ^= g >> 24 ^ g;

is wrong, as >> has lower priority than ^, and that causes significantly worse results. The original code in the book was:

h = h ^ (g >>24);
h = h ^ g;

But even their version doesn't look too promising as the last line in their function is:

return h % PRIME;

where the prime is defined as 211. But at least that suggests that it was not meant to be used in "open addressing."


Another problem with the hash.cpp: either all or none of the functions should use the "fix" as:

h ^ (h >> 16);

otherwise there is no comparing under the same conditions.

Peter Kankowski,

Thanks for your comment. MSVC generates the same machine code for

h = h ^ (g >> 24);
h = h ^ g;
and
h ^= g >> 24 ^ g;

Operator ^ has lower precedence than >> (^ is lower in the precedence table). You probably changed something else in the source code that affected performance.

I tried h ^ (h >> 16) for K&R function, and it was slower this way:

WordsWin32NumbersPrefixPostfixVariablesShakespeare
without XOR146[106]884[437]88[288]323[94]317[106]657[360]905[561]
with XOR144[94]894[432]91[288]330[110]322[105]660[352]906[573]

For Larson's hash, I just forgot to try it (XOR version is faster). Thank you for reminding!

ace,
> ^ is lower in the precedence table

You are certainly right, my error! This was without thinking from my side, as I was used to see x >> y + z where the author wanted to >> before +. However ^ is really weaker than >>.

To excuse myself, this lapse occurred as I wanted to make my small contribution:

D. R. Hanson uses in two of his books, since around 1997 up to now, e.g. here:

http://code.google.com/p/cii/source/browse/tags/v20/src/atom.c
the following hash function:
for (h = 0, i = 0; i < len; i++)
    h = (h<<1) + scatter[(unsigned char)str[i]];

And it's very poor! Your tests immediately demonstrate the (this time, real) error.

ace,
(is there any page with the new syntax for formatting comments?)
Peter Kankowski,

No problem.

The comments are plain text now. I've rewritten strchr.com from scratch, as I promised to do earlier. Real minimalism: around 1500 lines of PHP code written in a couple of weeks, no bloated CMSes or "web frameworks". All your comments were saved, as usual. Later, there will be a WYSIWYG editor for comments and a button to publish your own article :)

,
Congratulation for the new implementation!

Xkcd recently brought to my attention "Collatz conjecture" (http://en.wikipedia.org/wiki/Collatz_conjecture)

Using programming language notation, giving the following Python function (where numbers are of unlimited number of bits):
def collatz( n ):
    while 1:
        n = 3 * n + 1 if n & 1 else n >> 1
        if n == 1: break
the conjecture states that for every n the loop will eventually terminate.

One way to look at it is: the 3*n+1 would be an acceptable hash function step where each input would be a single 1. (of course using unlimited number of bits is not for practical hash functions). The n >> 1 branch removes all zero LS bits, if they exist, else the "hash function" is applied (note also that by the construction of the loop every "hash" step results in at least one zero LSb). So the conjecture can be restated "applying 3*n+1 transformation step over the number with unlimited number of bits and the transformation which removes LS zero bits from the number in one moment there would be exactly 1 set bit in the number."

Of course I don't see any practical aspect of all this, except how a lot of experiments (in wikipedia article) show the "bit randomizing" properties of the multiplication with the odd number.

And that's what's wrong in Hanson's hash: he uses an even multiplier and in every step he loses information about previous steps(!) Using the "scatter" mapping trick of input byte doesn't help at all -- the multiplier is the engine, here the broken one.
ace,
(I guess that's the bug in the new web code, allowing empty name?)
Peter Kankowski,
Thanks, it's a good example of seemingly simple but unsolved problem. I will fix the empty name bug.
ace,

I suggest adding the function used by Hanson to your set of hashes in hash.cpp. The results nicely demonstrate how a faulty hash function can be designed, demonstrated in the books and used for a long time:

static unsigned long scatter[] = {
2078917053, 143302914, 1027100827, 1953210302, 755253631, 2002600785,
1405390230, 45248011, 1099951567, 433832350, 2018585307, 438263339,
813528929, 1703199216, 618906479, 573714703, 766270699, 275680090,
1510320440, 1583583926, 1723401032, 1965443329, 1098183682, 1636505764,
980071615, 1011597961, 643279273, 1315461275, 157584038, 1069844923,
471560540, 89017443, 1213147837, 1498661368, 2042227746, 1968401469,
1353778505, 1300134328, 2013649480, 306246424, 1733966678, 1884751139,
744509763, 400011959, 1440466707, 1363416242, 973726663, 59253759,
1639096332, 336563455, 1642837685, 1215013716, 154523136, 593537720,
704035832, 1134594751, 1605135681, 1347315106, 302572379, 1762719719,
269676381, 774132919, 1851737163, 1482824219, 125310639, 1746481261,
1303742040, 1479089144, 899131941, 1169907872, 1785335569, 485614972,
907175364, 382361684, 885626931, 200158423, 1745777927, 1859353594,
259412182, 1237390611, 48433401, 1902249868, 304920680, 202956538,
348303940, 1008956512, 1337551289, 1953439621, 208787970, 1640123668,
1568675693, 478464352, 266772940, 1272929208, 1961288571, 392083579,
871926821, 1117546963, 1871172724, 1771058762, 139971187, 1509024645,
109190086, 1047146551, 1891386329, 994817018, 1247304975, 1489680608,
706686964, 1506717157, 579587572, 755120366, 1261483377, 884508252,
958076904, 1609787317, 1893464764, 148144545, 1415743291, 2102252735,
1788268214, 836935336, 433233439, 2055041154, 2109864544, 247038362,
299641085, 834307717, 1364585325, 23330161, 457882831, 1504556512,
1532354806, 567072918, 404219416, 1276257488, 1561889936, 1651524391,
618454448, 121093252, 1010757900, 1198042020, 876213618, 124757630,
2082550272, 1834290522, 1734544947, 1828531389, 1982435068, 1002804590,
1783300476, 1623219634, 1839739926, 69050267, 1530777140, 1802120822,
316088629, 1830418225, 488944891, 1680673954, 1853748387, 946827723,
1037746818, 1238619545, 1513900641, 1441966234, 367393385, 928306929,
946006977, 985847834, 1049400181, 1956764878, 36406206, 1925613800,
2081522508, 2118956479, 1612420674, 1668583807, 1800004220, 1447372094,
523904750, 1435821048, 923108080, 216161028, 1504871315, 306401572,
2018281851, 1820959944, 2136819798, 359743094, 1354150250, 1843084537,
1306570817, 244413420, 934220434, 672987810, 1686379655, 1301613820,
1601294739, 484902984, 139978006, 503211273, 294184214, 176384212,
281341425, 228223074, 147857043, 1893762099, 1896806882, 1947861263,
1193650546, 273227984, 1236198663, 2116758626, 489389012, 593586330,
275676551, 360187215, 267062626, 265012701, 719930310, 1621212876,
2108097238, 2026501127, 1865626297, 894834024, 552005290, 1404522304,
48964196, 5816381, 1889425288, 188942202, 509027654, 36125855,
365326415, 790369079, 264348929, 513183458, 536647531, 13672163,
313561074, 1730298077, 286900147, 1549759737, 1699573055, 776289160,
2143346068, 1975249606, 1136476375, 262925046, 92778659, 1856406685,
1884137923, 53392249, 1735424165, 1602280572
};


UINT hashHanson( const CHAR* s, SIZE_T L )
{
    UINT h = 0;
    for ( SIZE_T i = 0; i < L; i++ ) {
        unsigned char c = s[i];
        h = ( h << 1 ) + scatter[ c ];
    }
    return h;
}

The serious weakness is visible for your "Postfix" set (the strings with the same long postfix).

Peter Kankowski,
Thanks. Just cannot remember why I commented out HashHanson in test.cpp :) It's included now, and the table in article is sorted by the total running time.
ace,

Finally, just for completeness, I propose one more function. I named it Novak Hash, and it's basically what should have been done to still do the "scatter" approach and to implement the function right (that is, not losing significant information about the previous values in every step and reducing the size of the table):

unsigned char* rijndaelSBox = (unsigned char*)
"\x63\x7c\x77\x7b\xf2\x6b\x6f\xc5"
"\x30\x01\x67\x2b\xfe\xd7\xab\x76"
"\xca\x82\xc9\x7d\xfa\x59\x47\xf0"
"\xad\xd4\xa2\xaf\x9c\xa4\x72\xc0"
"\xb7\xfd\x93\x26\x36\x3f\xf7\xcc"
"\x34\xa5\xe5\xf1\x71\xd8\x31\x15"
"\x04\xc7\x23\xc3\x18\x96\x05\x9a"
"\x07\x12\x80\xe2\xeb\x27\xb2\x75"
"\x09\x83\x2c\x1a\x1b\x6e\x5a\xa0"
"\x52\x3b\xd6\xb3\x29\xe3\x2f\x84"
"\x53\xd1\x00\xed\x20\xfc\xb1\x5b"
"\x6a\xcb\xbe\x39\x4a\x4c\x58\xcf"
"\xd0\xef\xaa\xfb\x43\x4d\x33\x85"
"\x45\xf9\x02\x7f\x50\x3c\x9f\xa8"
"\x51\xa3\x40\x8f\x92\x9d\x38\xf5"
"\xbc\xb6\xda\x21\x10\xff\xf3\xd2"
"\xcd\x0c\x13\xec\x5f\x97\x44\x17"
"\xc4\xa7\x7e\x3d\x64\x5d\x19\x73"
"\x60\x81\x4f\xdc\x22\x2a\x90\x88"
"\x46\xee\xb8\x14\xde\x5e\x0b\xdb"
"\xe0\x32\x3a\x0a\x49\x06\x24\x5c"
"\xc2\xd3\xac\x62\x91\x95\xe4\x79"
"\xe7\xc8\x37\x6d\x8d\xd5\x4e\xa9"
"\x6c\x56\xf4\xea\x65\x7a\xae\x08"
"\xba\x78\x25\x2e\x1c\xa6\xb4\xc6"
"\xe8\xdd\x74\x1f\x4b\xbd\x8b\x8a"
"\x70\x3e\xb5\x66\x48\x03\xf6\x0e"
"\x61\x35\x57\xb9\x86\xc1\x1d\x9e"
"\xe1\xf8\x98\x11\x69\xd9\x8e\x94"
"\x9b\x1e\x87\xe9\xce\x55\x28\xdf"
"\x8c\xa1\x89\x0d\xbf\xe6\x42\x68"
"\x41\x99\x2d\x0f\xb0\x54\xbb\x16";

unsigned long NovakHashUnrolled( char* s, int L )
{
    unsigned long h = 0;
    int i;
    unsigned char* t = (unsigned char*)s;
    for ( h = 0, i = 0; i < ( L & ~1 ); i += 2 ) {
        h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ];
        h = ( h << 1 ) + h + rijndaelSBox[ t[ i + 1 ] ];
    }
    if ( L & 1 )
        h = ( h << 1 ) + h + rijndaelSBox[ t[ L - 1 ] ];
    return h;
}

unsigned long NovakHash( const char* s, int L )
{
    int i; unsigned long h = 0;
    unsigned char* t = (unsigned char*)s;
    for ( i = 0; i < L; i++ )
        h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ];
    return h;
}

I just consider it "how Hanson's function should have been done," I still believe that functions that don't use any tables are better in almost all real-life circumstances.

ace,
So, how is then NovakHash compared to the rest?
Peter Kankowski,
Sorry for the delay. Your function seems to be faster than Murmur with similarly low number of collisions. Congratulations! :)
ace,
A link which ends with a C# code that draws pictures of mixing quality of different hashes:

Bret Mulvey
Evaluating hash functions
http://home.comcast.net/~bretm/hash/
Andrew M.,
You can speed up the CRC32 code by expanding the lookup table and processing a dword at a time. Breaks it on big endian systems, though: http://codepad.org/RqZGx7xn

The downside to just testing general use cases is that programmers who don't know any better will see a hash function that works for a particular situation, spread it around, and it will end up being used for things it shouldn't! e.g. the huge number of hash functions that exist now, many of them in wide use despite being pretty bad outside of a certain range of keys. Not to say that I know of a good universal metric, but work towards one would be good.
Peter Kankowski,

Thank you very much! CRC got the 4th place with your optimization. Its results are very balanced (a low number of collisions; no spikes on any test).

AFAIK, the best we can do is to test the functions on different sets of strings (both real-world and synthetic). I tried to find the sets that cover most common situations and are very different statistically: English words, function and variable names, sequential numbers, etc. If you have other ideas, I will be happy to add more tests.

Andrew M.,
CRC32 is a good example of what screws up theoretical tests. It has 0% avalanche yet it performs very well on many sets of keys, so merely going off of avalanche is not that useful, which makes me hesitant to advocate any of the theoretical tests.

For more "real world" a list of utf-8 words with multi-byte encodings would be good. I've also found a binary list of ips is good at screwing up hash functions, but that will need special consideration for loading as they can contain nulls and carriage returns.
Peter Kankowski,
Thanks. I will try IP addresses and UTF-8 (German and French letters with accents, Russian words, etc.) and will publish the results later.
Peter Kankowski,
Thanks, he removed the book chapters from site. I updated the link.
ace,
Initial value of the h matters!
I have reconsidered Novak Hash and I'd like to update it: instead of initializing with h = 0 I'd like h to be initialized with h = 1. The reason is that as one of the values in S-Box is 0 when strings of different lengths but only of the byte which maps to zero are hashed the pathological case occurs.
Similar pathological case can be constructed for all hashes that have multiplicative constant, if the initial value of h is 0. E.g. for x17 if it subtracts 32 of the byte it degrades to the pathological case when a lot of strings consisting only of spaces of different lengths are hashed -- all will map to the same slot!
Andrew M.,
If you're going to play the constructing collisions game, initializing h to 1 isn't going to help at all! (It's also common to use the length of the data to be hashed as an initializer to somewhat "avoid" the same hash for a repeating string that causes a zero case).

Yes, R, RR, RRR, etc will not hash to the same value with h = 1, but existing collisions will still collide. Set h to whatever you want and hash "novak" and "qovAk", they'll both have the same hash.

You can continue plugging holes, but you'll only slow it down. If you really want to go the Sbox route, wouldn't a full 32 bit sbox like http://home.comcast.net/~bretm/hash/10.html be better than single bytes?
ace,
Andrew M, you're right that some collisions will remain! Of course, hash functions must have collisions by definition. I agree that 32-bit sbox is certainly safer bet. Still selecting hash function, if it's not a cryptographic one, is just an exercise in engineering -- trying to find the local "optimum" between different trade-offs. A run of the same bytes (and especially a run of zeroes) is something that is very common and something that often can be passed to some even simple hash function, e.g. when some compression algorithm is attempted.
What can be considered a "good enough minimum" is is the most interesting question indeed! As the assemblers and compilers had the limitation of only 6 letters per symbol even just summing the letters was considered "good enough." Now, it appears that the demand for full cryptographic "avalanche" effect on all bits is not always necessary -- I'd really like to know at which point it becomes important. Avoiding some "obvious" problems with simplest functions should help.
That's the idea with which I experimented with 8-bit S-Box, to find if something can be "good enough" with smaller cache footprint (and something that's maybe more convenient for embedded systems limitations? I'm also preferring to the "endianess independent" implementations). Still, for my tastes, avoiding having tables at all can be better whenever the function is not very frequently invoked -- tests with the "full" cache don't reflect the a lot of typical usage scenarios I can imagine.
I'd really like to know where the limits of every function are.
I believe fixing the simple multiplicative functions for some common scenarios with the different initial value can be the "good enough" solution for some usages?
BTW, I also know that AES primitives are being increasingly implemented in hardware. I admit I haven't checked if S-Box mapping is actually part of the primitives implemented by Intel?
I know CRC32 is implemented in some new Intel CPU's, and I'm very interested to see how your implementation compares to the one using the specialized instructions.
Peter Kankowski,

Andrew, I've added the tests with UTF-8 and IP addresses (in binary). Fletcher's hash is terrible for UTF-8 texts because of repeated first bytes in multi-byte sequences. I did an additional test with a Russian novel, and Fletcher was even worse. The results of other functions were not very different from the previous tests.

Ace, I've updated Novak hash and x17 (h = 1). AES instructions implement the whole round of encryption, not just substitution using S-boxes.

CRC32 is implemented in Core i5-i7 processors, but with the iSCSI polynomial, so it is useless for ZIP, PNG, MPEG-2, and many other formats, which use a different polynomial. Though it does not matter much for a hash function which polynomial to use. I will probably try accelerated CRC32 later :)

Andrew M.,
Have you done the SSE CRC32 testing yet? I want to get an i7 to try it out, but can't justify it since my E8400 is still way too fast.

Could any of the AES instructions be used for hashing too? I can't seem to find any speed comparions for the SSE CRC32/AES instructions at all.
ace,
I believe Andrew that you wouldn't notice some improvement from E8400 unless you can actually use more cores at once.
As Peter linked, AES instructions seem to be too heavy for hashing, as they do much bigger work, which is not surprising, the substitution alone is very effective as soon as the table is in cache.
Peter Kankowski,

I've just done the tests on Core i5 processor (see the results above). Hardware-accelerated CRC is the fastest hash function; x17 is slow on this processor for some reason.

I've also optimized the handling of remaining bytes in your CRC32 code (using two conditions instead of the loop). It's slightly faster this way.

Andrew M.,
No, my E8400 doesn't support SSE 4.2 (CRC32/AES instructions) at all! They were introduced with the i7. Otherwise I would have done some benchmarking on them myself.

The hardware CRC32 numbers are fairly impressive. That is why I wondered about AES, if any of the ops are fast enough then they could be quite good.
Peter Kankowski,
I would like to see the AES results, too. Unfortunately, my i5 has no support for AES instructions. If anybody else can do the benchmark, please drop a comment here.
ace,
Andrew, re "unless you really need cores" I was referring to your "can't justify it since my E8400 is still way too fast."

Here's what I've found about the speedup of CPU-accelerated AES compared to non-CPU accelerated AES: it is approximately "just" factor 4, at least according to:

http://wiki.debianforum.de/BenchmarkFestplattenverschlüsselung

and according to Intel, up to 10 for multi-core scenario:

http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/

As AES-NI instruction does whole round on 16 bytes and there is a lot of processing for one round and the typical speedup is only 4, I believe AES-NI is therefore still irrelevant for the simple kind of hashing that's the subject here.

Peter, do you have then I5-750 instead of any of I5-6*'s (which are all supposed to have AES-NI, at least according to: http://processorfinder.intel.com/List.aspx?ProcFam=3155)?

I didn't know there are I5's without AES-NI at all until now.
Peter Kankowski,

AES-NI results are impressive. As I found, it's supported by PGP, DiskCryptor, and other popular encryption software. GnuPG does not use the hardware acceleration; most likely, because they want to stay portable.

Yes, I have Core i5 750 (four cores, SSE 4.2, but no AES-NI).

P.S. Here is a new article with detailed CRC32 benchmarks.

Alexander,
See some hash tests here (Russian):
http://amsoftware.narod.ru/algo.html
ace,
Your test are very insightful, thanks Alexander.
Vaclav,
A different hashing approaches for textual and geometrical data can be found at:
http://herakles.zcu.cz/~skala/publications.htm
especially:
http://herakles.zcu.cz/~skala/PUBL/PUBL_2010/2010_WSEAS-Corfu_Hash-final.pdf
http://herakles.zcu.cz/~skala/PUBL/PUBL_2010/2010_Corfu-NAUN-Hash.pdf

Perhaps will behelpful.
ace,
If I understand what he Alexander did (by looking at his code) -- he considered as a collision whenever some 32 bits of some hash function result are the same for different input words (and I think he somewhere even used more bits). In such occasions "Novak Hash" produced more of the same values than most of others. That nicely demonstrates the engineering trade-off that must be considered when using simple hash functions: for some uses you have to know how many resulting bits of some function are "quality bits" for different sizes of the input sets, unless you know that you can trust all the bits (which you can if the function is constructed based on some stronger mathematic or cryptographic ideas). Also note that in some uses you actually don't care for the bits you don't want to use, and non-existence of collision in whole 32 bits (or more) doesn't help you if you need much less buckets and all the results hit only a few buckets. I'm curious how these interesting aspects of the functions can be nicely visualized.
Peter Kankowski,
I added his functions to the benchmark. The results: MaHash8, MaHash4, MaHash11 (the winners in his benchmark) have a small number of collisions, but do too much operations per one character, so they are slow. MaPrime2c looks more interesting.

About visualization: I wonder how to do it, too :) If we plot the low 16 bits on x axis and the high bits on y axis, there will be not enough pixels to display them on a web page (need 216 × 216 pixels).
ace,
Visualization: I guess it should be done this way: creating input sets of different sizes, but certainly up to the much bigger size than what you used. Calculating hash for each value. Determining the number of collisions for different number of used bits. That's the curve with number of collisions on y and number of used bits on x. Drawing the curves of more functions on one graph, for one input set size, and doing this for more input set sizes. I guess not too much input sizes can be enough. For every input size the set should have enough examples of the values which demonstrate problems, you recognized some typical cases like prefix, postfix and numbers. I guess a single set for a given set size but with a good proportion can be good enough in demonstrating the quality.

I know it's not needed to do that when some specialized solution is considered, but it's interesting playing with the experiments that are supposed to give us more insight about the behavior of the functions.
Georgi 'Sanmayce',
Hi to all,

I hope it would be interesting to you to see FNV-1A(13bit) used in my intoxicatingly fast word-list ripper Leprechaun at http://encode.ru/threads/612-Fastest-decompressor!?p=22184&viewfull=1#post22184, where 22,202,980 wordlist of latin-letters-words is proceeded.
also 'FNV1A_Hash_Granularity' at http://encode.ru/threads/612-Fastest-decompressor!/page3.

I salute the host Peter for this site: it is most comprehensive and useful for sure, keep going.
Georgi 'Sanmayce',
Hi Peter,

after some minor changes in your Hash17_unrolled here comes the 'Alfalfa' tuned for latin-letters-words i.e. alpha strings between 1 and 31 chars.
This variant behaves surprisingly well when 13bits(8192 slots) are used.

With my testbed(22,202,980 distinct words) which uses 3-level hash(1st: 26 slots for first letter; 2nd: 31 slots for string length; 3rd: 8192 slots given to some multiplicative hasher) Alfalfa outperforms Hash17_unrolled by: 18,677,243 - 18,645,799 = 31,444 less collisions.
In my opinion every hash variant must be tuned specifically for a given data-set and to be regarded as a potential gem(not as a common-stone only because for other set it performed poorly).

// Word count: 35,271,297 of them 22,202,980 distinct
// Number Of Trees(GREATER THE BETTER): 3525737
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18677243
// Maximum Attempts to Find/Put a WORD into a Binary-Search-Tree: '38'
// Total Attempts to Find/Put WORDs into Binary-Search-Trees: 117,612,984
// Total Number of LEAFs in Binary-Search-Trees(GREATER THE BETTER): 8,056,968
// Perfectly-Balanced-Binary-Search-Tree for MaxNODEs = 90 must have PEAK = 7 = rounding down of integer (1+lb(90))
// Binary-Search-Tree(1st out of 2) with MaxNODEs = 90 has PEAK = 26 and LEAFs = 23
// Binary-Search-Tree(1st out of 1) with MaxPEAK = '38' has NODEs = 60 and LEAFs = 13
// Binary-Search-Tree(1st out of 3) with MaxLEAFs = 27 has NODEs = 84 and PEAK = 22
int Hash17_unrolled(char *key, int wrdlen)
{
	int hash = 1;
        int i;
	for(i = 0; i < (wrdlen & -2); i += 2) {
		hash = (17) * hash + (key[i] - ' ');
		hash = (17) * hash + (key[i+1] - ' ');
	}
	if(wrdlen & 1)
		hash = (17) * hash + (key[wrdlen-1] - ' ');
	return ( hash ^ (hash >> 16) ) & 8191;
}

// Word count: 35,271,297 of them 22,202,980 distinct
// Number Of Trees(GREATER THE BETTER): 3557181
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18645799
// Maximum Attempts to Find/Put a WORD into a Binary-Search-Tree: '37'
// Total Attempts to Find/Put WORDs into Binary-Search-Trees: 116,908,873
// Total Number of LEAFs in Binary-Search-Trees(GREATER THE BETTER): 8,076,977
// Perfectly-Balanced-Binary-Search-Tree for MaxNODEs = 85 must have PEAK = 7 = rounding down of integer (1+lb(85))
// Binary-Search-Tree(1st out of 1) with MaxNODEs = 85 has PEAK = 19 and LEAFs = 18
// Binary-Search-Tree(1st out of 2) with MaxPEAK = '37' has NODEs = 49 and LEAFs = 10
// Binary-Search-Tree(1st out of 1) with MaxLEAFs = 28 has NODEs = 83 and PEAK = 25
int Alfalfa(const char *key, int wrdlen)
{
	int hash = 7;
        int i;
	for(i = 0; i < (wrdlen & -2); i += 2) {
		hash = (17+9) * ((17+9) * hash + (key[i])) + (key[i+1]);
	}
	if(wrdlen & 1)
		hash = (17+9) * hash + (key[wrdlen-1]);
	return ( hash ^ (hash >> 16) ) & 8191;
}
A suggestion: putting one more column to your result table containing all latin-letters-words encountered in wikipedia-en-html.tar.wrd (12,561,874 distinct words in 146,973,879 bytes) would give a better look on collision-performance on heavy(real) loads along with light-weight Shakespeare's sonnets, don't you think?
If you are interested, my testbed is here: http://www.sanmayce.com/Downloads/Leprechaun_r13+++++_EXE+ELF_vs_Wikipedia.zip
Peter Kankowski,
Thanks! I will try your testbed to see if it will be different from the results on smaller datasets. I will also try the visualization suggested by Ace.
ace,
As 2 ** 24 ~ 16e6 I think should be possible to count all hash collisions of 12e6 en.wikipedia words on a 32 bit computer. Maybe using, for example, 32, 24, 19, 13 and 9 bits of hash and the differences between the number of collisions (but with smaller sets for less bits) can be a good way to demonstrate the potential problems that can be relevant in practice too. And I'd add some serious number of prefix, suffix , number and binary number values to each test set.
ace,
I also vote for SBoxHash presented by Bret Mulvey: http://home.comcast.net/~bretm/hash/10.html to be in the set of the evaluated functions.
Georgi 'Sanmayce',
Hi Peter,

regarding speed I offer two extra fast hashers(tuned for 8192 slots):

// Number Of Trees(GREATER THE BETTER): 3550665
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18652315

int Alfalfa_HALF(const char *key, int wrdlen)
{
	int hash = 12;
        int i;
        int j;
//	for(i = 0; i < (wrdlen & -2); i += 2) {
//		hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]);
//	}
	for(i = 0; i < (wrdlen & -4); i += 4) {
		hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]);
		hash = (( ((hash<<5)-hash) + key[i+2] )<<5) - ( ((hash<<5)-hash) + key[i+2] ) + (key[i+3]);
	}

//	if(wrdlen & 1)
//		hash = ((hash<<5)-hash) + (key[wrdlen-1]);
	for(j = 0; j < (wrdlen & 3); j += 1) {
		hash = ((hash<<5)-hash) + key[i+j];
	}

	return ( hash ^ (hash >> 16) ) & 8191;
}

// Number Of Trees(GREATER THE BETTER): 3552103
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18650877

#define FNV1_32_INIT ((u_int32_t)2166136261)
#define FNV1_32_PRIME ((u_int32_t)588411137)

#define FNV_32A_OP(hash, octet) \
    (((u_int32_t)(hash) ^ (u_int8_t)(octet)) * FNV1_32_PRIME)

#define FNV_32A_OP32(hash, octet) \
    (((u_int32_t)(hash) ^ (u_int32_t)(octet)) * FNV1_32_PRIME)

int FNV1A_Hash_4_OCTETS(const char *str, int wrdlen)
{ 
u_int32_t hash;
char *p;

int wrdlen_QUADRUPLETS = wrdlen>>2;
hash = FNV1_32_INIT;
p=str;

// The goal of stage #1: to reduce number of 'imul's and mainly: the number of loops.

// Stage #1:
for (; wrdlen_QUADRUPLETS != 0; --wrdlen_QUADRUPLETS) {
    hash = FNV_32A_OP32(hash, (unsigned long)*(long *)p);
    p=p+4;
}

// Stage #2:
//for (; *p; ++p) {
//    hash = FNV_32A_OP(hash, *p);
//}
for (wrdlen_QUADRUPLETS = 0; wrdlen_QUADRUPLETS < (wrdlen & 3); wrdlen_QUADRUPLETS += 1) {
    hash = FNV_32A_OP(hash, *p);
    ++p;
}
  return ((hash>>16) ^ hash) & 8191; // 00..8191 i.e. 2^13=8192
}

regarding collisions it's up to your testbed, nevertheless I expect FNV1A_Hash_4_OCTETS to be a gem.
Peter, if you find it useful please put it along with generic FNV-1A.
Peter Kankowski,
I'm very sorry for the delay. First, the visualization.

Poorly performing functions are Fletcher and Novak. Not-so-good functions: Weinberger, x17, and Paul Hsieh. For this test, I used Georgi's list of all words in English Wikipedia (12.5 million words). You can try other word lists (the results will be similar) by running the benchmark with /c command-line switch and processing the output with get_google_chart.py script.

For small number of bits, all functions are equally poor because of the birthday paradox. I didn't implement the 32-bit hashing (it would require a rewrite of the benchmarking code), but the results should be similar to the above.

Georgi, I followed your suggestion and benchmarked a large word list (wikipedia-en-html.tar.wrd). Thanks for sharing your testbed.

Hardware-accelerated CRC and Murmur are winners again. Novak and Fletcher are at the bottom of the list because of high number of collisions.

Your unrolled version of FNV1A showed very good performance on small data sets. However, it has a high number of collisions in Numbers and IPv4 tests, so I cannot recommend it for general usage. Multiplications in the original FNV "mix up" the input characters to achieve avalanche effect. Unfortunately, when we remove some of them, this effect is lost.

About tuning for a given data set: there is perfect hashing (no collisions). It's used in parsers and other applications, where the list of words is fixed. For example, imagine a hash table for C++ keywords (if, else, for, class, etc.)

In the end, the winners didn't change: Murmur2 and hardware-accelerated CRC32 are the fastest hash functions. SBox looks promising, too.

ace,
If I use your data from the "large data set" table, and calculate percentages of regression for speed and for the number of collisions, I get:

Speed in percentage of regression (reference: the fastest non-hardware based):
==============================================================================

iSCSI CRC -11.27
Murmur2 0
x65599 1.75
Paul Larson 2.75
Alfalfa 3.35
FNV-1a 5.38
Hanson 8.55
SBox 11.27
CRC-32 11.52
K&R 11.82
x17 unrolled 14.42
Bernstein 14.43
Alfalfa_HALF 15.95
Sedgewick 17.17
lookup3 19.09
Paul Hsieh 19.44
Ramakrishna 26.11
MaPrime2c 30.04
One At Time 31.11
Arash Partow 31.78
Weinberger 44.85
FNV1A_unrolled 45.24
Novak unrolled 231.91
Fletcher 247.25

Collisions in percentage of regression (reference: the best):
=============================================================

Bernstein 0
CRC-32 0.04
Alfalfa_HALF 0.15
iSCSI CRC 0.17
Paul Larson 0.28
Sedgewick 0.31
FNV-1a 0.34
Murmur2 0.35
K&R 0.43
SBox 0.47
MaPrime2c 0.49
Arash Partow 0.5
lookup3 0.51
One At Time 0.66
Ramakrishna 0.92
x65599 1.38
Hanson 2.68
Paul Hsieh 5.11
Alfalfa 5.88
x17 unrolled 16.22
Weinberger 70.72
FNV1A_unrolled 193.48
Novak unrolled 204.62
Fletcher 336.97

Looking at it this way, Murmur2 is less than 3% faster than Larson, which is a simple constant: 101. Then looking at the number of collisions, there are a lot of functions which are different less than 1%, and Larson again scores very well. So my current conclusion would be, if I'd have to remember something for hashing up to a few million of English word-like entries, I'd just remember Larson's constant 101 (decimal) and stop worrying!

The graph is nice. The biggest visual insight is how so called "FNV1A_unrolled" becomes very bad at some fixed point and how much worse than most some functions are.

What graph shows is just the dependency of the number of collisions from the table size (table size == 2 ** bits) for different functions and a fixed large input set, and it's not unexpected that you can't see much for the table significantly smaller than the input set size. That's why I suggested something else: designing a function to always produce the input set for the given number of input elements.

I'd suggest that the set would have: words (40%) fixed string + words (20%) words + fixed string (20%) pure binary numbers, increasing (20%) Not because it's a good representation of any real input set but because we've already seen that it nicely amplifies the weaknesses. Then the goal is to select enough words from the 12 millions for any given target size. I'd sort the Wikipedia words with the sort function that as the highest criterion uses the number of letters and then I'd simply take the first N (where N is target table size * 0.4 for the first set etc.) Then, my initial idea was to fix the number of the input set elements and draw the separate graph like yours for such input sets. That was to allow visual comparison of functions even for smaller input sets, where we don't "see" much now.
Georgi 'Sanmayce',
Thank you Peter,
very informative and well done(especially the graphical chart).

Allow me to give some of my thoughts regarding collisions:
Looking at the chart I see no difference for all functions up to 22bits, which is something new for me and in the same time very good(I expected 16bits to be the upper limit for good behavior) because in my view 13-20bits are the most targeted range.

And for future tests(64bit compiler needed because long long is used) it would be interesting to show the performance of 'FNV1A_Hash_Granularity' when only 'FNV1A_Hash_Granularity(wrd, wrdlen>>3, 3);' invocations are used for all lengths. I have made tests with 'FNV1A_Hash_8_OCTETS' with 32bit(not 64bit) multiplications and the speed was crazy, but with high collisions(due to lost carryings).

Below is the new 'Alfalfa_DWORD': your function 'x17 unrolled' boosted even more with best collisions result on my biggest(22,202,980 latin-letters-words) test and very fast too.

// Caution: big/little endian dependent. Here Little-endian is used.
// More instructions but one memory(DWORD) access instead of 4 memory(BYTE) accesses.

1706 int Alfalfa_DWORD(const char *key, int wrdlen)
1707 {
1708         unsigned long hashDWORD, hashAlfalfa, iAlfalfa, jAlfalfa;
1709         hashAlfalfa = 7;
1710         for(iAlfalfa = 0; iAlfalfa < (wrdlen & -4); iAlfalfa += 4) {
1711             hashDWORD=*(unsigned long *)(&key[iAlfalfa]);
1712             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashDWORD>>0)&0xFF) ) + ((hashDWORD>>8)&0xFF);
1713             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashDWORD>>16)&0xFF) ) + ((hashDWORD>>24)&0xFF);
1714         }
1715         for(jAlfalfa = 0; jAlfalfa < (wrdlen & 3); jAlfalfa += 1) {
1716             hashAlfalfa = (17+9) * hashAlfalfa + key[iAlfalfa+jAlfalfa];
1717         }
1718         return ( hashAlfalfa ^ (hashAlfalfa >> 16) ) & 8191;
1719 }

/*
PUBLIC	_Alfalfa_DWORD
; Function compile flags: /Ogtpy
_TEXT	SEGMENT
_key$ = 8						; size = 4
_wrdlen$ = 12						; size = 4
_Alfalfa_DWORD PROC
; Line 1707
	push	ebx
; Line 1710
	mov	ebx, DWORD PTR _wrdlen$[esp]
	push	esi
	mov	esi, ebx
	xor	edx, edx
	and	esi, -4					; fffffffcH
	push	edi
	mov	edi, DWORD PTR _key$[esp+8]
	mov	ecx, 7
	jbe	SHORT $LN4@Alfalfa_DW
	push	ebp
	npad	6
$LL6@Alfalfa_DW:
; Line 1711
	mov	eax, DWORD PTR [edx+edi]
; Line 1713
	imul	ecx, 26					; 0000001aH
	movzx	ebp, al
	add	ebp, ecx
	imul	ebp, 26					; 0000001aH
	mov	ecx, eax
	shr	ecx, 8
	and	ecx, 255				; 000000ffH
	add	ebp, ecx
	imul	ebp, 26					; 0000001aH
	mov	ecx, eax
	shr	ecx, 16					; 00000010H
	and	ecx, 255				; 000000ffH
	add	ecx, ebp
	imul	ecx, 26					; 0000001aH
	shr	eax, 24					; 00000018H
	add	edx, 4
	add	ecx, eax
	cmp	edx, esi
	jb	SHORT $LL6@Alfalfa_DW
	pop	ebp
$LN4@Alfalfa_DW:
; Line 1715
	xor	eax, eax
	and	ebx, 3
	mov	esi, ebx
	jbe	SHORT $LN1@Alfalfa_DW
	add	edx, edi
$LL3@Alfalfa_DW:
; Line 1716
	movsx	edi, BYTE PTR [edx+eax]
	imul	ecx, 26					; 0000001aH
	inc	eax
	add	ecx, edi
	cmp	eax, esi
	jb	SHORT $LL3@Alfalfa_DW
$LN1@Alfalfa_DW:
; Line 1718
	mov	eax, ecx
	shr	eax, 16					; 00000010H
	pop	edi
	xor	eax, ecx
	pop	esi
	and	eax, 8191				; 00001fffH
	pop	ebx
; Line 1719
	ret	0
_Alfalfa_DWORD ENDP
_TEXT	ENDS
*/
Peter Kankowski,
Georgi, unfortunately, Alfalfa_DWORD wasn't faster than Alfalfa in my test (the generated assembly code was exactly the same as yours).

A 64-bit benchmark would be interesting, and I will do it in future; thank you for the idea. Most functions are tuned for 32 bits, but some should be fast in 64-bit mode.

There is no difference between hash functions for small number of bits, because the benchmark attempts to squeeze 12.5 million words into 221 ≈ 2 million cells. It's my fault :( Not only hash table size, but also the input set size should be reduced, as Ace proposed.

Ace, it would be easier if you do the visualization as you wanted. You can freely use my code and Google Charts API to draw the graph. I'm very interested in the results and you can publish them here.

I agree that Larson's hash scores well on most tests, and it's very simple. Thanks for the scaled results on the large data set, they are much more readable.

ace,
Something completely different: a visualization of the change in positions of the functions on Pentium M and then on i5 for your initial tests:

svg (I use Opera, haven't tried others): http://pastebin.com/raw.php?i=btMpwEkb

png: http://i54.tinypic.com/2ryse1w.png

What that says to me is that your original tests contain too much "noise" compared to the "signal" (at least for my taste!). But they still helped detect some serious problems in some of the functions, and all together some very insightful steps came as the topic developed. Peter, thank you for maintaining this topic, its educational value is really high!
ace,
pm vs i5 vs 12e6 svg: http://pastebin.com/raw.php?i=t4Bsc0KW
Georgi 'Sanmayce',
Yes Peter, it is weird, 'Alfalfa_DWORD' and 'Alfalfa_HALF' are capricious, I don't have enough knowledge to figure out where the bottleneck lies.
I guess if it is written in assembly(to avoid these ugly BYTE extracts) the picture will be different, also due to short strings(the average word length in Wikipedia-en is 10 bytes: 2 loops within the first cycle, which could be unrolled too, but I don't like such a 'tuning') speed-performance of 'Alfalfa_DWORD' and 'FNV1A_Hash_4_OCTETS' suffers.

When multi millions of keys are involved, I firmly believe that a cascade of hash functions must be used to ensure a good(no worst case allowed) distribution/dispersion, then 22bits are quite enough, in my case instead of using some 24bits I chose using 5-bits/5-bits/13bits. Here the principle 'know your data' comes as a rule. As for the birthday paradox it supports such an approach fully. In short: an application designed for speed must not rely on some kind of general-purpose functions but on uncompromising/tuned ones.

Talking of 'FNV' variants(very well defined at Mr. Noll's site - the home of FNV), I think there is a niche where even 128bit(with greedy 64bit multiplications used similarly to 32/64bit counterpart, in order to gain speed as in 'FNV1A_Hash_8_OCTETS') would be interesting to be explored/tested.
Peter Kankowski,
Ace, you are welcome! :) The original tests are okay, but the processors are just different. Here are the results of "Wikipedia" test on Pentium M:
WikipediaScaled
x17 unrolled11468094[2410605]1.00[1.16]
Alfalfa_HALF11700704[2077426]1.02[1.00]
K&R11770971[2083145]1.03[1.00]
Bernstein11885598[2074237]1.04[1.00]
Alfalfa11967569[2196163]1.04[1.06]
Paul Larson12039531[2080111]1.05[1.00]
Alfalfa_DWORD12106308[2196163]1.06[1.06]
Sedgewick12264250[2080640]1.07[1.00]
x6559912220786[2102893]1.07[1.01]
Ramakrishna12323472[2093253]1.07[1.01]
Arash Partow12374518[2084572]1.08[1.00]
CRC-3212738877[2075088]1.11[1.00]
Murmur212867441[2081476]1.12[1.00]
Hanson12802025[2129832]1.12[1.03]
SBox13004625[2084018]1.13[1.00]
lookup312941665[2084889]1.13[1.01]
FNV-1a13035702[2081195]1.14[1.00]
Paul Hsieh13155169[2180206]1.15[1.05]
MaPrime2c13537606[2084467]1.18[1.00]
One At Time13827753[2087861]1.21[1.01]
Weinberger14758921[3541181]1.29[1.71]
FNV1A_unrolled20442267[6082342]1.78[2.93]
Fletcher37899467[9063797]3.30[4.37]
Novak unrolled38123202[6318611]3.32[3.05]

As you can see, they are very different from the results on Core i5. So the reason is differences between microarchitectures.

ace,
Thanks for Pentium M Wikipedia words tests! Now we have even more to compare! For the things I do professionally, the differences in architectures are easily observable, and basically don't change most of the scores except in some very specific cases, like handling of the unaligned memory access. I just wanted to point that the initial test scores as they are visible now don't seem to be able to directly give some clear answer which functions are good, only which are obviously worse. Looking at the all 4 speed scores, it's not immediately clear what would be the best choice, but we clearly don't want to think about new function every time the processor changes. Much more convenient is taking one that is always "good enough."

I'm an engineer. Here's how we solve such things: everything that's inside of the 10% span can be considered the same! We should first drop the functions for which we know that are obviously bad, even when they are good in some special occasions. That means, Novak, Fletcher, FNV1A_unrolled, Weinberger must go, but also Hanson (Interesting: the nature of the input set of Wikipedia words only is quite forgiving to Hanson function for which we know that it loses bits in every iteration) and Hsieh and x17 according to the collisions graph. Then, from the remaining ones, we keep those that are inside of 10% staring from the first remaining, consider them *the same*, then inside of the next 10%, consider these the same but belonging to the lower group, the rest we drop. We repeat this in all 4 cases. Which are then always among first 10%? Which are sometimes in top 10% sometimes in other 10%? We need to know only 4 resulting groups. I proposed the "initial drop" group. Now let's make the "always in top 10%" "less good" and "even less good" groups. Which function is where?

(If this approach works, we can then generate the tests on a few more platforms and use these scores too. Again the winner is any function that is still in "always top 10%" (if there is such, if not, then those in "less good" etc). The processors as MIPS and ARM are certainly something that is more different than all x86. I can run something on my MIPS-based router, but only if it doesn't use more than around 10 MB of RAM and can be compiled with GCC.)
Georgi 'Sanmayce',
In my pre-previous comment I said something stupid:
"Looking at the chart I see no difference for all functions up to 22bits, which is something new for me ...",
I was thinking of my own approach and talking of something else/obvious, from time-to-time I do such blunders.
Of course when 4,194,304 slots are given with incoming 12,561,874 keys the resultant collisions equal (12,561,874-4,194,304)..(12,561,874-1).
What was on my mind: the slot[s] out of 6,602,752 with the MAXIMUM number of collisions - my explanation for such thinking failures: a mind troubled by anxiety.

Again, the thing I am/was interested in: to examine/find the slot[s] with MAXIMUM collisions(in fact the depth) for each bucket(hash table) - this is crucial for any further processing whether linked-lists/binary-search-trees/b-trees or another hash.
In my view the useful info(even when keys<=slots) is not the hash table utilization but the depth(how much it is nested). For example if 2^20 slots are used with 2^24 keys I care of the peak: the ideal case is 2^4 depth or 16 layers one megabyte each. If 2^13 slots are used with 2^12 keys I care not so much for zero collisions as for not existing a slot with depth>some_THRESHOLD, of course the primary goal is to hash the keys at highest possible speed.

Peter, if you find it useful(in sake for gaining more experience) please include and this one(as 32bit or/and 64bit):

// Number Of Trees(GREATER THE BETTER): 3557181
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18645799

// Caution: big/little endian dependent.
// More instructions but one memory(QWORD) access instead of 8 memory(BYTE) accesses.

1736 int Alfalfa_QWORD(const char *key, int wrdlen)
1737 {
1738         unsigned long long hashQWORD;
1739 //      unsigned long iAlfalfa, jAlfalfa;
1740         unsigned long hashAlfalfa = 7;
1741 
1742         for(; wrdlen >= 8; wrdlen -= 8, key += 8) {
1743             hashQWORD=*(unsigned long long*)key;
1744 //      for(iAlfalfa = 0; iAlfalfa < (wrdlen & -8); iAlfalfa += 8) {
1745 //          hashQWORD=*(unsigned long long *)(&key[iAlfalfa]);
1746             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>0)&0xFF) ) + ((hashQWORD>>8)&0xFF);
1747             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>16)&0xFF) ) + ((hashQWORD>>24)&0xFF);
1748             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>32)&0xFF) ) + ((hashQWORD>>40)&0xFF);
1749             hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>48)&0xFF) ) + ((hashQWORD>>56)&0xFF);
1750         }
1751         for(; wrdlen; wrdlen--, key++)
1752             hashAlfalfa = (17+9) * hashAlfalfa + *key;
1753 //      for(jAlfalfa = 0; jAlfalfa < (wrdlen & 7); jAlfalfa += 1) {
1754 //          hashAlfalfa = (17+9) * hashAlfalfa + key[iAlfalfa+jAlfalfa];
1755 //      }
1756         return ( hashAlfalfa ^ (hashAlfalfa >> 16) ) & 8191;
1757 }

/*
Alfalfa_QWORD PROC
; Line 1737
$LN13:
	sub	rsp, 8
; Line 1742
	cmp	edx, 8
	mov	r10d, edx
	mov	r11, rcx
	mov	r9d, 7
	jl	$LN4@Alfalfa_QW@2
	mov	QWORD PTR [rsp], rbx
	mov	rbx, r10
	shr	rbx, 3
	mov	eax, ebx
	neg	eax
	lea	r10d, DWORD PTR [r10+rax*8]
	npad	4
$LL6@Alfalfa_QW@2:
; Line 1743
	mov	rdx, QWORD PTR [r11]
	add	r11, 8
; Line 1746
	mov	rax, rdx
	shr	rax, 8
	movzx	r8d, al
; Line 1747
	mov	rax, rdx
	shr	rax, 16
	movzx	ecx, al
	mov	rax, rdx
	shr	rax, 24
; Line 1749
	imul	r8d, 26
	add	r8d, ecx
	movzx	ecx, al
	mov	rax, rdx
	shr	rax, 32					; 00000020H
	imul	r8d, 26
	add	r8d, ecx
	movzx	ecx, al
	mov	rax, rdx
	shr	rax, 40					; 00000028H
	imul	r8d, 26
	add	r8d, ecx
	movzx	ecx, al
	mov	rax, rdx
	shr	rax, 48					; 00000030H
	imul	r8d, 26
	add	r8d, ecx
	movzx	ecx, al
	movzx	eax, dl
	shr	rdx, 56					; 00000038H
	imul	r8d, 26
	add	r8d, ecx
	imul	eax, 558124416				; 21444d80H
	imul	r8d, 26
	sub	r8d, eax
	mov	eax, r9d
	add	r8d, edx
	imul	eax, 1626332928				; 60efdf00H
	mov	r9d, r8d
	sub	r9d, eax
	sub	rbx, 1
	jne	$LL6@Alfalfa_QW@2
	mov	rbx, QWORD PTR [rsp]
$LN4@Alfalfa_QW@2:
; Line 1751
	test	r10d, r10d
	je	SHORT $LN1@Alfalfa_QW@2
$LL3@Alfalfa_QW@2:
; Line 1752
	movsx	eax, BYTE PTR [r11]
	imul	r9d, 26
	inc	r11
	add	r9d, eax
	sub	r10d, 1
	jne	SHORT $LL3@Alfalfa_QW@2
$LN1@Alfalfa_QW@2:
; Line 1756
	mov	eax, r9d
	shr	eax, 16
	xor	eax, r9d
	and	eax, 8191				; 00001fffH
; Line 1757
	add	rsp, 8
	ret	0
Alfalfa_QWORD ENDP
*/

And also for research purposes the eight-bytes-at-once-FNV-variant:

// Number Of Trees(GREATER THE BETTER): 3526192
// Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53%
// Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18676788

#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
#define FNV1_64_PRIME ((u_int64_t)1099511628211)

#define FNV_64A_OP(hash, octet) \
    (((u_int64_t)(hash) ^ (u_int8_t)(octet)) * FNV1_64_PRIME)

#define FNV_64A_OP64(hash, octet) \
    (((u_int64_t)(hash) ^ (u_int64_t)(octet)) * FNV1_64_PRIME)

int FNV1A_Hash_8_OCTETS(const char *str, int wrdlen)
{ 
u_int64_t hash64;
char *p;

int wrdlen_OCTUPLET = wrdlen>>3;
hash64 = FNV1_64_INIT;
p=str;

// The goal of stage #1: to reduce number of 'imul's and mainly: the number of loops.

// Stage #1:
for (; wrdlen_OCTUPLET != 0; --wrdlen_OCTUPLET) {
    hash64 = FNV_64A_OP64(hash64, (u_int64_t)*(u_int64_t *)p); // SLOWer but with carry
    p=p+8;
}

// Stage #2:
//for (; *p; ++p) {
//    hash = FNV_32A_OP(hash, *p);
//}
for (wrdlen_OCTUPLET = 0; wrdlen_OCTUPLET < (wrdlen & 7); wrdlen_OCTUPLET += 1) {
    hash64 = FNV_64A_OP(hash64, (u_int8_t)*(u_int8_t *)p); // SLOWer but with carry
    ++p;
}

// 5*13 = 64+1 or 1*12+4*13 = 64 i.e. shift by 12,25,38,51
return ( (hash64>>(64-(1*12+0*13))) ^ (hash64>>(64-(1*12+1*13))) ^ (hash64>>(64-(1*12+2*13))) ^ (hash64>>(64-(1*12+3*13))) ^ hash64) & 8191; // SLOWer but with carry
}
Enough, I am stopping with offering more functions.
ace,
I've tried to reject the functions that have problems (or are hardware dependent) and using the following scoring: 1 -- inside of 10% from the best, 2 -- 10-20% 3 -- 20%-100%, and I get:
i5 big

1 Alfalfa,  Alfalfa_DWORD,  FNV-1a,  Larson,  Murmur2,  x65599
2 Alfalfa_HALF,  Bernstein,  CRC-32,  K&R,  SBox,  Sedgewick,  lookup3
3 MaPrime2c,  One_At_Time,  Partow,  Ramakrishna


PM big

1 Alfalfa,  Alfalfa_DWORD,  Alfalfa_HALF,  Bernstein,  CRC-32,  K&R,  Larson,  Murmur2,  Partow,  Ramakrishna,  Sedgewick,  x65599
2 FNV-1a,  MaPrime2c,  One_At_Time,  SBox,  lookup3
For small tests, I use ordering which favors the function that has the most best scores and group them:
i5 small

321111111 Murmur2
322111111 Alfalfa_HALF
331111111 Larson,  Sedgewick
332211111 CRC-32,  SBox
333111111 FNV-1a,  x65599
333211111 Bernstein
333221111 K&R
333322222 MaPrime2c,  Ramakrishna
333332222 lookup3
432111111 Alfalfa,  Alfalfa_DWORD
433333222 Partow
433333322 One_At_Time


PM Small

221111111 SBox
222111111 CRC-32
322111111 Alfalfa_HALF
331111111 Murmur2
332111111 Larson
332211111 K&R,  lookup3
333211111 Bernstein,  x65599
333222111 Sedgewick
333322221 FNV-1a
333332222 Ramakrishna
333333222 Partow
333333322 MaPrime2c
333333333 One_At_Time
432221111 Alfalfa
433222222 Alfalfa_DWORD
Now I'd drop these that scored 4 in "short" tests (more than 100% worse than the best for the given test): Alfalfa, Alfalfa_DWORD, Partow, One_At_Time

Second, I can drop the only ones that scored "3" in i5big test: MaPrime2c and Ramakrishna (they are obviously worst in pmsmall test too)

Then, I can drop the only one which never scores 1 in small test: lookup3, it's good since it scores always 2 in i5big and pmbig. That leaves:
i5 big

1 FNV-1a,  Larson,  Murmur2,  x65599
2 Alfalfa_HALF,  Bernstein,  CRC-32,  K&R,  SBox,  Sedgewick


PM big

1 Alfalfa_HALF,  Bernstein,  CRC-32,  K&R,  Larson,  Murmur2,  Sedgewick,  x65599
2 FNV-1a,  SBox


i5 small

321111111 Murmur2
322111111 Alfalfa_HALF
331111111 Larson,  Sedgewick
332211111 CRC-32,  SBox
333111111 FNV-1a,  x65599
333211111 Bernstein
333221111 K&R


PM Small

221111111 SBox
222111111 CRC-32
322111111 Alfalfa_HALF
331111111 Murmur2
332111111 Larson
332211111 K&R
333211111 Bernstein,  x65599
333222111 Sedgewick
333322221 FNV-1a
Now, using "big" results it's obvious that the winners are Larson, Murmur2 and x65599 (the only three that scored 1 by both big).

Using "small" results, we can't see much. We see that SBox and CRC-32 never score 3 on PM (meaning that the table lookups are really fast there and that the resulting behaviour is good). Otherwise, every function sometimes scores 3. That again can only mean that these measurements were "too fuzzy" to be of too much use. And that leaves open if using short tests is relevant enough, in the form they are now, and if we were fair to drop some function that scored 4 there and were 1 in bigs (Alfalfa, Alfalfa_DWORD).

If the small tests are relevant enough to mean something and if we believe bigs more, Murmur2 is the best, the second is Larson, the third x65599.

However if we look at how complex the functions are to implement, x65599 is the simplest, requiring no fast MUL in the CPU, the second (as it doesn't require alignment) is Larson, the third is Murmur2 which has the highest demands on the hardware.

The conclusion:

1) It seems that x65599 is confirmed to be both very good and simple. Larson is the second simple function. Murmur2 can be better as soon as you have aligned strings and use modern CPU, or if you have some demands not covered enough by all the above tests.

2) The "small" tests in the current form still look "too fuzzy" for my taste to trust them more than as to point "what's not good" (if even so much). I don't know how they can be improved, instead, I'd rather modify or add inputs to "big" tests in order to have more values that we saw that are able to detect the problems in functions tested in "small" (prefix, suffix, numbers, binary numbers).
ace,
More important: whichever function among:

FNV-1a, Larson, Murmur2, x65599, Alfalfa_HALF, Bernstein, CRC-32, K&R, SBox, Sedgewick

you use, the worst that can happen in the tested cases is to have sometimes the slowdown of 2 compared to the best possible outcome tested, most of the times you can expect to be inside of 20% difference, so if you use one of the above functions in the scenarios close to the tested ones, you practically don't have to worry if you selected the right one. Factor 2 difference is not much compared to much bigger differences (i.e. factors 100) that are possible to be achieved by only allocating too often, for example.

So in my opinion, as long as the practical "speed competition" can't be organized to consistently demonstrate bigger obvious differences, it seems that we shouldn't worry too much about the speed.
Georgi 'Sanmayce',
I just cannot stand the unfinished/semi-tuned functions, so here comes the MULless(only shifts) variant of 'FNV1A_Hash_4_OCTETS':
The damage is undone, the ugly inconsistency(Mobile vs i5) exterminated, the result for 25bits(very useful testbed hash.cpp Peter, thank you) on Pentium Merom is:
Note for FNV1A_unrolled: choosing small multiplier(as Mr.Noll advised) numberly FNV1_32_PRIME=588411137 becomes 31 and removing all multiplications.
Note for 'Alfalfa_HALF': re-arranged expressions in order to avoid converting(compiler plays badly here) shifts with multiplications.
Note for 'Alfalfa': (17+9) replaced with 53. Peter here is Alfalfa's speed-performance on my Pentium Merom with (17+9) changed to 13 and 19 for Wikipedia-en words test:
13 11,525,803 [2,627,025]
19 11,676,574 [2,292,587]
53 12,093,768 [2,074,883]
and results for x17 unrolled:
11,767,768 [2,410,605]

D:\_KA45F~1\_w>release\hash.exe wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
        x17 unrolled:   11877089  11799077  11776420  11776905  11767768|  11767768 [2410605]
      FNV1A_unrolled:   12789079  12791080  12816821  12877818  12799565|  12789079 [2191287]
             Alfalfa:   12096628  12100191  12093768  12114109  12121803|  12093768 [2074883]
        Alfalfa_HALF:   12028618  12013585  12022435  12034921  12092433|  12013585 [2077426]

D:\_KA45F~1\_w>hash_benchmark.bat
Sonnets 
3228 lines read
8192 elements in the table (13 bits)
        x17 unrolled:        603       607       581       581       581|       581 [  589]
      FNV1A_unrolled:        546       519       518       517       535|       517 [  570]
             Alfalfa:        561       556       573       554       555|       554 [  570]
        Alfalfa_HALF:        587       607       577       576       594|       576 [  543]

D:\_KA45F~1\_w>

FNV1A_unrolled a.k.a. 'FNV1A_Hash_4_OCTETS_MULless' follows:
?HashFNV1A_unrolled@@YAIPBDK@Z PROC			; HashFNV1A_unrolled

; 753  : 	//const UINT PRIME = 31;
; 754  : 	UINT hash = 2166136261;
; 755  : 	const CHAR * p = str;

  00210	8b 54 24 04	 mov	 edx, DWORD PTR _str$[esp-4]
  00214	56		 push	 esi
  00215	57		 push	 edi

; 756  : 
; 757  : 	// Reduce the number of multiplications by unrolling the loop
; 758  : 	for (SIZE_T ndwords = wrdlen / sizeof(DWORD); ndwords; --ndwords) {

  00216	8b 7c 24 10	 mov	 edi, DWORD PTR _wrdlen$[esp+4]
  0021a	8b f7		 mov	 esi, edi
  0021c	c1 ee 02	 shr	 esi, 2
  0021f	b9 c5 9d 1c 81	 mov	 ecx, -2128831035	; 811c9dc5H
  00224	85 f6		 test	 esi, esi
  00226	74 13		 je	 SHORT $LN4@HashFNV1A_
$LL6@HashFNV1A_:

; 759  : 		//hash = (hash ^ *(DWORD*)p) * PRIME;
; 760  : 		hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);

  00228	8b 02		 mov	 eax, DWORD PTR [edx]
  0022a	33 c1		 xor	 eax, ecx
  0022c	8b c8		 mov	 ecx, eax
  0022e	c1 e1 05	 shl	 ecx, 5
  00231	2b c8		 sub	 ecx, eax

; 761  : 
; 762  : 		p += sizeof(DWORD);

  00233	83 c2 04	 add	 edx, 4
  00236	83 ee 01	 sub	 esi, 1
  00239	75 ed		 jne	 SHORT $LL6@HashFNV1A_
$LN4@HashFNV1A_:

; 763  : 	}
; 764  : 
; 765  : 	// Process the remaining bytes
; 766  : 	for (SIZE_T i = 0; i < (wrdlen & (sizeof(DWORD) - 1)); i++) {

  0023b	83 e7 03	 and	 edi, 3
  0023e	8b f7		 mov	 esi, edi
  00240	76 12		 jbe	 SHORT $LN1@HashFNV1A_
$LL3@HashFNV1A_:

; 767  : 		//hash = (hash ^ *p++) * PRIME;
; 768  : 		hash = ((hash ^ *p)<<5) - (hash ^ *p);

  00242	0f be 02	 movsx	 eax, BYTE PTR [edx]
  00245	33 c1		 xor	 eax, ecx
  00247	8b c8		 mov	 ecx, eax
  00249	c1 e1 05	 shl	 ecx, 5
  0024c	2b c8		 sub	 ecx, eax

; 769  : 		p++;

  0024e	42		 inc	 edx
  0024f	83 ee 01	 sub	 esi, 1
  00252	75 ee		 jne	 SHORT $LL3@HashFNV1A_
$LN1@HashFNV1A_:

; 770  : 	}
; 771  : 
; 772  : 	return (hash>>16) ^ hash;

  00254	8b c1		 mov	 eax, ecx
  00256	c1 e8 10	 shr	 eax, 16			; 00000010H
  00259	5f		 pop	 edi
  0025a	33 c1		 xor	 eax, ecx
  0025c	5e		 pop	 esi

; 773  : }

  0025d	c3		 ret	 0
?HashFNV1A_unrolled@@YAIPBDK@Z ENDP			; HashFNV1A_unrolled

Alfalfa_HALF follows:
?HashAlfalfa_HALF@@YAIPBDK@Z PROC			; HashAlfalfa_HALF

; 738  : {

  00260	53		 push	 ebx
  00261	55		 push	 ebp

; 739  : 	UINT hash = 12;
; 740  : 	UINT hashBUFFER;
; 741  : 	SIZE_T i;
; 742  : 	for(i = 0; i < (wrdlen & -4); i += 4) {

  00262	8b 6c 24 10	 mov	 ebp, DWORD PTR _wrdlen$[esp+4]
  00266	56		 push	 esi
  00267	8b dd		 mov	 ebx, ebp
  00269	33 d2		 xor	 edx, edx
  0026b	83 e3 fc	 and	 ebx, -4			; fffffffcH
  0026e	57		 push	 edi
  0026f	8b 7c 24 14	 mov	 edi, DWORD PTR _key$[esp+12]
  00273	b9 0c 00 00 00	 mov	 ecx, 12			; 0000000cH
  00278	76 44		 jbe	 SHORT $LN4@HashAlfalf
  0027a	8d 9b 00 00 00
	00		 npad	 6
$LL6@HashAlfalf:

; 743  : 		//hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]);
; 744  : 		hashBUFFER = ((hash<<5)-hash) + key[i];
; 745  : 		hash = (( hashBUFFER )<<5) - ( hashBUFFER ) + (key[i+1]);
; 746  : 		//hash = (( ((hash<<5)-hash) + key[i+2] )<<5) - ( ((hash<<5)-hash) + key[i+2] ) + (key[i+3]);
; 747  : 		hashBUFFER = ((hash<<5)-hash) + key[i+2];
; 748  : 		hash = (( hashBUFFER )<<5) - ( hashBUFFER ) + (key[i+3]);

  00280	0f be 34 17	 movsx	 esi, BYTE PTR [edi+edx]
  00284	8b c1		 mov	 eax, ecx
  00286	c1 e0 05	 shl	 eax, 5
  00289	2b c1		 sub	 eax, ecx
  0028b	0f be 4c 17 01	 movsx	 ecx, BYTE PTR [edi+edx+1]
  00290	03 f0		 add	 esi, eax
  00292	8b c6		 mov	 eax, esi
  00294	c1 e0 05	 shl	 eax, 5
  00297	2b c6		 sub	 eax, esi
  00299	03 c1		 add	 eax, ecx
  0029b	8b c8		 mov	 ecx, eax
  0029d	c1 e1 05	 shl	 ecx, 5
  002a0	2b c8		 sub	 ecx, eax
  002a2	0f be 44 17 02	 movsx	 eax, BYTE PTR [edi+edx+2]
  002a7	03 c8		 add	 ecx, eax
  002a9	8b c1		 mov	 eax, ecx
  002ab	c1 e0 05	 shl	 eax, 5
  002ae	2b c1		 sub	 eax, ecx
  002b0	0f be 4c 17 03	 movsx	 ecx, BYTE PTR [edi+edx+3]
  002b5	83 c2 04	 add	 edx, 4
  002b8	03 c8		 add	 ecx, eax
  002ba	3b d3		 cmp	 edx, ebx
  002bc	72 c2		 jb	 SHORT $LL6@HashAlfalf
$LN4@HashAlfalf:

; 749  : 	}
; 750  : 	for(SIZE_T j = 0; j < (wrdlen & 3); j += 1) {

  002be	33 c0		 xor	 eax, eax
  002c0	83 e5 03	 and	 ebp, 3
  002c3	8b f5		 mov	 esi, ebp
  002c5	76 1d		 jbe	 SHORT $LN1@HashAlfalf
  002c7	03 d7		 add	 edx, edi
  002c9	8d a4 24 00 00
	00 00		 npad	 7
$LL3@HashAlfalf:

; 751  : 		hash = ((hash<<5)-hash) + key[i+j];

  002d0	0f be 3c 02	 movsx	 edi, BYTE PTR [edx+eax]
  002d4	8b d9		 mov	 ebx, ecx
  002d6	c1 e3 05	 shl	 ebx, 5
  002d9	2b d9		 sub	 ebx, ecx
  002db	03 fb		 add	 edi, ebx
  002dd	40		 inc	 eax
  002de	8b cf		 mov	 ecx, edi
  002e0	3b c6		 cmp	 eax, esi
  002e2	72 ec		 jb	 SHORT $LL3@HashAlfalf
$LN1@HashAlfalf:
  002e4	5f		 pop	 edi
  002e5	5e		 pop	 esi

; 752  : 	}
; 753  : 	return hash ^ (hash >> 16);

  002e6	8b c1		 mov	 eax, ecx
  002e8	c1 e8 10	 shr	 eax, 16			; 00000010H
  002eb	5d		 pop	 ebp
  002ec	33 c1		 xor	 eax, ecx
  002ee	5b		 pop	 ebx

; 754  : }

  002ef	c3		 ret	 0
?HashAlfalfa_HALF@@YAIPBDK@Z ENDP			; HashAlfalfa_HALF
Peter please update(if you find them useful) above functions, they are now finished.
Peter Kankowski,
Ace, thank you very much for the 10%-20%-100% comparison and analysis. I've noticed that the results on Core i5 are unstable and change with each run. The usual measures (warming up caches, increasing the number of runs) do not help. The code works perfectly on Pentium M, that's why I didn't notice this problem before. The Core i5 results will be removed from this page until I fix the benchmark.

I agree that the difference between, say, SBox and CRC-32 (in software) is negligible, but try to compare Weinberger and Murmur2. Factor 2 difference is noticable for an end user. Also remember that I excluded some really bad functions from the test (e.g., Two chars). So, the choice of hash function matters, and we should explore different architectures. Fully agree with you about the best function, which is Murmur2; Larson and x65599 are nice, too.

Georgi, the updated FNV1A_unrolled is much better at Wikipedia test, congratulations! Regarding the slots with maximum collisions. Red Dragon Book proposes the following formula for evaluating hash function quality:

sum from j=0 to m-1: b_j(b_j+1)/2 / [(n/2m)(n+2m-1)]

where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value (I used a counter for the number of visited slots in earlier version of the benchmark, which is similar). The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, so AFAIK it's not bad.

Here are the results for some of our functions:

Hash function quality (using the formula from Red Dragon book). In Numbers test: K&R and Bernstein - 1.6, x65599 - 1.2, x17 and Paul Larson - 0.8, CRC-32 - 0.9. SBox, Murmur2, Paul Hsieh, lookup3 - between 0.95 and 1.05. In other tests all functions have the result between 0.95 and 1.05.

Murmur2 is the best, again. I wrote a Python 3.x script (count_quality.py) for generating such charts.

ace,
What's m and n for the quality graph you made? How about this: increasing number of slots (and input size) as I suggested (do use words, but also generated prefix, postfix and numbers for every set) and drawing small bars instead circles for each case? Everything can fit in the same graph area as now, as you'd need only a few pixels in width for every slot size increase and you can keep all the bars for different input sizes near one another for the same function. We'd know much more about each function. Example: KR (size1, size2... ) would take as much width as now if the marks are narrow and near.

I also like how it's nicely visible that numbers, prefix and postfix are the best stress tests for a lot of functions.

Regarding "factor 2": it's the fact that your "small" measurements can't point to a single function which stays *always* on top. That's why I rejected those that certainly make more than factor 2 (that score 4) or exhibit problems. The remaining ones make less (score 3 means: 20%-100%) and even that not often -- note that in your PM big test all functions:

Alfalfa, Alfalfa_DWORD, Alfalfa_HALF, Bernstein, CRC-32, K&R, Larson, Murmur2, Partow, Ramakrishna, Sedgewick, x65599, FNV-1a, MaPrime2c, One_At_Time, SBox, lookup3

are not more different than 20% from the best one, and most not more different than 10%! And that's for big inputs. For small inputs, there's practically no chance that you'll notice the difference. For the big inputs too, it's not worth the invested time to consider them: 3 seconds or 2.6 seconds? Who sees that? Most programmers make orders of magnitude bigger obvious overheads than that.

Therefore I like any direction that would either demonstrate more obvious time differences or provide some other relevant criterion for evaluation.

Regarding i5 measurements: I suggest you (as I always did): don't use the micrometer to measure the distance between cities. Produce the tests that take enough time to be able to measure them using independent clock (QueryPerf..) by simply making the test running long enough (don't measure too small inputs, they deceive, or if you do need to measure small inputs do a lot of small different inputs -- like always using a new part of wikipedia words, and the new numbers values etc). Then, if it's still needed, make the explicit overhead of some operation to make the meaningful differences more obvious. For example, make the cost of fetching the values further in the same chain higher (before they are compared with the input value). That's reasonable to expect as soon as your hash table and values in it are much bigger than the CPU cache. See what you get then... Then add allocations -- see if that makes any other differences irrelevant (nice to know) etc...
Peter Kankowski,
In the tests by Aho, Sethi, and Ullman, m = 211 and n = 50 .. 1200. I used n = 500 .. 13 000 and m = 2 × round_to_next_power_of_two(n).

If you add allocations or some other "overhead", the result will be completely unpredictable, because you will be measuring not only your hash function, but also the allocator performance. I'm moving in another direction: trying to eliminate all side effects on Core i5 (on Pentium M, they are already eliminated).

Very large in-memory hash tables are rarely used in real-world programs. Large databases are typically implemented with B-trees, but I'm interested in small hash tables such as a symbol table in compiler.

ace,
> I used n = 500 .. 13 000

Do we see graph for all n? I thought it's for only one n? It would be interesting to see how the results vary as n changes.

> m = 2 × round_to_next_power_of_two(n)

if you target the "real life" scenarios, how do you expect your compiler to know n in advance to determine m? AFAIK compilers typically have fixed m or rehash as little as possible and then you can vary only n. If m is not fixed then you have to add "rehashing" too to your measurements.

And what kind of the compiler do you target? Modern C++ compilers now have very long (decorated) identifiers and awfully lot values thanks to the libraries and approaches like boost. All Win32 and framework headers were also quite big before, but some orders of magnitude smaller. It has sense to measure for a lot of values. Long too.

> trying to eliminate all side effects on Core i5

I'd just make the test to last at least a few seconds of calculation for each measurement -- if I'd measure for small tables, I'd feed them with various inputs (now with the big set of words, there's enough to use, for every pass the new ones!) and measure the total time. I'm quite sure that gives much more repeatable results, and it can show better differences too. It's simple: if you use too little to measure you measure more noise than the signal. And of course I wouldn't measure the ticks but the "real" time between the start and the end of every such calculation.
ace,
And regarding allocation of course the cost of it would be independent of the function, I just guess that can trump other costs. It won't help to find "the best" just to get the idea of the general orders of magnitude.
ace,
By the way if what here is called Georgi's FNV1_unrolled uses 31 mult, how is it anything that can be named FNV at all? Isn't it plain and simple K&R? How about being clear what's really what?
Georgi 'Sanmayce',
Thank you Peter for your efforts and for the update.

Ace,
- 'HashFNV1A_unrolled' is a pure FNV-1A variant;
- they are different, don't you see the XOR, generally I don't read enough, that is I like to experience the things myself, so a pseudo-plagiarism is possible.

HashKernighanRitchie:
hash = 31 * hash + key[i];
HashFNV1A_unrolled:
hash = ((hash ^ *(DWORD*)str)<<5) - (hash ^ *(DWORD*)str);

And whether 15,17,31,33 the choice is a kind of gambling.
Georgi 'Sanmayce',
Peter, I cannot suppress it: I managed(was lucky) to improve 'FNV1A_unrolled' even further. If the results on i5 confirm this trend I think we can say: 'WHIZ stands for 1579'.

First improved function is 'FNV1A_Hash_WHIZ': FNV1A_Hash_4_OCTETS with FNV1_32_PRIME = 1607 or near-by as 1579.
Second improved function is 'FNV1A_unrolled_Final': post/second cycle in 'FNV1A_unrolled' - just unrolled.

'FNV1A_Hash_WHIZ' follows:
#define FNV1_32_INIT ((UINT)2166136261)
#define FNV1_32_PRIME ((UINT)1607)

#define FNV_32A_OP(hash, octet) \
(((UINT)(hash) ^ (unsigned char)(octet)) * FNV1_32_PRIME)

#define FNV_32A_OP32(hash, octet) \
(((UINT)(hash) ^ (UINT)(octet)) * FNV1_32_PRIME)

UINT FNV1A_Hash_WHIZ(const char *str, SIZE_T wrdlen)
{

UINT hash32;
const char *p;

hash32 = FNV1_32_INIT;
p=str;

for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash32 = FNV_32A_OP32(hash32, (UINT)*(UINT *)p);
}
if (wrdlen & -2) {
hash32 = FNV_32A_OP32(hash32, *(UINT*)p&0xFFFF);
p++;p++;
}
if (wrdlen & 1)
hash32 = FNV_32A_OP(hash32, *p);

return hash32 ^ (hash32 >> 16);
}

'FNV1A_unrolled_Final' follows:
UINT HashFNV1A_unrolled_Final(const CHAR *str, SIZE_T wrdlen)
{
//const UINT PRIME = 31;
UINT hash = 2166136261;
const CHAR * p = str;
/*
// Reduce the number of multiplications by unrolling the loop
for (SIZE_T ndwords = wrdlen / sizeof(DWORD); ndwords; --ndwords) {
//hash = (hash ^ *(DWORD*)p) * PRIME;
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);

p += sizeof(DWORD);
}
*/
for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
}

// Process the remaining bytes
/*
for (SIZE_T i = 0; i < (wrdlen & (sizeof(DWORD) - 1)); i++) {
//hash = (hash ^ *p++) * PRIME;
hash = ((hash ^ *p)<<5) - (hash ^ *p);
p++;
}
*/
if (wrdlen & -2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
p++;p++;
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);

return (hash>>16) ^ hash;
}

Also a new FNV-1A derivate hasher targeted on case-insensitive-latin-letters I made, perhaps it is not useful for anything else, but the idea of making ONE multiplication/(shift+subtraction) within 6 chars range is worth thinking of, do you agree?
At first the name was 'Sixtine' a la S.King's 'Christine' monster, but I wanted to remind also for the insensitiveness.

// Tuned for lowercase-and-uppercase letters i.e. 26 ASCII symbols 65-90 and 97-122 decimal.
UINT Sixtinsensitive(const CHAR *str, SIZE_T wrdlen)
{
UINT hash = 2166136261;
UINT hashBUFFER_EAX, hashBUFFER_BH, hashBUFFER_BL;
const CHAR * p = str;

// Ox41 = 065 'A' 010 [0 0001]
// Ox5A = 090 'Z' 010 [1 1010]
// Ox61 = 097 'a' 011 [0 0001]
// Ox7A = 122 'z' 011 [1 1010]

// Reduce the number of multiplications by unrolling the loop
for(; wrdlen >= 6; wrdlen -= 6, p += 6) {
//hashBUFFER_AX = (*(DWORD*)(p+0)&0xFFFF);
hashBUFFER_EAX = (*(DWORD*)(p+0)&0x1F1F1F1F);
hashBUFFER_BL = (*(p+4)&0x1F);
hashBUFFER_BH = (*(p+5)&0x1F);
//6bytes-in-4bytes or 48bits-to-30bits
// Two times next:
//3bytes-in-2bytes or 24bits-to-15bits
//EAX BL BH
//[5bit][3bit][5bit][3bit][5bit][3bit][5bit][3bit]
// 5th[0..15] 13th[0..15]
// BL lower 3 BL higher 2bits
// OR or XOR no difference
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x07)<<5); // BL lower 3bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x18)<<(2+8)); // BL higher 2bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x07)<<(5+16)); // BH lower 3bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x18)<<((2+8)+16)); // BH higher 2bits of 5bits
//hash = (hash ^ hashBUFFER_EAX)*1607; //What a mess: <<7 becomes imul but <<5 not!?
hash = ((hash ^ hashBUFFER_EAX)<<5) - (hash ^ hashBUFFER_EAX);
//1607:[2118599]
// 127:[2121081]
// 31:[2139242]
// 17:[2150803]
// 7:[2166336]
// 5:[2183044]
//8191:[2200477]
// 3:[2205095]
// 257:[2206188]
}
// Post-Variant #1:
for(; wrdlen; wrdlen--, p++) {
hash = ((hash ^ (*p&0x1F))<<5) - (hash ^ (*p&0x1F));
}
/*
// Post-Variant #2:
for(; wrdlen >= 2; wrdlen -= 2, p += 2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);
*/
/*
// Post-Variant #3:
for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
}
if (wrdlen & -2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
p++;p++;
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);
*/
return (hash>>16) ^ hash;
}

?Sixtinsensitive@@YAIPBDK@Z PROC ; Sixtinsensitive

; 891 : {

00300 53 push ebx
00301 55 push ebp

; 892 : UINT hash = 2166136261;
; 893 : UINT hashBUFFER_EAX, hashBUFFER_BH, hashBUFFER_BL;
; 894 : const CHAR * p = str;
; 895 :
; 896 : // Ox41 = 065 'A' 010 [0 0001]
; 897 : // Ox5A = 090 'Z' 010 [1 1010]
; 898 : // Ox61 = 097 'a' 011 [0 0001]
; 899 : // Ox7A = 122 'z' 011 [1 1010]
; 900 :
; 901 : // Reduce the number of multiplications by unrolling the loop
; 902 : for(; wrdlen >= 6; wrdlen -= 6, p += 6) {

00302 8b 6c 24 10 mov ebp, DWORD PTR _wrdlen$[esp+4]
00306 57 push edi
00307 8b 7c 24 10 mov edi, DWORD PTR _str$[esp+8]
0030b bb c5 9d 1c 81 mov ebx, -2128831035 ; 811c9dc5H
00310 83 fd 06 cmp ebp, 6
00313 72 5c jb SHORT $LN4@Sixtinsens
00315 b8 ab aa aa aa mov eax, -1431655765 ; aaaaaaabH
0031a f7 e5 mul ebp
0031c c1 ea 02 shr edx, 2
0031f 56 push esi
$LL6@Sixtinsens:

; 903 : //hashBUFFER_AX = (*(DWORD*)(p+0)&0xFFFF);
; 904 : hashBUFFER_EAX = (*(DWORD*)(p+0)&0x1F1F1F1F);
; 905 : hashBUFFER_BL = (*(p+4)&0x1F);
; 906 : hashBUFFER_BH = (*(p+5)&0x1F);

00320 0f be 77 05 movsx esi, BYTE PTR [edi+5]
00324 0f be 4f 04 movsx ecx, BYTE PTR [edi+4]
00328 83 e6 1f and esi, 31 ; 0000001fH

; 907 : //6bytes-in-4bytes or 48bits-to-30bits
; 908 : // Two times next:
; 909 : //3bytes-in-2bytes or 24bits-to-15bits
; 910 : //EAX BL BH
; 911 : //[5bit][3bit][5bit][3bit][5bit][3bit][5bit][3bit]
; 912 : // 5th[0..15] 13th[0..15]
; 913 : // BL lower 3 BL higher 2bits
; 914 : // OR or XOR no difference
; 915 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x07)<<5); // BL lower 3bits of 5bits
; 916 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x18)<<(2+8)); // BL higher 2bits of 5bits
; 917 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x07)<<(5+16)); // BH lower 3bits of 5bits
; 918 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x18)<<((2+8)+16)); // BH higher 2bits of 5bits
; 919 : //hash = (hash ^ hashBUFFER_EAX)*1607; //What a mess: <<7 becomes imul but <<5 not!?
; 920 : hash = ((hash ^ hashBUFFER_EAX)<<5) - (hash ^ hashBUFFER_EAX);

0032b 8b c6 mov eax, esi
0032d 83 e0 18 and eax, 24 ; 00000018H
00330 c1 e0 05 shl eax, 5
00333 83 e1 1f and ecx, 31 ; 0000001fH
00336 83 e6 07 and esi, 7
00339 33 c6 xor eax, esi
0033b 8b f1 mov esi, ecx
0033d c1 e0 0b shl eax, 11 ; 0000000bH
00340 83 e1 07 and ecx, 7
00343 83 e6 18 and esi, 24 ; 00000018H
00346 33 c6 xor eax, esi
00348 c1 e0 05 shl eax, 5
0034b 33 c1 xor eax, ecx
0034d 8b 0f mov ecx, DWORD PTR [edi]
0034f 81 e1 1f 1f 1f
1f and ecx, 522133279 ; 1f1f1f1fH
00355 c1 e0 05 shl eax, 5
00358 33 c1 xor eax, ecx
0035a 33 c3 xor eax, ebx
0035c 8b c8 mov ecx, eax
0035e c1 e1 05 shl ecx, 5
00361 2b c8 sub ecx, eax
00363 83 ed 06 sub ebp, 6
00366 83 c7 06 add edi, 6
00369 83 ea 01 sub edx, 1
0036c 8b d9 mov ebx, ecx
0036e 75 b0 jne SHORT $LL6@Sixtinsens
00370 5e pop esi
$LN4@Sixtinsens:

; 921 : //1607:[2118599]
; 922 : // 127:[2121081]
; 923 : // 31:[2139242]
; 924 : // 17:[2150803]
; 925 : // 7:[2166336]
; 926 : // 5:[2183044]
; 927 : //8191:[2200477]
; 928 : // 3:[2205095]
; 929 : // 257:[2206188]
; 930 : }
; 931 : // Post-Variant #1:
; 932 : for(; wrdlen; wrdlen--, p++) {

00371 85 ed test ebp, ebp
00373 74 17 je SHORT $LN1@Sixtinsens
$LL3@Sixtinsens:

; 933 : hash = ((hash ^ (*p&0x1F))<<5) - (hash ^ (*p&0x1F));

00375 0f be 07 movsx eax, BYTE PTR [edi]
00378 83 e0 1f and eax, 31 ; 0000001fH
0037b 33 c3 xor eax, ebx
0037d 8b d0 mov edx, eax
0037f c1 e2 05 shl edx, 5
00382 2b d0 sub edx, eax
00384 4d dec ebp
00385 47 inc edi
00386 8b da mov ebx, edx
00388 85 ed test ebp, ebp
0038a 75 e9 jne SHORT $LL3@Sixtinsens
$LN1@Sixtinsens:

; 934 : }
; 935 : /*
; 936 : // Post-Variant #2:
; 937 : for(; wrdlen >= 2; wrdlen -= 2, p += 2) {
; 938 : hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
; 939 : }
; 940 : if (wrdlen & 1)
; 941 : hash = ((hash ^ *p)<<5) - (hash ^ *p);
; 942 : */
; 943 : /*
; 944 : // Post-Variant #3:
; 945 : for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
; 946 : hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
; 947 : }
; 948 : if (wrdlen & -2) {
; 949 : hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
; 950 : p++;p++;
; 951 : }
; 952 : if (wrdlen & 1)
; 953 : hash = ((hash ^ *p)<<5) - (hash ^ *p);
; 954 : */
; 955 : return (hash>>16) ^ hash;

0038c 8b c3 mov eax, ebx
0038e 5f pop edi
0038f c1 e8 10 shr eax, 16 ; 00000010H
00392 5d pop ebp
00393 33 c3 xor eax, ebx
00395 5b pop ebx

; 956 : }

00396 c3 ret 0
?Sixtinsensitive@@YAIPBDK@Z ENDP ; Sixtinsensitive

Still, I am not happy with 'Sixtinsensitive' speed which is about (12445829-11075601 )/11075601*100%=12.3% slower than 'FNV1A_unrolled_Final'. If you are interested in boosting such an approach I will be glad.

D:\_KAZE_new-stuff\_w>hash.exe wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
Sixtinsensitive: 12554437 12465228 12466856 12445829 12447431| 12445829 [2139242]
x17 unrolled: 11927763 11908183 11928229 11923448 11929265| 11908183 [2410605]
FNV1A_Hash_WHIZ: 10642905 10647391 10631697 10633363 10626230| 10626230 [2189360] // FNV1_32_PRIME = 1607
FNV1A_Hash_WHIZ: 10567474 10566509 10587741 10604073 10563141| 10563141 [2144749] // FNV1_32_PRIME = 1999
FNV1A_Hash_WHIZ: 10535248 10555186 10514388 10517062 10518655| 10514388 [2154569] // FNV1_32_PRIME = 1579
FNV1A_unrolled_Final: 11101280 11084731 11105538 11075601 11085023| 11075601 [2252381]
FNV1A_unrolled: 11212976 11226149 11236817 11188433 11262786| 11188433 [2191287]
Alfalfa: 11837124 11864087 11835669 11856730 11841964| 11835669 [2074883]
Alfalfa_HALF: 11978432 11984171 11955814 11966588 11961493| 11955814 [2077426]

D:\_KAZE_new-stuff\_w>hash_benchmark.bat
Words
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 99 91 90 90 89| 89 [ 105]
x17 unrolled: 97 92 91 118 91| 91 [ 109]
FNV1A_Hash_WHIZ: 90 84 82 82 81| 81 [ 124]
FNV1A_unrolled_Final: 87 80 79 78 78| 78 [ 115]
FNV1A_unrolled: 90 83 81 80 80| 80 [ 106]
Alfalfa: 94 88 87 86 87| 86 [ 100]
Alfalfa_HALF: 98 111 91 91 90| 90 [ 97]
Win32
1992 lines read
4096 elements in the table (12 bits)
Sixtinsensitive: 524 512 534 510 509| 509 [ 409]
x17 unrolled: 592 617 588 588 589| 588 [ 414]
FNV1A_Hash_WHIZ: 464 437 436 437 456| 436 [ 418]
FNV1A_unrolled_Final: 434 425 424 423 449| 423 [ 414]
FNV1A_unrolled: 433 426 425 424 444| 424 [ 404]
Alfalfa: 573 569 568 587 568| 568 [ 411]
Alfalfa_HALF: 574 587 566 566 565| 565 [ 428]
Numbers
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 66 65 65 65 66| 65 [ 206]
x17 unrolled: 48 48 48 48 48| 48 [ 24]
FNV1A_Hash_WHIZ: 52 51 50 50 50| 50 [ 304]
FNV1A_unrolled_Final: 69 69 69 68 69| 68 [ 420]
FNV1A_unrolled: 70 69 68 68 68| 68 [ 420]
Alfalfa: 45 45 45 45 45| 45 [ 160]
Alfalfa_HALF: 54 53 53 53 53| 53 [ 288]
Prefix
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 165 162 162 162 162| 162 [ 106]
x17 unrolled: 205 204 204 204 204| 204 [ 113]
FNV1A_Hash_WHIZ: 129 146 128 128 128| 128 [ 100]
FNV1A_unrolled_Final: 123 122 122 122 122| 122 [ 101]
FNV1A_unrolled: 125 124 124 123 123| 123 [ 116]
Alfalfa: 199 219 200 200 200| 199 [ 104]
Alfalfa_HALF: 200 199 199 199 199| 199 [ 115]
Postfix
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 164 161 160 160 160| 160 [ 116]
x17 unrolled: 201 200 200 200 200| 200 [ 102]
FNV1A_Hash_WHIZ: 127 126 125 126 125| 125 [ 109]
FNV1A_unrolled_Final: 122 120 120 120 120| 120 [ 110]
FNV1A_unrolled: 120 119 263 118 118| 118 [ 102]
Alfalfa: 198 198 197 197 197| 197 [ 116]
Alfalfa_HALF: 198 197 196 302 196| 196 [ 111]
Variables
1842 lines read
4096 elements in the table (12 bits)
Sixtinsensitive: 433 416 415 415 436| 415 [ 374]
x17 unrolled: 438 434 433 451 433| 433 [ 368]
FNV1A_Hash_WHIZ: 369 361 358 360 383| 358 [ 353]
FNV1A_unrolled_Final: 363 354 353 352 353| 352 [ 352]
FNV1A_unrolled: 367 360 359 357 358| 357 [ 341]
Alfalfa: 440 415 413 413 413| 413 [ 343]
Alfalfa_HALF: 462 438 437 435 436| 435 [ 396]
Sonnets
3228 lines read
8192 elements in the table (13 bits)
Sixtinsensitive: 611 587 587 586 586| 586 [ 542]
x17 unrolled: 618 586 585 585 585| 585 [ 589]
FNV1A_Hash_WHIZ: 523 514 513 514 512| 512 [ 555]
FNV1A_unrolled_Final: 527 519 517 516 517| 516 [ 582]
FNV1A_unrolled: 531 525 522 523 523| 522 [ 570]
Alfalfa: 562 555 555 555 554| 554 [ 570]
Alfalfa_HALF: 589 579 579 577 578| 577 [ 543]
UTF-8
13408 lines read
32768 elements in the table (15 bits)
Sixtinsensitive: 2885 2813 2789 2793 2807| 2789 [ 2414]
x17 unrolled: 2910 2910 2889 2896 2915| 2889 [ 2392]
FNV1A_Hash_WHIZ: 2493 2492 2501 2473 2476| 2473 [ 2403]
FNV1A_unrolled_Final: 2479 2474 2494 2471 2483| 2471 [ 2446]
FNV1A_unrolled: 2508 2520 2547 2693 2470| 2470 [ 2421]
Alfalfa: 2735 2726 2725 2743 2761| 2725 [ 2415]
Alfalfa_HALF: 2927 2892 2900 2943 2933| 2892 [ 2445]
IPv4
3925 lines read
8192 elements in the table (13 bits)
Sixtinsensitive: 543 526 526 525 526| 525 [ 1443]
x17 unrolled: 444 444 443 444 443| 443 [ 829]
FNV1A_Hash_WHIZ: 397 396 394 394 395| 394 [ 1404]
FNV1A_unrolled_Final: 394 395 395 394 394| 394 [ 1419]
FNV1A_unrolled: 400 403 404 402 401| 400 [ 1419]
Alfalfa: 405 419 405 403 403| 403 [ 728]
Alfalfa_HALF: 434 433 432 432 436| 432 [ 813]

D:\_KAZE_new-stuff\_w>

The next 4 keys(words/phrases/sentences/paragraphs) occupy one slot, and the 'Sixtinsensitive' speed performance boosts as the key length increases:
pneumonoultramicroscopicsilicovolcanoconiosis: "A facetious word alleged to mean 'a lung disease caused by the inhalation of very fine silica dust' but occurring chiefly as an instance of a very long word." [OED] Online Etymology Dictionary, (c) 2010 Douglas Harper
Pneumonoultramicroscopicsilicovolcanoconiosis: "A Facetious Word Alleged To Mean 'a Lung Disease Caused By The Inhalation Of Very Fine Silica Dust' But Occurring Chiefly As An Instance Of A Very Long Word." [Oed] Online Etymology Dictionary, (C) 2010 Douglas Harper
PNEUMONOULTRAMICROSCOPICSILICOVOLCANOCONIOSIS: "A FACETIOUS WORD ALLEGED TO MEAN 'A LUNG DISEASE CAUSED BY THE INHALATION OF VERY FINE SILICA DUST' BUT OCCURRING CHIEFLY AS AN INSTANCE OF A VERY LONG WORD." [OED] ONLINE ETYMOLOGY DICTIONARY, (C) 2010 DOUGLAS HARPER
pneumonoultramicroscopicsilicovolcanoconiosis: "a facetious word alleged to mean 'a lung disease caused by the inhalation of very fine silica dust' but occurring chiefly as an instance of a very long word." [oed] online etymology dictionary, (c) 2010 douglas harper

D:\_KAZE_new-stuff\_w>hash.exe 4keys.txt
4 lines read
8 elements in the table (3 bits)
Sixtinsensitive: 7 6 6 6 6| 6 [ 3]
x17 unrolled: 10 10 10 10 10| 10 [ 0]
FNV1A_Hash_WHIZ: 5 5 5 5 5| 5 [ 1]
FNV1A_unrolled_Final: 4 4 4 4 4| 4 [ 0]
FNV1A_unrolled: 4 4 4 4 4| 4 [ 0]
Alfalfa: 10 10 10 10 10| 10 [ 0]
Alfalfa_HALF: 10 10 10 10 10| 10 [ 1]
D:\_KAZE_new-stuff\_w>

Once again, it would bring joy if 'Sixtinsensitive' could outperform 'FNV1A_unrolled', personally I don't get the above result, my expectation was 'Sixtinsensitive' to whiz with such a long keys, obviously some things must be improved.
Peter Kankowski,

Georgi, here are the results on Pentium M:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
FNV1A_WHIZ83[124]444[418]61[304]136[100]133[109]362[353]523[555]2462[2411]436[1404]1.04[3.22]
FNV1A_unrolled82[115]442[414]75[420]133[101]129[110]362[352]530[582]2500[2446]441[1419]1.06[4.02]
Novak unrolled90[113]519[399]56[90]169[118]164[113]400[342]575[581]2713[2430]483[969]1.15[1.67]
SBox88[91]553[432]57[116]182[108]177[91]414[347]561[526]2811[2442]472[836]1.19[1.77]
Fletcher85[131]445[405]102[460]140[127]132[108]382[507]598[1052]2918[4893]515[1359]1.20[4.60]
CRC-3290[101]562[426]56[64]196[107]191[94]426[338]594[563]2829[2400]469[725]1.22[1.40]
Murmur295[103]530[416]64[104]165[106]161[111]431[383]620[566]2919[2399]538[834]1.22[1.73]
x17 unrolled94[109]591[414]52[24]213[113]207[102]434[368]595[589]2866[2392]486[829]1.26[1.18]
Hash_Sixtinsensitive95[105]550[409]70[206]177[106]174[116]437[374]612[542]2905[2414]575[1443]1.27[2.54]
lookup395[101]565[412]70[97]188[101]182[95]432[361]632[550]2932[2392]573[834]1.29[1.64]
K&R95[106]618[437]58[288]221[94]218[106]446[360]602[561]3005[2365]448[831]1.30[2.99]
Paul Larson94[99]630[416]49[16]231[99]228[105]455[366]601[583]3025[2447]468[755]1.31[1.09]
Alfalfa_HALF96[97]596[428]62[288]210[115]206[111]460[396]629[543]3055[2445]479[813]1.31[3.02]
Bernstein95[114]622[412]61[288]225[100]222[102]445[353]595[572]3001[2380]471[703]1.32[2.98]
Paul Hsieh106[114]574[419]71[118]184[101]180[100]456[341]681[600]3146[2380]580[847]1.33[1.82]
x6559995[111]626[382]61[203]234[107]231[122]451[379]596[560]2997[2373]474[846]1.34[2.44]
Sedgewick101[107]666[413]53[48]244[103]241[103]478[348]630[570]3212[2437]477[782]1.38[1.32]
FNV-1a101[124]658[428]63[108]239[94]236[105]472[374]624[555]3130[2446]518[807]1.40[1.76]
Alfalfa97[100]646[412]54[72]237[103]352[484]460[363]613[584]2991[2365]493[810]1.45[1.98]
MaPrime2c108[103]705[425]65[106]255[91]253[106]507[349]676[550]3413[2406]542[865]1.49[1.72]
Alfalfa_DWORD102[100]658[412]57[72]241[103]362[484]484[363]653[584]3175[2392]512[896]1.51[2.00]
Ramakrishna108[108]727[409]61[91]279[125]272[103]511[360]666[528]3371[2383]518[840]1.52[1.65]
Arash Partow106[101]742[435]94[420]280[98]274[85]517[355]672[570]3344[2372]543[779]1.60[3.87]
One At Time118[105]831[421]81[110]319[97]316[103]575[364]741[545]3800[2346]657[795]1.76[1.74]
Weinberger120[104]959[423]54[100]376[111]379[117]620[364]748[712]3994[2547]561[744]1.84[1.74]
Hanson87[118]532[649]55[112]168[118]1603[499]393[435]551[592]2746[2890]462[833]2.38[2.44]
WikipediaAvg
x17 unrolled11405396[2410605]1.00[1.16]
K&R11796999[2083145]1.03[1.00]
Alfalfa_HALF11830555[2077426]1.04[1.00]
Bernstein11862646[2074237]1.04[1.00]
Paul Larson12003126[2080111]1.05[1.00]
Alfalfa11962051[2196163]1.05[1.06]
Alfalfa_DWORD12081043[2196163]1.06[1.06]
Sedgewick12224373[2080640]1.07[1.00]
x6559912164116[2102893]1.07[1.01]
Ramakrishna12294348[2093253]1.08[1.01]
Arash Partow12379799[2084572]1.09[1.00]
FNV1A_WHIZ12536783[2189360]1.10[1.06]
FNV1A_unrolled12592869[2252381]1.10[1.09]
CRC-3212743823[2075088]1.12[1.00]
Murmur212816383[2081476]1.12[1.00]
Hash_Sixtinsensitive12723929[2139242]1.12[1.03]
Hanson12777632[2129832]1.12[1.03]
SBox12851471[2084018]1.13[1.00]
lookup312908876[2084889]1.13[1.01]
FNV-1a13004759[2081195]1.14[1.00]
Paul Hsieh13105724[2180206]1.15[1.05]
MaPrime2c13495196[2084467]1.18[1.00]
One At Time13801915[2087861]1.21[1.01]
Weinberger14759180[3541181]1.29[1.71]
Fletcher37809242[9063797]3.32[4.37]
Novak unrolled38047816[6318611]3.34[3.05]

This code:

if (wrdlen & -2) {
  hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
  p++;p++;
}
should be:
if (wrdlen & sizeof(WORD)) {
  hash = ((hash ^ *(WORD*)p)<<5) - (hash ^ *(WORD*)p);
  p += sizeof(WORD);
}

With *(DWORD*), a buffer overrun is possible at the end of a memory page. Also note wrdlen & 2 instead of wrdlen & -2.

Generally, Whiz is one of the fastest hash function at this moment :) It proves that a simpler hash function can be faster despite of higher number of collisions. Thank you very much for your contribution!

Georgi 'Sanmayce',
Thank you Peter,
dummy me, you are right: 'if (wrdlen & -2) {' becomes 'if (wrdlen & 2) {'.
The idea was/is to cover the three cases: 0,1,2,3 with two IFs.

As for 'With *(DWORD*), a buffer overrun is possible at the end of a memory page.' I knew about it but was fooled by assembly code generated by VS2010 which translates it to a word access:
; 792 : hash32 = FNV_32A_OP32(hash32, *(UINT*)p&0xFFFF);

00360 0f b7 30 movzx esi, WORD PTR [eax]
00363 33 f1 xor esi, ecx
00365 69 f6 47 06 00
00 imul esi, 1607 ; 00000647H
0036b 8b ce mov ecx, esi

In fact I still don't know how to operate directly with words(two bytes, unsigned short int may be 32bit instead of 16bit, yes?) in C, I was eager to share them, glad that you fixed the bugs.

But one mystery(caused by alignment!?) remains: why on my Pentium Merom results with Wikipedia show a very different roster!
Peter Kankowski,

I've just run the Wikipedia test on Core i5. Whiz is very fast.

Unfortunately, it's slower on Pentium M. I repeated the test twice on both processors, and the results don't change. Most likely, there is some unlucky difference between microarchitectures, but we should optimize for the newer processors (your Merom and my Lynnfield).

Georgi 'Sanmayce',
I agree, in my view no one function must be underestimated i.e. every function is a rough gem, thanks again.
I have some high hopes(due to cheap lowercasing and 4+2 granularity) for 'Sixtinsensitive', already incorporated in Leprechaun with very consistent distribution regardless of FNV1_32_PRIME = 3,5,7,17,31,127,257,8191.
ace,
> unlucky difference between microarchitectures

Isn't the reason for your observations the fact that your source strings simply aren't aligned anymore? That's why Murmur and anything that uses "dwords" should be penalized on all architectures that are affected by alignment. I believe if you tried with MIPS or ARM you'd also see the problems of unaligned access. But as you also see now, it's very convenient to have strings not starting on aligned address (you can omit a lot of allocations, copying and also reduce the memory needs).

But you also know my general opinion: small differences in speed (under 10 or 20%, and with changes across the platform) aren't too relevant, there's got to be some more dramatic demonstration of superiority of some function to some other one in order to claim a really "relevant" winner, and the winner should be "overall good" not only "good on the CPU I have at the moment". I'd always prefer to use the function that is "near the best" on more platforms to the function "the best on one CPU and bad on others."
Peter Kankowski,

Ace, the strings were not aligned in the earlier version of the program. There was an arena allocator: a large char array and a (not aligned) pointer into this char array, which was increased as the lines were read from file.

I agree that the winner should be overall good, having near the best speed and good number of collisions in all tests. At this time, Murmur2 is the best; hardware-accelerated CRC may be used if the target CPU is known to support it.

ace,
I don't know right now (can you please describe yourself?) but I think I concluded recently just by looking at the source (I didn't try anything else) that the string pointed to was not aligned. When I say "something not aligned" I always refer to the thing address of which is not aligned, not "the content of which is not a product of some power of 2" so then I beleived that the pointers ware aligned but the strings pointed to were not (that's why yours "(not aligned) pointer" sounds confusing to me). Anyway that means that reading string dword by dword must have been suboptimal, and I know that Intel optimized such cases only starting with 45 nm Core 2. Everything before (even in Core 2 line) has penalty, everything after hasn't, and I think that matches what you measure in "big pm" and I'd expect the effect must be more obvious the longer the tested strings are. I still believe you look too much to the "latest big x86 in possession" (Atom being also x86, for example, and not to mention other CPU's).
Peter Kankowski,
Ace, thank you very much for your ideas. I tried aligning the strings, but it does not help. (BTW, Fletcher becomes much faster when the strings are aligned.) Still looking for the reason.
ace,
How big tables did you test with aligned strings, and have you measured on PM or something else? Can you please post what you've exactly measured, I'm curious to see? I believed there would be some visible difference in "big pm" results, especially if the strings would be made somewhat bigger than the plain words. But i5 should even be better unaligned because the total memory footprint must be smaller by the denser packing and the hardware optimizes every access.
Georgi 'Sanmayce',
Hashing words from wikipedia-en-html.tar on 3 Intel powered PCs: currently I have no access to AMD CPUs, so here are 3 Intel CPUs under fire: Subnotebook Toshiba PC, Notebook Toshiba PC, Desktop PC.
[Subnotebook Toshiba PC:]

Number of cores		1 (max 1)
Number of threads	2 (max 2)
Name			Intel Atom N450
Codename		Pineview-N
Specification		Intel(R) Atom(TM) CPU N450   @ 1.66GHz
Package (platform ID)	Socket 437 FCBGA8 (0x2)
CPUID			6.C.A
Extended CPUID		6.1C
Core Stepping		B0
Technology		45 nm
Core Speed		1662.7 MHz
Multiplier x FSB	10.0 x 166.3 MHz
Rated Bus speed		665.1 MHz
Stock frequency		1666 MHz
Instructions sets	MMX, SSE, SSE2, SSE3, SSSE3, EM64T
L1 Data cache		24 KBytes, 6-way set associative, 64-byte line size
L1 Instruction cache	32 KBytes, 8-way set associative, 64-byte line size
L2 cache		512 KBytes, 8-way set associative, 64-byte line size

Channels			Single
CAS# latency (CL)		5.0
RAS# to CAS# delay (tRCD)	5
RAS# Precharge (tRP)		5
Cycle Time (tRAS)		13
Row Refresh Cycle Time (tRFC)	44
Command Rate (CR)		2T

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Test\hash>dir

11/07/2010  06:47 AM            84,992 hash.exe
10/27/2010  03:48 PM       146,973,879 wikipedia-en-html.tar.wrd

C:\Test\hash>hash wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
    Sixtinsensitive+:   16153226  16031348  16039844  16048697  16026421|  16026421 [2251734] !FASTEST!
Hash_Sixtinsensitive:   16501364  16504662  16506868  16509768  16515127|  16501364 [2139242]
                Whiz:   16081498  16080691  16080162  16079881  16085656|  16079881 [2189360] !Second-to-FASTEST! 
           Bernstein:   16222890  16208653  16195534  16205258  16207576|  16195534 [2074237]
                 K&R:   16181141  16153089  16147347  16144145  16144279|  16144145 [2083145]
        x17 unrolled:   16196693  16200720  16203124  16195429  16196980|  16195429 [2410605]
              x65599:   17213371  17215886  17211092  17225140  17218658|  17211092 [2102893]
              FNV-1a:   17648084  17649994  17644643  17654437  17650937|  17644643 [2081195]
           Sedgewick:   17715483  17711538  17706339  17709057  17711857|  17706339 [2080640]
          Weinberger:   19795514  19798517  19792040  19783954  19796407|  19783954 [3541181]
         Paul Larson:   16904068  16909317  16906315  16905215  16899096|  16899096 [2080111]
          Paul Hsieh:   16731679  16720594  16932967  16731003  16725432|  16720594 [2180206]
         One At Time:   18317093  18329453  18319193  18325990  18318877|  18317093 [2087861]
             lookup3:   16591932  16600364  16593657  16594726  16605181|  16591932 [2084889]
        Arash Partow:   16790714  16782529  16783414  16782986  16791156|  16782529 [2084572]
              CRC-32:   16878539  16872606  16866719  16876415  16860548|  16860548 [2075088]
         Ramakrishna:   16437073  16449160  16438019  16442447  16452696|  16437073 [2093253]
            Fletcher:   74995725  74991965  74983105  75027126  75020605|  74983105 [9063797]
             Murmur2:   16742975  16730183  16736149  16741761  16735690|  16730183 [2081476]
              Hanson:   17286767  17288260  17273143  17280187  17371322|  17273143 [2129832]
      Novak unrolled:   60320056  60322432  60315223  60308709  60325221|  60308709 [6318611]
                SBox:   17677039  17685532  17686854  17681897  17680986|  17677039 [2084018]
           MaPrime2c:   19286056  19296348  19307574  19294326  19295151|  19286056 [2084467]

C:\Test\hash>

[Notebook Toshiba PC:]

CPU Type: Mobile DualCore Intel Pentium T3400  
CPU Alias: Merom-1M  
CPU Stepping: M0  
Engineering Sample: No  
CPUID CPU Name: Intel(R) Pentium(R) Dual CPU T3400 @ 2.16GHz  
CPU Clock: 2161.5 MHz (original: 2166 MHz)  
CPU Multiplier: 13x  
CPU FSB: 166.3 MHz (original: 166 MHz)  
L1 Code Cache: 32 KB per core  
L1 Data Cache: 32 KB per core  
L2 Cache: 1 MB (On-Die, ECC, ASC, Full-Speed)  
Motherboard Name: Toshiba Satellite L305  
Memory Timings: 5-5-5-13 (CL-RCD-RP-RAS)  
Front Side Bus Properties: 
 Real Clock: 167 MHz (QDR)  
 Effective Clock: 666 MHz  
 Bandwidth: 5332 MB/s  
Memory Bus Properties:  
 Bus Type: Dual DDR2 SDRAM  
 Bus Width: 128-bit  
 DRAM:FSB Ratio: 10:5  
 Real Clock: 333 MHz (DDR)  
 Effective Clock: 666 MHz  
 Bandwidth: 10663 MB/s  

D:\_KAZE_new-stuff>hash wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
    Sixtinsensitive+:   12058258  11973671  11953887  11970400  11973268|  11953887 [2251734]
Hash_Sixtinsensitive:   12387803  12368726  12437268  12419321  12367736|  12367736 [2139242]
                Whiz:   10594894  10600128  10582191  10593562  10580720|  10580720 [2189360] !FASTEST!
           Bernstein:   12157227  12160397  12163431  12152197  12206038|  12152197 [2074237]
                 K&R:   12113889  12062050  12066080  12053732  12058839|  12053732 [2083145]
        x17 unrolled:   11831265  11851909  11849651  11834030  11842581|  11831265 [2410605] !Second-to-FASTEST!
              x65599:   12015548  12059651  12043365  12003264  12014828|  12003264 [2102893]
              FNV-1a:   12512030  12476285  12472843  12464470  12481049|  12464470 [2081195]
           Sedgewick:   12423386  12430959  12446600  12481711  12417274|  12417274 [2080640]
          Weinberger:   15886398  15882493  15873672  15871503  15887145|  15871503 [3541181]
         Paul Larson:   11865686  11875179  11880320  11930336  11883977|  11865686 [2080111]
          Paul Hsieh:   12743450  12720752  12740266  12722144  12733419|  12720752 [2180206]
         One At Time:   13025820  13037695  13023502  13036316  13087593|  13023502 [2087861]
             lookup3:   12513213  12510938  12517817  12521024  12528520|  12510938 [2084889]
        Arash Partow:   12616358  12613361  12610421  12609956  12596411|  12596411 [2084572]
              CRC-32:   12407442  12316574  12336861  12333267  12332221|  12316574 [2075088]
         Ramakrishna:   12281227  12301150  12289356  12282034  12288241|  12281227 [2093253]
            Fletcher:   50147553  50090287  50105264  50208280  50116095|  50090287 [9063797]
             Murmur2:   12188900  12187655  12192879  12190325  12256534|  12187655 [2081476]
              Hanson:   12475359  12463011  12446859  12444451  12445178|  12444451 [2129832]
      Novak unrolled:   37262357  37364140  37383476  37304713  37319954|  37262357 [6318611]
                SBox:   12414330  12413821  12512329  12421929  12411736|  12411736 [2084018]
           MaPrime2c:   12973562  12980602  12983752  12983859  12987014|  12973562 [2084467]

D:\_KAZE_new-stuff>

[Desktop PC:]

Number of cores		4 (max 4)
Number of threads	4 (max 4)
Name			Intel Core 2 Quad Q9550S
Codename		Yorkfield
Specification		Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
Package (platform ID)	Socket 775 LGA (0x4)
CPUID			6.7.A
Extended CPUID		6.17
Core Stepping		E0
Technology		45 nm
Core Speed		2002.2 MHz
Multiplier x FSB	6.0 x 333.7 MHz
Rated Bus speed		1334.8 MHz
Stock frequency		2833 MHz
Instructions sets	MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, EM64T, VT-x
L1 Data cache		4 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache	4 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache		2 x 6144 KBytes, 24-way set associative, 64-byte line size
TDP Limit		65 Watts
Northbridge			Intel G41 rev. A3
Southbridge			Intel 82801GB (ICH7/R) rev. A1
Memory Type			DDR3
Memory Size			4096 MBytes
Channels			Dual, (Symmetric)
Memory Frequency		667.4 MHz (1:2)
CAS# latency (CL)		7.0
RAS# to CAS# delay (tRCD)	7
RAS# Precharge (tRP)		7
Cycle Time (tRAS)		21
Row Refresh Cycle Time (tRFC)	60
Command Rate (CR)		2T

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

D:\TESTS\hash>hash wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
    Sixtinsensitive+:    9053718   8989266   8951519   8952219   8949380|   8949380 [2251734]
Hash_Sixtinsensitive:    9269264   9264150   9263994   9263761   9263358|   9263358 [2139242]
                Whiz:    7865326   7856200   7859796   7866475   7852540|   7852540 [2189360] !FASTEST!
           Bernstein:    9081460   9069960   9068900   9070469   9079405|   9068900 [2074237]
                 K&R:    8993801   8992382   8990556   8983363   8986733|   8983363 [2083145]
        x17 unrolled:    8682855   8681626   8692004   8684190   8682793|   8681626 [2410605] !Second-to-FASTEST!
              x65599:    8793795   8792213   8791230   8790913   8801408|   8790913 [2102893]
              FNV-1a:    9391666   9382336   9379497   9379865   9389164|   9379497 [2081195]
           Sedgewick:    9091018   9085970   9085732   9094216   9097600|   9085732 [2080640]
          Weinberger:   11999347  11988551  11987173  11995011  11991473|  11987173 [3541181]
         Paul Larson:    8788831   8797335   8784061   8788220   8785700|   8784061 [2080111]
          Paul Hsieh:    9470151   9474939   9471454   9470880   9483763|   9470151 [2180206]
         One At Time:    9856512   9865262   9853988   9864387   9856661|   9853988 [2087861]
             lookup3:    9346552   9341545   9346329   9338833   9346559|   9338833 [2084889]
        Arash Partow:    9341715   9323081   9332539   9338738   9333211|   9323081 [2084572]
              CRC-32:    9159684   9134865   9138729   9155572   9155020|   9134865 [2075088]
         Ramakrishna:    9183066   9178082   9191092   9184270   9187847|   9178082 [2093253]
            Fletcher:   31111801  31116622  31094619  31055130  31036218|  31036218 [9063797]
             Murmur2:    9013410   9011146   9010973   9016315   9016830|   9010973 [2081476]
              Hanson:    9275481   9276946   9284989   9279977   9281620|   9275481 [2129832]
      Novak unrolled:   17390159  17391792  17387836  17389504  17391874|  17387836 [6318611]
                SBox:    9318679   9317361   9316436   9316997   9319220|   9316436 [2084018]
           MaPrime2c:    9819245   9818182   9820444   9816287   9817004|   9816287 [2084467]

D:\TESTS\hash>
For full info with non-proportional fonts:
http://encode.ru/threads/1155-A-new-match-searching-structure?p=22923#post22923
Peter Kankowski,

Ace, here are the results on Wikipedia, Pentium M:

AlignedNot alignedCollisions
x17 unrolled1,001,00[2410605]
K&R1,031,03[2083145]
Bernstein1,041,04[2074237]
Paul Larson1,051,05[2080111]
x655991,071,06[2102893]
Sedgewick1,071,07[2080640]
Ramakrishna1,081,07[2093253]
Arash Partow1,081,08[2084572]
Whiz1,091,09[2189360]
CRC-321,101,11[2075088]
Murmur21,111,12[2081476]
Hanson1,111,11[2129832]
SBox1,121,12[2084018]
lookup31,111,13[2084889]
FNV-1a1,141,14[2081195]
Paul Hsieh1,141,15[2180206]
Murmur2A1,151,15[2081370]
MaPrime2c1,181,18[2084467]
One At Time1,211,21[2087861]
Weinberger1,291,29[3541181]
Fletcher3,323,30[9063797]
Novak unrolled3,353,33[6318611]

In short, no difference. The small test is more interesting. Here is the aligned version, the not aligned one can be found above:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
Whiz80[124]421[418]61[304]129[100]125[109]350[353]510[555]2393[2411]435[1404]1.03[3.22]
Novak unrolled88[113]505[399]55[90]165[118]159[113]390[342]572[581]2670[2430]482[969]1.16[1.67]
Fletcher83[131]437[406]97[460]134[127]126[108]373[507]579[1052]2815[4893]514[1359]1.18[4.60]
lookup388[101]497[412]67[97]158[101]153[95]388[361]585[550]2682[2392]570[834]1.20[1.64]
SBox88[91]544[431]57[116]177[108]173[91]410[347]557[526]2815[2442]472[836]1.20[1.77]
Murmur294[103]514[415]63[104]156[106]153[111]421[383]612[566]2856[2399]537[834]1.22[1.73]
CRC-3289[101]550[426]54[64]191[107]186[94]414[338]578[563]2750[2400]469[725]1.23[1.40]
x17 unrolled94[109]590[415]52[24]209[113]204[102]433[368]594[589]2867[2392]485[829]1.29[1.18]
K&R92[106]610[437]58[288]217[94]214[106]439[360]584[561]2925[2365]448[831]1.31[2.99]
Paul Hsieh104[114]557[420]70[118]174[101]171[100]448[341]667[600]3106[2380]578[847]1.33[1.82]
Paul Larson93[99]623[416]49[16]229[99]226[105]451[366]597[583]2997[2447]469[755]1.34[1.09]
Bernstein94[114]614[412]60[288]221[100]218[102]440[353]588[572]2970[2380]469[703]1.34[2.98]
x6559993[111]620[382]60[203]230[107]228[122]448[379]595[560]2977[2373]472[846]1.36[2.44]
Sedgewick100[107]660[414]52[48]240[103]238[103]474[348]625[570]3171[2437]476[782]1.40[1.32]
Murmur2A110[114]581[433]77[102]173[112]169[109]479[365]708[544]3315[2369]650[772]1.41[1.72]
FNV-1a101[124]651[428]62[108]235[94]233[105]470[374]624[555]3147[2446]517[807]1.43[1.76]
MaPrime2c107[103]698[426]64[106]250[91]248[106]506[349]675[550]3417[2406]540[865]1.52[1.72]
Ramakrishna106[108]721[409]61[91]273[125]268[103]507[360]659[528]3354[2383]517[840]1.54[1.65]
Arash Partow106[101]725[435]91[420]276[98]270[85]508[355]671[570]3317[2372]542[779]1.62[3.87]
One At Time119[105]825[421]81[110]315[97]312[103]576[364]753[545]3851[2346]657[795]1.81[1.74]
Weinberger118[104]955[422]54[100]374[111]375[117]621[364]733[712]4008[2547]560[744]1.88[1.74]
Hanson86[118]524[649]55[112]165[118]1476[499]388[435]546[592]2696[2890]462[833]2.32[2.44]

The functions that read WORD at time (Fletcher and Paul Hseih) become faster when the strings are aligned. Lookup3 relies on alignment (see its code), so it's also faster in this version.

Georgi, thanks for publishing your results.

Georgi 'Sanmayce',
I purged all previous revisions of 'FNV1A' and now I have 4 base variants:

- FNV1A_Whiz, FNV1_32_PRIME=709607
- FNV1A_Smaragd, FNV1_32_PRIME=709607
- FNV1A_Peregrine, FNV1_32_PRIME=709607
- FNV1A_Nefertiti, FNV1_32_PRIME=31 (in fact MULless)

Romanticism in me dictates this replacement: variants with no personality to have their own first name (Whiz, Smaragd, Peregrine, Nefertiti) and family name(FNV1A). Not to mention the clarity when references are to be made.

To obtain sources(with corresponding 32bit instructions) of above four(plus Sixtinsensitive+ and Sixtinsensitive and four Alfalfa variants) download the PDF booklet at: http://www.sanmayce.com/Downloads/_Kaze_10-HASHERS.pdf

I expect Peregrine(using one 64bit memory access) when compiled for 64bit to outspeed Nefertiti(former 'FNV1A_unrolled_Final').
In other hand Smaragd(using one 32bit memory access but with 2 passes) is slower than Nefertiti but with less collisions.
And Whiz is just a lucky prodigy.
I think Peregrine is most promising hasher because for longer strings with granularity 8(full-fledged when compiled as 64bit) it will fly for sure.

I want more people to experience the hash-benchmark-fun, so here is my newest test package(131,981,551 bytes):
http://www.sanmayce.com/Downloads/_KAZE_hash_test_r1.rar

It looks like this:

D:\_KAZE_new-stuff\_KAZE_hash_test_r1>dir

11/11/2010 07:52 AM 195,935 hash.cod
11/11/2010 07:52 AM 86,528 hash.exe
11/11/2010 07:31 AM <DIR> Peter_source
11/11/2010 07:52 AM 1,748 Runme.bat
11/11/2010 07:52 AM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
11/11/2010 07:52 AM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/11/2010 07:52 AM 1,121,365 Word-list_00,105,982_English_Spell-Check_High-Quality.wrd
11/11/2010 07:52 AM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/11/2010 07:52 AM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
11/11/2010 07:52 AM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/11/2010 07:52 AM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd

D:\_KAZE_new-stuff\_KAZE_hash_test_r1>Runme.bat
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv" 1>"Word-list_00,584,879.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,584,879.txt" 1>"Word-list_00,584,879_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,584,879.txt" 1>"Word-list_00,584,879_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt" 1>"Sentence-list_00,032,359.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Sentence-list_00,032,359.txt" 1>"Sentence-list_00,032,359_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Sentence-list_00,032,359.txt" 1>"Sentence-list_00,032,359_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd" 1>"Word-list_00,038,936.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,038,936.txt" 1>"Word-list_00,038,936_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,038,936.txt" 1>"Word-list_00,038,936_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,105,982_English_Spell-Check_High-Quality.wrd" 1>"Word-list_00,105,982.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,105,982.txt" 1>"Word-list_00,105,982_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,105,982.txt" 1>"Word-list_00,105,982_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd" 1>"Word-list_00,351,114.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,351,114.txt" 1>"Word-list_00,351,114_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,351,114.txt" 1>"Word-list_00,351,114_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd" 1>"Word-list_12,561,874.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_12,561,874.txt" 1>"Word-list_12,561,874_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_12,561,874.txt" 1>"Word-list_12,561,874_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd" 1>"Word-list_22,202,980.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_22,202,980.txt" 1>"Word-list_22,202,980_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_22,202,980.txt" 1>"Word-list_22,202,980_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>echo Done.
Done.

D:\_KAZE_new-stuff\_KAZE_hash_test_r1>dir

11/11/2010 07:52 AM 195,935 hash.cod
11/11/2010 07:52 AM 86,528 hash.exe
11/11/2010 07:31 AM <DIR> Peter_source
11/11/2010 07:52 AM 1,748 Runme.bat
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359.txt
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359_COLLISIONS.txt
11/11/2010 07:52 AM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359_SPEED.txt
11/11/2010 08:30 AM 2,938 Word-list_00,038,936.txt
11/11/2010 08:30 AM 2,938 Word-list_00,038,936_COLLISIONS.txt
11/11/2010 07:52 AM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/11/2010 08:30 AM 2,938 Word-list_00,038,936_SPEED.txt
11/11/2010 08:30 AM 2,939 Word-list_00,105,982.txt
11/11/2010 08:30 AM 2,939 Word-list_00,105,982_COLLISIONS.txt
11/11/2010 07:52 AM 1,121,365 Word-list_00,105,982_English_Spell-Check_High-Quality.wrd
11/11/2010 08:30 AM 2,939 Word-list_00,105,982_SPEED.txt
11/11/2010 08:30 AM 2,940 Word-list_00,351,114.txt
11/11/2010 08:30 AM 2,940 Word-list_00,351,114_COLLISIONS.txt
11/11/2010 07:52 AM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/11/2010 08:30 AM 2,940 Word-list_00,351,114_SPEED.txt
11/11/2010 08:30 AM 2,940 Word-list_00,584,879.txt
11/11/2010 08:30 AM 2,940 Word-list_00,584,879_COLLISIONS.txt
11/11/2010 07:52 AM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
11/11/2010 08:30 AM 2,940 Word-list_00,584,879_SPEED.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874_COLLISIONS.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874_SPEED.txt
11/11/2010 07:52 AM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/11/2010 09:19 AM 2,943 Word-list_22,202,980.txt
11/11/2010 09:19 AM 2,943 Word-list_22,202,980_COLLISIONS.txt
11/11/2010 09:19 AM 2,943 Word-list_22,202,980_SPEED.txt
11/11/2010 07:52 AM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd

For Intel T3400 Merom 2.16GHz, the contents of 'Sentence-list_00,032,359_SPEED.txt' which in fact is the 32,359 The_Holy_Bible's sentences:

65536 elements in the table (16 bits)
32359 lines read
Fletcher: 28567 28709 28329 28521 28596| 28329 [ 7209]
FNV1A_Nefertiti: 30434 29919 30281 30112 30354| 29919 [ 6878]
FNV1A_Peregrine: 30652 30654 30774 30711 30728| 30652 [ 6838]
FNV1A_Whiz: 31917 31798 31608 31624 31934| 31608 [ 6874]
Murmur2: 32442 31898 31622 31843 31838| 31622 [ 6786]
Sixtinsensitive+: 34803 35073 35052 35265 35171| 34803 [ 6839]
SBox: 35864 36651 35840 35964 36036| 35840 [ 6839]
Novak unrolled: 37072 36698 36464 36601 37234| 36464 [ 6826]
Sixtinsensitive: 38126 38161 38783 38085 38243| 38085 [ 6876]
Paul Hsieh: 41114 40952 40467 40903 41335| 40467 [ 6874]
FNV1A_Smaragd: 40618 40934 40921 40550 40621| 40550 [ 6849]
lookup3: 42606 42204 42536 42584 42619| 42204 [ 6805]
Alfalfa_QWORD: 48411 47007 47281 47491 47252| 47007 [ 6943]
CRC-32: 47458 48495 47733 47692 48641| 47458 [ 6891]
Hanson: 49364 50769 49310 49310 49223| 49223 [ 19602]
Alfalfa: 53467 52707 53514 52711 52163| 52163 [ 6943]
x65599: 52965 53324 52546 52944 52932| 52546 [ 6859]
Paul Larson: 53238 52846 52558 53073 52954| 52558 [ 6889]
x17 unrolled: 53407 53882 53568 53693 53760| 53407 [ 6827]
Alfalfa_HALF: 53931 54075 54072 53659 53691| 53659 [ 6821]
Alfalfa_DWORD: 56396 56912 57265 57019 56184| 56184 [ 6943]
FNV-1a: 61401 61344 61129 60464 61180| 60464 [ 6840]
Bernstein: 62958 62738 63534 62466 62337| 62337 [ 6858]
K&R: 63326 63728 62776 62787 62992| 62776 [ 6785]
Sedgewick: 64195 63916 64174 63471 64562| 63471 [ 6858]
MaPrime2c: 63779 64134 63573 63849 64284| 63573 [ 6950]
Ramakrishna: 68307 69456 68476 68773 67562| 67562 [ 6943]
Arash Partow: 75811 75134 74359 74849 75068| 74359 [ 6845]
One At Time: 75324 76130 75891 75592 76803| 75324 [ 6937]
Weinberger: 102085 103031 102324 101752 101774| 101752 [ 6871]

For Intel T3400 Merom 2.16GHz, the contents of 'Word-list_22,202,980_COLLISIONS.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:

67108864 elements in the table (26 bits)
22202980 lines read
Alfalfa_HALF: 21589312 21582860 21557249 21529387 21529700| 21529387 [ 3286890]
Alfalfa: 21797854 21768913 21771533 21790962 21787538| 21768913 [ 3288684]
Alfalfa_QWORD: 22137983 22122623 22102676 22106176 22115088| 22102676 [ 3288684]
Alfalfa_DWORD: 21922825 21938602 21896015 21906405 21938735| 21896015 [ 3288684]
Bernstein: 22069765 22066886 22078223 22068073 22070701| 22066886 [ 3290766]
K&R: 21767970 21769291 21786733 21808794 21781952| 21767970 [ 3290941]
Paul Larson: 22217132 22208663 22221180 22257973 22232433| 22208663 [ 3296692]
FNV-1a: 24783327 24754595 24761221 24768850 24773851| 24754595 [ 3297552]
Murmur2: 24429581 24436845 24439565 24473986 24420091| 24420091 [ 3297709]
SBox: 24497356 24495252 24474408 24482514 24486617| 24474408 [ 3298021]
FNV1A_Smaragd: 24892728 24657954 24713885 24642826 24653872| 24642826 [ 3298433]
CRC-32: 24315125 24364015 24358932 24370230 24357086| 24315125 [ 3298998]
lookup3: 25152714 25163446 25180860 25174309 25171522| 25152714 [ 3299369]
MaPrime2c: 25468907 25497087 25515137 25496127 25511109| 25468907 [ 3299747]
Sedgewick: 23187075 23188783 23201984 23182620 23187676| 23182620 [ 3302263]
One At Time: 25752155 25745719 25763766 25769312 25760894| 25745719 [ 3304908]
Ramakrishna: 22571194 22588985 22583130 22584062 22567869| 22567869 [ 3321824]
x65599: 22869457 22871793 22868825 22899146 22893458| 22868825 [ 3325064]
Arash Partow: 23330844 23304012 23325130 23319561 23321201| 23304012 [ 3325683]
FNV1A_Peregrine: 24709308 24696956 24739156 24542282 24027705| 24027705 [ 3333193]
FNV1A_Whiz: 23260698 23284708 23261485 23231186 23274844| 23231186 [ 3369088]
Sixtinsensitive: 23749686 23754606 23752685 23771534 23756371| 23749686 [ 3373923]
Hanson: 24678894 24686360 24668732 24692393 24679870| 24668732 [ 3408497]
Paul Hsieh: 25165181 25195490 25185265 25191703 25191274| 25165181 [ 3498543]
FNV1A_Nefertiti: 23178958 23204612 23158585 23201471 23168530| 23158585 [ 3505371]
Sixtinsensitive+: 23695832 23697515 23670644 23686530 23677100| 23670644 [ 3507772]
x17 unrolled: 21089676 21072607 21073703 21093963 21080539| 21072607 [ 3830652]
Weinberger: 27205172 27183550 27221767 27207192 27186851| 27183550 [ 5732660]
Novak unrolled: 76893278 76881102 76828362 76905661 76913653| 76828362 [10591108]
Fletcher: 61012927 60973684 60990148 60993752 60974823| 60973684 [14915258]

For Intel Atom N450 Pineview-N 1.66GHz, the contents of 'Word-list_00,351,114_SPEED.txt' which in fact is the spell-checker's words:

Word-list_00,351,114_SPEED.txt
1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Nefertiti: 306194 306573 306192 308890 306516| 306192 [ 52963]
FNV1A_Whiz: 309750 309663 315782 309936 309856| 309663 [ 52966]
Sixtinsensitive+: 311474 311330 311115 314129 313496| 311115 [ 53040]
Alfalfa_HALF: 312465 312981 312385 316337 312773| 312385 [ 52454]
K&R: 313431 321837 312990 319829 313343| 312990 [ 52642]
FNV1A_Peregrine: 319362 329116 322006 319367 319748| 319362 [ 52551]
Bernstein: 321816 321772 321580 324893 322049| 321580 [ 52770]
x17 unrolled: 322271 321899 321849 324666 322026| 321849 [ 53556]
Paul Hsieh: 326436 322757 322904 322812 323075| 322757 [ 52729]
Sixtinsensitive: 325249 325427 325026 331674 325273| 325026 [ 53081]
Murmur2: 326730 326620 326377 332203 326445| 326377 [ 52738]
FNV1A_Smaragd: 329728 326406 329167 326707 326948| 326406 [ 52774]
lookup3: 334076 327371 327638 328289 328143| 327371 [ 52868]
Alfalfa: 329238 329437 329020 331666 330027| 329020 [ 52594]
Arash Partow: 333027 330317 330431 338750 330578| 330317 [ 52887]
Ramakrishna: 337692 331050 330770 330642 331796| 330642 [ 52764]
CRC-32: 335909 332821 332915 333260 332676| 332676 [ 52931]
Paul Larson: 335006 341362 335552 335411 335263| 335006 [ 52970]
Alfalfa_DWORD: 335984 336655 335764 341768 336394| 335764 [ 52594]
x65599: 337815 338067 337403 339999 337224| 337224 [ 52988]
Novak unrolled: 338616 338544 340872 339003 338570| 338544 [ 70274]
Alfalfa_QWORD: 342953 343383 343121 345655 346083| 342953 [ 52594]
Hanson: 345076 345541 344846 347805 345012| 344846 [ 57741]
Sedgewick: 346944 346191 348871 346527 346158| 346158 [ 52920]
FNV-1a: 352768 352508 357287 352658 352651| 352508 [ 52829]
SBox: 353603 353294 359148 353330 353175| 353175 [ 52688]
One At Time: 370592 369297 368441 368550 368128| 368128 [ 52836]
MaPrime2c: 398537 400557 398662 398581 398967| 398537 [ 52435]
Weinberger: 415567 418102 416656 415559 416279| 415559 [ 103386]
Fletcher: 440953 440887 440794 443035 441821| 440794 [ 182747]

For Intel Atom N450 Pineview-N 1.66GHz, the contents of 'Word-list_22,202,980_SPEED.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:

67108864 elements in the table (26 bits)
22202980 lines read
K&R: 27212349 27196471 27200039 27186453 27196511| 27186453 [ 3290941]
Alfalfa_HALF: 27356819 27307195 27297075 27307918 27303211| 27297075 [ 3286890]
x17 unrolled: 27339582 27350110 27357010 27339325 27356507| 27339325 [ 3830652]
Bernstein: 27868098 27885616 27865719 27879787 27863864| 27863864 [ 3290766]
Ramakrishna: 28665899 28666251 28682862 28664667 28671544| 28664667 [ 3321824]
Alfalfa: 28888653 28886944 28877923 28887710 28891342| 28877923 [ 3288684]
FNV1A_Nefertiti: 29213855 29223509 29198537 29215353 29230921| 29198537 [ 3505371]
Alfalfa_DWORD: 29369155 29377827 29371609 29385218 29372532| 29369155 [ 3288684]
Paul Larson: 29416290 29420142 29438150 29438707 29445835| 29416290 [ 3296692]
Sixtinsensitive+: 29620313 29611448 29599723 29620233 29601234| 29599723 [ 3507772]
Arash Partow: 29601009 29607096 29607259 29604831 29600309| 29600309 [ 3325683]
Alfalfa_QWORD: 29766556 29768872 29761910 29768865 29750096| 29750096 [ 3288684]
Sixtinsensitive: 30910025 30227158 30218696 30253983 30229414| 30218696 [ 3373923]
FNV1A_Whiz: 30529924 30573778 30522564 30513371 30532538| 30513371 [ 3369088]
x65599: 30639384 30626734 30637721 30629312 30667952| 30626734 [ 3325064]
FNV1A_Peregrine: 31062527 31065744 31063962 31074164 31060828| 31060828 [ 3333193]
Sedgewick: 31231177 31217657 31227213 31231472 31210256| 31210256 [ 3302263]
Paul Hsieh: 31504060 31514350 31516370 31506898 31500243| 31500243 [ 3498543]
lookup3: 31603211 31576806 31546659 31551113 31562355| 31546659 [ 3299369]
FNV1A_Smaragd: 31788377 31580190 31580953 31575398 31584602| 31575398 [ 3298433]
Murmur2: 31588844 31591137 31578842 31585076 31581288| 31578842 [ 3297709]
CRC-32: 31993878 31986891 31976616 31993457 31978443| 31976616 [ 3298998]
Hanson: 32697671 32708803 32703678 32697383 32699428| 32697383 [ 3408497]
FNV-1a: 33097899 33081494 33091676 33097100 33087453| 33081494 [ 3297552]
Weinberger: 33148065 33152666 33133844 33143483 33153828| 33133844 [ 5732660]
SBox: 33418326 33433409 33412257 33432103 33433903| 33412257 [ 3298021]
One At Time: 34447974 34462104 34469829 34453808 34453369| 34447974 [ 3304908]
MaPrime2c: 36544157 36541591 36553914 36532887 36536317| 36532887 [ 3299747]
Fletcher: 85211946 85207760 85213725 85234580 85240698| 85207760 [14915258]
Novak unrolled: 122659701 122684073 122688489 122710846 122677634| 122659701 [10591108]

For Intel Q9550 2.83GHz Yorkfield, the contents of 'Word-list_00,351,114_SPEED.txt' which in fact is the spell-checker's words:

1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Whiz: 126979 126965 126909 127074 126892| 126892 [ 52966]
FNV1A_Nefertiti: 127558 127674 127561 127462 127655| 127462 [ 52963]
Novak unrolled: 133056 131005 130946 130748 130608| 130608 [ 70274]
FNV1A_Peregrine: 131261 131375 132194 131400 131466| 131261 [ 52551]
FNV1A_Smaragd: 135540 132354 132447 132105 134366| 132105 [ 52774]
Sixtinsensitive+: 140261 133781 132937 132972 133011| 132937 [ 53040]
Alfalfa: 133692 134306 134139 133886 133746| 133692 [ 52594]
x17 unrolled: 138124 135728 135652 138535 136985| 135652 [ 53556]
Alfalfa_HALF: 136099 135873 135958 136045 135852| 135852 [ 52454]
Alfalfa_DWORD: 139382 136591 137320 136955 136703| 136591 [ 52594]
SBox: 136660 136629 136652 139976 137095| 136629 [ 52688]
CRC-32: 136776 136682 137740 136886 137013| 136682 [ 52931]
Sixtinsensitive: 137325 139569 137416 138149 137460| 137325 [ 53081]
Murmur2: 137708 137451 137466 137370 137645| 137370 [ 52738]
Paul Larson: 137963 137860 137821 137806 137844| 137806 [ 52970]
Alfalfa_QWORD: 138891 140860 138630 138520 139282| 138520 [ 52594]
Hanson: 138905 138985 141471 140417 139221| 138905 [ 57741]
x65599: 139179 141376 138957 139922 139037| 138957 [ 52988]
K&R: 142387 142494 142280 142263 142276| 142263 [ 52642]
Bernstein: 143426 143500 143558 144583 143804| 143426 [ 52770]
Paul Hsieh: 144100 144173 145563 144810 144435| 144100 [ 52729]
Sedgewick: 145909 145885 145971 145638 146653| 145638 [ 52920]
lookup3: 147183 147415 148043 149721 147718| 147183 [ 52868]
FNV-1a: 147512 147545 147397 148011 149087| 147397 [ 52829]
Ramakrishna: 148659 156452 148559 148661 148236| 148236 [ 52764]
Fletcher: 149293 149231 149835 151863 149492| 149231 [ 182747]
Arash Partow: 149436 149511 149354 149420 149339| 149339 [ 52887]
MaPrime2c: 152965 153024 152778 153176 152732| 152732 [ 52435]
One At Time: 159041 158835 164564 159598 158996| 158835 [ 52836]
Weinberger: 197287 197401 197641 199104 199606| 197287 [ 103386]

For Intel Q9550 2.83GHz Yorkfield, the contents of 'Word-list_22,202,980_SPEED.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:

67108864 elements in the table (26 bits)
22202980 lines read
x17 unrolled: 16711968 16714788 16712787 16715049 16715557| 16711968 [ 3830652]
Alfalfa_HALF: 16925846 16921816 16935959 16918072 16920664| 16918072 [ 3286890]
Alfalfa: 16949669 16948259 16945026 16947268 16949278| 16945026 [ 3288684]
Alfalfa_DWORD: 17087653 17089499 17090082 17092342 17092248| 17087653 [ 3288684]
K&R: 17217212 17215582 17214916 17211997 17210891| 17210891 [ 3290941]
Alfalfa_QWORD: 17284333 17281698 17296440 17284594 17287022| 17281698 [ 3288684]
Paul Larson: 17287965 17288993 17286715 17288186 17290720| 17286715 [ 3296692]
Bernstein: 17441366 17441744 17446211 17446642 17448115| 17441366 [ 3290766]
x65599: 17516305 17512376 17512059 17510499 17514638| 17510499 [ 3325064]
FNV1A_Whiz: 17578315 17575074 17574692 17572083 17571654| 17571654 [ 3369088]
FNV1A_Nefertiti: 17812040 17810394 17815863 17800634 17802965| 17800634 [ 3505371]
Sedgewick: 17813196 17805720 17809107 17808987 17809341| 17805720 [ 3302263]
Ramakrishna: 17821031 17818277 17816051 17817365 17820101| 17816051 [ 3321824]
FNV1A_Smaragd: 18156291 18032451 18056304 18014531 18010227| 18010227 [ 3298433]
FNV1A_Peregrine: 18037980 18041105 18047264 18049949 18050438| 18037980 [ 3333193]
Arash Partow: 18159192 18218352 18156661 18148943 18148120| 18148120 [ 3325683]
Sixtinsensitive+: 18273937 18273231 18272381 18276826 18272754| 18272381 [ 3507772]
CRC-32: 18299132 18303268 18300555 18307328 18296881| 18296881 [ 3298998]
Murmur2: 18312461 18309171 18315023 18312814 18306135| 18306135 [ 3297709]
Sixtinsensitive: 18326606 18325448 18319761 18326255 18320762| 18319761 [ 3373923]
SBox: 18454359 18453094 18457853 18448201 18451512| 18448201 [ 3298021]
Hanson: 18592317 18585036 18585941 18597290 18595878| 18585036 [ 3408497]
Paul Hsieh: 18931940 18935308 18939618 18942478 18940287| 18931940 [ 3498543]
lookup3: 18978329 18970650 18969792 18967985 18983381| 18967985 [ 3299369]
FNV-1a: 19055865 19051343 19051202 19054812 19049558| 19049558 [ 3297552]
MaPrime2c: 19497746 19470760 19474411 19474490 19476430| 19470760 [ 3299747]
One At Time: 19826720 19775854 19767551 19765074 19763605| 19763605 [ 3304908]
Weinberger: 21866786 21864914 21867700 21862238 21866417| 21862238 [ 5732660]
Fletcher: 44235533 44193774 44142011 44138717 44158624| 44138717 [14915258]
Novak unrolled: 52067982 52067088 52068005 52064574 52068712| 52064574 [10591108]

Any suggestions(adding more files/functions/...) are appreciated.

Something strange(beyond my understanding) is going on: on wikipedia's words a significant hampering appeared from nowhere!? Any idea for this mystery?
Georgi 'Sanmayce',
Peter, I am still stunned by speed degradation for FNV1A variants, I have still no clue what causes it.
Just for longer keys here comes the Jester - an unrolled Whiz:

UINT FNV1A_Hash_Jester(const char *str, SIZE_T wrdlen)
{
const UINT PRIME = 709607;
UINT hash32 = 2166136261;
const char *p = str;

// Idea comes from Igor Pavlov's 7zCRC, thanks.
/*
for(; wrdlen && ((unsigned)(ptrdiff_t)p&3); wrdlen -= 1, p++) {
hash32 = (hash32 ^ *p) * PRIME;
}
*/
for(; wrdlen >= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
hash32 = (hash32 ^ *(DWORD *)p) * PRIME;
hash32 = (hash32 ^ *(DWORD *)(p+4)) * PRIME;
}
// Cases: 0,1,2,3,4,5,6,7
if (wrdlen & sizeof(DWORD)) {
hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
p += sizeof(DWORD);
}
if (wrdlen & sizeof(WORD)) {
hash32 = (hash32 ^ *(WORD*)p) * PRIME;
p += sizeof(WORD);
}
if (wrdlen & 1)
hash32 = (hash32 ^ *p) * PRIME;

return hash32 ^ (hash32 >> 16);
}

262144 elements in the table (18 bits)
105982 lines read
FNV1A_Jester: 52198 52248 51275 52071 50515| 50515 [ 18774]
FNV1A_Whiz: 51168 52620 51802 52121 50774| 50774 [ 18774]

1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Jester: 237708 235011 234586 234747 235305| 234586 [ 52966]
FNV1A_Whiz: 239372 239256 238010 238354 239165| 238010 [ 52966]
Georgi 'Sanmayce',
For longer(than ordinary words) strings i.e. phrases/sentences/files here comes the FNV1A_Jesteress - simply the fastest hasher so far:
http://encode.ru/threads/1160-Fastest-non-secure-hash-function!
Georgi 'Sanmayce',

Hi Peter,

Having read some critics I added the test on enwik8 some 900,000 lines with all kind of keys in it:

You are welcome to home of FNV1A_Jesteress:

http://www.sanmayce.com/Fastest_Hash/index.html

Regards

ace,

Georgi's page contains an interesting link to:

http://cbloomrants.blogspot.com/2010/11/11-19-10-hashes-and-cache-tables.html

It's written by the guy behind http://www.cbloom.com and if I understand correctly he is also interested in using hashes for compression algorithms, not only in compiler-needed hash tables.

Peter Kankowski,

Georgi, I've added Jesteress to the test; thank you very much for your efforts. Jesteress is even faster than Whiz! :) The number of collisions in Numbers test remains high, though.

Ace, thanks for the link. He benchmarked STLport, a complex implementation that grows the hash table, uses jump tables and prime numbers for table size. Something similar to what you wanted to do.

I finally got stable results on Core i5 by increasing the number of runs. There are some potentially useful ideas for statistical tests (chi-square, etc.) at Murmur page.

Georgi 'Sanmayce',

Peter, I do not care about numbers, IPs and other numeric datasets, anyway just for making all fans happy here comes Meiyan(Beauty, Charming Eyes or most precisely: SOULFUL EYES):

FNV1A_Jesteress gives way to FNV1A_Meiyan because of better mixing - the last DWORD is split into two WORD passes to avoid losing the carries.

#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Meiyan(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	const char *p = str;

	// Idea comes from Igor Pavlov's 7zCRC, thanks.
/*
	for(; wrdlen && ((unsigned)(ptrdiff_t)p&3); wrdlen -= 1, p++) {
		hash32 = (hash32 ^ *p) * PRIME;
	}
*/
	for(; wrdlen >= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME;		
	}
	// Cases: 0,1,2,3,4,5,6,7
	if (wrdlen & sizeof(DWORD)) {
//		hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
//		p += sizeof(DWORD);
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
	}
	if (wrdlen & 1) 
		hash32 = (hash32 ^ *p) * PRIME;

	return hash32 ^ (hash32 >> 16);
}

Thus, as in one of my favorite movies where Meiyan transforms(only physically) into a princess from an ugly imp similarly here the ugly(because of numeric ugliness) Jesteress evolves to a sub-fantastic entity.

Intel Core i5 gives:

iSCSI CRC	66	[105]	335	[415]	36	[112]	86	[106]	84	[92]	285	[368]	414	[584]	1978	[2388]	324	[838]	1.02	[1.77]
Jesteress	66	[110]	325	[397]	45	[300]	85	[102]	84	[106]	272	[366]	407	[585]	1953	[2427]	366	[1499]	1.05	[3.19]

Intel Merom gives:

Words 

500 lines read

1024 elements in the table (10 bits)

        FNV1A_Meiyan:         92        80        78        77        77|        77 [     102]
     FNV1A_Jesteress:         86        80        79        79        78|        78 [     110]
Win32 

1992 lines read

4096 elements in the table (12 bits)

        FNV1A_Meiyan:       1850      1847      1832      1877      1865|      1832 [     408]
     FNV1A_Jesteress:       1863      1714      1715      1716      1710|      1710 [     397]
Numbers 

500 lines read

1024 elements in the table (10 bits)

        FNV1A_Meiyan:         55        50        50        50        50|        50 [     125]
     FNV1A_Jesteress:         56        56        56        56        56|        56 [     300]
Prefix 

500 lines read

1024 elements in the table (10 bits)

        FNV1A_Meiyan:        123       115       115       115       115|       115 [     106]
     FNV1A_Jesteress:        116       114       113       113       113|       113 [     102]
Postfix 

500 lines read

1024 elements in the table (10 bits)

        FNV1A_Meiyan:        120       112       111       111       111|       111 [     112]
     FNV1A_Jesteress:        110       109       107       106       106|       106 [     106]
Variables 

1842 lines read

4096 elements in the table (12 bits)

        FNV1A_Meiyan:        396       349       397       347       347|       347 [     350]
     FNV1A_Jesteress:        352       344       403       351       344|       344 [     366]
Sonnets 

3228 lines read

8192 elements in the table (13 bits)

        FNV1A_Meiyan:        540       512       508       510       510|       508 [     588]
     FNV1A_Jesteress:        561       513       509       509       508|       508 [     585]
UTF-8 

13408 lines read

32768 elements in the table (15 bits)

        FNV1A_Meiyan:       2530      2436      2430      2426      2435|      2426 [    2377]
     FNV1A_Jesteress:       2420      2403      2934      2410      2404|      2403 [    2427]
IPv4 

3925 lines read

8192 elements in the table (13 bits)

        FNV1A_Meiyan:        421       398       397       398       394|       394 [     768]
     FNV1A_Jesteress:        402       406       401       403       403|       401 [    1499]

D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash enwik8

919074 lines read

2097152 elements in the table (21 bits)

        FNV1A_Meiyan:    1070854   1053328   1061062   1049874   1063539|   1049874 [  315896]
     FNV1A_Jesteress:    1056365   1065933   1048484   1048852   1049749|   1048484 [  316210]

D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash enwik9

10920423 lines read

33554432 elements in the table (25 bits)

        FNV1A_Meiyan:   13539353  13374523  13366537  13362240  13389443|  13362240 [ 4562577]
     FNV1A_Jesteress:   13289862  13337087  13293203  13296861  13525346|  13289862 [ 4564891]

D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"

12561874 lines read

33554432 elements in the table (25 bits)

        FNV1A_Meiyan:   13010074  12874395  12885821  12923525  12927757|  12874395 [ 2111271]
     FNV1A_Jesteress:   12906637  12879373  12871222  13125286  12886406|  12871222 [ 2121868]

D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"

22202980 lines read

67108864 elements in the table (26 bits)

        FNV1A_Meiyan:   25300254  24965950  24905897  25175291  25028349|  24905897 [ 3345260]
     FNV1A_Jesteress:   24844250  24846907  24835438  24832771  25086445|  24832771 [ 3355676]

Meiyan: "Hell that no fury like a beauty scorned. Haven't you heard?"

Meiyan: "No wonder they say all good people are tricky."

Ha-ha, right said Meiyan.

Meiyan: "I'm not garbage. Don't just throw me away. I'm not just a thing. I have a name! My name is Meiyan."

Peter Kankowski,
Thank you very much! Meiyan scores well in all of my tests. I added it to the list of recommended functions (in Conclusion).
Georgi 'Sanmayce',

Peter,

I knew where the weakness(regarding collisions) was, it is no more i.e. it is amended with arrival of FNV1A_Mantis the strongest of all my FNV1A variants.

Now FNV1A_Meiyan is between FNV1A_Jesteress and FNV1A_Mantis.

Personally my favorite(because my way of using her differs a lot from that of other people) is still FNV1A_Jesteress despite her apparent collision drawback in interval 8chars to 10chars: due to either one BYTE(8+1) or WORD(8+2) mix.

I tried(unseriously) to fix it in FNV1A_Meiyan, so here comes FNV1A_Mantis(00562-004a0=194bytes fattest so far):

// Mantis has two(three to be exact) gears: it operates as WORD based FNV1A for 1..15 lengths and as QWORD based FNV1A 16.. lengths.
// I see the instant mantis' grasping-and-devouring as MONSTROUS QUADRO-BYTE-PAIRs BAITs(IN-MIX) while target secured within FIRM-GRIP of forelimbs(PRE-MIX & POST-MIX).
// Word 'mantical'(Of or relating to the foretelling of events by or as if by supernatural means) comes from Greek mantikos, from the Greek word mantis, meaning "prophet, seer."
// The Greeks, who made the connection between the upraised front legs of a mantis waiting for its prey and the hands of a prophet in prayer, used the name mantis to mean "the praying mantis."
#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	const char *p = str;
	// Cases: 0,1,2,3,4,5,6,7
	if (wrdlen & sizeof(DWORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(DWORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(WORD);
	}
	if (wrdlen & 1) {
		hash32 = (hash32 ^ *p) * PRIME;
		p += sizeof(char);
		//wrdlen -= sizeof(char);
	}
		wrdlen -= p-str;
// The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress.
// FNV1A_Jesteress: fastest strong
// FNV1A_Meiyan   : faster  stronger
// FNV1A_Mantis   : fast    strongest
	for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME;		
	}
		hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME;
	return hash32 ^ (hash32 >> 16);
}

Ok, now some heavy hash hustle with fixed-length-ASCII-strings, in my opinion this is the most relevant and close to practice(here: the fundamental match finding) benchmark.

At link below this summer I approached in a dummy way LZ match finding by counting the repetitions(in rich of English text OSHO.TXT) through Building-Blocks_DUMPER:

http://encode.ru/threads/1134-Dummy-Static-Windowless-Dictionary-Text-Decompressor?p=22653&viewfull=1#post22653

Length of Building-Blocks / Quantity of ALL(with overlapping) Building-Blocks / Quantity of DISTINCT(with overlapping) Building-Blocks / Quantity of REPETITIVE(with overlapping) Building-Blocks

3                           206908949-3+1                                       46486                                                    206862461
4                           206908949-4+1                                       248019                                                   206660927
5                           206908949-5+1                                       855682                                                   206053263
6                           206908949-6+1                                       2236138                                                  204672806
7                           206908949-7+1                                       4803152                                                  202105791
8                           206908949-8+1                                       8956496                                                  197952446
9                           206908949-9+1                                       15006172                                                 191902769
10                          206908949-10+1                                      22992127                                                 183916813
11                          206908949-11+1                                      32707519                                                 174201420
12                          206908949-12+1                                      43802365                                                 163106573

D:\_KAZE_new-stuff\r3>dir

Volume in drive D is H320_Vol5

Volume Serial Number is 0CB3-C881

 Directory of D:\_KAZE_new-stuff\r3

12/02/2010  08:15 AM    <DIR>          .
12/02/2010  08:15 AM    <DIR>          ..
12/02/2010  08:15 AM            45,501 Building-Blocks_DUMPER.c
12/02/2010  08:15 AM            79,360 Building-Blocks_DUMPER.exe
12/02/2010  08:15 AM           223,954 hash.cod
12/02/2010  08:15 AM            56,186 hash.cpp
12/02/2010  08:15 AM            81,920 hash.exe
12/02/2010  08:15 AM       206,908,949 OSHO.TXT
12/02/2010  08:15 AM               394 RUNME.BAT
               7 File(s)    207,396,264 bytes
               2 Dir(s)   6,020,390,912 bytes free

D:\_KAZE_new-stuff\r3>Building-Blocks_DUMPER.exe

Building-Blocks_DUMPER rev.2, written by Kaze.

Note: This revision converts CR to $ and LF to # in order to have lines patternlen long ending with LF.

Sorting 206908947 Pointers to Building-Blocks 3 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB003.txt ...

3|206908949-3+1|46486|206862461

Sorting 206908946 Pointers to Building-Blocks 4 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB004.txt ...

4|206908949-4+1|248019|206660927

Sorting 206908945 Pointers to Building-Blocks 5 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB005.txt ...

5|206908949-5+1|855682|206053263

Sorting 206908944 Pointers to Building-Blocks 6 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB006.txt ...

6|206908949-6+1|2236138|204672806

Sorting 206908943 Pointers to Building-Blocks 7 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB007.txt ...

7|206908949-7+1|4803152|202105791

Sorting 206908942 Pointers to Building-Blocks 8 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB008.txt ...

8|206908949-8+1|8956496|197952446

Sorting 206908941 Pointers to Building-Blocks 9 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB009.txt ...

9|206908949-9+1|15006172|191902769

Sorting 206908940 Pointers to Building-Blocks 10 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB010.txt ...

10|206908949-10+1|22992127|183916813

Sorting 206908939 Pointers to Building-Blocks 11 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB011.txt ...

11|206908949-11+1|32707519|174201420

Sorting 206908938 Pointers to Building-Blocks 12 chars in size ...

Allocated memory for pointers-to-words in MB: 790

Writing Sorted Building-Blocks to BB012.txt ...

12|206908949-12+1|43802365|163106573

Building-Blocks_DUMPER total time: 2658329 clocks

D:\_KAZE_new-stuff\r3>dir

Volume in drive D is H320_Vol5

Volume Serial Number is 0CB3-C881

 Directory of D:\_KAZE_new-stuff\r3

12/02/2010  08:55 AM    <DIR>          .
12/02/2010  08:55 AM    <DIR>          ..
12/02/2010  08:18 AM           185,944 BB003.txt
12/02/2010  08:22 AM         1,240,095 BB004.txt
12/02/2010  08:25 AM         5,134,092 BB005.txt
12/02/2010  08:29 AM        15,652,966 BB006.txt
12/02/2010  08:33 AM        38,425,216 BB007.txt
12/02/2010  08:37 AM        80,608,464 BB008.txt
12/02/2010  08:41 AM       150,061,720 BB009.txt
12/02/2010  08:46 AM       252,913,397 BB010.txt
12/02/2010  08:52 AM       392,490,228 BB011.txt
12/02/2010  08:59 AM       569,430,745 BB012.txt
12/02/2010  08:15 AM            45,501 Building-Blocks_DUMPER.c
12/02/2010  08:15 AM            79,360 Building-Blocks_DUMPER.exe
12/02/2010  08:15 AM           223,954 hash.cod
12/02/2010  08:15 AM            56,186 hash.cpp
12/02/2010  08:15 AM            81,920 hash.exe
12/02/2010  08:15 AM       206,908,949 OSHO.TXT
12/02/2010  08:15 AM               394 RUNME.BAT
              17 File(s)  1,713,539,131 bytes
               2 Dir(s)   4,511,014,912 bytes free

D:\_KAZE_new-stuff\r3>dir

Volume in drive D is H320_Vol5

Volume Serial Number is 0CB3-C881

 Directory of D:\_KAZE_new-stuff\r3

12/02/2010  09:08 AM    <DIR>          .
12/02/2010  09:08 AM    <DIR>          ..
12/02/2010  08:18 AM           185,944 BB003.txt
12/02/2010  08:22 AM         1,240,095 BB004.txt
12/02/2010  08:25 AM         5,134,092 BB005.txt
12/02/2010  08:29 AM        15,652,966 BB006.txt
12/02/2010  08:33 AM        38,425,216 BB007.txt
12/02/2010  08:37 AM        80,608,464 BB008.txt
12/02/2010  08:41 AM       150,061,720 BB009.txt
12/02/2010  08:46 AM       252,913,397 BB010.txt
12/02/2010  08:52 AM       392,490,228 BB011.txt
12/02/2010  08:59 AM       569,430,745 BB012.txt
12/02/2010  08:15 AM            45,501 Building-Blocks_DUMPER.c
12/02/2010  08:15 AM            79,360 Building-Blocks_DUMPER.exe
12/02/2010  08:15 AM           223,954 hash.cod
12/02/2010  08:15 AM            56,186 hash.cpp
12/02/2010  08:15 AM            81,920 hash.exe
11/01/2009  12:00 AM       202,688,536 IP-COUNTRY-REGION-CITY.CSV
12/02/2010  09:07 AM             1,636 LONG2DOT.BAS
12/02/2010  09:07 AM            43,110 LONG2DOT.EXE
12/02/2010  08:15 AM       206,908,949 OSHO.TXT
12/02/2010  08:15 AM               394 RUNME.BAT
              20 File(s)  1,916,272,413 bytes
               2 Dir(s)   4,510,916,608 bytes free

D:\_KAZE_new-stuff\r3>LONG2DOT.EXE IP-COU~1.csv IPs.TXT

LONG2DOT.EXE, revision 001.

Written by Svalqyatchx 'Kaze'.

Example: liner ip.csv ip.txt

Input file: IP-COU~1.CSV

Output file: IPS.TXT

Lines: 2995394

LONG2DOT: Done.

D:\_KA45F~1\r3>type IP-COUNTRY-REGION-CITY.CSV|more
"0","33554431","-","-","-","-"
"33554432","50331647","UK","UNITED KINGDOM","-","-"
"50331648","50331903","US","UNITED STATES","NEW JERSEY","SUMMIT"
"50331904","50332159","US","UNITED STATES","NEW YORK","MONROE"
"50332160","50332671","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50332672","50332927","US","UNITED STATES","NEW JERSEY","LEBANON"
"50332928","50333695","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50333696","50333951","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50333952","50334719","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50334720","50334975","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50334976","50335487","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50335488","50335743","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50335744","50358271","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50358272","50358527","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50358528","50361855","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50361856","50362111","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50362112","50390527","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50390528","50390783","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50390784","50397183","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50397184","50397951","US","UNITED STATES","NEW JERSEY","SUMMIT"
"50397952","50398207","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50398208","50398463","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50398464","50398719","US","UNITED STATES","NEW JERSEY","TRENTON"
"50398720","50398975","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50398976","50399231","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50399232","50399487","US","UNITED STATES","NEW JERSEY","MEDFORD"
"50399488","50399743","US","UNITED STATES","NEW JERSEY","SUMMIT"
"50399744","50399999","US","UNITED STATES","NEW JERSEY","TRENTON"
"50400000","50400255","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50400256","50400511","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50400512","50400767","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50400768","50401023","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50401024","50401279","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50401280","50401535","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50401536","50402303","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50402304","50402559","US","UNITED STATES","NEW YORK","NEW YORK"
"50402560","50405631","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50405632","50405887","US","UNITED STATES","NEW JERSEY","TRENTON"
"50405888","50411263","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50411264","50411519","US","UNITED STATES","NEW JERSEY","TRENTON"
"50411520","50414079","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50414080","50414335","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50414336","50432255","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50432256","50432511","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50432512","50448639","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50448640","50448895","US","UNITED STATES","NEW JERSEY","MEDFORD"
"50448896","50457087","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50457088","50457343","US","UNITED STATES","MISSOURI","CHILLICOTHE"
"50457344","50463487","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
"50463488","50463743","US","UNITED STATES","NEW JERSEY","LEBANON"
"50463744","50490367","US","UNITED STATES","CONNECTICUT","FAIRFIELD"
^C
D:\_KA45F~1\r3>type IPS.TXT|more
0.0.0.0
0.0.0.2
0.0.0.3
0.1.0.3
0.2.0.3
0.4.0.3
0.5.0.3
0.8.0.3
0.9.0.3
0.12.0.3
0.13.0.3
0.15.0.3
0.16.0.3
0.104.0.3
0.105.0.3
0.118.0.3
0.119.0.3
0.230.0.3
0.231.0.3
0.0.1.3
0.3.1.3
0.4.1.3
0.5.1.3
0.6.1.3
0.7.1.3
0.8.1.3
0.9.1.3
0.10.1.3
0.11.1.3
0.12.1.3
0.13.1.3
0.14.1.3
0.15.1.3
0.16.1.3
0.17.1.3
0.20.1.3
0.21.1.3
0.33.1.3
0.34.1.3
0.55.1.3
0.56.1.3
0.66.1.3
0.67.1.3
0.137.1.3
0.138.1.3
0.201.1.3
0.202.1.3
0.234.1.3
0.235.1.3
0.3.2.3
0.4.2.3
^C

RUNME.BAT:
hash bb003.txt /s26 >RESULTS\BB.TXT
hash bb004.txt /s26 >>RESULTS\BB.TXT
hash bb005.txt /s26 >>RESULTS\BB.TXT
hash bb006.txt /s26 >>RESULTS\BB.TXT
hash bb007.txt /s26 >>RESULTS\BB.TXT
hash bb008.txt /s26 >>RESULTS\BB.TXT
hash bb009.txt /s26 >>RESULTS\BB.TXT
hash bb010.txt /s26 >>RESULTS\BB.TXT
hash bb011.txt /s26 >>RESULTS\BB.TXT
hash bb012.txt /s26 >>RESULTS\BB.TXT
hash IPS.txt >RESULTS\IP.TXT

D:\_KA45F~1\r3\RESULTS>sort IP.TXT /+75

8388608 elements in the table (23 bits)

2995394 lines read

        FNV1A_Meiyan:    2320681   2325209   2329864   2329377   2330092|   2320681 [  593723]
     FNV1A_Jesteress:    2365681   2373005   2366059   2364893   2364232|   2364232 [  691369]
        FNV1A_Mantis:    2497424   2440808   2442037   2431852   2434459|   2431852 [  481132]
              Hanson:    2443092   2452243   2440873   2450233   2454445|   2440873 [  534251]
              CRC-32:    2461652   2465529   2465261   2461036   2488104|   2461036 [  472854]
       FNV1A_Smaragd:    2478378   2483251   2480562   2478002   2479434|   2478002 [  480914]
     Alfalfa_Rollick:    2499324   2497969   2500407   2487340   2484967|   2484967 [  604098]
      Novak unrolled:    2521693   2523052   2517898   2523059   2509677|   2509677 [  657377]
             Murmur2:    2531026   2523656   2519732   2527580   2524589|   2519732 [  476330]
                 K&R:    2525950   2522337   2567966   2563603   2548883|   2522337 [  474011]
             Alfalfa:    2545089   2550223   2539683   2552206   2556244|   2539683 [  475434]
        FNV1A_Jester:    2562152   2555758   2567615   2558522   2567506|   2555758 [  689339]
       Alfalfa_DWORD:    2576601   2559221   2572886   2569249   2567150|   2559221 [  475434]
        Alfalfa_HALF:    2568949   2565813   2571174   2569459   2567984|   2565813 [  480071]
     FNV1A_Peregrine:    2578517   2587847   2580838   2584245   2583035|   2578517 [  546915]
        x17 unrolled:    2631251   2633371   2590646   2598510   2602628|   2590646 [  475528]
                SBox:    2598938   2597768   2598215   2601608   2635853|   2597768 [  476681]
         Paul Larson:    2603197   2617318   2619513   2617793   2612003|   2603197 [  475575]
           Sedgewick:    2613270   2616042   2615704   2612230   2613473|   2612230 [  477931]
              FNV-1a:    2636697   2644461   2644682   2642210   2633728|   2633728 [  477067]
          Weinberger:    2645404   2650330   2649391   2640028   2646150|   2640028 [ 1159267]
           Bernstein:    2650060   2657143   2658727   2657347   2655746|   2650060 [  474048]
          Paul Hsieh:    2659945   2664320   2664344   2668886   2660429|   2659945 [  543835]
          FNV1A_Whiz:    2675262   2665171   2677243   2678088   2704615|   2665171 [  689339]
       Alfalfa_QWORD:    2688756   2686666   2695489   2683671   2682588|   2682588 [  475434]
           MaPrime2c:    2748198   2737837   2730565   2707478   2702163|   2702163 [  477151]
             lookup3:    2710216   2706776   2711884   2718018   2717402|   2706776 [  476566]
        Arash Partow:    2723308   2726973   2725111   2727244   2728437|   2723308 [  478246]
     Sixtinsensitive:    2724973   2731232   2729760   2726649   2731038|   2724973 [  582793]
         Ramakrishna:    2769904   2793210   2788144   2770267   2750457|   2750457 [  476020]
     FNV1A_Nefertiti:    2812872   2811425   2809344   2778330   2788216|   2778330 [  763451]
    Sixtinsensitive+:    2817990   2812481   2813852   2805091   2815720|   2805091 [  716367]
         One At Time:    2830002   2828003   2823084   2827213   2821480|   2821480 [  477667]
              x65599:    2906449   2906157   2908646   2904777   2912365|   2904777 [  654463]
            Fletcher:   44224682  44243371  44320905  44280935  44190506|  44190506 [ 2856890]

D:\_KA45F~1\r3\RESULTS>sort IP.TXT /+85

8388608 elements in the table (23 bits)

2995394 lines read

              CRC-32:    2461652   2465529   2465261   2461036   2488104|   2461036 [  472854]
                 K&R:    2525950   2522337   2567966   2563603   2548883|   2522337 [  474011]
           Bernstein:    2650060   2657143   2658727   2657347   2655746|   2650060 [  474048]
             Alfalfa:    2545089   2550223   2539683   2552206   2556244|   2539683 [  475434]
       Alfalfa_DWORD:    2576601   2559221   2572886   2569249   2567150|   2559221 [  475434]
       Alfalfa_QWORD:    2688756   2686666   2695489   2683671   2682588|   2682588 [  475434]
        x17 unrolled:    2631251   2633371   2590646   2598510   2602628|   2590646 [  475528]
         Paul Larson:    2603197   2617318   2619513   2617793   2612003|   2603197 [  475575]
         Ramakrishna:    2769904   2793210   2788144   2770267   2750457|   2750457 [  476020]
             Murmur2:    2531026   2523656   2519732   2527580   2524589|   2519732 [  476330]
             lookup3:    2710216   2706776   2711884   2718018   2717402|   2706776 [  476566]
                SBox:    2598938   2597768   2598215   2601608   2635853|   2597768 [  476681]
              FNV-1a:    2636697   2644461   2644682   2642210   2633728|   2633728 [  477067]
           MaPrime2c:    2748198   2737837   2730565   2707478   2702163|   2702163 [  477151]
         One At Time:    2830002   2828003   2823084   2827213   2821480|   2821480 [  477667]
           Sedgewick:    2613270   2616042   2615704   2612230   2613473|   2612230 [  477931]
        Arash Partow:    2723308   2726973   2725111   2727244   2728437|   2723308 [  478246]
        Alfalfa_HALF:    2568949   2565813   2571174   2569459   2567984|   2565813 [  480071]
       FNV1A_Smaragd:    2478378   2483251   2480562   2478002   2479434|   2478002 [  480914]
        FNV1A_Mantis:    2497424   2440808   2442037   2431852   2434459|   2431852 [  481132]
              Hanson:    2443092   2452243   2440873   2450233   2454445|   2440873 [  534251]
          Paul Hsieh:    2659945   2664320   2664344   2668886   2660429|   2659945 [  543835]
     FNV1A_Peregrine:    2578517   2587847   2580838   2584245   2583035|   2578517 [  546915]
     Sixtinsensitive:    2724973   2731232   2729760   2726649   2731038|   2724973 [  582793]
        FNV1A_Meiyan:    2320681   2325209   2329864   2329377   2330092|   2320681 [  593723]
     Alfalfa_Rollick:    2499324   2497969   2500407   2487340   2484967|   2484967 [  604098]
              x65599:    2906449   2906157   2908646   2904777   2912365|   2904777 [  654463]
      Novak unrolled:    2521693   2523052   2517898   2523059   2509677|   2509677 [  657377]
        FNV1A_Jester:    2562152   2555758   2567615   2558522   2567506|   2555758 [  689339]
          FNV1A_Whiz:    2675262   2665171   2677243   2678088   2704615|   2665171 [  689339]
     FNV1A_Jesteress:    2365681   2373005   2366059   2364893   2364232|   2364232 [  691369]
    Sixtinsensitive+:    2817990   2812481   2813852   2805091   2815720|   2805091 [  716367]
     FNV1A_Nefertiti:    2812872   2811425   2809344   2778330   2788216|   2778330 [  763451]
          Weinberger:    2645404   2650330   2649391   2640028   2646150|   2640028 [ 1159267]
            Fletcher:   44224682  44243371  44320905  44280935  44190506|  44190506 [ 2856890]

D:\_KA45F~1\r3\RESULTS>type BB.TXT

46486 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:      34624     36535     34148     34050     35542|     34050 [      13]
        FNV1A_Meiyan:      27019     24998     24931     24732     24517|     24517 [      11]
     FNV1A_Jesteress:      25405     24216     24112     24240     24471|     24112 [      11]
        FNV1A_Jester:      24355     23938     23833     23914     24672|     23833 [      11]
       FNV1A_Smaragd:      24712     24679     24311     25139     24759|     24311 [      11]
     FNV1A_Peregrine:      34158     33475     32550     33495     33119|     32550 [      11]
          FNV1A_Whiz:      24603     24586     24919     26505     24570|     24570 [      11]
     FNV1A_Nefertiti:      16523     16228     16047     16162     16373|     16047 [    6690]
              FNV-1a:      25285     25305     25102     25242     25163|     25102 [      12]
    Sixtinsensitive+:      19649     19808     19619     20769     19910|     19619 [   31572]
     Sixtinsensitive:      19995     19031     18857     19831     18853|     18853 [   31823]
     Alfalfa_Rollick:       5618      5769      5732      5604      5692|      5604 [    5182]
             Alfalfa:       6177      5877      5878      5916      6015|      5877 [    5182]
        Alfalfa_HALF:       5829      5808      5901      6096      5934|      5808 [   10593]
       Alfalfa_DWORD:       5971      6091      5964      6193      6017|      5964 [    5182]
       Alfalfa_QWORD:       6096      6116      6103      5976      6019|      5976 [    5182]
           Bernstein:       6743      6508      6616      6860      6642|      6508 [   12855]
                 K&R:       5660      5396      5552      5467      5352|      5352 [   10593]
        x17 unrolled:       7591      7647      7664      7728      7722|      7591 [   26729]
              x65599:       9569      9762      9512      9765      9692|      9512 [      35]
           Sedgewick:       9453      9310      9669      9504      9511|      9310 [      11]
          Weinberger:       7267      7209      7155      7128      7184|      7128 [   27452]
         Paul Larson:       6588      6306      6481      6336      6469|      6306 [      14]
          Paul Hsieh:      33372     33097     33243     32862     33022|     32862 [      30]
         One At Time:      31891     31652     31955     32240     33917|     31652 [      28]
             lookup3:      34724     34649     35148     33936     34738|     33936 [      28]
        Arash Partow:       7987      7909      7933      8171      7999|      7909 [      11]
              CRC-32:      32050     34561     32286     34037     32238|     32050 [      11]
         Ramakrishna:       6800      6765      6726      6766      6693|      6693 [   11920]
            Fletcher:      20746     21133     20787     20814     21500|     20746 [   42071]
             Murmur2:      28144     28965     27766     27079     27476|     27079 [      30]
              Hanson:      32326     32349     32356     32086     32453|     32086 [      33]
      Novak unrolled:      22379     22860     23012     22888     22864|     22379 [   43340]
                SBox:      32514     32536     31759     31992     32705|     31759 [      26]
           MaPrime2c:      30483     30545     31196     31047     30942|     30483 [      11]

248019 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:     179028    174553    177981    175523    174781|    174553 [     427]
        FNV1A_Meiyan:     123632    122620    120915    119542    119672|    119542 [     365]
     FNV1A_Jesteress:     116618    116521    114923    115622    115284|    114923 [    1938]
        FNV1A_Jester:     114802    114067    118031    114699    115439|    114067 [    1938]
       FNV1A_Smaragd:     122969    125695    121732    123395    122146|    121732 [     365]
     FNV1A_Peregrine:     182437    178080    179506    184654    181708|    178080 [     365]
          FNV1A_Whiz:     117534    115430    116303    115853    117752|    115430 [    1938]
     FNV1A_Nefertiti:     105310    106214    106906    108339    106486|    105310 [   13655]
              FNV-1a:     117850    115653    116143    113840    115203|    113840 [     411]
    Sixtinsensitive+:     141117    138681    140142    138261    137625|    137625 [  145278]
     Sixtinsensitive:     112584    112133    113127    115259    113846|    112133 [  139136]
     Alfalfa_Rollick:      44101     44551     44016     44380     44286|     44016 [   13663]
             Alfalfa:      48331     46665     46135     46336     45574|     45574 [   13663]
        Alfalfa_HALF:      42777     40995     41830     41363     41640|     40995 [   22302]
       Alfalfa_DWORD:      48295     47415     49170     46909     48033|     46909 [   13666]
       Alfalfa_QWORD:      49057     47531     49249     50390     48969|     47531 [   13663]
           Bernstein:      44504     43530     43335     45011     42987|     42987 [   37847]
                 K&R:      35828     35036     34723     36075     35468|     34723 [   22302]
        x17 unrolled:      42351     41905     42011     43212     41050|     41050 [   98876]
              x65599:      65130     64455     65439     64865     65220|     64455 [     492]
           Sedgewick:      67421     66574     68726     69338     67572|     66574 [     460]
          Weinberger:      39373     38517     38971     39534     39034|     38517 [   97808]
         Paul Larson:      46692     47034     47393     46918     46783|     46692 [     312]
          Paul Hsieh:     181643    180929    179932    180131    181471|    179932 [     481]
         One At Time:     176486    178151    175074    179022    175743|    175074 [     585]
             lookup3:     186712    187299    187604    191075    188396|    186712 [     480]
        Arash Partow:      64996     68647     66338     66992     65996|     64996 [     560]
              CRC-32:     175958    176258    177073    178955    174797|    174797 [     864]
         Ramakrishna:      46777     48286     46497     46997     45853|     45853 [   30316]
            Fletcher:     136146    133758    146270    133606    134351|    133606 [   49303]
             Murmur2:     173469    171420    172774    184955    173339|    171420 [     451]
              Hanson:     171761    174920    173488    172920    171958|    171761 [     481]
      Novak unrolled:     443212    446049    441939    439340    441672|    439340 [  238439]
                SBox:     171345    169739    169607    172310    169658|    169607 [     448]
           MaPrime2c:     168898    170270    169093    170387    168243|    168243 [     556]

855682 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:     635469    619881    616091    617357    615924|    615924 [    5518]
        FNV1A_Meiyan:     415088    419401    417240    417605    416751|    415088 [    5330]
     FNV1A_Jesteress:     415868    417644    417151    418238    416844|    415868 [    5968]
        FNV1A_Jester:     416369    415653    417404    418992    417513|    415653 [    5968]
       FNV1A_Smaragd:     439764    441560    445192    443800    445971|    439764 [    5330]
     FNV1A_Peregrine:     633723    638664    636564    629140    635454|    629140 [    5330]
          FNV1A_Whiz:     419561    419168    415653    416053    422086|    415653 [    5968]
     FNV1A_Nefertiti:     346358    346958    342009    343926    346074|    342009 [   30631]
              FNV-1a:     569274    565082    568014    565177    564142|    564142 [    5560]
    Sixtinsensitive+:     459193    459959    458521    455978    454754|    454754 [  397985]
     Sixtinsensitive:     434941    436744    435405    435906    439194|    434941 [  399637]
     Alfalfa_Rollick:     221477    221568    223042    223021    222469|    221477 [   26670]
             Alfalfa:     225888    228010    227636    225752    224918|    224918 [   26670]
        Alfalfa_HALF:     204340    201225    203866    202078    203984|    201225 [   35000]
       Alfalfa_DWORD:     228464    229762    231969    227542    228853|    227542 [   26671]
       Alfalfa_QWORD:     236423    235609    235735    234026    235571|    234026 [   26670]
           Bernstein:     206162    206211    205870    205911    204922|    204922 [   59573]
                 K&R:     175157    176214    177255    176008    178668|    175157 [   34582]
        x17 unrolled:     183717    184090    182407    182280    185108|    182280 [  190251]
              x65599:     320093    320066    317655    318289    320661|    317655 [    4932]
           Sedgewick:     303817    303858    305741    304128    306254|    303817 [    4428]
          Weinberger:     168826    170675    170027    169820    171513|    168826 [  177873]
         Paul Larson:     245727    247790    246504    247241    246850|    245727 [    5499]
          Paul Hsieh:     649603    649342    654617    650496    655266|    649342 [   30488]
         One At Time:     630109    633306    630020    630335    632801|    630020 [    5476]
             lookup3:     655841    654395    659685    654746    657823|    654395 [    5491]
        Arash Partow:     269233    269534    268099    268478    267374|    267374 [    6272]
              CRC-32:     619892    618408    613697    611186    616191|    611186 [    4450]
         Ramakrishna:     224644    226707    225684    225891    225163|    224644 [   45786]
            Fletcher:     354516    353851    353636    355539    355643|    353636 [  656967]
             Murmur2:     603138    605572    597462    602469    612482|    597462 [    5387]
              Hanson:     617870    655715    616389    618870    625239|    616389 [    5502]
      Novak unrolled:    3830475   3728574   3726761   3732577   3731494|   3726761 [  827051]
                SBox:     617466    619852    616287    620911    622378|    616287 [    5531]
           MaPrime2c:     608066    606798    609259    607319    609889|    606798 [    5418]

2236138 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:    1693726   1672778   1662022   1667975   1663712|   1662022 [   36643]
        FNV1A_Meiyan:    1398600   1396525   1397032   1398618   1402294|   1396525 [   38870]
     FNV1A_Jesteress:    1293633   1292957   1286705   1289538   1285456|   1285456 [   52366]
        FNV1A_Jester:    1292520   1286581   1292729   1292907   1290687|   1286581 [   52366]
       FNV1A_Smaragd:    1621731   1628939   1624750   1622775   1623560|   1621731 [   38870]
     FNV1A_Peregrine:    1745616   1750922   1752428   1753970   1752587|   1745616 [   38870]
          FNV1A_Whiz:    1405758   1414269   1403700   1401918   1396313|   1396313 [   52366]
     FNV1A_Nefertiti:    1289850   1261247   1258932   1260898   1261091|   1258932 [  120192]
              FNV-1a:    1567310   1564706   1568154   1567700   1564842|   1564706 [   37073]
    Sixtinsensitive+:    1476991   1482146   1482924   1474038   1479225|   1474038 [  874695]
     Sixtinsensitive:    1607388   1598777   1610932   1603888   1607572|   1598777 [  853608]
     Alfalfa_Rollick:     762068    777119    760775    768934    762752|    760775 [   61964]
             Alfalfa:     778235    769760    767723    777981    771836|    767723 [   61964]
        Alfalfa_HALF:     736108    708959    712845    713131    714714|    708959 [   77952]
       Alfalfa_DWORD:     782912    782056    781509    782052    781405|    781405 [   61957]
       Alfalfa_QWORD:     801646    806199    803009    799110    800906|    799110 [   61964]
           Bernstein:     714618    718075    721649    718863    718033|    714618 [   94387]
                 K&R:     637258    635055    644195    644498    639929|    635055 [   79819]
        x17 unrolled:     611861    610664    611586    610798    613531|    610664 [  261696]
              x65599:    1017236   1011981   1017402   1011745   1014835|   1011745 [   37963]
           Sedgewick:     977710    974922    976937    977692    986235|    974922 [   36010]
          Weinberger:     665221    655858    659006    660126    659379|    655858 [  256510]
         Paul Larson:     828451    828535    829061    826011    824462|    824462 [   36486]
          Paul Hsieh:    1757052   1764429   1759602   1764186   1765198|   1757052 [   75554]
         One At Time:    1751552   1745773   1752822   1742071   1750032|   1742071 [   36793]
             lookup3:    1775213   1782342   1788541   1776350   1779976|   1775213 [   36983]
        Arash Partow:     975995    975288    973406    972132    974392|    972132 [   36388]
              CRC-32:    1675912   1674248   1679114   1686786   1672200|   1672200 [   37609]
         Ramakrishna:     783869    781304    774580    777867    781566|    774580 [   82766]
            Fletcher:    1729741   1746034   1739662   1743231   1741643|   1729741 [  695965]
             Murmur2:    1673253   1681208   1690136   1679592   1705678|   1673253 [   36934]
              Hanson:    1709350   1719499   1688917   1686299   1710949|   1686299 [   37167]
      Novak unrolled:   12807728  12814669  12812825  12829304  12809311|  12807728 [ 2152234]
                SBox:    1690921   1688167   1682024   1685474   1696212|   1682024 [   36476]
           MaPrime2c:    1687514   1683882   1690934   1687003   1690998|   1683882 [   37112]

4803152 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:    3925306   3863029   3859598   3865504   3854072|   3854072 [  167297]
        FNV1A_Meiyan:    3192046   3195965   3179265   3184785   3189121|   3179265 [  168339]
     FNV1A_Jesteress:    3176691   3219146   3182731   3173671   3149210|   3149210 [  170503]
        FNV1A_Jester:    3155641   3146388   3141771   3149879   3150006|   3141771 [  170503]
       FNV1A_Smaragd:    3666858   3654821   3662842   3665221   3658382|   3654821 [  168339]
     FNV1A_Peregrine:    3867947   3860289   3854550   3860177   3857658|   3854550 [  168339]
          FNV1A_Whiz:    3182447   3195344   3196748   3185454   3184057|   3182447 [  170503]
     FNV1A_Nefertiti:    3046587   3035378   3034303   3032712   3030773|   3030773 [  215095]
              FNV-1a:    3640750   3650958   3639683   3654937   3646469|   3639683 [  168058]
    Sixtinsensitive+:    3253634   3254085   3240195   3271119   3274314|   3240195 [ 1446295]
     Sixtinsensitive:    3320076   3295173   3299524   3298457   3312531|   3295173 [ 1433399]
     Alfalfa_Rollick:    2053934   2053399   2053120   2058154   2054123|   2053120 [  195034]
             Alfalfa:    2081500   2073797   2072266   2068358   2077348|   2068358 [  195034]
        Alfalfa_HALF:    1950333   1947412   1954580   1958729   1950574|   1947412 [  214254]
       Alfalfa_DWORD:    2096353   2095004   2098951   2088296   2094393|   2088296 [  195036]
       Alfalfa_QWORD:    2156447   2162179   2169995   2169146   2166216|   2156447 [  195034]
           Bernstein:    1955300   1967808   1970984   1964697   1961368|   1955300 [  227590]
                 K&R:    1797894   1801834   1799419   1799418   1805518|   1797894 [  211850]
        x17 unrolled:    1698527   1693994   1691649   1694846   1692182|   1691649 [  432040]
              x65599:    2588406   2597619   2593762   2588259   2583001|   2583001 [  168139]
           Sedgewick:    2523482   2521966   2543139   2544848   2558079|   2521966 [  172013]
          Weinberger:    2208798   2219657   2188612   2187141   2191627|   2187141 [  771987]
         Paul Larson:    2192456   2185158   2190565   2189394   2193449|   2185158 [  166691]
          Paul Hsieh:    3943296   3943258   3937383   3948230   3936017|   3936017 [  238759]
         One At Time:    4000288   3997946   4001915   3991514   4003329|   3991514 [  167708]
             lookup3:    3962573   3961724   3971915   3965543   3973480|   3961724 [  167908]
        Arash Partow:    2344027   2340630   2358305   2344931   2342474|   2340630 [  165769]
              CRC-32:    3854146   3859270   3865907   3861287   3841605|   3841605 [  166743]
         Ramakrishna:    2124189   2133556   2123155   2125564   2141054|   2123155 [  220086]
            Fletcher:    2922879   2927199   2931088   2917889   2913200|   2913200 [ 3262980]
             Murmur2:    3801662   3800472   3803887   3807522   3798062|   3798062 [  167779]
              Hanson:    3818130   3830602   3831806   3825353   3832675|   3818130 [  169743]
      Novak unrolled:   24990613  25009525  24991605  25198933  25036322|  24990613 [ 4562018]
                SBox:    3835014   3826787   3823619   3822488   3823134|   3822488 [  167630]
           MaPrime2c:    3863293   3869203   3873062   3865968   3880730|   3863293 [  167237]

8956496 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:    7369727   7247741   7261370   7253739   7266209|   7247741 [  574438]
        FNV1A_Meiyan:    5847582   5886598   5930016   5881373   5854930|   5847582 [ 1028978]
     FNV1A_Jesteress:    5836792   5850374   5854357   5860963   5850093|   5836792 [ 1028978]
        FNV1A_Jester:    5769096   5750931   5764104   5758504   5764662|   5750931 [  913448]
       FNV1A_Smaragd:    7574255   7570837   7558324   7565386   7571800|   7558324 [  574438]
     FNV1A_Peregrine:    7647868   7636216   7672854   7713413   7620732|   7620732 [  913448]
          FNV1A_Whiz:    5811576   5802315   5804006   5807046   5795549|   5795549 [  913448]
     FNV1A_Nefertiti:    6992685   7015027   7011395   7016150   6997868|   6992685 [  896124]
              FNV-1a:    7426811   7423883   7422975   7425308   7492168|   7422975 [  572119]
    Sixtinsensitive+:    7490178   7494893   7567449   7527364   7506432|   7490178 [ 2560474]
     Sixtinsensitive:    6603978   6604723   6613926   6612349   6612198|   6603978 [ 2277956]
     Alfalfa_Rollick:    5270257   5276405   5265746   5261441   5276458|   5261441 [ 1633539]
             Alfalfa:    4773629   4768585   4773963   4765614   4767022|   4765614 [  598532]
        Alfalfa_HALF:    4551844   4549514   4562170   4564350   4556019|   4549514 [  613540]
       Alfalfa_DWORD:    4934446   4942344   4953421   4915923   4916364|   4915923 [  598520]
       Alfalfa_QWORD:    5140486   5153957   5146400   5159958   5143577|   5140486 [  598520]
           Bernstein:    4621485   4622047   4609047   4626567   4616239|   4609047 [  632926]
                 K&R:    4615881   4608413   4612256   4608572   4608237|   4608237 [  609893]
        x17 unrolled:    4080313   4077832   4055702   4069213   4080348|   4055702 [  868872]
              x65599:    5683539   5666888   5694646   5737104   5727959|   5666888 [  574269]
           Sedgewick:    5651701   5627789   5631242   5624196   5614715|   5614715 [  570722]
          Weinberger:    6571422   6584821   6580232   6565814   6570873|   6565814 [ 2462847]
         Paul Larson:    4983728   4986882   4984806   4982408   4977153|   4977153 [  573492]
          Paul Hsieh:    7963669   7934035   7959702   7964171   7986804|   7934035 [  609781]
         One At Time:    8281677   8236676   8181044   8174620   8171247|   8171247 [  572492]
             lookup3:    8131042   8098702   8119699   8131122   8127732|   8098702 [  571014]
        Arash Partow:    5694880   5689350   5703258   5687727   5695461|   5687727 [  569873]
              CRC-32:    7737260   7735351   7731257   7817579   7805965|   7731257 [  570199]
         Ramakrishna:    4960288   4975901   4964380   4955695   4947370|   4947370 [  629279]
            Fletcher:    8703788   8735616   8710538   8720857   8710367|   8703788 [ 2646718]
             Murmur2:    7772966   7779928   7763846   7781060   7776226|   7763846 [  570799]
              Hanson:    7982700   8000687   8098676   7959213   7977307|   7959213 [  581505]
      Novak unrolled:   36453591  36501725  36504178  36452614  35493766|  35493766 [ 8271712]
                SBox:    7534769   7522317   7502772   7527539   7516409|   7502772 [  572545]
           MaPrime2c:    7631536   7603833   7626193   7617020   7676645|   7603833 [  572590]

15006172 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:   13805664  13605789  13605469  13595929  13624790|  13595929 [ 1560014]
        FNV1A_Meiyan:   11637701  11612695  11614641  11671544  11646487|  11612695 [ 1656865]
     FNV1A_Jesteress:   11769885  11633601  11644545  11629472  11618162|  11618162 [ 1656865]
        FNV1A_Jester:   11646312  11644806  11627881  11614360  11653957|  11614360 [ 1599151]
       FNV1A_Smaragd:   13708741  13801274  13802221  13726471  13700392|  13700392 [ 1560794]
     FNV1A_Peregrine:   13762367  13768122  13775727  13756456  13755180|  13755180 [ 1599151]
          FNV1A_Whiz:   11692615  11707611  11830349  11673884  11728753|  11673884 [ 1599151]
     FNV1A_Nefertiti:   11818117  11783640  11799180  11792281  11818414|  11783640 [ 1642435]
              FNV-1a:   14438052  14415112  14424435  14534772  14397478|  14397478 [ 1559756]
    Sixtinsensitive+:   12537062  12533209  12528706  12526817  12539785|  12526817 [ 3634999]
     Sixtinsensitive:   12359837  12363844  12352257  12369827  12430243|  12352257 [ 3573121]
     Alfalfa_Rollick:    9596214   9576997   9596140   9585540   9586623|   9576997 [ 1796113]
             Alfalfa:    9631180   9631644   9646953   9631102   9645931|   9631102 [ 1585513]
        Alfalfa_HALF:    9309513   9308152   9309767   9386163   9309212|   9308152 [ 1599087]
       Alfalfa_DWORD:    9784801   9779266   9768980   9776590   9778093|   9768980 [ 1585512]
       Alfalfa_QWORD:   10243282  10241185  10250929  10247302  10227501|  10227501 [ 1585512]
           Bernstein:   10167814  10222175  10258834  10199647  10188531|  10167814 [ 1613382]
                 K&R:    9476524   9501514   9493102   9506902   9485744|   9476524 [ 1601151]
        x17 unrolled:    8488170   8457047   8465708   8474736   8470738|   8457047 [ 1874514]
              x65599:   11919751  12111993  11896090  11916367  11906404|  11896090 [ 1560060]
           Sedgewick:   11883738  11902481  11876890  11862498  11874301|  11862498 [ 1561162]
          Weinberger:   14886178  14897714  15010720  14889111  14912039|  14886178 [ 4913597]
         Paul Larson:    9985066   9988315   9993121   9980663   9987270|   9980663 [ 1558905]
          Paul Hsieh:   14493691  14487566  14472722  14616287  14481414|  14472722 [ 1601618]
         One At Time:   15014211  15011538  14998070  15011306  15034171|  14998070 [ 1559519]
             lookup3:   14432190  14404073  14484438  14466880  14432023|  14404073 [ 1559611]
        Arash Partow:   10727471  10704561  10702966  10736381  10695439|  10695439 [ 1561948]
              CRC-32:   13970409  14002291  14020990  14111270  14031718|  13970409 [ 1559521]
         Ramakrishna:   10870117  10871609  10863313  10869283  10873759|  10863313 [ 1617827]
            Fletcher:   11197838  11199418  11188781  11211803  11207085|  11188781 [ 8696395]
             Murmur2:   14182899  14158526  14076046  14113060  14073436|  14073436 [ 1558392]
              Hanson:   14374202  14351157  14381037  14363612  14357884|  14351157 [ 1589703]
      Novak unrolled:   42308739  42114355  42204100  42355087  42168342|  42114355 [13102749]
                SBox:   14636086  14655729  14614969  14637515  14624832|  14614969 [ 1560519]
           MaPrime2c:   14980645  14880612  14869258  14870841  14866759|  14866759 [ 1557844]

22992127 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:   24055313  23752703  23627674  23635254  23651268|  23627674 [ 3525152]
        FNV1A_Meiyan:   21484521  21485590  21539671  21508878  21479985|  21479985 [ 3611850]
     FNV1A_Jesteress:   21471437  21479105  21470194  21441619  21588275|  21441619 [ 3611850]
        FNV1A_Jester:   21669658  21666314  21623420  21652606  21653595|  21623420 [ 3654062]
       FNV1A_Smaragd:   24030079  23964112  23916920  23921989  23915584|  23915584 [ 3525152]
     FNV1A_Peregrine:   24544415  24646189  24553231  24515824  24578414|  24515824 [ 3654062]
          FNV1A_Whiz:   21591734  21588433  21718497  21647686  21668049|  21588433 [ 3654062]
     FNV1A_Nefertiti:   22574103  22547113  22552888  22612894  22634642|  22547113 [ 3739373]
              FNV-1a:   24375206  24398812  24382207  24420587  24455063|  24375206 [ 3527537]
    Sixtinsensitive+:   23638104  23647702  23623564  23634566  23623324|  23623324 [ 5880454]
     Sixtinsensitive:   22370814  22356235  22289947  22309625  22289803|  22289803 [ 5652221]
     Alfalfa_Rollick:   18203747  18184287  18302923  18218867  18196632|  18184287 [ 3613229]
             Alfalfa:   18409370  18403101  18403470  18391784  18407508|  18391784 [ 3544682]
        Alfalfa_HALF:   17942649  17848766  17844460  17813733  17825962|  17813733 [ 3558121]
       Alfalfa_DWORD:   18700603  18698421  18747674  18674743  18655766|  18655766 [ 3544682]
       Alfalfa_QWORD:   19368576  19378340  19362669  19361865  19346436|  19346436 [ 3544682]
           Bernstein:   18359820  18240404  18241044  18224922  18250261|  18224922 [ 3574833]
                 K&R:   17517888  17518595  17556678  17588614  17500726|  17500726 [ 3561685]
        x17 unrolled:   16621547  16643586  16639577  16625549  16634323|  16621547 [ 3832933]
              x65599:   20818696  20817156  20729503  20772917  20736959|  20729503 [ 3528251]
           Sedgewick:   20434750  20463676  20561028  20480465  20457107|  20434750 [ 3525723]
          Weinberger:   26213981  26230236  26190034  26320772  26191497|  26190034 [ 7239926]
         Paul Larson:   20373400  20412562  20383213  20385771  20417860|  20373400 [ 3523757]
          Paul Hsieh:   25233187  25148714  25200371  25140735  25108527|  25108527 [ 3551794]
         One At Time:   26273020  26147008  26154750  26122919  26156684|  26122919 [ 3525042]
             lookup3:   25438420  25540246  25435166  25411436  25469256|  25411436 [ 3526141]
        Arash Partow:   20932539  20950835  20953799  20893649  20888163|  20888163 [ 3528077]
              CRC-32:   24446832  24401820  24418762  24552466  24398535|  24398535 [ 3529021]
         Ramakrishna:   19294474  19292354  19285851  19301630  19361718|  19285851 [ 3580093]
            Fletcher:   29491893  29393892  29382254  29383070  29559628|  29382254 [ 7785869]
             Murmur2:   24637145  24616643  24636645  24621049  24614172|  24614172 [ 3523751]
              Hanson:   25108618  24953984  24937324  24961635  24952696|  24937324 [ 3607402]
      Novak unrolled:   45287398  45201809  45162351  45247704  45070137|  45070137 [17947092]
                SBox:   24623427  24650569  24603930  24770727  24644057|  24603930 [ 3523461]
           MaPrime2c:   25086850  25030747  25112970  25110324  25208499|  25030747 [ 3525183]

32707519 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:   39663161  38969335  38933207  38856246  38873958|  38856246 [ 6818433]
        FNV1A_Meiyan:   37322456  37169246  37140526  37202370  37295058|  37140526 [ 6848898]
     FNV1A_Jesteress:   37151332  37169251  37148774  37289624  37146213|  37146213 [ 6848898]
        FNV1A_Jester:   37740139  37803620  37784132  37714810  37710040|  37710040 [ 6844433]
       FNV1A_Smaragd:   38917671  38764848  38805502  38781550  38871100|  38764848 [ 6820479]
     FNV1A_Peregrine:   39110328  39137217  39238079  39198853  39122895|  39110328 [ 6844433]
          FNV1A_Whiz:   37999460  38084763  38004072  38034855  38018002|  37999460 [ 6844433]
     FNV1A_Nefertiti:   35477881  35314767  35317648  35312868  35476589|  35312868 [ 6862897]
              FNV-1a:   39935647  39901778  39919333  40023990  39893528|  39893528 [ 6820338]
    Sixtinsensitive+:   37211159  37340669  37211783  37221348  37240185|  37211159 [ 8918604]
     Sixtinsensitive:   36839819  36755880  36778962  36790829  36880758|  36755880 [ 8890228]
     Alfalfa_Rollick:   30997547  30981009  31010693  31102597  31006573|  30981009 [ 6874377]
             Alfalfa:   31737062  31762501  31742817  31887839  31774054|  31737062 [ 6837237]
        Alfalfa_HALF:   31115319  31097943  31254094  31134033  31107714|  31097943 [ 6844423]
       Alfalfa_DWORD:   31877000  31892971  31991164  31877463  31879958|  31877000 [ 6837236]
       Alfalfa_QWORD:   32856874  32982880  32898091  32835536  32852958|  32835536 [ 6837236]
           Bernstein:   31919522  31814902  31776473  31806821  31801622|  31776473 [ 6857971]
                 K&R:   31123177  30974104  30959249  30974944  31050042|  30959249 [ 6850978]
        x17 unrolled:   29386189  29363602  29375920  29349031  29463899|  29349031 [ 7098408]
              x65599:   35127704  35115087  35096492  35247998  35097892|  35096492 [ 6819716]
           Sedgewick:   34917368  34883533  35086999  34888583  34912648|  34883533 [ 6818912]
          Weinberger:   41254124  41321017  41222180  41258375  41281811|  41222180 [10307895]
         Paul Larson:   32874893  32887489  32870621  32829068  32969357|  32829068 [ 6817853]
          Paul Hsieh:   40409490  40325257  40516598  40411823  40368034|  40325257 [ 6855160]
         One At Time:   42473111  42621639  42489644  42481293  42668990|  42473111 [ 6818900]
             lookup3:   39942929  39938528  39943634  40094568  39925246|  39925246 [ 6816040]
        Arash Partow:   34247173  34315764  34318592  34220246  34266975|  34220246 [ 6820854]
              CRC-32:   39786257  39800688  39757461  39725372  39891213|  39725372 [ 6818991]
         Ramakrishna:   33337713  33359240  33358901  33467731  33378484|  33337713 [ 6864715]
            Fletcher:   34382645  34384318  34476525  34384583  34358066|  34358066 [17501262]
             Murmur2:   39472463  39601819  39483558  39451986  39443158|  39443158 [ 6817202]
              Hanson:   40538844  40402486  40392937  40502717  40417510|  40392937 [ 7000005]
      Novak unrolled:   44247384  44242070  44412707  44241831  44227650|  44227650 [20559205]
                SBox:   40115423  40006667  39980360  40095793  39988604|  39980360 [ 6821829]
           MaPrime2c:   41060106  41075050  41175522  41060267  41057558|  41057558 [ 6819946]

43802365 lines read

67108864 elements in the table (26 bits)

        FNV1A_Mantis:   59952876  58803469  58923244  58792350  58989388|  58792350 [11635004]
        FNV1A_Meiyan:   55465475  55474404  55508029  55424802  55438444|  55424802 [11659429]
     FNV1A_Jesteress:   52497350  52380658  52576751  52399304  52419711|  52380658 [11818547]
        FNV1A_Jester:   53425354  53300316  53324168  53372656  53241243|  53241243 [12062644]
       FNV1A_Smaragd:   59908693  59741994  59788332  59969316  59756263|  59741994 [11635004]
     FNV1A_Peregrine:   60292033  60116021  60306069  60136755  60224361|  60116021 [11677586]
          FNV1A_Whiz:   57514713  57300395  57518028  57355838  57384306|  57300395 [12062644]
     FNV1A_Nefertiti:   58579235  58369723  58497589  58368938  58472111|  58368938 [12199626]
              FNV-1a:   61350065  61227879  61371281  61221694  61322523|  61221694 [11630492]
    Sixtinsensitive+:   58750571  58831447  58919152  58808174  58837582|  58750571 [14097063]
     Sixtinsensitive:   60824218  60843024  60732440  60798747  60949721|  60732440 [14027127]
     Alfalfa_Rollick:   50700991  50861491  50757394  50707789  50866710|  50700991 [11667363]
             Alfalfa:   51331604  51254984  51374576  51204869  51215921|  51204869 [11646579]
        Alfalfa_HALF:   50498639  50357422  50475334  50435161  50427230|  50357422 [11652948]
       Alfalfa_DWORD:   51926525  51882880  51921051  51988651  51929979|  51882880 [11646593]
       Alfalfa_QWORD:   53368751  53431091  53401526  53457158  53399570|  53368751 [11646573]
           Bernstein:   51780948  51834772  51740616  51786817  51980943|  51740616 [11662585]
                 K&R:   50720934  50823237  50784556  50750716  50848267|  50720934 [11655087]
        x17 unrolled:   48435214  48430906  48575140  48432371  48455008|  48430906 [11861222]
              x65599:   56005015  55872129  55909899  55947120  55903420|  55872129 [11632515]
           Sedgewick:   55947483  55828349  55855731  55938120  55820960|  55820960 [11633458]
          Weinberger:   64107358  63999034  64133733  64036367  64080080|  63999034 [15481171]
         Paul Larson:   53084388  53004950  53146557  53031828  53055844|  53004950 [11630978]
          Paul Hsieh:   61529775  61391502  61527462  61366620  61415225|  61366620 [11651683]
         One At Time:   65024854  64981867  65087212  64972332  65096548|  64972332 [11634172]
             lookup3:   62310133  62428054  62303782  62457776  62342284|  62303782 [11632483]
        Arash Partow:   56572844  56691783  56562409  56631840  56602057|  56562409 [11628687]
              CRC-32:   60661602  60789672  60664234  60804397  60621855|  60621855 [11633685]
         Ramakrishna:   53972634  53880488  53896432  53976477  53845704|  53845704 [11671893]
            Fletcher:   74218474  74084565  74190736  74047460  74184412|  74047460 [17365212]
             Murmur2:   60389939  60448616  60500681  60306495  60454486|  60306495 [11630747]
              Hanson:   62505541  62568248  62495065  62509340  62576766|  62495065 [11992494]
      Novak unrolled:   43245062  43296038  43220401  43189639  43173652|  43173652 [19028002]
                SBox:   61389499  61282368  61384062  61288236  61365819|  61282368 [11633664]
           MaPrime2c:   62738819  62847730  62865690  62807543  62915332|  62738819 [11628836]

D:\_KA45F~1\r3\RESULTS>

in order to enrich the versatility of your testbed my suggestion is Heavy-IP(IPs.TXT 2,995,394 keys) dataset to be added(as a table) to en-wikipedia dataset, I think millions of words&IPs are a must-show basis. I don't like(speaking of some critics) words as 'untrustworthy' to be connected with my name.

As for the match finding test I guess the criticizers must offer-first shoot-next.

Regards.

Georgi 'Sanmayce',

Oh, I forgot to give the testbed for all this dumps above:

http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z

224,799,085 bytes
Peter Kankowski,
Georgi,
thanks, but Mantis reads beyond buffer boundary if (wrdlen < 15). For example, if wrdlen == 7:
hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME; // <--- this line will read a WORD right after the string
Georgi 'Sanmayce',

Yes Peter, again a stupid error from side, caused by hurry-mode in which I am these days, I saw it yesterday 3hours after updating my site, and fixed with 'if (wrdlen) {}' wrapping.

After 1 hour I will post again, last night I remade the 200MB test also, with some big table in mind for the weekend.

Mantis is a very good predator/hash, I will be glad if you test it on your machine.

Thanks.

Georgi 'Sanmayce',

Mantis source and Benchmark-Dumps also (http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z) (224,916,121 bytes) fixed now,

here is the bugless Mantis:

define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	const char *p = str;
	// Cases: 0,1,2,3,4,5,6,7
	if (wrdlen & sizeof(DWORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(DWORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
		//wrdlen -= sizeof(WORD);
	}
	if (wrdlen & 1) {
		hash32 = (hash32 ^ *p) * PRIME;
		p += sizeof(char);
		//wrdlen -= sizeof(char);
	}
		wrdlen -= p-str;
// The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress.
// FNV1A_Jesteress: fastest strong
// FNV1A_Meiyan   : faster  stronger
// FNV1A_Mantis   : fast    strongest
	if (wrdlen) {
	for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME;		
	}
		hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME;
		hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME;
	} // Bug Fixed!
	return hash32 ^ (hash32 >> 16);
}


Peter Kankowski,

Mantis results on Pentium M:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
Jesteress78[110]419[397]59[300]121[102]115[106]353[366]512[585]2411[2427]439[1499]1.02[3.19]
Meiyan79[102]427[409]56[125]122[106]118[112]354[350]524[588]2439[2377]445[768]1.03[1.86]
Mantis84[97]471[404]56[125]140[117]133[99]389[370]551[585]2646[2388]446[768]1.10[1.86]
Novak unrolled90[113]518[399]56[90]168[118]163[113]401[342]576[581]2728[2430]482[969]1.20[1.67]
Fletcher84[131]443[406]102[460]139[127]132[108]377[507]594[1052]2912[4893]513[1359]1.23[4.60]
SBox88[91]551[431]57[116]181[108]176[91]413[347]560[526]2817[2442]472[836]1.24[1.77]
Murmur294[103]529[415]64[104]165[106]160[111]430[383]619[566]2933[2399]538[834]1.26[1.73]
CRC-3290[101]566[426]56[64]197[107]192[94]427[338]593[563]2840[2400]469[725]1.28[1.40]
x17 unrolled93[109]592[415]52[24]214[113]207[102]435[368]593[589]2867[2392]486[829]1.32[1.18]
lookup395[101]565[412]71[97]189[101]183[95]436[361]633[550]2954[2392]570[834]1.35[1.64]
K&R94[106]617[437]58[288]221[94]218[106]443[360]588[561]2966[2365]448[831]1.36[2.99]
Paul Larson94[99]630[416]50[16]231[99]227[105]455[366]600[583]3026[2447]469[755]1.38[1.09]
Bernstein95[114]621[412]61[288]225[100]221[102]445[353]593[572]3002[2380]469[703]1.38[2.98]
Paul Hsieh106[114]574[420]71[118]183[101]179[100]457[341]684[600]3168[2380]579[847]1.39[1.82]
x6559994[111]626[382]61[203]234[107]231[122]451[379]597[560]2997[2373]471[846]1.40[2.44]
Sedgewick101[107]666[414]53[48]244[103]241[103]477[348]630[570]3205[2437]475[782]1.45[1.32]
Murmur2A112[114]598[433]79[102]181[112]176[109]490[365]721[544]3382[2369]650[772]1.46[1.72]
FNV-1a101[124]659[428]62[108]239[94]235[105]472[374]625[555]3135[2446]517[807]1.46[1.76]
MaPrime2c107[103]705[426]65[106]255[91]253[106]509[349]675[550]3414[2406]541[865]1.56[1.72]
Ramakrishna108[108]727[409]61[91]277[125]271[103]513[360]669[528]3413[2383]517[840]1.59[1.65]
Arash Partow107[101]739[435]93[420]281[98]274[85]516[355]674[570]3355[2372]542[779]1.68[3.87]
One At Time118[105]832[421]81[110]320[97]316[103]576[364]741[545]3806[2346]657[795]1.85[1.74]
Weinberger120[104]957[422]54[100]376[111]378[117]623[364]753[712]4019[2547]560[744]1.94[1.74]
Hanson87[118]530[649]55[112]168[118]1647[499]393[435]549[592]2740[2890]462[833]2.62[2.44]
WikipediaAvg
x17 unrolled11407606[2410605]1.00[1.16]
K&R11743083[2083145]1.03[1.00]
Bernstein11850076[2074237]1.04[1.00]
Paul Larson11998017[2080111]1.05[1.00]
Sedgewick12224089[2080640]1.07[1.00]
x6559912166596[2102893]1.07[1.01]
Arash Partow12374085[2084572]1.08[1.00]
Ramakrishna12334890[2093253]1.08[1.01]
Meiyan12381114[2111271]1.09[1.02]
Jesteress12390279[2121868]1.09[1.02]
Mantis12525368[2082213]1.10[1.00]
CRC-3212739133[2075088]1.12[1.00]
Murmur212815546[2081476]1.12[1.00]
Hanson12766271[2129832]1.12[1.03]
SBox12851512[2084018]1.13[1.00]
lookup312934889[2084889]1.13[1.01]
FNV-1a12982534[2081195]1.14[1.00]
Paul Hsieh13126250[2180206]1.15[1.05]
Murmur2A13204842[2081370]1.16[1.00]
MaPrime2c13489436[2084467]1.18[1.00]
One At Time13793712[2087861]1.21[1.01]
Weinberger14772418[3541181]1.29[1.71]
Fletcher37809825[9063797]3.31[4.37]
Novak unrolled38061845[6318611]3.34[3.05]

Core i5:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
iSCSI CRC74[105]330[415]36[112]85[106]83[92]278[368]407[584]1974[2388]322[838]1.01[1.77]
Meiyan74[102]328[409]45[125]86[106]85[112]274[350]413[588]1979[2377]352[768]1.05[1.86]
Jesteress76[110]324[397]46[300]86[102]84[106]274[366]410[585]1964[2427]366[1499]1.06[3.19]
Mantis78[97]358[404]45[125]100[117]96[99]301[370]425[585]2115[2388]348[768]1.12[1.86]
Murmur271[103]378[415]49[104]109[106]106[111]314[383]452[566]2181[2399]398[834]1.19[1.73]
SBox70[91]439[431]46[116]124[108]122[91]300[347]429[526]2151[2442]378[836]1.22[1.77]
Paul Larson69[99]396[416]39[16]143[99]141[105]305[366]435[583]2159[2447]349[755]1.23[1.09]
Novak unrolled76[113]461[399]44[90]126[118]125[113]322[342]460[581]2283[2430]379[969]1.26[1.67]
CRC-3270[101]489[426]39[64]146[107]144[94]320[338]444[563]2229[2400]357[725]1.28[1.40]
Sedgewick73[107]476[414]42[48]144[103]143[103]318[348]450[570]2255[2437]349[782]1.29[1.32]
Fletcher71[131]402[406]80[460]103[127]99[108]311[507]482[1052]2476[4893]387[1359]1.30[4.60]
Murmur2A91[114]408[433]53[102]117[112]115[109]335[365]496[544]2382[2369]428[772]1.31[1.72]
x6559973[111]463[382]51[203]144[107]145[122]315[379]440[560]2182[2373]350[846]1.31[2.44]
FNV-1a73[124]469[428]53[108]144[94]144[105]310[374]443[555]2203[2446]373[807]1.32[1.76]
Bernstein74[114]432[412]49[288]150[100]150[102]323[353]450[572]2276[2380]351[703]1.32[2.98]
Paul Hsieh82[114]413[420]54[118]127[101]125[100]342[341]502[600]2379[2380]433[847]1.33[1.82]
K&R77[106]494[437]47[288]150[94]149[106]334[360]481[561]2364[2365]344[831]1.35[2.99]
x17 unrolled78[109]510[415]44[24]156[113]153[102]343[368]473[589]2367[2392]373[829]1.38[1.18]
lookup383[101]460[412]55[97]140[101]138[95]361[361]527[550]2497[2392]427[834]1.41[1.64]
MaPrime2c79[103]457[426]57[106]155[91]155[106]349[349]484[550]2491[2406]405[865]1.42[1.72]
Ramakrishna82[108]587[409]45[91]189[125]186[103]371[360]493[528]2602[2383]381[840]1.53[1.65]
Arash Partow83[101]560[435]70[420]215[98]212[85]391[355]507[570]2624[2372]408[779]1.69[3.87]
One At Time84[105]561[421]58[110]221[97]220[103]391[364]509[545]2645[2346]459[795]1.70[1.74]
Weinberger85[104]595[422]42[100]266[111]279[117]402[364]548[712]2769[2547]419[744]1.81[1.74]
Hanson72[118]469[649]45[112]122[118]1217[499]311[435]436[592]2238[2890]370[833]2.70[2.44]
WikipediaAvg
iSCSI CRC5814350[2077725]1.00[1.00]
Jesteress5820373[2121868]1.00[1.02]
Meiyan5833530[2111271]1.00[1.02]
Mantis5987971[2082213]1.03[1.00]
Murmur26320510[2081476]1.09[1.00]
Paul Larson6349291[2080111]1.09[1.00]
x655996478493[2102893]1.11[1.01]
FNV-1a6645319[2081195]1.14[1.00]
Hanson6891590[2129832]1.19[1.03]
SBox6959635[2084018]1.20[1.00]
Sedgewick7072573[2080640]1.22[1.00]
CRC-327090810[2075088]1.22[1.00]
K&R7145952[2083145]1.23[1.00]
Bernstein7256091[2074237]1.25[1.00]
Murmur2A7369543[2081370]1.27[1.00]
lookup37359399[2084889]1.27[1.01]
Paul Hsieh7417808[2180206]1.28[1.05]
x17 unrolled7419562[2410605]1.28[1.16]
Ramakrishna8183394[2093253]1.41[1.01]
One At Time8332427[2087861]1.43[1.01]
MaPrime2c8433854[2084467]1.45[1.00]
Arash Partow8501990[2084572]1.46[1.00]
Weinberger9433157[3541181]1.62[1.71]
Novak unrolled21350811[6318611]3.67[3.05]
Fletcher22272811[9063797]3.83[4.37]

Generally, Mantis has similar number of collisions to Meiyan, but Mantis is slower.

Georgi 'Sanmayce',

Thanks for testing,

at link below is my attempt to present Heavy-Hash-Hustle dumps in a more digestible fashion:

http://www.sanmayce.com/Fastest_Hash/index.html#Heavy-Hash-Hustle

Intel Core 2 Quad Q9550S Yorkfield 2.83GHz 12MB L2 Cache:

D:\_2010-Dec-05\_KAZE_hash_test_r3.RESULTS.Atom.Q9550S\RESULTS_Q9550S>sort IP.TXT /+75

8388608 elements in the table (23 bits)

2995394 lines read

        FNV1A_Meiyan:    1914155   1914404   1914122   1914486   1913915|   1913915 [  593723]
     FNV1A_Jesteress:    1935195   1937318   1937469   1936312   1935495|   1935195 [  691369]
     Alfalfa_Rollick:    1980953   1981207   1980671   1981201   1981856|   1980671 [  604098]
        FNV1A_Mantis:    2016612   1995406   1993654   1993662   1993258|   1993258 [  481137]
      Novak unrolled:    2010097   2009996   2009829   2010971   2009834|   2009829 [  657377]
              Hanson:    2012466   2012191   2012537   2012350   2012236|   2012191 [  534251]
       FNV1A_Smaragd:    2030879   2028917   2029161   2029013   2028666|   2028666 [  480914]
              CRC-32:    2034011   2033842   2035153   2033904   2034696|   2033842 [  472854]
        FNV1A_Jester:    2050708   2050776   2050694   2050668   2051040|   2050668 [  689339]
                 K&R:    2064544   2064406   2065423   2065249   2065812|   2064406 [  474011]
             Murmur2:    2066808   2066666   2066382   2066497   2066478|   2066382 [  476330]
             Alfalfa:    2068149   2067824   2068182   2068081   2067452|   2067452 [  475434]
        Alfalfa_HALF:    2071077   2071411   2071349   2070671   2071446|   2070671 [  480071]
       Alfalfa_DWORD:    2081631   2081774   2081485   2081846   2081789|   2081485 [  475434]
     FNV1A_Peregrine:    2098860   2098795   2098753   2098718   2098887|   2098718 [  546915]
        x17 unrolled:    2112134   2114144   2112556   2112551   2112255|   2112134 [  475528]
         Paul Larson:    2121042   2121087   2120977   2120391   2121282|   2120391 [  475575]
          FNV1A_Whiz:    2127393   2127963   2127276   2126858   2127638|   2126858 [  689339]
                SBox:    2137206   2137208   2137045   2136962   2136563|   2136563 [  476681]
           Sedgewick:    2137897   2138151   2137352   2138251   2137971|   2137352 [  477931]
           Bernstein:    2165058   2164180   2164633   2164803   2164633|   2164180 [  474048]
              FNV-1a:    2169036   2169173   2168907   2168979   2168931|   2168907 [  477067]
     FNV1A_Nefertiti:    2184810   2184848   2184522   2184738   2185078|   2184522 [  763451]
       Alfalfa_QWORD:    2190122   2189991   2189488   2189910   2189894|   2189488 [  475434]
          Paul Hsieh:    2196409   2195461   2195511   2195236   2195676|   2195236 [  543835]
          Weinberger:    2203173   2203305   2202832   2202739   2202064|   2202064 [ 1159267]
     Sixtinsensitive:    2225863   2225928   2226657   2226719   2226205|   2225863 [  582793]
        Arash Partow:    2241218   2241227   2241293   2240949   2240652|   2240652 [  478246]
             lookup3:    2246638   2246955   2247440   2247833   2247080|   2246638 [  476566]
           MaPrime2c:    2250515   2250361   2251148   2251678   2252040|   2250361 [  477151]
         Ramakrishna:    2268889   2268536   2268740   2268861   2268980|   2268536 [  476020]
    Sixtinsensitive+:    2272421   2272439   2272689   2272420   2272411|   2272411 [  716367]
              x65599:    2341902   2342241   2341970   2341930   2341731|   2341731 [  654463]
         One At Time:    2360163   2360112   2360203   2359913   2359343|   2359343 [  477667]
            Fletcher:   24424563  24401091  24400993  24395252  24394052|  24394052 [ 2856890]

Intel Atom N450 1.66GHz 512KB L2 Cache:

D:\_2010-Dec-05\_KAZE_hash_test_r3.RESULTS.Atom.Q9550S\RESULTS_Atom>sort IP.TXT /+75

8388608 elements in the table (23 bits)

2995394 lines read

        FNV1A_Meiyan:    3155958   3159537   3157158   3158773   3156719|   3155958 [  593723]
          Weinberger:    3248750   3246228   3248741   3249811   3243717|   3243717 [ 1159267]
     FNV1A_Jesteress:    3253953   3256344   3264163   3254169   3254820|   3253953 [  691369]
        FNV1A_Mantis:    3314496   3281163   3284368   3283819   3281818|   3281163 [  481137]
              CRC-32:    3343937   3345737   3356535   3347699   3345348|   3343937 [  472854]
     FNV1A_Peregrine:    3376694   3378179   3386379   3379572   3377683|   3376694 [  546915]
        Alfalfa_HALF:    3379134   3382696   3380707   3377835   3392481|   3377835 [  480071]
       FNV1A_Smaragd:    3379196   3382352   3380635   3379201   3379668|   3379196 [  480914]
             lookup3:    3424892   3430226   3426356   3421977   3426662|   3421977 [  476566]
             Murmur2:    3430503   3430565   3429001   3429150   3427817|   3427817 [  476330]
              Hanson:    3433796   3434865   3435452   3435932   3432672|   3432672 [  534251]
          Paul Hsieh:    3435899   3433042   3435868   3436050   3436606|   3433042 [  543835]
     Alfalfa_Rollick:    3465457   3459388   3456972   3458956   3458066|   3456972 [  604098]
           Bernstein:    3487977   3491575   3494922   3486510   3487834|   3486510 [  474048]
        FNV1A_Jester:    3504276   3504202   3510988   3505340   3503647|   3503647 [  689339]
     Sixtinsensitive:    3515471   3519793   3523615   3517541   3519258|   3515471 [  582793]
          FNV1A_Whiz:    3533310   3534248   3531524   3534347   3531857|   3531524 [  689339]
        x17 unrolled:    3542761   3543107   3541781   3540489   3542713|   3540489 [  475528]
        Arash Partow:    3542980   3542590   3542073   3542242   3545691|   3542073 [  478246]
    Sixtinsensitive+:    3546786   3546044   3546339   3546185   3544348|   3544348 [  716367]
                 K&R:    3564368   3562589   3565386   3565243   3563323|   3562589 [  474011]
              FNV-1a:    3576419   3588693   3579410   3576049   3579704|   3576049 [  477067]
      Novak unrolled:    3595877   3582905   3580473   3580509   3582652|   3580473 [  657377]
             Alfalfa:    3583829   3584782   3582429   3584607   3581127|   3581127 [  475434]
     FNV1A_Nefertiti:    3593590   3588775   3591870   3593603   3590078|   3588775 [  763451]
       Alfalfa_DWORD:    3622713   3620781   3622630   3622695   3626486|   3620781 [  475434]
         Ramakrishna:    3651378   3652073   3652729   3652338   3651156|   3651156 [  476020]
           Sedgewick:    3661915   3656998   3656072   3656996   3662433|   3656072 [  477931]
         Paul Larson:    3706139   3694095   3692079   3694297   3694153|   3692079 [  475575]
       Alfalfa_QWORD:    3696531   3702283   3701765   3703357   3699967|   3696531 [  475434]
                SBox:    3785574   3779069   3780873   3782789   3785080|   3779069 [  476681]
         One At Time:    3955247   3954321   3955469   3963892   3953817|   3953817 [  477667]
              x65599:    3998490   4006600   3997098   3994519   3996586|   3994519 [  654463]
           MaPrime2c:    4191055   4193915   4193048   4204844   4193416|   4191055 [  477151]
            Fletcher:   77210862  77226344  77294433  77235843  77205175|  77205175 [ 2856890]

My wish is to cover all meaningful(at least for LZ) lengths that is 3..66 bytes but a different approach must be commenced because of HUGE size of dataset:

3*46486= 139,458 bytes
...
66*198631486=13,109,678,076 bytes

Speaking of very precious(regarding English language usage, and original thoughts used, also hundreds of books included) OSHO.TXT I propose one simple way of achieving Building-Blocks hashing:

loading 197MB(the file itself) and hashing(3..66 chunks) at each position(i.e. one byte increment).

Another thing I want to share regarding collision managing:

approaches(rehashing, chains, ...) without definite goals i.e. context are like kata(detailed choreographed patterns of movements)(the real fight is an extension/mix of kata/techniques with complex timing which includes awareness of timings of outer things not just your own timing), I mean if enough(free RAM) resources are given not utilizing/exploiting them and talking about speed as this-and-that is a dead-end.

For example I tested(now commented) a FNV1A variant hash function in Leprechaun which outperforms(hash time + lookup time) FNV1A_Jesteress by (1,8??,???-1,6??,???)/1,6??,???*100%= 12.5%, but it is completely due to B-tree used as collision manager at final stage.

This very hasher performs not well while other techniques are used, though.

The point is, speed is something beyond all limitations imposed, it must be chased for each niche relentlessly.

One phenom in real world is Mr. Bolt: his fantastic technique is being constantly improved, or as he says in one interview he and his trainer work on even faster than one of the fastest starts in 100M races. It's just amazing the tallest sprinter to have one of most explosive starts as well. And even more amazing is the will for improvements.

I have a sambo practitioner buddy who had said about his 100/200M records: "What technique? It is just left-right left-right!"

Of course I did disagree. Neglecting the basic/fundamental stances leads to nasty future slips(a kind of 'O! What happened' i.e. lack of further deep-understanding/technique-improvement).

Georgi 'Sanmayce',

I fully agree with:

http://cbloomrants.blogspot.com/2010/11/11-29-10-useless-hash-test.html

Looking at my 3chars..12chars Building-Block test I see the strong candidate for A future ultimate testbed.

Peter Kankowski,
Thanks for the link. I'm glad that he got good results with Whiz :)
Georgi 'Sanmayce',

Hi Peter,

pre-yesterday a documentary movie on History Channel about Nicola Tesla (an outstanding man not only a pragmatic visionary) inspired me to tune an almost forgotten hasher.

Here comes FNV1A_Tesla: a suitable hasher for keys [much] longer than 15 bytes (the case of 3+grams phrases).

That is the very (with the 64bitx32bit->32bit multiplication and loss of carry) hash function I was talking about previously.

In all tests, below, FNV1A_Tesla outspeeds all my FNV1A variants.

Surprisingly the bad collision rate doesn't affect its speed, I have been hit [again] by the fact that the brutal loss of data doesn't (when keys are not with ONLY weak-range lengths) hurt the lookup.

Here the speed/dispersion trade-off was made in favor of speed of course.

The function itself:

//#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n))))
UINT FNV1A_Hash_Tesla(const char *str, SIZE_T wrdlen)
{
	const UINT PRIME = 709607;
	UINT hash32 = 2166136261;
	//unsigned long long hash64 = 2166136261; // Change with a bigger one!
	const char *p = str;
	//unsigned long long QWORD1,QWORD2; //64bit=QWORD
	for(; wrdlen >= 2*2*sizeof(DWORD); wrdlen -= 2*2*sizeof(DWORD), p += 2*2*sizeof(DWORD)) {
		hash32 = (hash32 ^ (ROL(*(unsigned long long *)(p+0),5-0)^*(unsigned long long *)(p+8))) * PRIME; // loss of carry!
		//hash64 = (hash64 ^ (ROL(QWORD1,5-0)^QWORD2)) * PRIME;		
		//hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME;		
		//hash32 = (hash32 ^ (ROL(*(DWORD *)(p+8),5-0)^*(DWORD *)(p+12))) * PRIME;		
	}
	//hash32 = hash64 ^ (hash64 >> 32);
	// Cases: 0,1,2,3,4,5,6,7,... 15
	if (wrdlen & (2*sizeof(DWORD))) {
		hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME;		
		//hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
		//hash32 = (hash32 ^ *(DWORD*)(p+4)) * PRIME;
		p += 2*sizeof(DWORD);
	}
	if (wrdlen & sizeof(DWORD)) {
		hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
		p += sizeof(DWORD);
	}
	if (wrdlen & sizeof(WORD)) {
		hash32 = (hash32 ^ *(WORD*)p) * PRIME;
		p += sizeof(WORD);
	}
	if (wrdlen & 1) 
		hash32 = (hash32 ^ *p) * PRIME;
	return hash32 ^ (hash32 >> 16);
}

I am curious what amendments can be done. It is revision 1.

My 64bit knowledge/experience is next to nothing, so it would be nice somebody to refine it especially for 64bit compilers.

Some tests (on my Intel Merom 2.16GHz, Windows XP 32bit, VS2008 32bit compiler):

D:\_KAZE_new-stuff\VivaNicolaTesla>dir/oe

Volume in drive D is H320_Vol5

Volume Serial Number is 0CB3-C881

 Directory of D:\_KAZE_new-stuff\VivaNicolaTesla

03/16/2011  07:54 AM    <DIR>          ..
03/16/2011  07:54 AM    <DIR>          .
03/16/2011  07:39 AM           218,698 hash.cod
03/16/2011  07:39 AM            65,440 hash.cpp
03/16/2011  07:39 AM            87,552 hash.exe
03/16/2011  07:39 AM             8,390 BuildLog.htm
11/14/2010  02:39 PM         7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
12/03/2010  07:30 AM        42,892,307 IPS.TXT
11/14/2010  02:39 PM         4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
03/15/2011  12:10 PM       104,857,601 100MB_as_one_line.TXT
03/16/2011  07:54 AM       409,829,386 googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt
11/14/2010  02:39 PM         4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/14/2010  02:39 PM           388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/14/2010  02:39 PM       146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/14/2010  02:39 PM       278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd
              13 File(s)    998,706,809 bytes
               2 Dir(s)   2,947,858,432 bytes free

D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"

22202980 lines read

67108864 elements in the table (26 bits)

    FNV1A_Hash_Tesla:   23890797  23614936  23680579  23684698  23606344|  23606344 [ 3457538]
        FNV1A_Mantis:   24848068  24866965  24863372  25003541  24859437|  24848068 [ 3298270]
        FNV1A_Meiyan:   23836095  23832986  23858019  23818340  23992756|  23818340 [ 3345260]
     FNV1A_Jesteress:   23737579  23756975  23731544  23743013  23743112|  23731544 [ 3355676]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"

12561874 lines read

33554432 elements in the table (25 bits)

    FNV1A_Hash_Tesla:   12464491  12317094  12331733  12323774  12346488|  12317094 [ 2141464]
        FNV1A_Mantis:   12946943  12933482  12932442  12942204  13009030|  12932442 [ 2082213]
        FNV1A_Meiyan:   12583824  12417170  12440549  12465760  12441958|  12417170 [ 2111271]
     FNV1A_Jesteress:   12388725  12369952  12378952  12377230  12380569|  12369952 [ 2121868]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd"

351114 lines read

1048576 elements in the table (20 bits)

    FNV1A_Hash_Tesla:     252801    238573    234905    233660    237413|    233660 [   53107]
        FNV1A_Mantis:     253479    251841    249576    250135    254282|    249576 [   52712]
        FNV1A_Meiyan:     234582    238797    235336    239111    238004|    234582 [   52910]
     FNV1A_Jesteress:     235985    236268    234515    236577    234398|    234398 [   52684]
        FNV1A_Jester:     236458    237823^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd"

38936 lines read

131072 elements in the table (17 bits)

    FNV1A_Hash_Tesla:       9787      9429      9419      9397      9567|      9397 [    5176]
        FNV1A_Mantis:      10181     10205     10302     12326     11283|     10181 [    5185]
        FNV1A_Meiyan:       9591      9563      9530      9493      9603|      9493 [    5224]
     FNV1A_Jesteress:      15021     10163      9524      9533      9476|      9476 [    5182]
        FNV1A_Jester:       9637      9482      9586      9580      9588|      9482 [    5200]
       FNV1A_Smaragd:      11148     10041     10030     10048     10077|     10030 [    5194]
     FNV1A_Peregrine:      10117     10179      9974      9968     10116|      9968 [    5277]
          FNV1A_Whiz:       9616      9877      9679      9614     10118|      9614 [    5200]
     FNV1A_Nefertiti:      10428      9939      9870      9925     10214|      9870 [    5381]
              FNV-1a:      11328     11370     11388     11335     11216|     11216 [    5321]
    Sixtinsensitive+:      10182     10274      9987     10234     10065|      9987 [    5209]
     Sixtinsensitive:      10847     10800     10723     12666     10626|     10626 [    5347]
     Alfalfa_Rollick:      10855     10116     10006     10077     10019|     10006 [    5242]
             Alfalfa:      10422     10409     10427     10586     10946|     10409 [    5252]
        Alfalfa_HALF:      10676     10529     10602     10584     10553|     10529 [    5231]
       Alfalfa_DWORD:      10679     10883     11059     11093     11386|     10679 [    5252]
       Alfalfa_QWORD:      10759     10692     10777     10792     10745|     10692 [    5252]
           Bernstein:      11311^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv"

584879 lines read

2097152 elements in the table (21 bits)

    FNV1A_Hash_Tesla:     393365    373464    374534    375388    373557|    373464 [   81232]
        FNV1A_Mantis:     412156    414563    411944    414678    415841|    411944 [   74643]
        FNV1A_Meiyan:     383821    387854    387760    390291    387271|    383821 [   75377]
     FNV1A_Jesteress:     382663    383224    381564    380962    382974|    380962 [   75404]
        FNV1A_Jester:     384581^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt"

32359 lines read

65536 elements in the table (16 bits)

    FNV1A_Hash_Tesla:      27520     26722     27432     28257     26136|     26136 [    6937]
        FNV1A_Mantis:      28386     28415     28554     28248     27951|     27951 [    6925]
        FNV1A_Meiyan:      27598     27359     27334     27319     27338|     27319 [    6897]
     FNV1A_Jesteress:      27888     27308     27483     27310     27274|     27274 [    6883]
        FNV1A_Jester:      31040     31119     31379     30732     30596|     30596 [    6874]
       FNV1A_Smaragd:      40554     40719     42560     40805     40459|     40459 [    6849]
     FNV1A_Peregrine:      30687     30474     30474     31499     32991|     30474 [    6838]
          FNV1A_Whiz:      32561     31988     31396     31424     31415|     31396 [    6874]
     FNV1A_Nefertiti:      30534     30860     30331     30011     30143|     30011 [    6878]
              FNV-1a:      60627     60952     61696     61693     61678|     60627 [    6840]
    Sixtinsensitive+:      35231     35090     35459     35186     37219|     35090 [    6839]
     Sixtinsensitive:      38508^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash 100MB_as_one_line.TXT

1 lines read

4 elements in the table (2 bits)

    FNV1A_Hash_Tesla:     194019    197848    195058    193316    193382|    193316 [       0]
        FNV1A_Mantis:     234425    236411    236747    234573    234684|    234425 [       0]
        FNV1A_Meiyan:     240160    240099    242438    240194    242915|    240099 [       0]
     FNV1A_Jesteress:     243338    241873    239935    239735    239124|    239124 [       0]
        FNV1A_Jester:     331840^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash IPS.TXT

2995394 lines read

8388608 elements in the table (23 bits)

    FNV1A_Hash_Tesla:    2289107   2226568   2226390   2234596   2222701|   2222701 [  691369]
        FNV1A_Mantis:    2469897   2466911   2466878   2471249   2465728|   2465728 [  481137]
        FNV1A_Meiyan:    2290118   2285787   2284061   2291056   2286210|   2284061 [  593723]
     FNV1A_Jesteress:    2331767   2324661   2327224   2326173   2325028|   2324661 [  691369]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>hash googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt

17981107 lines read

67108864 elements in the table (26 bits)

    FNV1A_Hash_Tesla:   19108584  18921663  19047468  18902535  18913730|  18902535 [ 4218589]
        FNV1A_Mantis:   20408048  20413041  20428172  20421550  20570044|  20408048 [ 2208686]
        FNV1A_Meiyan:   19589642  19586573  19584081  19590689  19588464|  19584081 [ 2209364]
     FNV1A_Jesteress:   19482570  19662021  19476235  19490673  19499351|  19476235 [ 2208081]
        FNV1A_Jester: ^C
D:\_KAZE_new-stuff\VivaNicolaTesla>type googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt
...
a_bacillus_and_a
a_bacillus_belonging_to
a_bacillus_closely_related
a_bacillus_closely_resembling
a_bacillus_described_by
a_bacillus_discovered_by
a_bacillus_found_in
a_bacillus_from_the
a_bacillus_has_been
a_bacillus_identical_with
a_bacillus_in_the
a_bacillus_isolated_from
a_bacillus_known_as
a_bacillus_obtained_from
a_bacillus_of_the
a_bacillus_or_a
a_bacillus_resembling_that
a_bacillus_resembling_the
a_bacillus_similar_to
a_bacillus_that_is
a_bacillus_to_which
a_bacillus_which_has
a_bacillus_which_he
a_bacillus_which_is
a_bacillus_which_may
a_bacillus_which_they
a_bacillus_which_was
a_bacillus_whose_growth
a_bacillus_with_rounded
a_back_alley_and
a_back_alley_behind
a_back_alley_in
a_back_alley_of
a_back_alley_off
a_back_alley_or
a_back_alley_somewhere
a_back_alley_that
a_back_alley_to
a_back_alley_where
a_back_alley_with
...
D:\_KAZE_new-stuff\VivaNicolaTesla>

Regards

by the will of the hash(ing) gods...,

I am hunting for an extremely fast integer->integer hashing method for working with a large array of hashtables, in particular, where there are a high number of key (re)inserts, key (re)deletions, and value (re)updates within each hashtable, as large volumes of data are processed. Currently writing in C, but open to inlining assembly if it offers nice gains.

By the will of the hash(ing) gods ... show me the way!

Peter Kankowski,
If you have a high number of delete operations, balanced tree is a better choice than hash table.
ace,

Testing avalanche on integer hash functions

http://baagoe.org/en/wiki/Avalanche_on_integer_hash_functions

Quinn Norton,

I'd love to use some hashes in PHP but not have to enable or install an extension to do so, naturally speed and efficiency would suffer, but the "portability" of the code makes the trade off worth it for my needs. I'd love to have lookup3/SuperFastHash ported to a php function, even One-At-A-Time would be great!

http://www.burtleburtle.net/bob/hash/doobs.html http://www.azillionmonkeys.com/qed/hash.html
quinn.norton (shift+2) uknowwhat.org
Witek,

How about adding some CityHash hashes to the comparison?

Peter Kankowski,
Thanks, I will add them in future.
ms440,

Thanks Peter! I've found your blog extremely useful. Your discussion with @Sanmayce led me to your blog.

Keep up a good job!

Ace,

Cuckoo hashing: neither linear probing nor chaining

http://en.wikipedia.org/wiki/Cuckoo_hashing

DomDead,

I found the following to be an interesting hash function

http://code.google.com/p/xxhash/

Additional info can be found here:

http://fastcompression.blogspot.ca/2012/04/selecting-checksum-algorithm.html

Peter Kankowski,
Thanks, I will test it and publish the results here.
Mark Adler,

I concur with the general conclusion that Adler-32 should not be used as a hash function. It "fills up" with information from the input very slowly, which is definitely not what you want in a hash function, especially when used on short strings.

However the observation above: "The second problem is that the characters are not "weighted" (multiplied by different numbers), so that Adler-32("01") = Adler-32("10"), that's why it fails the Numbers test. Ditto for anagrams in Shakespeare's sonnets: Adler-32("heart") = Adler-32("earth")." is not correct. The Adler-32 of "01" is 0x00930062, whereas the Adler-32 of "10" is 0x00940062. Similarly, the Adler-32 of "heart" is 0x061c0215, and the Adler-32 of "earth" is 0x06280215.

The leading zeros in each half-word of those last two is indicative of the slow filling that I referred to.

Peter Kankowski,

I'm sorry, I've corrected the mistake in the article and added a link to your comment. The checksum actually has no collisions for the numbers, but when the values are reduced modulo the hash table size, they collide. For example, the hash table size is 216:

Adler-32("01") = 0x00930062
Adler-32("10") = 0x00940062

Adler-32("01") mod 2^16 = 0x0062
Adler-32("10") mod 2^16 = 0x0062

Adler-32("heart") = 0x061c0215
Adler-32("earth") = 0x06280215

Adler-32("heart") mod 2^16 = 0x0215
Adler-32("earth") mod 2^16 = 0x0215

It helps if you XOR the lower and the higher part, but there is still a lot of collisions:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
iSCSI CRC65[105]330[415]36[112]85[106]83[92]280[368]408[584]1968[2388]323[838]1.01[1.78]
Meiyan65[102]329[409]45[125]87[106]85[112]274[350]412[588]1976[2377]353[768]1.05[1.87]
Murmur272[103]378[415]49[104]109[106]106[111]314[383]453[566]2187[2399]399[834]1.21[1.74]
...
Fletcher71[131]353[406]80[460]103[127]100[108]312[507]481[1052]2479[4893]388[1359]1.30[4.62]
...
Adler with XOR84[124]434[468]106[449]135[115]131[108]364[514]636[1639]3453[8100]706[3005]1.75[5.09]
Adler84[136]457[746]136[477]137[136]135[136]394[872]773[2495]5359[11945]833[3518]2.06[6.02]
Hanson73[118]418[649]45[112]123[118]1207[499]321[435]450[592]2329[2890]371[833]2.70[2.46]

 

WikipediaAvg
iSCSI CRC5793852[2077725]1.00[1.00]
Meiyan5899524[2111271]1.02[1.02]
Larson6334029[2080111]1.09[1.00]
Murmur26357397[2081476]1.10[1.00]
...
Fletcher22159903[9063797]3.82[4.37]
Adler with XOR274019692[11669974]47.29[5.63]

 

As you said, the function should be used as a checksum, not as a hash function.

Georgi 'Sanmayce',

Hi Peter,

just wanted to see how the fastest (regarding linear speed) hasher FNV1A_Tesla (64bit) rewritten down to 32bit would behave, so here comes my new favorite FNV1A_Yorikke - the fastest 32bit hasheress.

She outspeeds both FNV1A_Jesteress and FNV1A_Meiyan featuring collisions comparable to CRC32.

I wonder how the new approach (hashing two lines) behaves on i5/i7, my expectations are that FNV1A_Yorikke is gonna scream.

I hate the fact that I still cannot play with an i7 machine, so the following results are obtained on my laptop T7500 2200MHz:

Words 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         76 [  110]
              Meiyan:         76 [  102]
             Yorikke:         76 [  108]
        x17 unrolled:         88 [  109]
              FNV-1a:         92 [  124]
              Larson:         93 [   99]
              CRC-32:         85 [  101]
             Murmur2:         86 [  103]
                SBox:         81 [   91]
            Murmur2A:         95 [  114]
             Murmur3:        100 [  101]
           XXHfast32:         97 [  110]
         XXHstrong32:         96 [  109]
Win32 

1992 lines read

4096 elements in the table (12 bits)

           Jesteress:        401 [  397]
              Meiyan:        405 [  409]
             Yorikke:        403 [  431]
        x17 unrolled:        577 [  415]
              FNV-1a:        576 [  428]
              Larson:        565 [  416]
              CRC-32:        527 [  426]
             Murmur2:        467 [  415]
                SBox:        482 [  431]
            Murmur2A:        515 [  433]
             Murmur3:        517 [  380]
           XXHfast32:        473 [  420]
         XXHstrong32:        488 [  429]
Numbers 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         51 [  300]
              Meiyan:         47 [  125]
             Yorikke:         46 [   86]
        x17 unrolled:         45 [   24]
              FNV-1a:         53 [  108]
              Larson:         41 [   16]
              CRC-32:         45 [   64]
             Murmur2:         53 [  104]
                SBox:         50 [  116]
            Murmur2A:         62 [  102]
             Murmur3:         67 [  104]
           XXHfast32:         67 [  102]
         XXHstrong32:         70 [  102]
Prefix 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:        110 [  102]
              Meiyan:        111 [  106]
             Yorikke:        107 [   94]
        x17 unrolled:        206 [  113]
              FNV-1a:        195 [   94]
              Larson:        195 [   99]
              CRC-32:        177 [  107]
             Murmur2:        138 [  106]
                SBox:        149 [  108]
            Murmur2A:        149 [  112]
             Murmur3:        148 [  103]
           XXHfast32:        117 [  103]
         XXHstrong32:        123 [  102]
Postfix 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:        108 [  106]
              Meiyan:        108 [  112]
             Yorikke:        106 [  111]
        x17 unrolled:        201 [  102]
              FNV-1a:        195 [  105]
              Larson:        195 [  105]
              CRC-32:        174 [   94]
             Murmur2:        136 [  111]
                SBox:        148 [   91]
            Murmur2A:        146 [  109]
             Murmur3:        148 [  105]
           XXHfast32:        115 [  106]
         XXHstrong32:        123 [  112]
Variables 

1842 lines read

4096 elements in the table (12 bits)

           Jesteress:        337 [  366]
              Meiyan:        341 [  350]
             Yorikke:        338 [  359]
        x17 unrolled:        418 [  368]
              FNV-1a:        418 [  374]
              Larson:        429 [  366]
              CRC-32:        400 [  338]
             Murmur2:        384 [  383]
                SBox:        371 [  347]
            Murmur2A:        424 [  365]
             Murmur3:        433 [  334]
           XXHfast32:        406 [  347]
         XXHstrong32:        405 [  355]
Sonnets 

3228 lines read

8192 elements in the table (13 bits)

           Jesteress:        494 [  585]
              Meiyan:        501 [  588]
             Yorikke:        496 [  552]
        x17 unrolled:        565 [  589]
              FNV-1a:        558 [  555]
              Larson:        593 [  583]
              CRC-32:        560 [  563]
             Murmur2:        562 [  566]
                SBox:        516 [  526]
            Murmur2A:        627 [  544]
             Murmur3:        651 [  555]
           XXHfast32:        602 [  491]
         XXHstrong32:        598 [  491]
UTF-8 

13408 lines read

32768 elements in the table (15 bits)

           Jesteress:       2391 [ 2427]
              Meiyan:       2445 [ 2377]
             Yorikke:       2427 [ 2392]
        x17 unrolled:       2786 [ 2392]
              FNV-1a:       2860 [ 2446]
              Larson:       2979 [ 2447]
              CRC-32:       2770 [ 2400]
             Murmur2:       2724 [ 2399]
                SBox:       2640 [ 2442]
            Murmur2A:       3037 [ 2369]
             Murmur3:       3100 [ 2376]
           XXHfast32:       2946 [ 2494]
         XXHstrong32:       2936 [ 2496]
IPv4 

3925 lines read

8192 elements in the table (13 bits)

           Jesteress:        576 [  819]
              Meiyan:        590 [  807]
             Yorikke:        588 [  821]
        x17 unrolled:        796 [  804]
              FNV-1a:        855 [  796]
              Larson:        817 [  789]
              CRC-32:        787 [  802]
             Murmur2:        698 [  825]
                SBox:        722 [  804]
            Murmur2A:        762 [  804]
             Murmur3:        776 [  818]
           XXHfast32:        789 [  829]
         XXHstrong32:        809 [  829]
3333 Latin Powers 

3333 lines read

8192 elements in the table (13 bits)

           Jesteress:        763 [  576]
              Meiyan:        770 [  583]
             Yorikke:        779 [  579]
        x17 unrolled:       1345 [  564]
              FNV-1a:       1299 [  604]
              Larson:       1301 [  581]
              CRC-32:       1192 [  613]
             Murmur2:        956 [  600]
                SBox:        996 [  576]
            Murmur2A:       1023 [  576]
             Murmur3:       1032 [  583]
           XXHfast32:        843 [  596]
         XXHstrong32:        882 [  571]
~3 million IPs (dot format) 

2995394 lines read

8388608 elements in the table (23 bits)

           Jesteress:    2027663 [691369]
              Meiyan:    2033983 [593723]
             Yorikke:    1952199 [476699]
        x17 unrolled:    2357193 [475528]
              FNV-1a:    2410596 [477067]
              Larson:    2369252 [475575]
              CRC-32:    2298651 [472854]
             Murmur2:    2298675 [476330]
                SBox:    2412474 [476681]
            Murmur2A:    2376168 [475493]
             Murmur3:    2346091 [476845]
           XXHfast32:    2365397 [476358]
         XXHstrong32:    2372267 [476358]
Russian ASCII 

584879 lines read

2097152 elements in the table (21 bits)

           Jesteress:     322585 [75404]
              Meiyan:     325962 [75377]
             Yorikke:     324935 [74661]
        x17 unrolled:     311773 [75124]
              FNV-1a:     370532 [74184]
              Larson:     327605 [74389]
              CRC-32:     360723 [74307]
             Murmur2:     359362 [74234]
                SBox:     368927 [74645]
            Murmur2A:     375582 [74456]
             Murmur3:     371407 [74612]
           XXHfast32:     370156 [74572]
         XXHstrong32:     371014 [74603]

Wikipedia en

12561874 lines read

33554432 elements in the table (25 bits)

           Jesteress:   10606801 [2121868]
              Meiyan:   10691456 [2111271]
             Yorikke:   10710077 [2084954]
        x17 unrolled:   10336797 [2410605]
              FNV-1a:   11551149 [2081195]
              Larson:   10837339 [2080111]
              CRC-32:   11464031 [2075088]
             Murmur2:   11379472 [2081476]
                SBox:   11530201 [2084018]
            Murmur2A:   11762919 [2081370]
             Murmur3:   11708730 [2082084]
           XXHfast32:   11576405 [2084164]
         XXHstrong32:   11570909 [2084514]
Wikipedia de-en-es-fr-it-nl-pt-ro 

22202980 lines read

67108864 elements in the table (26 bits)

           Jesteress:   19823910 [3355676]
              Meiyan:   19998537 [3345260]
             Yorikke:   20036224 [3300245]
        x17 unrolled:   19252660 [3830652]
              FNV-1a:   21626497 [3297552]
              Larson:   20162419 [3296692]
              CRC-32:   21396006 [3298998]
             Murmur2:   21249648 [3297709]
                SBox:   21487107 [3298021]
            Murmur2A:   21675364 [3300445]
             Murmur3:   21264598 [3299700]
           XXHfast32:   21028361 [3301160]
         XXHstrong32:   21033394 [3302256]
100MB as one line 

1 lines read

4 elements in the table (2 bits)

           Jesteress:     198199 [    0]
              Meiyan:     198333 [    0]
             Yorikke:     176506 [    0]
        x17 unrolled:     953166 [    0]
              FNV-1a:     924509 [    0]
              Larson:     950111 [    0]
              CRC-32:     764957 [    0]
             Murmur2:     339978 [    0]
                SBox:     512374 [    0]
            Murmur2A:     339648 [    0]
             Murmur3:     303091 [    0]
           XXHfast32:     168528 [    0]
         XXHstrong32:     217354 [    0]
5,000,000 Knight Tours 

5000000 lines read

16777216 elements in the table (24 bits)

           Jesteress:     5912178 [676877]
              Meiyan:     5917649 [676877]
             Yorikke:     5762697 [677478]
        x17 unrolled:   136105460 [4868928]
              FNV-1a:    13574050 [2080003]
              Larson:    42509864 [4475748]
              CRC-32:     9373072 [676997]
             Murmur2:     6755138 [675965]
                SBox:    11414715 [2079523]
            Murmur2A:     6882708 [676417]
             Murmur3:     6673330 [676857]
           XXHfast32:     5864132 [675637]
         XXHstrong32:     6154955 [675834]

In last test we hash 5 million 128bytes long lines (kind of super-heavy-prefix test).

It resembles having very similar (sharing one long prefix) 5 million 128 chars long tweets.

Obviously we need more versatile tests (like my 'Knight Tours') in order to find-and-fix possible weak points.

Here x17 unrolled, FNV-1a, Larson, SBox fail to keep up with others.

In this a la tweet test FNV1A_Yorikke outspeeds XXHfast32 with (5864132-5762697)/5762697*100% = 1.7%, how about on i5/i7 machines!?

In pre-last (linear hash speed - 100MB as one line) test XXHfast32 outspeeds FNV1A_Yorikke with (176506-168528)/168528*100% = 4.7%, is this the case on i5/i7 machines!?

In my view FNV1A_Yorikke has heart full of soul or "fine and clean like polished gold" (just as B.Traven describes her), yet, more torture is needed.

You all are welcome to download my (your benchmarker is used) latest (all above tests included) hash package:

http://www.sanmayce.com/Fastest_Hash/_KAZE_hash_Yorikke.7z (99,482,860 bytes)

If we have to hash some 5,000,000,000+ tweets, the things would rapidly go ugly.

It would be useful to add more fault-finding tests, thus we can have/rely on some hasher for "real world" (i.e. heavy) loads.

Regards

Lefteris,

Very informative analysis. Sincerely thank you Peter. Made my job of finding a good hash function so much easier with the research you have conducted in this topic.

David Norton,

Hi Piter,

Thanks for the article. It would be great if you make your test for linux too. You will be surprised with the results.

For me the best hash function here is “Yorikke”.

Thank You Sanmayce!

Georgi 'Sanmayce',

Thank you Mr. Norton,

I wrote Knight-tour_r8dump_Yorikke.c (downloadable at link below) which hashes Knight-Tours on the fly in order to monitor the fattest slot(s).

This torture-test uses 27bit hash table i.e. 134,217,728 slots.

http://www.sanmayce.com/Fastest_Hash/index.html#KT_torture

Up to last night the test reached 550,000,000 Knight-Tours:

FNV1A_Yorikke: KT_DumpCounter = 0,555,000,000; 000,000,005 x MAXcollisionsAtSomeSlots = 000,019; HASHfreeSLOTS = 0,002,145,975
CRC32        : KT_DumpCounter = 0,555,000,000; 000,000,001 x MAXcollisionsAtSomeSlots = 000,021; HASHfreeSLOTS = 0,002,148,539

0555 million KT | FNV1A_Yorikke | 0,005 x | 0,019 |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|
0555 million KT | CRC32         | 0,001 x | 0,021 |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|

FNV1A_Yorikke has 005 slots with MAX_depthness 019 (number of x's).

CRC32 has 001 slot with MAX_depthness 021 (number of x's).

In other words MAX_depthness 019 means maximum 019 layers or maximum 019 keys sharing one slot.

Notice also that FNV1A_Yorikke has HASH utilization (134,217,728-2,145,975)/134,217,728*100% = 98.401%,
a mutsi dispersion, slightly better than CRC32's (134,217,728-2,148,539)/134,217,728*100% = 98.399%.

I will wait until all the 5,000,000,000 Knight-Tours are hashed, maybe a week more.

This is the heaviest torture-hash-test, no?

Also I am very glad of Mr. Noll's retweet:

RT “@Sanmayce: @landonnolL .. fastest 32bit hasheress released: http://www.sanmayce.com/Fastest_Hash/index.html#FNV1A_Yorikke … ..i” < Thanx 
David Norton,
@Sanmayce

Dear Sanmayce,

I would like to thank You for the intention to share yours amazing real McCoy hash functions!

When you finish the test I hope people to see the “Yorikke” beauty as me.

In our “hostile” 24/7 *nix environment the speed counts and “Yorikke” rocks!

It’s a treasure and state of the art work.

Thanks again and all the best!

DN

Peter Kankowski,
A benchmark of different string hash table implementations (GCC, Google, HP, Dinkumware, and Python): http://preshing.com/20110603/hash-table-performance-tests
Alexander,

Some idea...

unsigned long maFastPrime1Hash(char *str, unsigned int len) {
    unsigned int hash = len, i = 0, t, k;
    long rem = len;
    unsigned char trail;

    const unsigned char * data = (const unsigned char *)str;

    while (rem >= 4) {
        k = *(unsigned int*)data;
        k += i++;
        hash ^= k;
        hash *= 171717;
        data += 4;
        rem -= 4;
    }

    while (rem >= 0) {
        trail = *(unsigned char*)data;
        trail += i++;
        hash ^= trail;
        hash *= 171717;
        data++;
        rem--;
    }

    return hash;
}

by Alexander Myasnikov

amsoftware.narod.ru (amsoftware@ya.ru)
Alexander,

Some idea to speed up (but little more collisions)

unsigned long maRushPrime1Hash(char *str, unsigned int len) {
	unsigned int hash = len, i = 0, k;
	long rem = len;

	const unsigned char * data = (const unsigned char *)str;

	while (rem >= 4) {
		k = *(unsigned int*)data;
		k += i++;
		hash ^= k;
		hash *= 171717;
		data += 4;
		rem -= 4;
	}

	switch (rem) {
	case 3:
		k = (unsigned long)(data[0]) | (unsigned long)(data[1] << 8) |
			(unsigned long)(data[2] << 16);
		k += i++;
		hash ^= k;
		hash *= 171717;
		break;

	case 2:
		k = (unsigned long)(data[0]) | (unsigned long)(data[1] << 8);
		k += i++;
		hash ^= k;
		hash *= 171717;

		break;

	case 1:
		k = (unsigned long)(data[0]);
		k += i++;
		hash ^= k;
		hash *= 171717;
		break;

	}

	return hash;
}

by Alexander Myasnikov

amsoftware.narod.ru (amsoftware at ya.ru)

Peter Kankowski,

Unfortunately, the hash functions don't perform well on my benchmark. The results of "IPv4" and "Numbers" tests are especially bad:

WordsWin32NumbersPrefixPostfixVariablesSonnetsUTF-8IPv4Avg
iSCSI CRC67[105]340[415]36[112]88[106]86[92]285[368]410[584]1999[2388]323[838]1.03[1.78]
Meiyan64[102]328[409]45[125]87[106]85[112]272[350]411[588]1971[2377]353[768]1.05[1.87]
Murmur272[103]378[415]48[104]109[106]106[111]313[383]452[566]2186[2399]399[834]1.20[1.74]
Larson70[99]398[416]34[16]143[99]141[105]305[366]437[583]2164[2447]349[755]1.22[1.10]
SBox70[91]391[431]46[116]125[108]123[91]305[347]433[526]2191[2442]378[836]1.22[1.78]
XXHfast3278[110]373[420]57[102]89[103]88[106]316[347]472[491]2336[2494]464[838]1.23[1.71]
...
MaPrime2c82[103]463[426]50[106]155[91]156[106]357[349]508[550]2566[2406]408[865]1.43[1.73]
...
Weinberger84[104]594[422]37[100]249[111]271[117]400[364]526[712]2772[2547]421[744]1.75[1.75]
Hanson71[118]397[649]45[112]116[118]1200[499]302[435]432[592]2209[2890]371[833]2.61[2.46]
maRush76[146]354[427]354[496]112[207]94[127]325[522]538[1179]2656[4444]1557[3371]2.64[5.35]
maFast74[125]372[447]357[496]108[154]100[126]324[457]516[1006]2505[3923]1582[3027]2.65[5.12]
Alexander,

Some collision and speed tests on different data sets (Russian)

http://amsoftware.narod.ru/algo2.html

Cyan,

xxHash has been updated recently :

http://code.google.com/p/xxhash/

slashmais,

Hi, I've created the following hash function for my own use, and found that using random generated strings, I repeatedly get 0 collisions. I repeatedly tested with 10 million such random strings with same 0 collisions, and also compared the same sets of strings with the djb-function - the djb function averaged more than 200 000 collisions.

I lack testing sources/techniques and would be very interested in the results if you would be so kind as to test it.

Here's the function:

unsigned long slash_hash(const std::string s)
{

union

	{

unsigned long t;

		unsigned char b[sizeof(long)];
	};
	unsigned long i=0,n=s.length(),p=0,d=sizeof(long);
	t=0L;
	while (i<n)
	{
		b[p++] += s[i] << (i/d);
		if (p>=d) p=0;
		i++;
	}
	return t;
}

slashmais,

PS to my previous post:

I'm using a 64-bit machine and sizeof(long) == 8 bytes

and I think the collisions will start in earnest with

strings that exceed 63 bytes in length (then (i/d)>=8)

and the shift s[i] may result in 0 - just a guess.

slashmais,

(sorry - I feel like a spammer;)

Here is a revised/improved version of the function:

uint64_t slash_hash(const char *s)
//uint32_t slash_hash(const char *s)
{
	union { uint64_t h; uint8_t u[8]; };
	int i=0, p=0; h=0;
	while (*s) { u[p++%8] += i + (*s++ << (i/8) % 8); i++; }
	return h;
	//return (h+(h>>32));
}

I tested it against MurmurHash2 with the same (my) data-sets and the results were about the same. I would really appreciate a comparative test - the function, for it's simplicity, seems to work remarkably well :)

A. Non,

It looks like SipHash is becoming the new standard for hashing short messages. I'm curious to see how it performs on your benchmarks.

https://131002.net/siphash/

tatumizer,

Typical hash map in web apps is JSON object. It has short attribute names, and not a lot of entries.

Let's assume average size of identifier is 12 chars, and number of entries - say, in the range 16-32 (certainly less than 256)

Clearly, we need a hashing function optimized for this case.

In this regard, it would be interesting to see statistics for Pearson algo in your research.

NOTE:

For short identifiers, amortized cost of last iteration' branch misprediction is very high- approx 1 cycle/byte, so probably it would make sense

to have N specialized versions (for string length 1, 2, ... say, 24), and use indirect jump (costs 5 cycles), this will translate to 1 cycle per char in savings -

comparable with the cost of computation per char itself.

What do you think?

ace,

Choosing a Good Hash Function, Part 3

http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/

Georgi 'Sanmayce',

Hi guys,

I am happy to share my latest best.

A remainderless variant for 16[+] bytes keys appeared while I was playing with Yorikke and wanting to try an old idea of mine - to reduce the branching.

When the GOLDEN Yorikke was Interleaved&Interlaced a DIAMANT appeared - it's time for a new generation slasher: FNV1A_Yoshimura.

Up to now the TOP benchmark results I was able to gather:

On AMD (Phenom II X6 1600T, 4000MHz) FNV1A_YoshimitsuTRIAD reigns with 11.360MB per clock or 11360/1024 = 11.093GB/s.

On Intel (i7-3930K, 4500MHz) FNV1A_YoshimitsuTRIAD reigns with 14.928MB per clock or 14928/1024 = 14.578GB/s.

It is worth the attempt to explore the Jesteress-Yorikke (i.e. 1 hash line 4+4 vs 2 hash lines 4+4) 8-16 GAP in order to lessen their collisions further more.

Simply put, 3 hash lines 4 bytes each, 12 bytes per loop.

The 'non power of 2' workaround I see as one MONOLITH function with no remainder mixing at all.

The idea #1 is to exterminate all nasty IFs outwith the main loop, I believe such branchless etude will outperform Jesteress.

The idea #2 is to STRESS memory by fetching not-so-adjacent areas.

For example:

Key: hash_with_overlapping_aye_aye

Key left-to-right quadruplets and remainder: 'hash', '_wit', 'h_ov', 'erla', 'ppin', 'g_ay', 'e_ay', 'e'
Key right-to-left quadruplets and remainder: 'h', 'ash_', 'with', '_ove', 'rlap', 'ping', '_aye', '_aye'

Key_Length: 29

Loop_Counter: 3 //if ( Key_Length%(3*4) ) Loop_Counter = Key_Length/(3*4)+1; else Loop_Counter = Key_Length/(3*4);

Loop #1 of 3:

Hash line 1: hash

Hash line 2: h_ov

Hash line 3: ping

Loop #2 of 3:

Hash line 1: _wit

Hash line 2: erla

Hash line 3: _aye

Loop #3 of 3:

Hash line 1: h_ov

Hash line 2: ppin

Hash line 3: _aye

I don't know the internals, whether lines are 32/64/128 bytes long is a secondary concern.

Well, the key is too short, in reality the above key may span only 1|2 cache lines, if the key is longer than 4 cache lines (assuming 32) e.g. 128+2 bytes then it may span 5|6 lines.

My dummy/clueless premise is that is possible (in future systems) to access effectively RAM in such manner.

Does someone know whether such type of accessing has any practical value in nowadays CPUs?

Of course, it is worth trying to "interleave" in that way all the short string hashers, yes?

Anyway the 3 lines are in stack, for the time being let's see how 'INTERLEAVED' Yorikke, which I called FNV1A_Yoshimura, behaves.

// [North Star One-Sword School]
// - My name is Kanichiro Yoshimura.
//   I'm a new man. Just so you'll know who I am...
//   Saito-sensei.
// - What land are you from?
// - 'Land'?
// - Yes.
// - I was born in Morioka, in Nanbu, Oshu.
//   It's a beautiful place.
//   Please...
//   Away to the south is Mt Hayachine...
//   with Mt Nansho and Mt Azumane to the west.
//   In the north are Mt Iwate and Mt Himekami.
//   Out of the high mountains flows the Nakatsu River...
//   through the castle town into the Kitakami below Sakuranobaba.
//   Ah, it's pretty as a picture!
//   There's nowhere like it in all Japan!
// /Paragon Kiichi Nakai in the paragon piece-of-art 'The Wolves of Mibu' aka 'WHEN THE LAST SWORD IS DRAWN'/
// As I said on one Japanese forum, Kiichi Nakai deserves an award worth his weight in gold, nah-nah, in DIAMONDS!
uint32_t FNV1A_Hash_Yoshimura(const char *str, uint32_t wrdlen)
{
    const uint32_t PRIME = 709607;
    uint32_t hash32 = 2166136261;
    uint32_t hash32B = 2166136261;
    const char *p = str;
    uint32_t Loop_Counter;
    uint32_t Second_Line_Offset;

if (wrdlen >= 2*2*sizeof(uint32_t)) {
    Second_Line_Offset = wrdlen-((wrdlen>>4)+1)*(2*4); // ((wrdlen>>1)>>3)
    Loop_Counter = (wrdlen>>4);
    //if (wrdlen%16) Loop_Counter++;
    Loop_Counter++;
    for(; Loop_Counter; Loop_Counter--, p += 2*sizeof(uint32_t)) {
		// revision 1:
		//hash32 = (hash32 ^ (_rotl(*(uint32_t *)(p+0),5) ^ *(uint32_t *)(p+4))) * PRIME;        
		//hash32B = (hash32B ^ (_rotl(*(uint32_t *)(p+0+Second_Line_Offset),5) ^ *(uint32_t *)(p+4+Second_Line_Offset))) * PRIME;        
		// revision 2:
		hash32 = (hash32 ^ (_rotl(*(uint32_t *)(p+0),5) ^ *(uint32_t *)(p+0+Second_Line_Offset))) * PRIME;        
		hash32B = (hash32B ^ (_rotl(*(uint32_t *)(p+4+Second_Line_Offset),5) ^ *(uint32_t *)(p+4))) * PRIME;        
    }
} else {
    // Cases: 0,1,2,3,4,5,6,7,...,15
    if (wrdlen & 2*sizeof(uint32_t)) {
		hash32 = (hash32 ^ *(uint32_t*)(p+0)) * PRIME;
		hash32B = (hash32B ^ *(uint32_t*)(p+4)) * PRIME;
		p += 4*sizeof(uint16_t);
    }
    // Cases: 0,1,2,3,4,5,6,7
    if (wrdlen & sizeof(uint32_t)) {
		hash32 = (hash32 ^ *(uint16_t*)(p+0)) * PRIME;
		hash32B = (hash32B ^ *(uint16_t*)(p+2)) * PRIME;
		p += 2*sizeof(uint16_t);
    }
    if (wrdlen & sizeof(uint16_t)) {
        hash32 = (hash32 ^ *(uint16_t*)p) * PRIME;
        p += sizeof(uint16_t);
    }
    if (wrdlen & 1) 
        hash32 = (hash32 ^ *p) * PRIME;
}
    hash32 = (hash32 ^ _rotl(hash32B,5) ) * PRIME;
    return hash32 ^ (hash32 >> 16);
}

To reproduce the quick-test below here comes: http://www.sanmayce.com/Fastest_Hash/DOUBLOON_hash_micro-package_r2.zip

The results on my 'Bonboniera' T7500, throwing mostly 16+ long keys at the "awful greedy country samurai":

E:\Night_Light_Sky_hash_package_r1+\DOUBLOON_hash_micro-package_r2>RUNME.BAT
E:\Night_Light_Sky_hash_package_r1+\DOUBLOON_hash_micro-package_r2>type Results.txt

Intel 12.1:

3333 lines read

8192 elements in the table (13 bits)

           Jesteress: 493 [  576]
              Meiyan: 515 [  583]
             Yorikke: 458 [  579]
           Yoshimura: 379 [  593] !!! SIGNIFICANTLY fastEST !!!
          Yoshimitsu: 497 [  609]
     YoshimitsuTRIAD: 489 [  615]
              FNV-1a: 969 [  604]
              Larson: 947 [  581]
              CRC-32: 894 [  613]
             Murmur2: 656 [  600]
             Murmur3: 711 [  583]
           XXHfast32: 504 [  596]
         XXHstrong32: 528 [  571]

1000 lines read

2048 elements in the table (11 bits)

           Jesteress:  268 [  205]
              Meiyan:  268 [  205]
             Yorikke:  224 [  207]

Yoshimura: 235 [ 187] ??? the slowest of all the four Yo*, something to ponder about ???

          Yoshimitsu:  225 [  225]
     YoshimitsuTRIAD:  221 [  219]
              FNV-1a: 1125 [  225]
              Larson: 1131 [  212]
              CRC-32:  919 [  230]
             Murmur2:  439 [  222]
             Murmur3:  497 [  223]
           XXHfast32:  250 [  223]
         XXHstrong32:  309 [  192]

32359 lines read

65536 elements in the table (16 bits)

           Jesteress: 12249 [ 6883]
              Meiyan: 12369 [ 6897]
             Yorikke: 11000 [ 6872]
           Yoshimura:  9876 [ 6908] !!! fastEST, yet with high collisions !!!
          Yoshimitsu: 11489 [ 6937]
     YoshimitsuTRIAD: 11094 [ 6843]
              FNV-1a: 39491 [ 6840]
              Larson: 39714 [ 6889]
              CRC-32: 34264 [ 6891]
             Murmur2: 17678 [ 6786]
             Murmur3: 19626 [ 6850]
           XXHfast32: 10383 [ 6859]
         XXHstrong32: 12708 [ 6887]

Microsoft 16:

3333 lines read

8192 elements in the table (13 bits)

           Jesteress:  756 [  576]
              Meiyan:  781 [  583]
             Yorikke:  776 [  579]
           Yoshimura:  740 [  593]
          Yoshimitsu:  781 [  609]
     YoshimitsuTRIAD:  803 [  615]
              FNV-1a: 1306 [  604]
              Larson: 1304 [  581]
              CRC-32: 1204 [  613]
             Murmur2:  983 [  600]
             Murmur3: 1031 [  583]
           XXHfast32:  859 [  596]
         XXHstrong32:  883 [  571]

1000 lines read

2048 elements in the table (11 bits)

           Jesteress:  463 [  205]
              Meiyan:  464 [  205]
             Yorikke:  422 [  207]
           Yoshimura:  442 [  187]
          Yoshimitsu:  431 [  225]
     YoshimitsuTRIAD:  423 [  219]
              FNV-1a: 1311 [  225]
              Larson: 1319 [  212]
              CRC-32: 1148 [  230]
             Murmur2:  648 [  222]
             Murmur3:  637 [  223]
           XXHfast32:  451 [  223]
         XXHstrong32:  496 [  192]

32359 lines read

65536 elements in the table (16 bits)

           Jesteress: 20162 [ 6883]
              Meiyan: 20124 [ 6897]
             Yorikke: 19101 [ 6872]
           Yoshimura: 17801 [ 6908]
          Yoshimitsu: 19616 [ 6937]
     YoshimitsuTRIAD: 19370 [ 6843]
              FNV-1a: 47142 [ 6840]
              Larson: 48009 [ 6889]
              CRC-32: 42964 [ 6891]
             Murmur2: 25741 [ 6786]
             Murmur3: 25654 [ 6850]
           XXHfast32: 18179 [ 6859]
         XXHstrong32: 20557 [ 6887]

Mr.Norton you are most welcome, maybe you have already spotted that I don't target 64bit stamps at all, yet, if you are interested I can write r.3 of my 'Tesla' function using these (Interleaving & Interlacing) nifty techniques - I believe it will outperform itself being the fastest 64bit hash in m^2 testbench.

Georgi 'Sanmayce',

I thank Przemyslaw Skibinski and Maciej Adamczyk (m^2) for their 64bit testbench which I included along with yours (Peter) in the benchmark:

http://www.sanmayce.com/Fastest_Hash/DOUBLOON_hash_micro-package_r3.zip

Results below are for the Intel 12.1 32bit executable on my laptop:

// hash_I 16KB_as_one_line.TXT:

1 lines read

4 elements in the table (2 bits)

           Jesteress:  17 [    0]
              Meiyan:  17 [    0]
             Yorikke:  10 [    0]
           Yoshimura:  10 [    0]
          Yoshimitsu:   9 [    0]
     YoshimitsuTRIAD:  10 [    0]
              FNV-1a: 131 [    0]
              Larson: 131 [    0]
              CRC-32: 102 [    0]
             Murmur2:  35 [    0]
             Murmur3:  42 [    0]
           XXHfast32:  11 [    0]
         XXHstrong32:  21 [    0]
...
// hash_I 100MB_as_one_line.TXT:

1 lines read

4 elements in the table (2 bits)

           Jesteress: 128304 [    0]
              Meiyan: 128253 [    0]
             Yorikke: 108994 [    0]
           Yoshimura:  94794 [    0]
          Yoshimitsu: 101738 [    0]
     YoshimitsuTRIAD: 104999 [    0]
              FNV-1a: 850704 [    0]
              Larson: 853118 [    0]
              CRC-32: 663362 [    0]
             Murmur2: 239559 [    0]
             Murmur3: 287138 [    0]
           XXHfast32: 104313 [    0]
         XXHstrong32: 153782 [    0]
// hash_I 200MB_as_one_line.TXT:

1 lines read

4 elements in the table (2 bits)

           Jesteress:  257028 [    0]
              Meiyan:  256955 [    0]
             Yorikke:  218383 [    0]
           Yoshimura:  190907 [    0]
          Yoshimitsu:  203801 [    0]
     YoshimitsuTRIAD:  210256 [    0]
              FNV-1a: 1702024 [    0]
              Larson: 1709548 [    0]
              CRC-32: 1327211 [    0]
             Murmur2:  479303 [    0]
             Murmur3:  574363 [    0]
           XXHfast32:  208887 [    0]
         XXHstrong32:  307942 [    0]

I wrote revision 3 of FNV1A_Tesla as 64bit counterpart of FNV1A_Yoshimura and included them into the 64bit linear speed test by Przemyslaw and Maciej, the results are (I threw the 200MB at the hashers):

As console screenshots:

http://www.sanmayce.com/Fastest_Hash/64bit_page1_.png

http://www.sanmayce.com/Fastest_Hash/64bit_page2_.png

As console text dumps:

E:\DOUBLOON_hash_micro-package_r3>RUNME_64bit.BAT

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_O2.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 108 ms, 209715202 bytes = 1851 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
CityHash64                              1.0.3
  209715210 (x 1.000)      3333 MB/s 3389 MB/s      273e15 277e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      4081 MB/s 4081 MB/s      334e15 334e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4347 MB/s 4347 MB/s      356e15 356e15
xxhash-fast                             r3
  209715206 (x 1.000)      4000 MB/s 4000 MB/s      327e15 327e15
xxhash-strong                           r3
  209715206 (x 1.000)      2816 MB/s 2816 MB/s      230e15 230e15
xxhash256                               r3
  209715234 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_O3.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 109 ms, 209715202 bytes = 1834 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      3278 MB/s 3333 MB/s      268e15 273e15
CityHash64                              1.0.3
  209715210 (x 1.000)      3333 MB/s 3278 MB/s      273e15 268e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      3278 MB/s 3278 MB/s      268e15 268e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      3921 MB/s 3921 MB/s      321e15 321e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
xxhash-fast                             r3
  209715206 (x 1.000)      3636 MB/s 3636 MB/s      297e15 297e15
xxhash-strong                           r3
  209715206 (x 1.000)      2777 MB/s 2777 MB/s      227e15 227e15
xxhash256                               r3
  209715234 (x 1.000)      3773 MB/s 3773 MB/s      309e15 309e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_fast.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 110 ms, 209715202 bytes = 1818 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      2380 MB/s 2380 MB/s      195e15 195e15
CityHash64                              1.0.3
  209715210 (x 1.000)      2105 MB/s 2105 MB/s      172e15 172e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      3508 MB/s 3508 MB/s      287e15 287e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3389 MB/s 3389 MB/s      277e15 277e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4000 MB/s 4000 MB/s      327e15 327e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
xxhash-fast                             r3
  209715206 (x 1.000)      3773 MB/s 3773 MB/s      309e15 309e15
xxhash-strong                           r3
  209715206 (x 1.000)      2777 MB/s 2777 MB/s      227e15 227e15
xxhash256                               r3
  209715234 (x 1.000)      3921 MB/s 3921 MB/s      321e15 321e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Microsoft_VS2010_Ox.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 111 ms, 209715202 bytes = 1801 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      4444 MB/s 4444 MB/s      364e15 364e15
CityHash64                              1.0.3
  209715210 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      4081 MB/s 4081 MB/s      334e15 334e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3278 MB/s      273e15 268e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4347 MB/s 4347 MB/s      356e15 356e15
xxhash-fast                             r3
  209715206 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
xxhash-strong                           r3
  209715206 (x 1.000)      2857 MB/s 2857 MB/s      234e15 234e15
xxhash256                               r3
  209715234 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>

FNV1A_Yoshimura is simply DIAMANTINE.

Bulat Ziganshin,

VMAC-64, cryptographically strong keyed hash, runs at x64 with 4 bytes/cycle speed. i.e. 16 GB/s

Peter Kankowski,

Thank you. VMAC-64 is a cryptographic hash function, not a function for hash tables. Most likely, it will be slow on the short strings that are used in my benchmark. You are welcomed to benchmark it and post the results here.

Georgi 'Sanmayce',

For one, Bulat's point is from data deduplication context, am I right?

There VMAC(VHASH) IS a function for hash tables, where keys can be chunks several KB long.

Though not "natively" targeted for HT it could be used as such, similarly to CRC32 not a "native" HT function but a checksum.

Looking at VMAC(VHASH)'s main loop (the version below does 64-bytes of message at a time):

    for (i = 0; i < nw; i+= 8) {                                             
        MUL64(th,tl,get64PE((mp)+i  )+(kp)[i  ],get64PE((mp)+i+1)+(kp)[i+1]);
        ADD128(rh,rl,th,tl);                                                 
        MUL64(th,tl,get64PE((mp)+i+2)+(kp)[i+2],get64PE((mp)+i+3)+(kp)[i+3]);
        ADD128(rh,rl,th,tl);                                                 
        MUL64(th,tl,get64PE((mp)+i+4)+(kp)[i+4],get64PE((mp)+i+5)+(kp)[i+5]);
        ADD128(rh,rl,th,tl);                                                 
        MUL64(th,tl,get64PE((mp)+i+6)+(kp)[i+6],get64PE((mp)+i+7)+(kp)[i+7]);
        ADD128(rh,rl,th,tl);                                                 
    }   

It looks elegant, straight-forward and with potential to be bettered, yet, these 4 64x64 MULs are STILL scary.

As for those 4 bytes/cycle (on i7?!) they are kick-ass but in that respect 'xxhash256' kicks its ass quickly:

http://encode.ru/threads/1371-Filesystem-benchmark?p=33515&viewfull=1#post33515

On Core 2 2.33GHz in L2 cache xxhash256 hashes at 11.01GB/s or 5+ B/c, but this is only a warm-up, on Phenom II X4 955, 3.2GHz (4 threads) in L1 cache xxhash256 hashes at 73.98GB/s or 6+ B/c = (73.98*1024*1024*1024)/(4*3.2*1000*1000*1000), OUCH!

Well, xxhash256 is a monster, but there are supermonsters as well, in the name of reaching higher Bytes-Per-Cycle let us see what a Core i3 laptop can achieve in L3/L2/L1 using XMM registers:

Core i3 2310M 2.1GHz laptop:

FNV1A_penumbra, (2MB block): 15593MB/s or (15593*1024*1024)/(2.1*1000*1000*1000)=7.7B/c 
FNV1A_penumbra, (128KB block): 19692MB/s or (19692*1024*1024)/(2.1*1000*1000*1000)=9.8B/c 
FNV1A_penumbra, (16KB block): 19877MB/s or (19877*1024*1024)/(2.1*1000*1000*1000)=9.9B/c 

For reference my Core 2 T7500 2.2GHz laptop gave:

FNV1A_penumbra, (16KB block): 11262MB/s or (11262*1024*1024)/(2.2*1000*1000*1000)=5.3B/c 

Seeing that 'Everest' gives 67011MB/s for L1 cache Read on this i3 and comparing it to 19877MB/s results in only 237% non-utilization.

Brutal improvement in XMM department over the preprevious generation (i3 being of 2nd), 8GB/s - huge difference indeed.

FNV1A_penumbra simply combines the short (192- bytes) and "long" 192[+] bytes long keys under one roof, first hashed by FNV1a-YoshimitsuTRIADii while the second by an unrolled XMM FNV1a whopper.

And one more thing, dispersion, a FNV1A-YoshimitsuTRIADii vs VHASH showdown would be interesting.

Currently I am hashing (using 27bit HT) 1+ trillion a la Knight-Tour 128 byte long keys, 220h later the result is:

FNV1A_YoshimitsuTRIADii: KT_DumpCounter = 0,629,883,797,505; 000,000,001 x MAXcollisionsAtSomeSlots = 005,065; HASHfreeSLOTS = 0,000,000,000
CRC32 0x8F6E37A0, iSCSI: KT_DumpCounter = 0,629,883,797,505; 000,000,002 x MAXcollisionsAtSomeSlots = 005,072; HASHfreeSLOTS = 0,000,000,000

Where keys:slots ratio is 629,883,797,505/134,217,727 = 4693 or CRC32/FNV1A_YoshimitsuTRIADii show only 8% deviation from "ideal" distribution.

In my view each and every 'text message' HT function should be tortured with this 128 (neither long nor short) key length.

m^2,

I run a quick slash_hash test with SMHasher:

https://gist.github.com/anonymous/5926294

Not great.

Ace,

Safe against DoS attacks, by Jean-Philippe Aumasson and Daniel J. Bernstein:

https://131002.net/siphash/

Another mentioned:

http://google-opensource.blogspot.co.at/2011/04/introducing-cityhash.html

http://code.google.com/p/cityhash/

Ace,

And the set of tests maintained by Murmur authors: http://code.google.com/p/smhasher/

Peter Kankowski,

Thank you, SipHash looks interesting. I will include it in the benchmarks.

ace,

http://blog.booking.com/hardening-perls-hash-function.html

"Analysis done by the Perl5 Security Team suggests that One-At-A-Time-Hash is intrinsically more secure than MurmurHash. However to my knowledge, there is no peer-reviewed cryptanalysis to prove it.

There seems to be very little research into fast, robust, hash algorithms which are suitable for dynamic languages. Siphash is a notable exception and a step forward, but within the Perl community there seems to be consensus that it is currently too slow, at least in the recommended Siphash-2-4 incarnation. It is also problematic that its current implementation only supports 64 bit architectures. (No doubt this will improve over time, or perhaps even already has.)"

Ace,

https://github.com/facebook/folly/blob/master/folly/container/F14.md

"F14 is a 14-way probing hash table that resolves collisions by double hashing. Up to 14 keys are stored in a chunk at a single hash table position. Vector instructions (SSE2 on x86_64, NEON on aarch64) are used to filter within a chunk; intra-chunk search takes only a handful of instructions. F14 refers to the fact that the algorithm Filters up to 14 keys at a time. This strategy allows the hash table to be operated at a high maximum load factor (12/14) while still keeping probe chains very short.

F14 provides compelling replacements for most of the hash tables we use in production at Facebook."

Leonid Yuriev,

It seems t1ha superior to all of the above functions, both in speed and in quality.

Of course it could be used with Folly' F14 hash table.

https://github.com/PositiveTechnologies/t1ha

Peter Kankowski,

Thanks, I will test your function.

Georgi 'Sanmayce',

Hi Peter,

glad to share the latest-n-fastest FNV1A variant.

For a long time I knew how much more is out there, many coders shared very nice etudes, but my 'Yorikke' has something special, the ... Zennish approach embedded :P

Currently I am writing an insane matchfinder using B-trees while hashing millions of keys of order 4,6,8,10,12,14,16,18,36,64, thus a hasher of superhigh speed (FOR SMALL KEYS) is needed since the B-trees are constructed in multi-passes and billions of hash invocations of Yorikke are to be used. Latency is crucial, throughput is meh.

#define _rotl_KAZE(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define _PADr_KAZE(x, n) ( ((x) << (n))>>(n) )
#define ROLInBits 27 // 5 in r.1; Caramba: it should be ROR by 5 not ROL, from the very beginning the idea was to mix two bytes by shifting/masking the first 5 'noisy' bits (ASCII 0-31 symbols).

UINT Hash_Yorikke(const char *str, SIZE_T wrdlen)
{
    const UINT PRIME = 591798841;
    UINT hash32 = 2166136261;
    const char *p = str;
    long long PADDEDby8;

    for(; wrdlen >= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
        //hash32 = (hash32 ^ (_rotl(*(DWORD *)p,ROLInBits) ^ *(DWORD *)(p+4))) * PRIME;        
    hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)(p+0) ) * PRIME;        
    hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)(p+4) ) * PRIME;        
    }

    PADDEDby8 = _PADr_KAZE(*(long long *)(p+0), (8/1-(wrdlen&(8/1-1)))<<3);
    hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)((char *)&PADDEDby8+0) ) * PRIME;        
    hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)((char *)&PADDEDby8+(8/1)/2) ) * PRIME;        
    return hash32 ^ (hash32 >> 16);
}

// The very instrumental and informative page of Peter Kankowski, first column is time (smaller-better), last one is collisions (smaller-better):
/*
dic_common_words.txt 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         55 [  110]
              Meiyan:         56 [  102]
             Yorikke:         54 [   98] ! Best Speed, Best Dispersion ! on Core 2, 32bit executable
              FNV-1a:         69 [  124]
              Larson:         68 [   99]
              CRC-32:         65 [  101]
             Murmur2:         71 [  103]
             Murmur3:         68 [  101]
           XXHfast32:         80 [  110]
         XXHstrong32:         80 [  109]
dic_fr.txt 

13408 lines read

32768 elements in the table (15 bits)

           Jesteress:       1757 [ 2427]
              Meiyan:       1775 [ 2377]
             Yorikke:       1672 [ 2413] ! Best Speed, - ! on Core 2, 32bit executable
              FNV-1a:       2097 [ 2446]
              Larson:       2033 [ 2447]
              CRC-32:       2140 [ 2400]
             Murmur2:       2266 [ 2399]
             Murmur3:       2116 [ 2376]
           XXHfast32:       2428 [ 2494]
         XXHstrong32:       2431 [ 2496]
dic_ip.txt 

3925 lines read

8192 elements in the table (13 bits)

           Jesteress:        436 [  819]
              Meiyan:        451 [  807]
             Yorikke:        486 [  789] ! - , Best Dispersion ! on Core 2, 32bit executable
              FNV-1a:        614 [  796]
              Larson:        587 [  789]
              CRC-32:        589 [  802]
             Murmur2:        566 [  825]
             Murmur3:        549 [  818]
           XXHfast32:        704 [  829]
         XXHstrong32:        704 [  829]
dic_numbers.txt 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         40 [  300]
              Meiyan:         32 [  125]
             Yorikke:         37 [   82] ! - , - ! on Core 2, 32bit executable
              FNV-1a:         35 [  108]
              Larson:         26 [   16]
              CRC-32:         34 [   64]
             Murmur2:         45 [  104]
             Murmur3:         42 [  104]
           XXHfast32:         53 [  102]
         XXHstrong32:         53 [  102]
dic_postfix.txt 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         70 [  106]
              Meiyan:         74 [  112]
             Yorikke:         76 [   99] ! - , - ! on Core 2, 32bit executable
              FNV-1a:        159 [  105]
              Larson:        160 [  105]
              CRC-32:        129 [   94]
             Murmur2:         99 [  111]
             Murmur3:         98 [  105]
           XXHfast32:         76 [  106]
         XXHstrong32:         82 [  112]
dic_prefix.txt 

500 lines read

1024 elements in the table (10 bits)

           Jesteress:         73 [  102]
              Meiyan:         77 [  106]
             Yorikke:         79 [   94] ! - , Best Dispersion ! on Core 2, 32bit executable
              FNV-1a:        165 [   94]
              Larson:        161 [   99]
              CRC-32:        135 [  107]
             Murmur2:        103 [  106]
             Murmur3:        101 [  103]
           XXHfast32:         77 [  103]
         XXHstrong32:         82 [  102]
dic_Shakespeare.txt 

3228 lines read

8192 elements in the table (13 bits)

           Jesteress:        357 [  585]
              Meiyan:        366 [  588]
             Yorikke:        349 [  536] ! Best Speed, - ! on Core 2, 32bit executable
              FNV-1a:        419 [  555]
              Larson:        404 [  583]
              CRC-32:        433 [  563]
             Murmur2:        471 [  566]
             Murmur3:        443 [  555]
           XXHfast32:        493 [  491]
         XXHstrong32:        493 [  491]
dic_variables.txt 

1842 lines read

4096 elements in the table (12 bits)

           Jesteress:        249 [  366]
              Meiyan:        256 [  350]
             Yorikke:        240 [  351] ! Best Speed, - ! on Core 2, 32bit executable
              FNV-1a:        318 [  374]
              Larson:        313 [  366]
              CRC-32:        309 [  338]
             Murmur2:        318 [  383]
             Murmur3:        299 [  334]
           XXHfast32:        336 [  347]
         XXHstrong32:        339 [  355]
*/

Georgi 'Sanmayce',
Hashing Faster than SSE4.2 iSCSI-CRC

The feed:

https://github.com/wangyi-fudan/wyhash/issues/29#issuecomment-538078396

https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/824947#comment-1946227

Dummy me, had to fix v2, now everything is OK, my excuse - yesterday, have been distracted the whole day.

So, here comes v3:

#define _rotl_KAZE(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define _PADr_KAZE(x, n) ( ((x) << (n))>>(n) )
#define ROLInBits 27 // 5 in r.1; Caramba: it should be ROR by 5 not ROL, from the very beginning the idea was to mix two bytes by shifting/masking the first 5 'noisy' bits (ASCII 0-31 symbols).
// CAUTION: Add 8 more bytes to the buffer being hashed, usually malloc(...+8) - to prevent out of boundary reads!
uint32_t FNV1A_Hash_Yorikke_v3(const char *str, uint32_t wrdlen)
{
    const uint32_t PRIME = 591798841;
    uint32_t hash32 = 2166136261;
    uint64_t PADDEDby8;
    const char *p = str;
    for(; wrdlen > 2*sizeof(uint32_t); wrdlen -= 2*sizeof(uint32_t), p += 2*sizeof(uint32_t)) {
        hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ (*(uint32_t *)(p+0)) ) * PRIME;        
        hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ (*(uint32_t *)(p+4)) ) * PRIME;        
    }
		// Here 'wrdlen' is 1..8
		PADDEDby8 = _PADr_KAZE(*(uint64_t *)(p+0), (8-wrdlen)<<3); // when (8-8) the QWORD remains intact
	        hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(uint32_t *)((char *)&PADDEDby8+0) ) * PRIME;        
	        hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(uint32_t *)((char *)&PADDEDby8+4) ) * PRIME;        
    return hash32 ^ (hash32 >> 16);
}
// Last touch: 2019-Oct-03, Kaze

Georgi 'Sanmayce',

Peter, I see no ways to better the 32bit code hashers, so the fastest known to me 32bit hasher in 64bit code is:

https://forum.thefreedictionary.com/postsm1118964_MASAKARI--The-people-s-choice--General-Purpose-Grade--English-wordlist.aspx#1118964

Mohit Soni,

What is the size of Bucket used in all the hash function mentioned in the graph named (hash function quality using red dragon book) in this blog?. It is requested to answer the query please.

Peter Kankowski,

Hello Mohit, thank you, it's a good question. The number of buckets (slots) were 2 * the number of items rounded to the next multiple of two. For example, in the "numbers" test there are 500 items in the table, so the number of buckets (hash table size) is 2 * 512 = 1024. Hope this helps

Mohit Soni,

Thanks for your response

Mohit Soni,

I just wanted to know the exact value for the uniform distribution of different hash function in the graph named (hash function quality using red dragon book), Can you please provide the exact values for which you plotted the graph.

It would be very helpful of you.

 
Mohit Soni,

Can you please provide me with the exact value in the hash function quality graph just for numbers dataset? It would be really helpful of you.

Thanks in advance @Peter Kankowski.

mirabilos,

Where’s the Wikipedia wordlist (or where does one get it from), and, more importantly, the OA test code?

I’m looking for a good OA spread/avalanche combo that’s cheap enough but doesn’t invoke UB or IB in C and is extremely portable (so it has to read by bytes, which makes a CRC surprisingly expensive, 2.39 to one-at-a-time’s 2.21 on my test borrowed windows system (don’t normally have one but your source is for it…))

Arash Partow,

Here are some interesting hash functions that can be added to your comparison suite:

 https://www.partow.net/programming/hashfunctions/index.html
Matteo Zapparoli,

I would like to present my latest research:

https://github.com/matteo65/ZedmeeHash a new hashing function with very interesting features

matteo allan@y combinator,

I would like to see this tried with sha-ni too

Your name:


Comment: