Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. Hashing in large databases is also left out from this article; the benchmark includes medium-size hash tables such as:
- symbol table in a parser,
- IP address table for filtering network traffic,
- the dictionary in a word counting program or a spellchecker.
There are two classes of the functions used in hash tables:
- multiplicative hash functions, which are simple and fast, but have a high number of collisions;
- more complex functions, which have better quality, but take more time to calculate.
Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.
The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?
Multiplicative hash functions
Any multiplicative hash function is a special case of the following algorithm:
UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
UINT hash = INITIAL_VALUE;
for(UINT i = 0; i < len; ++i)
hash = M * hash + key[i];
return hash % TABLE_SIZE;
}
(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.
A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:
INITIAL_VALUE * M^4 + 'T' * M^3 + 'O' * M^2 + 'N' * M + 'E'
Let's enter several similar strings and watch the output of the functions:
Bernstein Kernighan
(M=33) (M=31)
too b88af17 1c154
top b88af18 1c155
tor b88af1a 1c157
tpp b88af39 1c174
a000 7c9312d6 2cd22f
a001 7c9312d7 2cd230
a002 7c9312d8 2cd231
a003 7c9312d9 2cd232
a004 7c9312da 2cd233
a005 7c9312db 2cd234
a006 7c9312dc 2cd235
a007 7c9312dd 2cd236
a008 7c9312de 2cd237
a009 7c9312df 2cd238
a010 7c9312f7 2cd24e
a 2b606 61
aa 597727 c20
aaa b885c68 17841
Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.
Now let's compare top with tpp. Their hashes will be:
INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P' INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'
The hashes will be different by M * ('P' - 'O') = M. Similarly, when the first letters are different by x, their hashes will be different by x * M^2.
When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).
If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.
There are no exact rules for choosing the multiplier, only some heuristics:
- the multiplier should be large enough to accommodate most of the possible letters (e.g., 3 or 5 is too small);
- the multiplier should be fast to calculate with shifts and additions [e.g., 33 * hash can be calculated as (hash << 5) + hash];
- the multiplier should be odd for the reason explained above;
- prime numbers are good multipliers.
Complex hash functions
These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:
Paul Hsieh One At Time too 3ad11d33 3a9fad1e top 78b5a877 4c5dd09a tor c09e2021 f2aa9d35 tpp 3058996d d5e9e480 a000 7552599f ed3859d8 a001 3cc1d896 fef7fd57 a002 c6ff5c9b 08a610b3 a003 dcab7b0c 1a88b478 a004 780c7202 3621ebaa a005 7eb63e3a 47db8f1d a006 6b0a7a17 b901717b a007 cb5cb1ab caec1550 a008 5c2a15c0 e58d4a92 a009 33339829 f75aee2d a010 eb1f336e bd097a6b a 115ea782 ca2e9442 aa 008ad357 7081738e aaa 7dfdc310 ae4f22ec
To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?
Test conditions
The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on Pentium-M and Core i5 processors.
The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:
- the list of common words from Wiktionary (500 items);
- the list of Win32 functions from Colorer syntax highlight scheme (1992 items);
- 500 names from a000 to a499 (imitates the names in auto-generated source code);
- the list of common words with a long prefix and postfix;
- all variable names from WordPress 2.3.2 source code in wp-includes folder (1842 names);
- list of all words in Sonnets by W. Shakespeare (imitates a word counting program; 3228 words);
- list of all words in La Peau de chagrin by Balzac (in French, UTF-8 encoding);
- search engine IP addresses (binary).
Results
Pentium-M processor
| Words | Win32 | Numbers | Prefix | Postfix | Variables | Sonnets | UTF-8 | IPv4 | Avg | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jesteress | 78 | [110] | 418 | [397] | 59 | [300] | 119 | [102] | 114 | [106] | 353 | [366] | 513 | [585] | 2420 | [2427] | 439 | [1499] | 1.02 | [3.19] |
| Meiyan | 80 | [102] | 425 | [409] | 56 | [125] | 123 | [106] | 119 | [112] | 354 | [350] | 524 | [588] | 2445 | [2377] | 445 | [768] | 1.03 | [1.86] |
| Novak unrolled | 91 | [113] | 517 | [399] | 56 | [90] | 168 | [118] | 163 | [113] | 399 | [342] | 574 | [581] | 2718 | [2430] | 482 | [969] | 1.20 | [1.67] |
| Fletcher | 83 | [131] | 443 | [406] | 102 | [460] | 139 | [127] | 132 | [108] | 375 | [507] | 593 | [1052] | 2901 | [4893] | 514 | [1359] | 1.23 | [4.60] |
| SBox | 88 | [91] | 551 | [431] | 57 | [116] | 181 | [108] | 177 | [91] | 412 | [347] | 559 | [526] | 2816 | [2442] | 472 | [836] | 1.24 | [1.77] |
| Murmur2 | 96 | [103] | 530 | [415] | 64 | [104] | 165 | [106] | 161 | [111] | 433 | [383] | 619 | [566] | 2928 | [2399] | 537 | [834] | 1.27 | [1.73] |
| CRC-32 | 90 | [101] | 564 | [426] | 56 | [64] | 196 | [107] | 190 | [94] | 427 | [338] | 590 | [563] | 2836 | [2400] | 469 | [725] | 1.28 | [1.40] |
| x17 unrolled | 93 | [109] | 592 | [415] | 52 | [24] | 214 | [113] | 207 | [102] | 433 | [368] | 592 | [589] | 2864 | [2392] | 486 | [829] | 1.33 | [1.18] |
| lookup3 | 95 | [101] | 566 | [412] | 71 | [97] | 189 | [101] | 181 | [95] | 436 | [361] | 634 | [550] | 2949 | [2392] | 570 | [834] | 1.35 | [1.64] |
| K&R | 96 | [106] | 619 | [437] | 58 | [288] | 221 | [94] | 218 | [106] | 447 | [360] | 604 | [561] | 3021 | [2365] | 448 | [831] | 1.37 | [2.99] |
| Paul Larson | 95 | [99] | 630 | [416] | 50 | [16] | 232 | [99] | 227 | [105] | 455 | [366] | 600 | [583] | 3024 | [2447] | 469 | [755] | 1.38 | [1.09] |
| Paul Hsieh | 105 | [114] | 574 | [420] | 71 | [118] | 182 | [101] | 177 | [100] | 456 | [341] | 680 | [600] | 3155 | [2380] | 579 | [847] | 1.38 | [1.82] |
| Bernstein | 95 | [114] | 621 | [412] | 61 | [288] | 225 | [100] | 221 | [102] | 445 | [353] | 593 | [572] | 3002 | [2380] | 469 | [703] | 1.39 | [2.98] |
| x65599 | 99 | [111] | 630 | [382] | 60 | [203] | 234 | [107] | 231 | [122] | 459 | [379] | 616 | [560] | 3073 | [2373] | 472 | [846] | 1.42 | [2.44] |
| Sedgewick | 101 | [107] | 666 | [414] | 52 | [48] | 245 | [103] | 241 | [103] | 478 | [348] | 630 | [570] | 3204 | [2437] | 476 | [782] | 1.45 | [1.32] |
| Murmur2A | 113 | [114] | 597 | [433] | 79 | [102] | 181 | [112] | 176 | [109] | 490 | [365] | 719 | [544] | 3383 | [2369] | 650 | [772] | 1.47 | [1.72] |
| FNV-1a | 101 | [124] | 658 | [428] | 62 | [108] | 239 | [94] | 236 | [105] | 472 | [374] | 624 | [555] | 3136 | [2446] | 517 | [807] | 1.47 | [1.76] |
| MaPrime2c | 108 | [103] | 705 | [426] | 65 | [106] | 255 | [91] | 253 | [106] | 509 | [349] | 674 | [550] | 3413 | [2406] | 541 | [865] | 1.57 | [1.72] |
| Ramakrishna | 108 | [108] | 726 | [409] | 78 | [91] | 277 | [125] | 271 | [103] | 510 | [360] | 662 | [528] | 3376 | [2383] | 517 | [840] | 1.63 | [1.65] |
| Arash Partow | 106 | [101] | 740 | [435] | 93 | [420] | 281 | [98] | 275 | [85] | 515 | [355] | 672 | [570] | 3344 | [2372] | 542 | [779] | 1.68 | [3.87] |
| One At Time | 118 | [105] | 832 | [421] | 81 | [110] | 319 | [97] | 315 | [103] | 577 | [364] | 741 | [545] | 3807 | [2346] | 657 | [795] | 1.86 | [1.74] |
| Weinberger | 120 | [104] | 958 | [422] | 55 | [100] | 377 | [111] | 379 | [117] | 617 | [364] | 747 | [712] | 3994 | [2547] | 569 | [744] | 1.95 | [1.74] |
| Hanson | 86 | [118] | 531 | [649] | 55 | [112] | 167 | [118] | 1644 | [499] | 393 | [435] | 550 | [592] | 2745 | [2890] | 463 | [833] | 2.63 | [2.44] |
Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Avg column contains the average normalized execution time (and the number of collisions).
The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).
As you can see from the table, the function with the lowest number of collisions is not always the fastest one.
Results on a large data set (list of all words in English Wikipedia, 12.5 million words, from the benchmark by Georgi 'Sanmayce'):
Core i5 processor
| Wikipedia | Avg | |||
|---|---|---|---|---|
| iSCSI CRC | 5686064 | [2077725] | 1.00 | [1.00] |
| Jesteress | 5757994 | [2121868] | 1.01 | [1.02] |
| Meiyan | 5781950 | [2111271] | 1.02 | [1.02] |
| Murmur2 | 6255797 | [2081476] | 1.10 | [1.00] |
| Paul Larson | 6305466 | [2080111] | 1.11 | [1.00] |
| x65599 | 6410172 | [2102893] | 1.13 | [1.01] |
| FNV-1a | 6581385 | [2081195] | 1.16 | [1.00] |
| Hanson | 6835921 | [2129832] | 1.20 | [1.03] |
| CRC-32 | 6927378 | [2075088] | 1.22 | [1.00] |
| Sedgewick | 7012507 | [2080640] | 1.23 | [1.00] |
| K&R | 7057453 | [2083145] | 1.24 | [1.00] |
| SBox | 7142103 | [2084018] | 1.26 | [1.00] |
| Bernstein | 7189250 | [2074237] | 1.26 | [1.00] |
| Murmur2A | 7305910 | [2081370] | 1.28 | [1.00] |
| lookup3 | 7285984 | [2084889] | 1.28 | [1.01] |
| Paul Hsieh | 7335526 | [2180206] | 1.29 | [1.05] |
| x17 unrolled | 7351758 | [2410605] | 1.29 | [1.16] |
| Ramakrishna | 8133363 | [2093253] | 1.43 | [1.01] |
| One At Time | 8271113 | [2087861] | 1.45 | [1.01] |
| MaPrime2c | 8365788 | [2084467] | 1.47 | [1.00] |
| Arash Partow | 8437500 | [2084572] | 1.48 | [1.00] |
| Weinberger | 9344561 | [3541181] | 1.64 | [1.71] |
| Novak unrolled | 21137740 | [6318611] | 3.72 | [3.05] |
| Fletcher | 22100750 | [9063797] | 3.89 | [4.37] |
Pentium-M processor
| Wikipedia | Avg | |||
|---|---|---|---|---|
| x17 unrolled | 11216841 | [2410605] | 1.00 | [1.16] |
| K&R | 11613260 | [2083145] | 1.04 | [1.00] |
| Bernstein | 11660386 | [2074237] | 1.04 | [1.00] |
| Paul Larson | 11794370 | [2080111] | 1.05 | [1.00] |
| Sedgewick | 12029681 | [2080640] | 1.07 | [1.00] |
| x65599 | 12029638 | [2102893] | 1.07 | [1.01] |
| Arash Partow | 12164940 | [2084572] | 1.08 | [1.00] |
| Ramakrishna | 12099292 | [2093253] | 1.08 | [1.01] |
| Jesteress | 12160339 | [2121868] | 1.08 | [1.02] |
| Meiyan | 12163372 | [2111271] | 1.08 | [1.02] |
| CRC-32 | 12496534 | [2075088] | 1.11 | [1.00] |
| Murmur2 | 12577192 | [2081476] | 1.12 | [1.00] |
| Hanson | 12532500 | [2129832] | 1.12 | [1.03] |
| SBox | 12626848 | [2084018] | 1.13 | [1.00] |
| lookup3 | 12697932 | [2084889] | 1.13 | [1.01] |
| FNV-1a | 12760139 | [2081195] | 1.14 | [1.00] |
| Paul Hsieh | 12892492 | [2180206] | 1.15 | [1.05] |
| Murmur2A | 12994766 | [2081370] | 1.16 | [1.00] |
| MaPrime2c | 13266856 | [2084467] | 1.18 | [1.00] |
| One At Time | 13575868 | [2087861] | 1.21 | [1.01] |
| Weinberger | 14512944 | [3541181] | 1.29 | [1.71] |
| Fletcher | 36966422 | [9063797] | 3.30 | [4.37] |
| Novak unrolled | 37239016 | [6318611] | 3.32 | [3.05] |
Some functions were excluded from the benchmark because of very bad performance:
The number of collisions depending on the hash table size (for the same data set, thanks to Ace for the idea):
Red Dragon Book proposes the following formula for evaluating hash function quality:
where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value. The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, which is not bad.
Here are the results for some of our functions:
Conclusion
Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (words and sonnets tests).
For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.
Novak showed bad results on the large data set. Jesteress has a high number of collisions in numbers test.
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Download the source code (152 KB, MSVC++)
Variations
XORing high and low part
For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:
return hash ^ (hash >> 16);
Subtracting a constant
x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.
Using larger multipliers for a compiler
Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.
The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).
Setting hash table size to a prime number
A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:
| Words | Win32 | Numbers | Prefix | Postfix | Variables | Shakespeare | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bernstein % 2K | 145 | [261] | 880 | [889] | 426 | [8030] | 326 | [214] | 316 | [226] | 649 | [697] | 874 | [1131] |
| Bernstein % prime | 186 | [221] | 1049 | [995] | 445 | [5621] | 364 | [194] | 357 | [217] | 805 | [800] | 1123 | [1051] |
| Bernstein optimized mod | 160 | [221] | 960 | [995] | 416 | [5621] | 341 | [194] | 334 | [217] | 722 | [800] | 969 | [1051] |
| x17 % 2K | 137 | [193] | 847 | [1002] | 81 | [340] | 314 | [244] | 300 | [228] | 641 | [863] | 832 | [1012] |
| x17 % prime | 173 | [256] | 1010 | [1026] | 104 | [324] | 356 | [246] | 339 | [216] | 760 | [760] | 1046 | [1064] |
| x17 optimized mod | 155 | [256] | 915 | [1026] | 96 | [324] | 330 | [246] | 315 | [216] | 691 | [760] | 930 | [1064] |
Implementing open addressing vs. separate chaining
With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:
| Bernst. | K&R | x17 unroll | x65599 | FNV | Univ | Weinb. | Hsieh | One-at | Lookup3 | Partow | CRC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OA | 426 | 81 | 84 | 207 | 88 | 91 | 273 | 110 | 103 | 92 | 1042 | 79 |
| [8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
| 32-bit | 179 | 69 | 74 | 114 | 86 | 80 | 125 | 105 | 99 | 92 | 347 | 82 |
| [8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
| chain | 92 | 68 | 73 | 82 | 88 | 84 | 73 | 107 | 99 | 95 | 149 | 84 |
| [500] | [500] | [24] | [258] | [124] | [48] | [100] | [138] | [131] | [108] | [1530] | [64] |
You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.
Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.

165 comments
Ten recent comments are shown below. Show all comments
Mantis source and Benchmark-Dumps also (http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z) (224,916,121 bytes) fixed now,
here is the bugless Mantis:
define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; const char *p = str; // Cases: 0,1,2,3,4,5,6,7 if (wrdlen & sizeof(DWORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(DWORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(WORD); } if (wrdlen & 1) { hash32 = (hash32 ^ *p) * PRIME; p += sizeof(char); //wrdlen -= sizeof(char); } wrdlen -= p-str; // The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress. // FNV1A_Jesteress: fastest strong // FNV1A_Meiyan : faster stronger // FNV1A_Mantis : fast strongest if (wrdlen) { for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME; } hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME; } // Bug Fixed! return hash32 ^ (hash32 >> 16); }Mantis results on Pentium M:
Core i5:
Generally, Mantis has similar number of collisions to Meiyan, but Mantis is slower.
Thanks for testing,
at link below is my attempt to present Heavy-Hash-Hustle dumps in a more digestible fashion:
http://www.sanmayce.com/Fastest_Hash/index.html#Heavy-Hash-Hustle
Intel Core 2 Quad Q9550S Yorkfield 2.83GHz 12MB L2 Cache:
8388608 elements in the table (23 bits)
2995394 lines read
FNV1A_Meiyan: 1914155 1914404 1914122 1914486 1913915| 1913915 [ 593723] FNV1A_Jesteress: 1935195 1937318 1937469 1936312 1935495| 1935195 [ 691369] Alfalfa_Rollick: 1980953 1981207 1980671 1981201 1981856| 1980671 [ 604098] FNV1A_Mantis: 2016612 1995406 1993654 1993662 1993258| 1993258 [ 481137] Novak unrolled: 2010097 2009996 2009829 2010971 2009834| 2009829 [ 657377] Hanson: 2012466 2012191 2012537 2012350 2012236| 2012191 [ 534251] FNV1A_Smaragd: 2030879 2028917 2029161 2029013 2028666| 2028666 [ 480914] CRC-32: 2034011 2033842 2035153 2033904 2034696| 2033842 [ 472854] FNV1A_Jester: 2050708 2050776 2050694 2050668 2051040| 2050668 [ 689339] K&R: 2064544 2064406 2065423 2065249 2065812| 2064406 [ 474011] Murmur2: 2066808 2066666 2066382 2066497 2066478| 2066382 [ 476330] Alfalfa: 2068149 2067824 2068182 2068081 2067452| 2067452 [ 475434] Alfalfa_HALF: 2071077 2071411 2071349 2070671 2071446| 2070671 [ 480071] Alfalfa_DWORD: 2081631 2081774 2081485 2081846 2081789| 2081485 [ 475434] FNV1A_Peregrine: 2098860 2098795 2098753 2098718 2098887| 2098718 [ 546915] x17 unrolled: 2112134 2114144 2112556 2112551 2112255| 2112134 [ 475528] Paul Larson: 2121042 2121087 2120977 2120391 2121282| 2120391 [ 475575] FNV1A_Whiz: 2127393 2127963 2127276 2126858 2127638| 2126858 [ 689339] SBox: 2137206 2137208 2137045 2136962 2136563| 2136563 [ 476681] Sedgewick: 2137897 2138151 2137352 2138251 2137971| 2137352 [ 477931] Bernstein: 2165058 2164180 2164633 2164803 2164633| 2164180 [ 474048] FNV-1a: 2169036 2169173 2168907 2168979 2168931| 2168907 [ 477067] FNV1A_Nefertiti: 2184810 2184848 2184522 2184738 2185078| 2184522 [ 763451] Alfalfa_QWORD: 2190122 2189991 2189488 2189910 2189894| 2189488 [ 475434] Paul Hsieh: 2196409 2195461 2195511 2195236 2195676| 2195236 [ 543835] Weinberger: 2203173 2203305 2202832 2202739 2202064| 2202064 [ 1159267] Sixtinsensitive: 2225863 2225928 2226657 2226719 2226205| 2225863 [ 582793] Arash Partow: 2241218 2241227 2241293 2240949 2240652| 2240652 [ 478246] lookup3: 2246638 2246955 2247440 2247833 2247080| 2246638 [ 476566] MaPrime2c: 2250515 2250361 2251148 2251678 2252040| 2250361 [ 477151] Ramakrishna: 2268889 2268536 2268740 2268861 2268980| 2268536 [ 476020] Sixtinsensitive+: 2272421 2272439 2272689 2272420 2272411| 2272411 [ 716367] x65599: 2341902 2342241 2341970 2341930 2341731| 2341731 [ 654463] One At Time: 2360163 2360112 2360203 2359913 2359343| 2359343 [ 477667] Fletcher: 24424563 24401091 24400993 24395252 24394052| 24394052 [ 2856890]Intel Atom N450 1.66GHz 512KB L2 Cache:
8388608 elements in the table (23 bits)
2995394 lines read
FNV1A_Meiyan: 3155958 3159537 3157158 3158773 3156719| 3155958 [ 593723] Weinberger: 3248750 3246228 3248741 3249811 3243717| 3243717 [ 1159267] FNV1A_Jesteress: 3253953 3256344 3264163 3254169 3254820| 3253953 [ 691369] FNV1A_Mantis: 3314496 3281163 3284368 3283819 3281818| 3281163 [ 481137] CRC-32: 3343937 3345737 3356535 3347699 3345348| 3343937 [ 472854] FNV1A_Peregrine: 3376694 3378179 3386379 3379572 3377683| 3376694 [ 546915] Alfalfa_HALF: 3379134 3382696 3380707 3377835 3392481| 3377835 [ 480071] FNV1A_Smaragd: 3379196 3382352 3380635 3379201 3379668| 3379196 [ 480914] lookup3: 3424892 3430226 3426356 3421977 3426662| 3421977 [ 476566] Murmur2: 3430503 3430565 3429001 3429150 3427817| 3427817 [ 476330] Hanson: 3433796 3434865 3435452 3435932 3432672| 3432672 [ 534251] Paul Hsieh: 3435899 3433042 3435868 3436050 3436606| 3433042 [ 543835] Alfalfa_Rollick: 3465457 3459388 3456972 3458956 3458066| 3456972 [ 604098] Bernstein: 3487977 3491575 3494922 3486510 3487834| 3486510 [ 474048] FNV1A_Jester: 3504276 3504202 3510988 3505340 3503647| 3503647 [ 689339] Sixtinsensitive: 3515471 3519793 3523615 3517541 3519258| 3515471 [ 582793] FNV1A_Whiz: 3533310 3534248 3531524 3534347 3531857| 3531524 [ 689339] x17 unrolled: 3542761 3543107 3541781 3540489 3542713| 3540489 [ 475528] Arash Partow: 3542980 3542590 3542073 3542242 3545691| 3542073 [ 478246] Sixtinsensitive+: 3546786 3546044 3546339 3546185 3544348| 3544348 [ 716367] K&R: 3564368 3562589 3565386 3565243 3563323| 3562589 [ 474011] FNV-1a: 3576419 3588693 3579410 3576049 3579704| 3576049 [ 477067] Novak unrolled: 3595877 3582905 3580473 3580509 3582652| 3580473 [ 657377] Alfalfa: 3583829 3584782 3582429 3584607 3581127| 3581127 [ 475434] FNV1A_Nefertiti: 3593590 3588775 3591870 3593603 3590078| 3588775 [ 763451] Alfalfa_DWORD: 3622713 3620781 3622630 3622695 3626486| 3620781 [ 475434] Ramakrishna: 3651378 3652073 3652729 3652338 3651156| 3651156 [ 476020] Sedgewick: 3661915 3656998 3656072 3656996 3662433| 3656072 [ 477931] Paul Larson: 3706139 3694095 3692079 3694297 3694153| 3692079 [ 475575] Alfalfa_QWORD: 3696531 3702283 3701765 3703357 3699967| 3696531 [ 475434] SBox: 3785574 3779069 3780873 3782789 3785080| 3779069 [ 476681] One At Time: 3955247 3954321 3955469 3963892 3953817| 3953817 [ 477667] x65599: 3998490 4006600 3997098 3994519 3996586| 3994519 [ 654463] MaPrime2c: 4191055 4193915 4193048 4204844 4193416| 4191055 [ 477151] Fletcher: 77210862 77226344 77294433 77235843 77205175| 77205175 [ 2856890]My wish is to cover all meaningful(at least for LZ) lengths that is 3..66 bytes but a different approach must be commenced because of HUGE size of dataset:
Speaking of very precious(regarding English language usage, and original thoughts used, also hundreds of books included) OSHO.TXT I propose one simple way of achieving Building-Blocks hashing:
loading 197MB(the file itself) and hashing(3..66 chunks) at each position(i.e. one byte increment).
Another thing I want to share regarding collision managing:
approaches(rehashing, chains, ...) without definite goals i.e. context are like kata(detailed choreographed patterns of movements)(the real fight is an extension/mix of kata/techniques with complex timing which includes awareness of timings of outer things not just your own timing), I mean if enough(free RAM) resources are given not utilizing/exploiting them and talking about speed as this-and-that is a dead-end.
For example I tested(now commented) a FNV1A variant hash function in Leprechaun which outperforms(hash time + lookup time) FNV1A_Jesteress by (1,8??,???-1,6??,???)/1,6??,???*100%= 12.5%, but it is completely due to B-tree used as collision manager at final stage.
This very hasher performs not well while other techniques are used, though.
The point is, speed is something beyond all limitations imposed, it must be chased for each niche relentlessly.
One phenom in real world is Mr. Bolt: his fantastic technique is being constantly improved, or as he says in one interview he and his trainer work on even faster than one of the fastest starts in 100M races. It's just amazing the tallest sprinter to have one of most explosive starts as well. And even more amazing is the will for improvements.
I have a sambo practitioner buddy who had said about his 100/200M records: "What technique? It is just left-right left-right!"
Of course I did disagree. Neglecting the basic/fundamental stances leads to nasty future slips(a kind of 'O! What happened' i.e. lack of further deep-understanding/technique-improvement).
I fully agree with:
http://cbloomrants.blogspot.com/2010/11/11-29-10-useless-hash-test.html
Looking at my 3chars..12chars Building-Block test I see the strong candidate for A future ultimate testbed.
Hi Peter,
pre-yesterday a documentary movie on History Channel about Nicola Tesla (an outstanding man not only a pragmatic visionary) inspired me to tune an almost forgotten hasher.
Here comes FNV1A_Tesla: a suitable hasher for keys [much] longer than 15 bytes (the case of 3+grams phrases).
That is the very (with the 64bitx32bit->32bit multiplication and loss of carry) hash function I was talking about previously.
In all tests, below, FNV1A_Tesla outspeeds all my FNV1A variants.
Surprisingly the bad collision rate doesn't affect its speed, I have been hit [again] by the fact that the brutal loss of data doesn't (when keys are not with ONLY weak-range lengths) hurt the lookup.
Here the speed/dispersion trade-off was made in favor of speed of course.
The function itself:
//#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Tesla(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; //unsigned long long hash64 = 2166136261; // Change with a bigger one! const char *p = str; //unsigned long long QWORD1,QWORD2; //64bit=QWORD for(; wrdlen >= 2*2*sizeof(DWORD); wrdlen -= 2*2*sizeof(DWORD), p += 2*2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(unsigned long long *)(p+0),5-0)^*(unsigned long long *)(p+8))) * PRIME; // loss of carry! //hash64 = (hash64 ^ (ROL(QWORD1,5-0)^QWORD2)) * PRIME; //hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME; //hash32 = (hash32 ^ (ROL(*(DWORD *)(p+8),5-0)^*(DWORD *)(p+12))) * PRIME; } //hash32 = hash64 ^ (hash64 >> 32); // Cases: 0,1,2,3,4,5,6,7,... 15 if (wrdlen & (2*sizeof(DWORD))) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME; //hash32 = (hash32 ^ *(DWORD*)p) * PRIME; //hash32 = (hash32 ^ *(DWORD*)(p+4)) * PRIME; p += 2*sizeof(DWORD); } if (wrdlen & sizeof(DWORD)) { hash32 = (hash32 ^ *(DWORD*)p) * PRIME; p += sizeof(DWORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); } if (wrdlen & 1) hash32 = (hash32 ^ *p) * PRIME; return hash32 ^ (hash32 >> 16); }I am curious what amendments can be done. It is revision 1.
My 64bit knowledge/experience is next to nothing, so it would be nice somebody to refine it especially for 64bit compilers.
Some tests (on my Intel Merom 2.16GHz, Windows XP 32bit, VS2008 32bit compiler):
Volume in drive D is H320_Vol5
Volume Serial Number is 0CB3-C881
Directory of D:\_KAZE_new-stuff\VivaNicolaTesla 03/16/2011 07:54 AM <DIR> .. 03/16/2011 07:54 AM <DIR> . 03/16/2011 07:39 AM 218,698 hash.cod 03/16/2011 07:39 AM 65,440 hash.cpp 03/16/2011 07:39 AM 87,552 hash.exe 03/16/2011 07:39 AM 8,390 BuildLog.htm 11/14/2010 02:39 PM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv 12/03/2010 07:30 AM 42,892,307 IPS.TXT 11/14/2010 02:39 PM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt 03/15/2011 12:10 PM 104,857,601 100MB_as_one_line.TXT 03/16/2011 07:54 AM 409,829,386 googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt 11/14/2010 02:39 PM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd 11/14/2010 02:39 PM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd 11/14/2010 02:39 PM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd 11/14/2010 02:39 PM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd 13 File(s) 998,706,809 bytes 2 Dir(s) 2,947,858,432 bytes free D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"22202980 lines read
67108864 elements in the table (26 bits)
FNV1A_Hash_Tesla: 23890797 23614936 23680579 23684698 23606344| 23606344 [ 3457538] FNV1A_Mantis: 24848068 24866965 24863372 25003541 24859437| 24848068 [ 3298270] FNV1A_Meiyan: 23836095 23832986 23858019 23818340 23992756| 23818340 [ 3345260] FNV1A_Jesteress: 23737579 23756975 23731544 23743013 23743112| 23731544 [ 3355676] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"12561874 lines read
33554432 elements in the table (25 bits)
FNV1A_Hash_Tesla: 12464491 12317094 12331733 12323774 12346488| 12317094 [ 2141464] FNV1A_Mantis: 12946943 12933482 12932442 12942204 13009030| 12932442 [ 2082213] FNV1A_Meiyan: 12583824 12417170 12440549 12465760 12441958| 12417170 [ 2111271] FNV1A_Jesteress: 12388725 12369952 12378952 12377230 12380569| 12369952 [ 2121868] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd"351114 lines read
1048576 elements in the table (20 bits)
FNV1A_Hash_Tesla: 252801 238573 234905 233660 237413| 233660 [ 53107] FNV1A_Mantis: 253479 251841 249576 250135 254282| 249576 [ 52712] FNV1A_Meiyan: 234582 238797 235336 239111 238004| 234582 [ 52910] FNV1A_Jesteress: 235985 236268 234515 236577 234398| 234398 [ 52684] FNV1A_Jester: 236458 237823^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd"38936 lines read
131072 elements in the table (17 bits)
FNV1A_Hash_Tesla: 9787 9429 9419 9397 9567| 9397 [ 5176] FNV1A_Mantis: 10181 10205 10302 12326 11283| 10181 [ 5185] FNV1A_Meiyan: 9591 9563 9530 9493 9603| 9493 [ 5224] FNV1A_Jesteress: 15021 10163 9524 9533 9476| 9476 [ 5182] FNV1A_Jester: 9637 9482 9586 9580 9588| 9482 [ 5200] FNV1A_Smaragd: 11148 10041 10030 10048 10077| 10030 [ 5194] FNV1A_Peregrine: 10117 10179 9974 9968 10116| 9968 [ 5277] FNV1A_Whiz: 9616 9877 9679 9614 10118| 9614 [ 5200] FNV1A_Nefertiti: 10428 9939 9870 9925 10214| 9870 [ 5381] FNV-1a: 11328 11370 11388 11335 11216| 11216 [ 5321] Sixtinsensitive+: 10182 10274 9987 10234 10065| 9987 [ 5209] Sixtinsensitive: 10847 10800 10723 12666 10626| 10626 [ 5347] Alfalfa_Rollick: 10855 10116 10006 10077 10019| 10006 [ 5242] Alfalfa: 10422 10409 10427 10586 10946| 10409 [ 5252] Alfalfa_HALF: 10676 10529 10602 10584 10553| 10529 [ 5231] Alfalfa_DWORD: 10679 10883 11059 11093 11386| 10679 [ 5252] Alfalfa_QWORD: 10759 10692 10777 10792 10745| 10692 [ 5252] Bernstein: 11311^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv"584879 lines read
2097152 elements in the table (21 bits)
FNV1A_Hash_Tesla: 393365 373464 374534 375388 373557| 373464 [ 81232] FNV1A_Mantis: 412156 414563 411944 414678 415841| 411944 [ 74643] FNV1A_Meiyan: 383821 387854 387760 390291 387271| 383821 [ 75377] FNV1A_Jesteress: 382663 383224 381564 380962 382974| 380962 [ 75404] FNV1A_Jester: 384581^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt"32359 lines read
65536 elements in the table (16 bits)
FNV1A_Hash_Tesla: 27520 26722 27432 28257 26136| 26136 [ 6937] FNV1A_Mantis: 28386 28415 28554 28248 27951| 27951 [ 6925] FNV1A_Meiyan: 27598 27359 27334 27319 27338| 27319 [ 6897] FNV1A_Jesteress: 27888 27308 27483 27310 27274| 27274 [ 6883] FNV1A_Jester: 31040 31119 31379 30732 30596| 30596 [ 6874] FNV1A_Smaragd: 40554 40719 42560 40805 40459| 40459 [ 6849] FNV1A_Peregrine: 30687 30474 30474 31499 32991| 30474 [ 6838] FNV1A_Whiz: 32561 31988 31396 31424 31415| 31396 [ 6874] FNV1A_Nefertiti: 30534 30860 30331 30011 30143| 30011 [ 6878] FNV-1a: 60627 60952 61696 61693 61678| 60627 [ 6840] Sixtinsensitive+: 35231 35090 35459 35186 37219| 35090 [ 6839] Sixtinsensitive: 38508^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash 100MB_as_one_line.TXT1 lines read
4 elements in the table (2 bits)
FNV1A_Hash_Tesla: 194019 197848 195058 193316 193382| 193316 [ 0] FNV1A_Mantis: 234425 236411 236747 234573 234684| 234425 [ 0] FNV1A_Meiyan: 240160 240099 242438 240194 242915| 240099 [ 0] FNV1A_Jesteress: 243338 241873 239935 239735 239124| 239124 [ 0] FNV1A_Jester: 331840^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash IPS.TXT2995394 lines read
8388608 elements in the table (23 bits)
FNV1A_Hash_Tesla: 2289107 2226568 2226390 2234596 2222701| 2222701 [ 691369] FNV1A_Mantis: 2469897 2466911 2466878 2471249 2465728| 2465728 [ 481137] FNV1A_Meiyan: 2290118 2285787 2284061 2291056 2286210| 2284061 [ 593723] FNV1A_Jesteress: 2331767 2324661 2327224 2326173 2325028| 2324661 [ 691369] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt17981107 lines read
67108864 elements in the table (26 bits)
FNV1A_Hash_Tesla: 19108584 18921663 19047468 18902535 18913730| 18902535 [ 4218589] FNV1A_Mantis: 20408048 20413041 20428172 20421550 20570044| 20408048 [ 2208686] FNV1A_Meiyan: 19589642 19586573 19584081 19590689 19588464| 19584081 [ 2209364] FNV1A_Jesteress: 19482570 19662021 19476235 19490673 19499351| 19476235 [ 2208081] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>type googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt ... a_bacillus_and_a a_bacillus_belonging_to a_bacillus_closely_related a_bacillus_closely_resembling a_bacillus_described_by a_bacillus_discovered_by a_bacillus_found_in a_bacillus_from_the a_bacillus_has_been a_bacillus_identical_with a_bacillus_in_the a_bacillus_isolated_from a_bacillus_known_as a_bacillus_obtained_from a_bacillus_of_the a_bacillus_or_a a_bacillus_resembling_that a_bacillus_resembling_the a_bacillus_similar_to a_bacillus_that_is a_bacillus_to_which a_bacillus_which_has a_bacillus_which_he a_bacillus_which_is a_bacillus_which_may a_bacillus_which_they a_bacillus_which_was a_bacillus_whose_growth a_bacillus_with_rounded a_back_alley_and a_back_alley_behind a_back_alley_in a_back_alley_of a_back_alley_off a_back_alley_or a_back_alley_somewhere a_back_alley_that a_back_alley_to a_back_alley_where a_back_alley_with ... D:\_KAZE_new-stuff\VivaNicolaTesla>Regards
I am hunting for an extremely fast integer->integer hashing method for working with a large array of hashtables, in particular, where there are a high number of key (re)inserts, key (re)deletions, and value (re)updates within each hashtable, as large volumes of data are processed. Currently writing in C, but open to inlining assembly if it offers nice gains.
By the will of the hash(ing) gods ... show me the way!
Testing avalanche on integer hash functions
http://baagoe.org/en/wiki/Avalanche_on_integer_hash_functions
I'd love to use some hashes in PHP but not have to enable or install an extension to do so, naturally speed and efficiency would suffer, but the "portability" of the code makes the trade off worth it for my needs. I'd love to have lookup3/SuperFastHash ported to a php function, even One-At-A-Time would be great!