Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. Hashing in large databases is also left out from this article; the benchmark includes medium-size hash tables such as:
- symbol table in a parser,
- IP address table for filtering network traffic,
- the dictionary in a word counting program or a spellchecker.
There are two classes of the functions used in hash tables:
- multiplicative hash functions, which are simple and fast, but have a high number of collisions;
- more complex functions, which have better quality, but take more time to calculate.
Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.
The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?
Multiplicative hash functions
Any multiplicative hash function is a special case of the following algorithm:
UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
UINT hash = INITIAL_VALUE;
for(UINT i = 0; i < len; ++i)
hash = M * hash + key[i];
return hash % TABLE_SIZE;
}
(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.
A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:
INITIAL_VALUE * M^4 + 'T' * M^3 + 'O' * M^2 + 'N' * M + 'E'
Let's enter several similar strings and watch the output of the functions:
Bernstein Kernighan
(M=33) (M=31)
too b88af17 1c154
top b88af18 1c155
tor b88af1a 1c157
tpp b88af39 1c174
a000 7c9312d6 2cd22f
a001 7c9312d7 2cd230
a002 7c9312d8 2cd231
a003 7c9312d9 2cd232
a004 7c9312da 2cd233
a005 7c9312db 2cd234
a006 7c9312dc 2cd235
a007 7c9312dd 2cd236
a008 7c9312de 2cd237
a009 7c9312df 2cd238
a010 7c9312f7 2cd24e
a 2b606 61
aa 597727 c20
aaa b885c68 17841
Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.
Now let's compare top with tpp. Their hashes will be:
INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P' INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'
The hashes will be different by M * ('P' - 'O') = M. Similarly, when the first letters are different by x, their hashes will be different by x * M^2.
When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).
If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.
There are no exact rules for choosing the multiplier, only some heuristics:
- the multiplier should be large enough to accommodate most of the possible letters (e.g., 3 or 5 is too small);
- the multiplier should be fast to calculate with shifts and additions [e.g., 33 * hash can be calculated as (hash << 5) + hash];
- the multiplier should be odd for the reason explained above;
- prime numbers are good multipliers.
Complex hash functions
These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:
Paul Hsieh One At Time too 3ad11d33 3a9fad1e top 78b5a877 4c5dd09a tor c09e2021 f2aa9d35 tpp 3058996d d5e9e480 a000 7552599f ed3859d8 a001 3cc1d896 fef7fd57 a002 c6ff5c9b 08a610b3 a003 dcab7b0c 1a88b478 a004 780c7202 3621ebaa a005 7eb63e3a 47db8f1d a006 6b0a7a17 b901717b a007 cb5cb1ab caec1550 a008 5c2a15c0 e58d4a92 a009 33339829 f75aee2d a010 eb1f336e bd097a6b a 115ea782 ca2e9442 aa 008ad357 7081738e aaa 7dfdc310 ae4f22ec
To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?
Test conditions
The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on Pentium-M and Core i5 processors.
The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:
- the list of common words from Wiktionary (500 items);
- the list of Win32 functions from Colorer syntax highlight scheme (1992 items);
- 500 names from a000 to a499 (imitates the names in auto-generated source code);
- the list of common words with a long prefix and postfix;
- all variable names from WordPress 2.3.2 source code in wp-includes folder (1842 names);
- list of all words in Sonnets by W. Shakespeare (imitates a word counting program; 3228 words);
- list of all words in La Peau de chagrin by Balzac (in French, UTF-8 encoding);
- search engine IP addresses (binary).
Results
Core i5 processor
| Words | Win32 | Numbers | Prefix | Postfix | Variables | Sonnets | UTF-8 | IPv4 | Avg | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| iSCSI CRC | 65 | [105] | 329 | [415] | 36 | [112] | 84 | [106] | 83 | [92] | 280 | [368] | 408 | [584] | 1964 | [2388] | 322 | [838] | 1.01 | [1.78] |
| Meiyan | 64 | [102] | 328 | [409] | 45 | [125] | 87 | [106] | 85 | [112] | 274 | [350] | 411 | [588] | 1972 | [2377] | 353 | [768] | 1.05 | [1.87] |
| Murmur2 | 72 | [103] | 378 | [415] | 48 | [104] | 109 | [106] | 106 | [111] | 315 | [383] | 450 | [566] | 2183 | [2399] | 399 | [834] | 1.21 | [1.74] |
| XXHfast32 | 78 | [110] | 372 | [420] | 57 | [102] | 88 | [103] | 88 | [106] | 315 | [347] | 473 | [491] | 2323 | [2494] | 463 | [838] | 1.23 | [1.71] |
| SBox | 70 | [91] | 389 | [431] | 46 | [116] | 124 | [108] | 123 | [91] | 304 | [347] | 430 | [526] | 2182 | [2442] | 377 | [836] | 1.23 | [1.78] |
| Larson | 72 | [99] | 401 | [416] | 34 | [16] | 143 | [99] | 141 | [105] | 312 | [366] | 451 | [583] | 2230 | [2447] | 349 | [755] | 1.25 | [1.10] |
| XXHstrong32 | 79 | [109] | 385 | [429] | 58 | [102] | 93 | [102] | 92 | [112] | 321 | [355] | 474 | [491] | 2332 | [2496] | 464 | [838] | 1.25 | [1.72] |
| Sedgewick | 73 | [107] | 417 | [414] | 36 | [48] | 143 | [103] | 143 | [103] | 319 | [348] | 446 | [570] | 2246 | [2437] | 349 | [782] | 1.26 | [1.33] |
| Novak unrolled | 76 | [113] | 404 | [399] | 43 | [90] | 127 | [118] | 125 | [113] | 322 | [342] | 459 | [581] | 2284 | [2430] | 379 | [969] | 1.26 | [1.68] |
| CRC-32 | 70 | [101] | 429 | [426] | 40 | [64] | 146 | [107] | 143 | [94] | 320 | [338] | 443 | [563] | 2231 | [2400] | 357 | [725] | 1.28 | [1.41] |
| Murmur3 | 78 | [101] | 391 | [380] | 54 | [104] | 108 | [103] | 107 | [105] | 331 | [334] | 492 | [555] | 2360 | [2376] | 433 | [783] | 1.28 | [1.69] |
| x65599 | 74 | [111] | 407 | [382] | 45 | [203] | 144 | [107] | 144 | [122] | 316 | [379] | 449 | [560] | 2221 | [2373] | 349 | [846] | 1.29 | [2.45] |
| FNV-1a | 74 | [124] | 408 | [428] | 47 | [108] | 144 | [94] | 144 | [105] | 309 | [374] | 440 | [555] | 2193 | [2446] | 376 | [807] | 1.30 | [1.77] |
| Murmur2A | 79 | [114] | 410 | [433] | 53 | [102] | 117 | [112] | 114 | [109] | 337 | [365] | 494 | [544] | 2377 | [2369] | 429 | [772] | 1.31 | [1.73] |
| Fletcher | 71 | [131] | 352 | [406] | 80 | [460] | 104 | [127] | 100 | [108] | 312 | [507] | 481 | [1052] | 2477 | [4893] | 388 | [1359] | 1.31 | [4.62] |
| K&R | 73 | [106] | 429 | [437] | 47 | [288] | 149 | [94] | 149 | [106] | 324 | [360] | 450 | [561] | 2266 | [2365] | 343 | [831] | 1.32 | [3.00] |
| Paul Hsieh | 80 | [114] | 410 | [420] | 54 | [118] | 123 | [101] | 121 | [100] | 336 | [341] | 496 | [600] | 2351 | [2380] | 433 | [847] | 1.33 | [1.83] |
| Bernstein | 75 | [114] | 428 | [412] | 49 | [288] | 150 | [100] | 150 | [102] | 324 | [353] | 460 | [572] | 2312 | [2380] | 351 | [703] | 1.34 | [2.99] |
| x17 unrolled | 78 | [109] | 446 | [415] | 43 | [24] | 156 | [113] | 153 | [102] | 344 | [368] | 472 | [589] | 2361 | [2392] | 373 | [829] | 1.37 | [1.19] |
| lookup3 | 83 | [101] | 459 | [412] | 55 | [97] | 140 | [101] | 137 | [95] | 359 | [361] | 526 | [550] | 2480 | [2392] | 427 | [834] | 1.42 | [1.65] |
| MaPrime2c | 79 | [103] | 459 | [426] | 50 | [106] | 155 | [91] | 155 | [106] | 349 | [349] | 486 | [550] | 2493 | [2406] | 406 | [865] | 1.42 | [1.73] |
| Ramakrishna | 80 | [108] | 513 | [409] | 44 | [91] | 189 | [125] | 186 | [103] | 370 | [360] | 483 | [528] | 2565 | [2383] | 380 | [840] | 1.51 | [1.66] |
| One At Time | 85 | [105] | 562 | [421] | 58 | [110] | 221 | [97] | 220 | [103] | 392 | [364] | 511 | [545] | 2659 | [2346] | 459 | [795] | 1.72 | [1.75] |
| Arash Partow | 83 | [101] | 560 | [435] | 71 | [420] | 215 | [98] | 212 | [85] | 392 | [355] | 507 | [570] | 2638 | [2372] | 407 | [779] | 1.72 | [3.88] |
| Weinberger | 87 | [104] | 590 | [422] | 37 | [100] | 254 | [111] | 273 | [117] | 398 | [364] | 541 | [712] | 2734 | [2547] | 419 | [744] | 1.78 | [1.75] |
| Hanson | 73 | [118] | 417 | [649] | 45 | [112] | 123 | [118] | 1207 | [499] | 318 | [435] | 448 | [592] | 2324 | [2890] | 370 | [833] | 2.70 | [2.46] |
Pentium-M processor
| Words | Win32 | Numbers | Prefix | Postfix | Variables | Sonnets | UTF-8 | IPv4 | Avg | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Meiyan | 80 | [102] | 426 | [409] | 56 | [125] | 123 | [106] | 121 | [112] | 354 | [350] | 525 | [588] | 2443 | [2377] | 445 | [768] | 1.02 | [1.87] |
| Novak unrolled | 90 | [113] | 517 | [399] | 56 | [90] | 169 | [118] | 164 | [113] | 398 | [342] | 575 | [581] | 2716 | [2430] | 482 | [969] | 1.18 | [1.68] |
| Fletcher | 84 | [131] | 444 | [406] | 102 | [460] | 140 | [127] | 133 | [108] | 374 | [507] | 592 | [1052] | 2891 | [4893] | 513 | [1359] | 1.21 | [4.62] |
| SBox | 88 | [91] | 552 | [431] | 57 | [116] | 181 | [108] | 178 | [91] | 414 | [347] | 560 | [526] | 2814 | [2442] | 472 | [836] | 1.22 | [1.78] |
| Murmur2 | 97 | [103] | 532 | [415] | 65 | [104] | 165 | [106] | 162 | [111] | 434 | [383] | 622 | [566] | 2948 | [2399] | 537 | [834] | 1.25 | [1.74] |
| CRC-32 | 90 | [101] | 565 | [426] | 55 | [64] | 198 | [107] | 192 | [94] | 427 | [338] | 590 | [563] | 2842 | [2400] | 469 | [725] | 1.26 | [1.41] |
| x17 unrolled | 93 | [109] | 593 | [415] | 52 | [24] | 214 | [113] | 208 | [102] | 434 | [368] | 593 | [589] | 2867 | [2392] | 486 | [829] | 1.30 | [1.19] |
| lookup3 | 94 | [101] | 565 | [412] | 70 | [97] | 189 | [101] | 182 | [95] | 432 | [361] | 631 | [550] | 2943 | [2392] | 572 | [834] | 1.32 | [1.65] |
| K&R | 93 | [106] | 619 | [437] | 58 | [288] | 221 | [94] | 218 | [106] | 442 | [360] | 587 | [561] | 2961 | [2365] | 447 | [831] | 1.33 | [3.00] |
| Larson | 95 | [99] | 631 | [416] | 49 | [16] | 231 | [99] | 228 | [105] | 455 | [366] | 599 | [583] | 3027 | [2447] | 469 | [755] | 1.35 | [1.10] |
| XXHfast32 | 108 | [110] | 546 | [420] | 86 | [102] | 139 | [103] | 136 | [106] | 459 | [347] | 681 | [491] | 3259 | [2494] | 717 | [838] | 1.35 | [1.71] |
| Murmur3 | 108 | [101] | 561 | [380] | 74 | [104] | 167 | [103] | 165 | [105] | 468 | [334] | 700 | [555] | 3259 | [2376] | 604 | [783] | 1.36 | [1.69] |
| Bernstein | 97 | [114] | 622 | [412] | 61 | [288] | 225 | [100] | 222 | [102] | 448 | [353] | 609 | [572] | 3053 | [2380] | 469 | [703] | 1.37 | [2.99] |
| XXHstrong32 | 108 | [109] | 558 | [429] | 86 | [102] | 150 | [102] | 147 | [112] | 460 | [355] | 682 | [491] | 3262 | [2496] | 714 | [838] | 1.38 | [1.72] |
| x65599 | 99 | [111] | 628 | [382] | 61 | [203] | 234 | [107] | 232 | [122] | 459 | [379] | 630 | [560] | 3097 | [2373] | 471 | [846] | 1.40 | [2.45] |
| Paul Hsieh | 106 | [114] | 576 | [420] | 82 | [118] | 183 | [101] | 178 | [100] | 456 | [341] | 678 | [600] | 3154 | [2380] | 670 | [847] | 1.41 | [1.83] |
| Sedgewick | 101 | [107] | 667 | [414] | 52 | [48] | 245 | [103] | 242 | [103] | 478 | [348] | 630 | [570] | 3204 | [2437] | 475 | [782] | 1.42 | [1.33] |
| Murmur2A | 113 | [114] | 598 | [433] | 78 | [102] | 183 | [112] | 178 | [109] | 488 | [365] | 719 | [544] | 3380 | [2369] | 651 | [772] | 1.44 | [1.73] |
| FNV-1a | 102 | [124] | 660 | [428] | 62 | [108] | 239 | [94] | 237 | [105] | 473 | [374] | 627 | [555] | 3140 | [2446] | 516 | [807] | 1.44 | [1.77] |
| MaPrime2c | 108 | [103] | 705 | [426] | 65 | [106] | 255 | [91] | 254 | [106] | 508 | [349] | 674 | [550] | 3413 | [2406] | 542 | [865] | 1.54 | [1.73] |
| Ramakrishna | 108 | [108] | 728 | [409] | 61 | [91] | 278 | [125] | 272 | [103] | 511 | [360] | 660 | [528] | 3378 | [2383] | 517 | [840] | 1.56 | [1.66] |
| Arash Partow | 106 | [101] | 739 | [435] | 93 | [420] | 280 | [98] | 275 | [85] | 514 | [355] | 671 | [570] | 3332 | [2372] | 543 | [779] | 1.65 | [3.88] |
| One At Time | 118 | [105] | 830 | [421] | 81 | [110] | 321 | [97] | 319 | [103] | 578 | [364] | 741 | [545] | 3809 | [2346] | 657 | [795] | 1.82 | [1.75] |
| Weinberger | 119 | [104] | 956 | [422] | 54 | [100] | 375 | [111] | 379 | [117] | 614 | [364] | 745 | [712] | 3973 | [2547] | 560 | [744] | 1.89 | [1.75] |
| Hanson | 86 | [118] | 531 | [649] | 55 | [112] | 168 | [118] | 1722 | [499] | 393 | [435] | 549 | [592] | 2742 | [2890] | 463 | [833] | 2.60 | [2.46] |
Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Avg column contains the average normalized execution time (and the number of collisions).
The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).
As you can see from the table, the function with the lowest number of collisions is not always the fastest one.
Results on a large data set (list of all words in English Wikipedia, 12.5 million words, from the benchmark by Georgi 'Sanmayce'):
Core i5 processor
| Wikipedia | Avg | |||
|---|---|---|---|---|
| iSCSI CRC | 5725944 | [2077725] | 1.00 | [1.00] |
| Meiyan | 5829105 | [2111271] | 1.02 | [1.02] |
| Murmur2 | 6313466 | [2081476] | 1.10 | [1.00] |
| Larson | 6403975 | [2080111] | 1.12 | [1.00] |
| Murmur3 | 6492620 | [2082084] | 1.13 | [1.00] |
| x65599 | 6479417 | [2102893] | 1.13 | [1.01] |
| FNV-1a | 6599423 | [2081195] | 1.15 | [1.00] |
| SBox | 6964673 | [2084018] | 1.22 | [1.00] |
| Hanson | 7007689 | [2129832] | 1.22 | [1.03] |
| CRC-32 | 7016147 | [2075088] | 1.23 | [1.00] |
| Sedgewick | 7060691 | [2080640] | 1.23 | [1.00] |
| XXHfast32 | 7078804 | [2084164] | 1.24 | [1.00] |
| K&R | 7109841 | [2083145] | 1.24 | [1.00] |
| XXHstrong32 | 7168788 | [2084514] | 1.25 | [1.00] |
| Bernstein | 7247096 | [2074237] | 1.27 | [1.00] |
| lookup3 | 7342986 | [2084889] | 1.28 | [1.01] |
| Murmur2A | 7376650 | [2081370] | 1.29 | [1.00] |
| Paul Hsieh | 7387317 | [2180206] | 1.29 | [1.05] |
| x17 unrolled | 7410443 | [2410605] | 1.29 | [1.16] |
| Ramakrishna | 8172670 | [2093253] | 1.43 | [1.01] |
| One At Time | 8338799 | [2087861] | 1.46 | [1.01] |
| MaPrime2c | 8428492 | [2084467] | 1.47 | [1.00] |
| Arash Partow | 8503299 | [2084572] | 1.49 | [1.00] |
| Weinberger | 9416340 | [3541181] | 1.64 | [1.71] |
| Novak unrolled | 21289919 | [6318611] | 3.72 | [3.05] |
| Fletcher | 22235133 | [9063797] | 3.88 | [4.37] |
Pentium-M processor
| Wikipedia | Avg | |||
|---|---|---|---|---|
| x17 unrolled | 11321744 | [2410605] | 1.00 | [1.16] |
| K&R | 11666050 | [2083145] | 1.03 | [1.00] |
| Bernstein | 11833902 | [2074237] | 1.05 | [1.00] |
| Larson | 11888751 | [2080111] | 1.05 | [1.00] |
| Sedgewick | 12111839 | [2080640] | 1.07 | [1.00] |
| x65599 | 12144777 | [2102893] | 1.07 | [1.01] |
| Arash Partow | 12235396 | [2084572] | 1.08 | [1.00] |
| Ramakrishna | 12185834 | [2093253] | 1.08 | [1.01] |
| Meiyan | 12269691 | [2111271] | 1.08 | [1.02] |
| CRC-32 | 12604152 | [2075088] | 1.11 | [1.00] |
| Murmur2 | 12713455 | [2081476] | 1.12 | [1.00] |
| SBox | 12716574 | [2084018] | 1.12 | [1.00] |
| Hanson | 12627597 | [2129832] | 1.12 | [1.03] |
| lookup3 | 12791917 | [2084889] | 1.13 | [1.01] |
| FNV-1a | 12868991 | [2081195] | 1.14 | [1.00] |
| Murmur3 | 12916960 | [2082084] | 1.14 | [1.00] |
| XXHfast32 | 12936106 | [2084164] | 1.14 | [1.00] |
| XXHstrong32 | 12950650 | [2084514] | 1.14 | [1.00] |
| Murmur2A | 13068746 | [2081370] | 1.15 | [1.00] |
| Paul Hsieh | 12992315 | [2180206] | 1.15 | [1.05] |
| MaPrime2c | 13348580 | [2084467] | 1.18 | [1.00] |
| One At Time | 13662010 | [2087861] | 1.21 | [1.01] |
| Weinberger | 14592843 | [3541181] | 1.29 | [1.71] |
| Fletcher | 37410790 | [9063797] | 3.30 | [4.37] |
| Novak unrolled | 37769882 | [6318611] | 3.34 | [3.05] |
Some functions were excluded from the benchmark because of very bad performance:
- Adler-32 (slow filling, not suitable as a hash function);
- TwoChars (bad for machine-generated names and variable names that are similar to each other, disastrous for large data sets such as Wikipedia).
The number of collisions depending on the hash table size (for the same data set, thanks to Ace for the idea):
Red Dragon Book proposes the following formula for evaluating hash function quality:
where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value. The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, which is not bad.
Here are the results for some of our functions:
Conclusion
Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (words and sonnets tests).
For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.
Novak showed bad results on the large data set. Jesteress has a high number of collisions in numbers test.
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Download the source code (152 KB, MSVC++)
Variations
XORing high and low part
For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:
return hash ^ (hash >> 16);
Subtracting a constant
x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.
Using larger multipliers for a compiler
Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.
The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).
Setting hash table size to a prime number
A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:
| Words | Win32 | Numbers | Prefix | Postfix | Variables | Shakespeare | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bernstein % 2K | 145 | [261] | 880 | [889] | 426 | [8030] | 326 | [214] | 316 | [226] | 649 | [697] | 874 | [1131] |
| Bernstein % prime | 186 | [221] | 1049 | [995] | 445 | [5621] | 364 | [194] | 357 | [217] | 805 | [800] | 1123 | [1051] |
| Bernstein optimized mod | 160 | [221] | 960 | [995] | 416 | [5621] | 341 | [194] | 334 | [217] | 722 | [800] | 969 | [1051] |
| x17 % 2K | 137 | [193] | 847 | [1002] | 81 | [340] | 314 | [244] | 300 | [228] | 641 | [863] | 832 | [1012] |
| x17 % prime | 173 | [256] | 1010 | [1026] | 104 | [324] | 356 | [246] | 339 | [216] | 760 | [760] | 1046 | [1064] |
| x17 optimized mod | 155 | [256] | 915 | [1026] | 96 | [324] | 330 | [246] | 315 | [216] | 691 | [760] | 930 | [1064] |
Implementing open addressing vs. separate chaining
With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:
| Bernst. | K&R | x17 unroll | x65599 | FNV | Univ | Weinb. | Hsieh | One-at | Lookup3 | Partow | CRC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OA | 426 | 81 | 84 | 207 | 88 | 91 | 273 | 110 | 103 | 92 | 1042 | 79 |
| [8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
| 32-bit | 179 | 69 | 74 | 114 | 86 | 80 | 125 | 105 | 99 | 92 | 347 | 82 |
| [8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
| chain | 92 | 68 | 73 | 82 | 88 | 84 | 73 | 107 | 99 | 95 | 149 | 84 |
| [500] | [500] | [24] | [258] | [124] | [48] | [100] | [138] | [131] | [108] | [1530] | [64] |
You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.
Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.

192 comments
Have you tried to compare the hash-functions against CRC32? That would be interesting!
Some DSP's can already do galois multiplies which is the slow part of CRC. For PC we'll have cheap CRC in the future when the new SSE becomes mainstream (hopefully).
Cool blob btw..
Most of the function presented there produce entropy of some level at a quantity of 32-bits. What you are doing by mod'ing by 1024 is ignoring the 22-bit higher bits. You should integrate them back into the result somehow.
A true test would not quantize the values. Instead it would just make a list of already generated values and see if those generated values occur again within the period. The period being the size of the "common words" etc.
Another good test is to see the avalanching abilities of the functions. In this case you only change 1 bit in the input and then see how many of the output bits change. The average should be close to about half the number of output bits(eg: 16 bits in the case of 32-bit outputs)
Another good thing to remember is to use random values, than only English or some such as you will see that in the case of English only certain bits in a byte are likely to change, how many times do we use control characters in English words?
I believe if you follow the above you may see different results...
My purpose was to benchmark the hash functions in close-to-real-life scenario. Theoretical tests will give a completely different result.
For example, cryptographic hashes such as MD5 will give a perfect distribution, but they are impractically slow for hash tables. Good results in a theoretical test not always mean a fast hash function.
So, even if the K&R idea is close to the reinvented "simple steps for each character, complex step at the end", the function they published is still not good enough for your "numbers" test, but otherwise is not so bad. And using primes is not so good idea.
But if I remember, the "K&R" code you use is still not the "from the book" one as they don't xor higher and lower bits?
And also, if I properly remember the K&R book, they use what Wikipedia article about "hash table" names "separate chaining". Would you compare how everything behaves then? Of course, for good results, you shouldn't allocate each new node with a separate malloc call.
Bernstein: K&R: x17: x65599: FNV-1a: Words without XOR 151 [ 308] 147 [ 196] 140 [ 240] 141 [ 293] 152 [ 272] Words with XOR 147 [ 261] 142 [ 194] 137 [ 193] 139 [ 250] 151 [ 262] Win32 without XOR 893 [ 1087] 892 [ 932] 839 [ 880] 866 [ 1049] 963 [ 1076] with XOR 889 [ 889] 897 [ 1006] 847 [ 1002] 843 [ 816] 951 [ 1021] Numbers without XOR 825 [18330] 869 [20810] 76 [ 360] 298 [ 5400] 91 [ 409] with XOR 453 [ 8030] 842 [19533] 81 [ 340] 205 [ 3158] 88 [ 207] Prefix without XOR 333 [ 262 ] 329 [ 192] 316 [ 235] 329 [ 251] 374 [ 260] with XOR 328 [ 214] 339 [ 274] 314 [ 244] 319 [ 200] 368 [ 233] Postfix without XOR 320 [262 ] 320 [ 234] 304 [ 237] 318 [ 265] 355 [ 211] with XOR 316 [ 226] 323 [ 256] 300 [ 228] 316 [ 316] 356 [ 237 ] Variables without X 671 [ 948] 668 [ 762] 654 [ 1109] 642 [ 883] 697 [ 787] with XOR 658 [ 697] 667 [ 811] 641 [ 863] 641 [ 904] 693 [ 790] Shakespeare without 882 [ 1151] 915 [ 1168] 844 [ 1019] 837 [ 1135] 890 [ 982] with XOR 884 [ 1131] 893 [ 1188] 832 [ 1012] 838 [ 1111] 908 [ 1063]For K&R function, the version without XOR seems to be better, so I will use it in future tests. FNV authors recommend using XOR, though their function often works better without it. XORing also does not help complex functions, because they already mix the bits.
The choice between separate chaining and open addressing depends on your task. If you need add-only hash table (a symbols table in compiler, a word counting program, etc.), open addressing will be faster because of better data locality for caching.
If you need to delete items from the table, separate chaining may be faster. Or may be not: note that in Python dictionary, they use some complex variation of open addressing, not separate chaining (see Beautiful code book). I will not do the tests, because the results will depend on the usage pattern of a particular application (how often does it delete items? which items? how many times is the table resized? etc.) You should benchmark it on some specific task.
I always used separate chaining for my hashes. Unless you know at the start how big hash you need, separate chaining behaves better.
Kernighan&Ritchie: | 179 [ 234] x17: | 179 [ 198] Kernighan&Ritchie: | 971 [ 1022] x17: | 971 [ 982] Kernighan&Ritchie: | 108 [ 1000] x17: | 84 [ 48] Kernighan&Ritchie: | 311 [ 226] x17: | 310 [ 214] Kernighan&Ritchie: | 299 [ 234] x17: | 299 [ 226] Kernighan&Ritchie: | 773 [ 822] x17: | 782 [ 868] Kernighan&Ritchie: | 1116 [ 1282] x17: | 1137 [ 1276]As expected, x17 is noticeably faster only in your "Numbers" test set. Otherwise the numbers of collisions are very similar. Also note that the pathological behaviour is fully avoided.
Kernighan&Ritchie: | 186 [ 196 ] x17: | 183 [ 193 ] Kernighan&Ritchie: | 987 [ 932 ] x17: | 992 [ 1002 ] Kernighan&Ritchie: | 1056 [20810 ] x17: | 109 [ 340 ] Kernighan&Ritchie: | 321 [ 192 ] x17: | 329 [ 244 ] Kernighan&Ritchie: | 304 [ 234 ] x17: | 304 [ 228 ] Kernighan&Ritchie: | 793 [ 762 ] x17: | 801 [ 863 ] Kernighan&Ritchie: | 1153 [ 1168 ] x17: | 1132 [ 1012 ]Could you please share your code via an upload site (such as RapidShare.de or Zalil.ru)?
Now, your code depends a lot on knowing exactly how many strings it has to process before it determines the table size. And if we use more elements for the same table size my program will run circles around yours.
I've checked now the wikipedia entry I've mentioned (the first time I've just looked up how they call "chaining" and didn't read anything) and I can just tell you -- don't believe them when they write:
"in most cases the differences between these algorithms is marginal, and other considerations typically come into play." ( false )
It looks to me that the author of wikipedia article "invented" that conclusion (that's against the wikipedia policy, but wikipedia shouldn't be used as authoritative reference anyway).
It seems that your statements "The choice between separate chaining and open addressing depends on your task. If you need add-only hash table (a symbols table in compiler, a word counting program, etc.), open addressing will be faster because of better data locality for caching" are also based on that wikipedia article and not on your actual or thought experiments.
Properly implemented, chaining is always much faster in "harder" cases and has comparable performance otherwise. Using it is more important then trying to be smart with the hash function. And of course, in practice there's seldom any benefit of using anything more complicated than the K&R-style function -- in that aspect you were on the right track. Only for very long strings, there can be benefit of processing more bytes at once during hash index calculation. In any real use, the time spent in hash index calculation is not what makes any program slow, not using chains often is (and of course, using mallocs where they are not needed alwasy slows things down).
I'm sorry that I can't upload the code, but it's easy to repeat the results. I've counted the "collisions" any time the string comparison to equality failed.
Separate chaining needs more memory because of the next item pointers (usually 3 times more than open addressing with the same table size), so people reduce the size of the table to save space. Sedgewick recommends setting table size to 1/5..1/10 of the number of items (on average, 5..10 items in each chain). For open addressing, he uses table size = 2..4 * N.
If you make a sparser table, the search will be faster (a classical time-space compromise), and it's true for both methods. Open addressing would also be faster if you take a larger table. I would use not 1/5..1/10, but somewhere around 1/2..1 for separate chaining. Memory consumption should be the same, so it will be fair.
I've counted the "collisions" any time the string comparison to equality failed.
You probably counted them twice (when inserting into the table and when searching in it). That's why your figures for collisions should not be compared with mine.
...based on that wikipedia article and not on your actual or thought experiments
My statements are based on the understanding of computer architecture and caching. Open addressing requires less memory accesses. If the string comparison fails, it reads the next item, usually from the same cache line. Separate chaining reads from non-adjacent cache lines.
There is another way to reduce the number of memory accesses (used in Python dictionary implementation). Let's store a full 32-bit hash in the hash table cells. It would require twice more memory, but string comparison will be rarely needed (only when the hashes are equal). I will probably try this.
About prime numbers: I just found in Sedgewick's book that they were used in modular hashing (not to be confused with multiplicative hashing). "Before high-level languages appeared" (i.e., long before your and my birth :), they used something like this:
UINT hash(const CHAR* str) {
return *(UINT*)str % TABLE_SIZE;
}
Here, TABLE_SIZE should not be the power of 2, or else it will mask out the letters in higher bytes. Today this function is not very useful, because division is much slower than other operations on modern processors. It also will show terrible results in the "Prefix" test.
What is more important is that your program with separate chaining eliminates the worst-case ("Numbers" test). It's really good.
> 2, or else it will mask out the letters in
> higher bytes. Today this function is not very
> useful, because division is much slower than
> other operations on modern processors. It
> also will show terrible results in
> the "Prefix" test.
Wait wait wait, guys..
No compiler does a slow division to get the modulo of a constant these days. No matter if your constant is a prime or a power of two.
Two multiplications and some adds and subs is all you need for any number. Powers of two just need an logical and (as everyone here knows).
So using a modulo is not *that* much slower. I think memory access times are much more important.
Besides that: The number of collisions become very important if the data you hash is large. For things like filenames you can get away with a simple hash, but try to hash megabyte large blobs of data, possible with paging from disk.
You'll be glad about any collision you don#t have in these cases.
3 times more? Of what? Certainly not in my code. If you can't imagine it without actually programming, then try to implement it efficiently -- you'll see then.
> Memory consumption should be the same, so it will be fair.
In my view, to be fair
1) Try to fix the table size before you know how many elements you have to hash (you have to do this for almost any real life purpose). With open addr, you will either use insane amounts of memory "just to be sure" or will have to make new tables each time you cross some percentage of table occupancy. That percentage is important, since the rest of the pointers is going to be unused. And making a new table on these points is also expensive (you have to rehash everything again).
2) Forget about nice power of two table sizes -- once you care about the memory you can't allow the luxury of having each table size twice as big as previous.
Everything else is not realistic enough.
> You probably counted them twice (when inserting into the table and when searching in it)
You're right. So you count "collision" only after inserting? Moreover, should "collision" be every cmp or only "did it hit empty table entry or not"? Personally I'd avoid the term "collision" at all -- it would be better to count some specific operations which have some specific costs (cmp, hash calc etc). Or if we observe "table occupancy" it's also not the full information there once the chaining exists. Etc. More or less boring things -- personally I wouldn't mix the speed tests with the evaluations of the properties of the hash functions or the program strategies.
> Separate chaining reads from non-adjacent cache lines.
What makes you think that?
> About prime numbers: (...) they were used in modular hashing (..) "Before high-level languages appeared"
Exactly! That's why people who knew about that even later tried to keep the table sizes uneven. However I still stumble on the texts that preach the major importance of primality (and I never believed them). On another side, most of the times you don't have the luxury of having hash tables with only of power of 2 sizes, so the "modulo" calculation is hard to be avoided in practice -- that's why I liked to see it in the tests. :) As the added benefit, it demonstrated that dependence of hashes on primes is not really needed.
Btw, I've just checked K&R 2ed to see their example, now you know the whole story behind their
line. :)
(speaking of defines, in your cpp code you should use enums instead)
Re modulo operation, as far as I know, it can't be avoided if the the table size is not known in advance.
Numbers / 500 lines / 512 elements in the table Bernstein: 109 [ 500 ] Kernighan&Ritchie: 111 [ 500 ] x17: 101 [ 168 ] x65599: 134 [ 780 ]Orig code (no chaining, table size twice as big as in chaining):
Numbers / 500 lines / 1024 elements in the table Bernstein: 516 [ 8030 ] Kernighan&Ritchie: 1056 [20810 ] x17: 109 [ 340 ] x65599: 262 [ 3158 ]It's equivalent to j = i - i / 257 * 257, where i / 257 is calculated with MUL. Unfortunately, this optimization is impossible if table size is not constant.
There is also a method for optimizing modulo of 2^N-1 (Mersenne primes). Note that his code is wrong for large numbers. The correct, slower code will be:
UINT mod127(UINT k) {
const UINT p = 127, s = 7;
do {
k = (k & p) + (k >> s);
} while (k > p);
return k == p ? 0 : k;
}
You are right about the case when there is a lot of data. Paul Hsieh said his function is used by Adobe, Apple, and Google. As far as I understand, they use it for large amount of data (such as Google web search index). Complex functions perform better in this case, and his function is one the best for this kind of tasks.
I'm sorry, not 3 times, but N additional pointers for the next items, where N is the number of strings inserted into the table.
About non-adjacent cache lines. With separate chaining, you need to look up the hash value in the index table, then walk to the key-value pairs (see the picture). They are usually located far from each other in memory. If the value is not found in the first record, you need to follow the next pointer, which also can be far from the first one.
Now imagine open addressing with stored 32-bit hash. In case of collision, it will read from adjacent memory cells (usually, no need for string compare), so it may be faster than other methods. I will test this.
A hybrid scheme looks interesting: items in the collision chain will be close to each other, so it should be fast. Another scheme.
> And making a new table on these points is also expensive (you have to rehash everything again).
Resizing hash tables is another interesting topic. For now, let's assume that we know the approximate table size before starting hashing. For example, in a word counting program you could divide the size of the text file by the average word length to get the estimated size.
Thank you for the results; they are really impressive. Why can't you share your code? Zalil.ru is really easy to use (even with no knowledge of Russian language): just choose your file. It will upload it and give you a link on their server. RapidShare is another popular server.
I still claim you didin't try to imagine how it should be implemented.
> A hybrid scheme
Just another bad wikipedia article with unsubstantiated claims, written by the author who doesn't understand the topic, or maybe just edited to stupidity by the following "contributors" -- I'm not interested to check how it happened.
> Another scheme
This at least doesn't fail the "first glance test".
> For example, in a word counting program you could divide the size of the text file by the average word length to get the estimated size.
The perl script is 25 lines, the input files are 5 GB each. Estimate how big hashes should be constructed during the interpretation of the script on a 256 MB machine.
> Why can't you share your code?
My code is not important. I know you were able to understand everything (even without actually programming!) if you didn't stick to your initial idea and code.
You can simply calculate the fast modulo using reciprocals yourself. No need to rely on the compiler optimization.
The code to find the magic constants for reciprocal multiplication can be found here: http://www.hackersdelight.org
The hashtable size does not change that often, so it's not a big deal.
Btw, I may be wrong, but I remember that compilers generate better code for the optimized division when you work with unsigned values for the index and the prime-constant.
Perhaps you could use the following code:
#include
#include
#include
#include
#include
When you first said about modulo optimization, I thought that it's possible to generate the code on the fly. In this case, we can use faster sequences for some divisors instead of general and slower code, but code generation can slow down the whole program. Another solution is to select one of 32 pre-generated functions for prime divisors near to 2^K (one function for each possible value of K).
In most cases, using prime numbers provide similar number of collisions to that of 2^K table sizes (see the table above), so I'm not sure if using these tricks worth a cost. It will not be much faster than the original program, but it's interesting and I will possibly try this. Thank you very much for your contribution.
> the input files are 5 GB each. Estimate how big hashes should be constructed...
I agree with you, it should have some reasonable limit on hash table size. However, the basic idea is correct. It makes difference if you count words in a 20 KB article or in a multi-megabyte book such as "War and Peace". You can use the file size to estimate the size of the table and avoid resizing it too often.
Your function shows mediocre speed in my tests because of using ((i & 1) == 0) condition that was not optimized to branchless code by the compiler. Weinberger's function has the same problem (on old computers, the cost of branch misprediction was lower, so I guess his function was quite fast for his time). On modern computers, branches on random data should be avoided in inner loops whenever possible.
In other aspects (the number of collisions) your function is good and shows quite nice results. You should try to avoid the branch inside the loop.
Finally, this "cache-hits bla bla bla" what somebody wrote and you stuck to is a nonsense, as I've already said. Check what happens in OA collisions: the code can't just play with pointers, it must do string comparisons, so OA in that case is not better.
Just start implementing C.
Thanx for the comments and advice on my function, I'm aware of the speed issues in the published versions the only reason why it and other hash function on my site are described like that is for explanatory purposes. In fact most of the hash functions on the page can be speeded up trivially by unrolling the loops somewhat.
For example when I use my hash function in real life I remove the if statement and process 2 or 4 or 8 bytes at a time rather than the 1 byte per loop. In any case a good optimizing compiler should see that the branching statement is not related to the data but the iterator and from there should be able to come up with a decent execution path. An example of how this optimization can be done:
unsigned int ap_hash(const char* begin, const char* end)
{
unsigned int hash = 0xAAAAAAAA;
unsigned int length = static_cast
unsigned int rounds = length / 2;
const char* it = begin;
for(std::size_t r = 0; r < rounds; ++r)
{
hash ^= (hash << 7) ^ (*it++) * (hash >> 3);
hash ^= ~((hash << 11) + (*it++) ^ (hash >> 5));
}
if (1 == (length & 0x01))
{
hash ^= (hash << 7) ^ (*it) * (hash >> 3);
}
return hash;
}
Note: the above can be done to make of the hash function presented.
That said a purely "collisions oriented" hash function test suite would be great to have, I'd be willing to collaborate with you and others here to come up with a set of tests (BJs or PHs) to definitively test hash functions.
The reason why I say this is from experience, good hash functions from a collisions pov are easier to optimize for speed than good hash functions from a speed pov are upgraded to perform well from a minimizing collisions pov.
> In any case a good optimizing compiler should see that the branching statement is not related to the data but the iterator and from there should be able to come up with a decent execution path.
Peter uses MS compiler. I've tried with gcc 4, and it also can't make such optimization. Which compiler can?
>>Which compiler can?
I'm not sure if any compilers do at this point, that is why I reorganize things. The point I was trying to make was that one-at-a-time hash's should have their loops unrolled (where possible).
About the test suite. What do you think about finding the average collision chain length in addition to the number of collisions?
There are probably other valuable metrics. I'm not so good at math aspects of hash functions (programming always was more interesting for me :), so if you have other ideas, please propose them. You seems to have more experience in math than me. BTW, what are BJs and PHs?
> The point I was trying to make was that one-at-a-time hash's should have their loops unrolled (where possible).
Thank you, I will try to unroll other functions, too.
You did a good job identifying a good examples for weaknesses of OA with linear probing.
Regarding investigation of hash functions and algoritghm properties independently of speed, did you know you can do integer aritmetic in Perl just like in C?
sub myh { use integer; # important my $h = 0; for my $i ( 0..9 ) { $h = $h * 33 + $i; my $v = $h & 0x1ff; print "$i\t$v\n"; } } myh();gives
>>> In any case a good optimizing compiler should
>> Which compiler can?
> I'm not sure if any compilers do at this point,
Thanks. Which one do you use? I guess MS?
Anybody to try Intel?
When I said Intel I was referring to the "Intel Compiler" (which is distributed with some licence managing -- limiting software even for Linux).
I've tried the latest gcc from 4.1 branch, and Peter uses VS 2005, but as far as I've seen up to now VS 2008 doesn't introduce some significant optimizations on the function level, that's why I asked about Intel Compiler.
Still, it is possible that gcc 4.2 has some new optimizations (I haven't analyzed what's new in 4.2), so it's worth that you try it.
But my question on that topic is still -- is there anybody to try Arash's original code on the Intel Compiler to see if the bit check of i can be moved outside of the loop by compiler alone?
>>Arash, with this optimization, your function looks much better. I will include it in the next benchmark.
Can't wait to see the new results. :)
>>About the test suite. What do you think about finding the average collision chain length in addition to the number of collisions?
Average collision analysis is a moot point with regards to hash function, due to two reasons, a hash function is inherently a prng so its behavior is best described as a poisson process (when was the last time someone wanted to know the average value the mersenne twister gives?), and secondly average collisions would require a quantizer value - this as you can see is another debatable issue, what value to use? should it be prime or can it be a power of 2(for efficient "and"ing etc)?
As Ace suggested the maximal chain length would be a nice fact to know and a good measure. Knowing the mean chain length, and the varaince from the mean then assessing how close that is to the maximal would be another to know but these should be done with and without quantizers.
With regards BJ and PH, Bob Jenkins and Paul Hsieh, they both advocate analysis of avalanching properties - which I agree is important and does reveal a lot more than basic collision analysis. In short coming back to the prng model for hash functions, given some random string of random length, and a hash function that outputs n-bits, a good hash function as far as avalanching goes (not mentioning strict avalanching criteria) is one that has probability of the i'th bit in the output being 1 close to 0.5. A hash function that has some i'th output bit that is predictable eg: always 1 or most times 1 implies that the output bit is either not being included properly in the mixing process or that whatever mixing is occurring results in it being in said way, a simplistic example of this is if the result of a particular bit is the output of an "OR" gate with inverted inputs (C = A or ~A)
>> Which one do you use? I guess MS?
I use Intel (icc) 10.x, and it doesn't do it. But there is a good reasons why, as far as the c++ standard is concerned there are some issue which regards to reference aliasing that the optimizer in the compiler detects, and as a result is not capable of optimizing away. It may change in the 2009 standard ratification if threading is introduced as according the Sutter for threads to be apart of the language definition the language should provide some guarantees about memory model and some-such this would inherently result in positive side effects for such optimizations.
In some cases I find it better to craft the code specifically rather than relying on some black-box of a compiler's optimizer, you may write something and it may look good in contrived test cases, but then when used elsewhere the compiler may see other possible optimizations and as a result do "other" things. I guess that is why even today you still see people implementing data structures such as RB-Trees etc purely as macros.
Chaining can be used where OH can't. So when implementing only one strategy, not implementing chaining is the wrong choice. And once there's chaining, there's no real reason to implement OH. Second, with chaining, most hash functions will have the same effect -- the difference is hard to recognize in most of the cases. But, where the number of collisions is more important than speed (where access to the data is really expensive), not surprisingly, CRC32 proved to be the very safe bet. And for tables with up to 2^16 elements CRC16 can probably be enough.
Now, Peter, your function gives quite consistent good results in speed and number of collisions, at least in your set of tests! It's amazing that 17, even smaller constant than 33 and 31 went unrecognized up to now. Congratulations! Of course, this will not give significant speed up in practice (unless in cases where somebody unwisely used OH), where widely used functions all have the similar speeds.
But you also nicely demonstrated that the effort to develop the complicated functions (excluding CRC of course) was practically unneeded.
Here is some kind of "overall speed scores" based on the normalized results of the chaining tests with your code (I gave all test groups the same weight, that is maybe controversial) with
that is, the table twice as small as in your OH tests.
(In chaining, only one pointer per string can be enough. The string can simply immediately follow the "next" pointer.)
Compiled with /O1
(lookup3 is the only one really dangerous here -- not to be used unless it can be checked that the compiler optimized what was necessary to optimize to even use the function)
Compiled with /O2
Other common cases not covered by your tests are hashing of integers, and hashing of memory addresses. In both cases there is a limited number of bytes, some patterns occur often and there's still the need to hash fast and good.
Hmm, i've wrote about it in your blog before :-]
http://smallcode.weblogs.us/2007/01/05/what-your-compiler-can-do-for-you/
In most cases divide and modulo (together) can be done using only one MUL
My function differs from the others not only by multiplier, but also by subtracting 0x20 from each character, which helps to "pack" more characters in the hash value. Certainly, this will work only for ASCII strings (you should use a different function for random binary data).
Arash said a lot about PRNG-type hash functions (with avalanching effect), but these functions are actually less effective for short strings than non-random multiplicative functions. If the table is small enough, a multiplicative function can pack all input characters into integer, so the hash value will be unique.
Thank you for the overall score figures and especially for lookup3 test. This function is too complicated for both compilers and humans :).
Do you mean hashing 32-bit integers and memory addresses? In this case, different hash functions must be used. Modulo prime should work well, and the optimization proposed by Nils will be very useful.
Arash, thank you very much for these ideas. If I will have some free time, I will implement them in my tests. You have a good framework for theoretical tests, so you could try them, too.
Prime quantizers are not needed for hashing strings; ANDing will do better.
About avalancing: multiplicative functions are not random at all, but they often show better results than PRNG-like functions.
Take a look at Python sources. They use a simple multiplicative hash function with M = 1000003 (search for "string_hash" in file stringobject.c). In dictobject.c, they say:
Most hash schemes depend on having a good hash function, in the sense of simulating randomness. Python doesn't, [...] but this isn't necessarily bad.
Read also their notes on optimizing dictionaries.
Thank you again for your help. Your function shows much better results after unrolling, and mine also wins from it :).
> i've wrote about it in your blog before
Peter, sorry, I forgot about this. I knew about "magic numbers" for division, but never thought that it can be applied to modulo. BTW, you will need additional shifts anyway, so it's not only one MUL.
Many years ago, at the time I considered such a modification, I believed it shouldn't bring anything. However I never used OH or searched for pathological examples. Have you tried to get the results without that subtraction?
> Thank you for the overall score figures
That was on Core 2. I've made the same scoring on Pentium III, /O2 and received:
Note that here a function with 1.16 is on average 16% slower than the best one. Of couse a bit of reasonable doubt remains as you selected the test sets. :)
In practice the hash functions are called between other processing, so the speed of actual hash functions can contribute even less to the overall speed. In such cases some shorter function can be better even if in tests like yours it appears slower than some with the bigger footprint. On another side, when we really want to minimize the number of collisions, it would be interesting to know "collison goodness factor" from your tests -- using the same method I get the following scores:
Now, I also considered the possibility that your function gets too good score here since you selected the pathological example and possibly optimized your function only for that. So I removed the "numbers" results and the scores are:
Note that most of the functions are very close. Not counting 'Weinberger', all inside of 7% span!
(Still, I'd rather count the number of comparisons and not "collisions".)
> Modulo prime should work well, and the optimization proposed by Nils will be very useful.
How much is the gain of using these compared to a single DIV in processor cycles? Can that method be implemented to work for any N? Can it be programmed to work without programmer manually testing constants (class FasterDiv) where the constants are calculated once (but in the runtime) and then often used?
It was almost always slower:
Words Win32 Numbers Prefix Postfix Variables Shakespeare x17 146[237] 890[980] 85 [351] 339[277] 320[234] 684[1196] 875[1059] x17 - 0x20 138[193] 848[1002] 81 [340] 313[244] 300[228] 640[863] 839[1012] K&R 143[196] 892[932] 866[20810] 329[192] 320[234] 658[762] 878[1168] K&R - 0x20 136[187] 842[917] 863[20810] 312[201] 302[207] 636[852] 840[1050] Bernst. 145[261] 880[889] 426[8030] 326[214] 316[226] 649[697] 874[1131] Bernst. - 0x20 139[229] 850[903] 468[9295] 319[276] 303[263] 640[756] 841[1025]> How much is the gain of using these compared to a single DIV in processor cycles?
% 1031 and 257 % 127 DIV 39 39 mod_table_size 14 17All times are in clock cycles; measured with Agner Fog's program.
> Can that method be implemented to work for any N?
Some divisors require 4 additional instructions (compare 257 and 127). If you have a pre-calculated table of divisors close to the powers of 2, you can select only the divisors that give a shorter code. If it gives a longer code, just select the next prime number, e.g., 131 instead of 127.
Another solution is to use CMOV to ignore the result of the 4 additional operations when they are not needed, but this will be a little slower.
> It was almost always slower
I've asked because the method of using multiplication by constant is widely accepted but using subtraction not.
The "collision score tables" which include minus and no minus versions for various mult factors and using chaining look inconclusive:
Without numbers test
with numbers test
It looks to me that the visible "noise" of the results can not actually lead to some useful conclusion. If that operation really has the properties of conditioning the final result better even for these test cases, I'd expect that every function based on the multiplication to benefit from it? Here, K&R got better with subtraction, x65599 and Bernstein got worse.
Also note that "no subtraction" x17 comes much worse, too much to claim that 17 is much better factor than others.
Note also that 'a' - ' ' == 'A'. As it looks like that most of your test samples contain small letter strings, what you were doing is most of the times just making them ALL CAPS strings and testing the *same* functions!
The only useful real conclusion I have up to now is that functions with multiplication factors are still the good choice for practical purposes.
Yes, it's the combination of the factor 17 and subtracting that makes it the fastest function.
> functions with multiplication factors are still the good choice for practical purposes.
That was the main point of my first article :).
Ace, what do you think about Python dictionary code?
> it's the combination of the factor 17 and subtracting that makes it the fastest function.
But let me point again, I beleive that given two functions on strings where one processes
char - Const
and the other
char
can both be considered as one and the same function. It doesn't look to me that you proved anything, except that the function gives good results exactly for the set of strings you've selected. Wouldn't just "toggling case" of your set before the hashing result in loss exactly where now the gain is? Do you want to say that you've developed the hash function that does good on strings with mainly 'a'..'z' but not on strings with mainly 'A'..'Z'?
> what do you think about Python dictionary code?
I haven't invested much time, so I can't say much. If I correctly see, they have a special treatment for very small dictionaries -- that's very important and a very good thing to do. Then, if I've correctly understood, they don't use chaining, and they resize the table once it's more than 2/3 full. Now if the 2/3 is an optimal limit is something that depends of their collision resolution efficiency. In my opinion you've demonstrated that without chaining and with linear probing even 1/2 can be "too full".
No, it will not be slower. Here are the raw results with upper-cased text files (open addressing, Pentium M):
Words Win32 Prefix Postfix Variables Shakespeare Bernstei 148[287] 882[919] 335[267] 319[231] 662[787] 864[1050] K&R 144[214] 889[933] 334[229] 321[221] 652[655] 865[1029] x17 139[197] 838[848] 309[192] 300[226] 642[939] 856[1219] x17 unroll 133[197] 821[848] 302[192] 294[226] 626[939] 823[1219] x65599 139[213] 860[1090 323[215] 311[222] 639[860] 839[1129] FNV-1a 150[222] 956[1021 369[216] 357[222] 686[655] 903[1067]Mixed case and lower case:
After conversion to upper case:
x17 should be slower on strings with a lot of "\r\n" and other control characters (though I have not checked this).
http://stochasticgeometry.wordpress.com/2008/03/29/cache-concious-hash-tables/
Actually, the number of collisions that Murmur (and lookup3, and any cryptographic hash) produces is in the range predicted by statistics if the hash is truly random. I don't recall the equation offhand, but your x17 does better than predicted because it's less random. :)
Statistically good distribution is important in some applications because it allows me to say "My hash function will not produce pathological results with your keyset" with some certainty even if I have no idea what your keyset is. Bob Jenkin's frog.c test is particularly good at producing pathologically bad keysets - it creates large keys that are mostly 0 bits but with a small handful of 1s - that choke most simple-but-fast hash functions. Murmur happens to hit a nice sweet spot where it is simple, fast, and still statistically strong.
UINT32 HashAdler(const CHAR * data, SIZE_T len)
{
UINT32 a = 1, b = 0;
while(len > 0) {
SIZE_T tlen = len > 5550 ? 5550 : len;
len -= tlen;
do {
a += *data++;
b += a;
} while (--tlen);
a %= 65521;
b %= 65521;
}
return (b << 16) | a;
}
But for me the shortest Murmur variant is a bit of cheating. The starts of the strings in C and C++ are *not* aligned to 4 bytes unless typical malloc or new is made for each string or each string is copied to the convenient buffer before the hash is calculated. So I'd consider MurmurHashAligned2 a real function and I'd do the timings on the set of the strings which are packed in memory one after another, without anything between them (which makes the start of each of them unaligned and not predictable). I'd also like to see the comparison between "copy to the nice buffer then do MurmurHash2" and "just do "MurmurHashAligned2" for strings never longer than e.g. 1024 bytes -- when something is longer it's certainly not probable to appear in unaligned input (this test would probably show the quality of L1 cache?).
------
The need for "statistically good distribution" is exactly the reason why "adding constant" to each source byte should not matter, the reason why I didn't like your claim about the goodness of "subtracting constant" in x17.
------
Austin Appleby mentions on
http://murmurhash.googlepages.com/discussion
that he did chi-square and avalanche testing.
Chi-square test are fundamental tests and should be used for most of experiments. In comp science it was of course used to check goodness of rnd generators probably since they exists.
One possible introduction:
http://www.fourmilab.ch/rpkp/experiments/statistics.html
and the program which uses the test to evaluate the quality of "random" stream:
http://www.fourmilab.ch/random/
http://code.google.com/p/google-perftools/
In addition to tcmaloc, the perftools include some interesting hash table implementations. If you can deal with the limitations, intrusive data structures can be much more efficient.
http://code.google.com/p/google-sparsehash/
Nice go for a 4th time around, but I think we previously discussed as far as timing is concerned if the hash functions are not implemented in a similar fashion then its really not a fair or meaningful experiment. The hash functions from my site are implemented in their most basic definition, for production purposes you wouldn't use them as is you would try and do things like duff's device or something similar to what Jenkins does.
Take for example the way you have unrolled my hash function, its probably the worst way it could have been done, further more why not djb, pjw or others? So as far as timing is concerned unless you can get them all on the same footing it is worthless/meaningless to mention times.
The next issue is collisions, I still don't accept your methods as being generally acceptable though it does seem valid for most situations, as I suggest previously it may be better to get together on this an define a real set of tests, standard inputs and testing methodologies.
In anycase keep up the good work.
x86 processor can access non-aligned dwords. My previous measurements showed that it's faster to read short strings without alignment, so that you can avoid an additional switch statement before the main loop. Alignment matters only if you are going to use the function on a different processor or to hash some very long strings.
Subtracting a constant matters because you then calculate a modulus of the hash table size, so you effectively throw off some high bits. You can save more information in low bits by subtracting ' ' (if you know that '\n', '\r', and other control characters will not appear in the hashed strings).
Certainly, x17 has nothing common with a statistically good hash function; it just tries to "pack" more characters into a small hash value.
Won, thank you, I'm studying SparseHash now. They used quadratic probing instead of separate chaining. It would be interesting to compare these approaches (I will probably do this in future).
The first problem with Adler and Fletcher is that sum2 will be masked away when calculating the modulus. I tried replacing the last line with the following:
return a ^ b;
and got much better results:
Words Win32 Numbers Variables Shakespeare Adler-32 original 151 [178] 830 [1070] 382 [7688] 725 [1564] 1540 [11686] Adler-32 (a ^ b) 147 [157] 791 [559] 251 [3624] 655 [719] 1141 [3567]The second problem is that the characters are not "weighted" (multiplied by different numbers), so that Adler-32("01") = Adler-32("10"), that's why it fails the Numbers test. Ditto for anagrams in Shakespeare's sonnets: Adler-32("heart") = Adler-32("earth").
So, Adler-32 may be a good checksum for compressed data, but I would not use it for hash tables. Murmur is definitely better :)
No, you just shuffle the values of each input byte (1 becomes 225 etc).
> Certainly, x17 has nothing common with a statistically good hash function; it just tries to "pack" more characters into a small hash value.
Yes, your tests clearly demonstrate that even using simple K&R and separate chaining can often be good enough. Maybe Google searches for a stronger function only because they decided not to use separate chaining?
I haven't analyzed their code, but it can be that they didn't really have to avoid separate chaining -- that they were able to keep the same memory usage, or even have better behaviour in cases when the number of input elements is not known in advance.
I'd still enjoy to see any real life example which demonstrates the real need for statistically strong hash function, provided open addressing is avoided.
http://www.augustana.ca/~mohrj/courses/1999.fall/csc210/lecture_notes/hashing.html
http://www.boost.org/doc/libs/1_35_0/doc/html/intrusive/unordered_set_unordered_multiset.html
Boost, of course, has lots of interesting stuff.
I mentioned Cuckoo hashing as an alternative chaining method to Peter in an e-mail. There are variants that can sustain very high load factors.
http://en.wikipedia.org/wiki/Cuckoo_hashing
For Adler/Fletcher: ah, that makes sense. I suppose the prime mod versions would work much better. But, I don't think it is correct to say that Adler/Fletcher cannot distinguish between permutations (think about how 'b' accumulates). I also don't think a^b is a particularly good mix for Adler/Fletcher. There is the problem with how you compute the bucket from the hash. If you need N bits, maybe it makes sense to combine N/2 bits for a and N/2 bits from B. Maybe this is as simple as taking the middle N bits from an Adler/Fletcher hash, rather than the least N bits.
One variation is to allow for multiple items in a table entry (similar to a multi-way cache) before rehashing those elements. This variation is certainly not unique to cuckoo hashing, but seems to work well with it.
Anyway, my discussion is mostly from memory, but these postings are encouraging me to look into it myself...
For short strings, the higher byte of 'a' is zero, so if we take the middle N bits, lower bits of 'a' will be lost. It makes sense to calculate a ^ (b << (N/2)).
I've just tried a ^ (b << 4) for these test files, and the results were comparable with Murmur:
Words Win32 Numbers Variables Shakespeare Adler-32 original 151 [178] 830 [1070] 382 [7688] 725 [1564] 1540 [11686] Adler-32 (a ^ b) 147 [157] 791 [559] 251 [3624] 655 [719] 1141 [3567] Adler-32 a^(b<<4) 146 [135] 789 [475] 98 [ 500] 627 [428] 923 [784]Thank you very much for your ideas.
AFAIK the tables there also clearly demonstrate the superiority of chaining? See what happens when load factor increases.
So I'd still like to see any good argument to use open addressing, especially since it demands more complex hash function only to be able to perform acceptably and then it even requires using this function more often, and behaves exponentially worse as load factor even approaches 1 (I'd call that lose-lose-exp(lose) scenario :) ).
However, there are some advantages to open addressing. A non-intrusive C++ hash table implementation that uses chaining (e.g. tr1/unordered_map, based on SGI STL hash_map) has to allocate nodes to store the data + the link pointers. This means memory allocations, copies, and indirections that contribute to the constant factor for chained hash table implementations. For typical load factors (1/2 to 2/3), open addressing (google-sparsehash uses quadratic probing) can be faster.
http://google-sparsehash.googlecode.com/svn/trunk/doc/performance.html
There is also the slight advantage that open addressing can be slightly more compact, but I don't know how important this actually is. Sustaining a high load factor is probably much more important than avoiding pointer overhead, so chaining might be even better in this regard. If you really care about compactness, something like Judy might be better:
http://nothings.org/computer/judy/
An intrusive hash table implementation avoids most of the practical problems of chaining. Since you avoid all those copies, allocations and indirections. These are much faster than any of the non-intrusive options, but they are not appropriate in every situation. It is not always possible to instrument classes to be used in such an application. The instrumentation overhead affects all instances, not just those in hash tables. Ownership semantics are very different, and that can lead to design subtleties (amplified by concurrency).
Moreover how can the load factor which so influences open addressing be kept constant without introducing even bigger performance penalties, when even slight changes in load factors force rebuild of the whole container?
I agree with you that ownership is very important. Personally I don't think it's smart overusing smart pointers in C++ and that when somebody doesn't want to care about ownership he should not use C++ at all but some language which has "natural" GC. For C and C++, I think the best results are achieved by maintaining a distinction between owning and referring containers. The compiler writers knew about that since forever I guess. And they certainly had to maintain a lot of hashes.
Is it possible for you to explain (if you know) what are the exact design decisions behind "sparse hash" since I fail to grasp them from the site? Only then it can be discussed if there's a possibility for improvement, or what the major contribution of that implementation is.
Thanks in advance.
Open addressing does not require additional information to handle collisions, because the "chaining" is implicit, based on the probing strategy. Objects are copied into the container without need for modification or ancillary data (maybe a little bit to indicate empty/full).
Explicit chaining requires some kind of pointer. Each bucket forms a short linked-list of elements. You can implement that linked-list in several ways. The non-intrusive way (like SGI hash_map) makes a node that is essentially a pair of the object + a pointer to the next node. The intrusive way (like Boost intrusive_unordered_map) is to have a special field within the object itself to point to the next object. So for STL-compatible containers, this isn't necessarily a huge win, since they are value-oriented. However, a non-owning, pointer-oriented intrusive hash map can be very fast because it avoids almost all copies and allocations.
BTW, ownership is always a design problem. GC does not automatically solve it for you.
Specifically, it seems that his problem was that when STL hash_set is used to *own* values (of some big structures?) a lot of memory was taken by placeholders for unpopulated entries.
That's why he designed the "sparsetable" which spends (more-or-less) a single bit to mark an used or unused entry in it, and allocates space to hold only the assigned values it owns. Then he used that "sparsetable" as the underlying table for the open-addressing hash table implementation.
So his goal was to fit as much as possible 'indexed' (with a hash function) structures in the memory without even spending a single pointer per structure kept, giving away performance to get the lower memory footprint.
Nice.
I've accidentally discovered the error in the implementation of "HashWeinberger." The line
is wrong, as >> has lower priority than ^, and that causes significantly worse results. The original code in the book was:
But even their version doesn't look too promising as the last line in their function is:
where the prime is defined as 211. But at least that suggests that it was not meant to be used in "open addressing."
Another problem with the hash.cpp: either all or none of the functions should use the "fix" as:
otherwise there is no comparing under the same conditions.
Thanks for your comment. MSVC generates the same machine code for
andOperator ^ has lower precedence than >> (^ is lower in the precedence table). You probably changed something else in the source code that affected performance.
I tried
h ^ (h >> 16)for K&R function, and it was slower this way:For Larson's hash, I just forgot to try it (XOR version is faster). Thank you for reminding!
You are certainly right, my error! This was without thinking from my side, as I was used to see x >> y + z where the author wanted to >> before +. However ^ is really weaker than >>.
To excuse myself, this lapse occurred as I wanted to make my small contribution:
D. R. Hanson uses in two of his books, since around 1997 up to now, e.g. here:
http://code.google.com/p/cii/source/browse/tags/v20/src/atom.c
the following hash function:
for (h = 0, i = 0; i < len; i++) h = (h<<1) + scatter[(unsigned char)str[i]];And it's very poor! Your tests immediately demonstrate the (this time, real) error.
No problem.
The comments are plain text now. I've rewritten strchr.com from scratch, as I promised to do earlier. Real minimalism: around 1500 lines of PHP code written in a couple of weeks, no bloated CMSes or "web frameworks". All your comments were saved, as usual. Later, there will be a WYSIWYG editor for comments and a button to publish your own article :)
Xkcd recently brought to my attention "Collatz conjecture" (http://en.wikipedia.org/wiki/Collatz_conjecture)
Using programming language notation, giving the following Python function (where numbers are of unlimited number of bits):
def collatz( n ): while 1: n = 3 * n + 1 if n & 1 else n >> 1 if n == 1: breakthe conjecture states that for every n the loop will eventually terminate.One way to look at it is: the 3*n+1 would be an acceptable hash function step where each input would be a single 1. (of course using unlimited number of bits is not for practical hash functions). The n >> 1 branch removes all zero LS bits, if they exist, else the "hash function" is applied (note also that by the construction of the loop every "hash" step results in at least one zero LSb). So the conjecture can be restated "applying 3*n+1 transformation step over the number with unlimited number of bits and the transformation which removes LS zero bits from the number in one moment there would be exactly 1 set bit in the number."
Of course I don't see any practical aspect of all this, except how a lot of experiments (in wikipedia article) show the "bit randomizing" properties of the multiplication with the odd number.
And that's what's wrong in Hanson's hash: he uses an even multiplier and in every step he loses information about previous steps(!) Using the "scatter" mapping trick of input byte doesn't help at all -- the multiplier is the engine, here the broken one.
I suggest adding the function used by Hanson to your set of hashes in hash.cpp. The results nicely demonstrate how a faulty hash function can be designed, demonstrated in the books and used for a long time:
static unsigned long scatter[] = { 2078917053, 143302914, 1027100827, 1953210302, 755253631, 2002600785, 1405390230, 45248011, 1099951567, 433832350, 2018585307, 438263339, 813528929, 1703199216, 618906479, 573714703, 766270699, 275680090, 1510320440, 1583583926, 1723401032, 1965443329, 1098183682, 1636505764, 980071615, 1011597961, 643279273, 1315461275, 157584038, 1069844923, 471560540, 89017443, 1213147837, 1498661368, 2042227746, 1968401469, 1353778505, 1300134328, 2013649480, 306246424, 1733966678, 1884751139, 744509763, 400011959, 1440466707, 1363416242, 973726663, 59253759, 1639096332, 336563455, 1642837685, 1215013716, 154523136, 593537720, 704035832, 1134594751, 1605135681, 1347315106, 302572379, 1762719719, 269676381, 774132919, 1851737163, 1482824219, 125310639, 1746481261, 1303742040, 1479089144, 899131941, 1169907872, 1785335569, 485614972, 907175364, 382361684, 885626931, 200158423, 1745777927, 1859353594, 259412182, 1237390611, 48433401, 1902249868, 304920680, 202956538, 348303940, 1008956512, 1337551289, 1953439621, 208787970, 1640123668, 1568675693, 478464352, 266772940, 1272929208, 1961288571, 392083579, 871926821, 1117546963, 1871172724, 1771058762, 139971187, 1509024645, 109190086, 1047146551, 1891386329, 994817018, 1247304975, 1489680608, 706686964, 1506717157, 579587572, 755120366, 1261483377, 884508252, 958076904, 1609787317, 1893464764, 148144545, 1415743291, 2102252735, 1788268214, 836935336, 433233439, 2055041154, 2109864544, 247038362, 299641085, 834307717, 1364585325, 23330161, 457882831, 1504556512, 1532354806, 567072918, 404219416, 1276257488, 1561889936, 1651524391, 618454448, 121093252, 1010757900, 1198042020, 876213618, 124757630, 2082550272, 1834290522, 1734544947, 1828531389, 1982435068, 1002804590, 1783300476, 1623219634, 1839739926, 69050267, 1530777140, 1802120822, 316088629, 1830418225, 488944891, 1680673954, 1853748387, 946827723, 1037746818, 1238619545, 1513900641, 1441966234, 367393385, 928306929, 946006977, 985847834, 1049400181, 1956764878, 36406206, 1925613800, 2081522508, 2118956479, 1612420674, 1668583807, 1800004220, 1447372094, 523904750, 1435821048, 923108080, 216161028, 1504871315, 306401572, 2018281851, 1820959944, 2136819798, 359743094, 1354150250, 1843084537, 1306570817, 244413420, 934220434, 672987810, 1686379655, 1301613820, 1601294739, 484902984, 139978006, 503211273, 294184214, 176384212, 281341425, 228223074, 147857043, 1893762099, 1896806882, 1947861263, 1193650546, 273227984, 1236198663, 2116758626, 489389012, 593586330, 275676551, 360187215, 267062626, 265012701, 719930310, 1621212876, 2108097238, 2026501127, 1865626297, 894834024, 552005290, 1404522304, 48964196, 5816381, 1889425288, 188942202, 509027654, 36125855, 365326415, 790369079, 264348929, 513183458, 536647531, 13672163, 313561074, 1730298077, 286900147, 1549759737, 1699573055, 776289160, 2143346068, 1975249606, 1136476375, 262925046, 92778659, 1856406685, 1884137923, 53392249, 1735424165, 1602280572 }; UINT hashHanson( const CHAR* s, SIZE_T L ) { UINT h = 0; for ( SIZE_T i = 0; i < L; i++ ) { unsigned char c = s[i]; h = ( h << 1 ) + scatter[ c ]; } return h; }The serious weakness is visible for your "Postfix" set (the strings with the same long postfix).
Finally, just for completeness, I propose one more function. I named it Novak Hash, and it's basically what should have been done to still do the "scatter" approach and to implement the function right (that is, not losing significant information about the previous values in every step and reducing the size of the table):
unsigned char* rijndaelSBox = (unsigned char*) "\x63\x7c\x77\x7b\xf2\x6b\x6f\xc5" "\x30\x01\x67\x2b\xfe\xd7\xab\x76" "\xca\x82\xc9\x7d\xfa\x59\x47\xf0" "\xad\xd4\xa2\xaf\x9c\xa4\x72\xc0" "\xb7\xfd\x93\x26\x36\x3f\xf7\xcc" "\x34\xa5\xe5\xf1\x71\xd8\x31\x15" "\x04\xc7\x23\xc3\x18\x96\x05\x9a" "\x07\x12\x80\xe2\xeb\x27\xb2\x75" "\x09\x83\x2c\x1a\x1b\x6e\x5a\xa0" "\x52\x3b\xd6\xb3\x29\xe3\x2f\x84" "\x53\xd1\x00\xed\x20\xfc\xb1\x5b" "\x6a\xcb\xbe\x39\x4a\x4c\x58\xcf" "\xd0\xef\xaa\xfb\x43\x4d\x33\x85" "\x45\xf9\x02\x7f\x50\x3c\x9f\xa8" "\x51\xa3\x40\x8f\x92\x9d\x38\xf5" "\xbc\xb6\xda\x21\x10\xff\xf3\xd2" "\xcd\x0c\x13\xec\x5f\x97\x44\x17" "\xc4\xa7\x7e\x3d\x64\x5d\x19\x73" "\x60\x81\x4f\xdc\x22\x2a\x90\x88" "\x46\xee\xb8\x14\xde\x5e\x0b\xdb" "\xe0\x32\x3a\x0a\x49\x06\x24\x5c" "\xc2\xd3\xac\x62\x91\x95\xe4\x79" "\xe7\xc8\x37\x6d\x8d\xd5\x4e\xa9" "\x6c\x56\xf4\xea\x65\x7a\xae\x08" "\xba\x78\x25\x2e\x1c\xa6\xb4\xc6" "\xe8\xdd\x74\x1f\x4b\xbd\x8b\x8a" "\x70\x3e\xb5\x66\x48\x03\xf6\x0e" "\x61\x35\x57\xb9\x86\xc1\x1d\x9e" "\xe1\xf8\x98\x11\x69\xd9\x8e\x94" "\x9b\x1e\x87\xe9\xce\x55\x28\xdf" "\x8c\xa1\x89\x0d\xbf\xe6\x42\x68" "\x41\x99\x2d\x0f\xb0\x54\xbb\x16"; unsigned long NovakHashUnrolled( char* s, int L ) { unsigned long h = 0; int i; unsigned char* t = (unsigned char*)s; for ( h = 0, i = 0; i < ( L & ~1 ); i += 2 ) { h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ]; h = ( h << 1 ) + h + rijndaelSBox[ t[ i + 1 ] ]; } if ( L & 1 ) h = ( h << 1 ) + h + rijndaelSBox[ t[ L - 1 ] ]; return h; } unsigned long NovakHash( const char* s, int L ) { int i; unsigned long h = 0; unsigned char* t = (unsigned char*)s; for ( i = 0; i < L; i++ ) h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ]; return h; }I just consider it "how Hanson's function should have been done," I still believe that functions that don't use any tables are better in almost all real-life circumstances.
Bret Mulvey
Evaluating hash functions
http://home.comcast.net/~bretm/hash/
The downside to just testing general use cases is that programmers who don't know any better will see a hash function that works for a particular situation, spread it around, and it will end up being used for things it shouldn't! e.g. the huge number of hash functions that exist now, many of them in wide use despite being pretty bad outside of a certain range of keys. Not to say that I know of a good universal metric, but work towards one would be good.
Thank you very much! CRC got the 4th place with your optimization. Its results are very balanced (a low number of collisions; no spikes on any test).
AFAIK, the best we can do is to test the functions on different sets of strings (both real-world and synthetic). I tried to find the sets that cover most common situations and are very different statistically: English words, function and variable names, sequential numbers, etc. If you have other ideas, I will be happy to add more tests.
For more "real world" a list of utf-8 words with multi-byte encodings would be good. I've also found a binary list of ips is good at screwing up hash functions, but that will need special consideration for loading as they can contain nulls and carriage returns.
I have reconsidered Novak Hash and I'd like to update it: instead of initializing with h = 0 I'd like h to be initialized with h = 1. The reason is that as one of the values in S-Box is 0 when strings of different lengths but only of the byte which maps to zero are hashed the pathological case occurs.
Similar pathological case can be constructed for all hashes that have multiplicative constant, if the initial value of h is 0. E.g. for x17 if it subtracts 32 of the byte it degrades to the pathological case when a lot of strings consisting only of spaces of different lengths are hashed -- all will map to the same slot!
Yes, R, RR, RRR, etc will not hash to the same value with h = 1, but existing collisions will still collide. Set h to whatever you want and hash "novak" and "qovAk", they'll both have the same hash.
You can continue plugging holes, but you'll only slow it down. If you really want to go the Sbox route, wouldn't a full 32 bit sbox like http://home.comcast.net/~bretm/hash/10.html be better than single bytes?
What can be considered a "good enough minimum" is is the most interesting question indeed! As the assemblers and compilers had the limitation of only 6 letters per symbol even just summing the letters was considered "good enough." Now, it appears that the demand for full cryptographic "avalanche" effect on all bits is not always necessary -- I'd really like to know at which point it becomes important. Avoiding some "obvious" problems with simplest functions should help.
That's the idea with which I experimented with 8-bit S-Box, to find if something can be "good enough" with smaller cache footprint (and something that's maybe more convenient for embedded systems limitations? I'm also preferring to the "endianess independent" implementations). Still, for my tastes, avoiding having tables at all can be better whenever the function is not very frequently invoked -- tests with the "full" cache don't reflect the a lot of typical usage scenarios I can imagine.
I'd really like to know where the limits of every function are.
I believe fixing the simple multiplicative functions for some common scenarios with the different initial value can be the "good enough" solution for some usages?
BTW, I also know that AES primitives are being increasingly implemented in hardware. I admit I haven't checked if S-Box mapping is actually part of the primitives implemented by Intel?
I know CRC32 is implemented in some new Intel CPU's, and I'm very interested to see how your implementation compares to the one using the specialized instructions.
Andrew, I've added the tests with UTF-8 and IP addresses (in binary). Fletcher's hash is terrible for UTF-8 texts because of repeated first bytes in multi-byte sequences. I did an additional test with a Russian novel, and Fletcher was even worse. The results of other functions were not very different from the previous tests.
Ace, I've updated Novak hash and x17 (h = 1). AES instructions implement the whole round of encryption, not just substitution using S-boxes.
CRC32 is implemented in Core i5-i7 processors, but with the iSCSI polynomial, so it is useless for ZIP, PNG, MPEG-2, and many other formats, which use a different polynomial. Though it does not matter much for a hash function which polynomial to use. I will probably try accelerated CRC32 later :)
Could any of the AES instructions be used for hashing too? I can't seem to find any speed comparions for the SSE CRC32/AES instructions at all.
As Peter linked, AES instructions seem to be too heavy for hashing, as they do much bigger work, which is not surprising, the substitution alone is very effective as soon as the table is in cache.
I've just done the tests on Core i5 processor (see the results above). Hardware-accelerated CRC is the fastest hash function; x17 is slow on this processor for some reason.
I've also optimized the handling of remaining bytes in your CRC32 code (using two conditions instead of the loop). It's slightly faster this way.
The hardware CRC32 numbers are fairly impressive. That is why I wondered about AES, if any of the ops are fast enough then they could be quite good.
Here's what I've found about the speedup of CPU-accelerated AES compared to non-CPU accelerated AES: it is approximately "just" factor 4, at least according to:
http://wiki.debianforum.de/BenchmarkFestplattenverschlüsselung
and according to Intel, up to 10 for multi-core scenario:
http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
As AES-NI instruction does whole round on 16 bytes and there is a lot of processing for one round and the typical speedup is only 4, I believe AES-NI is therefore still irrelevant for the simple kind of hashing that's the subject here.
Peter, do you have then I5-750 instead of any of I5-6*'s (which are all supposed to have AES-NI, at least according to: http://processorfinder.intel.com/List.aspx?ProcFam=3155)?
I didn't know there are I5's without AES-NI at all until now.
AES-NI results are impressive. As I found, it's supported by PGP, DiskCryptor, and other popular encryption software. GnuPG does not use the hardware acceleration; most likely, because they want to stay portable.
Yes, I have Core i5 750 (four cores, SSE 4.2, but no AES-NI).
P.S. Here is a new article with detailed CRC32 benchmarks.
http://amsoftware.narod.ru/algo.html
http://herakles.zcu.cz/~skala/publications.htm
especially:
http://herakles.zcu.cz/~skala/PUBL/PUBL_2010/2010_WSEAS-Corfu_Hash-final.pdf
http://herakles.zcu.cz/~skala/PUBL/PUBL_2010/2010_Corfu-NAUN-Hash.pdf
Perhaps will behelpful.
About visualization: I wonder how to do it, too :) If we plot the low 16 bits on x axis and the high bits on y axis, there will be not enough pixels to display them on a web page (need 216 × 216 pixels).
I know it's not needed to do that when some specialized solution is considered, but it's interesting playing with the experiments that are supposed to give us more insight about the behavior of the functions.
I hope it would be interesting to you to see FNV-1A(13bit) used in my intoxicatingly fast word-list ripper Leprechaun at http://encode.ru/threads/612-Fastest-decompressor!?p=22184&viewfull=1#post22184, where 22,202,980 wordlist of latin-letters-words is proceeded.
also 'FNV1A_Hash_Granularity' at http://encode.ru/threads/612-Fastest-decompressor!/page3.
I salute the host Peter for this site: it is most comprehensive and useful for sure, keep going.
after some minor changes in your Hash17_unrolled here comes the 'Alfalfa' tuned for latin-letters-words i.e. alpha strings between 1 and 31 chars.
This variant behaves surprisingly well when 13bits(8192 slots) are used.
With my testbed(22,202,980 distinct words) which uses 3-level hash(1st: 26 slots for first letter; 2nd: 31 slots for string length; 3rd: 8192 slots given to some multiplicative hasher) Alfalfa outperforms Hash17_unrolled by: 18,677,243 - 18,645,799 = 31,444 less collisions.
In my opinion every hash variant must be tuned specifically for a given data-set and to be regarded as a potential gem(not as a common-stone only because for other set it performed poorly).
// Word count: 35,271,297 of them 22,202,980 distinct // Number Of Trees(GREATER THE BETTER): 3525737 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18677243 // Maximum Attempts to Find/Put a WORD into a Binary-Search-Tree: '38' // Total Attempts to Find/Put WORDs into Binary-Search-Trees: 117,612,984 // Total Number of LEAFs in Binary-Search-Trees(GREATER THE BETTER): 8,056,968 // Perfectly-Balanced-Binary-Search-Tree for MaxNODEs = 90 must have PEAK = 7 = rounding down of integer (1+lb(90)) // Binary-Search-Tree(1st out of 2) with MaxNODEs = 90 has PEAK = 26 and LEAFs = 23 // Binary-Search-Tree(1st out of 1) with MaxPEAK = '38' has NODEs = 60 and LEAFs = 13 // Binary-Search-Tree(1st out of 3) with MaxLEAFs = 27 has NODEs = 84 and PEAK = 22 int Hash17_unrolled(char *key, int wrdlen) { int hash = 1; int i; for(i = 0; i < (wrdlen & -2); i += 2) { hash = (17) * hash + (key[i] - ' '); hash = (17) * hash + (key[i+1] - ' '); } if(wrdlen & 1) hash = (17) * hash + (key[wrdlen-1] - ' '); return ( hash ^ (hash >> 16) ) & 8191; } // Word count: 35,271,297 of them 22,202,980 distinct // Number Of Trees(GREATER THE BETTER): 3557181 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18645799 // Maximum Attempts to Find/Put a WORD into a Binary-Search-Tree: '37' // Total Attempts to Find/Put WORDs into Binary-Search-Trees: 116,908,873 // Total Number of LEAFs in Binary-Search-Trees(GREATER THE BETTER): 8,076,977 // Perfectly-Balanced-Binary-Search-Tree for MaxNODEs = 85 must have PEAK = 7 = rounding down of integer (1+lb(85)) // Binary-Search-Tree(1st out of 1) with MaxNODEs = 85 has PEAK = 19 and LEAFs = 18 // Binary-Search-Tree(1st out of 2) with MaxPEAK = '37' has NODEs = 49 and LEAFs = 10 // Binary-Search-Tree(1st out of 1) with MaxLEAFs = 28 has NODEs = 83 and PEAK = 25 int Alfalfa(const char *key, int wrdlen) { int hash = 7; int i; for(i = 0; i < (wrdlen & -2); i += 2) { hash = (17+9) * ((17+9) * hash + (key[i])) + (key[i+1]); } if(wrdlen & 1) hash = (17+9) * hash + (key[wrdlen-1]); return ( hash ^ (hash >> 16) ) & 8191; }A suggestion: putting one more column to your result table containing all latin-letters-words encountered in wikipedia-en-html.tar.wrd (12,561,874 distinct words in 146,973,879 bytes) would give a better look on collision-performance on heavy(real) loads along with light-weight Shakespeare's sonnets, don't you think?If you are interested, my testbed is here: http://www.sanmayce.com/Downloads/Leprechaun_r13+++++_EXE+ELF_vs_Wikipedia.zip
regarding speed I offer two extra fast hashers(tuned for 8192 slots):
// Number Of Trees(GREATER THE BETTER): 3550665 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18652315 int Alfalfa_HALF(const char *key, int wrdlen) { int hash = 12; int i; int j; // for(i = 0; i < (wrdlen & -2); i += 2) { // hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]); // } for(i = 0; i < (wrdlen & -4); i += 4) { hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]); hash = (( ((hash<<5)-hash) + key[i+2] )<<5) - ( ((hash<<5)-hash) + key[i+2] ) + (key[i+3]); } // if(wrdlen & 1) // hash = ((hash<<5)-hash) + (key[wrdlen-1]); for(j = 0; j < (wrdlen & 3); j += 1) { hash = ((hash<<5)-hash) + key[i+j]; } return ( hash ^ (hash >> 16) ) & 8191; } // Number Of Trees(GREATER THE BETTER): 3552103 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18650877 #define FNV1_32_INIT ((u_int32_t)2166136261) #define FNV1_32_PRIME ((u_int32_t)588411137) #define FNV_32A_OP(hash, octet) \ (((u_int32_t)(hash) ^ (u_int8_t)(octet)) * FNV1_32_PRIME) #define FNV_32A_OP32(hash, octet) \ (((u_int32_t)(hash) ^ (u_int32_t)(octet)) * FNV1_32_PRIME) int FNV1A_Hash_4_OCTETS(const char *str, int wrdlen) { u_int32_t hash; char *p; int wrdlen_QUADRUPLETS = wrdlen>>2; hash = FNV1_32_INIT; p=str; // The goal of stage #1: to reduce number of 'imul's and mainly: the number of loops. // Stage #1: for (; wrdlen_QUADRUPLETS != 0; --wrdlen_QUADRUPLETS) { hash = FNV_32A_OP32(hash, (unsigned long)*(long *)p); p=p+4; } // Stage #2: //for (; *p; ++p) { // hash = FNV_32A_OP(hash, *p); //} for (wrdlen_QUADRUPLETS = 0; wrdlen_QUADRUPLETS < (wrdlen & 3); wrdlen_QUADRUPLETS += 1) { hash = FNV_32A_OP(hash, *p); ++p; } return ((hash>>16) ^ hash) & 8191; // 00..8191 i.e. 2^13=8192 }regarding collisions it's up to your testbed, nevertheless I expect FNV1A_Hash_4_OCTETS to be a gem.
Peter, if you find it useful please put it along with generic FNV-1A.
Poorly performing functions are Fletcher and Novak. Not-so-good functions: Weinberger, x17, and Paul Hsieh. For this test, I used Georgi's list of all words in English Wikipedia (12.5 million words). You can try other word lists (the results will be similar) by running the benchmark with
/ccommand-line switch and processing the output with get_google_chart.py script.For small number of bits, all functions are equally poor because of the birthday paradox. I didn't implement the 32-bit hashing (it would require a rewrite of the benchmarking code), but the results should be similar to the above.
Georgi, I followed your suggestion and benchmarked a large word list (wikipedia-en-html.tar.wrd). Thanks for sharing your testbed.
Hardware-accelerated CRC and Murmur are winners again. Novak and Fletcher are at the bottom of the list because of high number of collisions.
Your unrolled version of FNV1A showed very good performance on small data sets. However, it has a high number of collisions in Numbers and IPv4 tests, so I cannot recommend it for general usage. Multiplications in the original FNV "mix up" the input characters to achieve avalanche effect. Unfortunately, when we remove some of them, this effect is lost.
About tuning for a given data set: there is perfect hashing (no collisions). It's used in parsers and other applications, where the list of words is fixed. For example, imagine a hash table for C++ keywords (if, else, for, class, etc.)
In the end, the winners didn't change: Murmur2 and hardware-accelerated CRC32 are the fastest hash functions. SBox looks promising, too.
Speed in percentage of regression (reference: the fastest non-hardware based):
==============================================================================
iSCSI CRC -11.27
Murmur2 0
x65599 1.75
Paul Larson 2.75
Alfalfa 3.35
FNV-1a 5.38
Hanson 8.55
SBox 11.27
CRC-32 11.52
K&R 11.82
x17 unrolled 14.42
Bernstein 14.43
Alfalfa_HALF 15.95
Sedgewick 17.17
lookup3 19.09
Paul Hsieh 19.44
Ramakrishna 26.11
MaPrime2c 30.04
One At Time 31.11
Arash Partow 31.78
Weinberger 44.85
FNV1A_unrolled 45.24
Novak unrolled 231.91
Fletcher 247.25
Collisions in percentage of regression (reference: the best):
=============================================================
Bernstein 0
CRC-32 0.04
Alfalfa_HALF 0.15
iSCSI CRC 0.17
Paul Larson 0.28
Sedgewick 0.31
FNV-1a 0.34
Murmur2 0.35
K&R 0.43
SBox 0.47
MaPrime2c 0.49
Arash Partow 0.5
lookup3 0.51
One At Time 0.66
Ramakrishna 0.92
x65599 1.38
Hanson 2.68
Paul Hsieh 5.11
Alfalfa 5.88
x17 unrolled 16.22
Weinberger 70.72
FNV1A_unrolled 193.48
Novak unrolled 204.62
Fletcher 336.97
Looking at it this way, Murmur2 is less than 3% faster than Larson, which is a simple constant: 101. Then looking at the number of collisions, there are a lot of functions which are different less than 1%, and Larson again scores very well. So my current conclusion would be, if I'd have to remember something for hashing up to a few million of English word-like entries, I'd just remember Larson's constant 101 (decimal) and stop worrying!
The graph is nice. The biggest visual insight is how so called "FNV1A_unrolled" becomes very bad at some fixed point and how much worse than most some functions are.
What graph shows is just the dependency of the number of collisions from the table size (table size == 2 ** bits) for different functions and a fixed large input set, and it's not unexpected that you can't see much for the table significantly smaller than the input set size. That's why I suggested something else: designing a function to always produce the input set for the given number of input elements.
I'd suggest that the set would have: words (40%) fixed string + words (20%) words + fixed string (20%) pure binary numbers, increasing (20%) Not because it's a good representation of any real input set but because we've already seen that it nicely amplifies the weaknesses. Then the goal is to select enough words from the 12 millions for any given target size. I'd sort the Wikipedia words with the sort function that as the highest criterion uses the number of letters and then I'd simply take the first N (where N is target table size * 0.4 for the first set etc.) Then, my initial idea was to fix the number of the input set elements and draw the separate graph like yours for such input sets. That was to allow visual comparison of functions even for smaller input sets, where we don't "see" much now.
very informative and well done(especially the graphical chart).
Allow me to give some of my thoughts regarding collisions:
Looking at the chart I see no difference for all functions up to 22bits, which is something new for me and in the same time very good(I expected 16bits to be the upper limit for good behavior) because in my view 13-20bits are the most targeted range.
And for future tests(64bit compiler needed because long long is used) it would be interesting to show the performance of 'FNV1A_Hash_Granularity' when only 'FNV1A_Hash_Granularity(wrd, wrdlen>>3, 3);' invocations are used for all lengths. I have made tests with 'FNV1A_Hash_8_OCTETS' with 32bit(not 64bit) multiplications and the speed was crazy, but with high collisions(due to lost carryings).
Below is the new 'Alfalfa_DWORD': your function 'x17 unrolled' boosted even more with best collisions result on my biggest(22,202,980 latin-letters-words) test and very fast too.
// Caution: big/little endian dependent. Here Little-endian is used. // More instructions but one memory(DWORD) access instead of 4 memory(BYTE) accesses. 1706 int Alfalfa_DWORD(const char *key, int wrdlen) 1707 { 1708 unsigned long hashDWORD, hashAlfalfa, iAlfalfa, jAlfalfa; 1709 hashAlfalfa = 7; 1710 for(iAlfalfa = 0; iAlfalfa < (wrdlen & -4); iAlfalfa += 4) { 1711 hashDWORD=*(unsigned long *)(&key[iAlfalfa]); 1712 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashDWORD>>0)&0xFF) ) + ((hashDWORD>>8)&0xFF); 1713 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashDWORD>>16)&0xFF) ) + ((hashDWORD>>24)&0xFF); 1714 } 1715 for(jAlfalfa = 0; jAlfalfa < (wrdlen & 3); jAlfalfa += 1) { 1716 hashAlfalfa = (17+9) * hashAlfalfa + key[iAlfalfa+jAlfalfa]; 1717 } 1718 return ( hashAlfalfa ^ (hashAlfalfa >> 16) ) & 8191; 1719 } /* PUBLIC _Alfalfa_DWORD ; Function compile flags: /Ogtpy _TEXT SEGMENT _key$ = 8 ; size = 4 _wrdlen$ = 12 ; size = 4 _Alfalfa_DWORD PROC ; Line 1707 push ebx ; Line 1710 mov ebx, DWORD PTR _wrdlen$[esp] push esi mov esi, ebx xor edx, edx and esi, -4 ; fffffffcH push edi mov edi, DWORD PTR _key$[esp+8] mov ecx, 7 jbe SHORT $LN4@Alfalfa_DW push ebp npad 6 $LL6@Alfalfa_DW: ; Line 1711 mov eax, DWORD PTR [edx+edi] ; Line 1713 imul ecx, 26 ; 0000001aH movzx ebp, al add ebp, ecx imul ebp, 26 ; 0000001aH mov ecx, eax shr ecx, 8 and ecx, 255 ; 000000ffH add ebp, ecx imul ebp, 26 ; 0000001aH mov ecx, eax shr ecx, 16 ; 00000010H and ecx, 255 ; 000000ffH add ecx, ebp imul ecx, 26 ; 0000001aH shr eax, 24 ; 00000018H add edx, 4 add ecx, eax cmp edx, esi jb SHORT $LL6@Alfalfa_DW pop ebp $LN4@Alfalfa_DW: ; Line 1715 xor eax, eax and ebx, 3 mov esi, ebx jbe SHORT $LN1@Alfalfa_DW add edx, edi $LL3@Alfalfa_DW: ; Line 1716 movsx edi, BYTE PTR [edx+eax] imul ecx, 26 ; 0000001aH inc eax add ecx, edi cmp eax, esi jb SHORT $LL3@Alfalfa_DW $LN1@Alfalfa_DW: ; Line 1718 mov eax, ecx shr eax, 16 ; 00000010H pop edi xor eax, ecx pop esi and eax, 8191 ; 00001fffH pop ebx ; Line 1719 ret 0 _Alfalfa_DWORD ENDP _TEXT ENDS */A 64-bit benchmark would be interesting, and I will do it in future; thank you for the idea. Most functions are tuned for 32 bits, but some should be fast in 64-bit mode.
There is no difference between hash functions for small number of bits, because the benchmark attempts to squeeze 12.5 million words into 221 ≈ 2 million cells. It's my fault :( Not only hash table size, but also the input set size should be reduced, as Ace proposed.
Ace, it would be easier if you do the visualization as you wanted. You can freely use my code and Google Charts API to draw the graph. I'm very interested in the results and you can publish them here.
I agree that Larson's hash scores well on most tests, and it's very simple. Thanks for the scaled results on the large data set, they are much more readable.
svg (I use Opera, haven't tried others): http://pastebin.com/raw.php?i=btMpwEkb
png: http://i54.tinypic.com/2ryse1w.png
What that says to me is that your original tests contain too much "noise" compared to the "signal" (at least for my taste!). But they still helped detect some serious problems in some of the functions, and all together some very insightful steps came as the topic developed. Peter, thank you for maintaining this topic, its educational value is really high!
I guess if it is written in assembly(to avoid these ugly BYTE extracts) the picture will be different, also due to short strings(the average word length in Wikipedia-en is 10 bytes: 2 loops within the first cycle, which could be unrolled too, but I don't like such a 'tuning') speed-performance of 'Alfalfa_DWORD' and 'FNV1A_Hash_4_OCTETS' suffers.
When multi millions of keys are involved, I firmly believe that a cascade of hash functions must be used to ensure a good(no worst case allowed) distribution/dispersion, then 22bits are quite enough, in my case instead of using some 24bits I chose using 5-bits/5-bits/13bits. Here the principle 'know your data' comes as a rule. As for the birthday paradox it supports such an approach fully. In short: an application designed for speed must not rely on some kind of general-purpose functions but on uncompromising/tuned ones.
Talking of 'FNV' variants(very well defined at Mr. Noll's site - the home of FNV), I think there is a niche where even 128bit(with greedy 64bit multiplications used similarly to 32/64bit counterpart, in order to gain speed as in 'FNV1A_Hash_8_OCTETS') would be interesting to be explored/tested.
As you can see, they are very different from the results on Core i5. So the reason is differences between microarchitectures.
I'm an engineer. Here's how we solve such things: everything that's inside of the 10% span can be considered the same! We should first drop the functions for which we know that are obviously bad, even when they are good in some special occasions. That means, Novak, Fletcher, FNV1A_unrolled, Weinberger must go, but also Hanson (Interesting: the nature of the input set of Wikipedia words only is quite forgiving to Hanson function for which we know that it loses bits in every iteration) and Hsieh and x17 according to the collisions graph. Then, from the remaining ones, we keep those that are inside of 10% staring from the first remaining, consider them *the same*, then inside of the next 10%, consider these the same but belonging to the lower group, the rest we drop. We repeat this in all 4 cases. Which are then always among first 10%? Which are sometimes in top 10% sometimes in other 10%? We need to know only 4 resulting groups. I proposed the "initial drop" group. Now let's make the "always in top 10%" "less good" and "even less good" groups. Which function is where?
(If this approach works, we can then generate the tests on a few more platforms and use these scores too. Again the winner is any function that is still in "always top 10%" (if there is such, if not, then those in "less good" etc). The processors as MIPS and ARM are certainly something that is more different than all x86. I can run something on my MIPS-based router, but only if it doesn't use more than around 10 MB of RAM and can be compiled with GCC.)
"Looking at the chart I see no difference for all functions up to 22bits, which is something new for me ...",
I was thinking of my own approach and talking of something else/obvious, from time-to-time I do such blunders.
Of course when 4,194,304 slots are given with incoming 12,561,874 keys the resultant collisions equal (12,561,874-4,194,304)..(12,561,874-1).
What was on my mind: the slot[s] out of 6,602,752 with the MAXIMUM number of collisions - my explanation for such thinking failures: a mind troubled by anxiety.
Again, the thing I am/was interested in: to examine/find the slot[s] with MAXIMUM collisions(in fact the depth) for each bucket(hash table) - this is crucial for any further processing whether linked-lists/binary-search-trees/b-trees or another hash.
In my view the useful info(even when keys<=slots) is not the hash table utilization but the depth(how much it is nested). For example if 2^20 slots are used with 2^24 keys I care of the peak: the ideal case is 2^4 depth or 16 layers one megabyte each. If 2^13 slots are used with 2^12 keys I care not so much for zero collisions as for not existing a slot with depth>some_THRESHOLD, of course the primary goal is to hash the keys at highest possible speed.
Peter, if you find it useful(in sake for gaining more experience) please include and this one(as 32bit or/and 64bit):
// Number Of Trees(GREATER THE BETTER): 3557181 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18645799 // Caution: big/little endian dependent. // More instructions but one memory(QWORD) access instead of 8 memory(BYTE) accesses. 1736 int Alfalfa_QWORD(const char *key, int wrdlen) 1737 { 1738 unsigned long long hashQWORD; 1739 // unsigned long iAlfalfa, jAlfalfa; 1740 unsigned long hashAlfalfa = 7; 1741 1742 for(; wrdlen >= 8; wrdlen -= 8, key += 8) { 1743 hashQWORD=*(unsigned long long*)key; 1744 // for(iAlfalfa = 0; iAlfalfa < (wrdlen & -8); iAlfalfa += 8) { 1745 // hashQWORD=*(unsigned long long *)(&key[iAlfalfa]); 1746 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>0)&0xFF) ) + ((hashQWORD>>8)&0xFF); 1747 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>16)&0xFF) ) + ((hashQWORD>>24)&0xFF); 1748 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>32)&0xFF) ) + ((hashQWORD>>40)&0xFF); 1749 hashAlfalfa = (17+9) * ((17+9) * hashAlfalfa + ((hashQWORD>>48)&0xFF) ) + ((hashQWORD>>56)&0xFF); 1750 } 1751 for(; wrdlen; wrdlen--, key++) 1752 hashAlfalfa = (17+9) * hashAlfalfa + *key; 1753 // for(jAlfalfa = 0; jAlfalfa < (wrdlen & 7); jAlfalfa += 1) { 1754 // hashAlfalfa = (17+9) * hashAlfalfa + key[iAlfalfa+jAlfalfa]; 1755 // } 1756 return ( hashAlfalfa ^ (hashAlfalfa >> 16) ) & 8191; 1757 } /* Alfalfa_QWORD PROC ; Line 1737 $LN13: sub rsp, 8 ; Line 1742 cmp edx, 8 mov r10d, edx mov r11, rcx mov r9d, 7 jl $LN4@Alfalfa_QW@2 mov QWORD PTR [rsp], rbx mov rbx, r10 shr rbx, 3 mov eax, ebx neg eax lea r10d, DWORD PTR [r10+rax*8] npad 4 $LL6@Alfalfa_QW@2: ; Line 1743 mov rdx, QWORD PTR [r11] add r11, 8 ; Line 1746 mov rax, rdx shr rax, 8 movzx r8d, al ; Line 1747 mov rax, rdx shr rax, 16 movzx ecx, al mov rax, rdx shr rax, 24 ; Line 1749 imul r8d, 26 add r8d, ecx movzx ecx, al mov rax, rdx shr rax, 32 ; 00000020H imul r8d, 26 add r8d, ecx movzx ecx, al mov rax, rdx shr rax, 40 ; 00000028H imul r8d, 26 add r8d, ecx movzx ecx, al mov rax, rdx shr rax, 48 ; 00000030H imul r8d, 26 add r8d, ecx movzx ecx, al movzx eax, dl shr rdx, 56 ; 00000038H imul r8d, 26 add r8d, ecx imul eax, 558124416 ; 21444d80H imul r8d, 26 sub r8d, eax mov eax, r9d add r8d, edx imul eax, 1626332928 ; 60efdf00H mov r9d, r8d sub r9d, eax sub rbx, 1 jne $LL6@Alfalfa_QW@2 mov rbx, QWORD PTR [rsp] $LN4@Alfalfa_QW@2: ; Line 1751 test r10d, r10d je SHORT $LN1@Alfalfa_QW@2 $LL3@Alfalfa_QW@2: ; Line 1752 movsx eax, BYTE PTR [r11] imul r9d, 26 inc r11 add r9d, eax sub r10d, 1 jne SHORT $LL3@Alfalfa_QW@2 $LN1@Alfalfa_QW@2: ; Line 1756 mov eax, r9d shr eax, 16 xor eax, r9d and eax, 8191 ; 00001fffH ; Line 1757 add rsp, 8 ret 0 Alfalfa_QWORD ENDP */And also for research purposes the eight-bytes-at-once-FNV-variant:
// Number Of Trees(GREATER THE BETTER): 3526192 // Forest population(Hash Function Quality regarding Collisions i.e. Hash Table Utilization): 53% // Number Of Hash Collisions(Distinct WORDs - Number Of Trees): 18676788 #define FNV1_64_INIT ((u_int64_t)14695981039346656037) #define FNV1_64_PRIME ((u_int64_t)1099511628211) #define FNV_64A_OP(hash, octet) \ (((u_int64_t)(hash) ^ (u_int8_t)(octet)) * FNV1_64_PRIME) #define FNV_64A_OP64(hash, octet) \ (((u_int64_t)(hash) ^ (u_int64_t)(octet)) * FNV1_64_PRIME) int FNV1A_Hash_8_OCTETS(const char *str, int wrdlen) { u_int64_t hash64; char *p; int wrdlen_OCTUPLET = wrdlen>>3; hash64 = FNV1_64_INIT; p=str; // The goal of stage #1: to reduce number of 'imul's and mainly: the number of loops. // Stage #1: for (; wrdlen_OCTUPLET != 0; --wrdlen_OCTUPLET) { hash64 = FNV_64A_OP64(hash64, (u_int64_t)*(u_int64_t *)p); // SLOWer but with carry p=p+8; } // Stage #2: //for (; *p; ++p) { // hash = FNV_32A_OP(hash, *p); //} for (wrdlen_OCTUPLET = 0; wrdlen_OCTUPLET < (wrdlen & 7); wrdlen_OCTUPLET += 1) { hash64 = FNV_64A_OP(hash64, (u_int8_t)*(u_int8_t *)p); // SLOWer but with carry ++p; } // 5*13 = 64+1 or 1*12+4*13 = 64 i.e. shift by 12,25,38,51 return ( (hash64>>(64-(1*12+0*13))) ^ (hash64>>(64-(1*12+1*13))) ^ (hash64>>(64-(1*12+2*13))) ^ (hash64>>(64-(1*12+3*13))) ^ hash64) & 8191; // SLOWer but with carry }Enough, I am stopping with offering more functions.Now I'd drop these that scored 4 in "short" tests (more than 100% worse than the best for the given test): Alfalfa, Alfalfa_DWORD, Partow, One_At_Time
Second, I can drop the only ones that scored "3" in i5big test: MaPrime2c and Ramakrishna (they are obviously worst in pmsmall test too)
Then, I can drop the only one which never scores 1 in small test: lookup3, it's good since it scores always 2 in i5big and pmbig. That leaves:
Now, using "big" results it's obvious that the winners are Larson, Murmur2 and x65599 (the only three that scored 1 by both big).
Using "small" results, we can't see much. We see that SBox and CRC-32 never score 3 on PM (meaning that the table lookups are really fast there and that the resulting behaviour is good). Otherwise, every function sometimes scores 3. That again can only mean that these measurements were "too fuzzy" to be of too much use. And that leaves open if using short tests is relevant enough, in the form they are now, and if we were fair to drop some function that scored 4 there and were 1 in bigs (Alfalfa, Alfalfa_DWORD).
If the small tests are relevant enough to mean something and if we believe bigs more, Murmur2 is the best, the second is Larson, the third x65599.
However if we look at how complex the functions are to implement, x65599 is the simplest, requiring no fast MUL in the CPU, the second (as it doesn't require alignment) is Larson, the third is Murmur2 which has the highest demands on the hardware.
The conclusion:
1) It seems that x65599 is confirmed to be both very good and simple. Larson is the second simple function. Murmur2 can be better as soon as you have aligned strings and use modern CPU, or if you have some demands not covered enough by all the above tests.
2) The "small" tests in the current form still look "too fuzzy" for my taste to trust them more than as to point "what's not good" (if even so much). I don't know how they can be improved, instead, I'd rather modify or add inputs to "big" tests in order to have more values that we saw that are able to detect the problems in functions tested in "small" (prefix, suffix, numbers, binary numbers).
FNV-1a, Larson, Murmur2, x65599, Alfalfa_HALF, Bernstein, CRC-32, K&R, SBox, Sedgewick
you use, the worst that can happen in the tested cases is to have sometimes the slowdown of 2 compared to the best possible outcome tested, most of the times you can expect to be inside of 20% difference, so if you use one of the above functions in the scenarios close to the tested ones, you practically don't have to worry if you selected the right one. Factor 2 difference is not much compared to much bigger differences (i.e. factors 100) that are possible to be achieved by only allocating too often, for example.
So in my opinion, as long as the practical "speed competition" can't be organized to consistently demonstrate bigger obvious differences, it seems that we shouldn't worry too much about the speed.
The damage is undone, the ugly inconsistency(Mobile vs i5) exterminated, the result for 25bits(very useful testbed hash.cpp Peter, thank you) on Pentium Merom is:
Note for FNV1A_unrolled: choosing small multiplier(as Mr.Noll advised) numberly FNV1_32_PRIME=588411137 becomes 31 and removing all multiplications. Note for 'Alfalfa_HALF': re-arranged expressions in order to avoid converting(compiler plays badly here) shifts with multiplications. Note for 'Alfalfa': (17+9) replaced with 53. Peter here is Alfalfa's speed-performance on my Pentium Merom with (17+9) changed to 13 and 19 for Wikipedia-en words test: 13 11,525,803 [2,627,025] 19 11,676,574 [2,292,587] 53 12,093,768 [2,074,883] and results for x17 unrolled: 11,767,768 [2,410,605] D:\_KA45F~1\_w>release\hash.exe wikipedia-en-html.tar.wrd 12561874 lines read 33554432 elements in the table (25 bits) x17 unrolled: 11877089 11799077 11776420 11776905 11767768| 11767768 [2410605] FNV1A_unrolled: 12789079 12791080 12816821 12877818 12799565| 12789079 [2191287] Alfalfa: 12096628 12100191 12093768 12114109 12121803| 12093768 [2074883] Alfalfa_HALF: 12028618 12013585 12022435 12034921 12092433| 12013585 [2077426] D:\_KA45F~1\_w>hash_benchmark.bat Sonnets 3228 lines read 8192 elements in the table (13 bits) x17 unrolled: 603 607 581 581 581| 581 [ 589] FNV1A_unrolled: 546 519 518 517 535| 517 [ 570] Alfalfa: 561 556 573 554 555| 554 [ 570] Alfalfa_HALF: 587 607 577 576 594| 576 [ 543] D:\_KA45F~1\_w> FNV1A_unrolled a.k.a. 'FNV1A_Hash_4_OCTETS_MULless' follows: ?HashFNV1A_unrolled@@YAIPBDK@Z PROC ; HashFNV1A_unrolled ; 753 : //const UINT PRIME = 31; ; 754 : UINT hash = 2166136261; ; 755 : const CHAR * p = str; 00210 8b 54 24 04 mov edx, DWORD PTR _str$[esp-4] 00214 56 push esi 00215 57 push edi ; 756 : ; 757 : // Reduce the number of multiplications by unrolling the loop ; 758 : for (SIZE_T ndwords = wrdlen / sizeof(DWORD); ndwords; --ndwords) { 00216 8b 7c 24 10 mov edi, DWORD PTR _wrdlen$[esp+4] 0021a 8b f7 mov esi, edi 0021c c1 ee 02 shr esi, 2 0021f b9 c5 9d 1c 81 mov ecx, -2128831035 ; 811c9dc5H 00224 85 f6 test esi, esi 00226 74 13 je SHORT $LN4@HashFNV1A_ $LL6@HashFNV1A_: ; 759 : //hash = (hash ^ *(DWORD*)p) * PRIME; ; 760 : hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p); 00228 8b 02 mov eax, DWORD PTR [edx] 0022a 33 c1 xor eax, ecx 0022c 8b c8 mov ecx, eax 0022e c1 e1 05 shl ecx, 5 00231 2b c8 sub ecx, eax ; 761 : ; 762 : p += sizeof(DWORD); 00233 83 c2 04 add edx, 4 00236 83 ee 01 sub esi, 1 00239 75 ed jne SHORT $LL6@HashFNV1A_ $LN4@HashFNV1A_: ; 763 : } ; 764 : ; 765 : // Process the remaining bytes ; 766 : for (SIZE_T i = 0; i < (wrdlen & (sizeof(DWORD) - 1)); i++) { 0023b 83 e7 03 and edi, 3 0023e 8b f7 mov esi, edi 00240 76 12 jbe SHORT $LN1@HashFNV1A_ $LL3@HashFNV1A_: ; 767 : //hash = (hash ^ *p++) * PRIME; ; 768 : hash = ((hash ^ *p)<<5) - (hash ^ *p); 00242 0f be 02 movsx eax, BYTE PTR [edx] 00245 33 c1 xor eax, ecx 00247 8b c8 mov ecx, eax 00249 c1 e1 05 shl ecx, 5 0024c 2b c8 sub ecx, eax ; 769 : p++; 0024e 42 inc edx 0024f 83 ee 01 sub esi, 1 00252 75 ee jne SHORT $LL3@HashFNV1A_ $LN1@HashFNV1A_: ; 770 : } ; 771 : ; 772 : return (hash>>16) ^ hash; 00254 8b c1 mov eax, ecx 00256 c1 e8 10 shr eax, 16 ; 00000010H 00259 5f pop edi 0025a 33 c1 xor eax, ecx 0025c 5e pop esi ; 773 : } 0025d c3 ret 0 ?HashFNV1A_unrolled@@YAIPBDK@Z ENDP ; HashFNV1A_unrolled Alfalfa_HALF follows: ?HashAlfalfa_HALF@@YAIPBDK@Z PROC ; HashAlfalfa_HALF ; 738 : { 00260 53 push ebx 00261 55 push ebp ; 739 : UINT hash = 12; ; 740 : UINT hashBUFFER; ; 741 : SIZE_T i; ; 742 : for(i = 0; i < (wrdlen & -4); i += 4) { 00262 8b 6c 24 10 mov ebp, DWORD PTR _wrdlen$[esp+4] 00266 56 push esi 00267 8b dd mov ebx, ebp 00269 33 d2 xor edx, edx 0026b 83 e3 fc and ebx, -4 ; fffffffcH 0026e 57 push edi 0026f 8b 7c 24 14 mov edi, DWORD PTR _key$[esp+12] 00273 b9 0c 00 00 00 mov ecx, 12 ; 0000000cH 00278 76 44 jbe SHORT $LN4@HashAlfalf 0027a 8d 9b 00 00 00 00 npad 6 $LL6@HashAlfalf: ; 743 : //hash = (( ((hash<<5)-hash) + key[i] )<<5) - ( ((hash<<5)-hash) + key[i] ) + (key[i+1]); ; 744 : hashBUFFER = ((hash<<5)-hash) + key[i]; ; 745 : hash = (( hashBUFFER )<<5) - ( hashBUFFER ) + (key[i+1]); ; 746 : //hash = (( ((hash<<5)-hash) + key[i+2] )<<5) - ( ((hash<<5)-hash) + key[i+2] ) + (key[i+3]); ; 747 : hashBUFFER = ((hash<<5)-hash) + key[i+2]; ; 748 : hash = (( hashBUFFER )<<5) - ( hashBUFFER ) + (key[i+3]); 00280 0f be 34 17 movsx esi, BYTE PTR [edi+edx] 00284 8b c1 mov eax, ecx 00286 c1 e0 05 shl eax, 5 00289 2b c1 sub eax, ecx 0028b 0f be 4c 17 01 movsx ecx, BYTE PTR [edi+edx+1] 00290 03 f0 add esi, eax 00292 8b c6 mov eax, esi 00294 c1 e0 05 shl eax, 5 00297 2b c6 sub eax, esi 00299 03 c1 add eax, ecx 0029b 8b c8 mov ecx, eax 0029d c1 e1 05 shl ecx, 5 002a0 2b c8 sub ecx, eax 002a2 0f be 44 17 02 movsx eax, BYTE PTR [edi+edx+2] 002a7 03 c8 add ecx, eax 002a9 8b c1 mov eax, ecx 002ab c1 e0 05 shl eax, 5 002ae 2b c1 sub eax, ecx 002b0 0f be 4c 17 03 movsx ecx, BYTE PTR [edi+edx+3] 002b5 83 c2 04 add edx, 4 002b8 03 c8 add ecx, eax 002ba 3b d3 cmp edx, ebx 002bc 72 c2 jb SHORT $LL6@HashAlfalf $LN4@HashAlfalf: ; 749 : } ; 750 : for(SIZE_T j = 0; j < (wrdlen & 3); j += 1) { 002be 33 c0 xor eax, eax 002c0 83 e5 03 and ebp, 3 002c3 8b f5 mov esi, ebp 002c5 76 1d jbe SHORT $LN1@HashAlfalf 002c7 03 d7 add edx, edi 002c9 8d a4 24 00 00 00 00 npad 7 $LL3@HashAlfalf: ; 751 : hash = ((hash<<5)-hash) + key[i+j]; 002d0 0f be 3c 02 movsx edi, BYTE PTR [edx+eax] 002d4 8b d9 mov ebx, ecx 002d6 c1 e3 05 shl ebx, 5 002d9 2b d9 sub ebx, ecx 002db 03 fb add edi, ebx 002dd 40 inc eax 002de 8b cf mov ecx, edi 002e0 3b c6 cmp eax, esi 002e2 72 ec jb SHORT $LL3@HashAlfalf $LN1@HashAlfalf: 002e4 5f pop edi 002e5 5e pop esi ; 752 : } ; 753 : return hash ^ (hash >> 16); 002e6 8b c1 mov eax, ecx 002e8 c1 e8 10 shr eax, 16 ; 00000010H 002eb 5d pop ebp 002ec 33 c1 xor eax, ecx 002ee 5b pop ebx ; 754 : } 002ef c3 ret 0 ?HashAlfalfa_HALF@@YAIPBDK@Z ENDP ; HashAlfalfa_HALFPeter please update(if you find them useful) above functions, they are now finished.I agree that the difference between, say, SBox and CRC-32 (in software) is negligible, but try to compare Weinberger and Murmur2. Factor 2 difference is noticable for an end user. Also remember that I excluded some really bad functions from the test (e.g., Two chars). So, the choice of hash function matters, and we should explore different architectures. Fully agree with you about the best function, which is Murmur2; Larson and x65599 are nice, too.
Georgi, the updated FNV1A_unrolled is much better at Wikipedia test, congratulations! Regarding the slots with maximum collisions. Red Dragon Book proposes the following formula for evaluating hash function quality:
where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value (I used a counter for the number of visited slots in earlier version of the benchmark, which is similar). The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, so AFAIK it's not bad.
Here are the results for some of our functions:
Murmur2 is the best, again. I wrote a Python 3.x script (count_quality.py) for generating such charts.
I also like how it's nicely visible that numbers, prefix and postfix are the best stress tests for a lot of functions.
Regarding "factor 2": it's the fact that your "small" measurements can't point to a single function which stays *always* on top. That's why I rejected those that certainly make more than factor 2 (that score 4) or exhibit problems. The remaining ones make less (score 3 means: 20%-100%) and even that not often -- note that in your PM big test all functions:
Alfalfa, Alfalfa_DWORD, Alfalfa_HALF, Bernstein, CRC-32, K&R, Larson, Murmur2, Partow, Ramakrishna, Sedgewick, x65599, FNV-1a, MaPrime2c, One_At_Time, SBox, lookup3
are not more different than 20% from the best one, and most not more different than 10%! And that's for big inputs. For small inputs, there's practically no chance that you'll notice the difference. For the big inputs too, it's not worth the invested time to consider them: 3 seconds or 2.6 seconds? Who sees that? Most programmers make orders of magnitude bigger obvious overheads than that.
Therefore I like any direction that would either demonstrate more obvious time differences or provide some other relevant criterion for evaluation.
Regarding i5 measurements: I suggest you (as I always did): don't use the micrometer to measure the distance between cities. Produce the tests that take enough time to be able to measure them using independent clock (QueryPerf..) by simply making the test running long enough (don't measure too small inputs, they deceive, or if you do need to measure small inputs do a lot of small different inputs -- like always using a new part of wikipedia words, and the new numbers values etc). Then, if it's still needed, make the explicit overhead of some operation to make the meaningful differences more obvious. For example, make the cost of fetching the values further in the same chain higher (before they are compared with the input value). That's reasonable to expect as soon as your hash table and values in it are much bigger than the CPU cache. See what you get then... Then add allocations -- see if that makes any other differences irrelevant (nice to know) etc...
If you add allocations or some other "overhead", the result will be completely unpredictable, because you will be measuring not only your hash function, but also the allocator performance. I'm moving in another direction: trying to eliminate all side effects on Core i5 (on Pentium M, they are already eliminated).
Very large in-memory hash tables are rarely used in real-world programs. Large databases are typically implemented with B-trees, but I'm interested in small hash tables such as a symbol table in compiler.
Do we see graph for all n? I thought it's for only one n? It would be interesting to see how the results vary as n changes.
> m = 2 × round_to_next_power_of_two(n)
if you target the "real life" scenarios, how do you expect your compiler to know n in advance to determine m? AFAIK compilers typically have fixed m or rehash as little as possible and then you can vary only n. If m is not fixed then you have to add "rehashing" too to your measurements.
And what kind of the compiler do you target? Modern C++ compilers now have very long (decorated) identifiers and awfully lot values thanks to the libraries and approaches like boost. All Win32 and framework headers were also quite big before, but some orders of magnitude smaller. It has sense to measure for a lot of values. Long too.
> trying to eliminate all side effects on Core i5
I'd just make the test to last at least a few seconds of calculation for each measurement -- if I'd measure for small tables, I'd feed them with various inputs (now with the big set of words, there's enough to use, for every pass the new ones!) and measure the total time. I'm quite sure that gives much more repeatable results, and it can show better differences too. It's simple: if you use too little to measure you measure more noise than the signal. And of course I wouldn't measure the ticks but the "real" time between the start and the end of every such calculation.
Ace,
- 'HashFNV1A_unrolled' is a pure FNV-1A variant;
- they are different, don't you see the XOR, generally I don't read enough, that is I like to experience the things myself, so a pseudo-plagiarism is possible.
HashKernighanRitchie:
hash = 31 * hash + key[i];
HashFNV1A_unrolled:
hash = ((hash ^ *(DWORD*)str)<<5) - (hash ^ *(DWORD*)str);
And whether 15,17,31,33 the choice is a kind of gambling.
First improved function is 'FNV1A_Hash_WHIZ': FNV1A_Hash_4_OCTETS with FNV1_32_PRIME = 1607 or near-by as 1579.
Second improved function is 'FNV1A_unrolled_Final': post/second cycle in 'FNV1A_unrolled' - just unrolled.
'FNV1A_Hash_WHIZ' follows:
#define FNV1_32_INIT ((UINT)2166136261)
#define FNV1_32_PRIME ((UINT)1607)
#define FNV_32A_OP(hash, octet) \
(((UINT)(hash) ^ (unsigned char)(octet)) * FNV1_32_PRIME)
#define FNV_32A_OP32(hash, octet) \
(((UINT)(hash) ^ (UINT)(octet)) * FNV1_32_PRIME)
UINT FNV1A_Hash_WHIZ(const char *str, SIZE_T wrdlen)
{
UINT hash32;
const char *p;
hash32 = FNV1_32_INIT;
p=str;
for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash32 = FNV_32A_OP32(hash32, (UINT)*(UINT *)p);
}
if (wrdlen & -2) {
hash32 = FNV_32A_OP32(hash32, *(UINT*)p&0xFFFF);
p++;p++;
}
if (wrdlen & 1)
hash32 = FNV_32A_OP(hash32, *p);
return hash32 ^ (hash32 >> 16);
}
'FNV1A_unrolled_Final' follows:
UINT HashFNV1A_unrolled_Final(const CHAR *str, SIZE_T wrdlen)
{
//const UINT PRIME = 31;
UINT hash = 2166136261;
const CHAR * p = str;
/*
// Reduce the number of multiplications by unrolling the loop
for (SIZE_T ndwords = wrdlen / sizeof(DWORD); ndwords; --ndwords) {
//hash = (hash ^ *(DWORD*)p) * PRIME;
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
p += sizeof(DWORD);
}
*/
for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
}
// Process the remaining bytes
/*
for (SIZE_T i = 0; i < (wrdlen & (sizeof(DWORD) - 1)); i++) {
//hash = (hash ^ *p++) * PRIME;
hash = ((hash ^ *p)<<5) - (hash ^ *p);
p++;
}
*/
if (wrdlen & -2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
p++;p++;
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);
return (hash>>16) ^ hash;
}
Also a new FNV-1A derivate hasher targeted on case-insensitive-latin-letters I made, perhaps it is not useful for anything else, but the idea of making ONE multiplication/(shift+subtraction) within 6 chars range is worth thinking of, do you agree?
At first the name was 'Sixtine' a la S.King's 'Christine' monster, but I wanted to remind also for the insensitiveness.
// Tuned for lowercase-and-uppercase letters i.e. 26 ASCII symbols 65-90 and 97-122 decimal.
UINT Sixtinsensitive(const CHAR *str, SIZE_T wrdlen)
{
UINT hash = 2166136261;
UINT hashBUFFER_EAX, hashBUFFER_BH, hashBUFFER_BL;
const CHAR * p = str;
// Ox41 = 065 'A' 010 [0 0001]
// Ox5A = 090 'Z' 010 [1 1010]
// Ox61 = 097 'a' 011 [0 0001]
// Ox7A = 122 'z' 011 [1 1010]
// Reduce the number of multiplications by unrolling the loop
for(; wrdlen >= 6; wrdlen -= 6, p += 6) {
//hashBUFFER_AX = (*(DWORD*)(p+0)&0xFFFF);
hashBUFFER_EAX = (*(DWORD*)(p+0)&0x1F1F1F1F);
hashBUFFER_BL = (*(p+4)&0x1F);
hashBUFFER_BH = (*(p+5)&0x1F);
//6bytes-in-4bytes or 48bits-to-30bits
// Two times next:
//3bytes-in-2bytes or 24bits-to-15bits
//EAX BL BH
//[5bit][3bit][5bit][3bit][5bit][3bit][5bit][3bit]
// 5th[0..15] 13th[0..15]
// BL lower 3 BL higher 2bits
// OR or XOR no difference
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x07)<<5); // BL lower 3bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x18)<<(2+8)); // BL higher 2bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x07)<<(5+16)); // BH lower 3bits of 5bits
hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x18)<<((2+8)+16)); // BH higher 2bits of 5bits
//hash = (hash ^ hashBUFFER_EAX)*1607; //What a mess: <<7 becomes imul but <<5 not!?
hash = ((hash ^ hashBUFFER_EAX)<<5) - (hash ^ hashBUFFER_EAX);
//1607:[2118599]
// 127:[2121081]
// 31:[2139242]
// 17:[2150803]
// 7:[2166336]
// 5:[2183044]
//8191:[2200477]
// 3:[2205095]
// 257:[2206188]
}
// Post-Variant #1:
for(; wrdlen; wrdlen--, p++) {
hash = ((hash ^ (*p&0x1F))<<5) - (hash ^ (*p&0x1F));
}
/*
// Post-Variant #2:
for(; wrdlen >= 2; wrdlen -= 2, p += 2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);
*/
/*
// Post-Variant #3:
for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
}
if (wrdlen & -2) {
hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
p++;p++;
}
if (wrdlen & 1)
hash = ((hash ^ *p)<<5) - (hash ^ *p);
*/
return (hash>>16) ^ hash;
}
?Sixtinsensitive@@YAIPBDK@Z PROC ; Sixtinsensitive
; 891 : {
00300 53 push ebx
00301 55 push ebp
; 892 : UINT hash = 2166136261;
; 893 : UINT hashBUFFER_EAX, hashBUFFER_BH, hashBUFFER_BL;
; 894 : const CHAR * p = str;
; 895 :
; 896 : // Ox41 = 065 'A' 010 [0 0001]
; 897 : // Ox5A = 090 'Z' 010 [1 1010]
; 898 : // Ox61 = 097 'a' 011 [0 0001]
; 899 : // Ox7A = 122 'z' 011 [1 1010]
; 900 :
; 901 : // Reduce the number of multiplications by unrolling the loop
; 902 : for(; wrdlen >= 6; wrdlen -= 6, p += 6) {
00302 8b 6c 24 10 mov ebp, DWORD PTR _wrdlen$[esp+4]
00306 57 push edi
00307 8b 7c 24 10 mov edi, DWORD PTR _str$[esp+8]
0030b bb c5 9d 1c 81 mov ebx, -2128831035 ; 811c9dc5H
00310 83 fd 06 cmp ebp, 6
00313 72 5c jb SHORT $LN4@Sixtinsens
00315 b8 ab aa aa aa mov eax, -1431655765 ; aaaaaaabH
0031a f7 e5 mul ebp
0031c c1 ea 02 shr edx, 2
0031f 56 push esi
$LL6@Sixtinsens:
; 903 : //hashBUFFER_AX = (*(DWORD*)(p+0)&0xFFFF);
; 904 : hashBUFFER_EAX = (*(DWORD*)(p+0)&0x1F1F1F1F);
; 905 : hashBUFFER_BL = (*(p+4)&0x1F);
; 906 : hashBUFFER_BH = (*(p+5)&0x1F);
00320 0f be 77 05 movsx esi, BYTE PTR [edi+5]
00324 0f be 4f 04 movsx ecx, BYTE PTR [edi+4]
00328 83 e6 1f and esi, 31 ; 0000001fH
; 907 : //6bytes-in-4bytes or 48bits-to-30bits
; 908 : // Two times next:
; 909 : //3bytes-in-2bytes or 24bits-to-15bits
; 910 : //EAX BL BH
; 911 : //[5bit][3bit][5bit][3bit][5bit][3bit][5bit][3bit]
; 912 : // 5th[0..15] 13th[0..15]
; 913 : // BL lower 3 BL higher 2bits
; 914 : // OR or XOR no difference
; 915 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x07)<<5); // BL lower 3bits of 5bits
; 916 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BL&0x18)<<(2+8)); // BL higher 2bits of 5bits
; 917 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x07)<<(5+16)); // BH lower 3bits of 5bits
; 918 : hashBUFFER_EAX = hashBUFFER_EAX ^ ((hashBUFFER_BH&0x18)<<((2+8)+16)); // BH higher 2bits of 5bits
; 919 : //hash = (hash ^ hashBUFFER_EAX)*1607; //What a mess: <<7 becomes imul but <<5 not!?
; 920 : hash = ((hash ^ hashBUFFER_EAX)<<5) - (hash ^ hashBUFFER_EAX);
0032b 8b c6 mov eax, esi
0032d 83 e0 18 and eax, 24 ; 00000018H
00330 c1 e0 05 shl eax, 5
00333 83 e1 1f and ecx, 31 ; 0000001fH
00336 83 e6 07 and esi, 7
00339 33 c6 xor eax, esi
0033b 8b f1 mov esi, ecx
0033d c1 e0 0b shl eax, 11 ; 0000000bH
00340 83 e1 07 and ecx, 7
00343 83 e6 18 and esi, 24 ; 00000018H
00346 33 c6 xor eax, esi
00348 c1 e0 05 shl eax, 5
0034b 33 c1 xor eax, ecx
0034d 8b 0f mov ecx, DWORD PTR [edi]
0034f 81 e1 1f 1f 1f
1f and ecx, 522133279 ; 1f1f1f1fH
00355 c1 e0 05 shl eax, 5
00358 33 c1 xor eax, ecx
0035a 33 c3 xor eax, ebx
0035c 8b c8 mov ecx, eax
0035e c1 e1 05 shl ecx, 5
00361 2b c8 sub ecx, eax
00363 83 ed 06 sub ebp, 6
00366 83 c7 06 add edi, 6
00369 83 ea 01 sub edx, 1
0036c 8b d9 mov ebx, ecx
0036e 75 b0 jne SHORT $LL6@Sixtinsens
00370 5e pop esi
$LN4@Sixtinsens:
; 921 : //1607:[2118599]
; 922 : // 127:[2121081]
; 923 : // 31:[2139242]
; 924 : // 17:[2150803]
; 925 : // 7:[2166336]
; 926 : // 5:[2183044]
; 927 : //8191:[2200477]
; 928 : // 3:[2205095]
; 929 : // 257:[2206188]
; 930 : }
; 931 : // Post-Variant #1:
; 932 : for(; wrdlen; wrdlen--, p++) {
00371 85 ed test ebp, ebp
00373 74 17 je SHORT $LN1@Sixtinsens
$LL3@Sixtinsens:
; 933 : hash = ((hash ^ (*p&0x1F))<<5) - (hash ^ (*p&0x1F));
00375 0f be 07 movsx eax, BYTE PTR [edi]
00378 83 e0 1f and eax, 31 ; 0000001fH
0037b 33 c3 xor eax, ebx
0037d 8b d0 mov edx, eax
0037f c1 e2 05 shl edx, 5
00382 2b d0 sub edx, eax
00384 4d dec ebp
00385 47 inc edi
00386 8b da mov ebx, edx
00388 85 ed test ebp, ebp
0038a 75 e9 jne SHORT $LL3@Sixtinsens
$LN1@Sixtinsens:
; 934 : }
; 935 : /*
; 936 : // Post-Variant #2:
; 937 : for(; wrdlen >= 2; wrdlen -= 2, p += 2) {
; 938 : hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
; 939 : }
; 940 : if (wrdlen & 1)
; 941 : hash = ((hash ^ *p)<<5) - (hash ^ *p);
; 942 : */
; 943 : /*
; 944 : // Post-Variant #3:
; 945 : for(; wrdlen >= 4; wrdlen -= 4, p += 4) {
; 946 : hash = ((hash ^ *(DWORD*)p)<<5) - (hash ^ *(DWORD*)p);
; 947 : }
; 948 : if (wrdlen & -2) {
; 949 : hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF));
; 950 : p++;p++;
; 951 : }
; 952 : if (wrdlen & 1)
; 953 : hash = ((hash ^ *p)<<5) - (hash ^ *p);
; 954 : */
; 955 : return (hash>>16) ^ hash;
0038c 8b c3 mov eax, ebx
0038e 5f pop edi
0038f c1 e8 10 shr eax, 16 ; 00000010H
00392 5d pop ebp
00393 33 c3 xor eax, ebx
00395 5b pop ebx
; 956 : }
00396 c3 ret 0
?Sixtinsensitive@@YAIPBDK@Z ENDP ; Sixtinsensitive
Still, I am not happy with 'Sixtinsensitive' speed which is about (12445829-11075601 )/11075601*100%=12.3% slower than 'FNV1A_unrolled_Final'. If you are interested in boosting such an approach I will be glad.
D:\_KAZE_new-stuff\_w>hash.exe wikipedia-en-html.tar.wrd
12561874 lines read
33554432 elements in the table (25 bits)
Sixtinsensitive: 12554437 12465228 12466856 12445829 12447431| 12445829 [2139242]
x17 unrolled: 11927763 11908183 11928229 11923448 11929265| 11908183 [2410605]
FNV1A_Hash_WHIZ: 10642905 10647391 10631697 10633363 10626230| 10626230 [2189360] // FNV1_32_PRIME = 1607
FNV1A_Hash_WHIZ: 10567474 10566509 10587741 10604073 10563141| 10563141 [2144749] // FNV1_32_PRIME = 1999
FNV1A_Hash_WHIZ: 10535248 10555186 10514388 10517062 10518655| 10514388 [2154569] // FNV1_32_PRIME = 1579
FNV1A_unrolled_Final: 11101280 11084731 11105538 11075601 11085023| 11075601 [2252381]
FNV1A_unrolled: 11212976 11226149 11236817 11188433 11262786| 11188433 [2191287]
Alfalfa: 11837124 11864087 11835669 11856730 11841964| 11835669 [2074883]
Alfalfa_HALF: 11978432 11984171 11955814 11966588 11961493| 11955814 [2077426]
D:\_KAZE_new-stuff\_w>hash_benchmark.bat
Words
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 99 91 90 90 89| 89 [ 105]
x17 unrolled: 97 92 91 118 91| 91 [ 109]
FNV1A_Hash_WHIZ: 90 84 82 82 81| 81 [ 124]
FNV1A_unrolled_Final: 87 80 79 78 78| 78 [ 115]
FNV1A_unrolled: 90 83 81 80 80| 80 [ 106]
Alfalfa: 94 88 87 86 87| 86 [ 100]
Alfalfa_HALF: 98 111 91 91 90| 90 [ 97]
Win32
1992 lines read
4096 elements in the table (12 bits)
Sixtinsensitive: 524 512 534 510 509| 509 [ 409]
x17 unrolled: 592 617 588 588 589| 588 [ 414]
FNV1A_Hash_WHIZ: 464 437 436 437 456| 436 [ 418]
FNV1A_unrolled_Final: 434 425 424 423 449| 423 [ 414]
FNV1A_unrolled: 433 426 425 424 444| 424 [ 404]
Alfalfa: 573 569 568 587 568| 568 [ 411]
Alfalfa_HALF: 574 587 566 566 565| 565 [ 428]
Numbers
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 66 65 65 65 66| 65 [ 206]
x17 unrolled: 48 48 48 48 48| 48 [ 24]
FNV1A_Hash_WHIZ: 52 51 50 50 50| 50 [ 304]
FNV1A_unrolled_Final: 69 69 69 68 69| 68 [ 420]
FNV1A_unrolled: 70 69 68 68 68| 68 [ 420]
Alfalfa: 45 45 45 45 45| 45 [ 160]
Alfalfa_HALF: 54 53 53 53 53| 53 [ 288]
Prefix
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 165 162 162 162 162| 162 [ 106]
x17 unrolled: 205 204 204 204 204| 204 [ 113]
FNV1A_Hash_WHIZ: 129 146 128 128 128| 128 [ 100]
FNV1A_unrolled_Final: 123 122 122 122 122| 122 [ 101]
FNV1A_unrolled: 125 124 124 123 123| 123 [ 116]
Alfalfa: 199 219 200 200 200| 199 [ 104]
Alfalfa_HALF: 200 199 199 199 199| 199 [ 115]
Postfix
500 lines read
1024 elements in the table (10 bits)
Sixtinsensitive: 164 161 160 160 160| 160 [ 116]
x17 unrolled: 201 200 200 200 200| 200 [ 102]
FNV1A_Hash_WHIZ: 127 126 125 126 125| 125 [ 109]
FNV1A_unrolled_Final: 122 120 120 120 120| 120 [ 110]
FNV1A_unrolled: 120 119 263 118 118| 118 [ 102]
Alfalfa: 198 198 197 197 197| 197 [ 116]
Alfalfa_HALF: 198 197 196 302 196| 196 [ 111]
Variables
1842 lines read
4096 elements in the table (12 bits)
Sixtinsensitive: 433 416 415 415 436| 415 [ 374]
x17 unrolled: 438 434 433 451 433| 433 [ 368]
FNV1A_Hash_WHIZ: 369 361 358 360 383| 358 [ 353]
FNV1A_unrolled_Final: 363 354 353 352 353| 352 [ 352]
FNV1A_unrolled: 367 360 359 357 358| 357 [ 341]
Alfalfa: 440 415 413 413 413| 413 [ 343]
Alfalfa_HALF: 462 438 437 435 436| 435 [ 396]
Sonnets
3228 lines read
8192 elements in the table (13 bits)
Sixtinsensitive: 611 587 587 586 586| 586 [ 542]
x17 unrolled: 618 586 585 585 585| 585 [ 589]
FNV1A_Hash_WHIZ: 523 514 513 514 512| 512 [ 555]
FNV1A_unrolled_Final: 527 519 517 516 517| 516 [ 582]
FNV1A_unrolled: 531 525 522 523 523| 522 [ 570]
Alfalfa: 562 555 555 555 554| 554 [ 570]
Alfalfa_HALF: 589 579 579 577 578| 577 [ 543]
UTF-8
13408 lines read
32768 elements in the table (15 bits)
Sixtinsensitive: 2885 2813 2789 2793 2807| 2789 [ 2414]
x17 unrolled: 2910 2910 2889 2896 2915| 2889 [ 2392]
FNV1A_Hash_WHIZ: 2493 2492 2501 2473 2476| 2473 [ 2403]
FNV1A_unrolled_Final: 2479 2474 2494 2471 2483| 2471 [ 2446]
FNV1A_unrolled: 2508 2520 2547 2693 2470| 2470 [ 2421]
Alfalfa: 2735 2726 2725 2743 2761| 2725 [ 2415]
Alfalfa_HALF: 2927 2892 2900 2943 2933| 2892 [ 2445]
IPv4
3925 lines read
8192 elements in the table (13 bits)
Sixtinsensitive: 543 526 526 525 526| 525 [ 1443]
x17 unrolled: 444 444 443 444 443| 443 [ 829]
FNV1A_Hash_WHIZ: 397 396 394 394 395| 394 [ 1404]
FNV1A_unrolled_Final: 394 395 395 394 394| 394 [ 1419]
FNV1A_unrolled: 400 403 404 402 401| 400 [ 1419]
Alfalfa: 405 419 405 403 403| 403 [ 728]
Alfalfa_HALF: 434 433 432 432 436| 432 [ 813]
D:\_KAZE_new-stuff\_w>
The next 4 keys(words/phrases/sentences/paragraphs) occupy one slot, and the 'Sixtinsensitive' speed performance boosts as the key length increases:
pneumonoultramicroscopicsilicovolcanoconiosis: "A facetious word alleged to mean 'a lung disease caused by the inhalation of very fine silica dust' but occurring chiefly as an instance of a very long word." [OED] Online Etymology Dictionary, (c) 2010 Douglas Harper
Pneumonoultramicroscopicsilicovolcanoconiosis: "A Facetious Word Alleged To Mean 'a Lung Disease Caused By The Inhalation Of Very Fine Silica Dust' But Occurring Chiefly As An Instance Of A Very Long Word." [Oed] Online Etymology Dictionary, (C) 2010 Douglas Harper
PNEUMONOULTRAMICROSCOPICSILICOVOLCANOCONIOSIS: "A FACETIOUS WORD ALLEGED TO MEAN 'A LUNG DISEASE CAUSED BY THE INHALATION OF VERY FINE SILICA DUST' BUT OCCURRING CHIEFLY AS AN INSTANCE OF A VERY LONG WORD." [OED] ONLINE ETYMOLOGY DICTIONARY, (C) 2010 DOUGLAS HARPER
pneumonoultramicroscopicsilicovolcanoconiosis: "a facetious word alleged to mean 'a lung disease caused by the inhalation of very fine silica dust' but occurring chiefly as an instance of a very long word." [oed] online etymology dictionary, (c) 2010 douglas harper
D:\_KAZE_new-stuff\_w>hash.exe 4keys.txt
4 lines read
8 elements in the table (3 bits)
Sixtinsensitive: 7 6 6 6 6| 6 [ 3]
x17 unrolled: 10 10 10 10 10| 10 [ 0]
FNV1A_Hash_WHIZ: 5 5 5 5 5| 5 [ 1]
FNV1A_unrolled_Final: 4 4 4 4 4| 4 [ 0]
FNV1A_unrolled: 4 4 4 4 4| 4 [ 0]
Alfalfa: 10 10 10 10 10| 10 [ 0]
Alfalfa_HALF: 10 10 10 10 10| 10 [ 1]
D:\_KAZE_new-stuff\_w>
Once again, it would bring joy if 'Sixtinsensitive' could outperform 'FNV1A_unrolled', personally I don't get the above result, my expectation was 'Sixtinsensitive' to whiz with such a long keys, obviously some things must be improved.
Georgi, here are the results on Pentium M:
This code:
if (wrdlen & -2) { hash = ((hash ^ (*(DWORD*)p&0xFFFF))<<5) - (hash ^ (*(DWORD*)p&0xFFFF)); p++;p++; }should be:if (wrdlen & sizeof(WORD)) { hash = ((hash ^ *(WORD*)p)<<5) - (hash ^ *(WORD*)p); p += sizeof(WORD); }With
*(DWORD*), a buffer overrun is possible at the end of a memory page. Also notewrdlen & 2instead ofwrdlen & -2.Generally, Whiz is one of the fastest hash function at this moment :) It proves that a simpler hash function can be faster despite of higher number of collisions. Thank you very much for your contribution!
dummy me, you are right: 'if (wrdlen & -2) {' becomes 'if (wrdlen & 2) {'.
The idea was/is to cover the three cases: 0,1,2,3 with two IFs.
As for 'With *(DWORD*), a buffer overrun is possible at the end of a memory page.' I knew about it but was fooled by assembly code generated by VS2010 which translates it to a word access:
; 792 : hash32 = FNV_32A_OP32(hash32, *(UINT*)p&0xFFFF);
00360 0f b7 30 movzx esi, WORD PTR [eax]
00363 33 f1 xor esi, ecx
00365 69 f6 47 06 00
00 imul esi, 1607 ; 00000647H
0036b 8b ce mov ecx, esi
In fact I still don't know how to operate directly with words(two bytes, unsigned short int may be 32bit instead of 16bit, yes?) in C, I was eager to share them, glad that you fixed the bugs.
But one mystery(caused by alignment!?) remains: why on my Pentium Merom results with Wikipedia show a very different roster!
I've just run the Wikipedia test on Core i5. Whiz is very fast.
Unfortunately, it's slower on Pentium M. I repeated the test twice on both processors, and the results don't change. Most likely, there is some unlucky difference between microarchitectures, but we should optimize for the newer processors (your Merom and my Lynnfield).
I have some high hopes(due to cheap lowercasing and 4+2 granularity) for 'Sixtinsensitive', already incorporated in Leprechaun with very consistent distribution regardless of FNV1_32_PRIME = 3,5,7,17,31,127,257,8191.
Isn't the reason for your observations the fact that your source strings simply aren't aligned anymore? That's why Murmur and anything that uses "dwords" should be penalized on all architectures that are affected by alignment. I believe if you tried with MIPS or ARM you'd also see the problems of unaligned access. But as you also see now, it's very convenient to have strings not starting on aligned address (you can omit a lot of allocations, copying and also reduce the memory needs).
But you also know my general opinion: small differences in speed (under 10 or 20%, and with changes across the platform) aren't too relevant, there's got to be some more dramatic demonstration of superiority of some function to some other one in order to claim a really "relevant" winner, and the winner should be "overall good" not only "good on the CPU I have at the moment". I'd always prefer to use the function that is "near the best" on more platforms to the function "the best on one CPU and bad on others."
Ace, the strings were not aligned in the earlier version of the program. There was an arena allocator: a large char array and a (not aligned) pointer into this char array, which was increased as the lines were read from file.
I agree that the winner should be overall good, having near the best speed and good number of collisions in all tests. At this time, Murmur2 is the best; hardware-accelerated CRC may be used if the target CPU is known to support it.
[Subnotebook Toshiba PC:] Number of cores 1 (max 1) Number of threads 2 (max 2) Name Intel Atom N450 Codename Pineview-N Specification Intel(R) Atom(TM) CPU N450 @ 1.66GHz Package (platform ID) Socket 437 FCBGA8 (0x2) CPUID 6.C.A Extended CPUID 6.1C Core Stepping B0 Technology 45 nm Core Speed 1662.7 MHz Multiplier x FSB 10.0 x 166.3 MHz Rated Bus speed 665.1 MHz Stock frequency 1666 MHz Instructions sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T L1 Data cache 24 KBytes, 6-way set associative, 64-byte line size L1 Instruction cache 32 KBytes, 8-way set associative, 64-byte line size L2 cache 512 KBytes, 8-way set associative, 64-byte line size Channels Single CAS# latency (CL) 5.0 RAS# to CAS# delay (tRCD) 5 RAS# Precharge (tRP) 5 Cycle Time (tRAS) 13 Row Refresh Cycle Time (tRFC) 44 Command Rate (CR) 2T Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Test\hash>dir 11/07/2010 06:47 AM 84,992 hash.exe 10/27/2010 03:48 PM 146,973,879 wikipedia-en-html.tar.wrd C:\Test\hash>hash wikipedia-en-html.tar.wrd 12561874 lines read 33554432 elements in the table (25 bits) Sixtinsensitive+: 16153226 16031348 16039844 16048697 16026421| 16026421 [2251734] !FASTEST! Hash_Sixtinsensitive: 16501364 16504662 16506868 16509768 16515127| 16501364 [2139242] Whiz: 16081498 16080691 16080162 16079881 16085656| 16079881 [2189360] !Second-to-FASTEST! Bernstein: 16222890 16208653 16195534 16205258 16207576| 16195534 [2074237] K&R: 16181141 16153089 16147347 16144145 16144279| 16144145 [2083145] x17 unrolled: 16196693 16200720 16203124 16195429 16196980| 16195429 [2410605] x65599: 17213371 17215886 17211092 17225140 17218658| 17211092 [2102893] FNV-1a: 17648084 17649994 17644643 17654437 17650937| 17644643 [2081195] Sedgewick: 17715483 17711538 17706339 17709057 17711857| 17706339 [2080640] Weinberger: 19795514 19798517 19792040 19783954 19796407| 19783954 [3541181] Paul Larson: 16904068 16909317 16906315 16905215 16899096| 16899096 [2080111] Paul Hsieh: 16731679 16720594 16932967 16731003 16725432| 16720594 [2180206] One At Time: 18317093 18329453 18319193 18325990 18318877| 18317093 [2087861] lookup3: 16591932 16600364 16593657 16594726 16605181| 16591932 [2084889] Arash Partow: 16790714 16782529 16783414 16782986 16791156| 16782529 [2084572] CRC-32: 16878539 16872606 16866719 16876415 16860548| 16860548 [2075088] Ramakrishna: 16437073 16449160 16438019 16442447 16452696| 16437073 [2093253] Fletcher: 74995725 74991965 74983105 75027126 75020605| 74983105 [9063797] Murmur2: 16742975 16730183 16736149 16741761 16735690| 16730183 [2081476] Hanson: 17286767 17288260 17273143 17280187 17371322| 17273143 [2129832] Novak unrolled: 60320056 60322432 60315223 60308709 60325221| 60308709 [6318611] SBox: 17677039 17685532 17686854 17681897 17680986| 17677039 [2084018] MaPrime2c: 19286056 19296348 19307574 19294326 19295151| 19286056 [2084467] C:\Test\hash> [Notebook Toshiba PC:] CPU Type: Mobile DualCore Intel Pentium T3400 CPU Alias: Merom-1M CPU Stepping: M0 Engineering Sample: No CPUID CPU Name: Intel(R) Pentium(R) Dual CPU T3400 @ 2.16GHz CPU Clock: 2161.5 MHz (original: 2166 MHz) CPU Multiplier: 13x CPU FSB: 166.3 MHz (original: 166 MHz) L1 Code Cache: 32 KB per core L1 Data Cache: 32 KB per core L2 Cache: 1 MB (On-Die, ECC, ASC, Full-Speed) Motherboard Name: Toshiba Satellite L305 Memory Timings: 5-5-5-13 (CL-RCD-RP-RAS) Front Side Bus Properties: Real Clock: 167 MHz (QDR) Effective Clock: 666 MHz Bandwidth: 5332 MB/s Memory Bus Properties: Bus Type: Dual DDR2 SDRAM Bus Width: 128-bit DRAM:FSB Ratio: 10:5 Real Clock: 333 MHz (DDR) Effective Clock: 666 MHz Bandwidth: 10663 MB/s D:\_KAZE_new-stuff>hash wikipedia-en-html.tar.wrd 12561874 lines read 33554432 elements in the table (25 bits) Sixtinsensitive+: 12058258 11973671 11953887 11970400 11973268| 11953887 [2251734] Hash_Sixtinsensitive: 12387803 12368726 12437268 12419321 12367736| 12367736 [2139242] Whiz: 10594894 10600128 10582191 10593562 10580720| 10580720 [2189360] !FASTEST! Bernstein: 12157227 12160397 12163431 12152197 12206038| 12152197 [2074237] K&R: 12113889 12062050 12066080 12053732 12058839| 12053732 [2083145] x17 unrolled: 11831265 11851909 11849651 11834030 11842581| 11831265 [2410605] !Second-to-FASTEST! x65599: 12015548 12059651 12043365 12003264 12014828| 12003264 [2102893] FNV-1a: 12512030 12476285 12472843 12464470 12481049| 12464470 [2081195] Sedgewick: 12423386 12430959 12446600 12481711 12417274| 12417274 [2080640] Weinberger: 15886398 15882493 15873672 15871503 15887145| 15871503 [3541181] Paul Larson: 11865686 11875179 11880320 11930336 11883977| 11865686 [2080111] Paul Hsieh: 12743450 12720752 12740266 12722144 12733419| 12720752 [2180206] One At Time: 13025820 13037695 13023502 13036316 13087593| 13023502 [2087861] lookup3: 12513213 12510938 12517817 12521024 12528520| 12510938 [2084889] Arash Partow: 12616358 12613361 12610421 12609956 12596411| 12596411 [2084572] CRC-32: 12407442 12316574 12336861 12333267 12332221| 12316574 [2075088] Ramakrishna: 12281227 12301150 12289356 12282034 12288241| 12281227 [2093253] Fletcher: 50147553 50090287 50105264 50208280 50116095| 50090287 [9063797] Murmur2: 12188900 12187655 12192879 12190325 12256534| 12187655 [2081476] Hanson: 12475359 12463011 12446859 12444451 12445178| 12444451 [2129832] Novak unrolled: 37262357 37364140 37383476 37304713 37319954| 37262357 [6318611] SBox: 12414330 12413821 12512329 12421929 12411736| 12411736 [2084018] MaPrime2c: 12973562 12980602 12983752 12983859 12987014| 12973562 [2084467] D:\_KAZE_new-stuff> [Desktop PC:] Number of cores 4 (max 4) Number of threads 4 (max 4) Name Intel Core 2 Quad Q9550S Codename Yorkfield Specification Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz Package (platform ID) Socket 775 LGA (0x4) CPUID 6.7.A Extended CPUID 6.17 Core Stepping E0 Technology 45 nm Core Speed 2002.2 MHz Multiplier x FSB 6.0 x 333.7 MHz Rated Bus speed 1334.8 MHz Stock frequency 2833 MHz Instructions sets MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, EM64T, VT-x L1 Data cache 4 x 32 KBytes, 8-way set associative, 64-byte line size L1 Instruction cache 4 x 32 KBytes, 8-way set associative, 64-byte line size L2 cache 2 x 6144 KBytes, 24-way set associative, 64-byte line size TDP Limit 65 Watts Northbridge Intel G41 rev. A3 Southbridge Intel 82801GB (ICH7/R) rev. A1 Memory Type DDR3 Memory Size 4096 MBytes Channels Dual, (Symmetric) Memory Frequency 667.4 MHz (1:2) CAS# latency (CL) 7.0 RAS# to CAS# delay (tRCD) 7 RAS# Precharge (tRP) 7 Cycle Time (tRAS) 21 Row Refresh Cycle Time (tRFC) 60 Command Rate (CR) 2T Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. D:\TESTS\hash>hash wikipedia-en-html.tar.wrd 12561874 lines read 33554432 elements in the table (25 bits) Sixtinsensitive+: 9053718 8989266 8951519 8952219 8949380| 8949380 [2251734] Hash_Sixtinsensitive: 9269264 9264150 9263994 9263761 9263358| 9263358 [2139242] Whiz: 7865326 7856200 7859796 7866475 7852540| 7852540 [2189360] !FASTEST! Bernstein: 9081460 9069960 9068900 9070469 9079405| 9068900 [2074237] K&R: 8993801 8992382 8990556 8983363 8986733| 8983363 [2083145] x17 unrolled: 8682855 8681626 8692004 8684190 8682793| 8681626 [2410605] !Second-to-FASTEST! x65599: 8793795 8792213 8791230 8790913 8801408| 8790913 [2102893] FNV-1a: 9391666 9382336 9379497 9379865 9389164| 9379497 [2081195] Sedgewick: 9091018 9085970 9085732 9094216 9097600| 9085732 [2080640] Weinberger: 11999347 11988551 11987173 11995011 11991473| 11987173 [3541181] Paul Larson: 8788831 8797335 8784061 8788220 8785700| 8784061 [2080111] Paul Hsieh: 9470151 9474939 9471454 9470880 9483763| 9470151 [2180206] One At Time: 9856512 9865262 9853988 9864387 9856661| 9853988 [2087861] lookup3: 9346552 9341545 9346329 9338833 9346559| 9338833 [2084889] Arash Partow: 9341715 9323081 9332539 9338738 9333211| 9323081 [2084572] CRC-32: 9159684 9134865 9138729 9155572 9155020| 9134865 [2075088] Ramakrishna: 9183066 9178082 9191092 9184270 9187847| 9178082 [2093253] Fletcher: 31111801 31116622 31094619 31055130 31036218| 31036218 [9063797] Murmur2: 9013410 9011146 9010973 9016315 9016830| 9010973 [2081476] Hanson: 9275481 9276946 9284989 9279977 9281620| 9275481 [2129832] Novak unrolled: 17390159 17391792 17387836 17389504 17391874| 17387836 [6318611] SBox: 9318679 9317361 9316436 9316997 9319220| 9316436 [2084018] MaPrime2c: 9819245 9818182 9820444 9816287 9817004| 9816287 [2084467] D:\TESTS\hash>For full info with non-proportional fonts:http://encode.ru/threads/1155-A-new-match-searching-structure?p=22923#post22923
Ace, here are the results on Wikipedia, Pentium M:
In short, no difference. The small test is more interesting. Here is the aligned version, the not aligned one can be found above:
The functions that read WORD at time (Fletcher and Paul Hseih) become faster when the strings are aligned. Lookup3 relies on alignment (see its code), so it's also faster in this version.
Georgi, thanks for publishing your results.
- FNV1A_Whiz, FNV1_32_PRIME=709607
- FNV1A_Smaragd, FNV1_32_PRIME=709607
- FNV1A_Peregrine, FNV1_32_PRIME=709607
- FNV1A_Nefertiti, FNV1_32_PRIME=31 (in fact MULless)
Romanticism in me dictates this replacement: variants with no personality to have their own first name (Whiz, Smaragd, Peregrine, Nefertiti) and family name(FNV1A). Not to mention the clarity when references are to be made.
To obtain sources(with corresponding 32bit instructions) of above four(plus Sixtinsensitive+ and Sixtinsensitive and four Alfalfa variants) download the PDF booklet at: http://www.sanmayce.com/Downloads/_Kaze_10-HASHERS.pdf
I expect Peregrine(using one 64bit memory access) when compiled for 64bit to outspeed Nefertiti(former 'FNV1A_unrolled_Final').
In other hand Smaragd(using one 32bit memory access but with 2 passes) is slower than Nefertiti but with less collisions.
And Whiz is just a lucky prodigy.
I think Peregrine is most promising hasher because for longer strings with granularity 8(full-fledged when compiled as 64bit) it will fly for sure.
I want more people to experience the hash-benchmark-fun, so here is my newest test package(131,981,551 bytes):
http://www.sanmayce.com/Downloads/_KAZE_hash_test_r1.rar
It looks like this:
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>dir
11/11/2010 07:52 AM 195,935 hash.cod
11/11/2010 07:52 AM 86,528 hash.exe
11/11/2010 07:31 AM <DIR> Peter_source
11/11/2010 07:52 AM 1,748 Runme.bat
11/11/2010 07:52 AM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
11/11/2010 07:52 AM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/11/2010 07:52 AM 1,121,365 Word-list_00,105,982_English_Spell-Check_High-Quality.wrd
11/11/2010 07:52 AM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/11/2010 07:52 AM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
11/11/2010 07:52 AM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/11/2010 07:52 AM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>Runme.bat
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv" 1>"Word-list_00,584,879.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,584,879.txt" 1>"Word-list_00,584,879_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,584,879.txt" 1>"Word-list_00,584,879_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt" 1>"Sentence-list_00,032,359.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Sentence-list_00,032,359.txt" 1>"Sentence-list_00,032,359_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Sentence-list_00,032,359.txt" 1>"Sentence-list_00,032,359_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd" 1>"Word-list_00,038,936.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,038,936.txt" 1>"Word-list_00,038,936_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,038,936.txt" 1>"Word-list_00,038,936_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,105,982_English_Spell-Check_High-Quality.wrd" 1>"Word-list_00,105,982.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,105,982.txt" 1>"Word-list_00,105,982_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,105,982.txt" 1>"Word-list_00,105,982_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd" 1>"Word-list_00,351,114.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_00,351,114.txt" 1>"Word-list_00,351,114_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_00,351,114.txt" 1>"Word-list_00,351,114_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd" 1>"Word-list_12,561,874.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_12,561,874.txt" 1>"Word-list_12,561,874_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_12,561,874.txt" 1>"Word-list_12,561,874_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd" 1>"Word-list_22,202,980.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+75 "Word-list_22,202,980.txt" 1>"Word-list_22,202,980_SPEED.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>sort /+85 "Word-list_22,202,980.txt" 1>"Word-list_22,202,980_COLLISIONS.txt"
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>echo Done.
Done.
D:\_KAZE_new-stuff\_KAZE_hash_test_r1>dir
11/11/2010 07:52 AM 195,935 hash.cod
11/11/2010 07:52 AM 86,528 hash.exe
11/11/2010 07:31 AM <DIR> Peter_source
11/11/2010 07:52 AM 1,748 Runme.bat
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359.txt
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359_COLLISIONS.txt
11/11/2010 07:52 AM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt
11/11/2010 08:30 AM 2,937 Sentence-list_00,032,359_SPEED.txt
11/11/2010 08:30 AM 2,938 Word-list_00,038,936.txt
11/11/2010 08:30 AM 2,938 Word-list_00,038,936_COLLISIONS.txt
11/11/2010 07:52 AM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd
11/11/2010 08:30 AM 2,938 Word-list_00,038,936_SPEED.txt
11/11/2010 08:30 AM 2,939 Word-list_00,105,982.txt
11/11/2010 08:30 AM 2,939 Word-list_00,105,982_COLLISIONS.txt
11/11/2010 07:52 AM 1,121,365 Word-list_00,105,982_English_Spell-Check_High-Quality.wrd
11/11/2010 08:30 AM 2,939 Word-list_00,105,982_SPEED.txt
11/11/2010 08:30 AM 2,940 Word-list_00,351,114.txt
11/11/2010 08:30 AM 2,940 Word-list_00,351,114_COLLISIONS.txt
11/11/2010 07:52 AM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd
11/11/2010 08:30 AM 2,940 Word-list_00,351,114_SPEED.txt
11/11/2010 08:30 AM 2,940 Word-list_00,584,879.txt
11/11/2010 08:30 AM 2,940 Word-list_00,584,879_COLLISIONS.txt
11/11/2010 07:52 AM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv
11/11/2010 08:30 AM 2,940 Word-list_00,584,879_SPEED.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874_COLLISIONS.txt
11/11/2010 08:47 AM 2,943 Word-list_12,561,874_SPEED.txt
11/11/2010 07:52 AM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd
11/11/2010 09:19 AM 2,943 Word-list_22,202,980.txt
11/11/2010 09:19 AM 2,943 Word-list_22,202,980_COLLISIONS.txt
11/11/2010 09:19 AM 2,943 Word-list_22,202,980_SPEED.txt
11/11/2010 07:52 AM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd
For Intel T3400 Merom 2.16GHz, the contents of 'Sentence-list_00,032,359_SPEED.txt' which in fact is the 32,359 The_Holy_Bible's sentences:
65536 elements in the table (16 bits)
32359 lines read
Fletcher: 28567 28709 28329 28521 28596| 28329 [ 7209]
FNV1A_Nefertiti: 30434 29919 30281 30112 30354| 29919 [ 6878]
FNV1A_Peregrine: 30652 30654 30774 30711 30728| 30652 [ 6838]
FNV1A_Whiz: 31917 31798 31608 31624 31934| 31608 [ 6874]
Murmur2: 32442 31898 31622 31843 31838| 31622 [ 6786]
Sixtinsensitive+: 34803 35073 35052 35265 35171| 34803 [ 6839]
SBox: 35864 36651 35840 35964 36036| 35840 [ 6839]
Novak unrolled: 37072 36698 36464 36601 37234| 36464 [ 6826]
Sixtinsensitive: 38126 38161 38783 38085 38243| 38085 [ 6876]
Paul Hsieh: 41114 40952 40467 40903 41335| 40467 [ 6874]
FNV1A_Smaragd: 40618 40934 40921 40550 40621| 40550 [ 6849]
lookup3: 42606 42204 42536 42584 42619| 42204 [ 6805]
Alfalfa_QWORD: 48411 47007 47281 47491 47252| 47007 [ 6943]
CRC-32: 47458 48495 47733 47692 48641| 47458 [ 6891]
Hanson: 49364 50769 49310 49310 49223| 49223 [ 19602]
Alfalfa: 53467 52707 53514 52711 52163| 52163 [ 6943]
x65599: 52965 53324 52546 52944 52932| 52546 [ 6859]
Paul Larson: 53238 52846 52558 53073 52954| 52558 [ 6889]
x17 unrolled: 53407 53882 53568 53693 53760| 53407 [ 6827]
Alfalfa_HALF: 53931 54075 54072 53659 53691| 53659 [ 6821]
Alfalfa_DWORD: 56396 56912 57265 57019 56184| 56184 [ 6943]
FNV-1a: 61401 61344 61129 60464 61180| 60464 [ 6840]
Bernstein: 62958 62738 63534 62466 62337| 62337 [ 6858]
K&R: 63326 63728 62776 62787 62992| 62776 [ 6785]
Sedgewick: 64195 63916 64174 63471 64562| 63471 [ 6858]
MaPrime2c: 63779 64134 63573 63849 64284| 63573 [ 6950]
Ramakrishna: 68307 69456 68476 68773 67562| 67562 [ 6943]
Arash Partow: 75811 75134 74359 74849 75068| 74359 [ 6845]
One At Time: 75324 76130 75891 75592 76803| 75324 [ 6937]
Weinberger: 102085 103031 102324 101752 101774| 101752 [ 6871]
For Intel T3400 Merom 2.16GHz, the contents of 'Word-list_22,202,980_COLLISIONS.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:
67108864 elements in the table (26 bits)
22202980 lines read
Alfalfa_HALF: 21589312 21582860 21557249 21529387 21529700| 21529387 [ 3286890]
Alfalfa: 21797854 21768913 21771533 21790962 21787538| 21768913 [ 3288684]
Alfalfa_QWORD: 22137983 22122623 22102676 22106176 22115088| 22102676 [ 3288684]
Alfalfa_DWORD: 21922825 21938602 21896015 21906405 21938735| 21896015 [ 3288684]
Bernstein: 22069765 22066886 22078223 22068073 22070701| 22066886 [ 3290766]
K&R: 21767970 21769291 21786733 21808794 21781952| 21767970 [ 3290941]
Paul Larson: 22217132 22208663 22221180 22257973 22232433| 22208663 [ 3296692]
FNV-1a: 24783327 24754595 24761221 24768850 24773851| 24754595 [ 3297552]
Murmur2: 24429581 24436845 24439565 24473986 24420091| 24420091 [ 3297709]
SBox: 24497356 24495252 24474408 24482514 24486617| 24474408 [ 3298021]
FNV1A_Smaragd: 24892728 24657954 24713885 24642826 24653872| 24642826 [ 3298433]
CRC-32: 24315125 24364015 24358932 24370230 24357086| 24315125 [ 3298998]
lookup3: 25152714 25163446 25180860 25174309 25171522| 25152714 [ 3299369]
MaPrime2c: 25468907 25497087 25515137 25496127 25511109| 25468907 [ 3299747]
Sedgewick: 23187075 23188783 23201984 23182620 23187676| 23182620 [ 3302263]
One At Time: 25752155 25745719 25763766 25769312 25760894| 25745719 [ 3304908]
Ramakrishna: 22571194 22588985 22583130 22584062 22567869| 22567869 [ 3321824]
x65599: 22869457 22871793 22868825 22899146 22893458| 22868825 [ 3325064]
Arash Partow: 23330844 23304012 23325130 23319561 23321201| 23304012 [ 3325683]
FNV1A_Peregrine: 24709308 24696956 24739156 24542282 24027705| 24027705 [ 3333193]
FNV1A_Whiz: 23260698 23284708 23261485 23231186 23274844| 23231186 [ 3369088]
Sixtinsensitive: 23749686 23754606 23752685 23771534 23756371| 23749686 [ 3373923]
Hanson: 24678894 24686360 24668732 24692393 24679870| 24668732 [ 3408497]
Paul Hsieh: 25165181 25195490 25185265 25191703 25191274| 25165181 [ 3498543]
FNV1A_Nefertiti: 23178958 23204612 23158585 23201471 23168530| 23158585 [ 3505371]
Sixtinsensitive+: 23695832 23697515 23670644 23686530 23677100| 23670644 [ 3507772]
x17 unrolled: 21089676 21072607 21073703 21093963 21080539| 21072607 [ 3830652]
Weinberger: 27205172 27183550 27221767 27207192 27186851| 27183550 [ 5732660]
Novak unrolled: 76893278 76881102 76828362 76905661 76913653| 76828362 [10591108]
Fletcher: 61012927 60973684 60990148 60993752 60974823| 60973684 [14915258]
For Intel Atom N450 Pineview-N 1.66GHz, the contents of 'Word-list_00,351,114_SPEED.txt' which in fact is the spell-checker's words:
Word-list_00,351,114_SPEED.txt
1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Nefertiti: 306194 306573 306192 308890 306516| 306192 [ 52963]
FNV1A_Whiz: 309750 309663 315782 309936 309856| 309663 [ 52966]
Sixtinsensitive+: 311474 311330 311115 314129 313496| 311115 [ 53040]
Alfalfa_HALF: 312465 312981 312385 316337 312773| 312385 [ 52454]
K&R: 313431 321837 312990 319829 313343| 312990 [ 52642]
FNV1A_Peregrine: 319362 329116 322006 319367 319748| 319362 [ 52551]
Bernstein: 321816 321772 321580 324893 322049| 321580 [ 52770]
x17 unrolled: 322271 321899 321849 324666 322026| 321849 [ 53556]
Paul Hsieh: 326436 322757 322904 322812 323075| 322757 [ 52729]
Sixtinsensitive: 325249 325427 325026 331674 325273| 325026 [ 53081]
Murmur2: 326730 326620 326377 332203 326445| 326377 [ 52738]
FNV1A_Smaragd: 329728 326406 329167 326707 326948| 326406 [ 52774]
lookup3: 334076 327371 327638 328289 328143| 327371 [ 52868]
Alfalfa: 329238 329437 329020 331666 330027| 329020 [ 52594]
Arash Partow: 333027 330317 330431 338750 330578| 330317 [ 52887]
Ramakrishna: 337692 331050 330770 330642 331796| 330642 [ 52764]
CRC-32: 335909 332821 332915 333260 332676| 332676 [ 52931]
Paul Larson: 335006 341362 335552 335411 335263| 335006 [ 52970]
Alfalfa_DWORD: 335984 336655 335764 341768 336394| 335764 [ 52594]
x65599: 337815 338067 337403 339999 337224| 337224 [ 52988]
Novak unrolled: 338616 338544 340872 339003 338570| 338544 [ 70274]
Alfalfa_QWORD: 342953 343383 343121 345655 346083| 342953 [ 52594]
Hanson: 345076 345541 344846 347805 345012| 344846 [ 57741]
Sedgewick: 346944 346191 348871 346527 346158| 346158 [ 52920]
FNV-1a: 352768 352508 357287 352658 352651| 352508 [ 52829]
SBox: 353603 353294 359148 353330 353175| 353175 [ 52688]
One At Time: 370592 369297 368441 368550 368128| 368128 [ 52836]
MaPrime2c: 398537 400557 398662 398581 398967| 398537 [ 52435]
Weinberger: 415567 418102 416656 415559 416279| 415559 [ 103386]
Fletcher: 440953 440887 440794 443035 441821| 440794 [ 182747]
For Intel Atom N450 Pineview-N 1.66GHz, the contents of 'Word-list_22,202,980_SPEED.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:
67108864 elements in the table (26 bits)
22202980 lines read
K&R: 27212349 27196471 27200039 27186453 27196511| 27186453 [ 3290941]
Alfalfa_HALF: 27356819 27307195 27297075 27307918 27303211| 27297075 [ 3286890]
x17 unrolled: 27339582 27350110 27357010 27339325 27356507| 27339325 [ 3830652]
Bernstein: 27868098 27885616 27865719 27879787 27863864| 27863864 [ 3290766]
Ramakrishna: 28665899 28666251 28682862 28664667 28671544| 28664667 [ 3321824]
Alfalfa: 28888653 28886944 28877923 28887710 28891342| 28877923 [ 3288684]
FNV1A_Nefertiti: 29213855 29223509 29198537 29215353 29230921| 29198537 [ 3505371]
Alfalfa_DWORD: 29369155 29377827 29371609 29385218 29372532| 29369155 [ 3288684]
Paul Larson: 29416290 29420142 29438150 29438707 29445835| 29416290 [ 3296692]
Sixtinsensitive+: 29620313 29611448 29599723 29620233 29601234| 29599723 [ 3507772]
Arash Partow: 29601009 29607096 29607259 29604831 29600309| 29600309 [ 3325683]
Alfalfa_QWORD: 29766556 29768872 29761910 29768865 29750096| 29750096 [ 3288684]
Sixtinsensitive: 30910025 30227158 30218696 30253983 30229414| 30218696 [ 3373923]
FNV1A_Whiz: 30529924 30573778 30522564 30513371 30532538| 30513371 [ 3369088]
x65599: 30639384 30626734 30637721 30629312 30667952| 30626734 [ 3325064]
FNV1A_Peregrine: 31062527 31065744 31063962 31074164 31060828| 31060828 [ 3333193]
Sedgewick: 31231177 31217657 31227213 31231472 31210256| 31210256 [ 3302263]
Paul Hsieh: 31504060 31514350 31516370 31506898 31500243| 31500243 [ 3498543]
lookup3: 31603211 31576806 31546659 31551113 31562355| 31546659 [ 3299369]
FNV1A_Smaragd: 31788377 31580190 31580953 31575398 31584602| 31575398 [ 3298433]
Murmur2: 31588844 31591137 31578842 31585076 31581288| 31578842 [ 3297709]
CRC-32: 31993878 31986891 31976616 31993457 31978443| 31976616 [ 3298998]
Hanson: 32697671 32708803 32703678 32697383 32699428| 32697383 [ 3408497]
FNV-1a: 33097899 33081494 33091676 33097100 33087453| 33081494 [ 3297552]
Weinberger: 33148065 33152666 33133844 33143483 33153828| 33133844 [ 5732660]
SBox: 33418326 33433409 33412257 33432103 33433903| 33412257 [ 3298021]
One At Time: 34447974 34462104 34469829 34453808 34453369| 34447974 [ 3304908]
MaPrime2c: 36544157 36541591 36553914 36532887 36536317| 36532887 [ 3299747]
Fletcher: 85211946 85207760 85213725 85234580 85240698| 85207760 [14915258]
Novak unrolled: 122659701 122684073 122688489 122710846 122677634| 122659701 [10591108]
For Intel Q9550 2.83GHz Yorkfield, the contents of 'Word-list_00,351,114_SPEED.txt' which in fact is the spell-checker's words:
1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Whiz: 126979 126965 126909 127074 126892| 126892 [ 52966]
FNV1A_Nefertiti: 127558 127674 127561 127462 127655| 127462 [ 52963]
Novak unrolled: 133056 131005 130946 130748 130608| 130608 [ 70274]
FNV1A_Peregrine: 131261 131375 132194 131400 131466| 131261 [ 52551]
FNV1A_Smaragd: 135540 132354 132447 132105 134366| 132105 [ 52774]
Sixtinsensitive+: 140261 133781 132937 132972 133011| 132937 [ 53040]
Alfalfa: 133692 134306 134139 133886 133746| 133692 [ 52594]
x17 unrolled: 138124 135728 135652 138535 136985| 135652 [ 53556]
Alfalfa_HALF: 136099 135873 135958 136045 135852| 135852 [ 52454]
Alfalfa_DWORD: 139382 136591 137320 136955 136703| 136591 [ 52594]
SBox: 136660 136629 136652 139976 137095| 136629 [ 52688]
CRC-32: 136776 136682 137740 136886 137013| 136682 [ 52931]
Sixtinsensitive: 137325 139569 137416 138149 137460| 137325 [ 53081]
Murmur2: 137708 137451 137466 137370 137645| 137370 [ 52738]
Paul Larson: 137963 137860 137821 137806 137844| 137806 [ 52970]
Alfalfa_QWORD: 138891 140860 138630 138520 139282| 138520 [ 52594]
Hanson: 138905 138985 141471 140417 139221| 138905 [ 57741]
x65599: 139179 141376 138957 139922 139037| 138957 [ 52988]
K&R: 142387 142494 142280 142263 142276| 142263 [ 52642]
Bernstein: 143426 143500 143558 144583 143804| 143426 [ 52770]
Paul Hsieh: 144100 144173 145563 144810 144435| 144100 [ 52729]
Sedgewick: 145909 145885 145971 145638 146653| 145638 [ 52920]
lookup3: 147183 147415 148043 149721 147718| 147183 [ 52868]
FNV-1a: 147512 147545 147397 148011 149087| 147397 [ 52829]
Ramakrishna: 148659 156452 148559 148661 148236| 148236 [ 52764]
Fletcher: 149293 149231 149835 151863 149492| 149231 [ 182747]
Arash Partow: 149436 149511 149354 149420 149339| 149339 [ 52887]
MaPrime2c: 152965 153024 152778 153176 152732| 152732 [ 52435]
One At Time: 159041 158835 164564 159598 158996| 158835 [ 52836]
Weinberger: 197287 197401 197641 199104 199606| 197287 [ 103386]
For Intel Q9550 2.83GHz Yorkfield, the contents of 'Word-list_22,202,980_SPEED.txt' which in fact is the wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar's words:
67108864 elements in the table (26 bits)
22202980 lines read
x17 unrolled: 16711968 16714788 16712787 16715049 16715557| 16711968 [ 3830652]
Alfalfa_HALF: 16925846 16921816 16935959 16918072 16920664| 16918072 [ 3286890]
Alfalfa: 16949669 16948259 16945026 16947268 16949278| 16945026 [ 3288684]
Alfalfa_DWORD: 17087653 17089499 17090082 17092342 17092248| 17087653 [ 3288684]
K&R: 17217212 17215582 17214916 17211997 17210891| 17210891 [ 3290941]
Alfalfa_QWORD: 17284333 17281698 17296440 17284594 17287022| 17281698 [ 3288684]
Paul Larson: 17287965 17288993 17286715 17288186 17290720| 17286715 [ 3296692]
Bernstein: 17441366 17441744 17446211 17446642 17448115| 17441366 [ 3290766]
x65599: 17516305 17512376 17512059 17510499 17514638| 17510499 [ 3325064]
FNV1A_Whiz: 17578315 17575074 17574692 17572083 17571654| 17571654 [ 3369088]
FNV1A_Nefertiti: 17812040 17810394 17815863 17800634 17802965| 17800634 [ 3505371]
Sedgewick: 17813196 17805720 17809107 17808987 17809341| 17805720 [ 3302263]
Ramakrishna: 17821031 17818277 17816051 17817365 17820101| 17816051 [ 3321824]
FNV1A_Smaragd: 18156291 18032451 18056304 18014531 18010227| 18010227 [ 3298433]
FNV1A_Peregrine: 18037980 18041105 18047264 18049949 18050438| 18037980 [ 3333193]
Arash Partow: 18159192 18218352 18156661 18148943 18148120| 18148120 [ 3325683]
Sixtinsensitive+: 18273937 18273231 18272381 18276826 18272754| 18272381 [ 3507772]
CRC-32: 18299132 18303268 18300555 18307328 18296881| 18296881 [ 3298998]
Murmur2: 18312461 18309171 18315023 18312814 18306135| 18306135 [ 3297709]
Sixtinsensitive: 18326606 18325448 18319761 18326255 18320762| 18319761 [ 3373923]
SBox: 18454359 18453094 18457853 18448201 18451512| 18448201 [ 3298021]
Hanson: 18592317 18585036 18585941 18597290 18595878| 18585036 [ 3408497]
Paul Hsieh: 18931940 18935308 18939618 18942478 18940287| 18931940 [ 3498543]
lookup3: 18978329 18970650 18969792 18967985 18983381| 18967985 [ 3299369]
FNV-1a: 19055865 19051343 19051202 19054812 19049558| 19049558 [ 3297552]
MaPrime2c: 19497746 19470760 19474411 19474490 19476430| 19470760 [ 3299747]
One At Time: 19826720 19775854 19767551 19765074 19763605| 19763605 [ 3304908]
Weinberger: 21866786 21864914 21867700 21862238 21866417| 21862238 [ 5732660]
Fletcher: 44235533 44193774 44142011 44138717 44158624| 44138717 [14915258]
Novak unrolled: 52067982 52067088 52068005 52064574 52068712| 52064574 [10591108]
Any suggestions(adding more files/functions/...) are appreciated.
Something strange(beyond my understanding) is going on: on wikipedia's words a significant hampering appeared from nowhere!? Any idea for this mystery?
Just for longer keys here comes the Jester - an unrolled Whiz:
UINT FNV1A_Hash_Jester(const char *str, SIZE_T wrdlen)
{
const UINT PRIME = 709607;
UINT hash32 = 2166136261;
const char *p = str;
// Idea comes from Igor Pavlov's 7zCRC, thanks.
/*
for(; wrdlen && ((unsigned)(ptrdiff_t)p&3); wrdlen -= 1, p++) {
hash32 = (hash32 ^ *p) * PRIME;
}
*/
for(; wrdlen >= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) {
hash32 = (hash32 ^ *(DWORD *)p) * PRIME;
hash32 = (hash32 ^ *(DWORD *)(p+4)) * PRIME;
}
// Cases: 0,1,2,3,4,5,6,7
if (wrdlen & sizeof(DWORD)) {
hash32 = (hash32 ^ *(DWORD*)p) * PRIME;
p += sizeof(DWORD);
}
if (wrdlen & sizeof(WORD)) {
hash32 = (hash32 ^ *(WORD*)p) * PRIME;
p += sizeof(WORD);
}
if (wrdlen & 1)
hash32 = (hash32 ^ *p) * PRIME;
return hash32 ^ (hash32 >> 16);
}
262144 elements in the table (18 bits)
105982 lines read
FNV1A_Jester: 52198 52248 51275 52071 50515| 50515 [ 18774]
FNV1A_Whiz: 51168 52620 51802 52121 50774| 50774 [ 18774]
1048576 elements in the table (20 bits)
351114 lines read
FNV1A_Jester: 237708 235011 234586 234747 235305| 234586 [ 52966]
FNV1A_Whiz: 239372 239256 238010 238354 239165| 238010 [ 52966]
http://encode.ru/threads/1160-Fastest-non-secure-hash-function!
Hi Peter,
Having read some critics I added the test on enwik8 some 900,000 lines with all kind of keys in it:
You are welcome to home of FNV1A_Jesteress:
http://www.sanmayce.com/Fastest_Hash/index.html
Regards
Georgi's page contains an interesting link to:
http://cbloomrants.blogspot.com/2010/11/11-19-10-hashes-and-cache-tables.html
It's written by the guy behind http://www.cbloom.com and if I understand correctly he is also interested in using hashes for compression algorithms, not only in compiler-needed hash tables.
Georgi, I've added Jesteress to the test; thank you very much for your efforts. Jesteress is even faster than Whiz! :) The number of collisions in Numbers test remains high, though.
Ace, thanks for the link. He benchmarked STLport, a complex implementation that grows the hash table, uses jump tables and prime numbers for table size. Something similar to what you wanted to do.
I finally got stable results on Core i5 by increasing the number of runs. There are some potentially useful ideas for statistical tests (chi-square, etc.) at Murmur page.
Peter, I do not care about numbers, IPs and other numeric datasets, anyway just for making all fans happy here comes Meiyan(Beauty, Charming Eyes or most precisely: SOULFUL EYES):
FNV1A_Jesteress gives way to FNV1A_Meiyan because of better mixing - the last DWORD is split into two WORD passes to avoid losing the carries.
#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Meiyan(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; const char *p = str; // Idea comes from Igor Pavlov's 7zCRC, thanks. /* for(; wrdlen && ((unsigned)(ptrdiff_t)p&3); wrdlen -= 1, p++) { hash32 = (hash32 ^ *p) * PRIME; } */ for(; wrdlen >= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME; } // Cases: 0,1,2,3,4,5,6,7 if (wrdlen & sizeof(DWORD)) { // hash32 = (hash32 ^ *(DWORD*)p) * PRIME; // p += sizeof(DWORD); hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); } if (wrdlen & 1) hash32 = (hash32 ^ *p) * PRIME;Thus, as in one of my favorite movies where Meiyan transforms(only physically) into a princess from an ugly imp similarly here the ugly(because of numeric ugliness) Jesteress evolves to a sub-fantastic entity.
Intel Core i5 gives:
Intel Merom gives:
500 lines read
1024 elements in the table (10 bits)
FNV1A_Meiyan: 92 80 78 77 77| 77 [ 102] FNV1A_Jesteress: 86 80 79 79 78| 78 [ 110] Win321992 lines read
4096 elements in the table (12 bits)
FNV1A_Meiyan: 1850 1847 1832 1877 1865| 1832 [ 408] FNV1A_Jesteress: 1863 1714 1715 1716 1710| 1710 [ 397] Numbers500 lines read
1024 elements in the table (10 bits)
FNV1A_Meiyan: 55 50 50 50 50| 50 [ 125] FNV1A_Jesteress: 56 56 56 56 56| 56 [ 300] Prefix500 lines read
1024 elements in the table (10 bits)
FNV1A_Meiyan: 123 115 115 115 115| 115 [ 106] FNV1A_Jesteress: 116 114 113 113 113| 113 [ 102] Postfix500 lines read
1024 elements in the table (10 bits)
FNV1A_Meiyan: 120 112 111 111 111| 111 [ 112] FNV1A_Jesteress: 110 109 107 106 106| 106 [ 106] Variables1842 lines read
4096 elements in the table (12 bits)
FNV1A_Meiyan: 396 349 397 347 347| 347 [ 350] FNV1A_Jesteress: 352 344 403 351 344| 344 [ 366] Sonnets3228 lines read
8192 elements in the table (13 bits)
FNV1A_Meiyan: 540 512 508 510 510| 508 [ 588] FNV1A_Jesteress: 561 513 509 509 508| 508 [ 585] UTF-813408 lines read
32768 elements in the table (15 bits)
FNV1A_Meiyan: 2530 2436 2430 2426 2435| 2426 [ 2377] FNV1A_Jesteress: 2420 2403 2934 2410 2404| 2403 [ 2427] IPv43925 lines read
8192 elements in the table (13 bits)
FNV1A_Meiyan: 421 398 397 398 394| 394 [ 768] FNV1A_Jesteress: 402 406 401 403 403| 401 [ 1499] D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash enwik8919074 lines read
2097152 elements in the table (21 bits)
FNV1A_Meiyan: 1070854 1053328 1061062 1049874 1063539| 1049874 [ 315896] FNV1A_Jesteress: 1056365 1065933 1048484 1048852 1049749| 1048484 [ 316210] D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash enwik910920423 lines read
33554432 elements in the table (25 bits)
FNV1A_Meiyan: 13539353 13374523 13366537 13362240 13389443| 13362240 [ 4562577] FNV1A_Jesteress: 13289862 13337087 13293203 13296861 13525346| 13289862 [ 4564891] D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"12561874 lines read
33554432 elements in the table (25 bits)
FNV1A_Meiyan: 13010074 12874395 12885821 12923525 12927757| 12874395 [ 2111271] FNV1A_Jesteress: 12906637 12879373 12871222 13125286 12886406| 12871222 [ 2121868] D:\WorkTemp\_KAZE_hash_test_r2_enwik>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"22202980 lines read
67108864 elements in the table (26 bits)
FNV1A_Meiyan: 25300254 24965950 24905897 25175291 25028349| 24905897 [ 3345260] FNV1A_Jesteress: 24844250 24846907 24835438 24832771 25086445| 24832771 [ 3355676]Meiyan: "Hell that no fury like a beauty scorned. Haven't you heard?"
Meiyan: "No wonder they say all good people are tricky."
Ha-ha, right said Meiyan.
Meiyan: "I'm not garbage. Don't just throw me away. I'm not just a thing. I have a name! My name is Meiyan."
Peter,
I knew where the weakness(regarding collisions) was, it is no more i.e. it is amended with arrival of FNV1A_Mantis the strongest of all my FNV1A variants.
Now FNV1A_Meiyan is between FNV1A_Jesteress and FNV1A_Mantis.
Personally my favorite(because my way of using her differs a lot from that of other people) is still FNV1A_Jesteress despite her apparent collision drawback in interval 8chars to 10chars: due to either one BYTE(8+1) or WORD(8+2) mix.
I tried(unseriously) to fix it in FNV1A_Meiyan, so here comes FNV1A_Mantis(00562-004a0=194bytes fattest so far):
// Mantis has two(three to be exact) gears: it operates as WORD based FNV1A for 1..15 lengths and as QWORD based FNV1A 16.. lengths. // I see the instant mantis' grasping-and-devouring as MONSTROUS QUADRO-BYTE-PAIRs BAITs(IN-MIX) while target secured within FIRM-GRIP of forelimbs(PRE-MIX & POST-MIX). // Word 'mantical'(Of or relating to the foretelling of events by or as if by supernatural means) comes from Greek mantikos, from the Greek word mantis, meaning "prophet, seer." // The Greeks, who made the connection between the upraised front legs of a mantis waiting for its prey and the hands of a prophet in prayer, used the name mantis to mean "the praying mantis." #define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; const char *p = str; // Cases: 0,1,2,3,4,5,6,7 if (wrdlen & sizeof(DWORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(DWORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(WORD); } if (wrdlen & 1) { hash32 = (hash32 ^ *p) * PRIME; p += sizeof(char); //wrdlen -= sizeof(char); } wrdlen -= p-str; // The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress. // FNV1A_Jesteress: fastest strong // FNV1A_Meiyan : faster stronger // FNV1A_Mantis : fast strongest for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME; } hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME; return hash32 ^ (hash32 >> 16); }Ok, now some heavy hash hustle with fixed-length-ASCII-strings, in my opinion this is the most relevant and close to practice(here: the fundamental match finding) benchmark.
At link below this summer I approached in a dummy way LZ match finding by counting the repetitions(in rich of English text OSHO.TXT) through Building-Blocks_DUMPER:
http://encode.ru/threads/1134-Dummy-Static-Windowless-Dictionary-Text-Decompressor?p=22653&viewfull=1#post22653
Length of Building-Blocks / Quantity of ALL(with overlapping) Building-Blocks / Quantity of DISTINCT(with overlapping) Building-Blocks / Quantity of REPETITIVE(with overlapping) Building-Blocks
Volume in drive D is H320_Vol5
Volume Serial Number is 0CB3-C881
Directory of D:\_KAZE_new-stuff\r3 12/02/2010 08:15 AM <DIR> . 12/02/2010 08:15 AM <DIR> .. 12/02/2010 08:15 AM 45,501 Building-Blocks_DUMPER.c 12/02/2010 08:15 AM 79,360 Building-Blocks_DUMPER.exe 12/02/2010 08:15 AM 223,954 hash.cod 12/02/2010 08:15 AM 56,186 hash.cpp 12/02/2010 08:15 AM 81,920 hash.exe 12/02/2010 08:15 AM 206,908,949 OSHO.TXT 12/02/2010 08:15 AM 394 RUNME.BAT 7 File(s) 207,396,264 bytes 2 Dir(s) 6,020,390,912 bytes free D:\_KAZE_new-stuff\r3>Building-Blocks_DUMPER.exeBuilding-Blocks_DUMPER rev.2, written by Kaze.
Note: This revision converts CR to $ and LF to # in order to have lines patternlen long ending with LF.
Sorting 206908947 Pointers to Building-Blocks 3 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB003.txt ...
Sorting 206908946 Pointers to Building-Blocks 4 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB004.txt ...
Sorting 206908945 Pointers to Building-Blocks 5 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB005.txt ...
Sorting 206908944 Pointers to Building-Blocks 6 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB006.txt ...
Sorting 206908943 Pointers to Building-Blocks 7 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB007.txt ...
Sorting 206908942 Pointers to Building-Blocks 8 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB008.txt ...
Sorting 206908941 Pointers to Building-Blocks 9 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB009.txt ...
Sorting 206908940 Pointers to Building-Blocks 10 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB010.txt ...
Sorting 206908939 Pointers to Building-Blocks 11 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB011.txt ...
Sorting 206908938 Pointers to Building-Blocks 12 chars in size ...
Allocated memory for pointers-to-words in MB: 790
Writing Sorted Building-Blocks to BB012.txt ...
Building-Blocks_DUMPER total time: 2658329 clocks
Volume in drive D is H320_Vol5
Volume Serial Number is 0CB3-C881
Directory of D:\_KAZE_new-stuff\r3 12/02/2010 08:55 AM <DIR> . 12/02/2010 08:55 AM <DIR> .. 12/02/2010 08:18 AM 185,944 BB003.txt 12/02/2010 08:22 AM 1,240,095 BB004.txt 12/02/2010 08:25 AM 5,134,092 BB005.txt 12/02/2010 08:29 AM 15,652,966 BB006.txt 12/02/2010 08:33 AM 38,425,216 BB007.txt 12/02/2010 08:37 AM 80,608,464 BB008.txt 12/02/2010 08:41 AM 150,061,720 BB009.txt 12/02/2010 08:46 AM 252,913,397 BB010.txt 12/02/2010 08:52 AM 392,490,228 BB011.txt 12/02/2010 08:59 AM 569,430,745 BB012.txt 12/02/2010 08:15 AM 45,501 Building-Blocks_DUMPER.c 12/02/2010 08:15 AM 79,360 Building-Blocks_DUMPER.exe 12/02/2010 08:15 AM 223,954 hash.cod 12/02/2010 08:15 AM 56,186 hash.cpp 12/02/2010 08:15 AM 81,920 hash.exe 12/02/2010 08:15 AM 206,908,949 OSHO.TXT 12/02/2010 08:15 AM 394 RUNME.BAT 17 File(s) 1,713,539,131 bytes 2 Dir(s) 4,511,014,912 bytes free D:\_KAZE_new-stuff\r3>dirVolume in drive D is H320_Vol5
Volume Serial Number is 0CB3-C881
Directory of D:\_KAZE_new-stuff\r3 12/02/2010 09:08 AM <DIR> . 12/02/2010 09:08 AM <DIR> .. 12/02/2010 08:18 AM 185,944 BB003.txt 12/02/2010 08:22 AM 1,240,095 BB004.txt 12/02/2010 08:25 AM 5,134,092 BB005.txt 12/02/2010 08:29 AM 15,652,966 BB006.txt 12/02/2010 08:33 AM 38,425,216 BB007.txt 12/02/2010 08:37 AM 80,608,464 BB008.txt 12/02/2010 08:41 AM 150,061,720 BB009.txt 12/02/2010 08:46 AM 252,913,397 BB010.txt 12/02/2010 08:52 AM 392,490,228 BB011.txt 12/02/2010 08:59 AM 569,430,745 BB012.txt 12/02/2010 08:15 AM 45,501 Building-Blocks_DUMPER.c 12/02/2010 08:15 AM 79,360 Building-Blocks_DUMPER.exe 12/02/2010 08:15 AM 223,954 hash.cod 12/02/2010 08:15 AM 56,186 hash.cpp 12/02/2010 08:15 AM 81,920 hash.exe 11/01/2009 12:00 AM 202,688,536 IP-COUNTRY-REGION-CITY.CSV 12/02/2010 09:07 AM 1,636 LONG2DOT.BAS 12/02/2010 09:07 AM 43,110 LONG2DOT.EXE 12/02/2010 08:15 AM 206,908,949 OSHO.TXT 12/02/2010 08:15 AM 394 RUNME.BAT 20 File(s) 1,916,272,413 bytes 2 Dir(s) 4,510,916,608 bytes free D:\_KAZE_new-stuff\r3>LONG2DOT.EXE IP-COU~1.csv IPs.TXTLONG2DOT.EXE, revision 001.
Written by Svalqyatchx 'Kaze'.
Example: liner ip.csv ip.txt
Output file: IPS.TXT
Lines: 2995394
LONG2DOT: Done.
8388608 elements in the table (23 bits)
2995394 lines read
FNV1A_Meiyan: 2320681 2325209 2329864 2329377 2330092| 2320681 [ 593723] FNV1A_Jesteress: 2365681 2373005 2366059 2364893 2364232| 2364232 [ 691369] FNV1A_Mantis: 2497424 2440808 2442037 2431852 2434459| 2431852 [ 481132] Hanson: 2443092 2452243 2440873 2450233 2454445| 2440873 [ 534251] CRC-32: 2461652 2465529 2465261 2461036 2488104| 2461036 [ 472854] FNV1A_Smaragd: 2478378 2483251 2480562 2478002 2479434| 2478002 [ 480914] Alfalfa_Rollick: 2499324 2497969 2500407 2487340 2484967| 2484967 [ 604098] Novak unrolled: 2521693 2523052 2517898 2523059 2509677| 2509677 [ 657377] Murmur2: 2531026 2523656 2519732 2527580 2524589| 2519732 [ 476330] K&R: 2525950 2522337 2567966 2563603 2548883| 2522337 [ 474011] Alfalfa: 2545089 2550223 2539683 2552206 2556244| 2539683 [ 475434] FNV1A_Jester: 2562152 2555758 2567615 2558522 2567506| 2555758 [ 689339] Alfalfa_DWORD: 2576601 2559221 2572886 2569249 2567150| 2559221 [ 475434] Alfalfa_HALF: 2568949 2565813 2571174 2569459 2567984| 2565813 [ 480071] FNV1A_Peregrine: 2578517 2587847 2580838 2584245 2583035| 2578517 [ 546915] x17 unrolled: 2631251 2633371 2590646 2598510 2602628| 2590646 [ 475528] SBox: 2598938 2597768 2598215 2601608 2635853| 2597768 [ 476681] Paul Larson: 2603197 2617318 2619513 2617793 2612003| 2603197 [ 475575] Sedgewick: 2613270 2616042 2615704 2612230 2613473| 2612230 [ 477931] FNV-1a: 2636697 2644461 2644682 2642210 2633728| 2633728 [ 477067] Weinberger: 2645404 2650330 2649391 2640028 2646150| 2640028 [ 1159267] Bernstein: 2650060 2657143 2658727 2657347 2655746| 2650060 [ 474048] Paul Hsieh: 2659945 2664320 2664344 2668886 2660429| 2659945 [ 543835] FNV1A_Whiz: 2675262 2665171 2677243 2678088 2704615| 2665171 [ 689339] Alfalfa_QWORD: 2688756 2686666 2695489 2683671 2682588| 2682588 [ 475434] MaPrime2c: 2748198 2737837 2730565 2707478 2702163| 2702163 [ 477151] lookup3: 2710216 2706776 2711884 2718018 2717402| 2706776 [ 476566] Arash Partow: 2723308 2726973 2725111 2727244 2728437| 2723308 [ 478246] Sixtinsensitive: 2724973 2731232 2729760 2726649 2731038| 2724973 [ 582793] Ramakrishna: 2769904 2793210 2788144 2770267 2750457| 2750457 [ 476020] FNV1A_Nefertiti: 2812872 2811425 2809344 2778330 2788216| 2778330 [ 763451] Sixtinsensitive+: 2817990 2812481 2813852 2805091 2815720| 2805091 [ 716367] One At Time: 2830002 2828003 2823084 2827213 2821480| 2821480 [ 477667] x65599: 2906449 2906157 2908646 2904777 2912365| 2904777 [ 654463] Fletcher: 44224682 44243371 44320905 44280935 44190506| 44190506 [ 2856890] D:\_KA45F~1\r3\RESULTS>sort IP.TXT /+858388608 elements in the table (23 bits)
2995394 lines read
CRC-32: 2461652 2465529 2465261 2461036 2488104| 2461036 [ 472854] K&R: 2525950 2522337 2567966 2563603 2548883| 2522337 [ 474011] Bernstein: 2650060 2657143 2658727 2657347 2655746| 2650060 [ 474048] Alfalfa: 2545089 2550223 2539683 2552206 2556244| 2539683 [ 475434] Alfalfa_DWORD: 2576601 2559221 2572886 2569249 2567150| 2559221 [ 475434] Alfalfa_QWORD: 2688756 2686666 2695489 2683671 2682588| 2682588 [ 475434] x17 unrolled: 2631251 2633371 2590646 2598510 2602628| 2590646 [ 475528] Paul Larson: 2603197 2617318 2619513 2617793 2612003| 2603197 [ 475575] Ramakrishna: 2769904 2793210 2788144 2770267 2750457| 2750457 [ 476020] Murmur2: 2531026 2523656 2519732 2527580 2524589| 2519732 [ 476330] lookup3: 2710216 2706776 2711884 2718018 2717402| 2706776 [ 476566] SBox: 2598938 2597768 2598215 2601608 2635853| 2597768 [ 476681] FNV-1a: 2636697 2644461 2644682 2642210 2633728| 2633728 [ 477067] MaPrime2c: 2748198 2737837 2730565 2707478 2702163| 2702163 [ 477151] One At Time: 2830002 2828003 2823084 2827213 2821480| 2821480 [ 477667] Sedgewick: 2613270 2616042 2615704 2612230 2613473| 2612230 [ 477931] Arash Partow: 2723308 2726973 2725111 2727244 2728437| 2723308 [ 478246] Alfalfa_HALF: 2568949 2565813 2571174 2569459 2567984| 2565813 [ 480071] FNV1A_Smaragd: 2478378 2483251 2480562 2478002 2479434| 2478002 [ 480914] FNV1A_Mantis: 2497424 2440808 2442037 2431852 2434459| 2431852 [ 481132] Hanson: 2443092 2452243 2440873 2450233 2454445| 2440873 [ 534251] Paul Hsieh: 2659945 2664320 2664344 2668886 2660429| 2659945 [ 543835] FNV1A_Peregrine: 2578517 2587847 2580838 2584245 2583035| 2578517 [ 546915] Sixtinsensitive: 2724973 2731232 2729760 2726649 2731038| 2724973 [ 582793] FNV1A_Meiyan: 2320681 2325209 2329864 2329377 2330092| 2320681 [ 593723] Alfalfa_Rollick: 2499324 2497969 2500407 2487340 2484967| 2484967 [ 604098] x65599: 2906449 2906157 2908646 2904777 2912365| 2904777 [ 654463] Novak unrolled: 2521693 2523052 2517898 2523059 2509677| 2509677 [ 657377] FNV1A_Jester: 2562152 2555758 2567615 2558522 2567506| 2555758 [ 689339] FNV1A_Whiz: 2675262 2665171 2677243 2678088 2704615| 2665171 [ 689339] FNV1A_Jesteress: 2365681 2373005 2366059 2364893 2364232| 2364232 [ 691369] Sixtinsensitive+: 2817990 2812481 2813852 2805091 2815720| 2805091 [ 716367] FNV1A_Nefertiti: 2812872 2811425 2809344 2778330 2788216| 2778330 [ 763451] Weinberger: 2645404 2650330 2649391 2640028 2646150| 2640028 [ 1159267] Fletcher: 44224682 44243371 44320905 44280935 44190506| 44190506 [ 2856890] D:\_KA45F~1\r3\RESULTS>type BB.TXT46486 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 34624 36535 34148 34050 35542| 34050 [ 13] FNV1A_Meiyan: 27019 24998 24931 24732 24517| 24517 [ 11] FNV1A_Jesteress: 25405 24216 24112 24240 24471| 24112 [ 11] FNV1A_Jester: 24355 23938 23833 23914 24672| 23833 [ 11] FNV1A_Smaragd: 24712 24679 24311 25139 24759| 24311 [ 11] FNV1A_Peregrine: 34158 33475 32550 33495 33119| 32550 [ 11] FNV1A_Whiz: 24603 24586 24919 26505 24570| 24570 [ 11] FNV1A_Nefertiti: 16523 16228 16047 16162 16373| 16047 [ 6690] FNV-1a: 25285 25305 25102 25242 25163| 25102 [ 12] Sixtinsensitive+: 19649 19808 19619 20769 19910| 19619 [ 31572] Sixtinsensitive: 19995 19031 18857 19831 18853| 18853 [ 31823] Alfalfa_Rollick: 5618 5769 5732 5604 5692| 5604 [ 5182] Alfalfa: 6177 5877 5878 5916 6015| 5877 [ 5182] Alfalfa_HALF: 5829 5808 5901 6096 5934| 5808 [ 10593] Alfalfa_DWORD: 5971 6091 5964 6193 6017| 5964 [ 5182] Alfalfa_QWORD: 6096 6116 6103 5976 6019| 5976 [ 5182] Bernstein: 6743 6508 6616 6860 6642| 6508 [ 12855] K&R: 5660 5396 5552 5467 5352| 5352 [ 10593] x17 unrolled: 7591 7647 7664 7728 7722| 7591 [ 26729] x65599: 9569 9762 9512 9765 9692| 9512 [ 35] Sedgewick: 9453 9310 9669 9504 9511| 9310 [ 11] Weinberger: 7267 7209 7155 7128 7184| 7128 [ 27452] Paul Larson: 6588 6306 6481 6336 6469| 6306 [ 14] Paul Hsieh: 33372 33097 33243 32862 33022| 32862 [ 30] One At Time: 31891 31652 31955 32240 33917| 31652 [ 28] lookup3: 34724 34649 35148 33936 34738| 33936 [ 28] Arash Partow: 7987 7909 7933 8171 7999| 7909 [ 11] CRC-32: 32050 34561 32286 34037 32238| 32050 [ 11] Ramakrishna: 6800 6765 6726 6766 6693| 6693 [ 11920] Fletcher: 20746 21133 20787 20814 21500| 20746 [ 42071] Murmur2: 28144 28965 27766 27079 27476| 27079 [ 30] Hanson: 32326 32349 32356 32086 32453| 32086 [ 33] Novak unrolled: 22379 22860 23012 22888 22864| 22379 [ 43340] SBox: 32514 32536 31759 31992 32705| 31759 [ 26] MaPrime2c: 30483 30545 31196 31047 30942| 30483 [ 11]248019 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 179028 174553 177981 175523 174781| 174553 [ 427] FNV1A_Meiyan: 123632 122620 120915 119542 119672| 119542 [ 365] FNV1A_Jesteress: 116618 116521 114923 115622 115284| 114923 [ 1938] FNV1A_Jester: 114802 114067 118031 114699 115439| 114067 [ 1938] FNV1A_Smaragd: 122969 125695 121732 123395 122146| 121732 [ 365] FNV1A_Peregrine: 182437 178080 179506 184654 181708| 178080 [ 365] FNV1A_Whiz: 117534 115430 116303 115853 117752| 115430 [ 1938] FNV1A_Nefertiti: 105310 106214 106906 108339 106486| 105310 [ 13655] FNV-1a: 117850 115653 116143 113840 115203| 113840 [ 411] Sixtinsensitive+: 141117 138681 140142 138261 137625| 137625 [ 145278] Sixtinsensitive: 112584 112133 113127 115259 113846| 112133 [ 139136] Alfalfa_Rollick: 44101 44551 44016 44380 44286| 44016 [ 13663] Alfalfa: 48331 46665 46135 46336 45574| 45574 [ 13663] Alfalfa_HALF: 42777 40995 41830 41363 41640| 40995 [ 22302] Alfalfa_DWORD: 48295 47415 49170 46909 48033| 46909 [ 13666] Alfalfa_QWORD: 49057 47531 49249 50390 48969| 47531 [ 13663] Bernstein: 44504 43530 43335 45011 42987| 42987 [ 37847] K&R: 35828 35036 34723 36075 35468| 34723 [ 22302] x17 unrolled: 42351 41905 42011 43212 41050| 41050 [ 98876] x65599: 65130 64455 65439 64865 65220| 64455 [ 492] Sedgewick: 67421 66574 68726 69338 67572| 66574 [ 460] Weinberger: 39373 38517 38971 39534 39034| 38517 [ 97808] Paul Larson: 46692 47034 47393 46918 46783| 46692 [ 312] Paul Hsieh: 181643 180929 179932 180131 181471| 179932 [ 481] One At Time: 176486 178151 175074 179022 175743| 175074 [ 585] lookup3: 186712 187299 187604 191075 188396| 186712 [ 480] Arash Partow: 64996 68647 66338 66992 65996| 64996 [ 560] CRC-32: 175958 176258 177073 178955 174797| 174797 [ 864] Ramakrishna: 46777 48286 46497 46997 45853| 45853 [ 30316] Fletcher: 136146 133758 146270 133606 134351| 133606 [ 49303] Murmur2: 173469 171420 172774 184955 173339| 171420 [ 451] Hanson: 171761 174920 173488 172920 171958| 171761 [ 481] Novak unrolled: 443212 446049 441939 439340 441672| 439340 [ 238439] SBox: 171345 169739 169607 172310 169658| 169607 [ 448] MaPrime2c: 168898 170270 169093 170387 168243| 168243 [ 556]855682 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 635469 619881 616091 617357 615924| 615924 [ 5518] FNV1A_Meiyan: 415088 419401 417240 417605 416751| 415088 [ 5330] FNV1A_Jesteress: 415868 417644 417151 418238 416844| 415868 [ 5968] FNV1A_Jester: 416369 415653 417404 418992 417513| 415653 [ 5968] FNV1A_Smaragd: 439764 441560 445192 443800 445971| 439764 [ 5330] FNV1A_Peregrine: 633723 638664 636564 629140 635454| 629140 [ 5330] FNV1A_Whiz: 419561 419168 415653 416053 422086| 415653 [ 5968] FNV1A_Nefertiti: 346358 346958 342009 343926 346074| 342009 [ 30631] FNV-1a: 569274 565082 568014 565177 564142| 564142 [ 5560] Sixtinsensitive+: 459193 459959 458521 455978 454754| 454754 [ 397985] Sixtinsensitive: 434941 436744 435405 435906 439194| 434941 [ 399637] Alfalfa_Rollick: 221477 221568 223042 223021 222469| 221477 [ 26670] Alfalfa: 225888 228010 227636 225752 224918| 224918 [ 26670] Alfalfa_HALF: 204340 201225 203866 202078 203984| 201225 [ 35000] Alfalfa_DWORD: 228464 229762 231969 227542 228853| 227542 [ 26671] Alfalfa_QWORD: 236423 235609 235735 234026 235571| 234026 [ 26670] Bernstein: 206162 206211 205870 205911 204922| 204922 [ 59573] K&R: 175157 176214 177255 176008 178668| 175157 [ 34582] x17 unrolled: 183717 184090 182407 182280 185108| 182280 [ 190251] x65599: 320093 320066 317655 318289 320661| 317655 [ 4932] Sedgewick: 303817 303858 305741 304128 306254| 303817 [ 4428] Weinberger: 168826 170675 170027 169820 171513| 168826 [ 177873] Paul Larson: 245727 247790 246504 247241 246850| 245727 [ 5499] Paul Hsieh: 649603 649342 654617 650496 655266| 649342 [ 30488] One At Time: 630109 633306 630020 630335 632801| 630020 [ 5476] lookup3: 655841 654395 659685 654746 657823| 654395 [ 5491] Arash Partow: 269233 269534 268099 268478 267374| 267374 [ 6272] CRC-32: 619892 618408 613697 611186 616191| 611186 [ 4450] Ramakrishna: 224644 226707 225684 225891 225163| 224644 [ 45786] Fletcher: 354516 353851 353636 355539 355643| 353636 [ 656967] Murmur2: 603138 605572 597462 602469 612482| 597462 [ 5387] Hanson: 617870 655715 616389 618870 625239| 616389 [ 5502] Novak unrolled: 3830475 3728574 3726761 3732577 3731494| 3726761 [ 827051] SBox: 617466 619852 616287 620911 622378| 616287 [ 5531] MaPrime2c: 608066 606798 609259 607319 609889| 606798 [ 5418]2236138 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 1693726 1672778 1662022 1667975 1663712| 1662022 [ 36643] FNV1A_Meiyan: 1398600 1396525 1397032 1398618 1402294| 1396525 [ 38870] FNV1A_Jesteress: 1293633 1292957 1286705 1289538 1285456| 1285456 [ 52366] FNV1A_Jester: 1292520 1286581 1292729 1292907 1290687| 1286581 [ 52366] FNV1A_Smaragd: 1621731 1628939 1624750 1622775 1623560| 1621731 [ 38870] FNV1A_Peregrine: 1745616 1750922 1752428 1753970 1752587| 1745616 [ 38870] FNV1A_Whiz: 1405758 1414269 1403700 1401918 1396313| 1396313 [ 52366] FNV1A_Nefertiti: 1289850 1261247 1258932 1260898 1261091| 1258932 [ 120192] FNV-1a: 1567310 1564706 1568154 1567700 1564842| 1564706 [ 37073] Sixtinsensitive+: 1476991 1482146 1482924 1474038 1479225| 1474038 [ 874695] Sixtinsensitive: 1607388 1598777 1610932 1603888 1607572| 1598777 [ 853608] Alfalfa_Rollick: 762068 777119 760775 768934 762752| 760775 [ 61964] Alfalfa: 778235 769760 767723 777981 771836| 767723 [ 61964] Alfalfa_HALF: 736108 708959 712845 713131 714714| 708959 [ 77952] Alfalfa_DWORD: 782912 782056 781509 782052 781405| 781405 [ 61957] Alfalfa_QWORD: 801646 806199 803009 799110 800906| 799110 [ 61964] Bernstein: 714618 718075 721649 718863 718033| 714618 [ 94387] K&R: 637258 635055 644195 644498 639929| 635055 [ 79819] x17 unrolled: 611861 610664 611586 610798 613531| 610664 [ 261696] x65599: 1017236 1011981 1017402 1011745 1014835| 1011745 [ 37963] Sedgewick: 977710 974922 976937 977692 986235| 974922 [ 36010] Weinberger: 665221 655858 659006 660126 659379| 655858 [ 256510] Paul Larson: 828451 828535 829061 826011 824462| 824462 [ 36486] Paul Hsieh: 1757052 1764429 1759602 1764186 1765198| 1757052 [ 75554] One At Time: 1751552 1745773 1752822 1742071 1750032| 1742071 [ 36793] lookup3: 1775213 1782342 1788541 1776350 1779976| 1775213 [ 36983] Arash Partow: 975995 975288 973406 972132 974392| 972132 [ 36388] CRC-32: 1675912 1674248 1679114 1686786 1672200| 1672200 [ 37609] Ramakrishna: 783869 781304 774580 777867 781566| 774580 [ 82766] Fletcher: 1729741 1746034 1739662 1743231 1741643| 1729741 [ 695965] Murmur2: 1673253 1681208 1690136 1679592 1705678| 1673253 [ 36934] Hanson: 1709350 1719499 1688917 1686299 1710949| 1686299 [ 37167] Novak unrolled: 12807728 12814669 12812825 12829304 12809311| 12807728 [ 2152234] SBox: 1690921 1688167 1682024 1685474 1696212| 1682024 [ 36476] MaPrime2c: 1687514 1683882 1690934 1687003 1690998| 1683882 [ 37112]4803152 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 3925306 3863029 3859598 3865504 3854072| 3854072 [ 167297] FNV1A_Meiyan: 3192046 3195965 3179265 3184785 3189121| 3179265 [ 168339] FNV1A_Jesteress: 3176691 3219146 3182731 3173671 3149210| 3149210 [ 170503] FNV1A_Jester: 3155641 3146388 3141771 3149879 3150006| 3141771 [ 170503] FNV1A_Smaragd: 3666858 3654821 3662842 3665221 3658382| 3654821 [ 168339] FNV1A_Peregrine: 3867947 3860289 3854550 3860177 3857658| 3854550 [ 168339] FNV1A_Whiz: 3182447 3195344 3196748 3185454 3184057| 3182447 [ 170503] FNV1A_Nefertiti: 3046587 3035378 3034303 3032712 3030773| 3030773 [ 215095] FNV-1a: 3640750 3650958 3639683 3654937 3646469| 3639683 [ 168058] Sixtinsensitive+: 3253634 3254085 3240195 3271119 3274314| 3240195 [ 1446295] Sixtinsensitive: 3320076 3295173 3299524 3298457 3312531| 3295173 [ 1433399] Alfalfa_Rollick: 2053934 2053399 2053120 2058154 2054123| 2053120 [ 195034] Alfalfa: 2081500 2073797 2072266 2068358 2077348| 2068358 [ 195034] Alfalfa_HALF: 1950333 1947412 1954580 1958729 1950574| 1947412 [ 214254] Alfalfa_DWORD: 2096353 2095004 2098951 2088296 2094393| 2088296 [ 195036] Alfalfa_QWORD: 2156447 2162179 2169995 2169146 2166216| 2156447 [ 195034] Bernstein: 1955300 1967808 1970984 1964697 1961368| 1955300 [ 227590] K&R: 1797894 1801834 1799419 1799418 1805518| 1797894 [ 211850] x17 unrolled: 1698527 1693994 1691649 1694846 1692182| 1691649 [ 432040] x65599: 2588406 2597619 2593762 2588259 2583001| 2583001 [ 168139] Sedgewick: 2523482 2521966 2543139 2544848 2558079| 2521966 [ 172013] Weinberger: 2208798 2219657 2188612 2187141 2191627| 2187141 [ 771987] Paul Larson: 2192456 2185158 2190565 2189394 2193449| 2185158 [ 166691] Paul Hsieh: 3943296 3943258 3937383 3948230 3936017| 3936017 [ 238759] One At Time: 4000288 3997946 4001915 3991514 4003329| 3991514 [ 167708] lookup3: 3962573 3961724 3971915 3965543 3973480| 3961724 [ 167908] Arash Partow: 2344027 2340630 2358305 2344931 2342474| 2340630 [ 165769] CRC-32: 3854146 3859270 3865907 3861287 3841605| 3841605 [ 166743] Ramakrishna: 2124189 2133556 2123155 2125564 2141054| 2123155 [ 220086] Fletcher: 2922879 2927199 2931088 2917889 2913200| 2913200 [ 3262980] Murmur2: 3801662 3800472 3803887 3807522 3798062| 3798062 [ 167779] Hanson: 3818130 3830602 3831806 3825353 3832675| 3818130 [ 169743] Novak unrolled: 24990613 25009525 24991605 25198933 25036322| 24990613 [ 4562018] SBox: 3835014 3826787 3823619 3822488 3823134| 3822488 [ 167630] MaPrime2c: 3863293 3869203 3873062 3865968 3880730| 3863293 [ 167237]8956496 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 7369727 7247741 7261370 7253739 7266209| 7247741 [ 574438] FNV1A_Meiyan: 5847582 5886598 5930016 5881373 5854930| 5847582 [ 1028978] FNV1A_Jesteress: 5836792 5850374 5854357 5860963 5850093| 5836792 [ 1028978] FNV1A_Jester: 5769096 5750931 5764104 5758504 5764662| 5750931 [ 913448] FNV1A_Smaragd: 7574255 7570837 7558324 7565386 7571800| 7558324 [ 574438] FNV1A_Peregrine: 7647868 7636216 7672854 7713413 7620732| 7620732 [ 913448] FNV1A_Whiz: 5811576 5802315 5804006 5807046 5795549| 5795549 [ 913448] FNV1A_Nefertiti: 6992685 7015027 7011395 7016150 6997868| 6992685 [ 896124] FNV-1a: 7426811 7423883 7422975 7425308 7492168| 7422975 [ 572119] Sixtinsensitive+: 7490178 7494893 7567449 7527364 7506432| 7490178 [ 2560474] Sixtinsensitive: 6603978 6604723 6613926 6612349 6612198| 6603978 [ 2277956] Alfalfa_Rollick: 5270257 5276405 5265746 5261441 5276458| 5261441 [ 1633539] Alfalfa: 4773629 4768585 4773963 4765614 4767022| 4765614 [ 598532] Alfalfa_HALF: 4551844 4549514 4562170 4564350 4556019| 4549514 [ 613540] Alfalfa_DWORD: 4934446 4942344 4953421 4915923 4916364| 4915923 [ 598520] Alfalfa_QWORD: 5140486 5153957 5146400 5159958 5143577| 5140486 [ 598520] Bernstein: 4621485 4622047 4609047 4626567 4616239| 4609047 [ 632926] K&R: 4615881 4608413 4612256 4608572 4608237| 4608237 [ 609893] x17 unrolled: 4080313 4077832 4055702 4069213 4080348| 4055702 [ 868872] x65599: 5683539 5666888 5694646 5737104 5727959| 5666888 [ 574269] Sedgewick: 5651701 5627789 5631242 5624196 5614715| 5614715 [ 570722] Weinberger: 6571422 6584821 6580232 6565814 6570873| 6565814 [ 2462847] Paul Larson: 4983728 4986882 4984806 4982408 4977153| 4977153 [ 573492] Paul Hsieh: 7963669 7934035 7959702 7964171 7986804| 7934035 [ 609781] One At Time: 8281677 8236676 8181044 8174620 8171247| 8171247 [ 572492] lookup3: 8131042 8098702 8119699 8131122 8127732| 8098702 [ 571014] Arash Partow: 5694880 5689350 5703258 5687727 5695461| 5687727 [ 569873] CRC-32: 7737260 7735351 7731257 7817579 7805965| 7731257 [ 570199] Ramakrishna: 4960288 4975901 4964380 4955695 4947370| 4947370 [ 629279] Fletcher: 8703788 8735616 8710538 8720857 8710367| 8703788 [ 2646718] Murmur2: 7772966 7779928 7763846 7781060 7776226| 7763846 [ 570799] Hanson: 7982700 8000687 8098676 7959213 7977307| 7959213 [ 581505] Novak unrolled: 36453591 36501725 36504178 36452614 35493766| 35493766 [ 8271712] SBox: 7534769 7522317 7502772 7527539 7516409| 7502772 [ 572545] MaPrime2c: 7631536 7603833 7626193 7617020 7676645| 7603833 [ 572590]15006172 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 13805664 13605789 13605469 13595929 13624790| 13595929 [ 1560014] FNV1A_Meiyan: 11637701 11612695 11614641 11671544 11646487| 11612695 [ 1656865] FNV1A_Jesteress: 11769885 11633601 11644545 11629472 11618162| 11618162 [ 1656865] FNV1A_Jester: 11646312 11644806 11627881 11614360 11653957| 11614360 [ 1599151] FNV1A_Smaragd: 13708741 13801274 13802221 13726471 13700392| 13700392 [ 1560794] FNV1A_Peregrine: 13762367 13768122 13775727 13756456 13755180| 13755180 [ 1599151] FNV1A_Whiz: 11692615 11707611 11830349 11673884 11728753| 11673884 [ 1599151] FNV1A_Nefertiti: 11818117 11783640 11799180 11792281 11818414| 11783640 [ 1642435] FNV-1a: 14438052 14415112 14424435 14534772 14397478| 14397478 [ 1559756] Sixtinsensitive+: 12537062 12533209 12528706 12526817 12539785| 12526817 [ 3634999] Sixtinsensitive: 12359837 12363844 12352257 12369827 12430243| 12352257 [ 3573121] Alfalfa_Rollick: 9596214 9576997 9596140 9585540 9586623| 9576997 [ 1796113] Alfalfa: 9631180 9631644 9646953 9631102 9645931| 9631102 [ 1585513] Alfalfa_HALF: 9309513 9308152 9309767 9386163 9309212| 9308152 [ 1599087] Alfalfa_DWORD: 9784801 9779266 9768980 9776590 9778093| 9768980 [ 1585512] Alfalfa_QWORD: 10243282 10241185 10250929 10247302 10227501| 10227501 [ 1585512] Bernstein: 10167814 10222175 10258834 10199647 10188531| 10167814 [ 1613382] K&R: 9476524 9501514 9493102 9506902 9485744| 9476524 [ 1601151] x17 unrolled: 8488170 8457047 8465708 8474736 8470738| 8457047 [ 1874514] x65599: 11919751 12111993 11896090 11916367 11906404| 11896090 [ 1560060] Sedgewick: 11883738 11902481 11876890 11862498 11874301| 11862498 [ 1561162] Weinberger: 14886178 14897714 15010720 14889111 14912039| 14886178 [ 4913597] Paul Larson: 9985066 9988315 9993121 9980663 9987270| 9980663 [ 1558905] Paul Hsieh: 14493691 14487566 14472722 14616287 14481414| 14472722 [ 1601618] One At Time: 15014211 15011538 14998070 15011306 15034171| 14998070 [ 1559519] lookup3: 14432190 14404073 14484438 14466880 14432023| 14404073 [ 1559611] Arash Partow: 10727471 10704561 10702966 10736381 10695439| 10695439 [ 1561948] CRC-32: 13970409 14002291 14020990 14111270 14031718| 13970409 [ 1559521] Ramakrishna: 10870117 10871609 10863313 10869283 10873759| 10863313 [ 1617827] Fletcher: 11197838 11199418 11188781 11211803 11207085| 11188781 [ 8696395] Murmur2: 14182899 14158526 14076046 14113060 14073436| 14073436 [ 1558392] Hanson: 14374202 14351157 14381037 14363612 14357884| 14351157 [ 1589703] Novak unrolled: 42308739 42114355 42204100 42355087 42168342| 42114355 [13102749] SBox: 14636086 14655729 14614969 14637515 14624832| 14614969 [ 1560519] MaPrime2c: 14980645 14880612 14869258 14870841 14866759| 14866759 [ 1557844]22992127 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 24055313 23752703 23627674 23635254 23651268| 23627674 [ 3525152] FNV1A_Meiyan: 21484521 21485590 21539671 21508878 21479985| 21479985 [ 3611850] FNV1A_Jesteress: 21471437 21479105 21470194 21441619 21588275| 21441619 [ 3611850] FNV1A_Jester: 21669658 21666314 21623420 21652606 21653595| 21623420 [ 3654062] FNV1A_Smaragd: 24030079 23964112 23916920 23921989 23915584| 23915584 [ 3525152] FNV1A_Peregrine: 24544415 24646189 24553231 24515824 24578414| 24515824 [ 3654062] FNV1A_Whiz: 21591734 21588433 21718497 21647686 21668049| 21588433 [ 3654062] FNV1A_Nefertiti: 22574103 22547113 22552888 22612894 22634642| 22547113 [ 3739373] FNV-1a: 24375206 24398812 24382207 24420587 24455063| 24375206 [ 3527537] Sixtinsensitive+: 23638104 23647702 23623564 23634566 23623324| 23623324 [ 5880454] Sixtinsensitive: 22370814 22356235 22289947 22309625 22289803| 22289803 [ 5652221] Alfalfa_Rollick: 18203747 18184287 18302923 18218867 18196632| 18184287 [ 3613229] Alfalfa: 18409370 18403101 18403470 18391784 18407508| 18391784 [ 3544682] Alfalfa_HALF: 17942649 17848766 17844460 17813733 17825962| 17813733 [ 3558121] Alfalfa_DWORD: 18700603 18698421 18747674 18674743 18655766| 18655766 [ 3544682] Alfalfa_QWORD: 19368576 19378340 19362669 19361865 19346436| 19346436 [ 3544682] Bernstein: 18359820 18240404 18241044 18224922 18250261| 18224922 [ 3574833] K&R: 17517888 17518595 17556678 17588614 17500726| 17500726 [ 3561685] x17 unrolled: 16621547 16643586 16639577 16625549 16634323| 16621547 [ 3832933] x65599: 20818696 20817156 20729503 20772917 20736959| 20729503 [ 3528251] Sedgewick: 20434750 20463676 20561028 20480465 20457107| 20434750 [ 3525723] Weinberger: 26213981 26230236 26190034 26320772 26191497| 26190034 [ 7239926] Paul Larson: 20373400 20412562 20383213 20385771 20417860| 20373400 [ 3523757] Paul Hsieh: 25233187 25148714 25200371 25140735 25108527| 25108527 [ 3551794] One At Time: 26273020 26147008 26154750 26122919 26156684| 26122919 [ 3525042] lookup3: 25438420 25540246 25435166 25411436 25469256| 25411436 [ 3526141] Arash Partow: 20932539 20950835 20953799 20893649 20888163| 20888163 [ 3528077] CRC-32: 24446832 24401820 24418762 24552466 24398535| 24398535 [ 3529021] Ramakrishna: 19294474 19292354 19285851 19301630 19361718| 19285851 [ 3580093] Fletcher: 29491893 29393892 29382254 29383070 29559628| 29382254 [ 7785869] Murmur2: 24637145 24616643 24636645 24621049 24614172| 24614172 [ 3523751] Hanson: 25108618 24953984 24937324 24961635 24952696| 24937324 [ 3607402] Novak unrolled: 45287398 45201809 45162351 45247704 45070137| 45070137 [17947092] SBox: 24623427 24650569 24603930 24770727 24644057| 24603930 [ 3523461] MaPrime2c: 25086850 25030747 25112970 25110324 25208499| 25030747 [ 3525183]32707519 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 39663161 38969335 38933207 38856246 38873958| 38856246 [ 6818433] FNV1A_Meiyan: 37322456 37169246 37140526 37202370 37295058| 37140526 [ 6848898] FNV1A_Jesteress: 37151332 37169251 37148774 37289624 37146213| 37146213 [ 6848898] FNV1A_Jester: 37740139 37803620 37784132 37714810 37710040| 37710040 [ 6844433] FNV1A_Smaragd: 38917671 38764848 38805502 38781550 38871100| 38764848 [ 6820479] FNV1A_Peregrine: 39110328 39137217 39238079 39198853 39122895| 39110328 [ 6844433] FNV1A_Whiz: 37999460 38084763 38004072 38034855 38018002| 37999460 [ 6844433] FNV1A_Nefertiti: 35477881 35314767 35317648 35312868 35476589| 35312868 [ 6862897] FNV-1a: 39935647 39901778 39919333 40023990 39893528| 39893528 [ 6820338] Sixtinsensitive+: 37211159 37340669 37211783 37221348 37240185| 37211159 [ 8918604] Sixtinsensitive: 36839819 36755880 36778962 36790829 36880758| 36755880 [ 8890228] Alfalfa_Rollick: 30997547 30981009 31010693 31102597 31006573| 30981009 [ 6874377] Alfalfa: 31737062 31762501 31742817 31887839 31774054| 31737062 [ 6837237] Alfalfa_HALF: 31115319 31097943 31254094 31134033 31107714| 31097943 [ 6844423] Alfalfa_DWORD: 31877000 31892971 31991164 31877463 31879958| 31877000 [ 6837236] Alfalfa_QWORD: 32856874 32982880 32898091 32835536 32852958| 32835536 [ 6837236] Bernstein: 31919522 31814902 31776473 31806821 31801622| 31776473 [ 6857971] K&R: 31123177 30974104 30959249 30974944 31050042| 30959249 [ 6850978] x17 unrolled: 29386189 29363602 29375920 29349031 29463899| 29349031 [ 7098408] x65599: 35127704 35115087 35096492 35247998 35097892| 35096492 [ 6819716] Sedgewick: 34917368 34883533 35086999 34888583 34912648| 34883533 [ 6818912] Weinberger: 41254124 41321017 41222180 41258375 41281811| 41222180 [10307895] Paul Larson: 32874893 32887489 32870621 32829068 32969357| 32829068 [ 6817853] Paul Hsieh: 40409490 40325257 40516598 40411823 40368034| 40325257 [ 6855160] One At Time: 42473111 42621639 42489644 42481293 42668990| 42473111 [ 6818900] lookup3: 39942929 39938528 39943634 40094568 39925246| 39925246 [ 6816040] Arash Partow: 34247173 34315764 34318592 34220246 34266975| 34220246 [ 6820854] CRC-32: 39786257 39800688 39757461 39725372 39891213| 39725372 [ 6818991] Ramakrishna: 33337713 33359240 33358901 33467731 33378484| 33337713 [ 6864715] Fletcher: 34382645 34384318 34476525 34384583 34358066| 34358066 [17501262] Murmur2: 39472463 39601819 39483558 39451986 39443158| 39443158 [ 6817202] Hanson: 40538844 40402486 40392937 40502717 40417510| 40392937 [ 7000005] Novak unrolled: 44247384 44242070 44412707 44241831 44227650| 44227650 [20559205] SBox: 40115423 40006667 39980360 40095793 39988604| 39980360 [ 6821829] MaPrime2c: 41060106 41075050 41175522 41060267 41057558| 41057558 [ 6819946]43802365 lines read
67108864 elements in the table (26 bits)
FNV1A_Mantis: 59952876 58803469 58923244 58792350 58989388| 58792350 [11635004] FNV1A_Meiyan: 55465475 55474404 55508029 55424802 55438444| 55424802 [11659429] FNV1A_Jesteress: 52497350 52380658 52576751 52399304 52419711| 52380658 [11818547] FNV1A_Jester: 53425354 53300316 53324168 53372656 53241243| 53241243 [12062644] FNV1A_Smaragd: 59908693 59741994 59788332 59969316 59756263| 59741994 [11635004] FNV1A_Peregrine: 60292033 60116021 60306069 60136755 60224361| 60116021 [11677586] FNV1A_Whiz: 57514713 57300395 57518028 57355838 57384306| 57300395 [12062644] FNV1A_Nefertiti: 58579235 58369723 58497589 58368938 58472111| 58368938 [12199626] FNV-1a: 61350065 61227879 61371281 61221694 61322523| 61221694 [11630492] Sixtinsensitive+: 58750571 58831447 58919152 58808174 58837582| 58750571 [14097063] Sixtinsensitive: 60824218 60843024 60732440 60798747 60949721| 60732440 [14027127] Alfalfa_Rollick: 50700991 50861491 50757394 50707789 50866710| 50700991 [11667363] Alfalfa: 51331604 51254984 51374576 51204869 51215921| 51204869 [11646579] Alfalfa_HALF: 50498639 50357422 50475334 50435161 50427230| 50357422 [11652948] Alfalfa_DWORD: 51926525 51882880 51921051 51988651 51929979| 51882880 [11646593] Alfalfa_QWORD: 53368751 53431091 53401526 53457158 53399570| 53368751 [11646573] Bernstein: 51780948 51834772 51740616 51786817 51980943| 51740616 [11662585] K&R: 50720934 50823237 50784556 50750716 50848267| 50720934 [11655087] x17 unrolled: 48435214 48430906 48575140 48432371 48455008| 48430906 [11861222] x65599: 56005015 55872129 55909899 55947120 55903420| 55872129 [11632515] Sedgewick: 55947483 55828349 55855731 55938120 55820960| 55820960 [11633458] Weinberger: 64107358 63999034 64133733 64036367 64080080| 63999034 [15481171] Paul Larson: 53084388 53004950 53146557 53031828 53055844| 53004950 [11630978] Paul Hsieh: 61529775 61391502 61527462 61366620 61415225| 61366620 [11651683] One At Time: 65024854 64981867 65087212 64972332 65096548| 64972332 [11634172] lookup3: 62310133 62428054 62303782 62457776 62342284| 62303782 [11632483] Arash Partow: 56572844 56691783 56562409 56631840 56602057| 56562409 [11628687] CRC-32: 60661602 60789672 60664234 60804397 60621855| 60621855 [11633685] Ramakrishna: 53972634 53880488 53896432 53976477 53845704| 53845704 [11671893] Fletcher: 74218474 74084565 74190736 74047460 74184412| 74047460 [17365212] Murmur2: 60389939 60448616 60500681 60306495 60454486| 60306495 [11630747] Hanson: 62505541 62568248 62495065 62509340 62576766| 62495065 [11992494] Novak unrolled: 43245062 43296038 43220401 43189639 43173652| 43173652 [19028002] SBox: 61389499 61282368 61384062 61288236 61365819| 61282368 [11633664] MaPrime2c: 62738819 62847730 62865690 62807543 62915332| 62738819 [11628836] D:\_KA45F~1\r3\RESULTS>in order to enrich the versatility of your testbed my suggestion is Heavy-IP(IPs.TXT 2,995,394 keys) dataset to be added(as a table) to en-wikipedia dataset, I think millions of words&IPs are a must-show basis. I don't like(speaking of some critics) words as 'untrustworthy' to be connected with my name.
As for the match finding test I guess the criticizers must offer-first shoot-next.
Regards.
Oh, I forgot to give the testbed for all this dumps above:
http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z
thanks, but Mantis reads beyond buffer boundary if (wrdlen < 15). For example, if wrdlen == 7:
Yes Peter, again a stupid error from side, caused by hurry-mode in which I am these days, I saw it yesterday 3hours after updating my site, and fixed with 'if (wrdlen) {}' wrapping.
After 1 hour I will post again, last night I remade the 200MB test also, with some big table in mind for the weekend.
Mantis is a very good predator/hash, I will be glad if you test it on your machine.
Thanks.
Mantis source and Benchmark-Dumps also (http://www.sanmayce.com/Downloads/_KAZE_hash_test_r3.7z) (224,916,121 bytes) fixed now,
here is the bugless Mantis:
define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Mantis(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; const char *p = str; // Cases: 0,1,2,3,4,5,6,7 if (wrdlen & sizeof(DWORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(DWORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); //wrdlen -= sizeof(WORD); } if (wrdlen & 1) { hash32 = (hash32 ^ *p) * PRIME; p += sizeof(char); //wrdlen -= sizeof(char); } wrdlen -= p-str; // The goal is to avoid the weak range [8, 8+2, 8+1] that is 8..10 in practice 1..15 i.e. 1..8+4+2+1, thus amending FNV1A_Meiyan and FNV1A_Jesteress. // FNV1A_Jesteress: fastest strong // FNV1A_Meiyan : faster stronger // FNV1A_Mantis : fast strongest if (wrdlen) { for(; wrdlen > 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5)^*(DWORD *)(p+4))) * PRIME; } hash32 = (hash32 ^ *(WORD*)(p+0*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+1*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+2*sizeof(WORD))) * PRIME; hash32 = (hash32 ^ *(WORD*)(p+3*sizeof(WORD))) * PRIME; } // Bug Fixed! return hash32 ^ (hash32 >> 16); }Mantis results on Pentium M:
Core i5:
Generally, Mantis has similar number of collisions to Meiyan, but Mantis is slower.
Thanks for testing,
at link below is my attempt to present Heavy-Hash-Hustle dumps in a more digestible fashion:
http://www.sanmayce.com/Fastest_Hash/index.html#Heavy-Hash-Hustle
Intel Core 2 Quad Q9550S Yorkfield 2.83GHz 12MB L2 Cache:
8388608 elements in the table (23 bits)
2995394 lines read
FNV1A_Meiyan: 1914155 1914404 1914122 1914486 1913915| 1913915 [ 593723] FNV1A_Jesteress: 1935195 1937318 1937469 1936312 1935495| 1935195 [ 691369] Alfalfa_Rollick: 1980953 1981207 1980671 1981201 1981856| 1980671 [ 604098] FNV1A_Mantis: 2016612 1995406 1993654 1993662 1993258| 1993258 [ 481137] Novak unrolled: 2010097 2009996 2009829 2010971 2009834| 2009829 [ 657377] Hanson: 2012466 2012191 2012537 2012350 2012236| 2012191 [ 534251] FNV1A_Smaragd: 2030879 2028917 2029161 2029013 2028666| 2028666 [ 480914] CRC-32: 2034011 2033842 2035153 2033904 2034696| 2033842 [ 472854] FNV1A_Jester: 2050708 2050776 2050694 2050668 2051040| 2050668 [ 689339] K&R: 2064544 2064406 2065423 2065249 2065812| 2064406 [ 474011] Murmur2: 2066808 2066666 2066382 2066497 2066478| 2066382 [ 476330] Alfalfa: 2068149 2067824 2068182 2068081 2067452| 2067452 [ 475434] Alfalfa_HALF: 2071077 2071411 2071349 2070671 2071446| 2070671 [ 480071] Alfalfa_DWORD: 2081631 2081774 2081485 2081846 2081789| 2081485 [ 475434] FNV1A_Peregrine: 2098860 2098795 2098753 2098718 2098887| 2098718 [ 546915] x17 unrolled: 2112134 2114144 2112556 2112551 2112255| 2112134 [ 475528] Paul Larson: 2121042 2121087 2120977 2120391 2121282| 2120391 [ 475575] FNV1A_Whiz: 2127393 2127963 2127276 2126858 2127638| 2126858 [ 689339] SBox: 2137206 2137208 2137045 2136962 2136563| 2136563 [ 476681] Sedgewick: 2137897 2138151 2137352 2138251 2137971| 2137352 [ 477931] Bernstein: 2165058 2164180 2164633 2164803 2164633| 2164180 [ 474048] FNV-1a: 2169036 2169173 2168907 2168979 2168931| 2168907 [ 477067] FNV1A_Nefertiti: 2184810 2184848 2184522 2184738 2185078| 2184522 [ 763451] Alfalfa_QWORD: 2190122 2189991 2189488 2189910 2189894| 2189488 [ 475434] Paul Hsieh: 2196409 2195461 2195511 2195236 2195676| 2195236 [ 543835] Weinberger: 2203173 2203305 2202832 2202739 2202064| 2202064 [ 1159267] Sixtinsensitive: 2225863 2225928 2226657 2226719 2226205| 2225863 [ 582793] Arash Partow: 2241218 2241227 2241293 2240949 2240652| 2240652 [ 478246] lookup3: 2246638 2246955 2247440 2247833 2247080| 2246638 [ 476566] MaPrime2c: 2250515 2250361 2251148 2251678 2252040| 2250361 [ 477151] Ramakrishna: 2268889 2268536 2268740 2268861 2268980| 2268536 [ 476020] Sixtinsensitive+: 2272421 2272439 2272689 2272420 2272411| 2272411 [ 716367] x65599: 2341902 2342241 2341970 2341930 2341731| 2341731 [ 654463] One At Time: 2360163 2360112 2360203 2359913 2359343| 2359343 [ 477667] Fletcher: 24424563 24401091 24400993 24395252 24394052| 24394052 [ 2856890]Intel Atom N450 1.66GHz 512KB L2 Cache:
8388608 elements in the table (23 bits)
2995394 lines read
FNV1A_Meiyan: 3155958 3159537 3157158 3158773 3156719| 3155958 [ 593723] Weinberger: 3248750 3246228 3248741 3249811 3243717| 3243717 [ 1159267] FNV1A_Jesteress: 3253953 3256344 3264163 3254169 3254820| 3253953 [ 691369] FNV1A_Mantis: 3314496 3281163 3284368 3283819 3281818| 3281163 [ 481137] CRC-32: 3343937 3345737 3356535 3347699 3345348| 3343937 [ 472854] FNV1A_Peregrine: 3376694 3378179 3386379 3379572 3377683| 3376694 [ 546915] Alfalfa_HALF: 3379134 3382696 3380707 3377835 3392481| 3377835 [ 480071] FNV1A_Smaragd: 3379196 3382352 3380635 3379201 3379668| 3379196 [ 480914] lookup3: 3424892 3430226 3426356 3421977 3426662| 3421977 [ 476566] Murmur2: 3430503 3430565 3429001 3429150 3427817| 3427817 [ 476330] Hanson: 3433796 3434865 3435452 3435932 3432672| 3432672 [ 534251] Paul Hsieh: 3435899 3433042 3435868 3436050 3436606| 3433042 [ 543835] Alfalfa_Rollick: 3465457 3459388 3456972 3458956 3458066| 3456972 [ 604098] Bernstein: 3487977 3491575 3494922 3486510 3487834| 3486510 [ 474048] FNV1A_Jester: 3504276 3504202 3510988 3505340 3503647| 3503647 [ 689339] Sixtinsensitive: 3515471 3519793 3523615 3517541 3519258| 3515471 [ 582793] FNV1A_Whiz: 3533310 3534248 3531524 3534347 3531857| 3531524 [ 689339] x17 unrolled: 3542761 3543107 3541781 3540489 3542713| 3540489 [ 475528] Arash Partow: 3542980 3542590 3542073 3542242 3545691| 3542073 [ 478246] Sixtinsensitive+: 3546786 3546044 3546339 3546185 3544348| 3544348 [ 716367] K&R: 3564368 3562589 3565386 3565243 3563323| 3562589 [ 474011] FNV-1a: 3576419 3588693 3579410 3576049 3579704| 3576049 [ 477067] Novak unrolled: 3595877 3582905 3580473 3580509 3582652| 3580473 [ 657377] Alfalfa: 3583829 3584782 3582429 3584607 3581127| 3581127 [ 475434] FNV1A_Nefertiti: 3593590 3588775 3591870 3593603 3590078| 3588775 [ 763451] Alfalfa_DWORD: 3622713 3620781 3622630 3622695 3626486| 3620781 [ 475434] Ramakrishna: 3651378 3652073 3652729 3652338 3651156| 3651156 [ 476020] Sedgewick: 3661915 3656998 3656072 3656996 3662433| 3656072 [ 477931] Paul Larson: 3706139 3694095 3692079 3694297 3694153| 3692079 [ 475575] Alfalfa_QWORD: 3696531 3702283 3701765 3703357 3699967| 3696531 [ 475434] SBox: 3785574 3779069 3780873 3782789 3785080| 3779069 [ 476681] One At Time: 3955247 3954321 3955469 3963892 3953817| 3953817 [ 477667] x65599: 3998490 4006600 3997098 3994519 3996586| 3994519 [ 654463] MaPrime2c: 4191055 4193915 4193048 4204844 4193416| 4191055 [ 477151] Fletcher: 77210862 77226344 77294433 77235843 77205175| 77205175 [ 2856890]My wish is to cover all meaningful(at least for LZ) lengths that is 3..66 bytes but a different approach must be commenced because of HUGE size of dataset:
Speaking of very precious(regarding English language usage, and original thoughts used, also hundreds of books included) OSHO.TXT I propose one simple way of achieving Building-Blocks hashing:
loading 197MB(the file itself) and hashing(3..66 chunks) at each position(i.e. one byte increment).
Another thing I want to share regarding collision managing:
approaches(rehashing, chains, ...) without definite goals i.e. context are like kata(detailed choreographed patterns of movements)(the real fight is an extension/mix of kata/techniques with complex timing which includes awareness of timings of outer things not just your own timing), I mean if enough(free RAM) resources are given not utilizing/exploiting them and talking about speed as this-and-that is a dead-end.
For example I tested(now commented) a FNV1A variant hash function in Leprechaun which outperforms(hash time + lookup time) FNV1A_Jesteress by (1,8??,???-1,6??,???)/1,6??,???*100%= 12.5%, but it is completely due to B-tree used as collision manager at final stage.
This very hasher performs not well while other techniques are used, though.
The point is, speed is something beyond all limitations imposed, it must be chased for each niche relentlessly.
One phenom in real world is Mr. Bolt: his fantastic technique is being constantly improved, or as he says in one interview he and his trainer work on even faster than one of the fastest starts in 100M races. It's just amazing the tallest sprinter to have one of most explosive starts as well. And even more amazing is the will for improvements.
I have a sambo practitioner buddy who had said about his 100/200M records: "What technique? It is just left-right left-right!"
Of course I did disagree. Neglecting the basic/fundamental stances leads to nasty future slips(a kind of 'O! What happened' i.e. lack of further deep-understanding/technique-improvement).
I fully agree with:
http://cbloomrants.blogspot.com/2010/11/11-29-10-useless-hash-test.html
Looking at my 3chars..12chars Building-Block test I see the strong candidate for A future ultimate testbed.
Hi Peter,
pre-yesterday a documentary movie on History Channel about Nicola Tesla (an outstanding man not only a pragmatic visionary) inspired me to tune an almost forgotten hasher.
Here comes FNV1A_Tesla: a suitable hasher for keys [much] longer than 15 bytes (the case of 3+grams phrases).
That is the very (with the 64bitx32bit->32bit multiplication and loss of carry) hash function I was talking about previously.
In all tests, below, FNV1A_Tesla outspeeds all my FNV1A variants.
Surprisingly the bad collision rate doesn't affect its speed, I have been hit [again] by the fact that the brutal loss of data doesn't (when keys are not with ONLY weak-range lengths) hurt the lookup.
Here the speed/dispersion trade-off was made in favor of speed of course.
The function itself:
//#define ROL(x, n) (((x) << (n)) | ((x) >> (32-(n)))) UINT FNV1A_Hash_Tesla(const char *str, SIZE_T wrdlen) { const UINT PRIME = 709607; UINT hash32 = 2166136261; //unsigned long long hash64 = 2166136261; // Change with a bigger one! const char *p = str; //unsigned long long QWORD1,QWORD2; //64bit=QWORD for(; wrdlen >= 2*2*sizeof(DWORD); wrdlen -= 2*2*sizeof(DWORD), p += 2*2*sizeof(DWORD)) { hash32 = (hash32 ^ (ROL(*(unsigned long long *)(p+0),5-0)^*(unsigned long long *)(p+8))) * PRIME; // loss of carry! //hash64 = (hash64 ^ (ROL(QWORD1,5-0)^QWORD2)) * PRIME; //hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME; //hash32 = (hash32 ^ (ROL(*(DWORD *)(p+8),5-0)^*(DWORD *)(p+12))) * PRIME; } //hash32 = hash64 ^ (hash64 >> 32); // Cases: 0,1,2,3,4,5,6,7,... 15 if (wrdlen & (2*sizeof(DWORD))) { hash32 = (hash32 ^ (ROL(*(DWORD *)p,5-0)^*(DWORD *)(p+4))) * PRIME; //hash32 = (hash32 ^ *(DWORD*)p) * PRIME; //hash32 = (hash32 ^ *(DWORD*)(p+4)) * PRIME; p += 2*sizeof(DWORD); } if (wrdlen & sizeof(DWORD)) { hash32 = (hash32 ^ *(DWORD*)p) * PRIME; p += sizeof(DWORD); } if (wrdlen & sizeof(WORD)) { hash32 = (hash32 ^ *(WORD*)p) * PRIME; p += sizeof(WORD); } if (wrdlen & 1) hash32 = (hash32 ^ *p) * PRIME; return hash32 ^ (hash32 >> 16); }I am curious what amendments can be done. It is revision 1.
My 64bit knowledge/experience is next to nothing, so it would be nice somebody to refine it especially for 64bit compilers.
Some tests (on my Intel Merom 2.16GHz, Windows XP 32bit, VS2008 32bit compiler):
Volume in drive D is H320_Vol5
Volume Serial Number is 0CB3-C881
Directory of D:\_KAZE_new-stuff\VivaNicolaTesla 03/16/2011 07:54 AM <DIR> .. 03/16/2011 07:54 AM <DIR> . 03/16/2011 07:39 AM 218,698 hash.cod 03/16/2011 07:39 AM 65,440 hash.cpp 03/16/2011 07:39 AM 87,552 hash.exe 03/16/2011 07:39 AM 8,390 BuildLog.htm 11/14/2010 02:39 PM 7,000,453 Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv 12/03/2010 07:30 AM 42,892,307 IPS.TXT 11/14/2010 02:39 PM 4,347,243 Sentence-list_00,032,359_English_The_Holy_Bible.txt 03/15/2011 12:10 PM 104,857,601 100MB_as_one_line.TXT 03/16/2011 07:54 AM 409,829,386 googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt 11/14/2010 02:39 PM 4,024,146 Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd 11/14/2010 02:39 PM 388,308 Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd 11/14/2010 02:39 PM 146,973,879 Word-list_12,561,874_wikipedia-en-html.tar.wrd 11/14/2010 02:39 PM 278,013,406 Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd 13 File(s) 998,706,809 bytes 2 Dir(s) 2,947,858,432 bytes free D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_22,202,980_wikipedia-de-en-es-fr-it-nl-pt-ro-html.tar.wrd"22202980 lines read
67108864 elements in the table (26 bits)
FNV1A_Hash_Tesla: 23890797 23614936 23680579 23684698 23606344| 23606344 [ 3457538] FNV1A_Mantis: 24848068 24866965 24863372 25003541 24859437| 24848068 [ 3298270] FNV1A_Meiyan: 23836095 23832986 23858019 23818340 23992756| 23818340 [ 3345260] FNV1A_Jesteress: 23737579 23756975 23731544 23743013 23743112| 23731544 [ 3355676] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_12,561,874_wikipedia-en-html.tar.wrd"12561874 lines read
33554432 elements in the table (25 bits)
FNV1A_Hash_Tesla: 12464491 12317094 12331733 12323774 12346488| 12317094 [ 2141464] FNV1A_Mantis: 12946943 12933482 12932442 12942204 13009030| 12932442 [ 2082213] FNV1A_Meiyan: 12583824 12417170 12440549 12465760 12441958| 12417170 [ 2111271] FNV1A_Jesteress: 12388725 12369952 12378952 12377230 12380569| 12369952 [ 2121868] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,351,114_English_Spell-Check_Unknown-Quality.wrd"351114 lines read
1048576 elements in the table (20 bits)
FNV1A_Hash_Tesla: 252801 238573 234905 233660 237413| 233660 [ 53107] FNV1A_Mantis: 253479 251841 249576 250135 254282| 249576 [ 52712] FNV1A_Meiyan: 234582 238797 235336 239111 238004| 234582 [ 52910] FNV1A_Jesteress: 235985 236268 234515 236577 234398| 234398 [ 52684] FNV1A_Jester: 236458 237823^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd"38936 lines read
131072 elements in the table (17 bits)
FNV1A_Hash_Tesla: 9787 9429 9419 9397 9567| 9397 [ 5176] FNV1A_Mantis: 10181 10205 10302 12326 11283| 10181 [ 5185] FNV1A_Meiyan: 9591 9563 9530 9493 9603| 9493 [ 5224] FNV1A_Jesteress: 15021 10163 9524 9533 9476| 9476 [ 5182] FNV1A_Jester: 9637 9482 9586 9580 9588| 9482 [ 5200] FNV1A_Smaragd: 11148 10041 10030 10048 10077| 10030 [ 5194] FNV1A_Peregrine: 10117 10179 9974 9968 10116| 9968 [ 5277] FNV1A_Whiz: 9616 9877 9679 9614 10118| 9614 [ 5200] FNV1A_Nefertiti: 10428 9939 9870 9925 10214| 9870 [ 5381] FNV-1a: 11328 11370 11388 11335 11216| 11216 [ 5321] Sixtinsensitive+: 10182 10274 9987 10234 10065| 9987 [ 5209] Sixtinsensitive: 10847 10800 10723 12666 10626| 10626 [ 5347] Alfalfa_Rollick: 10855 10116 10006 10077 10019| 10006 [ 5242] Alfalfa: 10422 10409 10427 10586 10946| 10409 [ 5252] Alfalfa_HALF: 10676 10529 10602 10584 10553| 10529 [ 5231] Alfalfa_DWORD: 10679 10883 11059 11093 11386| 10679 [ 5252] Alfalfa_QWORD: 10759 10692 10777 10792 10745| 10692 [ 5252] Bernstein: 11311^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Word-list_00,584,879_Russian_Spell-Check_Unknown-Quality.slv"584879 lines read
2097152 elements in the table (21 bits)
FNV1A_Hash_Tesla: 393365 373464 374534 375388 373557| 373464 [ 81232] FNV1A_Mantis: 412156 414563 411944 414678 415841| 411944 [ 74643] FNV1A_Meiyan: 383821 387854 387760 390291 387271| 383821 [ 75377] FNV1A_Jesteress: 382663 383224 381564 380962 382974| 380962 [ 75404] FNV1A_Jester: 384581^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash "Sentence-list_00,032,359_English_The_Holy_Bible.txt"32359 lines read
65536 elements in the table (16 bits)
FNV1A_Hash_Tesla: 27520 26722 27432 28257 26136| 26136 [ 6937] FNV1A_Mantis: 28386 28415 28554 28248 27951| 27951 [ 6925] FNV1A_Meiyan: 27598 27359 27334 27319 27338| 27319 [ 6897] FNV1A_Jesteress: 27888 27308 27483 27310 27274| 27274 [ 6883] FNV1A_Jester: 31040 31119 31379 30732 30596| 30596 [ 6874] FNV1A_Smaragd: 40554 40719 42560 40805 40459| 40459 [ 6849] FNV1A_Peregrine: 30687 30474 30474 31499 32991| 30474 [ 6838] FNV1A_Whiz: 32561 31988 31396 31424 31415| 31396 [ 6874] FNV1A_Nefertiti: 30534 30860 30331 30011 30143| 30011 [ 6878] FNV-1a: 60627 60952 61696 61693 61678| 60627 [ 6840] Sixtinsensitive+: 35231 35090 35459 35186 37219| 35090 [ 6839] Sixtinsensitive: 38508^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash 100MB_as_one_line.TXT1 lines read
4 elements in the table (2 bits)
FNV1A_Hash_Tesla: 194019 197848 195058 193316 193382| 193316 [ 0] FNV1A_Mantis: 234425 236411 236747 234573 234684| 234425 [ 0] FNV1A_Meiyan: 240160 240099 242438 240194 242915| 240099 [ 0] FNV1A_Jesteress: 243338 241873 239935 239735 239124| 239124 [ 0] FNV1A_Jester: 331840^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash IPS.TXT2995394 lines read
8388608 elements in the table (23 bits)
FNV1A_Hash_Tesla: 2289107 2226568 2226390 2234596 2222701| 2222701 [ 691369] FNV1A_Mantis: 2469897 2466911 2466878 2471249 2465728| 2465728 [ 481137] FNV1A_Meiyan: 2290118 2285787 2284061 2291056 2286210| 2284061 [ 593723] FNV1A_Jesteress: 2331767 2324661 2327224 2326173 2325028| 2324661 [ 691369] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>hash googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt17981107 lines read
67108864 elements in the table (26 bits)
FNV1A_Hash_Tesla: 19108584 18921663 19047468 18902535 18913730| 18902535 [ 4218589] FNV1A_Mantis: 20408048 20413041 20428172 20421550 20570044| 20408048 [ 2208686] FNV1A_Meiyan: 19589642 19586573 19584081 19590689 19588464| 19584081 [ 2209364] FNV1A_Jesteress: 19482570 19662021 19476235 19490673 19499351| 19476235 [ 2208081] FNV1A_Jester: ^C D:\_KAZE_new-stuff\VivaNicolaTesla>type googlebooks-eng-us-all-4gram-20090715-graffith_A_distinct.txt ... a_bacillus_and_a a_bacillus_belonging_to a_bacillus_closely_related a_bacillus_closely_resembling a_bacillus_described_by a_bacillus_discovered_by a_bacillus_found_in a_bacillus_from_the a_bacillus_has_been a_bacillus_identical_with a_bacillus_in_the a_bacillus_isolated_from a_bacillus_known_as a_bacillus_obtained_from a_bacillus_of_the a_bacillus_or_a a_bacillus_resembling_that a_bacillus_resembling_the a_bacillus_similar_to a_bacillus_that_is a_bacillus_to_which a_bacillus_which_has a_bacillus_which_he a_bacillus_which_is a_bacillus_which_may a_bacillus_which_they a_bacillus_which_was a_bacillus_whose_growth a_bacillus_with_rounded a_back_alley_and a_back_alley_behind a_back_alley_in a_back_alley_of a_back_alley_off a_back_alley_or a_back_alley_somewhere a_back_alley_that a_back_alley_to a_back_alley_where a_back_alley_with ... D:\_KAZE_new-stuff\VivaNicolaTesla>Regards
I am hunting for an extremely fast integer->integer hashing method for working with a large array of hashtables, in particular, where there are a high number of key (re)inserts, key (re)deletions, and value (re)updates within each hashtable, as large volumes of data are processed. Currently writing in C, but open to inlining assembly if it offers nice gains.
By the will of the hash(ing) gods ... show me the way!
Testing avalanche on integer hash functions
http://baagoe.org/en/wiki/Avalanche_on_integer_hash_functions
I'd love to use some hashes in PHP but not have to enable or install an extension to do so, naturally speed and efficiency would suffer, but the "portability" of the code makes the trade off worth it for my needs. I'd love to have lookup3/SuperFastHash ported to a php function, even One-At-A-Time would be great!
How about adding some CityHash hashes to the comparison?
Thanks Peter! I've found your blog extremely useful. Your discussion with @Sanmayce led me to your blog.
Keep up a good job!
Cuckoo hashing: neither linear probing nor chaining
http://en.wikipedia.org/wiki/Cuckoo_hashing
I found the following to be an interesting hash function
http://code.google.com/p/xxhash/
Additional info can be found here:
http://fastcompression.blogspot.ca/2012/04/selecting-checksum-algorithm.html
I concur with the general conclusion that Adler-32 should not be used as a hash function. It "fills up" with information from the input very slowly, which is definitely not what you want in a hash function, especially when used on short strings.
However the observation above: "The second problem is that the characters are not "weighted" (multiplied by different numbers), so that Adler-32("01") = Adler-32("10"), that's why it fails the Numbers test. Ditto for anagrams in Shakespeare's sonnets: Adler-32("heart") = Adler-32("earth")." is not correct. The Adler-32 of "01" is 0x00930062, whereas the Adler-32 of "10" is 0x00940062. Similarly, the Adler-32 of "heart" is 0x061c0215, and the Adler-32 of "earth" is 0x06280215.
The leading zeros in each half-word of those last two is indicative of the slow filling that I referred to.
I'm sorry, I've corrected the mistake in the article and added a link to your comment. The checksum actually has no collisions for the numbers, but when the values are reduced modulo the hash table size, they collide. For example, the hash table size is 216:
Adler-32("01") = 0x00930062 Adler-32("10") = 0x00940062 Adler-32("01") mod 2^16 = 0x0062 Adler-32("10") mod 2^16 = 0x0062 Adler-32("heart") = 0x061c0215 Adler-32("earth") = 0x06280215 Adler-32("heart") mod 2^16 = 0x0215 Adler-32("earth") mod 2^16 = 0x0215It helps if you XOR the lower and the higher part, but there is still a lot of collisions:
As you said, the function should be used as a checksum, not as a hash function.
Hi Peter,
just wanted to see how the fastest (regarding linear speed) hasher FNV1A_Tesla (64bit) rewritten down to 32bit would behave, so here comes my new favorite FNV1A_Yorikke - the fastest 32bit hasheress.
She outspeeds both FNV1A_Jesteress and FNV1A_Meiyan featuring collisions comparable to CRC32.
I wonder how the new approach (hashing two lines) behaves on i5/i7, my expectations are that FNV1A_Yorikke is gonna scream.
I hate the fact that I still cannot play with an i7 machine, so the following results are obtained on my laptop T7500 2200MHz:
500 lines read
1024 elements in the table (10 bits)
Jesteress: 76 [ 110] Meiyan: 76 [ 102] Yorikke: 76 [ 108] x17 unrolled: 88 [ 109] FNV-1a: 92 [ 124] Larson: 93 [ 99] CRC-32: 85 [ 101] Murmur2: 86 [ 103] SBox: 81 [ 91] Murmur2A: 95 [ 114] Murmur3: 100 [ 101] XXHfast32: 97 [ 110] XXHstrong32: 96 [ 109] Win321992 lines read
4096 elements in the table (12 bits)
Jesteress: 401 [ 397] Meiyan: 405 [ 409] Yorikke: 403 [ 431] x17 unrolled: 577 [ 415] FNV-1a: 576 [ 428] Larson: 565 [ 416] CRC-32: 527 [ 426] Murmur2: 467 [ 415] SBox: 482 [ 431] Murmur2A: 515 [ 433] Murmur3: 517 [ 380] XXHfast32: 473 [ 420] XXHstrong32: 488 [ 429] Numbers500 lines read
1024 elements in the table (10 bits)
Jesteress: 51 [ 300] Meiyan: 47 [ 125] Yorikke: 46 [ 86] x17 unrolled: 45 [ 24] FNV-1a: 53 [ 108] Larson: 41 [ 16] CRC-32: 45 [ 64] Murmur2: 53 [ 104] SBox: 50 [ 116] Murmur2A: 62 [ 102] Murmur3: 67 [ 104] XXHfast32: 67 [ 102] XXHstrong32: 70 [ 102] Prefix500 lines read
1024 elements in the table (10 bits)
Jesteress: 110 [ 102] Meiyan: 111 [ 106] Yorikke: 107 [ 94] x17 unrolled: 206 [ 113] FNV-1a: 195 [ 94] Larson: 195 [ 99] CRC-32: 177 [ 107] Murmur2: 138 [ 106] SBox: 149 [ 108] Murmur2A: 149 [ 112] Murmur3: 148 [ 103] XXHfast32: 117 [ 103] XXHstrong32: 123 [ 102] Postfix500 lines read
1024 elements in the table (10 bits)
Jesteress: 108 [ 106] Meiyan: 108 [ 112] Yorikke: 106 [ 111] x17 unrolled: 201 [ 102] FNV-1a: 195 [ 105] Larson: 195 [ 105] CRC-32: 174 [ 94] Murmur2: 136 [ 111] SBox: 148 [ 91] Murmur2A: 146 [ 109] Murmur3: 148 [ 105] XXHfast32: 115 [ 106] XXHstrong32: 123 [ 112] Variables1842 lines read
4096 elements in the table (12 bits)
Jesteress: 337 [ 366] Meiyan: 341 [ 350] Yorikke: 338 [ 359] x17 unrolled: 418 [ 368] FNV-1a: 418 [ 374] Larson: 429 [ 366] CRC-32: 400 [ 338] Murmur2: 384 [ 383] SBox: 371 [ 347] Murmur2A: 424 [ 365] Murmur3: 433 [ 334] XXHfast32: 406 [ 347] XXHstrong32: 405 [ 355] Sonnets3228 lines read
8192 elements in the table (13 bits)
Jesteress: 494 [ 585] Meiyan: 501 [ 588] Yorikke: 496 [ 552] x17 unrolled: 565 [ 589] FNV-1a: 558 [ 555] Larson: 593 [ 583] CRC-32: 560 [ 563] Murmur2: 562 [ 566] SBox: 516 [ 526] Murmur2A: 627 [ 544] Murmur3: 651 [ 555] XXHfast32: 602 [ 491] XXHstrong32: 598 [ 491] UTF-813408 lines read
32768 elements in the table (15 bits)
Jesteress: 2391 [ 2427] Meiyan: 2445 [ 2377] Yorikke: 2427 [ 2392] x17 unrolled: 2786 [ 2392] FNV-1a: 2860 [ 2446] Larson: 2979 [ 2447] CRC-32: 2770 [ 2400] Murmur2: 2724 [ 2399] SBox: 2640 [ 2442] Murmur2A: 3037 [ 2369] Murmur3: 3100 [ 2376] XXHfast32: 2946 [ 2494] XXHstrong32: 2936 [ 2496] IPv43925 lines read
8192 elements in the table (13 bits)
Jesteress: 576 [ 819] Meiyan: 590 [ 807] Yorikke: 588 [ 821] x17 unrolled: 796 [ 804] FNV-1a: 855 [ 796] Larson: 817 [ 789] CRC-32: 787 [ 802] Murmur2: 698 [ 825] SBox: 722 [ 804] Murmur2A: 762 [ 804] Murmur3: 776 [ 818] XXHfast32: 789 [ 829] XXHstrong32: 809 [ 829] 3333 Latin Powers3333 lines read
8192 elements in the table (13 bits)
Jesteress: 763 [ 576] Meiyan: 770 [ 583] Yorikke: 779 [ 579] x17 unrolled: 1345 [ 564] FNV-1a: 1299 [ 604] Larson: 1301 [ 581] CRC-32: 1192 [ 613] Murmur2: 956 [ 600] SBox: 996 [ 576] Murmur2A: 1023 [ 576] Murmur3: 1032 [ 583] XXHfast32: 843 [ 596] XXHstrong32: 882 [ 571] ~3 million IPs (dot format)2995394 lines read
8388608 elements in the table (23 bits)
Jesteress: 2027663 [691369] Meiyan: 2033983 [593723] Yorikke: 1952199 [476699] x17 unrolled: 2357193 [475528] FNV-1a: 2410596 [477067] Larson: 2369252 [475575] CRC-32: 2298651 [472854] Murmur2: 2298675 [476330] SBox: 2412474 [476681] Murmur2A: 2376168 [475493] Murmur3: 2346091 [476845] XXHfast32: 2365397 [476358] XXHstrong32: 2372267 [476358] Russian ASCII584879 lines read
2097152 elements in the table (21 bits)
Jesteress: 322585 [75404] Meiyan: 325962 [75377] Yorikke: 324935 [74661] x17 unrolled: 311773 [75124] FNV-1a: 370532 [74184] Larson: 327605 [74389] CRC-32: 360723 [74307] Murmur2: 359362 [74234] SBox: 368927 [74645] Murmur2A: 375582 [74456] Murmur3: 371407 [74612] XXHfast32: 370156 [74572] XXHstrong32: 371014 [74603]Wikipedia en
12561874 lines read
33554432 elements in the table (25 bits)
Jesteress: 10606801 [2121868] Meiyan: 10691456 [2111271] Yorikke: 10710077 [2084954] x17 unrolled: 10336797 [2410605] FNV-1a: 11551149 [2081195] Larson: 10837339 [2080111] CRC-32: 11464031 [2075088] Murmur2: 11379472 [2081476] SBox: 11530201 [2084018] Murmur2A: 11762919 [2081370] Murmur3: 11708730 [2082084] XXHfast32: 11576405 [2084164] XXHstrong32: 11570909 [2084514] Wikipedia de-en-es-fr-it-nl-pt-ro22202980 lines read
67108864 elements in the table (26 bits)
Jesteress: 19823910 [3355676] Meiyan: 19998537 [3345260] Yorikke: 20036224 [3300245] x17 unrolled: 19252660 [3830652] FNV-1a: 21626497 [3297552] Larson: 20162419 [3296692] CRC-32: 21396006 [3298998] Murmur2: 21249648 [3297709] SBox: 21487107 [3298021] Murmur2A: 21675364 [3300445] Murmur3: 21264598 [3299700] XXHfast32: 21028361 [3301160] XXHstrong32: 21033394 [3302256] 100MB as one line1 lines read
4 elements in the table (2 bits)
Jesteress: 198199 [ 0] Meiyan: 198333 [ 0] Yorikke: 176506 [ 0] x17 unrolled: 953166 [ 0] FNV-1a: 924509 [ 0] Larson: 950111 [ 0] CRC-32: 764957 [ 0] Murmur2: 339978 [ 0] SBox: 512374 [ 0] Murmur2A: 339648 [ 0] Murmur3: 303091 [ 0] XXHfast32: 168528 [ 0] XXHstrong32: 217354 [ 0] 5,000,000 Knight Tours5000000 lines read
16777216 elements in the table (24 bits)
Jesteress: 5912178 [676877] Meiyan: 5917649 [676877] Yorikke: 5762697 [677478] x17 unrolled: 136105460 [4868928] FNV-1a: 13574050 [2080003] Larson: 42509864 [4475748] CRC-32: 9373072 [676997] Murmur2: 6755138 [675965] SBox: 11414715 [2079523] Murmur2A: 6882708 [676417] Murmur3: 6673330 [676857] XXHfast32: 5864132 [675637] XXHstrong32: 6154955 [675834]In last test we hash 5 million 128bytes long lines (kind of super-heavy-prefix test).
It resembles having very similar (sharing one long prefix) 5 million 128 chars long tweets.
Obviously we need more versatile tests (like my 'Knight Tours') in order to find-and-fix possible weak points.
Here x17 unrolled, FNV-1a, Larson, SBox fail to keep up with others.
In pre-last (linear hash speed - 100MB as one line) test XXHfast32 outspeeds FNV1A_Yorikke with (176506-168528)/168528*100% = 4.7%, is this the case on i5/i7 machines!?
In my view FNV1A_Yorikke has heart full of soul or "fine and clean like polished gold" (just as B.Traven describes her), yet, more torture is needed.
You all are welcome to download my (your benchmarker is used) latest (all above tests included) hash package:
If we have to hash some 5,000,000,000+ tweets, the things would rapidly go ugly.
It would be useful to add more fault-finding tests, thus we can have/rely on some hasher for "real world" (i.e. heavy) loads.
Regards
Very informative analysis. Sincerely thank you Peter. Made my job of finding a good hash function so much easier with the research you have conducted in this topic.
Hi Piter,
Thanks for the article. It would be great if you make your test for linux too. You will be surprised with the results.
For me the best hash function here is “Yorikke”.
Thank You Sanmayce!
Thank you Mr. Norton,
I wrote Knight-tour_r8dump_Yorikke.c (downloadable at link below) which hashes Knight-Tours on the fly in order to monitor the fattest slot(s).
This torture-test uses 27bit hash table i.e. 134,217,728 slots.
http://www.sanmayce.com/Fastest_Hash/index.html#KT_torture
Up to last night the test reached 550,000,000 Knight-Tours:
FNV1A_Yorikke has 005 slots with MAX_depthness 019 (number of x's).
CRC32 has 001 slot with MAX_depthness 021 (number of x's).
In other words MAX_depthness 019 means maximum 019 layers or maximum 019 keys sharing one slot.
I will wait until all the 5,000,000,000 Knight-Tours are hashed, maybe a week more.
This is the heaviest torture-hash-test, no?
Also I am very glad of Mr. Noll's retweet:
Dear Sanmayce,
I would like to thank You for the intention to share yours amazing real McCoy hash functions!
When you finish the test I hope people to see the “Yorikke” beauty as me.
In our “hostile” 24/7 *nix environment the speed counts and “Yorikke” rocks!
It’s a treasure and state of the art work.
Thanks again and all the best!
Some idea...
unsigned long maFastPrime1Hash(char *str, unsigned int len) { unsigned int hash = len, i = 0, t, k; long rem = len; unsigned char trail; const unsigned char * data = (const unsigned char *)str; while (rem >= 4) { k = *(unsigned int*)data; k += i++; hash ^= k; hash *= 171717; data += 4; rem -= 4; } while (rem >= 0) { trail = *(unsigned char*)data; trail += i++; hash ^= trail; hash *= 171717; data++; rem--; } return hash; }by Alexander Myasnikov
Some idea to speed up (but little more collisions)
unsigned long maRushPrime1Hash(char *str, unsigned int len) { unsigned int hash = len, i = 0, k; long rem = len; const unsigned char * data = (const unsigned char *)str; while (rem >= 4) { k = *(unsigned int*)data; k += i++; hash ^= k; hash *= 171717; data += 4; rem -= 4; } switch (rem) { case 3: k = (unsigned long)(data[0]) | (unsigned long)(data[1] << 8) | (unsigned long)(data[2] << 16); k += i++; hash ^= k; hash *= 171717; break; case 2: k = (unsigned long)(data[0]) | (unsigned long)(data[1] << 8); k += i++; hash ^= k; hash *= 171717; break; case 1: k = (unsigned long)(data[0]); k += i++; hash ^= k; hash *= 171717; break; } return hash; }by Alexander Myasnikov
amsoftware.narod.ru (amsoftware at ya.ru)
Unfortunately, the hash functions don't perform well on my benchmark. The results of "IPv4" and "Numbers" tests are especially bad:
Some collision and speed tests on different data sets (Russian)
http://amsoftware.narod.ru/algo2.html
xxHash has been updated recently :
http://code.google.com/p/xxhash/
Hi, I've created the following hash function for my own use, and found that using random generated strings, I repeatedly get 0 collisions. I repeatedly tested with 10 million such random strings with same 0 collisions, and also compared the same sets of strings with the djb-function - the djb function averaged more than 200 000 collisions.
I lack testing sources/techniques and would be very interested in the results if you would be so kind as to test it.
Here's the function:
unsigned long slash_hash(const std::string s) {union
{unsigned long t;
unsigned char b[sizeof(long)]; }; unsigned long i=0,n=s.length(),p=0,d=sizeof(long); t=0L; while (i<n) { b[p++] += s[i] << (i/d); if (p>=d) p=0; i++; } return t; }PS to my previous post:
I'm using a 64-bit machine and sizeof(long) == 8 bytes
and I think the collisions will start in earnest with
strings that exceed 63 bytes in length (then (i/d)>=8)
and the shift s[i] may result in 0 - just a guess.
(sorry - I feel like a spammer;)
Here is a revised/improved version of the function:
uint64_t slash_hash(const char *s) //uint32_t slash_hash(const char *s) { union { uint64_t h; uint8_t u[8]; }; int i=0, p=0; h=0; while (*s) { u[p++%8] += i + (*s++ << (i/8) % 8); i++; } return h; //return (h+(h>>32)); }I tested it against MurmurHash2 with the same (my) data-sets and the results were about the same. I would really appreciate a comparative test - the function, for it's simplicity, seems to work remarkably well :)
It looks like SipHash is becoming the new standard for hashing short messages. I'm curious to see how it performs on your benchmarks.
https://131002.net/siphash/
Typical hash map in web apps is JSON object. It has short attribute names, and not a lot of entries.
Let's assume average size of identifier is 12 chars, and number of entries - say, in the range 16-32 (certainly less than 256)
Clearly, we need a hashing function optimized for this case.
In this regard, it would be interesting to see statistics for Pearson algo in your research.
NOTE:
For short identifiers, amortized cost of last iteration' branch misprediction is very high- approx 1 cycle/byte, so probably it would make sense
to have N specialized versions (for string length 1, 2, ... say, 24), and use indirect jump (costs 5 cycles), this will translate to 1 cycle per char in savings -
comparable with the cost of computation per char itself.
What do you think?
Choosing a Good Hash Function, Part 3
http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/
Hi guys,
I am happy to share my latest best.
A remainderless variant for 16[+] bytes keys appeared while I was playing with Yorikke and wanting to try an old idea of mine - to reduce the branching.
When the GOLDEN Yorikke was Interleaved&Interlaced a DIAMANT appeared - it's time for a new generation slasher: FNV1A_Yoshimura.
Up to now the TOP benchmark results I was able to gather:
On AMD (Phenom II X6 1600T, 4000MHz) FNV1A_YoshimitsuTRIAD reigns with 11.360MB per clock or 11360/1024 = 11.093GB/s.
It is worth the attempt to explore the Jesteress-Yorikke (i.e. 1 hash line 4+4 vs 2 hash lines 4+4) 8-16 GAP in order to lessen their collisions further more.
Simply put, 3 hash lines 4 bytes each, 12 bytes per loop.
The 'non power of 2' workaround I see as one MONOLITH function with no remainder mixing at all.
The idea #1 is to exterminate all nasty IFs outwith the main loop, I believe such branchless etude will outperform Jesteress.
The idea #2 is to STRESS memory by fetching not-so-adjacent areas.
For example:
Key: hash_with_overlapping_aye_aye
Key_Length: 29
Loop #1 of 3:
Hash line 1: hash
Hash line 2: h_ov
Hash line 3: ping
Loop #2 of 3:
Hash line 1: _wit
Hash line 2: erla
Hash line 3: _aye
Loop #3 of 3:
Hash line 1: h_ov
Hash line 2: ppin
Hash line 3: _aye
I don't know the internals, whether lines are 32/64/128 bytes long is a secondary concern.
Well, the key is too short, in reality the above key may span only 1|2 cache lines, if the key is longer than 4 cache lines (assuming 32) e.g. 128+2 bytes then it may span 5|6 lines.
My dummy/clueless premise is that is possible (in future systems) to access effectively RAM in such manner.
Does someone know whether such type of accessing has any practical value in nowadays CPUs?
Of course, it is worth trying to "interleave" in that way all the short string hashers, yes?
Anyway the 3 lines are in stack, for the time being let's see how 'INTERLEAVED' Yorikke, which I called FNV1A_Yoshimura, behaves.
// [North Star One-Sword School] // - My name is Kanichiro Yoshimura. // I'm a new man. Just so you'll know who I am... // Saito-sensei. // - What land are you from? // - 'Land'? // - Yes. // - I was born in Morioka, in Nanbu, Oshu. // It's a beautiful place. // Please... // Away to the south is Mt Hayachine... // with Mt Nansho and Mt Azumane to the west. // In the north are Mt Iwate and Mt Himekami. // Out of the high mountains flows the Nakatsu River... // through the castle town into the Kitakami below Sakuranobaba. // Ah, it's pretty as a picture! // There's nowhere like it in all Japan! // /Paragon Kiichi Nakai in the paragon piece-of-art 'The Wolves of Mibu' aka 'WHEN THE LAST SWORD IS DRAWN'/ // As I said on one Japanese forum, Kiichi Nakai deserves an award worth his weight in gold, nah-nah, in DIAMONDS! uint32_t FNV1A_Hash_Yoshimura(const char *str, uint32_t wrdlen) { const uint32_t PRIME = 709607; uint32_t hash32 = 2166136261; uint32_t hash32B = 2166136261; const char *p = str; uint32_t Loop_Counter; uint32_t Second_Line_Offset; if (wrdlen >= 2*2*sizeof(uint32_t)) { Second_Line_Offset = wrdlen-((wrdlen>>4)+1)*(2*4); // ((wrdlen>>1)>>3) Loop_Counter = (wrdlen>>4); //if (wrdlen%16) Loop_Counter++; Loop_Counter++; for(; Loop_Counter; Loop_Counter--, p += 2*sizeof(uint32_t)) { // revision 1: //hash32 = (hash32 ^ (_rotl(*(uint32_t *)(p+0),5) ^ *(uint32_t *)(p+4))) * PRIME; //hash32B = (hash32B ^ (_rotl(*(uint32_t *)(p+0+Second_Line_Offset),5) ^ *(uint32_t *)(p+4+Second_Line_Offset))) * PRIME; // revision 2: hash32 = (hash32 ^ (_rotl(*(uint32_t *)(p+0),5) ^ *(uint32_t *)(p+0+Second_Line_Offset))) * PRIME; hash32B = (hash32B ^ (_rotl(*(uint32_t *)(p+4+Second_Line_Offset),5) ^ *(uint32_t *)(p+4))) * PRIME; } } else { // Cases: 0,1,2,3,4,5,6,7,...,15 if (wrdlen & 2*sizeof(uint32_t)) { hash32 = (hash32 ^ *(uint32_t*)(p+0)) * PRIME; hash32B = (hash32B ^ *(uint32_t*)(p+4)) * PRIME; p += 4*sizeof(uint16_t); } // Cases: 0,1,2,3,4,5,6,7 if (wrdlen & sizeof(uint32_t)) { hash32 = (hash32 ^ *(uint16_t*)(p+0)) * PRIME; hash32B = (hash32B ^ *(uint16_t*)(p+2)) * PRIME; p += 2*sizeof(uint16_t); } if (wrdlen & sizeof(uint16_t)) { hash32 = (hash32 ^ *(uint16_t*)p) * PRIME; p += sizeof(uint16_t); } if (wrdlen & 1) hash32 = (hash32 ^ *p) * PRIME; } hash32 = (hash32 ^ _rotl(hash32B,5) ) * PRIME; return hash32 ^ (hash32 >> 16); }To reproduce the quick-test below here comes: http://www.sanmayce.com/Fastest_Hash/DOUBLOON_hash_micro-package_r2.zip
The results on my 'Bonboniera' T7500, throwing mostly 16+ long keys at the "awful greedy country samurai":
Intel 12.1:
3333 lines read
8192 elements in the table (13 bits)
Jesteress: 493 [ 576] Meiyan: 515 [ 583] Yorikke: 458 [ 579] Yoshimura: 379 [ 593] !!! SIGNIFICANTLY fastEST !!! Yoshimitsu: 497 [ 609] YoshimitsuTRIAD: 489 [ 615] FNV-1a: 969 [ 604] Larson: 947 [ 581] CRC-32: 894 [ 613] Murmur2: 656 [ 600] Murmur3: 711 [ 583] XXHfast32: 504 [ 596] XXHstrong32: 528 [ 571]1000 lines read
2048 elements in the table (11 bits)
Jesteress: 268 [ 205] Meiyan: 268 [ 205] Yorikke: 224 [ 207]Yoshimura: 235 [ 187] ??? the slowest of all the four Yo*, something to ponder about ???
Yoshimitsu: 225 [ 225] YoshimitsuTRIAD: 221 [ 219] FNV-1a: 1125 [ 225] Larson: 1131 [ 212] CRC-32: 919 [ 230] Murmur2: 439 [ 222] Murmur3: 497 [ 223] XXHfast32: 250 [ 223] XXHstrong32: 309 [ 192]32359 lines read
65536 elements in the table (16 bits)
Jesteress: 12249 [ 6883] Meiyan: 12369 [ 6897] Yorikke: 11000 [ 6872] Yoshimura: 9876 [ 6908] !!! fastEST, yet with high collisions !!! Yoshimitsu: 11489 [ 6937] YoshimitsuTRIAD: 11094 [ 6843] FNV-1a: 39491 [ 6840] Larson: 39714 [ 6889] CRC-32: 34264 [ 6891] Murmur2: 17678 [ 6786] Murmur3: 19626 [ 6850] XXHfast32: 10383 [ 6859] XXHstrong32: 12708 [ 6887]Microsoft 16:
3333 lines read
8192 elements in the table (13 bits)
Jesteress: 756 [ 576] Meiyan: 781 [ 583] Yorikke: 776 [ 579] Yoshimura: 740 [ 593] Yoshimitsu: 781 [ 609] YoshimitsuTRIAD: 803 [ 615] FNV-1a: 1306 [ 604] Larson: 1304 [ 581] CRC-32: 1204 [ 613] Murmur2: 983 [ 600] Murmur3: 1031 [ 583] XXHfast32: 859 [ 596] XXHstrong32: 883 [ 571]1000 lines read
2048 elements in the table (11 bits)
Jesteress: 463 [ 205] Meiyan: 464 [ 205] Yorikke: 422 [ 207] Yoshimura: 442 [ 187] Yoshimitsu: 431 [ 225] YoshimitsuTRIAD: 423 [ 219] FNV-1a: 1311 [ 225] Larson: 1319 [ 212] CRC-32: 1148 [ 230] Murmur2: 648 [ 222] Murmur3: 637 [ 223] XXHfast32: 451 [ 223] XXHstrong32: 496 [ 192]32359 lines read
65536 elements in the table (16 bits)
Jesteress: 20162 [ 6883] Meiyan: 20124 [ 6897] Yorikke: 19101 [ 6872] Yoshimura: 17801 [ 6908] Yoshimitsu: 19616 [ 6937] YoshimitsuTRIAD: 19370 [ 6843] FNV-1a: 47142 [ 6840] Larson: 48009 [ 6889] CRC-32: 42964 [ 6891] Murmur2: 25741 [ 6786] Murmur3: 25654 [ 6850] XXHfast32: 18179 [ 6859] XXHstrong32: 20557 [ 6887]Mr.Norton you are most welcome, maybe you have already spotted that I don't target 64bit stamps at all, yet, if you are interested I can write r.3 of my 'Tesla' function using these (Interleaving & Interlacing) nifty techniques - I believe it will outperform itself being the fastest 64bit hash in m^2 testbench.
I thank Przemyslaw Skibinski and Maciej Adamczyk (m^2) for their 64bit testbench which I included along with yours (Peter) in the benchmark:
http://www.sanmayce.com/Fastest_Hash/DOUBLOON_hash_micro-package_r3.zip
Results below are for the Intel 12.1 32bit executable on my laptop:
1 lines read
4 elements in the table (2 bits)
Jesteress: 17 [ 0] Meiyan: 17 [ 0] Yorikke: 10 [ 0] Yoshimura: 10 [ 0] Yoshimitsu: 9 [ 0] YoshimitsuTRIAD: 10 [ 0] FNV-1a: 131 [ 0] Larson: 131 [ 0] CRC-32: 102 [ 0] Murmur2: 35 [ 0] Murmur3: 42 [ 0] XXHfast32: 11 [ 0] XXHstrong32: 21 [ 0] ... // hash_I 100MB_as_one_line.TXT:1 lines read
4 elements in the table (2 bits)
Jesteress: 128304 [ 0] Meiyan: 128253 [ 0] Yorikke: 108994 [ 0] Yoshimura: 94794 [ 0] Yoshimitsu: 101738 [ 0] YoshimitsuTRIAD: 104999 [ 0] FNV-1a: 850704 [ 0] Larson: 853118 [ 0] CRC-32: 663362 [ 0] Murmur2: 239559 [ 0] Murmur3: 287138 [ 0] XXHfast32: 104313 [ 0] XXHstrong32: 153782 [ 0] // hash_I 200MB_as_one_line.TXT:1 lines read
4 elements in the table (2 bits)
Jesteress: 257028 [ 0] Meiyan: 256955 [ 0] Yorikke: 218383 [ 0] Yoshimura: 190907 [ 0] Yoshimitsu: 203801 [ 0] YoshimitsuTRIAD: 210256 [ 0] FNV-1a: 1702024 [ 0] Larson: 1709548 [ 0] CRC-32: 1327211 [ 0] Murmur2: 479303 [ 0] Murmur3: 574363 [ 0] XXHfast32: 208887 [ 0] XXHstrong32: 307942 [ 0]I wrote revision 3 of FNV1A_Tesla as 64bit counterpart of FNV1A_Yoshimura and included them into the 64bit linear speed test by Przemyslaw and Maciej, the results are (I threw the 200MB at the hashers):
As console screenshots:
http://www.sanmayce.com/Fastest_Hash/64bit_page1_.png
http://www.sanmayce.com/Fastest_Hash/64bit_page2_.png
As console text dumps:
FNV1A_Yoshimura is simply DIAMANTINE.