Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. Hashing in large databases is also left out from this article; the benchmark includes medium-size hash tables such as:
- symbol table in a parser,
- IP address table for filtering network traffic,
- the dictionary in a word counting program or a spellchecker.
There are two classes of the functions used in hash tables:
- multiplicative hash functions, which are simple and fast, but have a high number of collisions;
- more complex functions, which have better quality, but take more time to calculate.
Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.
The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?
Multiplicative hash functions
Any multiplicative hash function is a special case of the following algorithm:
UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
UINT hash = INITIAL_VALUE;
for(UINT i = 0; i < len; ++i)
hash = M * hash + key[i];
return hash % TABLE_SIZE;
}
(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.
A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:
INITIAL_VALUE * M^4 + 'T' * M^3 + 'O' * M^2 + 'N' * M + 'E'
Let's enter several similar strings and watch the output of the functions:
Bernstein Kernighan (M=33) (M=31) too b88af17 1c154 top b88af18 1c155 tor b88af1a 1c157 tpp b88af39 1c174 a000 7c9312d6 2cd22f a001 7c9312d7 2cd230 a002 7c9312d8 2cd231 a003 7c9312d9 2cd232 a004 7c9312da 2cd233 a005 7c9312db 2cd234 a006 7c9312dc 2cd235 a007 7c9312dd 2cd236 a008 7c9312de 2cd237 a009 7c9312df 2cd238 a010 7c9312f7 2cd24e a 2b606 61 aa 597727 c20 aaa b885c68 17841
Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.
Now let's compare top with tpp. Their hashes will be:
INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P' INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'
The hashes will be different by M * ('P' - 'O') = M
. Similarly, when the first letters are different by x, their hashes will be different by x * M^2
.
When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).
If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.
There are no exact rules for choosing the multiplier, only some heuristics:
- the multiplier should be large enough to accommodate most of the possible letters (e.g., 3 or 5 is too small);
- the multiplier should be fast to calculate with shifts and additions [e.g., 33 * hash can be calculated as (hash << 5) + hash];
- the multiplier should be odd for the reason explained above;
- prime numbers are good multipliers.
Complex hash functions
These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:
Paul Hsieh One At Time too 3ad11d33 3a9fad1e top 78b5a877 4c5dd09a tor c09e2021 f2aa9d35 tpp 3058996d d5e9e480 a000 7552599f ed3859d8 a001 3cc1d896 fef7fd57 a002 c6ff5c9b 08a610b3 a003 dcab7b0c 1a88b478 a004 780c7202 3621ebaa a005 7eb63e3a 47db8f1d a006 6b0a7a17 b901717b a007 cb5cb1ab caec1550 a008 5c2a15c0 e58d4a92 a009 33339829 f75aee2d a010 eb1f336e bd097a6b a 115ea782 ca2e9442 aa 008ad357 7081738e aaa 7dfdc310 ae4f22ec
To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?
Test conditions
The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on Pentium-M and Core i5 processors.
The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:
- the list of common words from Wiktionary (500 items);
- the list of Win32 functions from Colorer syntax highlight scheme (1992 items);
- 500 names from a000 to a499 (imitates the names in auto-generated source code);
- the list of common words with a long prefix and postfix;
- all variable names from WordPress 2.3.2 source code in wp-includes folder (1842 names);
- list of all words in Sonnets by W. Shakespeare (imitates a word counting program; 3228 words);
- list of all words in La Peau de chagrin by Balzac (in French, UTF-8 encoding);
- search engine IP addresses (binary).
Results
Core i5 processor
Words | Win32 | Numbers | Prefix | Postfix | Variables | Sonnets | UTF-8 | IPv4 | Avg | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
iSCSI CRC | 65 | [105] | 329 | [415] | 36 | [112] | 84 | [106] | 83 | [92] | 280 | [368] | 408 | [584] | 1964 | [2388] | 322 | [838] | 1.01 | [1.78] |
Meiyan | 64 | [102] | 328 | [409] | 45 | [125] | 87 | [106] | 85 | [112] | 274 | [350] | 411 | [588] | 1972 | [2377] | 353 | [768] | 1.05 | [1.87] |
Murmur2 | 72 | [103] | 378 | [415] | 48 | [104] | 109 | [106] | 106 | [111] | 315 | [383] | 450 | [566] | 2183 | [2399] | 399 | [834] | 1.21 | [1.74] |
XXHfast32 | 78 | [110] | 372 | [420] | 57 | [102] | 88 | [103] | 88 | [106] | 315 | [347] | 473 | [491] | 2323 | [2494] | 463 | [838] | 1.23 | [1.71] |
SBox | 70 | [91] | 389 | [431] | 46 | [116] | 124 | [108] | 123 | [91] | 304 | [347] | 430 | [526] | 2182 | [2442] | 377 | [836] | 1.23 | [1.78] |
Larson | 72 | [99] | 401 | [416] | 34 | [16] | 143 | [99] | 141 | [105] | 312 | [366] | 451 | [583] | 2230 | [2447] | 349 | [755] | 1.25 | [1.10] |
XXHstrong32 | 79 | [109] | 385 | [429] | 58 | [102] | 93 | [102] | 92 | [112] | 321 | [355] | 474 | [491] | 2332 | [2496] | 464 | [838] | 1.25 | [1.72] |
Sedgewick | 73 | [107] | 417 | [414] | 36 | [48] | 143 | [103] | 143 | [103] | 319 | [348] | 446 | [570] | 2246 | [2437] | 349 | [782] | 1.26 | [1.33] |
Novak unrolled | 76 | [113] | 404 | [399] | 43 | [90] | 127 | [118] | 125 | [113] | 322 | [342] | 459 | [581] | 2284 | [2430] | 379 | [969] | 1.26 | [1.68] |
CRC-32 | 70 | [101] | 429 | [426] | 40 | [64] | 146 | [107] | 143 | [94] | 320 | [338] | 443 | [563] | 2231 | [2400] | 357 | [725] | 1.28 | [1.41] |
Murmur3 | 78 | [101] | 391 | [380] | 54 | [104] | 108 | [103] | 107 | [105] | 331 | [334] | 492 | [555] | 2360 | [2376] | 433 | [783] | 1.28 | [1.69] |
x65599 | 74 | [111] | 407 | [382] | 45 | [203] | 144 | [107] | 144 | [122] | 316 | [379] | 449 | [560] | 2221 | [2373] | 349 | [846] | 1.29 | [2.45] |
FNV-1a | 74 | [124] | 408 | [428] | 47 | [108] | 144 | [94] | 144 | [105] | 309 | [374] | 440 | [555] | 2193 | [2446] | 376 | [807] | 1.30 | [1.77] |
Murmur2A | 79 | [114] | 410 | [433] | 53 | [102] | 117 | [112] | 114 | [109] | 337 | [365] | 494 | [544] | 2377 | [2369] | 429 | [772] | 1.31 | [1.73] |
Fletcher | 71 | [131] | 352 | [406] | 80 | [460] | 104 | [127] | 100 | [108] | 312 | [507] | 481 | [1052] | 2477 | [4893] | 388 | [1359] | 1.31 | [4.62] |
K&R | 73 | [106] | 429 | [437] | 47 | [288] | 149 | [94] | 149 | [106] | 324 | [360] | 450 | [561] | 2266 | [2365] | 343 | [831] | 1.32 | [3.00] |
Paul Hsieh | 80 | [114] | 410 | [420] | 54 | [118] | 123 | [101] | 121 | [100] | 336 | [341] | 496 | [600] | 2351 | [2380] | 433 | [847] | 1.33 | [1.83] |
Bernstein | 75 | [114] | 428 | [412] | 49 | [288] | 150 | [100] | 150 | [102] | 324 | [353] | 460 | [572] | 2312 | [2380] | 351 | [703] | 1.34 | [2.99] |
x17 unrolled | 78 | [109] | 446 | [415] | 43 | [24] | 156 | [113] | 153 | [102] | 344 | [368] | 472 | [589] | 2361 | [2392] | 373 | [829] | 1.37 | [1.19] |
lookup3 | 83 | [101] | 459 | [412] | 55 | [97] | 140 | [101] | 137 | [95] | 359 | [361] | 526 | [550] | 2480 | [2392] | 427 | [834] | 1.42 | [1.65] |
MaPrime2c | 79 | [103] | 459 | [426] | 50 | [106] | 155 | [91] | 155 | [106] | 349 | [349] | 486 | [550] | 2493 | [2406] | 406 | [865] | 1.42 | [1.73] |
Ramakrishna | 80 | [108] | 513 | [409] | 44 | [91] | 189 | [125] | 186 | [103] | 370 | [360] | 483 | [528] | 2565 | [2383] | 380 | [840] | 1.51 | [1.66] |
One At Time | 85 | [105] | 562 | [421] | 58 | [110] | 221 | [97] | 220 | [103] | 392 | [364] | 511 | [545] | 2659 | [2346] | 459 | [795] | 1.72 | [1.75] |
Arash Partow | 83 | [101] | 560 | [435] | 71 | [420] | 215 | [98] | 212 | [85] | 392 | [355] | 507 | [570] | 2638 | [2372] | 407 | [779] | 1.72 | [3.88] |
Weinberger | 87 | [104] | 590 | [422] | 37 | [100] | 254 | [111] | 273 | [117] | 398 | [364] | 541 | [712] | 2734 | [2547] | 419 | [744] | 1.78 | [1.75] |
Hanson | 73 | [118] | 417 | [649] | 45 | [112] | 123 | [118] | 1207 | [499] | 318 | [435] | 448 | [592] | 2324 | [2890] | 370 | [833] | 2.70 | [2.46] |
Pentium-M processor
Words | Win32 | Numbers | Prefix | Postfix | Variables | Sonnets | UTF-8 | IPv4 | Avg | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Meiyan | 80 | [102] | 426 | [409] | 56 | [125] | 123 | [106] | 121 | [112] | 354 | [350] | 525 | [588] | 2443 | [2377] | 445 | [768] | 1.02 | [1.87] |
Novak unrolled | 90 | [113] | 517 | [399] | 56 | [90] | 169 | [118] | 164 | [113] | 398 | [342] | 575 | [581] | 2716 | [2430] | 482 | [969] | 1.18 | [1.68] |
Fletcher | 84 | [131] | 444 | [406] | 102 | [460] | 140 | [127] | 133 | [108] | 374 | [507] | 592 | [1052] | 2891 | [4893] | 513 | [1359] | 1.21 | [4.62] |
SBox | 88 | [91] | 552 | [431] | 57 | [116] | 181 | [108] | 178 | [91] | 414 | [347] | 560 | [526] | 2814 | [2442] | 472 | [836] | 1.22 | [1.78] |
Murmur2 | 97 | [103] | 532 | [415] | 65 | [104] | 165 | [106] | 162 | [111] | 434 | [383] | 622 | [566] | 2948 | [2399] | 537 | [834] | 1.25 | [1.74] |
CRC-32 | 90 | [101] | 565 | [426] | 55 | [64] | 198 | [107] | 192 | [94] | 427 | [338] | 590 | [563] | 2842 | [2400] | 469 | [725] | 1.26 | [1.41] |
x17 unrolled | 93 | [109] | 593 | [415] | 52 | [24] | 214 | [113] | 208 | [102] | 434 | [368] | 593 | [589] | 2867 | [2392] | 486 | [829] | 1.30 | [1.19] |
lookup3 | 94 | [101] | 565 | [412] | 70 | [97] | 189 | [101] | 182 | [95] | 432 | [361] | 631 | [550] | 2943 | [2392] | 572 | [834] | 1.32 | [1.65] |
K&R | 93 | [106] | 619 | [437] | 58 | [288] | 221 | [94] | 218 | [106] | 442 | [360] | 587 | [561] | 2961 | [2365] | 447 | [831] | 1.33 | [3.00] |
Larson | 95 | [99] | 631 | [416] | 49 | [16] | 231 | [99] | 228 | [105] | 455 | [366] | 599 | [583] | 3027 | [2447] | 469 | [755] | 1.35 | [1.10] |
XXHfast32 | 108 | [110] | 546 | [420] | 86 | [102] | 139 | [103] | 136 | [106] | 459 | [347] | 681 | [491] | 3259 | [2494] | 717 | [838] | 1.35 | [1.71] |
Murmur3 | 108 | [101] | 561 | [380] | 74 | [104] | 167 | [103] | 165 | [105] | 468 | [334] | 700 | [555] | 3259 | [2376] | 604 | [783] | 1.36 | [1.69] |
Bernstein | 97 | [114] | 622 | [412] | 61 | [288] | 225 | [100] | 222 | [102] | 448 | [353] | 609 | [572] | 3053 | [2380] | 469 | [703] | 1.37 | [2.99] |
XXHstrong32 | 108 | [109] | 558 | [429] | 86 | [102] | 150 | [102] | 147 | [112] | 460 | [355] | 682 | [491] | 3262 | [2496] | 714 | [838] | 1.38 | [1.72] |
x65599 | 99 | [111] | 628 | [382] | 61 | [203] | 234 | [107] | 232 | [122] | 459 | [379] | 630 | [560] | 3097 | [2373] | 471 | [846] | 1.40 | [2.45] |
Paul Hsieh | 106 | [114] | 576 | [420] | 82 | [118] | 183 | [101] | 178 | [100] | 456 | [341] | 678 | [600] | 3154 | [2380] | 670 | [847] | 1.41 | [1.83] |
Sedgewick | 101 | [107] | 667 | [414] | 52 | [48] | 245 | [103] | 242 | [103] | 478 | [348] | 630 | [570] | 3204 | [2437] | 475 | [782] | 1.42 | [1.33] |
Murmur2A | 113 | [114] | 598 | [433] | 78 | [102] | 183 | [112] | 178 | [109] | 488 | [365] | 719 | [544] | 3380 | [2369] | 651 | [772] | 1.44 | [1.73] |
FNV-1a | 102 | [124] | 660 | [428] | 62 | [108] | 239 | [94] | 237 | [105] | 473 | [374] | 627 | [555] | 3140 | [2446] | 516 | [807] | 1.44 | [1.77] |
MaPrime2c | 108 | [103] | 705 | [426] | 65 | [106] | 255 | [91] | 254 | [106] | 508 | [349] | 674 | [550] | 3413 | [2406] | 542 | [865] | 1.54 | [1.73] |
Ramakrishna | 108 | [108] | 728 | [409] | 61 | [91] | 278 | [125] | 272 | [103] | 511 | [360] | 660 | [528] | 3378 | [2383] | 517 | [840] | 1.56 | [1.66] |
Arash Partow | 106 | [101] | 739 | [435] | 93 | [420] | 280 | [98] | 275 | [85] | 514 | [355] | 671 | [570] | 3332 | [2372] | 543 | [779] | 1.65 | [3.88] |
One At Time | 118 | [105] | 830 | [421] | 81 | [110] | 321 | [97] | 319 | [103] | 578 | [364] | 741 | [545] | 3809 | [2346] | 657 | [795] | 1.82 | [1.75] |
Weinberger | 119 | [104] | 956 | [422] | 54 | [100] | 375 | [111] | 379 | [117] | 614 | [364] | 745 | [712] | 3973 | [2547] | 560 | [744] | 1.89 | [1.75] |
Hanson | 86 | [118] | 531 | [649] | 55 | [112] | 168 | [118] | 1722 | [499] | 393 | [435] | 549 | [592] | 2742 | [2890] | 463 | [833] | 2.60 | [2.46] |
Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Avg column contains the average normalized execution time (and the number of collisions).
The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).
As you can see from the table, the function with the lowest number of collisions is not always the fastest one.
Results on a large data set (list of all words in English Wikipedia, 12.5 million words, from the benchmark by Georgi 'Sanmayce'):
Core i5 processor
Wikipedia | Avg | |||
---|---|---|---|---|
iSCSI CRC | 5725944 | [2077725] | 1.00 | [1.00] |
Meiyan | 5829105 | [2111271] | 1.02 | [1.02] |
Murmur2 | 6313466 | [2081476] | 1.10 | [1.00] |
Larson | 6403975 | [2080111] | 1.12 | [1.00] |
Murmur3 | 6492620 | [2082084] | 1.13 | [1.00] |
x65599 | 6479417 | [2102893] | 1.13 | [1.01] |
FNV-1a | 6599423 | [2081195] | 1.15 | [1.00] |
SBox | 6964673 | [2084018] | 1.22 | [1.00] |
Hanson | 7007689 | [2129832] | 1.22 | [1.03] |
CRC-32 | 7016147 | [2075088] | 1.23 | [1.00] |
Sedgewick | 7060691 | [2080640] | 1.23 | [1.00] |
XXHfast32 | 7078804 | [2084164] | 1.24 | [1.00] |
K&R | 7109841 | [2083145] | 1.24 | [1.00] |
XXHstrong32 | 7168788 | [2084514] | 1.25 | [1.00] |
Bernstein | 7247096 | [2074237] | 1.27 | [1.00] |
lookup3 | 7342986 | [2084889] | 1.28 | [1.01] |
Murmur2A | 7376650 | [2081370] | 1.29 | [1.00] |
Paul Hsieh | 7387317 | [2180206] | 1.29 | [1.05] |
x17 unrolled | 7410443 | [2410605] | 1.29 | [1.16] |
Ramakrishna | 8172670 | [2093253] | 1.43 | [1.01] |
One At Time | 8338799 | [2087861] | 1.46 | [1.01] |
MaPrime2c | 8428492 | [2084467] | 1.47 | [1.00] |
Arash Partow | 8503299 | [2084572] | 1.49 | [1.00] |
Weinberger | 9416340 | [3541181] | 1.64 | [1.71] |
Novak unrolled | 21289919 | [6318611] | 3.72 | [3.05] |
Fletcher | 22235133 | [9063797] | 3.88 | [4.37] |
Pentium-M processor
Wikipedia | Avg | |||
---|---|---|---|---|
x17 unrolled | 11321744 | [2410605] | 1.00 | [1.16] |
K&R | 11666050 | [2083145] | 1.03 | [1.00] |
Bernstein | 11833902 | [2074237] | 1.05 | [1.00] |
Larson | 11888751 | [2080111] | 1.05 | [1.00] |
Sedgewick | 12111839 | [2080640] | 1.07 | [1.00] |
x65599 | 12144777 | [2102893] | 1.07 | [1.01] |
Arash Partow | 12235396 | [2084572] | 1.08 | [1.00] |
Ramakrishna | 12185834 | [2093253] | 1.08 | [1.01] |
Meiyan | 12269691 | [2111271] | 1.08 | [1.02] |
CRC-32 | 12604152 | [2075088] | 1.11 | [1.00] |
Murmur2 | 12713455 | [2081476] | 1.12 | [1.00] |
SBox | 12716574 | [2084018] | 1.12 | [1.00] |
Hanson | 12627597 | [2129832] | 1.12 | [1.03] |
lookup3 | 12791917 | [2084889] | 1.13 | [1.01] |
FNV-1a | 12868991 | [2081195] | 1.14 | [1.00] |
Murmur3 | 12916960 | [2082084] | 1.14 | [1.00] |
XXHfast32 | 12936106 | [2084164] | 1.14 | [1.00] |
XXHstrong32 | 12950650 | [2084514] | 1.14 | [1.00] |
Murmur2A | 13068746 | [2081370] | 1.15 | [1.00] |
Paul Hsieh | 12992315 | [2180206] | 1.15 | [1.05] |
MaPrime2c | 13348580 | [2084467] | 1.18 | [1.00] |
One At Time | 13662010 | [2087861] | 1.21 | [1.01] |
Weinberger | 14592843 | [3541181] | 1.29 | [1.71] |
Fletcher | 37410790 | [9063797] | 3.30 | [4.37] |
Novak unrolled | 37769882 | [6318611] | 3.34 | [3.05] |
Some functions were excluded from the benchmark because of very bad performance:
- Adler-32 (slow filling, not suitable as a hash function);
- TwoChars (bad for machine-generated names and variable names that are similar to each other, disastrous for large data sets such as Wikipedia).
The number of collisions depending on the hash table size (for the same data set, thanks to Ace for the idea):
Red Dragon Book proposes the following formula for evaluating hash function quality:
![sum from j=0 to m-1: b_j(b_j+1)/2 / [(n/2m)(n+2m-1)]](/media/hash_red_dragon_formula.png)
where bj is the number of items in j-th slot, m is the number of slots, and n is the total number of items. The sum of bj(bj + 1) / 2 estimates the number of slots your program should visit to find the required value. The denominator (n / 2m)(n + 2m − 1) is the number of visited slots for an ideal function that puts each item into a random slot. So, if the function is ideal, the formula should give 1. In reality, a good function is somewhere between 0.95 and 1.05. If it's more, there is a high number of collisions (slow!). If it's less, the function gives less collisions than the randomly distributing function, which is not bad.
Here are the results for some of our functions:
Conclusion
Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (words and sonnets tests).
For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.
Novak showed bad results on the large data set. Jesteress has a high number of collisions in numbers test.
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Download the source code (152 KB, MSVC++)
Variations
XORing high and low part
For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:
return hash ^ (hash >> 16);
Subtracting a constant
x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.
Using larger multipliers for a compiler
Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.
The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).
Setting hash table size to a prime number
A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:
Words | Win32 | Numbers | Prefix | Postfix | Variables | Shakespeare | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bernstein % 2K | 145 | [261] | 880 | [889] | 426 | [8030] | 326 | [214] | 316 | [226] | 649 | [697] | 874 | [1131] |
Bernstein % prime | 186 | [221] | 1049 | [995] | 445 | [5621] | 364 | [194] | 357 | [217] | 805 | [800] | 1123 | [1051] |
Bernstein optimized mod | 160 | [221] | 960 | [995] | 416 | [5621] | 341 | [194] | 334 | [217] | 722 | [800] | 969 | [1051] |
x17 % 2K | 137 | [193] | 847 | [1002] | 81 | [340] | 314 | [244] | 300 | [228] | 641 | [863] | 832 | [1012] |
x17 % prime | 173 | [256] | 1010 | [1026] | 104 | [324] | 356 | [246] | 339 | [216] | 760 | [760] | 1046 | [1064] |
x17 optimized mod | 155 | [256] | 915 | [1026] | 96 | [324] | 330 | [246] | 315 | [216] | 691 | [760] | 930 | [1064] |
Implementing open addressing vs. separate chaining
With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:
Bernst. | K&R | x17 unroll | x65599 | FNV | Univ | Weinb. | Hsieh | One-at | Lookup3 | Partow | CRC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
OA | 426 | 81 | 84 | 207 | 88 | 91 | 273 | 110 | 103 | 92 | 1042 | 79 |
[8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
32-bit | 179 | 69 | 74 | 114 | 86 | 80 | 125 | 105 | 99 | 92 | 347 | 82 |
[8030] | [20810] | [340] | [3158] | [207] | [480] | [4360] | [342] | [267] | [205] | [20860] | [96] | |
chain | 92 | 68 | 73 | 82 | 88 | 84 | 73 | 107 | 99 | 95 | 149 | 84 |
[500] | [500] | [24] | [258] | [124] | [48] | [100] | [138] | [131] | [108] | [1530] | [64] |
You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.
Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.
215 comments
Ten recent comments are shown below. Show all comments
Peter, I see no ways to better the 32bit code hashers, so the fastest known to me 32bit hasher in 64bit code is:
https://forum.thefreedictionary.com/postsm1118964_MASAKARI--The-people-s-choice--General-Purpose-Grade--English-wordlist.aspx#1118964
What is the size of Bucket used in all the hash function mentioned in the graph named (hash function quality using red dragon book) in this blog?. It is requested to answer the query please.
Hello Mohit, thank you, it's a good question. The number of buckets (slots) were 2 * the number of items rounded to the next multiple of two. For example, in the "numbers" test there are 500 items in the table, so the number of buckets (hash table size) is 2 * 512 = 1024. Hope this helps
Thanks for your response
I just wanted to know the exact value for the uniform distribution of different hash function in the graph named (hash function quality using red dragon book), Can you please provide the exact values for which you plotted the graph.
It would be very helpful of you.
Can you please provide me with the exact value in the hash function quality graph just for numbers dataset? It would be really helpful of you.
Thanks in advance @Peter Kankowski.
Where’s the Wikipedia wordlist (or where does one get it from), and, more importantly, the OA test code?
I’m looking for a good OA spread/avalanche combo that’s cheap enough but doesn’t invoke UB or IB in C and is extremely portable (so it has to read by bytes, which makes a CRC surprisingly expensive, 2.39 to one-at-a-time’s 2.21 on my test borrowed windows system (don’t normally have one but your source is for it…))
Here are some interesting hash functions that can be added to your comparison suite:
I would like to present my latest research:
https://github.com/matteo65/ZedmeeHash a new hashing function with very interesting features
I would like to see this tried with sha-ni too