Abstract:
Benchmark program for hash tables and comparison of 15 popular hash functions.

Created 1 year ago by Peter Kankowski
Last changed 1 day ago
Contributors: Nils, Ace, and Won
Filed under Algorithms

Hash functions: An empirical comparison

Hash tables are popular data structures for storing key-value pairs. A hash function is used to map the key value (usually a string) to array index. The functions are different from cryptographic hash functions, because they should be much faster and don't need to be resistant to preimage attack. There are two classes of the functions used in hash tables:

Hash table benchmarks usually include theoretical metrics such as the number of collisions or distribution uniformity (see, for example, hash function comparison in the Red Dragon book). Obviously, you will have a better distribution with more complex functions, so they are winners in these benchmarks.

The question is whether using complex functions gives you a faster program. The complex functions require more operations per one key, so they can be slower. Is the price of collisions high enough to justify the additional operations?

Multiplicative hash functions

Any multiplicative hash function is a special case of the following algorithm:

UINT HashMultiplicative(const CHAR *key, SIZE_T len) {
   UINT hash = INITIAL_VALUE;
   for(UINT i = 0; i < len; ++i)
      hash = M * hash + key[i];
   return hash % TABLE_SIZE;
}

(Sometimes XOR operation is used instead of addition, but it does not make much difference.) The hash functions differ only by values of INITIAL_VALUE and multiplier (M). For example, the popular Bernstein's function uses INITIAL_VALUE of 5381 and M of 33; Kernighan and Ritchie's function uses INITIAL_VALUE of 0 and M of 31.

A multiplicative function works by adding together the letters weighted by powers of multiplier. For example, the hash for the word TONE will be:

  INITIAL_VALUE * M^4  +  'T' * M^3  +  'O' * M^2  +  'N' * M  +  'E'

Let's enter several similar strings and watch the output of the functions:

     Bernstein Kernighan
        (M=33)     (M=31)
 too   b88af17     1c154
 top   b88af18     1c155
 tor   b88af1a     1c157
 tpp   b88af39     1c174
a000  7c9312d6    2cd22f
a001  7c9312d7    2cd230
a002  7c9312d8    2cd231
a003  7c9312d9    2cd232
a004  7c9312da    2cd233
a005  7c9312db    2cd234
a006  7c9312dc    2cd235
a007  7c9312dd    2cd236
a008  7c9312de    2cd237
a009  7c9312df    2cd238
a010  7c9312f7    2cd24e
   a     2b606        61
  aa    597727       c20
 aaa   b885c68     17841

Too and top are different in the last letter only. The letter P is the next one after O, so the values of hash function are different by 1 (1c154 and 1c155, b88af17 and b88af18). Ditto for a000..a009.

Now let's compare top with tpp. Their hashes will be:

  INITIAL_VALUE * M^3 + 'T' * M^2 + 'O' * M + 'P'
  INITIAL_VALUE * M^3 + 'T' * M^2 + 'P' * M + 'P'

The hashes will be different by M * ('P' - 'O') = M. Similarly, when the first letters are different by x, their hashes will be different by x * M^2.

When there are less than 33 possible letters, Bernstein's function will pack them into a number (similar to Radix40 packing scheme). For example, hash table of size 333 will provide perfect hashing (without any collisions) for all three-letter English words written in small letters. In practice, the words are longer and hash tables are smaller, so there will be some collisions (situations when different strings have the same hash value).

If the string is too long to fit into the 32-bit number, the first letters will still affect the value of the hash function, because the multiplication is done modulo 2^32 (in a 32-bit register), and the multiplier is chosen to have no common divisors with 2^32 (in other words, it must be odd), so the bits will not be just shifted away.

There are no exact rules for choosing the multiplier, only some heuristics:

Complex hash functions

These functions do a good job of mixing together the bits of the source word. The change in one input bit changes a half of the bits in the output (see Avalanche_effect), so the result looks completely random:

     Paul Hsieh One At Time
 too   3ad11d33  3a9fad1e  
 top   78b5a877  4c5dd09a  
 tor   c09e2021  f2aa9d35  
 tpp   3058996d  d5e9e480  
a000   7552599f  ed3859d8  
a001   3cc1d896  fef7fd57  
a002   c6ff5c9b  08a610b3  
a003   dcab7b0c  1a88b478  
a004   780c7202  3621ebaa  
a005   7eb63e3a  47db8f1d  
a006   6b0a7a17  b901717b  
a007   cb5cb1ab  caec1550  
a008   5c2a15c0  e58d4a92  
a009   33339829  f75aee2d  
a010   eb1f336e  bd097a6b  
   a   115ea782  ca2e9442  
  aa   008ad357  7081738e  
 aaa   7dfdc310  ae4f22ec

To achieve this behavior, the hash functions perform a lot of shifts, XORs, and additions. But do we need a complex function? What is faster: tolerating the collisions and resolving them with chaining, or avoiding them with a more complex function?

Test conditions

The benchmark uses separate chaining algorithm for collision resolution. Memory allocation and other "heavy" functions were excluded from the benchmarked code. The RDTSC instruction was used for benchmarking. The test was performed on a Pentium-M processor.

The benchmark inserts some keys in the table, then looks them up in the same order as they were inserted. The test data include:

Results

WordsWin32NumbersPrefixPostfixVariablesSonnetsSum
Fletcher129[131]693[406]190[460]233[127]221[108]579[507]878[1052]2923[2791]
Murmur2140[103]796[415]83[104]263[106]258[111]628[383]873[566]3041[1788]
x17 unrolled135[93]847[421]73[24]302[96]296[104]634[368]856[557]3143[1663]
x65599139[111]857[382]82[203]319[107]314[122]641[379]861[560]3213[1864]
Paul Larson138[99]872[416]68[16]320[99]314[105]647[366]872[583]3231[1684]
K&R145[106]884[437]88[288]323[94]316[106]657[360]898[561]3311[1952]
lookup3153[101]848[412]95[97]291[101]283[95]672[361]977[550]3319[1717]
Bernstein146[114]894[412]91[288]326[100]321[102]658[353]895[572]3331[1941]
Paul Hsieh158[114]853[420]108[118]281[101]274[100]677[341]992[600]3343[1794]
Ramakrishna151[108]971[409]80[91]371[125]360[103]704[360]926[528]3563[1724]
Sedgewick154[107]970[414]84[48]367[103]362[103]706[348]960[570]3603[1693]
FNV-1a157[124]977[428]88[108]367[94]362[105]711[374]955[555]3617[1788]
CRC-32154[101]987[426]84[64]371[107]362[94]718[338]959[563]3635[1693]
Arash Partow154[101]1009[435]148[420]376[98]366[85]731[355]952[570]3736[2064]
One At Time160[105]1019[421]99[110]384[97]377[103]745[364]992[545]3776[1745]
Weinberger170[104]1220[422]73[100]474[111]470[117]834[364]1056[712]4297[1930]
Two chars126[182]750[1265]498[490]305[351]228[310]1112[1454]1708[2706]4727[6758]
Hanson131[118]799[649]74[112]264[118]4418[499]595[435]813[592]7094[2523]

Each cell includes the execution time, then the number of collisions in square brackets. Execution time is expressed in thousands of clock cycles (a lower number is better). Sum column contains the total execution time and the total number of collisions for all test files.

The function by Kernighan and Ritchie is from their famous book "The C programming Language", 3rd edition; Weinberger's hash and the hash with multiplier 65599 are from the Red Dragon book. The latter function is used in gawk, sdbm, and other Linux programs. x17 is the function by Peter Kankowski (multiplier = 17; 32 is subtracted from each letter code).

As you can see from the table, the function with the lowest number of collisions is not always the fastest one. For example, compare CRC-32 and Larson's hash in the "numbers" test.

Conclusion

Complex functions by Paul Hsieh and Bob Jenkins are tuned for long keys, such as the ones in postfix and prefix tests. Note that they do not provide the best number of collisions for these tests, but do have the best time, which means that the functions are faster than the others because of loop unrolling. At the same time, they are suboptimal for short keys (common words and sonnets tests).

For a word counting program, a compiler, or another application that typically handles short keys, it's often advantageous to use a simple multiplicative function such as x17 or Larson's hash. However, these functions perform badly on long keys.

Fletcher's checksum has the best total time, but a large number of collisions in sonnets test.

Murmur2 is the only of the complex hash functions that provides good performance for all kinds of keys. It can be recommended as a general-purpose hashing function.

Download the source code (80 KB, MSVC++)

Variations

XORing high and low part

For table size less than 2^16, we can improve the quality of hash function by XORing high and low words, so that more letters will be taken into account:

   return hash ^ (hash >> 16);

Subtracting a constant

x17 hash function subtracts a space from each letter to cut off the control characters in the range 0x00..0x1F. If the hash keys are long and contain only Latin letters and numbers, the letters will be less frequently shifted out, and the overall number of collisions will be lower. You can even subtract 'A' when you know that the keys will be only English words.

Using larger multipliers for a compiler

Paul Hsieh noted that large multipliers may provide better results for the hash table in a compiler, because a typical source code contains a lot of one-letter variable names (i, j, s, etc.), and they will collide if the multiplier is less than the number of letters in the alphabet.

The test confirms this assumption: the function by Kernighan & Ritchie (M = 33) has lower number of collisions than x17 (M = 17), but the latter is still faster (see Variables column in the table above).

Setting hash table size to a prime number

A test showed that the number of collisions will usually be lower if you use a prime, but the calculations modulo prime take much more time than the calculations for a power of 2, so this method is impractical. Even replacing division with multiplication by reciprocal values do not help here:

WordsWin32NumbersPrefixPostfixVariablesShakespeare
Bernstein % 2K145[261]880[889]426[8030]326[214]316[226]649[697]874[1131]
Bernstein % prime186[221]1049[995]445[5621]364[194]357[217]805[800]1123[1051]
Bernstein optimized mod160[221]960[995]416[5621]341[194]334[217]722[800]969[1051]
x17 % 2K137[193]847[1002]81[340]314[244]300[228]641[863]832[1012]
x17 % prime173[256]1010[1026]104[324]356[246]339[216]760[760]1046[1064]
x17 optimized mod155[256]915[1026]96[324]330[246]315[216]691[760]930[1064]

Implementing open addressing vs. separate chaining

With open addressing, most hash functions show awkward clustering behavior in "Numbers" test:

Bernst.K&Rx17 unrollx65599FNVUnivWeinb.HsiehOne-atLookup3PartowCRC
OA4268184207889127311010392104279
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
32-bit17969741148680125105999234782
[8030][20810][340][3158][207][480][4360][342][267][205][20860][96]
chain92687382888473107999514984
[500][500][24][258][124][48][100][138][131][108][1530][64]

You can avoid the worst case by using chaining for collision resolution. However, chaining requires more memory for the next item pointers, so the performance improvement does not come for free. A custom memory allocator should be usually written, because calling malloc() for a large number of small structures is suboptimal.

Some implementations (e.g., hash table in Python interpreter) store a full 32-bit hash with the item to speed up the string comparison, but this is less effective than chaining.

79 comments

Ten recent comments are shown below. Show all comments

Peter Kankowski, 2 days ago

Thanks for your comment. MSVC generates the same machine code for

h = h ^ (g >> 24);
h = h ^ g;
and
h ^= g >> 24 ^ g;

Operator ^ has lower precedence than >> (^ is lower in the precedence table). You probably changed something else in the source code that affected performance.

I tried h ^ (h >> 16) for K&R function, and it was slower this way:

WordsWin32NumbersPrefixPostfixVariablesShakespeare
without XOR146[106]884[437]88[288]323[94]317[106]657[360]905[561]
with XOR144[94]894[432]91[288]330[110]322[105]660[352]906[573]

For Larson's hash, I just forgot to try it (XOR version is faster). Thank you for reminding!

ace, 2 days ago
> ^ is lower in the precedence table

You are certainly right, my error! This was without thinking from my side, as I was used to see x >> y + z where the author wanted to >> before +. However ^ is really weaker than >>.

To excuse myself, this lapse occurred as I wanted to make my small contribution:

D. R. Hanson uses in two of his books, since around 1997 up to now, e.g. here:

http://code.google.com/p/cii/source/browse/tags/v20/src/atom.c
the following hash function:
for (h = 0, i = 0; i < len; i++)
    h = (h<<1) + scatter[(unsigned char)str[i]];

And it's very poor! Your tests immediately demonstrate the (this time, real) error.

ace, 2 days ago
(is there any page with the new syntax for formatting comments?)
Peter Kankowski, 2 days ago

No problem.

The comments are plain text now. I've rewritten strchr.com from scratch, as I promised to do earlier. Real minimalism: around 1500 lines of PHP code written in a couple of weeks, no bloated CMSes or "web frameworks". All your comments were saved, as usual. Later, there will be a WYSIWYG editor for comments and a button to publish your own article :)

, 2 days ago
Congratulation for the new implementation!

Xkcd recently brought to my attention "Collatz conjecture" (http://en.wikipedia.org/wiki/Collatz_conjecture)

Using programming language notation, giving the following Python function (where numbers are of unlimited number of bits):
def collatz( n ):
    while 1:
        n = 3 * n + 1 if n & 1 else n >> 1
        if n == 1: break
the conjecture states that for every n the loop will eventually terminate.

One way to look at it is: the 3*n+1 would be an acceptable hash function step where each input would be a single 1. (of course using unlimited number of bits is not for practical hash functions). The n >> 1 branch removes all zero LS bits, if they exist, else the "hash function" is applied (note also that by the construction of the loop every "hash" step results in at least one zero LSb). So the conjecture can be restated "applying 3*n+1 transformation step over the number with unlimited number of bits and the transformation which removes LS zero bits from the number in one moment there would be exactly 1 set bit in the number."

Of course I don't see any practical aspect of all this, except how a lot of experiments (in wikipedia article) show the "bit randomizing" properties of the multiplication with the odd number.

And that's what's wrong in Hanson's hash: he uses an even multiplier and in every step he loses information about previous steps(!) Using the "scatter" mapping trick of input byte doesn't help at all -- the multiplier is the engine, here the broken one.
ace, 2 days ago
(I guess that's the bug in the new web code, allowing empty name?)
Peter Kankowski, 2 days ago
Thanks, it's a good example of seemingly simple but unsolved problem. I will fix the empty name bug.
ace, 2 days ago

I suggest adding the function used by Hanson to your set of hashes in hash.cpp. The results nicely demonstrate how a faulty hash function can be designed, demonstrated in the books and used for a long time:

static unsigned long scatter[] = {
2078917053, 143302914, 1027100827, 1953210302, 755253631, 2002600785,
1405390230, 45248011, 1099951567, 433832350, 2018585307, 438263339,
813528929, 1703199216, 618906479, 573714703, 766270699, 275680090,
1510320440, 1583583926, 1723401032, 1965443329, 1098183682, 1636505764,
980071615, 1011597961, 643279273, 1315461275, 157584038, 1069844923,
471560540, 89017443, 1213147837, 1498661368, 2042227746, 1968401469,
1353778505, 1300134328, 2013649480, 306246424, 1733966678, 1884751139,
744509763, 400011959, 1440466707, 1363416242, 973726663, 59253759,
1639096332, 336563455, 1642837685, 1215013716, 154523136, 593537720,
704035832, 1134594751, 1605135681, 1347315106, 302572379, 1762719719,
269676381, 774132919, 1851737163, 1482824219, 125310639, 1746481261,
1303742040, 1479089144, 899131941, 1169907872, 1785335569, 485614972,
907175364, 382361684, 885626931, 200158423, 1745777927, 1859353594,
259412182, 1237390611, 48433401, 1902249868, 304920680, 202956538,
348303940, 1008956512, 1337551289, 1953439621, 208787970, 1640123668,
1568675693, 478464352, 266772940, 1272929208, 1961288571, 392083579,
871926821, 1117546963, 1871172724, 1771058762, 139971187, 1509024645,
109190086, 1047146551, 1891386329, 994817018, 1247304975, 1489680608,
706686964, 1506717157, 579587572, 755120366, 1261483377, 884508252,
958076904, 1609787317, 1893464764, 148144545, 1415743291, 2102252735,
1788268214, 836935336, 433233439, 2055041154, 2109864544, 247038362,
299641085, 834307717, 1364585325, 23330161, 457882831, 1504556512,
1532354806, 567072918, 404219416, 1276257488, 1561889936, 1651524391,
618454448, 121093252, 1010757900, 1198042020, 876213618, 124757630,
2082550272, 1834290522, 1734544947, 1828531389, 1982435068, 1002804590,
1783300476, 1623219634, 1839739926, 69050267, 1530777140, 1802120822,
316088629, 1830418225, 488944891, 1680673954, 1853748387, 946827723,
1037746818, 1238619545, 1513900641, 1441966234, 367393385, 928306929,
946006977, 985847834, 1049400181, 1956764878, 36406206, 1925613800,
2081522508, 2118956479, 1612420674, 1668583807, 1800004220, 1447372094,
523904750, 1435821048, 923108080, 216161028, 1504871315, 306401572,
2018281851, 1820959944, 2136819798, 359743094, 1354150250, 1843084537,
1306570817, 244413420, 934220434, 672987810, 1686379655, 1301613820,
1601294739, 484902984, 139978006, 503211273, 294184214, 176384212,
281341425, 228223074, 147857043, 1893762099, 1896806882, 1947861263,
1193650546, 273227984, 1236198663, 2116758626, 489389012, 593586330,
275676551, 360187215, 267062626, 265012701, 719930310, 1621212876,
2108097238, 2026501127, 1865626297, 894834024, 552005290, 1404522304,
48964196, 5816381, 1889425288, 188942202, 509027654, 36125855,
365326415, 790369079, 264348929, 513183458, 536647531, 13672163,
313561074, 1730298077, 286900147, 1549759737, 1699573055, 776289160,
2143346068, 1975249606, 1136476375, 262925046, 92778659, 1856406685,
1884137923, 53392249, 1735424165, 1602280572
};


UINT hashHanson( const CHAR* s, SIZE_T L )
{
    UINT h = 0;
    for ( SIZE_T i = 0; i < L; i++ ) {
        unsigned char c = s[i];
        h = ( h << 1 ) + scatter[ c ];
    }
    return h;
}

The serious weakness is visible for your "Postfix" set (the strings with the same long postfix).

Peter Kankowski, 1 day ago
Thanks. Just cannot remember why I commented out HashHanson in test.cpp :) It's included now, and the table in article is sorted by the total running time.
ace, 1 day ago
Finally, just for completeness, I propose one more function. I named it Novak Hash, and it's basically what should have been done to still do the "scatter" approach and to implement the function right (that is, not losing significant information about the previous values in every step and reducing the size of the table):

unsigned char* rijndaelSBox = (unsigned char*)
"\x63\x7c\x77\x7b\xf2\x6b\x6f\xc5"
"\x30\x01\x67\x2b\xfe\xd7\xab\x76"
"\xca\x82\xc9\x7d\xfa\x59\x47\xf0"
"\xad\xd4\xa2\xaf\x9c\xa4\x72\xc0"
"\xb7\xfd\x93\x26\x36\x3f\xf7\xcc"
"\x34\xa5\xe5\xf1\x71\xd8\x31\x15"
"\x04\xc7\x23\xc3\x18\x96\x05\x9a"
"\x07\x12\x80\xe2\xeb\x27\xb2\x75"
"\x09\x83\x2c\x1a\x1b\x6e\x5a\xa0"
"\x52\x3b\xd6\xb3\x29\xe3\x2f\x84"
"\x53\xd1\x00\xed\x20\xfc\xb1\x5b"
"\x6a\xcb\xbe\x39\x4a\x4c\x58\xcf"
"\xd0\xef\xaa\xfb\x43\x4d\x33\x85"
"\x45\xf9\x02\x7f\x50\x3c\x9f\xa8"
"\x51\xa3\x40\x8f\x92\x9d\x38\xf5"
"\xbc\xb6\xda\x21\x10\xff\xf3\xd2"
"\xcd\x0c\x13\xec\x5f\x97\x44\x17"
"\xc4\xa7\x7e\x3d\x64\x5d\x19\x73"
"\x60\x81\x4f\xdc\x22\x2a\x90\x88"
"\x46\xee\xb8\x14\xde\x5e\x0b\xdb"
"\xe0\x32\x3a\x0a\x49\x06\x24\x5c"
"\xc2\xd3\xac\x62\x91\x95\xe4\x79"
"\xe7\xc8\x37\x6d\x8d\xd5\x4e\xa9"
"\x6c\x56\xf4\xea\x65\x7a\xae\x08"
"\xba\x78\x25\x2e\x1c\xa6\xb4\xc6"
"\xe8\xdd\x74\x1f\x4b\xbd\x8b\x8a"
"\x70\x3e\xb5\x66\x48\x03\xf6\x0e"
"\x61\x35\x57\xb9\x86\xc1\x1d\x9e"
"\xe1\xf8\x98\x11\x69\xd9\x8e\x94"
"\x9b\x1e\x87\xe9\xce\x55\x28\xdf"
"\x8c\xa1\x89\x0d\xbf\xe6\x42\x68"
"\x41\x99\x2d\x0f\xb0\x54\xbb\x16";

unsigned long NovakHashUnrolled( char* s, int L )
{
unsigned long h = 0;
int i;
unsigned char* t = (unsigned char*)s;
for ( h = 0, i = 0; i < ( L & ~1 ); i += 2 ) {
h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ];
h = ( h << 1 ) + h + rijndaelSBox[ t[ i + 1 ] ];
}
if ( L & 1 )
h = ( h << 1 ) + h + rijndaelSBox[ t[ L - 1 ] ];
return h;
}

unsigned long NovakHash( const char* s, int L )
{
int i; unsigned long h = 0;
unsigned char* t = (unsigned char*)s;
for ( i = 0; i < L; i++ )
h = ( h << 1 ) + h + rijndaelSBox[ t[ i ] ];
return h;
}

I just consider it "how Hanson's function should have been done," I still believe that functions that don't use any tables are better in almost all real-life circumstances.
Your name:
Comment: