Intel® Architecture Instruction Set Extensions Programming Reference (PDF reference). The description of new instructions in the upcoming Haswell processor, including transactional memory support, hardware random number generator, and 256-bit vector integer operations. The transactional memory instructions should be useful for GIL (global interpreter lock) in Python and Ruby. They tried to eliminate it with software TM, but it was too slow.
Peter, I've seen the sse2 implementation by Dmitry.
For the short string your implementation is the fastest (1.4x faster than the glibc's implementation and 1.5x faster than the Dmitry's one).
For the long string your implementation is the slowest (7...
A ternary structure for storing dictionaries is proposed. The structure is based on ternary search trie that is "compressed" into a DAG by linking together equal subtrees. By using it, you can eliminate affix stripping and implement a faster spelling corrector.