Intel® Architecture Instruction Set Extensions Programming Reference (PDF reference). The description of new instructions in the upcoming Haswell processor, including transactional memory support, hardware random number generator, and 256-bit vector integer operations. The transactional memory instructions should be useful for GIL (global interpreter lock) in Python and Ruby. They tried to eliminate it with software TM, but it was too slow.
Brad, thank you very much for noticing. Your version has one instruction less in the loop. However, len += 4 can be executed in parallel with s += 4 on a superscalar processor. As far as I remember, both versions have the same speed for this reason.
A ternary structure for storing dictionaries is proposed. The structure is based on ternary search trie that is "compressed" into a DAG by linking together equal subtrees. By using it, you can eliminate affix stripping and implement a faster spelling corrector.