Sometimes you need to check such conditions:
if(TOP <= y && y < BOTTOM)
Or such ones:
if('a' <= ch && ch <= 'z')
Two branches in this code burdens the branch predictor, adding unnecessary entries to its history table:
cmp eax, 'a'
jl bypass
cmp eax, 'z'
jg bypass
; if statement body
bypass:
How can this code be improved?
The trick
First, subtract the code of the letter 'a' from ch:
On the chart, the interval from 'a' to 'z' is highlighted with blue. After subtraction, this interval shifts down to zero. Now it covers 26 numbers from 0 to 25. All you need to do now is to check if the difference is less than 26:
if(unsigned(ch - 'a') < 26)
Note that you should use unsigned comparison here, because the interval [0; 'a'–1] (it’s marked with light gray color on the chart) shifts to [–'a'; –1] in case of signed numbers or to [256–'a';255] in case of unsigned ones. If the signed comparison is used, all character codes below 'a' will pass the test, while they should not:
char ch; // char is signed by default
if(ch - 'a' < 26) // Wrong! E.g., any figure ('0'..'9') passes the test
You should cast to unsigned or specify an unsigned type for the literal 26:
if(unsigned(ch - 'a') < 26)
or
if(ch - 'a' < 26U)
Effect on generated code
This unsigned comparison is translated to the following code:
sub eax, 'a'
cmp eax, 26
jae bypass
; if statement body
bypass:
Now, you got rid of the second branch; the code also became 2 bytes shorter. Sometimes, this trick can remove the branches completely:
// This is translated to a code with two branches:
bool b = ch <= 'a' && ch <= 'z';
// This is translated to a code without any branches:
bool b = unsigned(ch - 'a') < 26;
In the second example, the compiler can use SETB or SBB instruction to get the boolean result of comparison:
sub eax, 'a'
cmp eax, 26
sbb eax, eax ; -1 if eax < 26, otherwise 0
neg eax ; 1 if eax < 26, otherwise 0
On modern superscalar processors, this code will be faster, because there is no penalty for a mispredicted branch.
Macro
And, finally, let’s write a macro that will check if a point belongs to an interval:
#define InRange(ch, a, b) (unsigned((ch) - (a)) <= (b) - (a))
if(InRange(ch, 'A', 'Z')) // an upper-case Latin letter
…
else if(InRange(ch, '0', '9')) // a figure
…
Questions to the reader
- Explain why the following conditions are not equivalent:
ch - '0' < 10U ch < 10U + '0'
5 comments
Great!
But this techinque better for assemly-coding
Of course, and for the learning and much better understanding 'human' optimization methods.
I think that, because VC7-8 generate very optimal code, depends on command-line switches.
For example:
Compiler generate this:
Good result, isn't it?