Too often people believe that they can "optimize" their program by using, say, i++
instead of i = i + 1
. Such tricks might be important twenty years ago, but modern compilers are clever enough to do this job without your help. Here are some of the things that your compiler can do for you.
Integer arithmetic optimizations
Paul Hsieh suggests the following tricks:
// Non-optimized: Optimized:
// Using bitwise AND to calculate Y modulo N, where N is a power of two
x = y % n; x = y & (n - 1);
x = y % 32; x = y & 31;
// Using shifts instead of multiplication by power of two
x = y * 8; x = y << 3;
A test with MSVC++, GCC, Borland C++, Pelles C, and Digital Mars showed that all these compilers generated the same optimized code for both snippets. So you should not obfuscate your code by writing
bitmap_size = width * height << 2;
instead of
bitmap_size = width * height * sizeof(COLORREF);
Modern compilers optimize multiplication and division by an arbitrary constant. For example, your compiler can use a combination of shifts, additions, and subtractions instead of the slower IMUL instruction:
// MSVC++ 2005, full optimization, favor fast code (/Ox /Ot)
// GCC 3.2, -O3 (generates the same code as MSVC++):
// (x * 7)
lea ecx, dword[eax*8]
sub ecx, eax
// (x * 5)
lea ecx, dword[eax+eax*4]
// (x * 8)
lea ecx, dword[eax*8]
Pelles C and Digital Mars C++ are unable to optimize some cases:
// Pelles C 4.5, maximize speed more (-Ot -Ox)
// Digital Mars 8.33
// (x * 7)
imul ecx,eax,7
Digital Mars and Borland C++ use SHL, not LEA for multiplication by a power of two:
// Digital Mars 8.33
// Borland C++ 5.5.1, optimize for speed (-O2)
// (x * 8)
shl eax,3
MSVC++ and GCC achieve better results, because the shift instruction is slow on Pentium IV. In any case, replacing multiplications with shifts obfuscates your source code and gives a little or no improvement in speed.
Constant folding
To improve readability, you can use arithmetic expressions for constants:
#define AVG_MONTH_LENGTH 365.25 / 12
const double AVG_MONTH_LENGTH = 365.25 / 12;
It doesn't matter if you use #define or const double: in both cases, the value of AVG_MONTH_LENGTH will be calculated at compile time. You don't have to divide 365.25 by 12 with a calculator, and then write the result in your source code file. Use your compiler instead.
// Another example of constant folding
WORD inline BSWAP(WORD x) {
return x >> 8 | x << 8;
// In MSVC++, you can use _byteswap_ushort (non-portable)
}
const WCHAR IDEOGRAPHIC_SPACE = 0x3000;
if(unicode_character == BSWAP(IDEOGRAPHIC_SPACE) || unicode_character == BSWAP(' '))
// ...
MSVC++, GCC, and Borland C++ can deduce that BSWAP(IDEOGRAPHIC_SPACE) == 0x0030 and BSWAP(' ') == 0x2000, and compare the unicode_character with these values. Obviously, the source code is much more readable than the one with "magic numbers":
if(unicode_character == 0x0030 || unicode_character == 0x2000)
// ...
Other tricks
Sometimes a compiler can optimize your code in a way that you did not think about. For example, if you always use a function with the same constant argument, MSVC++ 2005 avoids passing this parameter and propagate the constant into a function body. This makes the function code smaller and faster.
RedHat's site lists all optimizations performed by GCC. This compiler also "knows" many clever tricks; in some areas, it's even better than MSVC++. (See also a similar list from a compiler performance analysis tool.)
Generally, you don't need to use mysterious constants in your code. Your compiler can do this work for you, while you will enjoy reading clean and simple source code.
13 comments
For example if we want to do in high language:
x = a / 10;
r = a % 10;
The compiler will make something like this:
So modulo and dividing operation is done without DIV instruction (only one MUL)... This piece of code may be useful when doing binary to decimal conversion function in assembler.
Also, your code is for unsigned division; the code for signed division is slightly longer (and slower). That is one of reasons why I prefer to use UINT variables wherever it's possible.
Well, Peter, thank you for the example.
2. If you #define an arithmetic expression you should always put it in parentheses. You don't know what context it will be evaluated in so this would be better (also corrected length of year in days)
#define (AVG_MONTH_LENGTH 365.24 / 12)
Tim
Nope. Twenty years ago we laughed at the fools who did this, and called it optimization. We laughed at any attempt at optimization that didn't begin with re-assessing the chosen algorithms.
There aren't that many people who should be doing this anyway - I want the author of my video driver, kernel and standard libraries to pay attention to optimization. The rest of us/you/everyone have no business doing this.
And embedded systems are where the optimization matter the most.
Would it be possible for you to test this in the ARM compiler too?
Thanks,
Sarav
You actually demonstrated that to *not* be true for Pelles C and Digital Mars C++, unless of course those compilers would convert shifts back into a slower multiplication, which I doubt.
Also, when dealing with bitfields or boolean logic, bitshifts can be significantly easier to read than the equivalent multiplication, so it's not always a misguided attempt to micro-optimize.
https://www.in.redhat.com/software/gnupro/technical/gnupro_gcc.php3