You should not obfuscate you code with low-level optimizations: your compiler can do them for you.

Created by Peter Kankowski
Last changed
Contributors: peter_k, Tim J, Bobby, and Dragon
Filed under Low-level code optimization

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

What your compiler can do for you

Too often people believe that they can "optimize" their program by using, say, i++ instead of i = i + 1. Such tricks might be important twenty years ago, but modern compilers are clever enough to do this job without your help. Here are some of the things that your compiler can do for you.

Integer arithmetic optimizations

Paul Hsieh suggests the following tricks:

// Non-optimized:                 Optimized:

// Using bitwise AND to calculate Y modulo N, where N is a power of two
   x = y % n;                     x = y & (n - 1);
   x = y % 32;                    x = y & 31;

// Using shifts instead of multiplication by power of two
   x = y * 8;                     x = y << 3;

A test with MSVC++, GCC, Borland C++, Pelles C, and Digital Mars showed that all these compilers generated the same optimized code for both snippets. So you should not obfuscate your code by writing

bitmap_size = width * height << 2;

instead of

bitmap_size = width * height * sizeof(COLORREF);

Modern compilers optimize multiplication and division by an arbitrary constant. For example, your compiler can use a combination of shifts, additions, and subtractions instead of the slower IMUL instruction:

// MSVC++ 2005, full optimization, favor fast code (/Ox /Ot)
// GCC 3.2, -O3 (generates the same code as MSVC++):

// (x * 7)
lea ecx, dword[eax*8]
sub ecx, eax

// (x * 5)
lea ecx, dword[eax+eax*4]

// (x * 8)
lea ecx, dword[eax*8]

Pelles C and Digital Mars C++ are unable to optimize some cases:

// Pelles C 4.5, maximize speed more (-Ot -Ox)
// Digital Mars 8.33
// (x * 7)
imul ecx,eax,7

Digital Mars and Borland C++ use SHL, not LEA for multiplication by a power of two:

// Digital Mars 8.33
// Borland C++ 5.5.1, optimize for speed (-O2)
// (x * 8)
shl eax,3

MSVC++ and GCC achieve better results, because the shift instruction is slow on Pentium IV. In any case, replacing multiplications with shifts obfuscates your source code and gives a little or no improvement in speed.

Constant folding

To improve readability, you can use arithmetic expressions for constants:

#define AVG_MONTH_LENGTH 365.25 / 12
const double AVG_MONTH_LENGTH = 365.25 / 12;

It doesn't matter if you use #define or const double: in both cases, the value of AVG_MONTH_LENGTH will be calculated at compile time. You don't have to divide 365.25 by 12 with a calculator, and then write the result in your source code file. Use your compiler instead.

// Another example of constant folding
WORD inline BSWAP(WORD x) {
    return x >> 8 | x << 8;
    // In MSVC++, you can use _byteswap_ushort (non-portable)

if(unicode_character == BSWAP(IDEOGRAPHIC_SPACE) || unicode_character == BSWAP(' '))
// ...

MSVC++, GCC, and Borland C++ can deduce that BSWAP(IDEOGRAPHIC_SPACE) == 0x0030 and BSWAP(' ') == 0x2000, and compare the unicode_character with these values. Obviously, the source code is much more readable than the one with "magic numbers":

if(unicode_character == 0x0030 || unicode_character == 0x2000)
// ...

Other tricks

Sometimes a compiler can optimize your code in a way that you did not think about. For example, if you always use a function with the same constant argument, MSVC++ 2005 avoids passing this parameter and propagate the constant into a function body. This makes the function code smaller and faster.

RedHat's site lists all optimizations performed by GCC. This compiler also "knows" many clever tricks; in some areas, it's even better than MSVC++. (See also a similar list from a compiler performance analysis tool.)

Generally, you don't need to use mysterious constants in your code. Your compiler can do this work for you, while you will enjoy reading clean and simple source code.

Peter Kankowski
Peter Kankowski

About the author

Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP (on Windows platform). He can be reached at kankowski@narod.ru.


Ten recent comments are shown below. Show all comments

Paul B,
> Such tricks might be important twenty years ago

Nope. Twenty years ago we laughed at the fools who did this, and called it optimization. We laughed at any attempt at optimization that didn't begin with re-assessing the chosen algorithms.

There aren't that many people who should be doing this anyway - I want the author of my video driver, kernel and standard libraries to pay attention to optimization. The rest of us/you/everyone have no business doing this.
Peter Kankowski,
Thank you, I fixed the colors and #define statement. It was some kind of bug with colors in the blog software.
Well you missed out the ARM compiler which is used more often for compiling embedded systems code in the commercial world.

And embedded systems are where the optimization matter the most.

Would it be possible for you to test this in the ARM compiler too?

Peter Kankowski,
Sorry, I have no experience with ARM processors and embedded systems. See the headline of this blog: "Code optimization, assembly language and C programming for Win32". If it's interesting for you, you can do the tests yourself.
Sarav, anybody interested in ARM technology or who just tried to type "arm compiler" in Google (not Microsoft's!) search box knows that GCC can be used for ARM CPUs and that it gives very good results.
"In any case, using shift instead of multiplication in source code gives you no benefit: you compiler will generate the same code anyway."

You actually demonstrated that to *not* be true for Pelles C and Digital Mars C++, unless of course those compilers would convert shifts back into a slower multiplication, which I doubt.

Also, when dealing with bitfields or boolean logic, bitshifts can be significantly easier to read than the equivalent multiplication, so it's not always a misguided attempt to micro-optimize.
Peter Kankowski,
Thanks, there was some contradiction; I corrected the statement. Shifts are certainly useful for bitfields (and they were designed for that purpose). My point was that replacing multiplication by a power of 2 to shift, as Paul Hsieh suggested, is a bad idea.
RedHat’s site lists all optimizations performed by GCC.


Peter Kankowski,
Thank you very much, I fixed the link.

when you #define something

you should always remember to add parentheses.

Your name:

Please ignore this field: