What your compiler can do for you

Too often people believe that they can "optimize" their program by using, say, i++ instead of i = i + 1. Such tricks might be important twenty years ago, but modern compilers are clever enough to do this job without your help. Here are some of the things that your compiler can do for you.

Integer arithmetic optimizations

Paul Hsieh suggests the following tricks:

// Non-optimized:                 Optimized:

// Using bitwise AND to calculate Y modulo N, where N is a power of two
   x = y % n;                     x = y & (n - 1);
   x = y % 32;                    x = y & 31;

// Using shifts instead of multiplication by power of two
   x = y * 8;                     x = y << 3;

A test with MSVC++, GCC, Borland C++, Pelles C, and Digital Mars showed that all these compilers generated the same optimized code for both snippets. So you should not obfuscate your code by writing

bitmap_size = width * height << 2;

instead of

bitmap_size = width * height * sizeof(COLORREF);

Modern compilers optimize multiplication and division by an arbitrary constant. For example, your compiler can use a combination of shifts, additions, and subtractions instead of the slower IMUL instruction:

// MSVC++ 2005, full optimization, favor fast code (/Ox /Ot)
// GCC 3.2, -O3 (generates the same code as MSVC++):

// (x * 7)
lea ecx, dword[eax*8]
sub ecx, eax

// (x * 5)
lea ecx, dword[eax+eax*4]

// (x * 8)
lea ecx, dword[eax*8]

Pelles C and Digital Mars C++ are unable to optimize some cases:

// Pelles C 4.5, maximize speed more (-Ot -Ox)
// Digital Mars 8.33
// (x * 7)
imul ecx,eax,7

Digital Mars and Borland C++ use SHL, not LEA for multiplication by a power of two:

// Digital Mars 8.33
// Borland C++ 5.5.1, optimize for speed (-O2)
// (x * 8)
shl eax,3

MSVC++ and GCC achieve better results, because the shift instruction is slow on Pentium IV. In any case, replacing multiplications with shifts obfuscates your source code and gives a little or no improvement in speed.

Constant folding

To improve readability, you can use arithmetic expressions for constants:

#define AVG_MONTH_LENGTH 365.25 / 12
const double AVG_MONTH_LENGTH = 365.25 / 12;

It doesn't matter if you use #define or const double: in both cases, the value of AVG_MONTH_LENGTH will be calculated at compile time. You don't have to divide 365.25 by 12 with a calculator, and then write the result in your source code file. Use your compiler instead.

// Another example of constant folding
WORD inline BSWAP(WORD x) {
    return x >> 8 | x << 8;
    // In MSVC++, you can use _byteswap_ushort (non-portable)
}
const WCHAR IDEOGRAPHIC_SPACE = 0x3000;

if(unicode_character == BSWAP(IDEOGRAPHIC_SPACE) || unicode_character == BSWAP(' '))
// ...

MSVC++, GCC, and Borland C++ can deduce that BSWAP(IDEOGRAPHIC_SPACE) == 0x0030 and BSWAP(' ') == 0x2000, and compare the unicode_character with these values. Obviously, the source code is much more readable than the one with "magic numbers":

if(unicode_character == 0x0030 || unicode_character == 0x2000)
// ...

Other tricks

Sometimes a compiler can optimize your code in a way that you did not think about. For example, if you always use a function with the same constant argument, MSVC++ 2005 avoids passing this parameter and propagate the constant into a function body. This makes the function code smaller and faster.

RedHat's site lists all optimizations performed by GCC. This compiler also "knows" many clever tricks; in some areas, it's even better than MSVC++. (See also a similar list from a compiler performance analysis tool.)

Generally, you don't need to use mysterious constants in your code. Your compiler can do this work for you, while you will enjoy reading clean and simple source code.

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created 16 years ago by Peter Kankowski
Last changed 15 years ago
Contributors: peter_k, Tim J, Bobby, and Dragon

13 comments

peter_k, 17 years ago

Most compilers can also speed up div function when the divider is know in the compilation time.

For example if we want to do in high language:



x = a / 10;

r = a % 10;

The compiler will make something like this:

; before ecx = a
mov edx, 0CCCCCCCDh
mov eax, ecx
mul edx
shr edx, 3
lea eax, [edx+edx*4]
add eax, eax
sub ecx, eax
; after that ecx = a % 10 and edx = a / 10

So modulo and dividing operation is done without DIV instruction (only one MUL)... This piece of code may be useful when doing binary to decimal conversion function in assembler.

Peter Kankowski, 17 years ago

Yes, and, as I heard, finding the magic number (0CCCCCCCDh in your example) for some divisors is not an easy task. Henry Warren wrote the whole chapter about it in his book.

Also, your code is for unsigned division; the code for signed division is slightly longer (and slower). That is one of reasons why I prefer to use UINT variables wherever it's possible.

Well, Peter, thank you for the example.

Tim J, 16 years ago

1. Dark blue on dark gray is not a clever color for text. Almost impossible to read.

2. If you #define an arithmetic expression you should always put it in parentheses. You don't know what context it will be evaluated in so this would be better (also corrected length of year in days)

#define (AVG_MONTH_LENGTH 365.24 / 12)

Tim

Inferis’ Mind Dump » Blog Archive » links for 2008-04-01, 16 years ago

[...] smallcode » Blog Archive » What your compiler can do for you Cool. (tags: code optimisation compiler c++) Share with Shareomatic! [...]

Paul B, 16 years ago

> Such tricks might be important twenty years ago

Nope. Twenty years ago we laughed at the fools who did this, and called it optimization. We laughed at any attempt at optimization that didn't begin with re-assessing the chosen algorithms.

There aren't that many people who should be doing this anyway - I want the author of my video driver, kernel and standard libraries to pay attention to optimization. The rest of us/you/everyone have no business doing this.

Peter Kankowski, 16 years ago

Thank you, I fixed the colors and #define statement. It was some kind of bug with colors in the blog software.

Sarav, 16 years ago

Well you missed out the ARM compiler which is used more often for compiling embedded systems code in the commercial world.

And embedded systems are where the optimization matter the most.

Would it be possible for you to test this in the ARM compiler too?

Thanks,
Sarav

Peter Kankowski, 16 years ago

Sorry, I have no experience with ARM processors and embedded systems. See the headline of this blog: "Code optimization, assembly language and C programming for Win32". If it's interesting for you, you can do the tests yourself.

ace, 16 years ago

Sarav, anybody interested in ARM technology or who just tried to type "arm compiler" in Google (not Microsoft's!) search box knows that GCC can be used for ARM CPUs and that it gives very good results.

Bobby, 15 years ago

"In any case, using shift instead of multiplication in source code gives you no benefit: you compiler will generate the same code anyway."

You actually demonstrated that to *not* be true for Pelles C and Digital Mars C++, unless of course those compilers would convert shifts back into a slower multiplication, which I doubt.

Also, when dealing with bitfields or boolean logic, bitshifts can be significantly easier to read than the equivalent multiplication, so it's not always a misguided attempt to micro-optimize.

Peter Kankowski, 15 years ago

Thanks, there was some contradiction; I corrected the statement. Shifts are certainly useful for bitfields (and they were designed for that purpose). My point was that replacing multiplication by a power of 2 to shift, as Paul Hsieh suggested, is a bad idea.

Dragon, 15 years ago

RedHat’s site lists all optimizations performed by GCC.

https://www.in.redhat.com/software/gnupro/technical/gnupro_gcc.php3