Reverse Polish Notation and Expression Compiler

Reverse Polish Notation

By convention, we put an operator (+) between two numbers:

2 + 2 = 4

But you also can put it before or after the operands:

+ 2 2
2 2 +

The first way is called prefix notation (LISP uses it), and the second is Reverse Polish Notation (it was invented by Jan Łukasiewicz, a Polish mathematician).

Here are more complex examples of RPN:

	Reverse polish notation	Common (infix) notation
(a)	7 4 + 3 −	7 + 4 − 3
(b)	1 2 * 3 +	1 * 2 + 3
(c)	1 2 + 3 *	(1 + 2) * 3
(d)	12 3 / 2 /	12 / 3 / 2
(e)	1 2 * 3 4 * +	1 * 2 + 3 * 4
(f)	5 9 2 * +	5 + 9 * 2

You can read the example (a) as "7 is taken; 4 is added to it; 3 is subtracted from it". So, it will be written as 7 4 + 3 −. And (b) will be: "1 is taken; 2 is multiplied by it; 3 is added to it". (It sounds better in a language with free order of words.) Spelling the expression is a good exercise that helps to understand RPN. Try to read aloud the expressions (c) and (d) in your native language.

If RPN is new for you, you should also try the RPN calculator (or its real-world analog, if you can get one).

The examples (b) and (c) show that there is no need for parentheses in RPN. All operators have the same priority; they are performed in the order they are written.

RPN interpreters use operand stack for evaluation (you can read about it in the Wikipedia article). The lines (e) and (f) are examples of more complex expressions that can be evaluated on stack.

In fact, x87 FPU instructions are an advanced form of RPN, where the terms are written as separate instructions:

(a)          (b)          (e)          (f)
fld 7        fld 1        fld 1        fld 5
fld 4        fld 2        fld 2        fld 9
faddp        fmulp        fmulp        fld 2
fld 3        fld 3        fld 3        fmulp
fsubp        faddp        fld 4        faddp
                          fmulp
                          faddp

Here "fld 7" is used as a short way to write

seven dq 7.0
fld seven

Many VMs (including CLR and Java VM) use a similar notation.

Expression compiler

Now you know enough to integrate code generator with the expression evaluator. Instead of doing calculations, ExprEval will generate the code by calling ExprCode's methods. For example, the ParseFactors function will be:

// Parse multiplication and division
void ParseFactors() {
    ParseAtom();
    for(;;) {
        // Save the operation and the position
        EVAL_CHAR op = *_cur_char;
        if(op != '/' && op != '*')
            return;
        _cur_char++;
        ParseAtom();
        // Emit the saved operation
        _code.EmitOperation(op == '*' ? OP_MUL : OP_DIV);
    }
}

With the code generator class, writing a compiler it's not harder than writing an interpreter.

Catching division by zero

There is no portable way to create an exception handler for division by zero (MSVC++ CRT contains the _matherr function, but it's Microsoft-specific). However, you can check if the result is −∞ or +∞, which will catch the division by zero:

bool InfinifyOrNan(double num) {
    unsigned second_byte = *((unsigned*)&num + 1);
    const unsigned EXPONENT_MASK = 0x7FF00000;
    return (second_byte & EXPONENT_MASK) == EXPONENT_MASK;
}

In IEEE floating point format, the value is infinity if its exponent contains "all ones" pattern.

Changes and improvements

From the ParseFactors code, you can see another change in ExprEval: the pointer to a current character is now a class member (_cur_char). This simple modification saves space on stack and makes the program faster.

If you define DUMP_RPN, the program will dump the compiled code. You can use this feature to study RPN. For example, try to compile the expressions (a)-(f) above.

Recursive descent algorithm used in this series implements a top-down approach to parsing. Shunting yard is an example of another approach (bottom-up). The recursive descent algorithm is easier to understand, though it uses more memory on stack for a complex grammar.

Download the expression compiler, 41 KB

5 comments

ac, 17 years ago

Re: InfinifyOrNan function

http://msdn.microsoft.com/library/en-us/vclib/html/_crt__finite.asp

Peter Kankowski, 17 years ago

Thank you, I browsed the Floating point functions page in MSDN, but overlooked this function. However, __finite is not portable, so I will stay with my InfinifyOrNan, because it works with GCC as well as MSVC++.

John McPherson, 15 years ago

This reminds me very much of the forth programming language. Forth is a stack oriented language, push two numbers onto the stack and then operator and forth returns the result… Link: http://en.wikipedia.org/wiki/Forth_(programming_language)

Regards,

John McPherson

Phil Berdeski, 13 years ago

I am in Harbor Springs, Michigan for this two weeks, north of 45 degrees North Latitude, looking for a printf format code for unsigned long long to print a 64-bit register with 2^64-1, using gcc. Siberia ... wow! /s/ Berdeski

Peter Kankowski, 13 years ago

Warm greetings from Siberia :) The code for GCC is %llu according to Wikipedia. For MSVC++ compiler, it is %I64u.

Reverse Polish Notation and Expression Compiler