Reverse Polish Notation
By convention, we put an operator (+) between two numbers:
2 + 2 = 4
But you also can put it before or after the operands:
+ 2 2 2 2 +
The first way is called prefix notation (LISP uses it), and the second is Reverse Polish Notation (it was invented by Jan Łukasiewicz, a Polish mathematician).
Here are more complex examples of RPN:
Reverse polish notation | Common (infix) notation | |
---|---|---|
(a) | 7 4 + 3 − | 7 + 4 − 3 |
(b) | 1 2 * 3 + | 1 * 2 + 3 |
(c) | 1 2 + 3 * | (1 + 2) * 3 |
(d) | 12 3 / 2 / | 12 / 3 / 2 |
(e) | 1 2 * 3 4 * + | 1 * 2 + 3 * 4 |
(f) | 5 9 2 * + | 5 + 9 * 2 |
You can read the example (a) as "7 is taken; 4 is added to it; 3 is subtracted from it". So, it will be written as 7 4 + 3 −. And (b) will be: "1 is taken; 2 is multiplied by it; 3 is added to it". (It sounds better in a language with free order of words.) Spelling the expression is a good exercise that helps to understand RPN. Try to read aloud the expressions (c) and (d) in your native language.
If RPN is new for you, you should also try the RPN calculator (or its real-world analog, if you can get one).
The examples (b) and (c) show that there is no need for parentheses in RPN. All operators have the same priority; they are performed in the order they are written.
RPN interpreters use operand stack for evaluation (you can read about it in the Wikipedia article). The lines (e) and (f) are examples of more complex expressions that can be evaluated on stack.
In fact, x87 FPU instructions are an advanced form of RPN, where the terms are written as separate instructions:
(a) (b) (e) (f)
fld 7 fld 1 fld 1 fld 5
fld 4 fld 2 fld 2 fld 9
faddp fmulp fmulp fld 2
fld 3 fld 3 fld 3 fmulp
fsubp faddp fld 4 faddp
fmulp
faddp
Here "fld 7" is used as a short way to write
seven dq 7.0
fld seven
Many VMs (including CLR and Java VM) use a similar notation.
Expression compiler
Now you know enough to integrate code generator with the expression evaluator. Instead of doing calculations, ExprEval will generate the code by calling ExprCode's methods. For example, the ParseFactors function will be:
// Parse multiplication and division
void ParseFactors() {
ParseAtom();
for(;;) {
// Save the operation and the position
EVAL_CHAR op = *_cur_char;
if(op != '/' && op != '*')
return;
_cur_char++;
ParseAtom();
// Emit the saved operation
_code.EmitOperation(op == '*' ? OP_MUL : OP_DIV);
}
}
With the code generator class, writing a compiler it's not harder than writing an interpreter.
Catching division by zero
There is no portable way to create an exception handler for division by zero (MSVC++ CRT contains the _matherr function, but it's Microsoft-specific). However, you can check if the result is −∞ or +∞, which will catch the division by zero:
bool InfinifyOrNan(double num) {
unsigned second_byte = *((unsigned*)&num + 1);
const unsigned EXPONENT_MASK = 0x7FF00000;
return (second_byte & EXPONENT_MASK) == EXPONENT_MASK;
}
In IEEE floating point format, the value is infinity if its exponent contains "all ones" pattern.
Changes and improvements
From the ParseFactors code, you can see another change in ExprEval: the pointer to a current character is now a class member (_cur_char). This simple modification saves space on stack and makes the program faster.
If you define DUMP_RPN, the program will dump the compiled code. You can use this feature to study RPN. For example, try to compile the expressions (a)-(f) above.
Recursive descent algorithm used in this series implements a top-down approach to parsing. Shunting yard is an example of another approach (bottom-up). The recursive descent algorithm is easier to understand, though it uses more memory on stack for a complex grammar.
Download the expression compiler, 41 KB
Recommended reading
- The second chapter of the Red Dragon book contains a example of translator from infix notation to RPN using resursive descent parsing.
- Another article on infix to RPN translation using shunting yard algorithm.
- Shunting yard algorithm on Wikipedia.
The previous part: Code generator
The next part: Variables in expression compiler
5 comments
http://msdn.microsoft.com/library/en-us/vclib/html/_crt__finite.asp
This reminds me very much of the forth programming language. Forth is a stack oriented language, push two numbers onto the stack and then operator and forth returns the result… Link: http://en.wikipedia.org/wiki/Forth_(programming_language)
Regards,
John McPherson
I am in Harbor Springs, Michigan for this two weeks, north of 45 degrees North Latitude, looking for a printf format code for unsigned long long to print a 64-bit register with 2^64-1, using gcc. Siberia ... wow! /s/ Berdeski
%llu
according to Wikipedia. For MSVC++ compiler, it is%I64u
.