Abstract:

Created by Peter Kankowski
Last changed
Filed under Interpreters and compilers

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

Introduction: Writing a simple expression evaluator

Let's make an expression evaluator that takes strings like "2 * 3 + 1.5", evaluates them, and returns the result (7.5). The four arithmetic operators and parentheses are allowed in the expression. You can use this simple program instead of your desktop calculator, or you can embed it in a larger application (for example, it can be useful in accounting software).

You may think about reading an expression sequentially and do the calculations (for the example above: 2 * 3 = 6, then 6 + 1.5 = 7.5). But, if you do so, you will get wrong results for expressions like "3 – 1 / 2" or "1 + 3 * (25 – 4)". The operators have different priorities; that's why you need a more complex algorithm.

Recursive descent parsing

A popular approach (called recursive descent parsing) views an expression as a hierarchical structure. On the top level, an expression consists of several summands:

1.5 + 2 + 3.4 * (25 – 4) / 2 – 8. Summands are 1.5, 2, 3.4 * (25 – 4) / 2, and 8

Summands are separated by "+" and "–" signs. They include single numbers (1.5, 2, 8) and more complex expressions [3.4 * (25 – 4) / 2]. (Strictly speaking, 8 is not a summand here, but a subtrahend. For the purposes of this article, we will refer to the parts of expression separated by "–" as summands, though a mathematician would say it's a misnomer.)

Each summand, in turn, consist of several factors (we will call "factors" the things separated by "*" and "/"):

3.4 * (25 – 4) / 2. Factors are 3.4, (25 – 4), and 2. In '1.5', there is one factor, 1.5

The summand "1.5" can be viewed as a degenerated case: it constists of one factor, 1.5. The summand "3.4 * (25 – 4) / 2" consists of three factors.

There are 2 types of factors: atoms and subexpressions in parenthesis. In our simple expression evaluator, an atom is just a number (in more complex translators, atoms can also include variable names). Subexpressions can be parsed in the same way as the whole expression (you again found the summands inside the brackets, then factors, and then atoms).

The hierarchy of summands and factors

So, we have an hierarchy, where atoms constitute the lowest level, then they are combined into factors, and the factors are combined into summands. A natural way to parse such hierarchical expression is to use recursion:

double ParseAtom(char*& expr) {
    // Read the number from string
    char* end_ptr;
    double res = strtod(expr, &end_ptr);
    // Advance the pointer and return the result
    expr = end_ptr;
    return res;
}

// Parse multiplication and division
double ParseFactors(char*& expr) {
    double num1 = ParseAtom(expr);
    for(;;) {
        // Save the operation
        char op = *expr;
        if(op != '/' && op != '*')
            return num1;
        expr++;
        double num2 = ParseAtom(expr);
        // Perform the saved operation
        if(op == '/')
            num1 /= num2;
        else
            num1 *= num2;
    }
}

// Parse addition and subtraction
double ParseSummands(char*& expr) {
    double num1 = ParseFactors(expr);
    for(;;) {
        char op = *expr;
        if(op != '-' && op != '+')
            return num1;
        expr++;
        double num2 = ParseFactors(expr);
        if(op == '-')
            num1 -= num2;
        else
            num1 += num2;
    }
}

double EvaluateTheExpression(char* expr) {
    return ParseSummands(expr);
};

The hierarchical structure is directly reflected in this code.

Comments to the code

We need to save the operation, because we cannot calculate the result when the second operand is still unknown (e.g., you cannot evaluate "2*3" when you have read only "2*"). The for(;;) loop allows us to parse several summands, e.g., in "1+2+3".

The program uses double data type, which is okay for most applications. If you want to use this code for accounting, you should change to fixed-point data type. Don't use floating-point arithmetic when dealing with money. (If you need an explanation, read about Currency type in Visual Basic 6.0. Also peruse this example in C# and this article from IBM.)

Error handling, parentheses and unary minus parsing are also added to the program below. Only three error messages are used: "parentheses don't match", "invalid character", and "division by zero". When the interpreter sees an unexpected character anywhere, it returns the "invalid character" error code. The position in string where the error occurred is also shown. If there are several errors, only one of them will be reported.

Download the expression evaluator, 40 KB

Recommended reading

 

The next part: Code generator
Peter Kankowski
Peter Kankowski

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created by Peter Kankowski
Last changed

12 comments

Ten recent comments are shown below. Show all comments

ac,
To anybody interested to learn the compiler theory properly I suggest "Engineering a Compiler", by Keith Cooper and Linda Torczon. Extremely well organized book.

"Red Dragon" was for a long time the most referred book, but it's not so convenient for a beginner. BTW there's even a "second edition" of it, a "Purple Dragon", thoroughly updated, I guess mostly by Monica S. Lam, but I still find "Engineering a Compiler" a better start.

After "Engineering a Compiler" (or together with it) a second book for somebody who wants to do some serious compiler-like code is certainly "A Retargetable C Compiler: Design and Implementation" by Chris Fraser and
David Hanson, which is an excellent "case study". EAC book gives you enough of "really-why" and ARCC book enough of "high-quality-how".
ac,
One good online book which nicely demonstrates practical compiler construction is:

Nicholas Wirth's "Compiler Construction"
http://www.oberon.ethz.ch/WirthPubl/CBEAll.pdf

The code and the target are Oberon. Oberon is the language designed to be small and explainable so it's convenient to demonstrate the concepts, and it's not hard to understand.
James Juno,
Excellent and concise introduction. As a trained physicist, I've not spent the time a computer scientist has in this area and this short description has proved invaluable for my current in-house project. Thank you very much!
Mohit,
Nice work. Here is a Java based version of Expression Evaluator on similar lines http://code.google.com/p/xpressionengine/
Xupicor,

Pretty nice, although I like Reverse Polish Notation better. Just combine the RPN expression evaluation algorithm with Djikstra's infix to postfix(RPN) algorithm, and you have expression evaluator that is highly configurable - you can implement whatever operators you want (very easy, just add a new operator with its symbol, precedence, left or right association, number of arguments and function that will process the data), have functions as operators, have variables (very easy to add), strings and string variables (a bit more work, but still fairly easy) and at a bit higher level you can put language constructs like loops.

Voila, you've got yourself a very basic scripting language without ever knowing about flex or bison. ;)

Peter Kankowski,
Thanks, I already did what you described in the next parts of the article.
asim sajjad,

good efforts!i appriacet all ur works

Conor,

Of all the examples Ive perused, this is the easiest to get your head around recursive descent parsing. Also, I have to mention Game Scripting Mastery to anyone without a deep knowledge of this subject (grammars etc). Look for a torrent on google if you're interested.

UncleMarvo,

You're a star! Thank you!

Superb, clear code, and written as a Class too.

Adam Novagen,

Thank you so much for this extremely clear, concise, simple and VALUABLE article! Thanks to this - and the code you provided - I was able to reverse-engineer a perfectly operational version for BlitzMax in a single day... And I hardly even know any C++! This here is a treasure, and makes it so easy to grasp that I was even able to add exponents into the heirarchy with no bother at all!

Thank you!

Your name:


Comment:


Please ignore this field: