Adding variables to expression evaluator

In the previous part of expression evaluator series, we completed a simple compiler for arithmetic expressions. Now let's add support for variables and improve the code.

Lexer

The first improvement is separating lexer from parser. In the previous version, splitting the expression into tokens was intermixed with token processing. Now, there is a separate function, NextToken, which retrieves the next token. A token may be:

one of the following operators: + - * / = ( );
a floating-point number;
a name of variable.

How can you return all these kind of data types from NextToken? The function returns a char, which can be an operator or two special values (TOKEN_NUM and TOKEN_VAR). In the latter case, the calling function should look for the value in _number and _var variables.

Variables

The names and addresses of variables are stored in a hash table. A simple fixed-size (1024 elements) table and Bernstein hash function were used. Both provide good performance for this task.

The code to process variables is now very simple. When we encounter a variable name, we check if it is an assignment or usage of previously assigned variable. If the variable is assigned, we emit the code to calculate the expression and store the result in memory. If the value of variable is used in the expression, we just load it from memory:

if(_cur_token == TOKEN_VAR) {
    EVAL_CHAR* var_name = _op_start;
    if( NextToken() == '=') {
        double** stored_to = _var;
        NextToken();
        ParseSummands();
        *stored_to = _code.EmitStoreNumber(ExprCode::P_LEAVE);
    } else {
        if(NULL == *_var) {
            _op_start = var_name;
            ReportError(EEE_UNDEF_VAR);
        }
        _code.EmitLoadVariable(*_var);
    }
    if(negative)
        _code.EmitOperation(OP_CHANGE_SIGN);
}

The evaluator accepts C-style assignments like a = b = 2 or 3 + b = 2 (which means b = 2 and print 3 + b). We could implement Basic-style assignment, which allows assignment only to one variable in an expression, but this would require more complex code. We would have to push token back if the variable is the first token, but there is no assignment (e.g., a + 5). So, simpler C-style assignments were chosen.

As an exercise, you can implement pre-defined variables. For example, add pi variable for "π" number.

Conclusion

This code illustrates some techniques widely used in compilers. Most translators (including commercial C++ compilers) store variables in hash tables and separate lexer from parser. By studying this toy example, you can learn something about "real" compilers.

Download the expression evaluator, 43 KB

The previous part: Reverse Polish notation

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created 15 years ago by Peter Kankowski
Last changed 13 years ago

16 comments

Ten recent comments are shown below. Show all comments

Keny Ordaz, 13 years ago

Thank you, Petya! Do you think that implementing FCOMP could be easy? I mean for introducing > or < or ==, and to result 0 for false and 1 for true.

Thanks again!

ace, 13 years ago

Keny, your request sounds like you've got a homework assignment and now you expect somebody to do it for you.

Peter Kankowski, 13 years ago

IMHO, Keny asked an interesting question. The obvious solution is:

; st0 and st1 contain the numbers to compare.
; If they are equal, 1.0 is returned; if not, 0.0 is returned.
fcompp
fnstsw ax
shr ax, 14
and ax, 1
mov [tmp], ax
fild word[tmp]

However, FSTSW is slow (see Agner Fog's manuals). A better solution uses FCOMIP and FCMOVcc instructions (available on Pentium Pro and higher):

fcomip st0, st1
fstp st0
fld1
fldz
fcmove st0, st1
fstp st1

It's both shorter (less code bytes) and faster.

ace, 13 years ago

I understood the question as a request for the expression evaluator which evaluates relational operators.

Peter Kankowski, 13 years ago

The expression evaluator should generate this code snippet for == operation. For < and >, it should be modified (FCMOVB and FCMOVNBE instead of FCMOVE).

Darko Nikolic, 13 years ago

Thank you for this very good article series.

I have a questions considering the STRINGS or ARRAYS, because you have shown how to deal with numbers INTEGERS and DOUBLES, but ARRAYS of any type are also interesting.

How to allocate a large STRING or any ARRAY type variable in .data segment?

Should I allocate more pages from memory to hold this?

Can I dynamically allocate additional memory if needed during run-time?

Thank you very much for your answers!

Regards,

Darko

Peter Kankowski, 13 years ago

Please read the latest article about dynamic arrays.

Darko Nikolic, 13 years ago

My question was connected to my current work.

I am building a scripting engine and execution of native code from memory buffer sounds very good, considering the speed.

It looks like that Google Chrome V8 JavaScript engine is built on this idea.

So, that is what I was wondering while reading your articles.

Can this be applied to the scripting subsystem?

It looks like it can, and that it should be done in this manner.

Thank you again.

Regards,

Darko

Peter Kankowski, 13 years ago

Darko, sorry, I misunderstood your question. You can define operations on strings and arrays in the expression evaluator. For example, you can write functions for allocating, concatenating, and slicing the strings. The expression compiler will generate calls to these functions. String literals can be stored in a preallocated array, and string variables can be allocated at runtime.

In a dynamically typed scripting language, you will also have to check types of the variables, generate function prologues and epilogues, allocate local variables on stack, etc. It's more complex than a p-code virtual machine, but IMHO, it's possible. Few years ago, I wanted to write a simple version of such compiler myself.

Darko Nikolic, 13 years ago

Thank you, Peter, for your answer.

Regards,

Darko

Adding variables to expression evaluator

Lexer

Variables

Conclusion

About the author

16 comments

Featured pages

Recent comments