In the previous part of expression evaluator series, we completed a simple compiler for arithmetic expressions. Now let's add support for variables and improve the code.
Lexer
The first improvement is separating lexer from parser. In the previous version, splitting the expression into tokens was intermixed with token processing. Now, there is a separate function, NextToken, which retrieves the next token. A token may be:
- one of the following operators:
+ - * / = ( ); - a floating-point number;
- a name of variable.
How can you return all these kind of data types from NextToken? The function returns a char, which can be an operator or two special values (TOKEN_NUM and TOKEN_VAR). In the latter case, the calling function should look for the value in _number and _var variables.
Variables
The names and addresses of variables are stored in a hash table. A simple fixed-size (1024 elements) table and Bernstein hash function were used. Both provide good performance for this task.
The code to process variables is now very simple. When we encounter a variable name, we check if it is an assignment or usage of previously assigned variable. If the variable is assigned, we emit the code to calculate the expression and store the result in memory. If the value of variable is used in the expression, we just load it from memory:
if(_cur_token == TOKEN_VAR) {
EVAL_CHAR* var_name = _op_start;
if( NextToken() == '=') {
double** stored_to = _var;
NextToken();
ParseSummands();
*stored_to = _code.EmitStoreNumber(ExprCode::P_LEAVE);
} else {
if(NULL == *_var) {
_op_start = var_name;
ReportError(EEE_UNDEF_VAR);
}
_code.EmitLoadVariable(*_var);
}
if(negative)
_code.EmitOperation(OP_CHANGE_SIGN);
}
The evaluator accepts C-style assignments like a = b = 2 or 3 + b = 2 (which means b = 2 and print 3 + b). We could implement Basic-style assignment, which allows assignment only to one variable in an expression, but this would require more complex code. We would have to push token back if the variable is the first token, but there is no assignment (e.g., a + 5). So, simpler C-style assignments were chosen.
As an exercise, you can implement pre-defined variables. For example, add pi variable for "π" number.
Conclusion
This code illustrates some techniques widely used in compilers. Most translators (including commercial C++ compilers) store variables in hash tables and separate lexer from parser. By studying this toy example, you can learn something about "real" compilers.
Download the expression evaluator, 43 KB
The previous part: Reverse Polish notation
I've thoroughly enjoyed the education on machine code, which I'm applying to AIMGP.
And what is AIMGP?
It's an AI search algorithm where you generate machine code "organisms", test their fitness through execution of the raw machine code(on a set of test data), then perform crossover/mutation etc on the fitter organisms.
One of the more complex parts is learning machine code and generating it on the fly at run time.
A parser/lexer/tree generator written with the goal to be better than all yacc-based-and-like tools, written in Java but with the C target possible. BCD license.
http://www.antlr.org/
I haven't tried it. The C runtime has around 10000 lines of C (!) which I consider a little too much for something small. But maybe it's good for experimenting or doing something big.
p.s.
I love Red Dragon book (it is one of my fav. computer books) and i don't think that it is so "boring". I highly recomend it even as a starter.