The future of compilers

In past, there was a clear difference between compiled languages and interpreted languages. The former were statically typed, often had manual memory management and powerful optimizing compilers (C, Pascal, Ada). The later were dynamically (often weakly) typed, interpreted, and were used for scripting and learning (BASIC, AWK, Unix shells). Today, the distinction is blurred: interpreters became fast enough by using just-in-time compilation (JavaScript, Lua) and compilers got convenient features making them more usable for scripting (automatic memory management, type inference). In this article, I tried to imagine how programming languages will look in future.

No explicit types

First, there will be no syntactic difference between statically and dynamically typed languages. You will not have to specify type for any variable explicitly. A simple language implementation will use dynamic typing, and a more complex one will statically infer the types to generate better code. For example, to count the number of unique elements in array, you now have to write:

// C++
using namespace std;

int count_unique_words(vector words) {
	map unique_words;
 
	for(vector::const_iterator it = words.begin(); it != words.end(); it++)
        unique_words[*it] = true;
 
	return unique_words.size();
}

Type inference can be used in C++0x to deduce the type of iterator, but not the type of unique_words or words. The language is verbose and nagging.

// Python
def count_unique_words(words):
	unique_words = {}
	for word in words:
		unique_words.setdefault(word, True)
	return len(unique_words)

Python code is close to ideal, but the interpreter have to check the variable types at runtime. In future, it will be as efficient as C++ by using type inference. If the count_unique_words function is called twice — with an array of integers and an array of strings, — the compiler will generate two separate functions (similar to C++ templates).

Choosing the optimal data structure

High-level scripting languages used to be slower, because the interpreters have to use the most common type required by the language semantics. For example, if arrays are always associative (as in PHP and AWK), they have to use a hash table even for arrays indexed by small integers. That's why Python and Perl introduce a special type for associative arrays (Python also has "optimized" tuples and iterators). Lua solves the problem by splitting an array into two parts (an array and a hash table) and dynamically choosing in which part to put the element.

In future, compilers will be able to statically analyze the code and choose the most appropriate data structure. For example, if an array is only used as stack (by calling push and pop methods), a linked list may be better than a hash table or an array.

Background incremental compilation

For most programming languages, compiler is a separate command-line program. A programmer types the source code in IDE, then waits while it is compiled:

Sequentially: programmer writes the code, compiler transforms it to machine code

Modern IDEs such as Visual Studio 2010 analyze your code, check its syntax and remember symbols for IntelliSense tooltips. A full compiler can be implemented in the same way: it can try to compile the function you are typing now (or just have finished to type). If any global definitions are changed, it recompiles the affected code. Global optimizations can be enabled by storing call graph and recompiling the affected functions. For example, if the current function is inlined, all functions that call it must be recompiled. A compiler can store their ASTs to speed up the process.

In parallel: programmer writes the code and compiler transforms it to machine code piece-by-piece

This incremental compiler will have to do more work than the current compilers, because the previous versions of compiled functions will be often thrown away. However, it will compile the code while it is edited, so the programmer will not have to wait for the compilation. Multicore processors can compile several functions simultaneously (already implemented in MSVC++). The symbol table, which is updated as you type the code, can be used for IntelliSense-like tooltips. Furthermore, unit tests can be run in background after finishing compilation.

Conclusion

The ideas described above will make a compiler more complex, but the current compilers are already very sophisticated, so we can expect that incremental compilation and intelligent choice of data structure will be implemented in the nearest future.

1 comment

It´s no future !, 12 years ago

The future will remain the same!

The future will continue exactly as it is ... that is, JAVA is who will lead the programming market ...

And why?

Because Java was simply the programming language chosen by the IT industry ... that is, she was voted the "darling" among

all languages available ...

With this election, JAVA can have at their disposal the best brains in the area, massive capital investments and everything else that is

required to maintain "the leadership" of the software production area ...

This "phenomenon" is not only observed in the IT market ... all other industries also have their "elected" without

no apparent reason, nothing to justify ... just the fact that a group met and elected someone as "the darling of the area"!

The future of compilers

No explicit types

Choosing the optimal data structure

Background incremental compilation

Conclusion

About the author

1 comment

Featured pages

Recent comments