A toy compiler of a Scala subset

I got curious about the inner workings of compilers a long time ago.

But only recently had  a chance to devote more time and effort to the topic. As a result, I finally wrote a compiler.

Why write another toy compiler? Honestly, in terms of education or interesting ideas it hardly has anything to offer over existing mini compilers. Still, a few things should set this one apart.

For starters, the language. Many toy/teaching languages pretty much have only functions and simple values. They seem too primitive to me. Our language’s name is Cool 2020, it’s a Scala subset. Cool 2020 is concise and easy to implement but retains many features of grown-up programming languages: static typing, classes, dynamic dispatch, and even the simplest form of pattern matching.

Then there is a question: what flavour of assembly language the compiler should emit? Compiler courses usually pick MIPS assembly for generation and run it using an emulator. They surely have good reasons for doing this, but to me being able to run the emitted assembly as a native OS executable feels infinitely more satisfying. So, we’re going to emit x86-64 assembly for Windows and Linux.

Go to the project's repository to explore the source code.

Complexity of toy compiler design

This is not a tutorial

But why write a compiler?!

What are we going to compile?

Why a Scala subset?

How are we going to compile?

Compile-time errors

Compilation stages

Lexing and parsing




Runtime library

What can our runtime do?

What can not our runtime do?

Executable linking

Intermediate representations and optimizations

If you want to learn about compilers

Advanced topics

This is not a tutorial

Scroll to the end  for links to a couple awesome tutorials. Even if I’d tried to make this a tutorial, it couldn’t have begun to compare with one of those. A side effect of working on this write up is I definitely gained a new appreciation for just how talented the authors of the linked tutorials are.

Instead, let’s talk about design decisions and  difficulties that came up while planning and writing the code. Also, we can discuss the big picture of the project’s structure. Plus, while coding up the compiler, I collected a number of tips, war stories, and interesting opinions from compiler gurus that seem worth sharing. If you’re curious about any of this stuff, I’m happy, read on!

But why write a compiler?!

A long time ago in a company far, far away I joined a C++ project.

We used Visual Studio 6.0 to work on it. Soon after I joined, teammates introduced me to Visual Assist. It’s a plugin that improves and extends what Visual Studio can do with C++ code. At least back then, the plugin really impressed me: I felt like it’s powered by a general artificial intelligence engine, no less.

While Windows was our primary platform, we supported building and running the project on Linux. To edit the code we used KDevelop or Eclipse CDT. At the time, even an out-of-the-box install of Visual Studio left these two in the dust in terms of working with C++ code. Visual Assist was simply decades ahead. On the other hand — as I learned soon enough — if you asked to suggest an IDE for Linux, people would respond that the Unix command line was the most advanced IDE ever.

So, a light bulb went on in my head: I have a chance to contribute big-time to the open-source community! Linus Torvalds himself will shake my hand. Maybe Red Hat will even give me some stock options as a present. All I have to do is improve an existing Linux C++ IDE so it’s at least as good as Windows counterparts. Easy-peasy!.. Probably, this was the first time I started wondering how IDE-s and compilers “understand” code.

Well, life had different plans. I changed jobs and switched to C# at the new company. So, C++ IDE-s, compilers, and Linux had to take a back seat for a long time.

Finally, in 2019 I decided I cannot postpone wrapping my head around compiler construction any longer. What is the best way to learn? Of course, to apply whatever you’re learning in practice. So, this project was kicked off.

What are we going to compile?

Let’s first get a quick taste of our programming language and dive into details a bit later. Here’s a small complete program in Cool 2020.

class QuickSort() extends IO () {
def quicksort(array: ArrayAny, lo: Int, hi: Int): Unit = {
if  (lo < hi) {
var p: Int  = partition(array, lo, hi);
     quicksort(array, lo, p -
1 );
     quicksort(array, p +
1, hi)
else ()
def partition(array: ArrayAny , lo: Int, hi: Int): Int  = {
var pivot: Int = array.get(lo) match { case i: Int  => i };
var p: Int = lo;
var i: Int = lo + 1 ;
while  (i <= hi) {
if  (((array.get(i)) match { case i: Int  => i }) <= pivot)
       array_swap(array, i, { p = p +
1 ; p })
     i = i +
   array_swap(array, p, lo);
def array_swap(array: ArrayAny, p: Int, q: Int): Unit  = {
var tmp: Any  = array.get(p);
   array.set(p, array.get(q));
   array.set(q, tmp)
def out_array(array: ArrayAny): Unit = {
var i: Int = 0;
while  (i < array.length()) {
match  {
case  i: Int  => out_int(i); out_string( " " )
     i = i +
var array: ArrayAny = new ArrayAny(5);
0, 30);
1, 20 );
2, 50);
3, 40);
4, 10 );
0, array.length() - 1 );

class Main() {
new QuickSort() };

Use this link  to run the code.

Alex Aiken came up with Classroom Object Oriented Language or COOL  for their Compilers course . While the course was one of my primary studying resources, I picked another language for the toy compiler — Cool 2020 . The names are similar but the languages are fairly different. While COOL is an invention of Alex Aiken, Cool 2020 is a Scala subset (with a couple minor incompatibilities). John Boyland extracted and documented it.

Why a Scala subset?

Firstly, edX hosts the Compilers course. Their Honor Code  has a clause that prohibits posting “answers to problems that are being used to assess learner performance”. Does this clause cover the code that a student writes doing the course’s exercises? I’m not sure, but would like to steer clear of violating it as much as possible.

Secondly, the course gives some amount of starter code  and structure for its assignments. The student is supposed to fill in the missing pieces. Doing it is a ton of work, but still a bit more help and guidance than my preference for this particular project.

Out of other educational programming languages I found on the Internet, Cool 2020  seemed to be the best fit. We’ve seen a sample program earlier, you can take a look at an implementation of Conway’s Game of Life  and the grammar  in the project’s repository. Please notice, the grammar ignores the operations precedence for the sake of simplicity. The precedence of operations is described here .

Going with a Scala subset brings us some advantages. One, as Scala is quite a modern language, Cool 2020 too has features typical of modern programming languages. Thanks to this, we get to think about and solve problems similar to the problems developers of modern programming languages solve, though, obviously, on a hugely smaller scale.

An example of such a feature is the way we write constructors in Cool 2020. Its parameters appear in parentheses just after the class name. And the constructor’s code can be split across a number of code blocks appearing in curly braces inside the class body.

Let’s take a look at a bit of code illustrating this feature. The constructor of the Greeter class below has one parameter name . When the compiler sees this constructor parameter, it automatically adds an attribute name to the class, and emits the code to assign the parameter to the attribute. The method speak then uses the attribute name.

class Greeter(var name: String) {
var io: IO = new IO();
def speak(): Unit  = {
"Hello, "  + name + "!" );

class Main() { { new Greeter("World" ).speak() }; }

Two, we can use Scala tools. Syntax highlighting in editors and in this document is obviously very nice to have. Scastie  runs Scala code right in the browser. It is going to be really useful for us too as we can run Cool 2020 programs in Scastie.

Here are the bits of code we’ll need to run Cool 2020 in Scastie.

class ArrayAny(len: Int) {
var  _array = new Array[Any](len);
def length(): Int = _array.length;
def get(i: Int): Any = _array(i);
def set(i: Int, value: Any): Unit = _array(i) = value;

class IO  {
def out_string(s: String): Unit  = print(s);
def out_int(i: Int): Unit  = print(i);
def out_nl(): Unit = println();

object Program  {
def main(args: Array[String]): Unit = {
new Main()

That’s it about Cool 2020 for the moment. If you happen to be looking for a language to implement in your own compiler, make sure to also take a look at ChocoPy . It’s very well documented, has static typing, classes, dynamic dispatch, nested functions, etc. At the same time ChocoPy manages to be concise and possible to implement in a limited time frame.

How are we going to compile?

Here’s how.

Well, to reproduce what happens in the gif, we’ll need to write a compiler. And to do that we need to pick a programming language to write the compiler in.

In 2017 I came across the Programming Languages course  on Coursera. Part A is devoted to Standard ML (SML) . To say the instructor is great would be a huge understatement. The code we wrote for assignments used only immutable values, recursion instead of loops, partial application. It didn’t make me a hardcore functional programming proponent, but seriously broadened my horizons. On top of that, we used pattern matching extensively, which in combination with discriminated unions impressed me the most: using these two to model a lot of problems felt really natural.

The course left me wanting to write more SML. At work we develop in C#. Which means I’m most familiar and comfortable with the .NET platform. Basically, it made all kind of sense to me to try SML’s closest relative on .NET — the F# programming language . So, I tried F# in a couple smaller  projects and started looking for an opportunity to use it for something bigger.

A toy compiler was exactly the bigger project I was looking for. Add to this the fact the Internet is full of comments praising SML-family languages for just how good they are for writing compilers. Then notice the pretty much classic Tiger book  has its code samples in SML. And you’ll see how it was a no brainer to settle on F# for this project.

Now, that we’ve decided on F#, let’s  get one thing out of the way: please don’t expect to find purely functional code in the repository. While writing the code, I quickly realised trying to stick to functional programming concepts and learning about compilers was a little bit more than I could handle at the same time. So, there is mutable state, loops, and classes in the code.

Compile-time errors

Before we get down to coding, let’s do a bit of planning. One thing to decide on is what our compiler is supposed to do when it encounters an error in the user’s code.

The simplest thing to do is to output a message telling the user about the error and stop translation once the compiler found the first error. As a result, if there are multiple errors in the code, the user will have to restart compilation multiple times to catch them all. It doesn’t make for a good user experience.

A very different approach exists. When the compiler detects an error it informs the user, but instead of stopping, it keeps analysing the code. The goal is to find and report as many errors as possible in a single run. So that the user can fix all of them and then —  ideally — restart compilation only once.

Let’s discuss one downside  of this approach. It requires the compiler’s code to be noticeably more complicated. As the compiler has to use heuristics reducing cascading errors.

What’s a cascading error? The Scala code sample below contains one error. The closing parenthesis is missing in line 2.

/* 1 */ class CascadingError {
/* 2 */      def force(: Unit = {
/* 3 */      };
/* 4 */  }

But the Scala compiler diagnoses two errors . It gets “confused” after seeing the first one, which is real. As a result, it reports the second error in line 3, where there is no actual problem with the code, — this is a cascading error.

CascadingError.scala:2: error: identifier expected but ':' found.
   def speak(: Unit = {
CascadingError.scala:3: error: ':' expected but ';' found.

In some cases, cascading errors make diagnostic messages so noisy, users start looking for a way to make the compiler stop after the first encountered error . Trying to figure out which errors are real and which are just cascading and will go away when the user fixes the real ones is just not worth it in such cases.

To try and get the best of both worlds, some compilers combine the two error handling strategies we just discussed. A great example of going this way is the C# compiler. Here’s a short description Eric Lippert gave on StackOverflow :

Briefly, the way the compiler works is it tries to get the program through a series of stages...

The idea is that if an early stage gets an error, we might not be able to successfully complete a later stage without (1) going into an infinite loop, (2) crashing, or (3) reporting crazy "cascading" errors . So what happens is, you get one error, you fix it, and then suddenly the next stage of compilation can run, and it finds a bunch more errors.


The up side of this design is that (1) you get the errors that are the most "fundamental" first, without a lot of noisy, crazy cascading errors, and (2) the compiler is more robust because it doesn't have to try to do analysis on programs where the basic invariants of the language are broken. The down side is of course your scenario: that you have fifty errors, you fix them all, and suddenly fifty more appear. [emphasis added]

Let’s see how this idea translates into the C# compiler’s source code . The method CompileAndEmit of the class Microsoft.CodeAnalysis.CommonCompiler  organizes the work related to compilation of C# and VB.NET. Notice the invocations of compilation.GetDiagnostics , each one has a different value from the enum CompilationStage  as its argument. Every invocation is followed by a number of checks: did the translation stage complete successfully, should the translation stop or move to the next stage?

Compilation stages

We’re going to follow the combined error handling strategy as best we can. Of course our code is going to be much simpler than the C# compiler’s. Let’s split compilation into the following stages.

Within one compilation stage, we’re going to try and diagnose as many errors in the user’s code as possible. But the next stage will only start if we don’t detect any errors at the current one. Let’s say the compiler found syntactic errors at the parsing stage, in such a case semantic analysis wouldn’t even start.

The same idea applies to the lexing and parsing stages. If there are portions of the program’s text we didn’t manage to turn into tokens while lexing, parsing isn’t going to start, the compiler will stop instead.

To see these ideas reflected in the project’s code, take a look at CompileToAsmStep.fs  and the method Invoke of EmitExeHandler .

Also, we simplify things a lot here compared to the C# compiler. As it performs lexing and parsing at the same time. If the C# compiler’s  lexer cannot recognize a token, the parser handles this scenario and keeps going anyway. But semantic analysis will not start, if lexical or syntactic errors exist in the code — translation terminates.

Lexing and parsing

First things first, we need a lexer and a parser.

The lexer’s goal is to take in source code as a sequence of characters and recognize tokens in it. A token can be a number literal, a string literal, an identifier, a language keyword, an operator, a punctuation mark, etc. In our specific case, we’ll throw away any whitespace and comments.

Next, the parser will try to convert its input sequence of tokens into an abstract syntax tree (AST).

An AST represents the structure of a program’s text in the form of a tree. In other words, as far as compilation goes, it’s hard to do anything useful with a character sequence. But a tree is a much more appropriate data structure to work with. That’s why we’ll organize the next translation stages around the AST.

A program’s text

The AST, serialized into JSON

class Main() {
var io: IO = new IO ();


          "Hello, World!" );

{ "classes": [
 { "name":
"Main" ,
   "varformals": [],
   "extends": { "type":
"Any" ,
                "actuals": [] },
   "body": [
    { "kind":
"attribute" ,
"io" , "type": "IO" ,
      "value": { "kind":
"new" ,
"IO" ,
                 "actuals": [] } },
    { "kind":
"block" ,
      "statements": [
       { "kind":
"stmt" ,
         { "kind":
"dispatch" ,
"io" ,
"out_string" ,
           "actuals": [
"\"Hello, World!\""  ] }
       { "kind":
"expr" ,
         { "kind":
"dispatch" ,
"io" ,
"out_nl" ,
           "actuals": [] }
   ] } ] }

There are a number of ways to get the AST from source code: parser generators such as GNU Bison  and ANTLR ; parser combinators libraries such as Parsec ; writing a lexer and a parser from scratch. Which one should we go with? Well, if we have a chance to gain experience at least remotely similar to writing an enterprise-grade compiler, we should take that chance.

As far as I can tell, the teams of most contemporary enterprise-grade compilers built their lexers and recursive-descent parsers by hand. Some of them are: GCC, Clang, D, Rust, Swift, Go, Java, V8, TypeScript, C#, VB.NET.

Here’s how one of the C# compiler’s developers explained  their decision to implement their lexer and parser manually.

Hello, I work on the C# compiler and we use a handwritten recursive-descent parser. Here are a few of the more important reasons for doing so:

The story of GCC’s parser seems to be quite a bright example. GCC started out using the Bison parser generator, but in 2004 the developers decided to switch  to a handwritten recursive-descent parser. Apparently, migrating turned out to be easier than trying to get around Bison's limitations. In the changelog , you can read the following.

A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing.

Now that we’ve considered all of the info above, the choice is clear: we’re going to write our lexer and parser by hand. In fact, it’s not difficult.

In their book Crafting Interpreters , Robert Nystrom explains writing lexers  and recursive-descent parsers  really, really well. Our parser is going to be recursive-descent too.

The lexer’s code is in Lexer.fs , if you feel like untangling it, consider starting from the method GetNext . The parser’s code is in Parser.fs , and the method Parser.Parse  is its entry point.


Once parsing completes, the next compilation stage is semantic analysis.

Eric Lippert wrote a very interesting blog post How many passes? . It lists some of the passes the C# compiler  performs and gives a bit of details on each. Spoilers ahead! There are more than twenty passes. Many of them are related to semantic analysis.

We’re going to perform semantic analysis over three passes. The method Translator.Translate  is responsible for starting the analysis passes one after another and checking their results.  

The first semantic pass is the simplest. The only thing it’s going to do is create a map, where the key is a class’ name and the value is the AST node corresponding to this class. We’ll need this map to look up class names during the second pass.

In case we meet two or more classes with the same name, we’ll report an error and stop compilation.

The first semantic pass is implemented by the function collect_class_nodes  from Translator.fs .

In the scope of the second semantic pass we’ll go over class definitions, their attributes and methods, but will not “look” inside the methods’ bodies. Along the way, we’ll form a class metadata map, where the key is again a class’ name, but the value this time is the list of the class’ attributes and methods with all the associated types. This map will come in handy while performing the third semantic pass. Also the second pass makes sure:

In case, the second pass finds any semantic problems, the compiler informs the user about every one of them and stops.

The second semantic pass is implemented by the class ClassSymbolCollector . Translator.Translate  sets up the second pass and starts it by a call to ClassSymbolCollector.Collect .

We’re at a point where we have the class metadata map, so it’s time to start the third semantic pass. What do we need the class metadata map for? Let’s say the third pass is looking at a method’s invocation instance.some_method().  Thanks to having the metadata map, we can make sure some_method  is actually defined in the class of instance.

The third semantic pass analyses methods’ bodies. It makes sure the code doesn’t contain attempts to assign a string value to an Int  variable. Or pass an Int argument to a method that expects a String. It also checks for other similar errors.

Another thing the third pass needs is a symbol table. A symbol table maps names from the current lexical scope to a collection of details on this name: does the name refer to a variable, a parameter, or an attribute; what’s its type, etc. If a name is not defined or not visible in the current scope, it won’t be present in the symbol table. Check out the class SymbolTable  to see the code.

Apart from diagnosing semantic errors, the third pass could build a new tree of the same shape as AST, but extended with additional info: the type of each node, list and description of parameters and the return type for method nodes, etc. To keep the amount of code down and its complexity in check, we’re going to skip generating an intermediate tree. Instead, we’ll combine the third semantic pass and assembly generation.

The class ExprTranslator  contains the third pass’ code. ExprTranslator.Translate  is the entry point.


OK, the parser built an AST for us. Along the way, the compiler made sure the code doesn’t contain syntactic or semantic errors. Now we’re going to walk the AST and emit assembly in the process. We’ll turn the assembly into an executable a bit later.

The Compilers course and many other learning materials have MIPS as their target for assembly generation. Surely, they have very good reasons for that and the fundamentals of generating assembly don’t depend on the particular instruction set architecture anyway. But in practice, I personally find generating x86-64 assembly much, much more motivating and rewarding. You need an emulator to run MIPS assembly — that is, unless you own an SGI workstation or a Tesla Model S. In contrast, executables produced from x86-64 assembly code can run natively on Windows or Linux right on a boring desktop PC.

Producing native executables has one more practical advantage, it makes it much simpler to automate compilation and execution of test programs to check the results of their work.

Here’s an example of a test program. If the results of building and execution of a test program don’t match the DIAG  and  OUT  comments respectively, we consider the test failed.

class Main() {
var io: IO = new IO();
"Hello, Cool2020!" );
// DIAG: Build succeeded: Errors: 0. Warnings: 0
// OUT: Hello, Cool2020!


Emitting assembly corresponding to control flow statements is rather straightforward. For example, imagine we had to organize a loop using only if  and goto  statements. To a first approximation, we organize loops in assembly in the same way.

Translating expressions is more interesting. A program in assembly language is a sequence of instructions, each has from zero to three operands. There is no assembly instruction that can evaluate a simplest expression like a + b + c + d . And expressions can be way more complex: contain parenthesis, different math operations, methods invocations...

So, what can we do? The idea is to “pretend” that instead of one complex expression we have multiple simple expressions. An expression is simple if we can trivially translate it into assembly. Let’s take a look at a bit of pseudocode.

Complex expression

Multiple simple expressions

a + b + c + d

var tmp0: Int;
var tmp1: Int;
var result: Int;

tmp0 = a;
tmp1 = b;
tmp0 = tmp0 + tmp1;
// tmp0 = a + b

tmp1 = c;
tmp0 = tmp0 + tmp1;
// tmp0 = a + b + c
tmp1 = d;
tmp0 = tmp0 + tmp1;
// tmp0 = a + b + c + d
result = tmp0      
// result = a + b + c + d

To better understand how this idea applies to practice, let’s follow a detailed example.

We’re translating a + b + c + d.

├── +
│   ├── +
│   │   ├── a
│   │   └── b
│   └── c
└── d

Let’s perform a post-order (LRN) traversal of this tree. When the post-order function finishes visiting both subtrees, it will print an assembly instruction corresponding to the current tree node. The post-order function will return the name of the register holding the result of the subtree’s expression evaluation.

If we trace the tree’s post-order traversal, we’ll see the following (using AT&T syntax ).

# Entering the root node `+`
# The node has two subtrees: `[+ (+ a b) c]` and `d`
# Recursively invoking the post-order function for the left subtree
#   Entering the `+` node of the subtree `[+ (+ a b) c]`
#   The node has two subtrees: `(+ a b)` and `c`
#   Recursively invoking the post-order function for the left subtree
#     Entering the `+` node of the subtree `(+ a b)`
#     The node has two leaf nodes as its subtrees: `a` and `b`
#     Recursively invoking the post-order function for the left subtree
#       Entering the `a` node
#       Picking an unused register %rbx to load the value `a` into it
#       Printing the assembly instruction
mov    a,   %rbx
#       Returning the name of the register %rbx
#       %rbx contains the value at the address `a`
#       Leaving the `a` node
#     Recursively invoking the post-order function for the right subtree
#       Entering the `b` node
#       Picking an unused register %r10 to load the value `b` into it
#       Printing the assembly instruction
mov    b,   %r10
#       Returning the name of the register %r10
#       %r10 contains the value at the address `b`
#       Leaving the `b` node
#     Printing the assembly instruction for (+ a b)
#     %rbx = a
#     %r10 = b
add     %r10, %rbx # %rbx = %rbx + %r10
#     Marking %r10 as unused
#     Returning the name of the register %rbx
#     %rbx contains the value of `a + b`
#     Leaving the `+` node of the subtree `(+ a b)`
#   Recursively invoking the post-order function for the right subtree

#     Entering the `c` node of the subtree `[+ (+ a b) c]`
#     Picking an unused register %r10 to load the value `c` into it
#     Printing the assembly instruction
mov    c,    %r10
#     Returning the name of the register %r10
#     %r10 contains the value at the address `c`
#     Leaving the `c` node
#   Printing the assembly instruction for `[+ (+ a b) c]`
#   %rbx = a + b
#   %r10 = c
add     %r10, %rbx # %rbx = %rbx + %r10
#   Marking %r10 as unused
#   Returning the name of the register %rbx
#   %rbx contains the value of `(a + b) + c`
#   Leaving the `+` node of the subtree `[+ (+ a b) c]`
# Recursively invoking the post-order function for the right subtree

#   Entering the `d` node of the tree `(+ [+ (+ a b) c] d)`
#   Picking an unused register %r10 to load the value `d` into it
#   Printing the assembly instruction
mov    d,    %r10
#   Returning the name of the register %r10
#   %r10 contains the value at the address `d`
#   Leaving the `d` node
# Printing the assembly instruction for `(+ [+ (+ a b) c] d)`
# %rbx = (a + b) + c
# %r10 = d
add     %r10, %rbx
# Marking %r10 as unused
# Returning the name of the register %rbx
# %rbx contains the value of `((a + b) + c) + d`
# Leaving the root `+` node

Yippee, we translated an expression into assembly code! The %rbx  register will contain the expression’s value at runtime. Let’s feast our eyes on the result one last time.


Assembly (using AT&T syntax)

a + b + c + d

mov     a,    %rbx
mov     b,    %r10
add     %r10, %rbx
mov     c,    %r10
add     %r10, %rbx
mov     d,    %r10
add     %r10, %rbx

The method Translator.Translate  kicks off assembly generation by invoking ProgramTranslator.Translate . There are multiple classes responsible for emitting assembly: ProgramTranslator , ClassTranslator , ExprTranslator  and  AsmBuilder .

Runtime library

A number of built-in classes are available in Cool 2020. Additionally, although they are not directly accessible from user code, there are procedures for allocating memory, copying objects, and several procedures for aborting a program in various invalid scenarios. Another important element of our runtime is the entry point that transfers control to user code, we’ll discuss it in more details later.

All the code described above lives in the runtime library (or simply runtime).

Stanford’s Compilers course gives students a pre-made runtime library. It targets the MIPS architecture, assumes a Unix-like operating system and implements the classes built into the Cool language.

At the same time, we decided to emit x86-64 assembly code, target Windows and Linux, and our language is a Scala subset — its built-in classes and the built-in classes of Stanford’s Cool are different.

By now, the attentive reader will have noticed, the runtime library from the Compilers course is not going to work for us as is. Instead, we can study its code and apply the gained insights to roll our own runtime library.

Compiler Explorer  (aka — godbolt.org) is a great help in writing relatively complex assembly code. For example, I used it to implement the string to number routine. I implemented the algorithm in C and used Compiler Explorer to translate it into assembly code . It took a lot of cleaning up to arrive at the final version of the code, but writing it from scratch would’ve been much, much harder for me.

What can our runtime do?

The runtime is made up of three parts. One is rt_common.s . Just as the name suggests, it’s common code that doesn’t depend on the platform. rt_common.s contains the process entry point — the routine main . It’s responsible for invoking the platform-dependent code’s initialization procedure, creating an instance of the class Main  and invocation of the constructor. Exiting the process is something main  is responsible for too.

Here’s a look at the entry point’s code.

Assembly (using AT&T syntax)


  .global main
call   .Platform.init

# A class 'Main' must be present

  # in every Cool2020 program.

  # Create a new instance of 'Main'.
movq  $Main_proto_obj, %rdi
call  .Runtime.copy_object

# 'Main..ctor' is a Cool2020

  # program's entry point.
# Pass a reference to the newly

  # created 'Main' instance in %rdi.
# Invoke the constructor.
movq   %rax, %rdi
call   Main..ctor

 xorq  %rdi, %rdi
jmp   .Platform.exit_process

def main () = {
Platform .init();



Also, rt_common.s contains the code of constructors and methods of the built-in classes.

The platform-specific rt_windows.s  and rt_linux.s  each contain five procedures.

What else is necessary to make the runtime cross-platform? Of course, the end of line constant. Both rt_windows and rt_linux define a label .Platform.ascii_new_line  pointing at the respective end of line characters sequence.

rt_windows.s and rt_linux.s are fully self-contained. As in they don’t depend on the standard C library or any other library. Instead they work directly with Win API on Windows and system calls on Linux.

So how do we make a Win API call from assembly code? First, we follow The Microsoft x64 Calling Convention . Second, it is very important to understand and keep in mind the concept of shadow space .

Things are a bit simpler on Linux: we don’t have to worry about the shadow space when performing a system call. We still need to stick to the platform’s calling conventions — for Linux it’s System V ABI . One place where you can find the list of system calls, their parameters and short descriptions is Linux System Call Table  from Chromium OS Docs.

What our runtime can not do?

It cannot make coffee. And it cannot collect garbage. At the moment, Cool 2020 programs allocate memory but never free it up in the course of their execution. Eventually, we’ll add a generational garbage collector inspired by the one from Stanford’s Compilers course to the runtime. And for now, we’ll just pretend we already have a garbage collector — fake it till you make it!

There are officially no plans of adding coffee making routines.

Executable linking

Whew! We took a Cool 2020 program as the input and emitted assembly code as the output. The runtime’s code that the emitted assembly depends on is also in place. Not that much left to do to finally turn all of these into an executable.

For starters, we need an assembler  that translates assembly code into machine code packaged into an object file . In other words, the assembler assembles assembly — programming is not without poetry, don’t you think?

Although object files contain machine code — i.e. the code that can be directly executed on the target CPU —  we cannot run them just yet. We’ll need to combine the object files generated from the user code and from the runtime code into an executable using a linker.

As we’re aiming to have the compiler cross-platform, it makes sense to go for the assembler and linker from GNU Binutils  — as and ld.

For Linux it’s a package basically every distro has. To give a specific example, on Ubuntu the package name is binutils.

Installing GNU Binutils on Windows is really simple too. MinGW includes the utilities in the distribution. If you don’t know what MinGW is, don’t bother right now. The important thing here is after installing it, we’ll have as and ld. Item 3 on this page  explains where to get a MinGW graphical installer from and how to use it. ( Advanced users can opt for an alternative way of obtaining MinGW. Install MSYS2 . You now have bash and pacman. Say pacman -S mingw-w64-x86_64-toolchain  to get the latest version of MinGW.)

On Windows, remember to add the path to as and ld to the PATH  environment variable. Otherwise, our compiler won’t find them and will fail to generate executables.

The EmitExeHandler.Invoke  is the method that runs as and ld and checks the results of their work.

And so, we have reached the point where our compiler can create executables from Cool 2020 programs! Now we have everything we need to record that GIF from "How are we going to compile?" .

Intermediate representations and optimizations

Our compiler does not generate an intermediate representation (IR) of the program and does not perform any optimizations. Intermediate representations and code optimizations are extremely important, probably the most important, topics in compiler development, but working on them also takes a lot of time and effort. For this reason, I opted for taking them out of this project’s scope, at least for now. Hopefully, it doesn’t take away from studying the fundamentals of compilers. Better still, understanding IR and optimizations is much easier when you have a solid understanding of the basics.

Anyway, intermediate representations and optimizations are exceptionally important and very interesting topics in compiler development. That’s why teams of enterprise-grade compilers as well as computer scientists work really hard on them. How fast compiled programs work and their memory consumption directly depends on the optimizations compilers perform. There are a couple of links explaining these things in more detail in the “Advanced topics”  section.

If you want to learn about compilers

Cannot recommend highly enough the book Crafting Interpreters  by Robert Nystrom. Don’t let the word Interpreters in the book’s title discourage you. Compilers and interpreters have a lot in common and until you get to the topics specific to interpreters, you’ll gain a ton of useful knowledge. The author has a real talent for explaining things in a very clear and precise manner.

Alex Aiken, Compilers . This is basically a classic on compilers. A Stanford  professor Alex Aiken created the course. As you would expect, compared to Crafting Interpreters, the course’s lectures focus on the theoretical aspects much more. To give an example, where Crafting Interpreters only mentions LR(1) parsing existence, the Compilers course goes into details about it.

For some reasons, the Compilers course, just as many other compiler studying materials, discusses emitting assembly code targeting the MIPS architecture. But x86-64 assembly is a better fit for our specific project. For this reason, a freely available book Introduction to Compilers and Language Design  by Douglas Thain is very useful to us. As, among other things, the book explains how to work with the x86-64 assembly and how to translate different constructs of high-level languages into it.

Who’s gonna watch a Fortnite stream on Twitch, when Immo Landwerth is writing a compiler  live on YouTube?! Of course, records of the live streams are available as well and you can explore the code in the project’s github repo .

Advanced topics

A talk on intermediate representations (IR). Previous knowledge is not required. What’s so cool about this talk is that it’s built around one of the most widely used IR — LLVM IR. 2019 EuroLLVM Developers’ Meeting: V. Bridgers & F. Piovezan “LLVM IR Tutorial - Phis, GEPs ...”

A free self-guided online course from Cornell University. Goes in depth on IR and optimizations but is very accessible at the same time. Again, cannot recommend highly enough. CS 6120: Advanced Compilers: The Self-Guided Online Course .