Crafting Interpreters: A Tree-walk Interpreter in Rust - Part 2

21 minute read

Published:

Introduction

This is the second part. The first part is available here.

This blog is not a rewritten version of the original book. I only want to record the important things I need to know. This blog is more like my log book of what I should know and what I have done. You can use it as your guide too, but I apologize for my bad grammar and vocab choosing in advance. The full code can be found here.

Review the grammar rules from the previous chapter

expression → equality ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" ;
  • (...)* : * means having at least one (one or more of elements in (...))
  • (...)? : ? means optional (zero or more of elements in (...))

Chapter 8: Statements and State

Problems This Chapter Addresses

  1. The “Calculator Problem”

Problem: The previous interpreter could only evaluate expressions like a calculator - it couldn’t build up programs from smaller pieces or remember anything between operations.

Why this matters: Real programming languages need to let you:

  • Break complex problems into smaller parts
  • Reuse values and computations
  • Build systems that maintain state over time
  1. No Way to See Results

Problem: We could evaluate expressions, but had no way to display results to users. // Impact: Without output, you can’t interact with programs or see what they’re doing.

  1. No Memory Between Operations

Problem: Each expression was evaluated in isolation - no way to store a result and use it later.

This is a grammar rules for this chapter

program → declaration* EOF ;

declaration → varDecl | statement ;

varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;

statement → exprStmt | printStmt | block ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;

expression → assignment ;
assignment → IDENTIFIER "=" assignment | equality ;
// Almost the same as before
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
// add IDENTIFIER to primary
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;

Here is only an updated part from the previous chapter.

program → declaration* EOF ;
declaration → varDecl | statement ;
statement → exprStmt | printStmt | block ;

varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;

// Expression grammar updated:
expression → assignment ;
assignment → IDENTIFIER "=" assignment | equality ;
primary → "true" | "false" | "nil" | NUMBER | STRING 
        | "(" expression ")" | IDENTIFIER ;

However, in my implementation, I add varDecl and block to be the statement to simplify the implementation.

  • Stmt::Var creates a variable.
  • Expr::Variable uses a variable.
var x = 10;    // This uses Stmt::Var (creates the variable)
print x;       // This uses Expr::Variable (accesses the variable)

Statement VS Expression

Statement vs Expression Distinction

  • Expressions: Produce values (2 + 3, a, a = 5)
  • Statements: Perform actions (print x;, var a = 5;, { ... })

AST of var x = 10; that declares a variable.

Stmt::Var {
    name: Token("x"),
    initializer: Some(Box::new(Expr::Literal { value: 10 }))
}

AST of print x; that uses a variable.

Stmt::Print {
    expression: Box::new(Expr::Variable { name: Token("x") })
}

The difference between expression and expression statement is that the expression statement is an expression with ; in the end. The expression returns a value, while the expression statement does not.

Implementation Mismatch

There is a little mismatch between the rules and the implementation of block.

declaration → varDecl | statement ;
statement → exprStmt | printStmt | block ;

So, the correct implementation in parser.rs of declaration() and statement() is

fn declaration(&mut self) -> Result<Stmt> {
    if self.match_tokens(&[TokenType::Var]) {
        self.var_declaration()
    } else {
        // Remove the block handling from here
        self.statement()
    }
}

fn statement(&mut self) -> Result<Stmt> {
    if self.match_tokens(&[TokenType::Print]) {
        self.print_statement()
    } else if self.match_tokens(&[TokenType::LeftBrace]) {
        // Block handling belongs here
        Ok(Stmt::block(self.block()?))
    } else {
        self.expression_statement()
    }
}

However, the current implementation still produces a correct result.

Introducing Environment

An environment is introduced now because I have rules for declaring variables. We need to keep record of them. For the block statement, if a variable cannot be found in the current environment, it will be searched by traversing up through its parent environment until it is found. if not, report an error.

pub fn get(&self, name: &str) -> Result<Value> {
        // TODO: Get variable value, return error if undefined
        // Hint: Use anyhow! macro for error
        
        if let Some(value) = self.values.get(name) { // don't confuse HashMap's get and Environment's get
            Ok(value.clone())
        }
        else if let Some(enclosing) = &self.enclosing { // If cannot find in the current env, find it in the parent env
            enclosing.get(name)
        }
        else {
            Err(anyhow!("Undefined variable '{}'.", name))
        }
    }

What does fn synchronize do

fn synchronize is used to recover the parser after finding syntax errors and continue parsing the rest of the program.

Example: test.lox

var a = 5;
var bad syntax here !@#;    // Syntax error
var b = 10;                 // Should still be parsed
print b;                    // Should still be parsed

Chapter 9: Control Flow

This chapter adds the control flow, including if-statement and loop-statement.

A full grammar of this chapter.

program → declaration* EOF ;

declaration → varDecl | statement ;

varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;

statement → exprStmt | forStmt | ifStmt | printStmt | whileStmt | block ;
exprStmt → expression ";" ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" ) expression? ";" expression? ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
printStmt → "print" expression ";" ;
whileStmt → "while" "(" expression ")" statement ;
block → "{" declaration* "}" ;

expression → assignment ;
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;

Here is only an updated part from the previous chapter.

statement → exprStmt | ifStmt | printStmt | whileStmt | block ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
whileStmt → "while" "(" expression ")" statement ;

// Logical operators added to expression hierarchy:
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;

// For loops added:
statement → exprStmt | forStmt | ifStmt | printStmt | whileStmt | block ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" ) 
                    expression? ";" 
                    expression? ")" statement ;

Why adding Logical while it looks the same as Binary?

The reason I add

Logical { 
        left: Box<Expr>,
        operator: Token,
        right: Box<Expr>,
    },

because I want to shortcut some operations, as opposed to Binary that does not have this feature. If the evaluation result of the left expression of or is true, I can return true immediately without evaluating the right expression. In the same way, if the left expression’s evaluation result is false, return false immediately. Reminder that evaluation can be done from .accept(...).

fn visit_logical_expr(&mut self, _expr: &Expr, left: &Expr, operator: &Token, right: &Expr) -> Result<Value> {
    // TODO: Implement short-circuiting logic
    // For "or": if left is truthy, return left, otherwise return right
    // For "and": if left is falsy, return left, otherwise return right
    
    let left_value = left.accept(self)?;
    
    match operator.token_type {
        TokenType::Or => {
            if left_value.is_truthy() {
                // TODO: Return left_value (short-circuit)
                Ok(left_value)
            } else {
                // TODO: Evaluate and return right
                right.accept(self)
            }
        }
        TokenType::And => {
            if !left_value.is_truthy() {
                // TODO: Return left_value (short-circuit)  
                Ok(left_value)
            } else {
                // TODO: Evaluate and return right
                right.accept(self)
            }
        }
        _ => Err(anyhow!("Unknown logical operator: {:?}", operator.token_type)),
    }
}

For-Loop Sugar

A for-loop is, in fact, translated into while-loop and executed by the same way as while-loop.

  • Parser: Recognizes syntax, builds AST, performs transformations (desugaring)
  • Interpreter: Executes AST nodes, doesn’t need to know about original syntax So, for-loop to while-loop is done in parser.rs.

Chapter 10: Functions

This chapter is difficult. Adding a function feature introduces the closure feature and the difficulty to manage the environment.

This chapter implements user-defined functions in the Lox language, completing the transition from a calculator to a full programming language. It adds:

  • Function declarations (fun name(params) { body })
  • Function calls (functionName(arguments))
  • Return statements (return value;)
  • Local scopes and closures
  • Native functions (like clock())

Here is the full grammar rules.

program → declaration* EOF ;

declaration → funDecl | varDecl | statement ;

funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
parameters → IDENTIFIER ( "," IDENTIFIER )* ;

varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;

statement → exprStmt | forStmt | ifStmt | printStmt | returnStmt | whileStmt | block ;
exprStmt → expression ";" ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" ) expression? ";" expression? ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
printStmt → "print" expression ";" ;
returnStmt → "return" expression? ";" ;
whileStmt → "while" "(" expression ")" statement ;
block → "{" declaration* "}" ;

expression → assignment ;
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;
arguments → expression ( "," expression )* ;

Here is the updated rules from the previous chapter.

declaration → funDecl | varDecl | statement ;
funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
parameters → IDENTIFIER ( "," IDENTIFIER )* ;

// Function calls added to expression hierarchy:
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
arguments → expression ( "," expression )* ;

// Return statements:
statement → exprStmt | forStmt | ifStmt | printStmt 
          | returnStmt | whileStmt | block ;
returnStmt → "return" expression? ";" ;

Native Function

Native function is a function that is implemented in the host language (which is Rust in my case) instead of Lox language. This interpreter has one native function, clock().

Why return uses Err to implement?

The return feature of Lox uses Err to immediately return from a function.

  1. When return executes
fn visit_return_stmt(&mut self, ...) -> Result<()> {
    let val = if let Some(v) = value {
        v.accept(self)?
    } else {
        Value::Nil
    };
    
    // "Throw" the return value as an error
    Err(ReturnValue { value: val }.into())
}
  1. The error propagates up automatically
// In visit_while_stmt
while condition.accept(self)?.is_truthy() {
    body.accept(self)?;  // ← If this is a return, the ? operator 
                         //   immediately exits this function!
}
  1. The function caller catches it:
pub fn call_lox_function(&mut self, function: &LoxFunction, arguments: Vec<Value>) -> Result<Value> {
    // ... setup code ...
    
    let result: anyhow::Result<Value> = (|| {
        for statement in &function.declaration().body {
            statement.accept(self)?;  // ← Return "error" propagates out of this loop
        }
        Ok(Value::Nil)  // If no return, function returns nil
    })();

    // ... cleanup code ...

    match result {
        Err(err) => {
            // Check if it's actually a return value
            if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
                Ok(return_val.value.clone())  // Convert "error" back to normal value
            } else {
                Err(err)  // Real error, propagate it
            }
        }
        Ok(_) => Ok(Value::Nil),
    }
}

Using EnvironmentArena (Environment Pool) instead of Environment

This interpreter changes to use EnvironmentArena instead of Environment.

#[derive(Debug)]
pub struct EnvironmentArena {
    environments: Vec<Environment>, // The "parking lot" for environments
}

#[derive(Debug, Clone)]
pub struct Environment {
    enclosing: Option<EnvId>, // Just an ID, not a direct pointer!
    values: HashMap<String, Value>,
}

pub type EnvId = usize; // Environment ID - just a number!

All environment access goes through the arena.

// Get a variable's value, walking up the chain if needed
pub fn get(&self, env_id: EnvId, name: &str) -> Result<Value> {
    let mut current = env_id;
    loop {
        // Check current environment
        if let Some(value) = self.environments[current].values.get(name) {
            return Ok(value.clone());
        }
        
        // Move to parent environment
        if let Some(parent) = self.environments[current].enclosing {
            current = parent;
        } else {
            return Err(anyhow!("Undefined variable '{}'.", name));
        }
    }
}

Example

var global = "I'm global";

fun outer() {
    var outerVar = "I'm in outer";
    
    fun inner() {
        var innerVar = "I'm in inner";
        print global;    // Needs to walk up the chain
        print outerVar;  // Needs to walk up the chain
        print innerVar;  // Found locally
    }
    
    inner();
}

outer();

The arena contents will be

// Arena contents:
environments[0] = Environment {  // Global scope
    enclosing: None,
    values: {
        "global" -> "I'm global",
        "outer" -> Function(...)
    }
}

environments[1] = Environment {  // outer() function scope
    enclosing: Some(0),  // Points to global
    values: {
        "outerVar" -> "I'm in outer",
        "inner" -> Function(...)
    }
}

environments[2] = Environment {  // inner() function scope
    enclosing: Some(1),  // Points to outer scope
    values: {
        "innerVar" -> "I'm in inner"
    }
}

When inner() tries to access global

// Looking up "global" from environment 2
let mut current = 2; // Start in inner() scope

// Check environments[2] - not found
// Move to parent: current = environments[2].enclosing = Some(1)
current = 1;

// Check environments[1] - not found  
// Move to parent: current = environments[1].enclosing = Some(0)
current = 0;

// Check environments[0] - FOUND! Return "I'm global"

Problem when implementing an interpreter to support a recursion function.

It is tricky to write a call_lox_function to enable recursion function.

pub fn call_lox_function(&mut self, function: &LoxFunction, arguments: Vec<Value>) -> Result<Value> {
        ...

        let current_env = self.environment; // Remember current environment
        // Create new environment with function's closure as parent
        let call_env = self.arena.create_env_with_enclosing(function.closure());
        // Add function to its own environment for recursion
        self.arena.define(call_env, function.name().to_string(), Value::Function(function.clone()));
        // Bind parameters to arguments
        for (param, arg) in function.declaration().params.iter().zip(arguments.iter()) {
            self.arena.define(call_env, param.lexeme.clone(), arg.clone());
        }
        
        self.environment = call_env; // Switch to function's environment

        let result: anyhow::Result<Value> = (|| {
            for statement in &function.declaration().body {
                statement.accept(self)?;
            }
            Ok(Value::Nil)
        })();

        self.environment = current_env; // Restore previous environment

        match result {
            Err(err) => {
                if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
                    Ok(return_val.value.clone())
                } else {
                    Err(err)
                }
            }
            Ok(_) => Ok(Value::Nil),
        }
    }

Take a closer look at the beginning of this function.

let current_env = self.environment; // Remember current environment
// Create new environment with function's closure as parent
let call_env = self.arena.create_env_with_enclosing(function.closure());
// Add function to its own environment for recursion --IMPORTANT--
self.arena.define(call_env, function.name().to_string(), Value::Function(function.clone()));

I have to add the current function name to the current environment as well to allow recursion. Otherwise, when calling a recursion function, the function will not be able to find its own function.

Why using closure in call_lox_function?

One more important point is that the result is a result of closure function.

let result: anyhow::Result<Value> = (|| {
            for statement in &function.declaration().body {
                statement.accept(self)?;
            }
            Ok(Value::Nil)
        })();

The reason to use the closure here is that if there is anything wrong when evaluating the statement (again, by using .accept(...)), the following part of call_lox_function will still be executed. This is important because no matter what is the result of the statementS evaluation, the previous environment must always be restored. In other words, I want

// ...
self.environment = current_env; // Restore previous environment

match result {
    Err(err) => {
        if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
            Ok(return_val.value.clone())
        } else {
            Err(err)
        }
    }
    Ok(_) => Ok(Value::Nil),
}

to be restored no matter what the result of let result: anyhow::Result<Value> = (|| {... is.

Current Bug

Finally, this implementation still has a bug, which will be resolved in the next chapter. When I run

var a = "global";  // Global environment: {a: "global"}
{
  fun showA() {     // What gets captured here?
    print a;
  }
  
  showA();          // First call
  var a = "block";  // What happens here?
  showA();          // Second call
}

It gives

global
block

instead of

global
global

When the block starts to be executed, execute_block is called.

...
let block_env = self.arena.create_env_with_enclosing(global_env);
// block_env contents: {} (empty)
// block_env.enclosing: global_env
...

When showA() is called, the visit_function_stmt(...) does

fn visit_function_stmt(&mut self, _stmt: &Stmt, name: &Token, params: &[Token], body: &[Stmt]) -> Result<()> {
        // TODO: Create function object and store in environment
        // 1. Create FunctionDeclaration
        // 2. Capture current environment as closure
        // 3. Create LoxFunction
        // 4. Store in environment with function name
        let declaration = FunctionDeclaration {
            name: name.clone(),
            params: params.to_vec(),
            body: body.to_vec(),
        };
        
        // Create function with current environment as closure (just store the ID!)
        let function = LoxFunction::new(declaration, self.environment);
        
        // Define function in current environment
        self.arena.define(self.environment, name.lexeme.clone(), Value::Function(function));
        Ok(())
    }

When the first showA() is called,

  1. Looks up “a” in block_env: {} (not found)
  2. Looks up “a” in block_env.enclosing (global): {a: "global"}
  3. Prints “global”

When var a = "block" executes,

impl StmtVisitor<Result<()>> for Interpreter {
    fn visit_var_stmt(&mut self, _stmt: &Stmt, name: &Token, initializer: &Option<Box<Expr>>) -> Result<()> {
        let value = if let Some(init) = initializer {
            init.accept(self)? // Evaluates "block" -> Value::String("block")
        } else {
            Value::Nil
        };

        // HERE IS WHERE IT HAPPENS! 👇
        self.arena.define(self.environment, name.lexeme.clone(), value);
        //                ^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^   ^^^^^
        //                current env ID   "a"                 Value::String("block")
        // This MODIFIES block_env: {} becomes {a: "block"}
        Ok(())
    }
}
// This MODIFIES block_env: {} becomes {a: "block"}
impl EnvironmentArena {
    pub fn define(&mut self, env_id: EnvId, name: String, value: Value) {
        // THIS IS THE ACTUAL MUTATION! 👇
        self.environments[env_id].values.insert(name, value);
        //                        ^^^^^^ This HashMap gets modified!
    }
}

So, when the second showA() is called, it uses a = "block" instead.

  1. Looks up “a” in block_env: {a: "block"} (found!)
  2. Prints “block” (never looks at parent)

which is not what I want from the static scoping. The issue is that var a = "block" modifies the same environment that the function captured. Even though the function was declared before that variable, the environment it captured gets modified after the fact.

pub struct Environment {
    enclosing: Option<EnvId>,
    values: HashMap<String, Value>, // This HashMap gets modified!
}

Chapter 11: Resolving and Binding

– Not yet finished –