Crafting Interpreters: A Tree-walk Interpreter in Rust - Part 2
Published:
Introduction
This is the second part. The first part is available here.
This blog is not a rewritten version of the original book. I only want to record the important things I need to know. This blog is more like my log book of what I should know and what I have done. You can use it as your guide too, but I apologize for my bad grammar and vocab choosing in advance. The full code can be found here.
Review the grammar rules from the previous chapter
expression → equality ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" ;
(...)*
:*
means having at least one (one or more of elements in(...)
)(...)?
:?
means optional (zero or more of elements in(...)
)
Chapter 8: Statements and State
Problems This Chapter Addresses
- The “Calculator Problem”
Problem: The previous interpreter could only evaluate expressions like a calculator - it couldn’t build up programs from smaller pieces or remember anything between operations.
Why this matters: Real programming languages need to let you:
- Break complex problems into smaller parts
- Reuse values and computations
- Build systems that maintain state over time
- No Way to See Results
Problem: We could evaluate expressions, but had no way to display results to users. // Impact: Without output, you can’t interact with programs or see what they’re doing.
- No Memory Between Operations
Problem: Each expression was evaluated in isolation - no way to store a result and use it later.
This is a grammar rules for this chapter
program → declaration* EOF ;
declaration → varDecl | statement ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
statement → exprStmt | printStmt | block ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;
expression → assignment ;
assignment → IDENTIFIER "=" assignment | equality ;
// Almost the same as before
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
// add IDENTIFIER to primary
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;
Here is only an updated part from the previous chapter.
program → declaration* EOF ;
declaration → varDecl | statement ;
statement → exprStmt | printStmt | block ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;
// Expression grammar updated:
expression → assignment ;
assignment → IDENTIFIER "=" assignment | equality ;
primary → "true" | "false" | "nil" | NUMBER | STRING
| "(" expression ")" | IDENTIFIER ;
However, in my implementation, I add varDecl
and block
to be the statement to simplify the implementation.
Stmt::Var
creates a variable.Expr::Variable
uses a variable.
var x = 10; // This uses Stmt::Var (creates the variable)
print x; // This uses Expr::Variable (accesses the variable)
Statement VS Expression
Statement vs Expression Distinction
- Expressions: Produce values (
2 + 3
,a
,a = 5
) - Statements: Perform actions (
print x;
,var a = 5;
,{ ... }
)
AST of var x = 10;
that declares a variable.
Stmt::Var {
name: Token("x"),
initializer: Some(Box::new(Expr::Literal { value: 10 }))
}
AST of print x;
that uses a variable.
Stmt::Print {
expression: Box::new(Expr::Variable { name: Token("x") })
}
The difference between expression and expression statement is that the expression statement is an expression with ;
in the end. The expression returns a value, while the expression statement does not.
Implementation Mismatch
There is a little mismatch between the rules and the implementation of block
.
declaration → varDecl | statement ;
statement → exprStmt | printStmt | block ;
So, the correct implementation in parser.rs
of declaration()
and statement()
is
fn declaration(&mut self) -> Result<Stmt> {
if self.match_tokens(&[TokenType::Var]) {
self.var_declaration()
} else {
// Remove the block handling from here
self.statement()
}
}
fn statement(&mut self) -> Result<Stmt> {
if self.match_tokens(&[TokenType::Print]) {
self.print_statement()
} else if self.match_tokens(&[TokenType::LeftBrace]) {
// Block handling belongs here
Ok(Stmt::block(self.block()?))
} else {
self.expression_statement()
}
}
However, the current implementation still produces a correct result.
Introducing Environment
An environment is introduced now because I have rules for declaring variables. We need to keep record of them. For the block statement, if a variable cannot be found in the current environment, it will be searched by traversing up through its parent environment until it is found. if not, report an error.
pub fn get(&self, name: &str) -> Result<Value> {
// TODO: Get variable value, return error if undefined
// Hint: Use anyhow! macro for error
if let Some(value) = self.values.get(name) { // don't confuse HashMap's get and Environment's get
Ok(value.clone())
}
else if let Some(enclosing) = &self.enclosing { // If cannot find in the current env, find it in the parent env
enclosing.get(name)
}
else {
Err(anyhow!("Undefined variable '{}'.", name))
}
}
What does fn synchronize
do
fn synchronize
is used to recover the parser after finding syntax errors and continue parsing the rest of the program.
Example: test.lox
var a = 5;
var bad syntax here !@#; // Syntax error
var b = 10; // Should still be parsed
print b; // Should still be parsed
Chapter 9: Control Flow
This chapter adds the control flow, including if-statement
and loop-statement
.
A full grammar of this chapter.
program → declaration* EOF ;
declaration → varDecl | statement ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
statement → exprStmt | forStmt | ifStmt | printStmt | whileStmt | block ;
exprStmt → expression ";" ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" ) expression? ";" expression? ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
printStmt → "print" expression ";" ;
whileStmt → "while" "(" expression ")" statement ;
block → "{" declaration* "}" ;
expression → assignment ;
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;
Here is only an updated part from the previous chapter.
statement → exprStmt | ifStmt | printStmt | whileStmt | block ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
whileStmt → "while" "(" expression ")" statement ;
// Logical operators added to expression hierarchy:
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
// For loops added:
statement → exprStmt | forStmt | ifStmt | printStmt | whileStmt | block ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" )
expression? ";"
expression? ")" statement ;
Why adding Logical
while it looks the same as Binary
?
The reason I add
Logical {
left: Box<Expr>,
operator: Token,
right: Box<Expr>,
},
because I want to shortcut some operations, as opposed to Binary
that does not have this feature. If the evaluation result of the left expression of or
is true, I can return true immediately without evaluating the right expression. In the same way, if the left expression’s evaluation result is false, return false immediately. Reminder that evaluation can be done from .accept(...)
.
fn visit_logical_expr(&mut self, _expr: &Expr, left: &Expr, operator: &Token, right: &Expr) -> Result<Value> {
// TODO: Implement short-circuiting logic
// For "or": if left is truthy, return left, otherwise return right
// For "and": if left is falsy, return left, otherwise return right
let left_value = left.accept(self)?;
match operator.token_type {
TokenType::Or => {
if left_value.is_truthy() {
// TODO: Return left_value (short-circuit)
Ok(left_value)
} else {
// TODO: Evaluate and return right
right.accept(self)
}
}
TokenType::And => {
if !left_value.is_truthy() {
// TODO: Return left_value (short-circuit)
Ok(left_value)
} else {
// TODO: Evaluate and return right
right.accept(self)
}
}
_ => Err(anyhow!("Unknown logical operator: {:?}", operator.token_type)),
}
}
For-Loop Sugar
A for-loop
is, in fact, translated into while-loop
and executed by the same way as while-loop
.
- Parser: Recognizes syntax, builds AST, performs transformations (desugaring)
- Interpreter: Executes AST nodes, doesn’t need to know about original syntax So,
for-loop
towhile-loop
is done inparser.rs
.
Chapter 10: Functions
This chapter is difficult. Adding a function feature introduces the closure feature and the difficulty to manage the environment.
This chapter implements user-defined functions in the Lox language, completing the transition from a calculator to a full programming language. It adds:
- Function declarations (
fun name(params) { body }
) - Function calls (
functionName(arguments)
) - Return statements (
return value;
) - Local scopes and closures
- Native functions (like
clock()
)
Here is the full grammar rules.
program → declaration* EOF ;
declaration → funDecl | varDecl | statement ;
funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
parameters → IDENTIFIER ( "," IDENTIFIER )* ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
statement → exprStmt | forStmt | ifStmt | printStmt | returnStmt | whileStmt | block ;
exprStmt → expression ";" ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" ) expression? ";" expression? ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
printStmt → "print" expression ";" ;
returnStmt → "return" expression? ";" ;
whileStmt → "while" "(" expression ")" statement ;
block → "{" declaration* "}" ;
expression → assignment ;
assignment → IDENTIFIER "=" assignment | logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" | IDENTIFIER ;
arguments → expression ( "," expression )* ;
Here is the updated rules from the previous chapter.
declaration → funDecl | varDecl | statement ;
funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
parameters → IDENTIFIER ( "," IDENTIFIER )* ;
// Function calls added to expression hierarchy:
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
arguments → expression ( "," expression )* ;
// Return statements:
statement → exprStmt | forStmt | ifStmt | printStmt
| returnStmt | whileStmt | block ;
returnStmt → "return" expression? ";" ;
Native Function
Native function is a function that is implemented in the host language (which is Rust in my case) instead of Lox language. This interpreter has one native function, clock()
.
Why return
uses Err
to implement?
The return
feature of Lox uses Err
to immediately return from a function.
- When
return
executes
fn visit_return_stmt(&mut self, ...) -> Result<()> {
let val = if let Some(v) = value {
v.accept(self)?
} else {
Value::Nil
};
// "Throw" the return value as an error
Err(ReturnValue { value: val }.into())
}
- The error propagates up automatically
// In visit_while_stmt
while condition.accept(self)?.is_truthy() {
body.accept(self)?; // ← If this is a return, the ? operator
// immediately exits this function!
}
- The function caller catches it:
pub fn call_lox_function(&mut self, function: &LoxFunction, arguments: Vec<Value>) -> Result<Value> {
// ... setup code ...
let result: anyhow::Result<Value> = (|| {
for statement in &function.declaration().body {
statement.accept(self)?; // ← Return "error" propagates out of this loop
}
Ok(Value::Nil) // If no return, function returns nil
})();
// ... cleanup code ...
match result {
Err(err) => {
// Check if it's actually a return value
if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
Ok(return_val.value.clone()) // Convert "error" back to normal value
} else {
Err(err) // Real error, propagate it
}
}
Ok(_) => Ok(Value::Nil),
}
}
Using EnvironmentArena (Environment Pool) instead of Environment
This interpreter changes to use EnvironmentArena
instead of Environment
.
#[derive(Debug)]
pub struct EnvironmentArena {
environments: Vec<Environment>, // The "parking lot" for environments
}
#[derive(Debug, Clone)]
pub struct Environment {
enclosing: Option<EnvId>, // Just an ID, not a direct pointer!
values: HashMap<String, Value>,
}
pub type EnvId = usize; // Environment ID - just a number!
All environment access goes through the arena.
// Get a variable's value, walking up the chain if needed
pub fn get(&self, env_id: EnvId, name: &str) -> Result<Value> {
let mut current = env_id;
loop {
// Check current environment
if let Some(value) = self.environments[current].values.get(name) {
return Ok(value.clone());
}
// Move to parent environment
if let Some(parent) = self.environments[current].enclosing {
current = parent;
} else {
return Err(anyhow!("Undefined variable '{}'.", name));
}
}
}
Example
var global = "I'm global";
fun outer() {
var outerVar = "I'm in outer";
fun inner() {
var innerVar = "I'm in inner";
print global; // Needs to walk up the chain
print outerVar; // Needs to walk up the chain
print innerVar; // Found locally
}
inner();
}
outer();
The arena contents will be
// Arena contents:
environments[0] = Environment { // Global scope
enclosing: None,
values: {
"global" -> "I'm global",
"outer" -> Function(...)
}
}
environments[1] = Environment { // outer() function scope
enclosing: Some(0), // Points to global
values: {
"outerVar" -> "I'm in outer",
"inner" -> Function(...)
}
}
environments[2] = Environment { // inner() function scope
enclosing: Some(1), // Points to outer scope
values: {
"innerVar" -> "I'm in inner"
}
}
When inner()
tries to access global
// Looking up "global" from environment 2
let mut current = 2; // Start in inner() scope
// Check environments[2] - not found
// Move to parent: current = environments[2].enclosing = Some(1)
current = 1;
// Check environments[1] - not found
// Move to parent: current = environments[1].enclosing = Some(0)
current = 0;
// Check environments[0] - FOUND! Return "I'm global"
Problem when implementing an interpreter to support a recursion function.
It is tricky to write a call_lox_function
to enable recursion function.
pub fn call_lox_function(&mut self, function: &LoxFunction, arguments: Vec<Value>) -> Result<Value> {
...
let current_env = self.environment; // Remember current environment
// Create new environment with function's closure as parent
let call_env = self.arena.create_env_with_enclosing(function.closure());
// Add function to its own environment for recursion
self.arena.define(call_env, function.name().to_string(), Value::Function(function.clone()));
// Bind parameters to arguments
for (param, arg) in function.declaration().params.iter().zip(arguments.iter()) {
self.arena.define(call_env, param.lexeme.clone(), arg.clone());
}
self.environment = call_env; // Switch to function's environment
let result: anyhow::Result<Value> = (|| {
for statement in &function.declaration().body {
statement.accept(self)?;
}
Ok(Value::Nil)
})();
self.environment = current_env; // Restore previous environment
match result {
Err(err) => {
if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
Ok(return_val.value.clone())
} else {
Err(err)
}
}
Ok(_) => Ok(Value::Nil),
}
}
Take a closer look at the beginning of this function.
let current_env = self.environment; // Remember current environment
// Create new environment with function's closure as parent
let call_env = self.arena.create_env_with_enclosing(function.closure());
// Add function to its own environment for recursion --IMPORTANT--
self.arena.define(call_env, function.name().to_string(), Value::Function(function.clone()));
I have to add the current function name to the current environment as well to allow recursion. Otherwise, when calling a recursion function, the function will not be able to find its own function.
Why using closure in call_lox_function
?
One more important point is that the result
is a result of closure function.
let result: anyhow::Result<Value> = (|| {
for statement in &function.declaration().body {
statement.accept(self)?;
}
Ok(Value::Nil)
})();
The reason to use the closure here is that if there is anything wrong when evaluating the statement (again, by using .accept(...)
), the following part of call_lox_function
will still be executed. This is important because no matter what is the result of the statementS evaluation, the previous environment must always be restored. In other words, I want
// ...
self.environment = current_env; // Restore previous environment
match result {
Err(err) => {
if let Some(return_val) = err.downcast_ref::<ReturnValue>() {
Ok(return_val.value.clone())
} else {
Err(err)
}
}
Ok(_) => Ok(Value::Nil),
}
to be restored no matter what the result of let result: anyhow::Result<Value> = (|| {...
is.
Current Bug
Finally, this implementation still has a bug, which will be resolved in the next chapter. When I run
var a = "global"; // Global environment: {a: "global"}
{
fun showA() { // What gets captured here?
print a;
}
showA(); // First call
var a = "block"; // What happens here?
showA(); // Second call
}
It gives
global
block
instead of
global
global
When the block starts to be executed, execute_block
is called.
...
let block_env = self.arena.create_env_with_enclosing(global_env);
// block_env contents: {} (empty)
// block_env.enclosing: global_env
...
When showA()
is called, the visit_function_stmt(...)
does
fn visit_function_stmt(&mut self, _stmt: &Stmt, name: &Token, params: &[Token], body: &[Stmt]) -> Result<()> {
// TODO: Create function object and store in environment
// 1. Create FunctionDeclaration
// 2. Capture current environment as closure
// 3. Create LoxFunction
// 4. Store in environment with function name
let declaration = FunctionDeclaration {
name: name.clone(),
params: params.to_vec(),
body: body.to_vec(),
};
// Create function with current environment as closure (just store the ID!)
let function = LoxFunction::new(declaration, self.environment);
// Define function in current environment
self.arena.define(self.environment, name.lexeme.clone(), Value::Function(function));
Ok(())
}
When the first showA()
is called,
- Looks up “a” in block_env:
{}
(not found) - Looks up “a” in block_env.enclosing (global):
{a: "global"}
- Prints “global”
When var a = "block"
executes,
impl StmtVisitor<Result<()>> for Interpreter {
fn visit_var_stmt(&mut self, _stmt: &Stmt, name: &Token, initializer: &Option<Box<Expr>>) -> Result<()> {
let value = if let Some(init) = initializer {
init.accept(self)? // Evaluates "block" -> Value::String("block")
} else {
Value::Nil
};
// HERE IS WHERE IT HAPPENS! 👇
self.arena.define(self.environment, name.lexeme.clone(), value);
// ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^
// current env ID "a" Value::String("block")
// This MODIFIES block_env: {} becomes {a: "block"}
Ok(())
}
}
// This MODIFIES block_env: {} becomes {a: "block"}
impl EnvironmentArena {
pub fn define(&mut self, env_id: EnvId, name: String, value: Value) {
// THIS IS THE ACTUAL MUTATION! 👇
self.environments[env_id].values.insert(name, value);
// ^^^^^^ This HashMap gets modified!
}
}
So, when the second showA()
is called, it uses a = "block"
instead.
- Looks up “a” in block_env:
{a: "block"}
(found!) - Prints “block” (never looks at parent)
which is not what I want from the static scoping. The issue is that var a = "block"
modifies the same environment that the function captured. Even though the function was declared before that variable, the environment it captured gets modified after the fact.
pub struct Environment {
enclosing: Option<EnvId>,
values: HashMap<String, Value>, // This HashMap gets modified!
}
Chapter 11: Resolving and Binding
– Not yet finished –