Adding a standard library to your interpreter
Introduction
As part of my debugger 1 which I’ve been working on for a while now, I’ve been hard at work towards implementing a command-line interface similar to lldb/gdb. A big part of the way you interact with these debuggers is via function calls for core functionalities such as viewing registers and breakpoints. In this post, I will walk through how I implemented basic function calling ability as well as a simple len function which returns the length of a string provided as an argument.
To quickly familiarise yourself with the structure of my command-line interface, I’ve been following along Crafting Interpreters 2, having hand-translated the tutorial from Java to C++. The interpreter uses a top-down recursive descent parser which works by writing a function per grammar rule and these functions call each other recursively to match the input.
If you’ve read through Crafting Interpreters before, you will be familiar with what an expression and a statement is. For those who haven’t, an expression is something which produces a value, say, an integer 5 or a string ‘hello’. On the other hand, a statement is something which tells an interpreter to perform an action, such as variable assignment or a function call. In my code, these are implemented with the visitor pattern as you can see below.
// Expression
class IExprVisitor {
public:
virtual ~IExprVisitor() = default;
virtual Object visitBinaryExpr(const Binary& expr) = 0;
virtual Object visitGroupingExpr(const Grouping& expr) = 0;
virtual Object visitLiteralExpr(const Literal& expr) = 0;
virtual Object visitUnaryExpr(const Unary& expr) = 0;
virtual Object visitVariableExpr(const Variable& expr) = 0;
virtual Object visitAssignExpr(const Assign& expr) = 0;
};
// Statement
class IStmntVisitor {
public:
virtual ~IStmntVisitor() = default;
virtual Object visitExprStmnt(const ExprStmnt& stmnt) = 0;
virtual Object visitVarStmnt(const VarStmnt& stmnt) = 0;
// Print is already a built-in function as per the tutorial
virtual Object visitPrintStmnt(const PrintStmnt& stmnt) = 0;
};
Here, Object represents any value that can be returned as a result of a statement or that an expression represents. It’s been made to be somewhat similar to Java’s Object and is defined like so, where std::monostate represents a null value.
using Object = std::variant<std::monostate, std::string, float, int, double, bool>;
Design
To implement function calling, I first made a Callable abstract class.
class Callable {
public:
virtual ~Callable() = default;
virtual Object call(Interpreter& interpreter, std::vector<Object> args) = 0;
virtual int arity() const = 0;
[[nodiscard]] virtual std::string str() const = 0;
};
For those not familiar with maths terminology, arity simply refers to the number of arguments the function can accept. It’s probably also worth noting that default arguments and optional arguments are outside the scope of my implementation as these were deemed unnecessary for my use case. The call method will be derived by a child class to implement what the function should execute. This method takes an Interpreter reference and a list of arguments where Interpreter derives the Statement and Expression visitor classes. Lastly, the str method will be used to print the function object.
This class will serve as a template for implementing all other functions after the scanner has correctly identified which function to call.
I then expanded the definition of Object to include the Callable class. This allows me to reuse existing logic and visitor pattern to go through the functions and evaluate them. I also used a shared pointer here as I might need to store different functions as pointers to the base class and functions could live in different places, therefore using a smart pointer allows for easier memory management.
using Object = std::variant<std::monostate, std::string, float, int, double, bool, std::shared_ptr<Callable>>;
Scanning
Firstly, I added support for call statements in the statement visitor by adding the following line to IStmntVisitor. This allowed me to reuse the same pattern and functionality already built into my interpreter to support function calling.
class IStmntVisitor {
public:
...
virtual Object visitCallStmnt(const CallStmnt& stmnt) = 0;
};
Next, I implemented a call statement which will represent function calls in the AST. In here, Token is a type which contains the token type, the lexeme (string representation of the value) and the literal (actual value). This is just a simple class which has an accept method to work with the visitor pattern my interpreter is already using. It is worth noting that currently, my implementation supports only one argument which can also be an expression that gets evaluated. In the future, I’d use a std::vector of expressions so that I can support functions with multiple arguments.
// Stmnt is a abstract class which contains the `accept` function
class CallStmnt final : public Stmnt {
public:
Token m_fn;
// TODO: Use std::vector to support multiple arguments
std::unique_ptr<Expr> m_args;
CallStmnt(Token fn, std::unique_ptr<Expr> expr)
: m_fn(std::move(fn)), m_args(std::move(expr)) {}
Object accept(IStmntVisitor* visitor) override {
return visitor->visitCallStmnt(*this);
}
};
After this, I adapted the scanner so that it’s able to pick up the functions whilst going through the text generating tokens.
std::vector<Token> Scanner::identifier() {
...
if (type == TokenType::FUN) {
function(text);
return;
}
}
Next, I implemented function which creates the function object and then stores it in a token. For now, I only implemented the len function which means I’m able to get away with manually creating the object for it. However, when implementing multiple functions, I would need to make use of a macro which takes the function name and appends ‘Fn’ to it in order to dynamically generate the class name of the function. As for m_keywords, this is a hashmap of strings to a token type and for every function, I define its name here and explicitly state that it’s a function. Lastly, addToken here stores the LenFn object inside the token’s m_literal field. This is done so that I can later retrieve the function during interpreter execution.
// add to m_keywords
{"len", TokenType::FUN}};
void Scanner::function(const std::string& name) {
if (m_keywords.contains(name))
addToken(TokenType::FUN, std::make_shared<LenFn>(LenFn()));
}
Another alternative I considered here was registering functions directly in the environment. If you aren’t aware what an environment in this context is, it refers to the mapping between a variable name and its value. This would make adding new functions easier as all it would involve is one extra line in the Interpreter constructor, however, this would also mean that function names could potentially get overshadowed by variable declarations with the same name and that’s just not what I’m looking for.
Interpreter::Interpreter() {
// m_env.define("len", std::make_shared<Callable>(LenFn()));
};
This is not to say that the method I chose is perfect as currently adding a new function requires the following
- Make a new class deriving
Callable - Register it as a function keyword in
m_keywords - Adapt scanner to differentiate between new function and existing functions
In my opinion, a middle ground between registering the function in the environment and having it as a keyword would be ideal as it would reduce the complexity mentioned above. Implementing that would look something like this
Interpreter::Interpreter() {
// m_env.define("len", std::make_shared<Callable>(LenFn()), protected = true);
};
Parsing
After I made the changes to the scanner, I can now move on to the parser where as the name implies, the tokens get parsed into a format which can be executed by the interpreter. Firstly, I added a check to differentiate from other type of statements and make sure I correctly identify that the parser is looking at a function as opposed to a variable declaration. funStmnt refers to the function which will actually parse and generate a call statement.
std::unique_ptr<Stmnt> Parser::statement() {
if (match(TokenType::FUN)) return funStmnt();
...
}
Here, I first defined the arguments as expressions, as, for example I may wish to get the length of a string expression such as “hello” + “world”. As my interpreter currently doesn’t deviate too much from the one described in the book, I still expect a semicolon at the end of the line, however, this could be easily removed and is not actually necessary in my case. Lastly, I generate a call statement as shown earlier with the correct token and argument information.
std::unique_ptr<Stmnt> Parser::funStmnt() {
std::unique_ptr<Expr> args = expression();
consume(TokenType::SEMICOLON, "Expect ';' after function call.");
return std::make_unique<CallStmnt>(std::move(this->m_tokens[0]),
std::move(args));
}
Interpreting
Okay, here comes the final part, actually executing the code. For this, I modified the interpreter to recognise the function calls which you can see below.
Object Interpreter::visitCallStmnt(const CallStmnt& stmnt) {
Object arg = evaluate(stmnt.m_args);
auto fn = std::get_if<std::shared_ptr<Callable>>(&stmnt.m_fn.m_literal);
// TODO: Check arity against vector size
std::vector<Object> argList = {};
argList.emplace_back(std::move(arg));
if (fn) std::cout << stringify(fn->get()->call(*this, argList)) << std::endl;
return std::monostate{};
}
Firstly, the arguments get evaluated to ensure that they have been processed before operating on them. Next, I get the Callable object which contains the actual function information as well as what its supposed to execute. Lastly, I execute the function and print out its result. It’s worth noting that currently this example does not support using function results as arguments for other functions as it returns std::monostate. As for stringify, this simply formats an Object using std::format. Using this approach requires specialising std::format for the Object type which I’ve done like so.
template <>
struct std::formatter<Object> {
constexpr auto parse(std::format_parse_context& ctx) { return ctx.begin(); }
auto format(const Object& obj, std::format_context& ctx) const {
auto visitor = [&ctx](const auto& value) -> std::format_context::iterator {
using T = std::decay_t<decltype(value)>;
if constexpr (std::is_same_v<T, std::monostate>) {
return std::format_to(ctx.out(), "emtpy");
} else if constexpr (std::is_same_v<T, std::string>) {
return std::format_to(ctx.out(), "{}", value);
} else if constexpr (std::is_same_v<T, bool>) {
return std::format_to(ctx.out(), "{}", value ? "true" : "false");
} else if constexpr (std::is_same_v<T, double>) {
std::string text = std::format("{}", value);
if (text.ends_with(".0")) {
text = text.substr(0, text.length() - 2);
}
return std::format_to(ctx.out(), "{}", text);
} else if constexpr (std::is_same_v<T, std::shared_ptr<Callable>>) {
return std::format_to(ctx.out(), "{}", value->str());
}
else {
return std::format_to(ctx.out(), "{}", value);
}
};
return std::visit(visitor, obj);
};
std::string Interpreter::stringify(const Object& object) {
return std::format("{}", object);
}
Writing a function
After all that is done, I can finally implement a function. Following with my example, I implemented the len function which just returns the length of a string. Here I check if there is a mismatch between the number of arguments expected and the number of arguments provided. For the sake of getting a working example quickly, this was implemented within the function logic itself, however, for future use, this should be extracted out to visitCallStmnt to avoid repeating the same logic in other functions that would be defined. Lastly, I check the type of the argument to ensure its what is expected. Once again, this logic could be refactored and moved out to visitCallStmnt but this requires a bit more thinking and is not as straightforward.
class LenFn : public Callable {
public:
int arity() const override { return 1; }
[[nodiscard]] std::string str() const override { return "<native fn: len>"; }
Object call(Interpreter& interpreter, std::vector<Object> args) override {
if (args.size() != 1)
throw RuntimeError("len() requires exactly one argument");
auto* val = std::get_if<std::string>(&args[0]);
if (!val) throw RuntimeError("len() can only be called on a string");
return (int)val->length();
}
};
At last, I can try executing my function and see its result. For this specific function, its just a simple wrapper around the already existing C++ functionality, however, now I have the ability to further expand this and execute my own logic the way I see fit. I can also see that argument evaluation works fine as I am able to get the length of the expression "hello" + "world". Lastly, I can see my “type checking” in action as the interpreter will return an error message should I try and call len on anything other than a string.
> len "hello";
5
> len "hello" + "world";
10
> len 5;
len() can only be called on a string
Conclusion
To achieve a standard library, all that is left for me to do is support multiple arguments and implement more functions. After that, I’ll be looking at implementing core debugging functionality such as single stepping and register access. If I happen to go through any interesting challenges whilst implementing these features, you’ll be sure to see another post here :). As always, if you wish to get in touch with me, you can reach me at dm.leirbag@tulas