RustPython - From Vision To Architecture

From Vision To Architecture

In the previous blog, we gave you a first impression of what RustPython looks like and what it aims to achieve. This time, we’ll be taking a look under the hood to find out what the main architecture of the interpreter looks like and how the individual components interact with each other. 🧐

Main architecture

Choosing the main architectural styles or patterns used in an application is an important decision. The main patterns provide the system a way to meet the desired quality attributes. As is the case for any system, the main architectural patterns found in RustPython describe its structural organisation.

Since RustPython employs several distinct modules to achieve its interpretation of Python, two architectural patterns can be identified. The first is the layers pattern (in blue) and describes how the modules in the RustPython repository work together. The second pattern is the interpreter pattern (in grey), which describes what the internal dependencies of the interpreter look like and how they work together. The figure below combines those two patterns, separating them by alternating colours (blue and grey). The figure also shows how supplied user input (for example extra arguments) relates to the execution occurring in the VM.

Figure: Main architecture of RustPython.

Layers pattern

The layers pattern describes how an application can be decomposed into components that execute a distinct subtask at a given level of abstraction. In the case of RustPython, this comes down to three steps: parsing, compiling and executing. Each step has its own process operating on a given input: the parser, compiler and VM respectively. There are several advantages that come to mind when applying a layers pattern. The main advantage applicable for RustPython is that the modules can be used separately. This allows for (1) easy modification of higher layers without having to change lower layers and (2) reusability of individual modules.

Interpreter pattern

The interpreter pattern describes how the internal dependencies of RustPython work together to evaluate python source code with the given input. As visible in the figure, this pattern is represented by the grey items, transforming source code into its abstract syntax tree, bytecode, and finally, its execution. The interpreter pattern allows for highly dynamic behaviour and is the main reason it works well with a layered pattern. This also makes writing an interpreter easier, since programmers can work on specific parts separately.

Now that you have an idea of the main architecture of RustPython, we’re going to take a look at it from several different views, to get a more detailed understanding of how the interpreter operates.

Views

A view is simply a perspective on the application, with a certain level of abstraction. The different views that we’re going to be covering, are:

  • The containers view: here we’ll talk about what a ‘container’ is exactly and which ones are present in the RustPython environment.
  • The components view: here we’ll go into detail on the main components that can be identified in the RustPython application.
  • The connectors view: the previously-mentioned components interact with each other in certain ways. In this section, we’ll be describing how those interactions are realised.

The views are inspired by the C4 website: the official website of one of the most commonly used techniques for modelling the architecture of software systems.

Containers view

Let’s first shortly explain what a ‘container’ is exactly. The C4 website describes a container as ‘a separately runnable/deployable unit (e.g. a separate process space) that executes code or stores data’. In other words: an environment in which the application in question will be run. RustPython can be run in a broad range of environments: from a developer running Python code on his computer, to a server running Python apps headlessly, to in a browser through WASM.

Components view

As we’ve mentioned in the Main Architecture section, RustPython consists of three main components: the parser, compiler, and VM. Since RustPython is capable of both running a REPL as well as compiling complete source files, there are of course several entry points into the application. We’re going to go over some of the main entry points of each of the components.

Parser

The main entry of the parser component has the following signature:

pub fn parse(source: &str, mode: Mode) -> Result<ast::Mod, ParseError>

The source parameter contains the raw code that should be parsed. For the REPL this simply is the line that the user typed into the shell, where for running a file, it’s a single string containing all the lines in the file. The mode parameter is used to signify in which mode the application is running (REPL/script/module). The function returns the parsed AST.

Compiler

The main entry of the compiler component has the following signature:

pub fn compile_top(
    ast: &ast::Mod,
    source_path: String,
    opts: CompileOpts,
) -> CompileResult<CodeObject>

The ast parameter contains the parsed AST of the code to compile. The source_path and opts parameters also have an intuitive explanation: the path to the source file and an object containing some compilation options. This function returns the compiled bytecode.

VM

The main entry of the VM component has the following signature:

pub fn run_code_obj(&self, code: PyCodeRef, scope: Scope) -> PyResult

The code parameter again contains the bytecode, together with some additional information the VM needs. The scope parameter contains all the global and local variables visible in the current scope. This function actually executes the bytecode and returns the result.

Connectors view

Now that we’ve talked about the individual components, let’s talk about how they’re connected. First and foremost, since RustPython is ‘only’ an interpreter, all components are simply connected through function calls. Secondly, like you might have already noticed in the components section, the components expect and return two main object types. The first of those being the ast directory, it contains several classes and functions with which the entire source code can be represented in an AST. The second one is the bytecode directory, which represents the AST mapped to actual bytecode instructions.

Development

For the development of its system, RustPython uses quite a modularized approach. Unlike most cargo-based projects, most of RustPython’s modules, like ast & bytecode, are contained as completely separate projects (with their own project file) within the main repository. This allows them to be used separately while still being contained within the main git environment.

Contrary to how we visualize it in the Architectural style section, the code is called in the following order:

  • rustpython calls vm
  • vm calls compiler
  • compiler calls parser
  • parser generates the AST as defined in ast

The calls are done this way such that each module can act separately if called upon without context. For instance, the compiler can still be called with just the python code instead of always having to first manually call the parser.

Code maintenance

Code maintenance is one of the weaker points of the project at the moment. While code is well tested, there is a severe lack of documentation, which can lead to some maintenance issues down the line. The main reason for this is that the project is pushing as hard as possible for ‘making stuff work’ to have a useful product as soon as possible. The functional nature of Rust, the adherence to styling conventions, as well as the emphasis on clean code alleviate this issue somewhat.

The project is clearly a community effort, with currently 174 contributors and the openness to accept pull requests from anyone. Potential first-time contributions are very welcome and easily found: just open up the test coverage of CPython and find tests that are skipped, unskip it locally and make it work! This process is also outlined in an excellent post on their blog.

The process for building and developing the project is outlined in the DEVELOPMENT.md, and for ease of use, there is even an online gitpod with a correctly set up Docker environment integrated in the project.

Runtime requirements

Since RustPython is, not too surprisingly, written in Rust, your machine needs to be able to run Rust code. Luckily, Rust is supported on most mainstream platforms and more support is on its way. By implementing the VM themselves, instead of compiling to the target machine’s instruction set, they make it easier to support a broader range of platforms. The only other requirement is the ability to compile the std library, though there is work done on removing this requirement so that running on microcontrollers becomes possible.

Key attributes

Two crucial internal attributes, which are important for the developers of RustPython, are usability and maintainability. The realisation of these attributes can be derived from RustPython’s architecture. The clear separation of distinct modules in the repository is the main aspect visible for RustPython. While the current documentation of RustPython is still in an early state, separating these modules allows for easier documentation in the future. Allowing developers to work on these modules separately also increases the maintainability of the project.

API principles

As the purpose of an interpreter such as RustPython is only to evaluate a given Python expression, its Application Programming Interface (API) should be kept small and simple. Two API design principles stand out to ensure that.

The first is the small interfaces principle, which means the list of commands available to the user should be kept as small as possible. In the case of RustPython, the interface exposes the parse_program(), compile_program() and eval() function, which parse, compile and evaluate a given expression respectively. The second notable API design principle is the balance of usability and reusability principle. For RustPython, there are several examples of this. The AST module is one such an example, which is contained in its own module and shared with multiple other modules.