RustPython - Architecture.MD
Brief introduction
For our last essay, we decided to create an architecture.md
file for the RustPython repository. Since the current documentation coverage of the RustPython project is less than 10%, we feel creating an architecture.md
will provide the most benefit. It will serve as a good way to help new contributors understand the code architecture of RustPython. In addition to this, it also closes the gap between occasional and core contributors.
For more information on Architecture.MDs we would like to refer to this excellent blog post, or this example for RustAnalyzer.
Architecture
This document contains a high-level architectural overview of RustPython, thus it’s very well-suited to get to know the codebase.
RustPython is an Open Source (MIT) Python 3 interpreter written in Rust, available as both a library and a shell environment. Using Rust to implement the Python interpreter enables Python to be used as a programming language for Rust applications. Moreover, it allows Python to be immediately compiled in the browser using WebAssembly, meaning that anyone could easily run their Python code in the browser. For a more detailed introduction to RustPython, have a look at this blog post.
RustPython consists of several components which are described in the section below. Take a look at this video for a brief walk-through of the components of RustPython. For a more elaborate introduction to one of these components, the parser, see this blog post for more information.
Have a look at these websites for a demo of RustPython running in the browser using WebAssembly:
If, after reading this, you are interested to contribute to RustPython, take a look at these sources to get to know how and where to start:
- https://github.com/RustPython/RustPython/blob/master/DEVELOPMENT.md
- https://rustpython.github.io/guideline/2020/04/04/how-to-contribute-by-cpython-unittest.html
Bird’s eye view
A high-level overview of the workings of RustPython is visible in the figure below, showing how Python source files are interpreted.
Main architecture of RustPython.
The RustPython interpreter can be decoupled into three distinct modules: the parser, compiler and VM.
- The parser is responsible for converting the source code into tokens, and deriving an Abstract Syntax Tree (AST) from it.
- The compiler converts the generated AST to bytecode.
- The VM then executes the bytecode given user supplied input parameters and returns its result.
Entry points
The entry points are as follows:
- The ‘main’ method of the application:
run
, located insrc/lib.rs:70
. This method will call the compiler, which in turn will call the parser, and pass the compiled bytecode to the VM. - Parser:
parse
, located inparser/src/parser.rs:74
. - Compiler:
compile_top
, located incompiler/src/compiler.rs:90
. - VM:
run_code_obj
, located invm/src/rm.rs:459
.
Modules
Here we give a brief overview of each module and its function. For more details for the separate crates please take a look at their respective READMEs.
Bytecode
This module (single file at moment of writing) holds the representation of bytecode for RustPython.
Compiler
Python compilation to bytecode. The interface is exposed through the porcelain crate with compile
and compile_symtable
, while the inner workings are defined in compiler/src/compile.rs, which is mostly an adaptation of the CPython implementation.
Derive
Rust language extensions and macros specific to rustpython. Here we can find the definition of PyModule
and PyClass
along with useful macros like py_compile!
Parser
All the functionality required for parsing python sourcecode to an abstract syntax tree (AST)
- Lexical Analysis
- Parsing
As Python heavily relies on whitespace and indentation to organize code, the crate used for parsing, LALRPOP, the raw source code is first preprocessed by a lexer which makes sure that Indent
and Dedent
tokens occur at the correct locations. Then, the parser recursively generates an AST for the code which can be processed by the compiler.
Lib
Python side of the standard libary, copied over (with care) from CPython sourcecode.
Lib/test
CPython test suite, which can be used to compare with CPython in terms of functionality and performance. Many of these files have been modified to fit with the current state of RustPython (when they were added), with one of three ways:
- The test has been commented out completely if the parser could not create a valid code object (Syntax Error)
- A test has been marked as
unittest.skip("TODO: RustPython")
if it led to a crash of RustPython - A test has been marked as
unittest.expectedFailure
withTODO: RustPython
left on top if the test can run but failed
Note: This is a recommended route to starting with contributing. To get started please take a look this blog post.
VM
Python VM
- builtins: all the builtin functions
- obj: Builtin types
- stdlib: the parts of the standard library implemented in rust
src
The RustPython executable is implemented here, which is the interface through which users come in contact with library. Some things to note:
- The CLI is defined in
src/lib.rs@run
- The interface and helper for the REPL are defined in this package, but the actual REPL can be found in
vm/rustyline.rs
WASM
Crate for WebAssembly build, which compiles the RustPython package to a format that can be run on any modern browser.
extra_tests
Integration ans snippet tests for the entire project.