Theia: Quality and Evolution

Overview

Previously, we looked at Theia’s vision and architecture¹ ². In this post, we will focus on the software quality and evolution of the system.

The open-source nature of Theia enhances the quality of the software in multiple ways. More than 200 developers from all over the world and bring different talents and experiences to the project³ ⁴. Contributing is done in many ways, from reporting bugs and asking questions, to reporting feature requests and creating pull requests. Before developers contribute to Theia, they are expected to read the contribution guidelines. These include coding guidelines, which is defined to maintain consistency and quality of the code⁵. Moreover, Theia uses a set of ESLint⁶ rules to promote proper coding styles, as is further explained in the quality section⁷.

The code quality is maintained through pull requests. Before any code is changed, the pull request is thoroughly reviewed following the pull request guidelines⁸. Anyone can review a pull request, but a core developer needs to approve before merging. In addition, the code robustness is maintained through tests and Continuous Integration (CI) pipelines, where the code functionality is ensured through builds and tests.

In the following sections, we will elaborate on the CI and test processes, hotspot components, code quality, and analyse the technical debt present in the system.

Continuous Integration

Continuous integration (CI) is the practice of automating the integration of code changes from multiple contributors into a single software project⁹. Committing code often enables fast detection of errors and reduces the amount of code a developer needs to debug when finding the source of an error¹⁰. Multiple developers work on Theia at the same time. They create separate branches or fork the repository to work on features or to fix issues and merge them to the master branch when it is complete.

Theia makes use of custom continuous integration (CI) and continuous development (CD) workflows in their repository with GitHub Actions runner¹¹. The ‘runner’ is the server that has the GitHub Actions runner application installed. Runners listen for available jobs, run a job and report the progress, logs and results back to GitHub. These automatic jobs are triggered whenever a developer pushes to a branch or creates a pull request. If the code does not pass all required tests, it is flagged as broken and is prevented from merging with the working repository.

These CI configurations are specified in the ci-cd.yml and build.yml files, defining the sets of jobs with constraints of how and in which order they should be run.

Hotspot Components

We analyse the hotspots by counting the number of commits of different components in the past year, since a more significant number of commits means a higher change frequency. We filtered out the top 25% modules with the most commits as follows.

As shown in the above figure, most of the hotspots concentrate on the packages module, including runtime packages, the core package, and extensions¹². Besides, a small part of the hotspots concentrates on examples, providing some example applications for Theia.

The hotspot components of packages are all about extensions for Theia, which indicates that Theia’s main work in the past year has been to improve the extension framework and provide more extensions. The most active component plugin-ext is an extension that contributes functionality for the plugin API. Theia’s plugins can be loaded at runtime and allow users to customize the IDE with no need to learn any framework. That plugin-ext is so active indicates Theia has been adding new plugins.

The core extension is the main extension for all Theia-based applications and provides the main framework for all dependent extensions. It is always active because it is related to all parts of the system.

The monaco extension contributes to the integration of the code editor monaco¹³, which is the code editor that powers VS Code¹⁴, indicating that Theia is implementing VS Code API.

We also counted the number of commits of different modules in the past three months to observe recent hotspot components. The top 25% active results are as below.

**Figure:** commits in the past 3 months

The hotspots in the past three months are roughly the same as those in the past year. The plugin-ext and core are still the top 2 active components. However, it is worth noting that some components' activity level has changed slightly. For example, preferences have been more active than monaco in the last three months, which is the opposite of the situation in the past year. It may indicate that the manaco extension has become more stable.

We have analysed Theia’s roadmap in a previous post¹, in which we mentioned Theia is still working on improving the VS Code API compatibility. There are 282 issues with the label of “vscode” waiting for solved¹⁵. As the Theia plugin is similar to VS Code extension, most VS Code extensions are implemented in a plugin. Therefore, the plugin-ext will still be a hotspot component in the future. We can see some examples in the pull requests that implement VS Code API, in which files changed are mainly in the plugin-ext model¹⁶¹⁷¹⁸. In the meantime, the core model would also be a hotspot as it affects all kinds of parts of the system. It is also possible that some other components become active if they start to implement some specific APL. For example, we have noticed that Theia has not implemented a few VS Code API functions relating to debugging¹⁹, so it would be possible that debug become one of the hotspots in the future.

Due to the high frequency of code updates in hotspot components, it is easy for them to only implement functions without guaranteeing code quality. We analysed the hotspot components' code and found a few issues in plugin-ext, core, and bulk-edit. For example, deep nested logic is a common issue in hotspot components, which results in a higher possibility of programming mistakes since it increases the programmer’s cognitive load reading the code. Therefore, the code of the hotspot also needs long-term quality improvement and maintenance.

Test Processes

To overcome the conventional web test tools' shortcomings such as low speeds, instability, and difficult development, the core developers of Theia decide to develop their own integration testing framework²⁰ based on existing frameworks like Mocha²¹. In Theia’s testing framework, tests are written against the application APIs via dependency injection instead of browser elements, ensuring that APIs are tested completely and at the right time²². Also, these tests are designed to be executed fast. In general, Theia highlights five API testing principles²², including Information Hiding, Completeness, Extensibility, Convenience, and Robustness, which is corresponding to the API principles we discussed in the previous essay².

As the unit and integration tests are run²²²³²⁴, the test suite of Theia should collect the coverage data and output it to an HTML report. We have executed the tests for Theia’s browser example on master branch at the commit 6afae440e13d126174db7b560b2ef3ad15b9d5ef .

Most tests pass, but some errors were detected.

However, due to some misconfigurations in the project, the test coverage results are not available yet²⁵. The core developers are looking into this issue.

Quality Culture

To understand Theia’s quality culture, let’s go over the discussions in issues and pull requests.

The issues in Theia are well labeled. Some issues related to code quality are labeled as quality. Under the quality label, some issues address duplicate code(issue#5812, issue#9191, issue#9139), clean up code(issue#8713), or check some detailed information, such as a comment for a piece of code(issue#8211). Solving these issues lifts the code’s reusability, maintainability, and readability. Besides, Theia has several issues related to unit and API tests, like the improvement of the CI process(issue#8814), the test suite(issue#8360, issue#8183), and adding missing test cases(issue#7408, issue#7581,issue#7681).

For each pull request (PR), the contributor needs to confirm that they thoroughly tested the changes. The contributors for PR #9207 and PR#9199 used some extra test plugins to test, while the contributor for PR#8971 used Theia’s test suite. Sometimes the developer directly runs the program to see whether the bug is fixed(PR#9212).

After the contributors submit the pull requests, CI checks mentioned in the previous section are conducted to ensure there are no breaking changes. To merge a pull request, a reviewer with write access is required. For example, as PR#8514, the unsatisfied pull request will not be merged. Following that, a new pull request that meets the basic requirements is still asked to change several parts by the reviewer(PR#9022). We can see there are always active discussions between the contributors and the core developer under each pull request, especially on the core modules(PR#8910, PR#8969). Also, the pull requests for core modules are carefully tested by reviewers. In PR#9169, the reviewer ran both the unit test and the API test to check. In PR#9175, the reviewer used the same test plugin the contributor provided to reproduce the test result.

Another code quality control method among the Theia community is the use of ESLint⁶. ESLint is a static Javascript analyser that allows developers to write custom rules for their codebase. Theia defined their own set of ESLint rules in project root that among other things discourage the use of null keywords, unused expressions, and trailing whitespaces. These rules help maintain a consistent coding style and best practices among all developers.

From the above discussion, we can conclude that Theia has a good quality culture.

Technical Debt

Technical debt is a metaphor used to describe the amount of rework needed to realign a system’s current implementation with its original design choices and vision. Robert C. Martin, author of Clean Code²⁶, argues that debt occurs when developers make suboptimal design decisions in the short term to meet time or budget constrains²⁷. However, in practice, many technical debt analysis tools, like SonarQube²⁸ and Crucible²⁹, are built around the assumption that developers will be following suboptimal coding practices when under external pressures. Martin Fowler, developer, and influential author, argues that the debt metaphor is still useful in these more analytical contexts³⁰, and such analysis tools are able to encourage better coding style, more unit testing, and can find bugs that developers have missed. To illustrate this, and to gain a deeper understanding of Theia, we use SonarQube’s default set of rules to analyse the project codebase and discuss the results below.

In this figure we can see an overview of Theia’s performance in terms of code quality markers. SonarQube marks Theia on reliability, security, and maintainability, as well as collating system bugs, code smells, duplications, and coverage. It also estimates in hours how long improving the codebase would take. Based on this overview alone we could conclude that Theia does not perform well on these metrics, however this would be misleading, to explain why we zoom in on the security issues.

Here the Javascript style rules are flagging the use of a cryptographic hash to remind the developers to check that the hash they are using has not been deprecated. Though this contributes to the number of security issues marked by SonarQube, it is actually not an issue since Theia is using a SHA-256 hash which is safe to use.

To return to the earlier overview, SonarQube further misestimates the test coverage of the Theia project, marking it at 0%. However, Theia does provide unit tests, they are simply not visible in the SonarQube pipeline. Of course, SonarQube was still successful in uncovering bugs and bad coding practices throughout the codebase, that developers would benefit from addressing. Some examples are below.

While, the Theia project does not use SonarQube to maintain code quality, largely because they lacked the TypeScript support that developers needed at the time³¹³², they do implement methods of verifying code and maintaining consistent style, including CI tests and ESLint rules as discussed earlier in the essay.

Overall, static analysis of Theia does point towards a need for some code tidying and for an expansion of the used ESLint rules to better future submissions. Nonetheless, current code quality gates, as discussed here, have been performing very well at keeping the technical debt levels manageable, and maintaining a clean codebase.