DESOSA 2021

pip - The architecture

In our last blogpost, we described the product vision of pip. In this blogpost, we will dive deeper into the architecture of pip. We will discuss a few different architectural views of pip and also discuss some made design choices.

Main architectural style

pip uses a component-based architectural style for their project. A component-based architectural style focuses on the decomposition of the design into individual functional or logical components. These components represent well-defined communication interfaces containing methods, events, and properties.

If we look into the source code of pip, we will see two folders: internal and vendors. The vendors folder consists of all external libraries and dependencies of pip. Each dependency can be seen as a component. The internal folder contains the components of pip itself along with folders to organise the codebase a bit better.

Figure: The internal folder of pip

The code of pip is divided into several components where each component has a specific task and responsibilities. In the figure above, all the folders of the internals of pip are shown. Some of these folders represent a component whilst other folders are created to organise the code base. More about the components will be elaborated upon in the components view. But first, the container view will be discussed.

Container view

The container view of a project shows the decomposition of a system from the runtime environment point of view. Each container in this view is a separate unit of deployment that can be independently evolved.

The container view of pip is pretty simple. Outside of the container there are two elements, on the one hand we have the indexes such as the Python Package Index1, and on the other side we have the end user. Between the indexes and the end user you have pip.

Figure: The container view of pip

Inside the container you can see the local repository and the core. The local repository are all the currently installed packages on the machine. The core fetches the packages from outside pip and handles the management of these packages (download, update and delete) and the interface for the user. In the next section (components view), we will go more in depth into the core of pip.

Components view

The components view of a project shows the structural decomposition of the software along with the dependencies between the components. The figure below shows the components of the core of pip and the dependencies between each component.

Figure: The components view of pip

The core of pip is divisible into four main components, all written in Python. One of the components is the command line interface (CLI), where the user interacts with the application. All the functionality of pip can be accessed through commands in the CLI. The second component would be the configuration file handler, which is, as expected, in charge of handling all the configuration files such as desired version control. The third component is the resolver, which resolves issues around dependencies of packages, such as a certain package needing a certain version of some other package. The last component is the package finder/index. This component searches for what packages to install and where to install it from.

The core of pip is connected to the local repository. The local repository is, as mentioned in the previous section, the downloaded packages on a machine. The core needs access in order to be able to determine which packages needs updates or which packages are dependent on each other, hence the connection.

The depencencies between components are also visible in the figure above. The components within the core are dependent on each other. The CLI needs a user command that will perform some operation, making it dependent on the user. For commands such as installing packages, all other components are required, and a dependency on the online repository is formed. The configuration file handler needs to locate where the packages are going to be installed, whilst the resolver needs to search for possible conflicts between the installed packages and the package finder/index needs to search for the specific package that needs to be installed. Finding the specific packages would require searching in PyPI1 or other indexes. Hence a dependency outside of the application is shown in the figure above. The next section will discuss how the components interact with each other.

Connectors view

The connectors view depicts how the components are interconnected and what connector(s) are chosen for these connections. In the previous section, the components view is described. As can be seen from the figure in the previous section, all the inner components of the core are interconnected. The components are all written in Python, making it fairly easy to integrate other components. Every component is quite straightforward and calls other components when needed. No special data types are being used like a queue since everything in the application works based on a command. Each command performs sequential actions on different components.

Development view

The development view of a project should show the static organisation of the software and a mapping between the logical view of a system and the code2. This section will describe how these components look like and will be connected from a development view. The goal here is to describe all decisions that have been made when organising the software.

The internal code can be divided in multiple components which are being described in the components view. Some of these components make use of dependencies in the vendor code. It is very easy to connect different components with each other and to make use of vendor code. This can be done by just using a simple import at the top of a Python file. Within the internal code, there is only one component, which depends on an external resource. This external resource is the PyPI Simple API. The package finder component requires this external resource to retrieve the indexes of given package names.

For some components, the decision has been made to not use a dependency but to write code manually. This has for example been decided in the resolver component as described in their documentation3. Using the already existing dependency would result in a decrease in performance and an extra dependency for C.

To build and use pip from the source code instead of the build included in Python, developers can simply follow the instructions described in the “getting started guide”4 of pip containing a few simple commands to build pip from the source.

Run time view

The run time view should describe how components interact with each other during run time and what dependencies are important. While the previous section described how components interact with each other from a development point of view, this section will describe the interaction between different components from a run time point of view. As described in earlier sections, the CLI would be the entry point of user input during run time.

As shown in the figure below, all processes in pip start with a user giving input to the CLI. Once the CLI parser has parsed this user input, it will interact with other components. These components will convert this parsed input to concrete actions, which pip can finally execute. Due to the simplicity of the language pip is written in, there are no complex processes to be executed to realise this interaction. The only run time dependencies some components have, are a network connection and that the Python package index is online for the package finder component. Without these two run time dependencies, users would not be able to install or update new packages. Therefore, these two run time dependencies are very important for pip. A broad overview of the flow of pip during run time can be found in the pip overview documentation5.

Figure: General flow of pip during run time

To improve the performance during run time, pip makes use of caching for some functions in the package finder component. One maintainer found out that a few functions in the package finder component only depend on a few immutable parameters. As described in this pull request6, pip can cache this function in the package finder to improve performance of finding the correct packages. During run time, pip can use the cache for packages it has already found earlier and this would improve performance of installing packages by 50 percent6. This did, however, result in a new issue since the PackageFinder class might be mutable. Therefore, a new issue has been created to make this cached function static and only depend on the state of PackageFinder instead of on the object itself. This resulted in a major refactor, since every function called by this cached function should be static as well, or be called by an immutable object.

Architecture and key quality attributes

There are 12 main quality attributes or non-functional requirements that are most widely discussed, which have been described by A. Silva7. Technically there are way more, but we will mainly discuss a few of these that are most prominently visible in pip.

Since pip is an open-source project, a lot of their focus is aimed at maintainability. They want their code base to be as maintainable as possible, such that collaborators from all over the world can do small and large changes, yet keep the entirety easily maintainable. This is mainly done through thorough documentation, extensive testing and continuous integration of pull requests which verify code quality. They also do not allow anyone to just merge new code into the project as it requires an extensive analysis and review by main collaborators (often multiple) before a merge is allowed. Additionally, pip’s code has a component based architecture as described above, which also eases maintainability.

pip is also well known for its ease of use, small size and completeness, being a simple command line installer program of which its most commonly used command is only three phrases. Installing pip only takes a couple of seconds even on older devices, and contains, for the more advanced users, commands for more functionality. This allows pip to be extremely usable, portable and functional.

Since pip is purely a bridge script between PyPI and python, it has no downtime, as all downtime is purely dependent on the servers that are hosting the packages for PyPI. If PyPI were to be down, then indeed, pip would become temporarily less useful, as the install, search and other commands that require access to the PyPI servers would not work. However, commands such as package deletion, listing currently installed packages et cetera would still be usable, as pip is merely a script run locally on the user’s device.

References


  1. Python package index, https://pypi.org/ ↩︎

  2. Pautasso, C (2020), Software Architecture visual lecture notes ↩︎

  3. pip documentation of internals, https://pip.pypa.io/en/latest/development/architecture/anatomy/ ↩︎

  4. pip getting started guide, https://pip.pypa.io/en/latest/development/getting-started/ ↩︎

  5. pip overview, https://pip.pypa.io/en/latest/development/architecture/overview/ ↩︎

  6. pip pull request introducing caching, https://github.com/pypa/pip/pull/9078 ↩︎

  7. Silva, A. (2017, December 27). How to Write Meaningful Quality Attributes for Software Development. How to Write Meaningful Quality Attributes for Software Development. https://www.codementor.io/@antoniopfesilva/how-to-write-meaningful-quality-attributes-for-software-development-ez8y90wyo ↩︎

Pip
Authors
Martin Li
Quentin Lee
Thijmen Langendam
Wang Hao Wang