DESOSA 2021

pip - Variability analysis

The previous three essays described the product vision, architecture and quality assessment of pip. This essay will discuss the variability features of pip. pip allows a lot of configuration, while it still allows users to run pip on all operating systems with all versions of Python 3. Therefore, it would be interesting to take a deeper look into the variability features of pip and how they handle it. This essay will discuss how pip handles different hardware, operating systems, and how users can configure different commands suiting their personal needs.

Variability model

This section describes the variability model of pip. Software variability is the ability of a software system to be efficiently extended, changed, customised or configured for use in a particular context1.

Variablity features and their limitations

Software variability occurs in the platform, hardware and software. These three variabilities apply for pip. pip works on three major operating systems: Windows, MacOS and Linux. Here we are dealing with platform variablitity. Every computer has different hardware, an Intel or AMD CPU, different amount of RAM, et cetera. So pip also has to deal with hardware variability. At last, the software of pip should be configurable, which results in software variability. More elaboration will follow on the latter part since it is the most prominent part of pip.

Hardware and platform variability features

pip has many different variability features. One of the main feature is that pip is compatible with all different types of hardware and operating systems. Since pip does not execute heavy processes, it should be able to run pip on all different hardware devices as long as the operating system has Python installed. The only limitation regarding hardware, is that some commands require a network connection.

Another variability feature of pip is that it should be compatible with all different versions of Python. Currently, pip supports all versions of Python 3. Unfortunately, support for Python 2 has been dropped with the current version of pip. Older versions of pip do support Python 2 so users can roll back to these version. Inspecting the code base of pip and the software quality processes as described in the previous essay2, it shows that pip runs regression tests for past versions of Python 3 in their Continouos Integration (CI) pipeline. It could therefore be expected that all versions of Python 3 will be supported for a long time.

Software variability features

As mentioned in earlier essays3, pip retrieves its packages from multiple different indexes including the Python Package Index (PyPI). This is a huge advantage, since users are not limited to packages available on one index. In addition to this, users can also manually add indexes using the --index-url parameter in a command. This url could either be a url to an online repository similar to PyPI or an url to a local directory on the users computer.

To continue on the previous variability feature, pip also supports installing from different version control systems such as git and svn4. To install a package from for example a git repository, one can use the following command: pip3 install git+ssh://git.example.com/MyProject#egg=MyProject. In this command, git and ssh could be interchanged with other version control systems and other network protocols. The only limitation with this functionality, is that only a few version control systems are supported by pip and that pip has a few supported protocols for each version control system.

Variability in user commands

pip can install packages in different ways. The two main approaches to install packages are to specify them in the command itself, or to list them in a requirements.txt file. Specifying the package in the command would look like this: pip3 install <PACKAGENAME>. Installing packages specified in a requirements.txt file would look like this: pip install -r requirements.txt. If the package name is not found, pip will make suggestions on what you might be looking for.

Users can define specific versions of a package which they want to install. With the command pip3 install SomePackage==1.0.4, pip will install version 1.0.4 of the desired package. Additionally, this can be achieved by installing from a requirement file as mentioned above, where the requirement specifiers decide what the version of some package needs to be. Instead of a command, the requirements file can contain requirement specifiers as SomeProject >=1.2,<2.0, where pip tries to install a version between 1.2 and 2 of the desired package. A limitations of choosing package versions to install is that some package might depend on specific versions of other packages which is not feasible with the set of requirement specifiers. The resolver would disallow this and return an error message. The user needs to resolve this manually.

pip also has the compatibility to add new functionality or to enable deprecated functionality using the --use-feature <feature> and --use-deprecated <feature> parameter. This allows users to specify old versions of features if some issue occurs with the current version of the feature. The main use case of this feature is to replace the current resolver with the old resolver if the current resolver causes some dependency issues5.

pip also allows the user to compute the hash of a local package archive using pip3 hash [options] <file> .... This is convenient to get a hash digest for protecting against remote tampering. They allow for the use of multiple hashing algorithms through the hashlib library. The only thing they changed was the disabling of weaker hashing functions (md5, sha1, sha224 and more) to avoid false senses of security.

The above mentioned variability features all made use of different user defined parameters. It would however be a huge hassle to specify these (same) parameters for every user command. To solve this, the user has two different options. Users can specify the parameters as environment variables, or create configuration files. Both will allow the user to predefine parameters, so they do not have to be added as a command line parameter. When adding environment variables, the user should make sure that the environment variable is in the following format: PIP_<UPPER_LONG_NAME>. pip allows users to create configuration files on different levels. These levels are global level, user level and virtualenv6 level. This is very useful for the user, since it can easily create different configuration files for different projects. Sometimes, a user does not want to use any of these predefined parameters. In this case, pip allows the user to ignore these variables and configuration files using the --isolated parameter.

So overall, all variability features can be divided in four subcategories. The first two subcategories are the hardware and platform variability features. pip supports all hardware as long as it has any operating system installed with a subversion of Python 3. Also, pip supports all different operating systems and included all possible commands in their manual for different operating systems 7. The other two subcategories are variability of commands using parameters and user configuration files. For many commands, pip supports some variability features which specifies parameters. To add these parameters by default, users can add configuration files or environment variables to tweak these parameters to their preference.

Feature model

The feature model is a tree structered model displaying the different features of pip and their relation 8. The feature model shows how different variability features relate to different components in pip. In the figure below, the feature model of pip is depicted. This figure shows how a user can configure the variability features within pip. In the figure below, you can see that all software specific variability features relate to either the configuration of commands or index specification. The hardware and platform related variability features are excluded from the figure.

Figure: pip’s feature model

Variability management

Since the latest version of pip, and as mentioned in the features and limitations section above, support for Python 2 has been dropped, and thus pip only allows for the use of Python 3. pip automatically selects the correct version of Python 3 to use on your machine, as it simply searches for the version installed, and uses that. But if you were to use an older version of pip, the user could select the version of Python to install using the following two commands: pip and pip3 for versions Python 2 and Python 3 respectively.

For all variability features there exist different commands and there are two ways to manage these features. The first option is to specify them as a parameter in the command line. This, however, is very cumbersome, since a user often specifies the same parameters for every command. Therefore, pip also allows users to manage these variability features using configuration files.

Implementing variability

To realise the variability in the software system of pip, some coding is required. As mentioned before, hardware, platform and software variability apply to pip. To realise the hardware variability features, no additional programming is required as the operating system takes care of this. For pip to support multiple platforms, they made sure to have no specific software dependency except for Python 3. For configuration files, it is quite straightforward, pip searches for the configuration files in the repository and sets the parameters according to what is written within the configuration file. pip also has to check if the user has specified parameters in their commands. If there are parameters specified, these should overwrite the parameters from the configuration file for the corresponding command.

Since all of these different variability features do not require huge mechanisms, the binding time of these variability features should also be minimal. Additionally, it should not be too difficult to add different variability features to pip. As shown in the install.py9 file, developers can easily add a command line parameter using the add_option command to add a new variability feature to pip.

Adding new variability features to a command does however increase its complexity. Every variability feature should be tested in combination with every other feature if possible. These tests should also pass on all different operating systems with all different versions of Python 3. pip does have tests for these variability features, but as mentioned in the previous essay2, these result in CI/CD pipelines of a few hours. This would hinder the development lifecycle of pip, especially when creating PRs.

References


  1. A taxonomy of variability realization techniques†, https://onlinelibrary-wiley-com.tudelft.idm.oclc.org/doi/abs/10.1002/spe.652 ↩︎

  2. pip - Quality and Technical Debt, https://2021.desosa.nl/projects/pip/posts/2021-03-22-quality-and-technical-debt/ ↩︎

  3. pip - The architecture, https://2021.desosa.nl/projects/pip/posts/2021-03-15-pips-architecture/ ↩︎

  4. pip install --vcs documentation, https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support ↩︎

  5. --use-feature documentation, https://pip.pypa.io/en/stable/user_guide/#fixing-conflicting-dependencies ↩︎

  6. virtualenv documentation, https://virtualenv.pypa.io/en/latest/ ↩︎

  7. pip documentation v21.0.1, https://pip.pypa.io/en/stable/ ↩︎

  8. Feature-Oriented Software Product Lines Concepts and Implementation, https://link-springer-com.tudelft.idm.oclc.org/book/10.1007/978-3-642-37521-7 ↩︎

  9. pip install.py file, https://github.com/pypa/pip/blob/master/src/pip/_internal/commands/install.py ↩︎

Pip
Authors
Martin Li
Quentin Lee
Thijmen Langendam
Wang Hao Wang