In this series of 4 blogs, we will attempt to analyse and enhance our understanding of the software architecture behind the NetworkX package. In this blog, the first of the series, we will provide a description of the vision behind the NetworkX project and give insight on the projects future aspirations.
First, we will characterize the problems that NetworkX strives to solve in the domain of network analysis, and give a timeline and context of the projects' inception. Moreover, we will then explore the problem domain, outlining the external environment that is related to NetworkX. Furthermore, we will give an overview of the current use cases, the various different stakeholders involved and the future roadmap for the project. Finally, we will shed light on the ethical considerations of the project, given the open source nature of the project.
1. NetworkX vision
NetworkX is a Python package that allows for the creation, manipulation, and study of the structure, dynamics and functions of complex networks. The primary aim of the project is to be “the reference library for network science algorithms in Python”1.
Moreover, NetworkX aims to provide tools to study the structure and dynamics of social, biological and infrastructure networks while providing interfaces and graphical implementations to be used in other applications.
Through quick and easy development environment setup, ability to work with large nonstandard data with close to none modifications to user data, large documentation and consistency across conceptually identical arguments in various functions and methods. It is an easy to use reference-library both for newcomers and seasoned network science researchers.
Below you will find a list of promoted and fostered values2 which highlights the key attributes of the NetworkX package and community:
- Inclusive to newcomers and senior developers alike.
- Open source and community-driven.
- A strong focus on graph data structures and algorithms.
In the following section we will give insight to the timeline and context in which NetworkX has operated.
2. Timeline and Context
NetworkX was first devised in May 2002 by Aric Hagberg, Dan Schult, and Pieter Swart and was finally released in April 2005 as a package3. The flourishing of NetworkX is closely connected to other open-source calculation and visualization systems, for example, SciPy and Matplotlib.
More recently in Oct 2020, one of the core developers of NetworkX suggested that contributors should add more examples to the gallery, to demonstrate packages that can be used in conjunction with NetworkX, such as igraph, scikit-learn and PySAL.
Another powerful network package for Python comparable to NetworkX is NetworKit4. NetworKit is a toolkit for large-scale network analysis with a focus on parallelism and scalability, whereas NetworkX focuses more on general graph computations. Both packages rely heavily on external contributions to develop, with many contributors often adding contributions to both packages.
3. Problem analysis
Now we will perform a problem analysis on the package, to give insight into the various domains in which NetworkX operates.
3.1 The problem domain
The underlying domain model for NetworkX, in the format presented in the lecture notes by Cesare Pautasso5, can be shown as follows:
- A graph is formed of nodes and edges
- Nodes and edges can carry values such as weightings or arbitrary data structures
- Graphs can be stored and shared in various formats such as XML, JSON and YAML
- Graphs contain many properties, such as the degree of nodes, number of edges and distance between nodes
- Algorithms can be applied to graphs to extract further information, or to improve certain aspects of the graph (e.g. edge removal)
3.2 System capabilities
The primary purpose of NetworkX is to allow users, who do not necessarily have technical experience, to easily perform large scale, complex network analysis. The main capabilities of NetworkX to the end-user are:
- Create new networks
- Perform network analysis
- Visualize results
- Import/Export networks.
Users can use NetworkX to build a new network from the ground up, by programmatically defining the network. Simple networks can be built using atomic nodes and edges, as well as more complex networks, that use Python objects and arbitrary data as the nodes and edges.
Analysis can be easily performed once a network has been built or imported. NetworkX provides a variety of built-in algorithms to perform different analyses, such as clustering, shortest path between 2 nodes and cycle detection. Additionally, the open-source nature of NetworkX allows the user to utilise external packages to perform analyses quite easily. An example of this is the utilisation of visualisation packages such as Matplotlib and Gephi to display the network to the user, for exploratory analysis.
Finally, NetworkX allows users to share networks easily. Created networks can be imported and exported in a huge variety of different formats, such as edge lists and adjacency lists, or in specific formats such as JSON, graphML and YAML. This allows the user to use other software solutions to process their networks.
3.3 Key quality attributes
Considering the use cases, stakeholder needs and project values we can derive the following quality attributes that the system must posses, or in other words non-functional requirements:
- Robust data handling
- Persistent data structures and variables
- Immutable user data (NetworkX should not change user’s data in any way)
- Easy to read and understand implementation over efficiency
- Quick setup and easy to install for the novice user
- Compatible with NumPy arrays and SciPy sparse matrices for algorithms that more naturally use arrays and matrices or where time or space requirements are significantly lower
- Thorough and consistent documentation.
These non-functional requirements exemplify the following quality attribute types according to ISO/IEC 250106 which the project tries to fulfil:
- Interoperability: Mainly from the perspective of other Python libraries dealing with various data structures and formats.
- Usability: This attribute is probably the most present since the system aims at easing the required effort to manipulate and study networks through prepared methods and functions, as well as allowing for beginners to start using it.
- Maintainability and Modifiability: Enforced through the use of standard data structures on the lowest levels and consistent naming which in turn allows for easily implementable changes or fixes. Even if the breaking change is introduced, it is usually localised to a few methods or functions, due to how the NetworkX code is structured.
- Supportability: Because of the focus of being user friendly, the project aims at providing enough documentation, logging and informative error messages, though most of this is handled by Python directly.
3.4 Stakeholder analysis
In the original paper7 the authors state the vision for designing a general-purpose tool for graph analysis and a platform for creating more advanced graph algorithms. Therefore, all stakeholders require consistent abstraction of the package throughout the years with robust expansion in functionality.
- Primary stakeholders: The developers, many of which are the core developers, volunteer contributors, and any person who has to deal with complex network data. NetworkX provides these stakeholders with a flexible platform to design handcrafted solutions for their particular domains while relying upon industry-standard graph algorithms and visualization techniques. A non-exhaustive list of examples includes:
- Software developers
- Data scientists
- Behavioural analysts
- Secondary Stakeholders: A non-exhaustive list of those people benefiting from the use of NetworkX:
3.5 Project Roadmap
After investigating the issues raised for the milestone of NetworkX 2.7, the inferred project road map is as follows:
- Add more graph algorithms for cuts, traversals and pathfinding.
- Implement more efficient executions of existing graph algorithms (parallelism, CUDA) while still maintaining the general-purpose nature.
- Support more graph formats for reading and writing.
- Maintain closeness to the general-purpose roots as there is another package for high-performance graph processing.
3.6 Ethical Considerations
NetworkX has a well-defined code of conduct. The code serves to distil their common understanding of a collaborative, shared environment and goals. This inspires those related to the NetworkX community to be open, empathetic, collaborative, inquisitive and careful in words. This is achieved by well-structured deliverables on how to conduct a good review and become a reliable developer.
In line with their inclusive and cooperative values, the NetworkX community not only strongly advocates mentorship and kindness in volunteering work, which starts from the consideration of devoting to building simple, readable implementations for the entire community but also welcomes and honours diversity in all manners. This fairness guarantees the gap between a novice and core developer is small and the focus is aimed at constructive effort towards developing ideas.
NetworkX has continued to live up to standards that are solid and reasonable, by treating those making contributions to the community as valuable. Official document NetworkX Enhancement Proposals (NXEPs)12 have been produced to clarify responsibilities, proposals and decision-making processes. It is stated that decisions about the future of the project are made through discussion with all members of the community, which maintains consensus across different groups.
In addition to public discussions on contributions, NetworkX has a well-organized council to investigate and respond to complaints. The NetworkX Steering Council commits to protecting the identity of the reporter and treating the content of complaints as confidential. For any obvious breaches like personal threats or violent, sexist or racist language, NetworkX has the right to immediately disconnect the originator from their communication channels.
In conclusion, we have analysed the NetworkX project from a high-level perspective holistically. We see that NetworkX has far-reaching effects in the modern era which are facilitated by a well thought out software architecture, disciplined development approach, and strong inclination towards inclusivity. As a result of its embrace of open source development, it has an active developer community, which has given it solid ground to enhance itself in the future.
Staudt, Christian et al. “NetworKit: An Interactive Tool Suite for High-Performance Network Analysis”. arxiv.org. (2014). ↩︎
Hywel T.P. Williams, et al. “Network analysis reveals open forums and echo chambers in social media discussions of climate change”. Global Environmental Change 32. (2015): 126-138. ↩︎
Gratton, Caterina et al. “Focal Brain Lesions to Critical Locations Cause Widespread Disruption of the Modular Organization of the Brain”. Journal of cognitive neuroscience 24. (2012): 1275-85. ↩︎
Anthony, Simon et al. “Global patterns in coronavirus diversity”. Virus Evolution 3. (2017). ↩︎
Javier López Peña, et al. “A network theory analysis of football strategies.” (2012). ↩︎