NetworkX - Architectural Style
In the second blog of this series, we explore the architectural elements and relationships apparent within NetworkX, peering through the lens of various different architectural views orginiated from the C4 model1.
To recap, NetworkX is a python library for complex-network analysis, which has varied use-cases and stakeholders.
1. Main Architectural Style
NetworkX primarily utilises the Model View Controller (MVC) pattern. The MVC pattern is a commonly used design pattern in software architecture.
In MVC, the model represents the core functionality and data, the view displays information about the model that is displayed to the user, and the controllers are the methods that are used to modify the model. Essentially, MVC allows for an abstraction of internal methods to the user, so they are only concerned with the view and modifying the model using controllers.
In NetworkX, the models correspond to a graph type, such as a directed graph. Each graph model contains controllers, in the form of methods that facilitate any modifications (e.g inserting/removing an edge/node). Finally, the view component in NetworkX can display the graph in different forms, for example as a Matplotlib figure.
2. Containers View
Like most Python packages, the main maintainers of NetworkX push each new version to PyPI2 (python package index) through a continuous deployment(CD) pipeline. From PyPI, users can download either the stable or development version through the standard Python package manager pip
3. Additionally, customized versions of the source code for manual installation are available through GitHub and PyPI.
Due to the dependence on the scientific computing and graphics provided by the Python ecosystem, NetworkX also provides a one-time download command that contains all optional dependencies, using pip. Some dependencies, such as gdal
, also require C or C++ source codes to be compiled correctly.
By conforming to the mainstream package deployment process in the python community, the cost of building packages and dependencies by end-users themselves is greatly reduced.
What is the final (and very important) step in the deployment process? Testing! NetworkX adopts the package pytest[5] to assist in testing the unpacked source distribution or installed package.
3. Components & Connectors view
The main components that make up the functional core of NetworkX organised as follows:
- Core Classes: describe how the network is stored and accessed.
- Algorithms: commonly used complex-network algorithms
- Linear Algebra: common mathematical techniques used for graph algorithms
- Drawing: module for effective visualisation of graph for analytical purposes.
- Generators: generates graphs with particular properties from matrices, dictionaries, lists, plaintext, etc.
- ReadWrite: IO module
In addition, five types of connectors can be defined to describe the relationship between two components. The Core module is the central component that the other functional modules heavily depend on. Connectors cannot execute in parallel; an Analyse connector normally starts after a Generator calls the constructive or operational actions of the core module. Moreover, there is no need for linear execution since a Plot connector can happen anytime once the Generator finishes the call.
4 Development view
Due to length limitations, we will only show the main design decisions of NetworkX that will highlight the developers’ thought process and aim while developing the library.
4.1 Core Classes
These define the functional backbone of NetworkX. These organise storing and accessing graphs
4.1.1 Graph Types
The parent class of all the graphs is Graph
. It allows storing at most a single edge between two nodes and has the general functions for simple graph operations. Graph
is specialised into:
DiGraph
: directed graphs with self-loops, but no parallel-edges.MultiGraph
andMultiDiGraph
: to store multiple edges between two nodes (parallel edges). Both undirected and directed possible for efficient representation and access.- Ordered variants: ordered variants for the above graph types for compatibility with
python <3.6
where dictionaries are not inherently ordered4.
4.1.2 AtlasView and AdjacencyView
The data in the graph needs to be accessed consistently. AtlasView
provides a read-only view into a dict-of-dict data structure. The inner level of dict is read-write, but the outer level is read-only. It is used to access node and edge information for a general graph.
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G.add_edge("a", "b", weight=7)
>>> G["a"]
AtlasView({'b': {'weight': 7}})
AtlasView
is specialised into AdjacencyView
which is a read-only view into a dict-of-dict-of-dict data structure. Again, the inner level of dict is read-write, but the outer levels are read-only. It can be used to access node and edge information for a MultiDiGraph
for instance.
>>> G = nx.MultiDiGraph()
.... # add nodes and edges
>>> G[1] # adjacency dict-like view keyed by neighbour to edge attributes
AdjacencyView({2: {0: {'weight': 4}, 1: {'color': 'blue'}}})
4.1.3 Degree Views
These are used for obtaining the degree information for the nodes, along with directional information (in/out).DiDegreeView
is the parent for all Degree Views used for Graph algorithms i.e. it is specialised/simplified for other graph types. It provides a dict of tuples in the form of {<node>: <degree>, … }
pairs.
>>> G.degree
DegreeView({0: 4, 1: 4, 2: 4, 3: 4, 4: 4})
Additional functionality includes read-only lookup of node degree, calling with optional parameter nbunch
(for getting only a subset of nodes) and weight
(use edge weights to compute degree). It has specializations for various graph types (directed and multigraphs) and filtering specifically in/out edges.
4.1.4 Edge Views
Edge Views can be used for getting a list of edges in a graph in the form of [ (<node_1>, <node_2>), … ]
.
>>> G = nx.path_graph(4)
>>> G.edges()
EdgeView([(0, 1), (1, 2), (2, 3)])
We see an interesting design decision where the developers chose to put OutEdgeView
at the parent for all Edge Views.
4.2 Algorithms and Linear Algebra
Now that we have a good idea of how the graph information is stored and accessed we can focus on how it is used. There are many use cases for complex network analysis and these require a plethora of algorithms that are independent of each other. Over years of iterations, the developers have been consistent with how they handle this. Every category of algorithms is coded independently without a class hierarchy, as simple python functions.
These algorithms makeup 170 python scripts with some complex or more structured algorithms divided into 17 categories5. Some examples:
- Shortest path search
- Graph cut metrics
- Graph colouring
There also exists a dedicated module, linalg
, for linear algebra, required for certain algorithms. This module is concerned with using the adjacency for solving common equations e.g. Ax = b
, getting the Laplacian 6, etc.
4.3 Diagrams
Now we know how the network information is stored, accessed and the opportunities we can exploit to use the network data. The next step is to visualise the network to get clear insights.
NetworkX enables the user to generate various layouts: how the nodes are positioned in a multidimensional space. These ‘layout strategies’ take into consideration network properties such as edge weights and node degrees. Some examples are:
fruchterman_reingold_layout
: A force-directed algorithm treating edges as springs holding nodes close, while treating nodes as repelling objects (Anti-Gravity Force).kamada_kawai_layout
: An algorithm to position nodes using a path-length cost-function.spectral_layout
: position nodes using the eigenvectors of the graph’s unnormalized Laplacian. This shows possible clusters which are an approximation of the ratio cut.
Besides the layout, the drawing functions under the drawing
module enable changing the edge width, edge colour, node size, node colour, colouring the edges and nodes according to a given value, and more.
5. Runtime view
When a Python program is executed, it is contained in a single runtime environment which is handled and managed by the python executable itself. If the started process does not utilize multithreading or multiprocessing functions, then internal data is largely shared between defined components within the process. Now, considering that NetworkX is just a library of classes and functions, the implementation of multithreading/processing is primarily left to the user, as NetworkX does not natively implement it.
However, most classes have restrictions in place for the reading and modification of data, as the python memory manager restricts access to the internal memory heap containing all objects and data structures 8. Even data structures defined by other libraries, supported by NetworkX, utilize the same approach. The only clear interaction constraints imposed by NetworkX’s architecture during runtime is that once the data is read, it cannot be altered by the class on its own (e.g. for performance or rounding floats), only when explicitly specified by the user.
A typical application scenario is to start by constructing a suitable graph type, explore the use of nodes and edges, analyze the graph, and finally output the result.
- Construct a graph object:
G=nx.Graph()
. - Operate on the nodes and edges: addition(
add_node()
,add_edge()
), deletion(remove_node()
,remove_edge()
), modification(update()
), and accessing(adjacency()
,edges()
). All operations are supposed to be atomic or result in failure. - Analyze the graph with supported algorithms.
- Visualise the resulting network.
More relevant use cases can be found on the website9, which can help users familiarize themselves with the general execution scenarios for NetworkX.
6. Realisation of key quality attributes
The architectural design present in NetworkX has enabled the realization of the key quality attributes emphasized by the project’s vision and goals10 which primarily aims at being inclusive to newcomers and senior developers alike, and be user-friendly in regard to usability with a string focus on graph data structures and algorithms. We use the ISO/IEC 2501011 definition of key quality attributes.
Considering interoperability, NetworkX aimed at providing interfaces to their various methods and functions capable of handling different data. This meant the underlying approach to data must be viewed from an object-lens which facilitates an abstraction layer for different logical operations, data castings and generators. Additionally, since this project is a python library, it must also consider other popular libraries, like NumPy, and their data structures with additional castings and converters. High interoperability allows the library to handle a variety of data as well as integrate well with other libraries. The downside is that it increases complexity with a possible hit to performance.
Usability of the library is mainly expressed through numerous functions allowing for various network manipulations, effective polymorphism, and rich documentation. While this does not directly shape underlying software architecture, it still influences how different methods and functions interact within the library. The MVC architecture enables simple and familiar usage patterns allowing users to pick and choose wanted controllers.
Maintainability and Modifiability are mainly expressed through the development approach rather than technological decisions. At the code level, NetworkX uses standard python primitives in order to improve maintainability. Additionally, they have consistent naming conventions and function interfaces. These development decisions primarily influence the time and effort required to have meaningful results, and also enable easier maintainability and modifiability considering the project is open-source.
Lastly, Supportability is reinforced by other quality attributes: Usability, Maintainability and Modifiability. Additionally, it is enhanced by thorough and consistent documentation, examples for users, and rich logging and error handling.
7. API Design
NetworkX provides a well-documented API12, that follows a variety of different API design principles13:
- Explicit interface principle: The endpoints within NetworkX are stateless, meaning subsequent API calls do not have an effect on each other.
- Information hiding: Explicit knowledge of the inner workings of each endpoint are not required to use them. The user does not need to know how to calculate the number of nodes mathematically, in order to use the
number_of_nodes()
function. - Least surprise: Consistent and clear naming and usage of endpoints. For example:
add_node()
andadd_edge()
are used to add nodes and edges to a graph respectively. Users can infer what other endpoints would be.
Furthermore, although explicit versioning per endpoint is not implemented, endpoints that are deprecated are still kept in the project to allow for backwards compatibility for projects that still utilise them, and new endpoints are created under a different name. The quality expected in API design for contributors is documented in the API documentation13.
8. Conclusion
Hope you have enjoyed exploring the architecture of NetworkX from different ‘Views’. Its robust, modular architectural design is the key to its versatility and also allows effective development changes.