Stellar - Distribution Analysis

Figure: Stellar Logo

Stellar’s vision is to integrate blockchain technology with the current financial system. The primary differentiation between the two is the fact that blockchain is decentralised. It is a form of a distributed system. Thus, it is worth analysing how the distributed system works, what information it exchanges between its components, whether there is any redundant information in case of a failure, how reliable this distributed system is, and what trade-offs are made within the system.

Concrete Information Exchanged

As explained in our previous posts, Stellar comprises a decentralised peer-to-peer network, Stellar Core, and the Horizon API, which enables developers to communicate with the network. We first focus on the exchange of information between the nodes which form the Stellar Core.

During a node’s setup, the developer needs to register the node so other validators can include this node in their set of trusted validators. The developer can choose to select trusted validator nodes themselves or let the Stellar Core automatically decide on trusted validators. Stellar Core can use the reputation of validators for this. Consequently, nodes do not need to communicate their existence to other nodes. However, they get to know each other when a node asks another validator to validate a network transaction.

The peer-to-peer network keeps a distributed ledger in sync. Transactions are commands to modify the state of this ledger. These commands can range from sending payments to changing the settings on accounts. In order to keep the ledger in sync, nodes have to communicate the transactions with each other. Nodes communicate the changes on the ledger’s state to validators they trust and later vote on how the state of the ledger should change 1. A more comprehensive explanation of this process can be found below.

When a user submits a transaction through an application, this application can either directly connect to the Stellar Core or use the Horizon API. When an application connects to the Horizon API, HTTP requests are sent between them, conforming to a standard format. The Horizon API then sends these requests to a node in the network using the Stellar Go SDK. An application can, however, also directly use the Stellar SDK and thereby directly communicate with a node in the network.

Federated Byzantine Agreement (FBA)

The Stellar consensus protocol (SCP), the underlying algorithm used in the Stellar network, implements Federated Byzantine Agreement (FBA). FBA is in turn derived from Byzantine Agreement (BA). In a BA, the objective is to have a subset of nodes reach an agreement on a particular slot. A viable subset to reach an agreement is called a quorum. The benefit of only agreeing within a quorum and not the entire set of nodes is that the system can still come to an agreement even when a node fails. One of the requirements of BA is the unanimous agreement of membership by the nodes, which means that each node needs to be verified before it is added to the system.

FBA works a bit different. Instead of the need to verify nodes upfront, FBA has open membership and decentralised control. Because it has open membership, it also does not have predefined quorums. Instead, FBA uses a principle called quorum slices. A quorum slice is a set of nodes that a particular node finds acceptable for an agreement. Each node has at least one quorum slice but can have many more. In FBA, a quorum is defined as a non-empty set of nodes containing at least one quorum slice for each of its non-faulty members 2.

Before explaining how FBA works, two concepts need to be introduced first. First is the concept of intertwined nodes. Two nodes are intertwined when every quorum of one node intersects with every quorum of the other in at least one non-faulty node. In an FBA, agreement can be guaranteed between two nodes when this property holds. Due to how the SCP is set up, this property is guaranteed and is therefore considered safe. The second concept is an intact quorum. A quorum is considered intact when every quorum member is intertwined even when all other nodes not in the quorum are faulty 3. The SCP guarantees liveness for intact sets while nodes themselves might not know which nodes are in an intact set.

To reach a consensus, SCP uses a procedure called balloting. Balloting consists of sequential rounds of voting on a ballot until one is accepted. In Stellar’s case, a ballot can be seen as a set of transactions; therefore, it is important that all nodes accept the same ballot and keep the same state. A ballot round has two stages. The first is a preparation phase, where nodes try to create the content for the ballot. This content is based on the transactions that are taking place. The second phase is the commit phase, where nodes try to agree on the ballot’s content. If the ballot is accepted, it is considered the new state. If it is rejected, the network goes to the next balloting round and tries to reach another agreement.

For voting, SCP uses a protocol which is called “federated voting”. In federated voting, nodes can cast their vote together with their quorum slices. Because all nodes also specify their quorum slices, quorums will be discovered. Quorum overlap ensure that intertwined nodes cannot disagree on a particular ballot.

Figure: Federated voting (Simplified SCP, Referenced 27 Mar, http://www.scs.stanford.edu/~dm/blog/simplified-scp.html)

The concept of quorum slices can seem a bit abstract. Therefore we provide a few examples below to illustrate the impact of being in a quorum slice.

If a node v votes for a statement a and all other nodes in a quorum with v also voted for a then v will accept a. When all nodes in a quorum have accepted a, each member will confirm a. It is also possible that v has not voted for a but another statement that contradicts a. If v then contains a node in every quorum slice that has accepted a, v will also accept a. This is because v cannot be part of a quorum that will accept the alternative statement, because there are nodes accepting a in every quorum slice. A set of nodes with at least one node in every quorum slice of v is called a v-blocking set, because it has the power to block votes by v.

Fault Tolerance

One methodology for achieving fault tolerance is (Practical) Byzantine Fault Tolerance (PBFT). PBFT can be used to maintain reliability. The methodology is quite simple and makes assumptions on binary indications of maliciousness, on top of possible failures 4. Its low algorithmic complexity and relative safety make it practical for distributed systems 5. There are, however, some properties of PBFT that can be seen as a problem for networks such as Stellar. One of these properties is that a network using PBFT needs to keep the number of malicious nodes below (approximately, for large systems) 33 per cent 6. This is something that seems easy to assure, yet this cannot be guaranteed, especially if incentives to perform malicious behaviours become great enough to convince previously trustworthy nodes. The larger problem, however, comes from the fact that all nodes need to come to an agreement on the set of trusted validators. This step of the methodology causes the acceptance to the system to become centralised 7. In order to prevent this centralisation, Stellar implemented the Federated Byzantine Agreement (FBA) introduced above 7. The (SCP) protocol developed by Stellar prevents the need for a centralised trusted validator list, instead allowing each node to establish a set of validators to trust, or ‘node quorum slides’ 7. The overlap of trusted sets of nodes, introduced in the dedicated section on FBA, means that trusted nodes occur in sets of trusted validators of multiple network nodes. This causes the emergence of a quorum; consisting of the total overlapping set of quorum slices 6. The protocol both ensures that nodes can only become part of the validation steps when other nodes choose to trust them and that no central authority forms which can control trust. As long as there do not exist any disjoint quorum slices, no contradictory transactions can be recorded. Otherwise, the network will cease operations until consensus is finally reached 7 8. As such, the forms of faults in the Stellar network, faulty or malicious nodes, are prevented from causing discrepancies, as individual trusted nodes within a quorum slice cause a cascading wave of agreement jumping to other slices before long, as long as the operation is found to be consistent with the ledger 8.

CAP Theorem Trade-offs

CAP Theorem states that out of the three topics: consistency, availability and partition tolerance, only two can be guaranteed 9.

The SCP was designed to ensure consistency. Once a transaction is confirmed, it can not be changed. The cost of this consistency can be reduced availability. Whilst the availability can be maintained to a relatively high standard, when the choice has to be made by the protocol, consistency is preferred over availability. This can be seen in practice in the May 15th 2019 outage, where the network was down for 67 minutes 10.

The SCP does have partition tolerance, ensured through the earlier explained Quorem Slices of the protocol. This means that so-called “forks” in the network are not possible. When two partitions are created (for example, due to a partial outage), at least one of the partitions will not be able to settle transactions. However, this situation is often prevented because of the redundancy built into the protocol using quorum intersections. Again, in the end, SCP chooses consistency and partition tolerance over availability: when it is not possible to settle the transactions, the network will become unavailable in one of the partitions until the connection is restored 3.

References


  1. Stellar transactions, Referenced 29 Mar, https://developers.stellar.org/docs/glossary/transactions/ ↩︎

  2. Simplified SCP, Referenced 27 Mar, http://www.scs.stanford.edu/~dm/blog/simplified-scp.html ↩︎

  3. Stellar Consensus Protocol Whitepaper, Referenced 25 Mar, https://www.stellar.org/papers/stellar-consensus-protocol ↩︎

  4. Medium: How Does Distributed Consensus Work?, Referenced 28 Mar, https://medium.com/s/story/lets-take-a-crack-at-understanding-distributed-consensus-dad23d0dc95 ↩︎

  5. PBFT, Referenced 28 Mar, https://scholar.google.com/scholar_lookup?title=Practical%20byzantine%20fault%20tolerance&publication_year=1999&author=M.%20Castro&author=B.%20Liskov ↩︎

  6. TU Delft Repository: Comparative Study of Byzantine Fault tolerance Consensus Algorithms on Permissioned Blockchains, Referenced 28 Mar, https://repository.tudelft.nl/islandora/object/uuid:01083a4a-900b-4cf9-9746-cb9258c11d9e/datastream/OBJ/download ↩︎

  7. Stellar Development Foundation: The stellar consensus protocol: A federated model for internet-level consensus, Referenced 28 Mar, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.696.93&amp=&rep=rep1&amp=&type=pdf ↩︎

  8. Stellar consensus, Referenced 28 Mar, https://support.blockchain.com/hc/en-us/articles/360019105391-Stellar-consensus ↩︎

  9. CAP Theorem, Referenced 25 Mar, https://en.wikipedia.org/wiki/CAP_theorem ↩︎

  10. Stellar outage May 15th 2019, Referenced 22 Mar, https://medium.com/stellar-developers-blog/may-15th-network-halt-a7b933103984 ↩︎