— CH. 1 · FOUNDATIONS AND REQUIREMENTS —

Consensus (computer science)

~7 min read · Ch. 1 of 6

6 sections

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data value that is needed during computation. Example applications of consensus include agreeing on what transactions to commit to a database in which order, state machine replication, and atomic broadcasts. Real-world applications often requiring consensus include cloud computing, clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs (and multiple robots/agents in general), load balancing, blockchain, and others.
The consensus problem requires agreement among a number of processes (or agents) on a single data value. Some of the processes (agents) may fail or be unreliable in other ways, so consensus protocols must be fault-tolerant or resilient. The processes must put forth their candidate values, communicate with one another, and agree on a single consensus value.
Protocols that solve consensus problems are designed to deal with a limited number of faulty processes. These protocols must satisfy several requirements to be useful. For instance, a trivial protocol could have all processes output binary value 1. This is not useful; thus, the requirement is modified such that the production must depend on the input. That is, the output value of a consensus protocol must be the input value of some process. Another requirement is that a process may decide upon an output value only once, and this decision is irrevocable. A method is correct in an execution if it does not experience a failure. A consensus protocol tolerating halting failures must satisfy the following properties.
Termination Eventually, every correct process decides some value. Integrity If all the correct processes proposed the same value , then any correct process must decide . Agreement Every correct process must agree on the same value.
There are two types of failures a process may undergo, a crash failure or a Byzantine failure. A crash failure occurs when a process abruptly stops and does not resume. Byzantine failures are failures in which absolutely no conditions are imposed. For example, they may occur as a result of the malicious actions of an adversary. A process that experiences a Byzantine failure may send contradictory or conflicting data to other processes, or may sleep and then resume activity after a lengthy delay. Of the two types of failures, Byzantine failures are far more disruptive.
Thus, a consensus protocol tolerating Byzantine failures must be resilient to every possible error that can occur. A stronger version of consensus tolerating Byzantine failures is given by strengthening the Integrity constraint: IntegrityIf a correct process decides , then must have been proposed by some correct process.
The consensus problem may be considered in the case of asynchronous or synchronous systems. While real world communications are often inherently asynchronous, it is more practical and often easier to model synchronous systems, given that asynchronous systems naturally involve more issues than synchronous ones. In synchronous systems, it is assumed that all communications proceed in rounds. In one round, a process may send all the messages it requires, while receiving all messages from other processes. In this manner, no message from one round may influence any messages sent within the same round.
In a fully asynchronous message-passing distributed system, in which at least one process may have a crash failure, it has been proven in the famous 1985 FLP impossibility result by Fischer, Lynch and Paterson that a deterministic algorithm for achieving consensus is impossible. This impossibility result derives from worst-case scheduling scenarios, which are unlikely to occur in practice except in adversarial situations such as an intelligent denial-of-service attacker in the network. In most normal situations, process scheduling has a degree of natural randomness.
Randomized consensus algorithms can circumvent the FLP impossibility result by achieving both safety and liveness with overwhelming probability, even under worst-case scheduling scenarios such as an intelligent denial-of-service attacker in the network. The FLP result does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. In practice it is highly unlikely to occur.
Some forms of failures can be handled by a synchronous consensus protocol. For instance, the loss of a communication link may be modeled as a process which has suffered a Byzantine failure. A t-resilient anonymous synchronous protocol solves the Byzantine Generals problem if n > 3f, where f is the number of failures and n is the number of processes.
The Paxos consensus algorithm by Leslie Lamport, and variants of it such as Raft, are used pervasively in widely deployed distributed and cloud computing systems. These algorithms are typically synchronous, dependent on an elected leader to make progress, and tolerate only crashes and not Byzantine failures. Google has implemented a distributed lock service library called Chubby. Chubby maintains lock information in small files which are stored in a replicated database to achieve high availability in the face of failures. The database is implemented on top of a fault-tolerant log layer which is based on the Paxos consensus algorithm.
An example of a polynomial time binary consensus protocol that tolerates Byzantine failures is the Phase King algorithm by Garay and Berman. The algorithm solves consensus in a synchronous message passing model with n processes and up to f failures, provided n > 4f. In the phase king algorithm, there are f + 1 phases, with 2 rounds per phase. Each process keeps track of its preferred output (initially equal to the process's own input value). In the first round of each phase each process broadcasts its own preferred value to all other processes. It then receives the values from all processes and determines which value is the majority value and its count.
Consensus algorithms traditionally assume that the set of participating nodes is fixed and given at the outset: that is, that some prior (manual or automatic) configuration process has permissioned a particular known group of participants who can authenticate each other as members of the group. In the absence of such a well-defined, closed group with authenticated members, a Sybil attack against an open consensus group can defeat even a Byzantine consensus algorithm, simply by creating enough virtual participants to overwhelm the fault tolerance threshold.
Bitcoin introduced the first permissionless consensus protocol using proof of work and a difficulty adjustment function, in which participants compete to solve cryptographic hash puzzles, and probabilistically earn the right to commit blocks and earn associated rewards in proportion to their invested computational effort. Bitcoin uses proof of work, a difficulty adjustment function and a reorganization function to achieve permissionless consensus in its open peer-to-peer network. To extend bitcoin's blockchain or distributed ledger, miners attempt to solve a cryptographic puzzle, where probability of finding a solution is proportional to the computational effort expended in hashes per second.
Other cryptocurrencies (e.g. Ethereum, NEO, STRATIS, ...) use proof of stake, in which nodes compete to append blocks and earn associated rewards in proportion to stake, or existing cryptocurrency allocated and locked or staked for some time period. One advantage of a 'proof of stake' over a 'proof of work' system, is the high energy consumption demanded by the latter. As an example, bitcoin mining (2018) is estimated to consume non-renewable energy sources at an amount similar to the entire nations of Czech Republic or Jordan, while the total energy consumption of Ethereum, the largest proof of stake network, is just under that of 205 average US households.
To solve the consensus problem in a shared-memory system, concurrent objects must be introduced. A concurrent object, or shared object, is a data structure which helps concurrent processes communicate to reach an agreement. Traditional implementations using critical sections face the risk of crashing if some process dies inside the critical section or sleeps for an intolerably long time. Researchers defined wait-freedom as the guarantee that the algorithm completes in a finite number of steps.
The consensus number of a concurrent object is defined to be the maximum number of processes in the system which can reach consensus by the given object in a wait-free implementation. Objects with a consensus number of k can implement any object with a consensus number of k or lower, but cannot implement any objects with a higher consensus number. The consensus numbers form what is called Herlihy's hierarchy of synchronization objects.
According to the hierarchy, read/write registers cannot solve consensus even in a 2-process system. Data structures like stacks and queues can only solve consensus between two processes. However, some concurrent objects are universal (notated in the table with infinity), which means they can solve consensus among any number of processes and they can simulate any other objects through an operation sequence.

Up Next

Common questions

What is the consensus problem in computer science?

The consensus problem requires agreement among a number of processes or agents on a single data value. This fundamental issue in distributed computing ensures overall system reliability even when some processes fail or behave unreliably.

When was the FLP impossibility result published and who discovered it?

Fischer, Lynch and Paterson published the famous 1985 FLP impossibility result which proves that deterministic algorithms for achieving consensus are impossible in asynchronous systems with at least one crash failure. This discovery derives from worst-case scheduling scenarios unlikely to occur except during adversarial situations like denial-of-service attacks.

How does Bitcoin achieve permissionless consensus using proof of work?

Bitcoin introduced the first permissionless consensus protocol using proof of work where participants compete to solve cryptographic hash puzzles. Miners earn the right to commit blocks and receive rewards proportional to their invested computational effort through this probabilistic mechanism.

Why do Ethereum and other cryptocurrencies use proof of stake instead of proof of work?

Proof of stake systems consume significantly less energy than proof of work because nodes compete to append blocks based on existing cryptocurrency allocated and locked for a time period. For example, bitcoin mining in 2018 consumed non-renewable energy sources similar to entire nations while Ethereum used power comparable to just under 205 average US households.

What is Herlihy's hierarchy of synchronization objects regarding consensus numbers?

Herlihy's hierarchy defines the consensus number as the maximum number of processes that can reach consensus by a given object in a wait-free implementation. Read/write registers cannot solve consensus even in two-process systems while universal objects have an infinity consensus number capable of solving consensus among any number of processes.

See all questions about Consensus (computer science) →

All sources

43 references cited across the entry

1bookDistributed Systems: Concepts and DesignGeorge Coulouris et al. — Addison-Wesley — 2001
2journalAuthenticated algorithms for Byzantine agreementD. Dolev et al. — 1983
3journalByzantine Agreement with authenticationLi Gong et al. — 1995
4bookReplicationM. K. Aguilera — 2010
5journalImpossibility of distributed consensus with one faulty processM. J. Fischer et al. — 1985
6journalTime- and Space-Efficient Randomized ConsensusJames Aspnes — May 1993
7bookPrinciples of Distributed SystemsZarko Milosevic — 2009
8journalThe Weak Byzantine Generals ProblemL. Lamport — 1983
9webThe Consensus Problem in Unreliable Distributed Systems (A Brief Survey)Michael J Fischer
10journalThe Byzantine Generals ProblemL. Lamport et al. — 1982
11journalReaching Agreement in the Presence of FaultsLeslie Lamport — April 1980
12bookDistributed ComputingHagit Attiya — Wiley — 2004
13citationInteractive Theorem ProvingBenjamin Bisping et al. — Springer International Publishing — 2016
14journalCerberus: The Radix Consensus ProtocolMohammad Jalalzai et al. — 2023
15journalCloture Votes: n/4-resilient Distributed Consensus in t + 1 roundsPiotr Berman et al. — 1993
16conferenceThe Chubby lock service for loosely-coupled distributed systemsBurrows, M. — USENIX Association Berkeley, CA, USA — 2006
17conferencePaxos Made Live – An Engineering PerspectiveC. Tushar et al. — ACM Press New York, NY, USA — 2007
18journalResilient Asymptotic Consensus in Robust NetworksHeath J. LeBlanc — April 2013
19journalConsensus of second-order multi-agent systems in the presence of locally bounded faultsS. M. Dibaji — May 2015
20journalResilient consensus of second-order agent networks: Asynchronous update rules with delaysS. M. Dibaji — July 2017
21conferenceAnother advantage of free choice (extended abstract): Completely asynchronous agreement protocolsMichael Ben-Or — 1983
22journalCerberus: Minimalistic Multi-shard Byzantine-resilient Transaction ProcessingJelle Hellings et al. — 2023
23journalAn Efficient Algorithm for Byzantine Agreement without AuthenticationDanny Dolev et al. — 1982
24journalAn optimal probabilistic protocol for synchronous Byzantine agreementPesech Feldman et al. — 1997
25bookAdvances in Cryptology - CRYPTO 2006Jonathan Katz et al. — 2006
26conferencePractical Byzantine Fault ToleranceMiguel Castro et al. — 1999
27conferenceThe honey badger of BFT protocolsAndrew Miller et al. — October 2016
28webEfficient Synchronous Byzantine ConsensusIttai Abraham et al. — September 11, 2017
29webByzantine agreement made trivialSylvio Micali — CSAIL, MIT — March 19, 2018
30arxivALGORANDJing Chen et al. — 2016
31webBitcoin is an energy hog. Where is all that electricity coming from?Umair Irfan — June 18, 2019
32webThe Merge - Implications on the Electricity Consumption and Carbon Footprint of the Ethereum NetworkSeptember 7, 2022
33webElectricity consumption per capita worldwide in 2022, by selected country
34webThe Ripple Protocol Consensus AlgorithmDavid Schwartz et al. — 2014
35book2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)Maria Borge et al. — 2017
36arxivWho Watches the Watchmen? A Review of Subjective Approaches for Sybil-resistance in Proof of Personhood ProtocolsDivya Siddarth et al. — 13 October 2020
37conferenceAn Offline Foundation for Online Accountable PseudonymsBryan Ford et al. — April 2008
38bookSocial InformaticsGal Shahaf et al. — October 2020
39webCanDID: Can-Do Decentralized Identity with Legacy Compatibility, Sybil-Resistance, and AccountabilityDeepak Maram et al. — 28 Sep 2020
40arxivUniqueID: Decentralized Proof-of-Unique-HumanMohammad Javad Hajialikhani et al. — 20 June 2018
41journalWait-Free SynchronizationMaurice Herlihy — January 1991
42bookProceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computingDamien Imbs et al. — Association for Computing Machinery — 25 July 2010
43bookProceedings of the twenty-third annual ACM symposium on Principles of distributed computingFaith Fich et al. — Association for Computing Machinery — 25 July 2004

Consensus (computer science)

1. Foundations And Requirements

2. Failure Models And Systems

3. Impossibility Results And Solutions

4. Classic Algorithms And Protocols

5. Permissioned Versus Permissionless Networks

6. Shared Memory And Hierarchy

Up Next

Common questions

All sources