(This post is part of the Understanding GHOSTDAG series)

This post aims to kick off the discussion on familiar grounds. Toward this end, I chose a starting point I believe most readers are familiar with: the Byzantine Generals problem. I know, I know, you’ve read a thousand blog posts and Medium posts already, but trust me, this will be different. The generals will only be a starting point to a discussion about Sybilness, decentralization, security, and scaling that will introduce many concepts that will accompany us throughout the book. Besides, I’ve read the same posts you have, and I can confidently say that I explain it better.

The Byzantine Generals Problem

Many of us heard this story: a bunch of Byzantine generals are entrenched around a fortress and need to decide whether they should attack or retreat. The catch: they must either all attack or all retreat, a situation where some generals attack while others retreat will surely lead to a great loss. The other catch: the generals cannot hold a meeting to confer, they can only send messages to each other.

The solution seems simple enough: let the majority decide. If most generals want to attack, all generals attack. Otherwise, all generals retreat. The problem with this solution is that some generals might be treacherous and want the military to lose. Say there are three generals: the first wants to fight, the second wants to retreat, ant the third wants to lose. The first general sends the message “attack” to everyone, the second general sends the message “retreat” to everyone. Now the third general foils the army: he sends “attack” to one general and “retreat” to the other” The consequence? The first general will see two “attack” messages and attack, while the second will see two “retreat” messages and retreat. The third general got his way.

This example shows that the “majority wins“ protocol cannot stand even a single fault. A single treacherous general suffices to allow scenarios where the protocol fails.

So can we do better? Can we devise a protocol that can handle any faults? This question is at the heart of a field of research called Byzantine fault tolerance (BFT) after the Byzantine generals problem. It was inaugurated in 1980 in a seminal paper by Pease, Shostak and Lamport called Reaching Agreement in the Presence of Faults. In this paper, they provided a protocol that can handle $f$ faulty nodes given there are at least $3f+1$ nodes in total, and proved that it is optimal. In other words, BFT can only protect us as long as less than a third of the nodes are faulty (the paper also shows how the protocol can be slightly improved to work when exactly one-third of the nodes are faulty, using digital signatures). Today, this result is commonly referred to simply as the “ $3f+1$ theorem”.

The protocols proposed by Pease et al. are theoretically significant but impractical, they just require too much time and storage. In 1999, Miguel Castro and Barbara Liskov introduced Practical Byzantine Fault Tolerance (PBFT), an efficient BFT protocol suitable for large loads. Ever since, there has been a Cambrian explosion of different BFT protocols with different properties such as Tendermint, Aardvark and RBFT, each proposing different improvements and trade-offs.

A notable feature of some BFT protocols is that while 34% of faulty players can harm the consensus, their meddling can be detected retroactively. This property is leveraged in proof-of-stake consensus, where participants are required to put some money at stake. This money will be burned if malfeasance on their part is detected. This is often called slashing. Proof-of-stake cannot prevent a 34% collusion from interfering with the protocol, so it compromises by setting economic deterrents instead.

PSL’s Protocol*

So, how does Pease et al.’s BFT protocol work, and what makes it impractical?

We start with a protocol for four nodes: let us call them John, Paul, George, and Ringo. These four coleopterans are voting on what to call their new album. The protocol goes in two rounds: in the first round, each beatle tells all beatles what they want to call the album, and in the second round, each tells everyone what the other beatles told him. Note that the communication between the beatles is still one-on-one so that if one of them decides to lie, they can tell different lies to different beatles.

By the end of the second round, each beetle should have three messages of the form “John told me that he wants to call the album Rubber Soul” and nine messages of the form “George told me that Ringo told him that he wants to call the album Revolver”.

Let us assume the view of George trying to figure out what John wants to call the album. In the first round, John told him his preference. In the second round, Paul and Ringo told John what John supposedly told them. George will attribute the majority opinion to Paul.

Note that if John told George that he prefers Rubber Soul, but both Paul and Ringo told George that John prefers Revolver, George will still assume that John wants to call the album Revolver.

George repeats this process for all of his colleagues to finally figure out what it seems that most beatles want to name the album. John, Paul and Ringo do the same.

The key observation here is the following: if at least three of the beatles followed the protocol honestly, then all beatles will reach the same conclusion.

Proving this is indeed the case requires nothing but a simple (if a bit tedious) bookkeeping. It is clear that no problems arise if all four are honest, so let us assume John wants to name the album “Rubber Soul” at all costs. He can tell George that Ringo and Paul also want to name it “Rubber Soul”, but if they both prefer “Revolver” that would not help. Ringo will tell George that he prefers “Revolver", and Paul will also tell George that Ringo wants to call the album revolver. Together, they will prevent John’s coercion. This is only one possible cheating strategy, but listing and verifying that a single cheating beatle will fail in all of them is an easy task that shall remain an exercise for the reader.

This protocol can be extended to $n=3f+1$ beatles (or any other coleopterans) by adding more rounds. So by the end of the, say, fourth round, each player will hold messages such as “John told George that Ringo told him that Yoko told him that she prefers to call the album Let It Be”. Otherwise, everything remains the same. A standard proof by induction shows that this protocol provides tolerance against $f$ faulty nodes, given that it runs for at least $f$ rounds.

From the above we can read off the impracticality of the algorithm. First, the number of rounds increases linearly, which is terrible. The Ethereum network has more than 750,000 validators, meaning that if they used PSL’s protocol, each validation phase would have to run for 250,000 rounds. Worse yet, the amount of messages each validator needs to store is exponential: in the first round, they get $n-1$ messages, in the second round, they get $(n-2)(n-1)$ messages, and so on until the last round where they get $(n-1)\cdot\ldots\cdot(n-f) = \Theta(n^f)$ messages. This becomes prohibitive when the number of nodes is in the dozens, not to mention hundreds of thousands.

Enjoyed reading this article?

Comments

Mehdi

Does the asynchronus property of ABFT have any relevance in this discussion?

Byzantine Fault Tolerance (Understanding GHOSTDAG Part 1A, post 1)

Byzantine Fault Tolerance (Understanding GHOSTDAG Part 1A, post 1)

The Byzantine Generals Problem

PSL’s Protocol*

Comments

Mehdi

Post a comment