(This post is part of the Understanding GHOSTDAG series)
In this post, we dive in some depth into the mathematics of proving Bitcoin’s security. Providing a completely formal proof requires some background work that is beyond reasonable for our scope, but with little work we can explain most ideas in the proof, and leave few leaps of faith the stubborn reader is welcome to fill.
The analysis is divided into two parts. We first consider an attacker who wants to double-spend a specific transaction. That is, we consider how the probability of a revert of that specific transaction decays with the number of confirmations.
We then consider a more chaotic adversary. The adversary aims to revert any transaction with some predetermined number of confirmations . She does not care what particular transaction is reverted and, unlike the previous attacker, can change the attacked transaction throughout the attack. That is, we consider the probability that there ever is any double-spend.
We only consider the no-orphans case. The more general case is essentially obtained by replacing with whenever applicable (where is the honest network’s orphan rate).
Reverting a Particular Transaction
The attacker aims to revert the transaction tx after accumulating confirmations. The most straightforward way to attempt this is to mine a parallel chain and post it if it becomes long enough. That is, we want to arrive at a situation like this:
Note that arriving at this situation:
is not enough. The entire point of a double-spend is to revert a transaction after the merchant has accepted it. And if they only accept after confirmations, the above will not work.
The first leap of faith we have to make is that strategy is optimal. The attacker can try other tricks and stunts, but at the end of the day, none will work as well. This makes life much simpler for us, as any bound on the success probability of this attack immediately implies that same bound for any attack.
Simple Random Walks on
The mathematical model we use is called a Random Walk on .
Don’t let the notation scare you, is simply the set of integers. By a random walk we mean that we start from some number, and then move to another number with some probability.
We only need the so-called simple -biased random walk. It always starts at , and for some it either moves one step to the right with probability or one step to the left with probability (what makes this walk simple is that it can only move to adjacent numbers, and that the probability to move right/left never changes). This is illustrated by this figure:
For example, for we get what’s often called a drunk walk, as it fits the story of a drunk who chooses their next step completely randomly.
Margulis’ Theorem*
This cool story has absolutely nothing to do with anything we are doing here and can be completely skipped.
As the legend goes, the probability theorist Grigory Margulis was once walking through a park divided into a lattice of square, relatively secluded “rooms”, kind of like a chess board.
While mindlessly meandering through the park, he ran into a young couple. Wishing to give them some privacy, he walked past them and kept wandering, but he soon ran into them again, and again, and again. Concerned they would consider him some stalker, he went home. But he could not help but ask himself: why did we keep running into each other? Would this have happened if the garden was infinite?
So he sat down to work it out, and found a very interesting result. He considered the set , where each element is not just one integer but a list of integers, and each step changes one of these integers by one. So for example, a drunk walk on can be imagined like walking on an infinite chessboard, where at each step you skip to one of the four adjacent squares (diagonals not allowed!) with equal probability. In there is an infinite stack of infinite chess boards, and each square has ladders to the squares above and below it, and so on.
Margulis asked himself: if I start at any point, and move randomly, what is the probability that I keep getting to where I started? He then proved an astonishing theorem: this probability is for , and for . A common allegory for this theorem is that “a drunk frog will eventually find its way home, but a drunk bird will remain lost forever”.
For those curious about what causes this huge difference between and , I will point out that this eventually follows from the fact that the series converges if and only if . This only made you more curious? Great! Go read this book.
Double Spending as a Random Walk
We are going to model the optimal attack as a random walk. The number we are currently at is the advantage of the adversary. That is, it is the difference between the length of the secret and honest chains. Some examples:
We consider the start point of the attack as the situation on the top left of the figure, the two conflicting blocks are the only blocks above the split. This scenario doesn’t necessarily happen: the honest network might have created a confirmation for tx before the adversary managed to create a block containing tx’ (like in the top right of the figure). However, making this assumption works in favor of the adversary, so it does not invalidate our calculation.
At each point, the probability that the attacker creates the next block is , and the probability that the honest network creates the next block is . Hence, the adversary's advantage is modeled the -biased random walks above.
The attack is successful if two conditions are satisfied simultaneously: the attacker advantage is positive, and the transaction tx has at least confirmations. Both these conditions can be recast in terms of random walk: they simply mean that we have reached a positive number after at least left moves were made.
The second leap of faith is that the probability this ever happens is
This establishes the security that we want.
Intuitively, this bound makes a lot of sense. By the time the honest network made confirmations, the adversary is expected to make confirmation. So we have that . So to get to a positive there must be a point in time adversary where there were more right moves than left moves.
There are many ways this could happen. The most straightforward one is that the adversary makes consecutive blocks while the honest network makes none. One can see that for say , the probability of this is , which is indeed reducing exponentially with .
Of course, this is just an illustration. An actual bound must go over all possible scenarios.
Computing the Bound*
Actually computing the bound is not difficult, but it requires some mathematical sophistication.
We first note that it suffices to ask what is the probability that we hit after at least left moves. This is simpler than asking about any positive number, but actually captures all possibilities, as reaching any positive number requires hitting first.
Let be the number of left moves made before we won. Because we finished on , there were exactly right moves. The probability of any specific sequence of left moves and right move is
The naive thing to do is to multiply this by the number of such sequences to get the probability of hitting one while making exactly left moves. But we need to be more careful than that! If we do this, we also count scenarios where we hit after only left moves. When we sum over all possible values of , this will double count many scenarios, ruining our bound!
We must only count scenarios where is only hit on the last move. The number of such scenarios is the th Catalan number .
The Catalan numbers have many interesting interpretations. A particularly useful one is that it counts the number of ways to traverse the edges of a chess-board from the corner of a chessboard to the diametrically opposed corner only going in two directions and without going over the diagonal. For example, the following illustrates why :
So we have that for every the probability of hitting after exactly moves is , so the probability that we ever hit becomes the infinite sum
A sum of positive numbers is always larger than any number it contains, in particular the first number. So we get that the probability is at least (easy exercise: ), which is bad since this is a constant number and it definitely does not decay exponentially in any way, so what did we miss?
Ah yes! We only care about scenarios with at least left moves! So we actually want to compute (or at least bound) the sum
But how do we do that? Here we appeal to another fancy tool called generating functions. For any sequence of numbers , their generating function is given by . By using recursive properties of the Catalan numbers, one can show that its generating function is
We can write the expression we want to compute as
Proving that the above is shell remain an exercise for the reader (hint: Lagrange remainder), for illustration we note that for this comes out . For we have
This completes the proof.
Stochastic Attacks
Will we never see a double spend?
The definition of security prohibits attacks that revert a specific transaction with meaningful likelihood, given that it accumulated sufficient confirmations. But fails to consider attackers just out to harm the network.
Say the goal of an -attacker is simply to show that a transaction can be reverted after transactions. The probability of successfully reverting a specific transaction is , and we know that due to the nature of probabilistic finality.
Consider the following strategy: try to double-spend a transaction on the most recent block. If at some points the honest network accumulates confirmations, abort and restart.
The probability of success of each attempt is some , which is slightly smaller than (do you understand why?) but still positive. Hence, the probability the attempt fails is , and while it might be very close to , it is not .
The success probability of each attempt is independent of previous attempts, making the probability that consecutive attempts all failed . In other words, the probability the attack is successful within attempts is
This does not look good, it seems that success becomes exponentially likely with the number of attempts. To quantify this intuition, the following graph shows the success probability for a given number of attempts for different values of :
We see that even for relatively small values of , the required number of attempts for say, succeeding with probability , is far from being galactical. The graphs also indicate (and this can be easily shown mathematically) that the number of attempts required for some constant success probability grows like
It is important to accept that this reality is unavoidable. In a permissionless network, it is always possible to attempt the same stunt indefinitely, so what is our saving grace?
We recall that , so it is still the case that the expected number of attempts in a successful attack grows exponentially. The meaning of this is open to interpretation. The way I see it, it is just a matter of coming to terms with the fact that double-spends on Bitcoin are inevitable in light of persistent enough trolls. Moreover, for the attack to actually be harmful, it has to transact a meaningful amount to an innocent party, making such attacks exponentially expensive.
There is more to say about the disruption such attacks make for confirmation of the other transactions in the reorged blocks, and their consequences for orphan rates, but the bottom line is that they are inconsequential. The details are left for the reader.
Enjoyed reading this article?
More articles like thisComments
No comments yet!