(This post is part of the Understanding GHOSTDAG series)

In this post, we dive in some depth into the mathematics of proving Bitcoin’s security. Providing a completely formal proof requires some background work that is beyond reasonable for our scope, but with little work we can explain *most* ideas in the proof, and leave few leaps of faith the stubborn reader is welcome to fill.

The analysis is divided into two parts. We first consider an attacker who wants to double-spend a *specific* transaction. That is, we consider how the probability of a revert of that specific transaction decays with the number of confirmations.

We then consider a more chaotic adversary. The adversary aims to revert *any *transaction with some predetermined number of confirmations $C$. She does not care what particular transaction is reverted and, unlike the previous attacker, can change the attacked transaction throughout the attack. That is, we consider the probability that there ever is *any* double-spend.

We only consider the no-orphans case. The more general case is essentially obtained by replacing $(1-\alpha)$ with $(1-\alpha)(1-\delta)$ whenever applicable (where $\delta$ is the honest network’s orphan rate).

## Reverting a Particular Transaction

The attacker aims to revert the transaction *tx* after accumulating $C$ confirmations. The most straightforward way to attempt this is to mine a parallel chain and post it if it becomes long enough. That is, we want to arrive at a situation like this:

Note that arriving at this situation:

is not enough. The entire point of a double-spend is to revert a transaction **after** the merchant has accepted it. And if they only accept after $C$ confirmations, the above will not work.

The first leap of faith we have to make is that strategy is *optimal*. The attacker *can* try other tricks and stunts, but at the end of the day, none will work as well. This makes life much simpler for us, as any bound on the success probability of *this* attack immediately implies that same bound for *any* attack.

## Simple Random Walks on $\mathbb{Z}$

The mathematical model we use is called a Random Walk on $\mathbb{Z}$.

Don’t let the notation scare you, $\mathbb{Z}$ is simply the set of integers. By a *random walk* we mean that we start from some number, and then move to *another number* with some probability.

We only need the so-called *simple* $\alpha$-biased random walk. It always starts at $0$, and for some $\alpha$ it either moves one step to the right with probability $\alpha$ or one step to the left with probability $1-\alpha$ (what makes this walk *simple* is that it can only move to adjacent numbers, and that the probability to move right/left never changes). This is illustrated by this figure:

For example, for $\alpha=1/2$ we get what’s often called a *drunk walk,* as it fits the story of a drunk who chooses their next step completely randomly.

### Margulis’ Theorem*

This cool story has absolutely nothing to do with anything we are doing here and can be completely skipped.

As the legend goes, the probability theorist Grigory Margulis was once walking through a park divided into a lattice of square, relatively secluded “rooms”, kind of like a chess board.

While mindlessly meandering through the park, he ran into a young couple. Wishing to give them some privacy, he walked past them and kept wandering, but he soon ran into them again, and again, and again. Concerned they would consider him some stalker, he went home. But he could not help but ask himself: *why *did we keep running into each other? Would this have happened if the garden was infinite?

So he sat down to work it out, and found a very interesting result. He considered the set $\mathbb{Z}^n$, where each element is not just one integer but a list of $n$ integers, and each step changes one of these integers by one. So for example, a drunk walk on $\mathbb{Z}²$ can be imagined like walking on an infinite chessboard, where at each step you skip to one of the four adjacent squares (diagonals not allowed!) with equal probability. In $\mathbb{Z}³$ there is an infinite stack of infinite chess boards, and each square has ladders to the squares above and below it, and so on.

Margulis asked himself: if I start at any point, and move randomly, what is the probability that I keep getting to where I started? He then proved an astonishing theorem: this probability is $1$ for $n=1,2$, and $0$ for $n\ge 3$. A common allegory for this theorem is that “a drunk frog will eventually find its way home, but a drunk bird will remain lost forever”.

For those curious about what causes this huge difference between $n=2$ and $n=3$, I will point out that this eventually follows from the fact that the series $\sum_{n=1}^\infty n^{-\alpha}$ converges if and only if $\alpha>1$. This only made you more curious? Great! Go read this book.

## Double Spending as a Random Walk

We are going to model the optimal attack as a random walk. The number $N$ we are currently at is the *advantage* of the adversary. That is, it is the difference between the length of the secret and honest chains. Some examples:

We consider the start point of the attack as the situation on the top left of the figure, the two conflicting blocks are the only blocks above the split. This scenario doesn’t *necessarily *happen: the honest network might have created a confirmation for *tx* before the adversary managed to create a block containing *tx’ *(like in the top right of the figure). However, making this assumption works *in favor* of the adversary, so it does not invalidate our calculation.

At each point, the probability that the attacker creates the next block is $\alpha$, and the probability that the honest network creates the next block is $1-\alpha$. Hence, the adversary's advantage is modeled the $\alpha$-biased random walks above.

The attack is successful if two conditions are satisfied simultaneously: the attacker advantage is positive, and the transaction *tx* has at least $C$ confirmations. Both these conditions can be recast in terms of random walk: they simply mean that we have reached a positive number after at least $C$ left moves were made.

The second leap of faith is that the probability this ever happens is $O\left(\left(\frac{\alpha}{1-\alpha}\right)^C\right)\text{.}$

This establishes the security that we want.

Intuitively, this bound makes a lot of sense. By the time the honest network made $C$ confirmations, the adversary is expected to make $\frac{\alpha}{1-\alpha}C$ confirmation. So we have that $N=\frac{\alpha}{1-\alpha}C-C$. So to get to a positive $N$ there must be a point in time adversary where there were $1-N = 1+C\frac{1-2\alpha}{1-\alpha}$ more right moves than left moves.

There are many ways this could happen. The most straightforward one is that the adversary makes $1-N$ consecutive blocks while the honest network makes none. One can see that for say $\alpha=1/4$, the probability of this is $\alpha^{1+\frac{2}{3}C}$, which is indeed reducing exponentially with $N$.

Of course, this is just an illustration. An actual bound must go over *all* possible scenarios.

## Computing the Bound*

Actually computing the bound is not difficult, but it requires some mathematical sophistication.

We first note that it suffices to ask what is the probability that we hit $1$ after at least $C$ left moves. This is simpler than asking about *any* positive number, but actually captures all possibilities, as reaching any positive number requires hitting $1$ first.

Let $k$ be the number of left moves made before we won. Because we finished on $1$, there were exactly $k+1$ right moves. The probability of any *specific* sequence of $k$ left moves and $k+1$ right move is $\alpha^{k+1}(1-\alpha)^k$

The naive thing to do is to multiply this by the number of such sequences to get the probability of hitting one while making exactly $k$ left moves. But we need to be more careful than that! If we do this, we also count scenarios where we hit $1$ after only $k-1$ left moves. When we sum over all possible values of $k$, this will double count many scenarios, ruining our bound!

We must only count scenarios where $1$ is only hit on the *last* move. The number of such scenarios is the $k$th Catalan number $C_k$.

The Catalan numbers have many interesting interpretations. A particularly useful one is that it counts the number of ways to traverse the edges of a $k\times k$ chess-board from the corner of a chessboard to the diametrically opposed corner only going in two directions and without going over the diagonal. For example, the following illustrates why $C_3=5$:

So we have that for every $k$ the probability of hitting $1$ after exactly $k$ moves is $\alpha^{k+1}(1-\alpha)^k C_k$, so the probability that we *ever *hit $1$ becomes the infinite sum $\sum_{k=0}^{\infty}\alpha^{k+1}\left(1-\alpha\right)^{k}C_{k}\text{.}$

A sum of positive numbers is always larger than any number it contains, in particular the first number. So we get that the probability is at least $\alpha$ (easy exercise: $C_1 =1$), which is bad since this is a constant number and it definitely does not decay exponentially in any way, so what did we miss?

Ah yes! We only care about scenarios with at least $C$ left moves! So we actually want to compute (or at least bound) the sum $\sum_{k=C}^{\infty}\alpha^{k+1}\left(1-\alpha\right)^{k}C_{k}\text{.}$

But how do we do that? Here we appeal to another fancy tool called *generating functions*. For any sequence of numbers $A_n$, their *generating *function is given by $a(n) = \sum x^n A_n$. By using recursive properties of the Catalan numbers, one can show that its generating function is $\sum_{k=0}^{\infty}x^{k}C_{k}=\frac{1-\sqrt{1-4x}}{2x}\text{.}$

We can write the expression we want to compute as $\begin{aligned}\sum_{k=C}^{\infty}\alpha^{k+1}\left(1-\alpha\right)^{k}C_{k} & =\alpha\sum_{k=C}^{\infty}\left(\alpha\left(1-\alpha\right)\right)^{k}C_{k}\\& =\alpha\sum_{k=0}^{\infty}\left(\alpha\left(1-\alpha\right)\right)^{k}C_{k}-\alpha\sum_{k=0}^{C-1}\left(\alpha\left(1-\alpha\right)\right)^{k}C_{k}\\& =\alpha\frac{1-\sqrt{1-4\alpha\left(1-\alpha\right)}}{2\alpha\left(1-\alpha\right)}-\alpha\sum_{k=0}^{C-1}\left(\alpha\left(1-\alpha\right)\right)^{k}C_{k}\\& =\frac{\alpha}{1-\alpha}-\alpha\sum_{k=0}^{C-1}\left(\alpha\left(1-\alpha\right)\right)^{k}C_{k}\end{aligned}\text{.}$

Proving that the above is $O\left(\left(\frac{\alpha}{1-\alpha}\right)^{C}\right)$ shell remain an exercise for the reader (hint: Lagrange remainder), for illustration we note that for $C=0$ this comes out $\frac{\alpha}{1-\alpha}$. For $C=1$ we have $\frac{\alpha}{1-\alpha}-\alpha=\frac{\alpha}{1-\alpha}\left(1-(1-\alpha)\right)=\frac{\alpha^{2}}{1-\alpha}<\left(\frac{\alpha}{1-\alpha}\right)^{2}\text{.}$

This completes the proof.

## Will We *Never* See a Double-Spend?

The definition of security prohibits attacks that revert a *specific* transaction with meaningful likelihood, given that it accumulated sufficient confirmations. But fails to consider attackers just out to harm the network.

Say the goal of an $\alpha$-attacker is simply to show that a transaction *can* be reverted after $C$ transactions. The probability of successfully reverting a specific transaction is $\varepsilon=\varepsilon(C,\alpha)$, and we know that $\varepsilon>0$ due to the nature of probabilistic finality.

Consider the following strategy: try to double-spend a transaction on the most recent block. If at some points the honest network accumulates $C+10$ confirmations, abort and restart.

The probability of success of each attempt is some $\varepsilon‘$, which is slightly smaller than $\varepsilon$ (do you understand why?) but still positive. Hence, the probability the attempt fails is $(1-\varepsilon’)$, and while it might be very close to $1$, it is not $1$.

The success probability of each attempt is independent of previous attempts, making the probability that $t$ consecutive attempts all failed $(1-\varepsilon’)^t$. In other words, the probability the attack is successful within $t$ attempts is $1-(1-\varepsilon’)^t\text{.}$

This does not look good, it seems that success becomes exponentially likely with the number of attempts. To quantify this intuition, the following graph shows the success probability for a given number of attempts for different values of $\varepsilon’$:

We see that even for relatively small values of $\varepsilon’$, the required number of attempts for say, succeeding with probability $1/2$, is far from being galactical. The graphs also indicate (and this can be easily shown mathematically) that the number of attempts required for some constant success probability grows like $O(1/\varepsilon‘)$

It is important to accept that this reality is *unavoidable*. In a permissionless network, it is always possible to attempt the same stunt indefinitely, so what is our saving grace?

We recall that $\varepsilon=O\left(e^{-C}\right)$, so it is still the case that the expected number of attempts in a successful attack grows exponentially. The meaning of this is open to interpretation. The way I see it, it is just a matter of coming to terms with the fact that double-spends on Bitcoin are inevitable in light of persistent enough trolls. Moreover, for the attack to actually be *harmful*, it has to transact a *meaningful amount* to an *innocent party*, making such attacks exponentially expensive.

There is more to say about the disruption such attacks make for confirmation of the other transactions in the reorged blocks, and their consequences for orphan rates, but the bottom line is that they are inconsequential. The details are left for the reader.

Enjoyed reading this article?

More articles like this# Comments

No comments yet!