This is an incomplete post!

This post has not been edited and some parts have not been written yet. It is delayed until I finish migrating the book to its new home.

(This post is part of the Understanding GHOSTDAG series)

Introduction

We begin our GHOSTDAG journey by introducing its its imaginary friend PHANTOM. The PHANTOM protocol, despite what you might have heard, cannot be implemented. So why present it at all? Because it is simple. It is the purest manifestation of using $k$ -clusters to generalize Bitcoin. Understanding it will greatly clarify the goals we ultimately manage to achieve by GHOSTDAG.

Considering PHANTOM also tells us a great deal about the limitations we have to work around. PHANTOM is intractable in a very strong sense (namely, it is NP-hard), implying that an efficient implementation thereof is highly unlikely (as its very existence would annihilate most of computer science). If we want to transform these ideas, we will have to find a way to approximate them in a way that retains all the good properties of $k$ -clusters, while affording an efficient implementation.

In the next post, we provide a self-contained exposition of PHANTOM. This starts with introducing the notion of $k$ -clusters, followed by a discussion of their similarities to chain, and a discussion of the $k$ parameter itself, its role, and how it should be chosen. The following posts will bridge the gap from PHANTOM to GHOSTDAG from top to bottom, we explain

How a coloring can be computed given that, for any block, we can tell whether coloring it blue will break the $k$ -cluster property, and how a topological ordering can be extracted from that coloring.
How can we tell whether coloring a block blue will break the $k$ -cluster property, given that we can perform efficient reachability queries: in other words, for any two blocks $B$ and $C$ we can quickly determine whether $B\in C.Anticone$ .
How to implement reachability queries.

Having established all of that, we will also discuss how to handle reorgs, and how to implement the UTXO model over this whole thing.

In the next chapter, we will discuss the security of the resulting protocol in great depth.

Part 1: PHANTOM

The PHANTOM protocol is ridiculously simple. It relies entirely on two notions called, a $k$ -cluster, and topological sorting. With these two notions in hand, the PHANTOM chain ordering rule reduces to “find a maximal $k$ -cluster, and provide a topological sort of the blocks that gives precedence to blue blocks”. That’s it. Of course, the devil is in the requirement to “find a maximal $k$ -cluster”, that turns out to be NP hard.

Maximal $k$ -Clusters

The notion of $k$ -clusters is going to plague your nights from now on, so you better pay attention.

A $k$ -cluster is a DAG with the following property: any vertex $v$ satisfies that $|Anticone(v)|\le k$ . For example, the following DAG is a $3$ -cluster, but is not a $2$ -cluster since $|Anticone(v)|=3$ :

(Exercise: find an example of a $k$ -cluster such that it is possible to remove a vertex such that the result is connected but not a $k$ -cluster)

(Exercise: prove that a $k$ -cluster that is not connected has at most $2k$ vertices)

Given a DAG $G$ with vertices $V$ , let $U\subset V$ . The subDAG induced by $W$ is the graph obtained by taking only the vertices of $W$ and any arrows between them. For example:

Here I marked a few vertices in pink, and then marked all arrows between pink vertices in pink as well. A crucial subtlety is that I did not get to choose the arrows, they are implied by the vertices.

The combinatorial problem that concerns us is the following: given a DAG, find the largest possible induced subgraph that is a $k$ -cluster.

Here “largest possible” means containing the most vertices. There are situations where there are several different maximal graphs, i.e. there are two different $k$ -clusters of size six but no $k$ -cluster larger than six. In this case, we are tasked with returning any one of them. Also note that the value $k$ is not at our control, it is given to us as a parameter to the problem.

As an example, consider the following induced subgraph:

One can easily verify that it is a $3$ -cluster, and not-so-easily verify that it is in fact a maximal $3$ -cluster.

Unfortunately, the problem of finding a maximal $k$ -cluster is intractable for $k\ge 2$ .

Note that a $0$ -cluster is simply a chain, so finding a $0$ -cluster just means finding the longest chain. That is, the maximal $k$ -cluster problem is a generalization of HCR. The purpose of the rest of this chapter is to convince you that it is the correct generalization.

(Exercise: proof that PHANTOM with $k=0$ will not accept a transaction that will be rejected by HCR)

The $k$ Parameter

We keep saying that GHOSTDAG returns a $k$ -cluster, but we never specified what $k$ actually is. Some other things eerily missing from the discussion so far are the block rate and network delay. How is it possible that the operation of PHANTOM does not rely on the block rate and network delay? The network delay and block rate, as well as another parameter $\delta$ called the confidence, are all encoded into how $k$ is chosen.

Remember that when we discussed blockChains we surmised that no chain selection rule is actually secure against $0.5$ attackers. In practice, there is only security against $(1-\delta)/2$ attackers for some $\delta$ that is hopefully small. In the blockChain world $\delta$ is implicit. We can observe the orphan rates and try to conclude what the actual security is (which is not too easy, given that orphans are not retransmitted to the entire network), but it is not dictated a-priori.

PHANTOM and GHOSTDAG are slightly different in that regard, they make the choice of $\delta$ explicit. We first define how much confidence we require from the protocol, and then, we ask ourselves how to choose $k$ such that the honest network produces a $k$ -cluster $1-\delta$ of the time.

In other words, given $\delta$ , as well as the block rate $\lambda$ and the network delay bound $D$ , we ask ourselves: how large does $k$ has to be such that a network producing $\lambda$ blocks per second and transmitting them with a delay $D$ seconds will create more than $k$ parallel blocks at most $\delta$ of the time. There is an explicit formula for $k$ , but it is not very pleasant:

$k=\min\left\{ m\in\mathbb{N}\mid e^{-2D\lambda}\sum_{j=m+1}^{\infty}\frac{\left(2D\lambda\right)^{j}}{j!}<\delta\right\} \text{,}$ the point is that we can easily compute $k$ from $\delta, D, \lambda$ .

Remark: In practice, the correct choice of $k$ is even more complicated, because there is another phenomenon we have to bound that rises from a certain non-monotonocity property of GHOSTDAG. We will not get into this here.

The GHOSTDAG security proof establishes that if $k$ is chosen as we prescribed, as long as the actual network delay stays below $D$ , the network is secure against $(1-\lambda)/2$ attackers. For example, for $\lambda = 1 Block/Sec$ , $D = 10 Sec$ and $\delta = 0.05$ we get $k=18$ . That is, assuming a latency bound of $10$ seconds we can operate at one block per second with $k=18$ .

We will defer a deeper discussion of the confirmation times to a future post, but for now we mention that this $k$ is exactly the reason increasing block rates does not decrease confirmation times: the number of required confirmations grows with $k$ , and $k$ grows with the block rate.

Deeper Look Into $k$ *

The formula for $k$ is actually not so bad once you understand what it is all about.

In the appendix we examined the distribution of block creation and concluded that if we create a block once per $\lambda’$ seconds, then the probability to see $n$ blocks within $D’$ seconds is $p_n = \mathbb{P}\left[Poi\left(\frac{D'}{\lambda'}\right)\right]=\frac{\left(D'/\lambda'\right)e^{-D'/\lambda'}}{n!}\text{.}$

In the appendix, we did math like probability theorists, so we used $\lambda’$ to indicate the block delay whereas in the GHOSTDAG paper, and in this post so far, we used $\lambda$ for the block rate. This two values are obviously reciprocals of each other, that is $\lambda = 1/\lambda’$ .

The time period $D’$ that we want to look at is not the network delay, but the round trip time, which is just another fancy name for the network delay twice. It’s the time it would take the network to learn of the new block and report back their own parallel blocks to the originator. We want to consider this back-propagation time as well, so we set $D’=2D$ . The reshaped formula becomes: $\mathbb{P}\left[Poi\left(2D\lambda\right)\right]=\frac{\left(2D\lambda\right)e^{-2D\lambda}}{n!}\text{.}$

We want to make sure that the probability that at least $k$ blocks are generated in time $2D$ is at most $\delta$ .

For any $m$ we have that the probability that at least $m$ blocks were created, is just the sum of the probabilities that exactly $j$ blocks were created, where $j$ goes over all values at least $m$ . In other words, we want to find the least $m$ such that $\sum_{j=m}^\infty p_j < \delta\text{.}$

Since the values $p_j$ are probabilities of events that are disjoint (it is impossible that during the last minute, both exactly three blocks and exactly four blocks were created ) and exhaustive (the number of blocks created in the last minute must be some natural number, possibly zero), we get that $p_j\ge 0$ for any $k=0,1,\ldots$ , and that $\sum_{j=0}^\infty p_j = 1\text{,}$ which imply together that $\lim_{m\to\infty}\sum_{j=m}^{\infty}p_{j}=0$ and in particular, that we can find such $m$ .

From Chains to $k$ -Clusters

The intuition that leads to $k$ -clusters is as follows: in order to perform a double-spend attack, an adversary must withhold the blocks they create, at least for a while. If they release the blocks they create too soon, the network might be “convinced” that they are the better blocks and switch sides too soon, ruining the double spend attack.

If we take the discussion back to chains, we can imagine a scenario where $C=6$ . That is, a merchant will not consider a transaction confirmed before there’s a six blocks long chain piled over it. Assume that shortly after the adversary started cultivating her side chain, she found herself with three blocks while the honest network only produced two:

If the attacker publishes her blocks before the honest network accumulated $C$ blocks, then the network will reorg to her side chain before the merchant accepted the transaction, and the attack will have failed.

In the blockDAG world things look a bit different. Blocks can point at several other blocks. In particular, the adversary can point at her own blocks as well as the honest network, so it might look a bit like this:

The point is that while the attacker can point at honest blocks, they will not point back at her blocks. So as time goes by, more blue blocks accumulate in the anticone of the block containing $tx’$ , until their number becomes more than $k$ and we are guaranteed that, barring a reorg, the block containing $tx’$ will never be blue. If the honest network has more hashrate, then it is very likely (as we have analyzed for chains) that by the time the honest network achieves $C$ confirmation the DAG would look like this:

We see that the block containing $tx’$ has $C+1$ blue blocks in its anticone, and it is not going to be blue any time soon.

This example is not too convincing, because we could have used the longest chain rule as well. But that’s just because we assume the honest network is arranged like a chain. Look what happens when we DAG it up:

The attacker has created the longest chain (because she has the privilege to nicely arranged all her blocks into a chain, while the honest network does not), but the block containing the contradiction is not part of the maximal $k$ -cluster).

The actual analysis is obviously more involved, and requires taking into account the weight that the network lends to the honest blocks, and carefully show that they do not allow accumulating an advantage over time. However, that is the gist of it: revealing your blocks early will end the attack prematurely, while withholding to them for a while will exclude them from the maximal $k$ -cluster.

In a magical world where we know how to find a maximal $k$ -cluster this leads to the following ordering rule:

Find a maximal $k$ -cluster
Output a topological ordering that gives precedence to elements in the $k$ -cluster

The idea behind this protocol is that hopefully, any attempt at double-spend would produce poorly connected blocks that will necessarily be outside the maximal $k$ -cluster, ensuring that they will never precede any transaction that is in the $k$ -cluster. I am saying “hopefully” because actually proving it for PHANTOM is quite difficult and quite unnecessary as well. I remind you that the PHANTOM algorithm cannot be implemented. The GHOSTDAG rule is not just much easier to implement, but also much easier to reason about. It’s constructive nature provides structure that does not exist in the very abstract PHANTOM, and we exploit this structure to reduce many of our reasonings from DAGs to chains. The security proof of GHOSTDAG is essentially just that: proving that if GHOSTDAG is not secure, then neither is HCR, contradicting established results from the Bitcoin literature.

Part 2: Recursive, Greedy Approximation

GHOSTDAG is a recursive and greedy algorithm that approximates PHANTOM. So before we dive into it, we better have some understanding of what these words mean.

Recursive Algorithms

A recursive algorithm is an algorithm whose output on a given input is defined in terms of previous inputs. Complicated? An example should clear it up.

Most chances you have heard of the Fibonacci sequence $1,1,2,3,5,8,\ldots$ . The $n$ th Fibonnaci number $F_n$ is defined as follows:
$F_n = \begin{cases}1 & n=1\\1 & n=2\\F_{n-1}+F_{n-2} & n>2\end{cases}$

What should draw your eyes here is that $F_n$ is defined in terms of previous Fibonacci numbers.

Indeed, if we wanted to write a function that computes the Fibonacci number, we could write it like this:

def fibonacci(n):
  if n == 1:
    return 1
  if n == 2:
    return 2
  return fibonacci(n-1) + fibonacci(n-2)

This is a bad implementation, for many reasons (mostly because it is easy to find a closed formula $F_n = \frac{1}{\sqrt{5}}\left[\left(\frac{1+\sqrt{5}}{2}\right)^{n}-\left(\frac{1-\sqrt{5}}{2}\right)^{n}\right]\text{,}$ see here for example), but it works and demonstrates the point: the fibonacci function calls itself.

The GHOSTDAG protocol is given a block $B$ in a DAG, and has to compute $BluePast(B)$ : that is, all the blocks in $Past(B)$ that $B$ thinks are blue. Recall that in GHOSTDAG, blocks can disagree on what blocks are blue. Indeed, it is possible that $C \in Past(B_1) \cap Past(B_2)$ and $C\in BluePast(B_1)$ , yet $C\notin BluePast(B_2)$ .

This seems highly problematic, since it means that to even track all this information, we have to store the entire blue set of each block, which would take huge amounts of space. Worse yet, we have to keep the entire history to compute the blue past of any incoming block.

That’s where recursion comes to the rescue! Instead of computing the entire past, we a block can inherit the blue past of one of its children, and only add to it whatever it thinks should be added.

Greedy Algorithms

A greedy algorithm is an algorithm that does not look for the best possible solution, but is satisfied by making the most profitable next step.

Imagine you are a robber, and you broke into a museum with enough room in your knapsack to steal ten pounds of antiques. The greediest way to fill your bag will be to keep loading it with the most expensive item that fits, until there are no such items.

You look around and see two Victorian era statuettes of Charlie Lee and Craig Wright, each weighing five pounds and worth 500 kas. You also see three antique Ming dynasty vases, each weighing three pounds and worth 390 kas.

If you follow the greedy algorithm we described, you will end up with two statues with a total worth of 1000 kas, which is not optimal, since by grabbing the three vases, you would have made off with 1170 kas worth of antiques.

I bet at least some of you are already jumping in your seats dying to tell me “but Shai, you were just optimizing over the wrong quantity. If you’d check for value per pound you would have found that each statue is worth 100 kas per pound, while the vases go for 130 kas per pound”, and you would be correct. In this case, there is another greedy algorithm that does find an optimal solution.

However, with some work, one can also find a counterexample for that algorithm as well (see exercises). In fact, even if we assume in advance that all the museums treasury has uniform value per pound, and even that their weights are always in whole pounds, you could still find a counterexample for any algorithm you could find. Why am I so sure? Because this problem, called the knapsack problem, is also NP-hard.

Another example of a greedy algorithm is when we walk a city we don’t know, and want to avoid taking routes that are too steep. If we are at a junction, where the left road seems rather flat, while the right one leads to a huge ascent, we are more likely to take the left turn, despite no guarantees that just around the corner, the flat road will lead us to an even steeper ascent.

But of course, the best example of using a greedy algorithm in real life is after we finish our walk and are sitting to enjoy a cup of coffee, for which we pay using a digital currency that relies on a greedy algorithm called GHOSTDAG.

Approximation Algorithms

Note that the greedy algorithm for the knapsack problem might have not have optimized our profit, but it still did a rather good job. Yeah, 1000 kas is not 1170 kas, but it is quite approximate. We would have been much more upset if we’d have finished the night with 100 kas.

This is an example of approximation, a common theme in computer science that realized that sometimes, finding a “close to optimal” solution is the best compromise between optimality and tractability. Computer science is plagued with examples of NP-hard problems that admit algorithms guaranteed to provide a near-optimal solution in some sense.

For example, the knapsack problem might not be solvable. But given $n$ items, there is an algorithm that runs in time $O(n^2)$ and outputs a choice of items whose total value is at least, say, $3/4$ of the best solution. (We can actually say more: for any $\varepsilon >0$ the FPTAS algorithm can find a solution whose value is at least $(1-\varepsilon)$ of the optimal solution in time $O(n^2/\varepsilon)$ . So if, for example, you want to guarantee your solution gives you at least $(1-1/n)$ of the best solution, you could do it in quartic time of $O(n^3)$ .) However, that algorithm is not greedy. In the exercises we will see a greedy $1/2$ -approximation algorithm for the knapsack problem.

GHOSTDAG is also an approximation algorithm. What does it return? An approximately maximal $k$ -cluster. In what sense is the output “approximating”? In the sense that we can prove that it provides the security we require, despite not necessarily outputting a maximal $k$ -cluster.

Now let me stress: the security itself is not approximate in anyway. GHOSTDAG is not “approximately secure”. The output is an approximation of a maximal $k$ -cluster, but we can prove that this approximation is enough to fully establish the security. In other words, the security definition we subject GHOSTDAG to has nothing to do with how it is implemented, and is fully agnostic to any greediness or approximation.

Set Cover: A Greedy Approximation Example*

So far we haven’t actually seen an approximation algorithm. We’ve seen the knapsack problem, but no approximate solution.

The following is a nice textbook example called the set cover problem. In this problem we are given a set $U$ and a collection of subsets $S=\{S_1,\ldots,S_n\}$ of $U$ , with the promise that each $u\in U$ appears in at least one $S_k$ . Our goal is to find the smallest possible $S’\subset S$ such that each $u\in U$ is contained in at least one element of $S’$ .

For example, say we need to arrange an activity day for a bunch of picky children. Let $U$ be the list of guests, and let each $S_i$ could represent an activity and the list of children who will be willing to participate in it. We want to find the smallest assortment of activities such that each child would agree to participate in at least one activity.

Another example of an instance of set cover is trying to find the smallest amount of ellipses in the following picture such that each star is contained in at least one ellipse:

It does not take a wild stretch of the imagination to realize that the significance of the set-cover problem goes beyond picky children, and it arises naturally in many real world problems such as resource allocations.

Unfortunately, the set cover problem is also NP-hard. So to find practical solutions to problems that could be restated as a set cover problem, we could really use a good way to approximate it.

Fortunately, there is a very simple algorithm that does just that:

Start with an empty list $L$
Let $U’$ be the union of all elements of $L$ (the list of children willing to participate in an activity in $L$ )
While $U’ \ne U$ (while there is a child that will not participate in any activity in $L$ ):
- Insert into $L$ the element $S_k$ that maximizes $|S_k \setminus U’|$ (add the activity with the most children that are not willing to participate in any activity already chosen)

The promise that $\bigcup_k S_k = U$ (that all children want to participate in at least one activity) implies that the algorithm will terminate after at most $n$ iterations (since each iteration decreases the number of children without an activity by at least one).

The algorithm above is an example of a greedy algorithm. A greedy algorithm always makes the steps that looks best “for now”, disregarding the possibility that showing some restraint now could pay off in the future. The reason I am pointing it out is that GHOSTDAG is also a greedy algorithm.

For example, running the algorithm on the example above could end in the following result:

outputting five ellipses that cover all stars. Which is good, but not optimal, as they could be covered using only four ellipses:

One can prove that if there are $n$ kids, and the best solution requires $k$ activities, then the above algorithm will return at most $k\cdot \ln n$ activities, I will explain how to prove this in the exercises. (One can also show that if $P\ne NP$ it is intractable to provide a better approximation, I will not explain how to prove this).

Part 3: High-Level View of GHOSTDAG

We now deliberate on the recursive structure of GHOSTDAG.

Before diving into the details, we need to set up some terminology. For each block we want to compute its blue past, denoted $B.BluePast$ . That is, we want to know which blocks are blue in $B$ ’s opinion. Given that, we can define the blue score, $B.BlueScore$ , which is just the number of blue blocks in its past.

Remark: in practice, we can’t just look at the size of the blue past. Since there is difficulty adjustment, we have to consider the weight of each blocks in terms of the amount of work required to create it, exactly like we had to for Bitcoin. Hence, in the actual implementation we use another metric called blue accumulated work (BAW).

Given any block $B$ , we say that $C$ is the selected parent of $B$ , and denote $C = B.SP$ , if $C$ is the parent of $B$ with the highest blue score (if there are several possible choices for $C$ , use a tie breaking rule, the Kaspa implementation goes by higher hash).

Given a block $B$ , its merge set is the set of all blocks in its past that are not in its selected parents closed past: $B.MergeSet = B.Past\setminus B.SP.\overline{Past}\text{.}$

Note that all the blocks in $B.MergeSet$ are also in $B.SP.AntiCone$ which means that we can color at most $k$ of them blue if we want the blue set of $B$ to be a $k$ -cluster. This is very nice, because it puts a constant limit on how many blue blocks each block is adding, making the storage required for each block fixed.

Once GHOSTDAG computed what blocks it should add to the blue set, we can use that information to output a topological sorting, whereby making GHOSTDAG into an ordering rule. The way it does so is by following the template for iterative topological sorting we described in the last chapter: it computes a topological sorting of $MergeSet(B)$ that gives precedence to blue blocks, and appends it to the topological sorting of $Past(B)$ .

The Recursive Structure of GHOSTDAG

We can summarize the entire recursive structure into GHOSTDAG into a single statement: $B.BluePast = B.BlueAntiCone \cup B.SP.\overline{BluePast}\text{.}$

To understand how to exploit this structure, we will start with a simple example: checking that one block is in the blue past of the other. Say that we want to implement a function inBluePast(C, B) that checks whether $B$ thinks $C$ is blue. We could do so like this:

def inBluePast(C, B): #highly inefficient
  if C in B.BlueMergeSet:
    return True
  if B is Genesis:
    return False
  return inBluePast(C, B.SP)

This function is very straightforward:

If $C$ is in $B.BlueMergeSet$ then $B$ thinks its blue so we return True.
Otherwise, if $B$ is $Genesis$ then return False, since from the point of view of $Genesis$ , no block but $Genesis$ is blue. (Note that if $C = Genesis$ then we have $C \in Genesis.BlueMergeSet$ (because every block is in its own blue merge set), so the previous line will return True.
If none of the above hold, we need to dive deeper to answer the question, so we replace $B$ with $B.SP$ and repeat the process

The recursive nature is that we are allowed to switch $B$ with $B.SP$ : since $B$ inherits $B.SP.BluePast$ , if we find out that $B.SP$ thinks $C$ is blue, we can conclude that so does $B$ . We can apply this deeper and deeper to $B.SP.SP$ , $B.SP.SP.SP$ and so on until we find the blue merge set accommodating $C$ , or hit $Genesis$ .

This implementation might work, but it is extremely inefficient. The worst offense is that if $C$ happens to be red, the algorithm will make its way all the way down to genesis, regardless of how high $C$ is up the DAG. This means that the worst case for this code will increase linearly with time, which is very undesirable.

With some reflection we realize that we’ve actually done a lot of unnecessary work because we did not use all the information available to us. In particular, for any block $B$ we can check if $C \in B.MergeSet$ and if it is, check if it is blue. This will prevent us from going below the first block on the selected chain of $B$ that knows of $C$ . This would look something like this:

inBluePast(C, B): #somewhat inefficient
  if not C in B.Past:
    return False
  if C == B:
    return True
  if C in B.MergeSet:
    return (C in B.BlueMergeSet)
  if B is Genesis:
    return False
  return inBluePast(C, B.SP)

In this implementation, we first put a stop to the entire thing if $C \notin B.Past$ , as we should. This implies that the recursion will never go deeper than $C$ , since it will terminate as soon as we hit a chain block that is parallel to $C$ . This is still not a very good implementation, since there are edge cases that can cause it to run linearly long. For example, consider running inBluePast(C, B) on the following DAG:

There are better implementations still that can alleviate the problem, but we do not concern ourselves with them. Why not? Because they do not happen, either organically or maliciously. Checking whether a block is in the blue past of another block is not an operation that can be triggered by an adversary for arbitrary blocks, and in the honest majority setting, the depth difference between $B$ and $C$ is bound.

There is actually a much more important yet nuanced detail here, it is hiding in plain sight in the very first line of the function: if not C in B.Past. We are so used at this point to statements like “well, if $C$ is in the past of $B$ then…” that I presume most readers did not stop to ask themselves: “wait… how do we actually check this?” And the answer is that it is very hard to do correctly. In my years on board the Kaspa wagon, I have talked to several teams who tried to implement GHOSTDAG but couldn’t make an implementation that’s even remotely adequate in terms of efficiency. A typical attempt at implementing GHOSTDAG chokes completely when trying to compute the ordering for a DAG of width $5$ and $k=18$ , and the performance bottleneck is exactly that: checking whether one block is in the past of the other. We call such a check a reachability query. GHOSTDAG is very dense in reachability queries, with the number of such queries as high as $O(k^2)$ for each incoming block. Devising a clever way to implement them efficiently is key to efficiently implementing a DAG protocol.

In the next post, we will see how to compute blue merge sets reasonably efficiently given that we know how to implement efficient reachability queries, and further down the episode we explain how reachability queries are implemented.

Extracting the Topological Sorting

The block $B$ doesn’t only inherit the blue blocks in $B.SelectedParent.Past$ , but also the topological sorting of $B.Past$ . We can then compute a topological sorting of $B.MergeSet$ and append it to the inherited topological ordering to obtain an ordering of $B.Past$ , as we explained in detail in the previous chapter.

There is actually a lot of freedom in choosing the ordering, the only property we require of the ordering is that if a blue block can appear before a red block then it has to. So how can we pick such an ordering?

A common mistake is to try and sort “layer by layer”: start with the blocks whose parents are all outside $B.MergeSet$ , and place the blue ones before the red ones. Now take the blocks whose parents were already sorted and do the same thing. This will provide a topological sorting, but it might not give precedence to a blue block even though it could. For example consider this DAG:

The “layer-by-layer” approach will start with the layer $A,B,C,D$ , since $D$ is red, it will place it last, say it sorted them as $A\prec B\prec C \prec D$ , it would then move to the next layer (containing only $E$ ) and then to the last one (containing only $F$ ), and we’ll end up with the ordering $A\prec B\prec C\prec D\prec E\prec F$ .

The problem with that is that $A\prec B\prec C\prec E\prec D\prec F$ is also a topoloical sorting, but one that prefers $E$ to $D$ , as we want. The blue block $E$ is parallel to the red block $D$ , and yet it appears after it in the ordering, deprived of its precedence simply because its parents are in the anticone.

A correct approach will be something like this: start with the blue blocks in the first layer, then go on to all blue blocks whose parents were already sorted, and so on. Keep doing so until you are stuck, that is, when all unsorted blue blocks have unsorted parents. Once you get stuck, add a single red block, and go back to working on blue blocks. Keep going until all blocks are sorted.

So in the example above, we would start with the blocks $A,B,C$ and sort them somehow. We would then notice that all of $E$ ’s parents are sorted, so we can add $E$ to the ordering. We are now stuck, as the only unsorted blue block is $F$ , but $F$ has an unsorted parent $D$ . So we add $D$ to the ordering and keep going. The resulting ordering is indeed possibly $A\prec B\prec C\prec E \prec D\prec F$ . We ensured that no blue blocks are deprived their precedence by only adding red blocks when we absolutely have to.

This is not a very interesting example, and I recommend that you try to manually compute the ordering for more complicated cases, a few are provided in the exercises.

What makes the algorithm above ambiguous is the order we traverse the blocks. Different traversal orderings can provide one of many different topological orderings that all admit the required blue-precedence property. For example, the ordering $B\prec A\prec C\prec E \prec D\prec F$ is equally valid.

There is a very important nuance here: from a theoretical perspective, the security argument for GHOSTDAG will work for any ordering that fits the mould. However, when designing an actual network all of the above must be specified exactly. All implementations in the same network must follow the same ordering. Otherwise, nodes will not agree on which blocks are blue, and the network will split. In other words, if you want to implement your own network that uses the GHOSTDAG protocol, feel free to choose any suitable ordering. However, if you want to implement your own Kaspa client, you will have to comply with the exact ordering used in the Kaspa reference implementations.

The Virtual Block

We explained above how to extract the topological ordering from the point of view of a particular block, but what is the ordering in consensus? The POV of what block should be used as the objective truth?

The most immediate response might be “go with the bluest tip”, but that’s not good. We want to delay the other tips, but not ignore them.

What we do instead is to ask ourselves “if an honest miner mined a block over the DAG we observe, what would be the ordering according to this block?”. We call this block the virtual block.

Currently, it seems like the virtual block simply points at all tips and then computes GHOSTDAG, but that’s not completely true, for reasons we will see in the future:

A block could be invalid, and pointing at an invalid block will make your block invalid. Blocks that are invalid because they are malformed, don’t satisfy the difficulty requirement, etc., or any other violation that can be read off the header alone are trashed immediately, so we just disregard them. However, blocks could also be invalid due to their data. The handling of such blocks is different and could create situations where the valid block with the highest blue score is not the selected child. We explain this in detail when we discuss implementing the UTXO model over GHOSTDAG.
Actually, it’s not true that a miner would point at all tips. It is true for the version of GHOSTDAG we have seen so far, but in the prunable version, some more consensus rules are required, that actually sometimes require the miner to skip some of the tips. We want the ordering to remain truthful to that limitation.

More generally, we want to use a virtual block since we want the node to extract an ordering in the same way that a miner would. This couples the security of mining with the security of running a node, which is what we want.

Part 4: Picking Blue Blocks

We reduced computing GHOSTDAG to sampling blue blocks out of the merge set, which is the next problem on our list.

I have stated several times that “GHOSTDAG is a greedy algorithm”. Well, this is the greedy part. The way we choose the blocks in $B.BlueMergeSet$ is by traversing $B.MergeSet$ and, for each block, ask ourselves “can we color this block blue without destroying the $k$ -cluster property?”. We do not ask ourselves “would we have managed to add more blocks if we traversed them in a different ordering?”.

This means that, at the highest level, computing $B.BlueMergeSet$ looks like this:

def computeBlueMergeSet(B):
  B.BlueMergeSet = {B.SelectedParent}
  for C in B.MergeSet:
    if canAdd(B,C):
       B.BlueMergeSet.add(C)

First note that we go over $B.MergeSet$ in some arbitrary order. We do not need to specify exactly what this order is. In the Kaspa implementation this freedom was leveraged to choose an ordering that makes implementation as efficient as possible.

Obviously, we encapsulated most of the heavy lifting into the canAdd function. The rest of this post will deal with figuring out this function.

Checking Blue Candidates

So to reiterate, we want to create a function canAdd(B, C) that is given a block $B$ and a block $C$ and tells us if $C$ can be added to $B.BlueMergeSet$ without violating the $k$ -cluster property.

To do so, we need to do two checks. First, we need to check that there aren’t already $k$ blue blocks in $C.AntiCone$ . Second, we need to check that if $D$ is a blue block in $C.AntiCone$ then there aren’t already $D$ blue blocks in $C.AntiCone$ .

The first step is demonstrated here for $k=2$ :

The block $C$ cannot be added because there are $k+1$ blue blocks in its anticone.

For the second step, consider this similar situation:

Here $A$ is the only blue block in $C.AntiCone$ , so $C$ can be blue without violating the $k$ -cluster property. However, since $C \in A.AntiCone$ and there are already blue blocks in $A.AntiCone$ other than $C$ , adding $C$ will cause $A$ to violate the $k$ -cluster property.

On the surface this doesn’t seem very complicated: we can traverse $B.AntiCone$ and for each block colored blue, keep track of how many blue blocks are already in its anticone, never letting their amount suppress $k$ . Since there could be at most $k$ blue blocks, this will require at most $k^2$ reachability queries, and the complexity is fine, right?

So here’s the problem: we can’t just look inside the anticone. There are situations where we have to look into $B.SP.Past$ . Consider this scenario:

Here (for $k=2$ ) we obviously are not allowed to color $C$ blue. However, we cannot learn this from $B.MergeSet$ alone.

Checking how many blue blocks are in $C.AntiCone$ is a reasonable task, it is the other way around that makes things cumbersome. So now we have to go through blue blocks in $B.SP.Past$ and check how much blue blocks they have in their anticone (including those outside $B.SP.Past$ ? That doesn’t bode well.

This becomes much easier if we break the work further into two steps: First, implement the function under the assumption that for any blue block $C \in B.Past we can magically know how many blue blocks are both in$ B.BluePast $and in$ C.AntiCone$. Second, find a nice way to track this quantity that does not rely on magic.

Assuming that blue anticone sizes are tracked for us, we can implement canAdd(B, C) as follows:

def canAdd(B, C, blues = 0):
  if B in C.Past or B == C:
    return true
  for D in B.BlueMergeSet:
    if D in C.Anticone:
      if D.BluesCount == k or blues == k:
        return false
      blues = blues + 1
  return canAdd(B.SP, C, blues)

This code is both crucial and nuanced, so let us take the time to pick it apart.

First, the syntax blues = 0 in the first line just means that the program will assume blues is set to 0 unless we tell it otherwise, like we do in the last line. We use the variable blues to count how many blue blocks we have already noticed in C.anticone, and using it as a parameter allows us to pass it to each recursive call.

We first check if $B \in C.Past$ . We will come back to this check in a while, for now we only need to notice that this will never happen during the first recursive call, since computeBlueMergeSet(B), only calls canAdd(B, C) for $C \in B.BlueMergeSet \subset B.\overline{Past}$ . Second, we check if B==C, this is just a technicality that is required because we defined $BlueMergeSet$ to include $B$ (if we hadn’t included $B$ , we could remove this line, but would have had to make a similar check elsewhere).

The block starting with for D in B.BlueMergeSet is where we iterate over blue blocks in $B.BlueMergeSet$ . Whenever we find a blue block $D$ that is in $C.Anticone$ we check that neither $C$ nor $D$ already has $k$ blue blocks in its anticone. If one of the checks fails, we return false. Otherwise, we increment blues to account for $D$ being a blue block in $C.Anticone$ .

If we reached the last line, this means that we have finished traversing $B.BlueMergeSet$ without exceeding the $k$ blue blocks limit in $C.Anticone$ . So we now recursively call canAdd, asking it to repeat the process, starting with $B.SP$ .

At the top of the following recursive call, things are a bit different. Now it is B == C that could never happen, yet B in C.Past is definitely possible. If indeed $B\in C.Past$ there is no point to keep going since all of the remaining blue blocks are in $C.Past$ , so they are not in $B.Anticone$ . After this check, things continue as perscribed.

So how long does it take this algorithm to terminate? Here’s the nice thing: it will never recurse more than $k + 1$ times. Why is that? Well, because each recursion that is not satisfying B in C.Past increases blues by at least $1$ , and once blues exceeds $k$ is game over.

Tracking Blue Anticone Sizes*

The missing link (excluding reachability queries) is the ability to track how many blue blocks are in each block’s anticone.

This is not very difficult, but is nuanced, and the correct solution is saddled on the recursive structure of GHOSTDAG.

The first (wrong) idea that comes to mind is to simply store, for each block, the number of blue blocks in its anticone.

This idea fails because the blue coloring is subjective. Consider for example this situation for $k=2$ , where bold arrows denote the selected chain:

Here we see that there are two blue blocks in $A.Anticone$ , $B$ is one of them and $C$ isn’t.

Three blocks later, we can find ourselves in a completely different reality:

Note how everything has changed. Not only did the number of blue blocks in $A.Anticone$ was reduced from two to one, but $B$ has changed from blue to red while $C$ has changed from red to blue!

What made this possible is the reorg: the selected tip in the first scenario is no longer on the selected chain in the second.

This means that we have to take subjectivity into account. So what, each block will track the blue anticone sizes of all blocks in its past? That’s not good, since the amount of information each block will have to carry grows. OK, so what if each block only tracks the blue anticone of blocks in its mergeset? No dice. The problem here is that blocks in $B.BlueMergeSet$ could be in $C.Anticone$ for some $C\in B.SP.Past$ . For example, here for $k=3$ :

the block $A$ is in $B.SP.MergeSet$ . However, $B.SP$ does not know that $C$ exists.

The trick here is very similar to what we do with the blue past. Each block inherits the blue sizes from its selected parent, and only keeps track of the blue blocks whose blue anticone changed. Now, why does this work nicely?

First, note that since each block only tracks this information from their point of view, blocks never change color. It follows that the blue anticones of blue blocks can only increase. Second, note that the number of blue anticone changed by a single block is bound by a constant: any block in $B.MergeSet$ that we color blue can be in the anticones of at most $k$ blue blocks, we are adding at most $k$ blue blocks, so trivially obtain a bound of $k^2$ .

Remark: We can actually improve this bound to $O(k)$ . The key observation is that if $C$ has many blue parallel blocks in $B.BlueMergeSet$ then it can only have few blue parallel blocks outside $B.BlueMergeSet$ . However, if $C$ has few parallel blocks in $B.BlueMergeSet$ , it must share non-parallel blocks outside $B.BlueMergeSet$ with other blocks in $B.BlueMergeSet$ . The two extremes for $k=2$ are here in the following illustration:

In the top DAG, all elements of $B.BlueMergeSet$ are parallel to each other so they cannot be parallel to any other blue blocks. In the right DAG, none of them are parallel to each other so they are arranged in chain, but any block that is parallel to the top of the chain must also be parallel to the rest of the chain, so there could be at most $k$ of them.

Once we have that in place, to check the number of blue blocks in $C.Anticone$ from $B$ ’s point of view, we traverse down the selected chain until we find the first block that records $C.BlueAnticoneSize$ . Since we went back in time, this would actually be the latest update of this quantity, giving us the size we wanted. For example here:

If we want to compute how many blue blocks are in $A$ ’s anticone from $B$ ’s perspective, we start at $B$ and see that the information is already available there. However, if we want to find that information for $D$ , we start from $B$ , notice that there is no record, proceed to $B.SP$ , and find that there is a record. If we found that we traversed into a selected chain block in $C.Past$ , then we know that this information isn’t recorded anywhere along the selected chain, which means that $C$ is not a blue block at all! We ignore this edge case and assume this check is only performed for blue blocks.

So how deep can this recursion go? Well, if we choose $C$ to be a very low block it could theoretically take us all the way to genesis, just like inBluePast(C, B). And just like inBluePast(C, B), we are protected by the fact that an adversary cannot just ask a node to compute this function for arbitrary inputs. Yes, the complexity over all blocks is bad, but when we remember that inBluePast(C, B) and the newly christened blueAnticoneSize(C, B) are only used as subroutines for GHOSTDAG, we can limit the complexity analysis to inputs that are actually fed to these functions.

To translate the above discussion to code, we will do two things: we will modify computeBlueMergeSet(B) and canAdd(B, C) to keep track of this information, and we will add a helper function inBluePast(C, B) that uses this information. We have to be careful though, since we use this information as we compute it, we must avoid querying inBluePast(C, B) on blocks that were not yet updated.

We will use a standard data structure called a dictionay (or map) that has keys that are mapped to values, and the has keyword allows us to check if the dictionary has a particular key.

The modification of computeBlueMergeSet(B) is the simplest, all we do is add a line that creates an empty dictionary

def computeBlueMergeSet(B):
  B.BlueMergeSet = {B.SelectedParent}
  B.BlueSizes = {} #empty dictionary
  for C in B.MergeSet:
    if canAdd(B,C):
       B.BlueMergeSet.add(C)

The revised canAdd(B, C) function is trickier. It needs to track not only the number of blue blocks, in $C.Anticone$ , but the blue blocks themselves. We can’t update their anticone sizes on-the-fly, because we might find out later that $C$ is red. We also have to keep passing the block $B$ we started from, because that’s where we’ll have to update the anticone sizes if we ever return true.

def canAdd(B, C, original_B = B, blues = {}):
  if B in C.Past or B == C:
    original_B.BlueSizes[C] = blues.size()
    for D in blues:
      if original_B.BlueSizes has D:
         original_B.BlueSizes[D] += 1
      else:
         original_B.BlueSizes[D] = blueAnticoneSize(D, B) + 1
    return true
  for D in B.BlueMergeSet:
    if D in C.Anticone:
      if blues.size() == k or blueAnticoneSize(D, B) == k:
        return false
      blues.insert(D)
      if 
  return canAdd(B.SP, C, Original_B, blues)

Finally, we implement blueAnticoneSize(D, B):

def blueAnticoneSize(D, B):
  assert(B not in D.Past)
  if B.BlueSizes has D:
    return B.BlueSizes(D)
  else:
    return blueAnticoneSize(D, B.SP)

Part 5: Reachability Queries I — From Trees to DAGs

Reachability queries are the complexity bottleneck of GHOSTDAG, and being able to implement them efficiently is one of the reasons for Kaspa’s impressive performance. Like in most great algorithms (such as GHOSTDAG itself), the core idea is very salient, and the intricacies follow from adapting it to reality.

The reachability problem can be stated this way: given two block $B,C$ , determine whether $C\in B.Past$ .

To better understand the difficulty, let us look at two naive solutions, that are at two extremes of the time vs. space trade-off.

The first solution is, given $B, C$ just start traversing $B$ ’s past until you reach $C$ (and then you output true) or $Genesis$ (and then you output false). This solution is very efficient in terms of space, in fact, it does not require any permanent storage at all. However, it takes linear time. As time progresses, each such query will take longer.

You can argue that this process can be trimmed quite a lot: if $C \in B.Past$ then necessarily $C.BlueScore < B.BlueScore$ , so we can terminate our search whenever we got to a block whose blue score is less than $C.BlueScore$ , as it will never lead us to $C$ . However, that’s not enough. By posting two blocks, an attacker can force GHOSTDAG to make arbitrarily long reachability queries:

In this scenario, when we try to compute GHOSTDAG for $D$ , we will have to find out whether $D \in B.Past$ , which will lead us all the way down to $C$ , regardless of how many blocks there are between $B$ and $C$ .

In other words, this solution enables a CPU attack and is thus unacceptable. We will discuss this more soon.

At the other extreme, for each block we could simply keep a list of all blocks in past. Implemented correctly, that will allow answering any reachability query in constant time. The problem is that it requires quadratic storage. For example, for a chain of length $n$ we have that the block at height $k$ has past of size $k-1$ . Hence, even if the storage requirements are a measly $1$ byte per block reference (which they aren’t), the required storage will be $1 + \ldots + n$ bytes. I guess most of you know the formula $1 + \ldots + n = \frac{1}{2}n(n+1)$ , but even without it, we note that of $1,\ldots,n$ the largest $n/2$ numbers are at least $n/2$ so we have that $1 + \ldots + n > n^2/4$ . That is, the required storage grows quadratically fast.

Remark: The truth is we don’t need the entire past of all blocks, but only the pruned past. But a solution that grows quadratically with the BPS and the size of the pruning window isn’t a hit either. A simple back-of-the-envelope I will not repeat here shows that 10BPS with 36 hours pruning would require about 16GB of storage. That might not sound like a lot in terms of hard drive, but the real problem is having to random access such a huge data set thousands of times a second. Loading the entire thing to memory is not practical, and constantly page it will completely constipate the client.

Reducing from DAGs to Trees

By now you should be very used to the method of solving a problem by identifying its core, and showing how solving it can solve the entire thing. This case is no different.

But first, let us introduce some definitions.

First, recall the notion of a tree. A tree is a simple form of DAG where there is a root node, and each block only has one parent. For example, the set of all valid Bitcoin blocks ever created is a tree. Recall that, generally, a tree looks like this:

Given a DAG, a spanning tree is a choice of the edges that forms a tree that is coincides to all vertices.

For example given the following DAG:

we can find many ways to choose a spanning tree, including these two:

Note that any tip of the DAG must be a tip of the spanning tree, but not the other way around.

Remark: note that a spanning tree is not an induced subgraph.

The GHOSTDAG ordering affords a natural spanning tree for the DAG: choose only edges that goes from blocks to their selected parents. Since every block has exactly one selected parent, the result is a spanning tree. We call that spanning tree $SelectedTree$ . For each block $B$ , let $B.TreeFuture$ be the set of vertices from which $B$ is reachable via the tree edges. Note that this is not the same as $B.Future$ . For example, here:

we have that $C \in B.Future$ but $C \notin B.TreeFuture$ (because we can reach $B$ from $C$ by following arrows, but not by only following red arrows).

Note that for any two blocks $B$ and $C$ , proclaiming that $B\in C.TreeFuture$ is the same as declaring that $C$ is along the selected chain starting from $B$ .

The problem we currently want to solve is this: assume we have a magic box called $InTreeFuture$ that is given two blocks $B,C$ and tells us whether $B \in C.TreeFuture$ . Also assume that $InTreeFuture$ takes a fixed short time to respond, say a microsecond. Can we use $TreeOracle$ to efficiently implement reachability calls for DAGs?

Note that when we say “efficient”, we are also limiting the number of $InTreeFuture$ queries. Our goal is to keep the number of require oracle calls fixed.

Remarks: It should be pointed out that in the graph theoretic sense, there’s nothing special about $SelectedTree$ and we could have used any spanning tree. However, using the selected tree is mutually beneficial: on one hand, GHOSTDAG naturally provides us with this tree, and on the other, being able to quickly answer whether one block is in the selected chain of another is very useful.

Apparently, there is a rather easy way to do it.

For each block $C$ we define the future covering set to contain all blocks $B$ such that $C$ is in the anticone of $B$ : $C.FCS = \{B\mid C\in B.MergeSet\}\text{.}$

We argue that the following function will return True if and only if $C \in Past(B)$ :

def inPastOf(C, B):
  if InTreeFuture(B, C):
    return True
  for D in C.FCS:
    if B == D or InTreeFuture(B, D):
      return True
  return False

It is straightforward to show that if inPastOf(C, B) returns True then $C \in B.Past$ . The other way around is a bit more tricky.

The check InTreeFuture(B, C) succeeds if and only if $C$ is in the selected chain of $B$ . Otherwise we let $D$ be the last block we traverse that is in $C.Future$ . For example:

The not-so-interesting case is $B = D$ , which happens when $C\in B.MergeSet$ . Otherwise, we are in a situation similar to the example above. In this case the following holds:

$C \in D.Past$ while $C \notin D.SP.Past$ . In other words, $C \in D.MergeSet$ , and in other other words $D \in C.FCS$ .
We reached $D$ by only traversing to selected children, meaning that $B \in D.TreeFuture$ .

From the first fact, we get that the code will eventually evaluate InTreeFuture(B, D), and from the second we get that it will return True, whereby inPastOf(C, B) will return True.

For another demonstration, in the following diagram I colored blocks in $C.TreeFuture$ purple and blocks in $C.FCS$ green:

Note that all blocks in $C.Future$ are either purple, green, or in the Tree future of a green block.

Security and Bounded Merge

When discussing security notions, I harped about the fact that a real-world blockchain is actually subject to many different security notions, and that adding a new component requires sufficiently reasoning about its security. This is an example of this. When you add a new component to the system, you need to formulate what adequate security means. In many cases, you don’t have to mathematically formalize it rigorously prove its hold. But at the very least, both the security notion itself and any claims that it is satisfied must hold to scrutiny.

In this case, the relevant security notions are those of CPU attacks and storage attacks. Simply stated, we want to make sure that an (at least) 49% attacker can not cause nodes to waste time and space. That the amount of RAM, hard-drive and CPU cycles an adversary can cause nodes to require must be at most proportional to the amount of work spent into the attack.

So how can we reason that the above solution is secure? The question is that we can’t. You see, there is a storage attack.

We first note that if we sum the sizes of all future cover sets, we get the same than if we sum the sizes of all anticones: In the first case we go over all anticones and for each add the number of blocks inside, and in the second case we go over all blocks and count the number of anticones that they are in.

It turns out that an attacker can easily create blocks in such a way that the sum of sizes of their anticones is quadratic in the number of blocks.

Say that an adversary creates $2n$ block and arranges them like this:

All of the blocks $C_1,\ldots, C_n$ will have the same selected parent (why?), so let us assume it is $B_1$ . It follows that all of the blocks $B_2,\ldots,B_n$ are in the anticones of all the blocks $C_1,\ldots,C_n$ . Hence, $B_2.FCS$ to $B_n.FCS$ all contain references to all blocks $C_1,\ldots,C_n$ . This comes out to a total of $n(n-1)$ references. Hence, the storage required for the $2n$ attacker blocks quadratic in $n$ , and computing reachability queries to one of these blocks will require linear time.

What saves us is something called bounded merge which, as the name suggests, means there is some constant $L$ such that any block whose merge set has more than $L$ blocks is invalid. This modification of the GHOSTDAG rules was introduced as part of the design of the pruning protocol, but it turned out very useful for reachability as well, as it solves our problem. If there are $N$ blocks in the DAG, we would have to store at most $L\cdot N$ references, and inPastOf makes at most $L$ calls to InTreeFuture, giving us the constant time and linear space we needed.

The bounded merge rule might seem like a deus ex machina, but that’s just because it is not yet the time to discuss it in depth. It will make much more sense when we revisit it in the second part of the book.

Part 6: Reachability Queries II - Tree Reachability

Like all solutions we’ve seen so far, the tree reachability algorithm is a simple core idea carefully nuanced to cover all problematic cases.

Reachability for Static Trees

Our starting point is the 1989 paper Efficient Management of Transitive Relationships in Large Data and Knowledge Bases by Agrawal, Borgida and Jagadish.

Remark: One might wonder why the paper is called the way it is and what any of this has to do with transitivity. The word transitive means that if some relation holds from $A$ to $B$ and from $B$ to $C$ then it also holds from $A$ to $C$ . An anti example would be the relation “is a parent of”. It can (and in a tree, must) happen that $A$ is a parent of $B$ and $B$ is a parent of $C$ but $A$ is not a parent of $C$ . On the other hand, if $A \in B.Future$ and $B \in C.Future$ then definitely $A \in C.Future$ so the relation “is in the future of” is definitely transitive. Not that the first relation is the one we can compute and the second is the one we want to compute. But there is a special relationship between these relations: the latter is the transitive closure if the former. Given a relation $\to$ we construct its transitive closure $\overset{\text{tc}}{\to}$ by stating that $A\overset{\text{tc}}{\to}B$ if there are $C_1,\ldots,C_n$ such that $A\to C_1 \to \ldots \to C_n \to B$ . Showing that if $\to$ is the relation “is a parent of” then $\overset{\text{tc}}{\to}$ is the relation “is in the past of” is easy, and shell remain an exercise. The bottom line is that this paper shows how a relation in the form of a tree can be preprocessed to efficiently query its transitive closure (actually that’s just the start, as they then extend their methods to general DAGs, but in ways that are highly incompatible with dynamic trees).

The basic idea is this: for any block $B$ we provide an interval made of two positive indices, $L_B \le R_B$ . We think about the interval as the set of all numbers from $L_B$ to $R_B$ , which is why we call it an “interval”. To stress this, we will often denote such an interval as $\left[L_B\ R_B\right]$

Our goal is to set the intervals in a way that has two properties:

If $C \in B.TreeFuture$ then $B$ ’s interval strictly contains $C$ ’s interval: $L_B \le L_C \le R_C < R_B$ . The last inequality is strict to assure that no two blocks have the same interval.
If $B \in C.TreeAntiCone$ then their intervals are disjoint: either $L_C<R_B$ or $L_B<R_C$ .

With these intervals set in place, we can answer tree reachability queries quite easily: $C \in B.TreeFuture$ if and only if $L_B \le L_C \le R_C < R_B$ .

For example, consider this tree:

It could be endowed with intervals like this:

When queried on whether $B\in C.TreePast$ all we need to note is that $L_B = 1 < 3 = L_C$ and that $R_C = 3 < R_B$ and conclude that the answer is yes. Similarly, if we are asked whether $A \in C.TreePast$ we can note that $C_R =3 < 6 = A_L$ and answer no.

Do notice that there are no two blocks with the same right index. This follows from the requirement that if $B \in C.TreePast$ then $R_B < R_C$ .

Preprocessing the tree takes time $O(n)$ , but it only requires storing a constant amount of data per tree node, and can respond to reachability queries in constant time.

That sounds terrific. This algorithm satisfies all the criteria we sought after. So why aren’t we done? Right. As the title suggests, this solution is good for a static tree, but what do we do when new block emerges? The most head-in-the-wall approach is to scrap and recompute all the intervals, but this approach suffers from a problem so egregious I’ll let this meme explain it for me:

The rest of the post is dedicated to the exciting task of refining this method to accommodate a dynamically changing tree. But before we do that, there is an important refrain: if you look it up, you will find that the problem of maintaining reachability in dynamic DAGs (or as it is more commonly called in the literature, “managing transitive relationships of dynamic confluently persistent data structures,” blech) is wide open. In fact, I have never ran into an article that dares to even hypothesize whether a solution is possible or not. You will find that the solution we propose is very much tailored to Kaspa’s measurements. It’s efficiency and security rely on combinatorial properties of the database that are derived from the security properties of GHOSTDAG (even the DAG-to-tree reduction relied on bounded merge). Loosely, we rely on the fact that the tree becomes constantly higher while mostly maintaining bounded width.

Introducing Slack

If you are a slacker, you came to the right place, because we are going to be giving a lot of slack in this section.

Let us revisit this example:

The indices in this example are tightly packed. Not surprising, as there are nine blocks in the tree and nine indices in $\left[1\ 9\right]$ . Given that no to blocks have the same right index, it follows that nine indices cannot accommodate more than nine blocks.

But what if we had indexed the tree as such:

Now we suddenly get that room to squeeze a block in. This extra room a block has is called slack. The block $\left[1 7\right]$ has a slack of one.

Suppose we wanted to add a child to $\left[1 7\right]$ . No problem! We can furnish the unused index $6$ to obtain:

OK, but what if instead, we wanted the new block to point at $\left[5\ 5\right]$ ? This block has no slack, but we can move slack from $\left[1\ 7\right]$ . Zooming in only on the relevant part of the tree, we could perform these steps:

Such maneuvers are called reindexing. The entire problem of dynamical trees can be boiled down to when and how to reindex. Actually, most of the steps of the reindexing are determined by the position of the block and the current indices. Most of the decisions are regarding how to redistribute slack. The actual algorithm requires more sophisticated reindexing methods than the one demonstrated above.

But before going there we should ask, how much slack can we have? Recall that the two indices are positive integers. In particular, we use eight bit unsinged integers. This means that we can use any integer from $0$ (though I find it more pleasant to start my indices from $1$ ) to $2^{64}-1 = 18,446,744,073,709,551,615$ .

How large is that number? Well, if we assume our network creates one thousand blocks per second, it would take more than half a billion years to make as many blocks. Now recall that the indexation is not in consensus — whenever you restart a node, it restarts the indexing from stretch. So assuming that many block will never be created during a single session — is completely within reason.

Using the entire gamut of an eight bit integer gives us enough room to accommodate all the blocks we would ever want to.

Exponential Slack Distribution

So how do we divide the slack? First, we give the entire range to the root, and then we permeate it up to its children. We then repeat the process recursively.

How much slack does the root give its children? It might seem curious now, but we give the children all the slack. The root only keeps just enough indices to accommodate the subtree. Since this is a recursive process, by the time we are done only the tips will have any slack. Why does that make sense? Because we expect new blocks to only appear near the top of the tree. Of course, we need to handle new blocks that point directly at old blocks (and do so in a way that does not create an attack vector), but this is not the typical case.

We further take the idea of cultivating the blocks that are more likely to have new children by giving them an exponentially larger piece of the pie, er, slack. That is, if child $A$ has three times more children than child $B$ , we make sure that it will obtain $2^3 =8$ times more slack. The justification is that we want to give to each tree slack proportional to the probability that, of all trees, it would be the one to contain the GHOSTDAG selected chain. We will see in the next chapter that this probability decays exponentially with the size difference between the chains, similar to chains.

You might ask yourselves why even bother defining this intricate process. After all, isn’t the indexing built dynamically? And doesn’t that just mean we are adding one block at a time? The answer is that every once in a while we will have to reindex, and some of these reindexations are deep and require permeating the new intervals up the tree. That's when the full process becomes relevant.

Deeper Look*

While the description above is sufficient for a conceptual understanding of reachability, it would be instructive to specify it more explicitly and work through a couple of examples.

Say we are now at block $B$ , which has $k$ children and an interval of length $N$ (that is, $R_B-L_B+1 = N$ ). For simplicity, let us just assume that the interval is $\left[1\ N\right]$ (if it isn’t we just need to right shift all of the intervals by $R_B - 1$ ).

For each of the children $B_1,\ldots,B_k$ , set $n_i = |B_i.\overline{FutureTree}|$ . We have that $B$ has a slack of $S = N-(n_1 + \ldots + n_k +1) = N - |B.\overline{FutureTree}|\text{.}$ The child node $B_i$ will receive $n_i$ intervals to accommodate its own tree, as well as a cut of the slack proportional to $2^{n_i}$ . That is, if we set $p_i = 2^{n_i} / \sum_{j=1}^k 2^{n_j}$ and give $B_i$ an interval of size $I_i = n_i + S\cdot p_i$

Remark: If $S\cdot p_i$ is not an integer we round it somehow, it doesn’t really matter how as long as we are careful that no index is contained in the intervals of two children, moving a singe index around will make no difference.

(exercise: Can we set $n_i = |B_i.FutureTree|$ instead of $n_i = |B_i.\overline{FutureTree}|$ ?)

(exercise: verify that $\sum_{j=1}^n I_j = N-1$ )

To wrap our head around the details, lets work through two examples.

Say we want to endow this tree with indices from an interval of size $N=1$ .

We have that $n_1 = |B_1.\overline{TreeFuture}| = 2$ and $n_2 = 1$ , so the total slack is $S=1000 - (2+1+1) = 996$ .

We have that $2^{n_1} = 2^2=4$ and $2^{n_2} = 2$ so that $\sum_{j=1}^k 2^{n_j} = 2 + 4 = 6$ . Hence $p_1 = 2^{n_1}/6 = 2/3$ and $p_2 = 2^{n_2}/6 = 1/3$ .

So the first interval would have the devilish size $I_1 = n_1 + S\cdot p_1 = 2 + 996\cdot{2}/{3} = 666$ and the second interval would be of size $I_2 = n_2 + S\cdot p_2 = 1 + 996\cdot{1}/{3} = 333\text{.}$ One can easily check that $I_1 + I_2 = 999 = N-1$ as expected.

We set $L_{B_1} = 1, R_{B_1} = I_1, L_{B_2} = R_{B_1} + 1, R_{B_2} = L_{B_2} + I_2 - 1$ to obtain the following indexation:

We now repeat the process for the children. Since $B_2$ has no children we know in advance that we can skip it, so let us take the liberty to assume $B_2$ was processed first and have now moved on to $B_1$ . This places us at this situation:

This is a trivial case, but we should verify at least once that our algorithm produces the expected solution even for trivial cases.

So we have that $N=666$ , and that $n_1 = 1$ , so $S = 664$ and clearly $p_1 = 1$ . Hence we furnish $B_1$ with an interval of size $n_1 + S\cdot p_1 = 1 + 664\cdot 1 = 665$ , and we end up with the desired indexation:

For a bit more intricate example we will endow $N = 10006$ indices on the following tree:

We have $n_1 = 3$ , $n_2 = 4$ and $n_3 = 1$ so $S = 10006 - (3+4+1+1) = 9997$ . We have $2^{n_1} + 2^{n_2} + 2^{n_3} = 8 + 16 + 2 = 26$ so $p_1 = 8/26 = 4/13$ , $p_2 = 16/26 = 8/13$ and $p_3 = 1/13$ . So
$I_1 = n_1 + S\cdot p_1 = 3 + 9997\cdot \frac{4}{13} = 3079\text{,}$
$I_2 = n_2 + S\cdot p_2 = 4 + 9997\cdot \frac{8}{13} = 6156\text{,}$
$I_3 = n_3 + S\cdot p_3 = 1 + 9997\cdot \frac{1}{13} = 770\text{.}$

We can of course check that $I_1 + I_2 + I_3 = 3079 + 6156 + 770 = 10005 = N-1$ as expected.

Applying these intervals we get the following indexation:

We now progress to process $B_1$ . Since there is only one child can quickly deduce that it gets the entire interval except the last index, and move on to $B_2$ . This puts us in this situation:

We have $N = 6156$ , $n_1 = 1$ and $n_2 = 2$ so $S = 6152$ . We already computed in the previous example that for two children with subtrees of sizes $1$ and $2$ we have $p_1 = 2/3$ and $p_2 = 1/3$ . We try to compute the first interval to find that $I_1 = 2 + 6152\cdot\frac{2}{3} = 2050\frac{2}{3}$ . To deal with the fractional integer we can decide to round up, and to round down for the next interval, effectively saying “OK, $I_1$ and $I_2$ are fighting over this last index, so I’ll give it to $I_1$ ). Hence, we set $I_1 = 2 + 6152\cdot\frac{2}{3} = 2051$ . A similar computation will find that $I_2 = 4104\frac{1}{3}$ which we will round to $4104$ , ending up with the following indexation:

We can now move to $B$ ’s next child, after which, we will traverse their children’s children and so on.

BFS Vs. DFS

The two most common methods to traverse a tree (or graphs in general) are by a breadth-first-search (BFS) or a depth-first-search (DFS). In BFS, you start from the root, and then traverse all if its children, and then traverse all of their children, and so on. In DFS, you start from the root, go to one of its children, and then go to one of their children and so on, until you hit a leaf, and then you backtrack.

This is best illustrated with a drawing:

Usually, depth first search is easier to implement and maintain, making it preferable when there are no other considerations.

However, if you followed the examples, you might have noticed that children were processed breadth first, and not depth first.

The reason for that is that in our context, height of the tree might be too large for a depth first search. The thing is that when you do a DFS, you have to keep a stack of all the vertices you already started but still haven’t finished processing. This stack tells you where you should return after you finished processing all children. Lets add stacks to the example above to see what this looks like:

As you can see, the stack size at each point is proportional to the height of this point. Hence, traversing even moderately high trees (say, with height in the thousands) could cause a stack overflow and crash the node.

Remark: You might think to yourself that you have implemented DFS a million times using recursion, and you never had to maintain any stack. The reason you didn’t have to maintain a stack is that using recursion makes the compiler maintain it for you. Whenever a function is called, a new function frame is created containing instructions such as where to return when the function is done. This frame is not removed from the stack until the function returns. So despite all of this happening under the hood, the deeper you go, the more function frames are accumulated on the stack, and an overflow will quickly follow.

A First Working Solution

We are positioned to create a first solution to reachability that actually works. In the next post, we will point out some problems with this solution and improve it to respond to them, leading us to the final algorithm.

We start by applying the interval $\left[1\ 2^{64}\right]$ to the Genesis block, and propagating the intervals (using the exponential slack distribution we described above, though it doesn’t really matter right now).

To explain how we handle new incoming blocks, we need to address a few cases.

The first case is the simplest one, where the new block points at a block that has slack (so it is necessarily a leaf):

Note that since $B$ is a leaf, saying that “ $B$ has slack” is the same as saying that $L_B<R_B$ , hence the interval $\left[L_B (R_B-1)\right]$ makes sense.

Now we note a useful fact: a block $B$ satisfies that $R_B - L_B + 1 > |B.\overline{TreeFuture}|$ if and only if $B$ has a child with slack. Why? Well, because $R_B - L_B + 1$ is just the number of indices in $B$ ’s interval, that is, the number of children at can accommodate. Obviously, if $B$ can accommodate more children than it does accommodate, then somewhere in the tree above $B$ there is some free slack.

Why is this useful? Say we want to insert a new child to some block $C$ , and $C$ doesn’t have slack. We can do the following: start moving from $C$ to $B$ until you hit the first block that has some slack above it, and the reindex from there. This will make sure that you do the shallowest reindex possible.

For example, if we run into this situation:

we will traverse our way down to $\left[1001\ 2000\right]$ and start reindexing from there.

You are encouraged to verify that the result would be the following:

The last question remaining is what happens if we reach all the way down to the root and still have no slack? Well, that would mean that since you synced your node $2^{64}$ blocks have been created, which means you have been staring at the screen for literal billions of years, so you might want to consider touching some grass.

Part 7: Reachability Queries III - Deamortizing the Complexity

Let us assume for a moment that the network produces two blocks per level. That is, the reachability tree looks like this:

Note that this is not too different from the way the tree looks in practice most of the time: one main chain with many frizzes coming out of it. A frizzy chain.

Let us see how the solution we proposed last post handles a frizzy chain.

If we assume no reorgs of depth more than $1$ happened (that is, each incoming block pointed either at the selected tip or its selected parent), the indexation of the tree above would look like this.

And we see that after only measly $64$ steps we will find ourselves in this situation:

(In the exercises, you will show that even if we allow reorgs up to depth $d$ , we will arrive at a similar situation within $64d$ steps).

How could this happen? How could we run out of slack so fast? The answer is in the frizzes! You see, when we just received the first two blocks after genesis, we could know which would eventually remain on the selected chain, so we divided the slack equally. It turns out that all future blocks had the same of the two blocks on their selected chain. And the second block? Despite having half of the entire slack, never had to accommodate a single block. Half the slack wasted just like that.

Every time this happens, we lose another half of our slack to the frizz! Those who know exponential (for example, by reading about them in the appendix) understand that even though $2^64$ is huge, halving it many times will drain it very quickly.

So here we are. What will happen next?

When the next block arrives, we find ourselves in this situation:

It is not hard to see that $\left[1\ 8\right]$ is the most shallow block that has slack for the new kid, and after reindexation we might find ourselves here:

The next incoming block does not pose a lot of problem:

But the block after that:

Now we see that the shallowest block that has slack is $\left[1\ 16\right]$ .

The pattern continues this way: after the reorg we will have enough room to accommodate six more blocks, but the seventh block will force a reorg down to $\left[1\ 32\right]$ . After this reorg we’ll have room for fourteen more blocks, after which we will have to reindex all the way down to $\left[1\ 64\right]$ .

If we keep going we will see that after the $n$ th reindex we can accommodate $2^{n-1} -2$ blocks and then we will have to reindex to $\left[1\ 2^n\right]$ . How deep is this reindex? Since any two new blocks increase the height by $1$ , we get that the depth of the reindex is about half of the blocks created, though half as often. Yet, at 10 BPS, it takes about three minutes to create 2000 blocks. So at some point will will see a depth 500 reindex. Three minutes later we’ll see a depth 1000 reindex, six minutes later we’ll see a depth 2000 reindex, 12 minutes later a depth 4000 reindex and so on.

Now it is true that in practice the graph has more varied and wide structure, making these spikes more smooth (yet more often!), and still, this is far from desirable for a node that needs to run on very restricted hardware. The purpose of this post is to smooth out these spikes.

Remark: In complexity theoretic terms, the current algorithm has great amortized complexity: it behaves really well when averaged over periods of, say, an hour. But it has unbounded worst-case complexity, and the bad cases actually happen in practice. Our goal is to deamortize the algorithm, however, we cannot make it have bounded worst-case complexity. One can show that no matter how you distribute the indexes, there will always be some inputs, some way to feed the algorithm blocks, that will require a linear depth ordering. So what are we actually achieving? Recall that I said that this solution is not general but appeals to the security properties of GHOSTDAG. This is where this magic happens. While we can’t make spikes impossible, we can make them highly unlikely assuming the network has honest majority. This provides a huge, quantifiable improvement over the current situation, where spikes are a certainty even if all miners are honest.

Shallow Reindex Root

The first observation is that scenarios like the example above aren’t created by pointing into deep blocks. They always start from the tips. In real honest situations that remains approximately true: while blocks will appear that do not point at a tip, it is extremely unlikely that an honest block will point, say, 100 blocks below the tips.

The idea is that instead of placing the infinite interval $\left[1\ 2^{64}\right]$ on the genesis, we place it on a sufficiently deep block, we call that block the reindex root. Every time this blocks become too deep, we move the reindex root to a new, shallower block. Advancing the reindex root requires a bit of work, but it will prevent huge spikes, as it bounds the depth of a reindex.

More precisely, say that the reindex root is $R$ , and that we just added a new block $B$ that points at the selected tip $B’$ , so $B$ becomes the selected tip. If we find that the distance from $B$ to $R$ is $101$ (note that we can’t “skip” $101$ , at each step we only add one block so the distance from the reindex root to the selected tip can grow by at most $1$ ), we move the reindex root to $R^{+}$ , the only child of $R$ that is in $B$ ’s selected chain:

What is the problem with that? Right, that now we can only answer reachability queries for blocks in $R.TreeFuture$ .

We overcome this by leaving a little slack to the rest of the tree. How little? In Kaspa’s case the number would be $4000 \approx 2^{12}$ , but we will just call it $L$ for leftovers. So if $M$ is the number of times the reindex root was incremented so far. Besides leaving the old $R$ with $L$ slack, we also leave it with enough indexes for itself and all its children besides $R$ . which we will call $R.SideTree$ .

To better follow the details, let me explain this illustration to you:

The top of the drawing shows the indexation after $M$ increments of $R$ . In each of these increments, we provided the old $R$ with indices for its side tree and $L$ slack. I denoted by $K$ the sum of sizes of side trees that occurred during the increments:

When incrementing from the $R$ in the top drawing to the $R$ in the bottom drawing, we relinquish enough indices for the old $R$ and its side tree, as well as $L$ slack, giving the new $R$ an interval that is smaller by $k+L$ .

One thing of note is that we still think of $\left[1\ (2^{64}-(M\cdot L+K))\right]$ as an “infinite” interval. This is easily justified if I can convince you of two things: that $2^{63}$ is large enough, and that realistically it holds that $M\cdot L + K < 2^{63}$ , assuring that $2^{64} - (M\cdot L + K) > 2^{63}$ .

The former is pretty obvious. We said that in one thousand BPS it will take the network half a billion years to create $2^{64}$ blocks. Since $2^{63}$ is half of that, it will only take a quarter billion years.

For the latter, say that $N$ blocks were created since you started your node. Then clearly $K \le N$ (as it is the sum of number of blocks in disjoint trees), and also $M\le N$ (as each increment requires at least one new block). So we get the loose-but-sufficient bound $M\cdot L + K \le N\cdot L+ N = L(N+1)$ . So to ever reach a point where $M\cdot L + K > 2^{63}$ you would first reach a point where $L(N+1) > 2^{63}$ or equivalently that $N \ge 2^{63} / L$ . Recall that we chose $L = 4000 < 2^{12}$ so we get that $N > 2^{63}/2^{12} = 2^{51}$ .

We said that in 1000BPS, it would take a quarter billion years to create $2^{63}$ blocks, so creating $2^{51}$ will take “only” one four-thousandth of that, which is a measly 65 thousand years. So, eh, if you intend to run a node for longer just config a cronjob that’ll restart it once every ten thousand years ago and you should have plenty of safety margin.

Borrowing Slack

Say that a new block $B$ was added, and it is not the new selected tip, how do we handle it?

The readily available answer is “we traverse down its selected chain until we find a block with enough slack, and reindex from there”.

If $B\in R.TreeFuture$ then, as we noted, we will find some slack before reaching $S$ , unless the node has been running for tens of thousands of years without ever being restarted. But what if it isn’t?

Let $S$ be the latest block on the selected chain with $B \in S.TreeFuture$ . Since $S$ is on the selected chain, it means that at some point it was the reindex root (at least for now, we’ll soon introduce new ways to change the reindex root), and when it passed the torch to one of its children, it was given $L$ slack for its side tree, and we are now trying to add $B$ to that very same side tree.

If less than $L$ blocks were added to the side tree since passing the torch, then we will find some slack therein. But what will happen at addition $L+1$ ? In this case we start traversing from $S$ back towards $R$ looking for the first block $T$ that has some free slack. There are three options here:

$T\in R.Past$
$T = R$
$R$ doesn’t have enough slack

We are versed enough by know to know the third option can only happen if our node has been running consecutively for so long that the sun exploded, so we disregard this case.

The first case looks like this:

What we do in this case is to take $T$ ’s slack, move it to $S$ , and then propagate it back up to $T$ using the exponential slack distribution we introduced in the previous chapter.

Note that since we only moved around slack related to $T$ ’s side tree, this has no bearings on $R$ ’s indexation. This is a much lighter operation than reindexing the entire thing. The mechanics of propagating the indices correctly while keeping all intervals contiguous requires some care, but is not essentially difficult. It essentially requires to handle separate handle the cases where the slack is cut from either the left or the right side of the interval. Since we always assume a block never relinquishes its largest index, both cases require a different treatment.

In the second case, the slacks of all side trees from $S$ to $R$ have been depleted:

In this case we have to back-propagate some slack from $R$ to $S$ . And since $R$ ’s indices have been tempered with, we will have to reindex from $R$ too. To justify this, I only need to convince you that this scenario is very rare, even assuming a malicious minority.

The reason for that is that every time the selected chain is extended by one block, the slack below it is increased by four thousand. Even if the average width of the DAG is $100$ , we still get that the slack grows $40$ times faster than the chain. In order to trigger a reorg by slack back-propagation, blocks whose selected chain meets the honest chain at depth more than $100$ have to be created $40$ times faster than blocks in $R.TreeFuture$ . By the time it would take to deplete a single block below $R$ , forty more blocks would appear that require depletion before triggering a full reindex.

Remark: We can restarte this in more quantitative terms. Assume for simplicity that all blocks that are not in $R.TreeFuture$ (where $R$ is the reindexing root at the time they were added) are malicious, that the slack leftover size is $L$ , and that $(1-\varepsilon)$ of the time the width of the honest DAG is at most $W$ , what hashrate fraction $\alpha$ does an adversary need to assure that deep reorgs happen at least $\varepsilon$ of the time? The adversary needs to create at least $L$ blocks for each $W$ blocks created by the network, giving us the equation $(1-\alpha)\cdot W \le \alpha\cdot L$ . Rearranging, we get the condition $\alpha \ge \frac{1}{2}\left(1+\frac{L-W}{L+W}\right)\text{.}$ We see that for any $L>W$ a majority is required. If $L \gg W$ we get that almost all hashrate is required. Given that an adversary with $\alpha>1/2$ can disrupt the network in much more sinister ways than slightly increasing the CPU requirements for running a node, this is hardly a concern.

(Exercise: further analysis)

Reindexing Below the Reindex Root

The only case that requires our attention is that where $B \notin R.TreeFuture$ but $B$ is the new selected tip.

In this scenario we finally find that the reindexing root is no longer as the selected chain! Hence, if we do not reindex, the tree that grows the fastest is also the tree that has to make do with a measly slack of $L$ . If not handled gracefully (e.g. if upon reorging the reindex root out of the selected chain the node throws an exception and halts), this scenario could easily crash the entire network.

Now, one can argue that $100$ blocks reorgs are extremely rare, but:

In 10BPS, a 51% adversary creates on average at least $5$ blocks per second, so it will take not much longer than twenty seconds to garner such an attack. A network that can be compromised by twenty seconds of a 51% attack is not secure.
You don’t even really need 51% for this attack. Even with a much smaller fraction there is some chance you manage to create a $100$ blocks deep reorg. Hence, a smaller attacker can keep trying until they succeed in a stochastic attack. In the context of blockChains, we discussed stochastic attacks and analyzed how they relate the desired depth and the fraction of the adversary to the time it would take for an attack to succeed. The bottom line is that small adversaries can successfully perform a shallow attack in a somewhat reasonable time frame. However, recall that the adversary can’t control what attempt will succeed, so they can’t target a specific transaction. However, if a $100$ blocks reorg is suddenly enough to crash the entire network, this will give malicious actors more incentives to attempt such attacks.
We can’t discount the possibility that, say due to some east European power targeting the underwater cables that carry the internet across continents, a several minutes split might happen.

With all of the above in mind, we really want to handle this situation nicely. The most straightforward thing to do is to simply reindex: let $A$ be the latest common ancestor of $B$ and $R$ . We then set $S$ to be the block at depth $100$ from $B$ . $S$ will be the block to inherit $R$ ’s place as the new reindex block.

Assuming the process we are about to describe provides the expected slacks to the blocks in $S.SelectedChain$ , we get that this also holds for the blocks in $A.SelectedChain$ , so all we need to do is to provide slack to the blocks going from $A$ to $S$ (not including $S$ ) and propagate them to their children (which will also include $R.TreeFuture$ ), we then fully reindex from $S$ and we’re done.

However, there is another improvement we should make. In the scenario that a 100 blocks reorg happens, it is possible that the chain will also reorg back. A computationally inferior adversary can even take advantage of the situation to cause a few rounds of back-and-forth. The GHOSTDAG protocol has liveness, so any balancing an attack will be short lived, but we still don’t want it to be leveraged to cause many nodes to suffocate from constant deep reorgs.

But we actually don’t need to. Recall that the blocks outside $R.TreeFuture$ have slack we can use. So what we do is to set some threshold $T$ and withhold reindexing until one of the two happens: the selected tip is heavier than the best tip in $R.TreeFuture$ by at least $T$ blocks, or a new block was added to the future of the selected tip and there’s no slack for it. In some scenarios, it is possible that the first reorg will happen pretty fast, because it could be that by the time the reorg happened, most of the relevant slack was depleted. However, after that first reorg $L$ is provided to the side tree and if $T<L$ we know that it would take at least $T$ more blocks before the next reorg, thwarting the balance attack.

It might seem weird that we use a threshold, given my strong position against using thresholds when designing consensus protocols. The reason this is not a problem is because the indexing is not in consensus. Thresholds are bad for consensus because nodes can disagree on whether the threshold was reached, causing them to reach different states. The reachability indexation, however, does not affect the state at all. We don’t care if different nodes arrive at different indexations.

Optimized Processing

We conclude the topic of reachability by pointing out an important subtlety in how the reachability tree is constructed. The ulterior motive of this section is to be a teaser for our next topic: implementing the UTXO model over GHOSTDAG.

Throughout the posts, we assumed that the tree parent of each block is its selected parent. It not always is. The tree parent of each block is the block in its past with the largest blue past. It the codebase this is referred to as “headers selected tip”, but I’ll refer to it as the bluest parent.

If you find the distinction confusing this only means you’ve been following along! The way we presented GHOSTDAG, the bluest parent is the selected parent.

Well, here’s the thing. Some of the verifications steps for checking whether a block is valid are very heavy. In particular, anything that requires processing the block data. To truly verify that a block is valid, we need to compute the UTXO set from its point of view, so we could check if all the transactions therein are valid. Not only do we not want to do that for all blocks, but in some cases we want to validate blocks using the headers alone.

That’s why block verification is actually divided into two parts: consensus and data. The consensus checks are everything that can be verified from the headers alone (hence the name “headers selected tips”). Any block that passes the consensus level validation will be included in the DAG. We only do the data level checks for selected chain blocks. If, when trying to add a block, we find that it fails the data checks, we put it in an intermediate status called “disqualified from chain”. The implementation of the UTXO model over Kaspa was built around the idea that data validation should only be required for chain blocks, and that is a big part of the next topic.

In the scenario where the bluest parent of some block happens to be disqualified from chain, it would not be the same as its selected parent.

Part 7: UTXO Over GHOSTDAG

(create cryptographic appendix where I introduce digital signatures, random oracles and hashes, and Merkle trees.

The PHANTOM Protocol (Understanding GHOSTDAG Chapter 1E, post 1)