1Compact Ranges
2==============
3
4This document introduces **compact ranges**, a mental model and technique for
5reasoning about Merkle trees[^1] and proofs. We present the definition, the
6properties, and the applications of this technique.
7
8[^1]: In this doc, by Merkle trees we mean a subclass known as "history trees"
9introduced by Crosby and Wallach in "Efficient Data Structures for
10Tamper-Evident Logging"
11[paper](https://static.usenix.org/event/sec09/tech/full_papers/crosby.pdf).
12
13## Definition
14
15We call a tree node **perfect** if the subtree that it roots is a perfect
16binary tree. For example, in the picture below nodes `3`, `1.2`, `3.1` and
17`4.0` are perfect. A perfect node at level `H` has exactly `2^H` leaves in its
18subtree, and we say that it *covers* these leaves. Correspondingly, we call all
19the other nodes, such as `3.2` and `5.0`, *ephemeral*. Perfect nodes are
20immutable, i.e. the corresponding subtree contents and hash don't change when
21new leaves are appended to the Merkle tree. Ephemeral nodes, on the other hand,
22are modified each time a new leaf is added, until they become perfect.
23
24For a range `[L, R)` of leaves in a Merkle tree, a **compact range** is the
25minimal set of perfect nodes that cover these, and only these, leaves. For
26example, in the picture below, the range `[2, 9)` is covered by perfect nodes
27`[1.1, 2.1, 8]`, and the range `[12, 16)` is covered by a single perfect node
28`2.3`.
29
30<p align="center">
31 <img src="images/compact_ranges.svg" width="60%">
32</p>
33
34Note that, in the picture above, range `[0, 21)` could be covered by a single
35node `5.0`. However, we only use the perfect nodes for a compact range, so it
36rather consists of `[4.0, 2.4, 20]`. The idea is that nodes of the compact
37range are immutable, regardless of the tree size. For simplicity, when we talk
38about compact ranges, we can assume that the ephemeral nodes don’t exist.
39
40Compact ranges have many useful properties, some of which are elaborated in
41sections below. The basic property is that the number of nodes in a compact
42range `[L, R)` is `O(log(R-L))`, or `O(log N)` more generally. A compact range
43is always unique, and its shape is determined using a few bitwise operations on
44`L` and `R`.
45
46## Merging Compact Ranges
47
48The core property that makes compact ranges widely usable is that they are
49“mergeable”. Two compact ranges, `[L, M)` and `[M, R)`, can be efficiently
50merged into an `[L, R)` range. Consider the picture below for an intuitive
51understanding of how it works.
52
53<p align="center">
54 <img src="images/compact_ranges_merge.svg" width="60%">
55</p>
56
57Given two compact ranges, `[2, 9)` and `[9, 16)`, each represented by a set of
58node hashes (three green and three cyan nodes correspondingly), we “merge” two
59sibling nodes by computing their parent’s hash any time they are both present
60in the set of nodes. This process repeats until there are no siblings in the
61set. As a result, we get hashes of nodes `[1.1, 2.1, 3.1]` which, as it turns
62out, represent a compact range of `[2, 16)`.
63
64Note that, when merging two compact ranges, the set of “new” nodes (marked in
65yellow) that are generated as a side effect, forms a partial path towards the
66root of the tree. It can be proven that this is always the case, which is a
67convenient property for implementations.
68
69Merging two compact ranges can be implemented in `O(log(R-L))`, or more
70generally `O(log N)` time. This follows from the observation in the paragraph
71above, and the fact that the size of the resulting `[L, R)` compact range is
72limited by the same estimate.
73
74## Merkle Tree Proofs
75
76A compact range `[L, R)`, if represented by the cryptographic hashes of the
77corresponding nodes, can be considered a commitment to the contents of the
78leaves in this range. The ability to merge compact ranges is effectively the
79ability to merge commitments.
80
81### Root Hash
82
83For a Merkle tree with `N` leaves, the compact range `[0, N)` represents a
84succinct state of the entire tree.
85
86Often applications further reduce the size of this state down to a single hash.
87For example, in a tree with 21 leaves, as in the picutures above, this would be
88a hash of the ephemeral node `5.0`. We refer to this as the **root hash**.
89
90Merkle tree proofs verification uses the root hash (directly or indirectly) as
91the trust anchor. Usually the root hash is cryptographically signed and
92committed to by a server.
93
94### Inclusion Proofs Revisited
95
96An **inclusion proof** helps a client to verify that, in a Merkle tree of the
97given state / root hash, a certain leaf position `i` matches the given value.
98
99Intuitively, such a proof is the information that the client combines with the
100leaf hash in order to compute the root hash, which is then compared with the
101trusted one. The root hashes should match iff the leaf hash is correct (except
102a negligible probability of hash collisions).
103
104More specifically, for a leaf at index `i`, the server can give the compact
105ranges `[0, i)` and `[i+1, N)`, which the client can verify by merging with a
106middle range `[i, i+1)` formed simply as the hash of this leaf. The result
107will be a compact range `[0, N)`, which can be compared with the trusted root
108hash.
109
110<p align="center">
111 <img src="images/inclusion_proof.svg" width="60%">
112</p>
113
114In the example above, an inclusion proof for leaf `6` consists of compact
115ranges `[0, 6)` and `[7, 16)`.
116
117Note that the same nodes `[3.1, 2.0, 1.2, 7]` can be viewed as the siblings of
118the path going from the root to the leaf. This is a classic "vertical" way of
119thinking about Merkle tree proofs. It is used, for example, in [RFC
1206962](https://datatracker.ietf.org/doc/html/rfc6962#section-2.1).
121
122### Arbitrary Inclusion Proofs
123
124In the previous section we established that an inclusion proof can be
125decomposed into two compact ranges at both sides from the leaf in question.
126Note that these compact ranges represent the "complementary" part of the Merkle
127tree, i.e. the entire range minus the leaf. We can generalize this observation:
128an inclusion proof for an **arbitrary subset** of leaves is a commitment to its
129complementary part of the tree.
130
131For example, consider the case when we want to prove the authenticity of values
132within the range `[6, 13)`, as shown in the picture below.
133
134<p align="center">
135 <img src="images/inclusion_proof_range.svg" width="60%">
136</p>
137
138To do so, the server can provide two compact ranges: `[0, 6)` and `[13, 16)`.
139The client will then construct the middle compact range locally (based on the
140leaf hashes of values between `6` and `12` that they know), and merge it with
141the two boundary compact ranges. Then they compute the root hash from the
142resulting compact range, and compare it against the trusted root hash.
143
144This construction is called a **range inclusion proof**.
145
146In a more general case, to prove the inclusion of an arbitrary subset of
147entries, the server needs to provide a compact range for each of the “gaps” in
148leaves. The verifier will then construct compact ranges for each contiguous
149part of the leaves in question, and merge them with all the "gap" compact
150ranges provided by the prover.
151
152It is easy to see that a range inclusion proof takes `O(log N)` hashes of
153space. The general case, **multi-entry inclusion proof**, is less
154straightforward: depending on the number of leaves in question, and how close
155they are to each other, the proof size varies between `O(log N)` to `O(N)`. The
156multi-entry proof is always optimal though, and thus more efficient than many
157individual entry inclusion proofs which could cost `O(N log N)`.
158
159### Consistency Proofs
160
161A consistency proof (or proof of the append-only property) proves to a client
162that one tree state is the result of appending some entries to another state.
163
164<p align="center">
165 <img src="images/consistency_proof.svg" width="60%">
166</p>
167
168The definition of the consistency proof already contains a hint on how to model
169it with compact ranges. Suppose a client knows a compact range of the old tree,
170like `[0, 6)` in the picture above, and the server wants to prove that another
171state with 16 entries is an extension of the first 6 entries. The server
172provides a compact range of the appended entries `[6, 16)`. The client can then
173merge `[0, 6)` with `[6, 16)`, and compare the resulting compact range `[0,
17416)` with the advertised root hash.
175
176If the client does not have the compact range of the old tree, it can be
177provided by the server too. The classic consistency proof
178[algorithm](https://datatracker.ietf.org/doc/html/rfc6962#section-2.1.2) in RFC
1796962 doesn’t assume that the client has a mergeable commitment. So, instead of
180just compact range `[6, 16)`, it roughly[^2] consists of both the old compact range
181and the one that covers the appended entries.
182
183[^2]: In RFC 6962 the proof size has fewer nodes in most cases, e.g. the
184perfect nodes on the right border of the tree are replaced by a single
185ephemeral node.
186
187## Applications
188
189In addition to proofs, compact ranges can be used in various ways, such as
190constructing the Merkle tree in a distributed fashion.
191
192### Distributed Tree Construction
193
194Applications that operate large Merkle trees (such as [Certificate
195Transparency](https://certificate.transparency.dev/)), need to compute and/or
196store the nodes of the tree, for example, to be able to serve Merkle tree
197proofs or check the integrity of the whole data structure. It can be expensive
198or infeasible to do on a single server when the rate of updates or the volume
199of data is high.
200
201Since the Merkle tree is a highly regular recursive structure, its geometry can
202be split into pieces of a controllable size that a single server can handle.
203See the picture below for an intuition how.
204
205<p align="center">
206 <img src="images/distributed_tree.svg" width="80%">
207</p>
208
209The construction work is split across a number of "leaf" workers, each piece of
210work is based on a relatively small range of leaves. When the corresponding
211part of the tree is processed, the worker sends its compact range up to the
212coordinator. The coordinator merges the received compact ranges in order to
213construct the remaining top part of the tree.
214
215The number of tree nodes decreases exponentially from level to level. This is
216key to tuning performance of the algorithm. If it is requierd to handle a rate
217`O(T)` of processing leaves, it corresponds to only a rate of `O(T / 2^H)` at
218level `H`. By picking a reasonable batch size `B` for "leaf" workers, we allow
219the coordinator to receieve a rate of `O(T / B)` messages (containing just the
220compact ranges). A batch size of around `O(sqrt(T))` entries can help leveling
221the load on the coordinator with the load on "leaf" workers.
222
223The described approach can be used to implement a scalable tamper-evident log
224based on Merkle trees.
225
226### Updatable Proofs
227
228Logs based on Merkle trees are rarely static - usually they grow in append-only
229fashion. Proofs built for one state of the tree get outdated with later states.
230When proofs are represented using compact ranges, they can be updated by
231merging in the compact range between the previous and the next tree size.
232
233For example, an inclusion proof for range `[L, R)` in a tree of size `N` can be
234represented with a pair of compact ranges `[0, L)` and `[R, N)`, as described
235in the section on [arbitrary inclusion proofs](#arbitrary-inclusion-proofs).
236When the tree moves to a state of size `N+delta`, it is possible to update this
237inclusion proof by merging the compact range `[R, N)` with `[N, N+delta)`.
238
239Updating a proof incrementally is more efficient in terms of space / bandwidth
240than fetching an entire proof each time. For example, consistency proofs in a
241tree growing to size `N` take `O(log N)` space, and fetching them `O(N)` times
242takes `O(N log N)` in total. Incremental updates with compact ranges will total
243up as `O(N)` since each of the tree nodes is used in at most one compact range
244in the series.
245
246### Witness
247
248A witness is an entity that watches the state of the log and attests to its
249consistency. As described in the [Root Hash](#root-hash) section above, the
250tree state can be represented, for example, as a single root hash, or a compact
251range `[0, N)`.
252
253The benefit of storing a compact range is in the ability to incrementally
254update it as the tree state evolves. See the picture below.
255
256<p align="center">
257 <img src="images/witness.svg" width="80%">
258</p>
259
260Similarly to the [updatable proofs](#updatable-proofs), the accumulated space /
261bandwidth complexity of updating the state `O(N)` times is `O(N)`, as opposed
262to `O(N log N)` if the witness only stores the single root hash.
View as plain text