A Merkle tree is a binary tree that makes it possible to efficiently and securely verify the contents of large data structures. The concept of this tree was proposed and patented in 1982 by Ralph Merkle, an American cryptographer.
The length of the hash remains the same for any amount of input data. Leaf vertices contain hashes from blocks of data, whereas inner vertices include hashes from the addition of values in two child vertices. In turn, the root vertex comprises the hash of the entire data set.
Initially, there is a set of data that is not related to each other in any way. For each block of data (in the context of a transaction blockchain), we need to compute a hash of the block. The number of data blocks should be
2^N, because as the tree is built, the previous blocks are hashed in pairs at each iteration. A pair of hashes is then grouped together and a new hash is created based on them. Hashing will continue as long as there is something to group at each level, and it will stop when there is only one pair left from which the root hash – the Merkle root – is obtained.
H(1-2) = keccak256(H(1) + H(2))
H(3-4) = keccak256(H(3) + H(4))
H(5-6) = keccak256(H(5) + H(6))
H(7-8) = keccak256(H(7) + H(8))
H(1-2-3-4) = keccak256(H(1-2) + H(3-4))
H(5-6-7-8) = keccak256(H(5-6) + H(7-8))
H(1-2-3-4-5-6-7-8) = keccak256(H(1-2-3-4) + H(5-6-7-8))
Once we have the root, we can validate the data. If the root hash matches the original hash, everything is fine, the data is valid. It is also possible to efficiently check whether certain data is present in the tree structure.
Let’s say we want to check that the
H2 hash is present in the tree. The proof will be an array of data:
H5-6-7-8. Thanks to these hashes, we can reconstruct the root hash:
H(1-2-3-4-5-6-7-8) = keccak256(keccak256(keccak256(H(1) + H(2)) + H(3-4)) + H(5-6-7-8))
Then we compare the resulting hash with the original one. If they match, it means that the hash
H2 is present in the tree.
Let’s now see how to implement a Merkle tree and data validation in Solidity.
For simplicity, we will write the array of transactions directly to the blockchain, as well as the array of hashes. Since we will be using the built-in
keccak256 function, which returns a hash of 32 bytes, the datatype of the hashes array will be
bytes32. The array will also have a dynamic length. In general, the length of the hashes array will not be the same as the length of the transactions array, because it will hold the hashes produced. We could calculate the length in advance, but for the sake of code simplification we won’t. If you like, you can make each of these arrays
public and read the hashes of the transactions.
We will produce
hashes in the constructor. The first step is to count the hashes for blocks with
transactions in the loop. Then, at each level of the tree, the number of hashes is reduced by a factor of 2. Since we will be storing the hashes in the hashes array, the index offsets must be stored separately. The iterator in the inner loop is incremented by 2 because we will be taking a couple of elements when calculating and adding hashes. In
abi.encodePacked this time we will add 2 hashes each to which we will set the indices correctly: for the left element – the value of the current shift offset + the index at the current tree level, and for the right element – the same + 1.
So, we have built the tree, but the interest consists in validating the data. To do this, let’s write a
validateTransaction function that takes a transaction and its index. Unfortunately, Solidity does not have a
find method for arrays, so it is easier to just pass the transaction index.
First, let’s calculate the hash of the transaction and create an array of proofs. We find the number of proofs depending on the length of the transaction array and set the length of
proofsIndexes. After it, at each level of the tree, we find the necessary element, depending on the index, taking the right or left element, bearing in mind the offsets in the
hashes. Then we go through the array of proof indices, checking the index for parity and calculating the hash depending on whether the element is left or right in the pair. Finally, we compare the resulting hash with the root hash and return the result.
There are several applications of a Merkle tree in blockchain.
- Transaction storage: In Bitcoin, for example, a transaction block stores only the merkle_root of its transactions. This reduces the size of the block and the load on the network. After accumulating a certain number of transactions, it is possible to delete old transactions to save space, but the blocks themselves remain unchanged because they only store the root of the tree.
- Mining: A bitcoin block consists of two parts – the block header and the list of transactions. To avoid having to recompute the transaction hashes each time, they are lined up in a Merkle tree, after which the block header is hashed rather than the entire block, and a random nonce number is taken. When the block is sent to other nodes, the root of the transaction list is calculated, and if it does not match the root in the header, the block is rejected.
- Verification: The essence of verification is auditing without having to download the entire blockchain. The process is greatly simplified and also allows the existence of data to be confirmed without disclosing information about the data, which can be useful for verifying sensitive information.
Without Merkle trees, blockchains would be too cumbersome, and Merkle proofs allow for cost-effective verification of data authenticity.
Merkle trees are actively used in blockchains (e.g., Bitcoin, Ethereum) to store transactions efficiently, but this is not the only application. For example, after the collapse of the FTX exchange, many exchanges implemented a Proof-of-Reserve mechanism, which utilizes a Merkle tree.