Merkle Prover

This section explains the prover of the Merkle commitment scheme, focusing on how columns are committed to compute a Merkle root and how the Merkle tree layers are constructed, as well as how to generate Merkle proofs (decommitments).

Stwo represents the computation trace using multiple columns of different sizes. Rather than committing to each column in separate Merkle trees, Stwo uses a modified Merkle tree structure where all columns of different sizes are committed to using a single Merkle tree. We will describe the structure of this Merkle tree in this section.

MerkleProver Structure

The MerkleProver struct represents a prover for a Merkle commitment scheme. It stores all layers of the Merkle tree, where each layer contains the hash values at that level.

pub struct MerkleProver<B: MerkleOps<H>, H: MerkleHasher> {
    /// Layers of the Merkle tree.
    /// The first layer is the root layer.
    /// The last layer is the largest layer.
    /// See [MerkleOps::commit_on_layer] for more details.
    pub layers: Vec<Col<B, H::Hash>>,
}

The Commit Process

The core of the Merkle prover is the commit function, which takes as input a set of columns and outputs a MerkleProver containing all layers of the Merkle tree. The columns must be of length that is a power of 2.

Below is the complete implementation of the commit function:

    pub fn commit(columns: Vec<&Col<B, BaseField>>) -> Self {
        let _span = span!(Level::TRACE, "Merkle", class = "MerkleCommitment").entered();
        if columns.is_empty() {
            return Self {
                layers: vec![B::commit_on_layer(0, None, &[])],
            };
        }

        let columns = &mut columns
            .into_iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut layers: Vec<Col<B, H::Hash>> = Vec::new();

        let max_log_size = columns.peek().unwrap().len().ilog2();
        for log_size in (0..=max_log_size).rev() {
            // Take columns of the current log_size.
            let layer_columns = columns
                .peek_take_while(|column| column.len().ilog2() == log_size)
                .collect_vec();

            layers.push(B::commit_on_layer(log_size, layers.last(), &layer_columns));
        }
        layers.reverse();
        Self { layers }
    }

Let's walk through the function step by step:

Sort Columns by Length: Columns are sorted in descending order of length (columns with the most elements appear first). This ensures that the largest columns are committed first.
Layer-by-Layer Commitment:
- For each layer (from largest to smallest), the function collects all columns of the current size and computes the hashes for that layer using the commit_on_layer function.
- For the largest layer, the previous layer's hashes are empty, so the hash is computed directly from the column values.
- For subsequent layers, the hash is computed from the previous layer's hashes and the current layer's column values.
Reverse Layers: After all layers are computed, the list of layers is reversed so that the root layer is at the beginning.

Example: Commit Process

Suppose the input column data is as shown below:

Figure 1: Example column data to commit using a Merkle tree

For this example, the columns are already in the sorted order: $Column 0$ , $Column 1$ , $Column 2$ (from longest to shortest). We will now compute the hashes stored at each layer.

First Layer (Leaves): The hashes are computed directly from the column values: $[h_{00}, h_{01}, h_{10}, h_{11}] = [H (a, p), H (b, q), H (c, r), H (d, s)]$
Second Layer: The next layer uses the previous hashes and the values from $Column 2$ : $[h_{0}, h_{1}] = [H (h_{00}, h_{01}, u), H (h_{10}, h_{11}, v)]$
Root: The root is computed as $roo t = H (h_{0}, h_{1})$ .

The resulting Merkle tree is illustrated below:

Figure 2: Merkle tree structure after commitment

The Decommit Process

The decommitment process enables the prover to generate a Merkle proof for a set of queried indices, allowing the verifier to check that specific elements are included in the committed Merkle tree.

The output is a MerkleDecommitment struct, which contains the hash and column values required for the verifier to reconstruct the Merkle root at the queried positions. Its implementation is shown below:

pub struct MerkleDecommitment<H: MerkleHasher> {
    /// Hash values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    pub hash_witness: Vec<H::Hash>,
    /// Column values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    /// This complements the column values that were queried. These must be supplied directly to
    /// the verifier.
    pub column_witness: Vec<BaseField>,
}

The decommit function implemented for the MerkleProver takes as input:

queries_per_log_size: A map from log size to a vector of query indices for columns of that size.
columns: The column data that was committed to in the Merkle tree.

It returns:

A vector of queried values, ordered as they are opened (from largest to smallest layer).
A MerkleDecommitment containing the hash and column witnesses needed for verification.

Below is the complete implementation of the decommit function:

    pub fn decommit(
        &self,
        queries_per_log_size: &BTreeMap<u32, Vec<usize>>,
        columns: Vec<&Col<B, BaseField>>,
    ) -> (Vec<BaseField>, MerkleDecommitment<H>) {
        // Prepare output buffers.
        let mut queried_values = vec![];
        let mut decommitment = MerkleDecommitment::empty();

        // Sort columns by layer.
        let mut columns_by_layer = columns
            .iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut last_layer_queries = vec![];
        for layer_log_size in (0..self.layers.len() as u32).rev() {
            // Prepare write buffer for queries to the current layer. This will propagate to the
            // next layer.
            let mut layer_total_queries = vec![];

            // Each layer node is a hash of column values as previous layer hashes.
            // Prepare the relevant columns and previous layer hashes to read from.
            let layer_columns = columns_by_layer
                .peek_take_while(|column| column.len().ilog2() == layer_log_size)
                .collect_vec();
            let previous_layer_hashes = self.layers.get(layer_log_size as usize + 1);

            // Queries to this layer come from queried node in the previous layer and queried
            // columns in this one.
            let mut prev_layer_queries = last_layer_queries.into_iter().peekable();
            let mut layer_column_queries =
                option_flatten_peekable(queries_per_log_size.get(&layer_log_size));

            // Merge previous layer queries and column queries.
            while let Some(node_index) =
                next_decommitment_node(&mut prev_layer_queries, &mut layer_column_queries)
            {
                if let Some(previous_layer_hashes) = previous_layer_hashes {
                    // If the left child was not computed, add it to the witness.
                    if prev_layer_queries.next_if_eq(&(2 * node_index)).is_none() {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index));
                    }

                    // If the right child was not computed, add it to the witness.
                    if prev_layer_queries
                        .next_if_eq(&(2 * node_index + 1))
                        .is_none()
                    {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index + 1));
                    }
                }

                // If the column values were queried, return them.
                let node_values = layer_columns.iter().map(|c| c.at(node_index));
                if layer_column_queries.next_if_eq(&node_index).is_some() {
                    queried_values.extend(node_values);
                } else {
                    // Otherwise, add them to the witness.
                    decommitment.column_witness.extend(node_values);
                }

                layer_total_queries.push(node_index);
            }

            // Propagate queries to the next layer.
            last_layer_queries = layer_total_queries;
        }

        (queried_values, decommitment)
    }

Let's break down the function step by step:

Sort Columns by Length:
- As in the commit function, columns are sorted in descending order of length.
Layer-by-Layer Decommitment:
- For each layer (from largest to smallest):
  - Collect all columns of the current size (layer_columns) and the previous layer's hashes (previous_layer_hashes).
  - Retrieve the queries for the current layer (layer_column_queries) and the previous layer's queries (prev_layer_queries).
  - For each node index to be decommitted in this layer:
    - Check if the child node hashes of the current node can be computed by the verifier, and if not, add the missing child node hashes to the hash_witness in the decommitment.
    - If the node index is queried, fetch the corresponding column values and append them to queried_values.
    - If not queried, add the column values to the column_witness in the decommitment.
  - The set of node indices decommitted in this layer is propagated as queries to the next layer.

Example: Decommit Process

For the column data in Figure 1, consider the query indices $[(2, [0]), (1, [1])]$ , where the query indices are maps from log size to a vector of query indices for columns of that size. This corresponds to querying the following elements:

The 0th element of columns of size $2^{2} = 4$ : $a$ from $Column 0$ and $p$ from $Column 1$ .
The 1st element of the column of size $2^{1} = 2$ : $v$ from $Column 2$ .

Because columns of equal length are committed together, the same indices are opened together in the decommitment. For example, for query $(2, [0])$ , both $a$ and $p$ are opened together.

Below is a walkthrough of the main loop in the decommit function, showing the state of key variables for each layer_log_size:

layer_log_size = 2 (columns of size 4):
- layer_columns: $[Column 0, Column 1]$
- previous_layer_hashes: None (first layer)
- prev_layer_queries: empty
- layer_column_queries: $[0]$
- For node_index = 0:
  - node_values: $[a, p]$
  - Queried, so append to queried_values: $[a, p]$
  - Add node_index = 0 to layer_total_queries (to propagate to next layer)
- At this stage: queried_values = $[a, p]$ , decommitment is empty.
layer_log_size = 1 (columns of size 2):
- layer_columns: $[Column 2]$
- previous_layer_hashes: $[h_{00}, h_{01}, h_{10}, h_{11}]$
- prev_layer_queries: $[0]$
- layer_column_queries: $[1]$
- For node_index = 0:
  - Add $h_{01}$ to hash_witness in decommitment.
  - node_values: $[u]$
  - Not queried, so append to column_witness in decommitment.
  - Add node_index = 0 to layer_total_queries.
- At this stage: queried_values = $[a, p]$ , hash_witness = $[h_{01}]$ , column_witness = $[u]$ , layer_total_queries = $[0]$ .
- For node_index = 1:
  - Add $h_{10}, h_{11}$ to hash_witness in decommitment.
  - node_values: $[v]$
  - Queried, so append to queried_values.
  - Add node_index = 1 to layer_total_queries.
- At this stage: queried_values = $[a, p, v]$ , hash_witness = $[h_{01}, h_{10}, h_{11}]$ , column_witness = $[u]$ , layer_total_queries = $[0, 1]$ (to propagate to next layer).
layer_log_size = 0 (root):
- layer_columns: empty
- previous_layer_hashes: $[h_{0}, h_{1}]$
- prev_layer_queries: $[0, 1]$
- layer_column_queries: empty
- No values are added to queried_values, hash_witness, or column_witness in this layer.

Final output:

queried_values: $[a, p, v]$
decommitment:
- hash_witness: $[h_{01}, h_{10}, h_{11}]$
- column_witness: $[u]$

In the next section, we describe the verification process to verify the inclusion of the queried values using the Merkle decommitment.