Introduction
Stwo is a state-of-the-art framework for creating STARK proofs that provides the following features:
- A frontend designed to be flexible to allow you to express your own constraints
- A backend that leverages Circle STARKs over the Mersenne31 prime field for fast prover performance
- Seamlessly integrated with Cairo
This book will guide you through the process of creating your own constraints and then proving them using Stwo and also provide in-depth explanations of the inner workings of Stwo.
Why Use a Proof System?
At its core, a proof system can prove the validity of a statement \(C(x)=y\) where \(C\) is a representation of some logic, while \(x\) is an input and \(y\) the output of said logic. (Assuming that we are only dealing with logic that can be expressed as a computation, we will henceforth refer to this logic as a computation). This means that we can verify the validity of a statement by either directly running the computation, or by verifying the validity of the proof produced with the proof system. The verifier can benefit from the second option in terms of time and space, if the time to verify the proof is faster than the time to run the computation, or the size of the proof is smaller than the input to the statement.
This property of a proof system is often referred to as succinctness, and it is exactly why proof systems have seen wide adoption in the blockchain space, where computation on-chain is much more expensive compared to off-chain computation. Using a proof system, it becomes possible to replace a large collection of computation to be executed on-chain with a proof of execution of the same collection of computation and verifying it on-chain. This way, proof generation can be handled off-chain using large machines and only the proof verification needs to be done on-chain.
But there are applications of proof systems beyond just blockchains. Since a proof is verifiable as well as succinct, it can also be used as auxiliary data to verify that the computation of an untrusted party was done correctly. For example, when we delegate computation to an untrusted server, we can ask it to provide a proof along with the computation result that the result indeed came from running a specific computation. Another example could be to ask a server running an ML model to provide proof that it ran inference on the correct model. The size of the accompanying proof and the time to verify it will be negligible compared to the cost of running the computation, but we gain the guarantee that the computation was done correctly.
Another optional feature of proof systems is zero-knowledge, which means that the proof reveals nothing about the computation other than its validity. In general, the output \(y\) of the computation \(C(x)=y\) will be public (i.e. revealed to the verifier), but the input \(x\) may be split into public and private parts, so that the verifier does not learn anything about the private part. With this feature, the intermediate values computed by the prover while computing \(C(x)\) will also be hidden from the verifier.
Why Stwo?
Before we dive into why we should choose Stwo, let's define some terminology. When we talked about proof systems in the previous section, we were only referring to the part that takes a statement and outputs a proof. In reality, however, we first need to structure the statement in a way that it can be proven by the proof system. This structuring part is often referred to as the frontend, and the proof system is commonly called the backend.
With that out of the way, let's dive into some of the advantages of using Stwo.
First, Stwo is a standalone framework that provides both the frontend and backend and therefore handles the entire proving process. There are other frameworks that only provide the frontend or the backend, which has its advantages as its modular structure makes it possible to pick and choose a backend or frontend of one's liking. However, having a single integrated frontend and backend reduces the complexity of the system and is also easier to maintain.
In addition, Stwo's frontend structures statements in the Algebraic Intermediate Representation (AIR), which is a representation that is especially useful for proving statements that are repetitive (e.g. the CPU in a VM, which essentially repeats the same fetch-decode-execute over and over again).
Stwo's backend is also optimized for prover performance. This is due to largely three factors.
-
It implements STARKs, or hash-based SNARKs, which boasts a faster prover compared to elliptic curve-based SNARKs like Groth16 or PLONK. This improvement comes mainly from running the majority of the computation in a small prime field (32 bits); Elliptic curve-based SNARKs, on the other hand, need to use big prime fields (e.g. 254-bit prime fields), which incur a lot of overhead as most computation does not require that many bits.
-
Even amongst multiple STARK backends, however, Stwo provides state-of-the-art prover performance by running the Mersenne-31 prime field (modulo \(2^{31} - 1\)), which is faster than another popular 32-bit prime field like BabyBear (modulo \(2^{31} - 2^{27} + 1\)). We suggest going through this post for a breakdown of why this is the case.
-
Finally, Stwo offers various CPU and GPU optimizations that improves prover performance as shown in Figure 1 below. It can also be compiled to WASM, allowing for fast proving in web environments.

One of the drawbacks of STARKs is that they have a larger proof size compared to elliptic curve-based SNARKs. One way to mitigate this drawback is by batching multiple proofs together to form a single proof.
On zero-knowledge:
As of the time of this writing, Stwo does not provide the "zero-knowledge" feature. "Zero-knowledge" here refers to the fact that the proof should not reveal any additional information other than the validity of the statement, which is not true for Stwo as it reveals to the verifier commitments to its witness values without hiding them by e.g. adding randomness. This reveals some information about the witness values, which may be used in conjunction with other information to infer the witness values.
AIR Development
This section is intended for developers who want to create custom proofs using Stwo (proofs of custom VMs, ML inference, etc.). It assumes that the reader is familiar with Rust and has some background knowledge of cryptography (e.g. finite fields). It also assumes that the reader is familiar with the concept of zero-knowledge proofs and knows what they want to create a zero-knowledge proof for, but it does not assume any firsthand experience with zero-knowledge proof systems.
All the code that appears throughout this section is available here.
Writing a Simple AIR

Welcome to the guide for writing AIRs in Stwo!
In this "Writing a Simple AIR" section, we will go through the process of writing a simple AIR from scratch. This requires some understanding of the proving lifecycle in Stwo, so we added a diagram showing a high-level overview of the whole process. As we go through each step, please note that the diagram may contain more steps than the code. This is because there are steps that are abstracted away by the Stwo implementation, but is necessary for understanding the code that we need to write when creating an AIR.
Hello World
Let's first set up a Rust project with Stwo.
$ cargo new stwo-example
We need to specify the nightly Rust compiler to use Stwo.
$ echo -e "[toolchain]\nchannel = \"nightly-2025-01-02\"" > rust-toolchain.toml
Now let's edit the Cargo.toml
file as follows:
[package]
name = "stwo-examples"
version = "0.1.0"
edition = "2021"
license = "MIT"
[dependencies]
stwo-prover = { git = "https://github.com/starkware-libs/stwo.git", rev = "92984c060b49d0db05e021883755fac0a71a2fa7" }
num-traits = "0.2.17"
itertools = "0.12.0"
rand = "0.8.5"
We are all set!
Writing a Spreadsheet

In order to write a proof, we first need to create a table of rows and columns. This is no different than writing integers to an Excel spreadsheet as we can see in Figure 2.

But there is a slight caveat to consider when creating the table. Stwo implements SIMD operations to speed up the prover in the CPU, but this requires providing the table cells in chunks of 16 rows. Simply put, this is because Stwo supports 16 lanes of 32-bit integers, which means that the same instruction can be run simultaneously for 16 different data.
Alas, for our table, we will need to create 14 dummy rows to make the total number of rows equal to 16, as shown in Figure 2. For the sake of simplicity, however, we will omit the dummy rows in the diagrams of the following sections.

Given all that, let's create this table using Stwo.
use stwo_prover::core::{
backend::{
simd::{column::BaseColumn, m31::N_LANES},
Column,
},
fields::m31::M31,
};
fn main() {
let num_rows = N_LANES;
let mut col_1 = BaseColumn::zeros(num_rows as usize);
col_1.set(0, M31::from(1));
col_1.set(1, M31::from(7));
let mut col_2 = BaseColumn::zeros(num_rows as usize);
col_2.set(0, M31::from(5));
col_2.set(1, M31::from(11));
}
As mentioned above, we instantiate the num_rows
of our table as N_LANES=16
to accommodate SIMD operations. Then we create a BaseColumn
of N_LANES=16
rows for each column and populate the first two rows with our values and the rest with dummy values.
Note that the values in the BaseColumn
need to be of type M31
, which refers to the Mersenne-31 prime field that Stwo uses. This means that the integers in the table must be in the range \([0, 2^{31}-1)\).
Now that we have our table, let's move on!
From Spreadsheet to Trace Polynomials

In the previous section, we created a table (aka spreadsheet). In this section, we will convert the table into something called trace polynomials.

In STARKs, the computation trace is represented as evaluations of a polynomial over some domain. Typically this domain is a coset of a multiplicative subgroup. But since the multiplicative subgroup of M31 is not smooth, Stwo works over the circle group which is the subgroup of degree-2 extension of M31 (as explained in the Mersenne Primes and Circle Group sections). Thus the domain in Stwo is formed of points \((x_i, y_i)\) on the circle curve. When interpolating a polynomial over the computation trace using the points on the circle curve as domain gives a bivariate trace polynomial \(f_j(x, y)\).
We will explain why using a polynomial representation is useful in the next section, but for now, let's see how we can create trace polynomials for our code. Note that we are building upon the code from the previous section, so there's not much new code here.
fn main() {
// --snip--
// Convert table to trace polynomials
let domain = CanonicCoset::new(log_num_rows).circle_domain();
let _trace: ColumnVec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>> =
vec![col_1, col_2]
.into_iter()
.map(|col| CircleEvaluation::new(domain, col))
.collect();
}
Here, domain
refers to the \((x_i, y_i)\) values used to interpolate the trace polynomials. For example, \((x_1, y_1), (x_2, y_2)\) in Figure 2 are the domain values for our example. Note that when creating the domain, we set the log_num_rows
to the log of the actual number of rows that are used in the table. In our example, we set it to 4 since Stwo requires that we use at least 16 rows. For a background on what CanonicCoset
and .circle_domain()
mean, you can refer to the Circle Group section.
Now that we have created 2 trace polynomials for our 2 columns, let's move on to the next section where we commit to those polynomials!
Committing to the Trace Polynomials

Now that we have created the trace polynomials, we need to commit to them.
As we can see in Figure 1, Stwo commits to the trace polynomials by first expanding the trace polynomials (i.e. adding more evaluations) and then committing to the expanded evaluations using a Merkle tree. The rate of expansion (commonly referred to as the blowup factor) is a parameter of the FRI protocol and for the purposes of this tutorial, we will use the default value.
const CONSTRAINT_EVAL_BLOWUP_FACTOR: u32 = 1;
fn main() {
// --snip--
// Config for FRI and PoW
let config = PcsConfig::default();
// Precompute twiddles for evaluating and interpolating the trace
let twiddles = SimdBackend::precompute_twiddles(
CanonicCoset::new(
log_num_rows + CONSTRAINT_EVAL_BLOWUP_FACTOR + config.fri_config.log_blowup_factor,
)
.circle_domain()
.half_coset,
);
// Create the channel and commitment scheme
let channel = &mut Blake2sChannel::default();
let mut commitment_scheme =
CommitmentSchemeProver::<SimdBackend, Blake2sMerkleChannel>::new(config, &twiddles);
// Commit to the preprocessed trace
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(vec![]);
tree_builder.commit(channel);
// Commit to the size of the trace
channel.mix_u64(log_num_rows as u64);
// Commit to the original trace
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(trace);
tree_builder.commit(channel);
}
We begin with some setup. First, we create a default PcsConfig
instance, which sets the values for the FRI and PoW operations. Setting non-default values is related to the security of the proof, which is out of the scope for this tutorial.
Next, we precompute twiddles, which are factors multiplied during FFT for a particular domain. Notice that the log size of the domain is set to log_num_rows + CONSTRAINT_EVAL_BLOWUP_FACTOR + config.fri_config.log_blowup_factor
, which is the max log size of the domain that is needed throughout the proving process. For committing to the trace polynomial, we only need to add config.fri_config.log_blowup_factor
but as we will see in the next section, we also need to commit to a polynomial of a higher degree, which is the reason we also add CONSTRAINT_EVAL_BLOWUP_FACTOR
.
The final setup is creating a commitment scheme and a channel. The commitment scheme will be used to commit to the trace polynomials as Merkle trees, while the channel will be used to keep a running hash of all data in the proving process (i.e. transcript of the proof). This is part of the Fiat-Shamir transformation where randomness can be generated safely even in a non-interactive setting. Here, we use the Blake2sChannel
and Blake2sMerkleChannel
for the channel and commitment scheme, respectively, but we can also use the Poseidon252Channel
and Poseidon252MerkleChannel
pair.
Now that we have our setup, we can commit to the trace polynomials. But before we do so, we need to first commit to an empty vector called a preprocessed trace, which doesn't do anything but is required by Stwo. Then, we need to commit to the size of the trace, which is a vital part of the proof system that the prover should not be able to cheat on. Once we have done these, we can finally commit to the original trace polynomials.
Now that we have committed to the trace polynomials, we can move on to how we can create constraints over the trace polynomials!
Evaluating Constraints Over Trace Polynomials

Proving Spreadsheet Functions
When we want to perform computations over the cells in a spreadsheet, we don't want to manually fill in the computed values. Instead, we leverage spreadsheet functions to autofill cells based on a given computation.
We can do the same thing with our table, except in addition to autofilling cells, we can also create a constraint that the result was computed correctly. Remember that the purpose of using a proof system is that the verifier can verify a computation was executed correctly without having to execute it themselves? Well, that's exactly why we need to create a constraint.
Now let's say we want to add a new column C
to our spreadsheet that computes the product of the previous columns plus the first column. We can set C1
as A1 * B1 + A1
as in Figure 2. And then we can constrain the value of the third column by creating an equation that must equal 0: col1_row1 * col2_row1 + col1_row1 - col3_row1 = 0
.

Identical Constraints Every Row
Obviously, as can be seen in Figure 2, our new constraint is satisfied for every row in the table. This means that we can substitute creating a constraint for each row with a single constraint over the columns, i.e. the trace polynomials.
Thus, col1_row1 * col2_row1 + col1_row1 - col3_row1 = 0
becomes \(f_1(x) \cdot f_2(x) + f_1(x) - f_3(x) = 0\).
The idea that all rows must have the same constraint may seem restrictive, compared to say a spreadsheet where we can define different functions for different rows. However, we will show in later sections how to handle such use-cases.
(Spoiler alert: it involves selectors and components)
Composition Polynomial
We will now give a name to the polynomial that expresses the constraint: a composition polynomial.
\(C(x) = f_1(x) \cdot f_2(x) + f_1(x) - f_3(x)\)
Basically, in order to prove that the constraints are satisfied, we need to show that the composition polynomial evaluates to 0 over the original domain (i.e. the domain of size the number of rows in the table).
But first, as can be seen in Figure 1, we need to expand the evaluations of the trace polynomials by a factor of 2. This is because when you multiply two trace polynomials to compute the constraint polynomial, the degree of the constraint polynomial will be the sum of the degrees of the trace polynomials. To adjust for this increase in degree, we double the number of evaluations.
Once we have the expanded evaluations, we can evaluate the composition polynomial. Checking that the composition polynomial evaluates to 0 over the original domain is done in FRI, so once again we need to expand the composition polynomial evaluations by a factor of 2 and commit to them.
In the actual Stwo code, we commit not to the composition polynomial, but to the quotient polynomial. The quotient polynomial is the composition polynomial divided by the vanishing polynomial, i.e., a polynomial that evaluates to 0 in the original domain. However, we intentionally omit this detail for the sake of simplicity.
We'll see in the code below how this is implemented.
Code
struct TestEval {
log_size: u32,
}
impl FrameworkEval for TestEval {
fn log_size(&self) -> u32 {
self.log_size
}
fn max_constraint_log_degree_bound(&self) -> u32 {
self.log_size + CONSTRAINT_EVAL_BLOWUP_FACTOR
}
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let col_1 = eval.next_trace_mask();
let col_2 = eval.next_trace_mask();
let col_3 = eval.next_trace_mask();
eval.add_constraint(col_1.clone() * col_2.clone() + col_1.clone() - col_3.clone());
eval
}
}
fn main() {
// --snip--
let mut col_3 = BaseColumn::zeros(num_rows);
col_3.set(0, col_1.at(0) * col_2.at(0) + col_1.at(0));
col_3.set(1, col_1.at(1) * col_2.at(1) + col_1.at(1));
// Convert table to trace polynomials
let domain = CanonicCoset::new(log_num_rows).circle_domain();
let trace: ColumnVec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>> =
vec![col_1, col_2, col_3]
.into_iter()
.map(|col| CircleEvaluation::new(domain, col))
.collect();
// --snip--
// Create a component
let _component = FrameworkComponent::<TestEval>::new(
&mut TraceLocationAllocator::default(),
TestEval {
log_size: log_num_rows,
},
QM31::zero(),
);
}
First, we add a new column col_3
that contains the result of the computation: col_1 * col_2 + col_1
.
Then, to create a constraint over the trace polynomials, we first create a TestEval
struct that implements the FrameworkEval
trait. Then, we add our constraint logic in the FrameworkEval::evaluate
function. Note that this function is called for every row in the table, so we only need to define the constraint once.
Inside FrameworkEval::evaluate
, we call eval.next_trace_mask()
consecutively three times, retrieving the cell values of all three columns (see Figure 3 below for a visual representation). Once we retrieve all three column values, we add a constraint of the form col_1 * col_2 + col_1 - col_3
, which should equal 0.

We also need to implement FrameworkEval::max_constraint_log_degree_bound(&self)
for FrameworkEval
. As mentioned in the Composition Polynomial section, we need to expand the trace polynomial evaluations because the degree of our composition polynomial is higher than the trace polynomial. Expanding it by the lowest value CONSTRAINT_EVAL_BLOWUP_FACTOR=1
is sufficient for our example as we only have one multiplication gate, so we return self.log_size + CONSTRAINT_EVAL_BLOWUP_FACTOR
. For those who are interested in how to set this value in general, we leave a detailed note below.
What value to set for max_constraint_log_degree_bound(&self)
?
self.log_size + max(1, ceil(log2(max_degree - 1)))
, where max_degree
is the maximum degree of all defined constraint polynomials.
e.g.
- degree 1 - 3:
self.log_size + 1
- degree 4 - 5:
self.log_size + 2
- degree 6 - 9:
self.log_size + 3
- degree 10 - 17:
self.log_size + 4
- ...
Now that we know the degree of the composition polynomial, we can also explain the following code:
// Precompute twiddles for evaluating and interpolating the trace
let twiddles = SimdBackend::precompute_twiddles(
CanonicCoset::new(
log_num_rows + CONSTRAINT_EVAL_BLOWUP_FACTOR + config.fri_config.log_blowup_factor,
)
.circle_domain()
.half_coset,
);
Why is the log_size
of the domain set to log_num_rows + CONSTRAINT_EVAL_BLOWUP_FACTOR + config.fri_config.log_blowup_factor
here? As we can see in Figure 1, once we have the composition polynomial, we need to expand it again for before committing to it for the FRI step. Thus, the maximum size of the domain that we need in the entire proving process is the FRI blow-up factor times the degree of the composition polynomial.
Using the new TestEval
struct, we can create a new FrameworkComponent::<TestEval>
component, which the prover will use to evaluate the constraint. For now, we can ignore the other parameters of the FrameworkComponent::<TestEval>
constructor.
We now move on to the final section where we finally create and verify a proof.
Finally, we can break down what an Algebraic Intermediate Representation (AIR) means.
Algebraic means that we are using polynomials to represent the constraints.
Intermediate Representation means that this is a modified representation of our statement so that it can be proven by a proof system.
So AIR is just another way of saying that we are representing statements to be proven as constraints over polynomials.
Proving and Verifying an AIR

We're finally ready to take the final step--prove and verify an AIR!
Since the code is relatively short, let us present it first and then go over the details.
fn main() {
// --snip--
// Prove
let proof = prove(&[&component], channel, commitment_scheme).unwrap();
// Verify
let channel = &mut Blake2sChannel::default();
let commitment_scheme = &mut CommitmentSchemeVerifier::<Blake2sMerkleChannel>::new(config);
let sizes = component.trace_log_degree_bounds();
commitment_scheme.commit(proof.commitments[0], &sizes[0], channel);
channel.mix_u64(log_num_rows as u64);
commitment_scheme.commit(proof.commitments[1], &sizes[1], channel);
verify(&[&component], channel, commitment_scheme, proof).unwrap();
}
Prove
As you can see, there is only a single line of code added to create the proof. The prove
function performs the FRI and PoW operations under the hood, although technically, the constraint related steps in Figure 1 were not performed in the previous section and are only performed once prove
is called.
Verify
In order to verify our proof, we need to check that the constraints are satisfied using the commitments from the proof. In order to do that, we need to set up a Blake2sChannel
and CommitmentSchemeVerifier<Blake2sMerkleChannel>
, along with the same PcsConfig
that we used when creating the proof. Then, we need to recreate the running hash channel by passing the Merkle tree commitments and the log_num_rows
to the CommitmentSchemeVerifier
instance by calling commit
(remember, the order is important!). Then, we can verify the proof using the verify
function.
Try setting the dummy values in the table to 1 instead of 0. Does it fail? If so, can you see why?
Congratulations! We have come full circle. We now know how to create a table, convert it to trace polynomials, commit to them, create constraints over the trace polynomials, and prove and verify the constraints (i.e. an AIR). In the following sections, we will go over some more complicated AIRs to explain Stwo's other features.
Preprocessed Trace
This section and the following sections are intended for developers who have completed the Writing a Simple AIR section or are already familiar with the workflow of creating an AIR. If you have not gone through the Writing a Simple AIR section, we recommend you to do so first as the following sections gloss over a lot of boilerplate code.
For those of you who have completed the Writing a Simple AIR tutorial, you should now be familiar with the concept of a trace as a table of integers that are filled in by the prover (we will now refer to this as the original trace).
In addition to the original trace, Stwo also has a concept of a preprocessed trace, which is a table whose values are fixed and therefore cannot be arbitrarily chosen by the prover. In other words, these are columns whose values are known in advance of creating a proof and essentially agreed upon by both the prover and the verifier.
One of the use cases of the preprocessed trace is as a selector for different constraints. Remember that in an AIR, the same constraints are applied to every row of the trace? If we go back to the spreadsheet analogy, this means that we can't create a spreadsheet that runs different computations for different rows. We can get around this issue by composing multiple constraints using a selector column as part of the preprocessed trace. For example, let's say we want to create a constraint that runs different computations for the first 2 rows and the next 2 rows. We can do this by using a preprocessed trace that has value 1 for the first 2 rows and 0 for the next 2 rows, essentially as a selector for the first 2 rows. The resulting single constraint composes the two different constraints by adding them together: \((1 - \text{preprocessed_trace}) * \text{constraint_1} + \text{preprocessed_trace} * \text{constraint_2}=0\)

Another use case is to use the preprocessed trace for expressing constant values used in the constraints. For example, when creating a hash function in an AIR, we often need to use round constants, which the verifier needs to be able to verify or the resulting hash may be invalid. We can also "look up" the constant values as an optimization technique, which we will discuss in more detail in the next section.
In this section, we will explore how to implement a preprocessed trace as a selector, and we will implement the simplest form: a single isFirst
column, where the value is 1 for the first row and 0 for all other rows.
Boilerplate code is omitted for brevity. Please refer to the full example code for the full implementation.
struct IsFirstColumn {
pub log_size: u32,
}
#[allow(dead_code)]
impl IsFirstColumn {
pub fn new(log_size: u32) -> Self {
Self { log_size }
}
pub fn gen_column(&self) -> CircleEvaluation<SimdBackend, M31, BitReversedOrder> {
let mut col = BaseColumn::zeros(1 << self.log_size);
col.set(0, M31::from(1));
CircleEvaluation::new(CanonicCoset::new(self.log_size).circle_domain(), col)
}
pub fn id(&self) -> PreProcessedColumnId {
PreProcessedColumnId {
id: format!("is_first_{}", self.log_size),
}
}
}
First, we need to define a IsFirstColumn
struct that will be used as a preprocessed trace. We will use the gen_column()
function to generate a CircleEvaluation
struct that is 1 for the first row and 0 for all other rows. The id()
function is needed to identify this column when evaluating the constraints.
fn main() {
...
// Create and commit to the preprocessed trace
let is_first_column = IsFirstColumn::new(log_size);
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(vec![is_first_column.gen_column()]);
tree_builder.commit(channel);
// Commit to the size of the trace
channel.mix_u64(log_size as u64);
// Create and commit to the original trace
let trace = gen_trace(log_size);
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(trace);
tree_builder.commit(channel);
...
}
Then, in our main function, we will create and commit to the preprocessed and original traces. For those of you who are curious about why we need to commit to the trace, please refer to the Committing to the Trace Polynomials section.
struct TestEval {
is_first_id: PreProcessedColumnId,
log_size: u32,
}
impl FrameworkEval for TestEval {
fn log_size(&self) -> u32 {
self.log_size
}
fn max_constraint_log_degree_bound(&self) -> u32 {
self.log_size + CONSTRAINT_EVAL_BLOWUP_FACTOR
}
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let is_first = eval.get_preprocessed_column(self.is_first_id.clone());
let col_1 = eval.next_trace_mask();
let col_2 = eval.next_trace_mask();
let col_3 = eval.next_trace_mask();
// If is_first is 1, then the constraint is col_1 * col_2 - col_3 = 0
// If is_first is 0, then the constraint is col_1 * col_2 + col_1 - col_3 = 0
eval.add_constraint(
(col_1.clone() * col_2.clone() - col_3.clone()) * is_first.clone()
+ (col_1.clone() * col_2.clone() + col_1.clone() - col_3.clone())
* (E::F::from(M31::from(1)) - is_first.clone()),
);
eval
}
}
Now that we have the traces, we need to create a struct that contains the logic for evaluating the constraints. As mentioned before, we need to use the is_first_id
field to retrieve the row value of the IsFirstColumn
struct. Then, we compose two constraints using the IsFirstColumn
row value as a selector and adding them together.
If you're unfamiliar with how max_constraint_log_degree_bound(&self)
should be implemented, please refer to this note.
fn main() {
...
// Create a component
let component = FrameworkComponent::<TestEval>::new(
&mut TraceLocationAllocator::default(),
TestEval {
is_first_id: is_first_column.id(),
log_size,
},
QM31::zero(),
);
// Prove
let proof = prove(&[&component], channel, commitment_scheme).unwrap();
// Verify
let channel = &mut Blake2sChannel::default();
let commitment_scheme = &mut CommitmentSchemeVerifier::<Blake2sMerkleChannel>::new(config);
let sizes = component.trace_log_degree_bounds();
commitment_scheme.commit(proof.commitments[0], &sizes[0], channel);
channel.mix_u64(log_size as u64);
commitment_scheme.commit(proof.commitments[1], &sizes[1], channel);
verify(&[&component], channel, commitment_scheme, proof).unwrap();
}
Finally, we can create a FrameworkComponent
using the TestEval
struct and then prove and verify the component.
Static Lookups
In the previous section, we showed how to create a preprocessed trace. In this section, we will introduce the concept of static lookups, where we will create columns that look up values from a preprocessed trace.
Readers who are unfamiliar with the concept of lookups can refer to the Lookups section for a quick introduction.
Specifically, we will implement a range-check. A range-check is a technique used to check that a certain value is within a given range. This proves useful especially in proof systems like Stwo that uses finite fields because it allows checking for underflow and overflow.
A range-check checks that all values in a column are within a certain range. For example, as in Figure 1, we can check that all values in the lookup columns are between 0 and 3. We do this by creating a multiplicity column that counts the number of times each value in the preprocessed trace appears in the lookup columns.
Then, we create two LogUp columns. The first column contains in each row a fraction where the nominator is the multiplicity and the denominator is the random linear combination of the value in the range-check column. For example, for row 1, the fraction should be \(\dfrac{2}{X-0}\). Note that in Figure 1, the nominator is actually \(-2\), i.e. we apply a negation to the multiplicity, because we want the sum of the first column to be equal to the sum of the second column.
The second column contains fractions where the nominator is always 1 and the denominator is the random linear combination of the value in the lookup column. Note that a single row batches two lookup tables by adding two fractions together.

If you stare at the LogUp columns hard enough, you'll notice that if we add all the fractions in the two columns together, we get 0. This is no coincidence! The prover will provide the sum of the LogUp columns and the verifier check in the open that this value is indeed 0.
Now let's move on to the implementation.
struct RangeCheckColumn {
pub log_size: u32,
}
#[allow(dead_code)]
impl RangeCheckColumn {
pub fn new(log_size: u32) -> Self {
Self { log_size }
}
pub fn gen_column(&self) -> CircleEvaluation<SimdBackend, M31, BitReversedOrder> {
let col = BaseColumn::from_iter((0..(1 << self.log_size)).map(|i| M31::from(i)));
CircleEvaluation::new(CanonicCoset::new(self.log_size).circle_domain(), col)
}
pub fn id(&self) -> PreProcessedColumnId {
PreProcessedColumnId {
id: format!("range_check_{}_bits", self.log_size),
}
}
}
First, we need to create the range-check column as a preprocessed column. This should look familiar to the code from the previous section.
fn gen_trace(log_size: u32) -> Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>> {
// Create a table with random values
let mut rng = rand::thread_rng();
let mut lookup_col_1 =
BaseColumn::from_iter((0..(1 << log_size)).map(|_| M31::from(rng.gen_range(0..16))));
let mut lookup_col_2 =
BaseColumn::from_iter((0..(1 << log_size)).map(|_| M31::from(rng.gen_range(0..16))));
let mut multiplicity_col = BaseColumn::zeros(1 << log_size);
lookup_col_1
.as_mut_slice()
.iter()
.chain(lookup_col_2.as_mut_slice().iter())
.for_each(|value| {
let index = value.0 as usize;
multiplicity_col.set(index, multiplicity_col.at(index) + M31::from(1));
});
// Convert table to trace polynomials
let domain = CanonicCoset::new(log_size).circle_domain();
vec![
lookup_col_1.clone(),
lookup_col_2.clone(),
multiplicity_col.clone(),
]
.into_iter()
.map(|col| CircleEvaluation::new(domain, col))
.collect()
}
Next, we create the trace columns. The first two columns are random values in the range \([0, 15]\), and the third column contains the counts of the values in the range-check column.
relation!(LookupElements, 1);
fn gen_logup_trace(
log_size: u32,
range_check_col: &BaseColumn,
lookup_col_1: &BaseColumn,
lookup_col_2: &BaseColumn,
multiplicity_col: &BaseColumn,
lookup_elements: &LookupElements,
) -> (
Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>>,
SecureField,
) {
let mut logup_gen = LogupTraceGenerator::new(log_size);
let mut col_gen = logup_gen.new_col();
for simd_row in 0..(1 << (log_size - LOG_N_LANES)) {
let numerator: PackedSecureField = PackedSecureField::from(multiplicity_col.data[simd_row]);
let denom: PackedSecureField = lookup_elements.combine(&[range_check_col.data[simd_row]]);
col_gen.write_frac(simd_row, -numerator, denom);
}
col_gen.finalize_col();
let mut col_gen = logup_gen.new_col();
for simd_row in 0..(1 << (log_size - LOG_N_LANES)) {
let lookup_col_1_val: PackedSecureField =
lookup_elements.combine(&[lookup_col_1.data[simd_row]]);
let lookup_col_2_val: PackedSecureField =
lookup_elements.combine(&[lookup_col_2.data[simd_row]]);
// 1 / denom1 + 1 / denom2 = (denom1 + denom2) / (denom1 * denom2)
let numerator = lookup_col_1_val + lookup_col_2_val;
let denom = lookup_col_1_val * lookup_col_2_val;
col_gen.write_frac(simd_row, numerator, denom);
}
col_gen.finalize_col();
logup_gen.finalize_last()
}
fn main() {
...
// Draw random elements to use when creating the random linear combination of lookup values in the LogUp columns
let lookup_elements = LookupElements::draw(channel);
// Create and commit to the LogUp columns
let (logup_cols, claimed_sum) = gen_logup_trace(
log_size,
&range_check_col,
&trace[0],
&trace[1],
&trace[2],
&lookup_elements,
);
...
}
Now we need to create the LogUp columns.
First, note that we are creating a LookupElements
instance using the macro relation!
. This macro creates an API for performing random linear combinations. Under the hood, it creates two random values \(z, \alpha\) that can create a random linear combination of an arbitrary number of elements. In our case, we only need to combine one value (value in \([0,15]\)), which is why we pass in 1
to the macro.
Inside gen_logup_trace
, we create a LogupTraceGenerator
instance. This is a helper class that allows us to create LogUp columns. Every time we create a new column, we need to call new_col()
on the LogupTraceGenerator
instance.
You may notice that we are iterating over BaseColumn
in chunks of 16, or 1 << LOG_N_LANES
values. This is because we are using the SimdBackend
, which runs 16 lanes simultaneously, so we need to preserve this structure. The Packed
in PackedSecureField
means that it packs 16 values into a single value.
You may also notice that we are using a SecureField
instead of just the Field
. This is because the random value we created in LookupElements
will be in the degree-4 extension field \(\mathbb{F}_{p^4}\). Interested readers can refer to the Mersenne Primes section for more details.
Once we set the fractions for each simd_row
, we need to call finalize_col()
to finalize the column. This process modifies the LogUp columns from individual fractions to cumulative sums of the fractions as shown in Figure 2.

Finally, we need to call finalize_last()
on the LogupTraceGenerator
instance to finalize the LogUp columns, which will return the LogUp columns as well as the sum of the fractions in the LogUp columns.
struct TestEval {
range_check_id: PreProcessedColumnId,
log_size: u32,
lookup_elements: LookupElements,
}
impl FrameworkEval for TestEval {
fn log_size(&self) -> u32 {
self.log_size
}
fn max_constraint_log_degree_bound(&self) -> u32 {
self.log_size + CONSTRAINT_EVAL_BLOWUP_FACTOR
}
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let range_check_col = eval.get_preprocessed_column(self.range_check_id.clone());
let lookup_col_1 = eval.next_trace_mask();
let lookup_col_2 = eval.next_trace_mask();
let multiplicity_col = eval.next_trace_mask();
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
-E::EF::from(multiplicity_col),
&[range_check_col],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[lookup_col_1],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[lookup_col_2],
));
eval.finalize_logup_batched(&vec![0, 1, 1]);
eval
}
}
The last piece of the puzzle is to create the constraints. We use the same TestEval
struct as in the previous sections, but the evaluate
function will look slightly different. Instead of calling add_constraint
on the EvalAtRow
instance, we will call add_to_relation
, which recreates the fractions that we added in the LogUp columns using values in the range-check, lookup, and multiplicity columns.
Once we add the fractions as constraints, we call the finalize_logup_batched
function, which indicates how we want to batch the fractions. In our case, we added 3 fractions but want to create batches where the last two fractions are batched together, so we pass in &vec![0, 1, 1]
.
// Verify
assert_eq!(claimed_sum, SecureField::zero());
let channel = &mut Blake2sChannel::default();
let commitment_scheme = &mut CommitmentSchemeVerifier::<Blake2sMerkleChannel>::new(config);
let sizes = component.trace_log_degree_bounds();
commitment_scheme.commit(proof.commitments[0], &sizes[0], channel);
channel.mix_u64(log_size as u64);
commitment_scheme.commit(proof.commitments[1], &sizes[1], channel);
commitment_scheme.commit(proof.commitments[2], &sizes[2], channel);
verify(&[&component], channel, commitment_scheme, proof).unwrap();
When we verify the proof, as promised, we check that the claimed_sum
, which is the sum of the fractions in the LogUp columns, is 0.
And that's it! We have successfully created a static lookup for a range-check.
Dynamic Lookups
In the last section, we implemented a static lookup. A dynamic lookup is the same as a static lookup except that the values that are being looked up are not known before the proving process (i.e. they are not preprocessed columns but trace columns).
In this section, we will implement one of the simplest dynamic lookups: a permutation check.
A permutation check simply checks that two sets of values have the same elements, but not necessarily in the same order. For example, the values [1, 2, 3]
and [3, 1, 2]
are a permutation of each other, but [1, 2, 3]
and [1, 2]
are not.
If you went through the previous section, you should have a good intuition for how to implement this. First create two LogUp columns where the first column contains the values in the original set of values with multiplicity \(1\) and the second column contains the values in the second set of values with multiplicity \(-1\). Then, check that the claimed_sum
, or the sum of the fractions in the two LogUp columns, is \(0\).
We can optimize further by batching the two columns into a single LogUp column so that a LogUp column row looks something like \(\frac{1}{col_1} - \frac{1}{col_2}\).
fn gen_trace(log_size: u32) -> Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>> {
// Create a table with values [0, 1 << log_size)
let mut rng = rand::thread_rng();
let values = (0..(1 << log_size)).map(|i| i).collect::<Vec<_>>();
let original_col = BaseColumn::from_iter(values.iter().map(|v| M31::from(*v)));
// Create a random permutation of the values
let mut permutation = values.clone();
permutation.shuffle(&mut rng);
let permuted_col = BaseColumn::from_iter(permutation.iter().map(|v| M31::from(*v)));
// Convert table to trace polynomials
let domain = CanonicCoset::new(log_size).circle_domain();
vec![original_col, permuted_col]
.into_iter()
.map(|col| CircleEvaluation::new(domain, col))
.collect()
}
fn gen_logup_trace(
log_size: u32,
original_col: &BaseColumn,
permuted_col: &BaseColumn,
lookup_elements: &LookupElements,
) -> (
Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>>,
SecureField,
) {
let mut logup_gen = LogupTraceGenerator::new(log_size);
let mut col_gen = logup_gen.new_col();
for row in 0..(1 << (log_size - LOG_N_LANES)) {
// 1 / original - 1 / permuted = (permuted - original) / (original * permuted)
let original_val: PackedSecureField = lookup_elements.combine(&[original_col.data[row]]);
let permuted_val: PackedSecureField = lookup_elements.combine(&[permuted_col.data[row]]);
col_gen.write_frac(
row,
permuted_val - original_val,
original_val * permuted_val,
);
}
col_gen.finalize_col();
logup_gen.finalize_last()
}
fn main() {
...
// Create and commit to the trace columns
let trace = gen_trace(log_size);
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(trace.clone());
tree_builder.commit(channel);
// Draw random elements to use when creating the random linear combination of lookup values in the LogUp columns
let lookup_elements = LookupElements::draw(channel);
// Create and commit to the LogUp columns
let (logup_cols, claimed_sum) =
gen_logup_trace(log_size, &trace[0], &trace[1], &lookup_elements);
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals(logup_cols);
tree_builder.commit(channel);
...
}
Looking at the code above, we can see that it looks very similar to the implementation in the previous section. Instead of creating a preprocessed column, we create a trace column that contains the values [0, 1 << log_size)
in order. Then, we create a random permutation of the trace column values and set it as the second trace column. Note that this is equivalent to "looking up" all values in the first trace column once. And since all the values are looked up only once, we do not need a separate multiplicity column.
Then, we create a LogUp column that contains the values \(\frac{1}{original} - \frac{1}{permuted}\).
struct TestEval {
log_size: u32,
lookup_elements: LookupElements,
}
impl FrameworkEval for TestEval {
fn log_size(&self) -> u32 {
self.log_size
}
fn max_constraint_log_degree_bound(&self) -> u32 {
self.log_size + CONSTRAINT_EVAL_BLOWUP_FACTOR
}
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let original_col = eval.next_trace_mask();
let permuted_col = eval.next_trace_mask();
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[original_col],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
-E::EF::one(),
&[permuted_col],
));
eval.finalize_logup_in_pairs();
eval
}
}
The TestEval
struct is also very similar to the one in the previous section. The only difference is that we call add_to_relation
twice and add them together by calling finalize_logup_in_pairs()
on the TestEval
instance. This is equivalent to calling the finalize_logup_batched
function with &vec![0, 0]
.
Local Row Constraints
- Change the order of elements in
BaseColumn
in-place viabit_reverse_coset_to_circle_domain_order
before creating aCircleEvaluation
instance. - The previous row in the first row will point to the last row, so you may need to disable the constraint for the first row.
Until now, we have only considered constraints that apply over values in a single row. But what if we want to express constraints over multiple rows? For example, we may want to ensure that the difference between the values in two adjacent rows is always the same.
Turns out we can implement this as an AIR constraint, as long as the same constraints are applied to all rows. In this section, we will see how to implement this.
We will build upon the example in the previous section, where we created a two columns and proved that they are permutations of each other by asserting that the second column looks up all values in the first column exactly once.
Here, we will create two columns and prove that not only are they permutations of each other, but also that the second row is a sorted version of the first row.
More specifically, the sorted column will contain in order the values \([0,num\_rows)\), which means that the difference between every current row and the previous row should be \(1\).
We will go through three iterations, fixing an issue in each iteration.
First Try
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let unsorted_col = eval.next_trace_mask();
let [sorted_col_prev_row, sorted_col_curr_row] =
eval.next_interaction_mask(ORIGINAL_TRACE_IDX, [-1, 0]);
// New constraint
eval.add_constraint(
E::F::one() - (sorted_col_curr_row.clone() - sorted_col_prev_row.clone()),
);
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[unsorted_col],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
-E::EF::one(),
&[sorted_col_curr_row],
));
eval.finalize_logup_in_pairs();
eval
}
Basically, the same logic for creating the trace and LogUp columns are equal to the previous section, so we omit them for brevity.
What does change is the evaluate
function, where we call next_interaction_mask
on the EvalAtRow
instance. This function can retrieve values from arbitrary row offsets, which means we can access the previous row value using -1
. Since we call this function with offsets [-1, 0]
, we will retrieve the values for the previous and current rows.
Once we have these values, we can now assert that the difference between the current and previous row is always 1
with the constraint: E::F::one() - (sorted_col_curr_row.clone() - sorted_col_prev_row.clone())
.
But this will fail with a ConstraintsNotSatisfied
error, can you see why? (You can try running it yourself here)
Second Try
The issue was that when calling evaluate
on the first row of our trace, the previous row value wraps around to the last row because there are no negative indices.
This means that in our example, we are expecting the 0 - 15 = 1
constraint to hold, which is clearly not true.
To fix this, we can use the IsFirstColumn
preprocessed column that we created in the Preprocessed Trace section. So we will copy over the same code for creating the preprocessed column and modify our new constraint as follows:
let is_first_col = eval.get_preprocessed_column(self.is_first_id.clone());
eval.add_constraint(
(E::F::one() - is_first_col.clone())
* (E::F::one() - (sorted_col_curr_row.clone() - sorted_col_prev_row.clone())),
);
Now, we have a constraint that is disabled for the first row, which is exactly what we want.
Still, however, this will fail with the same ConstraintsNotSatisfied
error. (You can run it here)
Third Try
So when we were creating CircleEvaluation
instances from our BaseColumn
instances, the order of the elements that we were creating it with was actually not the order that Stwo understands it to be. Instead, it assumes that the values are in the bit-reversed, circle domain order. It's not important to understand what this order is, specifically, but this does mean that when Stwo tries to find the -1
offset when calling evaluate
, it will find the previous value assuming that it's in a different order. This means that when we create a CircleEvaluation
instance, we need to convert it to a bit-reversed circle domain order.
Thus, every time we create a CircleEvaluation
instance, we need to convert the order of the values in the BaseColumn
beforehand.
impl IsFirstColumn {
...
pub fn gen_column(&self) -> CircleEvaluation<SimdBackend, M31, BitReversedOrder> {
let mut col = BaseColumn::zeros(1 << self.log_size);
col.set(0, M31::from(1));
//////////////////////////////////////////////////////////////
// Convert the columns to bit-reversed circle domain order
bit_reverse_coset_to_circle_domain_order(col.as_mut_slice());
//////////////////////////////////////////////////////////////
CircleEvaluation::new(CanonicCoset::new(self.log_size).circle_domain(), col)
}
...
}
fn gen_trace(log_size: u32) -> Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>> {
// Create a table with random values
let mut rng = rand::thread_rng();
let sorted_values = (0..(1 << log_size)).map(|i| i).collect::<Vec<_>>();
let mut unsorted_values = sorted_values.clone();
unsorted_values.shuffle(&mut rng);
let mut unsorted_col = BaseColumn::from_iter(unsorted_values.iter().map(|v| M31::from(*v)));
let mut sorted_col = BaseColumn::from_iter(sorted_values.iter().map(|v| M31::from(*v)));
// Convert table to trace polynomials
let domain = CanonicCoset::new(log_size).circle_domain();
////////////////////////////////////////////////////////////////////
// Convert the columns to bit-reversed circle domain order
bit_reverse_coset_to_circle_domain_order(unsorted_col.as_mut_slice());
bit_reverse_coset_to_circle_domain_order(sorted_col.as_mut_slice());
////////////////////////////////////////////////////////////////////
vec![unsorted_col, sorted_col]
.into_iter()
.map(|col| CircleEvaluation::new(domain, col))
.collect()
}
And voilà, we have successfully implemented the constraint. You can run it here.
Components
So now that we know how to create a self-contained AIR, the inevitable question arises: How do we make this modular?
Fortunately, Stwo provides an abstraction called a component that allows us to create independent AIRs and compose them together. In other proving frontends, this is also commonly referred to as a chip, but the idea is the same.
One of the most common use cases of components is to separate frequently used functions (e.g. a hash function) from the main component into a separate component and reuse it, avoiding trace column bloat. Even if the function is not frequently used, it could be useful to separate it into a component to avoid the degree of the constraints becoming too high. This second point is possible because when we create a new component and connect it to the old component, we do it by using lookups, which means that the constraints of the new component are not added to the degree of the old component.
Example
To illustrate how to use components, we will create two components where the main component calls a hash function component. For simplicity, instead of an actual hash function, the second component will compute \(x^5 + 1\) from an input \(x\). This component will have in total three columns: [input, intermediate, output], which will correspond to the values \([x, x^3, x^5 + 1]\). Our main component, on the other hand, will have two columns, [input, output], which corresponds to the values \([x, x^5 + 1]\).
We'll now refer to the main component as the scheduling component and the hash function component the computing component, as the main component is essentially scheduling the hash function component to run its function with a given input and the hash function component computes on the provided input. As can be seen in Figure 1, the inputs and outputs of each component are connected by lookups.

Design

When we implement this in Stwo, the traces of each component will look like Figure 2 above. Each component has its own original and LogUp traces, and the inputs and outputs of each component are connected by lookups. Since the scheduling component adds the inputs as positive values and the outputs as negative values, while the computing component adds the inputs as negative values and the outputs as positive values, the verifier can simply check that the sum of the two LogUp columns is zero.
Code
fn main() {
// --snip--
// Create trace columns
let scheduling_trace = gen_scheduling_trace(log_size);
let computing_trace = gen_computing_trace(log_size, &scheduling_trace[0], &scheduling_trace[1]);
// Statement 0
let statement0 = ComponentsStatement0 { log_size };
statement0.mix_into(channel);
// Commit to the trace columns
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals([scheduling_trace.clone(), computing_trace.clone()].concat());
tree_builder.commit(channel);
// Draw random elements to use when creating the random linear combination of lookup values in the LogUp columns
let lookup_elements = ComputationLookupElements::draw(channel);
// Create LogUp columns
let (scheduling_logup_cols, scheduling_claimed_sum) = gen_scheduling_logup_trace(
log_size,
&scheduling_trace[0],
&scheduling_trace[1],
&lookup_elements,
);
let (computing_logup_cols, computing_claimed_sum) = gen_computing_logup_trace(
log_size,
&computing_trace[0],
&computing_trace[2],
&lookup_elements,
);
// Statement 1
let statement1 = ComponentsStatement1 {
scheduling_claimed_sum,
computing_claimed_sum,
};
statement1.mix_into(channel);
// Commit to the LogUp columns
let mut tree_builder = commitment_scheme.tree_builder();
tree_builder.extend_evals([scheduling_logup_cols, computing_logup_cols].concat());
tree_builder.commit(channel);
let components = Components::new(&statement0, &lookup_elements, &statement1);
let stark_proof = prove(&components.component_provers(), channel, commitment_scheme).unwrap();
let proof = ComponentsProof {
statement0,
statement1,
stark_proof,
};
// --snip--
}
The code above for proving the components should look pretty familiar by now. Since we need to do everything twice the amount of times, we create structs like ComponentsStatement0
, ComponentsStatement1
, Components
and ComponentsProof
, but the main logic is the same.
Let's take a closer look at how the LogUp columns are generated.
fn gen_scheduling_logup_trace(
log_size: u32,
scheduling_col_1: &CircleEvaluation<SimdBackend, M31, BitReversedOrder>,
scheduling_col_2: &CircleEvaluation<SimdBackend, M31, BitReversedOrder>,
lookup_elements: &ComputationLookupElements,
) -> (
Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>>,
SecureField,
) {
// --snip--
let scheduling_input: PackedSecureField =
lookup_elements.combine(&[scheduling_col_1.data[row]]);
let scheduling_output: PackedSecureField =
lookup_elements.combine(&[scheduling_col_2.data[row]]);
col_gen.write_frac(
row,
scheduling_output - scheduling_input,
scheduling_input * scheduling_output,
);
// --snip--
fn gen_computing_logup_trace(
log_size: u32,
computing_col_1: &CircleEvaluation<SimdBackend, M31, BitReversedOrder>,
computing_col_3: &CircleEvaluation<SimdBackend, M31, BitReversedOrder>,
lookup_elements: &ComputationLookupElements,
) -> (
Vec<CircleEvaluation<SimdBackend, M31, BitReversedOrder>>,
SecureField,
) {
// --snip--
let computing_input: PackedSecureField =
lookup_elements.combine(&[computing_col_1.data[row]]);
let computing_output: PackedSecureField =
lookup_elements.combine(&[computing_col_3.data[row]]);
col_gen.write_frac(
row,
computing_input - computing_output,
computing_input * computing_output,
);
// --snip--
}
As you can see, the LogUp values of the input and output columns of both the scheduling and computing components are batched together, but in the scheduling component, the output LogUp value is subtracted from the input LogUp value, while in the computing component, the input LogUp value is subtracted from the output LogUp value. This means that when the LogUp sums from both components are added together, they should cancel out and equal zero.
Next, let's check how the constraints are created.
impl FrameworkEval for SchedulingEval {
// --snip--
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let input_col = eval.next_trace_mask();
let output_col = eval.next_trace_mask();
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[input_col],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
-E::EF::one(),
&[output_col],
));
eval.finalize_logup_in_pairs();
eval
}
}
impl FrameworkEval for ComputingEval {
// --snip--
fn evaluate<E: EvalAtRow>(&self, mut eval: E) -> E {
let input_col = eval.next_trace_mask();
let intermediate_col = eval.next_trace_mask();
let output_col = eval.next_trace_mask();
eval.add_constraint(
intermediate_col.clone() - input_col.clone() * input_col.clone() * input_col.clone(),
);
eval.add_constraint(
output_col.clone()
- intermediate_col.clone() * input_col.clone() * input_col.clone()
- E::F::one(),
);
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
-E::EF::one(),
&[input_col],
));
eval.add_to_relation(RelationEntry::new(
&self.lookup_elements,
E::EF::one(),
&[output_col],
));
eval.finalize_logup_in_pairs();
eval
}
}
As you can see, we define the LogUp constraints for each component, and we also add two constraints that make sure the computations \(x^3\) and \(x^5 + 1\) are correct.
fn main() {
// --snip--
// Verify claimed sums
assert_eq!(
scheduling_claimed_sum + computing_claimed_sum,
SecureField::zero()
);
// Unpack proof
let statement0 = proof.statement0;
let statement1 = proof.statement1;
let stark_proof = proof.stark_proof;
// Create channel and commitment scheme
let channel = &mut Blake2sChannel::default();
let commitment_scheme = &mut CommitmentSchemeVerifier::<Blake2sMerkleChannel>::new(config);
let log_sizes = statement0.log_sizes();
// Preprocessed columns.
commitment_scheme.commit(stark_proof.commitments[0], &log_sizes[0], channel);
// Commit to statement 0
statement0.mix_into(channel);
// Trace columns.
commitment_scheme.commit(stark_proof.commitments[1], &log_sizes[1], channel);
// Draw lookup element.
let lookup_elements = ComputationLookupElements::draw(channel);
// Commit to statement 1
statement1.mix_into(channel);
// Interaction columns.
commitment_scheme.commit(stark_proof.commitments[2], &log_sizes[2], channel);
// Create components
let components = Components::new(&statement0, &lookup_elements, &statement1);
verify(
&components.components(),
channel,
commitment_scheme,
stark_proof,
)
.unwrap();
Finally, we verify the components!
Additional Examples
Example | Preprocessed Columns | Trace Columns | LogUp Columns |
---|---|---|---|
Permutation argument check | - unordered list - ordered list | 1 / unordered list - 1 / ordered list | |
Range Check (0 <= a < 2^bits) | [0,2^bits) rows | - lookup columns - multiplicities column | - 1 / lookup - multiplicity / preprocessed |
Comparator check (a > b) | [0,2^bits) rows | - a - b - multiplicities column | - 1 / (a - b) - multiplicity / preprocessed |
IsZero check (a == 0) | - a - a_inv | ||
XOR operations | valid XOR operations | - lookup columns - multiplicities column | |
Selectors | columns of 0s and 1s | ||
Checking whether current row is first row or not | single column (first row = 1, other rows = 0) | ||
Connecting multiple components (output of Component A is input of Component B) | - 1 / output - 1 / input * (-1) | ||
Public Input/Output | 1 / input + 1 / output |
Above is a list of additional examples that you can implement as an AIR using Stwo, some of which we have already implemented in the previous sections.
Cairo AIR
The following sections cover how Cairo is expressed as an AIR and proved using Stwo. The explanation is based on this commit of the Stwo-Cairo repository.
Overview of Cairo
This is an informal overview of Cairo. For a more formal explanation, please refer to the original Cairo paper.
Let's start by understanding how Cairo works. Essentially, Cairo is a Turing-complete CPU architecture specifically designed to enable efficient proofs of execution using STARKs. In particular, Cairo uses a read-only memory model instead of the more common read-write memory model and does not use any general-purpose registers.
Non-Deterministic Read-Only Memory
A read-only memory model is one where each address in memory can have only a single value throughout the program's execution. This contrasts with the more common read-write memory model, where an address can have multiple values at different points during execution.
The memory is also non-deterministic: the prover provides the values of the memory cells as witness values, and they do not need further constraints beyond ensuring that each address has a single value throughout the program's execution.
Registers
In physical CPUs, accessing memory is expensive compared to accessing registers due to physical proximity. This is why instructions typically operate over registers rather than directly over memory cells. In Cairo, accessing memory and registers incur the same cost, so Cairo instructions operate directly over memory cells. Thus, the three registers used in Cairo do not store instructions or operand values like in physical CPUs, but rather pointers to the memory cells where the instructions and operands are stored:
pc
is the program counter, which points to the current Cairo instructionap
is the allocation pointer, which points to the current available memory addressfp
is the frame pointer, which points to the current frame in the "call stack"
Cairo Instructions
Let's now see what a Cairo instruction looks like.

As the figure above from the Cairo paper shows, an instruction is 64 bits, where the first three 16-bit integers are signed offsets to the operands dst
, op0
, and op1
.
The next 15 bits are flags. The dst_reg
and op0_reg
1-bit flags indicate whether to use the ap
or the fp
register as the base for the dst
and op0
operands. The op1_src
flag supports a wider range of base values for the op1
operand: op0
, pc
, fp
, and ap
. The res_logic
flag indicates how to compute the res
operand: op1
, op0 + op1
, or op0 * op1
. The pc_update
and ap_update
flags show how to update the pc
and ap
registers after computing the operands. The opcode
flag indicates whether this instruction belongs to a predefined opcode (e.g., CALL
, RET
, ASSERT_EQ
) and also defines how the ap
and fp
registers should be updated.
For a more detailed explanation of the flags, please refer to Section 4.5 of the Cairo paper.
Finally, the last bit is fixed to 0, but as we will see in the next section, this design is modified in the current version of Cairo to support opcode extensions.
Opcodes and Opcode Extensions
In Cairo, an opcode refers to what the instruction should do. Cairo defines a set of common CPU operations as specific opcodes (e.g., ADD
, MUL
, JUMP
, CALL
), and the current version of Cairo also defines a new set of opcodes used to improve the performance of heavy computations such as Blake2s hashing and QM31 addition and multiplication.
Since the 64-bit instruction structure is not flexible enough to support this extended set of opcodes, Cairo extends the instruction size to 72 bits and uses the last 9 bits as the opcode extension value.

As of this commit, the following opcode extension values are supported:
0
: Stone (original opcodes)1
: Blake2
: BlakeFinalize3
: QM31Operation
Even if an instruction does not belong to any predefined set of opcodes, it is considered a valid opcode as long as it adheres to the state-transition function defined in Section 4.5 of the Cairo paper. In Stwo Cairo, this is referred to as a generic opcode.
Basic Building Blocks
This section covers the basic building blocks used to build the Cairo AIR.
Felt252 to M31
Cairo works over the prime field \(P = 2^{251} + 17 \cdot 2^{192} + 1\), while Stwo works over the prime field \(M31 = 2^{31} - 1\). Thus, in order to represent the execution of Cairo with Stwo, we need to decompose the 252-bit integers into 31-bit integers. The Cairo AIR chooses to use the 9-bit decomposition, so a single 252-bit integer will result in 28 9-bit limbs.
Range checks
Range-checks are very commonly used in the Cairo AIR. They are used to ensure that the values of the witness values are within a certain range, most commonly within a certain bit length. For example, in the Felt252 to M31 section, we saw that a 252-bit integer is decomposed into 28 9-bit limbs, so we need to verify that each limb is in the range \(0 \leq \text{limb} < 2^{9}\).
This is done by using a preprocessed column that contains the entire range of possible values for the bit length. For example, for a 9-bit range check, the column will contain the values from 0 to \(2^9 - 1\). We also have another column that contains the number of times the range-check was invoked for each valid value and we use lookups to check that each range-check is valid. For a more practical example, please refer to the Static Lookups section.
Main Components
Now that we have a basic understanding of how Cairo works and the basic building blocks that are used to build the Cairo AIR, let's take a look at the main components of the Cairo AIR.
For readers who are unfamiliar with the concepts of components and lookups, we suggest going over the Components section of the book.
Fetch, Decode, Execute
Cairo follows the common CPU architecture of fetching, decoding, and executing an instruction in a single CPU step. Below is a high-level diagram of what a single CPU step looks like in Cairo.

Since we need to prove the correctness of all CPU steps, the Cairo AIR writes the results of fetching, decoding, and executing an instruction at every CPU step into a trace and proves that the constraints over the trace are satisfied—i.e., consistent with the semantics of Cairo.
Let's keep this in mind while we go over the main components of the Cairo AIR.
1. Memory Component
The first component we need is a Memory
component, which implements the non-deterministic read-only memory model of Cairo.
In Cairo AIR, instead of mapping the memory address to a value directly, we first map the address
to an id
and then map the id
to a value
. This is done to classify the memory values into two groups: Small
and Big
, where Small
values are 72-bit integers and Big
values are 252-bit integers. As many memory values do not exceed the Small
size, this allows us to save cost on unnecessary padding.
As a result, the Memory
component is actually 2 components: MemoryAddressToId
and MemoryIdToValue
.
The constraints for the MemoryAddressToId
and MemoryIdToValue
components are as follows:
- An
address
must appear once and only once in theMemoryAddressToId
component. - Each
(address, id, value)
tuple must be unique.
The first constraint is implemented by using a preprocessed column that contains the sequence of numbers [0, MAX_ADDRESS)
and using this as the address values (in the actual code, the memory address starts at 1, so we need to add 1 to the sequence column).
The second constraint is guaranteed because the address
value is always unique.
A short explainer on how the id
value is computed:
The id
value is a 31-bit integer that is incremented by 1 from 0 whenever there is a unique memory access. For example, if the addresses [5, 1523, 142]
were accessed in that order, the ids for those addresses will be (5, 0)
, (1523, 1)
, and (142, 2)
.
Since an id
value needs to include information as to whether the corresponding value
is Small
or Big
, we use the MSB as a flag (0 for Small
and 1 for Big
). Thus, an id
value that corresponds to a Small
value can occupy the space [0, 2^30)
, while an id
value that corresponds to a Big
value can occupy the space [2^30, 2^31)
.
In reality, the size of each table will be [0, NUM_SMALL_VALUES_ACCESSED)
and [2^30, 2^30 + NUM_BIG_VALUES_ACCESSED)
, where NUM_SMALL_VALUES_ACCESSED
and NUM_BIG_VALUES_ACCESSED
are values that are provided by the prover. To make sure that the id
values are created correctly, we can use the preprocessed column as the id
values.
2. VerifyInstruction Component
The VerifyInstruction
component is responsible for accessing the instruction from the Memory
component and decomposing the retrieved value. As mentioned in the Felt252 to M31 section, a 252-bit integer is stored as 28 9-bit limbs, so we need to decompose the limbs and concatenate them to get the values we need. For example, in order to get the 3 16-bit offset values, we need to decompose the first 6 limbs into [9, [7, 2], [9], [5, 4], [9], [3, 6]]
and concatenate them as the following: [[9, 7], [2, 9, 5], [4, 9, 3]]
. Then, the remaining 6-bit value and the next limb will correspond to the 15-bit flags, and the next (8th) limb will be the opcode extension value. The other 20 limbs should all be zeros. At the end, we will have decomposed the instruction value into 3 16-bit offsets, 2 chunks of flags, and a 9-bit opcode extension.
Note that the decomposition will be constrained by range checking that each integer is within its corresponding range.
3. Opcode Component
Since every Cairo instruction can be mapped to a specific Opcode
, we can check that a Cairo instruction is executed by checking that the corresponding Opcode
component was executed correctly. You can think of the Opcode
component as the main component that uses the VerifyInstruction
and Memory
components.
We define a single Opcode
component for each predefined opcode and a GenericOpcode
component, which is used for all instructions that do not map to any of the predefined opcodes.
The following is a list of constraints that the Opcode
component needs to verify:
- The offsets and flag values are correct using the
VerifyInstruction
component. - The instruction is correctly mapped to the current
Opcode
component using the flags. - The operand values
op0
,op1
, anddst
computed with the registers and the offsets are correct using theMemory
component. - The operation for the current
Opcode
component is done correctly. - The state transition of the three registers (
pc
,ap
,fp
) is done correctly using the flags.
Of these constraints, items 2 and 4 are self-contained—meaning they do not require any other components to be verified. We will explore how the remaining three are verified using lookups between the different components.
Bringing It All Together
The following figure shows how each of the main components are connected to each other using lookups.
If we look at the right-hand side first, we can see the main components of the Cairo AIR. The boxes in each component correspond to lookups within that component, and boxes with the same fill color correspond to lookups of the same values. Some lookups are yielded (i.e., subtracted), while others are used (i.e., added). Yielded lookups have red edges; used lookups have blue edges.
On the left-hand side, we can see lookups that are computed directly by the verifier from data provided by the prover as part of the proof.
Each component has a claimed sum value that corresponds to the sum of all the lookups in the component. These claimed sums are provided by the prover as part of the proof, and the verifier adds them together along with the lookups on the left-hand side. If the total sum is zero, the verifier is convinced that the proof is valid.

Memory Lookups
The memory lookups correspond to looking up the (address, id)
and (id, value)
pairs. In the Memory
component, each of the lookups is multiplied by a witness value mult
, which indicates the number of times each memory address was accessed. Since the memory accesses are added to the total sum and the same amount is subtracted from the total sum in the Memory
component, the total sum for memory lookups should equal zero.
Note that the verifier also adds lookups for the Cairo program, which is necessary to ensure that the correct program is actually stored in memory and is properly executed.
Instruction Lookups
Once the VerifyInstruction
component retrieves the instruction value using the (pc_addr, pc_id)
and (pc_id, pc_val)
lookups, it subtracts the lookup of the tuple (pc, dst_off, op0_off, op1_off, flags1, flags2, opcode_extension)
, which are the decomposed values of the instruction. This lookup also has a mult
witness because the VerifyInstruction
component has a single row for each unique pc
value (i.e., the Cairo instruction stored at the pc
address). Thus, the same pc
value can be invoked multiple times throughout the program, and the mult
value represents the number of times the same pc
value is invoked.
Since the same tuple lookup is added to the total sum whenever the Opcode
component uses the same instruction, the total sum for instruction lookups should equal zero.
Register Lookups
After computing over the operands, a Cairo instruction updates the register values based on the values in the flags. In an Opcode
component, this update logic is verified using the columns pc
, ap
, fp
, new_pc
, new_ap
, new_fp
, and by constraining the values with the flags.
In addition to checking that each state transition is done correctly, we also need to make sure that the initial register values (i.e., before running the program) and the final register values (i.e., after running the program) satisfy the Cairo semantics. For example, the final pc
must point to an instruction that runs the JUMPREL 0
opcode, which is an infinite loop (you can check the rest of the semantics enforced here).
Once we have verified that the initial and final register values are correct and each state transition is done correctly, we do a final check that the register values are all connected—that is, the register values used by the second instruction are the same as the new register values of the first instruction, and so on. We check this by adding the lookup of the (pc, ap, fp)
tuple and subtracting the lookup of the (new_pc, new_ap, new_fp)
tuple for each Opcode
row. When we add up all the lookups, the total sum for register lookups should equal (init_pc, init_ap, init_fp) - (final_pc, final_ap, final_fp)
. Thus, the verifier can compute the lookups of the initial and final register values, subtract the first, add the second, and check that the total sum is zero.
ADD Opcode Walkthrough
To better understand how Opcode
components work, let's walk through the implementation of the AddOpcode
component.

Above is the list of all the columns used in the AddOpcode
component. Note that the dst_val
, op0_val
, and op1_val
columns are actually 28 columns each (to support 28 9-bit limbs), but we show them as single columns for brevity.
To reiterate what an Opcode
component does from the Main Components section, it verifies the following:
- The offsets and flag values are correct using the
VerifyInstruction
component. - The instruction is correctly mapped to the current
Opcode
component using the flags. - The operand values
op0
,op1
, anddst
computed with the registers and the offsets are correct using theMemory
component. - The operation for the current
Opcode
component is done correctly. - The state transition of the three registers (
pc
,ap
,fp
) is done correctly using the flags.
An instruction should contain 15 bits of flags but only 5 bits are represented in the AddOpcode
columns. Can you see why?
Hint
Hint
Check out how a Cairo instruction is pattern-matched to an ADD
opcode here.
Hint
Hint
Check out how the decomposition of the flags is verified here.
Items 1 and 3 should be familiar, as we already covered them in the Main Components section. For item 2, you can check the specs for the ADD
opcode here. For item 5, the specs for a valid state transition can be found in Section 4.5 of the Cairo paper.
In this section, we will focus on how item 4 is implemented.
Adding two 252-bit integers
Assuming that the operands op0
, op1
, and dst
are correctly accessed from the Memory
table, we now check that the addition of two 252-bit integers is done correctly, i.e. op0 + op1 = dst
. As noted in the Felt252 to M31 section, a 252-bit integer is stored as 28 9-bit limbs, so we need to check addition for each of the 28 limbs.
We will incrementally build up to the final constraint.
Limb-wise Addition and Carry
To verify that the two sets of limbs are correctly added, we check limb-wise addition. Since each limb can create a carry, we also add the carry from the previous limb (except for the first limb). Thus, the constraints look like this:
carry_limb_1 = (op0[0] + op1[0] - dst[0]) / 2^9
carry_limb_1 * (carry_limb_1 - 1) = 0
carry_limb_2 = (op0[1] + op1[1] + carry_limb_1 - dst[1]) / 2^9
carry_limb_2 * (carry_limb_2 - 1) = 0
...
op0[27] + op1[27] + carry_limb_27 - dst[27] = 0
We divide op0[0] + op1[0] - dst[0]
by 2^9
since this quantity is either 2^9
(if a carry exists) or 0
(if no carry exists). Dividing by 2^9
yields 1
or 0
, respectively. To check that the carry is either 0
or 1
, we add the constraint carry_limb_0 * (carry_limb_0 - 1) = 0
. For the final limb, we simply check that the addition is correct.
Handling overflow beyond the 252-bit prime field
We also need to account for addition overflowing the 252-bit prime field P
(i.e., op0 + op1 = dst + P
). To check this, we introduce a witness variable sub_p_bit
, a 1-bit value set to 1 if there is an overflow. Note that since P = 2^251 + 17 * 2^192 + 1
, we only subtract in the three limbs where P
has a non-zero limb: 0
, 22
, 27
.
Now let's revisit the constraints:
sub_p_bit * (sub_p_bit - 1) = 0
carry_limb_1 = (op0[0] + op1[0] - dst[0] - sub_p_bit) / 2^9
carry_limb_1 * (carry_limb_1 - 1) = 0
...
carry_limb_23 = (op0[22] + op1[22] + carry_limb_22 - dst[22] - sub_p_bit * 136) / 2^9
carry_limb_23 * (carry_limb_23 - 1) = 0
...
op0[27] + op1[27] + carry_limb_27 - dst[27] - sub_p_bit * 256 = 0
First, we verify that sub_p_bit
is a bit. Then, we subtract sub_p_bit
from the first limb, sub_p_bit * 136
from the 22nd limb, and sub_p_bit * 256
from the 27th limb. (Note that 136
and 256
are the values of P
in the 22nd and 27th limbs, respectively.)
A caveat of this approach is that subtracting sub_p_bit
can introduce an underflow, i.e., (op0[0] + op1[0] - dst[0] - sub_p_bit) / 2^9 = -1
. This means that carry_limb_0
can be -1
as well as 0
or 1
. Thus, we update the constraint for all carries to the following:
...
carry_limb_1 * (carry_limb_1 - 1) * (carry_limb_1 + 1) = 0
...
Optimization
To optimize the number of constraints, we can combine all the constraints for each limb into a single constraint. Naively checking that carry_limb_1 = (op0[0] + op1[0] - dst[0] - sub_p_bit) / 2^9
would require a dedicated column for carry_limb_1
. Instead, we keep carry_limb_1
as an intermediate value and inline the equation when computing the next carry. For example, the second-limb carry is computed as follows:
carry_limb_2 = (op0[1] + op1[1] + ((op0[0] + op1[0] - dst[0] - sub_p_bit) / 2^9) - dst[1]) / 2^9
This way, we avoid allocating extra columns for the carries and proceed until the last limb, where we check that the giant equation is correct.
This is possible because the computation does not involve any multiplication of witness values, so the constraint degree does not blow up.
Final constraint
One last note: in the implementation, we replace division by 2^9
with multiplication by 2^22
, which is equivalent in the M31 field since 2^9
and 2^22
are multiplicative inverses—i.e., 2^9 * 2^22 = 1 mod 2^31 - 1
.
So the final constraint looks like this:
sub_p_bit * (sub_p_bit - 1) = 0
carry_limb_1 = (op0[0] + op1[0] - dst[0] - sub_p_bit) * 2^22 // intermediate representation
carry_limb_1 * (carry_limb_1 - 1) * (carry_limb_1 + 1) = 0
...
carry_limb_27 = (op0[26] + op1[26] + carry_limb_26 - dst[26]) * 2^22 // intermediate representation
carry_limb_27 * (carry_limb_27 - 1) * (carry_limb_27 + 1) = 0
op0[27] + op1[27] + carry_limb_27 - dst[27] - sub_p_bit * 256 = 0
Enabler column
Finally, we introduce the last column in the AddOpcode
component: the enabler
column. As the name suggests, this column enables or disables the constraint for the current row. It is used to support the dummy rows added to the end of the table to make the number of rows a power of two. In other words, it is set to 1
for all valid ADD
opcode calls and 0
for all dummy rows.
How Does It Work?
This section is for those who want an in-depth explanation of various components of Stwo.
Mersenne Primes
Proof systems typically rely on finite field operations, where efficient field arithmetic is crucial for optimizing proof generation. In STARK protocols, there is no direct dependency between the security level of the proof system and the field size. This allows the use of small fields with highly efficient arithmetic, such as Mersenne prime fields.
A Mersenne prime is defined as a prime number that is one less than a power of two, expressed as \( p = 2^k -1 \).
Consider the Mersenne prime field \( \mathbb{F}_p \) where \( p = 2^{31} - 1 \). Our objective is to perform field multiplication \( a \cdot b \), where \( a, b \in \mathbb{F}_p \). This operation involves a 31-bit integer multiplication, producing a 62-bit intermediate result, which is then reduced modulo \( p \).
Let \( x = a \cdot b \), where \( a, b \) are 31-bit values, resulting in a 62-bit product \( x \). We can decompose \( x \) into two 31-bit values \( b \) and \( s \), such that \( x = 2^{31} \cdot b + s \), as shown in the following figure.
To perform modular reduction, we start with: \[ x \equiv (2^{31} \cdot b + s) \quad mod \quad (2^{31} - 1) \] Substituting \( 2^{31} \equiv 1 \mod (2^{31} - 1) \) gives: \[ x \equiv (b + s) \quad mod \quad (2^{31} - 1) \]
Since \( b \) and \( s \) are both 31-bit values, they can be directly represented as field elements. Consequently, modular reduction is performed with a single field addition. This makes arithmetic over Mersenne primes exceptionally fast, making them an ideal choice for our STARK protocol.
However, we instantiate STARK protocols over an FFT-friendly field, meaning a field that contains a multiplicative subgroup of order that is a large power of two (commonly referred to as a smooth subgroup).
\[ |\mathbb{F}_p^*| = p-1 = 2^k-2\]
As shown above, Mersenne prime fields lack a smooth subgroup of size that is a large power of two because there is no large power of two that divides \( |\mathbb{F}_{p}^*| \). In other words, there does not exist a sufficiently large \( n \) such that \( 2^n \, | \, p - 1 \).
Extensions of Mersenne Prime Field
To make Mersenne prime fields compatible with STARKs, we use a degree-2 extension of \( \mathbb{F}_p \), defined as follows:
\[ \mathbb{F}_{p^2} = \mathbb{F}_p[X]/(X^2 + 1) \]
This extension forms a field of size \( p^2 \), where elements can be represented as \( (a, b) \) or \[ a + i \cdot b \] where \( a, b \in \mathbb{F}_p \) and \( i \) is the root of the polynomial \( X^2 + 1 \) i.e. \( i^2 + 1 = 0\).
The order of the multiplicative group of this extended field is calculated as follows:
\[ |\mathbb{F}_{p^2}^*| = p^2 - 1 = (p-1) \cdot (p+1)\]
For Mersenne primes of the form \( p = 2^k - 1 \), this becomes:
\[ |\mathbb{F}_{p^2}^*| = (2^k-2) \cdot (2^k)\]
As shown above, \( 2^k \, | \, |\mathbb{F}_{p^2}^*| \) i.e. \( \mathbb{F}_{p^2}^* \) contains a subgroup of size that is a large power of two. This makes it suitable for instantiating STARKs. This subgroup is what we refer to as the Circle group (explored further in the next section).
Secure Field
For the soundness of the protocol, it is crucial that the verifier samples random challenges from a sufficiently large field to ensure that an adversary cannot guess or brute-force the challenges and generate a proof that passes verification without knowledge of the witness.
If we use \( p = 2^{31} -1 \), then 31-bit random challenges are not sufficient to maintain the security of the protocol. To address this, the verifier draws random challenges from a degree-4 extension of \( \mathbb{F}_{p} \), which is equivalent to degree-2 extension of \( \mathbb{F}_{p^2} \), denoted as \[ \mathbb{F}_{p^4} = \mathbb{F}_{p^2}[X]/(X^2 - 2 - i) \]
The elements of \( \mathbb{F}_{p^4} \) can be represented as \( (r, s) \) or \[ r + u \cdot s \] where \( r, s \in \mathbb{F}_{p^2} \) and \( u \) is the root of the polynomial \( X^2 - 2 - i \) i.e. \( u^2 - 2 - i = 0\).
Alternatively, the elements of \( \mathbb{F}_{p^4} \) can also be represented as four elements of \( \mathbb{F}_{p} \) i.e. \( ((a, b), (c, d)) \) or \[ (a + i \cdot b) + (c + i \cdot d) \cdot u \]
where \( a, b, c, d \in \mathbb{F}_p \). With four elements from \( \mathbb{F}_{p} \), the challenge space consists of 124-bit values, offering a sufficiently large \( 2^{124} \) possibilities to sample a random challenge.
Circle Group
As discussed in the previous section, Mersenne prime fields \( \mathbb{F}_p \) lack a smooth subgroup whose order is a large power of two. This property makes such fields unsuitable for instantiating STARK protocols. To address this, we consider extensions of \( \mathbb{F}_p \) that have smooth subgroups, which are suitable for performing FFTs and implementing the FRI protocol.
For a field extension \( F \) of \( \mathbb{F}_p \), we define the circle curve \( C(F) \) as the set of points \( (x, y) \in F^2 \) satisfying the relation: \[ x^2 + y^2 = 1 \]
In Stwo implementation, a point on the circle is defined as follows:
pub struct CirclePoint<F> {
pub x: F,
pub y: F,
}
The set \( C(F) \) forms a cyclic group under the operation defined by: \[ (x,y) + (x', y') = (xx' - yy', xy' + x'y) \]
Here, the group is defined additively, which differs from the multiplicative notation used in the Circle STARKs paper. In this documentation, we adopt the additive notation for consistency with the implementation. The above group operation is implemented as:
fn add(self, rhs: Self) -> Self::Output {
let x = self.x.clone() * rhs.x.clone() - self.y.clone() * rhs.y.clone();
let y = self.x * rhs.y + self.y * rhs.x;
Self { x, y }
}
The identity element in this group is \( (1, 0) \), implemented as:
pub fn zero() -> Self {
Self {
x: F::one(),
y: F::zero(),
}
}
Negation in the circle group corresponds to the conjugation map \( J \), defined by:
\[ J(x, y) = (x, -y) \]
This is same as complex conjugation in complex numbers. In Stwo, the conjugate of a CirclePoint
is computed as:
pub fn conjugate(&self) -> CirclePoint<F> {
Self {
x: self.x.clone(),
y: -self.y.clone(),
}
}
The total number of points in the circle group \( C(F) \) is given by \( |F| + 1 \). Specifically, for \( C(\mathbb{F}_p) \), the number of points is \( p + 1 \), which, as discussed earlier, is a large power of two and can thus be used in STARK protocol instantiations. This result is proven in Lemma 1 of the Circle STARKs paper.
In Stwo implementation, the generator \( g \) of the group \( C(\mathbb{F}_p) \) is defined as: \[ g = (2, 1268011823) \] where \( p = 2^{31} -1\). Subgroups of \( C(\mathbb{F}_p) \) of size \( 2^n \) can be generated using the following function:
pub fn subgroup_gen(n: u32) -> CirclePoint<F> {
assert!(n <= M31_CIRCLE_LOG_ORDER); // M31_CIRCLE_LOG_ORDER = 31
let s = 1 << (M31_CIRCLE_LOG_ORDER - n);
M31_CIRCLE_GEN.mul(s) // M31_CIRCLE_GEN = g = (2, 1268011823)
}
To generate a subgroup \( \langle g_n \rangle \) of size \( 2^n \), the function computes \( 2^{31 - n} \cdot g \), i.e. it applies the group law to the generator \( g \) with itself \( 2^{31 - n} \) times, as shown below:
pub fn mul(&self, mut scalar: u128) -> CirclePoint<F> {
let mut res = Self::zero();
let mut cur = self.clone();
while scalar > 0 {
if scalar & 1 == 1 {
res = res + cur.clone();
}
cur = cur.double();
scalar >>= 1;
}
res
}
Hence, the point \( 2^{31-n} \cdot g \) serves as a generator of a subgroup \( \langle g_n \rangle \) of order \( 2^n \).
Circle Domain
In a STARK protocol, the computation trace is interpolated as a low-degree polynomial over a domain using FFT. For Circle STARKs, this domain consists of points on the circle curve and is referred to as the circle domain. The circle domain \( D \) is constructed as the union of two disjoint cosets: \[ D = q + \langle g_{n-1} \rangle \cup -q + \langle g_{n-1} \rangle \] Here, \( \langle g_{n-1} \rangle \) is a subgroup of size \( 2^{n-1} \), and \( q \) is the coset offset. This union is also called the twin-coset. The second coset in the union can be viewed as the negation (or conjugation) of the first: \[ J(q + \langle g_{n-1} \rangle) = -q + \langle g_{n-1} \rangle \] Therefore, it suffices to store only the half coset \( q + \langle g_{n-1} \rangle \), and generate the full domain via its conjugates. The circle domain is defined in Stwo as:
pub struct CircleDomain {
pub half_coset: Coset,
}
The following animation shows a circle domain of size 8. It is constructed from the half coset \( q + \langle g_2 \rangle \) of size 4 (shown as red points) and its negation \( -q + \langle g_2 \rangle \) (shown as blue points).
To iterate over all points in the circle domain, we can iterate over the half coset and its conjugates:
pub fn iter(&self) -> CircleDomainIterator {
self.half_coset
.iter()
.chain(self.half_coset.conjugate().iter())
}
Canonic Coset
For a specific choice of offset \( q \), the twin-coset \( D \) becomes a coset of a larger subgroup. In particular, if \( q \) is a generator of a subgroup of order \( 2^{n+1} \), then: \[ D = q + \langle g_n \rangle = q + \langle g_{n-1} \rangle \cup -q + \langle g_{n-1} \rangle \] This result is proven in Proposition 1 of the Circle STARKs paper. Such domains are called standard position coset, or are referred to as canonic cosets. They are implemented as follows:
pub struct CanonicCoset {
pub coset: Coset,
}
Here, CanonicCoset
represents the full coset \( q + \langle g_n \rangle \), while CircleDomain
is represented with its half coset \( q + \langle g_{n-1} \rangle \). Thus to compute the CircleDomain
from the CanonicCoset
, first calculate the half coset \( q + \langle g_{n-1} \rangle \), which will be used to initialize the CircleDomain
as shown below:
pub fn circle_domain(&self) -> CircleDomain {
CircleDomain::new(self.half_coset())
}
The following animation shows a canonic coset of size 8. It is constructed from the coset \( \langle g_3 \rangle \) of size 8 followed by an offset by \( q \), where \( q \) is the generator of subgroup \( \langle g_4 \rangle \).
We can verify whether a given CircleDomain
is canonic by checking the step size of the half coset against the initial coset offset. In the CircleDomain
implementation, only the half coset \( q + \langle g_{n-1} \rangle \) is explicitly stored. If CircleDomain
is canonic, \( q \) must be a generator of the subgroup \( \langle g_{n+1} \rangle \), which has order \( 2^{n+1} \) i.e. \( q = 2^{31 - (n+1)} \cdot g \). Recall that the generator of the subgroup \( \langle g_{n-1} \rangle \) is \( 2^{31 - (n-1)} \cdot g \).
Thus, the step size between consecutive elements in the half coset is \( 2^{31 - (n-1)} \cdot g \), and the initial point is \( q = 2^{31 - (n+1)} \cdot g \). Therefore, the ratio between the step size and the initial coset offset is:
\[
\frac{2^{31 - (n-1)}}{2^{31 - (n+1)}} = 2^2 = 4
\]
This means that in a canonic coset, the step size is exactly four times the initial coset offset. This condition is used to check whether a CircleDomain
is canonic, as shown below:
pub fn is_canonic(&self) -> bool {
self.half_coset.initial_index * 4 == self.half_coset.step_size
}
In the next section, we will dive into polynomials defined over the circle.
Lookups
Lookups are simply a way to connect one part of the table to another. When we "look up" a value, we are doing nothing more than creating a constraint that allows us to use that value in another part of the table without breaking soundness.
Design
We will walk through four steps to incrementally build up the design of lookups.
Step 1: Suppose we want to have two columns with the same values.
We can do this by creating two columns with the exact same values and adding a constraint over them: col_1 - col_2 = 0
.
Step 2: We want to check that the two columns have the same values but in a different order.
We can use the idea that two sets of values will have the same cumulative product if they are indeed permutations of each other. So we add new columns, col_1_cumprod
for col_1
and col_2_cumprod
for col_2
, which contain the running cumulative product of col_1
and col_2
, respectively. The new constraints will check that each of these new columns do indeed contain the cumulative product values and that their last values are the same. We can optimize this by creating just one new column that keeps a running cumulative product of the fraction col_1 / col_2
.
Step 3: We want to check that all values in col_2
are in col_1
, but each value appears an arbitrary number of times.
(Note that this is a generalization of the second step in that for the second step,all values in col_2
appear exactly once in col_1
)
Supporting this third step is actually pretty simple: when creating the running cumulative product, we need to raise each value in col_1
to its multiplicity, or the number of times it appears in col_2
. The rest of the constraints do not need to be changed.
Step 4: We want to check that all values in [col_2, col_3, ...]
are in col_1
with arbitrary multiplicities
Finally, we want to create many more columns that contain values from col_1
. Fortunately,
To support this, we can use the same idea as the third step: when creating the running cumulative product, we need to raise each value in col_1
to the power of the number of times it appears in [col_2, col_3, ...]
.
In summary, lookups support the following use-cases:
- Prove equality: we want to prove that the values of the first column are equal to the values of the second column.
- Prove permutation: we want to prove that the values of the first column are a permutation of the values of the second column.
- Prove permutation with multiplicities: we want to prove that each value of the first column appears a certain number of times over multiple columns.
Technique: LogUp
LogUp is a technique used to constrain lookups. It's a successor to Plookup, and is especially useful for proving permutation with multiplicities. Here, we'll briefly explain why this is the case.
Plookup and its variants use a technique called the Grand Product Check to prove permutation.
$$ \prod_{i=0}^{n-1} (X - a_i) = \prod_{i=0}^{n-1} (X - b_i) $$
In the equation above, we can check that the set ${a_0,...,a_{n-1}}$ is a permutation of the set ${b_0,...,b_{n-1}}$ by setting $X$ to a random value provided by the verifier.
However, this becomes inefficient when we have multiplicities since we need to encode the multiplicities as powers of each lookup polynomial, and thus the degree of the polynomial increases linearly with the number of multiplicities.
$$ \prod_{i=0}^{n-1} (X - a_i) = \prod_{i=0}^{n-1} (X - b_i)^{m_i} $$
On the other hand, LogUp uses the derivative of the Grand Product Check:
$$ \sum_{i=0}^{n-1} \frac{1}{X - a_i} = \sum_{i=0}^{n-1} \frac{m_i}{X - b_i} $$
In this approach, each lookup polynomial is represented as a rational function with the multiplicity as the numerator. This transformation is significant because the degree of the polynomial remains constant regardless of the number of multiplicities, making LogUp more efficient for handling multiple lookups of the same value.
Implementation
The following figures show the implementation of lookups in Stwo that looks up values from a preprocessed trace and constraining them using the LogUp technique.



Awesome Stwo
We refer you to the Awesome Stwo repo for additional resources on Stwo and a list of awesome projects building on Stwo.
Benchmarks Report
- Overview of zkVM and Proof Systems
zkVM / Proof System | Architecture | Frontend | Backend | Security Bits |
---|---|---|---|---|
RISC Zero | RISC-V | Rust | STARK-based | 96 bits |
SP1 | RISC-V | Rust | STARK-based | 100 bits |
OpenVM | RISC-V | Rust | STARK-based | 100 bits |
Jolt | RISC-V | Rust | Lookup-based | - |
Stone | Cairo VM | Cairo | STARK-based | 100 bits |
Stwo | Cairo VM | Cairo | STARK-based | 96 bits |
- All benchmarks presented here are CPU-only and do not utilize any GPU acceleration.
- Stone benchmarks were generated using the
dynamic
layout with these configurations. - The benchmarks for SP1, R0 and OpenVM use compressed or succinct prover type, which aggregates all the STARK proofs into a single STARK proof.
- Benchmarks which run out of memory have been indicated by
💾
and benchmarks which generate errors in proof generation have been indicated by❌
in the tables.
Time and Commit Hash
- Commit Hash: f27856ec17fbd9e85ec31cba9c2cb9e96d8dd08f
- Timestamp: Thursday, July 03, 2025 20:52:50 UTC
System Information
OS Version
Ubuntu 24.04.2 LTS
CPU Info
-
Architecture: x86_64
-
CPU(s): 48
-
Model name: AMD EPYC-Rome Processor
-
Thread(s) per core: 2
-
Core(s) per socket: 24
-
Socket(s): 1
-
L3 cache: 16 MiB (1 instance)
Memory Info
-
MemTotal: 184.25 GB
-
MemFree: 166.19 GB
-
MemAvailable: 178.69 GB
Fibonacci
Benchmark n
Fibonacci iterations.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
32768 | 3.208 | 15.003 | 34.291 | 17.128 | 56.325 | 13.198 |
65536 | 5.183 | 18.252 | 42.403 | 29.241 | 95.468 | 11.372 |
131072 | 9.17 | 24.982 | 57.978 | 52.298 | 💾 | 12.886 |
262144 | 15.771 | 39.327 | 92.229 | 86.743 | 💾 | 12.587 |
524288 | 28.16 | 54.812 | 158.128 | 167.607 | 💾 | 16.166 |
1048576 | 54.42 | 72.028 | 320.082 | 306.386 | 💾 | 18.612 |
2097152 | 107.89 | 128.215 | 515.567 | 605.92 | 💾 | 31.047 |
4194304 | ❌ | 222.332 | 1042.58 | 1206.76 | 💾 | 60.939 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
32768 | 53.0 | 108 | 142 | 23 | 92.0 | 41 |
65536 | 77.0 | 107 | 139 | 10 | 113.0 | 16 |
131072 | 58.0 | 101 | 140 | 23 | 💾 | 18 |
262144 | 103.0 | 100 | 141 | 22 | 💾 | 89 |
524288 | 56.0 | 83 | 140 | 24 | 💾 | 14 |
1048576 | 104.0 | 83 | 57 | 23 | 💾 | 414 |
2097152 | 98.0 | 84 | 57 | 10 | 💾 | 367 |
4194304 | ❌ | 83 | 57 | 23 | 💾 | 10 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
32768 | 187.615 | 1315.53 | 1708.31 | 223.234 | 125.608 | 789.578 |
65536 | 197.799 | 1315.53 | 1708.31 | 223.234 | 129.8 | 783.63 |
131072 | 208.399 | 1315.53 | 1708.31 | 223.234 | 💾 | 783.834 |
262144 | 219.415 | 1315.53 | 1708.31 | 223.234 | 💾 | 802.678 |
524288 | 230.847 | 1315.53 | 1708.31 | 223.234 | 💾 | 801.194 |
1048576 | 242.695 | 1315.53 | 852.937 | 223.234 | 💾 | 808.038 |
2097152 | 254.959 | 1315.53 | 852.937 | 223.234 | 💾 | 819.122 |
4194304 | ❌ | 1315.53 | 816.649 | 223.234 | 💾 | 866.882 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo |
---|---|---|---|---|---|
32768 | 196974 | 168808 | 166215 | 229390 | 262143 |
65536 | 393582 | 332648 | 330055 | 458766 | 524287 |
131072 | 786798 | 660328 | 657735 | 💾 | 1048575 |
262144 | 1573230 | 1315688 | 1313095 | 💾 | 2097151 |
524288 | 3146041 | 2626408 | 2623815 | 💾 | 4194303 |
1048576 | 6291822 | 5247848 | 5245255 | 💾 | 8388607 |
2097152 | 12583293 | 10490728 | 10488135 | 💾 | 16777215 |
4194304 | ❌ | 20976488 | 20973895 | 💾 | 33554431 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
32768 | 6.44 | 5.12 | 7 | 2.3 | 59.44 | 11.08 |
65536 | 5.83 | 6.42 | 5.43 | 4.58 | 118.54 | 11.46 |
131072 | 8.79 | 9.18 | 5.43 | 9.15 | 💾 | 12.2 |
262144 | 14.73 | 14.3 | 6.57 | 9.17 | 💾 | 14.01 |
524288 | 26.95 | 15.12 | 12.22 | 9.17 | 💾 | 17.57 |
1048576 | 50.85 | 25.94 | 12.22 | 9.17 | 💾 | 24.57 |
2097152 | 98.93 | 35.61 | 12.42 | 9.18 | 💾 | 38.62 |
4194304 | ❌ | 51.25 | 12.43 | 9.18 | 💾 | 72.26 |
Sha2
Benchmark Sha256 hash of n
bytes. For Stone, the cairo implementation of sha256 by cartridge was used for benchmarking and for other zkvms sha2 Rust crate was used for benchmarking.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
256 | 1.657 | 12.71 | 28.943 | 11.196 | 10.835 | 17.978 | 34.019 | 8.316 | 44.301 |
512 | 1.711 | 13.184 | 29.561 | 11.306 | 19.88 | 15.051 | 33.991 | 11.33 | 43.728 |
1024 | 2.173 | 13.863 | 30.609 | 17.136 | 31.021 | 16.833 | 33.83 | 11.201 | 44.897 |
2048 | 3.324 | 15.021 | 34.184 | 28.934 | 31.294 | 17.726 | 34.531 | 17.174 | 44.683 |
4096 | ❌ | 16.88 | 39.38 | 52.487 | 66.539 | 14.084 | 34.418 | 28.982 | 47.701 |
8192 | ❌ | 24.037 | 50.539 | 86.688 | 116.467 | 14.979 | 35.564 | 52.48 | 48.995 |
16384 | ❌ | 37.958 | 78.057 | 167.856 | 💾 | 21.402 | 38.267 | 110.887 | 52.865 |
32768 | ❌ | 52.566 | 129.842 | 300.572 | 💾 | 16.555 | 46.209 | 184.873 | 60.148 |
65536 | ❌ | 72.136 | 249.403 | 589.84 | 💾 | 18.384 | 60.138 | 357.716 | 79.957 |
131072 | ❌ | 136.592 | 532.796 | 1172.21 | 💾 | 27.871 | 87.29 | 709.726 | 124.996 |
262144 | ❌ | 257.02 | 916.603 | 2338.03 | 💾 | 27.362 | 121.022 | 1415.34 | 219.654 |
524288 | ❌ | 513.726 | 1790.29 | 4673.32 | 💾 | 43.242 | 233.959 | 2825.46 | 393.678 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
256 | 50.0 | 108 | 141 | 10 | 428.0 | 10 | 83 | 22 | 142 |
512 | 49.0 | 113 | 144 | 20 | 418.0 | 11 | 106 | 23 | 141 |
1024 | 49.0 | 102 | 140 | 23 | 426.0 | 11 | 90 | 15 | 141 |
2048 | 75.0 | 97 | 140 | 22 | 440.0 | 11 | 109 | 22 | 142 |
4096 | ❌ | 106 | 139 | 22 | 454.0 | 11 | 99 | 23 | 141 |
8192 | ❌ | 109 | 141 | 22 | 525.0 | 11 | 87 | 22 | 142 |
16384 | ❌ | 82 | 144 | 22 | 💾 | 42 | 92 | 23 | 142 |
32768 | ❌ | 94 | 140 | 23 | 💾 | 49 | 102 | 15 | 142 |
65536 | ❌ | 83 | 141 | 23 | 💾 | 143 | 95 | 23 | 141 |
131072 | ❌ | 83 | 57 | 24 | 💾 | 441 | 83 | 24 | 141 |
262144 | ❌ | 83 | 56 | 24 | 💾 | 236 | 83 | 23 | 142 |
524288 | ❌ | 83 | 56 | 23 | 💾 | 12 | 82 | 24 | 141 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
256 | 172.143 | 1315.56 | 1708.31 | 223.482 | 104.104 | 986.394 | 1315.56 | 223.482 | 1784.35 |
512 | 172.143 | 1315.56 | 1708.31 | 223.482 | 114.152 | 1000.61 | 1315.56 | 223.482 | 1784.35 |
1024 | 181.495 | 1315.56 | 1708.31 | 223.482 | 117.224 | 1011.16 | 1315.56 | 223.482 | 1784.35 |
2048 | 191.263 | 1315.56 | 1708.31 | 223.482 | 120.04 | 1011.71 | 1315.56 | 223.482 | 1784.35 |
4096 | ❌ | 1315.56 | 1708.31 | 223.482 | 126.632 | 1007.77 | 1315.56 | 223.482 | 1784.35 |
8192 | ❌ | 1315.56 | 1708.31 | 223.482 | 130.824 | 1013.42 | 1315.56 | 223.482 | 1784.35 |
16384 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1005.31 | 1315.56 | 223.482 | 1784.35 |
32768 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1018.49 | 1315.56 | 223.482 | 1784.35 |
65536 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1017.72 | 1315.56 | 223.482 | 1784.35 |
131072 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1037.15 | 1315.56 | 223.482 | 1784.35 |
262144 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1045.93 | 1315.56 | 223.482 | 1784.35 |
524288 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 1092.14 | 1315.56 | 223.482 | 1784.35 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo | sp1-precompile | r0-precompile |
---|---|---|---|---|---|---|---|
256 | 35325.0 | 33911 | 51280 | 14791.0 | 131071 | 11492 | 31643 |
512 | 60893.0 | 52743 | 90848 | 29233.0 | 131071 | 15820 | 55467 |
1024 | 112029 | 90407 | 169984 | 46295.0 | 131071 | 24476 | 103133 |
2048 | 214301 | 165735 | 328256 | 80419.0 | 131071 | 41788 | 198463 |
4096 | ❌ | 316391 | 644800 | 160489 | 262143 | 76412 | 389106 |
8192 | ❌ | 617703 | 1277888 | 308807 | 524287 | 145660 | 770392 |
16384 | ❌ | 1220327 | 2544064 | 💾 | 1048575 | 284156 | 1532981 |
32768 | ❌ | 2425575 | 5076416 | 💾 | 2097151 | 561148 | 3058159 |
65536 | ❌ | 4836071 | 10141120 | 💾 | 4194303 | 1115132 | 6108532 |
131072 | ❌ | 9657063 | 20270528 | 💾 | 8388607 | 2223100 | 12209241 |
262144 | ❌ | 19299047 | 40529344 | 💾 | 16777215 | 4439036 | 24410676 |
524288 | ❌ | 38583015 | 81046976 | 💾 | 33554431 | 8870908 | 48813546 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
256 | 5.09 | 4.29 | 5.45 | 1.43 | 9.63 | 10.79 | 7.84 | 1.43 | 5.44 |
512 | 4.9 | 4.42 | 5.44 | 1.42 | 18.9 | 10.82 | 8.18 | 1.42 | 5.45 |
1024 | 4.9 | 4.54 | 5.44 | 2.3 | 35.74 | 10.86 | 8.01 | 1.42 | 5.44 |
2048 | 4.91 | 5.02 | 5.44 | 4.58 | 35.73 | 10.87 | 8.1 | 2.3 | 5.44 |
4096 | ❌ | 6.28 | 5.44 | 9.15 | 71.46 | 11.03 | 8.29 | 4.59 | 5.44 |
8192 | ❌ | 8.6 | 5.44 | 9.17 | 142.68 | 11.34 | 8.53 | 9.16 | 5.44 |
16384 | ❌ | 13.32 | 5.44 | 9.18 | 💾 | 11.94 | 8.79 | 9.17 | 5.44 |
32768 | ❌ | 15.31 | 8.78 | 9.18 | 💾 | 13.48 | 8.63 | 9.18 | 5.44 |
65536 | ❌ | 29.19 | 16.64 | 9.18 | 💾 | 16.52 | 12.9 | 9.18 | 5.44 |
131072 | ❌ | 35.4 | 21.6 | 9.2 | 💾 | 22.5 | 21.26 | 9.19 | 6.89 |
262144 | ❌ | 45.34 | 22.77 | 9.21 | 💾 | 34.52 | 34.15 | 9.21 | 12.86 |
524288 | ❌ | 54.88 | 25.59 | 9.25 | 💾 | 63.89 | 49.57 | 9.23 | 24.78 |
Sha2-Chain
Benchmark Sha256 hash of 32 bytes for n
iteration.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
8 | 2.164 | 12.798 | 29.444 | 11.303 | 11.083 | 11.488 | 34.606 | 6.862 | 43.511 |
16 | 2.973 | 13.875 | 30.349 | 11.252 | 11.115 | 18.583 | 34.339 | 8.352 | 44.295 |
32 | 4.912 | 14.62 | 33.814 | 17.156 | 11.991 | 11.938 | 34.203 | 8.305 | 43.893 |
64 | 8.285 | 17.617 | 38.294 | 28.948 | 10.631 | 14.158 | 33.862 | 11.289 | 44.197 |
128 | 14.469 | 24.728 | 49.582 | 52.457 | 19.206 | 10.335 | 35.134 | 11.315 | 45.006 |
256 | 25.909 | 37.471 | 73.574 | 74.818 | 26.184 | 14.057 | 36.234 | 17.422 | 42.793 |
512 | 50.064 | 52.895 | 126.086 | 144.234 | 56.529 | 22.012 | 40.643 | 29.057 | 46.573 |
1024 | 96.901 | 71.752 | 231.888 | 285.012 | 95.878 | 20.846 | 50.02 | 52.522 | 52.862 |
2048 | ❌ | 132.275 | 516.789 | 574.819 | 💾 | 14.061 | 77.808 | 110.071 | 61.761 |
4096 | ❌ | 227.325 | 905.706 | 1110.77 | 💾 | 17.799 | 103.011 | 191.208 | 81.617 |
8192 | ❌ | 444.028 | 1814.44 | 2210.89 | 💾 | 18.607 | 193.572 | 365.446 | 122.31 |
16384 | ❌ | 872.638 | 3596.93 | 4404.24 | 💾 | 20.569 | 319.33 | 723.963 | 212.16 |
32768 | ❌ | 1698.85 | 7067.26 | 8791.74 | 💾 | 25.041 | 628.712 | 1443.04 | 388.56 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
8 | 49.0 | 82 | 140 | 22 | 432.0 | 10 | 83 | 23 | 141 |
16 | 55.0 | 101 | 140 | 23 | 398.0 | 10 | 93 | 20 | 142 |
32 | 86.0 | 108 | 140 | 22 | 402.0 | 10 | 94 | 21 | 141 |
64 | 102.0 | 88 | 140 | 23 | 375.0 | 10 | 87 | 22 | 141 |
128 | 78.0 | 108 | 141 | 23 | 413.0 | 10 | 84 | 21 | 141 |
256 | 60.0 | 83 | 139 | 22 | 422.0 | 11 | 83 | 23 | 141 |
512 | 60.0 | 84 | 140 | 22 | 420.0 | 37 | 83 | 22 | 142 |
1024 | 103.0 | 83 | 139 | 23 | 456.0 | 41 | 83 | 22 | 141 |
2048 | ❌ | 83 | 57 | 23 | 💾 | 40 | 83 | 10 | 141 |
4096 | ❌ | 86 | 56 | 22 | 💾 | 119 | 83 | 23 | 141 |
8192 | ❌ | 83 | 56 | 22 | 💾 | 116 | 83 | 10 | 142 |
16384 | ❌ | 84 | 56 | 24 | 💾 | 182 | 86 | 23 | 141 |
32768 | ❌ | 83 | 56 | 24 | 💾 | 262 | 83 | 23 | 141 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
8 | 181.495 | 1315.56 | 1708.31 | 223.482 | 109.544 | 952.094 | 1315.56 | 223.482 | 1784.35 |
16 | 191.263 | 1315.56 | 1708.31 | 223.482 | 106.728 | 971.154 | 1315.56 | 223.482 | 1784.35 |
32 | 201.447 | 1315.56 | 1708.31 | 223.482 | 108.776 | 955.554 | 1315.56 | 223.482 | 1784.35 |
64 | 212.047 | 1315.56 | 1708.31 | 223.482 | 106.568 | 962.798 | 1315.56 | 223.482 | 1784.35 |
128 | 223.063 | 1315.56 | 1708.31 | 223.482 | 112.36 | 946.158 | 1315.56 | 223.482 | 1784.35 |
256 | 234.495 | 1315.56 | 1708.31 | 223.482 | 117.992 | 969.698 | 1315.56 | 223.482 | 1784.35 |
512 | 246.343 | 1315.56 | 1708.31 | 223.482 | 124.168 | 947.738 | 1315.56 | 223.482 | 1784.35 |
1024 | 258.607 | 1315.56 | 1708.31 | 223.482 | 129.512 | 948.874 | 1315.56 | 223.482 | 1784.35 |
2048 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 959.414 | 1315.56 | 223.482 | 1784.35 |
4096 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 966.074 | 1315.56 | 223.482 | 1784.35 |
8192 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 976.142 | 1315.56 | 223.482 | 1784.35 |
16384 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 986.186 | 1315.56 | 223.482 | 1784.35 |
32768 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1001.51 | 1315.56 | 223.482 | 1784.35 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo | sp1-precompile | r0-precompile |
---|---|---|---|---|---|---|---|
8 | 69290.0 | 48118 | 45952 | 3008.0 | 65535 | 13495 | 14624 |
16 | 136282 | 85686 | 83432 | 5872.0 | 65535 | 20287 | 20776 |
32 | 270266 | 160822 | 158392 | 11600.0 | 65535 | 33871 | 33080 |
64 | 538234 | 311094 | 308312 | 23056.0 | 65535 | 61039 | 57688 |
128 | 1074187 | 611638 | 608152 | 45968.0 | 65535 | 115375 | 106904 |
256 | 2146059 | 1212726 | 1207832 | 91792.0 | 131071 | 224047 | 205336 |
512 | 4289803 | 2414902 | 2407192 | 183440 | 262143 | 441391 | 402200 |
1024 | 8577291 | 4819254 | 4805912 | 366736 | 524287 | 876079 | 795928 |
2048 | ❌ | 9627958 | 9603352 | 💾 | 1048575 | 1745455 | 1583384 |
4096 | ❌ | 19245366 | 19198232 | 💾 | 2097151 | 3484207 | 3158296 |
8192 | ❌ | 38480182 | 38387992 | 💾 | 4194303 | 6961711 | 6308120 |
16384 | ❌ | 76949814 | 76767512 | 💾 | 8388607 | 13916719 | 12607768 |
32768 | ❌ | 153889078 | 153526552 | 💾 | 16777215 | 27826735 | 25207064 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile |
---|---|---|---|---|---|---|---|---|---|
8 | 5.1 | 4.31 | 5.44 | 1.42 | 9.1 | 10.74 | 7.81 | 1.44 | 5.44 |
16 | 4.91 | 4.54 | 5.44 | 1.42 | 9.04 | 10.74 | 8.04 | 1.43 | 5.44 |
32 | 5.92 | 4.98 | 5.44 | 2.3 | 9.04 | 10.75 | 8.5 | 1.43 | 5.44 |
64 | 8.69 | 6.19 | 5.44 | 4.58 | 9.05 | 10.77 | 8.05 | 1.42 | 5.44 |
128 | 14.39 | 8.81 | 5.44 | 9.14 | 17.87 | 10.78 | 8.29 | 1.42 | 5.44 |
256 | 26.36 | 13.27 | 5.44 | 9.17 | 29.66 | 10.83 | 8.52 | 2.3 | 5.44 |
512 | 48.32 | 14.7 | 8.57 | 9.17 | 59.33 | 10.95 | 8.63 | 4.59 | 5.44 |
1024 | 93.03 | 27.2 | 16.24 | 9.17 | 118.62 | 11.18 | 11.08 | 9.15 | 5.44 |
2048 | ❌ | 40.4 | 20.59 | 9.18 | 💾 | 11.64 | 18.61 | 9.17 | 5.44 |
4096 | ❌ | 44.04 | 20.6 | 9.19 | 💾 | 12.88 | 35.18 | 9.17 | 5.44 |
8192 | ❌ | 55.14 | 20.62 | 9.2 | 💾 | 15.31 | 53.3 | 9.17 | 6.64 |
16384 | ❌ | 58.47 | 20.66 | 9.2 | 💾 | 20.08 | 52.23 | 9.18 | 12.34 |
32768 | ❌ | 60.22 | 20.8 | 9.27 | 💾 | 29.71 | 63.03 | 9.18 | 23.74 |
Sha3
Benchmark Keccak256 hash of n
bytes. For Stone, the implementation of Keccak256 from stdlib as well as builtin was benchmarked.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
256 | 1.719 | 13.328 | 31.471 | 11.348 | 20.266 | 19.743 | 60.343 | 33.9 | 57.925 | 18.361 |
512 | 2.109 | 13.627 | 32.773 | 17.205 | 30.972 | 10.897 | 60.387 | 36.853 | 59.354 | 17.21 |
1024 | 3.15 | 15.275 | 34.777 | 17.179 | 65.925 | 23.258 | 60.193 | 42.789 | 58.748 | 18.037 |
2048 | 4.924 | 17.564 | 39.929 | 29.094 | 115.746 | 16.661 | 60.696 | 43.047 | 59.419 | 16.58 |
4096 | ❌ | 24.096 | 45.142 | 52.54 | 💾 | 11.693 | 60.073 | 54.792 | 61.503 | 27.567 |
8192 | ❌ | 37.851 | 60.927 | 110.08 | 💾 | 16.392 | 61.477 | 78.203 | 63.798 | 43.043 |
16384 | ❌ | 50.158 | 94.74 | 225.876 | 💾 | 15.422 | 63.416 | 136.103 | 70.003 | 86.217 |
32768 | ❌ | 66.943 | 165.212 | 422.471 | 💾 | 16.528 | 70.234 | 251.263 | 79.62 | 151.873 |
65536 | ❌ | 109.098 | 308.693 | 841.045 | 💾 | 21.269 | 84.047 | 437.697 | 102.599 | 💾 |
131072 | ❌ | 207.231 | 753.494 | 1671.26 | 💾 | 32.631 | 96.357 | 856.136 | 151.591 | 💾 |
262144 | ❌ | 367.867 | 1526.34 | 3310.6 | 💾 | 59.078 | 133.31 | 1691.49 | 252.934 | 💾 |
524288 | ❌ | 701.99 | 3048.73 | 6615.82 | 💾 | 106.153 | 191.352 | 3376.32 | 454.067 | 💾 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
256 | 77.0 | 107 | 139 | 23 | 505.0 | 11 | 116 | 24 | 141 | 269.0 |
512 | 66.0 | 107 | 139 | 23 | 494.0 | 11 | 108 | 23 | 141 | 213.0 |
1024 | 58.0 | 84 | 139 | 22 | 495.0 | 11 | 88 | 23 | 141 | 244.0 |
2048 | 61.0 | 109 | 140 | 24 | 544.0 | 11 | 104 | 22 | 141 | 217.0 |
4096 | ❌ | 110 | 140 | 22 | 💾 | 11 | 114 | 22 | 142 | 229.0 |
8192 | ❌ | 108 | 139 | 23 | 💾 | 11 | 89 | 22 | 142 | 254.0 |
16384 | ❌ | 83 | 140 | 24 | 💾 | 38 | 91 | 23 | 142 | 262.0 |
32768 | ❌ | 83 | 140 | 24 | 💾 | 139 | 88 | 22 | 141 | 273.0 |
65536 | ❌ | 82 | 140 | 24 | 💾 | 198 | 111 | 23 | 141 | 💾 |
131072 | ❌ | 83 | 57 | 24 | 💾 | 201 | 88 | 22 | 141 | 💾 |
262144 | ❌ | 83 | 56 | 23 | 💾 | 631 | 88 | 23 | 141 | 💾 |
524288 | ❌ | 83 | 56 | 22 | 💾 | 191 | 88 | 23 | 141 | 💾 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
256 | 170.823 | 1315.56 | 1708.31 | 223.482 | 113.64 | 1002.79 | 1477.23 | 223.482 | 1787.55 | 108.008 |
512 | 180.175 | 1315.56 | 1708.31 | 223.482 | 119.272 | 991.494 | 1477.23 | 223.482 | 1787.55 | 107.752 |
1024 | 189.943 | 1315.56 | 1708.31 | 223.482 | 126.92 | 1001.15 | 1477.23 | 223.482 | 1787.55 | 110.312 |
2048 | 200.127 | 1315.56 | 1708.31 | 223.482 | 130.536 | 1001.4 | 1477.23 | 223.482 | 1787.55 | 110.056 |
4096 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1016.95 | 1477.23 | 223.482 | 1787.55 | 114.92 |
8192 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1012.45 | 1477.23 | 223.482 | 1787.55 | 119.848 |
16384 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1027.4 | 1477.23 | 223.482 | 1787.55 | 126.088 |
32768 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1014.61 | 1477.23 | 223.482 | 1787.55 | 132.872 |
65536 | ❌ | 1315.56 | 1708.31 | 223.482 | 💾 | 1037.78 | 1477.23 | 223.482 | 1787.55 | 💾 |
131072 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1073.61 | 1477.23 | 223.482 | 1787.55 | 💾 |
262144 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 1154.34 | 1477.23 | 223.482 | 1787.55 | 💾 |
524288 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1179.28 | 1477.23 | 223.482 | 1787.55 | 💾 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo | sp1-precompile | r0-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|
256 | 46732.0 | 47653 | 63651 | 19431.0 | 131071 | 15337 | 36390 | 998.0 |
512 | 90274.0 | 84714 | 121925 | 38611.0 | 131071 | 20082 | 63910 | 1762.0 |
1024 | 177358 | 158823 | 238473 | 59058.0 | 262143 | 29559 | 118950 | 3286.0 |
2048 | 351574 | 307036 | 471569 | 117859 | 524287 | 48508 | 229030 | 6334.0 |
4096 | ❌ | 586913 | 921296 | 💾 | 1048575 | 86015 | 448102 | 12241.0 |
8192 | ❌ | 1146654 | 1820750 | 💾 | 2097151 | 161016 | 886246 | 24055.0 |
16384 | ❌ | 2266147 | 3619658 | 💾 | 2097151 | 311029 | 1762534 | 47683.0 |
32768 | ❌ | 4505110 | 7217474 | 💾 | 4194303 | 611032 | 3515110 | 94930.0 |
65536 | ❌ | 8999653 | 14429571 | 💾 | 8388607 | 1211497 | 7021350 | 💾 |
131072 | ❌ | 17988714 | 28853765 | 💾 | 16777215 | 2412402 | 14038604 | 💾 |
262144 | ❌ | 35966823 | 57702153 | 💾 | 33554431 | 4814199 | 28068169 | 💾 |
524288 | ❌ | 71923036 | 115398929 | 💾 | 67108863 | 9617788 | 56132428 | 💾 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
256 | 5.43 | 4.5 | 5.44 | 1.42 | 19.02 | 10.82 | 11.56 | 3.46 | 5.49 | 11.13 |
512 | 4.9 | 4.54 | 5.44 | 2.3 | 35.74 | 10.85 | 11.59 | 3.45 | 5.49 | 11.07 |
1024 | 4.9 | 5.01 | 5.44 | 2.3 | 71.41 | 10.97 | 11.55 | 3.45 | 5.49 | 11.07 |
2048 | 5.8 | 6.16 | 5.44 | 4.59 | 142.54 | 11.21 | 11.62 | 3.45 | 5.49 | 11.07 |
4096 | ❌ | 8.7 | 5.44 | 9.17 | 💾 | 11.9 | 11.79 | 4.59 | 5.49 | 21.97 |
8192 | ❌ | 13.65 | 5.44 | 9.17 | 💾 | 13.19 | 11.96 | 9.16 | 5.49 | 43.91 |
16384 | ❌ | 12.24 | 5.85 | 9.18 | 💾 | 13.73 | 11.87 | 9.17 | 5.49 | 87.17 |
32768 | ❌ | 24.03 | 10.77 | 9.18 | 💾 | 17 | 12.23 | 9.17 | 5.48 | 174.24 |
65536 | ❌ | 39 | 20.53 | 9.19 | 💾 | 23.54 | 12.22 | 9.18 | 5.49 | 💾 |
131072 | ❌ | 52.23 | 21.23 | 9.2 | 💾 | 42.08 | 17.09 | 9.18 | 8.19 | 💾 |
262144 | ❌ | 52.28 | 22.77 | 9.2 | 💾 | 78.97 | 27.25 | 9.21 | 15.42 | 💾 |
524288 | ❌ | 57.67 | 22.93 | 9.27 | 💾 | 151.16 | 55.68 | 9.22 | 29.87 | 💾 |
Sha3-Chain
Benchmark Keccak256 hash of 32 bytes for n
iteration.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
8 | 3.151 | 14.691 | 31.427 | 17.2 | 66.291 | 13.552 | 59.266 | 34.114 | 58.806 | 16.827 |
16 | 5.142 | 17.431 | 34.818 | 28.941 | 115.28 | 18.47 | 60.353 | 37.322 | 59.329 | 18.279 |
32 | 8.57 | 23.603 | 43.135 | 52.437 | 💾 | 13.475 | 60.409 | 36.973 | 58.913 | 27.208 |
64 | 15.082 | 37.422 | 57.017 | 74.883 | 💾 | 14.55 | 60.377 | 42.978 | 59.971 | 43.739 |
128 | 29.25 | 50.151 | 90.378 | 132.931 | 💾 | 18.699 | 59.433 | 54.814 | 61.487 | 85.956 |
256 | 55.496 | 66.364 | 156.377 | 261.039 | 💾 | 22.329 | 61.843 | 78.503 | 65.217 | 151.067 |
512 | 105.689 | 104.036 | 297.659 | 515.447 | 💾 | 29.078 | 65.634 | 92.151 | 72.255 | 💾 |
1024 | ❌ | 192.377 | 715.326 | 1037.74 | 💾 | 52.14 | 73.4 | 176.441 | 88.201 | 💾 |
2048 | ❌ | 357.885 | 1474.13 | 2034.16 | 💾 | 94.358 | 91.595 | 348.058 | 115.792 | 💾 |
4096 | ❌ | 681.138 | 2909.76 | 4083.82 | 💾 | ❌ | 109.247 | 665.897 | 171.808 | 💾 |
8192 | ❌ | 1316.41 | 5795.09 | 8122.75 | 💾 | ❌ | 183.78 | 1286.1 | 510.264 | 💾 |
16384 | ❌ | 2597.86 | 11684.6 | 16241.2 | 💾 | ❌ | 353.935 | 2565.25 | 870.635 | 💾 |
32768 | ❌ | 5192.04 | 23068.5 | 32449.9 | 💾 | ❌ | 657.383 | 5090.45 | 1712.06 | 💾 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
8 | 83.0 | 107 | 139 | 23 | 526.0 | 11.0 | 122 | 20 | 142 | 252.0 |
16 | 114.0 | 86 | 139 | 23 | 555.0 | 11.0 | 116 | 23 | 142 | 254.0 |
32 | 75.0 | 111 | 139 | 23 | 💾 | 11.0 | 88 | 23 | 142 | 256.0 |
64 | 79.0 | 102 | 139 | 23 | 💾 | 11.0 | 116 | 23 | 142 | 265.0 |
128 | 57.0 | 99 | 139 | 23 | 💾 | 11.0 | 99 | 22 | 141 | 270.0 |
256 | 64.0 | 83 | 139 | 22 | 💾 | 42.0 | 90 | 20 | 141 | 288.0 |
512 | 118.0 | 83 | 139 | 23 | 💾 | 44.0 | 103 | 23 | 141 | 💾 |
1024 | ❌ | 83 | 56 | 23 | 💾 | 49.0 | 106 | 23 | 144 | 💾 |
2048 | ❌ | 84 | 56 | 23 | 💾 | 572.0 | 88 | 24 | 142 | 💾 |
4096 | ❌ | 83 | 56 | 23 | 💾 | ❌ | 88 | 23 | 141 | 💾 |
8192 | ❌ | 84 | 56 | 23 | 💾 | ❌ | 87 | 22 | 56 | 💾 |
16384 | ❌ | 83 | 55 | 20 | 💾 | ❌ | 88 | 24 | 57 | 💾 |
32768 | ❌ | 84 | 57 | 23 | 💾 | ❌ | 88 | 22 | 55 | 💾 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
8 | 191.263 | 1315.56 | 1708.31 | 223.482 | 124.616 | 995.554 | 1477.23 | 223.482 | 1787.55 | 109.8 |
16 | 201.447 | 1315.56 | 1708.31 | 223.482 | 129.96 | 1006.302 | 1477.23 | 223.482 | 1787.55 | 108.264 |
32 | 212.047 | 1315.56 | 1708.31 | 223.482 | 💾 | 1028.558 | 1477.23 | 223.482 | 1787.55 | 115.944 |
64 | 223.063 | 1315.56 | 1708.31 | 223.482 | 💾 | 1023.458 | 1477.23 | 223.482 | 1787.55 | 122.856 |
128 | 234.495 | 1315.56 | 1708.31 | 223.482 | 💾 | 1032.058 | 1477.23 | 223.482 | 1787.55 | 123.304 |
256 | 246.343 | 1315.56 | 1708.31 | 223.482 | 💾 | 1031.81 | 1477.23 | 223.482 | 1787.55 | 131.944 |
512 | 258.607 | 1315.56 | 1708.31 | 223.482 | 💾 | 1084.746 | 1477.23 | 223.482 | 1787.55 | 💾 |
1024 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | 1155.054 | 1477.23 | 223.482 | 1787.55 | 💾 |
2048 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 1182.062 | 1477.23 | 223.482 | 1787.55 | 💾 |
4096 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | ❌ | 1477.23 | 223.482 | 1787.55 | 💾 |
8192 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | ❌ | 1477.23 | 223.482 | 852.937 | 💾 |
16384 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | ❌ | 1477.23 | 223.482 | 852.937 | 💾 |
32768 | ❌ | 1315.56 | 852.937 | 223.482 | 💾 | ❌ | 1477.23 | 223.482 | 816.649 | 💾 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo | sp1-precompile | r0-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|
8 | 203331 | 147841 | 147328 | 57648.0 | 262143 | 18577 | 27805 | 3480.0 |
16 | 404363 | 284353 | 286232 | 115185 | 524287 | 25825 | 43693 | 6904.0 |
32 | 806427 | 557377 | 564040 | 💾 | 1048575 | 40321 | 75469 | 13752.0 |
64 | 1610555 | 1103425 | 1119656 | 💾 | 2097151 | 69313 | 139021 | 27448.0 |
128 | 3218828 | 2195521 | 2230888 | 💾 | 4194303 | 127297 | 266125 | 54840.0 |
256 | 6435340 | 4379713 | 4453352 | 💾 | 8388607 | 243265 | 520333 | 109624 |
512 | 12868364 | 8748097 | 8898280 | 💾 | 16777215 | 475201 | 1028749 | 💾 |
1024 | ❌ | 17484865 | 17788136 | 💾 | 33554431 | 939073 | 2050355 | 💾 |
2048 | ❌ | 34958401 | 35567848 | 💾 | 67108863 | 1866817 | 4093575 | 💾 |
4096 | ❌ | 69905473 | 71127272 | 💾 | ❌ | 3722305 | 8174812 | 💾 |
8192 | ❌ | 139799617 | 142246120 | 💾 | ❌ | 7433281 | 16338156 | 💾 |
16384 | ❌ | 279587905 | 284483816 | 💾 | ❌ | 14855233 | 32669621 | 💾 |
32768 | ❌ | 559164481 | 568959208 | 💾 | ❌ | 29699137 | 65327886 | 💾 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo | sp1-precompile | r0-precompile | openvm-precompile | stone-precompile |
---|---|---|---|---|---|---|---|---|---|---|
8 | 5.09 | 5.14 | 5.44 | 2.3 | 71.49 | 10.95 | 11.61 | 3.45 | 5.49 | 11.07 |
16 | 5.85 | 6.23 | 5.44 | 4.58 | 142.53 | 11.14 | 11.43 | 3.45 | 5.49 | 11.07 |
32 | 8.88 | 9.03 | 5.44 | 9.14 | 💾 | 11.83 | 11.95 | 3.45 | 5.49 | 21.98 |
64 | 14.89 | 13.13 | 5.44 | 9.17 | 💾 | 13.2 | 11.6 | 3.45 | 5.49 | 43.91 |
128 | 27.13 | 12.02 | 5.7 | 9.18 | 💾 | 15.6 | 11.49 | 4.57 | 5.49 | 87.26 |
256 | 51.24 | 21.98 | 10.48 | 9.18 | 💾 | 20.66 | 11.14 | 9.13 | 5.49 | 174.37 |
512 | 100.0 | 39.84 | 19.95 | 9.18 | 💾 | 36.36 | 11.92 | 9.16 | 5.49 | 💾 |
1024 | ❌ | 41.18 | 19.95 | 9.19 | 💾 | 67.57 | 11.24 | 9.16 | 5.49 | 💾 |
2048 | ❌ | 47.4 | 19.96 | 9.2 | 💾 | 128.39 | 14.17 | 9.16 | 6.28 | 💾 |
4096 | ❌ | 59.51 | 20.31 | 9.22 | 💾 | ❌ | 30.3 | 9.17 | 11.39 | 💾 |
8192 | ❌ | 61.16 | 20.5 | 9.26 | 💾 | ❌ | 53.85 | 9.18 | 20.72 | 💾 |
16384 | ❌ | 61.43 | 20.75 | 9.34 | 💾 | ❌ | 57.23 | 9.2 | 20.8 | 💾 |
32768 | ❌ | 63.58 | 20.93 | 9.49 | 💾 | ❌ | 63.67 | 9.22 | 20.87 | 💾 |
BLAKE
Benchmark Blake2s256 hash of n
bytes. For the Stwo benchmarks, a dedicated opcode for BLAKE was used, as illustrated in this Cairo source file. SP1 and R0 do not support precompiles for BLAKE.
Prover Time (s)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
2048 | 14.272 | 28.817 | 35.206 |
4096 | 14.773 | 52.33 | 34.752 |
8192 | 17.678 | 65.67 | 30.911 |
16384 | 24.308 | 126.219 | 32.214 |
32768 | 37.485 | 240.843 | 30.324 |
65536 | 55.119 | 476.273 | 38.614 |
131072 | 101.786 | 947.306 | 35.269 |
262144 | 162.863 | 1888.15 | 33.434 |
Verifier Time (ms)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
2048 | 102 | 10 | 11 |
4096 | 105 | 23 | 11 |
8192 | 111 | 22 | 11 |
16384 | 99 | 24 | 11 |
32768 | 104 | 23 | 12 |
65536 | 81 | 22 | 12 |
131072 | 81 | 23 | 37 |
262144 | 81 | 22 | 48 |
Proof Size (KB)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
2048 | 1315.56 | 223.482 | 1117.14 |
4096 | 1315.56 | 223.482 | 1138.83 |
8192 | 1315.56 | 223.482 | 1134.37 |
16384 | 1315.56 | 223.482 | 1121.81 |
32768 | 1315.56 | 223.482 | 1144.87 |
65536 | 1315.56 | 223.482 | 1158.19 |
131072 | 1315.56 | 223.482 | 1141.67 |
262144 | 1315.56 | 223.482 | 1148.66 |
Cycle Count
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
2048 | 100997 | 264515 | 65535 |
4096 | 193061 | 521763 | 65535 |
8192 | 377189 | 1036259 | 65535 |
16384 | 745445 | 2065251 | 65535 |
32768 | 1481957 | 4123235 | 131071 |
65536 | 2954981 | 8239203 | 262143 |
131072 | 5901029 | 16471139 | 524287 |
262144 | 11793125 | 32935011 | 1048575 |
Peak Memory (GB)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
2048 | 4.55 | 4.58 | 19.98 |
4096 | 5 | 9.14 | 19.94 |
8192 | 6.09 | 9.17 | 19.98 |
16384 | 8.83 | 9.18 | 20.02 |
32768 | 13.83 | 9.18 | 20.17 |
65536 | 16.59 | 9.18 | 20.5 |
131072 | 38 | 9.19 | 20.88 |
262144 | 45.31 | 9.19 | 21.97 |
BLAKE-Chain
Benchmark Blake2s256 hash of 32 bytes for n
iteration. For the Stwo benchmarks, a dedicated opcode for BLAKE was used, as illustrated in this Cairo source file. SP1 and R0 do not support precompiles for BLAKE.
Prover Time (s)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
128 | 17.319 | 28.76 | 30.885 |
256 | 24.279 | 52.074 | 32.974 |
512 | 38.5 | 86.181 | 33.26 |
1024 | 53.129 | 166.706 | 30.723 |
2048 | 89.423 | 316.029 | 38.153 |
4096 | 145.299 | 628.118 | 31.717 |
8192 | 259.23 | 1260.14 | 37.422 |
16384 | 495.878 | 2521.17 | 35.82 |
Verifier Time (ms)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
128 | 105 | 22 | 11 |
256 | 104 | 22 | 11 |
512 | 108 | 23 | 11 |
1024 | 84 | 23 | 11 |
2048 | 81 | 23 | 41 |
4096 | 85 | 23 | 40 |
8192 | 81 | 23 | 49 |
16384 | 81 | 22 | 39 |
Proof Size (KB)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
128 | 1315.56 | 223.482 | 1119.07 |
256 | 1315.56 | 223.482 | 1121.09 |
512 | 1315.56 | 223.482 | 1128.54 |
1024 | 1315.56 | 223.482 | 1119.77 |
2048 | 1315.56 | 223.482 | 1127.81 |
4096 | 1315.56 | 223.482 | 1127.26 |
8192 | 1315.56 | 223.482 | 1134 |
16384 | 1315.56 | 223.482 | 1143.83 |
Cycle Count
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
128 | 376249 | 356507 | 65535 |
256 | 742841 | 702491 | 65535 |
512 | 1476025 | 1394459 | 131071 |
1024 | 2942393 | 2778395 | 262143 |
2048 | 5875129 | 5546267 | 524287 |
4096 | 11740601 | 11082011 | 1048575 |
8192 | 23471545 | 22153499 | 2097151 |
16384 | 46933433 | 44296475 | 4194303 |
Peak Memory (GB)
n | sp1 | r0 | stwo-precompile |
---|---|---|---|
128 | 6.16 | 4.58 | 19.99 |
256 | 9.12 | 9.15 | 20.01 |
512 | 13.17 | 9.17 | 20.13 |
1024 | 15.96 | 9.17 | 20.39 |
2048 | 32.88 | 9.17 | 20.72 |
4096 | 44.6 | 9.19 | 21.54 |
8192 | 50.57 | 9.19 | 23.31 |
16384 | 53.25 | 9.21 | 27.19 |
Matrix Multiplication
Benchmark multiplication of two matrices of size n x n.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
4 | 1.196 | 13.179 | 29.508 | 6.763 | 10.922 | 12.72 |
8 | 1.648 | 12.862 | 30.119 | 8.21 | 9.879 | 19.566 |
16 | 4.979 | 15.252 | 33.467 | 17.135 | 26.384 | 16.531 |
32 | 27.702 | 26.491 | 58.102 | 65.806 | 💾 | 14.655 |
64 | ❌ | 135.352 | 292.125 | 480.724 | 💾 | 21.199 |
128 | ❌ | 881.29 | 2671.68 | 3809.76 | 💾 | 76.061 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
4 | 44.0 | 110 | 140 | 23 | 111.0 | 10 |
8 | 51.0 | 110 | 140 | 22 | 120.0 | 10 |
16 | 86.0 | 102 | 139 | 23 | 139.0 | 10 |
32 | 84.0 | 109 | 140 | 25 | 💾 | 18 |
64 | ❌ | 85 | 140 | 22 | 💾 | 41 |
128 | ❌ | 84 | 56 | 23 | 💾 | 45 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
4 | 144.647 | 1315.53 | 1708.31 | 223.234 | 109.032 | 815.674 |
8 | 169.607 | 1315.53 | 1708.31 | 223.234 | 108.52 | 833.086 |
16 | 198.911 | 1315.53 | 1708.31 | 223.234 | 116.456 | 844.238 |
32 | 233.591 | 1315.53 | 1708.31 | 223.234 | 💾 | 844.914 |
64 | ❌ | 1315.53 | 1708.31 | 223.234 | 💾 | 876.354 |
128 | ❌ | 1315.53 | 852.937 | 223.234 | 💾 | 981.178 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo |
---|---|---|---|---|---|
4 | 7181.0 | 8430 | 5820 | 2181.0 | 32767 |
8 | 42753.0 | 23951 | 21352 | 13521.0 | 32767 |
16 | 314782 | 139844 | 137280 | 94569.0 | 131071 |
32 | 2457710 | 1044394 | 1041904 | 💾 | 1048575 |
64 | ❌ | 8211062 | 8208720 | 💾 | 8388607 |
128 | ❌ | 65306638 | 65304592 | 💾 | 67108863 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
4 | 5.43 | 4.45 | 5.43 | 1.42 | 9.09 | 10.73 |
8 | 4.91 | 4.29 | 5.43 | 1.43 | 9.03 | 10.73 |
16 | 5.8 | 5.12 | 5.43 | 2.29 | 29.68 | 10.82 |
32 | 26.45 | 10.34 | 5.43 | 9.17 | 💾 | 11.61 |
64 | ❌ | 39.28 | 19.69 | 9.17 | 💾 | 19.92 |
128 | ❌ | 60.01 | 20.09 | 9.19 | 💾 | 101.6 |
Elliptic Curve Addition
Benchmark n point doubling for secp256k1.
Prover Time (s)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
256 | 50.41 | 70.731 | 158.631 | 283.941 | 18.869 | 11.993 |
512 | 96.814 | 116.11 | 292.691 | 531.309 | 30.529 | 19.268 |
1024 | ❌ | 211.962 | 582.356 | 1037.16 | 65.693 | 15.356 |
2048 | ❌ | 394.022 | 1048.01 | 2072.757 | 114.312 | 14.318 |
4096 | ❌ | 741.169 | 2029.72 | 4079.796 | 💾 | 14.407 |
8192 | ❌ | 1426.64 | 4046.32 | 8181.789 | 💾 | 16.635 |
16384 | ❌ | 2844.32 | 7953.88 | ❌ | 💾 | 24.382 |
32768 | ❌ | 5628.39 | 15793.7 | ❌ | 💾 | 28.304 |
Verifier Time (ms)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
256 | 67.0 | 83 | 140 | 22.0 | 188.0 | 10 |
512 | 60.0 | 83 | 140 | 23.0 | 181.0 | 10 |
1024 | ❌ | 83 | 56 | 23.0 | 187.0 | 34 |
2048 | ❌ | 83 | 56 | 25.0 | 238.0 | 39 |
4096 | ❌ | 83 | 57 | 23.0 | 💾 | 41 |
8192 | ❌ | 83 | 56 | 11.0 | 💾 | 36 |
16384 | ❌ | 83 | 56 | ❌ | 💾 | 130 |
32768 | ❌ | 83 | 55 | ❌ | 💾 | 15 |
Proof Size (KB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
256 | 249.399 | 1315.56 | 1708.31 | 223.482 | 111.592 | 918.41 |
512 | 261.663 | 1315.56 | 1708.31 | 223.482 | 120.04 | 897.182 |
1024 | ❌ | 1315.56 | 852.937 | 223.482 | 127.784 | 889.49 |
2048 | ❌ | 1315.56 | 852.937 | 223.482 | 131.976 | 896.15 |
4096 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 901.758 |
8192 | ❌ | 1315.56 | 816.649 | 223.482 | 💾 | 910.154 |
16384 | ❌ | 1315.56 | 852.937 | ❌ | 💾 | 929.282 |
32768 | ❌ | 1315.56 | 816.649 | ❌ | 💾 | 931.042 |
Cycle Count
n | jolt | sp1 | r0 | stone | stwo |
---|---|---|---|---|---|
256 | 4760760 | 4715656 | 4711913 | 59936.0 | 131071 |
512 | 9290168 | 9169544 | 9165801 | 119840 | 262143 |
1024 | ❌ | 18077320 | 18073577 | 239648 | 262143 |
2048 | ❌ | 35892872 | 35889129 | 479264 | 524287 |
4096 | ❌ | 71523976 | 71520233 | 💾 | 1048575 |
8192 | ❌ | 142786184 | 142782441 | 💾 | 2097151 |
16384 | ❌ | 285310600 | ❌ | 💾 | 4194303 |
32768 | ❌ | 570359432 | ❌ | 💾 | 8388607 |
Peak Memory (GB)
n | jolt | sp1 | openvm | r0 | stone | stwo |
---|---|---|---|---|---|---|
256 | 48.62 | 23.6 | 10.37 | 9.18 | 17.93 | 10.9 |
512 | 93.9 | 47.38 | 19.8 | 9.19 | 35.7 | 11.08 |
1024 | ❌ | 44.31 | 19.83 | 9.19 | 71.47 | 11.27 |
2048 | ❌ | 55.82 | 19.84 | 9.19 | 142.68 | 11.81 |
4096 | ❌ | 62.49 | 19.86 | 9.22 | 💾 | 13.22 |
8192 | ❌ | 63.14 | 19.94 | 9.26 | 💾 | 16.01 |
16384 | ❌ | 66.48 | 19.95 | ❌ | 💾 | 21.62 |
32768 | ❌ | 66.97 | 19.96 | ❌ | 💾 | 32.64 |