Thanks to Paul Gafni from RISC Zero, Vanishree Rao from Fermah, and Ratan Kaliani from Succinct for their thoughtful insights, discussion, and review for this piece.
Intro
Earlier this summer, Vanishree Rao, Co-Founder of Fermah, sparked intense debate on Twitter with a provocative assertion. The point she made was essentially that, over time, general purpose zkVMs will be replaced by custom zk circuits tailored to specific use cases.
The tweet prompted intense debate from leading zk founders, investors, and academic cryptographers. From indignant to supportive and everything in between, this conversation is illustrative of a more general ambiguity present in the future of zkVM space. What value do general purpose zkVMs provide? Will that value change as new cryptographic primitives and approaches emerge? In this piece, we will dive into these questions and more as we attempt to provide our opinion on the future of zkVMs.
What is a zkVM?
A zero-knowledge virtual machine (zkVM) is a virtual machine that takes a computer program and its corresponding input, executes the program, and produces a zero-knowledge proof (zkp) that the program was executed correctly. Let’s dive deeper into what this means.
Preliminaries: zero-knowledge proof systems
The zero knowledge property of a zero knowledge proof allows the prover (the entity creating the zkp) to generate a proof of correct program execution without revealing the inputs to the program, known as the "witness" in cryptographic terminology. Essentially, these proofs do not divulge any information about the witness that the prover claims to possess, while proving that the statement is true. Illustrated below, we see that the verifier can verify the proof without having access to the witness.
In order to implement zkps for an execution of a program, we need a proof system consisting of two steps: the frontend and the backend.
The frontend allows arithmetization, a way to convert code, which may contain arbitrarily complex statements and expressions, into a sequence of mathematical statements. The statements take the form of an arithmetic circuit consisting of wires and gates using a choice of constraint system, as visualized in the diagram below.
The backend allows for the conversion of the generated statements into a proof for the verifier to check via cryptographic techniques like Interactive Oracle Proofs (IOPs) and commitment schemes.
In the case of L2 rollups for example, we want to prove that our public input x, which would be some batch of transactions and witness w, some private key to prove the validity of the transactions, are valid to the verifier, smart contracts on the L1. Here, the function F would be the checks to ensure that the transactions are valid, i.e. the signatures can be verified, there is no double spend, etc. Informally, the frontend compiles the program along with x,y,z into a format the backend can generate a proof of via a commitment scheme.
How is a zkVM different from a custom circuit?
The fundamental difference between a custom circuit and a zkVM is in the frontend. In the custom circuit approach, a circuit is created for each specific program we want to prove. In a zkVM approach, an entire virtual machine(VM) is represented as circuits, and then the program we want to prove is provided as an input to the zkVM.
Returning back to the rollups example, we want to process transactions off-chain and submit a single proof of validity on-chain. In an Ethereum zk rollup for instance, we can do this via zk Ethereum Virtual Machines (EVMs), where the proof we submit testifies correct execution of the EVM for some transactions submitted to the rollup.
A custom circuit approach for designing a zkEVM would be to write circuit logic for the EVM in a circuit-specific language like Circom to represent the EVM software as wires and gates of a circuit.
The implementation of zkEVMs is notoriously difficult. As Vitalik put it in a 2022 blog post, the EVM is not designed around ZK-friendliness; the zkEVM must emulate the stack-based architecture as well as the opcodes used in EVMs. Another challenge is implementing computationally intensive data structures, i.e. Merkle Patricia Tries of the EVM to run in the circuit.
The zkVM approach would instead execute the EVM inside a microprocessor architecture, or the VM. That is, the VM is the part written in circuits (hence the zkVM) and the EVM is an input to the zkVM. In the case of Zeth, it is possible to run a Rust-based EVM interpreter like reth in zkVMs for EVM functionality to prove that a given Ethereum block is valid. The obvious benefit is that if there are updates to the execution client, the updated implementation can be run in the zkVM, without having to modify the circuits.
It is now even possible to convert optimistic (OP) rollups to zk rollups using zkVMs. The RISC Zero team released Zeth, and the Succinct team OP Succinct, which allows any OP stack chain to convert into a zk rollup in an hour. Essentially, the rollup’s state transition function can be run inside the zkVM to generate proofs for the states of the rollup, simply by deploying a smart contract. As a result, transactions can be finalized in 10 minutes, the time to generate the proof, rather than the 7 day fraud proof window that standard OP rollups provide.
We can already see a glimpse of the extensibility and versatility of zkVMs. Indeed, zkVMs are machines that not only scale Ethereum and other blockchains but also have applications in various domains such as IoT, cloud computing, supply chain management, healthcare, and entertainment.
Lifecycle of a zkVM
Let’s now dive deeper into the architecture of zkVMs. While the nuances in design may differ, the high-level lifecycles of zkVMs are more or less similar.
- A program written in a general programming language such as Rust is compiled into bytecode that a specific instruction set (ISA) can execute
- The VM, constructed as an universal circuit, executes machine code and generates an execution trace
- The frontend translates the circuit into polynomial constraints and the execution traces into a collection of polynomials via a process called arithmetization
- The mathematical statement is converted to a proof via the backend using IOPs and Commitment Schemes
- The verifier is able to verify the proof, confirming the integrity of the execution
Components
The main components of a zkVM are 1) a Proof System and 2) an Instruction Set Architecture (ISA). The Prover runs a program in an ISA wrapped by the Proof system, such that the execution creates a proof for the Verifier.
Proof system
The backbone of any zkVM is the proof system. Essentially, the proof system enables steps 2-5 of the lifecycle shown above.
As mentioned, a proof system consists of a frontend that converts code into arithmetic circuits and a backend that converts the mathematical statements into concise proofs. The specific choice of frontend and backend define the proof system for zkVMs, depending on the arithmetization schemes, polynomial commitment schemes (PCS), interactive oracle proofs (IOP) or probabilistically checkable proofs (PCP). There are now a wide variety of choices for developers to choose from.
For our purposes, we focus on the two most widely used proof systems today: zkSNARKs and zkSTARKs.
zkSNARKs
First introduced in a 2012 paper, the term zkSNARK stands for Zero-Knowledge Succinct Non-Interactive Argument of Knowledge. Let’s break down each of these pieces one by one:
Zero-knowledge: does not reveal any knowledge about the witness
Succinct: the size of the proof is very small and fast to verify
Non-Interactive: the interaction (the prover sending the verifier the proof) only happens once
Argument of Knowledge: the proof is difficult to forge and will always be accepted if true
An important caveat in SNARKs stems from the “non-interactiveness.” As the name suggests, non-interactiveness removes the need for the prover to interact with the verifier. To achieve this, SNARKs require an one-time initialization process called a trusted setup, where a trusted and honest third party generates a random value for the prover and verifier to use.
One of the most popular zkSNARKs is Groth16, preferred for concise proofs and fast verification. It has the smallest proof size of 128 bytes and the fastest verification. The main drawback is the trusted setup is circuit specific, meaning a new trusted setup must be performed for each unique circuit or program to prove.
PLONK systems are a leading alternative to Groth16. While they still require a trusted setup, they differ by introducing a universal and updatable trusted setup. This universality offers reusability of the setup in different circuits with additional security at the expense of proof generation and verification time.
It is critical that the trusted setup is done right, meaning the prover and verifier cannot access the parameters generated by the third party. Otherwise, a fraudulent proof can be generated, defeating the entire purpose of the proof system. While the trusted setup enables efficient proofs, it also introduces a trust assumption and a potential vulnerability.
It would be nicer if there are ways to remove the trusted setup process entirely, so that we don’t have to place any trust assumptions on the proof system. That’s where STARKs come into play.
zkSTARKs
zkSTARK stands for Zero-knowledge Scalable Transparent Argument of Knowledge, and was introduced in a 2018 paper by Ben Sasson et al. STARKs today are exclusively deployed non-interactively, effectively making them SNARKs.
We note the differences:
Scalable: Prover and Verifier time are efficient
Transparent: There is no trusted setup (unlike SNARKs)
The transparent part of STARKs removes the need for trusted setups or any trusted third part. But of course, this comes at a cost: namely, larger proof sizes and longer verification times. A common implementation of STARKs involves the use of FRI protocols, used by many zkVM solutions today.
Let’s dive deeper into the trade-offs of these proof systems.
Key Trade-Offs of STARKS v SNARKs
What is the desiderata of a zk proof system in the blockchain context? They must satisfy the demands of limited block space and minimize latency for usability. This translates to having a short proof size and maintaining a low time and cost for both the prover and verifier. While there are many subtleties to these considerations, i.e. trusted setups, quantum resistance etc., we start with the major properties as best represented in the table below:
Will there be one killer proof system that governs all? We see that like most things, there are tradeoffs: currently, the choice of proof system depends on use case.
Fast prover time leads to larger proof sizes, but slower verification time. To obtain smaller proofs, the prover must use more resources (measured in number of cores and cycles for VMs) to compress them. That is, shorter proofs lead to faster verification at the cost of efficiency. Indeed, zkSTARKs take less time to generate proofs, but have larger proofs that take longer to verify than zkSNARKs. This in turn means that STARKs have higher gas costs to verify onchain. To exemplify, verifying a PLONK proof (a type of SNARK) onchain takes about 290k gas, a Groth16 proof around 230k gas, while STARKs are estimated to cost more than 4 times of SNARKs.
While proof size, generation time, and verification speed form the foundation of zk system performance, there are several additional factors beyond their core metrics that influence their adoption and effectiveness:
Security implications:
- Quantum Resistance: Unlike most SNARKs, STARKs rely on hash functions considered quantum-resistant, providing long-term security assurances. With major quantum breakthroughs continuing to pick up steam, this could be a critical issue in the not too distant future.
Scalability:
- As the size of the computation increases, the relative efficiency of STARKs compared to SNARKs improves, allowing proof generation for larger computations.
Ecosystem Maturity:
- Recall that SNARKs were developed six years ahead of STARKs. In turn, this has given SNARKs a head start in terms of adoption. As such, SNARKs tend to have more robust developer libraries and tools, making them more accessible. However, as the zk landscape is quickly evolving, we can expect STARKs to see rapid creation of tools and optimizations.
Using the excellent table from awesome ZKP repo, we summarize all trade-offs discussed up to this point:
Now that we have a way to convert software into math and prove the execution of it, we complete the missing piece: choosing the VM architecture that we want to convert to a circuit.
ISA
An Instruction Set Architecture (ISA), like the RISC-V architecture, is the interface between hardware and software. An ISA defines the set of instructions that a processor can execute, specifying the format and encoding of these instructions, as well as how they interact with memory and registers.
Recall the first two steps of the lifecycle of a zkVM:
- A program written in a general programming language such as Rust is compiled into bytecode that a specific instruction set (ISA) can execute
- The VM, constructed as an universal circuit, executes machine code and generates an execution trace
In order to compile the program into bytecode and execute it on a computer, we need two things: a compiler that compiles the program, and an ISA that can execute the compiled program. A VM is simply an emulation of an ISA, like x86 and ARM running on the laptop you are using to read this post.
With Cairo, StarkWare was the first team to create their own limited assembly language that takes a Cairo program and runs it in circuits. RISC Zero took a similar approach using the more well-known RISC-V architecture for its ISA. RISC-V is an open-source architecture with minimal number of instructions that supports various programming languages like Rust and C. RISC-V itself is a family of instructions, with a base instruction set with support for extensions like integer multiplication. Given its greater flexibility than Cairo, RISC-V architecture has been popular in many other VMs, including Succint’s SP1 and a16z’s open-source zkVM Jolt.
As illustrated above, there are two fundamentally different types of ISAs: general purpose, like RISC-V, and zk-tailored, like Cairo. Custom ISAs can be designed to minimize resource usage by eliminating structures unnecessary in the VM setting. Recall that general ISAs are designed with hardware in mind, so general-purpose registers can be inefficient in the zkVM setting. Eliminating unused registers or combining them can reduce the number of columns for the constraints, which in turn improves prover costs. However, this comes at the cost of a dedicated compiler toolchain to support general programming languages. RISC-V already has an implemented compiler called LLVM that enables general programming languages to be compiled into bytecodes that RISC-V can execute.
The choice of ISA matters because instruction complexity directly affects the size of the circuit, and thus impacts the speed and efficiency of the proof generation process. The complexity of arithmetizing ISA can be understood in two parts: cost per instruction and number of instructions. If the number of instructions is too limited or we optimize to lower the cost per instruction, it will limit the expressivity of the program. On the other hand, if the ISA complexity is higher, then the performance overhead for the prover is also higher. Finally, the relationship between the two variables is inverse: increasing cost per instruction can decrease the number of instructions and vice versa.
We end this section with a comparison of zkVM ISAs made by the Lita Foundation.
Circuits and zkVMs in practice
Let’s zoom out and examine the implications of all of this. Why do we use zkVMs today? When we discuss zkVMs and circuits, the goals of both of these approaches are similar in that they aim to provide two things: privacy and verifiable compute.
Privacy
Privacy ensures that your data is hidden. We have seen how zk rollups can privately attest to the validity of transactions. There are a plethora of intriguing privacy applications, such as shielding wallet assets and protecting transaction information in protocols like Aleo, Mina and Zcash.
Many exciting and promising ideas are emerging in digital identity, but major adoption in this field requires fast client-side generation of the zkp or a secure way to transfer the data, such as with zkTLS.
Verifiable Compute
Verifiable computation ensures that we can ascertain the integrity of the proof. This integrity implies that proofs are both sound and complete: fraudulent proofs, which purport to have executed the program without actually doing so, will be rejected, while genuine proofs of execution will always be accepted.
But perhaps the most interesting property, and what enables these properties is the succinctness of the zkp. The most predominant product-market fit today for zkps in the blockchain world has been scaling via the succinctness of the proof.
Succinctness refers to the small proof size and the short time to verify it. How does it help scale blockchains? Let’s first consider the properties of blockchains.
Blocks have a fixed size, meaning there is a limit to how many transactions we can natively store per block. One way we can work around this is to post transactions of transactions by compressing many transactions into a succinct proof. And furthermore, because validators perform identical tasks to verify transactions and update state, we can make validating complex transactions a simpler, faster task, through verifying proofs of aggregated transactions instead of each individual transaction. This approach greatly reduces the time for validators to reach consensus. Indeed, zkps can be proved in milliseconds, even for computations that would otherwise require hours to execute natively.
Another prime example is the "ZK coprocessor," a term coined by Axiom, which enhances the scalability of smart contracts by allowing them to delegate historical on-chain data access and computation without trust using ZK proofs. Instead of performing all operations in the EVM, developers can delegate expensive operations to the ZK coprocessor, which does not operate on-chain, and only move the results on-chain. By separating data access and computation from blockchain consensus, it offers a novel approach to scaling smart contracts.
Why use a zkVM over circuits?
Now that we understand not just how zkVMs work but also the value they can provide, we can finally get to answering this piece’s original question – why use a zkVM instead of a circuit? Or, with a little more color, if zk circuits are more performant than zkVMs, and they offer the same values of privacy and verifiable compute, why use them over circuits? Well, contrary to intuition, we see there being two main cases in favor of zkVMs – developer experience and simplicity.
Developer experience
An amazing part of zkVMs is that the complexity of writing circuits and generating proofs is abstracted away for the developer. A developer who wants their application to inherit the benefits of zk can write their app’s code in the “legacy” programming language of their choice, like Rust and C, and the zkVMs take care of the rest of the complex part of zk. This means zkVMs allow you to not only write your program in Rust but also import already implemented libraries and crates written in Rust, and integrate them into your zk program, without having to rewrite them in circuits. The ability to not have to interact with the advanced cryptography underpinning zk opens up a whole new design space for developers.
We have seen how complex the underlying proof system and circuits can be. zkVMs allow developers to build applications without grappling with the intricacies of the underlying cryptographic systems. This parallels the way HTTPS enabled the creation of secure web applications without requiring deep knowledge of encryption algorithms.
A standardized zkVM approach can incentivize rapid prototyping and accelerate adoption of zk. For example, consider someone with deep knowledge in finance who wants to build a privacy preserving credit scoring app. The pool of people with overlap in knowledge of credit scores and circuits is likely extremely limited. The engineering effort, hiring cost and auditing of building custom circuits are also notoriously costly: some say the time and cost required to bring a circuit-based ZK project to market is on the order of 2 years and $100 million dollars. In the case of Succinct’s zk-Tendermint, what took months to build in custom circuits was completed in a matter of hours using the VM approach.
Simplicity and security
Another benefit of zkVMs is that the ISAs used are simple and well-maintained, which in turn reduces the potential of vulnerabilities making their way into new applications. For example, RISC-V only has 40 base integer instructions, simplifying the constraint system for building an entire virtual machine. This leads to a more compact codebase that allows for more manageable development and, crucially, easier auditing (e.g. the entire codebase for Jolt is under 25,000 lines of code).
That is, simplicity leads to enhanced security. A clearly defined system is inherently easier to reason about and secure. zkVMs create a standardized execution environment that is less vulnerable to bugs and exploits.
The benefits of zkVMs extend beyond the security of the virtual machine itself to the programs built on top of it. For instance, Rust based programs that run in a zkVM only need to be evaluated in Rust; the security of the underlying circuit is abstracted away. Custom circuit-based approaches, on the other hand, often require independent evaluation of the circuit for each program change. zkVMs provide a consistent foundation that allows for easier maintenance and updates to programs without compromising security.
Current problems
But why have we not seen an explosion of zkVM adoption? What will it take for the zk industry to really take off?
Despite their benefits, zkVMs currently face significant performance challenges that limit their practical applications. While the verifier computation is much cheaper than the native computation, the prover computation introduces substantial overhead compared to native execution, ranging from 6 to 9 orders of magnitude depending on the implementation.
For example, the Jolt prover, which claims to be 6x+ faster than other currently deployed zkVMs, is still 500,000x slower relative to native execution of a RISC-V program. In other words, proving one step of the RISC-V CPU requires ~500,000 cycles of the RISC-V CPU being emulated. This translates to prover costs being 1,000 to 100,000 times higher than native execution for general computations.
Perhaps this latency explains why zkVMs are primarily utilized for blockchain scalability, and are yet to find fit more universally. Traditional web2 applications require near-instantaneous response times, making the prolonged generation time of zkps impractical to use. In contrast, the blockchain ecosystem, with its greater acceptance of latency, provides a more suitable environment for zkp scaling solutions.
Still, there is a general understanding amongst zk builders that the cost of proof generation needs to come down if widespread zkVM adoption is to be achieved. As such, the primary focus of zkVM development has been minimizing cost of proof generation for scaling extensive computational cycles, particularly for proving for EVM rollups, OP rollups, and proof aggregation.
Solutions
Significant effort is being put into making zk proving faster, with four main approaches seen today: precompiles, recursion, lookup arguments, and hardware improvements.
Precompiles
Precompiles, also known as gadgets or chiplets, are essentially special-purpose circuits used to accelerate redundant standardized cryptographic operations such as hash functions, elliptic curves or storage access. Instead of running these functions in the CPU, these computations are handled in a separate circuit, allowing the CPUs to simply look up the values. Just like specialized chips in CPUs, they bring significant performance improvements for specific operations. Precompiles are generally built as a modification to the ISA, allowing developers to access them through libraries and APIs.
Recursion
Recursion refers to being able to verify a proof in a circuit, which effectively means that multiple proof circuits can be chained together. This chainability of proofs enables zkVMs to scale massively in two ways: 1) continuations and 2) STARK-to-SNARK proving. Continuations allow large computations to be split into smaller ones, be executed independently, and aggregated to a single proof to save time and cost of compute. In this way, a STARK circuit typically bounded by 16 million cycles can be extended up to 10 billion cycles, which enables very complex computations such as those found in zkML. Furthermore, recursion is often used to convert STARKs to SNARKs. STARKs are currently preferred over SNARKs for continuations today but bring the downside of large proofs. Recursion allows us to shrink the STARK proof size by verifying the STARK in a SNARK circuit. The resulting SNARK proof is smaller and faster to verify, thus enabling onchain verifiable proofs.
Lookup Arguments
Many SNARK systems today use a lookup argument, a protocol that allows a prover to first commit to a large vector, and then prove that every entry of the vector is contained in some predetermined table. In the context of zkVMs, it is used to commit pre-computed lookup tables for any of bitwise operations, range checks, assembly instruction and memory access. During proving, values are checked against the lookup table rather than recomputing the operations. By replacing the operations with efficient lookups, the constraints for the ISA become simpler, thereby reducing circuit complexity and enhancing prover efficiency. Most existing zkVM projects use polynomial constraints as the core building block, with additional lookup arguments to boost performance.
Hardware
Finally, we can accelerate proof generation time with better hardware. Many choices are being experimented including GPUs and FPGAs. While FPGAs could outperform GPUs in terms of efficiency and cost in the future, the abundance of GPUs today has led many projects to choose hardware acceleration through GPUs. GPUs enable parallel processing, which is especially suitable for STARKs where many tasks can be pipelined. For instance, STARK-based systems can be parallelized on hashing and field arithmetic to reduce the cost for the most expensive bottleneck in STARKs. While beyond the scope of this piece, hardware companies like Ingonyama, Irreducible, Fabric Cryptography, and more are making significant advances in this domain.
A note on benchmarking zkVMs
Our next section is primarily focused on exploring the major players in the zkVM industry today. However, before diving in, we wanted to preface our comparison section with a note on benchmarking in the zkVM space:
Benchmarking zkVMs is a complex and often misleading process due to the multitude of variables involved at different layers of the zkVM stack. Projects often claim significant performance improvements over competitors, but these comparisons can be conflicting. For instance, project A might show a 30x performance boost by comparing against project B’s implementations without precompiles, while project B might achieve similar gains over project A through GPU enablement.
It is critical to understand that an apples-to-apples comparison on performance is quite challenging. The results can be skewed by many factors such as precompiles, hardware acceleration, optimal configurations, task size, recursion, and proof size. Experts like Justin Thaler stress the importance of level playing field evaluations, suggesting that comparing RISC-V zkVMs without precompiles provides a more accurate performance assessment. Others might argue that we should measure end to end latency for real world use cases like validating Ethereum blocks. Comparing different ISAs adds another layer of complexity, making direct comparisons challenging or impossible in some cases.
As zkVM use cases become clearer and tooling matures, some confounding factors in benchmarking may diminish, but the need for a standardized, third-party benchmarking in the zkVM space is evident. Until then, understanding the specific context and parameters of each benchmark is essential for accurate interpretation of performance claims in the zkVM space. In the following section, we have done our best to properly contextualize all benchmarks reported.
Ok, with that out of the way, we are now ready to explore the different zkVM implementations today. We dive deeper into more mature projects like RISC Zero and Succinct as well as emergent projects such as Lita, Jolt, and Nexus. While many of the projects have multiple product offerings like prover networks and aggregation layers for generating proofs, we focus specifically on their zkVM products given the focus of this piece.
The zkVM landscape
RISC Zero - RISC Zero zkVM
TLDR - RISC Zero was one of the first teams building in the zkVM space to integrate the RISC-V architecture. They differentiate themselves on optimization for recursion and GPU proving. Their zkVM can support boundless compute on GPUs via continuations. Further, they provide stellar support for developer accessibility – 70% of the top 1,000 Rust crates work with their zkVM without any additional configuration. Finally, they’ve derived even greater performance by starting to leverage precompiles to optimize specific, common workloads.
Architecture
The architecture of the RISC Zero zkVM consists of 3 circuits: 2 STARKs and 1 SNARK. A RISC-V circuit for each of the segments using a FRI prover, a recursion circuit using another FRI prover for aggregation, and the Groth16 circuit to convert the aggregated proof to a SNARK. The zkVM goes through the following lifecycle:
1. The program is executed, generating a number of “segments”
2. Each segment is proven using a STARK-based (FRI) prover
3. All the segment proofs are aggregated into a single proof, also using a FRI-based prover
4. The FRI proof is shrunken using Groth16
The core of the proof system for RISC-V proving is STARK-based, which the team coins 0STARK: in particular it uses DEEP-ALI & FRI. The system first uses DEEP-ALI to construct and validate the constraint and validity polynomials. FRI is then employed to prove that these polynomials are indeed of low degree, which is crucial for the soundness of the overall proof. STARK proofs are used for the first two circuits because of their efficiency for horizontally scalable recursive proving. The final Groth16 STARK-SNARK circuit was chosen because it’s cheap to verify onchain and has robust tooling available for engineering velocity.
There are several notable features that make RISC Zero’s zkVM performant.
- Continuations
Recall that continuation refers to a mechanism for splitting large computations into several segments. As can be seen in the diagram above, this renders the zkVM capable of generating proofs for arbitrarily complex computations by limiting the runtime memory requirements to the size of the segment. As a result, continuations allow parallelization of the proof generalization for each segment to reduce latency.
- Proof composition
Proof composition leverages recursion to aggregate and verify proofs inside a zkVM program. As seen in the architecture, there are three zk circuits. This architecture is possible because of proof composition: namely, the STARK proof can be verified inside a SNARK circuit for the STARK-to-SNARK step. Furthermore, the aggregation step enables privacy preservation for each of the different segments that can be proven privately.
- Precompiles
The RISC Zero team acknowledges that most of the clear performance boost comes from integrating accelerator circuits, or precompiles, rather than choosing a new proof system. Currently, they support precompiles for SHA-256 and 256-bit modular multiplication, which improved costs up to 10x for cryptography heavy Rust crates. While these circuits are specific to a particular workload (making it hard to generalize performance), precompiles and proof compositions can bring up to a 600% performance boost. In Q3 of 2024, RISC Zero aims to launch precompiles for more cryptographic functions such as RSA.
Performance
In recent benchmarks, RISC Zero outperformed SP1 in testing when RISC Zero optimized for GPU performance versus SP1’s CPU performance. RISC Zero’s zkVM showed significantly cheaper costs and faster prover times both on AWS and consumer devices.
Specifically, we highlight some of the benchmarks on small cycle tasks like hashing operations (SHA2) and bigger cycle tasks like proving Tendermint Light Client on different hardware optimized for GPU (i.e. cloud-like AWS g6.xlarge instance and consumer devices like the Macbook Pro M3 and CUDA-4090 for GPU). Note the speed on the table below includes speed for end-to-end latency that includes the STARK-to-SNARK step for their most performant model, R0 zkVM 1.0.0-rc.5. We also note that the operations ran made use of the SHA2 precompile.
As can be seen, for small cycle operations like SHA2, the fastest proof came from a consumer PC with nVidia 4090 GPU. Further details about the speed and cost, including the number of cycles and RAM usage across each step of the RISC Zero zkVM can be found in this datasheet.
Interestingly, the RISC Zero team notes that performance gains come from focusing more on engineering problems than proof system design. That is, they claim that the current 0STARK design is sufficiently fast – instead of upgrading to new proof systems like Jolt, they are focusing on integrating more precompiles and optimizing GPU kernels.
Succinct - SP1
TLDR - Succinct’s SP1 zkVM is one of the most widely adopted zkVMs in the space. Their tech underpins OP Succinct rollups (zk rollup leveraging the OP stack), Polygon’s Agglayer, and is used to secure Celestia’s bridge to Ethereum. While sharing overlap with RISC Zero on the use of recursion and RISC-V architecture, SP1 differs in leveraging the Plonky3 proof system, which helps optimize for recursion. SP1 places an emphasis on customizability for precompiles, allowing anyone to fork SP1 to add and test precompiles to tailor to their needs.
Architecture
The following is the lifecycle of SP1:
1. LLVM compilable program is converted to a RISC-V ELF file using the CLI tool
2. The execution trace is generated as multiple tables. For instance, the CPU instructions represent multiple tables and each precompile for a specific operation represents one table.
3. Each shard is proven using a STARKs + FRI prover, then aggregated to a global proof
4. SP1 can be configured to convert non-constant STARK proofs into constant size proofs for on-chain verification by recursively combining proofs and wrapping the STARK to a SNARK using Groth16 to generate constant size proofs.
The underlying proof system is derived from Plonky3, a modular and scalable toolkit for implementing polynomial IOPs like PLONK and STARKs created by Polygon Labs. SP1 is configured specifically to optimize for hardware and recursion. In particular, arithmetization is done via AIR, and polynomial commitments via batched Fast Reed-Solomon with lookups via LogUp lookup arguments.
We also highlight several of the notable features from SP1:
- Shared challenges
To handle large-scale computations for recursion, SP1 employs a shared challenges technique that segments extensive computations into manageable shards, which are then seamlessly interconnected to form a comprehensive global proof. To maintain consistency between shards, each shard has starting and ending program counters in their respective proofs such that adjacent shards can verify the integrity, ensuring consistency across the shards. On top of the proof system to segment computation, SP1’s stack is tailored to support recursion. Namely, the RISC-V ISA is modified to be modular, written in a custom Rust-based DSL as well as a custom recursion compiler.
- Cross-table communication and Precompiles
One of the core ideas of SP1 is to assign one operation to each table (chip) and enable communication between tables. For example, if the main table requires a SHA256 proof, this fact is “requested” to the table dedicated to SHA256 via a lookup protocol that uses logarithmic derivatives called LogUp.
Hence, SP1 can easily incorporate a series of performant precompiles that accelerate hash functions and cryptographic signature verification. Namely, SP1 currently supports a wide range of precompiles, including hashing functions such as SHA256 and Keccak256 for Merkle Proofs and elliptic curve operations like Ed25519, Secp256k1, and Bls12-381 that are commonly used for signatures. This flexibility decreases RISC-V cycle counts by 5-10x on real-world workloads such as Tendermint Light Client proving or Reth proving.
- GPU
95% of the SP1’s prover’s compute bottleneck can be parallelized via GPU, as shown in the diagram below. For comparison, the CPU prover for SP1 reaches 150Khz of RISC-V cycles per second whereas GPU allows for over 1 MHZ cycles per second.
Performance
Succinct’s recent benchmarking shows that with GPU and precompiles enabled, SP1 is cheaper and faster than RISC Zero on real-world tasks such as Tendermint and Reth transaction proving, at the cost of a larger proof size. The tests were conducted on AWS instances as well as Lambda Lab GPUs. The end to end test included STARK-to-SNARK proving for constant proof size.
The best costs were supported via the Lamda A6000 48GB GPU. For reference, the AWS results on G6.16XLARGE L4 instance’s cost for Reth 30M was around $1.7. We note the best performance in terms of cost and time below.
In the case of Reth 30M, SP1 had a 10x performance on proof generation speed and on-demand cost on Lambda GPU and nearly 36x performance boost on AWS R7I.16XLARGE, highlighting the efficacy of their precompile oriented structure. While the cost and speed of SP1 is impressive, the proof size of the RISC Zero’s zkVM is almost 7x smaller than that of SP1. Moving forward, SP1 is expected to improve on new recursion architecture, arithmetization and two-phase prover approach.
Lita - Valida
TLDR - Valida is a relatively new zkVM created by the Lita Foundation team. Instead of using existing solutions, Lita created their own zk-friendly ISA and compiler tool chain from scratch in order to tune their offering specifically to the world of zk. Right now, their zkVM is relatively narrow in scope - it only supports proving programs in the C language. On the horizon is expanding to offer more programming languages and leveraging new technology from hardware providers like Fabric Cryptography to further increase performance.
Architecture
As shown in the diagram above, source code is converted to an intermediate representation of LLVM (LLVMIR) which is then converted into Valida VM bytecode via the custom compiler. Valid’s custom compiler was created to complement the Valida VM, which does not have existing toolchain support like RISC-V.
The Valida VM implements the Valida ISA, a RISC-V inspired instruction set optimized for efficient ZK proving. To minimize overall computational cost, several instructions are combined so that the number of branches is minimized. Specifically, the VM employs a Harvard architecture with separate program code and main memory, and features a CPU with multiple coprocessors connected via communication buses. Notably, it eschews general-purpose registers and dedicated stack, focusing on operating directly on memory, with extensibility for lookup arguments. The design is modular, allowing for the addition of precompiles or coprocessors, and is being developed as an open-source project to encourage community contributions. Further differences between the RISC-V general-purpose ISA and Valida ISA’s design can be found here.
Valida’s proof system leverages Plonky3: the AIR builder is used for arithmetization and FRI for the backend polynomial commitment schemes. The Lita team has also been integrating another polynomial commitment scheme called Brakedown, which can encode witness data at approximately 1.2 GB/s.
Performance
Valida has demonstrated impressive performance capabilities in initial benchmarks on speed. Valida proving was carried out with speeds ranging from 1.19 to 54 times faster than multi-core RISC Zero proving when tested on CPU. Additionally, Valida proved to be between 19 to 1,600 times more efficient, achieving multiple orders of magnitude faster and more efficient zk-proof generation.
Lita Foundation released benchmarks on Valida’s performance, comparing it against RISC Zero, SP1, and Jolt with a focus on two metrics: CPU efficiency (which measures cost efficiency of the computation) and wall clock time (which focuses on user waiting time). The experiment was conducted on two main C test programs: Fibonacci sequence and SHA-256 computations. Valida was tested on single-core and multi-core and RISC Zero, SP1, and Jolt were tested on multi-threaded CPU.
Note that performance comparison between RISC-V and custom ISA can be tricky, as you are comparing two fundamentally different architectures, and there are many variables such as proof size, computation size, etc. As Valida develops, it would be interesting to see their performance on bigger computational tasks that require continuations and proof shrinking for onchain verification.
a16z Crypto - Jolt
TLDR - Jolt (Just One Lookup Table) is an open-source frontend for RISC-V ISAs developed by the a16z crypto engineering team. Underpinning Jolt is a unique approach to lookups called Lasso, which differs from STARK-based proof systems. Lasso is powered by the sum-check protocol, an interactive proof system based on multivariate polynomials. Unlike FRI-based protocols, this protocol uses many rounds of interaction, which leads to much greater prover efficiency. And, importantly, Jolt is simple: the codebase is under 25,000 lines of Rust, and individual CPU instructions can be written in under 50 lines of code.
Architecture
The high level lifecycle of Jolt is similar in that the input to the Jolt zkVM is the execution trace of a program, but the differences come from the proof system. The innovation comes from the idea of lookup singularity, which refers to the idea that the most optimal circuit will only perform lookups rather than relying on polynomial constraints. The idea is that each execution instruction will require only a single lookup from the evaluation table, effectively reducing the size of the circuit and thus removing the computationally expensive bottleneck of polynomial commitment schemes for zkps.
To implement this, Jolt uses a lookup table that contains the entire evaluation table for each of the RISC-V instructions. As the RISC-V instructions can take 32-bit inputs, the table size can get enormous, of orders of 232. This is where the lookup argument Lasso and the sum-check protocol come in. Instead of storing the entire table, Lasso is used to decompose the lookup table into smaller, more manageable tables, so any lookup into the big table can be answered by performing a handful of lookups into vastly smaller tables. Because of the decomposition, the prover time does not depend on the size of the table but only on the number of lookup operations needed. This means that, instead of having to handpick primitive instructions of the CPU to optimize for the constraint systems, Jolt can support complex instruction sets. That is why there is less overhead for Jolt to extend to richer ISAs like RV64IM (which supports 64-bit rather than the 32-bit data types).
Each state transition in the program execution is verified using table lookups. The CPU instruction is fetched from memory using a table lookup, which is verified via minimal R1CS constraints. The fetched instruction is decoded using another table lookup for opcodes, and is executed using table lookups. As can be seen, this significantly reduces the circuit size for Jolt while being able to fully represent every CPU instruction for RISC-V.
The initial Jolt implementation uses an elliptic-curve-based commitment scheme called Hyrax for its simplicity and transparency. Hyrax will be replaced with the Binius commitment scheme for multilinear polynomials. The Binius commitment scheme is especially useful for systems like Jolt, where committed values are small. Incorporating Binius is projected to result in an additional 5x speedup for Jolt.
Performance
The early results for Jolt seem promising: Jolt can lead to a speedup of 5x+ thanks to fewer and smaller committed values. The metrics below are compared with none of precompiles or recursion enabled for RISC Zero or SP1. The benchmarks were run on an AWS r7g.16xlarge ARM machine with 64 CPU cores and 512 GiB DDR5 RAM for proving SHA-2. The 1, 4, 16 refer to the segment sizes for SP1.
Like Lita, while the early results seem promising, Jolt currently lacks support for recursion and continuation, and has a small upper bound on the maximum cycle count (~16 million). Hence, an apples to apples comparison with SP1 and RISC Zero that do support recursion and GPU, and thus real world scenarios such as proving Ethereum blocks, is currently infeasible. These are not fundamental flaws of Jolt – rather, we expect to see improvements and optimizations in the near future. Notably, the roadmap includes continuations, recursions and precompiles.
Nexus - Nexus zkVM
Nexus’ zkVM is a highly modular and open-source zkVM written in Rust. Given its modular nature, Nexus makes use of a mix of cutting edge components for its zkVM design: Jolt for the frontend, HyperNova for the backend, and their own custom ISA (Nexus Virtual Machine) which is a modified version of RISC-V. Last, Nexus makes heavy use of folding schemes, a technique that allows for performant aggregation of proofs.
Architecture
The novelty of Nexus comes from folding schemes: a technique for Incrementally Verifiable Computation (IVC), which allows proofs to be combined and aggregated incrementally. Folding schemes let the prover create a proof for the entire computation first, and then create a zkSNARK on the generated proof. The idea is that if there is a long running computation, we prove the execution recursively at each step i to step i+1 and verify the proof from step i−1 to step i was correct. The Nexus prover currently supports proving the Nova family of folding schemes like SuperNova, and HyperNova, and will use HyperNova specifically in Nexus 2.0. Hypernova can support both Plonkish and R1CS constraints as well as lookup arguments and the sum-check protocol, making them suitable to use with Jolt.
The flow of the Nexus prover can be seen below. At each step F of the execution of the program, IVC allows the proof to accumulate without SNARKs. The aggregated IVC proof, which is quite large and hard to verify in other proof systems, is first compressed into a SNARK. Then, it is fed into a zkSNARK circuit, removing the need for SNARK recursion, thus effectively decreasing proof generation cost and keeping proofs small for fast verification.
Performance
While open source, the Nexus zkVM 2.0 is currently under development, and hence is not at the most performant stage. The benchmark for the current zkVM shows that generating the proof to find the 10th Fibonacci number takes around 9928 cycles, with less than 1 minute of proving time and less than 2 seconds of verifying time.
We look forward to the innovations folding schemes can bring in the future. Their roadmap for optimizations includes adding precompile, modularized compilers and further proof compression.
Conclusion
Having now defined zkVMs, detailed their core components, discussed the tradeoffs, and explored various implementations, we can finally get around to understanding the debate that provided the inspiration for this piece.
At the core of the debate over zkVMs versus custom ZK circuits lies the question of viability and future prospects for these two approaches.
Vanishree Rao argued that zkVMs primarily serve as a bootstrapping tool, enabling faster time-to-market and reduced development costs for startups. However, once a startup achieves product market fit, she claimed that the natural decision is to transition to circuits, as custom circuits will always outperform zkVMs in terms of efficiency.
This stance faced pushback from the zkVM community, who emphasized some of the benefits we discussed above, namely reduced security vulnerabilities, great developer accessibility and the relative competitive performance of zkVMs. Wei Dai, a Research Partner at 1kx, suggested that enhancements such as precompiles could keep zkVMs competitive, while Uma Roy, Co-Founder of Succinct, claimed that their SP1 zkVM actually boasted superior performance against circuits.
So what do we think? Positioning the debate as a binary choice between custom zk circuits and zkVMs could be an oversimplification. As we have seen, the distinction between generalized virtual machines and custom circuits is becoming increasingly blurred, particularly with the introduction of precompiles. We have seen that all the covered projects today already have or plan to incorporate precompiles into their zkVMs. The addition of custom ISAs further blurs the boundary between zk circuits and zkVMs, as they are starting to allow zkVMs to even exceed circuits in performance for specific computations.
Given such advancements already in the zkVM field, will projects ever resort to custom circuits in the long run? There will always be a small subset of projects that require the utmost in performance and are willing to invest in custom circuit development when a zkVM can’t provide better performance. But we are already seeing that the vast majority of use cases for zero-knowledge proofs today seem to be effectively addressed using zkVMs. Consider the case of the zkEVM being developed by Ethereum’s Privacy and Scaling Explorations (PSE) team. Because the zkEVM’s performance is so critical, the PSE team started off with a custom circuit approach. However, they recently pivoted to a general purpose zkVM approach. Why? Despite the performance overhead, the benefits of developer experience and auditability were too appealing and they made the switch to a zkVM.
At the end of the day, we feel lucky to find ourselves at quite an exciting time in the field of zk. The arms race amongst zkVM builders is forcing teams to innovate and evolve their offerings at breakneck pace. Some teams are pushing the boundaries of performance while others are expanding the flexibility and accessibility of their systems to increase the potential reach of this technology. The "end game" is likely to be a diverse ecosystem where different approaches coexist, each optimized for particular use cases and requirements. ✦
Legal Disclosure: This document, and the information contained herein, has been provided to you by Hyperedge Technology LP and its affiliates (“Symbolic Capital”) solely for informational purposes. This document may not be reproduced or redistributed in whole or in part, in any format, without the express written approval of Symbolic Capital. Neither the information, nor any opinion contained in this document, constitutes an offer to buy or sell, or a solicitation of an offer to buy or sell, any advisory services, securities, futures, options or other financial instruments or to participate in any advisory services or trading strategy. Nothing contained in this document constitutes investment, legal or tax advice or is an endorsement of any of the digital assets or companies mentioned herein. You should make your own investigations and evaluations of the information herein. Any decisions based on information contained in this document are the sole responsibility of the reader. Certain statements in this document reflect Symbolic Capital’s views, estimates, opinions or predictions (which may be based on proprietary models and assumptions, including, in particular, Symbolic Capital’s views on the current and future market for certain digital assets), and there is no guarantee that these views, estimates, opinions or predictions are currently accurate or that they will be ultimately realized. To the extent these assumptions or models are not correct or circumstances change, the actual performance may vary substantially from, and be less than, the estimates included herein. None of Symbolic Capital nor any of its affiliates, shareholders, partners, members, directors, officers, management, employees or representatives makes any representation or warranty, express or implied, as to the accuracy or completeness of any of the information or any other information (whether communicated in written or oral form) transmitted or made available to you. Each of the aforementioned parties expressly disclaims any and all liability relating to or resulting from the use of this information. Certain information contained herein (including financial information) has been obtained from published and non-published sources. Such information has not been independently verified by Symbolic Capital and, Symbolic Capital, does not assume responsibility for the accuracy of such information. Affiliates of Symbolic Capital may have owned or may own investments in some of the digital assets and protocols discussed in this document. Except where otherwise indicated, the information in this document is based on matters as they exist as of the date of preparation and not as of any future date, and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date hereof. This document provides links to other websites that we think might be of interest to you. Please note that when you click on one of these links, you may be moving to a provider’s website that is not associated with Symbolic Capital. These linked sites and their providers are not controlled by us, and we are not responsible for the contents or the proper operation of any linked site. The inclusion of any link does not imply our endorsement or our adoption of the statements therein. We encourage you to read the terms of use and privacy statements of these linked sites as their policies may differ from ours. The foregoing does not constitute a “research report” as defined by FINRA Rule 2241 or a “debt research report” as defined by FINRA Rule 2242 and was not prepared by Symbolic Capital Partners LLC. For all inquiries, please email info@symbolic.capital. © Copyright Hyperedge Capital LP 2024. All rights reserved.