Vitalik New Article: The Future and Challenges of ZK-EVM

2023-12-14 04:16:43

Original title: “What might an “enshrined ZK-EVM” look like?”

Original author: Vitalik

Original compilation: Luccy, BlockBeats

*Editor’s note: On December 13, Vitalik Buterin, co-founder of ETH Place, published an article that delved into the possible implementation of the “ZK-EVM” (Zero-Knowledge Ethereum Virtual Machine) and discussed the design of different versions of the ZK-EVM. *

*Vitalik’s ZK-EVM concept aims to reduce the duplication of Ethereum protocol features by Layer-2 projects and improve their efficiency in validating Layer-1 Ethereum blocks. The article points out that current Layer-2 EVM protocols (such as Optimistic Rollups and ZK Rollups) rely on EVM verification mechanisms, but this also means that they must trust a large codebase. Once there is a vulnerability in the codebase, these virtual machines can be at risk of being attacked. *

*In addition, he mentioned support for “almost-EVM”, which allows L2 VMs to use ZK-EVM within the protocol with only minor differences from the EVM, while also providing flexibility for some customization of the EVM. *

Layer-2 EVM protocols on ETH, including op rollups and ZK rollups, rely on EVM validation. However, this requires them to trust a large code base, and if that code stock is in the loophole, these VMs are at risk of being hacked. In addition, even ZK-EVMs, which want to be fully equivalent to the L1 EVM, need some form of governance to replicate changes from the L1 EVM into their own EVM implementations.

This situation is not ideal, as these projects are replicating the functionality that already exists in the ETH Fang protocol, and ETH Fang governance is already responsible for upgrading and fixing bugs: ZK-EVM basically does the same work as validating L1 ETH blocks! In addition, in the next few years, we expect light clients to become more and more powerful, and soon to fully validate L1 ETH Fang execution using ZK-SNARKs. At this point, the ETH network will effectively have a built-in ZK-EVM. So the question arises: why not make this ZK-EVM localization available to the rollup project?

This article will describe several possible versions of the “enshrined ZK-EVM” and explore the trade-offs and design challenges, as well as the reasons for not choosing a particular direction. The advantages of implementing protocol features should be weighed against the benefits left to the ecosystem and the benefits of keeping the underlying protocol simple.

What are the key attributes that we can expect from the ZK-EVM that is enshrined as a standard?

Basic function: Validate ETH blocks. The protocol feature (which is still uncertain whether it is opcode, precompilation, or some other mechanism) should at least accept the pre-state root, block, and post-state root as input, and verify that the post-state root is indeed the result of executing a block on top of the pre-state root.

Compatibility with ETH multi-client concept. This means that we want to avoid pursuing a single attestation system and instead allow different clients to use different attestation systems. This, in turn, means a few points:

· Data availability requirements: For any EVM execution that uses the enshrined ZK-EVM proofs, we want to be able to guarantee that the underlying data is usable so that provers using a different attestation system can re-attest the execution, and clients that rely on that attestation system can validate these newly generated proofs.

· Proof exists outside of the EVM and chunk data structure: The ZK-EVM feature does not directly use SNARKs as input within the EVM, as different clients expect different types of SNARKs. Instead, it might be similar to blob validation: transactions can contain statements that need to be proven (pre-state, block body, post-state), the contents of which can be accessed by opcodes or precompilations, and the client-side consensus rules will check the data availability and the existence of proofs for each claim made in the block, respectively.

Auditability. If any execution is proven, we want the underlying data to be usable so that in case anything goes wrong, users and developers can inspect it. In practice, this adds another reason for the importance of data availability requirements.

Upgradeability. If we find a vulnerability in a particular ZK-EVM solution, we want to be able to fix it quickly. This means that there is no need for a hard fork to fix it. This adds another reason to prove the importance of existing outside of the EVM and block data structures.

Networks that support almost-EVM. One of the attractions of L2 is the ability to innovate the execution layer and scale the EVM. If the virtual machine (VM) of a given L2 is only slightly different from the EVM, it would be nice if L2 could still use the same native in-protocol ZK-EVM as the EVM, relying only on its own code for the same parts of the EVM and their own code for different parts. This can be achieved by designing a ZK-EVM feature that allows the caller to specify some opcodes, addresses, or bitfields that are handled by an externally provided table, rather than by the EVM itself. We can also make gas costs open to customization to a limited extent.

“Open” and “closed” multi-client systems

The “multi-client concept” is probably the most asserted requirement on this list. There is the option of abandoning this idea and concentrating on a ZK-SNARK solution, which will simplify the design, but at the cost of a larger “philosophical shift” on ETH Workshop (as this would actually be a departure from ETH Workshop’s long-term multiclient philosophy) and the introduction of greater risks. In the long-term future, for example, if formal verification techniques get better, it may be better to choose this path, but for now, the risks seem too great.

Another option is a closed multiclient system where a fixed set of attestation systems is known within the protocol. For example, we might decide to use three ZK-EVMs: PSE ZK-EVM, Polygon ZK-EVM, and Kakarot. A block needs to carry proof of two of these three in order to be considered valid. This is better than a single proof system, but it makes the system less adaptable because users have to maintain validators for every known proof system, there is bound to be a political governance process to incorporate new proof systems, and so on.

This has led me to prefer an open, multi-client system, where proofs are placed “outside of blocks” and verified separately by clients. Individual users can validate blocks with the client they want, as long as there is at least one prover who generates the proof of the system. The attestation system will gain influence by convincing users to run them, not by convincing the protocol governance process. However, this approach does have more complexity costs that we’ll see.

What are the key features we want the ZK-EVM implementation to have?

In addition to the basic correct functionality and safety guarantees, the most important feature is speed. While it is possible to design a ZK-EVM feature within a protocol that is asynchronous and returns only one answer for each claim after a delay of N time slots, the problem that everything that happens in each block is self-contained would be easier if we could reliably guarantee that proofs would be generated in a matter of seconds.

Although it takes many minutes or hours to generate proofs of ETH blocks today, we know that there is no theoretical reason to prevent mass parallelization: we can always combine enough GPUs to prove the different parts of the block execution separately and then use recursive SNARKs to put those proofs together. In addition, hardware acceleration through FPGAs and ASICs may help to further optimize proofs. However, actually getting to this point is a significant engineering challenge that should not be underestimated.

What might the ZK-EVM features within the protocol look like?

Similar to EIP-4844 Blob transactions, we introduce a new transaction type that includes ZK-EVM claims:

Similar to EIP-4844, the object passed in the mempool will be a modified version of the transaction:

The latter can be converted to the former, but not the other way around. We’ve also expanded the block sidecar object (introduced in EIP-4844) to include a list of proofs made in a block.

Note that in practice, we may want to split the sidecar into two separate sidecars, one for blobs and one for proofs, and set up a separate subnet for each proof type (and an additional subnet for blobs).

On top of the consensus layer, we added a validation rule that a block will only be accepted if the client sees a valid proof of each claim in the block. The proof must be a ZK-SNARK attestation statement, i.e., the concatenation of transaction_and_witness_blobs is the serialization of the (Block,Witness) pair, the execution block is valid on pre_state_root using Witness (i), and (ii) outputs the correct post_state_root. Potentially, clients can choose to wait for M-of-N for multiple attestation types.

There is a philosophical note here that the block execution itself can be treated as something that only needs to be checked along with one of the triples (σpre, σpost, Proof) provided in the ZKEVMClaimTransaction object. As a result, a user’s ZK-EVM implementation can replace its execution client, which will still be used by (i) provers and block builders, and (ii) nodes that care about indexing and storing data for local use.

Verification and re-attestation

Let’s say you have two ETH clients, one of which uses PSE ZK-EVM and the other uses Polygon ZK-EVM. Suppose that by this point, both implementations have evolved to the point where they can prove ETH block execution in under 5 seconds, and for each proof system, there are enough independent volunteers running hardware to generate proofs.

Unfortunately, since individual attestation systems are not formalized, they cannot be incentivized in the protocol, however, we anticipate that the cost of running a proof-of-proof node will be lower compared to the cost of research and development, so we can simply fund proof-of-proof nodes through a common body funding for public goods.

Let’s say someone publishes a ZKEvmClaimNetworkTransaction, unless they only publish a proof of the PSE ZK-EVM version. Seeing this, the Proof node of the Polygon ZK-EVM calculates and republishes the object with the Proof of Polygon ZK-EVM.

This increases the total maximum latency between the earliest honest node accepting a block and the latest honest node accepting the same block from δ to 2 δ+Tprove (assuming Tprove < 5 seconds here).

The good news, however, is that if we take single-slot determinism, we can almost certainly “pipeline” this extra latency along with the multi-round consensus latency inherent to SSFs. For example, in this 4-slot proposal, the “head vote” step may only need to check the basic block validity, but the “freeze and confirm” step will require the existence of a proof.

Extension: Support “almost-EVM”

A desirable goal of the ZK-EVM feature is to support “almost-EVM”: EVMs with a few extra features. This could include new precompilation, new opcodes, allowing contracts to be written in EVMs or completely different VMs (e.g., in Arbitrum Stylus), or even multiple parallel EVMs with synchronous cross-communication.

Some modifications can be supported in a simple way: we can define a language that allows ZKEVMClaimTransaction to pass the full description of the modified EVM rules. This can be used to:

· Custom gas cost table (users are not allowed to reduce gas costs, but can increase them)

· Disable certain opcodes

· Set the block number (this will mean that there will be different rules depending on the hard fork)

· Set a flag that activates the full set of EVM changes that have been standardized for L2 use instead of L1 use, or other simpler changes

In order to allow users to add new functionality by introducing new precompiled (or opcodes) in a more open way, we can add a content containing precompiled input/output transcripts to a part of the blob of ZKEVMClaimNetworkTransaction:

The EVM execution will be modified as follows. Array inputs will be initialized as empty. Each time the address in used_precompile_addresses is called, we append the InputsRecord(callee_address, gas, input_calldata) object to the inputs and set the call’s RETURNDATA to outputs[i]。 Finally, we check that used_precompile_addresses has been called len(outputs) a total of times, and that inputs_commitments matches the result of the SSZ serialization of the generated blob commitments to the inputs. The purpose of exposing inputs_commitments is to facilitate external SNARKs to prove the relationship between inputs and outputs.

Note the asymmetry between the input and output, where the input is stored in a hash and the output is stored in the bytes that must be provided. This is because the execution needs to be done by a client that only sees the input and understands the EVM. EVM executions have already generated inputs for them, so they only need to check if the generated inputs match the declared inputs, which only requires hash checking. However, the output must be provided to them in full form, so it must be data available.

Another useful feature might be to allow “privileged transactions”, i.e., transactions that initiate calls from any sender account. These transactions can run between two other transactions, or when calling precompile in another (and possibly privileged) transaction. This can be used to allow non-EVM mechanisms to call back to the EVM.

This design can be modified to support new or modified opcodes, in addition to new or modified precompilation. Even with only precompilation, this design is very powerful. For example:

· By setting used_precompile_addresses, including a list of regular account addresses with certain flags set in their account objects in the state, and making an SNARK to prove that it was built correctly, you can support Arbitrum Stylus-style features where the code for the contract can be written in EVM or WASM (or other VMs). Privileged transactions can be used to allow WASM accounts to be called back to the EVM.

· By adding an external check to ensure that the input/output transcriptions and privileged transactions performed by multiple EVMs are matched in the correct way, you can demonstrate a parallel system of multiple EVMs communicating with each other over a synchronization channel.

· The fourth type of ZK-EVM can work by having multiple implementations: one that converts Solidity or other high-level languages directly into SNARK-friendly VMs, and another that compiles it into EVM code and executes it in the enshrined ZK-EVM. The second (and inevitably slower) implementation only runs if the error prover sends a transaction stating that there is an error, and if they are able to provide a transaction that is handled differently by the two, they can be rewarded.

· A purely asynchronous VM can be achieved by making all calls return zero and mapping the calls to privileged transactions that are added to the end of the block.

Extension: Support for stateful attesters

One of the challenges with the above design is that it is completely stateless, which makes it less efficient in terms of data utilization. By supporting stateful compression, ERC 20 sends can save up to 3x more space than when using stateless compression alone, using ideal data compression.

In addition, stateful EVMs do not need to provide witness data. In both cases, the principle is the same: it is a waste to ask for data to be available, because we already know that the data is usable because it was input or produced by a previous execution of the EVM.

If we want the ZK-EVM feature to be stateful, then we have two options:

Require σpre to be either empty, or a list of data available for a pre-declared key-value pair, or some previously executed σpost.
Add the blob commitment of a block-generated receipt R to the (σpre,σpost, Proof) tuple. Any previously generated or used blob promises, including those representing blocks, witnesses, receipts, or even regular EIP-4844 blob transactions, which may have some time limit, can be referenced in ZKEVMClaimTransaction and accessed during its execution (possibly through a series of instructions: "Insert byte N of commitment i at position j of block + witness data… N+k− 1 」）。

(1) Basically saying: instead of solidifying stateless EVM verification, we’re going to solidify EVM child chains. (2) Essentially creating a built-in minimal stateful compression algorithm that uses a previously used or generated blob as a dictionary. Both of these add the burden of storing more information to the proof node, which is only the proof node, and in case (2), it is easier to limit this burden to a certain amount of time than in case (1).

The debate between closed multi-proof systems and off-chain data

A closed multi-prover system avoids many of the above complexities by fixing a certain number of proofs in an M-of-N structure. In particular, closed multi-prover systems don’t need to worry about ensuring that data is on-chain. In addition, a closed multi-attester system will allow ZK-EVM proofs to be executed off-chain, making them compatible with EVM plasma solutions.

However, closed multi-prover systems increase governance complexity and reduce auditability, which is a trade-off between the high cost and these benefits.

If we establish ZK-EVM as a protocol feature, what will be the ongoing role of the “L2 project”?

The protocol will handle the EVM verification features that the L2 teams currently implement themselves, but the L2 project will still be responsible for a number of important features:

Fast pre-confirmation: Single-slot finality can slow down L1 slots, and L2 projects are already providing their users with “pre-confirmation” backed by L2’s own security, with much lower latency than one slot. This service will continue to be a pure L2 liability.

MEV mitigation strategies: This may include encrypted mempools, reputation-based sequential selection, and other features that L1s are reluctant to implement.

Extensions to the EVM: L2 projects can incorporate substantial extensions to the EVM to provide significant value to their users. This includes both “almost-EVMs” and fundamentally different approaches like Arbitrum Stylus’ WASM support and SNARK’s friendly Cairo language.

Convenience for users and developers: L2 teams do a lot of work to attract users and projects to their ecosystem and make them feel welcome, and they are compensated by acquiring MEV and congestion fees within their network. This relationship is here to stay.

Link to original article

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.