Ethereum Vision Planning: How The Purge Balances Permanence and Complexity

2025-07-18 01:42:00

The Possible Future of Ethereum: The Purge

One of the challenges facing Ethereum is that, by default, the expansion and complexity of any blockchain protocol tend to increase over time. This occurs in two areas:

Historical Data: Any transaction made or any account created at any point in history must be permanently stored by all clients and downloaded by any new client to fully synchronize with the network. This will lead to an increasing client load and synchronization time over time, even if the chain's capacity remains unchanged.
Protocol Functionality: Adding new features is much easier than removing old ones, leading to increased code complexity over time.

To ensure that Ethereum can sustain itself in the long term, we need to exert strong counter-pressure on these two trends, reducing complexity and expansion over time. At the same time, we need to retain one of the key attributes that makes blockchain great: persistence. You can put an NFT, a love letter in a transaction call data, or a smart contract containing 1 million dollars on the chain, go into a cave for ten years, and come out to find it still there waiting for you to read and interact with. To allow DApps to confidently decentralize completely and remove upgrade keys, they need to be assured that their dependencies will not upgrade in ways that would break them - especially L1 itself.

If we are determined to strike a balance between these two demands and minimize or reverse the bloat, complexity, and decline while maintaining continuity, it is absolutely possible. Living organisms can do this: while most organisms age over time, a few lucky ones do not. Even social systems can have very long lifespans. In some cases, Ethereum has succeeded: proof of work has disappeared, the SELFDESTRUCT opcode has mostly vanished, and beacon chain nodes have stored old data for up to six months. Finding this path for Ethereum in a more general way and moving towards a long-term stable final outcome is the ultimate challenge for Ethereum's long-term scalability, technical sustainability, and even security.

The Purge: Main Objective.

Reduce client storage requirements by minimizing or eliminating the need for each node to permanently store all historical records or even the final state.
Reduce protocol complexity by eliminating unnecessary features.

Table of Contents:

History expiry
State expiry
Feature cleanup

History expiry

What problem does it solve?

As of the time of writing, a fully synchronized Ethereum node requires approximately 1.1 TB of disk space to run the client, in addition to several hundred GB of disk space for the consensus client. The vast majority of this is historical: data regarding historical blocks, transactions, and receipts, most of which are several years old. This means that even if the Gas limit does not increase at all, the size of the node will continue to grow by hundreds of GB each year.

What is it and how does it work?

A key simplifying feature of historical storage issues is that because each block points to the previous block via hash links (and other structures), achieving consensus on the current state is sufficient to achieve consensus on history. As long as the network reaches consensus on the latest block, any historical block, transaction, or state (account balances, random numbers, code, storage) can be provided by any single participant along with a Merkle proof, which allows anyone else to verify its correctness. Consensus is an N/2-of-N trust model, while history is an N-of-N trust model.

This provides us with many options for how to store historical records. One natural choice is a network where each node only stores a small portion of the data. This is how seed networks have operated for decades: while the network stores and distributes millions of files in total, each participant only stores and distributes a few of those files. Perhaps counterintuitively, this approach may not even necessarily reduce the robustness of the data. If we can make running nodes more cost-effective, we could establish a network with 100,000 nodes, where each node stores a random 10% of the historical records, then each piece of data will be replicated 10,000 times - the same replication factor as a 10,000-node network where each node stores everything.

Now, Ethereum has begun to move away from the model where all nodes permanently store all history. Consensus blocks (i.e., those related to proof-of-stake consensus) only store about 6 months. Blobs only store for about 18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. The long-term goal is to establish a unified period (possibly around 18 days) during which each node is responsible for storing everything, and then create a peer-to-peer network composed of Ethereum nodes to store old data in a distributed manner.

Erasure codes can be used to improve robustness while keeping the replication factor the same. In fact, the Blob has already implemented erasure codes to support data availability sampling. The simplest solution is likely to reuse these Erasure codes and also place execution and consensus block data into the blob.

What is the relationship between ### and existing research?

EIP-4444；
Torrents and EIP-4444;
Portal Network;
Portal Network and EIP-4444;
Distributed storage and retrieval of SSZ objects in Portal;
How to increase gas limit (Paradigm).

What else needs to be done, what needs to be weighed?

The remaining main tasks include building and integrating a specific distributed solution to store historical records------at least execution history, but ultimately also including consensus and blobs. The simplest solution is to (i) simply introduce existing torrent libraries, as well as (ii) an Ethereum native solution called the Portal network. Once either of these is introduced, we can open EIP-4444. EIP-4444 itself does not require a hard fork, but it does require a new version of the network protocol. Therefore, enabling it for all clients at the same time is valuable; otherwise, there is a risk of clients failing due to expecting to download the full historical records when connecting to other nodes, but actually not retrieving it.

The main trade-off involves how we strive to provide "ancient" historical data. The simplest solution is to stop storing ancient history tomorrow and rely on existing archival nodes and various centralized providers for replication. This is easy, but it undermines Ethereum's position as a permanent record keeper. A more difficult but safer approach is to first build and integrate a torrent network to store historical records in a distributed manner. Here, "how hard we work" has two dimensions:

How do we ensure that the largest set of nodes actually stores all the data?
How deep is the integration of historical storage into the protocol?

An extreme paranoid method for (1) would involve proof of custody: essentially requiring each proof-of-stake validator to store a certain proportion of historical records and regularly check in an encrypted manner whether they are doing so. A more moderate approach would be to set a voluntary standard for the percentage of history stored by each client.

For (, the basic implementation only involves the work that has been completed today: the Portal has stored the ERA file containing the entire Ethereum history. A more thorough implementation would involve actually connecting it to the synchronization process so that if someone wants to synchronize the full history storage node or archive node, they can achieve this via direct synchronization from the portal network, even if no other archive nodes are online.

) How does it interact with other parts of the roadmap?

If we want to make running or starting nodes extremely easy, then reducing historical storage requirements can be said to be more important than statelessness: of the 1.1 TB required by the node, about 300 GB is state, and the remaining approximately 800 GB has become historical. Only by achieving statelessness and EIP-4444 can the vision of running an Ethereum node on a smartwatch and setting it up in just a few minutes be realized.

Limiting historical storage also makes it more feasible for newer Ethereum nodes to implement, only supporting the latest version of the protocol, which makes them simpler. For example, many lines of code can now be safely removed because the empty storage slots created during the DoS attack in 2016 have all been deleted. Now that the transition to proof of stake has become history, clients can safely remove all code related to proof of work.

State expiry

What problem does it solve?

Even if we eliminate the need for clients to store history, the storage requirements of clients will continue to grow by about 50 GB per year, as the state continues to expand: account balances and nonces, contract code and contract storage. Users can pay a one-time fee, thus permanently burdening current and future Ethereum clients.

The status is harder to "expire" than history, because the EVM is fundamentally designed around the assumption that once a state object is created, it will always exist and can be read by any transaction at any time. If we introduce statelessness, some argue that this issue may not be as bad: only dedicated block builder classes need to actually store state, while all other nodes (even those generating lists!) can operate statelessly. However, there is a perspective that we do not want to rely too much on statelessness, and ultimately we may want to make state expire to maintain the decentralization of Ethereum.

![Vitalik: The possible future of Ethereum, The Purge]###https://img-cdn.gateio.im/webp-social/moments-a97b8c7f7927e17a3ec0fa46a48c9f24.webp(

) What is it and how does it work?

Today, when you create a new state object (which can occur in one of the following three ways: (i) sending ETH to a new account, (ii) creating a new account using code, (iii) setting a previously untouched storage slot), the state object remains in that state forever. What we want instead is for the object to automatically expire over time. The key challenge is to do this in a way that achieves three goals:

Efficiency: No need for extensive additional calculations to run the expiration process.
User-friendliness: If someone enters the cave for five years and comes back, they should not lose access to their Ether, ERC20, NFT, and CDP positions...
Developer friendliness: Developers do not have to switch to a completely unfamiliar thinking model. In addition, applications that are currently rigid and not updated should continue to run normally.

It becomes easy to solve problems if these goals are not met. For example, you could have each state object also store an expiration date counter (which can be extended by burning ETH, which might happen automatically upon reading or writing at any time), and have a process that loops through the states to remove expired state objects. However, this introduces additional computation (and even storage requirements), and it certainly cannot meet the user-friendliness requirement. Developers also find it difficult to reason about edge cases where stored values sometimes reset to zero. If you set an expiration timer within the contract's scope, it technically makes the developer's life easier, but it complicates the economics: developers must consider how to "pass on" the ongoing storage costs to users.

These are issues that the Ethereum core development community has been working to solve for many years, including proposals like "blockchain rent" and "regeneration." Ultimately, we combined the best parts of the proposals and focused on two categories of "least bad known solutions":

Partial status expiration solution
Address cycle-based state expiration recommendations.

Partial state expiry

Some expired proposal states follow the same principles. We divide the state into blocks. Everyone permanently stores the "top-level mapping", where the blocks are either empty or non-empty. Data in each block is stored only if it has been accessed recently. There is a "revival" mechanism that activates if it is no longer stored.

![Vitalik: The Possible Future of Ethereum, The Purge]###https://img-cdn.gateio.im/webp-social/moments-5cd0e9908a04986f83c85cabecd4a0ae.webp(

ETH-1.31%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

14 Likes

Reward
14
10
Share

Comment

0/400

FalseProfitProphet

· 07-21 01:17

That's too exaggerated, the block expansion speed.

View OriginalReply0

RektRecovery

· 07-20 20:38

predictable bloat... another vulnerability waiting to exploit. saw this coming ages ago tbh

Reply0

OPsychology

· 07-19 22:45

This project is starting to make empty promises again.

View OriginalReply0

RektButStillHere

· 07-18 12:17

On-chain data accumulation is too terrifying.

View OriginalReply0

NonFungibleDegen

· 07-18 02:10

bearish af on bloated chains... probably nothing ser

Reply0

BackrowObserver

· 07-18 02:10

This wallet synchronization is discouraging for newcomers.

View OriginalReply0

MetamaskMechanic

· 07-18 02:09

Blockchain veteran is about to optimize again.

View OriginalReply0

MysteriousZhang

· 07-18 02:01

To lose weight, we need to support.

View OriginalReply0

ForkPrince

· 07-18 01:56

How to slim down data obesity!

View OriginalReply0

AirdropFatigue

· 07-18 01:53

This top airdrop player is tired too~

View OriginalReply0

Topic
#Gate ETH Staking APY 5%
34k Popularity
#Show My Alpha Points
48k Popularity
#Crypto IPO Surge
11k Popularity
#Bitcoin Hashrate New High
2k Popularity
#Hong Kong Stablecoin Rules
2k Popularity

sitemap