Shielded Upgradability

Last week, we deployed Penumbra Testnet 64, "Titan". Unlike our most recent releases, which landed headline features like bringing shielded transactions to the web and privacy to IBC, this release focused on laying less user-visible, but nonetheless critical foundations for a mainnet launch of Penumbra. In particular, Titan brings support for shielded chain upgrades, allowing participants in the Penumbra network to use the chain to coordinate on new software versions, and to seamlessly preserve users' private data across a chain upgrade boundary.

A common theme in Penumbra's development is that solving on-chain privacy requires reimagining every part of the stack. Why? Because on-chain privacy moves end-user data off-chain and onto end-user devices, and that means those end-user devices must be responsible for processing and storing that data. Every shielded transaction is a micro-rollup.

Chain upgrades appear simple, but are no exception to this theme. On a transparent chain, all state is public. This means that if a chain upgrade needs different on-chain data structures, the chain upgrade can simply change the data as part of the upgrade. Fundamentally, this is no different than doing a schema migration as part of deploying a centralized service, except that each node operator does the same schema migration to their replica of the chain state.

Private State as Constitutional Governance

On Penumbra, however, all user data is private, and recorded off-chain. Shielded transactions only publish cryptographic commitments to new state fragments (akin to UTXOs) and encrypted payload blobs visible only to the sender and receiver. End-user devices scan minified versions of these payload blobs to detect and download relevant transactions, and then decrypt and locally index their own transaction data.

Because user data is not recorded on-chain, a chain upgrade cannot alter it with a migration. Moreover, the encrypted payload blobs must be preserved exactly as-is, in order to allow clients to sync their private state. In effect, all user data is fundamentally immutable on a shielded chain.

Ever since on-chain governance was developed and deployed, there has been debate about what its scope and limits should be. The Cosmos ecosystem has seen different projects take different approaches, from the more libertarian bent of Celestia, which "rejects kings, presidents, and voting", to the majoritarian approach of Juno, which voted to confiscate tokens (then) worth $35M from an account the community felt had acted improperly.

Private state provides a way to resolve this tension, preserving the community's ability to coordinate on protocol changes while protecting individual protocol users from the tyranny of the majority. Private state is constitutionally protected, because it is technically impossible to modify by governance votes or even protocol changes. Protocol changes that affect the interpretation of private state cannot discriminate against specific users, and must affect all users of the protocol equally. On the other hand, public shared state can be modified, allowing the protocol to evolve over time, and the community can coordinate on-chain to signal approval or disapproval of code changes, counterbalancing the power of any set of protocol developers (who, in the absence of on-chain governance, might be able to form an informal cabal to push through code changes).

Backwards-Compatible Formats

The first technical challenge with shielded chain upgrades is that we need end-user data formats to have strong guarantees about backwards compatibility. The only way for users to understand their own activity is to download and decrypt their transactions. So, client software must be able to parse all previous transaction data, meaning that we can only make backwards-compatible changes to our transaction format. Luckily, Penumbra transactions are protobufs, which were designed around exactly this kind of compatibility guarantee.

This backwards compatibility issue doesn't just affect the transaction data, though. It also affects the way private state is committed to on-chain. For instance, each time a transaction creates a new output note, a commitment to that output note is added to an append-only Merkle tree. Then a future transaction can privately spend the note, by proving (in ZK) that the note's commitment was previously included in the tree of valid state commitments, but not revealing which one. Great. But what happens if we want to evolve the note format, or add a new kind of data? What if we don't just want to record fungible tokens, but a private interchain account controlled from within the shielded pool?

We've been thinking about this part of the problem for a while, and actually laid the foundation for it last year, based on our work implementing shielded swaps -- another example of how focusing on the specific shielded DEX use case has helped us build the foundation for general-purpose interchain privacy.

Our approach has two parts:

Rather than a note commitment tree, as in Zcash, Penumbra has a generic state commitment tree (SCT). The SCT commits to arbitrary state fragments, but makes no assumptions about what kind of states they are. Each state fragment can only be used once, enforced using a global nullifier set. The SCT is implemented using a tiered structure, allowing clients to accelerate sync, skipping over sections of the tree that aren't relevant to them, and ensuring that their tree sync work only scales with their activity, not with the total on-chain activity.
Client scanning processes StatePayload messages that use a Protobuf oneof to enumerate possible kinds of encrypted state fragments. Because oneofs are non-exhaustive and identified by field number, we can add more data types or evolve existing ones in a backwards-compatible way.

Together, this allows Penumbra to evolve over time and expand to new use-cases, without having to rebuild large parts of the system and allowing different kinds of state fragments to share the same anonymity set.

Historical Data Access

The second challenge is providing clients with access to historical transaction data. Penumbra is built on CometBFT/Tendermint, which was originally designed for the needs of the Cosmos SDK, where these problems are not a concern. When performing a chain upgrade, CometBFT accepts a snapshot of the application state, and discards all historical block and transaction data. In effect, the upgrade starts a new chain whose first block height is one greater than the old chain's last block. A migration of the chain state can be performed during the upgrade.

To explain how we made this work, we need to describe Penumbra's client sync process. Data management has always been our primary concern, so this was the very first thing we designed, all the way back in Testnet 1. As the chain records new shielded state fragments, it assembles them into a data structure called a CompactBlock. The CompactBlock is a minified version of a full block, stripped of everything except the minimal data required for clients to be able to scan and detect transactions.

Full nodes stream CompactBlocks to clients using GRPC, where they are scanned locally. If the client does not detect any relevant data, the TCT design allows it to fast-forward the update and throw away theCompactBlock data. But when it does detect data addressed to that user, the client needs to fetch the full transaction contents to understand the context,so it downloads the full block, extracts the user's transactions, and indexes them locally. This process is very fast. On the live Penumbra testnets, where we're currently running an ongoing on-chain auction for setup ceremony slots, we regularly see pcli stream and scan at rates upwards of 10,000 CompactBlocks per second, or 50,000x faster than real time.

While handling the CompactBlocks was easy (they're recorded as part of the chain state, so we just leave the historical data in place and continue the sequence post-upgrade), handling extended transaction fetching was trickier. In the end, we built a transaction cache into pd, and copy all of the transaction data into pd's state, so we can serve it directly to clients, without needing to proxy requests through to cometbft.

In the long term, we want to move this data completely off-chain, and make use of decentralized infrastructure to host it. And our upgrade support leaves the door open for us to do that in the future, by changing what the chain's public state is. In the short-term, though, we need Penumbra to be usable today, so we currently rely on full nodes to supply clients with transaction contents.

What's Next

We plan to exercise this logic in the new year, shifting from our previous model, where each testnet deployment was reset, to one where we upgrade from testnet to testnet, practicing chain upgrades that preserve user data, and ensuring that chain upgrade functionality is tested in real-world conditions.

Stay tuned for more updates on our pathway to mainnet. Slowly, and then all at once.