Shielded Upgradability

Last week, we deployed Penumbra Testnet 64, "Titan".  Unlike our most recentreleases, which landed headline features like bringing shielded transactions tothe web and privacy to IBC, this release focused onlaying less user-visible, but nonetheless critical foundations for a mainnetlaunch of Penumbra.  In particular, Titan brings support for shielded chainupgrades, allowing participants in the Penumbra network to use the chain tocoordinate on new software versions, and to seamlessly preserve users' privatedata across a chain upgrade boundary.

A common theme in Penumbra's development is that solving on-chain privacyrequires reimagining every part of the stack. Why? Because on-chain privacymoves end-user data off-chain and onto end-user devices, and that means thoseend-user devices must be responsible for processing and storing that data. Everyshielded transaction is a micro-rollup.

Chain upgrades appear simple, but are no exception to this theme. On atransparent chain, all state is public.  This means that if a chain upgradeneeds different on-chain data structures, the chain upgrade can simply changethe data as part of the upgrade. Fundamentally, this is no different than doinga schema migration as part of deploying a centralized service, except that eachnode operator does the same schema migration to their replica of the chain state.

Private State as Constitutional Governance

On Penumbra, however, all user data is private, and recorded off-chain. Shieldedtransactions only publish cryptographic commitments to new state fragments (akinto UTXOs) and encrypted payload blobs visible only to the sender and receiver.End-user devices scan minified versions of these payload blobs to detect anddownload relevant transactions, and then decrypt and locally index their owntransaction data.

Because user data is not recorded on-chain, a chain upgrade cannot alter it with amigration. Moreover, the encrypted payload blobs must be preserved exactlyas-is, in order to allow clients to sync their private state. In effect, alluser data is fundamentally immutable on a shielded chain.

Ever since on-chain governance was developed and deployed, there has been debateabout what its scope and limits should be.  The Cosmos ecosystem has seendifferent projects take different approaches, from the more libertarian bent ofCelestia, which "rejects kings, presidents, and voting", to the majoritarianapproach of Juno, which voted to confiscate tokens (then) worth $35Mfrom an account the community felt had acted improperly.

Private state provides a way to resolve this tension, preserving the community'sability to coordinate on protocol changes while protecting individual protocolusers from the tyranny of the majority.  Private state is constitutionallyprotected, because it is technically impossible to modify by governance votes oreven protocol changes.  Protocol changes that affect the interpretation ofprivate state cannot discriminate against specific users, and must affect allusers of the protocol equally. On the other hand, public shared state can bemodified, allowing the protocol to evolve over time, and the community cancoordinate on-chain to signal approval or disapproval of code changes,counterbalancing the power of any set of protocol developers (who, in theabsence of on-chain governance, might be able to form an informal cabal to pushthrough code changes).

Backwards-Compatible Formats

The first technical challenge with shielded chain upgrades is that weneed end-user data formats to have strong guarantees about backwardscompatibility. The only way for users to understand their own activity is todownload and decrypt their transactions.  So, client software must be able toparse all previous transaction data, meaning that we can only makebackwards-compatible changes to our transaction format. Luckily, Penumbratransactions are protobufs, which were designed around exactly this kind ofcompatibility guarantee.

This backwards compatibility issue doesn't just affect the transaction data,though. It also affects the way private state is committed to on-chain. Forinstance, each time a transaction creates a new output note, a commitment tothat output note is added to an append-only Merkle tree. Then a futuretransaction can privately spend the note, by proving (in ZK) that the note'scommitment was previously included in the tree of valid state commitments, butnot revealing which one. Great. But what happens if we want to evolve the noteformat, or add a new kind of data? What if we don't just want to record fungibletokens, but a private interchain account controlled from within the shieldedpool?

We've been thinking about this part of the problem for a while, and actuallylaid the foundation for it last year, based on our work implementingshielded swaps -- another example of how focusing on the specific shielded DEXuse case has helped us build the foundation for general-purpose interchainprivacy.

Our approach has two parts:

  1. Rather than a note commitment tree, as in Zcash, Penumbra has a generic state commitment tree (SCT). The SCT commits to arbitrary state fragments, but makes no assumptions about what kind of states they are. Each state fragment can only be used once, enforced using a global nullifier set.  The SCT is implemented using a tiered structure, allowing clients to accelerate sync, skipping over sections of the tree that aren't relevant to them, and ensuring that their tree sync work only scales with their activity, not with the total on-chain activity.
  2. Client scanning processes StatePayload messages that use a Protobuf oneof to enumerate possible kinds of encrypted state fragments. Because oneofs are non-exhaustive and identified by field number, we can add more data types or evolve existing ones in a backwards-compatible way.

Together, this allows Penumbra to evolve over time and expand to new use-cases,without having to rebuild large parts of the system and allowing different kinds of state fragments to share the same anonymity set.

Historical Data Access

The second challenge is providing clients with access to historical transactiondata.  Penumbra is built on CometBFT/Tendermint, which was originally designedfor the needs of the Cosmos SDK, where these problems are not a concern. Whenperforming a chain upgrade, CometBFT accepts a snapshot of the applicationstate, and discards all historical block and transaction data. In effect, theupgrade starts a new chain whose first block height is one greater than the oldchain's last block. A migration of the chain state can be performed during theupgrade.

To explain how we made this work, we need to describe Penumbra's client syncprocess. Data management has always been our primary concern, so this was thevery first thing we designed, all the way back in Testnet 1. As thechain records new shielded state fragments, it assembles them into a datastructure called a CompactBlock. The CompactBlock is a minified version of afull block, stripped of everything except the minimal data required for clientsto be able to scan and detect transactions.

Full nodes stream CompactBlocks to clients using GRPC, where they are scannedlocally. If the client does not detect any relevant data, the TCTdesign allows it to fast-forward the update and throw away theCompactBlock data. But when it does detect data addressed to that user, theclient needs to fetch the full transaction contents to understand the context,so it downloads the full block, extracts the user's transactions, and indexesthem locally.  This process is very fast. On the live Penumbra testnets, wherewe're currently running an ongoing on-chain auction for setup ceremonyslots, we regularly see pcli stream and scan at rates upwards of10,000 CompactBlocks per second, or 50,000x faster than real time.

While handling the CompactBlocks was easy (they're recorded as part of thechain state, so we just leave the historical data in place and continue thesequence post-upgrade), handling extended transaction fetching was trickier. Inthe end, we built a transaction cache into pd, and copy all of the transactiondata into pd's state, so we can serve it directly to clients, without needingto proxy requests through to cometbft.

In the long term, we want to move this data completely off-chain, and make useof decentralized infrastructure to host it. And our upgrade support leaves thedoor open for us to do that in the future, by changing what the chain's publicstate is. In the short-term, though, we need Penumbra to be usable today, so wecurrently rely on full nodes to supply clients with transaction contents.

What's Next

We plan to exercise this logic in the new year, shifting from our previousmodel, where each testnet deployment was reset, to one where we upgrade fromtestnet to testnet, practicing chain upgrades that preserve user data, andensuring that chain upgrade functionality is tested in real-world conditions.

Stay tuned for more updates on our pathway to mainnet. Slowly, and then all at once.