Penumbra is a fully private proofofstake network providing privacy to the Cosmos ecosystem.
Penumbra integrates privacy with proofofstake through a novel private delegation mechanism that provides staking derivatives, taxefficient staking, and onchain governance with private voting. Penumbra connects to the Cosmos ecosystem via IBC, acting as an ecosystemwide shielded pool and allowing private transactions in any IBCcompatible asset. Users can also swap these assets using ZSwap, a private decentralized exchange supporting sealedbid batch auctions and Uniswapv3style concentrated liquidity. Sealedbid batch auctions prevent frontrunning, provide better execution, and reveal only the net flow across a pair of assets in each block, and liquidity positions are created anonymously, allowing traders to approximate their desired trading function without revealing their individual beliefs about prices.
This website is a living document containing unfinished public research, and is subject to revision. The updates section has a list of design updates.
If you’re interested in technical discussion about these ideas, why not
 join the discord,
 check out the repo and issue tracker,
 or follow the project on Twitter for updates.
Slides from a presentation at the ZKValidator Cosmos Privacy Showcase can be found here.
Overview
Penumbra is a fully private proofofstake network providing privacy to the Cosmos ecosystem.
Penumbra integrates privacy with proofofstake through a novel private delegation mechanism that provides staking derivatives, taxefficient staking, and onchain governance with private voting. Penumbra connects to the Cosmos ecosystem via IBC, acting as an ecosystemwide shielded pool and allowing private transactions in any IBCcompatible asset. Users can also swap these assets using ZSwap, a private decentralized exchange supporting sealedbid batch auctions and Uniswapv3style concentrated liquidity. Sealedbid batch auctions prevent frontrunning, provide better execution, and reveal only the net flow across a pair of assets in each block, and liquidity positions are created anonymously, allowing traders to approximate their desired trading function without revealing their individual beliefs about prices.
Private Transactions
Penumbra records all value in a single multiasset shielded pool based on the Zcash Sapling design, but allows private transactions in any kind of IBC asset. Inbound IBC transfers shield value as it moves into the zone, while outbound IBC transfers unshield value.
Unlike Zcash, Penumbra has no notion of transparent transactions or a
transparent value pool; instead, inbound IBC transfers are analogous to t2z
Zcash transactions, outbound IBC transfers are analogous to z2t
Zcash
transactions, and the entire Cosmos ecosystem functions analogously to
Zcash’s transparent pool.
Unlike the Cosmos Hub or many other chains built on the Cosmos SDK, Penumbra has no notion of accounts. Only validators have any kind of longterm identity, and this identity is only used (in the context of transactions) for spending the validator’s commission.
Private Staking
In a proofofstake system like the Cosmos Hub, stakeholders delegate staking tokens by bonding them to validators. Validators participate in Tendermint consensus with voting power determined by their delegation size, and delegators receive staking rewards in exchange for taking on the risk of being penalized for validator misbehavior (slashing).
Integrating privacy and proof of stake poses significant challenges. If delegations are public, holders of the staking token must choose between privacy on the one hand and staking rewards and participation in consensus on the other hand. Because the majority of the stake will be bonded to validators, privacy becomes an uncommon, optin case. But if delegations are private, issuing staking rewards becomes very difficult, because the chain no longer knows the amount and duration of each address’ delegations.
Penumbra sidesteps this problem using a new mechanism that eliminates staking rewards entirely, treating unbonded and bonded stake as separate assets, with an epochvarying exchange rate that prices in what would be a staking reward in other systems. This mechanism ensures that all delegations to a particular validator are fungible, and can be represented by a single token representing a share of that validator’s delegation pool, in effect a firstclass staking derivative. Although delegation fungibility is key to enabling privacy, as a side effect, delegators do not realize any income while their stake is bonded, only a capital gain (or loss) on unbonding.
The total amount of stake bonded to each validator is part of the public chain state and determines consensus weight, but the bonded stake itself is just another token to be recorded in a multiasset shielded pool. This provides accountability for validators and privacy and flexibility for delegators, who can trade and transact with their bonded stake just like they can with any other token.
It also provides an alternate perspective on the debate between fixedsupply and inflationbased rewards. Choosing the unbonded token as the numéraire, delegators are rewarded by inflation for taking on the risk of validator misbehavior, and the token supply grows over time. Choosing the bonded token as the numéraire, nondelegators are punished by depreciation for not taking on any risk of misbehavior, and the token supply is fixed.
Private Governance
Like the Cosmos Hub, Penumbra supports onchain governance with delegated voting. Unlike the Cosmos Hub, Penumbra’s governance mechanism supports secret ballots. Penumbra users can anonymously propose votes by escrowing a deposit of bonded stake. Stakeholders vote by proving ownership of their bonded stake prior to the beginning of the voting period and encrypting their votes to a threshold key controlled by the validator set. Validators sum encrypted votes and decrypt only the perepoch totals.
Private DEX
Penumbra provides private, sealedbid batch swaps using ZSwap. ZSwap allows users to privately swap between any pair of assets. Individual swaps do not reveal trade amounts. Instead, all swaps in each block are executed in a single batch. Only the total amount in each batch is revealed, and only after the batch has been finalized. This prevents frontrunning and provides better execution, but also provides longterm privacy for individual swaps. Users can also provide liquidity by anonymously creating Uniswapv3style concentrated liquidity positions. These positions reveal the amount of liquidity and the bounds in which it is concentrated, but are not otherwise linked to any identity, so that (with some care) users can privately approximate arbitrary trading functions without revealing their specific views about prices.
Concepts and Mechanisms
This section provides an overview of the concepts involved in Penumbra’s mechanism design.
Epochs and Threshold Decryption
Penumbra organizes blocks into epochs. The number of blocks in each epoch is chosen relative to the block interval so that epochs last approximately 24 hours. For instance, an 8second block interval would imply $86400/8=10800$ blocks per epoch.
Most state changes (e.g., transfers from one address to another) are applied at block boundaries. However, state changes related to delegation and consensus are generally applied only at epoch boundaries.
The validator set and its consensus weighting is determined on a perepoch basis, and only changes at epoch boundaries, except in the case of slashing, which causes a validator to be removed from the validator set immediately.
Penumbra requires a homomorphic encryption scheme operating on i64
values
that supports threshold decryption and distributed key generation. This
scheme is used to allow transactions to include encrypted values that can be
aggregated and then decrypted, with the validators revealing only
the aggregate value, not the value from any individual transaction.
The choice to restrict validator changes to epoch boundaries plays a key role in Penumbra’s design, in two ways:

it allows the validators to share control of a single threshold key over the entire epoch, allowing aggregation of values in different blocks;

it provides a time interval over which changes to validators’ delegations can be aggregated, enhancing delegator privacy.
Homomorphic Threshold Decryption
This encryption scheme only needs to work on i64
values, not arbitrary
data, such as an entire transaction. Penumbra does not use threshold
decryption to unseal entire encrypted transactions, because Penumbra
transactions are constructed not to reveal any unnecessary information.
At the beginning of each epoch, the validator set performs distributed key generation for a homomorphic encryption scheme to produce a decryption key collectively controlled by the validators (on an equal basis, not a stakeweighted basis) and includes the encryption key in the first block of the epoch.
Because this key is only available after the first block of each epoch, some transactions cannot occur in the first block itself. Assuming a block interval similar to the Cosmos Hub, this implies an ~8second processing delay once per day, a reasonable tradeoff against the complexity of phased setup procedures.
Notes, Nullifiers, and Trees
Penumbra’s transaction model is derived from the Sapling shielded transaction design, with modifications to support multiple asset types, and several extensions to support additional functionality. The Sapling model is in turn derived from Bitcoin’s unspent transaction output (UTXO) model, in which value is recorded in transaction outputs that record the conditions under which they can be spent.
In Penumbra, value is recorded in notes, which function similarly to UTXOs. Each note specifies (either directly or indirectly) a type of value, an amount of value of that type, a spending key that authorizes spending the note’s value, and a unique nullifier derived from the note’s contents.
However, unlike UTXOs, notes are not recorded as part of the public chain state. Instead, the chain contains a note commitment tree, an incremental Merkle tree containing (public) commitments to (private) notes. Creating a note involves creating a new note commitment, proving that it is wellformed, . Spending a note involves proving that the spent note was previously included in the note commitment tree, using the spending key to demonstrate spend authorization, and revealing the nullifier, which prevents the same note from being spent twice.
Asset types are uniquely identified by an ADR001 name for compatibility
with IBC, and Penumbra supports IBC transfers to transfer value
into and out of of
the zone, allowing Penumbra’s shielded pool to privately record any asset in
the Cosmos ecosystem. ADR001 precisely specifies the entire pathway to the
origin of any token, but when this precision is not required, we denote the
Penumbra bearer tokens for asset XYZ
as pXYZ
; e.g., transferring ATOM
from the Hub to Penumbra results in pATOM
.
Transactions
Transactions describe an atomic collection of changes to the ledger state. Each transaction consist of a sequence of descriptions for various actions. Each description adds or subtracts (typed) value from the transaction’s value balance, which must net to zero. Penumbra uses Sapling’s spend description, which spends a note and adds to the transaction’s value balance, and output description, which creates a new note and subtracts from the transaction’s value balance, and adds many new descriptions to support additional functionality:

FeeDescription
s declare and burn transaction fees; 
TransferDescription
s transfer value out of Penumbra by IBC, consuming value from the transaction’s value balance, and producing an ICS20FungibleTokenPacketData
; 
DelegateDescription
s convert unbonded stake to bonded stake, consuming unbonded stake from the transaction’s value balance and producing new notes recording bonded stake; 
UndelegateDescription
s convert bonded stake to unbonded stake, consuming bonded stake from the transaction’s value balance and producing new notes recording unbonded stake; 
CommissionDescription
s are used by validators to sweep commission on staking rewards into shielded notes, adding unbonded stake to the transaction’s value balance; 
ProposalDescription
s are used to propose measures for onchain governance and supply a deposit, consuming bonded stake from the transaction’s value balance and producing a new note that holds the deposit in escrow; 
VoteDescription
s perform private voting for onchain governance and declare a vote, consuming bonded stake from the transaction’s value balance and producing a new note with the same amount of bonded stake; 
SwapDescription
s consume and burn tokens of one type from a transaction’s value balance and produce a swap commitment that permits the user to later mint tokens of a different type; 
SweepDescription
s allow a user who burned tokens of one type to mint tokens of a different type at a chainspecified clearing price, and adds the new tokens to a transaction’s value balance. 
DepositDescription
s deposit funds into the liquidity pool for a trading pair, and produce liquidity pool shares, consuming value of the traded types and producing value of the shares’ type; 
WithdrawDescription
s redeem liquidity pool shares and withdraw funds from a liquidity pool, consuming the shares’ type and producing value of the traded types;
TODO
 add links to specifications in the Cryptography section
Staking and Delegation
As described in the overview, integrating privacy with proofofstake poses significant challenges. Penumbra sidesteps these challenges using a novel mechanism that eliminates staking rewards entirely, treating unbonded and bonded stake as separate assets, with an epochvarying exchange rate that prices in what would be a staking reward in other systems. This mechanism ensures that all delegations to a particular validator are fungible, and can be represented by a single token representing a share of that validator’s delegation pool, in effect a firstclass staking derivative.
This section describes the staking and delegation mechanism in detail:
 Staking Tokens describes the different staking tokens;
 Validator Rewards and Fees describes mechanics around validator commissions and transaction fees;
 Voting Power describes how validators’ voting power is calculated;
 Delegation describes how users bond stake to validators;
 Undelegation describes how users unbond stake from validators;
 Example Staking Dynamics contains a worked example illustrating the mechanism design.
Staking Tokens
Penumbra’s staking token, denoted PEN
, represents units of unbonded stake.
Penumbra treats stake bonded with a particular validator as a distinct asset,
denoted PENb
, with an epochvarying exchange rate between PEN
and PENb
that prices in what would be a staking reward in other systems. This ensures
that all delegations to a particular validator are fungible, and can be
represented by a single token, in effect a firstclass staking derivative
that represents fractional ownership of that validator’s delegation pool.
Stake bonded with different validators is not fungible, as different
validators may have different commission rates and different risk of
misbehavior. Hence PENb
is a shorthand for a class of assets (one per
validator), rather than a single asset. PENb
bonded to a specific
validator $v$ can be denoted PENb(v)
when it is necessary to be precise.
Each flavor of PENb
is its own firstclass token, and like any other token
can be transferred between addresses, traded, sent over IBC, etc. Penumbra
itself does not attempt to pool risk across validators, but nothing prevents
third parties from building stake pools composed of these assets.
The base reward rate for bonded stake is a parameter $r_{e}$ indexed by epoch. This parameter can be thought of as a “Layer 1 Base Operating Rate”, or “L1BOR”, in that it serves as a reference rate for the entire chain. Its value is set on a perepoch basis by a formula involving the ratio of bonded and unbonded stake, increasing when there is relatively less bonded stake and decreasing when there is relatively more. This formula should be decided and adjusted by governance.
Each validator declares a commission percentage $c_{v,e}∈[0,1]$, also indexed by epoch, which is subtracted from the base reward rate to get a validatorspecific reward rate $r_{v,e}=(1−c_{v,e})r_{e}.$
The base exchange rate between PEN
and PENb
is given by the function
$ψ(e)=0≤i<e∏ (1+r_{i}),$ which measures the cumulative
depreciation of unbonded PEN
relative to bonded PENb
from genesis up to
epoch $e$. However, because PENb
is not a single asset but a family of
pervalidator assets, this is only a base rate.
The actual exchange rate between PEN
and PENb(v)
bonded to validator $v$
accounts for commissions by substituting the validatorspecific rate
$r_{v,e}$ in place of the base rate $r$ to get $ψ_{v}(e)=0≤i<e∏ (1+r_{v,i}).$
Delegating $x$ unbonded PEN
to validator $v$ at epoch $e_{1}$ results in $x/ψ_{v}(e_{1})$ PENb(v)
. Undelegating $y$ PENb(v)
from validator $v$ at
epoch $e_{2}$ results in $yψ_{v}(e_{2})$ PEN
. Thus, delegating at epoch
$e_{1}$ and undelegating at epoch $e_{2}$ results in a return of $ψ_{v}(e_{2})/ψ_{v}(e_{1})=e_{1}≤e<e_{2}∏ (1+r_{v,e}),$ i.e., the staking
reward compounded only over the period during which the stake was bonded.
Discounting newly bonded stake by the cumulative depreciation of unbonded stake since genesis means that all bonded stake can be treated as if it had been bonded since genesis, which allows newly unbonded stake to always be inflated by the cumulative appreciation since genesis. This mechanism avoids the need to track the age of any particular delegation to compute its rewards, and makes all shares of each validator’s delegation pool fungible.
Validator Rewards and Fees
Validators declare a commission percentage $c_{v,e}∈[0,1]$, which determines the spread between the base reward rate $r_{e}$ and the reward rate for their delegators $r_{v,e}=(1−c_{v,e})r_{e}$, or equivalently $r_{e}=r_{v,e}+c_{v,e}r_{e}$.
Validator rewards are distributed in the first block of each epoch. In epoch $e$, a validator $v$ whose delegation pool has size $y_{v}$ PENb
receives a commission of size $y_{v}c_{v,e}r_{e}ψ(e−1)$ PEN
, issued to the validator’s address.
To see why this is the reward amount, suppose validator $v$ has a delegation pool of size $y_{v}$ PENb
. In epoch $e−1$, the value of the pool is $y_{v}ψ_{v}(e−1)$ PEN
. In epoch $e$, the base reward rate $r_{e}$ causes the value of the pool to increase to
$(1+r_{e})y_{v}ψ_{v}(e−1).$
Splitting $r_{e}$ as $r_{e}=r_{v,e}+c_{v,e}r_{e}$, this becomes
$y_{v}(1+r_{v,e})ψ_{v}(e−1)+c_{v,e}r_{e}y_{v}ψ_{v}(e−1).$
The value in the first term, $y_{v}(1+r_{v,e})ψ_{v}(e−1)$,
corresponds to the $r_{v,e}$ portion, and accrues to the delegators. Since $(1+r_{v,e})ψ_{v}(e−1)=ψ_{v}(e)$, this is exactly $y_{v}ψ_{v}(e)$, the new PEN
denominated value of the delegation pool.
The value in the second term, $c_{v,e}r_{e}y_{v}ψ_{v}(e−1)$, corresponds to the $c_{v,e}r_{e}$ portion, and accrues to the validator as commission. Validators can selfdelegate the resulting PEN
or use it to fund their operating expenses.
Transaction fees are denominated in PEN
and are burned, so that the value of the fees accrues equally to all stake holders.
TODO

allow transaction fees in
PENb
with appropriate discounting, but only in transactions (e.g., undelegations, voting) that otherwise reveal the flavor ofPENb
.
Voting Power
The size of each validator’s delegation pool (i.e., the amount of PENb
of that validator’s flavor) is public information, and determines the validator’s voting power. However, these sizes cannot be used directly, because they are based on validatorspecific conversion rates $ψ_{v}$ and are therefore incommensurate.
Voting power is calculated using the adjustment function $θ_{v}(e)=ψ_{v}(e)/ψ(e)$, so that a validator $v$ whose delegation pool has $y_{v}$ PENb
in epoch $e$ has voting power $y_{v}θ_{v}(e)$.
The validatorspecific reward rate $r_{v,e}=r_{e}−c_{v,i}r_{e}$ adjusts the base reward rate to account for the validator’s commission. Since $ψ_{v}(e)=0≤i<e∏ (1+r_{v,i}),$ and $ψ(e)=0≤i<e∏ (1+r_{i}),$ the adjustment function $θ_{v}(e)=ψ(e)ψ_{v}(e) =0≤i<e∏ 1+r_{i}1+r_{v,i} $ accounts for the compounded effect of the validator’s commission on the size of the delegation pool.
Delegation
The delegation process bonds stake to a validator, converting unbonded PEN
to bonded PENb
. Delegations may be performed in any block, but only take effect in the next epoch.
Delegations are accomplished by creating a transaction with a DelegateDescription
. The delegate description specifies a validator $v$, consumes $x$ PEN
from the transaction’s balance, produces a new shielded note with $y=x/ψ_{v}(e)$ PENb
associated to that validator, and includes an encryption
$Enc_{D}(y)$ of the delegation amount to the validators’ shared decryption key $D$. Here $e$ is the index of the next epoch, when the delegation will take effect.
In the last block of epoch $e−1$, the validators sum the encrypted bonding amounts $Enc_{D}(y_{v})$ from all delegate descriptions for validator $v$ in the entire epoch to obtain an encryption of the total bonding amount $Enc_{D}(∑_{i}y_{v})$ and decrypt to obtain the total bonding amount $y_{v}=∑_{i}y_{v}$ without revealing any individual transaction’s bonding amount $y_{v}$. These total amounts are used to update the size of each validator’s delegation pool for the next epoch.
Revealing only the total amount of newly bonded stake in each epoch helps avoid linkability. For instance, if the size of each delegation were revealed directly, a delegation of size $a$ followed by an undelegation of size $b$ could be correlated if an observer notices that there are some epochs $e_{1},e_{2}$ so that $aψ_{v}(e_{1})ψ_{v}(e_{2}) =b.$
This risk is still present when only the total amount  the minimum disclosure required for consensus  is revealed, because there may be few (or no) other delegations to the same validator in the same epoch. Some care should be taken in client implementations and user behavior to mitigate the effects of this information disclosure, e.g., by splitting delegations (and undelegations and votes) into multiple transactions in different epochs involving randomized subportions of the stake. However, the best mitigation would simply be to have many users.
TODO
 specify the exact data in a delegate description
Undelegation
The undelegation process unbonds stake from a validator, converting bonded
PENb
to unbonded PEN
. Undelegations may be performed in any block, but
only settle after the undelegation has exited the unbonding queue.
The unbonding queue is a FIFO queue allowing only a limited amount of stake to be unbonded in each epoch, according to an unbonding rate selected by governance. Undelegations are inserted into the unbonding queue in FIFO order. Unlike delegations, where only the total amount of newly bonded stake is revealed, undelegations reveal the precise amount of newly unbonded stake, allowing the unbonding queue to function.
Undelegations are accomplished by creating a transaction with a
UndelegateDescription
. Undelegation descriptions have different behaviour
depending on whether or not the validator was slashed.
In the unslashed case, the undelegate description spends a note with value
$y$ PENb
, reveals $y$, and produces $yψ_{v}(e)$ PEN
for the
transaction’s balance, where $e$ is the index of the current epoch. The
nullifiers revealed by undelegate descriptions are not immediately included
in the nullifier set, and new notes created by a transaction containing an
undelegate description are not immediately included in the note commitment
tree. Instead, the transaction is placed into the unbonding queue to be
applied later. In the first block of each epoch, transactions are applied if
the corresponding validator remains unslashed, until the unbonding limit is
reached.
If a validator is slashed, any undelegate transactions currently in the unbonding queue are discarded. Because the nullifiers for the notes those transactions spent were not included in the nullifier set, the notes remain spendable, allowing a user to create a new undelegation description.
Undelegations from a slashed validator are settled immediately. The
undelegate description spends a note with value $y$ PENb
and produces
$syψ_{v}(e_{s})$ PEN
, where $1−s$ is the slashing penalty and
$e_{s}$ is the epoch at which the validator was slashed. The remaining value,
$(1−s)yψ_{v}(e_{s})$, is burned.
Because pending undelegations from a slashed validator are discarded without applying their nullifiers, those notes can be spent again in a postslashing undelegation description. This causes linkability between the discarded undelegations and the postslashing undelegations, but this is not a concern because slashing is a rare and unplanned event which already imposes worse losses on delegators.
Example Staking Dynamics
To illustrate the dynamics of this system, consider a toy scenario with three delegators, Alice, Bob, and Charlie, and two validators, Victoria and William. Tendermint consensus requires at least four validators and no one party controlling more than $1/3$ of the stake, but this example uses only a few parties just to illustrate the dynamics.
For simplicity, the the base reward rates and commission rates
are fixed over all epochs at $r=0.0006$ and $c_{v}=0$, $c_{w}=0.1$.
The PEN
and PENb
holdings of participant $a,b,c,…$ are
denoted by $x_{a}$, $y_{a}$, etc., respectively.
Alice starts with $y_{a}=10000$ PENb
bonded to Victoria, Bob starts with $y_{b}=10000$ PENb
bonded to William, and Charlie starts with $x_{c}=20000$ unbonded PEN
.

At genesis, Alice, Bob, and Charlie respectively have fractions $25%$, $25%$, and $50$ of the total stake, and fractions $50%$, $50%$, $0%$ of the total voting power.

At epoch $e=1$, Alice, Bob, and Charlie’s holdings remain unchanged, but their unrealized notional values have changed.
 Victoria charges zero commission, so $ψ_{v}(1)=ψ(1)=1.0006$. Alice’s $y_{a}=10000$
PENb(v)
is now worth $10006$PEN
.  William charges $10%$ commission, so $ψ_{w}(1)=1.00054$. Bob’s $y_{b}=10000$
PENb(w)
is now worth $10005.4$, and William receives $0.6$PEN
.  William can use the commission to cover expenses, or selfdelegate. In this example, we assume that validators selfdelegate their entire commission, to illustrate the staking dynamics.
 William selfdelegates $0.6$
PEN
, to get $0.6/ψ_{w}(2)=0.6/1.00054_{2}=0.59935…$PENb
in the next epoch, epoch $2$.
 Victoria charges zero commission, so $ψ_{v}(1)=ψ(1)=1.0006$. Alice’s $y_{a}=10000$

At epoch $e=90$:
 Alice’s $y_{a}=10000$
PENb(v)
is now worth $10554.67$PEN
.  Bob’s $y_{b}=10000$
PENb(w)
is now worth $10497.86$PEN
.  William’s selfdelegation of accumulated commission has resulted in $y_{w}=53.483$
PENb(w)
.  Victoria’s delegation pool remains at size $10000$
PENb(v)
. William’s delegation pool has increased to $10053.483$PENb(w)
. However, their respective adjustment factors are now $θ_{v}(90)=1$ and $θ_{w}(90)=0.99462$, so the voting powers of their delegation pools are respectively $10000$ and $9999.37$. The slight loss of voting power for William’s delegation pool occurs because William selfdelegates rewards with a one epoch delay, thus missing one epoch of compounding.
 Charlie’s unbonded $x_{c}=20000$
PEN
remains unchanged, but its value relative to Alice and Bob’s stake has declined.  William’s commission transfers stake from Bob, whose voting power has slightly declined relative to Alice’s.
 The distribution of stake between Alice, Bob, Charlie, and William is now $25.67%$, $25.54%$, $48.65%$, $0.14%$ respectively. The distribution of voting power is $50%$, $49.74%$, $0%$, $0.27%$ respectively.
 Charlie decides to bond his stake, split evenly between Victoria and William, to get $10000/ψ_{v}(91)=9485.85$
PENb(v)
and $10000/ψ_{w}(91)=9536$PENb(w)
.
 Alice’s $y_{a}=10000$

At epoch $e=91$:
 Charlie now has $9468.80$
PENb(v)
and $9520.60$PENb(w)
, worth $20000$PEN
.  For the same amount of unbonded stake, Charlie gets more
PENb(w)
thanPENb(v)
, because the exchange rate $ψ_{w}$ prices in the cumulative effect of commission since genesis, but Charlie isn’t charged for commission during the time he didn’t delegate to William.  William’s commission for this epoch is now $1.233$
PEN
, up from $0.633$PEN
in the previous epoch.  The distribution of stake between Alice, Bob, Charlie, and William is now $25.68%$, $25.54%$, $48.64%$, $0.14%$ respectively. Because all stake is now bonded, except William’s commission for this epoch, which is insignificant, the distribution of voting power is identical to the distribution of stake.
 Charlie now has $9468.80$

At epoch $e=180$:
 Alice’s $y_{a}=10000$
PENb(v)
is now worth $11140.12$PEN
.  Bob’s $y_{b}=10000$
PENb(w)
is now worth $11020.52$PEN
.  Charlies’s $y_{c,v}=9468.80$
PENb(v)
is now worth $10548.37$PEN
, and his $y_{c,w}=9520.60$PENb(w)
is now worth $10492.20$PEN
.  William’s selfdelegation of accumulated commission has resulted in $y_{w}=158.77$
PENb(w)
, worth $176.30$PEN
.  The distribution of stake and voting power between Alice, Bob, Charlie, and William is now $25.68%$, $25.41%$, $48.51%$, $0.40%$ respectively.
 Alice’s $y_{a}=10000$
This scenario was generated with a model in this Google Sheet.
Transfers into Penumbra
IBC transfer mechanics are specified in ICS20. The
FungibleTokenPacketData
packet describes the transfer:
FungibleTokenPacketData {
denomination: string,
amount: uint256,
sender: string,
receiver: string,
}
The sender
and receiver
fields are used to specify the sending account on
the source chain and the receiving account on the destination chain. However,
for inbound transfers, the destination chain is Penumbra, which has no
accounts. Instead, token transfers into Penumbra create an
OutputDescription
describing a new shielded note with the given amount and
denomination, and insert an encoding of the description itself into the
receiver
field.
Governance
Like the Cosmos Hub, Penumbra supports onchain governance with delegated voting. Unlike the Cosmos Hub, Penumbra’s governance mechanism supports secret ballots. Penumbra users can anonymously propose votes by escrowing a deposit of bonded stake. Stakeholders vote by proving ownership of their bonded stake prior to the beginning of the voting period and encrypting their votes to a threshold key controlled by the validator set. Validators sum encrypted votes and decrypt only the perepoch totals.
TODO
 emergency vs normal proposals, where emergency proposals are approved as soon as 2/3 validators vote yes
 is “liquid democracy”style voting possible? currently the only supported delegation is to validators, but it could be useful to support delegation to third parties. one way to do this would be to allow the proxy voter to create votes on behalf of their voting delegators, but this would require that the proxy spend and create their notes, which is a huge power. this could be minimized by creating a special “voting key” capability, but nothing ensures that the proxy voter actually creates valid notes when spending them, so they could still destroy their voting delegators’ power. another
Proposal Creation
Penumbra users can propose votes by escrowing a minimum amount of any flavor
of PENb
. They do this by creating a transaction with a
ProposalDescription
, which consumes some amount of PENb
from the
transaction’s balance, and creates a new escrow note with the same amount.
The escrow note is not included in the note commitment tree until voting
completes. The voting period begins in the next epoch and lasts for a fixed
number of epochs (e.g., 7).
TODO
 work out details enabling joint escrow (proposal crowdfunding)
 need proposition selection (only one proposition can be voted on at a time)
Private Voting
Votes are the same as on the Cosmos Hub: Yes
, No
, NoWithVeto
, and
Abstain
. NoWithVeto
is the same as No
but also votes that the proposer
should lose their deposit. The intended cultural norm is that No
should be
used to indicate disagreement with goodfaith proposals and NoWithVeto
should be used to deter spam proposals.
 this isn’t exactly the same as on the hub, where there’s a distinct veto mechanic; is that mechanic good to copy?
By default, stakeholders’ votes are delegated to the validator their stake is bonded to. Validators’ votes are public, signed by the validator’s key, and act as a default vote for their entire delegation pool.
Stakeholders vote by proving ownership of some amount of bonded stake (their voting power) prior to the beginning of the voting period.
To do this, they create a transaction with a VoteDescription
. This
description reveals the validator $v$ and one of the four votes above,
consumes $y$ PENb(v)
from the transaction’s balance, produces a new note
with $y$ PENb
, and includes $Enc_{D}(y)$, an encryption of the
voting power to the validators’ decryption key.
Descriptions that spend notes contain a proofofinclusion for the note they
spend. This establishes that a commitment to the spent note was previously
included in the note commitment tree, an incremental Merkle tree whose
current root (”anchor”) is included in each block. Normally, these proofs are
created and validated with respect to a recent anchor. However, transactions
containing VoteDescription
s are always validated with respect to the anchor
in the last block before the voting period begins. This prevents
doublevoting, by ensuring that only notes created before voting began can be
used to vote.
TODO
 vote acceptance thresholds / veto behavior
 encrypt voting power for each vote choice so that the vote does not reveal it
Counting Votes
At the end of each epoch, validators collect the encrypted votes from each delegation pool, aggregate the encrypted votes into encrypted tallies and decrypt the tallies. At the end of the voting period, these perepoch, pervalidator tallies are combined as follows. For each validator $v$, the perepoch tallies from that validator’s delegation pool are summed to obtain voting power subtotals for each vote option. Then, these subtotals are summed to determine the amount of the delegation pool that voted. The validator’s vote acts as a default vote for the rest of the delegation pool. Finally, these pervalidator subtotals are multiplied by the voting power adjustment function $θ_{v}(e)$ to obtain the final vote totals.
If the vote was not vetoed, the escrowed note from the ProposalDescription
is included in the note commitment tree, so that it can be spent by the
proposer. Otherwise, it is not, and the funds are burned.
ZSwap
Penumbra provides private, sealedbid batch swaps using ZSwap. ZSwap allows users to privately swap between any pair of assets. Individual swaps do not reveal trade amounts. Instead, all swaps in each block are executed in a single batch. Only the total amount in each batch is revealed, and only after the batch has been finalized. This prevents frontrunning and provides better execution, but also provides longterm privacy for individual swaps. Users can also provide liquidity by anonymously creating Uniswapv3style concentrated liquidity positions. These positions reveal the amount of liquidity and the bounds in which it is concentrated, but are not otherwise linked to any identity, so that (with some care) users can privately approximate arbitrary trading functions without revealing their specific views about prices.
Frequent batch auctions as a market design response
Budish, Cramton, and Shim (2015) analyze trading in traditional financial markets using the predominant continuoustime limit order book market design, and find that highfrequency trading arises as a response to mechanical arbitrage opportunities created by flawed market design:
These findings suggest that while there is an arms race in speed, the arms race does not actually affect the size of the arbitrage prize; rather, it just continually raises the bar for how fast one has to be to capture a piece of the prize... Overall, our analysis suggests that the mechanical arbitrage opportunities and resulting arms race should be thought of as a constant of the market design, rather than as an inefficiency that is competed away over time.
— Eric Budish, Peter Cramton, John Shim, The HighFrequency Trading Arms Race: Frequent Batch Auctions as a Market Design Response
Because these mechanical arbitrage opportunities arise from the market design even in the presence of symmetrically observed public information, they do not improve prices or produce value, but create arbitrage rents that increase the cost of liquidity provision^{1}. Instead, they suggest changing from a continuoustime model to a discretetime model and performing frequent batch auctions, executing all orders that arrive in the same discrete time step in a single batch with a uniform price.
In the blockchain context, while projects like Uniswap have demonstrated the power and success of CFMMs for decentralized liquidity provision, they have also highlighted the mechanical arbitrage opportunities created by the mismatch between continuoustime market designs and the state update model of the underlying blockchain, which operates in discrete batches (blocks). Each prospective state update is broadcast to all participants to be queued in the mempool, but only committed as part of the next block, and while it is queued for inclusion in the consensus state, other participants can bid to manipulate its ordering relative to other state updates (for instance, frontrunning a trade).
This manipulation is enabled by two features:

Although trades are always committed in a batch (a block), they are performed individually, making them dependent on miners’ choices of the ordering of trades within a block;

Because trades disclose the trade amount in advance of execution, all other participants have the information necessary to manipulate them.
ZSwap addresses the first problem by executing all swaps in each block in a single batch, first aggregating the amounts in each swap and then executing it against the CFMM as a single trade.
ZSwap addresses the second problem by having users encrypt their swap amounts using a homomorphic threshold decryption key controlled by the validators, who aggregate the encrypted amounts and decrypt only the batch trade. This prevents frontrunning prior to block inclusion, and provides privacy for individual trades (up to the size of the batch) afterwards.
Users do not experience additional trading latency from the batch auction design, because the batch auctions occur in every block, which is already the minimum latency for finalized state updates. A longer batch latency could help privacy for markettakers by increasing the number of swaps in each batch, but would impair other trading and impose a worse user experience.
Private, sealedbid batch auctions
A key challenge in the design of any private swap mechanism is that zeroknowledge proofs only allow privacy for userspecific state, not for global state, because they don’t let you prove statements about things that you don’t know. While users can prove that their userspecific state was updated correctly without revealing it, they cannot do so for other users’ state.
Instead of solving this problem, ZSwap sidesteps the need for users to do so. At a high level, swaps work as follows: users privately burn funds of one kind in a coordinated way that allows the chain to compute a perblock clearing price, and mint or burn liquidity pool reserves. Later, users privately mint funds of the other kind, proving that they previously burned funds and that the minted amount is consistent with the computed price and the burned amount. No interaction or transfer of funds between users or the liquidity pool reserves is required. Users do not transact with each other. Instead, the chain permits them to transmute one asset type to another, provably updating their private state without interacting with any other users’ private state.
This mechanism is described in more detail in the SealedBid Batch Auctions section.
Concentrated Liquidity
ZSwap uses the concentrated liquidity mechanism introduced in Uniswap v3 to make liquidity provision more capitalefficient. Uniswap v3’s insight is that that existing constantproduct market makers like Uniswap v2 allocate liquidity inefficiently, spreading it uniformly over the entire range of possible prices for a trading pair. Instead, allowing liquidity providers (LPs) to restrict their liquidity to a price range of their choosing provides a mechanism for market allocation of liquidity, concentrating it into the range of prices that the assets in the pair actually trade.
Liquidity providers create positions that tie a quantity of liquidity to a specific price range. Within that price range, the position acts as a constantproduct market maker with larger “virtual” reserves. At each price, the pool aggregates liquidity from all positions that contain that price, and tracks which positions remain (or become) active as the price moves. By creating multiple positions, LPs can approximate arbitrary trading functions.
In Uniswap v3, all positions are public and tied to the identity of the LP who created them. In ZSwap, positions are also public, but they are opened and closed anonymously. This means that while the aggregate of all LPs’ trading functions is public, the trading function of each individual LP is not, so LPs can approximate their desired trading functions without revealing their specific views about prices, as long as they take care to avoid linking their positions (e.g., by timing or amount).
Handling of concentrated liquidity is described in more detail in the Concentrated Liquidity section.
Liquidity Mining and Fees
ZSwap supports liquidity mining, described in more detail in the Liquidity Mining section.
Because ZSwap positions are created anonymously, fees and liquidity mining rewards cannot be paid out to the position’s owner, as in Uniswap v3. Instead, fees and rewards accrue to the position, and are claimed when the position is closed. This process is described in the Closing Positions section.
on HFT, N.B. their footnote 5:
A point of clarification: our claim is not that markets are less liquid today than before the rise of electronic trading and HFT; the empirical record is clear that trading costs are lower today than in the preHFT era, though most of the benefits appear to have been realized in the late 1990s and early 2000s... Rather, our claim is that markets are less liquid today than they would be under an alternative market design that eliminated sniping.
SealedBid Batch Auctions
ZSwap’s sealedbid batch auctions conceptually decompose into two parts: the marketmaking function itself, whose design is inspired by Uniswap v3, and the batching procedure, which allows multiple users’ swaps to be batched into a single trade executed against the trading function. This section focuses on the batching procedure, leaving the mechanics of the trading function for later.
A key challenge in the design of any private swap mechanism is that zeroknowledge proofs only allow privacy for userspecific state, not for global state, because [they don’t let you prove statements about things that you don’t know][barrywhitehatprivateuniswap]. While users can prove that their userspecific state was updated correctly without revealing it, they cannot do so for other users’ state.
Instead of solving this problem, ZSwap sidesteps the need for users to do so. Rather than have users transact with each other, the chain permits them to transmute one asset type to another, provably updating their private state without interacting with any other users’ private state. This works as follows.

Users create transactions with
SwapDescription
s that privately burn their input assets and encrypt the amounts to the validators. This description identifies the trading pair $(t_{1},t_{2})$, consumes $Δ=(Δ_{1},Δ_{2})$ of types $(t_{1},t_{2})$ from the transaction balance, and contains an encryption of the trade inputs $Enc_{D}(Δ)=(Enc_{D}(Δ_{1}),Enc_{D}(Δ_{2}))$ and a swap commitment (analogous to a note commitment) that binds to $Δ$ and a nullifier. Rational traders will choose either $Δ_{1}=0$ or $Δ_{2}=0$, i.e., trade one asset type for the other type, but the description provides two inputs so that different swap directions cannot be distinguished. 
Validators sum the encrypted amounts $Enc_{D}(Δ_{(i)})$ of all swaps in the batch to obtain an encryption of the combined inputs $Enc_{D}(∑_{i}Δ_{(i)})$, then decrypt to obtain the batch input $Δ=∑_{i}Δ_{(i)}$ without revealing any individual transaction’s input $Δ_{(i)}$. Then they execute $Δ$ against the trading pool, updating the pool state and obtaining the effective (inclusive of fees) clearing prices $p_{t_{1},t_{2}}$ and $p_{t_{2},t_{1}}$.

In a future block, users who created
SwapDescription
s obtain assets of the new types by creating aSweepDescription
. This description reveals the block height $h$ containing the swap description, the trading pair $(t_{1},t_{2})$, and the nullifier $n$ of a swap commitment. Then, it proves inclusion of a swap commitment with nullifier $n$ and trade inputs $Δ$ in the note commitment tree for that block, and proves that $Λ_{1}=Δ_{2}p_{t_{1},t_{2}}$ and $Λ_{2}=Δ_{1}p_{t_{2},t_{1}}$. Finally, it adds $(Λ_{1},Λ_{2})$ of types $(t_{1},t_{2})$ to the transaction’s balance (e.g., to be consumed by anOutputDescription
that creates a new note).
Although sweep descriptions do not reveal the amounts, or which swap’s outputs they claim, they do reveal the block and trading pair, so their anonymity set is considerably smaller than an ordinary shielded value transfer. For this reason, client software should create and submit a transaction with a sweep description immediately after observing that its transaction with a swap description was included in a block, rather than waiting for some future use of the new assets. This ensures that future shielded transactions involving the new assets are not trivially linkable to the swap.
This design reveals only the net flow across a trading pair in each batch, not the amounts of any individual swap. However, this provides no protection if the batch contains a single swap, and limited protection when there are only a few other swaps. This is likely to be an especially severe problem until the protocol has a significant base of active users, so it is worth examining the impact of amount disclosure and potential mitigations.
Assuming that all amounts are disclosed, an attacker could attempt to deanonymize parts of the transaction graph by tracing amounts, using strategies similar to those in An Empirical Analysis of Anonymity in Zcash. That research attempted to deanonymize transactions by analysing movement between Zcash’s transparent and shielded pools, with some notable successes (e.g., identifying transactions associated to the sale of stolen NSA documents). Unlike Zcash, where optin privacy means the bulk of the transaction graph is exposed, Penumbra does not have a transparent pool, and the bulk of the transaction graph is hidden, but there are several potential places to try to correlate amounts:
 IBC transfers into Penumbra are analogous to
t2z
transactions and disclose types and amounts and accounts on the source chain;  IBC transfers out of Penumbra are analogous to
z2t
transactions and disclose types and amounts and accounts on the destination chain;  Each unbonding discloses the precise amount of newly unbonded stake and the validator;
 Each epoch discloses the net amount of newly bonded stake for each validator;
 Liquidity pool deposits disclose the precise type and amount of newly deposited reserves;
 Liquidity pool deposits disclose the precise type and amount of newly withdrawn reserves;
The existence of the swap mechanism potentially makes correlation by amount more difficult, by expanding the search space from all amounts of one type to all combinations of all amounts of any tradeable type and all historic clearing prices. However, assuming rational trades may cut this search space considerably.^{1}
TODO
 consider the effect of multiasset support on correlations more deeply.
Thanks to Guillermo Angeris for this observation.
Cryptography
This section describes the cryptographic design of Penumbra and the primitives it uses.
Primitives
Penumbra uses the following cryptographic primitives, described in the following sections:

The Proof System section describes the choice of proving curve (BLS12377) and proof system (Groth16, and potentially PLONK in the future);

The
decaf377
section describesdecaf377
, a parameterization of the Decaf construction defined over the BLS12377 scalar field, providing a primeorder group that can be used inside or outside of a circuit; 
The Poseidon for BLS12377 section describes parameter selection for an instantiation of Poseidon, a SNARKfriendly sponge construction, over the BLS12377 scalar field;

The
zk555
section describeszk555
, an instantiation of the STROBE protocol framework for use inside (and outside) of circuits.
Proof System
Penumbra needs SNARK proofs. Because the choice of proving system and proving curve can’t really be cleanly separated from the rest of the system choices (e.g., the native field of the proving system informs what embedded curve is available, and how circuit programming is done), large parts of the rest of the system design block on making a choice of proving system.
Goals
 Nearterm implementation availability. Creating fast and highquality implementations of elliptic curves and proof systems is fun, but I don’t want the project to block on it.
 High performance for fixed functionality. Penumbra currently intends to support fixed functionality; programmability is a good future goal but isn’t a nearterm objective. The fixed functionality should have as high performance as possible.
 Longerterm flexibility. The choice should ideally not preclude many future choices for later functionality. More precisely, it should not impose high switching costs on future choices.
 Recursion capability. Penumbra doesn’t currently make use of recursion, but there are a lot of cool things it could enable in the future, and it would be able to do those in the future.
Setup ceremonies are beneficial to avoid for operational reasons, but not for security reasons. A decentralized setup procedure is sufficient for security.
Options
Proof systems:
 Groth16:
 Pros: high performance, very small proofs, mature system
 Cons: requires a setup for each proof statement
 PLONK:
 Pros: universal setup, still fairly compact proofs, seems to be a point of convergence with useful extensions (plookup, SHPLONK, etc)
 Cons: bigger proofs, worse constants than Groth16
 Halo 2
 Pros: no setup, arbitrary depth recursion
 Cons: bigger proof sizes, only useful with the Pallas/Vesta curves which don’t support pairings
Curve choices:

BLS12381:
 Pros: very mature, used by Sapling already
 Cons: no easy recursion

BLS12377:
 Pros: constructed as part of Zexe to support depth 1 recursion using a bigger parent curve, deployed in Celo, to be deployed in Zexe
 Cons: ?

Pallas/Vesta:
 Pros: none other than support for Halo 2’s arbitrary recursion
 Cons: no pairings mean they cannot be used for any pairingbased SNARK
Thoughts
Although the choice of proof system (Groth16, Plonk, Halo, Pickles, ...) is not completely separable from the choice of proving curve (e.g., pairingbased SNARKs require pairingfriendly curves), to the extent that it is, the choice of the proof system is relatively less important than the choice of proving curve, because it is easier to encapsulate.
The choice of proving curve determines the scalar field of the arithmetic circuit, which determines which curves are efficient to implement in the circuit, which determines which cryptographic constructions can be performed in the circuit, which determines what kind of key material the system uses, which propagates all the way upwards to uservisible details like the address format. While swapping out a proof system using the same proving curve can be encapsulated within an update to a client library, swapping out the proving curve is extremely disruptive and essentially requires all users to generate new addresses and migrate funds.
This means that, in terms of proof system flexibility, the Pallas/Vesta curves are relatively disadvantaged compared to pairingfriendly curves like BLS12381 or BLS12377, because they cannot be used with any pairingbased SNARK, or any other pairingbased construction. Realistically, choosing them is committing to using Halo 2.
Choosing BLS12377 instead of BLS12381 opens the possibility to do depth1 recursion later, without meaningfully restricting the nearterm proving choices. For this reason, BLS12377 seems like the best choice of proving curve.
Penumbra’s approach is to first create a useful set of fixed functionality, and generalize to custom, programmable functionality only later. Compared to Sapling, there is more functionality (not just Spend
and Output
but Delegate
, Undelegate
, Vote
, ...), meaning that there are more proof statements. Using Groth16 means that each of these statements needs to have its own proving and verification key, generated through a decentralized setup.
So the advantage of a universal setup (as in PLONK) over perstatement setup (as in Groth16) would be:
 The setup can be used for additional fixed functionality later;
 Client software does not need to maintain distinct proving/verification keys for each statement.
(2) is a definite downside, but the impact is a little unclear. As a point of reference, the Sapling spend and output parameters are 48MB and 3.5MB respectively. The size of the spend circuit could be improved using a snarkfriendly hash function.
With regard to (1), if functionality were being developed in many independent pieces, doing many setups would impose a large operational cost. But doing a decentralized setup for a dozen proof statements simultaneously does not seem substantially worse than doing a decentralized setup for a single proof statement. So the operational concern is related to the frequency of groups of new statements, not the number of statements in a group. Adding a later group of functionality is easy if the first group used a universal setup. But if it didn’t, the choice of perstatement setup initially doesn’t prevent the use of a universal setup later, as long as the new proof system can be implemented using the same curve.
Because Penumbra plans to have an initial set of fixed functionality, and performance is a concern, Groth16 seems like a good choice, and leaves the door open for a future universal SNARK. Using BLS12377 opens the door to future recursion, albeit only of depth 1.
The decaf377
group
Penumbra, like many other zeroknowledge protocols, requires a cryptographic group that can be used inside of an arithmetic circuit. This is accomplished by defining an “embedded” elliptic curve whose base field is the scalar field of the proving curve used by the proof system.
The Zexe paper, which defined BLS12377, also defined (but did not name) a cofactor4 Edwards curve defined over the BLS12377 scalar field for exactly this purpose. However, nonprimeorder groups are a leaky abstraction, forcing all downstream constructions to pay attention to correct handling of the cofactor. Although it is usually possible to do so safely, it requires additional care, and the optimal technique for handling the cofactor is different inside and outside of a circuit.
Instead, applying the Decaf construction to this curve gives decaf377
, a
clean abstraction that provides a primeorder group complete with hashtogroup
functionality and whose encoding and decoding functions integrate validation.
Although it imposes a modest additional cost in the circuit context, as
discussed in Costs and Alternatives, the
construction works the same way inside and outside of a circuit and imposes no
costs for lightweight, softwareonly applications, making it a good choice for
generalpurpose applications.
Implementation
A workinprogress implementation of decaf377
can be found here.
Costs and Alternatives
Arithmetic circuits have a different cost model than software. In the software cost model, software executes machine instructions, but in the circuit cost model, relations are certified by constraints. Unfortunately, while Decaf is a clearly superior choice in the software context, in the circuit context it imposes some additional costs, which must be weighed against its benefits.
At a high level, Decaf implements a primeorder group using a nonprimeorder curve by constructing a group quotient. Internally, group elements are represented by curve points, with a custom equality check so that equivalent representatives are considered equal, an encoding function that encodes equivalent representatives as identical bitstrings, and a decoding function that only accepts canonical encodings of valid representatives.
To do this, the construction defines a canonical encoding on a Jacobi quartic curve mod its 2torsion (a subgroup of size 4) by making two independent sign choices. Then, it uses an isogeny to transport this encoding from the Jacobi quartic to a target curve that will be used to actually implement the group operations. This target curve can be an Edwards curve or a Montgomery curve. The isogenies are only used for deriving the construction. In implementations, all of these steps are collapsed into a single set of formulas that perform encoding and decoding on the target curve.
In other words, one way to think about the Decaf construction is as some machinery that transforms two sign choices into selection of a canonical representative. Ristretto adds extra machinery to handle cofactor 8 by making an additional sign choice.
In the software cost model, where software executes machine instructions, this construction is essentially free, because the cost of both the Decaf and conventional Edwards encodings are dominated by the cost of computing an inverse or an inverse square root, and the cost of the sign checks is insignificant.
However, in the circuit cost model, where relations are certified by various constraints, this is no longer the case. On the one hand, certifying a square root or an inverse just requires checking that $y_{2}=x$ or that $xy=1$, which is much cheaper than actually computing $y=x $ or $y=1/x$. On the other hand, performing a sign check involves bitconstraining a field element, requiring hundreds of constraints.
Sign checks
The definition of which finite field elements are considered nonnegative is essentially arbitrary. The Decaf paper suggests three possibilities:

using the least significant bit, defining $x$ to be nonnegative if the least absolute residue for $x$ is even;

using the most significant bit, defining $x$ to be nonnegative if the least absolute residue for $x$ is in the range $0≤x≤(q−1)/2$;

for fields where $q≡3(mod4)$, using the Legendre symbol, which distinguishes between square and nonsquare elements.
Using the Legendre symbol is very appealing in the circuit context, since it has an algebraic definition and, at least in the case of square elements, very efficient certification. For instance, if square elements are chosen to be nonnegative, then certifying that $x$ is nonnegative requires only one constraint, $x=y_{2}$. However, the reason for the restriction to $3(mod4)$ fields is that $1$ and $−1$ should have different signs, which can only be the case if $−1$ is nonsquare. Unfortunately, many SNARKfriendly curves, including BLS12377, are specifically chosen so that $q≡1(mod2_{α})$ for as large a power $α$ as possible (e.g., $α=47$ in the case of BLS12377).
This leaves us with either the LSB or MSB choices. The least significant bit is potentially simpler for implementations, since it is actually the low bit of the encoding of $x$, while the most significant bit isn’t, because it measures from $(q−1)/2$, not a bit position $2_{k}$, so it seems to require a comparison or range check to evaluate. However, these choices are basically equivalent, in the following sense:
Lemma.^{}1
The most significant bit of $x$ is $0$ if and only if the least significant bit of $2x$ is $0$.
Proof.
The MSB of $x$ is $0$ if and only if $2x≤q−1$, but this means that $2x$, which is even, is the least absolute residue, so the LSB of $2x$ is also $0$. On the other hand, the MSB of $x$ is $1$ if and only if $x>(q−1)/2$, i.e., if $x≥(q−1)/2+1$, i.e., if $2x≥q−1+2=q+1$. This means that the least absolute residue of $2x$ is $2x−q$; since $2x$ is even and $q$ is odd, this is odd and has LSB $1$. $□$
This means that transforming an LSB check to an MSB check or vice versa requires multiplication by $2$ or $1/2$, which costs at most one constraint.
Checking the MSB requires checking whether a value is in the range $[0,(q−1)/2]$. Using Daira Hopwood’s optimized range constraints, the range check costs $73C$^{2}. However, the input to the range check is a bitconstrained unpacking of a field element, not a field element itself. This unpacking costs $253C$.
Checking the LSB is no less expensive, because although the check only examines one bit, the circuit must certify that the bitencoding is canonical. This requires checking that the value is in the range $[0,q−1]$, which also costs $73C$, and as before, the unpacking costs $253C$.
In other words, checking the sign of a field element costs $253C+73C=326C$, or $76C$ in the case where the field element is already bitencoded for other reasons. These checks are the dominant cost for encoding and decoding, which both require two sign checks. Decoding from bits costs c. $73C+326C≅400C$, decoding from a field element costs c. $326C+326C≅750C$, and encoding costs c. $750C$ regardless of whether the output is encoded as bits or as a field element.
Alternative approaches to handling cofactors
Decaf constructs a primeorder group whose encoding and decoding methods perform validation. A more conventional alternative approach is to use the underlying elliptic curve directly, restrict to its primeorder subgroup, and do subgroup validation separately from encoding and decoding. If this validation is done correctly, it provides a primeorder group. However, because validation is an additional step, rather than an integrated part of the encoding and decoding methods, this approach is necessarily more brittle, because each implementation must be sure to do both steps.
In the software cost model, there is no reason to use subgroup validation, because it is both more expensive and more brittle than Decaf or Ristretto. However, in the circuit cost model, there are cheaper alternatives, previously analyzed by Daira Hopwood in the context of Ristretto for JubJub (1, 2).
Multiplication by the group order.
The first validation method is to do a scalar multiplication and check that $[q]P=1$. Because the prime order is fixed, this scalar multiplication can be performed more efficiently using a hardcoded sequence of additions and doublings.
Cofactor preimage.
The second validation method provides a preimage $Q=[1/4]P$ in affine coordinates $(x,y)$. Because the image of $[4]:E→E$ is the primeorder subgroup, checking that $Q$ satisfies the curve equation and that $P=[4]Q$ checks that $P$ is in the primeorder subgroup.
In the software context, computing $[1/4]P$ and computing $[q]P$ cost about the same, although both are an order of magnitude more expensive than decoding. But in the circuit context, the prover can compute $Q=[1/4]P$ outside of the circuit and use only a few constraints to check the curve equation and two doublings. These costs round to zero compared to sign checks, so the validation is almost free.
The standard “compressed Edwards y” format encodes a point $(x,y)$ by the $y$coordinate and a sign bit indicating whether $x$ is nonnegative. In software, the cost of encoding and decoding are about the same, and dominated by taking an inverse square root. In circuits, the costs of encoding and decoding are also about the same, but they are instead dominated by a sign check that matches the sign of the recovered $x$coordinate with the supplied sign bit. This costs c. $325C$ as above.
Comparison and discussion
This table considers only approximate costs.
Operation  decaf377  Compressed Ed + Preimage 

Decode (from bits)  400C  400C 
Decode (from $F_{q}$)  750C  325C 
Encode (to bits)  750C  750C 
Encode (to $F_{q}$)  750C  325C 
When decoding from or encoding to field elements, the marginal cost of Decaf compared to the compressed Edwards + cofactor preimage is an extra bitunpacking and range check. While this effectively doubles the number of constraints, the marginal cost of c. $325C$ is still small relative to other operations like a scalar multiplication, which at 6 constraints per bit is approximately $1500C$.
When decoding from or encoding to bits, the marginal cost of Decaf disappears. When the input is already bitconstrained, Decaf’s first sign check can reuse the bit constraints, saving c. $250C$, but the compressed Edwards encoding must rangecheck the bits (which Decaf already does), costing c. $75C$ extra. Similarly, in encoding, Decaf’s second sign check produces bitconstrained variables for free, while the compressed Edwards encoding spends c. $250C+75C$ bitconstraining and rangechecking them.
However, in the software context, the primeorder validation check costs approximately 10x more than the cost of either encoding. Many applications require use of the embedded group both inside and outside of the circuit, and uses outside of the circuit may have additional resource constraints (for instance, a hardware token creating a signature authorizing delegated proving, etc.).
Performing validation as an additional, optional step also poses additional risks. While a specification may require it to be performed, implementations that skip the check will appear to work fine, and the history of invalidpoint attacks (where implementations should, but don’t, check that point coordinates satisfy the curve equation) suggests that structuring validation as an integral part of encoding and decoding is a safer design. This may not be a concern for a specific application with a single, highquality implementation that doesn’t make mistakes, but it’s less desirable for a generalpurpose construction.
In summary, Decaf provides a construction that works the same way inside and outside of a circuit and integrates validation with the encoding, imposing only a modest cost for use in circuits and no costs for lightweight, softwareonly applications, making it a good choice for generalpurpose constructions.
I have no idea whether this is common knowledge; I learned of this fact from its use in Mike Hamburg’s Ed448Goldilocks implementation.
The value 73 is computed as:
from itertools import groupby
def cost(k):
return min(k1, 2)
def constraints(bound):
costs = [cost(len(list(g))+1) for (c, g) in groupby(bound.bits()[:1]) if c == 1]
return sum(costs)
constraints(ZZ((q1)/2))
as here.
Inverse Square Roots
As in the [internetdraft], the decaf377
functions are defined in terms of the
following function, which computes the square root of a ratio of field elements,
with the special behavior that if the input is nonsquare, it returns the square
root of a related field element, to allow reuse of the computation in the
hashtogroup setting.
 TODO: actually specify this procedure.
Fix $ζ$ as 2841681278031794617739547238867782961338435681360110683443920362658525667816. Then $ζ$ is a nonsquare $2_{47}$th root of unity.
 TODO: possibly change this value to be convenient for implementations, rather than something that’s convenient for SAGE;
Define sqrt_ratio_zeta(u,v)
as:
 (True, $vu $) if $u$ and $v$ are nonzero, and $vu $ is square;
 (True, $0$) if $u$ is zero;
 (False, $0$) if $v$ is zero;
 (False, $ζvu $) if $u$ and $v$ are nonzero, and $vu $ is nonsquare.
Since $ζ$ is nonsquare, if $vu $ is nonsquare, $ζvu $ is
square. Note that unlike the similar function in the
ristretto255
/decaf448
[internetdraft], this function does not make any
claims about the sign of its output.
 TODO: describe efficient implementation using 2020/1407
Decoding
Decoding to works as follows:

Decode
s_bytes
to a field element $s$, rejecting if the encoding is noncanonical. (If the input is already a field element in the circuit case, skip this step). 
Check that $s$ is nonnegative, or reject (sign check 1).

$u_{1}←1−s_{2}$.

$u_{2}←u_{1}−4ds_{2}$.

(was_square, v) = sqrt_ratio_zeta(1, u_2 * u_1^2)
, rejecting ifwas_square
is false. 
$v←−v$ if $2su_{1}v$ is negative (sign check 2).

$(x,y)←(2sv_{2}u_{1}u_{2},(1+s_{2})vu_{2})$.
The resulting coordinates are the affine Edwards coordinates of an internal representative of the group element.
 simplify formulas using numerator instead of 1
Encoding
Given a representative in extended coordinates $(X,Y,Z,T)$, encoding works as follows.

$u_{1}←(X+T)(Y−T)$.

(_ignored, v) = sqrt_ratio_zeta(1, u_1 * (1  d) * x^2)
. 
$u_{2}←∣vu_{1}∣$ (sign check 1).

$u_{3}←u_{2}Z−T$.

$s←∣(−1−d)vu_{3}X∣$.

Set
s_bytes
to be the canonical littleendian encoding of $s$.
Group Hash
 specify, need to match the choice of quadratic nonresidue with $ζ$ used in invsqrt.
Test Vectors
Small generator multiples
This table has hexencodings of small multiples of the generator $B$:
Element  Hex encoding 

$0$  0000000000000000000000000000000000000000000000000000000000000000 
$B$  0800000000000000000000000000000000000000000000000000000000000000 
$[2]B$  b2ecf9b9082d6306538be73b0d6ee741141f3222152da78685d6596efc8c1506 
$[3]B$  2ebd42dd3a2307083c834e79fb9e787e352dd33e0d719f86ae4adb02fe382409 
$[4]B$  6acd327d70f9588fac373d165f4d9d5300510274dffdfdf2bf0955acd78da50d 
$[5]B$  460f913e516441c286d95dd30b0a2d2bf14264f325528b06455d7cb93ba13a0b 
$[6]B$  ec8798bcbb3bf29329549d769f89cf7993e15e2c68ec7aa2a956edf5ec62ae07 
$[7]B$  48b01e513dd37d94c3b48940dc133b92ccba7f546e99d3fc2e602d284f609f00 
$[8]B$  a4e85dddd19c80ecf5ef10b9d27b6626ac1a4f90bd10d263c717ecce4da6570a 
$[9]B$  1a8fea8cbfbc91236d8c7924e3e7e617f9dd544b710ee83827737fe8dc63ae00 
$[10]B$  0a0f86eaac0c1af30eb138467c49381edb2808904c81a4b81d2b02a2d7816006 
$[11]B$  588125a8f4e2bab8d16affc4ca60c5f64b50d38d2bb053148021631f72e99b06 
$[12]B$  f43f4cefbe7326eaab1584722b1b4860de554b23a14490a03f3fd63a089add0b 
$[13]B$  76c739a33ffd15cf6554a8e705dc573f26490b64de0c5bd4e4ac75ed5af8e60b 
$[14]B$  200136952d18d3f6c70347032ba3fef4f60c240d706be2950b4f42f1a7087705 
$[15]B$  bcb0f922df1c7aa9579394020187a2e19e2d8073452c6ab9b0c4b052aa50f505 
Poseidon for BLS12377
zk555, a STROBE for circuits
Fuzzy Message Detection
By design, privacypreserving blockchains like Penumbra don’t reveal metadata about the sender or receiver of a transaction. However, this means that users must scan the entire chain to determine which transactions relate to their addresses. This imposes large bandwidth and latency costs on users who do not maintain online replicas of the chain state, as they must “catch up” each time they come online by scanning all transactions that have occurred since their last activity.
Alternatively, users could delegate scanning to a third party, who would monitor updates to the chain state on their behalf and forward only a subset of those updates to the user. This is possible using viewing keys, as in Zcash, but viewing keys represent the capability to view all activity related to a particular address, so they can only be delegated to trusted third parties.
Instead, it would be useful to be able to delegate only a probabilistic detection capability. Analogous to a Bloom filter, this would allow a detector to identify all transactions related to a particular address (no false negatives), while also identifying unrelated transactions with some false positive probability. Unlike viewing capability, detection capability would not include the ability to view the details of a transaction, only a probabilistic association with a particular address. This is the problem of fuzzy message detection (FMD), analyzed by Beck, Len, Miers, and Green in their paper Fuzzy Message Detection, which proposes a cryptographic definition of fuzzy message detection and three potential constructions.
This section explores how Penumbra could make use of fuzzy message detection:

In Sender and Receiver FMD, we propose a generalization of the original definition where the false positive probability is set by the sender instead of the receiver, and discusses why this is useful.

In Constructing SFMD, we realize the new definition using a variant of one of the original FMD constructions, and extend it in two ways:
 to support diversified detection, allowing multiple, publicly unlinkable addresses to be scanned by a single detection key;
 to support arbitrarily precise detection with compact, constantsize keys.
The resulting construction can then be integrated into the Sapling key heirarchy.

In Parameter Considerations, we discuss how the false positive rates should be chosen.
Acknowledgements
Thanks to George Tankersley and Sarah Jamie Lewis for discussions on this topic (and each independently suggesting the modifications to realize SFMD), to Gabrielle Beck for discussions about the paper and ideas about statistical attacks, and to Guillermo Angeris for pointers on analyzing information disclosure.
Sender and Receiver FMD
The goal of detection capability is to be able to filter the global stream of state updates into a local stream of state updates that includes all updates related to a particular address, without identifying precisely which updates those are. The rate of updates on this filtered stream should be bounded below, to ensure that there is a large enough anonymity set, and bounded above, so that users processing the stream have a constant and manageable amount of work to process it and catch up with the current chain state. This means that the detection precision must be adaptive to the global message rates: if the false positive rate is too low, the filtered stream will have too few messages, and if it is too high, it will have too many messages.
However, this isn’t possible using the paper’s original definition of fuzzy message detection, because the false positive probability is chosen by the receiver, who has no way to know in advance what probability will produce a correctly sized stream of messages. One way to address this problem is to rename the original FMD definition as Receiver FMD (RFMD), and tweak it to obtain Sender FMD (SFMD), in which the sender chooses the detection probability.
Receiver FMD
The paper’s original definition of fuzzy message detection is as a tuple of
algorithms (KeyGen, Flag, Extract, Test)
. The receiver uses KeyGen
to
generate a root key and a flag key^{1}. A sender uses the
receiver’s flag key as input to Flag
to produce a flag ciphertext. The
Extract
algorithm takes the root key and a false positive rate $p$
(chosen from some set of supported rates), and produces a detection key. This
detection key can be applied to a flag ciphertext using Test
to produce a
detection result.
This scheme should satisfy certain properties, formalizations of which can be found in the paper:
Correctness.
Valid matches must always be detected by Test
; i.e., there are no false negatives.
Fuzziness.
Invalid matches should produce false positives with probability approximately $p$, as long as the flag ciphertexts and detection keys were honestly generated.
Detection Ambiguity.
An adversarial detector must be unable to distinguish between a true positive and a false positive, as long as the flag ciphertexts and detection keys were honestly generated.
In this original definition, the receiver has control over how much detection precision they delegate to a third party, because they choose the false positive probability when they extract a detection key from their root key. This fits with the idea of attenuating credentials, and intuitively, it seems correct that the receiver should control how much information they reveal to their detector. But the information revealed to their detector is determined by both the false positive probability and the amount of other messages that can function as cover traffic. Without knowing the extent of other activity on the system, the receiver has no way to make a principled choice of the detection precision to delegate.
Sender FMD
To address this problem, we generalize the original definition (now Receiver FMD) to Sender FMD, in which the false positive probability is chosen by the sender.
Like RFMD, SFMD consists of a tuple of
algorithms (KeyGen, Flag, Extract, Test)
. Like RFMD, KeyGen
generates a
root key and a flag key, and Test
takes a detection key and a flag ciphertext
and produces a detection result. Unlike RFMD, Extract
produces a detection
key directly from the root key, and Flag
takes the false positive rate $p$ and
the receiver’s flag key to produce a flag ciphertext. As discussed below, SFMD
can be realized using tweaks to either of the RFMD constructions in the
original paper.
In RFMD, flag ciphertexts are universal with respect to the false positive rate, which is applied to the detection key; in SFMD, the false positive rate is applied to the flag ciphertext and the detection key is universal. (This means there is no meaningful difference in capability between the root key and the detection key, so the distinction is maintained just for ease of comparison between the two variants).
Unlike RFMD, SFMD allows detection precision to be adaptive, by having senders use a (consensusdetermined) false positive parameter. This parameter should vary as the global message rates vary, so that filtered message streams have a bounded rate, and it should be the same for all users, so that messages cannot be distinguished by their false positive rate.
The paper calls these the secret and public keys respectively, but we avoid this in favor of capabilitybased terminology that names keys according to the precise capability they allow.
Constructing SFMD
The original FMD paper provides three constructions of RFMD. The first two realize functionality for restricted falsepositive probabilities of the form $p=2_{−n}$; the third supports arbitrary fractional probabilities using a much more complex and expensive construction.
The first two schemes, RFMD1 and RFMD2, are constructed similarly to a Bloom
filter: the Flag
procedure encrypts a number of 1
bits, and the Test
procedure uses information in a detection key to check whether some subset of
them are 1
, returning true
if so and false
otherwise. The false positive
probability is controlled by extracting only a subset of the information in the
root key into the detection key, so that it can only check a subset of the bits
encoded in the flag ciphertext. We focus on RFMD2, which provides more compact
ciphertexts and chosenciphertext security.
In this section, we:
 recall the construction of RFMD2, changing notation from the paper;
 adapt RFMD2 to SFMD2, which provides sender FMD instead of receiver FMD;
 extend SFMD2 to support diversified detection, allowing multiple, unlinkable flag keys to be detected by a single detection key;
 extend SFMD2 to use deterministic derivation, allowing 32byte flag and detection keys to support arbitraryprecision false positive probabilities;
 summarize the resulting construction and describe how it can be integrated with the Sapling key heirarchy.
The RFMD2 construction
First, we recall the paper’s construction of RFMD2, changing from multiplicative to additive notation and adjusting variable names to be consistent with future extensions.
The construction supports restricted false positive values $p=2_{−n}$ for $n≤γ$, a global parameter determining the minimum false positive rate. $G$ is a group of prime order $q$ and $H_{1}:G_{3}→{0,1}$ and $H_{2}:G×{0,1}_{γ}→Z_{q}$ are hash functions.
RFMD2/KeyGen
Choose a generator $B$ G$. For $i=1,…,γ$, choose $x_{i}$ Z_{q}$ and compute $X_{i}=[x_{i}]B$. Return the root key $rk←(x_{1},…,x_{γ})$ and the flag key $fk←(B,X_{1},…,X_{γ})$.
RFMD2/Extract
On input $p=2_{−n}$ and root key $rk$, parse $(x_{1},…,x_{γ})←rk$ and return $(x_{1},…,x_{n})$ as the detection key.
RFMD2/Flag
On input $fk$, first parse $(B,X_{1},…,X_{n})←fk$, then proceed as follows:
 Choose $r$ Z_{q}$ and compute $P←[r]B$.
 Choose $z$ Z_{q}$ and compute $Q←[z]B$.
 For each $i=1,…,γ$, compute
 a key bit $k_{i}←H_{1}(P∣∣[r]X_{i}∣∣Q)$;
 a ciphertext bit $c_{i}←k_{i}⊕1$.
 Compute $m←H_{2}(P∣∣c_{1}∣∣⋯∣∣c_{γ})$.
 Compute $y←(z−m)r_{−1}$.
Return the flag ciphertext $fc←(P,y,c_{1},…,c_{γ})$.
RFMD2/Test
On input $dk$, $fc$, first parse $(x_{1},…,x_{n})←dk$, $(P,y,c_{1},…,c_{γ})←fc$, then proceed as follows:
 Compute $m←H_{2}(P∣∣c_{1}∣∣⋯∣∣c_{γ})$.
 Recompute $Q$ as $Q←[y]P+[m]B$.
 For each $i=1,…,γ$, compute
 a key bit $k_{i}←H_{1}(P∣∣[x_{i}]P∣∣Q)$;
 a plaintext bit $b_{i}←c_{i}⊕k_{i}$.
If all plaintext bits $b_{i}=1$, return $1$ (match); otherwise, return $0$.
To see that the Test
procedure detects true positives, first note that
since $P=[r]B$ and $y=(z−m)r_{−1}$,
$Q =[y]P+[m]B=[(z−m)r_{−1}][r]B+[m]B=[z]B, $
so the detector will recompute the $Q$ used by the sender; then, since $[r]X_{i}=[r][x_{i}]B=[x_{i}]P$, the detector’s $k_{i}←H_{1}(P∣∣[x_{i}]P∣∣Q)$ is the
same as the sender’s $k_{i}←H_{1}(P∣∣[r]X_{i}∣∣Q)$.
On the other hand, if the ciphertext was created with a different flag key, the resulting $k_{i}$ will be random bits, giving the desired $2_{−n}$ false positive probability.
One way to understand the components of the construction is as follows: the message consists of $γ$ bits of a Bloom filter, each encrypted using hashed ElGamal with a common nonce $r$ and nonce commitment $P=[r]B$. The points $P,Q$ are included in the ElGamal hash, ensuring that any change to either will result in a random bit on decryption. The pair $(Q,y)$ act as a public key and signature for a onetime signature on the ciphertext as follows: the sender constructs the point $Q$ as the output of a chameleon hash with basis $(P,B)$ on input $(0,z)$, then uses the hash trapdoor (i.e., their knowledge of the discrete log relation $P=[r]B$) to compute a collision $(y,m)$ involving $m$, a hash of the rest of the ciphertext. The collision $y$ acts as a onetime signature with public key $Q$.
The fact that the generator $B$ was selected at random in each key generation is not used by the construction and doesn’t seem to play a role in the security analysis; in fact, the security proof in Appendix F has the security game operate with a common generator. In what follows, we instead assume that $B$ is a global parameter.
From RFMD2 to SFMD2
Changing this construction to the SFMD model is fairly straightforward: rather
than having the sender encrypt $γ$ bits of the Bloom filter and only allow
the detector to decrypt $n≤γ$ of them, we have the sender only encrypt
$n≤γ$ bits of the Bloom filter and allow the detector to decrypt all
potential bits. As noted in the previous
section, this means that in the SFMD context,
there is no separation of capability between the root key and the detection key.
For this reason, we skip the Extract
algorithm and (conceptually) merge the
root key into the detection key.
The SFMD2 construction then works as follows:
SFMD2/KeyGen
For $i=1,…,γ$, choose $x_{i}$ Z_{q}$ and compute $X_{i}=[x_{i}]B$. Return the detection key $dk←(x_{1},…,x_{γ})$ and the flag key $fk←(X_{1},…,X_{γ})$.
SFMD2/Flag
On input flag key $fk$, first parse $(X_{1},…,X_{n})←fk$, then proceed as follows:
 Choose $r$ Z_{q}$ and compute $P←[r]B$.
 Choose $z$ Z_{q}$ and compute $Q←[z]B$.
 For each $i=1,…,n$, compute
 a key bit $k_{i}←H_{1}(P∣∣[r]X_{i}∣∣Q)$;
 a ciphertext bit $c_{i}←k_{i}⊕1$.
 Compute $m←H_{2}(P∣∣n∣∣c_{1}∣∣⋯∣∣c_{n})$.
 Compute $y←(z−m)r_{−1}$.
Return the flag ciphertext $fc←(P,y,c_{1},…,c_{n})$.
SFMD2/Test
On input detection key $dk$ and flag ciphertext $fc$, first parse $(x_{1},…,x_{γ})←dk$ and $(P,y,c_{1},…,c_{n})←fc$, then proceed as follows:
 Compute $m←H_{2}(P∣∣n∣∣c_{1}∣∣⋯∣∣c_{n})$.
 Recompute $Q$ as $Q←[y]P+[m]B$.
 For each $i=1,…,n$, compute
 a key bit $k_{i}←H_{1}(P∣∣[x_{i}]P∣∣Q)$;
 a plaintext bit $b_{i}←c_{i}⊕k_{i}$.
If all plaintext bits $b_{i}=1$, return $1$ (match); otherwise, return $0$.
The value of $n$ is included in the ciphertext hash to ensure that the encoding of the ciphertext bits is nonmalleable. Otherwise, the construction is identical.
Diversified detection: from SFMD2 to SFMD2d
The Sapling design provides viewing keys that support diversified addresses: a collection of publicly unlinkable addresses whose activity can be scanned by a common viewing key. For integration with Sapling, as well as other applications, it would be useful to support diversified detection: a collection of publicly unlinkable diversified flag keys that share a common detection key. This collection is indexed by data called a diversifier; each choice of diversifier gives a different diversified flag key corresponding to the same detection key.
In the Sapling design, diversified addresses are implemented by selecting the basepoint of the key agreement protocol as the output of a groupvalued hash function whose input is the diversifier. Because the key agreement mechanism only makes use of the secret key (the incoming viewing key) and the counterparty’s ephemeral public key, decryption is independent of the basepoint, and hence the choice of diversifier, allowing a common viewing key to decrypt messages sent to any member of its family of diversified transmission (public) keys.
A similar approach can be applied to SFMD2, but some modifications to the tag
mechanism are required to avoid use of the basepoint in the Test
procedure.
Let $D$ be the set of diversifiers, and let $H_{3}:D→G$ be a groupvalued hash function. At a high level, our goal is to replace use of a common basepoint $B$ with a diversified basepoint $B_{d}=H_{3}(d)$ for some $d∈D$.
Our goal is to have diversified flag keys, so we first replace $B$ by $B_{d}$ in
KeyGen
, so that $X_{i}=[x_{i}]B_{d}$ and the components of the flag key are
parameterized by the diversifier $d$. The sender will compute the key bits as
$k_{i}←H_{1}(P∣∣[r]X_{i}∣∣Q),$
and the receiver computes them as
$k_{i}←H_{1}(P∣∣[x_{i}]P∣∣Q).$
If $P=[r]B_{d}$ instead of $P=[r]B$, then $[x_{i}]P=[r]X_{i}$ and the middle
component will match.
But then the sender needs to construct $Q$ as $Q←[z]B_{d}$ in order to have a known discrete log relation between $P$ and $Q$. This is a problem, because $Q$ is recomputed by the receiver, but if the receiver must recompute it as $Q←[y]P+[m]B_{d}$, detection is bound to the choice of diversifier, and we need detection to be independent of the choice of diversifier.
The root of this problem is that the chameleon hash uses basis $(P,B)$, so the receiver’s computation involves the basepoint $B$. To avoid it, we can use a basis $(P_{1},P_{2})$, where $P_{i}←[r_{i}]B$. $P_{1}$ is used in place of $P$, but $Q$ is computed as $[z]P_{2}$, i.e., as the chameleon hash $⟨(0,z),(P_{1},P_{2})⟩$. The sender’s collision then becomes $y←(z−m)r_{1}r_{2} $, allowing the detector to recompute $Q$ as $Q←=== [y]P_{1}+[m]P_{2}[(z−m)r_{1}r_{2} ][r_{1}]B+[m]P_{2}[z−m]P_{2}+[m]P_{2}[z]P_{2}. $ The point $P_{2}$ must be included in the flag ciphertext, increasing its size, but detection no longer involves use of a basepoint (diversified or otherwise).
Concretely, this mechanism works as follows, splitting out generation of flag keys (diversifierdependent) from generation of detection keys:
SFMD2d/KeyGen
For $i=1,…,γ$, choose $x_{i}$ Z_{q}$, and return the detection key $dk←(x_{1},…,x_{γ})$.
SFMD2d/Diversify
On input detection key $dk$ and diversifier $d$, first parse $(x_{1},…,x_{γ})←dk$. Then compute the diversified base $B_{d}←H_{3}(d)$, and use it to compute $X_{i}=[x_{i}]B_{d}$. Return the diversified flag key $fk_{d}←(X_{1},…,X_{γ})$.
SFMD2d/Flag
On input diversified flag key $fk_{d}$ and diversifier $d$, first parse $(X_{1},…,X_{n})←fk_{d}$, then proceed as follows:
 Compute the diversified base $B_{d}←H_{3}(d)$.
 Choose $r_{1}$ Z_{q}$ and compute $P_{1}←[r]B_{d}$.
 Choose $r_{2}$ Z_{q}$ and compute $P_{2}←[r]B_{d}$.
 Choose $z$ Z_{q}$ and compute $Q←[z]P_{2}$.
 For each $i=1,…,n$, compute
 a key bit $k_{i}←H_{1}(P_{1}∣∣P_{2}∣∣[r]X_{i}∣∣Q)$;
 a ciphertext bit $c_{i}←k_{i}⊕1$.
 Compute $m←H_{2}(P_{1}∣∣P_{2}∣∣n∣∣c_{1}∣∣⋯∣∣c_{n})$.
 Compute $y←(z−m)r_{1}r_{2} $.
Return the flag ciphertext $fc←(P_{1},P_{2},y,c_{1},…,c_{n})$.
SFMD2d/Test
On input detection key $dk$ and flag ciphertext $fc$, first parse $(x_{1},…,x_{γ})←dk$ and $(P_{1},P_{2},y,c_{1},…,c_{n})←fc$, then proceed as follows:
 Compute $m←H_{2}(P_{1}∣∣P_{2}∣∣n∣∣c_{1}∣∣⋯∣∣c_{n})$.
 Recompute $Q$ as $Q←[y]P_{1}+[m]P_{2}$.
 For each $i=1,…,n$, compute
 a key bit $k_{i}←H_{1}(P_{1}∣∣P_{2}∣∣[x_{i}]P∣∣Q)$;
 a plaintext bit $b_{i}←c_{i}⊕k_{i}$.
If all plaintext bits $b_{i}=1$, return $1$ (match); otherwise, return $0$.
Deterministic flag key generation
One remaining obstacle is the size of the flag keys. A flag key supporting false positive probabilities down to $2_{−γ}$ requires $γ$ group elements; for instance, supporting probabilities down to $2_{−14}=1/16384$ with a 256bit group requires 448byte flag keys. It would be much more convenient to have smaller keys.
One way to do this is to use a deterministic key derivation mechanism similar to BIP32 to derive a sequence of child keypairs from a single parent keypair, and use that sequence as the components of a flag / detection keypair. To do this, we define a hash function $H_{4}:G×Z→Z_{q}$. Then given a parent keypair $(x,X=[x]B)$, child keys can be derived as $x_{i}X_{i} ←x+H_{4}(X∣∣i)←X+[H_{4}(X∣∣i)]B $ with $X_{i}=[x_{i}]B$. Unlike BIP32, there is only one sequence of keypairs, so there is no need for a chain code to namespace different levels of derivation, although of course the hash function $H_{4}$ should be domainseparated.
Applying this to SFMD2, the scalar $x$ becomes the compact detection key, and the point $X$ becomes the compact flag key. The full detection key $(x_{1},…,x_{γ})$ can be derived from the compact detection key $x$, and the full flag key $(X_{1},…,X_{γ})$ can be derived from the compact flag key $X$. In addition, it is no longer necessary to select the minimum false positive probability as a global parameter $γ$ prior to key generation, as the compact keys can be extended to arbitrary length (and thus arbitrary false positive probabilities).
Deterministic derivation is only possible in the SFMD setting, not the RFMD setting, because SFMD does not require any separation of capability between the component keys used for each bit of the Bloom filter. Public deterministic derivation does not allow separation of capability: given $X$ and any child secret $x_{i}$, it’s possible to recover the parent secret $x$ and therefore the entire subtree of secret keys. Thus, it cannot be used for RFMD, where the detection key is created by disclosing only some of the bits of the root key.
Unfortunately, this optimization does not seem to be possible to combine with diversified detection. The detection key components $x_{i}$ must be derived from the parent secret and (the hash of) public data, but diversified detection requires that different public data (flag keys) have the same detection key components.
 Is there some clever way to sidestep this issue? seems somewhat fundamental.
Sapling Integration
 Pick diversified detection XOR deterministic flag key generation

Addresses include a flag key / compact flag key
 32byte compact flag keys are probably workable, 320448 byte full flag keys are probably not.
Parameter Considerations

How should the false positive rate be determined? In some epoch, let $p$ be the false positive rate, $N$ be the total number of messages, $M$ be the number of true positives for some detection key, and $D$ be the number of detections for that detection key. Then $E[D]=M+p(N−M)=pN+M(1−p),$ and ideally $p$ should be chosen so that:
 $E[D]$ is bounded above;
 When $M$ is within the range of “normal use”, $E[D]$ is close enough to $pN$ that it’s difficult for a detector to distinguish (what does this mean exactly?);

The notion of detection ambiguity only requires that true and false positives be ambiguous in isolation. In practice, however, a detector has additional context: the total number of messages, the number of detected messages, and the false positive probability. What’s the right notion in this context?

What happens when an adversary manipulates $N$ (diluting the global message stream) or $M$ (by sending extra messages to a target address)? There is some analogy here to flashlight attacks, although with the critical difference that flashlight attacks on decoy systems degrade privacy of the transactions themselves, whereas here the scope is limited to transaction detection.
Homomorphic Threshold Decryption
A sketch of one construction is as follows.
Write a value $v∈[−2_{63},2_{63})$ in radix $11$ with signed coefficients, i.e., as $v=v_{0}+v_{1}2_{11}+⋯+v_{5}2_{55}$ with $v_{i}∈[−2_{11},2_{11})$. Encode each coefficient $v_{i}$ as $v_{i}B$ and use ElGamal encryption to form the ciphertext $Enc_{D}(v)=(Enc_{D}(v_{0}B),…,Enc_{D}(v_{5}B)).$ Each ElGamal ciphertext consists of two group elements; if group elements can be encoded in 32 bytes, this gives a 384byte ciphertext. To decrypt $Enc_{D}(v)$, use ElGamal decryption to obtain the group elements $(v_{0}B,…,v_{5}B)$, and then use a lookup table to recover $v_{i}$ from $v_{i}B$, or fail if the value is unknown.
This can in principle be done inside of a zkSNARK circuit if the underlying group is an embedded elliptic curve, together with certification that the ciphertext was correctly formed with inrange coefficients.
Addition and subtraction of ciphertexts are done componentwise, using the homomorphic property of ElGamal encryptions, and the fact that $v_{i}B+w_{i}B=(v_{i}+w_{i})B$.
Adding $n=2_{k}$ ciphertexts of values whose coefficients were bounded as $∣v_{i,k}∣≤2_{11}$ results in a sum whose coefficients $w_{i}$ are bounded as $∣w_{i}∣≤n2_{11}=2_{11+k}$. Choosing $k=7$ and accounting for sign means that a lookup table of size $2⋅2_{11+7}=2_{19}$ is sufficient to decrypt sums of up to 128 wellformed ciphertexts. Sums of more than 128 ciphertexts can be handled by summing blocks of 128, decrypting the block sum, and summing the plaintexts.
Unfortunately, this method reveals slightly more information about a sum of encrypted summands than would be ideal. Ideally, it would reveal only the sum of the encrypted summands, but in fact it reveals the sum of each radix$11$ chunk, without carrying between them. Carrying collapses information about the summands, but that information is revealed by this scheme. This seems unlikely to be a problem in practice, but it is worth quantifying.
TODO
 the bounds above are a ballpark estimation; refine them and make them precise
 work out integration with ABCI++ protocol phases
Project Updates
These updates describe design progress on Penumbra:
 20200420: Project update log and issue tracker
 20200416: ABCI domain types
 20200412:
towerabci
 20200326: Design presentation
20200607: ZSwap
Replaces the previous sketch DEX design with ZSwap, an new design that integrates Uniswap v3’s concentrated liquidity mechanism.
20200511: Cryptography design
Creates a new Primitives section for the cryptographic primitives Penumbra will use.
Adds a rationale for the choice of proof system.
Adds an overview of the decaf377
construction with an overview of the relative cost compared to other encodings,
sketches the encoding and decoding functions and adds test vectors. Specifying
the square root algorithm and hashtogroup map go together.
Also puts in placeholders for a Poseidon instance, and a STROBEwithPoseidon construction.
20200420: Project update log and issue tracker
Created this project update section of the notes, backfilled it, and published the repo to use as an issue tracker for design details.
20200416: ABCI domain types
The first design iteration of towerabci
used raw
protobufgenerated types from the tendermintproto
crate. However, because these types are generated from protobuf definitions,
they’re much less ergonomic: they don’t follow Rust naming conventions, they
can’t be wellorganized into modules, they can’t be easily documented, etc.
More importantly, because protobufs require that all fields be optional and
have default values, using raw proto types means that validation checks are
scattered throughout the codebase.
To fix this, tendermintrs
PR #862 adds a complete model for the
ABCI protocol as a set of Rust domain types for each request and response
method, and provides validating conversions between the domain types and the
raw proto types.
Using this code in towerabci
makes the interface significantly better,
with the bulk of the kvstore
example application fitting
on one screen of code.
20200412: towerabci
Penumbra will use ABCI to talk to Tendermint. ABCI is the interface between Tendermint (a consensus engine for BFT replication of a state machine), and an arbitrary application (the state machine to be replicated). The ABCI interface consists of a set of requests and responses the consensus engine makes to drive the application state.
This means that Penumbra needs an ABCI interface. Existing Rust interfaces to ABCI require application developers to write their own adhoc synchronization logic to handle concurrent ABCI requests. This makes it difficult to get upandrunning, and difficult to correctly handle concurrent requests.
To address this, towerabci
is a new design for an asynchronous ABCI
interface using Tower. Tower is a library of modular components for
building networking clients and servers. Tower defines a core abstraction,
the Service
trait, which represents an asynchronous function with
backpressure, and then provides combinators that allow generic composition of
additional behavior, e.g., timeouts, buffering, loadshedding, ratelimiting,
instrumentation, etc.
The towerabci
crate has two parts:

An ABCI server, which listens for connections and forwards ABCI requests to one of four userprovided
Service
s, each responsible for processing one category of requests (consensus, mempool, info, or snapshot). 
Middleware that splits a single
Service
implementing all of ABCI into four cloneable component services, each implementing one category of requests. The component services use messagepassing to share access to the main service, which processes requests with the following categorybased prioritization:ConsensusRequest
s sent to theConsensus
service;MempoolRequest
s sent to theMempool
service;SnapshotRequest
s sent to theSnapshot
service;InfoRequest
s sent to theInfo
service.
Because the ABCI server takes one service per category, users can apply Tower
layers to the services they pass to the ABCI Server
to add
categoryspecific behavior, such as loadshedding, buffering, etc.
These parts can be combined in different ways to provide different points on the tradeoff curve between implementation complexity and performance:

At the lowest level of complexity, application developers can implement an ABCI application entirely synchronously. To do this, they implement
Service<Request>
so thatService::call
performs request processing and returns a ready future. Then they usesplit::service
to create four component services that share access to their application, and use those to construct the ABCIServer
. The application developer does not need to manage synchronization of shared state between different clones of their application, because there is only one copy of their application. 
At the next level of complexity, application developers can implement an ABCI application partially synchronously. As before, they implement
Service<Request>
to create a single ABCI application, but instead of processing all requests in the body ofService::call
, they can defer processing of some requests by immediately returning a future that will be executed on the caller’s task. Although all requests are still received by the application task, not all request processing needs to happen on the application task. 
At the highest level of complexity, application developers can implement multiple distinct
Service
s and manually control synchronization of shared state between them, then use these to construct the ABCIServer
.
Because these use the same interfaces in different ways, application developers can move gradually along this curve according to their performance requirements, starting with a synchronous application, then refactoring it to do some processing asynchronously, then doing more processing asynchronously, then splitting out one standalone service, then using entirely distinct services, etc.
20200326: Design presentation
Initial publication of these notes, and presentation at the ZKValidator Cosmos Privacy+ZKP showcase.
The slides for the presentation can be found here, and a recording is published here.