Ethereum Client Diversity Part 1: Consensus & Finalization
This is a series of three articles about Ethereum diversity from an operational perspective with the different risks associated with running different types of clients. The first part explains how the Ethereum consensus works around finalization and a way to think about it at a higher level, the second and third part detail the different scenarios and consequences on validators depending on which side of forks they are on whenever finalization issues arise. Based on this we highlight the current strategy we have in place at Kiln around diversity.
We hope this series will motive some decisions from other actors to ensure the Ethereum network is in a healthy position. It can also help our customers understand why and how we tackle diversity: doing the right thing for the network while ensuring the minimal risks for their positions.
Ethereum Consensus & Finalization
To understand the issues at stake under the client diversity discussions, it’s important to understand how the protocol works at the consensus level, especially around slot finalization.
Slots and Epochs
The Ethereum consensus layer (a.k.a the beacon chain) is composed of slots which happen every 12 seconds, they can be seen as a unit of time during which a selected validator (the proposer) has to create and propagate a block for the slot. Slots are grouped in logical entities called epochs, each epoch containing 32 slots.
graph LR subgraph Epoch 1 slot-0 slot-1 slots-epoch1 slot-31 end subgraph Epoch 2 slot-32 slot-33 slots-epoch2 slot-63 end slot-0 -.- slot-1 -.- slots-epoch1[...] -.- slot-31 -.- slot-32 -.- slot-33 -.- slots-epoch2[...] -.- slot-63
At the beginning of each new slot, the selected proposer broadcasts a block proposal and a subset of 1/32th of all validators is responsible of verifying the block and voting for it (attestation). As a result during an epoch, all active validators are expected to cast one vote.
The Canonical chain
Proposers are expected to propose a block on top of what they consider to be the head of the chain, that is, what they think is the current latest valid block of the chain. This is done via the parent_root
field of the block proposal payload. If the previous block is not received in time or invalid, the proposer considers it missed and builds the next block on top of the block before. This has strong implications because each validator can receive blocks at different times and as a result, the tip of the chain can fork into different branches.
For example:
- The validator proposing a block on slot N+2 may not have seen the block on slot N+1 so considers it missed, and it bases its block on top of block-0,
- The validator proposing a block on slot N+3 however, saw the block at slot N+1, but not the one at N+2 so considers it missed, and it bases its block on top of block-1.
graph RL classDef missed stroke:#f00 subgraph slot-0[slot N] block-0 end subgraph slot-1[slot N+1] missed-block-1[missed block]:::missed block-1 end subgraph slot-2[slot N+2] block-2 missed-block-2[missed block]:::missed end subgraph slot-3[slot N+3] block-3 end block-1 -.-> block-0 block-2 -.- missed-block-1 -.-> block-0 block-3 -.- missed-block-2 -.-> block-1
In turn the attesters vote for blocks they see and consider to be the head of the chain for the slot they are assigned to, this leads to a situation where the tree has branches with different weights depending on the number of validators that back it. The weight of branches is computed using the sum of effective balances of validators that voted for it. Forks can happen for other reasons than latency, if there is a bug in a consensus node or in an execution node or an attack on the network, a proposed block may not be verifiable by voters and dismissed from their local view.
Flattening this tree structure into a linked-list (the canonical chain) is the job of the LMD-Ghost algorithm (Latest-Message-Driven, a variant of the GHOST algorithm specific to Ethereum), which dynamically updates its view depending on voting weights it sees and other criteria. The more the chain progresses, the higher the confidence it gets as the more votes and weights are gathered, making it less likely to re-balance. This provides a coherent view of the network and what makes it possible for explorers like etherscan to show a linear history of the chain, without any trees.
graph LR classDef green fill:#006400,color:#ffffff block_30[block 30]:::green --- block_31[block 31]:::green --- block_32[block 32]:::green --- block_33[block 33]:::green block_30 -.- block_31'[block 31' ] -.- block_32'[block 32'] block_31 -.- block_32''[block 32'' ] -.- block_33''[block 33''] block_33'' -.- block_34''[block 34'']
Each validator in the network runs the LMD-Ghost based on its local view of the network, and depending on when each vote is received, what the validator considers to be the canonical chain can change if a branch gets more traction: this is a re-organization (a.k.a re-org).
This introduces a downside from users’ perspective: a transaction can appear in a block that gets re-orged so is no longer included in the canonical version of the chain. The advantage is to always provide a live view of the network (no chain halt, there will always progress made) with a probabilistic view of which version of the chain is most likely to finalize at any point. To provide stronger guarantees, the Ethereum consensus adds on top of LMD-Ghost additional logic that brings finalization: Casper the Friendly Finality Gadget, also known as FFG.
Justification and Finalization
Finalization is the process by which Ethereum guarantees the history of the chain can’t be re-organized without burning at least 1/3 of the staked value past a certain slot called a checkpoint. A checkpoint is the first slot of an epoch with its corresponding block root. Finalization happens on epoch boundaries for efficiency reasons: as validators already cast a vote once per epoch for the slot to vote for the current head of the network, in the same payload they also vote for justification.
The justification vote is a link between two checkpoint slots:
- the source checkpoint: the last justified checkpoint as seen by the validator
- the target checkpoint, the canonical view of the validator
If the validator sees 2/3 of the network casting a vote from its current justified checkpoint to its canonical target, the validator marks the target as the new justified epoch. In normal operations, most of the time there is 1 epoch difference between the source epoch and the target epoch, but under latency conditions for instance, the network can be slower and higher gaps can be created. When seeing two consecutive justified epochs in a row and if the former is a child of a finalized epoch, the former is considered finalized.
graph LR classDef justified fill:#8B8000 classDef finalized fill:#006400 subgraph epoch 2 subgraph slot_32 block_a["block A"] end end subgraph epoch 3 subgraph slot_64 block_b["block B"] end end subgraph epoch 4 subgraph slot_96 block_c["block C"] end end subgraph epoch 5 subgraph slot_128 block_d["block D"] end end slot_32:::justified -.-> slot_64 slot_64 -.- slot_96 slot_96 -.- slot_128
This is from the canonical view of a validator: epoch 2 is justified, validator receives votes to link epoch 2 to epoch 3 (with source S=2, root=A and target T=3, root=B).
graph LR classDef justified fill:#8B8000 classDef finalized fill:#006400 subgraph epoch 2 subgraph slot_32 block_a["block A"] end end subgraph epoch 3 subgraph slot_64 block_b["block B"] end end subgraph epoch 4 subgraph slot_96 block_c["block C"] end end subgraph epoch 5 subgraph slot_128 block_d["block D"] end end slot_32:::justified -.- 2/3 -.- slot_64:::justified slot_64 -.- slot_96 slot_96 -.- slot_128
Validator received 2/3 of votes with source S=2, root=A and target T=3, root=B: it marks epoch 2 as justified.
graph LR classDef justified fill:#8B8000 classDef finalized fill:#006400 subgraph epoch 2 subgraph slot_32 block_a["block A"] end end subgraph epoch 3 subgraph slot_64 block_b["block B"] end end subgraph epoch 4 subgraph slot_96 block_c["block C"] end end subgraph epoch 5 subgraph slot_128 block_d["block D"] end end slot_32:::finalized --- slot_64:::justified slot_64 -.-> slot_96 slot_96 -.- slot_128
There are now two justified epochs in a row, the validator marks the first one as finalized, while gathering FFG votes from source S=3, root=B to target T=4, root=C.
graph LR classDef justified fill:#8B8000 classDef finalized fill:#006400 subgraph epoch 2 subgraph slot_32 block_a["block A"] end end subgraph epoch 3 subgraph slot_64 block_b["block B"] end end subgraph epoch 4 subgraph slot_96 block_c["block C"] end end subgraph epoch 5 subgraph slot_128 block_d["block D"] end end slot_32:::finalized --- slot_64:::justified slot_64 -.- 2/3 -.- slot_96:::justified slot_96 -.- slot_128
Validator received 2/3 of votes for source S=3, root=B and target T=4, root=C: it marks epoch 3 as justified.
graph LR classDef justified fill:#8B8000 classDef finalized fill:#006400 subgraph epoch 2 subgraph slot_32 block_a["block A"] end end subgraph epoch 3 subgraph slot_64 block_b["block B"] end end subgraph epoch 4 subgraph slot_96 block_c["block C"] end end subgraph epoch 5 subgraph slot_128 block_d["block D"] end end slot_32:::finalized --- slot_64:::justified slot_64:::finalized --- slot_96:::justified slot_96 -.-> slot_128
Epoch 3 and 4 are both justified, that’s two in a row, validator marks epoch 3 as finalized.
Casper Slashing Rules
Casper Slashing rules are here to enforce the justification and finalization process: they ensure a validator which committed to finalize a certain view of the network can’t change its mind about it without being heavily penalized. Because it limits the actions an operator can take during a major consensus or execution client issue, it is crucial to understand them. In the part two of this series, we will reference these slashing rules when discussing the options for validators switching forks in the case of a faulty execution or consensus client.
There are two rules to follow which are expressed in the Casper the Friendly Finality Gadget paper at the heart of the Ethereum consensus:
Note: there are other slashing rules in Ethereum (to prevent proposing two different blocks at the same height, or to prevent voting on multiple heads at the same height). We are only concerned with the finalization process here as this is what matter in the context of diversity.
Rule 1: No Multiple Votes on the Same Target
This is the first rule of the FFG specification: a validator that vouched for a certain root as its target vote can’t vouch for another one at the same epoch. This can happen for instance if an operator changes the beacon node used by its validator client to another beacon node, with a different view of the network (i.e: different output of LMD Ghost on the target slot): in such a case, there is a possibility to cast a vote for the same slot but with a different target vote. This is usually prevented by the use of local anti-slashing databases.
Here are a few examples of slashable votes that break the first rule:
graph LR subgraph slot_32 block_A["Block A"] end subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end slot_32 -.- slot_64 slot_64 --> slot_96 slot_64 --> slot_96' slot_64 -.- slot_96 slot_64 -.- slot_96' linkStyle 1,2 stroke-width:2px,fill:none,stroke:red;
Here the validator sent two attestations targeting different blocks at the same height.
graph LR subgraph slot_32 block_A["Block A"] end subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end slot_32 -.- slot_64 slot_64 --> slot_96 slot_64 -.- slot_96 slot_32 --> slot_96' slot_64 -.- slot_96' linkStyle 1,3 stroke-width:2px,fill:none,stroke:red;
In this case the two attestations are from different sources and they target different blocks but at the same height.
graph LR subgraph slot_32 block_A["Block A"] end subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end slot_32 -.- slot_64 slot_64 -.- slot_96 slot_32 --> slot_96' slot_64 --> slot_96' slot_64 -.- slot_96' linkStyle 3,2 stroke-width:2px,fill:none,stroke:red;
Here the two attestations are from a different source but they target the same block.
Rule 2: No Surrounding Votes
The second rule is around surrounding, as we saw before the FFG vote can be seen as a link, if a validator publishes two links that exclusively surrounds one another, the validator is slashed. This can happen during execution or consensus client incidents where the network splits in multiple views : if you migrate your validator client to another beacon node type or execution client type which has a different interpretation of the network. Same here, a local anti-slashing database prevents this (it would likely prevent your validator from voting at all here).
graph LR subgraph slot_32 block_A["Block A"] end subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end subgraph slot_128 block_D["Block D"] end subgraph slot_128' block_D'["Block D'"] end subgraph slot_128'' block_D''["Block D''"] end slot_32 -.- slot_64 slot_64 -.- slot_96 slot_64 --> slot_96' slot_64 -.- slot_96' slot_96 -.- slot_128 slot_96 -.- slot_128' slot_96' -.- slot_128'' slot_32 --> slot_128'' linkStyle 7,2 stroke-width:2px,fill:none,stroke:red;
Here the first attestation from slot 64 to slot 96’ is surrounded by the attestation from slot 32 to slot 128’’.
Implicit Rule: Conflicting Checkpoints
In FFG, validators perform justification and finalization accounting for attestations that match their view (i.e: their source, their target), this means that an alien vote from a different source to the same target as the validator won’t be accounted for. In other words, the finality is achieved on a link (i.e: a correlated pair of source, target) with the target descending from the source. This prevents a validator from casting a vote with a target from a branch to another branch.
Even though such a vote wouldn’t be slashed as it does not violate the surrounding rule, it will be dismissed by all nodes on the network. Such a vote is nonsense: it would mean a part of the blockchain finalized on a branch, and suddenly the next block part of another branch is not descending from a known block. This has implications on what it is possible to do as an operator whenever a finality issue arises: you can’t always simply upgrade your stack once a vote has been cast on specific branch.
Here are a few examples that break the coherency rule while technically not being slash-able offences :
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end subgraph slot_128'' block_D'["Block D'"] end slot_64 -.- slot_96 slot_64 -.- slot_96' slot_96' -.- slot_128'' slot_96 --> slot_128'' linkStyle 3 stroke-width:2px,fill:none,stroke:orange;
Here block D’ does not descend from block C, it doesn’t make sense from a chain level to have such a link and this vote will be dismissed by validators on the network as none will understand it.
graph LR subgraph slot_32 block_A["Block A"] end subgraph slot_64 block_B["Block B"] end subgraph slot_64' block_B'["Block B'"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end subgraph slot_128 block_D["Block D"] end subgraph slot_128' block_D'["Block D'"] end subgraph slot_128'' block_D''["Block D''"] end slot_32 -.- slot_64 slot_32 -.- slot_64' slot_64 -.- slot_96 slot_64 -.- slot_96' slot_96 -.- slot_128 slot_96 -.- slot_128' slot_96' -.- slot_128'' slot_96 --> slot_128'' linkStyle 7 stroke-width:2px,fill:none,stroke:orange;
Similar situation here, block D’’ does not descend from block C and the vote will be dismissed by other validators.
Valid Votes
Now that we've discussed the rules for slashing, let's turn to some of the possible valid votes that validators can cast around justification and finalization. The most common situation when the blockchain runs smoothly, validator votes follow the chain, its target vote becoming the source of the next vote:
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_128 block_D["Block D"] end subgraph slot_160 block_E["Block E"] end slot_64 -.- slot_96 slot_96 -.- slot_128 slot_128 -.- slot_160 slot_64 --> slot_96 slot_96 --> slot_128 slot_128 --> slot_160 linkStyle 3,4,5 stroke-width:2px,fill:none,stroke:green;
When the network runs under degraded network conditions or part of the validators is down for instance, it’s possible for validators to cast multiple votes with the same source, as long as the targets are not at the same height. In this example, not enough validators witnessed the votes from S=64,T=96 to justify it so they try to justify the next epoch on the next round:
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_128 block_D["Block D"] end slot_64 -.- slot_96 slot_96 -.- slot_128 slot_64 --> slot_96 slot_64 --> slot_128 linkStyle 2,3 stroke-width:2px,fill:none,stroke:green;
Validators can also be offline like in this example where the voting validator missed a vote between slot 96 and slot 128 and got a small penalty, it’s fine as the Ethereum is designed to allow downtime:
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_160 block_E["Block E"] end subgraph slot_128 block_D["Block D"] end subgraph slot_160 block_E["Block E"] end slot_64 -.- slot_96 slot_96 -.- slot_128 slot_128 -.- slot_160 slot_64 --> slot_96 slot_128 --> slot_160 linkStyle 3,4 stroke-width:2px,fill:none,stroke:green;
In the following case, the validator voted for the wrong target with T=96’, this can happen for instance due to a software bug in the execution or the consensus client. The operator likely upgraded its software and jumped back on the expected branch at slot 128:
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end subgraph slot_160 block_E["Block E"] end subgraph slot_128 block_D["Block D"] end subgraph slot_160 block_E["Block E"] end slot_64 -.- slot_96 slot_64 -.- slot_96' slot_96 -.- slot_128 slot_128 -.- slot_160 slot_64 --> slot_96' slot_128 --> slot_160 linkStyle 4,5 stroke-width:2px,fill:none,stroke:green;
Here the validator committed to S=96´,T=128’ likely due to a consensus bug, this is fine provided there were enough votes on the other branch to justify S=96, allowing the operator to upgrade and vote for S=128:
graph LR subgraph slot_64 block_B["Block B"] end subgraph slot_96 block_C["Block C"] end subgraph slot_96' block_C'["Block C'"] end subgraph slot_128' block_D'["Block D'"] end subgraph slot_160 block_E["Block E"] end subgraph slot_128 block_D["Block D"] end subgraph slot_160 block_E["Block E"] end slot_64 -.- slot_96 slot_64 -.- slot_96' slot_96' -.- slot_128' slot_96 -.- slot_128 slot_128 -.- slot_160 slot_96' --> slot_128' slot_128 --> slot_160 linkStyle 5,6 stroke-width:2px,fill:none,stroke:green;
In part II of this series we will reference these slashing rules when discussing the options for validators switching forks in the case of a buggy client.
Inactivity Leak
Inactivity leak kicks in if there are 4 consecutive epochs that aren’t finalized. This usually means at least 1/3 of the network is struggling to vote correctly (either because they are offline, or because they wrongly voted and the network is stuck as there isn’t a 2/3 majority to finalize). In this scenario, the Ethereum consensus enters into a special mode where:
- nodes that do not attest during an epoch get a penalty that quadratically increases with time
- nodes that do attest get 0 rewards
Modelling the penalty accumulation during the first 48 hours from the point of view of a single validator:
Zooming out a bit, after a bit more than 50 days the entire balance of the stake is burned; when reaching an effective balance below 16 ETH, validators are automatically exited and have to go through the exit queue which can take hours to weeks depending on the number of exits. Hopefully before reaching such scales, the network reached finalization:
The intent of this slowly ramping-up failure mode is to let enough time for operators and client developers to address the issue in case it is an issue that can be fixed, before dramatically burning more an more ETH until the consensus can finalize again: as the misbehaving nodes (the ones not voting) are heavily penalized, the total staked value they represent will at some point, be less than 1/3 of the total staked value, at which point the consensus has enough staked value to reach 2/3 finality again. As we’ll see in the next articles of this series, a bug affecting more than 1/3 of the network can have dramatic consequences.
Conclusion
In this article, we have described the process of finalization: how the Ethereum network chooses which is the canonical version of the chain and how it secures it via slashing rules. We have also introduced the Inactivity Leak, a state the network enters in after 4 consecutive epochs without finalization. In this state, penalties are accumulated by validators until the network enters a finalization state again.
With this knowledge, we are now equipped to delve into the complexities of execution and consensus layer diversity issues and their potential implications, which we will do in the next two articles.
Appendix
The Attestation Vote
The vote is a signature of the validator over a structure called the attestation data with the following payload, where the beaon_block_root
, source
and target
respectively represent the head vote (output of LMD Ghost), the source vote and the target vote (FFG Casper votes):
class AttestationData(Container):
slot: Slot
index: CommitteeIndex
# LMD GHOST vote
beacon_block_root: Root
# FFG vote
source: Checkpoint
target: Checkpoint
class Checkpoint(Container):
epoch: Epoch
root: Root
The Block Proposal
The block proposal is a beacon block where the parent_root
points to the hash of the block on top of which it is built (its parent):
class BeaconBlock(Container):
slot: Slot
proposer_index: ValidatorIndex
parent_root: Root
state_root: Root
body: BeaconBlockBody
Coherent Links
Even though the slashing rules do not explicitly enforce a target vote to descend from a source vote, it is implicitly enforced in the specifications of Ethereum, in the process attestation part of the specification: only attestations that have a source matching the current justified view of the validator is stored into current_epoch_attestations
:
def process_attestation(state: BeaconState, attestation: Attestation) -> None:
[...]
if data.target.epoch == get_current_epoch(state):
assert data.source == state.current_justified_checkpoint
state.current_epoch_attestations.append(pending_attestation)
else:
assert data.source == state.previous_justified_checkpoint
state.previous_epoch_attestations.append(pending_attestation)
The current_epoch_attestations
is later used to check if it weights more than 2/3 of the votes, leading to justification and finalization. This means trying to vote from another source than the current justified view of the node won’t be taken into account.
Thanks to Sébastien Rannou (mxs) for writing this post, as well as the Ethereum Foundation, Thorsten Behrens and Emmanuel Nalepa for their support.
Find the second post of the series here.