Ethereum Client Diversity Part 1: Consensus & Finalization

Ethereum Client Diversity Part 1: Consensus & Finalization

This is a series of three articles about Ethereum diversity from an operational perspective with the different risks associated with running different types of clients. The first part explains how the Ethereum consensus works around finalization and a way to think about it at a higher level, the second and third part detail the different scenarios and consequences on validators depending on which side of forks they are on whenever finalization issues arise. Based on this we highlight the current strategy we have in place at Kiln around diversity.

We hope this series will motive some decisions from other actors to ensure the Ethereum network is in a healthy position. It can also help our customers understand why and how we tackle diversity: doing the right thing for the network while ensuring the minimal risks for their positions.

Ethereum Consensus & Finalization

To understand the issues at stake under the client diversity discussions, it’s important to understand how the protocol works at the consensus level, especially around slot finalization.

Slots and Epochs

The Ethereum consensus layer (a.k.a the beacon chain) is composed of slots which happen every 12 seconds, they can be seen as a unit of time during which a selected validator (the proposer) has to create and propagate a block for the slot. Slots are grouped in logical entities called epochs, each epoch containing 32 slots.

    graph LR

  subgraph Epoch 1
      slot-0
      slot-1
      slots-epoch1
      slot-31
  end

  subgraph Epoch 2
      slot-32
      slot-33
      slots-epoch2
      slot-63
  end

  slot-0 -.- slot-1 -.- slots-epoch1[...] -.- slot-31 -.- slot-32 -.- slot-33 -.- slots-epoch2[...] -.- slot-63


At the beginning of each new slot, the selected proposer broadcasts a block proposal and a subset of 1/32th of all validators is responsible of verifying the block and voting for it (attestation). As a result during an epoch, all active validators are expected to cast one vote.

The Canonical chain

Proposers are expected to propose a block on top of what they consider to be the head of the chain, that is, what they think is the current latest valid block of the chain. This is done via the parent_root field of the block proposal payload. If the previous block is not received in time or invalid, the proposer considers it missed and builds the next block on top of the block before. This has strong implications because each validator can receive blocks at different times and as a result, the tip of the chain can fork into different branches.

For example:

  • The validator proposing a block on slot N+2 may not have seen the block on slot N+1 so considers it missed, and it bases its block on top of block-0,
  • The validator proposing a block on slot N+3 however, saw the block at slot N+1, but not the one at N+2 so considers it missed, and it bases its block on top of block-1.
graph RL

  classDef missed stroke:#f00

	subgraph slot-0[slot N]
	   block-0
	end
	
	subgraph slot-1[slot N+1]
	   missed-block-1[missed block]:::missed
	   block-1
	end

	subgraph slot-2[slot N+2]
	   block-2
	   missed-block-2[missed block]:::missed
	end

	subgraph slot-3[slot N+3]
	   block-3
	end
	
block-1 -.-> block-0
block-2 -.- missed-block-1 -.-> block-0
block-3 -.- missed-block-2 -.-> block-1


In turn the attesters vote for blocks they see and consider to be the head of the chain for the slot they are assigned to, this leads to a situation where the tree has branches with different weights depending on the number of validators that back it. The weight of branches is computed using the sum of effective balances of validators that voted for it. Forks can happen for other reasons than latency, if there is a bug in a consensus node or in an execution node or an attack on the network, a proposed block may not be verifiable by voters and dismissed from their local view.

Flattening this tree structure into a linked-list (the canonical chain) is the job of the LMD-Ghost algorithm (Latest-Message-Driven, a variant of the GHOST algorithm specific to Ethereum), which dynamically updates its view depending on voting weights it sees and other criteria. The more the chain progresses, the higher the confidence it gets as the more votes and weights are gathered, making it less likely to re-balance. This provides a coherent view of the network and what makes it possible for explorers like etherscan to show a linear history of the chain, without any trees.

graph LR
  classDef green fill:#006400,color:#ffffff
  
  block_30[block 30]:::green --- block_31[block 31]:::green --- block_32[block 32]:::green --- block_33[block 33]:::green
  block_30 -.- block_31'[block 31' ] -.- block_32'[block 32']
  block_31 -.- block_32''[block 32'' ] -.- block_33''[block 33'']
  block_33'' -.- block_34''[block 34'']
  


Each validator in the network runs the LMD-Ghost based on its local view of the network, and depending on when each vote is received, what the validator considers to be the canonical chain can change if a branch gets more traction: this is a re-organization (a.k.a re-org).

This introduces a downside from users’ perspective: a transaction can appear in a block that gets re-orged so is no longer included in the canonical version of the chain. The advantage is to always provide a live view of the network (no chain halt, there will always progress made) with a probabilistic view of which version of the chain is most likely to finalize at any point. To provide stronger guarantees, the Ethereum consensus adds on top of LMD-Ghost additional logic that brings finalization: Casper the Friendly Finality Gadget, also known as FFG.

Justification and Finalization

Finalization is the process by which Ethereum guarantees the history of the chain can’t be re-organized without burning at least 1/3 of the staked value past a certain slot called a checkpoint. A checkpoint is the first slot of an epoch with its corresponding block root. Finalization happens on epoch boundaries for efficiency reasons: as validators already cast a vote once per epoch for the slot to vote for the current head of the network, in the same payload they also vote for justification.

The justification vote is a link between two checkpoint slots:

  • the source checkpoint: the last justified checkpoint as seen by the validator
  • the target checkpoint, the canonical view of the validator

If the validator sees 2/3 of the network casting a vote from its current justified checkpoint to its canonical target, the validator marks the target as the new justified epoch. In normal operations, most of the time there is 1 epoch difference between the source epoch and the target epoch, but under latency conditions for instance, the network can be slower and higher gaps can be created. When seeing two consecutive justified epochs in a row and if the former is a child of a finalized epoch, the former is considered finalized.

graph LR
  classDef justified fill:#8B8000
  classDef finalized fill:#006400

	subgraph epoch 2
		subgraph slot_32
		  block_a["block A"]
		end
	end

	subgraph epoch 3
     subgraph slot_64
       block_b["block B"]
     end
	end

	subgraph epoch 4
    subgraph slot_96
		  block_c["block C"]
		end
	end

	subgraph epoch 5
		subgraph slot_128
		  block_d["block D"]
		end
	end

	slot_32:::justified -.-> slot_64
  slot_64 -.- slot_96
	slot_96 -.- slot_128
  

This is from the canonical view of a validator: epoch 2 is justified, validator receives votes to link epoch 2 to epoch 3 (with source S=2, root=A and target T=3, root=B).

graph LR
  classDef justified fill:#8B8000
  classDef finalized fill:#006400

	subgraph epoch 2
		subgraph slot_32
		  block_a["block A"]
		end
	end

	subgraph epoch 3
     subgraph slot_64
       block_b["block B"]
     end
	end

	subgraph epoch 4
    subgraph slot_96
		  block_c["block C"]
		end
	end

	subgraph epoch 5
		subgraph slot_128
		  block_d["block D"]
		end
	end

	slot_32:::justified -.- 2/3 -.- slot_64:::justified
  slot_64 -.- slot_96
	slot_96 -.- slot_128
  

Validator received 2/3 of votes with source S=2, root=A and target T=3, root=B: it marks epoch 2 as justified.

graph LR
  classDef justified fill:#8B8000
  classDef finalized fill:#006400

	subgraph epoch 2
		subgraph slot_32
		  block_a["block A"]
		end
	end

	subgraph epoch 3
     subgraph slot_64
       block_b["block B"]
     end
	end

	subgraph epoch 4
    subgraph slot_96
		  block_c["block C"]
		end
	end

	subgraph epoch 5
		subgraph slot_128
		  block_d["block D"]
		end
	end
	slot_32:::finalized --- slot_64:::justified
  slot_64 -.-> slot_96
	slot_96 -.- slot_128
  

There are now two justified epochs in a row, the validator marks the first one as finalized, while gathering FFG votes from source S=3, root=B to target T=4, root=C.

graph LR
  classDef justified fill:#8B8000
  classDef finalized fill:#006400

	subgraph epoch 2
		subgraph slot_32
		  block_a["block A"]
		end
	end

	subgraph epoch 3
     subgraph slot_64
       block_b["block B"]
     end
	end

	subgraph epoch 4
    subgraph slot_96
		  block_c["block C"]
		end
	end

	subgraph epoch 5
		subgraph slot_128
		  block_d["block D"]
		end
	end

	slot_32:::finalized --- slot_64:::justified
  slot_64 -.- 2/3 -.- slot_96:::justified
	slot_96 -.- slot_128

Validator received 2/3 of votes for source S=3, root=B and target T=4, root=C: it marks epoch 3 as justified.

graph LR
  classDef justified fill:#8B8000
  classDef finalized fill:#006400

	subgraph epoch 2
		subgraph slot_32
		  block_a["block A"]
		end
	end

	subgraph epoch 3
     subgraph slot_64
       block_b["block B"]
     end
	end

	subgraph epoch 4
    subgraph slot_96
		  block_c["block C"]
		end
	end

	subgraph epoch 5
		subgraph slot_128
		  block_d["block D"]
		end
	end

	slot_32:::finalized --- slot_64:::justified
  slot_64:::finalized --- slot_96:::justified
	slot_96 -.-> slot_128
  

Epoch 3 and 4 are both justified, that’s two in a row, validator marks epoch 3 as finalized.

Casper Slashing Rules

Casper Slashing rules are here to enforce the justification and finalization process: they ensure a validator which committed to finalize a certain view of the network can’t change its mind about it without being heavily penalized. Because it limits the actions an operator can take during a major consensus or execution client issue, it is crucial to understand them. In the part two of this series, we will reference these slashing rules when discussing the options for validators switching forks in the case of a faulty execution or consensus client.

There are two rules to follow which are expressed in the Casper the Friendly Finality Gadget paper at the heart of the Ethereum consensus:

Extract of the Casper the Friendly Finality Gadget paper around FFG slashing rules.

Note: there are other slashing rules in Ethereum (to prevent proposing two different blocks at the same height, or to prevent voting on multiple heads at the same height). We are only concerned with the finalization process here as this is what matter in the context of diversity.

Rule 1: No Multiple Votes on the Same Target

This is the first rule of the FFG specification: a validator that vouched for a certain root as its target vote can’t vouch for another one at the same epoch. This can happen for instance if an operator changes the beacon node used by its validator client to another beacon node, with a different view of the network (i.e: different output of LMD Ghost on the target slot): in such a case, there is a possibility to cast a vote for the same slot but with a different target vote. This is usually prevented by the use of local anti-slashing databases.

Here are a few examples of slashable votes that break the first rule:

graph LR
	subgraph slot_32
   	block_A["Block A"]
	end

	subgraph slot_64
   	block_B["Block B"]
	end

	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

  slot_32 -.- slot_64
  slot_64 --> slot_96
  slot_64 --> slot_96'
  slot_64 -.- slot_96
  slot_64 -.- slot_96'
 

  linkStyle 1,2 stroke-width:2px,fill:none,stroke:red;
  

Here the validator sent two attestations targeting different blocks at the same height.

graph LR
	subgraph slot_32
   	block_A["Block A"]
	end

	subgraph slot_64
   	block_B["Block B"]
	end

	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

  slot_32 -.- slot_64
  slot_64 --> slot_96
  slot_64 -.- slot_96
  slot_32 --> slot_96'
  slot_64 -.- slot_96'

  linkStyle 1,3 stroke-width:2px,fill:none,stroke:red;
  

In this case the two attestations are from different sources and they target different blocks but at the same height.

graph LR
	subgraph slot_32
   	block_A["Block A"]
	end

	subgraph slot_64
   	block_B["Block B"]
	end

	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

  slot_32 -.- slot_64
  slot_64 -.- slot_96
  slot_32 --> slot_96'
  slot_64 --> slot_96'
  slot_64 -.- slot_96'

  linkStyle 3,2 stroke-width:2px,fill:none,stroke:red;
  

Here the two attestations are from a different source but they target the same block.

Rule 2: No Surrounding Votes

The second rule is around surrounding, as we saw before the FFG vote can be seen as a link, if a validator publishes two links that exclusively surrounds one another, the validator is slashed. This can happen during execution or consensus client incidents where the network splits in multiple views : if you migrate your validator client to another beacon node type or execution client type which has a different interpretation of the network. Same here, a local anti-slashing database prevents this (it would likely prevent your validator from voting at all here).

graph LR
	subgraph slot_32
   	block_A["Block A"]
	end

	subgraph slot_64
   	block_B["Block B"]
	end

	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_128'
   	block_D'["Block D'"]
	end

	subgraph slot_128''
   	block_D''["Block D''"]
	end

  slot_32 -.- slot_64
  slot_64 -.- slot_96
  slot_64 --> slot_96'
  slot_64 -.- slot_96'
  slot_96 -.- slot_128
  slot_96 -.- slot_128'
  slot_96' -.- slot_128''
  slot_32 --> slot_128''

  linkStyle 7,2 stroke-width:2px,fill:none,stroke:red;
  

Here the first attestation from slot 64 to slot 96’ is surrounded by the attestation from slot 32 to slot 128’’.

Implicit Rule: Conflicting Checkpoints

In FFG, validators perform justification and finalization accounting for attestations that match their view (i.e: their source, their target), this means that an alien vote from a different source to the same target as the validator won’t be accounted for. In other words, the finality is achieved on a link (i.e: a correlated pair of source, target) with the target descending from the source. This prevents a validator from casting a vote with a target from a branch to another branch.

Even though such a vote wouldn’t be slashed as it does not violate the surrounding rule, it will be dismissed by all nodes on the network. Such a vote is nonsense: it would mean a part of the blockchain finalized on a branch, and suddenly the next block part of another branch is not descending from a known block. This has implications on what it is possible to do as an operator whenever a finality issue arises: you can’t always simply upgrade your stack once a vote has been cast on specific branch.

Here are a few examples that break the coherency rule while technically not being slash-able offences :

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

	subgraph slot_128''
   	block_D'["Block D'"]
	end

  slot_64 -.- slot_96
  slot_64 -.- slot_96'
  slot_96' -.- slot_128''
  slot_96 --> slot_128''

  linkStyle 3 stroke-width:2px,fill:none,stroke:orange;
  

Here block D’ does not descend from block C, it doesn’t make sense from a chain level to have such a link and this vote will be dismissed by validators on the network as none will understand it.

graph LR
	subgraph slot_32
   	block_A["Block A"]
	end

	subgraph slot_64
   	block_B["Block B"]
	end

	subgraph slot_64'
   	block_B'["Block B'"]
	end

	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_128'
   	block_D'["Block D'"]
	end

	subgraph slot_128''
   	block_D''["Block D''"]
	end

  slot_32 -.- slot_64
  slot_32 -.- slot_64'
  slot_64 -.- slot_96
  slot_64 -.- slot_96'
  slot_96 -.- slot_128
  slot_96 -.- slot_128'
  slot_96' -.- slot_128''
  slot_96 --> slot_128''

  linkStyle 7 stroke-width:2px,fill:none,stroke:orange;
  

Similar situation here, block D’’ does not descend from block C and the vote will be dismissed by other validators.


Valid Votes

Now that we've discussed the rules for slashing, let's turn to some of the possible valid votes that validators can cast around justification and finalization. The most common situation when the blockchain runs smoothly, validator votes follow the chain, its target vote becoming the source of the next vote:

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

  slot_64 -.- slot_96
  slot_96 -.- slot_128
  slot_128 -.- slot_160
  
  slot_64 --> slot_96
  slot_96 --> slot_128
  slot_128 --> slot_160

  linkStyle 3,4,5 stroke-width:2px,fill:none,stroke:green;
  

When the network runs under degraded network conditions or part of the validators is down for instance, it’s possible for validators to cast multiple votes with the same source, as long as the targets are not at the same height. In this example, not enough validators witnessed the votes from S=64,T=96 to justify it so they try to justify the next epoch on the next round:

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

  slot_64 -.- slot_96
  slot_96 -.- slot_128
  
  slot_64 --> slot_96
  slot_64 --> slot_128

  linkStyle 2,3 stroke-width:2px,fill:none,stroke:green;
  


Validators can also be offline like in this example where the voting validator missed a vote between slot 96 and slot 128 and got a small penalty, it’s fine as the Ethereum is designed to allow downtime:

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

  slot_64 -.- slot_96
  slot_96 -.- slot_128
  slot_128 -.- slot_160
  
  
  slot_64 --> slot_96
  slot_128 --> slot_160

  linkStyle 3,4 stroke-width:2px,fill:none,stroke:green;
  



In the following case, the validator voted for the wrong target with T=96’, this can happen for instance due to a software bug in the execution or the consensus client. The operator likely upgraded its software and jumped back on the expected branch at slot 128:

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

  slot_64 -.- slot_96
  slot_64 -.- slot_96'
  slot_96 -.- slot_128
  slot_128 -.- slot_160
  
  
  slot_64 --> slot_96'
  slot_128 --> slot_160

  linkStyle 4,5 stroke-width:2px,fill:none,stroke:green;
  



Here the validator committed to S=96´,T=128’ likely due to a consensus bug, this is fine provided there were enough votes on the other branch to justify S=96, allowing the operator to upgrade and vote for S=128:

graph LR
	subgraph slot_64
   	block_B["Block B"]
	end
	
	subgraph slot_96
   	block_C["Block C"]
	end

	subgraph slot_96'
   	block_C'["Block C'"]
	end

	subgraph slot_128'
   	block_D'["Block D'"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

	subgraph slot_128
   	block_D["Block D"]
	end

	subgraph slot_160
   	block_E["Block E"]
	end

  slot_64 -.- slot_96
  slot_64 -.- slot_96'
  slot_96' -.- slot_128'
  slot_96 -.- slot_128
  slot_128 -.- slot_160
  
  
  slot_96' --> slot_128'
  slot_128 --> slot_160

  linkStyle 5,6 stroke-width:2px,fill:none,stroke:green;
  


In part II of this series we will reference these slashing rules when discussing the options for validators switching forks in the case of a buggy client.

Inactivity Leak

Inactivity leak kicks in if there are 4 consecutive epochs that aren’t finalized. This usually means at least 1/3 of the network is struggling to vote correctly (either because they are offline, or because they wrongly voted and the network is stuck as there isn’t a 2/3 majority to finalize). In this scenario, the Ethereum consensus enters into a special mode where:

  • nodes that do not attest during an epoch get a penalty that quadratically increases with time
  • nodes that do attest get 0 rewards

Modelling the penalty accumulation during the first 48 hours from the point of view of a single validator:

Penalties accumulated by non-attesting validators during the first 50 hours of an inactivity leak event.

Zooming out a bit, after a bit more than 50 days the entire balance of the stake is burned; when reaching an effective balance below 16 ETH, validators are automatically exited and have to go through the exit queue which can take hours to weeks depending on the number of exits. Hopefully before reaching such scales, the network reached finalization:

Penalties accumulated by non-attesting validators during the first 50 days of an inactivity leak event.

The intent of this slowly ramping-up failure mode is to let enough time for operators and client developers to address the issue in case it is an issue that can be fixed, before dramatically burning more an more ETH until the consensus can finalize again: as the misbehaving nodes (the ones not voting) are heavily penalized, the total staked value they represent will at some point, be less than 1/3 of the total staked value, at which point the consensus has enough staked value to reach 2/3 finality again. As we’ll see in the next articles of this series, a bug affecting more than 1/3 of the network can have dramatic consequences.

Conclusion

In this article, we have described the process of finalization: how the Ethereum network chooses which is the canonical version of the chain and how it secures it via slashing rules. We have also introduced the Inactivity Leak, a state the network enters in after 4 consecutive epochs without finalization. In this state, penalties are accumulated by validators until the network enters a finalization state again.

With this knowledge, we are now equipped to delve into the complexities of execution and consensus layer diversity issues and their potential implications, which we will do in the next two articles.

Appendix

The Attestation Vote

The vote is a signature of the validator over a structure called the attestation data with the following payload, where the beaon_block_root , source and target respectively represent the head vote (output of LMD Ghost), the source vote and the target vote (FFG Casper votes):

class AttestationData(Container):
   slot: Slot
   index: CommitteeIndex
   # LMD GHOST vote
   beacon_block_root: Root
   # FFG vote
   source: Checkpoint
   target: Checkpoint
   
class Checkpoint(Container):
   epoch: Epoch
   root: Root

The Block Proposal

The block proposal is a beacon block where the parent_root points to the hash of the block on top of which it is built (its parent):

class BeaconBlock(Container):
   slot: Slot
   proposer_index: ValidatorIndex
   parent_root: Root
   state_root: Root
   body: BeaconBlockBody


Coherent Links

Even though the slashing rules do not explicitly enforce a target vote to descend from a source vote, it is implicitly enforced in the specifications of Ethereum, in the process attestation part of the specification: only attestations that have a source matching the current justified view of the validator is stored into current_epoch_attestations :


def process_attestation(state: BeaconState, attestation: Attestation) -> None:

[...]

   if data.target.epoch == get_current_epoch(state):
       assert data.source == state.current_justified_checkpoint
       state.current_epoch_attestations.append(pending_attestation)
   else:
       assert data.source == state.previous_justified_checkpoint
       state.previous_epoch_attestations.append(pending_attestation)


The current_epoch_attestations is later used to check if it weights more than 2/3 of the votes, leading to justification and finalization. This means trying to vote from another source than the current justified view of the node won’t be taken into account.

Thanks to Sébastien Rannou (mxs) for writing this post, as well as the Ethereum Foundation, Thorsten Behrens and Emmanuel Nalepa for their support.

Find the second post of the series here.

Subscribe to our Newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.