Learnings from running Web3Signer at scale on Holesky

Learnings from running Web3Signer at scale on Holesky

At Kiln, we have been engaged in implementing Web3Signer at scale on Holesky, the new Ethereum testnet launched in September this year.

💡 Developed by ConsenSys, Web3Signer is a pivotal tool that provides a remote signing API, specifically for Ethereum. Its core functionality lies in enabling validators to securely sign blocks and attestations while significantly minimizing the risks associated with slashing penalties. 

While implementing Web3Signer at scale on Holesky, we encountered several challenges and gathered valuable insights, particularly with the distribution of validators across diverse geographic locations. This blog post delves into our experiences and findings from that setup.

Overview

This document describes the infrastructure setup for running validation at scale using Web3Signer, which we put in place at Kiln to run our ~100k Holesky validators.


One of the goals was to assess the feasibility of running validators from multiple geographical locations while keeping guarantees from Web3Signer. Initially, we adopted a straightforward approach, and this post covers some pitfalls that came up with it and the eventual enhancements we implemented.

We extend our heartfelt gratitude to the ConsenSys team for their unwavering support, providing custom flags to Web3Signer that enabled us to tweak its threading model.

Architecture

The overall architecture looks as follows:

Each validator client loads a subset of validation keys, establishes connections to a beacon/exec pair and gets signatures from a fleet of Web3Signer instances linked to an anti-slashing database. This configuration is quite common when using Web3Signer. 

However, what sets our setup apart from classic setups is the scale at which it runs, and the fact that validators/beacons/exec nodes are run from different geographical places.

Impact of geographical spread

TL;DR: You likely want to favor having a Web3Signer instance as close to the database as possible.

In the above architecture, the Web3Signer instances can be positioned:

  • close to the validator clients and far away from the database
  • close to the database and far away from the validator clients

We initially assumed it would end up in similar signing latencies as the overall distance end-to-end was the same, but that’s not the case.

Whenever a validator client reaches out to Web3Signer, it sends a single HTTP request, so there is only one back-and-forth. On the other hand, upon receiving the signature request, Web3Signer sends a transaction to the database, which first locks the validator to check, then from within this transaction, sends multiple sub-queries (this happens in the transaction handler). This means increasing the latency between Web3Signer and the database will have a ~5x impact on the overall signing latency as there is more back-and-forth happening:

The impact of increased latency between Web3Signer instances and the database leads to extended queue delays and, eventually, timeouts if the threading model of Web3Signer is not tuned (see next section).

Threading model of Web3Signer and latency

TL;DR: You may want to tweak the threading model of Web3Signer.

Our initial setup placed Web3Signer instances scheduled near validator clients, which increased the latency to the database. At the beginning of each epoch, when the signing load is at its peak, we observed:

  • a significant number of missed attestations
  • validators seeing timeouts when signing requests
  • optimal resource usage by beacons, execution clients and validators. 
  • little to no load on our database, as well as our Web3Signer instances
  • no saturation on the network side

These observations suggested that there was some contention in the way Web3Signer processed incoming requests. After a thorough investigation, we concluded that we needed to tweak the worker size pool of Vertx, the Java framework used by Web3Signer to dispatch request processing. This framework excels at handling async operations concurrently that can spread across multiple Unix threads. 

However, doing some categories of blocking operations from a handler can block the event loop. We suspect this is what happens around SQL transactions used for anti-slashing. As we couldn’t tweak the Vertx configuration, we coordinated with the ConsenSys team to build a version of Web3Signer, allowing the tuning of -Xworker-thread-pool, which increases the number of Unix threads. Tuning this parameter impacts performance, especially when latency to the database is high.

There are Prometheus metrics that are worth checking to get an insight into this:

  • http_vertx_worker_queue_delay: time spent by requests in queues before being processed
  • http_vertx_worker_pool_completed_total : number of queries processed by Web3Signer

With high latency to the database

The greater the distance between the database and the Web3Signer instances, the more significant this issue becomes because the blocking SQL transaction consumes more time and blocks other requests. We got to the point where we were observing improvements while running it with large values of 200 Unix threads on a single CPU, without a noticeable increase in the CPU load. This suggests that there is room for improvement at the Web3Signer level:

Note: The delays we are seeing in this case are not to the database, but rather to the incoming requests waiting in the processing queue of Web3Signer, with no contention at the underlying database level.

In this extreme example, with a latency of around ~O(50)ms between Web3Signer and the database, requests quickly accumulate in queues, awaiting processing by Web3Signer, as it can only handle a few at a time due to the block on the SQL transaction.

With low latency to the database

When there is low latency between the database and the Web3Signer instances, the average performance is generally satisfactory and tweaking its value marginally improves performances:

Note: With latencies < 0.02ms, this represents multiple orders of magnitude lower than the previous graph, and depicts a well-functioning Web3Signer.

However, zooming in on the 99 percentile, we still observe a net improvement in the waiting queue times when using a large value for worker-thread-size:

We could not match this (yet) with a clear improvement in our attestation rate because there is too much noise currently on Holesky. However, we believe it has the potential to win ~50ms at the 99 percentile on more stable networks or to mitigate the impact of an increase in DB latency.

Takeaways

The current state of Web3Signer’s request processing reveals opportunities for improvement, as it is currently delaying/queuing incoming requests in situations where there is no resource contention in the pipeline. Maybe yielding back to the Vertx scheduler around blocking code inside the signing transaction would allow other concurrent requests to progress.

In the meantime, we think adjusting the -Xworker-thread-pool value higher than 20 could prove beneficial in case we experience an incident that increases our latency to the database, ensuring that Web3Signer copes better with it.

Ingress load balancing

TL;DR: At scale, you likely want an ingress load-balancer.

We initially didn’t use an ingress, relying instead on the default load-balancing mechanism in Kubernetes. This approach led to a random connection of a validator client to a Web3Signer instance, which then keep its HTTP connection for all of its signatures. This is problematic because there is no guarantee that the random selection at socket opening time will result in a balanced situation, we observe a high level of disparity between the load processed by each Web3Signer instance:

As we saw before, the waiting queue can quickly be limiting due to the threading model hence, some validators all connected to the same instance would experience 5 times higher latencies and potentially timeout, while others would be fine. Using an ingress without any extra configuration results in a balanced load on each request:

On the plus side, we get additional metrics from the ingress that can quickly point out issues, such as the overall QPS (which spikes at every epoch):

As well as latency distribution graphs which can pinpoint underlying issues on the Web3Signer queues or at the database level:

Conclusion

Our journey with Web3Signer and Holesky at scale highlighted potential areas of optimization in its request processing. Fine-tuning parameters like -Xworker-thread-pool can provide better performance, especially when faced with unexpected latency issues. 

Additionally, implementing an ingress load-balancer at scale ensures a more balanced and efficient distribution of requests. These insights reflect the importance of continuous assessment and adjustment when operating in a dynamic environment like blockchain technology.

Thanks to Sébastien Rannou for writing this article, as well as the Ethereum Foundation for their support.

Reach out to start staking with Kiln

About Kiln

Kiln is the leading enterprise-grade staking platform, enabling institutional customers to stake assets, and to whitelabel staking functionality into their offering. Kiln runs validators on all major PoS blockchains, with over $2.2 billion crypto assets being programmatically staked, and running over 3% of the Ethereum network on a multi-cloud, multi-region infrastructure. Kiln also provides a validator-agnostic suite of products for fully automated deployment of validators, and reporting and commission management, enabling custodians, wallets, and exchanges to streamline staking operations across providers. Kiln is also SOC2 Type 2 certified.

Subscribe to our Newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.