The AMM test: a true look at L1 performance
https://medium.com/dragonfly-research/the-amm-test-a-no-bs-look-at-l1-performance-4c8c2129d581
Last updated
https://medium.com/dragonfly-research/the-amm-test-a-no-bs-look-at-l1-performance-4c8c2129d581
Last updated
Haseeb Qureshi, Mar 2022
Multichain is now a reality. Ethereum’s lack of scalability has caused a mass migration to a new generation of L1s. Most of these L1s use the EVM (the Ethereum Virtual Machine), which makes them compatible with Ethereum wallets and developer tools. But Solana has completely rebuilt its stack from the ground up. Solana claims to be the fastest blockchain in existence. So it begs the question: Just how much faster is Solana than the EVM chains?
First, we need to agree on how we measure performance. Since times immemorial, new blockchains have thrown around claims about how much more performant they are than Ethereum. It’s an old pastime. You’ll see lots of numbers bandied about and hastily assembled charts, comparing self-reported TPS (transactions per second). Unfortunately, these TPS numbers usually come from their own marketing materials, which are almost always BS.
Most benchmarks released by L1s themselves measure TPS of simple value transfers — i.e., transferring coins from one account to another. Simple transfers are extremely cheap and thus produce big numbers, and everyone loves big numbers. But no blockchain is actually bottlenecked on transfers like this, and this kind of activity doesn’t reflect real-world usage patterns. Furthermore, many of these numbers are generated on devnets or testnets rather than on mainnet. We don’t care about what someone’s software can do in the abstract: we care about what is possible on current mainnets.
In reality, there’s no single agreed-upon way to benchmark TPS. That’s often the case in benchmarking; it’s a messy and fraught field, full of misleading marketing, overfitting / “teaching to the test,” and cheating.
Okay, fine. So how should we actually measure L1 performance?
That’s a tricky question, because performance has multiple dimensions.
First, performance is always a tradeoff against decentralization. Testnets and devnets, which are highly centralized, can produce incredible numbers compared to what’s possible in mainnet environments. And many mainnets cut corners on decentralization, which squeezes out additional performance.
But let’s say we want to ignore decentralization and purely focus on performance. Well, benchmarking blockchain performance is notoriously hard because most new chains have very poor data visibility.
7 years in, Ethereum performance is highly studied and very well-understood. But as you start exploring newer chains, most of them have much less tooling, poor observability, and are constantly evolving. By the time you read this, these benchmarks will probably be out of date.
Furthermore, benchmarking is always arbitrary and riddled with pitfalls. The best you can do is pick a benchmark that measures something valuable, and then qualify your results as carefully as you can. That’s what we’ll be attempting to do here.
But what do we even mean by performance? There are two aspects to performance: throughput and latency.
You can visualize blockchain performance like water flowing through a pipe. The transactions are the water — you want lots of transactions flowing through the pipe at once. But the length of the pipe is what determines its latency — if it takes a long time for a transaction to get confirmed, even if lots of transactions can get confirmed at once, that’s not ideal.
Latency can be subdivided into block time (how long between blocks) and time to finality (how long until a block definitely won’t be reverted). Block time and time to finality are easy to measure.
But to actually measure throughput you need a standard unit of measure. Throughput of what?
Instead of token transfers, we looked at one of the top gas guzzlers on Ethereum: Uniswap V2, and turned it into a very simple benchmark. If you filled an entire block with Uniswap V2-style trades, how many trades per second would clear?
We chose this benchmark because 1) it’s simple and easy to measure, 2) every blockchain has a Uniswap V2-style AMM live in production, 3) it’s typical of common smart contract usage patterns.
For most chains that have a gas model, this back-of-the-envelope exercise should be straightforward. First, find the block gas limit and the block time to derive the gas/sec throughput of the chain; next, find an Uniswap v2-style AMM and pick a SwapETHforTokens equivalent transaction; lastly, divide the first number by the second to arrive at how many tx/sec would it achieve if its blocks were stuffed full of identical AMM trades.
Note: this is not a perfect benchmark! It’s idiosyncratic, it doesn’t account for parallelizable transactions (since Uniswap trades on the same pool must be linearized), and it’s not representative of every usage pattern. But smart contract usage is always power-law distributed, and the most used Dapps tend to be AMMs, so within a suite of benchmarks, we believe this is illustrative in getting a holistic view of performance.
So without further ado, let’s go down the list.
Uniswap v2 trades per second: 9.19 average, 18.38 max (due to EIP-1559)
Block time average: 13.2s (PoW, so blocks are mined randomly in a Poisson process)
Time to finality: 66 seconds (approximate, ETH blocks aren’t truly final)
Assumptions and Methodology: at the 15M gas target, which is what Ethereum achieves at equilibrium with EIP-1559, Ethereum can do 9.19 trades per second; at the 30M gas limit it can achieve 18.38 trades per second (but fees would increase exponentially if it stayed here). We used this swapExactETHForTokens transaction as a representative on-chain 1-hop trade. Assuming block producers can perfectly stuff a 15M gas limit block with Uniswap trades that cost 123,658 gas each, that means we can get 15M/123,658 = ~121.3 swaps into a single block. If we assume blocks arrive every 13.2 seconds, that means Ethereum processes 121.3/13.2s = ~9.19 Uniswap v2 swaps per second.
We will be using a similar calculation for other EVM chains on the list.
(Note: we are ignoring rollups with this methodology since all smart contract L1s are capable of adding rollups.)
Ubeswap trades per second: 24.93 average, 49.86 max (due to EIP-1559)
Block time average: 5s
Time to finality: 5s (Celo uses a PBFT-style protocol that immediately finalizes blocks)
Assumptions: this swap transaction is the representative trade, 10M gas target, and 20M gas limit.
Quickswap trades per second: 47.67 average, 95.33 max (due to EIP-1559)
Block time average: 2.5s
Time to finality: There are two notions of finality on Polygon
1. Probabilistic: This is similar to most Ethereum style blockchains where the canonical chain depends on the most work done (heaviest). In Polygon’s case, the finality of the Bor layer (which is the block producer layer) depends on the fork with higher difficulty.
2. Provable: This is similar to Tendermint/IBFT, where the super-majority signs on the canonical chain. This happens on the Heimdall layer (which is Polygon’s validators management and state-sync layer), through checkpointing. These checkpoints are submitted to Ethereum.
Reorgs and forks can happen on the Bor layer but not on Heimdall. Checkpoints are snapshots of the Bor chain state. Once a block is included in a submitted checkpoint, it cannot be reorg’d (unless >=⅓ of the validator set is dishonest). Checkpoints are submitted roughly every 25 minutes.
Assumptions: this swap transaction is the representative trade, 15M gas target, and 30M gas limit.
Trader Joe trades per second: 31.65 on average, but due to its elastic block time, at maximum throughput, the Avalanche C-Chain can process enough gas to hit 175.68 trades per second. However, sustaining throughput at that level would cause fees to rise exponentially.
Block time average: 2s average (Avalanche is a leaderless protocol with an elastic block time: blocks can be produced at any time, provided enough minimum fees are paid. The Avalanche C-Chain has had periods where >10 blocks were produced within 1 second.)
Time to finality: ~1.75s after the block is produced
Assumptions: this swap transaction is the representative trade, current 8M gas limit.
Avalanche is relatively hard to compare due to its block production mechanism being so different from Ethereum and the PoS chains. For Avalanche, there’s a large spread between what it can perform at maximum throughput and what it performs at average throughput. (Chains like Ethereum that have implemented EIP-1559 are bounded by 2x of their average throughput.)
PancakeSwap trades per second: 194.60 (Binance Smart Chain does not use EIP-1559, so this is a flat number)
Block time average: 3s
Time to finality: 75s
Assumptions: this swap transaction is the representative trade, 80M gas limit.
This concludes the benchmarking of the EVM blockchains — the blockchains whose virtual machine is modeled on Ethereum’s. Since all EVM chains use the same gas model, we can look at gas/sec as a benchmark for throughput. The solid bars denote target throughput, and hollow bars represent the limit.
Gas/sec for EVM chains
You can imagine that Binance Smart Chain is what happens when you run the EVM at its absolute limit. If you want to get higher performance out of smart contracts, you’ll have to move away from the EVM entirely.
Orca trades per second: 273.34
Block time: 590 ms
Time to finality: 13s (Solana also emits much faster “optimistic confirmations” but these are only resistant to ~4.7% corruption. Most Dapps accept this threshold instead.)
Here’s how we calculated this number. This one’s a doozy.
We first wanted to find a “gas limit” equivalent for Solana. You can’t find any number like that on block explorers. We started by asking some Solana developers we knew, but nobody seemed to know definitively if there even was such a limit. So we rolled up our sleeves and went on a trip to find out for ourselves.
We first learned that Solana does have something like gas, called compute units (CU), which is defined here. From our conversations with validators, most seemed to think Solana validation was “racing against the clock to pack as many transactions as they can within the block time,” but the actual limitation is that each block can only contain 48M CUs.
Second, only a limited number of CUs are writable to a single account in a single block. This limit is to prevent too many transactions writing to the same account, therefore reducing a block’s parallelism — though this is exactly what happens during mass congestion, such as during a popular IDO, when all transactions are competing to use a single contract.
The per-account limit is 12M. If you follow this 12M account CU limit, a 590ms block time on mainnet, and a cost of 74,408 CU per Orca swap, we arrive at a theoretical limit of 273.34 swaps/sec.
This number seems lower than expected! For us to trust this number, we’d want to verify this approach empirically.
To confirm that we were measuring its performance correctly, we decided to put Solana directly to the test with a spam attack. We don’t want to spam mainnet for obvious reasons, so we targeted the Solana devnet. Note that Solana’s devnet runs on a smaller cluster and thus has a faster blocktime than mainnet (380ms vs mainnet’s 590ms), which will increase its performance compared to mainnet. Given a 380ms block time, we should expect that the devnet should clear 424.40 swaps per second.
We spammed the Orca SOL-ORCA trading pair on the devnet to see how many Orca swaps we could land in a single block, and then extrapolated to the max throughput.
In Devnet block 106784857, we managed to land 184 Orca swaps
The highest number we managed to hit was 184 swaps in a single block. Assuming a block time of 380 ms, this gives us 484.21 swaps/second on the devnet. (Note that block times are not exact, so there is some jitter in these numbers. If you average across the 3 blocks where we landed the most transactions, it looks more like 381 swaps/second, which seems more reasonable). This seems to confirm that our analytical approach was correct (~10–15% delta), which therefore implies Solana’s mainnet can likely perform about 273 swaps/second on an AMM.
Granted this is only one test run, so here’s our code — we encourage you to play with it and share with us your results.
We’re glossing over a ton of details here, and none of this would’ve been possible without the help of our friends at Blockdaemon. If you want to know the juicy details of what it took to perform this (and a deeper dive into Solana internals with some MEV alpha leaks), check out part 2 where we get into the gory technical details.
You might look at all this and wonder: but I thought Solana is routinely doing 3000 TPS?
The way block explorers measure Solana’s TPS can be misleading — it counts internal consensus messages as transactions, which no other blockchain does. Roughly 80% of Solana’s throughput is consensus messages. Subtracting those, you’re left with ~600 TPS, of which most of those are Serum trades which are very cheap. So long as enough other contracts are being touched, Solana can also achieve higher performance in production.
The AMM test: Uniswap v2 style swaps/sec performance
So what’s the upshot of all this?
First, don’t take this as gospel. Do the math yourself.
Second, remember that all these blockchains are moving targets. They’re continually being optimized, and the technology is evolving rapidly, while any benchmark is a moment-in-time snapshot. We’d love to see more independent organizations creating standardized benchmarks, but this is our best attempt.
Third, notice that the spread in performance between these blockchains is not as big as advertised. The performance difference between Ethereum and the very best chain is about 10–25x, not 100x or 1000x. Nobody is getting that great performance out of linearized VM transactions; that will require a lot more work and optimization.
Fourth, if you want really high performance now you have to look outside the EVM space. We only benchmarked Solana here, but there are other non-EVM L1s like NEAR and Terra that also achieve higher performance. But like Solana, they don’t get to benefit from the tooling and ecosystem around the EVM. (Though NEAR has the Aurora shard, which is EVM-compatible, and other L1s are trying to develop similar virtualized EVM instances.)
Fifth, users aren’t that sensitive to performance considerations on non-Ethereum L1s right now. They care a lot more about the overall strength of an ecosystem, good UX, and low fees. These blockchains are not currently competing on performance because none of them are actually being used to capacity except during rare spikes, such as during IDOs or market meltdowns.
We expect that all of the major L1s will improve in their performance over time, as the dev teams spend more and more time tuning the performance across typical usage patterns. It should be no surprise that in their early days, each of these blockchains is poorly optimized!
But overall I come away with this impression: Ethereum is the MS-DOS of smart contract operating systems. But the current era of blockchains takes us into the Windows 95 era.
Next-generation blockchains represent a marked improvement, but there’s much further to go from here to get to mainstream adoption.