Over the past few months, we have had troubles supporting our ever growing data store. The main problem being database performance specially when answering queries involving large data sets. Code tuning is definitely the answer to a majority of these issues. However, there is always a performance bottleneck posed by the underlying infrastructure. In this blog post, I will try to explain the experiments that I did with different Amazon storage services to prove which is best suited to host a database similar to ours.
Our current database resides on EBS drives with Amazon and there has been a lot of talk on the poor performance of EBS drives because there is no guarantee on performance. Basically, EBS uses shared hardware over the network which makes it hard to predict the performance of the drives. The recommended solution is to switch to provisioned IOPS drives where Amazon guarantees a constant throughput in terms of IOPS (Input/Output operations per second). PIOPS drives are more expensive compared to EBS, but is the read/write performance significantly better?
To prove/disprove this argument, I ran several tests on various configurations of EBS and PIOPS drives and some of the results were surprising indeed.
For basic benchmarking, I used the following hardware setup:
- Benchmark server with a standard EBS drive (With standard EBS volume of 50GB)
- Benchmark server with a provisioned IOPS drive (With exactly the same CPU config and drive size as the EBS benchmark server.)
To test the performance between one configuration to another, we have a set of tests comprised of:
- Standard drive bandwidth tests using FIO
- Standard benchmark tests using pgbench (a benchmark tool for testing a postgres database)
- Test some of our slowest queries on production data
- Insert into some of our largest tables in production
Face-off between provisioned IOPS and EBS:
Our initial hunch was that the performance from provisioned IOPS would be significantly better than the standard EBS volume. However, some of the tests showed quite contrasting results.
Starting with the easiest setup, I installed postgres db on either of the benchmark servers and ran sort and standard benchmark tests using pgbench. Pgbench is extremely easy to run and there is hardly any setup required. The pgbench test would run small transactions on each db multiple times and finally generate an aggregate of response times.
The configuration of pgbench was set to the following:
./pgbench -c 4 -j 2 -T 600 -U postgres
|c:||Number of clients simulated, that is, number of concurrent database sessions|
|j:||Number of worker threads within pgbench.|
|T:||Number of transactions each client runs|
Note : These settings are not strict and can be tuned to customize each test.
The initial test results were rather unconvincing – EBS was way better than the PIOPS drive in terms of average transaction time.
|Average Transactions per second (PIOPS)||250|
|Average Transactions per second (EBS)||900|
Perplexed by the results from pgbench, the next test was geared towards testing the basic bandwidth that is available in the drives which is the main distinguishing factor between standard EBS and PIOPS. To test average bandwidth, I use a tool called FIO that spawn multiple threads to perform basic read and write operations on disk. The results from FIO were more interesting. The attached image below explains this in better detail. The diagram represents average bandwidth across 20 different tests performed using FIO on EBS and PIOPS volumes.
The provisioned IOPS drive provides a steady read and write bandwidth which are more or less the same. The Standard EBS on the other hand provides poor and fluctuating read bandwidth but relatively higher write bandwidth. Similar findings have also been posted in amazon’s forum.
The next effort was to measure the performance by adding 4 provisioned IOPS drives in RAID 0 configuration. Generally, raiding should improve the read/write performance. The next attached image shows results from running FIO random read and write tests on each of these drives over 10 iterations.
As evident from the results above, RAID-ing improves performance of the provisioned IOPS drive significantly. However, we still see the superior write performance of EBS.
The question to ask at this point is how these characteristics drill down to our real-life database performance. For a data store such as ours where major performance bottlenecks is caused due to large read requests. For example, heavy reports around weeks of path data across hundreds of paths generate enough traffic to test the read bandwidth of these drives. In this scenario it is sensible to switch to provision IOPS with RAID. One more reason for switching to provisioned IOPS is steady throughput. For tests that ran over night, the deviation from average is extremely low as evident from the charts above. On the other hand, it is hard to expect strict quality of service from EBS drive based databases.
However, if your application requires the best write performance, EBS is definitely the way to go. Even on RAID 0, the provisioned IOPS drive cannot match the performance of EBS.
Real application tests are also easy to setup. We used slave backups of production database and used pgreplay to relay a whole day’s sql queries on the test database.