---
title: "Benchmarking Crunchy Data for latency"
description: "Benchmarking Crunchy Data for latency"
canonical_url: "https://www.bigbinary.com/blog/crunchy-bridge-vs-digital-ocean"
markdown_url: "https://www.bigbinary.com/blog/crunchy-bridge-vs-digital-ocean.md"
---

# Benchmarking Crunchy Data for latency

Benchmarking Crunchy Data for latency

- Author: Vishnu M
- Published: October 15, 2024
- Categories: Misc

In Rails World 2024, DHH unveiled [Kamal 2](https://kamal-deploy.org/) in his
[opening keynote](https://www.youtube.com/watch?v=-cEn_83zRFw). Now, folks want
to give Kamal a try, but some people are worried about the data. They want to
take one step at a time, and they feel more comfortable if their database is
managed by someone else.

That's where [Crunchy Data](https://www.crunchydata.com/) comes in. They provide
managed Postgres service. Checkout this
[tweet](https://x.com/dhh/status/1840901376182009900) from DHH about Crunchy
Data.

In our internal discussion, one of the BigBinary engineers brought up the issue
of "latency". Since the PostgreSQL server will not be in the same data center,
what would be the latency? How much impact will it have on performance?

We didn't know the answer so we thought we would do some benchmarking.

## Benchmarking

To do the comparison we needed another hosting provider where we can run
PostgreSQL on the same data center. We chose to work with Digital Ocean.

To compare the two services, we wrote a benchmark script in Ruby similar to the
one [Ben Johnson](https://x.com/benbjohnson) wrote in
[Go](https://github.com/benbjohnson/production-sqlite-go/blob/main/postgres-tests/postgres_test.go)
for his [GopherCon talk](https://youtu.be/XcAYkriuQ1o?si=vz-sYjevztb_bnwL)
in 2021.

In this benchmark, we're using Ruby's Benchmark module to measure the
performance of a series of database operations. Here's what the code does:

1. It establishes a connection to the database only once, at the beginning of
   the script. This is done outside the benchmarked operations because
   establishing connections can be a slow operation because of TLS negotiation,
   and we don't want to account for that time in our measurements.

2. It then performs the following operations 10,000 times, measuring each one:

   - Drops a table named 't' if it exists.
   - Creates a new table 't' with two columns: 'id' (an auto-incrementing
     primary key) and 'name' (a text field).
   - Inserts a single row into the table with the name 'jane'.
   - Selects the 'name' from the table where the 'id' is 1 (which should be
     'jane').

3. After all 10,000 iterations, it calculates and prints the average time for
   each operation in microseconds.

```ruby
require "pg"
require "benchmark"

class PostgresBenchmark
  def initialize(connection_string)
    @conn = PG.connect(connection_string)
  end

  def run(iterations = 10_000)
    total_times = Hash.new { |h, k| h[k] = 0 }

    iterations.times do |i|
      puts "Running iteration #{i + 1}" if (i + 1) % 1000 == 0
      times = benchmark_operations
      times.each { |key, time| total_times[key] += time }
    end

    average_times = total_times.transform_values { |time| time / iterations }
    print_results(average_times, iterations)
  ensure
    @conn.close if @conn
  end

  private

  def benchmark_operations
    times = {}

    times[:drop] = Benchmark.measure { @conn.exec("DROP TABLE IF EXISTS t") }.real
    times[:create] = Benchmark.measure { @conn.exec("CREATE TABLE t (id SERIAL PRIMARY KEY, name TEXT)") }.real
    times[:insert] = Benchmark.measure { @conn.exec("INSERT INTO t (name) VALUES ('jane')") }.real
    times[:select] = Benchmark.measure do
      result = @conn.exec("SELECT name FROM t WHERE id = 1")
      raise "Unexpected result" unless result[0]["name"] == "jane"
    end.real

    times
  end

  def print_results(times, iterations)
    total_time = times.values.sum

    puts "\nAVERAGE ELAPSED TIME (over #{iterations} iterations)"
    puts "drop    #{(times[:drop] * 1_000_000).round(2)} microseconds"
    puts "create  #{(times[:create] * 1_000_000).round(2)} microseconds"
    puts "insert  #{(times[:insert] * 1_000_000).round(2)} microseconds"
    puts "select  #{(times[:select] * 1_000_000).round(2)} microseconds"
    puts "TOTAL   #{(total_time * 1_000_000).round(2)} microseconds"
  end
end

if __FILE__ == $0
  connection_string = "<DB_CONNECTION_STRING>"
  benchmark = PostgresBenchmark.new(connection_string)
  benchmark.run(10_000)
end
```

## Database Specifications and Setup

### Digital Ocean

- Region: NYC3 data center
- Specs: 2 vCPU, 4GB Memory
- Price: $60 per month

### Crunchy Data

- Provider: AWS
- Region: us-east-1
- Specs: 2 vCPU, 4GB Memory (hobby-4 plan)
- Price: $70 per month

We provisioned a Digital Ocean droplet in the NYC3 data center and invoked the
benchmark script from the machine. The Digital Ocean database was also in the
same NYC3 data center. For Crunchy Data, the availability zone selected was
`us-east-1` as it was the closest to NYC3.

## Benchmarking results

### Digital Ocean

```js
AVERAGE ELAPSED TIME (over 10000 iterations)
drop    3448.07 microseconds
create  5048.39 microseconds
insert  891.81 microseconds
select  584.17 microseconds
TOTAL   9972.44 microseconds
```

### Crunchy Data

```js
AVERAGE ELAPSED TIME (over 10000 iterations)
drop    10097.89 microseconds
create  16818.63 microseconds
insert  8416.35 microseconds
select  7211.42 microseconds
TOTAL   42544.29 microseconds
```

## Benchmarking analysis

The results of this benchmark do not come as a surprise. Digital Ocean is
performing significantly better than Crunchy Data. This performance difference
can be primarily attributed to network latency.

_Network latency_ refers to the round-trip time (RTT) it takes for the data to
travel from its source to its destination and back again across a network. In
the context of database operations, it's the time taken for a query to be sent
from the client to the database server and for the response to return to the
client.

In our benchmarking, the Digital Ocean database and the client machine invoking
the script were both located in the same data center (NYC3), resulting in
minimal network latency. On the other hand, the Crunchy Data database was hosted
in AWS `us-east-1`, and it had to communicate across a greater physical
distance, adding to latency.

To get a more accurate value for the network latency, we can compare the average
time taken to run the `SELECT` operation. The `SELECT` operation in the script
is a point query. _A point query refers to type of query that retrieves one or
several rows based on a unique key_.

In our script, it retrieves a single `name` value from the table `t` where the
`id` is `1`(which is the primary key), and is very fast to execute. Thus, the
time taken to execute the `SELECT` operation can give us an approximate value
for the network latency.

```
db_time = network_latency + query_execution_time
```

For point queries, the `query_execution_time` is almost zero so all the time
taken is pretty much "network latency".

```
db_time ≈ network_latency
```

If we look at the benchmarking result, then we can see that for "select"
operation time taken by Digital Ocean is "584 microseconds" and for Crunchy Data
it is "7211 microseconds".

The difference in network latency is _6627 microseconds_. That is 6.6
milliseconds.

This value over multiple queries can add up and can have a significant impact on
the overall response time of your application. To put this into perspective, for
a web application that makes 10 sequential database queries to render a page,
this could add up to about _66 milliseconds_ to the page load time. Now, this
could be an acceptable limit if your page loads in 3/4 seconds. However, if you
are trying to load your page in 200 millisecond,s then you need to watch out.

Ensuring that the database is always up and it's properly backed up is a
non-trivial problem. Latency notwithstanding, Crunchy Data takes care of running
the database. This gives us peace of mind and allows us to exit the cloud one
step at a time.

## Links

- [Human page](https://www.bigbinary.com/blog/crunchy-bridge-vs-digital-ocean)
