---
title: "Setting up a high performance Geocoder"
description:
  "A setup guide with side-by-side comparison of Nominatim with Google geocoder
  and how to improve geocoding efficiency"
canonical_url: "https://www.bigbinary.com/blog/setting-up-a-high-performance-geocoder"
markdown_url: "https://www.bigbinary.com/blog/setting-up-a-high-performance-geocoder.md"
---

# Setting up a high performance Geocoder

A setup guide with side-by-side comparison of Nominatim with Google geocoder and
how to improve geocoding efficiency

- Author: Midhun Krishna
- Published: August 21, 2018
- Categories: Ruby, Rails

One of our applications uses geocoding extensively. When we started the project,
we included the excellent
[Geocoder gem](https://github.com/alexreisner/geocoder), and set
[Google](https://developers.google.com/maps/documentation/geocoding/start) as
the geocoding backend. As the application scaled, its geocoding requirements
grew and soon we were looking at geocoding bills worth thousands of dollars.

### An alternative Geocoder

Our search for an alternative geocoder landed us on Nominatim. Written in C,
with a PHP web interface, Nominatim was performant enough for our requirements.
Once set up, Nominatim required 8GB of RAM to run and this included RAM for the
PostgreSQL (+ PostGIS) as well.

The rest of the blog discusses how to setup Nominatim and the tips and tricks
that we learned along the way and how it compares with the geocoding solution
offered by Google.

### Setting up Nominatim

We started off by looking for Amazon Machine Images with Nominatim setup and
could only find one which was hosted by
[OpenStreetMap](https://www.openstreetmap.org/) but the magnet link was dead.

Next, we went through the
[official installation document](http://nominatim.org/release-docs/latest/admin/Installation/).
We decided to give docker a shot and found that there are many Nominatim docker
builds. We used
[https://github.com/merlinnot/nominatim-docker](https://github.com/merlinnot/nominatim-docker)
since it seemed to follow all the steps mentioned in the official installation
guide.

## Issues faced during Setup

#### Out of Memory Errors

The official documentation recommends using 32GB of RAM for initial import but
we needed to double the memory to 64GB to make it work.

Also any time docker build failed, due to the large amount of data that is
generated on each run, we also ran out of disk space on subsequent docker builds
since docker caches layers across builds.

#### Merging Multiple Regions

We wanted to geocode locations from USA, Mexico, Canada and Sri Lanka. USA,
Mexico and Canada are
[included by default in North America data extract](http://download.geofabrik.de/north-america.html#subregions)
but we had to merge data for Sri Lanka with North America to get it in a format
required for initial import.

The following snippet pre-processes map data for North America and Sri Lanka
into a single data.osm.pbf file that can be directly used by Nominatim
installer.

```bash
RUN curl -L 'http://download.geofabrik.de/north-america-latest.osm.pbf' \
    --create-dirs -o /srv/nominatim/src/north-america-latest.osm.pbf
RUN curl -L 'http://download.geofabrik.de/asia/sri-lanka-latest.osm.pbf' \
    --create-dirs -o /srv/nominatim/src/sri-lanka-latest.osm.pbf

RUN osmconvert /srv/nominatim/src/north-america-latest.osm.pbf \
    -o=/srv/nominatim/src/north-america-latest.o5m
RUN osmconvert /srv/nominatim/src/sri-lanka-latest.osm.pbf \
    -o=/srv/nominatim/src/sri-lanka-latest.o5m

RUN osmconvert /srv/nominatim/src/north-america-latest.o5m \
    /srv/nominatim/src/sri-lanka-latest.o5m \
    -o=/srv/nominatim/src/data.o5m

RUN osmconvert /srv/nominatim/src/data.o5m \
    -o=/srv/nominatim/src/data.osm.pbf
```

#### Slow Search times

Once the installation was done, we tried running simple location
[searches like this one](https://nominatim.openstreetmap.org/search.php?q=New+York&polygon_geojson=1&viewbox=),
but the search timed out. Usually Nominatim can provide a lot of information
from its web-interface by appending `&debug=true` to the search query.

```bash
# from
https://nominatim.openstreetmap.org/search.php?q=New+York&polygon_geojson=1&viewbox=
# to
https://nominatim.openstreetmap.org/search.php?q=New+York&polygon_geojson=1&viewbox=&debug=true
```

We created an
[issue in Nominatim repository](https://github.com/openstreetmap/Nominatim/issues/1023)
and got very prompt replies from Nominatim maintainers, especially from
[Sarah Hoffman](https://github.com/lonvia) .

```sql
# runs analyze on the entire nominatim database
psql -d nominatim -c 'ANALYZE VERBOSE'
```

PostgreSQL query planner
[depends on statistics](https://www.postgresql.org/docs/9.5/static/planner-stats.html)
collected by
[postgres statistics collector](https://www.postgresql.org/docs/9.1/static/monitoring-stats.html)
while executing a query. In our case, query planner took an enormous amount of
time to plan queries as there were no stats collected since we had a fresh
installation.

### Comparing Nominatim and Google Geocoder

We compared 2500 addresses and we found that Google geocoded 99% of those
addresses. In comparison Nominatim could only geocode 47% of the addresses.

It means we still need to geocode ~50% of addresses using Google geocoder. We
found that we could increase geocoding efficiency by normalizing the addresses
we had.

### Address Normalization using libpostal

[Libpostal](https://github.com/openvenues/libpostal) is an address normalizer,
which uses
[statistical natural-language processing](<https://en.wikipedia.org/wiki/Natural-language_processing#Statistical_natural-language_processing_(SNLP)>)
to normalize addresses. Libpostal also has
[ruby bindings](https://github.com/openvenues/ruby_postal) which made it quite
easy to use it for our test purposes.

Once libpostal and its ruby-bindings were installed (installation is
straightforward and steps are available in
[ruby-postal's github page](https://github.com/openvenues/ruby_postal)), we gave
libpostal + Nominatim a go.

```ruby
require 'geocoder'
require 'ruby_postal/expand'
require 'ruby_postal/parser'

Geocoder.configure({lookup: :nominatim, nominatim: { host: "nominatim_host:port"}})

full_address = [... address for normalization ...]
expanded_addresses = Postal::Expand.expand_address(full_address)
parsed_addresses = expanded_addresses.map do |address|
  Postal::Parser.parse_address(address)
end

parsed_addresses.each do | address |
  parsed_address = [:house_number, :road, :city, :state, :postcode, :country].inject([]) do |acc, key|
    # address is of format
    # [{label: 'postcode', value: 12345}, {label: 'city', value: 'NY'} .. ]
    key_value = address.detect { |address| address[:label] == key }
    if key_value
        acc << "#{key_value_pair[:value]}".titleize
    end
    acc
  end

  coordinates = Geocoder.coordinates(parsed_address.join(", "))
  if (coordinates.is_a? Array) && coordinates.present?
    puts "By Libpostal #{coordinates} => #{parsed_address.join(", ")}"
    break
  end
end
```

With this, we were able to improve our geocoding efficiency by 10% as
Nominatim + Libpostal combination could geocode ~ 59% of addresses.

## Links

- [Human page](https://www.bigbinary.com/blog/setting-up-a-high-performance-geocoder)
