Works portfolio

Offline geocoder for Brazil

Work

This work was realized for an undisclosed contractor .

2026 Geocoding

In 2026, I was tasked with creating a VirtualBox machine providing Pelias, an offline geocoder, with data for Brazil. To make the work reproducible, I used Vagrant and versioned the code in a repository: pelias-vagrant.

Pelias relies on three open data sources: Who’s On First for the high-level administrative entities, OpenAddresses for authoritative data (national, local) for street-level addresses and OpenStreetMap for collaborative data about any location. It’s also possible to use Geonames as an alternative to OpenAddresses and OpenStreetMap, but it does not seem to be supported out-of-the-box by the docker install.

The data is processed and then imported in Elasticsearch, which is the core of the geocoder.

Example with Portland

Pelias provides a sample project with data for Portland, which requires downloading and preparing 6GB of data.

On the following figure, captured on the Pelias compare tool, you can see the result of requesting a location for the address “1955, Northwest Raleigh Street, Slabtown, Northwest District, Portland, Multnomah County, Oregon, 97209, USA” (on the example offline geocoder for Portland).

Prepare an offline geocoder (Pelias) with data for Brazil
Prepare an offline geocoder (Pelias) with data for Brazil

Pelias gives 2 results with OpenAddresses and 3 results with OpenStreetMap, all with confidence 1 but not the same coordinates 🤷. Only two of them match the street number. This means that some post-processing can be needed when using the geocoder.

Brazil

Once the local geocoder worked for Portland, I did the same for Brazil, which generates about 100GB data. Only half of it needs to be copied to the machine, ie. the virtual machine was about 50GB. Preparing the data takes about 12 hours.

On the following figure, you can see the result of requesting a location for the address “Estr. do Mendanha, 3345 - Campo Grande, Rio de Janeiro - RJ, 23092-001, Brazil”. The result from the offline geocoder is the same as geocode.earth.

Prepare an offline geocoder (Pelias) with data for Brazil
Prepare an offline geocoder (Pelias) with data for Brazil

On my first try, the OpenStreetMap data could not be used to generate polylines (which are used to interpolate the location of addresses on a street), because the PDB file for Brazil is too big. I intially overlooked the warning and provide the virtual machine as such. I then managed to process the OSM data by using the 5 region splits instead of the country-wide file. The client could check on their own side that this slight change increased the percentage of correctly geocoded adddresses in a small city of Brazil from 60% to 75%. The 15% addresses that are now precisely localted using OSM data were previously geocoded from unprecise OpenAddresses data or from the Who’s on First city centroid.