Skip to content

shao-wang-me/elasticsearch-gnaf

Repository files navigation

Import G-NAF dataset into Elasticsearch

Python scripts and Jupyter notebooks to import G-NAF dataset into Elasticsearch using MongoDB along with a script to generate dummy Australian demographic data and import it into Elasticsearch.

The Geocoded National Address File (referred to as G-NAF) is Australia’s authoritative, geocoded address file.

G-NAF is produced by PSMA Australia and on their page for GNAF, there is a Getting Started Guide. You can follow the steps in the guide to import G-NAF into a relational database. Almost everything you need is in the guide but one thing is missing - the actual method to import the data files into a database.

I import G-NAF into PostgreSQL and then import from PostgreSQL to Elasticsearch.

copy_gnaf_to_postgres.ipynb generates PostgreSQL's COPY command to import the files into Postgres. Just copy the generated command and use in PostgreSQL.

elastic_gnaf.py imports G-NAF from postgres to Elasticsearch.

elastic_australian_people.py is not related to G-NAF, but generates dummy Australian demographic data and imports it into Elasticsearch, if you need. 😊

See also

  1. data61/gnaf: A set of utilities developed by CSIRO's Data61 to import G-NAF into a relational database, establish a Apache Lucence (which Elasticsearch uses under the hood) index and many other things.
  2. Building real-time address search with the Australian G-NAF dataset: A blog from Elastic with a similar goal as this repository but uses Microsoft F#.
  3. aus-search: Uses Node.js and MongoDB to import G-NAF into Elasticsearch.