Skip to content

data pipeline utility to transport forum data into elasticsearch

License

Notifications You must be signed in to change notification settings

andrewMacmurray/minecart

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minecart

utility to embellish transport gingerbread CSV forum data into Elasticsearch. Uses the Google Cloud Natural Language API to add analysis data to each forum post.

Building the utility:

  1. Make sure you have stack (install guides here and here).
  2. Build the project with stack build
  3. Create a postgres DB called minecart

Running

to run the utility:

> stack exec minecart

This will show a list of the actions minecart can perform

To run an action:

> stack exec minecart -- --action

Running Options

There are a number of actions minecart can perform

  • --pgsetup adds the tables to the minecart pogstres database and adds forum posts from gb-forum.csv
  • --entities runs google cloud natural language request for each post body and collects the entities in the db
  • --sentences runs google cloud natural language request for each post body and collects sentences sentiments in the db
  • --elastic sets up an elasticsearch index and adds the complete post data to it

Running Order

  • --pgsetup must be run first
  • --entities & --sentences are optional (need an environment var of GOOGLE_CLOUD_API_KEY)
  • finally run --elastic to set up an elasticsearch index with the post data

About

data pipeline utility to transport forum data into elasticsearch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Haskell 100.0%