Skip to content

Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster

License

Notifications You must be signed in to change notification settings

jpacerqueira-zz/SparkElasticSearchPublisher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkElasticSearchPublisher

This is a package based pom.xml with :

  • Spak 1.6 Jobs to consume personalized Social Media data

  • Discover personal data profile

  • Transform into an Elastic Search Index with Daily Activity

  • Runs on previous day files with Spark Package : Publication of dat with org.elasticsearch.spark.sql

TODO :

  • Split Jobs into /raw /stage /pubished :: data jobs - pending
  • New App required to publish in elastic seach only totals from published - Done
  • New Elastic Seach APPs must only run in elasticsearch.sql context - Done

About

Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published