Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 486 Bytes

README.md

File metadata and controls

19 lines (12 loc) · 486 Bytes

Hive JSON Schema Finder

This project is a rough prototype that I've written to analyze large collections of JSON documents and discover their Apache Hive schema. I've used it to anaylyze the githubarchive.org's log data.

To build the project, use Maven (3.0.x) from http://maven.apache.org/.

Building the jar:

% mvn package

Run the program:

% bin/find-json-schema *.json.gz

I've uploaded the discovered schema for githubarchive.org to https://gist.github.com/omalley/5125691.