Apache-drill

Apache drill Docker Image running in Distributed Mode.

This image can be used for any enviroments to spinup a Apache Drill in Distributed mode. Distributed mode needs an external zookeper running available.

prerequisites

Apache Drill
Apache Zookeeper
Docker installed
AWS keys or AWS ec2 instance with Iam role attached with read and list permissions.
AWS S3 bucket

If you are deploying the apache drill to ec2-instance that has iam role attached the

Add the below code to your core-site.xml file before build the Docker Image if your are using AWS IAM role to access s3 storage

<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<property>
    <name>fs.s3a.aws.credentials.provider</name>
    <value>com.amazonaws.auth.InstanceProfileCredentialsProvider</value>
</property>
<property>
    <name>fs.s3a.endpoint</name>
    <value>s3.amazonaws.com</value>
</property>
<property>
    <name>fs.s3a.impl.disable.cache</name>
    <value>true</value>
</property>
<property>
    <name>fs.s3a.connection.maximum</name>
    <value>100</value>
</property>
</configuration>

If you are using a AWS Secret keys and Access keys then copy down the below config to your core-site.xml file before buiding the Docker Image

<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<property>
    <name>fs.s3a.access.key</name>
    <value>ENTER-YOUR-ACCESS-KEY</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>ENTER-YOUR-SECRET-KEY</value>
</property>
<property>
    <name>fs.s3a.endpoint</name>
    <value>s3.amazonaws.com</value>
</property>
<property>
    <name>fs.s3a.connection.maximum</name>
    <value>100</value>
</property>
</configuration>

I am living closer to us-east-1 so my s3 endpoint is s3.amazonaws.com check your closer s3 endpoint in AWS Endpoint Search

Getting started

clone the repository to your local file system.

git clone https://github.com/ravuri96/apache-drill.git

cd apache-drill

To build image:

Clone the repo and build the docker image using the commands below:

docker build -t apachedrill:latest .

To deploy the apache drill docker image:

docker compose up -d

verify the deployment

http://localhost/8047

Apache drill ui

To Pull the Image for dockerhub

docker pull ravuri96/apache-drill

Fun Part

Querying data from S3 bucket using apache drill.

Create a s3 bucket
Add the insurance-data.csv file in the project to s3 bucket.
enable s3 storage plugin

To enable s3 stroge plugin

http://localhost/8047/storage

{
  "type": "file",
  "connection": "s3a://your-bucket-name",
  "config": null,
  "workspaces": {
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "httpd": {
      "type": "httpd",
      "logFormat": "%h %t \"%r\" %>s %b \"%{Referer}i\"",
      "timestampFormat": null
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "avro": {
      "type": "avro"
    },
    "sequencefile": {
      "type": "sequencefile",
      "extensions": [
        "seq"
      ]
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
  },
  "enabled": true
}

copy the above configuration and paste it in your s3 storage plugin and update it.

We have already defined the AWS credentails or IAM Role in core-site.xml so we dont need to define again in storage plugin configuration

All configurations are done

Now go to http://localhost/8047/query

select * from s3.`insurance-data.csv`;

Happy Drilling..!!!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
CNAME		CNAME
Dockerfile		Dockerfile
README.md		README.md
_config.yml		_config.yml
core-site.xml		core-site.xml
docker-compose.yml		docker-compose.yml
drill-override.conf.template		drill-override.conf.template
entrypoint.sh		entrypoint.sh
insurance-data.csv		insurance-data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache-drill

Apache drill Docker Image running in Distributed Mode.

Getting started

To build image:

To deploy the apache drill docker image:

verify the deployment

To Pull the Image for dockerhub

Fun Part

Querying data from S3 bucket using apache drill.

Now go to http://localhost/8047/query

About

Releases

Packages

Languages

ravuri96/apache-drill

Folders and files

Latest commit

History

Repository files navigation

Apache-drill

Apache drill Docker Image running in Distributed Mode.

Getting started

To build image:

To deploy the apache drill docker image:

verify the deployment

To Pull the Image for dockerhub

Fun Part

Querying data from S3 bucket using apache drill.

Now go to http://localhost/8047/query

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages