- Clone repo
- Install docker + docker-compose
- Run
docker-compose --compatibility up
- Run
bash bootstrap-superset.sh
- Done! Checkout the service endpoints:
Trino: http://localhost:8080/ui/
(username can be anything)
Minio: http://localhost:9001/
(username: minio_access_key
, password: minio_secret_key
)
Superset: http://localhost:8088/
(username: admin
, password: admin
)
- Go to
data
dropdown and clickdatabases
- Click the
+ database
button - For
Select a database to connect
choosepresto
- In
SQLALCHEMY URI
puttrino://hive@trino-coordinator:8080/hive
- Switch over to
Advanced
tab - In
SQL Lab
select all options - In
Security
selectAllow data upload
-
https://docs.stackable.tech/home/stable/demos/trino-taxi-data.html
-
Dataset : https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
trino-taxi-data
Install the required Stackable operators.
Spin up the following data products:
-
Superset: A modern data exploration and visualization platform. This demo utilizes Superset to retrieve data from Trino via SQL queries and build dashboards on top of that data.
-
Trino: A fast distributed SQL query engine for big data analytics that helps you explore your data universe. This demo uses it to enable SQL access to the data.
-
MinIO: A S3 compatible object store. This demo uses it as persistent storage to store all the data used
-
Hive metastore: A service that stores metadata related to Apache Hive and other services. This demo uses it as metadata storage for Trino.
-
Open policy agent (OPA): An open-source, general-purpose policy engine unifying policy enforcement across the stack. This demo uses it as the authorizer for Trino, which decides which user can query
which data.
Load test data into S3. It contains 2.5 years of New York City taxi trips.
Make data accessible via SQL in Trino.
Create Superset dashboards for visualization of the data.