Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.
/ daf-kylo Public archive

Kylo integration with PDND (previously DAF).

License

Notifications You must be signed in to change notification settings

italia/daf-kylo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daf-Kylo for PDND (Piattaforma Digitale Nazionale Dati), previously DAF (Data & Analytics Framework)

In order to install and use this repo you may deploy all the components into a cloudera shared edge node.

Daf-Kylo is a data lake platform built on Apache Hadoop and Spark. Daf-Kylo provides a data lake solution enabling self-service data ingest, data preparation, and data discovery. Kylo integrates best practices around metadata capture, security, and data quality. Apache Nifi provides a flexible data processing framework for building batch or streaming pipeline templates, and for enabling self-service features.

What is the PDND (previously DAF)?

PDND stands for "Piattaforma Digitale Nazionale Dati" (Italian Digital Data Platform), previously known as Data & Analytics Framework (DAF).

In brief, is an attempt to establish a central Chief Data Officer (CDO) for the Government and Public Administration. Its main goal is to promote data exchange among Italian Public Administrations (PAs), to support the diffusion of open data, and to enable data-driven policies. You can find more about the PDND on the official Digital Transformation Team website.

What is Daf-Kylo?

Daf-Kylo repository contains the set of components used to deploy and manage the PDND data ingestion process.

Folder /docker contains all the docker files for build images of daf-kylo components.

Folder /kubernetes contains all the yaml files for deploy pods and services on kubernetes.

Folder /kylo contains all the kylo stuff such as api documentation for the integration with PDND Portal, kylo templates, kylo patch.

Folder /nifi contains all the nifi templates and customized processors used in ingestion process.

Folder /scripts contains utils scripts for manage pods, log and other kubernetes stuff.

Prerequisites

Project dependencies

Project dependencies can be find by clicking on this link.

Project components

Project Daf-Kylo depends by the following components.

  • ActiveMQ version 5.15.1, available here;
  • Elasticsearch version 5.6.4, available here;
  • MariaDB version 10.3, available here;
  • Spark version 2.2.0, available here;
  • Kylo-Services version 9.1.0, available here;
  • Kylo-UI version 9.1.0, available here;
  • NiFi version 1.7.0, available here.

How to install and use Daf-Kylo

MacOS and Linux

Installing Daf-Kylo on Unix-like systems requires a package manager such as Homebrew. You can download and install Homebrew following the instructions given in the Homebrew official website. Once you have installed Homebrew, you can follow some steps to complete the setup. First step is Homebrew cask installation. Open a terminal and type the following command to install Homebrew cask:

brew tap caskroom/cask  

Then, update all formulas and Homebrew itself by typing

brew update  

Last, install kube-controller-manager, RPM, make and Git by typing

brew install kubectl rpm make git  

How to build Daf-Kylo

To build most of Docker images, kylo code is required (source and compiled). To get it run, you have to download and compile it, using Makefile, by typing the following commands (production and test environment):

Production

make -f Makefile daf-kylo
make -f Makefile build-kylo  

Test

make -f Makefile.test daf-kylo
make -f Makefile.test build-kylo  

Login to nexus repository

docker login nexus.daf.teamdigitale.it

Build Docker images of the components

Once this is completed, you can build every image (production and test environment), by typing the following comands:

Production

make activemq
make mysql 
make kylo-services  
make kylo-ui  
make nifi  

Test

make -f Makefile.test activemq
make -f Makefile.test mysql  
make -f Makefile.test kylo-services  
make -f Makefile.test kylo-ui  
make -f Makefile.test nifi  

Push Docker images to local artifactory repository

Please ensure previously configuration of docker client as well as correct tagging the image has been performed. 'How to' can be found in:
TeamDigitale onboarding 'Setup Docker '
TeamDigitale onboarding 'Push Docker Image'

After config and proper tagging has been done, push can be performed typing: docker push [repositoryurl:repositoryport/artifact:version]

for instance:

Production

./nexus_push.sh prod [namespace]

Test

./nexus_push.sh test [namespace]
The [namespace] is optional.

Deploy components in kubernetes cluster

Please ensure previously configuration of kubectl has been done. 'How to' can be found in: -TeamDigitale onboarding , 'Setup Kubernetes'

Production

After config is done, deploy into kubernetes cluster can be performed typing ./playbook.sh [component] .

As an example, ./playbook.sh prod activemq [namespace].

Pod deletion can be performed typing: ./cleanup.sh [environment] [component].

As an example,
./cleanup.sh prod activemq [namespace]

Test

for instance:
./playbook.sh test activemq [namespace]
or delete by: ./cleanup.sh [environment] [component]

for instance:
./cleanup.sh test activemq [namespace]

Mysql Configuration

By default the kylo database is not created in mysql container, so you have to create it.

Configuration Ldap

To configure Ldap authentication:
Edit the config-maps kylo-services.yaml & kylo-ui.yaml as follows: config-map/kylo-services.yaml shoud be:

Production

    security.auth.ldap.server.uri=ldap://idm.daf.gov.it:389/cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.authDn=uid=admin,cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.password=xxxxxx

Test

    security.auth.ldap.server.uri=ldap://idm.teamdigitale.test:389/cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.authDn=uid=application,cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.password=xxxxxx

After these two changes redeploy as follows:

kubectl delete -f config-map/kylo-services.yaml  
kubectl delete -f config-map/kylo-ui.yaml  
  
kubectl apply -f config-map/kylo-services.yaml  
kubectl apply -f config-map/kylo-ui.yaml  

In the above example, it is not take in account the [namespace].

  1. Go to idm.teamdigitale.test and create an user such as dladmin with a password password

After these you are able to login into kylo ui!

As pointed out above, once this is done ldap login will be substituted by default login , this will allow to log in with default user dladmin/thinkbig. This has to be done to create users with the same name that those exist in ldap in order to grant them permissions (same functionality but for groups is currently being fixed by R&D) . Once user/s (or group/s) is/are created change back config-map/kylo-services.yaml and config-map/kylo-ui.yaml and redeploy again. Ldap is now good to go.

Bootstrap note

When kylo starts for the first time it need liquibase for creating Kylo DB, make sure that in the application.properties in kylo-service's config map:

liquibase.enabled=true  

How to views Log

Custom Processors

Here you can find additional information about custom processors created for the DAF.

How to contribute

Contributions are welcome. Feel free to open issues and submit a pull request at any time, but please read our handbook first.

License

Copyright (c) 2019 Presidenza del Consiglio dei Ministri

This program is a free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.