Skip to content

Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers

Notifications You must be signed in to change notification settings

jpacerqueira-zz/airflow-executions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Job execution Airflow DAG Selenium Webscaper

Run Job and download mp3 content of podcast BBC_Radio5 Wake Up To Money

 Execute installation with automated docker-composer

    $ bash -x get-docker-compose-Selenium-executor.sh

Selenium Plugin

AirFlow - Docker Container ecosystem

Imap Plugin

The purpose of this plugin is to use the Internet Message Access Protocol (IMAP) to retrieve email messages from a given mail server.

Airflow - 'imap_default' Email WebScraper

Airflow - 'imap_default' configuration required

Creating a connection

To create an IMAP connection using the Airflow UI you need to open the interface > Admin dropdown menu > click on 'connections' > create. The connection needs to be of the form:

  • Conn Id: Your connection id
  • Host: This is the IMAP server url
  • Login: Your email address
  • Password: Your email password, this may not be the same as your everyday email password depending on your mail server

Hooks

The hook is called IMAPHook and can be instantiated with the relevant Airflow connection id.

The hook has a series of methods to connect to a mail server, search for a specific email and download its attachments.

from airflow.hooks import IMAPHook

hook = IMAPHook(imap_conn_id='imap_default')

Operators

IMAPAttachmentOperator

This operator downloads the attachement of an email recieved the day before the execution date within the airflow context and saves it to a local directory.

op = IMAPAttachmentOperator(
    imap_conn_id='imap_default',
    mailbox='mail_test',
    search_criteria={"FROM": "[email protected]",
                     "SUBJECT": "daily_report"},
    local_path='',
    file_name='',
    task_id='imap_example')

op.execute(context={'yesterday_ds': '2019-08-04'})

Requirements

Setup your linux machine follow links for docker and docker-compose

Information in Article

Follow source selenium DAG article in towards datascience link

About

Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published