Skip to content

Python tool for loading Senzing Engine from RabbitMQ, Kafka, or AWS SQS.

License

Notifications You must be signed in to change notification settings

senzing-garage/stream-loader

Repository files navigation

stream-loader

If you are beginning your journey with Senzing, please start with Senzing Quick Start guides.

You are in the Senzing Garage where projects are "tinkered" on. Although this GitHub repository may help you understand an approach to using Senzing, it's not considered to be "production ready" and is not considered to be part of the Senzing product. Heck, it may not even be appropriate for your application of Senzing!

Synopsis

Pulls JSON records from a queue and inserts into Senzing Engine.

Overview

The stream-loader.py python script consumes data from various sources (Kafka, RabbitMQ, AWS SQS) and publishes it to Senzing. The senzing/stream-loader docker image is a wrapper for use in docker formations (e.g. docker-compose, kubernetes).

To see all of the subcommands, run:

$ ./stream-loader.py --help
usage: stream-loader.py [-h]
                        {kafka,kafka-withinfo,rabbitmq,rabbitmq-withinfo,sleep,sqs,sqs-withinfo,url,version,docker-acceptance-test}
                        ...

Load Senzing from a stream. For more information, see
https://github.com/senzing-garage/stream-loader

positional arguments:
  {kafka,kafka-withinfo,rabbitmq,rabbitmq-withinfo,sleep,sqs,sqs-withinfo,url,version,docker-acceptance-test}
                            Subcommands (SENZING_SUBCOMMAND):
    kafka                   Read JSON Lines from Apache Kafka topic.
    kafka-withinfo          Read JSON Lines from Apache Kafka topic. Return info to a queue.
    rabbitmq                Read JSON Lines from RabbitMQ queue.
    rabbitmq-withinfo       Read JSON Lines from RabbitMQ queue. Return info to a queue.
    sleep                   Do nothing but sleep. For Docker testing.
    sqs                     Read JSON Lines from AWS SQS queue.
    sqs-withinfo            Read JSON Lines from AWS SQS queue. Return info to a queue.
    url                     Read JSON Lines from URL-addressable file.
    version                 Print version of program.
    docker-acceptance-test  For Docker acceptance testing.

optional arguments:
  -h, --help            show this help message and exit

Contents

  1. Preamble
    1. Legend
  2. Expectations
  3. Demonstrate using Command Line Interface
    1. Prerequisites for CLI
    2. Download
    3. Environment variables for CLI
    4. Run command
  4. Demonstrate using Docker
    1. Prerequisites for Docker
    2. Database support
    3. External database
    4. Run Docker container
  5. Directives
  6. Configuration
  7. License
  8. References

Preamble

At Senzing, we strive to create GitHub documentation in a "don't make me think" style. For the most part, instructions are copy and paste. Whenever thinking is needed, it's marked with a "thinking" icon 🤔. Whenever customization is needed, it's marked with a "pencil" icon ✏️. If the instructions are not clear, please let us know by opening a new Documentation issue describing where we can improve. Now on with the show...

Legend

  1. 🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps there are some choices to be made. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

  • Space: This repository and demonstration require 6 GB free disk space.
  • Time: Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.
  • Background knowledge: This repository assumes a working knowledge of:

Demonstrate using Command Line Interface

Prerequisites for CLI

🤔 The following tasks need to be complete before proceeding. These are "one-time tasks" which may already have been completed.

  1. Install system dependencies:
    1. Use apt based installation for Debian, Ubuntu and others
      1. See apt-packages.txt for list
    2. Use yum based installation for Red Hat, CentOS, openSuse and others.
      1. See yum-packages.txt for list
  2. Install Python dependencies:
    1. See requirements.txt for list
      1. Installation hints
  3. The following software programs need to be installed:
    1. senzingapi
  4. 🤔 Optional: Some databases need additional support. For other databases, this step may be skipped.
    1. Db2: See Support Db2.
    2. MS SQL: See Support MS SQL.

Download

  1. Get a local copy of template-python.py. Example:

    1. ✏️ Specify where to download file. Example:

      export SENZING_DOWNLOAD_FILE=~/stream-loader.py
    2. Download file. Example:

      curl -X GET \
        --output ${SENZING_DOWNLOAD_FILE} \
        https://raw.githubusercontent.com/Senzing/stream-loader/main/stream-loader.py
    3. Make file executable. Example:

      chmod +x ${SENZING_DOWNLOAD_FILE}
  2. 🤔 Alternative: The entire git repository can be downloaded by following instructions at Clone repository

Environment variables for CLI

  1. ✏️ Identify the Senzing g2 directory. Example:

    export SENZING_G2_DIR=/opt/senzing/g2
    1. Here's a simple test to see if SENZING_G2_DIR is correct. The following command should return file contents. Example:

      cat ${SENZING_G2_DIR}/g2BuildVersion.json
  2. Set common environment variables Example:

    export PYTHONPATH=${SENZING_G2_DIR}/python
  3. 🤔 Set operating system specific environment variables. Choose one of the options.

    1. Option #1: For Debian, Ubuntu, and others. Example:

      export LD_LIBRARY_PATH=${SENZING_G2_DIR}/lib:${SENZING_G2_DIR}/lib/debian:$LD_LIBRARY_PATH
    2. Option #2 For Red Hat, CentOS, openSuse and others. Example:

      export LD_LIBRARY_PATH=${SENZING_G2_DIR}/lib:$LD_LIBRARY_PATH

Run command

  1. Run the command. Example:

    ${SENZING_DOWNLOAD_FILE} --help
  2. For more examples of use, see Examples of CLI.

Demonstrate using Docker

Prerequisites for Docker

🤔 The following tasks need to be complete before proceeding. These are "one-time tasks" which may already have been completed.

  1. The following software programs need to be installed:
    1. docker
  2. Configure Senzing database using Docker

Database support

🤔 Optional: Some databases need additional support. For other databases, these steps may be skipped.

  1. Db2: See Support Db2 instructions to set SENZING_OPT_IBM_DIR_PARAMETER.
  2. MS SQL: See Support MS SQL instructions to set SENZING_OPT_MICROSOFT_DIR_PARAMETER.

External database

🤔 Optional: Use if storing data in an external database. If not specified, the internal SQLite database will be used.

  1. ✏️ Specify database. Example:

    export DATABASE_PROTOCOL=postgresql
    export DATABASE_USERNAME=postgres
    export DATABASE_PASSWORD=postgres
    export DATABASE_HOST=senzing-postgresql
    export DATABASE_PORT=5432
    export DATABASE_DATABASE=G2
  2. Construct Database URL. Example:

    export SENZING_DATABASE_URL="${DATABASE_PROTOCOL}://${DATABASE_USERNAME}:${DATABASE_PASSWORD}@${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_DATABASE}"
  3. Construct parameter for docker run. Example:

    export SENZING_DATABASE_URL_PARAMETER="--env SENZING_DATABASE_URL=${SENZING_DATABASE_URL}"

Run Docker container

Although the Docker run command looks complex, it accounts for all of the optional variations described above. Unset *_PARAMETER environment variables have no effect on the docker run command and may be removed or remain.

  1. ✏️ Set environment variables. Example:

    export SENZING_DATA_SOURCE=TEST
    export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092
    export SENZING_KAFKA_TOPIC=senzing-kafka-topic
    export SENZING_MONITORING_PERIOD=60
    export SENZING_SUBCOMMAND=kafka
  2. Run Docker container. Example:

    sudo docker run \
      --env SENZING_DATA_SOURCE="${SENZING_DATA_SOURCE}" \
      --env SENZING_KAFKA_BOOTSTRAP_SERVER="${SENZING_KAFKA_BOOTSTRAP_SERVER}" \
      --env SENZING_KAFKA_TOPIC="${SENZING_KAFKA_TOPIC}" \
      --env SENZING_MONITORING_PERIOD="${SENZING_MONITORING_PERIOD}" \
      --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \
      --interactive \
      --rm \
      --tty \
      ${SENZING_DATABASE_URL_PARAMETER} \
      ${SENZING_NETWORK_PARAMETER} \
      ${SENZING_OPT_IBM_DIR_PARAMETER} \
      ${SENZING_OPT_MICROSOFT_DIR_PARAMETER} \
      ${SENZING_RUNAS_USER_PARAMETER} \
      senzing/stream-loader
  3. For more examples of use, see Examples of Docker.

Directives

The stream loader will inspect each incoming JSON message for a "senzingStreamLoader" JSON property name. The "senzingStreamLoader" property value is used to direct the actions of the stream loader. The "senzingStreamLoader" property will be removed from the JSON message before the message is sent to the Senzing Engine.

  1. The format of the "senzingStreamLoader" property value is:

    {
        "action": "<action-identifier>"
    }
  2. The supported "action-identifiers" are:

    1. addRecord
    2. addRecordWithInfo
    3. reevaluateRecord
    4. reevaluateRecordWithInfo
    5. deleteRecord
    6. deleteRecordWithInfo
  3. In a message, it looks like this example:

    {"senzingStreamLoader": {"action": "deleteRecordWithInfo"}, "DATA_SOURCE": "TEST", "RECORD_ID": "242131119", ...}
  4. If no directive exists, the action taken by the stream-loader will be addRecord or addRecordWithInfo, depending on the stream-loader.py subcommand. For subcommands, see Overview.

Configuration

Configuration values specified by environment variable or command line parameter.

License

View license information for the software container in this Docker image. Note that this license does not permit further distribution.

This Docker image may also contain software from the Senzing GitHub community under the Apache License 2.0.

Further, as with all Docker images, this likely also contains other software which may be under other licenses (such as Bash, etc. from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

References

  1. Development
  2. Errors
  3. Examples
  4. Related artifacts:
    1. DockerHub
    2. Helm Chart