Skip to content

This repository describes how to design and implement Natural Language Processing(NLP)-based service using AWS Serverless, Amazon Comprehend and AWS Cloud Development Kit(CDK)

License

Notifications You must be signed in to change notification settings

aws-samples/aws-serverless-nlp-comprehend-using-aws-cdk

AWS Serverless NLP Comprehend using AWS CDK

This repository describes how to design and implement Natural Language Processing(NLP)-based service using AWS Serverless, Amazon Comprehend and AWS Cloud Development Kit(CDK). This sample specifically illustrates a real-time user review analysis system as an example. All resources and configuration is provided through AWS CDK(typescript codes).

Solution

Amazon Comprehend provides a various solution(APIs) to analyze text within document. If we use this to build Review Analysis System, we can get very easy, fast and high-accuracy AI features. Particulary, If we provide real-time analysis, a combination of AWS CDK and AWS Serverless can make this easier. AWS Serverless can be used in a wide variety of fields from web development to data processing, and configuring and deploying these as IaC(AWS CDK) can maximize development productivity.

Amazon Comprehend

The following features in Amazon Comprehend were applied.

AWS Serverless

The following services in AWS Serverless were applied.

AWS CDK

The following AWS CDK-related open sources were applied.

Architecture

This architecture covers the following features.

  • Serverless Realtime Review API Service
  • Serverless Realtime Review Sentiment Analysis
  • Serverless Review Entity/Syntax Stream-Batch Analysis
  • Serverless Near Realtime Data Processing & Visualization
  • Serverless System Monitoring Dashboard

solution_architecture

Project Directory Structure

  • config: CDK project configuration json file for each deployment stage
  • infra: CDK typescript source codes
  • infra/app-main: CDK project main file
  • infra/stack: CDK Stack classes
  • codes/lambda: python source codes for each lambda function
  • script: utility scripts such as setup/deploy/destroy/simulation
  • script/simulation: simulation test scripts
  • test: test files such as CDK-nag

Implementation

IaC - CDK Project

All the resources described above are implemented and provided through AWS CDK ver2. Because this CDK project is built on top of AWS CDK Project Template for DevOps , please refer to that repository for details.

And other "Using AWS CDK" series can be found at:

Prerequisites - Installation

First of all, AWS Account and IAM User is required. And then the following modules must be installed.

  • AWS CLI: aws configure --profile [profile name]
  • Node.js: node --version
  • AWS CDK: cdk --version
  • jq: jq --version
  • curl: curl --version
  • python: python3 --version

Configuration

Open one of configuration json files in config directory, and update Name/Stage/Accouont/Region/Profile in Project. Accouont/Region/Profile depdends on your AWS Account, and you don't need to change Name/Stage. Additionaly, update email address in Stack/ReviewDashboard/SubscriptionEmails.

{
    "Project": {
        "Name": "ReviewService",                      <----- Optional: your project name, all stacks will be prefixed with [Project.Name+Project.Stage]
        "Stage": "Dev",                               <----- Optional: your project stage, all stacks will be prefixed with [Project.Name+Project.Stage]
        "Account": "your aws account number",         <----- Essential: update according to your AWS Account
        "Region": "your aws region name",             <----- Essential: update according to your target region
        "Profile": "your aws credential profile name" <----- Essential: AWS Profile, keep empty string if you use `default` profile
    },

    "Stack": {
        ...
        ...
        "ReviewDashboard": {
            "Name": "ReviewDashboardStack",

            "DashboardName": "ReviewDashboard",

            "SubscriptionEmails": ["your email address"], <----- Essential: Alarm notification Emails

            "ApiGatewayOverallCallThreshold": 100,        <----- Optional: Alarm Threshold for Overall Call
            "ApiGatewayError4xxCallThreshold": 20,        <----- Optional: Alarm Threshold for 4XX Error Call
            "ApiGatewayError5xxCallThreshold": 20         <----- Optional: Alarm Threshold for 5XX Error Call
        }
    }
}

In this guide, I have chosen config/app-config-dev.json file for convenience of explanation.

Setup AWS CDK Environment

Caution: This solution contains not-free tier AWS services. So be careful about the possible costs.

sh script/setup_initials.sh config/app-config-dev.json

Deploy 4 Stacks

Caution: This solution contains not-free tier AWS services. So be careful about the possible costs.

Execute this single script:

sh script/deploy_stacks.sh config/app-config-dev.json

or you can deploy manually like this:

export AWS_PROFILE=[your profile name]
export APP_CONFIG=config/app-config-dev.json

cdk list

cdk deploy *-ReviewBackendStack
cdk deploy *-ApiGatewayStack --outputs-file script/output/ApiGatewayStack.json
cdk deploy *-ReviewAnalysisStack --outputs-file script/output/ReviewAnalysisStack.json
cdk deploy *-ReviewDashboardStack

Caution: You must match this order for the first deployment. After that, these Stacks can be deployed independently in any order.

Deployment Results

This is a deployment result in CloudFormation.

deploy-cloudformation

How to simulate

Create and confirm a user in Coginto

Execute this single script. This create_user.sh script will create a new user and confirm in Cognito.

sh script/simulation/create_user.sh [aws profile name] [new user id, for example user-01] [new user pw] [cognito user pool id]

where [cognito user pool id] is OutputUserPoolId in script/output/ApiGatewayStack.json.

This is Password Policy in Cognito in api-gateway-stack.ts:

{
    requireSymbols: true,
    minLength: 8,
    requireUppercase: true,
    requireDigits: true
}

cognito-user-creation

Request a lot of reviews

Execute this single script. This request_reviews.py will log in to get Token and request POST REST API using Amazon review data - Toy.

python3 script/simulation/request_reviews.py --profile [aws profile name] --url [APIGatewaty URL + /review] --pool [cognito user pool client id] --id [new user id] --pw [new user pw]

where [APIGatewaty URL] is OutputRestApiUrl in script/output/ApiGatewayStack.json. [cognito user pool client id] is OutputUserPoolClientId in script/output/ApiGatewayStack.json.

Monitoring Dashboard

After a while, go to CloudWatch Dashboard. You can check the metrics that new data is coming in.

API Gateway Widget

dashboard-apigateway

Kinesis Widget

dashboard-kinesis

Athena Queries

Our CDK ReviewAnalysisStack deploy Workgroup and pre-defined queries in Athena. So we can easily execute those queries on demand.

Go to Qthena console, and Query editor menu, and then Saved queries. After changing Workgroup, execute the queries in order(3~7).

qthena-queries

These quries will create the following tables in Athena. We will use sentiment-table/syntax-table/entities-table in QuickSight

athena-tables

QuickSight Dashboard

Our CDK ReviewAnalysisStack just deploy QuickSight role only for QuickSight. So we have to set up QuickSight's DataSource/Analysis/Dashboard manually.

QuickSight Role setting

Go to QuickSight console, and Manage QuickSigh menu, and then Security & permissions. Please change QuickSight-managed role(default) to an existing role which CDK created in ReviewAnalysisStack for us.

where [an existing role] is OutputQuickSightRole in script/output/ReviewAnalysisStack.json

quicksight-setting

QuickSight Analysis

QuickSight Sentiment Analysis quicksight-sentiment

QuickSight Syntax Analysis quicksight-syntax

QuickSight Entities Analysis quicksight-entities

Clean up

sh script/destroy_stacks.sh config/app-config-dev.json

Caution: You must delete S3/DynamoDB manually because of removal policy.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

This repository describes how to design and implement Natural Language Processing(NLP)-based service using AWS Serverless, Amazon Comprehend and AWS Cloud Development Kit(CDK)

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published