A lambda to split pre-processed data into, training and validation then uploaded to an S3 bucket. Training and validation data uploaded to the bucket will be used when triggering the training job.
This repository does not create the S3 Bucket, this is created via Terraform found here terraform-aws-machine-learning-pipeline. For more details on the entire flow and how this lambda is deployed, see aws-automlops-serverless-deployment.
The diagram below demonstrates what happens when the lambda is trigger, when a new .csv
object has been uploaded to the S3 Bucket.
graph LR
S0(Start)
T1(Dataset pulled from S3 Bucket)
T2(Random split and sort using Numpy)
T3[["`70% training data
20% validation data
10% test data`"]]
T4("Upload split data into S3 Bucket as `.csv`")
T5("Start training job with training and validation data")
E0(End)
S0-->T1
T1-->T2
T2-->T3
T3-->T4
T4-->T5
T5-->E0
-
Build the docker image locally:
docker build --no-cache -t model_training:local .
-
Run the docker image built:
docker run --platform linux/amd64 -p 9000:8080 model_training:local
-
Send an event to the lambda via curl:
curl "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{<REPLACE_WITH_JSON_BELOW>}'
{ "Records": [ { "eventVersion": "2.0", "eventSource": "aws:s3", "awsRegion": "us-east-1", "eventTime": "1970-01-01T00:00:00.000Z", "eventName": "ObjectCreated:Put", "userIdentity": { "principalId": "EXAMPLE" }, "requestParameters": { "sourceIPAddress": "127.0.0.1" }, "responseElements": { "x-amz-request-id": "EXAMPLE123456789", "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH" }, "s3": { "s3SchemaVersion": "1.0", "configurationId": "testConfigRule", "bucket": { "name": "example-bucket", "ownerIdentity": { "principalId": "EXAMPLE" }, "arn": "arn:aws:s3:::example-bucket" }, "object": { "key": "data/example-bank-file.csv", "size": 515246, "eTag": "0e29c0d99c654bbe83c42097c97743ed", "sequencer": "00656A54CA3D69362D" } } } ] }
The GitHub Action "🚀 Push Docker image to AWS ECR" will check out the repository and push a docker image to the chosen AWS ECR using configure-aws-credentials action. The following repository secrets need to be set:
Secret | Description |
---|---|
AWS_REGION | The AWS Region. |
AWS_ACCOUNT_ID | The AWS account ID. |
AWS_ECR_REPOSITORY | The AWS ECR repository name. |