Skip to content

Terraform module for terraform-aws-emr

License

Notifications You must be signed in to change notification settings

Datatamer/terraform-aws-emr

Repository files navigation

TAMR AWS EMR Terraform Module

This module creates the entire AWS infrastructure required for Tamr to work with AWS EMR. Currently, this module supports 3 patterns of use:

  1. Creation of infrastruction for static HBase cluster
  2. Creation of infrastructure for static Spark cluster
  3. Creation of infrastructure for ephemeral Spark cluster (the cluster itself is not created)

Examples

Minimal

Fully working examples for each pattern of use. These examples might require extra resources to run the examples.

Invokes the root module:

Invokes submodules:

Resources Created

This module creates:

  • 5 Security Groups
    • One security group for EMR Managed Master instance(s)
    • One security group for EMR Managed Core instance(s)
    • One security group for additional ports for Master instance(s)
    • One security group for additional ports for Core instance(s)
    • One service access security group that can be attached to any instance
  • Security group rules. The number of the security group rules varies based on the number of CIDRs or source SGs provided.
  • 2 IAM Policies:
    • Minimum required EMR service policy
    • Minimum required EMR EC2 policy
  • 2 IAM roles:
    • Tamr EMR service IAM role
    • Tamr EMR EC2 IAM role
  • 1 IAM instance profile for EMR EC2 instances
  • 1 bucket object with the cluster's startup script

If you are creating a static HBase or Spark cluster, this module also creates:

  • 1 EMR Cluster and associated EMR Security Configuration

Note: For creating the logs and root directory buckets and/or S3-related permissions, use the terraform-aws-s3 module.

Requirements

Name Version
terraform >= 0.13
aws >= 3.36, !=4.0.0, !=4.1.0, !=4.2.0, !=4.3.0, !=4.4.0, !=4.5.0, !=4.6.0, !=4.7.0, !=4.8.0

Providers

No provider.

Inputs

Name Description Type Default Required
applications List of applications to run on EMR list(string) n/a yes
bucket_name_for_logs S3 bucket name for cluster logs. string n/a yes
bucket_name_for_root_directory S3 bucket name for storing root directory string n/a yes
emr_config_file_path Path to the EMR JSON configuration file. Please include the file name as well. string n/a yes
emr_managed_core_sg_ids List of EMR managed core security group ids list(string) n/a yes
emr_managed_master_sg_ids List of EMR managed master security group ids list(string) n/a yes
emr_service_access_sg_ids List of EMR service access security group ids list(string) n/a yes
key_pair_name Name of the Key Pair that will be attached to the EC2 instances string n/a yes
subnet_id ID of the subnet where the EMR cluster will be created string n/a yes
vpc_id VPC ID of the network string n/a yes
abac_valid_tags Valid tags for maintaining resources when using ABAC IAM Policies with Tag Conditions. Make sure tags contain a key value specified here. map(list(string)) {} no
additional_policy_arns List of policy ARNs to attach to EMR EC2 instance profile. list(string) [] no
additional_tags [DEPRECATED: Use tags instead] Additional tags to be attached to the resources created. map(string) {} no
arn_partition The partition in which the resource is located. A partition is a group of AWS Regions.
Each AWS account is scoped to one partition.
The following are the supported partitions:
aws -AWS Regions
aws-cn - China Regions
aws-us-gov - AWS GovCloud (US) Regions
string "aws" no
bootstrap_actions Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes.
list(object({
name = string
path = string
args = list(string)
}))
[] no
bucket_path_to_logs Path in logs bucket to store cluster logs e.g. mycluster/logs string "" no
cluster_name Name for the EMR cluster to be created string "TAMR-EMR-Cluster" no
core_bid_price Bid price for each EC2 instance in the core instance group, expressed in USD. By setting this attribute,
the instance group is being declared as a Spot Instance, and will implicitly create a Spot request.
Leave this blank to use On-Demand Instances
string "" no
core_bid_price_as_percentage_of_on_demand_price Bid price as percentage of on-demand price for core instances number 100 no
core_block_duration_minutes Duration for core spot instances, in minutes number 0 no
core_ebs_size The volume size, in gibibytes (GiB). string "500" no
core_ebs_type Type of volumes to attach to the core nodes. Valid options are gp2, io1, standard and st1 string "gp2" no
core_ebs_volumes_count Number of volumes to attach to the core nodes number 1 no
core_instance_fleet_name Name for the core instance fleet string "CoreInstanceFleet" no
core_instance_on_demand_count Number of on-demand instances for the spot instance fleet number 1 no
core_instance_spot_count Number of spot instances for the spot instance fleet number 0 no
core_instance_type The EC2 instance type of the core nodes string "m4.xlarge" no
core_timeout_action Timeout action for core instances string "SWITCH_TO_ON_DEMAND" no
core_timeout_duration_minutes Spot provisioning timeout for core instances, in minutes number 10 no
create_static_cluster True if the module should create a static cluster. False if the module should create supporting infrastructure but not the cluster itself. bool true no
custom_ami_id The ID of a custom Amazon EBS-backed Linux AMI string null no
emr_ec2_instance_profile_name Name of the new instance profile for EMR EC2 instances string "tamr_emr_ec2_instance_profile" no
emr_ec2_role_name Name of the new IAM role for EMR EC2 instances string "tamr_emr_ec2_role" no
emr_managed_sg_name Name for the EMR managed security group string "TAMR-EMR-Internal" no
emr_service_iam_policy_name Name for the IAM policy attached to the EMR Service role string "tamr-emr-service-policy" no
emr_service_role_name Name of the new IAM service role for the EMR cluster string "tamr_emr_service_role" no
hadoop_config_path Path in root directory bucket to upload Hadoop config to string "config/hadoop/conf/" no
hbase_config_path Path in root directory bucket to upload HBase config to string "config/hbase/conf.dist/" no
master_bid_price Bid price for each EC2 instance in the master instance group, expressed in USD. By setting this attribute,
the instance group is being declared as a Spot Instance, and will implicitly create a Spot request.
Leave this blank to use On-Demand Instances
string "" no
master_bid_price_as_percentage_of_on_demand_price Bid price as percentage of on-demand price for master instances number 100 no
master_block_duration_minutes Duration for master spot instances, in minutes number 0 no
master_ebs_size The volume size, in gibibytes (GiB). string "100" no
master_ebs_type Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1 string "gp2" no
master_ebs_volumes_count Number of volumes to attach to the master nodes number 1 no
master_instance_fleet_name Name for the master instance fleet string "MasterInstanceFleet" no
master_instance_on_demand_count Number of on-demand instances for the master instance fleet number 1 no
master_instance_spot_count Number of spot instances for the master instance fleet number 0 no
master_instance_type The EC2 instance type of the master nodes string "m4.xlarge" no
master_timeout_action Timeout action for master instances string "SWITCH_TO_ON_DEMAND" no
master_timeout_duration_minutes Spot provisioning timeout for master instances, in minutes number 10 no
permissions_boundary ARN of the policy that will be used to set the permissions boundary for all IAM Roles created by this module string null no
release_label The release label for the Amazon EMR release. string "emr-5.29.0" no
require_abac_for_subnet If abac_valid_tags is specified, choose whether or not to require ABAC also for actions related to the subnet bool true no
s3_policy_arns [DEPRECATED] List of policy ARNs to attach to EMR EC2 instance profile. Use 'additional_policy_arns' instead. list(string) [] no
security_configuration The name of an EMR Security Configuration string null no
tags A map of tags to add to all resources. Replaces additional_tags. map(string) {} no
utility_script_bucket_key Key (i.e. path) to upload the utility script to string "util/upload_hbase_config.sh" no

Outputs

Name Description
core_ebs_size The core EBS volume size, in gibibytes (GiB).
core_ebs_type The core EBS volume size, in gibibytes (GiB).
core_ebs_volumes_count Number of volumes to attach to the core nodes
core_fleet_instance_count Number of on-demand and spot core instances configured
core_instance_type The EC2 instance type of the core nodes
emr_configuration_json EMR cluster configuration in JSON format
emr_ec2_instance_profile_arn ARN of the EMR EC2 instance profile created
emr_ec2_instance_profile_name Name of the EMR EC2 instance profile created
emr_ec2_role_arn ARN of the EMR EC2 role created for EC2 instances
emr_managed_core_sg_ids List of security group ids of the EMR Core Security Group
emr_managed_master_sg_ids List of security group ids of the EMR Master Security Group
emr_managed_sg_id Security group id of the EMR Managed Security Group for internal communication
emr_service_access_sg_ids List of security group ids of the EMR Service Access Security Group
emr_service_role_arn ARN of the EMR service role created
emr_service_role_name Name of the EMR service role created
hbase_config_path Path in the root directory bucket that HBase config was uploaded to.
log_uri The path to the S3 location where logs for this cluster are stored.
master_ebs_size The master EBS volume size, in gibibytes (GiB).
master_ebs_type Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1
master_ebs_volumes_count Number of volumes to attach to the master nodes
master_fleet_instance_count Number of on-demand and spot master instances configured
master_instance_type The EC2 instance type of the master nodes
release_label The release label for the Amazon EMR release.
subnet_id ID of the subnet where EMR cluster was created
tamr_emr_cluster_id Identifier for the AWS EMR cluster created. Empty string if set up infrastructure for ephemeral cluster.
tamr_emr_cluster_name Name of the AWS EMR cluster created
upload_config_script_s3_key The name of the upload config script object in the bucket.

References

This repo is based on:

Development

Generating Docs

Run make terraform/docs to generate the section of docs around terraform inputs, outputs and requirements.

Checkstyles

Run make lint, this will run terraform fmt, in addition to a few other checks to detect whitespace issues. NOTE: this requires having docker working on the machine running the test

Releasing new versions

  • Update version contained in VERSION
  • Document changes in CHANGELOG.md
  • Create a tag in github for the commit associated with the version

License

Apache 2 Licensed. See LICENSE for full details.