From e3916a76b47ea2afc2cc5b3dfae8b0e0bffd5dd7 Mon Sep 17 00:00:00 2001 From: Aditya Choudhari <48932219+adityachoudhari26@users.noreply.github.com> Date: Mon, 4 Mar 2024 14:41:15 -0800 Subject: [PATCH] feat: Add operator helm release (#98) * add new infra alongside existing infra * terraform-docs: automated action * weave install false while rwo being implemented * add boolean flag * terraform-docs: automated action * better var name * terraform-docs: automated action * weave install true test * add wandb_replicas logic for gke app * terraform-docs: automated action * add license and extraenvs * correct caCertPath for redis * minimal spec * re-add bucket, mysql, and redis * add other wandb env * add all but extra envs * add extraEnv * unneeded comma * redis cert path follows helm charts * add depends on * correct controller image tag * add envs * rebase * chore(release): version 1.23.0 [skip ci] ## [1.23.0](https://github.com/wandb/terraform-google-wandb/compare/v1.22.0...v1.23.0) (2024-02-21) ### Features * Add support for t-shirt-sized deployments ([#91](https://github.com/wandb/terraform-google-wandb/issues/91)) ([5432961](https://github.com/wandb/terraform-google-wandb/commit/5432961f6688a5eed5a646d7ab772f28844d4bf7)), closes [#92](https://github.com/wandb/terraform-google-wandb/issues/92) * fix: Backwards compatibility for t-shirt-sized deployments (#101) * fix: Backwards compatibility for t-shirt-sized deployments * empty * empty * chore(release): version 1.23.1 [skip ci] ### [1.23.1](https://github.com/wandb/terraform-google-wandb/compare/v1.23.0...v1.23.1) (2024-02-21) ### Bug Fixes * Backwards compatibility for t-shirt-sized deployments ([#101](https://github.com/wandb/terraform-google-wandb/issues/101)) ([f812f81](https://github.com/wandb/terraform-google-wandb/commit/f812f810ec6addd3f8a18fe114d320245d64c9da)) * fix: Backwards compatibility fix to avoid changes in nodegroups. (#102) * fix: backwards compatibility fix to avoid changes in nodegroups. * terraform-docs: automated action * update example --------- Co-authored-by: github-actions[bot] Co-authored-by: George Scott * chore(release): version 1.23.2 [skip ci] ### [1.23.2](https://github.com/wandb/terraform-google-wandb/compare/v1.23.1...v1.23.2) (2024-02-22) ### Bug Fixes * Backwards compatibility fix to avoid changes in nodegroups. ([#102](https://github.com/wandb/terraform-google-wandb/issues/102)) ([c331853](https://github.com/wandb/terraform-google-wandb/commit/c3318536187b9cd17d9371c64b602e3aa8f5c399)) * rebase * rebase * rebase * terraform-docs: automated action * pull out ssl certificate id * specify which https for putput * remove issuer create tag from ingress * add inverse gorilla glue logic --------- Co-authored-by: github-actions[bot] Co-authored-by: semantic-release-bot Co-authored-by: Yogesh Garg Co-authored-by: George Scott --- README.md | 6 +- examples/public-dns-with-cloud-dns/main.tf | 9 +++ main.tf | 94 ++++++++++++++++++++-- modules/app_lb/https/outputs.tf | 3 + modules/app_lb/main.tf | 4 + modules/app_lb/outputs.tf | 14 +++- modules/redis/outputs.tf | 7 ++ modules/storage/bucket/outputs.tf | 4 + outputs.tf | 2 +- variables.tf | 6 ++ versions.tf | 6 +- 11 files changed, 143 insertions(+), 12 deletions(-) create mode 100644 modules/app_lb/https/outputs.tf diff --git a/README.md b/README.md index 3b9abc4..1a53b54 100644 --- a/README.md +++ b/README.md @@ -62,6 +62,7 @@ resources that lack official modules. |------|---------| | [terraform](#requirement\_terraform) | ~> 1.0 | | [google](#requirement\_google) | ~> 4.82 | +| [helm](#requirement\_helm) | ~> 2.10 | | [kubernetes](#requirement\_kubernetes) | ~> 2.23 | ## Providers @@ -75,13 +76,14 @@ No providers. | [app\_gke](#module\_app\_gke) | ./modules/app_gke | n/a | | [app\_lb](#module\_app\_lb) | ./modules/app_lb | n/a | | [database](#module\_database) | ./modules/database | n/a | -| [gke\_app](#module\_gke\_app) | wandb/wandb/kubernetes | 1.13.0 | +| [gke\_app](#module\_gke\_app) | wandb/wandb/kubernetes | 1.14.1 | | [kms](#module\_kms) | ./modules/kms | n/a | | [networking](#module\_networking) | ./modules/networking | n/a | | [project\_factory\_project\_services](#module\_project\_factory\_project\_services) | terraform-google-modules/project-factory/google//modules/project_services | ~> 13.0 | | [redis](#module\_redis) | ./modules/redis | n/a | | [service\_accounts](#module\_service\_accounts) | ./modules/service_accounts | n/a | | [storage](#module\_storage) | ./modules/storage | n/a | +| [wandb](#module\_wandb) | wandb/wandb/helm | 1.2.0 | ## Resources @@ -100,6 +102,7 @@ No resources. | [deletion\_protection](#input\_deletion\_protection) | If the instance should have deletion protection enabled. The database / Bucket can't be deleted when this value is set to `true`. | `bool` | `true` | no | | [disable\_code\_saving](#input\_disable\_code\_saving) | Boolean indicating if code saving is disabled | `bool` | `false` | no | | [domain\_name](#input\_domain\_name) | Domain for accessing the Weights & Biases UI. | `string` | `null` | no | +| [enable\_operator](#input\_enable\_operator) | Boolean indicating if the new operator should be enabled | `bool` | `false` | no | | [force\_ssl](#input\_force\_ssl) | Enforce SSL through the usage of the Cloud SQL Proxy (cloudsql://) in the DB connection string | `bool` | `false` | no | | [gke\_machine\_type](#input\_gke\_machine\_type) | Specifies the machine type to be allocated for the database | `string` | `"n1-standard-4"` | no | | [gke\_node\_count](#input\_gke\_node\_count) | n/a | `number` | `2` | no | @@ -149,4 +152,3 @@ No resources. | [standardized\_size](#output\_standardized\_size) | n/a | | [url](#output\_url) | The URL to the W&B application | - diff --git a/examples/public-dns-with-cloud-dns/main.tf b/examples/public-dns-with-cloud-dns/main.tf index bf2f036..9997d09 100644 --- a/examples/public-dns-with-cloud-dns/main.tf +++ b/examples/public-dns-with-cloud-dns/main.tf @@ -18,6 +18,14 @@ provider "kubernetes" { token = data.google_client_config.current.access_token } +provider "helm" { + kubernetes { + host = "https://${module.wandb.cluster_endpoint}" + cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate) + token = data.google_client_config.current.access_token + } +} + # Spin up all required services module "wandb" { source = "../../" @@ -33,6 +41,7 @@ module "wandb" { wandb_version = var.wandb_version wandb_image = var.wandb_image + create_redis = var.create_redis use_internal_queue = true force_ssl = var.force_ssl diff --git a/main.tf b/main.tf index 1a7da6d..41b764e 100644 --- a/main.tf +++ b/main.tf @@ -116,9 +116,9 @@ module "database" { } module "redis" { - count = var.create_redis ? 1 : 0 - source = "./modules/redis" - namespace = var.namespace + count = var.create_redis ? 1 : 0 + source = "./modules/redis" + namespace = var.namespace ### here we set the default to 6gb, which is = setting for "small" standard size memory_size_gb = coalesce(try(local.deployment_size[var.size].cache, 6)) network = local.network @@ -138,7 +138,7 @@ locals { module "gke_app" { source = "wandb/wandb/kubernetes" - version = "1.13.0" + version = "1.14.1" license = var.license @@ -156,11 +156,13 @@ module "gke_app" { local_restore = var.local_restore other_wandb_env = merge({ "GORILLA_DISABLE_CODE_SAVING" = var.disable_code_saving, - "GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source + "GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source, + "GORILLA_GLUE_LIST" = var.enable_operator }, var.other_wandb_env) - wandb_image = var.wandb_image - wandb_version = var.wandb_version + wandb_image = var.wandb_image + wandb_version = var.wandb_version + wandb_replicas = var.enable_operator ? 0 : 1 resource_limits = var.resource_limits resource_requests = var.resource_requests @@ -174,3 +176,81 @@ module "gke_app" { module.app_gke ] } + +module "wandb" { + source = "wandb/wandb/helm" + version = "1.2.0" + + spec = { + values = { + global = { + host = local.url + license = var.license + + extraEnv = merge({ + "GORILLA_DISABLE_CODE_SAVING" = var.disable_code_saving, + "GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source, + "TAG_CUSTOMER_NS" = var.namespace + "OIDC_ISSUER" = var.oidc_issuer + "OIDC_CLIENT_ID" = var.oidc_client_id + "OIDC_AUTH_METHOD" = var.oidc_auth_method + }, var.other_wandb_env) + + bucket = { + provider = "gcs" + name = local.bucket + } + + mysql = { + name = module.database.database_name + user = module.database.username + password = module.database.password + database = module.database.database_name + host = module.database.private_ip_address + port = 3306 + } + + redis = var.create_redis ? { + password = module.redis.0.auth_string + host = module.redis.0.host + port = module.redis.0.port + caCert = module.redis.0.ca_cert + params = { + tls = true + ttlInSeconds = 604800 + caCertPath = "/etc/ssl/certs/redis_ca.pem" + } + } : null + } + + app = { + extraEnvs = { + "GORILLA_GLUE_LIST" = !var.enable_operator + } + } + + ingress = { + annotations = { + "kubernetes.io/ingress.class" = "gce" + "kubernetes.io/ingress.global-static-ip-name" = module.app_lb.address_operator_name + "ingress.gcp.kubernetes.io/pre-shared-cert" = module.app_lb.certificate + } + } + + redis = { install = false } + mysql = { install = false } + # weave = { install = false } + } + } + + operator_chart_version = "1.1.0" + controller_image_tag = "1.10.1" + + # Added `depends_on` to ensure old infrastructure is provisioned first. This addresses a critical scheduling challenge + # where the Datadog DaemonSet could fail to provision due to CPU constraints. Ensuring the old infrastructure has priority + # mitigates the risk of "insufficient CPU" errors by facilitating controlled pod scheduling across nodes. + # TODO: Remove `depends_on` for phase 3 + depends_on = [ + module.gke_app + ] +} diff --git a/modules/app_lb/https/outputs.tf b/modules/app_lb/https/outputs.tf new file mode 100644 index 0000000..9fd7a71 --- /dev/null +++ b/modules/app_lb/https/outputs.tf @@ -0,0 +1,3 @@ +output "certificate" { + value = google_compute_managed_ssl_certificate.default.name +} diff --git a/modules/app_lb/main.tf b/modules/app_lb/main.tf index fd8f9ec..ad17e84 100644 --- a/modules/app_lb/main.tf +++ b/modules/app_lb/main.tf @@ -2,6 +2,10 @@ resource "google_compute_global_address" "default" { name = "${var.namespace}-address" } +resource "google_compute_global_address" "operator" { + name = "${var.namespace}-operator-address" +} + # Create a URL map that points to the GKE service module "url_map" { source = "./url_map" diff --git a/modules/app_lb/outputs.tf b/modules/app_lb/outputs.tf index 3a022d6..4a36124 100644 --- a/modules/app_lb/outputs.tf +++ b/modules/app_lb/outputs.tf @@ -1,3 +1,15 @@ output "address" { value = google_compute_global_address.default.address -} \ No newline at end of file +} + +output "address_operator" { + value = google_compute_global_address.operator.address +} + +output "address_operator_name" { + value = google_compute_global_address.operator.name +} + +output "certificate" { + value = module.https[0].certificate +} diff --git a/modules/redis/outputs.tf b/modules/redis/outputs.tf index a2706ec..0ed3381 100644 --- a/modules/redis/outputs.tf +++ b/modules/redis/outputs.tf @@ -10,3 +10,10 @@ output "auth_string" { value = google_redis_instance.default.auth_string } +output "host" { + value = google_redis_instance.default.host +} + +output "port" { + value = google_redis_instance.default.port +} diff --git a/modules/storage/bucket/outputs.tf b/modules/storage/bucket/outputs.tf index 268e328..2840b35 100644 --- a/modules/storage/bucket/outputs.tf +++ b/modules/storage/bucket/outputs.tf @@ -1,3 +1,7 @@ output "bucket_name" { value = google_storage_bucket.file_storage.name } + +output "bucket_region" { + value = google_storage_bucket.file_storage.location +} diff --git a/outputs.tf b/outputs.tf index 6b39310..e5f836b 100644 --- a/outputs.tf +++ b/outputs.tf @@ -1,5 +1,5 @@ output "address" { - value = module.app_lb.address + value = var.enable_operator ? module.app_lb.address_operator : module.app_lb.address } output "bucket_name" { value = local.bucket diff --git a/variables.tf b/variables.tf index 8d9a935..dca4e82 100644 --- a/variables.tf +++ b/variables.tf @@ -235,3 +235,9 @@ variable "size" { type = string default = null } + +variable "enable_operator" { + type = bool + description = "Boolean indicating if the new operator should be enabled" + default = false +} diff --git a/versions.tf b/versions.tf index 17f616e..860ec1e 100644 --- a/versions.tf +++ b/versions.tf @@ -9,5 +9,9 @@ terraform { source = "hashicorp/kubernetes" version = "~> 2.23" } + helm = { + source = "hashicorp/helm" + version = "~> 2.10" + } } -} \ No newline at end of file +}