Skip to content

Commit

Permalink
feat: Add operator helm release (#98)
Browse files Browse the repository at this point in the history
* add new infra alongside existing infra

* terraform-docs: automated action

* weave install false while rwo being implemented

* add boolean flag

* terraform-docs: automated action

* better var name

* terraform-docs: automated action

* weave install true test

* add wandb_replicas logic for gke app

* terraform-docs: automated action

* add license and extraenvs

* correct caCertPath for redis

* minimal spec

* re-add bucket, mysql, and redis

* add other wandb env

* add all but extra envs

* add extraEnv

* unneeded comma

* redis cert path follows helm charts

* add depends on

* correct controller image tag

* add envs

* rebase

* chore(release): version 1.23.0 [skip ci]

## [1.23.0](v1.22.0...v1.23.0) (2024-02-21)

### Features

* Add support for t-shirt-sized deployments ([#91](#91)) ([5432961](5432961)), closes [#92](#92)

* fix: Backwards compatibility for t-shirt-sized deployments (#101)

* fix: Backwards compatibility for t-shirt-sized deployments

* empty

* empty

* chore(release): version 1.23.1 [skip ci]

### [1.23.1](v1.23.0...v1.23.1) (2024-02-21)

### Bug Fixes

* Backwards compatibility for t-shirt-sized deployments ([#101](#101)) ([f812f81](f812f81))

* fix: Backwards compatibility fix to avoid changes in nodegroups. (#102)

* fix: backwards compatibility fix to avoid changes in nodegroups.

* terraform-docs: automated action

* update example

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: George Scott <[email protected]>

* chore(release): version 1.23.2 [skip ci]

### [1.23.2](v1.23.1...v1.23.2) (2024-02-22)

### Bug Fixes

* Backwards compatibility fix to avoid changes in nodegroups. ([#102](#102)) ([c331853](c331853))

* rebase

* rebase

* rebase

* terraform-docs: automated action

* pull out ssl certificate id

* specify which https for putput

* remove issuer create tag from ingress

* add inverse gorilla glue logic

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: semantic-release-bot <[email protected]>
Co-authored-by: Yogesh Garg <[email protected]>
Co-authored-by: George Scott <[email protected]>
  • Loading branch information
5 people committed Mar 4, 2024
1 parent baa2a2a commit e3916a7
Show file tree
Hide file tree
Showing 11 changed files with 143 additions and 12 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ resources that lack official modules.
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.0 |
| <a name="requirement_google"></a> [google](#requirement\_google) | ~> 4.82 |
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | ~> 2.10 |
| <a name="requirement_kubernetes"></a> [kubernetes](#requirement\_kubernetes) | ~> 2.23 |

## Providers
Expand All @@ -75,13 +76,14 @@ No providers.
| <a name="module_app_gke"></a> [app\_gke](#module\_app\_gke) | ./modules/app_gke | n/a |
| <a name="module_app_lb"></a> [app\_lb](#module\_app\_lb) | ./modules/app_lb | n/a |
| <a name="module_database"></a> [database](#module\_database) | ./modules/database | n/a |
| <a name="module_gke_app"></a> [gke\_app](#module\_gke\_app) | wandb/wandb/kubernetes | 1.13.0 |
| <a name="module_gke_app"></a> [gke\_app](#module\_gke\_app) | wandb/wandb/kubernetes | 1.14.1 |
| <a name="module_kms"></a> [kms](#module\_kms) | ./modules/kms | n/a |
| <a name="module_networking"></a> [networking](#module\_networking) | ./modules/networking | n/a |
| <a name="module_project_factory_project_services"></a> [project\_factory\_project\_services](#module\_project\_factory\_project\_services) | terraform-google-modules/project-factory/google//modules/project_services | ~> 13.0 |
| <a name="module_redis"></a> [redis](#module\_redis) | ./modules/redis | n/a |
| <a name="module_service_accounts"></a> [service\_accounts](#module\_service\_accounts) | ./modules/service_accounts | n/a |
| <a name="module_storage"></a> [storage](#module\_storage) | ./modules/storage | n/a |
| <a name="module_wandb"></a> [wandb](#module\_wandb) | wandb/wandb/helm | 1.2.0 |

## Resources

Expand All @@ -100,6 +102,7 @@ No resources.
| <a name="input_deletion_protection"></a> [deletion\_protection](#input\_deletion\_protection) | If the instance should have deletion protection enabled. The database / Bucket can't be deleted when this value is set to `true`. | `bool` | `true` | no |
| <a name="input_disable_code_saving"></a> [disable\_code\_saving](#input\_disable\_code\_saving) | Boolean indicating if code saving is disabled | `bool` | `false` | no |
| <a name="input_domain_name"></a> [domain\_name](#input\_domain\_name) | Domain for accessing the Weights & Biases UI. | `string` | `null` | no |
| <a name="input_enable_operator"></a> [enable\_operator](#input\_enable\_operator) | Boolean indicating if the new operator should be enabled | `bool` | `false` | no |
| <a name="input_force_ssl"></a> [force\_ssl](#input\_force\_ssl) | Enforce SSL through the usage of the Cloud SQL Proxy (cloudsql://) in the DB connection string | `bool` | `false` | no |
| <a name="input_gke_machine_type"></a> [gke\_machine\_type](#input\_gke\_machine\_type) | Specifies the machine type to be allocated for the database | `string` | `"n1-standard-4"` | no |
| <a name="input_gke_node_count"></a> [gke\_node\_count](#input\_gke\_node\_count) | n/a | `number` | `2` | no |
Expand Down Expand Up @@ -149,4 +152,3 @@ No resources.
| <a name="output_standardized_size"></a> [standardized\_size](#output\_standardized\_size) | n/a |
| <a name="output_url"></a> [url](#output\_url) | The URL to the W&B application |
<!-- END_TF_DOCS -->

9 changes: 9 additions & 0 deletions examples/public-dns-with-cloud-dns/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@ provider "kubernetes" {
token = data.google_client_config.current.access_token
}

provider "helm" {
kubernetes {
host = "https://${module.wandb.cluster_endpoint}"
cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
token = data.google_client_config.current.access_token
}
}

# Spin up all required services
module "wandb" {
source = "../../"
Expand All @@ -33,6 +41,7 @@ module "wandb" {
wandb_version = var.wandb_version
wandb_image = var.wandb_image


create_redis = var.create_redis
use_internal_queue = true
force_ssl = var.force_ssl
Expand Down
94 changes: 87 additions & 7 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,9 @@ module "database" {
}

module "redis" {
count = var.create_redis ? 1 : 0
source = "./modules/redis"
namespace = var.namespace
count = var.create_redis ? 1 : 0
source = "./modules/redis"
namespace = var.namespace
### here we set the default to 6gb, which is = setting for "small" standard size
memory_size_gb = coalesce(try(local.deployment_size[var.size].cache, 6))
network = local.network
Expand All @@ -138,7 +138,7 @@ locals {

module "gke_app" {
source = "wandb/wandb/kubernetes"
version = "1.13.0"
version = "1.14.1"

license = var.license

Expand All @@ -156,11 +156,13 @@ module "gke_app" {
local_restore = var.local_restore
other_wandb_env = merge({
"GORILLA_DISABLE_CODE_SAVING" = var.disable_code_saving,
"GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source
"GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source,
"GORILLA_GLUE_LIST" = var.enable_operator
}, var.other_wandb_env)

wandb_image = var.wandb_image
wandb_version = var.wandb_version
wandb_image = var.wandb_image
wandb_version = var.wandb_version
wandb_replicas = var.enable_operator ? 0 : 1

resource_limits = var.resource_limits
resource_requests = var.resource_requests
Expand All @@ -174,3 +176,81 @@ module "gke_app" {
module.app_gke
]
}

module "wandb" {
source = "wandb/wandb/helm"
version = "1.2.0"

spec = {
values = {
global = {
host = local.url
license = var.license

extraEnv = merge({
"GORILLA_DISABLE_CODE_SAVING" = var.disable_code_saving,
"GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = local.secret_store_source,
"TAG_CUSTOMER_NS" = var.namespace
"OIDC_ISSUER" = var.oidc_issuer
"OIDC_CLIENT_ID" = var.oidc_client_id
"OIDC_AUTH_METHOD" = var.oidc_auth_method
}, var.other_wandb_env)

bucket = {
provider = "gcs"
name = local.bucket
}

mysql = {
name = module.database.database_name
user = module.database.username
password = module.database.password
database = module.database.database_name
host = module.database.private_ip_address
port = 3306
}

redis = var.create_redis ? {
password = module.redis.0.auth_string
host = module.redis.0.host
port = module.redis.0.port
caCert = module.redis.0.ca_cert
params = {
tls = true
ttlInSeconds = 604800
caCertPath = "/etc/ssl/certs/redis_ca.pem"
}
} : null
}

app = {
extraEnvs = {
"GORILLA_GLUE_LIST" = !var.enable_operator
}
}

ingress = {
annotations = {
"kubernetes.io/ingress.class" = "gce"
"kubernetes.io/ingress.global-static-ip-name" = module.app_lb.address_operator_name
"ingress.gcp.kubernetes.io/pre-shared-cert" = module.app_lb.certificate
}
}

redis = { install = false }
mysql = { install = false }
# weave = { install = false }
}
}

operator_chart_version = "1.1.0"
controller_image_tag = "1.10.1"

# Added `depends_on` to ensure old infrastructure is provisioned first. This addresses a critical scheduling challenge
# where the Datadog DaemonSet could fail to provision due to CPU constraints. Ensuring the old infrastructure has priority
# mitigates the risk of "insufficient CPU" errors by facilitating controlled pod scheduling across nodes.
# TODO: Remove `depends_on` for phase 3
depends_on = [
module.gke_app
]
}
3 changes: 3 additions & 0 deletions modules/app_lb/https/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
output "certificate" {
value = google_compute_managed_ssl_certificate.default.name
}
4 changes: 4 additions & 0 deletions modules/app_lb/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ resource "google_compute_global_address" "default" {
name = "${var.namespace}-address"
}

resource "google_compute_global_address" "operator" {
name = "${var.namespace}-operator-address"
}

# Create a URL map that points to the GKE service
module "url_map" {
source = "./url_map"
Expand Down
14 changes: 13 additions & 1 deletion modules/app_lb/outputs.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
output "address" {
value = google_compute_global_address.default.address
}
}

output "address_operator" {
value = google_compute_global_address.operator.address
}

output "address_operator_name" {
value = google_compute_global_address.operator.name
}

output "certificate" {
value = module.https[0].certificate
}
7 changes: 7 additions & 0 deletions modules/redis/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,10 @@ output "auth_string" {
value = google_redis_instance.default.auth_string
}

output "host" {
value = google_redis_instance.default.host
}

output "port" {
value = google_redis_instance.default.port
}
4 changes: 4 additions & 0 deletions modules/storage/bucket/outputs.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
output "bucket_name" {
value = google_storage_bucket.file_storage.name
}

output "bucket_region" {
value = google_storage_bucket.file_storage.location
}
2 changes: 1 addition & 1 deletion outputs.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
output "address" {
value = module.app_lb.address
value = var.enable_operator ? module.app_lb.address_operator : module.app_lb.address
}
output "bucket_name" {
value = local.bucket
Expand Down
6 changes: 6 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -235,3 +235,9 @@ variable "size" {
type = string
default = null
}

variable "enable_operator" {
type = bool
description = "Boolean indicating if the new operator should be enabled"
default = false
}
6 changes: 5 additions & 1 deletion versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,9 @@ terraform {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.10"
}
}
}
}

0 comments on commit e3916a7

Please sign in to comment.