RFC: k8s storage #77

taylorsilva · 2020-09-28T20:39:51Z

This is meant for the next iteration of our Kubernetes runtime work. This has nothing to do with how the Concourse helm chart currently functions.

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

and remove quote blocks

Add link to drivers

cirocosta · 2020-10-01T14:01:21Z

074-k8s-storage/proposal.md

+
+# Summary
+
+After spiking on a few solutions for storage on Kubernetes our recommendation is to use an image registry to store **cache objects** for steps.


it might be interesting to check out the approach that kaniko took - it essentially caches layers in a container registry if you don't specify a local path (so, e.g., if you want to run a Job that runs kaniko but you don't want to care about providing it a volume - which would imply concurrency=1)

$ k logs -f build-v1.18.6-4tn7w | grep cache INFO[0008] Checking for cached layer cirocosta/envtest-cache:8642ec4096d286832017c667e68548ac74436793bd49b1cf59d6a238044f6b0a... INFO[0008] No cached layer found for cmd RUN set -x && apt update -y && apt install -y rsync && git clone https://github.com/kubernetes/kubernetes && cd ./kubernetes && git checkout $KUBE_TAG && go mod download && make generated_files INFO[0194] Checking for cached layer cirocosta/envtest-cache:21931b04c009b9e67db0c6c8a1f907ee699dd32ad3fef2aa14efaf6b1a5dc995... INFO[0313] Found cached layer, extracting to filesystem INFO[0348] Checking for cached layer cirocosta/envtest-cache:b862d7fa29205a55ec028679dc2bb416b707a1a29d24feb9714075e4be5a5c74... INFO[0435] Found cached layer, extracting to filesystem INFO[0447] Checking for cached layer cirocosta/envtest-cache:e6c3ee315be35a8483933a79682375de049bdef0560721ce9962d728993803d0... INFO[0464] Found cached layer, extracting to filesystem INFO[0470] Checking for cached layer cirocosta/envtest-cache:e78396635e5a8c3a4b750b11796109b2a6a954b628c6e24a3dfbc1f579ce7024... INFO[0471] No cached layer found for cmd RUN go install -v ./cmd/envtest INFO[0508] Pushing layer cirocosta/envtest-cache:e78396635e5a8c3a4b750b11796109b2a6a954b628c6e24a3dfbc1f579ce7024 to cache now

(https://github.com/GoogleContainerTools/kaniko/blob/master/pkg/cache/cache.go)

i'm not familiar with the details of the implementation, but it might serve as some inspiration 😁

buildkit seems to allow something similar: https://github.com/moby/buildkit/blob/master/README.md#export-cache

Hey @cirocosta
We can see this style of local caching being beneficial for pulling the contents on an input for a task. Is that what you meant ? We weren't sure how this might be useful for pushing 🤔

oh, I didn't have any specifics in mind, it was more about showcasing the pattern of using a registry as a place where you just throw layers at a registry regardless of what you're building and then try to reuse those layers (pulling if they exist) when running another build

ari-becker · 2020-10-03T15:28:21Z

074-k8s-storage/proposal.md

+ would have negligible overhead for steps scheduled on the same node as no input/output stream would be required
+
+### Cons
+- Not being able to use IaaS based persisent disks doesn't offer a viable solution. K8s nodes would need to have large root volumes.


Is this actually an issue? Under Persistent Volumes you note as a con:

IaaS based limits on volume limits per node prevents this from being a scalable solution

Large root volumes is closer to how Concourse Workers work today (i.e. closer to the implicit foundational design assumptions in the Concourse domain that this RFC attempts to deal with), and if there are scalability issues with Persistent Volumes, then doesn't this represent a simpler way (because it builds on prior baggageclaim work) to get started with building the Kubernetes runtime?

Hey @ari-becker,
Yes, this is true. It would be useful to get perspectives such as yours for us to understand what constraints or lack thereof folks are dealing with.

We are assuming the cons;

resizing of nodes with larger root volumes

running privileged baggageclaim pods

being ok with baggagelciam's current lack of security (this can be enhanced)

creates larger challenges for operators.

It sounds like, you'd be okay with this setup ?

@xtreme-sameer-vohra currently, we already run Concourse workers on a Kubernetes cluster, one where we've architected our workers such that the workers are restricted to their own underlying nodes (we manually scale both the underlying nodes and the worker StatefulSet in tandem) and the worker pods are privileged. So all of your cons are already present in our deployment and therefore implicitly acceptable.

Maybe you need to clarify what the goals are supposed to be of developing a Kubernetes runtime for Concourse? is it to phase out the "worker" in favor of ATC directly scheduling Kubernetes Jobs that represent Concourse jobs or steps? Because then running privileged baggageclaim pods are fine, basically then the worker isn't really phased out, it's just simplified down to only the baggageclaim code, and presumably we'd be able to configure ATC to create Jobs with node restrictions / affinity / taints / tolerations to get the Jobs to run on a dedicated worker instance group. Is it the ability to run on common / shared instances? Because then running privileged baggageclaim pods is probably unacceptable. Should untrusted Concourse workloads run on common / shared instances in the first place, even if they're not allowed to be privileged? Ehhh, debatable, and a lot of security experts will tell you no, particularly if the jobs are not configured with PodSecurityPolicies and the like, which in a CI / development environment rarely makes sense.

The main value that we'd get out of a Kubernetes-native runtime would be the ability to set up Cluster Autoscaler so that Kubernetes will add additional nodes (and reduce nodes) as needed according to scheduling workloads, as currently we need to scale workers by hand. Cluster Autoscaler works well with workloads that are scheduled to dedicated instance groups, adding and subtracting nodes from the dedicated instance group that will permit Jobs that are restricted to those instance groups.

Ah yes, we are working on the K8s Runtime RFC that should have preceded this one 😸
The primary objective being able to leverage the scaling & operational capabilities of K8s by deploying steps as K8s pods without changing the pipeline schema. The latter would allow pipelines to be portable across runtimes. Along the same lines we're also assessing;

the least amount of required permissions on K8s for fulfilling the same requirements as a current worker (container & storage management)

the # of dependencies (besides postgres) for a Concourse deployment on K8s

As you mentioned, the container management would be delegated to k8s. We are spiking on using IaaS PVs + propagating mounts using CSI, hoping to ultimately leverage baggageclaim. We are early in the exploration, but this path might also provide the ability of using volume location for the purposes of scheduling 💭

The initial draft of the proposal has a worker being mapped to K8s cluster + namespace. "Instance Groups" seems interesting. Do you see it fitting in naturally with the cluster+namespace mapping or would you propose something different ?

@xtreme-sameer-vohra One of the things that worries me about using an external image registry is the prospect of cross-availability-zone (on AWS, $0.01/GB) and/or VPC egress traffic costs (on AWS, much higher). Currently, all of our workers are in the same availability zone (for CI workers, we're less concerned with the prospect of an AZ-wide outage and more concerned with the traffic costs) and so our traffic costs are much lower.

Effectively using a external image registry while keeping costs under control would mean a) deploying a separate image registry per availability zone alongside the workers, where workers are correctly configured to use the image registry that is in their availability zone, b) the image registry itself needs to use either a local disk / IaaS PV (so as not to incur traffic costs when storing layers in s3, for example), so really all that an image registry does is consolidate storage usage in one place. This eventually leads to cost savings compared to using a large local disk / IaaS PV per worker, with the tradeoff of additional operational complexity.

When talking about instance groups, I'm using the Kops nomenclature. An instance group maps to an AWS AutoscalingGroup, where the nodes are of a certain instance type (i.e. CI is generally best-served with burstable nodes, which on AWS are t3 nodes, whereas t3 nodes may not make sense for other workloads). In order to schedule workers on a specific ASG that we set up for them, we use a combination of node affinity to direct Kubernetes to schedule the CI workloads (currently, the worker pods) to those nodes, and taints and tolerations to not allow any other workloads to schedule there. This is orthogonal to Kubernetes namespacing, which is a good idea anyway to improve logical isolation between individual CI workloads.

xtreme-sameer-vohra · 2020-10-05T16:07:09Z

074-k8s-storage/proposal.md

+
+# Proposal
+
+**TL;DR**: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details](#image-registry-to-store-artifacts)


Trade-study - push/pull from registry using

init & side car containers

CSI with independent volume/input (eg. image populator)

Custom image per combination of step for pulling and TODO for pushing

xtreme-sameer-vohra · 2020-10-05T18:59:55Z

074-k8s-storage/proposal.md

+- Not being able to use IaaS based persisent disks doesn't offer a viable solution. K8s nodes would need to have large root volumes.
+- Wouldn't have support for hosting images by default. However, `baggageclaim` could be extended to add the APIs
+- `baggageclaim` itself doesn't have any authentication/authorization or transport security (https) mechanisms built into it
+


Suggested change

+ - `baggageclaim` needs to be run in a privileged container. It would be useful to understand the perspective of operators for this constraint ? Is this tenable ?

as far as i know, TKG clusters created via TMC expects pods to adhere to a podsecurity policy that's quite restrictive:

https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-CD033D1D-BAD2-41C4-A46F-647A560BAEAB.html

https://docs.pivotal.io/tkgi/1-9/pod-security-policy.html

buuut, I'd imagine someone installing a CSI plugin (like anything network related) would expect to grant higher privileges anyway (being an "infra component"), but not to any container (i.e., the containers that run the workloads, like tasks, etc)

Yep, this does make the assumption that installing Concourse requires more than just pod* capabilities.
However, installing a CSI driver would be an Operator action similar to installing concourse today.

@taylorsilva and I were also discussing that this approach wouldn't preclude us from using the IaaS PVs directly for simpler installations where user doesn't have privileges to install a CSI driver. This would have performance implications in the overhead to create/destroy volumes and limit of volumes / node, but would be a very simple solution 🤔

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

cirocosta · 2020-10-21T21:59:32Z

074-k8s-storage/proposal.md

+
+### Cons
+- Wouldn't have support for hosting images by default.
+- IaaS based limits on [volume limits per node](https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits) prevents this from being a scalable solution


not super related, but found this idea interesting: argoproj/argo-workflows#4130

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

Signed-off-by: Taylor Silva <[email protected]>

xtreme-sameer-vohra · 2020-10-30T20:34:52Z

TODO - Document the permissions required to setup the baggageclaim CSI + Storage Class

Signed-off-by: Taylor Silva <[email protected]>

xtreme-sameer-vohra and others added 5 commits September 28, 2020 16:28

add proposal for k8s storage

fa56f8e

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

fix a word

b33fb26

and remove quote blocks

Remove link TODO's

6bade72

Update k8s storage proposal.md to fix broken link

a288d45

Update proposal.md

df4fc3f

Add link to drivers

matthewpereira mentioned this pull request Sep 29, 2020

Native Kubernetes Runtime concourse/concourse#5682

Open

cirocosta reviewed Oct 1, 2020

View reviewed changes

ari-becker reviewed Oct 3, 2020

View reviewed changes

xtreme-sameer-vohra reviewed Oct 5, 2020

View reviewed changes

xtreme-sameer-vohra and others added 2 commits October 19, 2020 16:49

Add current storage spike of baggageclaim + csi

05f90fd

Change recommmendation to baggageclaim + CSI driver

51ac1a3

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

cirocosta reviewed Oct 21, 2020

View reviewed changes

xtreme-sameer-vohra and others added 9 commits October 23, 2020 17:12

Update open and answered questions

ac4b2fd

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

Adding specific details about CSI implementation

4d08864

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

Starting the implementation details section

27da6c6

Signed-off-by: Sameer Vohra <[email protected]> Co-authored-by: Taylor Silva <[email protected]>

Continuing to outline the proposal

8816255

Signed-off-by: Taylor Silva <[email protected]>

edits

88b5edd

Signed-off-by: Taylor Silva <[email protected]>

more edits

349287e

Signed-off-by: Taylor Silva <[email protected]>

add some questions

ddea43c

Signed-off-by: Taylor Silva <[email protected]>

Add question about streaming files

c21e215

Signed-off-by: Taylor Silva <[email protected]>

more q's

fc2fd1f

Signed-off-by: Taylor Silva <[email protected]>

taylorsilva added 2 commits October 30, 2020 17:37

streaming questions

421d0ae

Signed-off-by: Taylor Silva <[email protected]>

Add question about a baggageclaim image registry

00a363a

vito assigned xtremerui Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: k8s storage #77

RFC: k8s storage #77

taylorsilva commented Sep 28, 2020

cirocosta Oct 1, 2020 •

edited

xtreme-sameer-vohra Oct 5, 2020

cirocosta Oct 21, 2020

ari-becker Oct 3, 2020 •

edited

xtreme-sameer-vohra Oct 5, 2020 •

edited

ari-becker Oct 8, 2020

xtreme-sameer-vohra Oct 12, 2020

ari-becker Oct 13, 2020

xtreme-sameer-vohra Oct 5, 2020

xtreme-sameer-vohra Oct 5, 2020

cirocosta Oct 21, 2020 •

edited

xtreme-sameer-vohra Oct 24, 2020

cirocosta Oct 21, 2020

xtreme-sameer-vohra commented Oct 30, 2020


		# Summary

		After spiking on a few solutions for storage on Kubernetes our recommendation is to use an image registry to store cache objects for steps.


		# Proposal

		TL;DR: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details](#image-registry-to-store-artifacts)


	+ - `baggageclaim` needs to be run in a privileged container. It would be useful to understand the perspective of operators for this constraint ? Is this tenable ?

RFC: k8s storage #77

Are you sure you want to change the base?

RFC: k8s storage #77

Conversation

taylorsilva commented Sep 28, 2020

cirocosta Oct 1, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ari-becker Oct 3, 2020 • edited

Choose a reason for hiding this comment

xtreme-sameer-vohra Oct 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cirocosta Oct 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xtreme-sameer-vohra commented Oct 30, 2020

cirocosta Oct 1, 2020 •

edited

ari-becker Oct 3, 2020 •

edited

xtreme-sameer-vohra Oct 5, 2020 •

edited

cirocosta Oct 21, 2020 •

edited