From fa56f8e13fbe016d2d3b079d7042b7b365e78a4c Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Mon, 28 Sep 2020 16:28:34 -0400 Subject: [PATCH 01/18] add proposal for k8s storage Signed-off-by: Sameer Vohra Co-authored-by: Taylor Silva --- 074-k8s-storage/proposal.md | 174 ++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 074-k8s-storage/proposal.md diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md new file mode 100644 index 00000000..520f49c0 --- /dev/null +++ b/074-k8s-storage/proposal.md @@ -0,0 +1,174 @@ +# Terms +- **cache object** a BLOB and relevant metadata that Concourse needs to persist. These could be Resource Caches, Task Caches or Build Caches. +- **worker** Concourse executes steps on a **worker** and implements some **worker** interface. Concourse is agnostic of the runtime representation of the worker (eg. K8s pod, node or cluster). + +# Summary + +After spiking on a few solutions for storage on Kubernetes our recommendation is to use an image registry to store **cache objects** for steps. + +# Motivation + +As we started thinking about the Kubernetes runtime we realized that we need to think about what our storage solution would be before proceeding with any other part of the implementation. Storage has a huge effect on how Concourse interacts with the runtime (Kubernetes). Storage also had a lot of unknowns, we didn't know what the storage landscape on Kubernetes looked like and what options were available to us. Storage also has a huge impact on the perforamnce of the cluster, in regards to storage and initialization of steps. + +## Requirements +An ideal storage solution can do the following : + +- image fetching from the CRI k8s is using +- transfer **cache objects** between steps (whatever represents a step, most likely a pod) +- cache for resources and tasks +- stream **cache objects** across worker runtimes (k8s worker sends artifact to garden worker) + +## Criteria +- security +- performance, aka initialization time (time spent running workloads on a single k8s worker, as well as across workers) +- resource usage to run this storage solution + +# Proposal + +**TL;DR**: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details TODO](#image-registry-to-store-artifacts) + +Furthermore, the CSI is a useful interface for building the storage component against. [See Details TODO](#csi) + +# Storage Options considered +## Baggageclaim Daemonset +### Description +A privileged baggageclaim pod would manage all the **cache object** for step pods. The pod can be provided sufficient privilege to create overlay mounts using `BiDirectional` value for `mountPropagation`. The `volumeMount` object allows specifying a volume `subPath`. + +This approach didn't work using GCE PDs or vSphere Volumes ([Issue](https://github.com/kubernetes/kubernetes/issues/95049)). It does work using `hostPath` option, however, that would require a large root volume and wouldn't be able to leverage IaaS based persistent disks. + +The pod would run on all nodes that Concourse would execute steps on. + +### Pros ++ Leverage baggageclaim + + volume streaming between nodes would work using the current Concourse architecture + + resource and task caches would also work using the current Concourse architecture + + would be able to stream **cache objects** across worker runtimes as it would be mediated via the web ++ Concourse would have complete control over volume lifecycle ++ would have negligible overhead for steps scheduled on the same node as no input/output stream would be required + +### Cons +- Not being able to use IaaS based persisent disks doesn't offer a viable solution. K8s nodes would need to have large root volumes. +- Wouldn't have support for hosting images by default. However, `baggageclaim` could be extended to add the APIs +- `baggageclaim` itself doesn't have any authentication/authorization or transport security (https) mechanisms built into it + +## Image Registry to store artifacts +### Description +Each **cache object** is represented as a image layer for a repository in an image registry. [SPIKE using registry to store artifacts](https://github.com/concourse/concourse/issues/3740). Concourse would require a managed image registry as a dependency. For each step, Concourse would generate a image config and manifest with all the relevant inputs modeled as image layers. + +### Pros +- Would have support for building an image in a step and using it as the image for a subsequent step. This would require the image registry to be accessible by the CRI subsystem on a node +- Image registries are are critical to operating on K8s and as such there are plenty of options for leveraging managed IaaS based solutions such as GCR, ECR, ACR to on prem solutions like Harbor. Therefore, it would be a safe assumption that a Concourse on K8s user would already have a registry available for use. +- Could explore further de-coupling by exploring [csi-driver-image-populator](https://github.com/kubernetes-csi/csi-driver-image-populator) when using registries for storing artifacts. Listed as a sample driver in the CSI docs and README says it is not production ready. Last commit was Oct 2019. There is also another utility - [imgpack](https://github.com/k14s/imgpkg) which allows arbitrary data store in images as layers. +- Leverage performance enhancements to registries such as [pull through cache](https://docs.docker.com/registry/recipes/mirror/) +- Use a standardized and documented [OCI image-spec protocol](https://github.com/opencontainers/image-spec) +- LRU based local caching of image layers by the K8s CRI +- Established ways of securely pushing/pulling blobs from an image registry +- As this would be a centralized storage solution + - it doesn't impact what a K8s based Concourse worker looks like + - Simplified GC + - Would support streaming across worker runtimes + +### Cons +- Some registries such as GCR don't expose an API to delete layers directly +- **cache object** would have to have a fixed static path in the image file system to be able to reuse the same layer. This would require some additional handling on Concourse to support [input-mapping](https://concourse-ci.org/jobs.html#schema.step.task-step.input_mapping) and [output-mapping](https://concourse-ci.org/jobs.html#schema.step.task-step.output_mapping) +- Adds extra development overhead to generate new image config & manifests to leverage **cache object** layers +- Adds extra initialization overhead. Concourse wouldn't have control over the local caches on K8s nodes, so volumes would always have to be pushed to the centralized registry and pulled at least once when executing a step +- Potentially adds substantial load on registry, as Concourse would be creating a new file system layer for every **cache object** +- There isn't a well documented approach to setup an in-cluster secure registry. The setup requires exposing an in-cluster registry externally with traffic routed via an LB. [Prior spike](https://github.com/concourse/concourse/issues/3796) + +## S3 Compatible Blobstore +## Description +Each **cache object** is stored in a blobstore. Concourse would require a mananaged blobstore as a dependency. For each step, Concourse would pull down the relevant blobs for inputs and push blobs for outputs. + +### Pros +- Scale well (GCR uses GCS as the underlying storage) +- Could explore further de-coupling by exploring CSI driver +- Established ways of securely fetching/pushing blobs from an a blobstore +- As this would be a centralized storage solution + - it doesn't impact what a K8s based Concourse worker looks like + - Simplified GC + - Would support streaming across worker runtimes + +### Cons +- Wouldn't have support for hosting images by default. +- Adds another dependency for Concourse (depending on where Concourse is deployed there might be managed solutions available) +- Lack of standardized APIs +- Adds extra initialization overhead. Concourse wouldn't have a local cache, so volumes would always have to be pushed & pulled for steps +- Concourse would potentially be heavy user of the blobstore + +## Persistent Volumes +Each **cache object** would be stored in its own persistent volume. Persistent volume snapshots would be used to reference **cache object** versions. + +### Pros +- Would leverage native k8s offering +- Maps well to Concourse's use of **cache objects** and offloads the heavy lifting to K8s +- Potentially wouldn't require volumes to be streamed at all + +### Cons +- Wouldn't have support for hosting images by default. +- IaaS based limits on [volume limits per node](https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits) prevents this from being a scalable solution +- CSI Snapshotting feature is optional and not every driver supports it (LINK) +- As this would NOT be a centralized storage solution, it wouldn't support workers across multiple runtimes or even K8s clusters + +## K8s POC (Baggagelciam peer-to-peer) +Each step would have a sidecar container to populate input **cache objects** and host outputs **cache objects** via an HTTP API.`beltloader` is used to populate inputs. `baggageclaim` is used to host outputs. `baggageclaim` was also modified to allow **cache objects** to be accessed via the registry APIs (support images). + +### Pros +- No external dependencies are required +- Supports worker-to-worker streaming bypassing Concourse web + +### Cons +- the `step` pod's lifecycle is tied to the **cache object** lifecycle (pods have to be kept around until the **cache object** they host is required). This would increase the CPU & memory usage of a cluster. +- there isn't a simple mechanism to allow the k8s container runtime to securely access the `baggageclaim` endpoints to fetch images +- As this would NOT be a centralized storage solution, it would require exposing the `baggageclaim` endpoints via `services` to be accessed externally +- `baggageclaim` itself doesn't have any authentication/authorization or transport security (https) mechanisms built into it + +# Other considerations +## CSI +The [Container Storage Interface](https://github.com/container-storage-interface/spec/blob/master/spec.md) provides a generic interface for providing storage to containers. + +CSI was developed as a standard for exposing arbitrary block and file storage storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes. With the adoption of the Container Storage Interface, the Kubernetes volume layer becomes truly extensible. Using CSI, third-party storage providers can write and deploy plugins exposing new storage systems in Kubernetes without ever having to touch the core Kubernetes code. This gives Kubernetes users more options for storage and makes the system more secure and reliable. [Source](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#why-csi) + +The CSI spec can be used to wrap every solution listed above. It provides an API through which the chosen solution would be consumed. + +### Pros +- Can be deployed/managed using k8s resources ([hostPath CSI Driver example](https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/deploy-1.17-and-later.md)) +- Allows the storage mechanims to be swapped more easily + - can be an extension point for Concourse community +- De-couples Concourse from its usage of storage + - the driver could be patched/upgraded indepdently of Concourse +- The CSI Spec is quite flexible and has a minimum set of required methods (the other set of features are opt-in) +- CSI supports multiple deployment topologies (master, master+node, node) +- Provides a scheduling extension point for volume aware scheduling + +### Cons +- extra overhead for development, packaging and deployment +- the CSI version may be tied to a K8s version + +## Fuse +This might simplify our usage of external storage solutions such as blobstores. There isn't a supported solution in K8s at the moment. However, this would be something worth considering if that were to change. Current [issue requesting K8s development](## Fuse +https://github.com/kubernetes/kubernetes/issues/7890) + +# Open Questions + +> Raise any concerns here for things you aren't sure about yet. +- Do we implement our own version of the csi-image-populator? +- Should we implement this as a CSI driver? + + +# Answered Questions + +> If there were any major concerns that have already (or eventually, through +> the RFC process) reached consensus, it can still help to include them along +> with their resolution, if it's otherwise unclear. +> +> This can be especially useful for RFCs that have taken a long time and there +> were some subtle yet important details to get right. +> +> This may very well be empty if the proposal is simple enough. + + +# New Implications + +> What is the impact of this change, outside of the change itself? How might it +> change peoples' workflows today, good or bad? From b33fb2618dc7d2d29c5eae4efaa132005df6317f Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Mon, 28 Sep 2020 16:36:22 -0400 Subject: [PATCH 02/18] fix a word and remove quote blocks --- 074-k8s-storage/proposal.md | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 520f49c0..bf823bf2 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -8,7 +8,7 @@ After spiking on a few solutions for storage on Kubernetes our recommendation is # Motivation -As we started thinking about the Kubernetes runtime we realized that we need to think about what our storage solution would be before proceeding with any other part of the implementation. Storage has a huge effect on how Concourse interacts with the runtime (Kubernetes). Storage also had a lot of unknowns, we didn't know what the storage landscape on Kubernetes looked like and what options were available to us. Storage also has a huge impact on the perforamnce of the cluster, in regards to storage and initialization of steps. +As we started thinking about the Kubernetes runtime we realized that we need to think about what our storage solution would be before proceeding with any other part of the implementation. Storage has a huge effect on how Concourse interacts with the runtime (Kubernetes). Storage also had a lot of unknowns, we didn't know what the storage landscape on Kubernetes looked like and what options were available to us. Storage also has a huge impact on the performance of the cluster, in regards to storage and initialization of steps. ## Requirements An ideal storage solution can do the following : @@ -151,24 +151,18 @@ https://github.com/kubernetes/kubernetes/issues/7890) # Open Questions -> Raise any concerns here for things you aren't sure about yet. - Do we implement our own version of the csi-image-populator? - Should we implement this as a CSI driver? # Answered Questions -> If there were any major concerns that have already (or eventually, through -> the RFC process) reached consensus, it can still help to include them along -> with their resolution, if it's otherwise unclear. -> -> This can be especially useful for RFCs that have taken a long time and there -> were some subtle yet important details to get right. -> -> This may very well be empty if the proposal is simple enough. + +# Related Links +- [Storage Spike](https://github.com/concourse/concourse/issues/6036) +- [Review k8s worker POC](https://github.com/concourse/concourse/issues/5986) # New Implications -> What is the impact of this change, outside of the change itself? How might it -> change peoples' workflows today, good or bad? +Will drive the rest of the Kubernetes runtime work. From 6bade72b182b5f078c57fc7ee80f505347ad39fa Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Mon, 28 Sep 2020 16:36:54 -0400 Subject: [PATCH 03/18] Remove link TODO's --- 074-k8s-storage/proposal.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index bf823bf2..c35f2350 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -25,9 +25,9 @@ An ideal storage solution can do the following : # Proposal -**TL;DR**: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details TODO](#image-registry-to-store-artifacts) +**TL;DR**: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details](#image-registry-to-store-artifacts) -Furthermore, the CSI is a useful interface for building the storage component against. [See Details TODO](#csi) +Furthermore, the CSI is a useful interface for building the storage component against. [See Details](#csi) # Storage Options considered ## Baggageclaim Daemonset From a288d45f17544beba482e4f02ca189d74c050674 Mon Sep 17 00:00:00 2001 From: Matthew Pereira Date: Mon, 28 Sep 2020 16:49:59 -0400 Subject: [PATCH 04/18] Update k8s storage proposal.md to fix broken link --- 074-k8s-storage/proposal.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index c35f2350..2ac5de38 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -146,8 +146,7 @@ The CSI spec can be used to wrap every solution listed above. It provides an API - the CSI version may be tied to a K8s version ## Fuse -This might simplify our usage of external storage solutions such as blobstores. There isn't a supported solution in K8s at the moment. However, this would be something worth considering if that were to change. Current [issue requesting K8s development](## Fuse -https://github.com/kubernetes/kubernetes/issues/7890) +This might simplify our usage of external storage solutions such as blobstores. There isn't a supported solution in K8s at the moment. However, this would be something worth considering if that were to change. [Click here to view the current issue requesting K8s development](https://github.com/kubernetes/kubernetes/issues/7890). # Open Questions From df4fc3f3242d93f3134b7ae5347bf48b21289df4 Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Tue, 29 Sep 2020 09:22:18 -0500 Subject: [PATCH 05/18] Update proposal.md Add link to drivers --- 074-k8s-storage/proposal.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 2ac5de38..02bb63f4 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -107,7 +107,7 @@ Each **cache object** would be stored in its own persistent volume. Persistent v ### Cons - Wouldn't have support for hosting images by default. - IaaS based limits on [volume limits per node](https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits) prevents this from being a scalable solution -- CSI Snapshotting feature is optional and not every driver supports it (LINK) +- CSI Snapshotting feature is optional and not every driver supports it ([Drivers & features they support](https://kubernetes-csi.github.io/docs/drivers.html#production-drivers)) - As this would NOT be a centralized storage solution, it wouldn't support workers across multiple runtimes or even K8s clusters ## K8s POC (Baggagelciam peer-to-peer) From 05f90fd8c37cf5938b2d7abec70b50f512f1cbbc Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Mon, 19 Oct 2020 16:49:39 -0400 Subject: [PATCH 06/18] Add current storage spike of baggageclaim + csi --- 074-k8s-storage/proposal.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 02bb63f4..6d26b40b 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -96,6 +96,14 @@ Each **cache object** is stored in a blobstore. Concourse would require a manana - Adds extra initialization overhead. Concourse wouldn't have a local cache, so volumes would always have to be pushed & pulled for steps - Concourse would potentially be heavy user of the blobstore +## Baggageclaim + CSI Implementation +### Description +TODO +### Pros +TODO +### Cons +TODO + ## Persistent Volumes Each **cache object** would be stored in its own persistent volume. Persistent volume snapshots would be used to reference **cache object** versions. From 51ac1a37bcf69185e59f294603c3ad777458123e Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Wed, 21 Oct 2020 17:21:44 -0400 Subject: [PATCH 07/18] Change recommmendation to baggageclaim + CSI driver Signed-off-by: Sameer Vohra Co-authored-by: Taylor Silva --- 074-k8s-storage/proposal.md | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 6d26b40b..198784db 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -25,16 +25,15 @@ An ideal storage solution can do the following : # Proposal -**TL;DR**: We recommend going with the image registry option because it satisfies all the requirements and gives us a bunch of options to improve performance when compared to the blobstore option. It also provides a very flexible solution that works across multiple runtime workers. [See Details](#image-registry-to-store-artifacts) +**TL;DR**: After [spiking on the CSI driver interface](https://github.com/concourse/concourse/issues/6133) we now recommend creating a CSI driver based on baggageclaim. Furthermore, the CSI is a useful interface for building the storage component against. [See Details](#csi) # Storage Options considered -## Baggageclaim Daemonset -### Description -A privileged baggageclaim pod would manage all the **cache object** for step pods. The pod can be provided sufficient privilege to create overlay mounts using `BiDirectional` value for `mountPropagation`. The `volumeMount` object allows specifying a volume `subPath`. -This approach didn't work using GCE PDs or vSphere Volumes ([Issue](https://github.com/kubernetes/kubernetes/issues/95049)). It does work using `hostPath` option, however, that would require a large root volume and wouldn't be able to leverage IaaS based persistent disks. +## Baggageclaim + CSI Implementation +### Description +A privileged baggageclaim pod would manage all the **cache object** for step pods. The baggageclaim pod can be provided sufficient privilege to create overlay mounts and have those mounts propagate back to the host using the `BiDirectional` value for `mountPropagation`. The pod would run on all nodes that Concourse would execute steps on. @@ -42,14 +41,20 @@ The pod would run on all nodes that Concourse would execute steps on. + Leverage baggageclaim + volume streaming between nodes would work using the current Concourse architecture + resource and task caches would also work using the current Concourse architecture - + would be able to stream **cache objects** across worker runtimes as it would be mediated via the web -+ Concourse would have complete control over volume lifecycle ++ Web can manage stoage via native k8s stoage objects + + Concourse would have complete control over volume lifecycle + + Operator can query k8s api to observe all volumes + would have negligible overhead for steps scheduled on the same node as no input/output stream would be required - ++ Disk where volumes are managed can be backed by any other CSI driver (as a long as baggageclaim can make overlay mounts on it) + + Can leverage tools in k8s to manage the disk that baggageclaim is writing to ### Cons -- Not being able to use IaaS based persisent disks doesn't offer a viable solution. K8s nodes would need to have large root volumes. +- CSI drivers are meant to guarantee storage capacity; baggageclaim does not currently do this, it provides unbounded disk space + - This CSI driver will not be meant for usage outside of Concourse - Wouldn't have support for hosting images by default. However, `baggageclaim` could be extended to add the APIs + - Crazy Idea 1: somehow load the image into the CRI that's running on the node + - Crazy Idea 2: Can a volumeMount override the root path (`/`)? - `baggageclaim` itself doesn't have any authentication/authorization or transport security (https) mechanisms built into it + - k8s has networking tools that we can leverage to ensure only authorized clients can talk to it ## Image Registry to store artifacts ### Description @@ -96,14 +101,6 @@ Each **cache object** is stored in a blobstore. Concourse would require a manana - Adds extra initialization overhead. Concourse wouldn't have a local cache, so volumes would always have to be pushed & pulled for steps - Concourse would potentially be heavy user of the blobstore -## Baggageclaim + CSI Implementation -### Description -TODO -### Pros -TODO -### Cons -TODO - ## Persistent Volumes Each **cache object** would be stored in its own persistent volume. Persistent volume snapshots would be used to reference **cache object** versions. From ac4b2fde7f5b5258c0a9481db700c95dd281da2d Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Fri, 23 Oct 2020 17:12:28 -0400 Subject: [PATCH 08/18] Update open and answered questions Signed-off-by: Sameer Vohra Co-authored-by: Taylor Silva --- 074-k8s-storage/proposal.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 198784db..2f37fa90 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -155,16 +155,24 @@ This might simplify our usage of external storage solutions such as blobstores. # Open Questions -- Do we implement our own version of the csi-image-populator? -- Should we implement this as a CSI driver? +- When we need to have a volume available on multiple k8s nodes, how do we do this in a baggageclaim CSI driver? + - Would it make sense to support `ReadWriteMany` as the volume's `accessMode` instead of `ReadWriteOnce`? +- What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? +- How will the CSI driver stream a volume between k8s nodes? + - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? +- What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ +- For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? # Answered Questions +- Should we implement this as a CSI driver? **Yes we do after doing the CSI Driver POC Spike** +- Do we implement our own version of the csi-image-populator? **Yes but based on baggageclaim instead of image layers** # Related Links - [Storage Spike](https://github.com/concourse/concourse/issues/6036) - [Review k8s worker POC](https://github.com/concourse/concourse/issues/5986) +- [CSI Driver POC Spike](https://github.com/concourse/concourse/issues/6133) # New Implications From 4d0886453fde62ce9d198cf91ec755d5b63a0260 Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Mon, 26 Oct 2020 17:24:42 -0400 Subject: [PATCH 09/18] Adding specific details about CSI implementation Signed-off-by: Sameer Vohra Co-authored-by: Taylor Silva --- 074-k8s-storage/proposal.md | 48 +++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 2f37fa90..22c7494e 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -29,6 +29,54 @@ An ideal storage solution can do the following : Furthermore, the CSI is a useful interface for building the storage component against. [See Details](#csi) +## Level Setting + +### What does Baggageclaim do? + +Baggageclaim comes as two components: a client and a server communicating over an HTTP REST API. The server component manages volumes within a specified directory on the host. +Volumes are created based on one of three strategies: +- Empty Strategy: creates an empty volume +- COW Strategy: creates a volume based on an existing volume +- Import Strategy: creates a volume based on a local directory or tar ball + +Baggageclaim keeps tracks of all volumes by querying the filesystem structure, therefore no database component is needed. Volumes are assigned an ID that is passed in by the baggageclaim client. + +The [HTTP REST API](https://github.com/concourse/baggageclaim/blob/ea9252e4fcca101f32971cfb5ff47c3355c7c91e/api/handler.go#L26-L38) allows a baggageclaim client to: +- Create volumes with one of the above strategies +- Destroy volumes +- Query for a list of all volumes +- Query for the properties of a single volume +- Stream the contents of a volume + +Supports multiple filesystem drivers (overlay, btrfs, naive\*). Overlay is the recommended driver to use as it's the most stable. All drivers support all features of baggageclaim. + +\* _naive simply `cp`'s files into new directories and isn't really a "filesystem"_ + +### What's a CSI Driver? + +In a container orchestration (CO) system, such as Kubernetes or Cloud Foundry, you need a way to provide storage to containers. This could be ephemeral (lifecycle is tied to the container) or persistent (exists outside of the container's lifecycle) storage. In order to support many different storage providers the [CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md) was made to allow CO's to have a consistent way to ask for storage for containers. + +A CSI driver is an implementation of a gRPC interface. The CO communicates with a CSI driver over a unix domain socket. + +A CSI driver is made up of two components, both serving different parts of the CSI's gRPC interface: + +**Controller Plugin**: Can be run anywhere. Serves the **Controller and Identity Service**. + +**Node Plugin**: Must be run on the Node where the storage requested by the CO is to be provisioned. Serves the **Node and Identity Service**. (Yes, the indentity service is served by both plugins). + +The full list of functions for the interface are available in the [CSI Spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#rpc-interface). There is some flexibility as to how you architect your CSI driver. The [CSI spec has some examples](https://github.com/container-storage-interface/spec/blob/master/spec.md#architecture). The volume lifecycle the CSI driver is expected to follow is also [diagramed in the CSI Spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#volume-lifecycle). + +When implementing a CSI driver for Kubernetes it is helpful to understand when certain CSI functions are called. CSI functions are typically called after the creation/modification of some Kubernetes API objects. It's important to note here that a CSI driver **does not know anything about kubernetes API objects**. In order for CSI functions to be called at the right time a CSI driver depends on various sidecar containers that monitor for certain Kubernetes Storage objects. These sidecars are provided by the Kubernetes team; a list of these ["helper containers" is available in this Kubernetes CSI Design document](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes). + +[The CSI Driver Spike](https://github.com/concourse/concourse/issues/6133#issuecomment-708471004) contains some notes that show which CSI functions are called when certain Kubernetes objects are created. + +## Proposed Implementation of Baggageclaim as a CSI Driver + +Follow the recommended deployment strategy from the Kubernetes team [described in this design document](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes) with the following differences: +- no `external-resizer` container +- no `external-snapshotter` container + + # Storage Options considered ## Baggageclaim + CSI Implementation From 27da6c6feda35d5f02d40c3fe9804689810cb8b4 Mon Sep 17 00:00:00 2001 From: Sameer Vohra Date: Mon, 26 Oct 2020 17:58:45 -0400 Subject: [PATCH 10/18] Starting the implementation details section Signed-off-by: Sameer Vohra Co-authored-by: Taylor Silva --- 074-k8s-storage/proposal.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 22c7494e..c7c8d02f 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -72,9 +72,13 @@ When implementing a CSI driver for Kubernetes it is helpful to understand when c ## Proposed Implementation of Baggageclaim as a CSI Driver +Targeting Kubernetes Version 1.19 + Follow the recommended deployment strategy from the Kubernetes team [described in this design document](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes) with the following differences: -- no `external-resizer` container -- no `external-snapshotter` container +- no `external-resizer` container. Not planning to support resizing. +- no `external-snapshotter` container. We will use the `CLONE_VOLUME` feature to create COW volumes in baggageclaim instead of trying to use snapshots. +- An extra volume must be mounted for each Pod in the DaemonSet. This volume, which should be very large, will be used by baggageclaim to store the volumes that it creates on each Kubernetes node. +- We plan to not guarantee the requested storage capicity because we have no idea how much space any given step in Concourse will use. Kubernetes will force us to specify a storage request but our CSI driver will ignore this value. # Storage Options considered From 8816255d21b47460683dc4438f244f08848b959c Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Tue, 27 Oct 2020 16:38:56 -0400 Subject: [PATCH 11/18] Continuing to outline the proposal Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index c7c8d02f..6f1de4f4 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -39,7 +39,7 @@ Volumes are created based on one of three strategies: - COW Strategy: creates a volume based on an existing volume - Import Strategy: creates a volume based on a local directory or tar ball -Baggageclaim keeps tracks of all volumes by querying the filesystem structure, therefore no database component is needed. Volumes are assigned an ID that is passed in by the baggageclaim client. +Baggageclaim keeps tracks of all volumes by querying the filesystem structure, therefore no database component is needed. Volumes are assigned an ID that is passed in by the baggageclaim client when the `CreateVolume` request is made. The [HTTP REST API](https://github.com/concourse/baggageclaim/blob/ea9252e4fcca101f32971cfb5ff47c3355c7c91e/api/handler.go#L26-L38) allows a baggageclaim client to: - Create volumes with one of the above strategies @@ -54,7 +54,7 @@ Supports multiple filesystem drivers (overlay, btrfs, naive\*). Overlay is the r ### What's a CSI Driver? -In a container orchestration (CO) system, such as Kubernetes or Cloud Foundry, you need a way to provide storage to containers. This could be ephemeral (lifecycle is tied to the container) or persistent (exists outside of the container's lifecycle) storage. In order to support many different storage providers the [CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md) was made to allow CO's to have a consistent way to ask for storage for containers. +In a container orchestration (CO) system, such as Kubernetes or Cloud Foundry, you need a way to provide storage to containers. This could be ephemeral (lifecycle is tied to the container) or persistent (operates outside of the container's lifecycle) storage. In order to support many different storage providers the [CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md) was made to allow CO's to have a consistent way to ask for storage for containers. A CSI driver is an implementation of a gRPC interface. The CO communicates with a CSI driver over a unix domain socket. @@ -77,11 +77,38 @@ Targeting Kubernetes Version 1.19 Follow the recommended deployment strategy from the Kubernetes team [described in this design document](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes) with the following differences: - no `external-resizer` container. Not planning to support resizing. - no `external-snapshotter` container. We will use the `CLONE_VOLUME` feature to create COW volumes in baggageclaim instead of trying to use snapshots. -- An extra volume must be mounted for each Pod in the DaemonSet. This volume, which should be very large, will be used by baggageclaim to store the volumes that it creates on each Kubernetes node. -- We plan to not guarantee the requested storage capicity because we have no idea how much space any given step in Concourse will use. Kubernetes will force us to specify a storage request but our CSI driver will ignore this value. +- An extra volume must be mounted for each replica Pod in the DaemonSet. This volume, which should be very large, will be used by baggageclaim to store the volumes that it creates on each Kubernetes node. +- We plan to **not guarantee** the requested storage capicity because we have no idea how much space any given step in Concourse will use. Kubernetes will force us to specify a storage request but our CSI driver will ignore this value. This goes against the CSI spec. +Let's go over some use cases to get an understanding about how the implementation may work. -# Storage Options considered +### Creating An Empty Volume + +A user creates a PVC: +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: empty-pvc +spec: + accessModes: + - ReadWriteOnce # the only accessMode we will support + volumeMode: Filesystem + resources: + requests: + storage: 1Gi # can be any value, we ignore this + storageClassName: baggageclaim +``` +The [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) will call `Controller.CreateVolume`. In this case `CreateVolume` will: + + +### Creating A Cloned Volume + +### Streaming Volumes Inside A Kubernetes Cluster + +### Streaming Volumes To An External Baggageclaim + +# Alternative Storage Options considered ## Baggageclaim + CSI Implementation ### Description From 88b5eddd550b92343267970dccf2b136fdd2695a Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 14:19:15 -0400 Subject: [PATCH 12/18] edits Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 142 +++++++++++++++++++++++++++++------- 1 file changed, 114 insertions(+), 28 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 6f1de4f4..b2b44a33 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -89,7 +89,7 @@ A user creates a PVC: apiVersion: v1 kind: PersistentVolumeClaim metadata: - name: empty-pvc + name: volume-guid # The volume ID that Concourse Web keeps track of spec: accessModes: - ReadWriteOnce # the only accessMode we will support @@ -99,16 +99,127 @@ spec: storage: 1Gi # can be any value, we ignore this storageClassName: baggageclaim ``` -The [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) will call `Controller.CreateVolume`. In this case `CreateVolume` will: +The [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) will call `Controller.CreateVolume`. In this case `CreateVolume` will generate an ID for tracking the volume. + +With the PVC "created" (from the perspective of Kubernetes), a user can now reference the PVC in a Pod. + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: output +spec: + containers: + - name: output + ... + volumeMounts: + - name: artifact-name + mountPath: /tmp/artifact-name + volumes: + - name: artifact-name + persistentVolumeClaim: + claimName: volume-id +``` + +`Controller.PublishVolume` will get called. This will be a no-op. + +`NodeStageVolume` will get called. This will be a no-op. + +`NodePublishVolume` will get called. Baggageclaim will create a volume based on the `EmptyStrategy`. + +```go +volume, err := ns.bagClient.CreateVolume(ns.logger, req.VolumeId, baggageclaim.VolumeSpec{ + Strategy: baggageclaim.EmptyStrategy{}, + Properties: map[string]string{}, +}) +``` + +The volume will then be mounted at the path provided in the `NodePublisVolumeRequest`: + +```go +mounter := mount.New("") +path := volume.Path() +targetPath := req.GetTargetPath() +options := []string{"bind"} +glog.V(4).Infof("concourse: mounting baggageclaim volume at %s", path) +if err := mounter.Mount(path, targetPath, "", options); err != nil { + return nil, err +} +``` ### Creating A Cloned Volume +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: volume-guid +spec: + accessModes: + - ReadWriteOnce + volumeMode: Filesystem + resources: + requests: + storage: 1Gi + storageClassName: baggageclaim + dataSource: # will only support cloning other baggageclaim volumes + name: some-other-pvc + kind: PersistentVolumeClaim +``` + +The [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) will call `Controller.CreateVolume`. In this case `CreateVolume` will see that a `VolumeContentSource` has been provided and will pass along the source volume's ID to later requests using the `Volume.volume_context` field. + +```go +if req.GetVolumeContentSource() != nil { + volumeSource := req.VolumeContentSource + switch volumeSource.Type.(type) { + case *csi.VolumeContentSource_Volume: + if srcVolume := volumeSource.GetVolume(); srcVolume != nil { + volumeContext["sourceVolumeID"] = srcVolume.GetVolumeId() + } + default: + status.Errorf(codes.InvalidArgument, "%v not a proper volume source", volumeSource) + } +} +``` + + ### Streaming Volumes Inside A Kubernetes Cluster ### Streaming Volumes To An External Baggageclaim -# Alternative Storage Options considered +# Open Questions + +- When we need to have a volume available on multiple k8s nodes, how do we do this in a baggageclaim CSI driver? + - Would it make sense to support `ReadWriteMany` as the volume's `accessMode` instead of `ReadWriteOnce`? +- What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? +- How will the CSI driver stream a volume between k8s nodes? + - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? +- What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ +- For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? + + +# Answered Questions + +- Should we implement this as a CSI driver? **Yes we do after doing the CSI Driver POC Spike** +- Do we implement our own version of the csi-image-populator? **Yes but based on baggageclaim instead of image layers** + +# Related Links +- [Storage Spike](https://github.com/concourse/concourse/issues/6036) +- [Review k8s worker POC](https://github.com/concourse/concourse/issues/5986) +- [CSI Driver POC Spike](https://github.com/concourse/concourse/issues/6133) + + +# New Implications + +Will drive the rest of the Kubernetes runtime work. + +--- + +# Appendix - Alternative Storage Options considered + +The follow are some of the other storage on Kubernetes options that we considered. ## Baggageclaim + CSI Implementation ### Description @@ -232,28 +343,3 @@ The CSI spec can be used to wrap every solution listed above. It provides an API ## Fuse This might simplify our usage of external storage solutions such as blobstores. There isn't a supported solution in K8s at the moment. However, this would be something worth considering if that were to change. [Click here to view the current issue requesting K8s development](https://github.com/kubernetes/kubernetes/issues/7890). -# Open Questions - -- When we need to have a volume available on multiple k8s nodes, how do we do this in a baggageclaim CSI driver? - - Would it make sense to support `ReadWriteMany` as the volume's `accessMode` instead of `ReadWriteOnce`? -- What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? -- How will the CSI driver stream a volume between k8s nodes? - - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? -- What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ -- For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? - - -# Answered Questions - -- Should we implement this as a CSI driver? **Yes we do after doing the CSI Driver POC Spike** -- Do we implement our own version of the csi-image-populator? **Yes but based on baggageclaim instead of image layers** - -# Related Links -- [Storage Spike](https://github.com/concourse/concourse/issues/6036) -- [Review k8s worker POC](https://github.com/concourse/concourse/issues/5986) -- [CSI Driver POC Spike](https://github.com/concourse/concourse/issues/6133) - - -# New Implications - -Will drive the rest of the Kubernetes runtime work. From 349287e0b74450578afbf2a3d17dcad222d2deb6 Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 15:36:02 -0400 Subject: [PATCH 13/18] more edits Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index b2b44a33..1a038a61 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -134,8 +134,9 @@ volume, err := ns.bagClient.CreateVolume(ns.logger, req.VolumeId, baggageclaim.V Properties: map[string]string{}, }) ``` +_This could also be done in `NodeStageVolume`_ -The volume will then be mounted at the path provided in the `NodePublisVolumeRequest`: +Still in `NodePublishVolume`, the volume will then be mounted at the path provided in the `NodePublisVolumeRequest`: ```go mounter := mount.New("") @@ -148,6 +149,8 @@ if err := mounter.Mount(path, targetPath, "", options); err != nil { } ``` +The volume has been successfully provided to Kubernetes by this point. + ### Creating A Cloned Volume ```yaml @@ -184,6 +187,24 @@ if req.GetVolumeContentSource() != nil { } ``` +`Controller.PublishVolume` will get called. This will be a no-op. + +`NodeStageVolume` will get called. This will be a no-op. + +`NodePublishVolume` will get called. Baggageclaim will create a volume based on the `COWStrategy`, fetching the parent volume from `VolumeContext`. + +```go +id, _ := volumeContext["sourceVolumeID"] +sourceVolume, _, _ := ns.bagClient.LookupVolume(ns.logger, id) + +volume, err := ns.bagClient.CreateVolume(ns.logger, req.VolumeId, baggageclaim.VolumeSpec{ + Strategy: baggageclaim.COWStrategy{Parent: sourceVolume}, + Properties: map[string]string{}, +}) +``` +_This could also be done in `NodeStageVolume`_ + +Still in `NodePublishVolume`, the volume will then be mounted at the path provided in the `NodePublisVolumeRequest`. The volume, populated with data from the parent PVC, has been successfully provided to Kubernetes by this point. ### Streaming Volumes Inside A Kubernetes Cluster From ddea43cc7645028192e5d4d48568bcd60dd80a17 Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 16:02:01 -0400 Subject: [PATCH 14/18] add some questions Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 1a038a61..39b2b622 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -31,6 +31,8 @@ Furthermore, the CSI is a useful interface for building the storage component ag ## Level Setting +Before getting into the meat of the proposal let's first understand level set on our understanding of [baggageclaim]() and the [CSI spec](). + ### What does Baggageclaim do? Baggageclaim comes as two components: a client and a server communicating over an HTTP REST API. The server component manages volumes within a specified directory on the host. @@ -219,6 +221,7 @@ Still in `NodePublishVolume`, the volume will then be mounted at the path provid - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? - What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ - For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? +- Does the CSI driver need to be aware of each Concourse cluster that is using it? Another way of phrasing this question: can/should the CSI driver support multiple concourse installations? Do we need to do anything special to support this if we decide yes? # Answered Questions From c21e215b13ac3c1621afd6e03784f7a3f9d9934b Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 16:06:07 -0400 Subject: [PATCH 15/18] Add question about streaming files Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 1 + 1 file changed, 1 insertion(+) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 39b2b622..29e2ecc2 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -219,6 +219,7 @@ Still in `NodePublishVolume`, the volume will then be mounted at the path provid - What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? - How will the CSI driver stream a volume between k8s nodes? - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? + - How will we stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) - What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ - For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? - Does the CSI driver need to be aware of each Concourse cluster that is using it? Another way of phrasing this question: can/should the CSI driver support multiple concourse installations? Do we need to do anything special to support this if we decide yes? From fc2fd1f8f76820131a41e24298f7f87309eff722 Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 16:10:48 -0400 Subject: [PATCH 16/18] more q's Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 29e2ecc2..585dae95 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -219,10 +219,11 @@ Still in `NodePublishVolume`, the volume will then be mounted at the path provid - What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? - How will the CSI driver stream a volume between k8s nodes? - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? - - How will we stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) + - How will we stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) (maybe this is an open question for the runtime RFC) - What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ - For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? - Does the CSI driver need to be aware of each Concourse cluster that is using it? Another way of phrasing this question: can/should the CSI driver support multiple concourse installations? Do we need to do anything special to support this if we decide yes? +- Do we need to modify baggageclaim for any reason? # Answered Questions From 421d0ae2c87162d9db75a8444fefb35f99e4c1fd Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Fri, 30 Oct 2020 17:37:53 -0400 Subject: [PATCH 17/18] streaming questions Signed-off-by: Taylor Silva --- 074-k8s-storage/proposal.md | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 585dae95..7e1eef65 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -78,15 +78,15 @@ Targeting Kubernetes Version 1.19 Follow the recommended deployment strategy from the Kubernetes team [described in this design document](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes) with the following differences: - no `external-resizer` container. Not planning to support resizing. -- no `external-snapshotter` container. We will use the `CLONE_VOLUME` feature to create COW volumes in baggageclaim instead of trying to use snapshots. +- no `external-snapshotter` container. The baggageclaim CSI driver will use the `CLONE_VOLUME` feature to create COW volumes in baggageclaim instead of trying to use snapshots. - An extra volume must be mounted for each replica Pod in the DaemonSet. This volume, which should be very large, will be used by baggageclaim to store the volumes that it creates on each Kubernetes node. -- We plan to **not guarantee** the requested storage capicity because we have no idea how much space any given step in Concourse will use. Kubernetes will force us to specify a storage request but our CSI driver will ignore this value. This goes against the CSI spec. +- The baggageclaim CSI drver will **not guarantee** the requested storage capicity because Concourse has no idea how much space any given step in will use, therefore Concourse continues to expect the CSI driver to follow this assumption. Kubernetes will force us to specify a storage request but the baggageclaim CSI driver will ignore this value. This goes against the CSI spec. Let's go over some use cases to get an understanding about how the implementation may work. ### Creating An Empty Volume -A user creates a PVC: +Concourse creates a PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim @@ -104,7 +104,7 @@ spec: The [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) will call `Controller.CreateVolume`. In this case `CreateVolume` will generate an ID for tracking the volume. -With the PVC "created" (from the perspective of Kubernetes), a user can now reference the PVC in a Pod. +With the PVC "created" (from the perspective of Kubernetes), Concourse can now reference the PVC in a Pod. ```yaml apiVersion: v1 @@ -155,6 +155,7 @@ The volume has been successfully provided to Kubernetes by this point. ### Creating A Cloned Volume +Concourse creates a PVC that references another PVC in the `spec.dataSource` field: ```yaml apiVersion: v1 kind: PersistentVolumeClaim @@ -168,7 +169,7 @@ spec: requests: storage: 1Gi storageClassName: baggageclaim - dataSource: # will only support cloning other baggageclaim volumes + dataSource: name: some-other-pvc kind: PersistentVolumeClaim ``` @@ -210,18 +211,23 @@ Still in `NodePublishVolume`, the volume will then be mounted at the path provid ### Streaming Volumes Inside A Kubernetes Cluster -### Streaming Volumes To An External Baggageclaim +### Streaming Volumes To And From An External Baggageclaim +Concourse needs to be able to stream volumes between workers. Our entire storage solution needs to take this into consideration. This use-case could be tackled by the CSI driver itself or by an external component such as Concourse web nodes. + +Here are two potential paths Concourse could take to address this use-case. More ideas are welcomed! +1. Use the Kubernetes API. Streaming is facilitated by Concourse Web nodes. + - Use the same packages/functions that are used by `kubectl cp` to stream volume contents out (entire volume or single files) + - Populate a volume with content from another worker somehow, maybe via a pod acting like a get step +2. Expose Baggageclaim's streaming endpoints for Concourse web to reach similar to how Concourse currently connects to Baggageclaim on workers. Still facilitated by Concourse Web Nodes but have the CSI driver play a more direct role in the streaming of bits. # Open Questions -- When we need to have a volume available on multiple k8s nodes, how do we do this in a baggageclaim CSI driver? +- When the CSI driver needs to have a volume available on multiple k8s nodes, how does the CSI driver accomplish this? - Would it make sense to support `ReadWriteMany` as the volume's `accessMode` instead of `ReadWriteOnce`? - What does the Concourse database model for volumes look like with a k8s worker running a baggageclaim CSI driver? - How will the CSI driver stream a volume between k8s nodes? - What is the recommended way for a CSI controller to maintain state and know which volume is on which node(s)? - - How will we stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) (maybe this is an open question for the runtime RFC) -- What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) _StatefulSet appears to fit our usecase best_ -- For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? + - How will Concourse stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) - Does the CSI driver need to be aware of each Concourse cluster that is using it? Another way of phrasing this question: can/should the CSI driver support multiple concourse installations? Do we need to do anything special to support this if we decide yes? - Do we need to modify baggageclaim for any reason? @@ -230,6 +236,8 @@ Still in `NodePublishVolume`, the volume will then be mounted at the path provid - Should we implement this as a CSI driver? **Yes we do after doing the CSI Driver POC Spike** - Do we implement our own version of the csi-image-populator? **Yes but based on baggageclaim instead of image layers** +- What is the recommended way to deploy the CSI driver? (e.g. StatefulSet, DaemonSet, etc.) **StatefulSet appears to fit our usecase best** +- For volume streaming, should we go for the in-cluster P2P solution or stick with streaming through the Concourse web nodes? **We will do in-cluster P2P between Kubernetes nodes and stick with going through the Concourse web nodes for streaming to external workers as our first pass** # Related Links - [Storage Spike](https://github.com/concourse/concourse/issues/6036) From 00a363a2e43a345bba8205ff78fe87eeae4fa513 Mon Sep 17 00:00:00 2001 From: Taylor Silva Date: Tue, 1 Dec 2020 14:56:35 -0500 Subject: [PATCH 18/18] Add question about a baggageclaim image registry --- 074-k8s-storage/proposal.md | 1 + 1 file changed, 1 insertion(+) diff --git a/074-k8s-storage/proposal.md b/074-k8s-storage/proposal.md index 7e1eef65..a4366601 100644 --- a/074-k8s-storage/proposal.md +++ b/074-k8s-storage/proposal.md @@ -230,6 +230,7 @@ Here are two potential paths Concourse could take to address this use-case. More - How will Concourse stream single files in a volume? (i.e. when Concourse needs to read a task config from the artifact of a get step) - Does the CSI driver need to be aware of each Concourse cluster that is using it? Another way of phrasing this question: can/should the CSI driver support multiple concourse installations? Do we need to do anything special to support this if we decide yes? - Do we need to modify baggageclaim for any reason? +- Adding image regisrty endpoints to baggageclaim so we can still support using images from a previous step? Was thinking that we could add the endpoints like Ciro did in the POC. Our baggageclaim pods will already be on each k8s node in a privileged volume. Would it be wrong to add in a cert for each baggageclaim registry so that the CRI's on all nodes can reach our baggageclaim image registry? [Trow has figured out how to add a cert](https://github.com/ContainerSolutions/trow/blob/7e3187edbdc8c37c836a2cdc133fd6c68289b11e/quick-install/install.sh#L138-L146) # Answered Questions