What conventions should we adopt for setting default values for resources? #286

jsanda · 2021-01-27T16:25:21Z

jsanda
Jan 27, 2021
Maintainer

Type of question
We have (or had) multiple tickets that involve setting defaults for things like container memory and heap size. There has been scattered discussions on the topic, so I thought it would be good to try and capture that in a single place and try to figure out what conventions we want to adopt.

For reference here are some of the related tickets:

The discussion primarily focuses on Cassandra and Stargate, but it does apply to other application we deploy. Here are several questions that we need to consider:

Should we specify defaults for memory, JVM heap, cpu?
If we do not specify defaults, what settings will the application use?
- In some cases, the application defaults are not good. Cassandra's default for the heap new generation size for example is typically bad.
- If we specify defaults, what type of defaults should we use?
  - Should they be minimum requirements?
  - Should they target a dev environment?
Is it ok if we specify default settings in some places and not others? Would it be confusing for example if we specify default heap settings for Cassandra and not for Stargate (or vice versa)?

As we are quickly approaching a 1.0 release, I am not necessarily suggesting we have all this figured out in the 1.0 time frame. I do want to start the discussion now though as it is coming up multiple places.

I also want to note that there is a desire to be opinionated in k8ssandra. That is a way we can provide value add with the project.

jdonenine · 2021-01-27T16:31:14Z

jdonenine
Jan 27, 2021

I would definitely support the idea that we should be opinionated and provide best-practice-based defaults where we can.

0 replies

jakerobb · 2021-01-27T16:48:25Z

jakerobb
Jan 27, 2021

In my opinion:

We should specify defaults where the underlying thing we're configuring has unreasonable or inconsistent defaults by itself.
- Cassandra's defaults, as John mentioned above, are what I'm thinking of when I say "unreasonable"
- The JVM's behavior when heap size is not specified is what I'm thinking of when I say "inconsistent" -- it will run with different specs on different machines.
We should include defaults for any setting required to get up and running.
Our defaults should target a developer running k8ssandra on a local machine using kind/k3d/minikube.
- One C*, one Stargate, ingress enabled, smallest reasonable memory footprint, etc.

My rationale for this is that one of the biggest obstacles to adoption of both Kubernetes and Cassandra is friction. Getting and environment configured and running at all, let alone well, is surprisingly hard. The more we do to streamline that experience, the more we are setting k8ssandra up to succeed. I would argue that streamlining the experience of running Cassandra on Kubernetes is k8ssandra's prime directive and sole reason for existence.

Minimizing friction for the developer is a priority over minimizing friction for use in production (or intermediate testing environments) because no matter what we choose, it's highly unlikely to be perfect for anyone.

In addition to defaults built into the charts, I think we should provide a small handful of "profiles": values.yaml override files with reasonable settings for a few different use scenarios. When people are ready to deploy k8ssandra beyond dev, they can use these files as a starting point and as examples. The files should be commented heavily with reasons and explanations for each setting we change from the defaults, including URLs of more detailed docs where appropriate.

I would further argue that at least a first stab at reasonable defaults for dev use are a worthy goal for 1.0. The profiles can come later.

0 replies

parham-pythian · 2021-01-27T18:23:55Z

parham-pythian
Jan 27, 2021

I like the idea of having best practices baked in, and config files defined by profile. I've seen this done as small, medium, large, for example.

When creating similar configuration defaults for Docker-based local clusters to be run on a variety of small/big laptops, I've defined the heap as low as 100M to allow for a nice number of nodes to play with (functional testing only). So I'll throw that number out as a suggestion for a small configuration.

0 replies

jsanda · 2021-01-27T18:43:43Z

jsanda
Jan 27, 2021
Maintainer Author

Our defaults should target a developer running k8ssandra on a local machine using kind/k3d/minikube.

I think this is reasonable, but I do want to point out that dev requirements != minimum requirements.

Our defaults should target a developer running k8ssandra on a local machine using kind/k3d/minikube.

I am split on this. On the one hand, I absolutely agree that want to provide a smooth developer experience. On the other hand, I want to make sure we also give adequate attention to our target prod environments. One project I worked on in the past used an RDBMS backend. The project/product supported both Postgres and Oracle. Most prod deployments used Oracle. Most devs used Postgres because it open source and easier to set up and use for development. We had a lot more Oracle related bugs than we did Postgres related bugs. The fact that more customers were running Oracle had a lot to do with that. If Oracle was used more during development, some of those bugs would have been caught sooner.

One C*, one Stargate, ingress enabled, smallest reasonable memory footprint, etc.

I am 👎 on having the default dev environment consist of a one node C* cluster. Cassandra is designed to run as a multi-node cluster. There are times when I think a single C* node is fine, like with some automated tests. In general though, I think we should be running multi-node clusters

0 replies

jsanda · 2021-01-27T18:50:05Z

jsanda
Jan 27, 2021
Maintainer Author

In addition to defaults built into the charts, I think we should provide a small handful of "profiles": values.yaml override files with reasonable settings for a few different use scenarios.

This is an interesting idea. I am not sure how well we could achieve something like this with Helm. I would need to investigate a bit. I am pretty sure it is much more feasible though with helmfile.

While profiles is an intriguing idea, I am not sure if I would want to implement that on top of Helm.

0 replies

jsanda · 2021-01-27T18:53:56Z

jsanda
Jan 27, 2021
Maintainer Author

When creating similar configuration defaults for Docker-based local clusters to be run on a variety of small/big laptops, I've defined the heap as low as 100M to allow for a nice number of nodes to play with (functional testing only). So I'll throw that number out as a suggestion for a small configuration.

This topic is near and dear to me 😄 There are lot of automated test scenarios where I do want a multiple C* nodes but I really only need them up and running and want to use as little resources as possible. This might be a good discussion for the Cassandra mailing list.

0 replies

jakerobb · 2021-01-27T19:29:26Z

jakerobb
Jan 27, 2021

I am 👎 on having the default dev environment consist of a one node C* cluster. Cassandra is designed to run as a multi-node cluster. There are times when I think a single C* node is fine, like with some automated tests. In general though, I think we should be running multi-node clusters

Understood; I just want to caution against conflating our defaults with what we (k8ssandra devs) work against most often. I'd want to have automated tests that run against the defaults as well as all "profiles". Or are you saying that a developer building an app on her laptop should be running multiple nodes against which to test her code?

This is an interesting idea. I am not sure how well we could achieve something like this with Helm. I would need to investigate a bit. I am pretty sure it is much more feasible though with helmfile.

While profiles is an intriguing idea, I am not sure if I would want to implement that on top of Helm.

Perhaps I'm missing something, but my suggestion was just that we bundle some customized values. helmfile seems interesting, but goes well beyond what I'm suggesting, and I'm not sure why it would be necessary.

I'll attempt to flesh out what I'm seeing in my head here for clarity. The codebase would look like this:

+ k8ssandra/
   - charts/
   - cmd/
   - docs/
   - ...
   + profiles/
      - small.yaml
      - medium.yaml
      - large.yaml

(To be clear: I'm not specifically advocating for small/medium/large as what we should name the profiles, nor for any sort of arbitrary description of size in general -- I'd be inclined to name them descriptively, e.g. three-nodes-with-stargate.yaml / seven-nodes-auth-disabled.yaml, or perhaps in terms of what they accomplish, e.g. high-availability-all-features-enabled.yaml / minimum-footprint.yaml. If things get complicated, we could organize them more using directory structure.)

Then, in the docs, we'd enumerate and describe each profile, and instruct the user to install them thus:

helm install mycluster k8ssandra/k8ssandra -f profiles/medium.yaml

Should they wish to fine tune, we would suggest (via the docs) that they copy a suitable profile, modify it to their taste, and then provide that at the command line. In this way we're not so much "building profiles on top of Helm" as we are using Helm to its fullest extent.

It would then be fairly straightforward for us to add/remove/modify profiles, develop against any of them (or the defaults), design an automated test suite which executes against all profiles (presumably with a mechanism to suppress certain tests for certain profiles, such that we're not testing multi-node stuff on single-node clusters), and refer to the profiles in our docs.

Some (most?) users would presumably not be working from a clone of our repo, so I'm assuming in those cases that they'd simply download the profiles from GitHub via links they'd find in our docs.

0 replies

JeremiahDJordan · 2021-01-28T22:32:14Z

JeremiahDJordan
Jan 28, 2021

In some cases, the application defaults are not good. Cassandra's default for the heap new generation size for example is typically bad.

One aside here. If you think Cassandra does a bad job at picking defaults, then we should fix that upstream, not just override that with new defaults.

1 reply

jakerobb Jan 31, 2021

Agreed -- but we should set our own default until such time as that change gets included. Also, we support a handful of already-released versions of C* -- 3.11.6 is never going to have a good default, which means we need one regardless.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What conventions should we adopt for setting default values for resources? #286

{{title}}

Replies: 8 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

What conventions should we adopt for setting default values for resources? #286

jsanda Jan 27, 2021 Maintainer

Replies: 8 comments · 1 reply

jdonenine Jan 27, 2021

jakerobb Jan 27, 2021

parham-pythian Jan 27, 2021

jsanda Jan 27, 2021 Maintainer Author

jsanda Jan 27, 2021 Maintainer Author

jsanda Jan 27, 2021 Maintainer Author

jakerobb Jan 27, 2021

JeremiahDJordan Jan 28, 2021

jakerobb Jan 31, 2021

jsanda
Jan 27, 2021
Maintainer

Replies: 8 comments 1 reply

jdonenine
Jan 27, 2021

jakerobb
Jan 27, 2021

parham-pythian
Jan 27, 2021

jsanda
Jan 27, 2021
Maintainer Author

jsanda
Jan 27, 2021
Maintainer Author

jsanda
Jan 27, 2021
Maintainer Author

jakerobb
Jan 27, 2021

JeremiahDJordan
Jan 28, 2021