Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephObjectStoreUser Reference Secret #11563

Open
jameshearttech opened this issue Jan 18, 2023 · 44 comments · May be fixed by #14001
Open

CephObjectStoreUser Reference Secret #11563

jameshearttech opened this issue Jan 18, 2023 · 44 comments · May be fixed by #14001
Assignees
Labels
Projects

Comments

@jameshearttech
Copy link

jameshearttech commented Jan 18, 2023

Is this a bug report or feature request?

  • Feature Request

What should the feature do:

  • CephObjectStoreUser should optionally be able to reference a secret where S3 key is defined. This enables us to specify the accesskey and accesssecret rather than those values being randomly generated. When the secret changes the CephObjectStoreUser should be updated.

What is use case behind this feature:

  • Define CephObjectStoreUser S3 key properties (e.g., accesskey and accesssecret) so we can use those same values to configure access to Ceph object storage bucket in an automated way (e.g., using ArgoCD and External Secrets Operator).

Environment:

$ kubectl get nodes -o wide
NAME          STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
dev-master0   Ready    control-plane   2d23h   v1.26.0   10.69.2.30    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-master1   Ready    control-plane   2d23h   v1.26.0   10.69.2.31    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-master2   Ready    control-plane   2d23h   v1.26.0   10.69.2.32    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker0   Ready    <none>          2d23h   v1.26.0   10.69.2.33    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker1   Ready    <none>          2d23h   v1.26.0   10.69.2.34    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker2   Ready    <none>          2d23h   v1.26.0   10.69.2.35    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker3   Ready    <none>          2d23h   v1.26.0   10.69.2.36    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
@divaspathak
Copy link

Hello @jameshearttech, I want to work on this issue. Can you please assign it to me?

@subhamkrai
Copy link
Contributor

@jameshearttech we do store accesskey and accesssecret. what are you requesting here? you suggesting to objectstoreuser should point to someSecret for this values?

@parth-gr
Copy link
Member

Probably they are asking similar to this #11336

@thotz
Copy link
Contributor

thotz commented Jan 23, 2023

Probably they are asking similar to this #11336

Yes I guess so, @jameshearttech can you please check above PR owner about its status?

@jameshearttech
Copy link
Author

jameshearttech commented Jan 23, 2023

@jameshearttech we do store accesskey and accesssecret. what are you requesting here? you suggesting to objectstoreuser should point to someSecret for this values?

Yes, exactly.

https://rook-io.slack.com/archives/CK9CF5H2R/p1674075543129569

In my case everything is committed to Git. Storing the accesskey and accesssecret in the CephObjectUserStore resource would potentially leak those sensitive values.

Btw, I don't think the docs reflect spec.accessKey and spec.secretKey if those are already implemented.

@zhucan
Copy link
Member

zhucan commented Mar 14, 2023

is there any new progress? @jameshearttech @thotz @divaspathak @subhamkrai @parth-gr

@thotz
Copy link
Contributor

thotz commented Mar 14, 2023

I thought this issue was related to #11336

@jameshearttech
Copy link
Author

jameshearttech commented Mar 14, 2023

Yeah, they are definitely related. If it makes sense, feel free to close this one.

EDIT: Oh, wait, I see the other issue was closed. The other issue has a lot of work done on it. In my case I was able to use block instead of bucket. Regardless, I think there should be a solution to this issue.

EDIT2: In my last edit I was setting up Loki and I thought that it only worked with object storage, but I found I could use file storage; however, that only works with single binary mode. We are looking to move to simple scaling mode, which requires object storage so this is now a block for me. Likewise, we are looking at implementing Thanos, which also requires object storage.

@devops-42
Copy link

devops-42 commented Mar 23, 2023

Hi,

I'd appreciate the approach of referring to existing k8s secrets. This would decouple the direct communication from Ceph to a KMS like Vault. One could implement K8s secrets by themselve and create CephObjectStoreUser instances referencing to an existing K8s secret.

@jhoblitt
Copy link
Contributor

jhoblitt commented Apr 3, 2023

I'm interested in this feature as well. I have rgw clients which are external to k8s. It would be great if I could use something like external-secrets operator to sync rgw user creds. This would be an alternative to using ldap auth.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@jhoblitt
Copy link
Contributor

This is a feature request we should probably keep open.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@travisn travisn added good-first-issue Simple issues that are good for getting started with Rook. and removed wontfix labels Aug 11, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@jameshearttech
Copy link
Author

jameshearttech commented Oct 19, 2023

Please do not close feature request. Afaik this has not been implemented.

@travisn
Copy link
Member

travisn commented Oct 19, 2023

A few notes on implementing this feature for someone to pick it up:
In the createOrUpdateCephUser() method, on line 298 if the GetUser() returns ErrNoSuchUser, then we need to:

  1. Query if the K8s secret exists where Rook would later store the user creds.
  2. If the secret exists and the expected keys exist (see example below), then set those values on the r.userConfig struct that is being passed to the CreateUser() method.
  3. Assuming Ceph accepts the user secret as input, the user would be created with the pre-defined keys.

Here is an example user secret which is named rook-ceph-object-user-<store>-<user>:

% kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o yaml
apiVersion: v1
kind: Secret
data:
  AccessKey: U0lVNVJQTVg0VFIwMTI1QkxGSjc=
  SecretKey: aHdnbGdIVmlDcEJqTHN3WUJDakJqWlBRQ2tzeEM3dkRuRDdXMTNySw==
  Endpoint: aHR0cDovL3Jvb2stY2VwaC1yZ3ctbXktc3RvcmUucm9vay1jZXBoLnN2Yzo4MA==

If this secret is created in advance of the reconcile, then Rook would pick it up.

Note that since this issue was originally opened, now the user and secret could be created in any namespace since #12730 was implemented.

@subhamkrai subhamkrai self-assigned this Oct 20, 2023
@github-actions github-actions bot removed the wontfix label Oct 20, 2023
@subhamkrai subhamkrai removed their assignment Dec 20, 2023
@subhamkrai
Copy link
Contributor

Removing the assigning as I'm focused on some other issue and I don't want to hold this to my name for long

@subhamkrai subhamkrai added this to To do in v1.13 via automation Dec 20, 2023
@subhamkrai
Copy link
Contributor

Also added it to project board 1.13 to get more attention

parth-gr added a commit to parth-gr/rook that referenced this issue Apr 19, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 19, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 19, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 19, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 22, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 22, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 22, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 23, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 23, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 23, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 29, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
parth-gr added a commit to parth-gr/rook that referenced this issue Apr 30, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
@BlaineEXE
Copy link
Member

I've been investigating key rotation for the COSI design updates as well, and after speaking with Josh on a call and collecting more information, I think we might want to revisit the design discussion for this Rook feature as well. It's tiring to get 90% of the way to completion with design and code and then have to rework it, but I think it may be worthwhile in this case.

From my discussion with Josh, the biggest gap that I think we have with the implementation proposed is that users don't have a good way to determine when the key rotation has completed. Josh has over a dozen k8s clusters that he manages, and that may grow to 2 dozen in not too long. In large environments like that, being able to know definitively when a key has been rotated is important. Status/monitoring/feedback is also a major tenet of the k8s design philosophy as well, and I missed it when proposing a design for this feature.

I'd like to talk about the details more in a Rook huddle next week, but the summary of my proposed new design is to not reuse the current secret for both input and output. I think it's simpler for developers and for users if we use a dedicated input secret, and Rook always uses the existing secret for output.

As for monitoring, I propose to achieve that by recording the metadata.generation field of the user input secret as a status field on the COSUser CR. Users can then easily use k8s tooling to compare the two values to know when the key rotation is complete.

I'd like to discuss this as a general idea before we start changing the code, so we don't burn ourselves out when reworking this. I think we can reuse a large part of the work we've already done on specifying the secret formatting, so hopefully we can limit that aspect. If that sounds good, we can iron out smaller details regarding COSUser spec/status API fields.

@parth-gr
Copy link
Member

parth-gr commented May 8, 2024

So for implementation:

  1. Keep the original rgw secret as it is for the status.

  2. If a user wants to provide their own credentials or do key rotation.
    i) Provide the user secret name in the objectuser CR
    CR will have a spec,

        spec:
          userSecretName: my-secret
    

    ii) Create or update the secret with this name

         apiVersion: v1
         kind: Secret
         metadata:
              name: my-secret
         data:
           AccessKey: U0lVNVJQTVg0VFIwMTI1QkxGSjc=
           SecretKey: aHdnbGdIVmlDcEJqTHN3WUJDakJqWlBRQ2tzeEM3dkRuRDdXMTNySw==
           Endpoint: aHR0cDovL3Jvb2stY2VwaC1yZ3ctbXktc3RvcmUucm9vay1jZXBoLnN2Yzo4MA==
    

    iii) The data fields remain the same as it was before
    iv) If the reconciliation of cephobjectstore user was successful and credentials were updated,
    Add the status to the cephobjectstore user CR, my-user secret creds successfully updated
    v) Also update the original rgw secret with the AccessKey and SecretKey.

@BlaineEXE please review is that what we need to do?

@BlaineEXE
Copy link
Member

The key reason to make this change is to make sure we are also planning a way for users to get feedback about the key rotation process. We need to also plan the status fields. I would suggest this to be something like this.

status:
  # ...
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

Having secretName and secretGeneration are both important. The first allows the user to change the secret used, and ensure Rook applied the correct one. The second allows the user to know when the current secret matches what Rook has applied.

From my discussion with Josh, I think it is also still good to plan for a way that users can have multiple active credentials. Josh explained this to be important to him, as well as other admins. It allows a migration window for end users so that key rotation isn't something that causes application downtime.

I think we should still plan to include the Credentials field in both the output secret and input secret to allow for this. (We could use multiple input credentials, but I think having a single input secret will mean that status comparison is easier for users.)

Also, I have another concern: Because the resource is called CephObjectStoreUser, I think we should refrain from choosing userSecretName. IMO, this naming overlap allows too much room for misunderstanding.

I also think we might consider using the a new section and the inputCredentials.secretName pattern for the input spec. This helps clarify for users that we are inputting credentials, and that it takes a secret name. I don't think this is strictly necessary, but by creating a new section, it allows us flexibility in the future to add more fields to inputCredentials if we need to, without bloating the main spec. I'm curious to hear @travisn 's preferences here. I could go either way.

If we keep it in the top level, inputCredentialsSecretName would be the most clear naming, though inputSecretName would probably be fine. I would steer clear of credentialsSecretName since that might also be interpreted as the config changes the name of the secret Rook uses for output.

@travisn
Copy link
Member

travisn commented May 9, 2024

If we're going to use a different secret for input, that needs to be on the spec, right? Then the status would just have the generation of that secret that has been evaluated, something like this?

spec:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
status:
  inputCredentials:
    secretGeneration: <the generation of the secret Rook last used to apply creds>

And I'm not clear what the proposal is for specifying multiple sets of creds? For simplicity it's nice to have the secret values match the output secret values, but if multiple sets of creds need to be specified, how are additional creds specified?

@parth-gr
Copy link
Member

@BlaineEXE user secret should not provide Endpoint data, right?

parth-gr added a commit to parth-gr/rook that referenced this issue May 15, 2024
CephObjectStoreUser should optionally be able to reference
a secret where S3 key is defined. This enables us to
specify the accesskey and accesssecret rather than those
values being randomly generated.

Closes: rook#11563

Signed-off-by: parth-gr <[email protected]>
@BlaineEXE
Copy link
Member

BlaineEXE user secret should not provide Endpoint data, right?

Correct. The input secret should only contain credentials.

If we're going to use a different secret for input, that needs to be on the spec, right? Then the status would just have the generation of that secret that has been evaluated, something like this?

spec:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
status:
  inputCredentials:
    secretGeneration: <the generation of the secret Rook last used to apply creds>

And I'm not clear what the proposal is for specifying multiple sets of creds? For simplicity it's nice to have the secret values match the output secret values, but if multiple sets of creds need to be specified, how are additional creds specified?

The secret name should also be reflected in the status in case the user changes the secret name in the spec. If a user creates a new secret each time, then the generation may always be 1, and there will be ambiguity as to whether the correct credentials are applied.

spec:
  inputCredentials:
    secretName: <the name of the secret the user wishes credentials to be applied from>
status:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

@parth-gr
Copy link
Member

@BlaineEXE some more questions on the status field,

status:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

So wanted to understand what should be the status on different events,

  1. If the reconciliation is started and secret exists -> secretName: my-secret, secretGeneration:-1
  2. If the reconciliation is completed succesfully and secret exists -> secretName: my-secret, secretGeneration: (generation value of secret)
  3. If the reconciliation is completed with a failure and secret exists -> secretName: my-secret, secretGeneration: (not sure?)
  4. If the reconciliation is completed succesfully and the secret doesn't exists -> secretName: -1, secretGeneration:-1
  5. If the reconciliation is started and the secret doesn't exists -> secretName: -1, secretGeneration:-1
  6. If the reconciliation is completed with failure and the secret doesn't exists -> secretName: -1, secretGeneration:-1

@BlaineEXE
Copy link
Member

These are great questions. :)

The status should reflect what the current reconcile has completed, unless the status item is specifically intended to reflect something about the ongoing reconcile. Since we aren't telling the user about the reconcile, it means we should update this status only after modifying things related to it.

When planning for the behavior of when this particular data gets updated in the status, we need to consider all cases:

  1. user changes inputCredentials.secretName to a different value
    • special case is when the user first sets this: i.e., "" -> "something"
    • different special case is when the user removes this: i.e., "something" -> ""
  2. user modifies the secret itself
  3. nothing secret-related has changed (i.e. a CephObjectStoreUser change that is unrelated to the secret)

Reconciliation routines work best when the logic used in code isn't unique for any of the cases. In this case, there is no need for us to use special logic for any of the cases. But we do need to consider all 3 of the cases I mentioned when deciding how to implement the behavior.

  • If the reconciliation is started and secret exists -> secretName: my-secret, secretGeneration:-1

We should not update these status items like this on every reconcile. In the no.3 case where nothing secret-related has changed, changing the status like this will only confuse users.

IMO, we can consider changing the status to be this immediately before Rook attempts to update the credentials, for cases no.1 and no.2. Doing so would be an indicator of what Rook is attempting to do (update the creds), and communicates that the underlying user credential status is undefined. However, once we update these statuses to reflect undefined internal state, we can't go back.

In my experience, sometimes the RGW will time out or will return an error but will still have made some internal changes. I think it would be good for Rook to indicate undefined state like this, but we should only set it immediately before RGW credentials are going to be modified.

  • If the reconciliation is completed succesfully and secret exists -> secretName: my-secret, secretGeneration: (generation value of secret)

Yes. But let's also be clear that this also assumes the user has specified an input secret. And be sure to remember that this should not be the values of the secret at the time the status is applied. The fields should reflect the secret name and generation that were applied to Ceph, even if they are behind the current secret name/value.

  • If the reconciliation is completed with a failure and secret exists -> secretName: my-secret, secretGeneration: (not sure?)

Let's also be clear that this also assumes the user has specified an input secret.

Failures can have different sources. If the reconcile fails before Rook attempts to update the credentials, there is no need to change these status fields.

If there is an error while applying the user secret data to the user, then we should not update the fields either. This is because we have already (presumably) set the status such that it indicates an undefined result. When RGW says that it failed to do the update, we don't truly know what the internal state is.

  • If the reconciliation is completed succesfully and the secret doesn't exists -> secretName: -1, secretGeneration:-1

Mostly yes, but I would use more straightforward wording: if the reconciliation is completed, and the user hasn't specified inputCredentials.secretName. Also, in the case where the user hasn't set (or has recently removed the inputs), the status should just be the default value for both items, so: secretName: "", secretGeneration: 0

  • If the reconciliation is started and the secret doesn't exists -> secretName: -1, secretGeneration:-1

And the user has specified an input secret.

This is an interesting case. When thinking about this, I am trying to determine what will be the best for the user experience. Presumably, the user will have an alert that checks whether the status Rook is outputting matches the secret used for input creds.

With that case in mind, it probably is good to set these such that they will throw an error Rook can't clear up the issue.

However, case no.1 is the complicating one. What happens if the user changes the value from "secret-1" to "secret-2"?

From a behavioral standpoint, Rook isn't going to be able to apply creds from a non-existent secret. That means that the underlying user's credentials are known by Rook, and they are known to be the currently-set values. Rook should not update the status in this case. The status should remain at secretName: "secret-1", secretGeneration: <previous-value>

  • If the reconciliation is completed with failure and the secret doesn't exists -> secretName: -1, secretGeneration:-1

This shouldn't happen, and we don't need to handle this condition. If the secret doesn't exist, that is already a failure.


I am trying to focus on and push all of the Rook devs to write less complex and fully-idempotent reconciliation routines and reconciliation logic. With that in mind, I will propose some pseudocode below that I think matches the enumerated cases above.

These are good focus points:

  • use as few conditional branches as possible
  • conditional branches should be as short as possible (keep happy path on the left)
  • conditionals should largely be based on (1) input config, and (2) error status
  • avoid conditionals based on edge conditions where possible
    • this specially goes for the edge condition "if this is the first reconcile"; it should be possible to encode logic for the first, Nth, and last times a feature is used using the same code logic
  • reconciliation routines should always refresh their current "view" of the inputs and system states every reconcile -- reconciles should not store state/status that gets used in future reconciles
ReconcileCredentials()
  if spec.inputCredentials.inputSecret is set:
     if inputSecret doesn't exist:
       return error // we know that the internal state is still the previously set secret

     secretObj, error = getSecret(inputSecret)
     if error:
       return error // we know that the internal state is still the previously set secret
        
     // once rook begins to update the user creds, we assume an undefined internal state
     // even if RGW returns failure (or timeout), sometimes there is a partial internal update
     set status.secretName = inputSecret, status.secretGeneration=-1 
     
     error = setUserCreds(user, creds(secretObj.data))
     if error:
        return error // no need to change status since undefined state will indicate a problem
  endif
  
  // regardless of whether the input secret is used, Rook always gets the current internal creds to apply to the reference secret
  currentCreds = getUserCreds(user)
  error = updateReferenceSecret(currentCreds)
  if error:
    // in case where inputSecret is present, there is no need to change status since undefined state will indicate a problem
    return error 
  
  if spec.inputCredentials.inputSecret is not set:
    set status.secretName = "", staus.secretGeneration=0
    return success
    
  if currentCreds != creds(inutSecret.data):
    return error(input secret was not applied successfully) // keep undefined status to indicate a problem
    
  set status.secretName = inputSecret, status.secretGeneration = secretObj.generation
  
  return success

Notice that we don't actually have to have conditionals for every one of the cases you have mentioned. The code itself is simpler. But it is still a good exercise to enumerate all of the different input and error conditions beforehand, so we know whether the code we produce is doing the right thing in all cases. It is also good practice to indicate in code comments what Rook is choosing to do (or not to do) based on the enumerated conditions we planned.

@BlaineEXE
Copy link
Member

BlaineEXE commented May 20, 2024

I did some looking locally, and it looks like not all k8s resources utilize the metadata.generation field. Secrets in particular seem to not update this. We will have to use another metric for determining if Rook has applied the current version.

The next best candidate is resourceVersion. This is documented to be

the version of that resource as stored in the underlying persistence layer

That doesn't make any special guarantees that the field won't change when the user hasn't made any updates, but it seems like the persistently stored version shouldn't change randomly, especially of a Secret, which doesn't have a status field. For resources that have a Status, I would bet that a Status change changes the resourceVersion but doesn't change the generation.

In my test cluster, I haven't seen a secret randomly get a new resourceVersion after 30+ minutes. And I have observed the resourceVersion change when I add labels/annotations to the secret.

I think that bodes well for a plan to use resourceVersion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
v1.14
To do
Development

Successfully merging a pull request may close this issue.

10 participants