CephObjectStoreUser Reference Secret #11563

jameshearttech · 2023-01-18T23:09:29Z

Is this a bug report or feature request?

Feature Request

What should the feature do:

CephObjectStoreUser should optionally be able to reference a secret where S3 key is defined. This enables us to specify the accesskey and accesssecret rather than those values being randomly generated. When the secret changes the CephObjectStoreUser should be updated.

What is use case behind this feature:

Define CephObjectStoreUser S3 key properties (e.g., accesskey and accesssecret) so we can use those same values to configure access to Ceph object storage bucket in an automated way (e.g., using ArgoCD and External Secrets Operator).

Environment:

$ kubectl get nodes -o wide
NAME          STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
dev-master0   Ready    control-plane   2d23h   v1.26.0   10.69.2.30    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-master1   Ready    control-plane   2d23h   v1.26.0   10.69.2.31    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-master2   Ready    control-plane   2d23h   v1.26.0   10.69.2.32    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker0   Ready    <none>          2d23h   v1.26.0   10.69.2.33    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker1   Ready    <none>          2d23h   v1.26.0   10.69.2.34    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker2   Ready    <none>          2d23h   v1.26.0   10.69.2.35    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15
dev-worker3   Ready    <none>          2d23h   v1.26.0   10.69.2.36    <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.6.15

The text was updated successfully, but these errors were encountered:

divaspathak · 2023-01-23T05:44:15Z

Hello @jameshearttech, I want to work on this issue. Can you please assign it to me?

subhamkrai · 2023-01-23T06:59:00Z

@jameshearttech we do store accesskey and accesssecret. what are you requesting here? you suggesting to objectstoreuser should point to someSecret for this values?

parth-gr · 2023-01-23T07:22:12Z

Probably they are asking similar to this #11336

thotz · 2023-01-23T10:36:53Z

Probably they are asking similar to this #11336

Yes I guess so, @jameshearttech can you please check above PR owner about its status?

jameshearttech · 2023-01-23T15:37:37Z

@jameshearttech we do store accesskey and accesssecret. what are you requesting here? you suggesting to objectstoreuser should point to someSecret for this values?

Yes, exactly.

https://rook-io.slack.com/archives/CK9CF5H2R/p1674075543129569

In my case everything is committed to Git. Storing the accesskey and accesssecret in the CephObjectUserStore resource would potentially leak those sensitive values.

Btw, I don't think the docs reflect spec.accessKey and spec.secretKey if those are already implemented.

zhucan · 2023-03-14T03:46:43Z

is there any new progress? @jameshearttech @thotz @divaspathak @subhamkrai @parth-gr

thotz · 2023-03-14T15:21:06Z

I thought this issue was related to #11336

jameshearttech · 2023-03-14T18:09:37Z

Yeah, they are definitely related. If it makes sense, feel free to close this one.

EDIT: Oh, wait, I see the other issue was closed. The other issue has a lot of work done on it. In my case I was able to use block instead of bucket. Regardless, I think there should be a solution to this issue.

EDIT2: In my last edit I was setting up Loki and I thought that it only worked with object storage, but I found I could use file storage; however, that only works with single binary mode. We are looking to move to simple scaling mode, which requires object storage so this is now a block for me. Likewise, we are looking at implementing Thanos, which also requires object storage.

devops-42 · 2023-03-23T10:43:03Z

Hi,

I'd appreciate the approach of referring to existing k8s secrets. This would decouple the direct communication from Ceph to a KMS like Vault. One could implement K8s secrets by themselve and create CephObjectStoreUser instances referencing to an existing K8s secret.

jhoblitt · 2023-04-03T23:05:36Z

I'm interested in this feature as well. I have rgw clients which are external to k8s. It would be great if I could use something like external-secrets operator to sync rgw user creds. This would be an alternative to using ldap auth.

github-actions · 2023-06-10T20:02:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

jhoblitt · 2023-06-10T22:46:40Z

This is a feature request we should probably keep open.

github-actions · 2023-08-11T20:01:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2023-10-19T20:02:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

jameshearttech · 2023-10-19T21:22:22Z

Please do not close feature request. Afaik this has not been implemented.

travisn · 2023-10-19T21:46:49Z

A few notes on implementing this feature for someone to pick it up:
In the createOrUpdateCephUser() method, on line 298 if the GetUser() returns ErrNoSuchUser, then we need to:

Query if the K8s secret exists where Rook would later store the user creds.
If the secret exists and the expected keys exist (see example below), then set those values on the r.userConfig struct that is being passed to the CreateUser() method.
Assuming Ceph accepts the user secret as input, the user would be created with the pre-defined keys.

Here is an example user secret which is named rook-ceph-object-user-<store>-<user>:

% kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o yaml
apiVersion: v1
kind: Secret
data:
  AccessKey: U0lVNVJQTVg0VFIwMTI1QkxGSjc=
  SecretKey: aHdnbGdIVmlDcEJqTHN3WUJDakJqWlBRQ2tzeEM3dkRuRDdXMTNySw==
  Endpoint: aHR0cDovL3Jvb2stY2VwaC1yZ3ctbXktc3RvcmUucm9vay1jZXBoLnN2Yzo4MA==

If this secret is created in advance of the reconcile, then Rook would pick it up.

Note that since this issue was originally opened, now the user and secret could be created in any namespace since #12730 was implemented.

subhamkrai · 2023-12-20T14:33:29Z

Removing the assigning as I'm focused on some other issue and I don't want to hold this to my name for long

subhamkrai · 2023-12-20T14:34:02Z

Also added it to project board 1.13 to get more attention

CephObjectStoreUser should optionally be able to reference a secret where S3 key is defined. This enables us to specify the accesskey and accesssecret rather than those values being randomly generated. Closes: rook#11563 Signed-off-by: parth-gr <[email protected]>

BlaineEXE · 2024-05-03T17:34:57Z

I've been investigating key rotation for the COSI design updates as well, and after speaking with Josh on a call and collecting more information, I think we might want to revisit the design discussion for this Rook feature as well. It's tiring to get 90% of the way to completion with design and code and then have to rework it, but I think it may be worthwhile in this case.

From my discussion with Josh, the biggest gap that I think we have with the implementation proposed is that users don't have a good way to determine when the key rotation has completed. Josh has over a dozen k8s clusters that he manages, and that may grow to 2 dozen in not too long. In large environments like that, being able to know definitively when a key has been rotated is important. Status/monitoring/feedback is also a major tenet of the k8s design philosophy as well, and I missed it when proposing a design for this feature.

I'd like to talk about the details more in a Rook huddle next week, but the summary of my proposed new design is to not reuse the current secret for both input and output. I think it's simpler for developers and for users if we use a dedicated input secret, and Rook always uses the existing secret for output.

As for monitoring, I propose to achieve that by recording the metadata.generation field of the user input secret as a status field on the COSUser CR. Users can then easily use k8s tooling to compare the two values to know when the key rotation is complete.

I'd like to discuss this as a general idea before we start changing the code, so we don't burn ourselves out when reworking this. I think we can reuse a large part of the work we've already done on specifying the secret formatting, so hopefully we can limit that aspect. If that sounds good, we can iron out smaller details regarding COSUser spec/status API fields.

parth-gr · 2024-05-08T11:38:38Z

So for implementation:

Keep the original rgw secret as it is for the status.
If a user wants to provide their own credentials or do key rotation.
i) Provide the user secret name in the objectuser CR
CR will have a spec,
```
    spec:
      userSecretName: my-secret
```
ii) Create or update the secret with this name
```
     apiVersion: v1
     kind: Secret
     metadata:
          name: my-secret
     data:
       AccessKey: U0lVNVJQTVg0VFIwMTI1QkxGSjc=
       SecretKey: aHdnbGdIVmlDcEJqTHN3WUJDakJqWlBRQ2tzeEM3dkRuRDdXMTNySw==
       Endpoint: aHR0cDovL3Jvb2stY2VwaC1yZ3ctbXktc3RvcmUucm9vay1jZXBoLnN2Yzo4MA==
```
iii) The data fields remain the same as it was before
iv) If the reconciliation of cephobjectstore user was successful and credentials were updated,
Add the status to the cephobjectstore user CR, my-user secret creds successfully updated
v) Also update the original rgw secret with the AccessKey and SecretKey.

@BlaineEXE please review is that what we need to do?

BlaineEXE · 2024-05-08T16:14:14Z

The key reason to make this change is to make sure we are also planning a way for users to get feedback about the key rotation process. We need to also plan the status fields. I would suggest this to be something like this.

status:
  # ...
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

Having secretName and secretGeneration are both important. The first allows the user to change the secret used, and ensure Rook applied the correct one. The second allows the user to know when the current secret matches what Rook has applied.

From my discussion with Josh, I think it is also still good to plan for a way that users can have multiple active credentials. Josh explained this to be important to him, as well as other admins. It allows a migration window for end users so that key rotation isn't something that causes application downtime.

I think we should still plan to include the Credentials field in both the output secret and input secret to allow for this. (We could use multiple input credentials, but I think having a single input secret will mean that status comparison is easier for users.)

Also, I have another concern: Because the resource is called CephObjectStoreUser, I think we should refrain from choosing userSecretName. IMO, this naming overlap allows too much room for misunderstanding.

I also think we might consider using the a new section and the inputCredentials.secretName pattern for the input spec. This helps clarify for users that we are inputting credentials, and that it takes a secret name. I don't think this is strictly necessary, but by creating a new section, it allows us flexibility in the future to add more fields to inputCredentials if we need to, without bloating the main spec. I'm curious to hear @travisn 's preferences here. I could go either way.

If we keep it in the top level, inputCredentialsSecretName would be the most clear naming, though inputSecretName would probably be fine. I would steer clear of credentialsSecretName since that might also be interpreted as the config changes the name of the secret Rook uses for output.

travisn · 2024-05-09T21:02:20Z

If we're going to use a different secret for input, that needs to be on the spec, right? Then the status would just have the generation of that secret that has been evaluated, something like this?

spec:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
status:
  inputCredentials:
    secretGeneration: <the generation of the secret Rook last used to apply creds>

And I'm not clear what the proposal is for specifying multiple sets of creds? For simplicity it's nice to have the secret values match the output secret values, but if multiple sets of creds need to be specified, how are additional creds specified?

parth-gr · 2024-05-15T14:31:30Z

@BlaineEXE user secret should not provide Endpoint data, right?

CephObjectStoreUser should optionally be able to reference a secret where S3 key is defined. This enables us to specify the accesskey and accesssecret rather than those values being randomly generated. Closes: rook#11563 Signed-off-by: parth-gr <[email protected]>

BlaineEXE · 2024-05-15T17:11:11Z

BlaineEXE user secret should not provide Endpoint data, right?

Correct. The input secret should only contain credentials.

If we're going to use a different secret for input, that needs to be on the spec, right? Then the status would just have the generation of that secret that has been evaluated, something like this?
spec:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
status:
  inputCredentials:
    secretGeneration: <the generation of the secret Rook last used to apply creds>
And I'm not clear what the proposal is for specifying multiple sets of creds? For simplicity it's nice to have the secret values match the output secret values, but if multiple sets of creds need to be specified, how are additional creds specified?

The secret name should also be reflected in the status in case the user changes the secret name in the spec. If a user creates a new secret each time, then the generation may always be 1, and there will be ambiguity as to whether the correct credentials are applied.

spec:
  inputCredentials:
    secretName: <the name of the secret the user wishes credentials to be applied from>
status:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

parth-gr · 2024-05-20T13:42:27Z

@BlaineEXE some more questions on the status field,

status:
  inputCredentials:
    secretName: <the name of the secret that Rook last used to apply creds>
    secretGeneration: <the generation of the secret Rook last used to apply creds>

So wanted to understand what should be the status on different events,

If the reconciliation is started and secret exists -> secretName: my-secret, secretGeneration:-1
If the reconciliation is completed succesfully and secret exists -> secretName: my-secret, secretGeneration: (generation value of secret)
If the reconciliation is completed with a failure and secret exists -> secretName: my-secret, secretGeneration: (not sure?)
If the reconciliation is completed succesfully and the secret doesn't exists -> secretName: -1, secretGeneration:-1
If the reconciliation is started and the secret doesn't exists -> secretName: -1, secretGeneration:-1
If the reconciliation is completed with failure and the secret doesn't exists -> secretName: -1, secretGeneration:-1

BlaineEXE · 2024-05-20T17:28:06Z

These are great questions. :)

The status should reflect what the current reconcile has completed, unless the status item is specifically intended to reflect something about the ongoing reconcile. Since we aren't telling the user about the reconcile, it means we should update this status only after modifying things related to it.

When planning for the behavior of when this particular data gets updated in the status, we need to consider all cases:

user changes inputCredentials.secretName to a different value
- special case is when the user first sets this: i.e., "" -> "something"
- different special case is when the user removes this: i.e., "something" -> ""
user modifies the secret itself
nothing secret-related has changed (i.e. a CephObjectStoreUser change that is unrelated to the secret)

Reconciliation routines work best when the logic used in code isn't unique for any of the cases. In this case, there is no need for us to use special logic for any of the cases. But we do need to consider all 3 of the cases I mentioned when deciding how to implement the behavior.

If the reconciliation is started and secret exists -> secretName: my-secret, secretGeneration:-1

We should not update these status items like this on every reconcile. In the no.3 case where nothing secret-related has changed, changing the status like this will only confuse users.

IMO, we can consider changing the status to be this immediately before Rook attempts to update the credentials, for cases no.1 and no.2. Doing so would be an indicator of what Rook is attempting to do (update the creds), and communicates that the underlying user credential status is undefined. However, once we update these statuses to reflect undefined internal state, we can't go back.

In my experience, sometimes the RGW will time out or will return an error but will still have made some internal changes. I think it would be good for Rook to indicate undefined state like this, but we should only set it immediately before RGW credentials are going to be modified.

If the reconciliation is completed succesfully and secret exists -> secretName: my-secret, secretGeneration: (generation value of secret)

Yes. But let's also be clear that this also assumes the user has specified an input secret. And be sure to remember that this should not be the values of the secret at the time the status is applied. The fields should reflect the secret name and generation that were applied to Ceph, even if they are behind the current secret name/value.

If the reconciliation is completed with a failure and secret exists -> secretName: my-secret, secretGeneration: (not sure?)

Let's also be clear that this also assumes the user has specified an input secret.

Failures can have different sources. If the reconcile fails before Rook attempts to update the credentials, there is no need to change these status fields.

If there is an error while applying the user secret data to the user, then we should not update the fields either. This is because we have already (presumably) set the status such that it indicates an undefined result. When RGW says that it failed to do the update, we don't truly know what the internal state is.

If the reconciliation is completed succesfully and the secret doesn't exists -> secretName: -1, secretGeneration:-1

Mostly yes, but I would use more straightforward wording: if the reconciliation is completed, and the user hasn't specified inputCredentials.secretName. Also, in the case where the user hasn't set (or has recently removed the inputs), the status should just be the default value for both items, so: secretName: "", secretGeneration: 0

If the reconciliation is started and the secret doesn't exists -> secretName: -1, secretGeneration:-1

And the user has specified an input secret.

This is an interesting case. When thinking about this, I am trying to determine what will be the best for the user experience. Presumably, the user will have an alert that checks whether the status Rook is outputting matches the secret used for input creds.

With that case in mind, it probably is good to set these such that they will throw an error Rook can't clear up the issue.

However, case no.1 is the complicating one. What happens if the user changes the value from "secret-1" to "secret-2"?

From a behavioral standpoint, Rook isn't going to be able to apply creds from a non-existent secret. That means that the underlying user's credentials are known by Rook, and they are known to be the currently-set values. Rook should not update the status in this case. The status should remain at secretName: "secret-1", secretGeneration: <previous-value>

If the reconciliation is completed with failure and the secret doesn't exists -> secretName: -1, secretGeneration:-1

This shouldn't happen, and we don't need to handle this condition. If the secret doesn't exist, that is already a failure.

I am trying to focus on and push all of the Rook devs to write less complex and fully-idempotent reconciliation routines and reconciliation logic. With that in mind, I will propose some pseudocode below that I think matches the enumerated cases above.

These are good focus points:

use as few conditional branches as possible
conditional branches should be as short as possible (keep happy path on the left)
conditionals should largely be based on (1) input config, and (2) error status
avoid conditionals based on edge conditions where possible
- this specially goes for the edge condition "if this is the first reconcile"; it should be possible to encode logic for the first, Nth, and last times a feature is used using the same code logic
reconciliation routines should always refresh their current "view" of the inputs and system states every reconcile -- reconciles should not store state/status that gets used in future reconciles

ReconcileCredentials()
  if spec.inputCredentials.inputSecret is set:
     if inputSecret doesn't exist:
       return error // we know that the internal state is still the previously set secret

     secretObj, error = getSecret(inputSecret)
     if error:
       return error // we know that the internal state is still the previously set secret
        
     // once rook begins to update the user creds, we assume an undefined internal state
     // even if RGW returns failure (or timeout), sometimes there is a partial internal update
     set status.secretName = inputSecret, status.secretGeneration=-1 
     
     error = setUserCreds(user, creds(secretObj.data))
     if error:
        return error // no need to change status since undefined state will indicate a problem
  endif
  
  // regardless of whether the input secret is used, Rook always gets the current internal creds to apply to the reference secret
  currentCreds = getUserCreds(user)
  error = updateReferenceSecret(currentCreds)
  if error:
    // in case where inputSecret is present, there is no need to change status since undefined state will indicate a problem
    return error 
  
  if spec.inputCredentials.inputSecret is not set:
    set status.secretName = "", staus.secretGeneration=0
    return success
    
  if currentCreds != creds(inutSecret.data):
    return error(input secret was not applied successfully) // keep undefined status to indicate a problem
    
  set status.secretName = inputSecret, status.secretGeneration = secretObj.generation
  
  return success

Notice that we don't actually have to have conditionals for every one of the cases you have mentioned. The code itself is simpler. But it is still a good exercise to enumerate all of the different input and error conditions beforehand, so we know whether the code we produce is doing the right thing in all cases. It is also good practice to indicate in code comments what Rook is choosing to do (or not to do) based on the enumerated conditions we planned.

BlaineEXE · 2024-05-20T17:36:17Z

I did some looking locally, and it looks like not all k8s resources utilize the metadata.generation field. Secrets in particular seem to not update this. We will have to use another metric for determining if Rook has applied the current version.

The next best candidate is resourceVersion. This is documented to be

the version of that resource as stored in the underlying persistence layer

That doesn't make any special guarantees that the field won't change when the user hasn't made any updates, but it seems like the persistently stored version shouldn't change randomly, especially of a Secret, which doesn't have a status field. For resources that have a Status, I would bet that a Status change changes the resourceVersion but doesn't change the generation.

In my test cluster, I haven't seen a secret randomly get a new resourceVersion after 30+ minutes. And I have observed the resourceVersion change when I add labels/annotations to the secret.

I think that bodes well for a plan to use resourceVersion.

jameshearttech added the feature label Jan 18, 2023

github-actions bot added the wontfix label Jun 10, 2023

travisn removed the wontfix label Jun 12, 2023

parth-gr mentioned this issue Jun 20, 2023

Create OBC with predefined credentials red-hat-storage/odf-operator#323

Closed

github-actions bot added the wontfix label Aug 11, 2023

travisn added good-first-issue Simple issues that are good for getting started with Rook. and removed wontfix labels Aug 11, 2023

github-actions bot added the wontfix label Oct 19, 2023

subhamkrai self-assigned this Oct 20, 2023

github-actions bot removed the wontfix label Oct 20, 2023

subhamkrai removed their assignment Dec 20, 2023

subhamkrai added this to To do in v1.13 via automation Dec 20, 2023

parth-gr mentioned this issue May 20, 2024

doc: Update the roadmap for 1.15 #14229

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CephObjectStoreUser Reference Secret #11563

CephObjectStoreUser Reference Secret #11563

jameshearttech commented Jan 18, 2023 •

edited

divaspathak commented Jan 23, 2023

subhamkrai commented Jan 23, 2023

parth-gr commented Jan 23, 2023

thotz commented Jan 23, 2023

jameshearttech commented Jan 23, 2023 •

edited

zhucan commented Mar 14, 2023

thotz commented Mar 14, 2023

jameshearttech commented Mar 14, 2023 •

edited

devops-42 commented Mar 23, 2023 •

edited

jhoblitt commented Apr 3, 2023 •

edited

github-actions bot commented Jun 10, 2023

jhoblitt commented Jun 10, 2023

github-actions bot commented Aug 11, 2023

github-actions bot commented Oct 19, 2023

jameshearttech commented Oct 19, 2023 •

edited

travisn commented Oct 19, 2023

subhamkrai commented Dec 20, 2023

subhamkrai commented Dec 20, 2023

BlaineEXE commented May 3, 2024

parth-gr commented May 8, 2024 •

edited

BlaineEXE commented May 8, 2024

travisn commented May 9, 2024

parth-gr commented May 15, 2024

BlaineEXE commented May 15, 2024

parth-gr commented May 20, 2024

BlaineEXE commented May 20, 2024

BlaineEXE commented May 20, 2024 •

edited

CephObjectStoreUser Reference Secret #11563

CephObjectStoreUser Reference Secret #11563

Comments

jameshearttech commented Jan 18, 2023 • edited

divaspathak commented Jan 23, 2023

subhamkrai commented Jan 23, 2023

parth-gr commented Jan 23, 2023

thotz commented Jan 23, 2023

jameshearttech commented Jan 23, 2023 • edited

zhucan commented Mar 14, 2023

thotz commented Mar 14, 2023

jameshearttech commented Mar 14, 2023 • edited

devops-42 commented Mar 23, 2023 • edited

jhoblitt commented Apr 3, 2023 • edited

github-actions bot commented Jun 10, 2023

jhoblitt commented Jun 10, 2023

github-actions bot commented Aug 11, 2023

github-actions bot commented Oct 19, 2023

jameshearttech commented Oct 19, 2023 • edited

travisn commented Oct 19, 2023

subhamkrai commented Dec 20, 2023

subhamkrai commented Dec 20, 2023

BlaineEXE commented May 3, 2024

parth-gr commented May 8, 2024 • edited

BlaineEXE commented May 8, 2024

travisn commented May 9, 2024

parth-gr commented May 15, 2024

BlaineEXE commented May 15, 2024

parth-gr commented May 20, 2024

BlaineEXE commented May 20, 2024

BlaineEXE commented May 20, 2024 • edited

jameshearttech commented Jan 18, 2023 •

edited

jameshearttech commented Jan 23, 2023 •

edited

jameshearttech commented Mar 14, 2023 •

edited

devops-42 commented Mar 23, 2023 •

edited

jhoblitt commented Apr 3, 2023 •

edited

jameshearttech commented Oct 19, 2023 •

edited

parth-gr commented May 8, 2024 •

edited

BlaineEXE commented May 20, 2024 •

edited