Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add stored status explicitly for logs #704

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

avinal
Copy link
Member

@avinal avinal commented Feb 5, 2024

Changes

  • added IsStored to LogStatus
  • update the API Version to v1alpha3
  • v1alpha2 will be removed later
  • this will serve as a clear indication of if the logs have been stored, partially stored or not stored at all.
  • this can be used to mitigate the race condition between pruning of runs and log storage.

Signed-off-by: Avinal Kumar [email protected]

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you review them:

  • Has Docs included if any changes are user facing
  • Has Tests included if any functionality added or changed
  • Tested your changes locally (if this is a code change)
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user-facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
  • Release notes contain the string "action required" if the change requires additional action from users switching to the new release

Release Notes

action required: - the log status contains an extra field called `is_stored` to denote if the logs have been correctly stored or not
- Breaking Change: API Version is updated to v1alpha3 from v1alpha2 

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 5, 2024
@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from avinal after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 5, 2024
@avinal
Copy link
Member Author

avinal commented Feb 5, 2024

/kind flake

@tekton-robot tekton-robot added the kind/flake Categorizes issue or PR as related to a flakey test label Feb 5, 2024
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha2/logs.go 68.3% 66.7% -1.6

@avinal avinal marked this pull request as draft February 5, 2024 13:52
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2024
@avinal avinal marked this pull request as ready for review February 6, 2024 13:34
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024
@tekton-robot tekton-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 7, 2024
@avinal
Copy link
Member Author

avinal commented Feb 14, 2024

As discussed in the WG meeting, I have tested it for partial log store, and it seems there is only one issue (not ATM) we need to take care of.

  • In case of file based storage if the logs have been partially stored, we need to have a mechanism to clear them because the server will try to store the logs again, and those partially stored ones would be orphaned. we can also develop another mechanism to resume from where it broke.

In case of S3 or GCS the partial logs will be discarded.

@avinal
Copy link
Member Author

avinal commented Feb 14, 2024

@sayan-biswas
Copy link
Contributor

Please also keep in mind this is an API change, previously stored data will not have the field isStore and with the current design GetLog will fail even it the log is there.

@ramessesii2
Copy link
Member

It seems we need to add WIP tag to this PR if it's still being worked on. Thanks.

@tekton-robot tekton-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 11, 2024
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2024
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha3/auth/impersonation/impersonation.go Do not exist 75.6%
pkg/api/server/v1alpha3/auth/nop.go Do not exist 0.0%
pkg/api/server/v1alpha3/auth/rbac.go Do not exist 75.0%
pkg/api/server/v1alpha3/lister/aggregator.go Do not exist 28.0%
pkg/api/server/v1alpha3/lister/filter.go Do not exist 92.3%
pkg/api/server/v1alpha3/lister/limit.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/lister.go Do not exist 15.8%
pkg/api/server/v1alpha3/lister/offset.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/order.go Do not exist 86.5%
pkg/api/server/v1alpha3/lister/page_token.go Do not exist 69.2%
pkg/api/server/v1alpha3/log/file.go Do not exist 78.6%
pkg/api/server/v1alpha3/log/gcs.go Do not exist 42.9%
pkg/api/server/v1alpha3/log/log.go Do not exist 72.7%
pkg/api/server/v1alpha3/log/s3.go Do not exist 38.2%
pkg/api/server/v1alpha3/logs.go Do not exist 67.4%
pkg/api/server/v1alpha3/ordering.go Do not exist 100.0%
pkg/api/server/v1alpha3/pagination.go Do not exist 100.0%
pkg/api/server/v1alpha3/record/record.go Do not exist 63.3%
pkg/api/server/v1alpha3/records.go Do not exist 87.1%
pkg/api/server/v1alpha3/result/result.go Do not exist 76.0%
pkg/api/server/v1alpha3/results.go Do not exist 88.9%
pkg/api/server/v1alpha3/server.go Do not exist 68.2%
pkg/api/server/v1alpha3/summary.go Do not exist 0.0%

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha3/auth/impersonation/impersonation.go Do not exist 75.6%
pkg/api/server/v1alpha3/auth/nop.go Do not exist 0.0%
pkg/api/server/v1alpha3/auth/rbac.go Do not exist 75.0%
pkg/api/server/v1alpha3/lister/aggregator.go Do not exist 28.0%
pkg/api/server/v1alpha3/lister/filter.go Do not exist 92.3%
pkg/api/server/v1alpha3/lister/limit.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/lister.go Do not exist 15.8%
pkg/api/server/v1alpha3/lister/offset.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/order.go Do not exist 86.5%
pkg/api/server/v1alpha3/lister/page_token.go Do not exist 69.2%
pkg/api/server/v1alpha3/log/file.go Do not exist 78.6%
pkg/api/server/v1alpha3/log/gcs.go Do not exist 42.9%
pkg/api/server/v1alpha3/log/log.go Do not exist 72.7%
pkg/api/server/v1alpha3/log/s3.go Do not exist 38.2%
pkg/api/server/v1alpha3/logs.go Do not exist 67.4%
pkg/api/server/v1alpha3/ordering.go Do not exist 100.0%
pkg/api/server/v1alpha3/pagination.go Do not exist 100.0%
pkg/api/server/v1alpha3/record/record.go Do not exist 63.3%
pkg/api/server/v1alpha3/records.go Do not exist 87.1%
pkg/api/server/v1alpha3/result/result.go Do not exist 76.0%
pkg/api/server/v1alpha3/results.go Do not exist 88.9%
pkg/api/server/v1alpha3/server.go Do not exist 68.2%
pkg/api/server/v1alpha3/summary.go Do not exist 0.0%

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding stored status for logs?

@khrm
Copy link
Contributor

khrm commented Mar 13, 2024

Can you have two commits here? One for API and other for status. It's easier to review then.

@gabemontero
Copy link
Contributor

Why are we adding stored status for logs?

It is in relation to fixing #514 @khrm as the use of labels or annotations had undesirable side effects, but that should be explicitly stated in the PR's description, which as I look now, I don't see it.

Good catch @khrm

@gabemontero
Copy link
Contributor

Can you have two commits here? One for API and other for status. It's easier to review then.

I agree ... fwiw @avinal what @khrm is calling for has been a best practice employed in various k8s / golang based projects.

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha3/auth/impersonation/impersonation.go Do not exist 75.6%
pkg/api/server/v1alpha3/auth/nop.go Do not exist 0.0%
pkg/api/server/v1alpha3/auth/rbac.go Do not exist 75.0%
pkg/api/server/v1alpha3/lister/aggregator.go Do not exist 28.0%
pkg/api/server/v1alpha3/lister/filter.go Do not exist 92.3%
pkg/api/server/v1alpha3/lister/limit.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/lister.go Do not exist 15.8%
pkg/api/server/v1alpha3/lister/offset.go Do not exist 100.0%
pkg/api/server/v1alpha3/lister/order.go Do not exist 86.5%
pkg/api/server/v1alpha3/lister/page_token.go Do not exist 69.2%
pkg/api/server/v1alpha3/log/file.go Do not exist 78.6%
pkg/api/server/v1alpha3/log/gcs.go Do not exist 42.9%
pkg/api/server/v1alpha3/log/log.go Do not exist 72.7%
pkg/api/server/v1alpha3/log/s3.go Do not exist 38.2%
pkg/api/server/v1alpha3/logs.go Do not exist 67.4%
pkg/api/server/v1alpha3/ordering.go Do not exist 100.0%
pkg/api/server/v1alpha3/pagination.go Do not exist 100.0%
pkg/api/server/v1alpha3/record/record.go Do not exist 63.3%
pkg/api/server/v1alpha3/records.go Do not exist 87.1%
pkg/api/server/v1alpha3/result/result.go Do not exist 76.0%
pkg/api/server/v1alpha3/results.go Do not exist 88.9%
pkg/api/server/v1alpha3/server.go Do not exist 68.2%
pkg/api/server/v1alpha3/summary.go Do not exist 0.0%
pkg/watcher/reconciler/dynamic/dynamic.go 64.2% 62.3% -1.9

@avinal
Copy link
Member Author

avinal commented Mar 14, 2024

@khrm @gabemontero I have broken the commits in two and also updated the PR description.

Copy link
Member

@ramessesii2 ramessesii2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

cmd/api/main.go Outdated
@@ -180,19 +180,19 @@ func main() {
recovery.StreamServerInterceptor(recovery.WithRecoveryHandler(recoveryHandler)),
),
)
v1alpha2pb.RegisterResultsServer(gs, v1a2)
v1alpha3pb.RegisterResultsServer(gs, v1a2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to register a result server for v1alpha2 in addition to v1alpha3 so the api server can serve clients accessing either version @avinal @sayan-biswas ?

per the wg call @avinal you need to test this at least manually .... if an e2e could be created that does curls against both versions, even better ..... one could argue that an e2e for accessing both versions should be a requirement for this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the tests, but I think something is getting wrong. If I update the server with the new code on the same DB, the logs are not being stored. I will try to find out the reason and fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We change the API version for breaking change only. Otherwise, you need to add proto for both v1alpha2 and v1alpha3. Adding a field or method isn't a breaking change. Renaming or removing them is a breaking change.

In this PR, I don't find any new field or method in proto. Only an internal behaviour change which I don't think requires a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we roll back to just migrating the DB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to migrate the DB? Isn't it part of json field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we update the Log Object version? Why is it necessary for it to have the same version as API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, we discussed in the call that to keep things in sync we can update the whole API. Do you think we should rediscuss the context and implementation for this in the next call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should rediscuss this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding object specific versioning is atypical at least from a k8s perspective

is there precedent in tekton @khrm that I'm missing?

that said @avinal if you and @sayan-biswas have not had a chance to get together and sort out the issues in your testing that we discussed, at this point I'm up for any path with is both a) not too crazy, and b) does not require a migrator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are updating the Object version here as well as the API. That object is internal to Results. That was the context of my comment. Let's discuss this in WG call.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2024
Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change require proto change? We are changing version of Log version in DB.
Just changing https://github.com/tektoncd/results/blob/84acda76c4cc0193eaa63d443148e912d2c192ba/pkg/apis/v1alpha3/types.go should work.
Should TaskRunLog stored in DB mirror GRPC version?

@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2024
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha2/logs.go 68.4% 67.5% -0.9
pkg/apis/v1alpha3/types.go Do not exist 0.0%
pkg/apis/v1alpha3/types.go Do not exist 0.0%
pkg/apis/v1alpha3/types.go Do not exist 100.0%
pkg/apis/v1alpha3/types.go Do not exist 100.0%
pkg/apis/v1alpha3/types.go Do not exist 66.7%

pkg/api/server/v1alpha2/logs.go Outdated Show resolved Hide resolved
@@ -112,13 +112,13 @@ func TestParseName(t *testing.T) {
}

func TestToStorage(t *testing.T) {
log := &v1alpha2.Log{
log := &v1alpha3.Log{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not have at least one v1alpha2 test case to validate that behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests won't work because we no longer have v1alpha2, to simulate, we have to retain the v1alpha2 types as well.

Value: jsonutil.AnyBytes(t, &v1alpha2.Log{
Spec: v1alpha2.LogSpec{
Resource: v1alpha2.Resource{
Type: v1alpha3.LogRecordType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have at least one v1alpha2 test case to validate that behavior?

@@ -320,7 +320,7 @@ func toJSON(v any) []byte {
}

func TestToLogProto(t *testing.T) {
wantType := "results.tekton.dev/v1alpha2.Log"
wantType := "results.tekton.dev/v1alpha3.Log"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will convert.go ever encounter v1alpha2?

@gabemontero
Copy link
Contributor

@gabemontero
Copy link
Contributor

also @avinal - just noticed something when looking at @khrm 's event PR .... should we be bumping the annotation for logs to v1alpha3 ?

See https://github.com/tektoncd/results/pull/748/files#diff-83680fa39b0bf1807ef48e2ec46f89b5ef89abf69a628ba8fac9a2e01554e473R12

Copy link
Contributor

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional thoughts occurred to me @avinal @sayan-biswas @khrm @enarha @ramessesii2 while I was considering this change for helping with fixing the memory leak and canceled context issues with sufficient latency / performance characteristics

let's add this to my prior ask for additional unit tests

also @avinal are you going to be able to spend time on wrapping this PR up this week? if not, let me know and I'll see about getting this over the finish line

thanks

Size int64 `json:"size"`
Path string `json:"path,omitempty"`
Size int64 `json:"size"`
IsStored bool `json:"isStored"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add an errorOnStoreCode int json:"errorOnStore"field and anerrorOnStoreMsg string json:"errorOnStoreMsg" field

that way we can tell if

  • an attempt to store has not yet completed
  • the store was successful
  • the store failed on an error
  • the precise error is retryable or not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems a better idea. Should I replace IsStored or add these new fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep IsStored and add the other two

@avinal
Copy link
Member Author

avinal commented May 13, 2024

also @avinal - just noticed something when looking at @khrm 's event PR .... should we be bumping the annotation for logs to v1alpha3 ?

Yes, that was the specific change we needed to differentiate v1alpha3 logs from v1alpha2.

- adding log stored stored status explictly in the Log object
improves the detection for partial or no storage of logs
- it might help mitigate the race condition between pruning
the runs and storing the logs.

Signed-off-by: Avinal Kumar <[email protected]>

rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha2/logs.go 68.4% 67.5% -0.9
pkg/apis/v1alpha3/types.go Do not exist 0.0%
pkg/apis/v1alpha3/types.go Do not exist 0.0%
pkg/apis/v1alpha3/types.go Do not exist 50.0%
pkg/apis/v1alpha3/types.go Do not exist 100.0%
pkg/apis/v1alpha3/types.go Do not exist 66.7%

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2024
@tekton-robot
Copy link

@avinal: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flakey test needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants