Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not lose flush error on server side update log #756

Merged

Conversation

gabemontero
Copy link
Contributor

Changes

/kind bug

An intermittent server side update log error I discussed with @sayan-biswas @avinal @khrm @enarha last month:

{"level":"error","ts":1713278748.4461074,"caller":"v1alpha2/logs.go:103","msg":"operation error S3: UploadPart, https response error StatusCode: 404, RequestID: 732R1N8N4J9RSB06, HostID: lsBFw/50Pfgee1X946YoNjrGdEfnafH1KmsVxQdqZXNGqNDuk2Vdka8vSm13Kx3h88Vbyq9HM7A=, api error NoSuchUpload: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.","stacktrace":
"github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog.func1\n
\t/opt/app-root/src/pkg/api/server/v1alpha2/logs.go:103\n
github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog\n
\t/opt/app-root/src/pkg/api/server/v1alpha2/logs.go:156\n
github.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_UpdateLog_Handler\n
\t/opt/app-root/src/proto/v1alpha2/results_go_proto/api_grpc.pb.go:686\n
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.StreamServerInterceptor.func1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery/interceptors.go:48\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49\n
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:121\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49\n
github.com/grpc-ecosystem/go-grpc-middleware/auth.StreamServerInterceptor.func1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/auth/auth.go:66\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49
\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/server_interceptors.go:53\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49\n
github.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tags/interceptors.go:39\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49\n
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n
\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:58\n
google.golang.org/grpc.(*Server).processStreamingRPC\n
\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1673\n
google.golang.org/grpc.(*Server).handleStream\n
\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n
\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1016"}

If we get an error on the flush let's return it to the client in case retry is possible

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you review them:

  • [n/a ] Has Docs included if any changes are user facing
  • [ n/a] Has Tests included if any functionality added or changed
  • [ /] Tested your changes locally (if this is a code change)
  • [/ ] Follows the commit message standard
  • [/ ] Meets the Tekton contributor standards (including functionality, content, code)
  • [/ ] Has a kind label. You can add a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ n/a] Release notes block below has been updated with any user-facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
  • [ n/a] Release notes contain the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesnt merit a release note. labels May 14, 2024
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 14, 2024
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha2/logs.go 68.4% 57.1% -11.3

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
@enarha
Copy link
Contributor

enarha commented May 16, 2024

It feels to me that we are masking the "real" error which got us to handleReturn in the first place. I mean when we try to flush the log and fail. Maybe at least concatenate the error messages like "got flushErr while handling otherError"?

@gabemontero
Copy link
Contributor Author

It feels to me that we are masking the "real" error which got us to handleReturn in the first place. I mean when we try to flush the log and fail. Maybe at least concatenate the error messages like "got flushErr while handling otherError"?

yeah I was wondering about that as well after I submitted the PR @enarha .... I'll look into doing this. Will post a comment when I 've updated it.

rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED
@gabemontero gabemontero force-pushed the fix-update-log-svr-side-lose-flush-err branch from b06ec84 to 1b916ab Compare May 20, 2024 14:40
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2024
@gabemontero
Copy link
Contributor Author

It feels to me that we are masking the "real" error which got us to handleReturn in the first place. I mean when we try to flush the log and fail. Maybe at least concatenate the error messages like "got flushErr while handling otherError"?

yeah I was wondering about that as well after I submitted the PR @enarha .... I'll look into doing this. Will post a comment when I 've updated it.

I've updated the PR @enarha to aggregate the errors ... PTAL

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/api/server/v1alpha2/logs.go 68.4% 63.4% -5.0

Copy link
Contributor

@enarha enarha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enarha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2024
Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2024
@tekton-robot tekton-robot merged commit 0a020d9 into tektoncd:main May 21, 2024
6 checks passed
@gabemontero gabemontero deleted the fix-update-log-svr-side-lose-flush-err branch May 21, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesnt merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants