-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexecerror: do not catch runtime panics in crdb_test builds #124098
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This came up during a postmortem. I'm not sure yet how much noise it will create in test failures, but curious to hear people's thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it's an interesting idea, and I'm in favor. I wouldn't expect this to have much noise - looks like only a single test needed an adjustment since it relies on the runtime panic.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/colexecerror/error_test.go
line 43 at r1 (raw file):
// Use the release-build panic-catching behavior instead of the // crdb_test-build behavior. defer func(prev bool) {
nit: it might be cleaner to have this logic in colexecerror/error.go
and only have a single exported method. I'm thinking about something like this:
var testingKnobShouldCatchPanic bool
// ProductionBehaviorForTests makes the panic-catcher behave in tests
// in the same way as it does in production builds.
func ProductionBehaviorForTests() func() {
old := testingKnobShouldCatchPanic
testingKnobShouldCatchPanic = true
return func() {
testingKnobShouldCatchPanic = old
}
}
...
// in tests
defer colexecerror.ProductionBehaviorForTests()()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/colexecerror/error_test.go
line 43 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it might be cleaner to have this logic in
colexecerror/error.go
and only have a single exported method. I'm thinking about something like this:var testingKnobShouldCatchPanic bool // ProductionBehaviorForTests makes the panic-catcher behave in tests // in the same way as it does in production builds. func ProductionBehaviorForTests() func() { old := testingKnobShouldCatchPanic testingKnobShouldCatchPanic = true return func() { testingKnobShouldCatchPanic = old } } ... // in tests defer colexecerror.ProductionBehaviorForTests()()
Nice, this is much cleaner! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/colexecerror/error_test.go
line 43 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Nice, this is much cleaner! Done.
nit: I think one set of parenthesis is missing when using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will panics injected for testing cause problems with this?
Reviewed 1 of 4 files at r1, 1 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will panics injected for testing cause problems with this?
- Panics injected by calling
crdb_internal.force_panic('foo')
from SQL will continue to crash the node, as they did before. - Panics injected outside the vectorized execution engine will continue to behave as they did before (either crashing the node if they are not caught somewhere else, or becoming an error if they are caught and wrapped with StorageError or InternalError or ExpectedError).
This should only affect panics injected within the vectorized execution engine that are not wrapped, which should be rare (and I believe represent bugs in the vectorized engine).
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and I believe represent bugs in the vectorized engine).
Actually, what I meant to say, since we're talking about injected panics is not that they represent bugs, but that they should be handled by adding the call to colexecerror.ProductionBehaviorForTests()
. And these should only be tests within the colexec* packages, which I think I've found.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have TestVectorizedPanics
linter that requires that all code within the vectorized engine (which has ./pkg/sql/col*
prefix in the path) to use either InternalError
or ExpectedError
. If we have an unwrapped panic, I think it's fair for us to crash in test builds.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
@mgartner ok if I bors this? |
I'll go ahead, and we can revert this if it breaks something. bors r=yuzefovich |
124098: colexecerror: do not catch runtime panics in crdb_test builds r=yuzefovich a=michae2 CatchVectorizedRuntimeError performs its stack-walking in order to catch *all* panics originating from within the vectorized execution engine, including runtime panics. The purpose of this is to limit the impact of a bug in vectorized execution to failure of a single execution rather than crashing the node. During tests, however, we'd like to make these runtime panics louder, since we believe they represent bugs in vectorized execution. So let's not catch panics under crdb_test builds unless they're wrapped. Epic: None Release note: None Co-authored-by: Michael Erickson <[email protected]>
Build failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only affect panics injected within the vectorized execution engine that are not wrapped, which should be rare (and I believe represent bugs in the vectorized engine).
Thanks for explaining! SGTM.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
CatchVectorizedRuntimeError performs its stack-walking in order to catch *all* panics originating from within the vectorized execution engine, including runtime panics. The purpose of this is to limit the impact of a bug in vectorized execution to failure of a single execution rather than crashing the node. During tests, however, we'd like to make these runtime panics louder, since we believe they represent bugs in vectorized execution. So let's not catch panics under crdb_test builds unless they're wrapped. Epic: None Release note: None
bors r=yuzefovich,mgartner |
CatchVectorizedRuntimeError performs its stack-walking in order to catch all panics originating from within the vectorized execution engine, including runtime panics. The purpose of this is to limit the impact of a bug in vectorized execution to failure of a single execution rather than crashing the node.
During tests, however, we'd like to make these runtime panics louder, since we believe they represent bugs in vectorized execution. So let's not catch panics under crdb_test builds unless they're wrapped.
Epic: None
Release note: None