all: Be more stringent with timer resets/stops (ref #9417) #9422
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I found some concerning things while looking through our timer handling to see if we leaked timers. I didn't find any leaks of tickers, which is good, and most timers were stopped which is also fine. There were some missing stops that I added, but these should not be critical as a normal timer will anyway be collected when it expires. One place created time.After() in a loop which is not great, that was cleaned up.
However, there were some inconsistencies mostly around our use of resetting timers. The documentation is quite clear on what not to do (although it's less clear on the consequences if you do it anyway):
This is impossible to guarantee when doing Reset() from outside the loop that processes the timer in question. Any place where we saved a timer in a struct, looped on a select over it, and did resets from another goroutine would run afoul of this rule.
To handle this better I've rewritten those places to use "local" timers (i.e., timer is declared in the function that loops over it, nobody else can touch it, and we make sure it gets stopped with a defer) and a channel to trigger reset from within that loop instead.
I also added a couple of convenience functions to reset a timer properly and stop a timer properly. Using the stop method is less critical, nothing bad will happen by using just a normal (*timer).Stop(), unless it's for something like a timeout where we want to make sure the timeout isn't trigger after doing the stop because there was already an undrained value in the channel. Then using the new StopAndDrain() will make sure the channel is drained.
I used the non-blocking receive in the convenience functions because it's sometimes hard to know whether the channel is already drained when calling StopAndDrain, we just want to make sure it is drained afterwards.