all: Be more stringent with timer resets/stops (ref #9417) #9422

calmh · 2024-02-15T13:08:23Z

I found some concerning things while looking through our timer handling to see if we leaked timers. I didn't find any leaks of tickers, which is good, and most timers were stopped which is also fine. There were some missing stops that I added, but these should not be critical as a normal timer will anyway be collected when it expires. One place created time.After() in a loop which is not great, that was cleaned up.

However, there were some inconsistencies mostly around our use of resetting timers. The documentation is quite clear on what not to do (although it's less clear on the consequences if you do it anyway):

For a Timer created with NewTimer, Reset should be invoked only on stopped or expired timers with drained channels.

If a program has already received a value from t.C, the timer is known to have expired and the channel drained, so t.Reset can be used directly. If a program has not yet received a value from t.C, however, the timer must be stopped and—if Stop reports that the timer expired before being stopped—the channel explicitly drained:

if !t.Stop() {
<-t.C
}
t.Reset(d)

(https://pkg.go.dev/time#Timer.Reset)

This is impossible to guarantee when doing Reset() from outside the loop that processes the timer in question. Any place where we saved a timer in a struct, looped on a select over it, and did resets from another goroutine would run afoul of this rule.

To handle this better I've rewritten those places to use "local" timers (i.e., timer is declared in the function that loops over it, nobody else can touch it, and we make sure it gets stopped with a defer) and a channel to trigger reset from within that loop instead.

I also added a couple of convenience functions to reset a timer properly and stop a timer properly. Using the stop method is less critical, nothing bad will happen by using just a normal (*timer).Stop(), unless it's for something like a timeout where we want to make sure the timeout isn't trigger after doing the stop because there was already an undrained value in the channel. Then using the new StopAndDrain() will make sure the channel is drained.

I used the non-blocking receive in the convenience functions because it's sometimes hard to know whether the channel is already drained when calling StopAndDrain, we just want to make sure it is drained afterwards.

imsodin

This was pretty casual review (causing a few surprised grins) until I reached folder.go o.O

imsodin · 2024-02-16T19:05:07Z

lib/events/events.go

- if runningTests {
- // Make the behavior stable when running tests to avoid randomly
- // varying test coverage. This ensures, in practice if not in
- // theory, that the timer fires and we take the true branch of the
- // next if.
- runtime.Gosched()
- }


imsodin · 2024-02-16T19:13:34Z

lib/model/folder.go

 if (err != nil || !success) && f.pullPause < 60*f.pullBasePause() {
 // Back off from retrying to pull
 f.pullPause *= 2
+
+ // Pulling failed, try again later.
+ delay := f.pullPause + time.Since(t0)
+ l.Infof("Folder %v isn't making sync progress - retrying in %v.", f.Description(), stringutil.NiceDurationString(delay))
+ timeutil.ResetTimer(pullFailTimer, delay)
 }


I don't think resetting the fail timer should be inside this condition. Resp. it should be split: Reset the fail timer for (err != nil || !success), and inside that condition and before resting the timer have another if condition to increase the pause.

Also I think f.pullPause is now only used in this loop/serve function, so can be removed from the struct too.

imsodin · 2024-02-16T19:16:09Z

lib/model/folder.go

 }

+ case <-pullFailTimer.C:
+ f.SchedulePull()


Afaik you also need to schedule a pull instead of calling .pull() in the next case. Resp. maybe it's safer to pass the fail timer into pull and do the resetting there, that way one can't call pull and forget to handle the timer.

imsodin · 2024-02-16T19:24:40Z

lib/model/folder.go

 case next := <-f.scanDelay:
- l.Debugln(f, "Delaying scan")
- f.scanTimer.Reset(next)
-
- case <-f.scanScheduled:
- l.Debugln(f, "Scan was scheduled")
- f.scanTimer.Reset(0)
+ if next > 0 {
+  l.Debugln(f, "Delaying scan")
+ } else {
+  l.Debugln(f, "Scan was scheduled")
+ }
+ timeutil.ResetTimer(scanTimer, next)


That scheduling is magic, just a tiny little bit.

calmh added 11 commits February 15, 2024 09:23

all: More stringent timer handling (ref syncthing#9417)

d64bd51

wip

8cc4bca

wip

b9b179d

wip

33546d9

wip

8fc1905

wip

bc64d32

wip

36af7a9

wip

0c79e67

wip

b125aac

wip

abfc5ee

wip

b1150bb

imsodin reviewed Feb 16, 2024

View reviewed changes

calmh closed this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all: Be more stringent with timer resets/stops (ref #9417) #9422

all: Be more stringent with timer resets/stops (ref #9417) #9422

calmh commented Feb 15, 2024 •

edited

imsodin left a comment

imsodin Feb 16, 2024

imsodin Feb 16, 2024

imsodin Feb 16, 2024

imsodin Feb 16, 2024

all: Be more stringent with timer resets/stops (ref #9417) #9422

all: Be more stringent with timer resets/stops (ref #9417) #9422

Conversation

calmh commented Feb 15, 2024 • edited

imsodin left a comment

Choose a reason for hiding this comment

imsodin Feb 16, 2024

Choose a reason for hiding this comment

imsodin Feb 16, 2024

Choose a reason for hiding this comment

imsodin Feb 16, 2024

Choose a reason for hiding this comment

imsodin Feb 16, 2024

Choose a reason for hiding this comment

calmh commented Feb 15, 2024 •

edited