-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential wait time increase if Raft Snapshot fails #1664
Comments
Actually, it wouldn't need to change the busy timeout. Just exec the SQL statement ( |
An alternative idea, which might be more elegant, would be to use channels (or condition variables) to signal when there are Store.Request() or Store.Query() functions active. Whenever the number of those requests goes from non-zero to zero, a window has opened up on which a snapshot could take place. Use that window to block more read requests, check if a snapshot is needed due to WAL size, and do it if necessary. |
I like the alternative idea. The only additional part that I would add is some hint that clients can see that provides a hint for if a snapshot is desired right now, perhaps as an addition to one of the existing endpoints. For example, adding This would be useful both for analytics and for potentially deferring non-urgent requests, and would guarantee snapshotting completes. |
Building on that, if additional 503s that aren't opt-in aren't desirable, it could be instead accomplished with as a precondition on the request, similar to freshness. I can't think of a good name of it but e.g.
This will allow the 503s to be opt-in, which could be friendlier for environments where client retries aren't robustly implemented (e.g., short-term projects, one-off scripts, etc) |
Yeah, I like that. Expose the information, so it's there for diagnostics and power users if needed. |
Additional context I'm relinking here from my interpretation of the sqlite forum post you made - https://sqlite.org/forum/forumpost/d6ac6bf752 is that it's not just one read query that slows it down, its this scenario req6 | xxxxx
req5 | xxxxx
req4 | xxxxx
req3 | xxxxx
req2 | xxxxx
req1 | xxxxx
walL | 111112233445566
-----------------
time ->
reqN = incoming none-level request N
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number) which can be resolved only in one of 2 ways as I understand it:
req6 | BBxxxxx
req5 | BBBBxxxxx
req4 | xxxxx
req3 | Bxxxxx
req2 | xxxxx
req1 | xxxxx
walL | 11111223344 55566
snap | F BF BBBS F
-------------------
time ->
reqN = incoming none-level request N; x means running,
B means blocked
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; B = blocked, F = fail,
S = success Depending on the system this could be pretty decent; you don't get failures on your none-queries, but they might take a very long time to respond because they spend most of the request time blocked in the extreme case.
req6 | F
req5 | F
req4 | xxxxx
req3 | F
req2 | F
req1 | xxxxx
walL | 11111 44444
snap | FRRRS FRRS
-------------------
time ->
reqN = incoming none-level request N; x means running,
F means failed due to snapshotting
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; F = fail, R = causing rejections
to new queries, S = success Depending on the system this could be pretty decent; if you have a large enough cluster, presumably there is at least one node not snapshotting at a time and so standard retrying on other nodes will find you a place to put your request. Requests are fast in one sense: they either instantly fail, or are dominated by the actual time required for the query. Personally, I prefer the latter since it allows for more control later, e.g., prioritizing queries, since rqlite is managing the failures rather than sqlite's locks which will treat all the queries equally |
Started with #1643
8.x introduced a degenerate behaviour that was not present in the 7.x series. It goes like this:
This is not really solvable in a fundamental way, not without wild ideas like two copies of the SQLite database sitting under rqlite. The benefit of 8.x is way more efficient snapshotting generally. The cost is that reads can block writes .
I'm not happy with writes being blocked this long by long-running reads. I don't like it because in SQLite reads do not block writes when the WAL is enabled (except while a RESTART or TRUNCATE checkpoint is running). So I would like to minimize the amount of time a long-running read can block writes, at least initially.
I think it would be better if the snapshotting process waited up to 500ms the first time, and doubled that wait time on every failure. Next time it would wait 1 sec, then 2, then 4, and so on. This could be achieved by changing the busy timeout on the connection used for checkpointing (the read-write connection), just before the checkpoint operation is attempted. It would revert to 5 seconds (so existing write timeouts are otherwise maintained) before the checkpoint function returns. And if checkpointing failed, rqlite would remember to double the timeout for the next attempt. Once snapshotting succeeds, the timeout returns to 500ms.
That said, I'm not sure this is strictly better. At the end of the day snapshotting has to eventually happen, and everytime a snapshot is missed, the next snapshot, if it suceeds, will take longer. Perhaps a more aggressive exponent might help (quadruple for example).
cc @Tjstretchalot for any opinion.
The text was updated successfully, but these errors were encountered: