-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore state from disk #4
Labels
help wanted
Extra attention is needed
Milestone
Comments
I have started a WIP of the index reopening here, I didn't have time to go through with it so far 1fd9baf |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Right now, there is no API for restoring state of a log from disk… The process can’t be volatile regarding data.
Imagine you have to restart a process, the commit-log should be able to be quickly read from a folder on disk, you don't want to loose all the log that you had already ingested...
Since the end-to-end functionality is complex, we can start by breaking up the tasks top-down:
CommitLog
When opening a commit_log, a folder should be given, the within the folder files should be organised into tuples of same named files, index/log, sub directories and other things should be ignored. It would be good to read the files in order of creation OR even better, by their offset (name).
API for reopening a directory
Segment
when opening a segment, both log and index files path should be given for full check. Each file check is performed by the index/log structs, but, the segment should ensure that the returning struct will be open for writing or closed …
File level
Index File
Trickiest part, since it is the reference for where data is stored on the files themselves. I would say this is the first part to be implemented.
The procedure must reopen a given file, and check its content / space left.
The index is truncated on creation (filled with empty bytes), that’s good because it spare space in disk and memory but a bit bad because when reopening we have to figure out where did we stop writing to it. If we just look the file size, it will tell you always the max_size defined beforehand, so you need to check where is the first empty byte to actually make sense of it.
There are several ways of doing it so, mainly what I’ve seen implemented was binary search within the file to lookup entries.
One idea was to actually, read the file in reverse until you find the first “existing” byte, set that as the end of the file and then do a quick check on entries size, by trying to divide the entries into the default entry size (20).
Log
There isn’t too much to do here other than open the file and check if it is still "open" (meaning that it has space left for writes). That's done by properly checking the size, empty bytes shouldn't count.
important for file implementations check the reference links.
Questions still open here:
Vec<Segment>
?Acceptance criteria
At the end of this task, we should be able to reopen log from disk following the above instructions/considerations.
References:
Kafka Log
Kafka Index
https://github.com/travisjeffery/jocko/blob/master/commitlog/commitlog.go#L95-L133
https://github.com/zowens/commitlog/blob/master/src/index.rs#L203-L226
The text was updated successfully, but these errors were encountered: