Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore state from disk #4

Open
marceloboeira opened this issue May 27, 2019 · 1 comment
Open

Restore state from disk #4

marceloboeira opened this issue May 27, 2019 · 1 comment
Labels
help wanted Extra attention is needed
Milestone

Comments

@marceloboeira
Copy link
Owner

marceloboeira commented May 27, 2019

Right now, there is no API for restoring state of a log from disk… The process can’t be volatile regarding data.

Imagine you have to restart a process, the commit-log should be able to be quickly read from a folder on disk, you don't want to loose all the log that you had already ingested...

Since the end-to-end functionality is complex, we can start by breaking up the tasks top-down:

CommitLog

When opening a commit_log, a folder should be given, the within the folder files should be organised into tuples of same named files, index/log, sub directories and other things should be ignored. It would be good to read the files in order of creation OR even better, by their offset (name).

~/foo/commit-log
  /00000000000001.idx
  /00000000000001.log
  /00000000000002.idx
  /00000000000003.log
  /00000000000005.idx --> ignored (missing log file)
  /a.txt --> ignored
  /b/c.txt --> ignored

API for reopening a directory

let mut c = CommitLog::open("~/foo");
// everything should work the same down here

c.write()
c.read()

Segment

when opening a segment, both log and index files path should be given for full check. Each file check is performed by the index/log structs, but, the segment should ensure that the returning struct will be open for writing or closed …

File level

Index File

Trickiest part, since it is the reference for where data is stored on the files themselves. I would say this is the first part to be implemented.

The procedure must reopen a given file, and check its content / space left.

The index is truncated on creation (filled with empty bytes), that’s good because it spare space in disk and memory but a bit bad because when reopening we have to figure out where did we stop writing to it. If we just look the file size, it will tell you always the max_size defined beforehand, so you need to check where is the first empty byte to actually make sense of it.

There are several ways of doing it so, mainly what I’ve seen implemented was binary search within the file to lookup entries.

One idea was to actually, read the file in reverse until you find the first “existing” byte, set that as the end of the file and then do a quick check on entries size, by trying to divide the entries into the default entry size (20).

Log

There isn’t too much to do here other than open the file and check if it is still "open" (meaning that it has space left for writes). That's done by properly checking the size, empty bytes shouldn't count.

important for file implementations check the reference links.


Questions still open here:

  • Should we keep storing segments as Vec<Segment>?
    • How efficient it is memory?
    • Should we check BTrees for efficiency?
  • Should we store indexes adding the commit-log (base-offset + index offset) or should we use both individually?
    • I’ve seen other implementations where all the offsets where global, the base (filename) + the internal file sparsing, I couldn’t never dive into figuring out the tradeoffs. It seems like if you don’t do it you have a bit more trouble searching later on reading. (I’ll move this to another issue probably).

Acceptance criteria

At the end of this task, we should be able to reopen log from disk following the above instructions/considerations.

References:

@marceloboeira marceloboeira added the help wanted Extra attention is needed label May 27, 2019
@marceloboeira marceloboeira added this to the 0.1.0 milestone May 27, 2019
@marceloboeira
Copy link
Owner Author

I have started a WIP of the index reopening here, I didn't have time to go through with it so far 1fd9baf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant