Assert the validity of the milli databases in the tests #4586

irevoire · 2024-04-18T15:42:07Z

Last sprint, we had a bunch of database corruptions going on that were really hard to implement.
One way we avoided this kind of issue on other projects was by running a complete scan of the database in the tests after each insertion/commit.
Milli contains a lot of databases, but we should implement the same kind of function.

I believe @dureuill already made something like that for the filters. It could be a good first step to integrate it into our test suite.
And then grows the number of checks.

This strategy has already been implemented in the index scheduler and arroy.
I think the most aggressive checks are made in arroy, here’s the code: https://github.com/meilisearch/arroy/blob/19e0a07d40fd2b7685b70c941fee00400e9dda24/src/reader.rs#L361-L441
But as a TLDR, here’s what I actually check:

All the trees must be valid

That means I can come from the root of a tree and follow every node till I reach every leaf without ever encountering an unknown node or item ID.

All the item IDs are used exactly once in every tree

While traversing a tree, I also ensure that every item ID the user provides is in the tree.
So, there is no extraneous ID and no missing ID.

The node of one tree cannot be used in another tree

A tree should never contain a reference to another tree; if that happens, it means something got corrupted somewhere, and we re-used a wrong node ID in an incremental build process.
It could be really hard to fix if caught too late.

Nothing is left unknown in the database

After checking all of that, I just went through an exhaustive list of everything in the database.
If I find anything else in the database, that’s a bug. It means that some nodes are leaked in the database, and over multiple indexing processes, the database could grow for no reason or, worse, cause corruption.

This check is called everywhere

In the end, in arroy, every time I update a database, I snapshot its content on disk just to be sure it never changes in an unexpected way.
And the function in charge of snapshotting the database calls the assert validity function:
https://github.com/meilisearch/arroy/blob/19e0a07d40fd2b7685b70c941fee00400e9dda24/src/tests/mod.rs#L41

irevoire added the maintenance Issue about maintenance (CI, tests, refacto...) label Apr 18, 2024

irevoire changed the title ~~Check the validity of the milli databases in the tests~~ Assert the validity of the milli databases in the tests Apr 18, 2024

curquiza added this to the v1.9.0 milestone Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert the validity of the milli databases in the tests #4586

Assert the validity of the milli databases in the tests #4586

irevoire commented Apr 18, 2024

Assert the validity of the milli databases in the tests #4586

Assert the validity of the milli databases in the tests #4586

Comments

irevoire commented Apr 18, 2024

All the trees must be valid

All the item IDs are used exactly once in every tree

The node of one tree cannot be used in another tree

Nothing is left unknown in the database

This check is called everywhere