Database thing statistics #7064

flyingsilverfin · 2024-05-10T15:10:23Z

Usage and product changes

We implement the architecture and most of the implementation required for tracking data statistics. The statistics will primarily be used for query planning. We achieved several design goals:

Not scanning the entire storage to update statistics
Allowing access to old versions statistics, which allows even very old time travel/MVCC usage without too much performance degradation
Not writing statistics to the storage layer in RocksDB, since we can degrade performance by updating the statistics keys on every transcation and statistics can/should be a primarily in-memory structure.

However, we take the trade off that the statistics are not always up-to-date. The update frequency is parameter we can optimise.

In the end, there is a single database-wide statistics struct, which is immutable and updated periodically. We update it by scanning the data WAL records and summing the count deltas since, and then replacing the Statistics struct held by the database atomically. The statistics are checkpointed into the WAL, which also allows us to time-travel to older snapshots and find a relatively accurate statistics entry from near that version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

Future work
We could find that reading from the WAL to update statistics is a bottleneck. We can solve several problems at once, by extracting the commit data "cache" from the IsolationManager into the DurabilityClient, which can then be shared across isolation and statistics operations.

Implementation

Architecture

Promote WAL and Checkpoint management into Database
- we move Checkpoint and associated commit_replay methods into a new module: //storage/recovery
- we also add an (ultimately unused) system to 'extend' a checkpoint with additional data
Split database creation and loading into two separate entry points, and rearrange the corresponding methods all the way down into Storage, WAL, and Checkpointing. We also update corresponding tests.
Given a database directory, each module creates its own subdirectory:
- MVCCStorage creates db-name/storage
- WAL creates db-name/wal
- Checkpoint creates db-name/checkpoint
Introduce //concept/thing/Statistics, which stores thing statistics for instances of each type, role playing, and relation indexing, etc.
We checkpoint statistics into the WAL using a new record type, and load the last one on bootup. This also helps solve the MVCC time-travel problem, where going back in time could lead to mismatched statistics being used (or even not relevant for the different schema!). We probably allow using the existing Statistics if the "old" sequence number being opened is no more than N (~100) versions behind the statistics version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

Statistics catch-up/synchronisation is implemented by reading data commit records from the WAL.
- For this we re-creatte write snapshots from commit data read from disk. However, this constructor intentionally returns a narrower API which means we cannot write or commit to a re-created write snapshots.
- We add CommitType to the CommitRecord generated by CommittableSnapshots. This allows deserialising and recreating the correct type of snapshot (data or schema) from a WAL entry.
Refactor out the //storage/durability module into //durability package, which contains a simplified Service trait
- We then create the DurabilityClient trait, which is now used throughout the code base wherever DurabilityService was used before
- The intent is to allow extracting Durability into remote machine(s) using a Calvin-style partitioned WAL, if we wanted to. We will use the client to communicate with a set of durability servers, and manage collecting ordering information, etc.
- For now, we only have a WALClient which wraps a WAL but conforms to the DurabilityClient trait

UX

We create a more consistent/comprehensive error structure for what happens if any of storage/wal/checkpoint are not present on bootup.

The presence or absence of the storage directory is irrelevant to bootup/recovery (same path). Being present simply optimises the recovery process since we have to copy fewer files from the checkpoints.

If the WAL is present on bootup, but no checkpoint is provided, we replay the WAL from scratch
If the WAL is present on bootup, and a checkpoint is provided, we replace the storage with the checkpoint and replay the WAL since the checkpoint
If the db-name database directory is present, but no WAL is present, this is an error state.
If the WAL directory is present, but for any reason data is required from the WAL is not present (for example, deleted or cleaned up) this is an error. This could happen when replaying the WAL from the start when the checkpoint is absent, or when a checkpoint is provided and the required replay point is not available in the WAL.

vaticle-bot · 2024-05-10T15:10:27Z

PR Review Checklist

Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.

Trivial Change

This change is trivial and does not require a code or architecture review.

Code

Packages, classes, and methods have a single domain of responsibility.
Packages, classes, and methods are grouped into cohesive and consistent domain model.
The code is canonical and the minimum required to achieve the goal.
Modules, libraries, and APIs are easy to use, robust (foolproof and not errorprone), and tested.
Logic and naming has clear narrative that communicates the accurate intent and responsibility of each module (e.g. method, class, etc.).
The code is algorithmically efficient and scalable for the whole application.

Architecture

Any required refactoring is completed, and the architecture does not introduce technical debt incidentally.
Any required build and release automations are updated and/or implemented.
Any new components follows a consistent style with respect to the pre-existing codebase.
The architecture intuitively reflects the application domain, and is easy to understand.
The architecture has a well-defined hierarchy of encapsulated components.
The architecture is extensible and scalable.

WIP

…nd Load into separate paths

Initial cleanup

…lient Add decompression of records Rename and clean up

flyingsilverfin force-pushed the statistics branch from 2a330f3 to 5d1e2fc Compare May 15, 2024 14:33

flyingsilverfin marked this pull request as ready for review May 15, 2024 14:34

flyingsilverfin requested review from haikalpribadi and lolski as code owners May 15, 2024 14:34

vaticle-bot assigned haikalpribadi and lolski May 15, 2024

flyingsilverfin added 13 commits May 15, 2024 16:28

WIP implementing statistics architecture

beb7e37

WIP

Deciding how to add statistics to checkpoints

fb34f9a

Refactoring to inject WAL and Checkpoint into Storage, split Create a…

d39e966

…nd Load into separate paths

Update database entry points to use create/load logic

27cf056

WIP wiring up statistics, moving statistics snapshotting into Durability

8763576

Add finding last record of a type in WAL

f3c90e0

Add statistics to the WAL

30d98a8

Add longer sleeps to the crash test

b0474dd

Initial cleanup

Re-checkpoint database on load if anything is new

cc94401

Move //storage/durability to //durability

e9c418a

Split Durability into //durability service and //storage:durability_c…

7783dc4

…lient Add decompression of records Rename and clean up

Rebase

7032b64

Rebased

10d2e97

flyingsilverfin force-pushed the statistics branch from 5d1e2fc to 10d2e97 Compare May 15, 2024 15:45

flyingsilverfin changed the title ~~Statistics~~ Database thing statistics May 15, 2024

flyingsilverfin merged commit 46aa227 into vaticle:3.0 May 15, 2024
0 of 2 checks passed

flyingsilverfin deleted the statistics branch May 15, 2024 15:47

flyingsilverfin added this to the 3.0.0 milestone May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database thing statistics #7064

Database thing statistics #7064

flyingsilverfin commented May 10, 2024 •

edited

vaticle-bot commented May 10, 2024

Database thing statistics #7064

Database thing statistics #7064

Conversation

flyingsilverfin commented May 10, 2024 • edited

Usage and product changes

Implementation

Architecture

UX

vaticle-bot commented May 10, 2024

PR Review Checklist

flyingsilverfin commented May 10, 2024 •

edited