Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database thing statistics #7064

Merged
merged 13 commits into from
May 15, 2024
Merged

Conversation

flyingsilverfin
Copy link
Member

@flyingsilverfin flyingsilverfin commented May 10, 2024

Usage and product changes

We implement the architecture and most of the implementation required for tracking data statistics. The statistics will primarily be used for query planning. We achieved several design goals:

  1. Not scanning the entire storage to update statistics
  2. Allowing access to old versions statistics, which allows even very old time travel/MVCC usage without too much performance degradation
  3. Not writing statistics to the storage layer in RocksDB, since we can degrade performance by updating the statistics keys on every transcation and statistics can/should be a primarily in-memory structure.

However, we take the trade off that the statistics are not always up-to-date. The update frequency is parameter we can optimise.

In the end, there is a single database-wide statistics struct, which is immutable and updated periodically. We update it by scanning the data WAL records and summing the count deltas since, and then replacing the Statistics struct held by the database atomically. The statistics are checkpointed into the WAL, which also allows us to time-travel to older snapshots and find a relatively accurate statistics entry from near that version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

Future work
We could find that reading from the WAL to update statistics is a bottleneck. We can solve several problems at once, by extracting the commit data "cache" from the IsolationManager into the DurabilityClient, which can then be shared across isolation and statistics operations.

Implementation

Architecture

  • Promote WAL and Checkpoint management into Database

    • we move Checkpoint and associated commit_replay methods into a new module: //storage/recovery
    • we also add an (ultimately unused) system to 'extend' a checkpoint with additional data
  • Split database creation and loading into two separate entry points, and rearrange the corresponding methods all the way down into Storage, WAL, and Checkpointing. We also update corresponding tests.

  • Given a database directory, each module creates its own subdirectory:

    • MVCCStorage creates db-name/storage
    • WAL creates db-name/wal
    • Checkpoint creates db-name/checkpoint
  • Introduce //concept/thing/Statistics, which stores thing statistics for instances of each type, role playing, and relation indexing, etc.

  • We checkpoint statistics into the WAL using a new record type, and load the last one on bootup. This also helps solve the MVCC time-travel problem, where going back in time could lead to mismatched statistics being used (or even not relevant for the different schema!). We probably allow using the existing Statistics if the "old" sequence number being opened is no more than N (~100) versions behind the statistics version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

  • Statistics catch-up/synchronisation is implemented by reading data commit records from the WAL.

    • For this we re-creatte write snapshots from commit data read from disk. However, this constructor intentionally returns a narrower API which means we cannot write or commit to a re-created write snapshots.
    • We add CommitType to the CommitRecord generated by CommittableSnapshots. This allows deserialising and recreating the correct type of snapshot (data or schema) from a WAL entry.
  • Refactor out the //storage/durability module into //durability package, which contains a simplified Service trait

    • We then create the DurabilityClient trait, which is now used throughout the code base wherever DurabilityService was used before
    • The intent is to allow extracting Durability into remote machine(s) using a Calvin-style partitioned WAL, if we wanted to. We will use the client to communicate with a set of durability servers, and manage collecting ordering information, etc.
    • For now, we only have a WALClient which wraps a WAL but conforms to the DurabilityClient trait

UX

We create a more consistent/comprehensive error structure for what happens if any of storage/wal/checkpoint are not present on bootup.

The presence or absence of the storage directory is irrelevant to bootup/recovery (same path). Being present simply optimises the recovery process since we have to copy fewer files from the checkpoints.

  • If the WAL is present on bootup, but no checkpoint is provided, we replay the WAL from scratch
  • If the WAL is present on bootup, and a checkpoint is provided, we replace the storage with the checkpoint and replay the WAL since the checkpoint
  • If the db-name database directory is present, but no WAL is present, this is an error state.
  • If the WAL directory is present, but for any reason data is required from the WAL is not present (for example, deleted or cleaned up) this is an error. This could happen when replaying the WAL from the start when the checkpoint is absent, or when a checkpoint is provided and the required replay point is not available in the WAL.

@vaticle-bot
Copy link
Member

PR Review Checklist

Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.


Trivial Change

  • This change is trivial and does not require a code or architecture review.

Code

  • Packages, classes, and methods have a single domain of responsibility.
  • Packages, classes, and methods are grouped into cohesive and consistent domain model.
  • The code is canonical and the minimum required to achieve the goal.
  • Modules, libraries, and APIs are easy to use, robust (foolproof and not errorprone), and tested.
  • Logic and naming has clear narrative that communicates the accurate intent and responsibility of each module (e.g. method, class, etc.).
  • The code is algorithmically efficient and scalable for the whole application.

Architecture

  • Any required refactoring is completed, and the architecture does not introduce technical debt incidentally.
  • Any required build and release automations are updated and/or implemented.
  • Any new components follows a consistent style with respect to the pre-existing codebase.
  • The architecture intuitively reflects the application domain, and is easy to understand.
  • The architecture has a well-defined hierarchy of encapsulated components.
  • The architecture is extensible and scalable.

@flyingsilverfin flyingsilverfin changed the title Statistics Database thing statistics May 15, 2024
@flyingsilverfin flyingsilverfin merged commit 46aa227 into vaticle:3.0 May 15, 2024
0 of 2 checks passed
@flyingsilverfin flyingsilverfin deleted the statistics branch May 15, 2024 15:47
@flyingsilverfin flyingsilverfin added this to the 3.0.0 milestone May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants