Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Stable Row Ids #2307

Open
1 of 16 tasks
wjones127 opened this issue May 6, 2024 · 0 comments
Open
1 of 16 tasks

Epic: Stable Row Ids #2307

wjones127 opened this issue May 6, 2024 · 0 comments
Assignees
Labels
epic A collection of issues with a certain theme

Comments

@wjones127
Copy link
Contributor

wjones127 commented May 6, 2024

Motivation

When we compaction data files, the row id changes. This causes us to need to update the index files whenever we compact. When the index files are updated, it invalidates them in the cache, degrading query performance. If row ids were stable when rows were moved, this would not happen.

Scope

This epic makes row ids stable after moving. It does not make them stable after updates. Rows that are updated will be deleted and appended under new ids.

A future epic will cover "primary keys", which will be the point at which row ids will be stable after updates in addition to moves. This is kept out of scope for now to keep the workload of this manageable.

Design

In very simple terms:

  1. Add row ids as auto-incrementing u64 id. The manifest will track max_row_id and assign in similar process as fragment ids are assigned during the commit loop.
  2. Each fragment metadata will contain a small row id index. This index maps from row id to row address. (Row address is what we currently call _rowid.) In most cases, such as after an append, this will be a simple range of values (max_row_id + 1)..(physical_rows + max_row_id + 1).
  3. Deletion files will be superceded by tombstones contained in the row id index. This cuts down on total number of files to manage.
  4. A new feature flag will be introduced to make sure older readers don't try to interpret these new row ids.

Plan

@wjones127 wjones127 added the epic A collection of issues with a certain theme label May 6, 2024
@wjones127 wjones127 self-assigned this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic A collection of issues with a certain theme
Projects
None yet
Development

No branches or pull requests

1 participant