Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrate IVF_PQ indices when vector column is casted #2102

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wjones127
Copy link
Contributor

@wjones127 wjones127 commented Mar 21, 2024

When a user calls alter_columns() to change the data type of a vector column, we can attempt to migrate the vector index to the new data type as part of the same transaction. This will allow users to easily migrate from f32-based vectors to f16 ones.

Closes #1978

Copy link

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 72 lines in your changes are missing coverage. Please review.

Project coverage is 80.97%. Comparing base (944aca8) to head (500b266).
Report is 8 commits behind head on main.

Files Patch % Lines
rust/lance/src/index/vector/pq.rs 51.28% 18 Missing and 1 partial ⚠️
rust/lance/src/index/vector/ivf.rs 83.33% 5 Missing and 11 partials ⚠️
rust/lance/src/index.rs 66.66% 6 Missing and 5 partials ⚠️
rust/lance/src/dataset/transaction.rs 76.66% 7 Missing ⚠️
rust/lance/src/index/vector/fixture_test.rs 0.00% 5 Missing ⚠️
rust/lance/src/index/vector.rs 20.00% 4 Missing ⚠️
rust/lance-index/src/scalar/btree.rs 0.00% 3 Missing ⚠️
rust/lance-index/src/scalar/flat.rs 0.00% 3 Missing ⚠️
rust/lance/src/dataset.rs 94.87% 1 Missing and 1 partial ⚠️
rust/lance/src/index/vector/hnsw.rs 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2102      +/-   ##
==========================================
- Coverage   81.07%   80.97%   -0.10%     
==========================================
  Files         160      160              
  Lines       47328    47533     +205     
  Branches    47328    47533     +205     
==========================================
+ Hits        38370    38490     +120     
- Misses       6768     6822      +54     
- Partials     2190     2221      +31     
Flag Coverage Δ
unittests 80.97% <72.72%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wjones127 wjones127 changed the title wip: handle migrating indices feat: migrate IVF_PQ indices when vector column is casted Mar 22, 2024
@wjones127 wjones127 marked this pull request as ready for review March 27, 2024 02:01
@@ -67,6 +67,8 @@ pub trait ProductQuantizer: Send + Sync + std::fmt::Debug {

fn dimension(&self) -> usize;

fn metric_type(&self) -> MetricType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to use DistanceType here?
cc @westonpace

.map(|(offset, length)| {
index
.sub_index
.load(reader.clone(), *offset, *length as usize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on new index format, the semantics of load now is loading the index, and load_partition is loading the index as a sub index.
the latter requires partition_id to load the sub index with partition metadata of given partition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate index when casting a vector column
4 participants