Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top-level index concept with stable ID #2317

Open
rpgreen opened this issue May 9, 2024 · 1 comment
Open

Top-level index concept with stable ID #2317

rpgreen opened this issue May 9, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@rpgreen
Copy link

rpgreen commented May 9, 2024

Introduce UUIDs for indexes that do not change during indexing and are stable for (index_type, index_name, column_name) 

@wjones127 wjones127 added the enhancement New feature or request label May 9, 2024
@wjones127
Copy link
Contributor

Background

  • Right now, "indices" in the Lance manifest refer to segments / pieces of an index. For example, if you create a vector index, that creates an index segment. If you add data and then incrementally index with a delta index, you will now have two "indices", that together make up the full vector index. That full vector index doesn't have a proper entity in Lance. It's just an aggregation of all index segments that share the same name within the same table.
  • A related problem: because an Index is attached to a single file, there is no such thing as an empty index. It is impossible to add an index to a newly created empty table. This strikes many users as odd, as they expect to be able to create a table, add an index, and then start inserting data.

Indexes

We probably need a terminology change:

  • Index -> IndexSegment
  • Index becomes a top-level index configuration

Users should be able to specify an index configuration up front, which will be saved into the Index entry. The Index will have a UUID in addition to a name, so that it can be differentiated from previous versions of the same name.

There's some rules to figure out with what's allowed for indexing. BTree indices don't really require any up-front training, and would be a good candidate to demonstrate how we can create an index without any data and incrementally update. However, anything with IVF requires some training or algorithm to create and split clusters. That needs special care and design.

@wjones127 wjones127 changed the title Stable index IDs Top-level index concept with stable ID May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants