feat: row id index structures (experimental) #2303

wjones127 · 2024-05-06T15:05:46Z

These are experimental indices to map from stable row ids to row addresses. It's possible there are some improvements to serialization format or performance we will make before stabilizing, but I'd like to defer that work so we can unblock work with the stable row ids.

These row id indices are optimized for storage size (in-memory and on-disk) and access speed.

Closes: #2308

codecov-commenter · 2024-05-06T15:25:49Z

Codecov Report

Attention: Patch coverage is 86.56846% with 155 lines in your changes are missing coverage. Please review.

Project coverage is 80.01%. Comparing base (2e07d71) to head (e3db3c0).
Report is 14 commits behind head on main.

Files	Patch %	Lines
rust/lance-table/src/rowids/segment.rs	87.34%	40 Missing and 2 partials ⚠️
rust/lance-table/src/rowids/encoded_array.rs	86.03%	37 Missing ⚠️
rust/lance-table/src/rowids.rs	82.14%	29 Missing and 1 partial ⚠️
rust/lance-table/src/rowids/serde.rs	82.75%	24 Missing and 6 partials ⚠️
rust/lance-table/src/rowids/bitmap.rs	92.64%	8 Missing and 2 partials ⚠️
rust/lance-table/src/rowids/index.rs	92.40%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2303      +/-   ##
==========================================
- Coverage   80.75%   80.01%   -0.75%     
==========================================
  Files         192      197       +5     
  Lines       56303    54302    -2001     
  Branches    56303    54302    -2001     
==========================================
- Hits        45469    43448    -2021     
- Misses       8201     8342     +141     
+ Partials     2633     2512     -121

Flag	Coverage Δ
unittests	`80.01% <86.56%> (-0.75%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wjones127 · 2024-05-20T18:43:50Z

I don't want to block further work on operations, so I'm leaving the performance testing where it is now. We should have someone make improvements in parallel with other work to update query and write code paths.

Index Size

Comparison of size (in bytes) of index for 1 million (original row ids) that are sorted, with some random percentage of rows deleted.

percent_deletions	flat size	index size	% change
0%	16,000,000	1,194	99.99%
25%	12,000,000	126,696	98.9%
50%	8,000,000	126,696	98.4%

The zero deletion sorted case is using a Range. While the other ones are likely using a bitmap.

Access speed

Right now it's roughly in the same ball park as a hashmap, but somewhat slower. Still, I think 100ns is pretty decent. The only case where we are faster is when there are no deleted rows.

wjones127 · 2024-05-20T18:51:24Z

protos/rowids.proto

+// TODO: what would it take to store this in a LanceV2 file?
+// Or would flatbuffers be better for this?


Leaving this TODO for a future PR. Would appreciate input on how we want to support this. For now, protobufs seems like the easiest.

This looks like an encoding for an array of u64 values. The problem is probably less with lance v2 and more with Arrow. Our file reader returns arrow arrays at the moment. I can't think of any good way to stuff this structure into an arrow Array. Maybe this could be done with a union array but I'm generally scared of those.

That being said, you can always put this in a file metadata buffer too, either as protobuf or as an encoded array. One advantage of using an encoded array, once the bit packing PR is done, is that we can pack into bits-per-value other than 16/32/64 (e.g. 23 or 12), although this would incur an encode/decode cost which might not make sense if the array is short.

We can also add a to_bytes / from_bytes methods for the primitive encodings. This would let you store it anywhere you can place a buffer.

less with lance v2 and more with Arrow

Yeah, that's what I was thinking too. I think it's likely very important we keep the in-memory and on-disk format aligned to minimize serialization.

westonpace

This looks good, a few questions but we can find issues as we start to use these structures too so I don't think we need to find everything right now.

westonpace · 2024-05-21T12:57:12Z

protos/rowids.proto

+// TODO: what would it take to store this in a LanceV2 file?
+// Or would flatbuffers be better for this?


This looks like an encoding for an array of u64 values. The problem is probably less with lance v2 and more with Arrow. Our file reader returns arrow arrays at the moment. I can't think of any good way to stuff this structure into an arrow Array. Maybe this could be done with a union array but I'm generally scared of those.

That being said, you can always put this in a file metadata buffer too, either as protobuf or as an encoded array. One advantage of using an encoded array, once the bit packing PR is done, is that we can pack into bits-per-value other than 16/32/64 (e.g. 23 or 12), although this would incur an encode/decode cost which might not make sense if the array is short.

We can also add a to_bytes / from_bytes methods for the primitive encodings. This would let you store it anywhere you can place a buffer.

westonpace · 2024-05-21T12:58:36Z

protos/rowids.proto

+ message U32Array {
+ uint64 base = 1;
+ /// The deltas are stored as 32-bit unsigned integers.
+ /// (we use bytes instead of uint32 to avoid overhead of varint encoding)


Curious, did you actually notice this overhead?

I did not. I could try to quickly measure it.

westonpace · 2024-05-21T13:37:11Z

rust/lance-table/src/rowids.rs

+ for row_id in iter.by_ref() {
+ first_10.push(row_id);
+ if first_10.len() > 10 {
+ break;
+ }
+ }
+
+ while let Some(row_id) = iter.next_back() {
+ last_10.push(row_id);
+ if last_10.len() > 10 {
+ break;
+ }
+ }


If there are 15 row ids will there be overlap?

They pull off the same double-ended iterator, so I don't think there should be any duplicates.

Ah, I see. I have to wrap my head around "double ended iterator" I'm not used to using it. I was thinking you were just starting with a forward iterator and then creating a backward iterator. I didn't realize you are reusing the iterator.

Yeah they are interesting. I learned while writing this they are passed through a surprising number of combinators. For example, if x is a DoubleEndedIterator, then x.enumerate() and x.enumerate().cycle() are too.

westonpace · 2024-05-21T13:38:44Z

rust/lance-table/src/rowids.rs

+ pub fn len(&self) -> u64 {
+ self.iter().count() as u64
+ }


The fact that this is O(N) (is it?) is slightly surprising. I would expect this value to be cached (and maybe computed at construction) or worst case at least O(# segments)

I'll change this to sum over the segments.

westonpace · 2024-05-21T13:43:05Z

rust/lance-table/src/rowids.rs

+ // If the last element of this sequence and the first element of next
+ // sequence are ranges, we might be able to combine them into a single
+ // range.


Technically there is no guarantee that other follows self and so the reverse could be true. The last element of other could be the first element of self and we could merge those too (I guess any range in other could merge with any range in self). I'm guessing we just care about optimizing this case because it is quite common?

The last element of other could be the first element of self and we could merge those too (I guess any range in other could merge with any range in self)

Not sure I follow. Remember order matters in these sequences. 0..10 + 10..20 == 0..20, but 10..20 + 0..10 != 0..20. But yeah there are probably other things we can combine. I just chose this as one common one.

westonpace · 2024-05-21T13:48:23Z

rust/lance-table/src/rowids.rs

+ // Often, the row ids will already be provided in the order they appear.
+ // So the optimal way to search will be to cycle through rather than
+ // restarting the search from the beginning each time.
+ let mut segment_iter = self.0.iter().enumerate().cycle();


I'm a little confused but might just be missing something. You call cycle here which makes me think you plan to iterate through the segments more than once (e.g. as described by your comment). However, in your loop (row_ids.into_iter()...) it seems like you will call segment_iter.next() at most self.0.len() times which means you aren't looping through multiple times.

Perhaps the thing you are missing is we are re-using the same segment_iter in the row_ids.into_iter().for_each() loop. This means each new search for a row id will pick up right after the last one we found. The idea is this is more efficient in the common case where the row ids we are searching for are in the same order they appear in the segment. Instead of restarting our search at the beginning of the segment for each row id, we can just keep going from where we left off. This makes the sorted case O(max(n, m)) instead of O(n * m) (n being the length of segment and m being number of row ids we are searching for).

The reason we limit each search to self.0.len() is so that if we are passed a non-existant row id we don't loop forever.

westonpace · 2024-05-21T13:49:51Z

rust/lance-table/src/rowids/bitmap.rs

+
+#[derive(PartialEq, Eq, Clone)]
+pub struct Bitmap {
+ pub data: Vec<u8>,


Could you use BooleanBuffer from arrow?

This is a good idea. There are some details with serialization to figure out, but I think it would work well and eliminate the need for this file. I'm going to leave this as a TODO for now.

rust/lance-table/src/rowids/encoded_array.rs

rust/lance-table/src/rowids/segment.rs

github-actions bot added the enhancement New feature or request label May 6, 2024

wjones127 force-pushed the row-id-indices branch from cf0abfa to 4eb6b8f Compare May 6, 2024 18:27

wjones127 added 10 commits May 20, 2024 11:48

start row id structures

060085c

serialization

7598e98

implement index

71f2c8f

add benchmark

a18dbcf

wip: refactoring to more optimized structure

a10de35

row id progress

e505c73

basic structure test

e166053

test iterators

4231ee3

fix bench

0980af5

better bitmap impl

dc556b6

wjones127 force-pushed the row-id-indices branch from 9a63a49 to dc556b6 Compare May 20, 2024 18:49

clippy

8784106

wjones127 commented May 20, 2024

View reviewed changes

wjones127 added 2 commits May 20, 2024 13:39

document it all better

5013e50

clippy

e1fd60f

wjones127 added the experimental Features that are experimental label May 20, 2024

wjones127 changed the title ~~feat: row id index structures~~ feat: row id index structures (experimental) May 20, 2024

wjones127 marked this pull request as ready for review May 20, 2024 20:44

wjones127 requested review from westonpace and chebbyChefNEQ May 20, 2024 20:45

westonpace approved these changes May 21, 2024

View reviewed changes

wjones127 added 2 commits May 22, 2024 11:30

pr feedback

83ec621

add todo for booleanbuffer

e3db3c0

wjones127 merged commit e310ab4 into lancedb:main May 22, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: row id index structures (experimental) #2303

feat: row id index structures (experimental) #2303

wjones127 commented May 6, 2024 •

edited

codecov-commenter commented May 6, 2024 •

edited

wjones127 commented May 20, 2024 •

edited

wjones127 May 20, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace left a comment

westonpace May 21, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace May 22, 2024

wjones127 May 22, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

westonpace May 21, 2024

wjones127 May 22, 2024

		// TODO: what would it take to store this in a LanceV2 file?
		// Or would flatbuffers be better for this?

feat: row id index structures (experimental) #2303

feat: row id index structures (experimental) #2303

Conversation

wjones127 commented May 6, 2024 • edited

codecov-commenter commented May 6, 2024 • edited

Codecov Report

wjones127 commented May 20, 2024 • edited

Index Size

Access speed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 commented May 6, 2024 •

edited

codecov-commenter commented May 6, 2024 •

edited

wjones127 commented May 20, 2024 •

edited