Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Compressed posting lists #4143

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from
Draft

[WIP] Compressed posting lists #4143

wants to merge 1 commit into from

Conversation

xzfc
Copy link
Contributor

@xzfc xzfc commented Apr 30, 2024

This PR adds a draft implementation of the new sparse vector index format.

Comparison with the current implementation

Baseline: current dev branch, c173a9f.

Memory usage

Old New
Full data on-disk size 23.0G 15.4G
Index only (inverted_index.data) on-disk size 13.6G 5.9G
RAM consumption 13.9G 6.8G

Bench in sparse crate

New:

search/random_50k         time:   [1.0777 ms 1.0874 ms 1.0981 ms]
search/random_500k        time:   [8.9879 ms 9.1699 ms 9.3507 ms]
search/msmarco_1M         time:   [13.931 ms 14.254 ms 14.581 ms]
search/msmarco_full_0.25  time:   [32.163 ms 33.040 ms 33.934 ms]

Old:

search/random_50k         time:   [776.59 µs 781.35 µs 787.21 µs]
search/random_500k        time:   [7.2040 ms 7.3156 ms 7.4284 ms]
search/msmarco_1M         time:   [14.409 ms 14.760 ms 15.118 ms]
search/msmarco_full_0.25  time:   [33.352 ms 34.323 ms 35.312 ms]

MSMARCO

Results of https://github.com/qdrant/sparse-vectors-benchmark.
Times are in ms.

Full dataset

Old New
min 101.8 106.6
50p 248.9 228.6
95p 358.7 330.5
99p 403.8 382.2
999p 463.4 450.2
max 507.8 491.1

1M dataset

Old New
min 17.59 17.71
50p 36.24 31.63
95p 50.44 43.46
99p 58.26 50.56
999p 70.89 59.28
max 90.39 66.70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant