You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although this guide is specifically for HSNW indexes, it can be generalized to work for any index type
Building an index without the CONCURRENTLY modifier will lock the table. For general advice about indexes, check out this guide.
To speed up queries, it is ideal to create an HSNW index on your embedded column
The general structure for creating an hsnw index follows this pattern:
CREATE INDEX <custom name of index>ON<table name> USING hnsw (<vectorized column><search type>);
Search can be one of three types:
operator
description
search type
<->
Euclidean distance
vector_l2_ops
<#>
negative inner product
vector_ip_ops
<=>
cosine distance
vector_cosine_ops
Queries can only utilize the index if it matches the search type used. If you are unsure which search type to prioritize, vector_cosine_ops is the most commonly used. You can checkout our guide for more info. The folks at Crunchy Data also wrote an explainer that you may find useful.
Applying an index can be slow and computationally expensive, so there are a few preparations that should be made beforehand:
1. Make sure your pgvector is the latest available version on Supabase.
Versions 0.6 and later have accelerated HNSW build speeds. You can observe your current version in the Dashboard's Extensions Page. You can perform a software upgrade in the Infrastructure Settings if necessary.
2. Setting up an external connection
The Dashboard has an internal time limit of ~2 minutes for queries. Indexing a large table will almost always take more time, so it is necessary to execute your code through an external interface, such as PSQL.
You can install PSQL in macOS and Windows by following these links and instructions.
For Linux (Debian) you can run the following:
Once installed, you can find your PSQL string from the Database Settings, which can be executed in the terminal to create a psql session.
If your network can use IPv6, consider using the direct connection string instead of Supavisor. It's not mandatory, but for tasks that run a long time, it's best to reduce network complexity. To check if your network is compatible, use this cURL command to request your IPv6 address:
curl -6 https://ifconfig.co/ip
If an address is returned, you should be able to use your direct connection string found in the Database Settings:
3. Increase memory for index creation (optional)
The maintance_work_mem variable limits the maximum amount of memory that can be used by maintenance operations, such as vacuuming, altering, and indexing tables. In your session you should try to set it to a reasonably high value:
set maintenance_work_mem to <several Gb's>; -- '#GB'
Inspect value to make sure it has been set:
show maintenance_work_mem;
4. Increase cores for index creation (optional)
The max_parallel_maintenance_workers variable limits the amount of cores that can be used by maintenance operations, including indexing tables. In your session, you should try to set it to a value roughly 1/2 to 2/3s of your compute core amount:
set max_parallel_maintenance_workers to <integer>;
Inspect value to make sure it has been set:
show max_parallel_maintenance_workers;
5. Setting a custom timeout
You should increase the query timeout for your connection:
set statement_timeout ='999min';
Inspect value to make sure it has been set:
show statement_timeout;
6. Consider temporarily upgrading your compute size (optional)
If your task is particularly long, you can speed it up by boosting your computing power temporarily. Compute size is charged by the hour, so you can increase it for an hour or two to finish your task faster, then scale it back afterward. Here is a list of compute add-ons. If you want to temporarily upgrade, you can find the add-ons for your project in your Dashboard's Add-Ons Settings.
7. Consider increasing disk size (optional)
HSNW indexes can produce temporary files during their construction that may consume a few GBs worth of disk. Consider increasing the disk size in the Database Settings to accommodate for short-term disk stress.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
To speed up queries, it is ideal to create an HSNW index on your embedded column
The general structure for creating an hsnw index follows this pattern:
Search can be one of three types:
Queries can only utilize the index if it matches the search type used. If you are unsure which search type to prioritize, vector_cosine_ops is the most commonly used. You can checkout our guide for more info. The folks at Crunchy Data also wrote an explainer that you may find useful.
Applying an index can be slow and computationally expensive, so there are a few preparations that should be made beforehand:
1. Make sure your pgvector is the latest available version on Supabase.
Versions 0.6 and later have accelerated HNSW build speeds. You can observe your current version in the Dashboard's Extensions Page. You can perform a software upgrade in the Infrastructure Settings if necessary.
2. Setting up an external connection
The Dashboard has an internal time limit of ~2 minutes for queries. Indexing a large table will almost always take more time, so it is necessary to execute your code through an external interface, such as PSQL.
You can install PSQL in macOS and Windows by following these links and instructions.
For Linux (Debian) you can run the following:
Once installed, you can find your PSQL string from the Database Settings, which can be executed in the terminal to create a psql session.
If your network can use IPv6, consider using the direct connection string instead of Supavisor. It's not mandatory, but for tasks that run a long time, it's best to reduce network complexity. To check if your network is compatible, use this cURL command to request your IPv6 address:
If an address is returned, you should be able to use your direct connection string found in the Database Settings:
3. Increase memory for index creation (optional)
The
maintance_work_mem
variable limits the maximum amount of memory that can be used by maintenance operations, such as vacuuming, altering, and indexing tables. In your session you should try to set it to a reasonably high value:Inspect value to make sure it has been set:
4. Increase cores for index creation (optional)
The
max_parallel_maintenance_workers
variable limits the amount of cores that can be used by maintenance operations, including indexing tables. In your session, you should try to set it to a value roughly 1/2 to 2/3s of your compute core amount:Inspect value to make sure it has been set:
5. Setting a custom timeout
You should increase the query timeout for your connection:
Inspect value to make sure it has been set:
6. Consider temporarily upgrading your compute size (optional)
If your task is particularly long, you can speed it up by boosting your computing power temporarily. Compute size is charged by the hour, so you can increase it for an hour or two to finish your task faster, then scale it back afterward. Here is a list of compute add-ons. If you want to temporarily upgrade, you can find the add-ons for your project in your Dashboard's Add-Ons Settings.
7. Consider increasing disk size (optional)
HSNW indexes can produce temporary files during their construction that may consume a few GBs worth of disk. Consider increasing the disk size in the Database Settings to accommodate for short-term disk stress.
Beta Was this translation helpful? Give feedback.
All reactions