Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【ef_search】set hnsw.ef_search = 1001 failed #560

Closed
frg01 opened this issue May 17, 2024 · 2 comments
Closed

【ef_search】set hnsw.ef_search = 1001 failed #560

frg01 opened this issue May 17, 2024 · 2 comments

Comments

@frg01
Copy link

frg01 commented May 17, 2024

Why is the boundary value of this parameter only 1,000, and there are tens of thousands of data in my table, but only 1,000 entries can be returned through the index? I don't think it makes sense

1. 
CREATE TABLE items_update4 (id bigserial ,embedding vector(3));

2. 
CREATE OR REPLACE FUNCTION public.add_vector_data(
 tb_name varchar,
    row_cnt integer,
 beg_id integer,
    vector_dimen integer, 
 vector_v_mul integer 
 )
    RETURNS void
    LANGUAGE 'plpgsql'

    COST 100
    VOLATILE 
AS $BODY$
declare
v_row_idx integer := 0;
embedding varchar := '';
v_strSql varchar := '';
begin         
  while v_row_idx < row_cnt loop  
 embedding := '''[' || array_to_string(ARRAY(SELECT random() * vector_v_mul FROM generate_series(1, vector_dimen)), ',') || ']''';
 v_strSql := 'insert into '||tb_name||'(id,embedding)values('||beg_id+v_row_idx||','||embedding||') ;';
 EXECUTE v_strSql; 
 v_row_idx = v_row_idx+1;
 if v_row_idx%1000=0 then
     raise notice'row_cnt: %',v_row_idx;
 end if; 
 
  end loop; 
end
$BODY$;

3.
SELECT COUNT(*)
FROM (
    SELECT * 
    FROM items_update4 
    ORDER BY embedding <-> '[1,2,3]' 
    LIMIT 10000
) AS subquery;

4. 
create index items_update4_spann_1 on items_update4 using hnsw (embedding vector_l2_ops);

5.
SELECT COUNT(*)
FROM (
    SELECT * 
    FROM items_update4 
    ORDER BY embedding <-> '[1,2,3]' 
    LIMIT 10000
) AS subquery;

No index can return any quantity, and creating an index can only return the number specified by the ef_search.
If you know, please tell me why

@jkatz
Copy link
Contributor

jkatz commented May 17, 2024

@frg01 What is your use case where you're trying to return more than 1000 entries in a single query?

@ankane
Copy link
Member

ankane commented May 20, 2024

Hi @frg01, hnsw.ef_search is limited to 1,000 to limit memory usage. IVFFlat is recommended if you need more than 500 results (docs).

@ankane ankane closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants