Skip to content

Searching with HNSW index

itsmeabd edited this page Apr 5, 2023 · 1 revision

HNSW

HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther; the lower layers are denser and he distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position.

In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to M. In addition, you can use efConstruction (when building index) or ef (when searching targets) to specify a search range.

  • Index building parameters

    Parameter Description Range
    M Maximum degree of the node [4, 64]
    efConstruction Search scope [8, 512]
  • Search parameters

    Parameter Description Range
    ef Search scope [top_k, 32768]

For example:

In python

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
# create a collection
collection_name = "milvus_HSNW"
default_fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=d)
]default_schema = CollectionSchema(fields=default_fields, description="test collection")
print(f"\nCreate collection...")
collection = Collection(name= collection_name, schema=default_schema)

# insert data
import random
vectors = [[random.random() for _ in range(8)] for _ in range(10)]
entities = [vectors]
mr = collection.insert(entities)
print(collection.num_entities) 


# create index
collection.create_index(field_name=field_name,
                        index_params={'index_type': 'HNSW',
                                      'metric_type': 'L2',
                                      'params': {
                                        "M": 16,              # int. 4~64
                                        "efConstruction": 40  # int. 8~512
                                      }})
collection.load()
                                     
# search 
top_k = 10
results = collection.search(vectors[:5], anns_field="vector", param={
                "ef": 64          
              }, limit=top_k)# show results
              
for result in results:
  print(result.ids)
  print(result.distance)