Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new "AI-powered search" section #2788

Merged
merged 15 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 34 additions & 0 deletions .code-samples.meilisearch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1188,6 +1188,40 @@ index_settings_tutorial_api_put_setting_1: |-
index_settings_tutorial_api_task_1: |-
curl \
-X GET 'http://localhost:7700/tasks/TASK_UID'
get_embedders_1: |-
curl \
-X GET 'http://localhost:7700/indexes/INDEX_NAME/settings/embedders'
update_embedders_1: |-
curl \
-X PATCH 'http://localhost:7700/indexes/INDEX_NAME/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "openAi",
"apiKey": "anOpenAiApiKey",
"model": "text-embedding-3-small",
"documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
}
}
}'
reset_embedders_1: |-
curl \
-X DELETE 'http://localhost:7700/indexes/INDEX_NAME/settings/embedders'
search_parameter_guide_hybrid_1: |-
curl -X POST 'localhost:7700/indexes/INDEX_NAME/search' \
-H 'content-type: application/json' \
--data-binary '{
"q": "kitchen utensils",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
}
}'
search_parameter_guide_vector_1: |-
curl -X POST 'localhost:7700/indexes/INDEX_NAME/search' \
-H 'content-type: application/json' \
--data-binary '{ "vector": [0, 1, 2] }'
get_search_cutoff_1: |-
curl \
-X GET 'http://localhost:7700/indexes/movies/settings/search-cutoff-ms'
Expand Down
27 changes: 27 additions & 0 deletions assets/datasets/kitchenware.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[
{
"id": 0,
"name": "Wooden spoon",
"price": 1.50
},
{
"id": 1,
"name": "Microwave lid",
"price": 1.00
},
{
"id": 2,
"name": "Wooden chopping board",
"price": 9.50
},
{
"id": 3,
"name": "Plastic chopping board",
"price": 1.50
},
{
"id": 4,
"name": "Rolling pin",
"price": 2.50
}
]
16 changes: 16 additions & 0 deletions config/sidebar-learn.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,22 @@
}
]
},
{
"title": "AI-powered search",
"slug": "ai-powered-search",
"routes": [
{
"source": "learn/ai_powered_search/getting_started_with_ai_search.mdx",
"label": "Getting started with AI-powered search",
"slug": "getting_started_with_ai_search"
},
{
"source": "learn/ai_powered_search/difference_full_text_ai_search.mdx",
"label": "Differences between keyword and AI-powered search",
"slug": "difference_full_text_ai_search"
}
]
},
{
"title": "Analytics",
"slug": "analytics",
Expand Down
30 changes: 30 additions & 0 deletions learn/ai_powered_search/difference_full_text_ai_search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Differences between full-text and AI-powered search — Meilisearch documentation
description: "Meilisearch offers two types of search: full-text search and AI-powered search. This article explains their differences and intended use cases."
---

# Differences between full-text and AI-powered search <NoticeTag type="experimental" label="experimental" />

Meilisearch offers two types of search: full-text search and AI-powered search. This article explains their differences and intended use cases.

## Full-text search

This is Meilisearch's default search type. When performing a full-text search, Meilisearch checks the indexed documents for acceptable matches to a set of search terms. It is a fast and reliable search method.

For example, when searching for `"pink sandals"`, full-text search will only return clothing items explicitly mentioning these two terms. Searching for `"pink summer shoes for girls"` is likely to return fewer and less relevant results.

## AI-powered search

AI-powered search is Meilisearch's newest search method. It returns results based on a query's meaning and context.

AI-powered search uses LLM providers such as OpenAI and Hugging Face to generate vector embeddings representing the meaning and context of both query terms and documents. It then compares these vectors to find semantically similar search results.

When using AI-powered search, Meilisearch returns both full-text and semantic results by default. This is also called hybrid search.

With AI-powered search, searching for `"pink sandals"` will be more efficient, but queries for `"cute pink summer shoes for girls"` will still return relevant results including light-colored open shoes.

## Use cases

Full-text search is a reliable choice that works well in most scenarios. It is fast, less resource-intensive, and requires no extra configuration. It is best suited for situations where you need precise matches to a query and your users are familiar with the relevant keywords.

AI-powered search combines the flexibility of semantic search with the performance of full-text search. Most searches, whether short and precise or long and vague, will return very relevant search results. In most cases, AI-powered search will offer your users the best search experience, but will require extra configuration. AI-powered search may also entail extra costs if you use a third-party service such as OpenAI to generate vector embeddings.
106 changes: 106 additions & 0 deletions learn/ai_powered_search/getting_started_with_ai_search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: Getting started with AI-powered search — Meilisearch documentation
description: AI-powered search is an experimental technology that uses LLMs to retrieve search results. This tutorial shows you how to configure an OpenAI embedder and perform your first search.
---

# Getting started with AI-powered search <NoticeTag type="experimental" label="experimental" />

[AI-powered search](https://meilisearch.com/solutions/vector-search?utm_campaign=vector-search&utm_source=aipowered-tutorial), sometimes also called vector search and hybrid search, is an experimental technology that uses [large language models](https://en.wikipedia.org/wiki/Large_language_model) to retrieve search results based on the meaning and context of a query.

This tutorial will walk you through configuring AI-powered search in your Meilisearch project. You will activate the vector store setting, generate document embeddings with OpenAI, and perform your first search.

## Requirements

- A running Meilisearch project
- An [OpenAI API key](https://platform.openai.com/api-keys)
- A command-line console

## Create a new index

Create a `kitchenware` index and add [this kitchenware products dataset](/assets/datasets/kitchenware.json) to it. If necessary, consult the quick start for instructions on how to configure a basic Meilisearch installation.

## Activate AI-powered search

First, activate the AI-powered search experimental feature. Exactly how to do that depends on whether you are using [Meilisearch Cloud](#meilisearch-cloud-projects) or [self-hosting Meilisearch](#self-hosted-instances).

### Meilisearch Cloud projects

If using Meilisearch Cloud, navigate to your project overview and find "Experimental features". Then check the "AI-powered search" box.

![A section of the project overview interface titled "Experimental features". The image shows a few options, including "Vector store".](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)

<Capsule intent="note" title="Meilisearch Cloud AI-powered search waitlist">
To ensure proper scaling of Meilisearch Cloud's latest AI-powered search offering, you must enter the waitlist before activating vector search. You will not be able to activate vector search in the Cloud interface or via the `/experimental-features` route until your sign-up has been approved.
</Capsule>

### Self-hosted instances

Use [the `/experimental-features` route](/reference/api/experimental_features) to activate vector search during runtime:

```sh
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"vectorStore": true
}'
```

## Generate vector embeddings with OpenAI

Next, you must generate vector embeddings for all documents in your dataset. Embeddings are mathematical representations of the meanings of words and sentences in your documents. Meilisearch relies on external providers to generate these embeddings. Use OpenAI for this tutorial.

Use the `embedders` index setting of the [update `/settings` endpoint](/reference/api/settings) to configure a default [OpenAI](https://platform.openai.com/) embedder:

```sh
curl \
-X PATCH 'http://localhost:7700/indexes/kitchenware/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"source": "openAi",
"apiKey": "OPEN_AI_API_KEY",
"model": "text-embedding-3-small",
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
}
}
}'
```

Replace `OPEN_AI_API_KEY` with your [OpenAI API key](https://platform.openai.com/api-keys). You may use any key tier for this tutorial, but prefer [Tier 2 keys](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two) for optimal performance in production environments.

### `documentTemplate`

`documentTemplate` describes a short [Liquid template](https://shopify.github.io/liquid/). The text inside curly brackets (`{{`) indicates a document field in dot notation, where `doc` indicates the document itself and the string that comes after the dot indicates a document attribute. Meilisearch replaces these brackets and their contents with the corresponding field value.

The resulting text is the prompt OpenAI uses to generate document embeddings.

For example, kitchenware documents have three fields: `id`, `name`, and `price`. If your `documentTemplate` is `"An object used in a kitchen named '{{doc.name}}'"`, the text Meilisearch will send to the embedder when indexing the first document is `"An object used in a kitchen named 'Wooden spoon'"`.

For the best results, always provide a `documentTemplate`. Keep your templates short and only include highly relevant information. This ensures optimal indexing performance and search result relevancy.

## Perform an AI-powered search

Perform AI-powered searches with `q` and `hybrid` to retrieve search results using the default embedder you configured in the previous step:

```sh
curl \
-X POST 'http://localhost:7700/indexes/kitchenware/search' \
-H 'content-type: application/json' \
--data-binary '{
"q": "kitchen utensils made of wood",
"hybrid": {
"embedder": "default",
"semanticRatio": 0.7
}
}'
```

Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. If you want Meilisearch to return more results based on the meaning and context of a search, set `semanticRatio` to a value greater than `0.5`. Setting `semanticRatio` to a value lower than `0.5`, instead, will return more full-text matches.

## Conclusion

You have seen how to set up and perform AI-powered searches with Meilisearch and OpenAI. For more in-depth information, consult the reference for embedders and the `hybrid` search parameter.

AI-powered search is an experimental Meilisearch feature and is undergoing active development—[join the discussion on GitHub](https://github.com/orgs/meilisearch/discussions/677).
44 changes: 44 additions & 0 deletions reference/api/search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ By default, [this endpoint returns a maximum of 1000 results](/learn/advanced/kn
| **[`matchingStrategy`](#matching-strategy)** | String | `last` | Strategy used to match query terms within documents |
| **[`showRankingScore`](#ranking-score)** | Boolean | `false` | Display the global ranking score of a document |
| **[`attributesToSearchOn`](#customize-attributes-to-search-on-at-search-time)** | Array of strings | `["*"]` | Restrict search to the specified attributes |
| **[`hybrid`](#hybrid-search-experimental)** | Object | `null` | Return results based on query keywords and meaning |
| **[`vector`](#vector-experimental)** | Array of numbers | `null` | Search using a custom query vector |

[Learn more about how to use each search parameter](#search-parameters).

Expand Down Expand Up @@ -162,6 +164,8 @@ By default, [this endpoint returns a maximum of 1000 results](/learn/advanced/kn
| **[`matchingStrategy`](#matching-strategy)** | String | `last` | Strategy used to match query terms within documents |
| **[`showRankingScore`](#ranking-score)** | Boolean | `false` | Display the global ranking score of a document |
| **[`attributesToSearchOn`](#customize-attributes-to-search-on-at-search-time)** | Array of strings | `["*"]` | Restrict search to the specified attributes |
| **[`hybrid`](#hybrid-search-experimental)** | Object | `null` | Return results based on query keywords and meaning |
| **[`vector`](#vector-experimental)** | Array of numbers | `null` | Search using a custom query vector |

[Learn more about how to use each search parameter](#search-parameters).

Expand Down Expand Up @@ -248,6 +252,8 @@ This is not necessary when using the `POST` route or one of our [SDKs](/learn/wh
| **[`matchingStrategy`](#matching-strategy)** | String | `last` | Strategy used to match query terms within documents |
| **[`showRankingScore`](#ranking-score)** | Boolean | `false` | Display the global ranking score of a document |
| **[`attributesToSearchOn`](#customize-attributes-to-search-on-at-search-time)** | Array of strings | `["*"]` | Restrict search to the specified attributes |
| **[`hybrid`](#hybrid-search-experimental)** | Object | `null` | Return results based on query keywords and meaning |
| **[`vector`](#vector-experimental)** | Array of numbers | `null` | Search using a custom query vector |

### Query (q)

Expand Down Expand Up @@ -1074,3 +1080,41 @@ The following query returns documents whose `overview` includes `"adventure"`:
<CodeSamples id="search_parameter_guide_attributes_to_search_on_1" />

Results would not include documents containing `"adventure"` in other fields such as `title` or `genre`, even if these fields were present in the `searchableAttributes` list.

### Hybrid search (experimental)

**Parameter**: `hybrid`<br />
**Expected value**: An object with two fields: `embedder` and `semanticRatio`<br />
**Default value**: `null`

Configures Meilisearch to return search results based on a query's meaning and context.

`hybrid` must be an object. It accepts two fields: `embedder` and `semanticRatio`.

`embedder` must be a string indicating an embedder configured with the `/settings` endpoint. If you don't specify an embedder and your index contains a single embedder, Meilisearch uses it by default. If an index contains multiple embedders, Meilisearch will use the embedder named `default`.

`semanticRatio` must be a number between `0.0` and `1.0` indicating the proportion between keyword and semantic search results. `0.0` causes Meilisearch to only return keyword results. `1.0` causes Meilisearch to only return meaning-based results. Defaults to `0.5`.

<Capsule intent="warning">
Meilisearch will return an error if you use `hybrid` before activating your instance's `vectorStore` and [configuring an embedder](/reference/api/settings#embedders-experimental).
</Capsule>

#### Example

<CodeSamples id="search_parameter_guide_hybrid_1" />

### Vector (experimental)

Use a custom vector to perform a search query. Must be an array of numbers corresponding to the dimensions of the custom vector.

`vector` is mandatory when performing searches with `userProvided` embedders. You may also use `vector` to override an embedder's automatic vector generation.

`vector` dimensions must match the dimensions of the embedder.

#### Example

<CodeSamples id="search_parameter_guide_vector_1" />

<Capsule intent="warning">
Meilisearch will return an error if you use `vector` before activating your instance's `vectorStore` and [configuring a custom embedder](/reference/api/settings#embedders-experimental).
</Capsule>