Skip to content

Commit

Permalink
Add sparse_vector query (#108254)
Browse files Browse the repository at this point in the history

---------

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Benjamin Trent <[email protected]>
  • Loading branch information
3 people committed May 22, 2024
1 parent 582c6c0 commit 7f35f1b
Show file tree
Hide file tree
Showing 37 changed files with 2,025 additions and 357 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/108254.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 108254
summary: Add `sparse_vector` query
area: Machine Learning
type: enhancement
issues: []
21 changes: 11 additions & 10 deletions docs/reference/mapping/types/sparse-vector.asciidoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
[[sparse-vector]]
=== Sparse vector field type

++++
<titleabbrev>Sparse vector</titleabbrev>
++++

A `sparse_vector` field can index features and weights so that they can later be used to query
documents in queries with a <<query-dsl-text-expansion-query,`text_expansion`>> query.
A `sparse_vector` field can index features and weights so that they can later be used to query documents in queries with a <<query-dsl-sparse-vector-query, `sparse_vector`>>.
This field can also be used with a legacy <<query-dsl-text-expansion-query,`text_expansion`>> query.

`sparse_vector` is the field type that should be used with <<elser-mappings, ELSER mappings>>.

Expand All @@ -23,16 +24,16 @@ PUT my-index
}
--------------------------------------------------

See <<semantic-search-elser, semantic search with ELSER>> for a complete example on adding documents
to a `sparse_vector` mapped field using ELSER.
See <<semantic-search-elser, semantic search with ELSER>> for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.

NOTE: `sparse_vector` fields can not be included in indices that were *created* on {es} versions between 8.0 and 8.10

NOTE: `sparse_vector` fields only support single-valued fields and strictly positive
values. Multi-valued fields and negative values will be rejected.
NOTE: `sparse_vector` fields only support single-valued fields and strictly positive values.
Multi-valued fields and negative values will be rejected.

NOTE: `sparse_vector` fields do not support querying, sorting or aggregating. They may
only be used within <<query-dsl-text-expansion-query,`text_expansion`>> queries.
NOTE: `sparse_vector` fields do not support querying, sorting or aggregating.
They may only be used within specialized queries.
The recommended query to use on these fields are <<query-dsl-sparse-vector-query, `sparse_vector`>> queries.
They may also be used within <<query-dsl-text-expansion-query,`text_expansion`>> queries.

NOTE: `sparse_vector` fields only preserve 9 significant bits for the precision, which
translates to a relative error of about 0.4%.
NOTE: `sparse_vector` fields only preserve 9 significant bits for the precision, which translates to a relative error of about 0.4%.
34 changes: 16 additions & 18 deletions docs/reference/query-dsl.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,50 +5,46 @@
--

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries.
Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of
clauses:
Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:

Leaf query clauses::

Leaf query clauses look for a particular value in a particular field, such as the
<<query-dsl-match-query,`match`>>, <<query-dsl-term-query,`term`>> or
<<query-dsl-range-query,`range`>> queries. These queries can be used
by themselves.
<<query-dsl-range-query,`range`>> queries.
These queries can be used by themselves.

Compound query clauses::

Compound query clauses wrap other leaf *or* compound queries and are used to combine
multiple queries in a logical fashion (such as the
<<query-dsl-bool-query,`bool`>> or <<query-dsl-dis-max-query,`dis_max`>> query),
or to alter their behaviour (such as the
Compound query clauses wrap other leaf *or* compound queries and are used to combine multiple queries in a logical fashion (such as the
<<query-dsl-bool-query,`bool`>> or <<query-dsl-dis-max-query,`dis_max`>> query), or to alter their behaviour (such as the
<<query-dsl-constant-score-query,`constant_score`>> query).

Query clauses behave differently depending on whether they are used in
<<query-filter-context,query context or filter context>>.

[[query-dsl-allow-expensive-queries]]
Allow expensive queries::
Certain types of queries will generally execute slowly due to the way they are implemented, which can affect
the stability of the cluster. Those queries can be categorised as follows:
Certain types of queries will generally execute slowly due to the way they are implemented, which can affect the stability of the cluster.
Those queries can be categorised as follows:

* Queries that need to do linear scans to identify matches:
** <<query-dsl-script-query,`script` queries>>
** queries on <<number,numeric>>, <<date,date>>, <<boolean,boolean>>, <<ip,ip>>,
<<geo-point,geo_point>> or <<keyword,keyword>> fields
that are not indexed but have <<doc-values,doc values>> enabled
<<geo-point,geo_point>> or <<keyword,keyword>> fields that are not indexed but have <<doc-values,doc values>> enabled

* Queries that have a high up-front cost:
** <<query-dsl-fuzzy-query,`fuzzy` queries>> (except on
<<wildcard-field-type,`wildcard`>> fields)
<<wildcard-field-type,`wildcard`>> fields)
** <<query-dsl-regexp-query,`regexp` queries>> (except on
<<wildcard-field-type,`wildcard`>> fields)
<<wildcard-field-type,`wildcard`>> fields)
** <<query-dsl-prefix-query,`prefix` queries>> (except on
<<wildcard-field-type,`wildcard`>> fields or those without
<<index-prefixes,`index_prefixes`>>)
<<wildcard-field-type,`wildcard`>> fields or those without
<<index-prefixes,`index_prefixes`>>)
** <<query-dsl-wildcard-query,`wildcard` queries>> (except on
<<wildcard-field-type,`wildcard`>> fields)
<<wildcard-field-type,`wildcard`>> fields)
** <<query-dsl-range-query,`range` queries>> on <<text,`text`>> and
<<keyword,`keyword`>> fields
<<keyword,`keyword`>> fields

* <<joining-queries,Joining queries>>

Expand Down Expand Up @@ -82,6 +78,8 @@ include::query-dsl/term-level-queries.asciidoc[]

include::query-dsl/text-expansion-query.asciidoc[]

include::query-dsl/sparse-vector-query.asciidoc[]

include::query-dsl/minimum-should-match.asciidoc[]

include::query-dsl/multi-term-rewrite.asciidoc[]
Expand Down

0 comments on commit 7f35f1b

Please sign in to comment.