Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a boolean query with both "should" and "must" clauses? #258

Open
yattias opened this issue Jun 28, 2021 · 4 comments
Open
Labels

Comments

@yattias
Copy link

yattias commented Jun 28, 2021

Questions

Hi @barseghyanartur. First, thanks for this great package. It has been extremely useful.

I was unable to find an answer to my question in the docs or by examining the source code so I figured I'd take a look.

Basically, what I need to do is generate a boolean query where one part of it is in a must clause and the other is in a should. More specifically, the query I would like to generate is as such:

  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [ SOME_FIELDS ],
            "operator": "and",
            "query": "SOME QUERY TERMS"
          }
        }
      ],
      "should": [
          {
            "term": {
                "SPECIFIC_FIELD": "SOME QUERY TERMS"
            }
          }
      ]
    }
  },

The reason for the above is to boost a phrase match.

With that being said, whenever I try mixing the following backends:

filter_backends = [
        PhraseSearchFilterBackend, # Custom
        MultiMatchSearchFilterBackend,
]

What ends up happening is that my term query ends up in a must clause even though I specify matching="should" pretty much everywhere.

I even debugged this all the way to base.py where I confirmed matching="should" yet somehow the final query ends up all in the "must".

Any ideas what I'm doing wrong?

@yattias
Copy link
Author

yattias commented Jun 28, 2021

For reference, here is my configuration:

class PaperDocumentView(DocumentViewSet):
    document = PaperDocument
    permission_classes = [ReadOnly]
    serializer_class = PaperDocumentSerializer
    pagination_class = LimitOffsetPagination
    lookup_field = 'id'
    filter_backends = [
        PhraseSearchFilterBackend,
        MultiMatchSearchFilterBackend,
        CompoundSearchFilterBackend,
        FacetedSearchFilterBackend,
        FilteringFilterBackend,
        PostFilterFilteringFilterBackend,
        DefaultOrderingFilterBackend,
        OrderingFilterBackend,
        HighlightBackend,
    ]

    search_fields = {
        'doi': {'boost': 3, 'fuzziness': 1},
        'title': {'boost': 2, 'fuzziness': 1},
        'raw_authors.full_name': {'boost': 1, 'fuzziness': 1},
        'abstract': {'boost': 1, 'fuzziness': 1},
        'hubs_flat': {'boost': 1, 'fuzziness': 1},
    }

    multi_match_search_fields = {
        'doi': {'boost': 3, 'fuzziness': 1},
        'title': {'boost': 2, 'fuzziness': 1},
        'raw_authors.full_name': {'boost': 1, 'fuzziness': 1},
        'abstract': {'boost': 1, 'fuzziness': 1},
        'hubs_flat': {'boost': 1, 'fuzziness': 1},
    }

    multi_match_options = {
        'operator': 'and'
    }

    post_filter_fields = {
        'hubs': 'hubs.name',
    }

    faceted_search_fields = {
        'hubs': 'hubs.name'
    }

    filter_fields = {
        'publish_date': 'paper_publish_date'
    }

    ordering = ('_score', '-hot_score', '-discussion_count', '-paper_publish_date')

    ordering_fields = {
        'publish_date': 'paper_publish_date',
        'discussion_count': 'discussion_count',
        'score': 'score',
        'hot_score': 'hot_score',
    }

    highlight_fields = {
        'raw_authors.full_name': {
            'field': 'raw_authors',
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 1000,
                'number_of_fragments': 10,
            },
        },
        'title': {
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 2000,
                'number_of_fragments': 1,
            },
        },
        'abstract': {
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 5000,
                'number_of_fragments': 1,
            },
        }
    }

@yattias
Copy link
Author

yattias commented Jun 30, 2021

Wondering if someone here can help 🙏

@Sachin-Kahandal
Copy link

extend base class get_queryset() and define your own queries there, rather than using search-filter-backends

def get_queryset(self): 
    # getting search param from request
    request = self.request
    text_raw = request.GET.get("search")
    query0 = multi-match query
    query1 = match query
    query2 = matchphrase
    etc...
    q1 = Bool(should=[query0, query1, tquery1, dquery1, tquery3, dquery3, item_url_query])
    queryset = Search(using=self.client, index=self.index, doc_type=self.document._doc_type.name).query(q1)
    return queryset

You will have finer control over your queries with this

@barseghyanartur
Copy link
Owner

barseghyanartur commented Jul 21, 2022

This question comes up regularly. I'll add it to the FAQ, but TL;DR:

If you need a combination of ANDs and ORs, use SimpleQueryStringSearchFilterBackend. Check for examples here and in docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants