Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple metadata keys on RedisVectorStoreFilterType #5015 #5028

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mauriciocirelli
Copy link
Contributor

Fixes #5015

Copy link

vercel bot commented Apr 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 9, 2024 3:09pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview Apr 9, 2024 3:09pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. auto:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Apr 9, 2024
@jacoblee93
Copy link
Collaborator

Thanks for the PR!

I think this could be nice but can you please look into making it backwards compatible?

@jacoblee93 jacoblee93 added the hold On hold label Apr 11, 2024
@mauriciocirelli
Copy link
Contributor Author

mauriciocirelli commented Apr 12, 2024

Hi @jacoblee93,

Thank you for your time looking into this.
I certainly can make it more backwards compatible. I just would like to clarify a couple things:

There are two 'incompatibilities' in my original PR:

  1. Dropping of metadataKey parameter
    I can have it back and it would add a JSON object as string into the Redis document, keeping its original functionality. Not sure if that is desirable, because we would have duplicated metadata stuff.. a 'metadataKey' field containing a JSON string and many metadata fields, according to the schema.

Originally:

metadata = { a:"A", b: "B", c:"C" };

redis: JSON.stringnify(metadata);

Now:

metadata = { a:"A", b: "B", c:"C" };
metadataSchema = {
 a: SchemaFieldTypes.TEXT,
 b: SchemaFieldTypes.TEXT,
 c: SchemaFieldTypes.TEXT,
};
redis: { a:"A", b: "B", c:"C" };

Suggestion:

metadata = { a:"A", b: "B", c:"C" };
metadataSchema = {
 a: SchemaFieldTypes.TEXT,
 b: SchemaFieldTypes.TEXT,
 c: SchemaFieldTypes.TEXT,
 metadataKey: SchemaFieldTypes.TEXT,
};
redis: { metadataKey: JSON.stringnify(metadata), a:"A", b: "B", c:"C" };
  1. Filter is not accepting an array of strings:
    We can accept an array of filters and OR them as originally coded. The behavior would be different, though, because we would be ORing Redis filter strings in this array instead of possible values for the 'metadataKey' field.

Originally:

filter = [ "A", "B", "C"];
filterString = '@metadataKey{A|B|C}'

Suggestion:

filter = [ "@a:{A}", "b:{B}", "@c:{C}"];
filterString = '@a:{A} | b:{B} | @c:{C}'

Note that the old filter array would result in a filter string that does not make sense anymore:

filter = [ "a", "b", "c"];
filterString = 'a|b|c'

This is why I made it with breaking changes... At least the developer would know that things are different now and would have to change the code to get the correct behavior. Although we can keep things backwards compatible in the interfaces, we would still have a very different behavior at runtime.

Please, let me know your thoughs and how can we make it work.

Thank you.

@mauriciocirelli
Copy link
Contributor Author

mauriciocirelli commented Apr 12, 2024

Also, I have looked into other VectorStores implementations, such as PgVector

It is also storing the metadata as a JSON object in a single column metadataColumnName in the database.

It would produce similar limitations, as we would not be able to filter by metadata fields in a reasonable way...

@bsbodden
Copy link

@mauriciocirelli this is a nice PR, with the addition of the RediSearch schema definition you get the flexibility of the full power of the search engine, but that means (as you pointed out above #5028 (comment) ) that filterString assembly gets a little more complex to handle all the possible types you can declare as part of the search index schema (for the metadata). At a minimum you would have to handle the difference in syntax between TAG fields and TEXT fields. See this project for an idea of how to assemble the expressions https://github.com/redis/redis-om-node/tree/main/lib/search and handle ANDs and ORs.

For the backwards compatibility, off the top of my head, if the 'metadataKey' is present use that as the backwards compatibility trigger and do as it was done before. If the metadataSchema is set and they pass string just assume is a RediSearch expression and pass it along (this might be the easiest approach) and puts the complexity on the query /expression building and not in the framework having to spring a query builder. My 2 cents!

@mauriciocirelli
Copy link
Contributor Author

mauriciocirelli commented Apr 16, 2024

Hi @bsbodden

Regarding the backwards compatibility, I think that's easy and reasonable.
Regarding the filter expression, I think it should be just a string in case a metadataSchema is provided.
Building the expression from an object and its schema is not trivial and I think it is way beyond the scope of langchain.

In my opinion, it is not reasonable that Langchain should do such things for any kind of integration.

There are helper libs for that and the developer should know how to build a Redis query (or use redis-om for example) as much as they have to know how to build Postgres queries (or use external ORM libs) or any other 3rd party VectorStore queries they want to integrate to langchain.

That's just my opinion. If you are all okay with that, I can update the PR with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature hold On hold size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow multiple metadata keys on RedisVectorStoreFilterType
3 participants