Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute the WordLift Vector Store #13028

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

ziodave
Copy link
Contributor

@ziodave ziodave commented Apr 22, 2024

Description

Add support for the WordLift Vector Store.

Fixes #12107

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 22, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@logan-markewich
Copy link
Collaborator

@ziodave Seems like the tests aren't working. Have you tried running them locally?

@ziodave ziodave marked this pull request as draft April 24, 2024 05:11
@ziodave
Copy link
Contributor Author

ziodave commented Apr 24, 2024

@ziodave Seems like the tests aren't working. Have you tried running them locally?

@logan-markewich yes, sorry I converted the PR to draft until we fix it. May I ask, we recreated the ubuntu-latest-unit-tester in our organization GH Runners so that we can have the GH Actions run on our fork, https://github.com/wordlift/llama_index/actions/runs/8800952266.

Oddly enough the tests pass there, is there a special configuration we need to apply to the GH Runner? This is basically the configuration we did:
image

cc @EthanWordlift

@logan-markewich
Copy link
Collaborator

logan-markewich commented Apr 25, 2024

@ziodave I don't think any special config is needed.

The errors in the test seem unrelated to env though

# Get the key to use for the operation.
>       key = await self.key_provider.for_query(query)
E       TypeError: object str can't be used in 'await' expression

@ziodave
Copy link
Contributor Author

ziodave commented Apr 25, 2024

@logan-markewich we're on it, we'll update the PR soon. Thanks!

@ziodave ziodave marked this pull request as ready for review April 25, 2024 12:11
@ziodave
Copy link
Contributor Author

ziodave commented Apr 25, 2024

@logan-markewich we're ready for review 🙏

def __init__(self, key: str):
self.key = key

async def for_add(self, nodes: List[BaseNode]) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why these methods have to be async? Or what this is even for actually (they all return the same thing 👀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if I'm understanding properly, I can't use just any embedding model with wordlift, it has to be a specific one.

I'm unfamiliar with wordlift, but what's the reason for that restriction?

node_id=node.node_id,
embeddings=node.get_embedding(),
text=node.get_content(metadata_mode=MetadataMode.NONE) or "",
metadata={},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not store the node metadata here?

for node in nodes:
node_dict = node.dict()
metadata: Dict[str, Any] = node_dict.get("metadata", {})
entity_id = metadata.get("entity_id", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is an entity id required? What happens if it's not provided?

(I really should go read more about wordlift haha)

from manager_client.rest import RESTResponseType


class VectorSearchQueriesApi:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to publish this as its own package? Like a client SDK? Its fine if it lives here for now though

Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me, just worried about a few UX things that might trip up users. I wonder if we can smooth out some of this or not?

@ziodave
Copy link
Contributor Author

ziodave commented Apr 27, 2024

This looks mostly good to me, just worried about a few UX things that might trip up users. I wonder if we can smooth out some of this or not?

Hello @logan-markewich yes, we'll review your comments and be back soon with updates and answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: add support for the WordLift VectorStore
4 participants