Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings support in the SDK #1475

Merged
merged 3 commits into from
May 22, 2024
Merged

Embeddings support in the SDK #1475

merged 3 commits into from
May 22, 2024

Conversation

levkk
Copy link
Contributor

@levkk levkk commented May 22, 2024

SDK

  1. Added pgml.embed() support into Builtins, with single and batch mode.

@levkk levkk requested a review from SilasMarvin May 22, 2024 16:17
.map(|v| {
v.as_str()
.with_context(|| "only text embeddings are supported")
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can collect into a Result<Vec<_>> so you can ? the Result instead of panicking if the value is not a str.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

let texts = texts
            .0
            .as_array()
            .with_context(|| "embed_batch takes an array of texts")?
            .iter()
            .map(|v| {
                v.as_str()
                    .with_context(|| "only text embeddings are supported")
                    .map(|s| s.to_string())
            })
            .collect::<anyhow::Result<Vec<String>>>()?;

I think will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want that. The API accepts an array of strings, so if someone passes in an array of objects that can be cast to a string, they will get embeddings of integers, or whatever else they passed in, which will make them think that everything is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still calling as_str here so if it was an object, it wouldn't be treated as a string. I was merely suggesting using Result::Err instead of unwrap. But all is good.

@SilasMarvin
Copy link
Contributor

SilasMarvin commented May 22, 2024

A test for both in Rust, Python, and JavaScript would be nice

@levkk
Copy link
Contributor Author

levkk commented May 22, 2024

A test for both in Rust, Python, and JavaScript would be nice

Do these tests run in CI or only locally? They require a PostgresML installation to be running somewhere right?

@levkk
Copy link
Contributor Author

levkk commented May 22, 2024

I can't seem to find instructions for setting up the JS environment for testing this. I did the Python one from memory, but it would be nice to have a short README for developers. I'm going to skip the JS tests for now and come back to them when I have a moment.

@levkk levkk merged commit 03ca54e into master May 22, 2024
1 check passed
@levkk levkk deleted the levkk-pgml-embed-sdk branch May 22, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants