Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: http(s) support for file readers #754

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

himself65
Copy link
Member

Fixes: #495

Copy link

changeset-bot bot commented Apr 22, 2024

⚠️ No Changeset found

Latest commit: eac554a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Apr 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
llama-index-ts-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 24, 2024 8:24pm

fs: GenericFileSystem = defaultFS,
): Promise<Document[]> {
const dataBuffer = await fs.readRawFile(file);
const blob = new Blob([dataBuffer]);
return [new ImageDocument({ image: blob, id_: file })];
return [new ImageDocument({ image: blob, id_: `${file}` })];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if file is an URL then you can also use
[new ImageDocument({ image: file, id_: ${file} })];
because image can be URL
(I guess this only makes sense for URLs with http(s):// prefix as we might send the image's URL to a LLM)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file system also can read URLs if it's file: schema

like

fs.readFile(new URL('file:/path/to/file'))

So, it could be error if pass file URL or non-public URL to LLM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I forgot that an http(s):// URL can still be non-public, and we can't find out whether it is public or not, so it's better to use Blob as a general case.
It might be worth adding a dedicated function for reading a public URL though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there are many cases:

  1. string
    1.1 base64 ✅
    1.2 http(s) ✅
    1.3 others
  2. URL
    2.1. image URL
    2.1.1 nonpublic
    2.1.2 public ✅
    2.2 non-image URL
  3. blob
    3.1 image blob (png/jpg...) ✅
    3.2 non-image blob (docx, pdf...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants