Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingest folders with symlink #748

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spaceymonk
Copy link

Hi, I wanted to propose a simple change in ingest script.

Reason behind it: I store my documents in a separate storage and instead of copying files I symlink them to SOURCE_DOCUMENTS folder. It works when I link documents one by one (i.e. giving full path for each file). But I have documents in nested folders and linking directories not work due to followlinks flag defaults to False. Documentation

The PR solves this problem.

Copy link

@NitkarshChourasia NitkarshChourasia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect.
But have you tested it?

@NitkarshChourasia
Copy link

What do you mean by separate storage?

Can you specify further?

@NitkarshChourasia
Copy link

Hi, I wanted to propose a simple change in ingest script.

Reason behind it: I store my documents in a separate storage and instead of copying files I symlink them to SOURCE_DOCUMENTS folder. It works when I link documents one by one (i.e. giving full path for each file). But I have documents in nested folders and linking directories not work due to followlinks flag defaults to False. Documentation

The PR solves this problem.

Do you mean by different hard-disk/SSD? or something?

@NitkarshChourasia
Copy link

@spaceymonk just for clarification I am asking.

@spaceymonk
Copy link
Author

  • I've tested it, of course.
  • By storage I meant another directory in the filesystem that I can symlink to under SOURCE_DOCUMENTS folder. For example, I store my documents under ~/Documents path and if I wanted to run localGPT on my documents, I have to copy/move al the files into SOURCE_DOCUMENTS path.

With this change I just run ln -s ~/Documents/ Documents and it automatically detects and ingests all the files under ~/Documents/.

There might be occur one problem, as the documentation stated, if you created a loop in your path, i.e. symlinking parent directory in child directory, it may run into infinite loop due to lack of storing of visited paths in Python.

@NitkarshChourasia
Copy link

NitkarshChourasia commented Apr 6, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants