Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for text files with other extensions .org (org mode) or .md (markdown) #1415

Open
nausher opened this issue May 2, 2024 · 3 comments

Comments

@nausher
Copy link

nausher commented May 2, 2024

I have quite a few notes that are created in Emacs Org-mode or Obsidian. These are markdown or org-mode files which have a .org or .md extension. These are text files with a different extension.

I uploaded these files to Danswer and they were 'indexed' but I see that all my search queries do not pull up any information from these files.

Can support be added for text files with non-'txt' extension.

@nausher
Copy link
Author

nausher commented May 2, 2024

I believe the change for this could be as simple as addding ".org" to this line in backend/danswer/connectors/file/utils.py
_VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx"]
changed to -
_VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx",".org"]

_VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx"]

@zarlor
Copy link

zarlor commented May 7, 2024

Hmm... I have ingested .md files without issue. I think it might not read them as formatted files, mind you, but it does seem to accept them and they are searchable for me as text files, at least.

@nausher
Copy link
Author

nausher commented May 8, 2024

@zarlor - the issue seems to be now limited to ".org" files. The code has a filter to accept files with the extesnion ".md" & ".mdx"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants