Skip to content

Commit

Permalink
Merge pull request #701 from GSK-FS/issue-47-ingest-html-documents
Browse files Browse the repository at this point in the history
HTML ingest
  • Loading branch information
PromtEngineer committed Jan 6, 2024
2 parents 0b4fa33 + 7bc14c8 commit e8d294e
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions constants.py
Expand Up @@ -6,6 +6,7 @@
# https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/excel.html?highlight=xlsx#microsoft-excel
from langchain.document_loaders import CSVLoader, PDFMinerLoader, TextLoader, UnstructuredExcelLoader, Docx2txtLoader
from langchain.document_loaders import UnstructuredFileLoader, UnstructuredMarkdownLoader
from langchain.document_loaders import UnstructuredHTMLLoader


# load_dotenv()
Expand Down Expand Up @@ -43,6 +44,7 @@

# https://python.langchain.com/en/latest/_modules/langchain/document_loaders/excel.html#UnstructuredExcelLoader
DOCUMENT_MAP = {
".html": UnstructuredHTMLLoader,
".txt": TextLoader,
".md": UnstructuredMarkdownLoader,
".py": TextLoader,
Expand Down

0 comments on commit e8d294e

Please sign in to comment.