"Chat with your documents" strategy help needed #2381

ahasibrm · 2024-05-09T23:02:30Z

ahasibrm
May 9, 2024

I want to create a corpus of academic texts on a subject, usable for research and citation. The citations must include the book title and enough drill-down information to identify the source, eg, "[book name], Section 2, Chapter 3." I think the way to do that is with a sequence of document uploads, each with metadata that includes variables for book, section, chapter.

Questions:

Can I use one document-loader node and invoke it sequentially, each time manually adjusting the metadata?
Do I have to use a separate loader node for each doc for a single upsert?
Is there a way to use the folder upload instead (each doc within the folder having unique metadata values)? How would I assign the metadata?

I've also considered creating a private website wherein each page is its own document, with the page title being the metadata. Then I could use a web scraper node to load the entire site at once. I've never used a web scrape node, so I have questions:

Is page title associated with the page content such that I could query the model about the source and it would return the page title (metadata)?
If so, would that result in one concat of metadata rather than the actual three values?
Is there an alternate way to embed the metadata within the page content?

Lastly, is there a better strategy I haven't thought of?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Chat with your documents" strategy help needed #2381

{{title}}

Replies: 0 comments

Select a reply

"Chat with your documents" strategy help needed #2381

ahasibrm May 9, 2024

Replies: 0 comments

ahasibrm
May 9, 2024