You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to create a corpus of academic texts on a subject, usable for research and citation. The citations must include the book title and enough drill-down information to identify the source, eg, "[book name], Section 2, Chapter 3." I think the way to do that is with a sequence of document uploads, each with metadata that includes variables for book, section, chapter.
Questions:
Can I use one document-loader node and invoke it sequentially, each time manually adjusting the metadata?
Do I have to use a separate loader node for each doc for a single upsert?
Is there a way to use the folder upload instead (each doc within the folder having unique metadata values)? How would I assign the metadata?
I've also considered creating a private website wherein each page is its own document, with the page title being the metadata. Then I could use a web scraper node to load the entire site at once. I've never used a web scrape node, so I have questions:
Is page title associated with the page content such that I could query the model about the source and it would return the page title (metadata)?
If so, would that result in one concat of metadata rather than the actual three values?
Is there an alternate way to embed the metadata within the page content?
Lastly, is there a better strategy I haven't thought of?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I want to create a corpus of academic texts on a subject, usable for research and citation. The citations must include the book title and enough drill-down information to identify the source, eg, "[book name], Section 2, Chapter 3." I think the way to do that is with a sequence of document uploads, each with metadata that includes variables for book, section, chapter.
Questions:
I've also considered creating a private website wherein each page is its own document, with the page title being the metadata. Then I could use a web scraper node to load the entire site at once. I've never used a web scrape node, so I have questions:
Lastly, is there a better strategy I haven't thought of?
Beta Was this translation helpful? Give feedback.
All reactions