Curate scraped HTML for large language models. Build more robust generative AI applications. Convert HTML to Markdown using Regex, BeautifulSoup4, and filter useless content with Jina Embeddings.
html
markdown
converter
asynchronous
regex
embeddings
curation
similarity-score
similarity-threshold
jina
ai-assistant
llm
retrieval-augmented-generation
custom-gpt
context-curation
context-converter
-
Updated
Jan 27, 2024 - Python