🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
-
Updated
Jun 14, 2024 - TypeScript
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Convert HTML to Markdown, and Convert Markdown to HTML
Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
Clients to use with the hosted spider service - spider.cloud
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Transform your HTML into clean, easy-to-read markdown with html2md.
helloworld 开发者社区开源的一个轻量级,强大的 html 一键转 md 工具,支持多平台文章一键转换,并保存下载到本地。
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
Script that fetches page content fromo URL and turns it into Markdown
🛏 An HTML to Markdown converter written in JavaScript
website scraper for text with conversion to markdown.md and directory structuring
A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.
A simple Swift package that converts HTML into Markdown
Table/List to Markdown - A simple GM userscript to extract tables and lists from any website and save them as Markdown.
reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.
Article title, authors, date and body extraction dataset.
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."