#

html2text

Here are 27 public repositories matching this topic...

AbdellatifCHE / Collect_Store_Search

The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.

python mongodb pymongo nltk scrapy html2text lemmatization

Updated Mar 3, 2020
Python

MattJeanLouis / scrap_web

C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.

markdown open-source interactive python3 web-application web-scraping data-extraction html2text beautifulsoup4 streamlit

Updated May 23, 2023
Python

cycloidio / docker-image-html2text

Dockerized html2text command-line tool

docker tool html2text

Updated Mar 18, 2019
Makefile

cycloidio / docker-image-python-html2text

Dockerized Python html2text command-line tool

html docker tool text html2text

Updated Mar 15, 2019
Makefile

sophiaken / Web-Scraping-Project-Python

Scraped Web using an automated python script that acted as scrapper to extract content from Wikipedia pages and created a clean dataset from it.

pandas-dataframe python3 html2text beautifulsoup4 scrapper-script

Updated Jun 19, 2020
Python

afeiship / next-html2text

Strip html to text for next.

html text strip html2text

Updated Mar 5, 2021
JavaScript

puhoy / readability_cli

a cli tool to fetch webpages main content and print it as markdown

markdown html-to-markdown python3 readability html2text readability-lxml readability-cli fetch-webpages

Updated Oct 31, 2020
Python

erayon / PubMed

This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.

html xml sklearn nltk xgboost beautifulsoup html2text svm-classifier

Updated Nov 7, 2017
HTML

LukaszNiewinski / Microservice-for-retrieving-img-and-text

Microservice for text and images collection for data science purposes.

python api docker flask service docker-compose scrapy html2text

Updated Nov 22, 2022
Python

rubix1138 / html2text

html2text Search Command for Splunk

python splunk html2text splunk-enterprise splunk-application splunk-searches

Updated Mar 4, 2019
Python

hcq0618 / html-files-to-markdown-files

batch convert html files to mardown files

html html2text mardown

Updated May 17, 2019
Python

masroore / php-html2text

A PHP package to convert HTML into a plain text format

html html-parser html2text

Updated Jun 13, 2022
PHP

BrenoFariasdaSilva / Python

My Python Codes.

python adb python3 pip shellscript html2text pip3 dagster pydriller ppadb

Updated May 3, 2024
Python

gsdefender / packtpub_telegram_bot

Receive Packt Publishing Ltd. Free Learning updates in Telegram every day

telegram telegram-bot selenium packtpub html2text selenium-python

Updated May 16, 2020
Python

importcjj / go-readability

Go package that cleans a HTML page for better readability.

go html golang text extractor text-extraction readability html2text html-extractor

Updated Aug 1, 2023
HTML

gereoffy / deepspam2

DeepSpam milter v2

nlp email-parsing spam-filtering html2text spam-detection neural

Updated Feb 17, 2024
Python

susilthapa / knowledge-retrieval-with-imgs

AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8

python html2text beautifulsoup4 opeanai streamlit langchain llama-index

Updated Aug 20, 2023
Python

AndyTheFactory / article-extraction-dataset

Article title, authors, date and body extraction dataset.

text-mining news html-to-markdown scraping corpus news-aggregator text-extraction dataset web-scraping readability datasets scraping-websites html2text news-crawler corpus-builder corpus-tools article-extractor text-cleaning text-preprocessing

Updated Mar 26, 2024
HTML

x28 / inscriptis-java

inscriptis - HTML to text conversion library for Java

java converter library html2text

Updated Aug 4, 2022
Java

pH-7 / Html2Text

A very simple (but efficient) "HTML to plain text" converter ✍️

php converter php7 text plain-text html2text convertor text-converter email-text-parsing htmltotext symfony-mailer text-convertor

Updated Jun 11, 2023
PHP

Improve this page

Add a description, image, and links to the html2text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html2text topic, visit your repo's landing page and select "manage topics."