text-extraction

This GitHub repository hosts the notebooks and tools developed as part of this thesis to automate the extraction, processing, and analysis of data from the MICCAI 2023 conference, aiding in the systematic review and providing a structured foundation for further research in this crucial area.

data-science machine-learning data-visualization text-extraction artificial-intelligence healthcare medical-imaging data-analysis datasets annotation-framework data-quality demographic-analysis medical-image-processing miccai pdf-data-extraction medical-ai healthcare-ai miccai2023 medical-ai-project

Updated May 15, 2024
Jupyter Notebook

TYPO3-Solr / ext-tika

Star

A TYPO3 CMS extension that provides Apache Tika functionality

search php metadata cms cms-extension tika language-detection typo3 typo3-cms-extension file-indexing text-extraction

Updated May 16, 2024
PHP

edhou20 / Medical-Texts-NLP-Clustering-

Star

nlp clustering text-extraction dimensionality-reduction vectorization unsupervised-learning

Updated May 13, 2024
Python

real0x0a1 / ocr-opencv

Star

OCR with Tesseract and OpenCV: Extract text from images effortlessly. Preprocess with OpenCV for accuracy. Display results and save output. Easy integration for document digitization and data entry automation.

python opencv machine-learning automation ocr image-processing tesseract text-extraction document-digitization data-entry-automation

Updated May 13, 2024
Python

ICIJ / datashare

Star

A self-hosted search engine for documents.

docker elasticsearch extract text-extraction named-entity-recognition web-gui datashare investigative-journalism

Updated May 15, 2024
Java

bookieio / breadability

Star

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

python text-mining text-extraction html-parsing html-extraction html-extractor

Updated May 9, 2024
HTML

miso-belica / jusText

Sponsor

Star

Heuristic based boilerplate removal tool

python text-extraction html-parser html-parsing

Updated May 9, 2024
Python

nguyen-tho / ID-card-extract-module

Star

deep-learning text-extraction id-card transformer-ocr

Updated May 9, 2024
Python

unidoc / unipdf

Star

Golang PDF library for creating and processing PDF files (pure go)

golang pdf signing text-extraction pdf-generator pdf-generation pdf-reader pdf-manipulation pdf-library pdf-document-processor pdf-compression pdf-sign pdf-reports

Updated May 1, 2024
Go

gamemaker1 / office-text-extractor

Star

Yet another library to extract text from MS Office and PDF files

pdf parser xlsx text-extraction ms-office docx pptx ms-excel ms-word ms-powerpoint get-text

Updated Apr 19, 2024
TypeScript

dataiku / dss-plugin-tesseract-ocr

Star

Dataiku DSS plugin to perform optical character recognition (OCR) using the Tesseract engine.

ocr tesseract text-extraction tesseract-ocr optical-character-recognition dataiku dss-plugin

Updated Apr 18, 2024
Python

chrismattmann / tika-python

Sponsor

Star

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Updated Apr 14, 2024
Python

zanachka / dateparser

Star

python parser for human readable dates

text-extraction html-extraction

Updated Apr 12, 2024
Python

Banner-19 / Extraction-and-Analysis-of-Text

Sponsor

Star

The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.

nlp data-science text-analysis python3 text-extraction nltk data-analytics data-analysis

Updated Apr 11, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 209 public repositories matching this topic...

flairNLP / fundus

abhinaba-ghosh / any-text

adbar / trafilatura

zanachka / extruct

MRGRD56 / textractor-translator

miso-belica / sumy

yasminsarkhosh / machine-learning-bsc-thesis-2024

TYPO3-Solr / ext-tika

edhou20 / Medical-Texts-NLP-Clustering-

real0x0a1 / ocr-opencv

ICIJ / datashare

bookieio / breadability

miso-belica / jusText

nguyen-tho / ID-card-extract-module

unidoc / unipdf

gamemaker1 / office-text-extractor

dataiku / dss-plugin-tesseract-ocr

chrismattmann / tika-python

zanachka / dateparser

Banner-19 / Extraction-and-Analysis-of-Text

Improve this page

Add this topic to your repo