The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
-
Updated
May 11, 2024 - Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Research Data Management Platform (RDMP) is an open source application for the loading,linking,anonymisation and extraction of datasets stored in relational databases.
DocILE: Document Information Localization and Extraction Benchmark
extract internal monitoring data from application logs for collection in a timeseries database
Dump Discord's cache and identify files
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Extract files from any kind of container formats
Go Library for Queuing and Extracting Archives: Rar, Zip, 7zip, Gz, Tar, Tgz, Bz2, Tbz2
Analysis over years for real estate across Canada
Assemblyline 4 File extraction service
Web data extraction can be effectively performed using CSS selectors. https://rapidapi.com/hotbrainstech-hotbrainstech-default/api/crawler-extractor
A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
Provides functions to read and write from/to an object or array using a simple string notation
Extract zip files, containing other rar files automatically. Also copies files with ensured interity
⚡️ Build quick LLM pipelines for your image applications
A robust web archive analytics toolkit
a python tool to extract GoBit .pak files
A tool written in Python to perform a bibliographic analysis of the NIME proceedings archive and other similar corpora.
🦜⛏️ Did you say you like data?
Add a description, image, and links to the extraction topic page so that developers can more easily learn about it.
To associate your repository with the extraction topic, visit your repo's landing page and select "manage topics."