tika

Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

metadata incremental tika crawling extraction

Updated Apr 15, 2024
Java

liquidinvestigations / hoover-snoop2

Star

Processing system for the search engine service in Liquid Investigations.

docker elasticsearch django tika celery tesseract-ocr

Updated Apr 9, 2024
Python

lguberan / LuceneFx

Star

Tiny unofficial javafx demo application for Apache's Lucene and Tika.

tika javafx lucene

Updated Apr 6, 2024
Java

DDansAbelenda / doc-clusterizer

Star

DocClusterizer is a Java desktop application designed to analyze and cluster documents based on their content similarity. The application utilizes Lucene and Tika libraries to process various file extensions such as txt, pdf, docx, and pptx.

tika javafx java-8 lucene kmeans-clustering linkage document-clustering kmeans-algorithm lucene-analyzer unsupervised-clustering fuzzycmeans

Updated Apr 6, 2024
Java

rse / tika-server

Sponsor

Star

Apache Tika Server as a Background Service in Node.js

server service tika apache process background

Updated Apr 3, 2024
JavaScript

Improve this page

Add a description, image, and links to the tika topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tika

Here are 141 public repositories matching this topic...

apache / tika

albertus82 / extfix

apache / tika-docker

OpenSextant / Xponents

dadoonet / fscrawler

apache / tika-helm

Dimous / tsundoku

TYPO3-Solr / ext-tika

kestra-io / plugin-tika

EricLondon / Docker-Rails-Tika-Elasticsearch

shelfio / tika-text-extract

sarbanandabhikkhu / tipitaka-xml

bcgov / nr-bcws-opensearch

ICIJ / extract

quarkiverse / quarkus-tika

DFKI / leechcrawler

liquidinvestigations / hoover-snoop2

lguberan / LuceneFx

DDansAbelenda / doc-clusterizer

rse / tika-server

Improve this page

Add this topic to your repo