tika

Star

Here are 141 public repositories matching this topic...

zhurlik / doc-search

Star

A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana

elasticsearch kibana tika tesseract-ocr tika-server

Updated Jan 23, 2021
Java

dataiku / dss-plugin-nlp-extraction

Star

WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents

ocr tika tesseract text-recognition speech-to-text optical-character-recognition dataiku document-extraction dss-plugin

Updated Jan 11, 2021
Makefile

contribution-jhipster-uga / sample-jhipster-docpreview

Star

The simple monolithic application demonstrates: the extraction of the images of the PDF document pages using Apache Tika, the storage of the images files into the local filesystem, the display of the pages using the ngx-swiper-wrapper library.

pdf spring tika jhipster image-storage ngx-swiper-wrapper

Updated May 9, 2023
Java

Slvkelevra / information-retrieval-system

Star

Information retrieval system for documents.

java information-retrieval tika apache lucene

Updated Feb 15, 2022
HTML

sarbanandabhikkhu / DhammaChakka

Star

Early Buddhist texts from the Tipitaka (Tripitaka). Suttas (sutras) with the Buddha's teachings on mindfulness, insight, wisdom, and meditation.

tika buddhism pali tipitaka sutta vinaya dhammachakka abhidhamma atthakatha

Updated Jul 6, 2023
JavaScript

msafwankarim / lufin

Star

LuFIn (Lucene File Indexer)

tika lucene fileindex

Updated Nov 6, 2023
Java

lguberan / LuceneFx

Star

Tiny unofficial javafx demo application for Apache's Lucene and Tika.

tika javafx lucene

Updated Apr 6, 2024
Java

tirthmehta / Apache-Solr-based-Web-Search-Engine

Star

Deployment of a search engine utilizing Apache Solr, Apache Tika and spelling correction programs.

python java php solr tika

Updated Jul 28, 2017

sesam-community / content-extractor

Star

Extract textual information using the Apache Tika library from JSON streams

docker tika transform sesam

Updated Apr 25, 2017
Java

gcpetri / SiteMap-Python

Star

Extracts GPS coordinates from pdf files and Points/Polygons from kmz files to create a master kml file. 🌎

pyqt5 tika geolocation python3 geology

Updated Jul 7, 2021
HTML

tusharkm / search_engine_using_lucene

Star

A Java application that uses Lucene and Tika to search document and display the document part in which the document is found.Along with precision and recall value

java search-engine tika lucenesearch

Updated Aug 20, 2017
Java

voltek62 / Rwahoo

Star

Create the ultimate scraper with Apache Tika for R

cran r tika

Updated Mar 23, 2018
R

stainlessai / grails-tika

Star

A plugin for using Apache Tika in Grails/Micronaut projects

tika micronaut grails4

Updated Nov 5, 2019
Groovy

frytoli / geotopic-parser-enabled-tika-docker

Star

Container-ized (Docker) GeoTopicParser-Enabled Apache Tika Server with Lucene Geo Gazetteer.

docker tika gazetteer tika-server geo-gazetteer

Updated Apr 5, 2021
Dockerfile

DDansAbelenda / doc-clusterizer

Star

DocClusterizer is a Java desktop application designed to analyze and cluster documents based on their content similarity. The application utilizes Lucene and Tika libraries to process various file extensions such as txt, pdf, docx, and pptx.