A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana
-
Updated
Jan 23, 2021 - Java
A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana
WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents
The simple monolithic application demonstrates: the extraction of the images of the PDF document pages using Apache Tika, the storage of the images files into the local filesystem, the display of the pages using the ngx-swiper-wrapper library.
Information retrieval system for documents.
Early Buddhist texts from the Tipitaka (Tripitaka). Suttas (sutras) with the Buddha's teachings on mindfulness, insight, wisdom, and meditation.
Extracts GPS coordinates from pdf files and Points/Polygons from kmz files to create a master kml file. 🌎
A Java application that uses Lucene and Tika to search document and display the document part in which the document is found.Along with precision and recall value
Container-ized (Docker) GeoTopicParser-Enabled Apache Tika Server with Lucene Geo Gazetteer.
DocClusterizer is a Java desktop application designed to analyze and cluster documents based on their content similarity. The application utilizes Lucene and Tika libraries to process various file extensions such as txt, pdf, docx, and pptx.
The Information Retrieval Labolatories
Information Retrieval system for indexing and searching files stored on disk, with support for Romanian language
POC: azure-functions (kotlin, gradle, tika)
Add a description, image, and links to the tika topic page so that developers can more easily learn about it.
To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."