PDF-Querying-using-TF-IDF-from-Scratch

Given a set of PDFs and the query, the most relevant pdf can be found with the help of TF-IDF. The code has not used any library to implement TF-IDF

Explanation

The code only uses pdfminer and glob libraries to read pdf and traverse a directory for pdf. The Tf-idf is done manually without using any library. To understand the code, please read the comments in the code.

PDF Files

A sample folder is uploaded with few pdfs to tryout the code.

PDF_querying.py

Includes the reading of pdf files using pdfminer library
Extracting words from each pdf
Take query input from the user
tf-idf for the pdf and query
Ranking the pdfs that have same words from the query

text querying.py

The text from the documents are taken as string initially
Rest process is same as the other code.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
PDFfiles		PDFfiles
PDF_querying.py		PDF_querying.py
README.md		README.md
text quering.py		text quering.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDFfiles

PDFfiles

PDF_querying.py

PDF_querying.py

README.md

README.md

text quering.py

text quering.py

Repository files navigation

PDF-Querying-using-TF-IDF-from-Scratch

Explanation

PDF Files

PDF_querying.py

text querying.py

About

Releases

Packages

Languages

shreyansh-kothari/PDF-Querying-using-TF-IDF-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

PDF-Querying-using-TF-IDF-from-Scratch

Explanation

PDF Files

PDF_querying.py

text querying.py

About

Topics

Resources

Stars

Watchers

Forks

Languages