PDF Parser

git clone https://github.com/simonkeng/pdf_parser.git

docker build -t pdf_parser .

Usage:

Run the container and execute the python script passing in a document:

docker run -i -t pdf_parser bash -c "python pdf_rip.py test_data.pdf"

You can also extract from multiple files, just place all your PDFs in one folder and copy it over to your docker container.

docker cp pdfs/ 609d09bb400f:/tmp/pdfs/

..replacing 609d09bb400f with your container ID. Now we can run the batch script within a new container.

docker run -i -t pdf_parser bash -c "python batch.py pdf/"

This command will return a container ID. To ensure it ran, and to check the status:

docker logs <containerID>

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
docker		docker
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py