-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grobid Returned None #1112
Comments
Hi @tinotendamarufetu could you share some more information about your Grobid deployment? Which server are you running (native, docker)? Could you share the SO you're using? Do you have some logs from the server? |
So I am running it on docker sitting on Kali Linux: Trying to use the processfulltext pdf client: Below are the logs: ( ____ ( ____ )( ___ )( ___ \ __ /( __ \ ( ____ ( ____ ( ____ )|\ /|_ /( ____ ( ____ INFO [2024-05-09 13:27:28,047] org.eclipse.jetty.setuid.SetUIDListener: Opened application@51e3d37e{HTTP/1.1,[http/1.1]}{0.0.0.0:8070}
INFO [2024-05-09 13:28:35,645] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@7bf018dd{/,null,AVAILABLE}
INFO [2024-05-09 13:28:35,658] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@23b1aa9{/,null,AVAILABLE} |
Hello @tinotendamarufetu This is not related to Grobid, but to the client. Are you using https://github.com/kermitt2/grobid_client_python ? grobid_client --input path_to_input_dir --output path_to_output_dir processFulltextDocument or see https://github.com/kermitt2/grobid_client_python?tab=readme-ov-file#using-the-client-in-your-python how to call it from a python script. There is no method "process" in this client that returns the resulting XML TEI string, |
Kindly assist. Failing to run the grobid fulltext processor getting below are the errors from my log:
2024-05-06 16:04:21,633 ERROR: Error processing after-appointment.pdf: Grobid returned None
2024-05-06 16:04:21,634 ERROR: Error processing appointment-checklist.pdf: Grobid returned None
2024-05-06 16:04:21,634 ERROR: Error processing A_FS_HCP_COVID19_PPE_card.pdf: Grobid returned None
2024-05-06 16:04:21,634 ERROR: Error processing CDC_COVID_Posters_GeneralAudience_card.pdf: Grobid returned None
2024-05-06 16:04:21,635 ERROR: Error processing How-Protein-Subunit-Vaccines-Work.pdf: Grobid returned None
2024-05-06 16:04:21,635 ERROR: Error processing protect-yourself-and-your-baby-print.pdf: Grobid returned None
2024-05-06 16:04:21,636 ERROR: Error processing temp-log-ultra-cold-storage-fahrenheit.pdf: Grobid returned None
2024-05-06 16:04:21,636 ERROR: Error processing Test-Soon-Treat-Early.pdf: Grobid returned None
This is my my pdf processing code:
Process each PDF in the input directory
for filename in os.listdir(input_dir):
if filename.endswith(".pdf"):
pdf_path = os.path.join(input_dir, filename)
output_path = os.path.join(output_dir, f"{os.path.splitext(filename)[0]}.xml")
The text was updated successfully, but these errors were encountered: