PDF only partially parsed #163

guillaume-millot · 2024-04-30T11:36:17Z

I parsed the below PDF using llama parse:
Allianz_2017_CbCR_7.pdf

Unfortunately, on page 1, only the left column got parsed:

pratiksinghchauhan · 2024-05-01T09:53:53Z

I can confirm this issue, LLamaParse misses a lot of text in the documents. On comparing the results of Llamaparse with Marker I noticed that LLamaparse doesn't parse around 40-60% of texts in PDF depending on the file. I must say, whatever llamaparse parses is superior to any other pdf to markdown converter out there but this issue makes it unusable. Look forward to a quick resolution from the team.

guillaume-millot · 2024-05-19T17:58:35Z

It seems the "markdown" format is broken:

With markdown format selected in UI

That's all we get (not much text).

With markdown format unselected

We get a lot more text.

guillaume-millot changed the title ~~Table not parsed~~ PDF only partially parsed Apr 30, 2024

logan-markewich added the pdf_debug label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF only partially parsed #163

PDF only partially parsed #163

guillaume-millot commented Apr 30, 2024 •

edited

pratiksinghchauhan commented May 1, 2024

guillaume-millot commented May 19, 2024 •

edited

PDF only partially parsed #163

PDF only partially parsed #163

Comments

guillaume-millot commented Apr 30, 2024 • edited

pratiksinghchauhan commented May 1, 2024

guillaume-millot commented May 19, 2024 • edited

guillaume-millot commented Apr 30, 2024 •

edited

guillaume-millot commented May 19, 2024 •

edited