Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF only partially parsed #163

Open
guillaume-millot opened this issue Apr 30, 2024 · 2 comments
Open

PDF only partially parsed #163

guillaume-millot opened this issue Apr 30, 2024 · 2 comments

Comments

@guillaume-millot
Copy link

guillaume-millot commented Apr 30, 2024

I parsed the below PDF using llama parse:
Allianz_2017_CbCR_7.pdf

Unfortunately, on page 1, only the left column got parsed:
image

@guillaume-millot guillaume-millot changed the title Table not parsed PDF only partially parsed Apr 30, 2024
@pratiksinghchauhan
Copy link

I can confirm this issue, LLamaParse misses a lot of text in the documents. On comparing the results of Llamaparse with Marker I noticed that LLamaparse doesn't parse around 40-60% of texts in PDF depending on the file. I must say, whatever llamaparse parses is superior to any other pdf to markdown converter out there but this issue makes it unusable. Look forward to a quick resolution from the team.

@guillaume-millot
Copy link
Author

guillaume-millot commented May 19, 2024

It seems the "markdown" format is broken:

With markdown format selected in UI
image

That's all we get (not much text).

With markdown format unselected
image

We get a lot more text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants