Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - {Lattice} SpreadsheetExtractionAlgorithm failing in capturing rows and cells #21

Open
GuroGuru opened this issue Dec 22, 2021 · 0 comments

Comments

@GuroGuru
Copy link

GuroGuru commented Dec 22, 2021

Describe the bug
While extracting a PDF I realized some tables were getting split because a row was not captured, as if it was considered a blank line. On other tables, the last cell in the row was skipped.

Screenshots
The screenshots are not from the original PDF, but it will hopefully illustrate the problem.

Given a table with a schema similar to the image below:
1-schema

I expected to capture the entire table at once:
2-expected

However, once the 5th was skipped, I ended up with two distinct capture groups.
3-extracted

The other problem is that sometimes some cell are skipped. One example is the 1st capture, that have missing data at the 3 ending rows:
4-skipped-data

Sometimes the data skipping happens also when the table is not split.

@GuroGuru GuroGuru changed the title [BUG] - {Stream or Lattice} {Description} [BUG] - {Lattice} SpreadsheetExtractionAlgorithm failing in capturing rows and cells Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant