Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to extract figures in pdf ? #63

Open
Myfootnotsmelly opened this issue Dec 19, 2023 · 2 comments
Open

how to extract figures in pdf ? #63

Myfootnotsmelly opened this issue Dec 19, 2023 · 2 comments
Assignees

Comments

@Myfootnotsmelly
Copy link

After setup, I tried
1.
doc.figures
2.
json.dump

but the results showed only figure box's position and its metadata, how can i get figure in the pdf?

@kyleclo
Copy link
Collaborator

kyleclo commented Mar 13, 2024

Hey @Myfootnotsmelly , sorry looks like a bug introduced; adding in this pull request: #73

@kyleclo kyleclo self-assigned this Mar 13, 2024
@kyleclo
Copy link
Collaborator

kyleclo commented Mar 18, 2024

Hihi please take a look at my response to this Issue #70

Yes, figures are represented by bounding boxes:
image

If you want the image crop of the figures, here's how you'd do it:

# get the image of a page and its dimensions
page_image = doc.images[page_id]
page_w, page_h = page_image.pilimage.size

# get the bounding box of a figure
figure_box = figures[0].boxes[0]

# convert it
figure_box_xy = figure_box.to_absolute(page_width=page_w, page_height=page_h).xy_coordinates

# crop the image using PIL
page_image._pilimage.crop(figure_box_xy)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants