Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble getting bookmarks from a pdf document #736

Open
BSevault opened this issue Nov 20, 2023 Discussed in #735 · 3 comments
Open

Trouble getting bookmarks from a pdf document #736

BSevault opened this issue Nov 20, 2023 Discussed in #735 · 3 comments
Labels
bug document-reading Related to reading documents help wanted

Comments

@BSevault
Copy link

BSevault commented Nov 20, 2023

Discussed in #735

Originally posted by BSevault November 20, 2023
Hello.

I'm working with an internal pdf which is generated programatically, and I'm having difficulties getting its bookmarks using PdfPig. But I managed to get bookmarks from other pdf documents. I need to get the bookmarks and the pages they point to in order to split the original pdf in multiple pdf according to the bookmarks.
When I debug my code, when I use TryGetBookmarks on my PdfDocument, it returns true but the Bookmarks contains nothing : it's length is 0.

Previously, I managed to get Bookmarks of the same pdf with itext7 but I caanot use it anymore due to licence issues.

My guess is that it is related to the way outlines and bookmarks are formed in the internal pdf's structure.

I tried to get the object containing "/Title" in the pdf but I failed.

Does anyone have an idea on how to get it done ?

Here is an extract of the pdf bookmarks objects :

2 0 obj

<< 

/Type/Catalog

/Pages 3 0 R

/Outlines 22048 0 R

>>

endobj

...

...

...

22048 0 obj

<< 

/Count 304

/First 22049 0 R

/Last 22201 0 R

>> 

endobj

22049 0 obj

<< 

/Title(TITLE0)

/Parent 22048 0 R

/Next 22201 0 R

/First 22050 0 R

/Last 22200 0 R

/Count 151

>> 

endobj

22050 0 obj

<< 

/Title(SUBTITLE1 /Page 1)

/A 22353 0 R

/Parent 22049 0 R

/Next 22051 0 R


>> 

endobj

22353 0 obj

<< 

/S/GoTo

/D[8 0 R /Fit]

>> 

endobj

22051 0 obj

<< 

/Title(SUBTITLE2 /Page 138)

/A 22354 0 R

/Parent 22049 0 R

/Next 22052 0 R

/Prev 22050 0 R


>> 

endobj

22354 0 obj

<< 

/S/GoTo

/D[338 0 R /Fit]

>> 

... and so on...
@BobLd
Copy link
Collaborator

BobLd commented Nov 20, 2023

Hi @BSevault, thanks for creating the issue. Would you be able to share a sample document, and maybe a snipet of how you extract rhe bookmarks?

@BSevault
Copy link
Author

Thanks for your answer, @BobLd.

Unfortunately, I cannot share the document I'm working on since it's confidential. I can only share an extract of the internal structure of the PDF.

The code I used to get bookmarks:

PdfDocument pdfDocument = new PdfDocument("path/to/my/document.pdf");
bool hasBookmarks = pdfDocument.TryGetBookmarks(out Bookmarks bookmarks);

hasBookmarks is true, but bookmarks.Roots.Count = 0.

I tried to get the tokens I need using PdfTokenScanner, but no success.

@EliotJones
Copy link
Member

I tried looking into this but without the document I'm getting nowhere. I have pushed a PowerShell script to enable setting a single target framework:

path\to\clonedrepo\tools> .\set-dotnet-version.ps1

This will make it easier to build and run the project locally. Without access to the source file the only approach I can think of is debug it locally where you can access the file, see LocalTests for running against a single file from the file system https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Tests/Integration/LocalTests.cs

@EliotJones EliotJones added bug help wanted document-reading Related to reading documents labels Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug document-reading Related to reading documents help wanted
Projects
None yet
Development

No branches or pull requests

3 participants