Trouble getting bookmarks from a pdf document #736

BSevault · 2023-11-20T13:18:42Z

Discussed in #735

^{Originally posted by BSevault November 20, 2023}
Hello.

I'm working with an internal pdf which is generated programatically, and I'm having difficulties getting its bookmarks using PdfPig. But I managed to get bookmarks from other pdf documents. I need to get the bookmarks and the pages they point to in order to split the original pdf in multiple pdf according to the bookmarks.
When I debug my code, when I use TryGetBookmarks on my PdfDocument, it returns true but the Bookmarks contains nothing : it's length is 0.

Previously, I managed to get Bookmarks of the same pdf with itext7 but I caanot use it anymore due to licence issues.

My guess is that it is related to the way outlines and bookmarks are formed in the internal pdf's structure.

I tried to get the object containing "/Title" in the pdf but I failed.

Does anyone have an idea on how to get it done ?

Here is an extract of the pdf bookmarks objects :

2 0 obj

<< 

/Type/Catalog

/Pages 3 0 R

/Outlines 22048 0 R

>>

endobj

...

...

...

22048 0 obj

<< 

/Count 304

/First 22049 0 R

/Last 22201 0 R

>> 

endobj

22049 0 obj

<< 

/Title(TITLE0)

/Parent 22048 0 R

/Next 22201 0 R

/First 22050 0 R

/Last 22200 0 R

/Count 151

>> 

endobj

22050 0 obj

<< 

/Title(SUBTITLE1 /Page 1)

/A 22353 0 R

/Parent 22049 0 R

/Next 22051 0 R


>> 

endobj

22353 0 obj

<< 

/S/GoTo

/D[8 0 R /Fit]

>> 

endobj

22051 0 obj

<< 

/Title(SUBTITLE2 /Page 138)

/A 22354 0 R

/Parent 22049 0 R

/Next 22052 0 R

/Prev 22050 0 R


>> 

endobj

22354 0 obj

<< 

/S/GoTo

/D[338 0 R /Fit]

>> 

... and so on...

The text was updated successfully, but these errors were encountered:

BobLd · 2023-11-20T20:47:07Z

Hi @BSevault, thanks for creating the issue. Would you be able to share a sample document, and maybe a snipet of how you extract rhe bookmarks?

BSevault · 2023-11-21T08:44:51Z

Thanks for your answer, @BobLd.

Unfortunately, I cannot share the document I'm working on since it's confidential. I can only share an extract of the internal structure of the PDF.

The code I used to get bookmarks:

PdfDocument pdfDocument = new PdfDocument("path/to/my/document.pdf");
bool hasBookmarks = pdfDocument.TryGetBookmarks(out Bookmarks bookmarks);

hasBookmarks is true, but bookmarks.Roots.Count = 0.

I tried to get the tokens I need using PdfTokenScanner, but no success.

EliotJones · 2024-01-10T21:37:24Z

I tried looking into this but without the document I'm getting nowhere. I have pushed a PowerShell script to enable setting a single target framework:

path\to\clonedrepo\tools> .\set-dotnet-version.ps1

This will make it easier to build and run the project locally. Without access to the source file the only approach I can think of is debug it locally where you can access the file, see LocalTests for running against a single file from the file system https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Tests/Integration/LocalTests.cs

EliotJones added bug help wanted document-reading Related to reading documents labels Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble getting bookmarks from a pdf document #736

Trouble getting bookmarks from a pdf document #736

BSevault commented Nov 20, 2023 •

edited

BobLd commented Nov 20, 2023

BSevault commented Nov 21, 2023

EliotJones commented Jan 10, 2024

Trouble getting bookmarks from a pdf document #736

Trouble getting bookmarks from a pdf document #736

Comments

BSevault commented Nov 20, 2023 • edited

Discussed in #735

BobLd commented Nov 20, 2023

BSevault commented Nov 21, 2023

EliotJones commented Jan 10, 2024

BSevault commented Nov 20, 2023 •

edited