PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

ahaganDEV · 2022-04-05T09:56:04Z

What were you trying to do?

Given some PDF file data retrieved from our API, pass them through into PDF-Lib to manipulate them (draw stamps etc) and merge them into one PDF document to output to disk. Then the PDF should be opened in Adobe Acrobat Pro DC.

How did you attempt to do it?

Initially we receive PDF data from an API that returns it in UInt8Array format.
Load the data into PDF-Lib:
const embedDoc = await PDFDocument.load(pdfFileData);
Embed the pages into the document:

const pages = embedDoc.getPages();
  for (const page of pages) {
      // Exclude blank pages
      if (!page.node.Contents()) {
        continue;
      }

      const newPage = embedDoc.addPage([page.getWidth(), page.getHeight()]);

      const embedPage = await embedDoc.embedPage(page);

      newPage.drawPage(embedPage, {
        x: 0,
        y: 0,
        xScale: 1,
        yScale: 1,
        width: page.getWidth(),
        height: page.getHeight(),
      });

Later on in the process, merge multiple PDF Documents together:

for (const pdfFile of this.pdfFiles) {
        const copiedPages = await mergedPdf.copyPages(
          pdfFile,
          pdfFile.getPageIndices()
        );
        copiedPages.forEach((page, index) => {
           mergedPdf.addPage(page);
        });
    }

Save and then output the merged file to disk:

const pdfData = await mergedPdf.save();
fs.writeFileSync('my-path\myfile.pdf', pdfData);

What actually happened?

The PDF generated can be opened in native PDF readers on Windows, MacOS and Ubuntu. However when trying to open it in Adobe Acrobat Pro DC, it fails to open, giving the following error:

When run through this PDF Checker tool https://www.pdf-online.com/osa/repair.aspx it outputs the followoing error:

0x80410306 - E - The "Length" key of the stream object is wrong.
    - Object No.: 10
    - File: Generated_Merged_File.pdf

When repaired, this PDF can then be opened in Adobe Acrobat Pro DC.

When Opened in RUPS here is the basic structure and the stream length of the above object:

Here is the RUPS view of the repaired PDF (notice the differing stream length highlighted)

What did you expect to happen?

The PDF file opens up correctly in Adobe Actobat Pro DC

How can we reproduce the issue?

Here is the original PowerPoint PDF file that is retrieved from our API (this PDF itself opens fine in Adobe Acrobat)
simple_ppt.pdf

Here is the generated PDF after it is passed through PDF-Lib and has gone through the merge process (this does NOT open in Adobe Acrobat)
Generated_Merged_File.pdf

Here is the output of the repaired PDF using the tool https://www.pdf-online.com/osa/repair.aspx (this does open in Adobe Acrobat)
Generated_Merged_File.pdf_recovered.pdf

Example code snippets are shown above.

Version

1.16.0

What environment are you running pdf-lib in?

Node

Checklist

My report includes a Short, Self Contained, Correct (Compilable) Example.
I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

No response

The text was updated successfully, but these errors were encountered:

Trapfether · 2022-04-16T00:48:31Z

I have determined that the reason for the issue is related to License text that is being embedded with the font. The license text contains pdf keywords that are confusing the Stream Parser in pdf-lib.

I'm working on an improvement that would make pdf-lib resilient to this particular issue.

ahaganDEV · 2022-05-09T11:12:31Z

@Trapfether I see your PR has been open for over 3 weeks now. Do you know if it is likely to merged and released soon?

Trapfether · 2022-05-11T01:25:05Z

@ahaganDEV a new release of pdf-lib is cut every few months as needed. I havn't yet received any contact or feedback from the maintainer so doubt it will be release soon.

In the mean time, you can apply my changes to your local copy depending on how you use pdf-lib. I use the browser-based version and so run the build myself and use the resulting files.

If you're using the backend version, you can use NPM Link or maintain your own repository and install the package from that repository instead of this one. However, you would want to check back periodically and switch back to using this repository once the change has been merged so you also get patches.

ahaganDEV added bug needs-triage labels Apr 5, 2022

Trapfether linked a pull request Apr 16, 2022 that will close this issue

Make stream parser resilient to text in streams #1215

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

ahaganDEV commented Apr 5, 2022

Trapfether commented Apr 16, 2022

ahaganDEV commented May 9, 2022

Trapfether commented May 11, 2022

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

Comments

ahaganDEV commented Apr 5, 2022

What were you trying to do?

How did you attempt to do it?

What actually happened?

What did you expect to happen?

How can we reproduce the issue?

Version

What environment are you running pdf-lib in?

Checklist

Additional Notes

Trapfether commented Apr 16, 2022

ahaganDEV commented May 9, 2022

Trapfether commented May 11, 2022