Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for tagged PDFs (PDF/UA, PDF/A) #832

Open
tobwen opened this issue Mar 15, 2024 · 5 comments
Open

support for tagged PDFs (PDF/UA, PDF/A) #832

tobwen opened this issue Mar 15, 2024 · 5 comments
Assignees

Comments

@tobwen
Copy link

tobwen commented Mar 15, 2024

feature request

Would it be possible to add support for tagged PDFs, f.e. used in PDF/UA (accessibility)?

PDF specification

All the specifications have been released for free (no costs!):

additional information

When creating accessible PDFs, some processes such as retagging or moving tags (e.g. footnotes) must be done manually via Acrobat or a few compatible editors. It would be great if this could be done via JSON, for example.

I can provide all kinds of demo files to support this endeavor.

@hhrutter
Copy link
Collaborator

Hello!

I put this in the pipeline.
Can you provide maybe one or two small, representative demo files?

@tobwen
Copy link
Author

tobwen commented Mar 15, 2024

I put this in the pipeline. Can you provide maybe one or two small, representative demo files?

Sure, I'll create some nice demo files.

I'm not in a financial position to support the whole thing financially (at least not beyond the occasional pizza or maybe a small pizza oven). But I will do my best to contribute details, data and testing time.

@tobwen
Copy link
Author

tobwen commented Apr 20, 2024

hint

I've handcrafted a tagged file. This isn't PDF/UA-valid, since I didn't embedd the font and didn't add stuff like XMP. That makes it easier to understand, how tagged PDFs work.

If you're interested, I can also supply full featured files, with complicated tables. But here it is more difficult to create a "plain" version.

untagged PDF

%PDF-1.4
%µ¶

1 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
>>
endobj

2 0 obj
<<
  /Type /Pages
  /Kids [ 3 0 R ]
  /Count 1
>>
endobj

3 0 obj
<<
  /Type /Page
  /Parent 2 0 R
  /MediaBox [ 0 0 612 792 ]
  /Resources <<
    /Font <<
      /F0 4 0 R
    >>
  >>
  /Contents 5 0 R
>>
endobj

4 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /BaseFont /Helvetica
>>
endobj

5 0 obj
<<
  /Length 89
>>
stream
BT
/F0 12 Tf
100 700 Td
(Hello, this is line 1.) Tj
0 -20 Td
(And this is line 2.) Tj
ET

endstream
endobj

xref
0 6
0000000000 65536 f 
0000000016 00000 n 
0000000070 00000 n 
0000000136 00000 n 
0000000291 00000 n 
0000000368 00000 n 

trailer
<<
  /Size 6
  /Root 1 0 R
>>
startxref
510
%%EOF

tagged PDF

%PDF-1.4
%µ¶

1 0 obj
<<
  /Pages 2 0 R
  /StructTreeRoot 5 0 R
  /Type /Catalog
>>
endobj

2 0 obj
<<
  /Count 1
  /Kids [ 3 0 R ]
  /Type /Pages
>>
endobj

3 0 obj
<<
  /Contents 10 0 R
  /MediaBox [ 0 0 612 792 ]
  /Parent 2 0 R
  /Resources <<
    /Font <<
      /F0 4 0 R
    >>
  >>
  /StructParents 0
  /Type /Page
>>
endobj

4 0 obj
<<
  /BaseFont /Helvetica
  /Subtype /Type1
  /Type /Font
>>
endobj

5 0 obj
<<
  /K 6 0 R
  /ParentTree 9 0 R
  /ParentTreeNextKey 1
  /Type /StructTreeRoot
>>
endobj

6 0 obj
<<
  /K [ 7 0 R 8 0 R ]
  /Lang (en_US)
  /P 5 0 R
  /S /Document
  /T ()
>>
endobj

7 0 obj
<<
  /K 0
  /P 6 0 R
  /Pg 3 0 R
  /S /P
  /T ()
>>
endobj

8 0 obj
<<
  /K 1
  /P 6 0 R
  /Pg 3 0 R
  /S /P
  /T ()
>>
endobj

9 0 obj
<<
  /Nums [ 0 [ 7 0 R 8 0 R ] ]
>>
endobj

10 0 obj
<<
  /Length 139
>>
stream
q
/P<</MCID 0>>BDC
BT
/F0 12 Tf
100 700 Td
(Hello, this is line 1.)Tj
EMC
/P<</MCID 1>>BDC
20 TL
0 -20 Td
(And this is line 2.)Tj
EMC
ET
Q

endstream
endobj

xref
0 11
0000000000 65536 f 
0000000016 00000 n 
0000000094 00000 n 
0000000160 00000 n 
0000000335 00000 n 
0000000412 00000 n 
0000000512 00000 n 
0000000605 00000 n 
0000000673 00000 n 
0000000741 00000 n 
0000000793 00000 n 

trailer
<<
  /Size 11
  /Root 1 0 R
>>
startxref
987
%%EOF

@hhrutter
Copy link
Collaborator

Please upload the file for analysis if you can.
Thanks for using pdfcpu 💚

@tobwen
Copy link
Author

tobwen commented Apr 20, 2024

Please upload the file for analysis if you can.

untagged.pdf
tagged.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants