-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156
Comments
Thank you very much for your answer, which has made great progress in my questions about Khmer PDFs everywhere! It looks almost correct, but I noticed a small issue that OpenPDF may not have handled this scene well Below, I will provide an example image. The OpenPDF version I am using is 1.3.43 `public class HelloWorld {
|
@wang0331 , could you provide a smaller example only with the incorrect letters? Please compare the output of OpenPdf/LayoutProcessor with the output of HarfBuzz hb-view, see https://github.com/harfbuzz/harfbuzz/releases/tag/8.4.0 |
Thank you very much for your reply. For a minimum example, please refer to this: I compared the outputs of itext8+pdfcalligraphy, and the results they displayed were clearly correct |
@wang0331, so you are not talking about displaying the characters in PDF, but about the extraction of text from the PDF file using a PDF viewer. This task is quit complicated and the exported characters seem incorrect even with the current source code on GitHub. OpenPDF (master branch, compiled on 2024-05-11) Only the output with the experimental option |
Analysis: Font used: https://fonts.google.com/specimen/Siemreap 68 uni17A0 111 uni17CD 165 uni17D2_uni179C.zz02 The glyph 165 is a ligature and corresponds to two Unicode The method java.awt.font.GlyphVector.getGlyphCharIndex does not return this correspondence. I don't see a possibility to store a one to many So if the PDF text shown in a PDF viewer is selected and copied the last character is lost. |
Thank you for your patient answer! @vk-github18 But I think you may have misunderstood my meaning. I didn't try to copy the text from the PDF, I just tried to export the Khmer text copied from Microsoft Office Word correctly I am unable to export the given minimum example correctly using Java8 and OpenPDF 1.3, but versions 1.4 and 2.0 are acceptable. If you successfully export this minimum Khmer language using 1.3, please provide your OpenPDF 1.3 code example |
@wang0331 , I tested the minimal example: OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8 ttx/GlyphOrder The method awt.Font.layoutGlyphVector() in Java 1.8 seems to return incorrect results. Java 11 or newer are correct. |
Using OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8 I used See https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts |
@vk-github18 However, there may still be issues with exporting PDF results. Can you share the code examples for the JDK version? I want to know if I missed some details myself |
@wang0331 , sure here is the example file: Running under Linux: |
@vk-github18 If I don't use If I use Can I conclude that using jdk1.8 and OpenPDF 1.3. x, I am unable to fully export Khmer text correctly |
@wang0331 , I don't see a simple solution for Java 1.8. |
Using FOP for your examples looks as follows |
I tried using OpenPDF and appropriate fonts to export Khmer text, but the display results were not entirely correct. OpenPDF and Apache FOP seemed to solve the problem of drawing a single character with multiple byte connections, but there was an error in the drawing order
It is known that itext8 and the itext component pdfcalligraph can export Khmer PDFs normally
This is my example code from: https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts
`public class HelloWorld {
}`
I hope to receive the support of technical personnel
<dependency> <groupId>com.github.librepdf</groupId> <artifactId>openpdf</artifactId> <version>2.0.2</version> </dependency> <dependency> <groupId>org.apache.xmlgraphics</groupId> <artifactId>fop</artifactId> <version>2.9</version> </dependency> <dependency> <groupId>org.apache.xmlgraphics</groupId> <artifactId>xmlgraphics-commons</artifactId> <version>2.9</version> </dependency>
From Wang Xueren
The text was updated successfully, but these errors were encountered: