Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect hyphenation in situations including dashes for some languages (e.g. polish) #3235

Closed
1 task done
jakubkaczor opened this issue Jan 20, 2024 · 5 comments · Fixed by #4058
Closed
1 task done
Labels
bug Something isn't working text Text layout, shaping, internationalization, etc.

Comments

@jakubkaczor
Copy link

jakubkaczor commented Jan 20, 2024

Description

In some languages—in polish[1], for example—while dividing a phrase consisted of words joined with a hyphen, it is required to split it on the hyphen, leaving it at the end of the line, but also to repeat it in the beginning of a new line. Typst doesn't do it. A minimal example is the following.

#set text(lang: "pl")
#set par(justify: true)

#{100 * [biało-czerwony ]}

Incorrect compilation results.

The correct hyphenation would be

biało-
-czerwony

I wasn't sure whether it should be a bug, or a feature report. I settled on a bug caused by the missing feature.


1: https://web.archive.org/web/20240120154340/https://www.ortograf.pl/zasady-pisowni/lacznik-zasady-pisowni

Reproduction URL

No response

Operating system

Linux

Typst version

  • I am using the latest version of Typst
@jakubkaczor
Copy link
Author

jakubkaczor commented Jan 20, 2024

I would also like to link the English documentation for the LaTeX polski package providing LaTeX macros for working around this issue. It also defines macros for en-dash (ppauza) and em-dash (pauza), as there are additional rules regarding them, such as forbidding breaking a line before them. I am not sure whether these additional rules are a problem in Typst, but I will change the title to be more general to avoid creating a separate issue if necessary.

@jakubkaczor jakubkaczor changed the title Incorrect hyphenation on words joined with hyphen in some languages (e.g. in polish) Incorrect hyphenation in situations including dashes for some languages (e.g. in polish) Jan 20, 2024
@jakubkaczor jakubkaczor changed the title Incorrect hyphenation in situations including dashes for some languages (e.g. in polish) Incorrect hyphenation in situations including dashes for some languages (e.g. polish) Jan 20, 2024
@Enivex Enivex added the text Text layout, shaping, internationalization, etc. label Jan 21, 2024
@Omikhleia
Copy link

Omikhleia commented Jan 31, 2024

Just for the sharing in return, after seeing this, SILE implemented the intended behavior for Polish in v0.14.15.

On our way, we discussed and referenced (but left unaddressed for now, at the time I'm writing this message) which other languages could have the same feature, i.e. (possibly) Czech, Slovak, Portuguese, Spanish and Basque.

Just to be real clear here (one never knows), my intent is to share information back -- not to brag about what SILE does or not -- in the open-minded spirit that we all need to have good solutions for language-specific concerns.

@tomas-vl
Copy link

tomas-vl commented Feb 2, 2024

I can confirm that this behavior is desired in Czech, Slovak, Lower Sorbian (probably also Upper Sorbian), and Croatian as well, as described in their orthographic manuals (i.e. Pravidla českého pravopisu, Pravidlá slovenského pravopisu, Dolnoserbski pšawopis, Hrvatski pravopis):

  • Czech: Jestliže se rozdělují části slova nebo výrazů, mezi nimiž se píše spojovník, do dvou řádků, píše se spojovník na konci řádku i na začátku řádku následujícího. (If parts of a word or a phrase between which a hyphen are divided into two lines, the hyphen shall appear at the end of the line and the beginning of the following line.)
  • Slovak: Ak zložené slová písané so spojovníkom rozdeľujeme na mieste tohto rozdeľovacieho znamienka, spojovník píšeme na konci prvého aj na začiatku nasledujúceho riadka (spojovník zopakujeme). (If compound words containing a hyphen are hyphenated in place of the hyphen, the hyphen is written both at the end of the first line and at the beginning of the next line (the hyphen is repeated).)
  • Lower Sorbian: Mit Bindestrich zusammengesetzte Wörter werden so getrennt, daß der Bindestrich der Zusammensetzung gleichzeitig Silbentrennungszeichen ist. Er kann aber auch zu Beginn der folgenden Zeile vor dem ersten Buchstaben stehen. (Compound words are hyphenated in such a way that the hyphen of the compound is also the hyphenation mark. It can also be placed at the beginning of the following line before the first letter.) (Excuse my German translation, please.)
  • Croatian: Ako se riječi koje se pišu sa spojnicom ipak prenose u novi redak, spojnica se zapisuje i na početku novoga retka. (If words containing a hyphen are split on a new line, the hyphen is also written at the beginning of the new line.)

@gabriel-araujjo
Copy link
Contributor

gabriel-araujjo commented Apr 27, 2024

Portuguese grammar also has this hyphenation particularity.

According to the Base XX of the "Acordo Ortográfico da Língua Portuguesa de 1990",

  1. Na translineação de uma palavra composta ou de uma combinação de palavras em que há um
    hífen ou mais, se a partição coincide com o final de um dos elementos ou membros, deve, por
    clareza gráfica, repetir-se o hífen no início da linha imediata: ex- -alferes, serená- -los-emos ou
    serená-los- -emos, vice- -almirante.12

Below I bring short excerpts from practical articles that teach how to hyphenate such kind of word.

When separating words with hyphens, where the line break coincides with a hyphen, it is important to repeat the dash at the beginning of the next line. Examples:

  • Guarda-/-chuva
  • Micro-/-ondas
  • Anti-/-inflamatório3

How to hyphenate words that already have a hyphen?

Can you put two hyphens? Does this exist?

The answer is yes!

It works like this:

Existem jovens pobres, que puxam as mangas, trabalham, e se tornam bem-
-sucedidos na vida. E também existem jovens que, dentro da família, são mal-
-acostumados e acabam não se esforçando e não tendo sucesso.

One hyphen is on the top line and another on the bottom line!4

When the word already has a hyphen, and it occurs at the end of a line, the hyphen is repeated at the begining of the next line: deu-/-te; arco-/-da-/-velha; cor-/-de-/-rosa.5

Footnotes

  1. https://www2.senado.leg.br/bdsf/bitstream/handle/id/508145/000997415.pdf

  2. http://www.priberam.pt/docs/AcOrtog90.pdf

  3. https://plataforma.hexag.online/blog-noticias/translineacao-o-que-e

  4. https://guiadoestudante.abril.com.br/redacao/saiba-o-que-e-translineacao-e-como-aplicar-na-redacao

  5. https://www.jn.pt/artes/dossiers/portugues-atual/regras-de-translineacao-3291887.html/

gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 2, 2024
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 2, 2024
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 2, 2024
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 2, 2024
@gabriel-araujjo
Copy link
Contributor

In Spanish we must repeat the hyphen too, according to the RAE1.

§ 4.1.1.1.2

e) Cuando, al dividir un compuesto o cualquier otra expresión formada por varias palabras unidas con guion (v. § 4.1.1.2), este signo coincida con el final de línea, deberá escribirse otro guion al comienzo del renglón siguiente: léxico-/ -semántico, crédito-/ -vivienda, calidad-/-precio. Con ello se evita que quien lee pueda considerar que la palabra o expresión dividida se escribe sin guion.

... except when the next word is capitalized. E.g.: Ruiz-/ Giménez must not repeat the hyphen on the next line.

La repetición del guion a comienzo de línea es innecesaria en el caso de los antropónimos y topónimos compuestos, ya que la mayúscula inicial del segundo componente indica de forma suficiente que el guion no es meramente indicativo de final de línea: Ruiz-/ Giménez no podría interpretarse más que como la partición de Ruiz-Giménez, y nunca de ⊗‍RuizGiménez, pues, como se indica en el capítulo IV, § 4.3.1 y 5.2, la mayúscula intercalada no se usa en español más allá de siglas y nombres comerciales.

Footnotes

  1. https://www.rae.es/ortografía/como-signo-de-división-de-palabras-a-final-de-línea

gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 4, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 8, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 11, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 14, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 14, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 15, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
gabriel-araujjo added a commit to gabriel-araujjo/typst that referenced this issue May 15, 2024
- Czech
- Croatian
- Lower Sorbian
- Polish
- Portuguese
- Slovak
- Spanish

Fix typst#3235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working text Text layout, shaping, internationalization, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants