Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual bibliography #126

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

nrydanov
Copy link

@nrydanov nrydanov commented Jan 11, 2024

Currently, there's no any support for language entry in CSL or BibLatex. This PR can add possibility to parse and use multiple languages in single bibliography, depending on the value of language parameter for each entry.

@nrydanov nrydanov force-pushed the feature/language-specific-terms branch from 485f99e to 4f7bf0a Compare January 11, 2024 22:17
@nrydanov nrydanov changed the title Multimodal bibliography entries Multilangual bibliography Jan 11, 2024
@nrydanov nrydanov changed the title Multilangual bibliography Multilingual bibliography Jan 12, 2024
@nrydanov nrydanov marked this pull request as ready for review January 12, 2024 08:28
src/csl/mod.rs Outdated Show resolved Hide resolved
src/csl/mod.rs Outdated Show resolved Hide resolved
src/csl/mod.rs Show resolved Hide resolved
src/csl/mod.rs Outdated Show resolved Hide resolved
src/csl/mod.rs Outdated Show resolved Hide resolved
src/csl/rendering/mod.rs Outdated Show resolved Hide resolved
src/csl/rendering/names.rs Outdated Show resolved Hide resolved
Comment on lines 5 to 22
lazy_static! {
static ref LANGUAGE_CODE_MAPPING: HashMap<&'static str, &'static str> =
HashMap::from([
("english", "en"),
("german", "ge"),
("french", "fr"),
("russian", "ru"),
("italian", "it"),
("chinese", "cn"),
("japanese", "jp"),
("ukranian", "ua")
]);
}

/// This function returns mapping for required language
pub fn get_mapping(s: &str) -> Option<&str> {
return LANGUAGE_CODE_MAPPING.get(s).copied();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mapping seems incomplete and strangely used. As I see it, it should only be required for BibLaTeX as CSL defines its language field in terms of ISO-639-1 codes with regions. We interpret this as RFC 5646 language tags.

According to the BibLaTeX manual, the langid entry (p. 28) controls the language of a citation. Its values shall be either a Babel/Polyglossia language tag which includes RFC 5646 language tags. The normalization to RFC 5646 belongs in the biblatex crate from where we then can create 1:1 LocaleCode structs in the interop module. You may want to use an external dependency in BibLaTeX for the language names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change this soon.

Copy link
Author

@nrydanov nrydanov Mar 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me understand how and where you convert LanguageIdentifier into LocaleCode that is used everywhere?

UPD. I found out that CLI doesn't make this conversion at all. Currently, I've added following lines in reference command parsing

            for entry in &bibliography {
                let mut item = CitationItem::with_entry(entry);
                item.locale = Some(LocaleCode(String::from(
                    entry.language().unwrap().language.as_str(),
                )));
                driver.citation(CitationRequest::new(
                    vec![item],
                    &style,
                    locale.clone(),
                    &locales,
                    None,
                ))
            }

It works, but I can't truly understand where should this conversion actually be done. It seems that it should happen somewhere earlier than this place.

src/lang/mod.rs Outdated Show resolved Hide resolved
tests/citeproc.rs Show resolved Hide resolved
@nrydanov nrydanov marked this pull request as ready for review March 24, 2024 13:43
@nrydanov nrydanov requested a review from reknih March 24, 2024 13:44
@nrydanov
Copy link
Author

Hello anyone?...

Copy link
Member

@reknih reknih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this PR, I have investigated how this fits into the CSL ecosystem.

citeproc.js and its use in Zotero is the reference implementation for CSL. When using Zotero to generate a bibliography, the app will ask the user to explicitly choose a language for that bibliography and will ignore the language field of each entry.

Screenshot of Zotero's Create Bibliography from Items modal with a language dropdown menu

This corresponds with our current behavior of using the locale in the BibliographyRequest. Merging this PR would lead to a divergence between Hayagriva and Zotero/citeproc.js and violate the CSL spec.

There is an extension to CSL called CSL-M which aims, amongst other things, to accommodate multilingual bibliographies. In it, a CSL-M style file may contain multiple locale-specific layout attributes for bibliographies and citations as well as a fallback locale. Consider the following example:

<bibliography>
  <layout suffix="." locale="da de">
    <text macro="bibliography" />
  </layout>
  <layout suffix=".">
    <text macro="bibliography" />
  </layout>
</bibliography>

This file changes how citeproc.js retrieves term localizations:

  • An item with the language da-DK would be rendered with the locale="da de" layout and receive Danish terms.
  • An item with the language de-DE would be rendered with the locale="da de" layout but use Danish terms as well. This is because the locale attribute defines a fallback order for terms locales for all entries rendered with it.
  • An item with the language jp-JP does not match any locale-specific layouts and would be rendered with the locale-less last layout. Assuming the user specified their default locale as en-US in the dropdown, the citation would not use Japanese but American English terms.

You can try this using the Juris-M fork of Zotero, previously known as Multilingual Zotero (MLZ) that focusses on CSL-M support. This behavior requires the citation processor to supply multiple styles, one of which has multiple layouts (known as polyglot) and one with just one layout. You can see this in the CSL-M style repo, for example with jm-chicago-fullnote-bibliography-polyglot.csl and jm-chicago-fullnote-bibliography.csl.

I do not personally find the CSL-M implementation the best since it requires a proliferation of styles as well as manually listing each supported language in a style. However, I also do not think that we should unconditionally choose to violate the spec. A configuration option to always prefer the terms from the language field of an entry in BibliographyRequest is a possible solution, another solution is to implement CSL-M which would be a larger effort and require changes in citationberg.

What would you think to be the best course of action? Let's discuss it here before you push more code.

On a different note I noticed that this PR still contains formatting changes of unchanged code which I cannot reproduce. Please revert them to maintain a clean diff!

@@ -105,12 +105,17 @@ impl<'a, T: EntryLike + Hash + PartialEq + Eq + Debug> BibliographyDriver<'a, T>
let mut entry_set = IndexSet::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut entry_set = IndexSet::new();
let mut item_set = IndexSet::new();

@@ -320,8 +320,11 @@ fn main() {

let mut driver = BibliographyDriver::new();
for entry in &bibliography {
let mut item = CitationItem::with_entry(entry);
let id = entry.language().unwrap_or_default();
item.locale = Some(LocaleCode(String::from(id.language.as_str())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, CitationItem::with_entry should use EntryLike's resolve_standard_variable feature to set the CitationItem's locale.

@nrydanov
Copy link
Author

Thanks for your answer!

I will study your investigation and reply a little bit later. Currently, I definitely have poor domain knowledge.

@nrydanov
Copy link
Author

nrydanov commented Apr 17, 2024

Ok, I got it.

Well, I think that the best idea for now is to just add configuration option. Let me explain why:

  1. First of all, the primary goal of this PR is to just add support for language entries.
  2. I would consider thinking about big changes in project only if
    • it's clear that those changes would make things much better
    • here's enough time and resources to do it
    • when a person, who realize this feature, completely understand what he does

I wouldn't say that this situation matches at least one of this points, because

  • you said that you personally "do not find CSL-M implementation the best" and there's no person in this project, who can judge it better
  • I spend my free time doing this PR, so I'm not sure that I will be ready for such hard work, at least alone
  • I need to do much more research in case we want to achieve smth bigger than small feature

Also, I find the idea of making configuration option as good one, because personally I didn't care of any CSL specs before this day and as end-user I don't care about it at all.

@nrydanov
Copy link
Author

nrydanov commented Apr 17, 2024

I'll also review changes in formatting and revert them. Still can't understand why do we have different formatting rules ;)

I'll also squash commits when PR is done.

@reknih
Copy link
Member

reknih commented May 8, 2024

Sounds good. I'll review the change once it hits this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants