Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markup: add --citeproc to pandoc converter #9953

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shoeffner
Copy link

Adds the citeproc filter to the pandoc converter.

There are several PRs for it this feature already. However, I think
simply adding --citeproc is the cleanest way to enable this feature,
with the option to flesh it out later, e.g., in #7529.

Some PRs and issues attempt adding more config options to Hugo which
indirectly configure pandoc, but I think simply configuring Pandoc via
Pandoc itself is simpler, as it is already possible with two YAML
blocks -- one for Hugo, and one for Pandoc:

---
title: This is the Hugo YAML block
---
---
bibliography: assets/pandoc-yaml-block-bibliography.bib
...
Document content with @citation!

There are other useful options, e.g., #4800 attempts to use nocite,
which works out of the box with this PR:

---
title: This is the Hugo YAML block
---
---
bibliography: assets/pandoc-yaml-block-bibliography.bib
nocite: |
  @*
...
Document content with no citations but a full bibliography:

## Bibliography

Other useful options are csl: ... and link-citations: true, which
set the path to a custom CSL file and create HTML links between the
references and the bibliography.

The following issues and PRs are related:

Note:
This PR also adds a tiny little bit of unrelated documentation to the external helpers by pointing out MathJax superficially. This certainly needs improvements, but is out of the scope for this PR.

@CLAassistant
Copy link

CLAassistant commented May 30, 2022

CLA assistant check
All committers have signed the CLA.

@bep
Copy link
Member

bep commented May 30, 2022

@shoeffner thanks for this. This PR is certainly easier to reason about (because of its size) than some of the others I've seen.

That said, the test failure on Linux shows that this needs some rethinking, as that flag is not available in the version we test on, fetched via:

sudo apt-get install -y pandoc

I assume this flag was added recently? I'm not sure how to handle this.

@shoeffner
Copy link
Author

The Linux build seems to use pandoc 2.5, but --citeproc is only supported > 2.11 (https://pandoc.org/releases.html#pandoc-2.11-2020-10-11 – on macOS, 2.18 is installed, so I didn't notice) – before --filter pandoc-citeproc was the way to go.

If I remember correctly, pandoc-citeproc had to be installed manually, but I will check it out in a container and see what I can do!

@shoeffner
Copy link
Author

choco and brew seem to install 2.18, so Windows and macOS should not need any specific handling.

On Linux we have everything from 2.5 (Ubuntu 20.04, e.g. GH actions) to 2.17 (Arch Linux); the most common version seems to be 2.9 (Debian stable, Ubuntu 22.04, and other distros), here pandoc-citeproc must be installed additionally.

This makes the pull request much more complicated than what I was aiming for, so feel free to close it again. My current approach is now:

if pandoc >= 2.11
    add --citeproc
else if pandoc-citeproc available
    add --filter pandoc-citeproc
otherwise
    keep the old behavior

If the citations are not supported (due to the version or pandoc-citeproc dependency), SupportsCitations is false and the corresponding tests are skipped. I also added a remark to the docs.
However, I was unable to test all configurations yet (i.e., Linux with newer pandoc, older pandoc, with and without pandoc-citeproc; nor did I test windows), thus I update the PR to "WIP".

An alternative could be to check for the version and ignore pandoc-citeproc. This would make it much simpler and whoever wants to use citeproc might also want to use other pandoc features anyways. But it puts some burden on the users who have to install a newer pandoc version.

What do you think?

@shoeffner shoeffner changed the title markup: add --citeproc to pandoc converter WIP: markup: add --citeproc to pandoc converter May 30, 2022
markup/pandoc/convert.go Outdated Show resolved Hide resolved
markup/pandoc/convert.go Outdated Show resolved Hide resolved
markup/pandoc/convert_test.go Outdated Show resolved Hide resolved
markup/pandoc/convert.go Outdated Show resolved Hide resolved
@shoeffner shoeffner force-pushed the citeproc branch 3 times, most recently from c71d9f8 to c000c6c Compare June 4, 2022 00:37
@shoeffner
Copy link
Author

shoeffner commented Jun 4, 2022

I rebased the commits and squashed them.
Unless there is anything else or the CI fails again, the PR is for the moment done, so I will also remove the WIP label.

@shoeffner shoeffner changed the title WIP: markup: add --citeproc to pandoc converter markup: add --citeproc to pandoc converter Jun 4, 2022
@shoeffner shoeffner changed the title markup: add --citeproc to pandoc converter WIP: markup: add --citeproc to pandoc converter Jun 5, 2022
@shoeffner shoeffner changed the title WIP: markup: add --citeproc to pandoc converter markup: add --citeproc to pandoc converter Jun 5, 2022
@shoeffner
Copy link
Author

I updated the PR again, I made a mistake (err != nil instead of err == nil)which caused the tests to skip and never activated pandoc. It should now work as expected.

@shoeffner
Copy link
Author

Rebased on master.

markup/pandoc/convert.go Outdated Show resolved Hide resolved
markup/pandoc/convert.go Outdated Show resolved Hide resolved
@shoeffner
Copy link
Author

Sorry for the late reaction. I fixed it to use argsv := []any{"--version"} instead of collections.StringSliceToInterfaceSlice....

@shoeffner
Copy link
Author

Rebased on master.

@shoeffner
Copy link
Author

Okay, comparing strings 2.5 and 2.11 was a stupid idea... Now it has a proper type and a function to compare two instances.

3c5e31f#diff-f662bf93836a7230b77cdb532095a04622eb9eda3054124e8ba9399786870efaR88-R94

Additionally, I included some test cases for those comparisons. I hope it will work now ;-)

@shoeffner
Copy link
Author

Unsupported pandoc should now cause the tests to SKIP instead of causing a panic.

@shoeffner
Copy link
Author

Rebased on the current tip of the master branch.

@jacobmerson
Copy link

@shoeffner thanks for putting this PR together. It is quite useful. Do you have a sense if this, or something like it will end up getting merged? Being able to put citations in text is quite important.

Adds the citeproc filter to the pandoc converter.

There are several PRs for it this feature already. However, I think
simply adding `--citeproc` is the cleanest way to enable this feature,
with the option to flesh it out later, e.g., in gohugoio#7529.

Some PRs and issues attempt adding more config options to Hugo which
indirectly configure pandoc, but I think simply configuring Pandoc via
Pandoc itself is simpler, as it is already possible with two YAML
blocks -- one for Hugo, and one for Pandoc:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    ...
    Document content with @citation!

There are other useful options, e.g., gohugoio#4800 attempts to use `nocite`,
which works out of the box with this PR:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    nocite: |
      @*
    ...
    Document content with no citations but a full bibliography:

    ## Bibliography

Other useful options are `csl: ...` and `link-citations: true`, which
set the path to a custom CSL file and create HTML links between the
references and the bibliography.

The following issues and PRs are related:

- Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension gohugoio#6101
  Bundles multiple requests, this PR tackles citation parsing.

- WIP: Bibliography with Pandoc gohugoio#4800
  Passes the frontmatter to Pandoc and still uses
  `--filter pandoc-citeproc` instead of `--citeproc`.
- Allow configuring Pandoc gohugoio#7529
  That PR is much more extensive and might eventually supersede this PR,
  but I think --bibliography and --citeproc should be independent
  options (--bibliography should be optional and citeproc can always be
  specified).
- Pandoc - allow citeproc extension to be invoked, with bibliography. gohugoio#8610
  Similar to gohugoio#7529, gohugoio#8610 adds a new config option to Hugo.
  I think passing --citeproc and letting the users decide on the
  metadata they want to pass to pandoc is better, albeit uglier.
@shoeffner
Copy link
Author

Thanks for getting back to this, I rebased the changes on the master.

I don't know whether this will be merged and if the pipelines still work, nor about the progress on alternative approaches, as I haven't followed the current developments much. Maybe @bep knows more about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants