Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: repairing latin binomials #2

Open
jrjhealey opened this issue Aug 9, 2018 · 2 comments
Open

Enhancement: repairing latin binomials #2

jrjhealey opened this issue Aug 9, 2018 · 2 comments

Comments

@jrjhealey
Copy link

Hi Jaime,

Possible enhancement for you!

If pybtex doesn't already correct this, it would be good if this can also incorporate the fix for correctly italicising Latin bionomials (fairly simple search-and-replace to switch HTML italics tags, to TeX format tags. There's an old script online (below) which does essentially this, but isn't the best Python in the world...
Inspired by:

https://twitter.com/MendeleySupport/status/776001527664156672

and

https://itskathylam.wordpress.com/2016/01/12/dealing-with-italics-in-bibtex-files-exported-from-mendeley/

#!/usr/bin/python
 
# By: Kathy Lam
# Date: January 11, 2016
# Purpose: Replace all instances of "<i>" with "\textit{"
#          and "</i>" with "}" in bibtex file generated by Mendeley
 
oldbib = open("bibliography.bib", "r")
newbib = open("new_bibliography.bib", "w")
 
for line in oldbib:
    if line.startswith("title"):
        if "<i>" in line:
            fixed_open_tags = line.replace("<i>", "\\textit{")
            fixed_both = fixed_open_tags.replace("</i>", "}")
            newbib.write(fixed_both)
        else:
            newbib.write(line)
    else:
        newbib.write(line)

If there was some logic to catch and handle duplicate entries that would be really useful too (a problem I end up with quite often).

Cheers!

Joe

@jaimergp
Copy link
Owner

Hi Joe! Thanks for the feedback.

I'd say we should regex against some common HTML code in titles (italics, subscript, and superscript, mainly). Do you have any examples at hand?

For the duplicate entries, let's create a separate issue.

@jrjhealey
Copy link
Author

jrjhealey commented Aug 13, 2018

Yep ok good idea! I'll open another issue for duplicates.

I'll commit a folder of different examples that I come up with to my fork of the repo, and then make a PR so you can test against them too perhaps?

Currently what I've thought of are an example of:

  • Italicised text
  • Sub/superscripts (chemical formulae etc.)
  • Duplicated bib entries.

In my experience it's quite good at converting special characters in names etc so that's probably enough to cover 90% of the troublesome refs.

Edit:

It looks like subs/superscript might be difficult, as Mendeley (which I export my bib files from), just coerces them to normal case letters/numbers (they have no HTML around them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants