Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Support MathJax custom tags #50

Open
ljrk0 opened this issue Jul 20, 2022 · 2 comments
Open

🐛 Bug: Support MathJax custom tags #50

ljrk0 opened this issue Jul 20, 2022 · 2 comments
Labels
bug Something isn't working enhancement New feature or request plugin

Comments

@ljrk0
Copy link

ljrk0 commented Jul 20, 2022

Describe the bug
MathJax is a JavaScript library allowing to add "custom tags" such as $...$ to HTML which will then be turned into e.g., MathML or whatever the browser supports.

Depending on the Markdown implementation math is either not supported at all -- or directly through the same syntax. Either way, it'd probably make most sense to simply keep $...$ expressions intact and not escape strings contained therein. While a simple filter for that would certainly work, MathJax allows supporting different escape characters than $...$ for inline- and $$...$$ for display-math, e.g., from the article https://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/:

<script>
window.MathJax = {
  tex: {
    tags: "ams",                                                                       inlineMath: [ ['$','$'], ['\\(', '\\)'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
  },
  options: {
    skipHtmlTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
  },
  loader: {
    load: ['[tex]/amscd']                                                            }
};
</script>

This would necessate parsing Js though ...

HTML Input

some formula: $\lambda$

Generated Markdown

some formula: $\\lambda$

Expected Markdown

some formula: $\lambda$

Additional context
This filter (or "unfilter") may be only activated, if MathJax is detected, and otherwise disabled. Further, as mentioned earlier, a more sophisticated parsing of the HTML may be used to detect the precise math-HTML tags used or make them configurable at the least.

@ljrk0 ljrk0 added the bug Something isn't working label Jul 20, 2022
@JohannesKaufmann JohannesKaufmann added enhancement New feature or request plugin labels Jul 23, 2022
@JohannesKaufmann
Copy link
Owner

I don't think getting the content between the $ signs will always work, as it can also be server-side-rendered. Luckily it seems like both MathJax and Katex (also) support the <math> tag.

So a math plugin would need to support both methods:

it will typically have a $\lambda$-expression as argument.
<mjx-assistive-mml unselectable="on" display="inline">
  <math xmlns="http://www.w3.org/1998/Math/MathML">
    <mi>λ</mi>
  </math>
</mjx-assistive-mml>

I won't add this plugin anytime soon, as it would be a lot of work. But this plugin should exist! Ideally maintained by someone better in math than me 😅

I'm planning a v2 of the library. Maybe I will add it then...


You could already help by collecting various snippets from websites you encounter. This should cover a variety of uses (e.g. client-side-rendering, server-side-rendering, different libraries, content that looks like math but is NOT, ...)

See this file as an example. It follows this pattern:

<!-- https://example.com/page1 -->
<div>snippet 1</div>

<hr />

<!-- https://example.com/page1 -->
<p>snippet 2</p>

<hr />

...

@ljrk0
Copy link
Author

ljrk0 commented Jul 25, 2022

Thanks for implementing #49 so quickly!

Yeah, MathJax supports LaTeX-Style, MathML as well as AsciiMath. Converting MathML to Markdown however is probably quite much work. Simply "passing through" dollar-signs if so-configured in the scripts may work "good enough" for most use cases though?

I've just noticed that pandoc can do just the thing:

pandoc --from=html+tex_math_dollars+tex_math_single_backslash+tex_math_double_backslash \
       --to=markdown \
       --output=foo.md \
       input.html

You can also choose --to=html to convert e.g., `$\lambda. \dots$ to:

<span class="math inline"><em>λ</em><em>i</em>.…</span>

Which works good enough for my use cases for now. Adding real $ support is quite tricky, especially when it comes to finding the closing tag etc.

Regardless, I will collect examples I stumble upon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request plugin
Projects
None yet
Development

No branches or pull requests

2 participants