Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode multiplication sign not a binary operator #3203

Open
mycarrhiza opened this issue Mar 5, 2024 · 2 comments
Open

Unicode multiplication sign not a binary operator #3203

mycarrhiza opened this issue Mar 5, 2024 · 2 comments
Labels
Accepted Issue has been reproduced by MathJax team Code Example Contains an illustrative code example, solution, or work-around Merged Merged into develop branch Test Needed v3
Milestone

Comments

@mycarrhiza
Copy link

The symbol × is not properly rendered: ${a × b} \neq {a \mathbin{×} b}$. The fix used here is a × ba \mathbin{×} b.

Obviously this isn't a huge problem, but it's not ideal either—I've run across some other examples of symbols which should obviously be mathbin or mathrel but whose Unicode character is not classed correctly by MathJax.

In the meantime, can someone help with these questions:

  • How to tell MathJax to always use a certain math class for a symbol?
  • Is there a way to determine which class MathJax is using for a given symbol?
@dpvc
Copy link
Member

dpvc commented Mar 6, 2024

MathJax doesn't go out of its way to try to handle unicode input, so this is one of those cases where things aren't working as one would like. For characters that aren't specifically defined as having a meaning in MathJax (like ^ and _), there are two places that determine what TeX class is used: the RANGES list at

https://github.com/mathjax/MathJax-src/blob/8565f9da973238e4c9571a86a4bcb281b1d98d9b/ts/core/MmlTree/OperatorDictionary.ts#L83

and the operator dictionary (OPTABLE) starting at

https://github.com/mathjax/MathJax-src/blob/8565f9da973238e4c9571a86a4bcb281b1d98d9b/ts/core/MmlTree/OperatorDictionary.ts#L171

The RANGES list breaks down the unicode codepoints into groups of characters that are given the same TeX class and what MathML element type to use for them. This is a rough grouping, and is not a perfect mapping of characters to classes, but is meant as a means of getting most things into an acceptable form. This allows, for example, Latin and Greek letters to be put into mi elements, while arrows can be put into mo elements. Since these are broad groups of characters, not every one in these ranges may be properly typed.

In your case, the times symbol is U+00D7, and that ends up falling within

  [0x00C0, 0x024F, TEXCLASS.ORD, 'mi'], // Latin-1 Supplement, Latin Extended-A, Latin Extended-B

which is mostly ranges of Latin letters, but does include a few symbols, like U+00D7 and U+00F7, which are not separated out. This means these two will get class ORD and be placed in an mi, as though they were letters like all the rest of that block. That is why you are getting the wrong spacing.

It would be possible to refine the RANGES list to handle these two characters better, but there is another approach that can be used, which relies on the operator dictionary. The operator dictionary is how MathML decides what the spacing should be around the various operators in mo elements. MathJax augments this to include what TeX class to use for each entry in the table. It turns out that there are entries for both U+00D7 and U+00F7, but because the operator table only applies to mo elements, and the RANGES table puts these two characters into mi instead, the table never gets used for them.

So an alternative to adjusting the RANGES table is to have the getRange() function first check whether the character being looked up is in the OPTABLE, and if so, return the proper TeX class and indicate that and mo is needed, otherwise look through the RANGES table for the value to use. That way, anything for which MathJax already has better data via the OPTABLE will automatically be placed in an mo, even if the RANGES had some other node type.

A MathJax configuration for v3 that does that is the following:

MathJax = {
  startup: {
    ready() {
      const OperatorDictionary = MathJax._.core.MmlTree.OperatorDictionary;
      const {getRange, OPTABLE} = OperatorDictionary;
      OperatorDictionary.getRange = function (text) {
        const def = OPTABLE.infix[text] || OPTABLE.prefix[text] || OPTABLE.postfix[text];
        return (def ? [0, 0, def[2], 'mo'] : getRange(text));
      }
      MathJax.startup.defaultReady();
    }
  }
}

This will be added in v4 (I am making a PR for it), but the code above will not work with v4. For v4 (now out in beta), you could use

MathJax = {
  startup: {
    ready() {
      const {RANGES} = MathJax._.core.MmlTree.OperatorDictionary;
      const {TEXCLASS} = MathJax._.core.MmlTree.MmlNode;
      RANGES.splice(
        2, 1,
        [0x00C0, 0x00D6, TEXCLASS.ORD, 'mi'],
        [0x00D7, 0x00D7, TEXCLASS.BIN, 'mo'],
        [0x00D8, 0x024F, TEXCLASS.ORD, 'mi']
      );
      MathJax.startup.defaultReady();
    }
  }
}

To special case U+00D7 into an mo, which then causes the operator dictionary values to be used. One could handle U+00F7 similarly, if needed.

@dpvc dpvc added Ready for Development Accepted Issue has been reproduced by MathJax team v3 Code Example Contains an illustrative code example, solution, or work-around labels Mar 6, 2024
@dpvc dpvc added this to the v4.0 milestone Mar 6, 2024
@mycarrhiza
Copy link
Author

Awesome, thanks for the explanation. And thanks for including the v4 workaround as well! (i just started using it a couple days ago)

dpvc added a commit to mathjax/MathJax-src that referenced this issue Mar 6, 2024
dpvc added a commit to mathjax/MathJax-src that referenced this issue Mar 16, 2024
Allow unknown characters to use operator table to determine class and node type.  (mathjax/MathJax#3203)
@dpvc dpvc added Merged Merged into develop branch and removed Ready for Review labels Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Issue has been reproduced by MathJax team Code Example Contains an illustrative code example, solution, or work-around Merged Merged into develop branch Test Needed v3
Projects
None yet
Development

No branches or pull requests

2 participants