Plugin to detect math in pasted LaTeX-like text #18

benrbray · 2021-04-01T13:57:16Z

Currently, consumers of prosemirror-math can set up custom paste behavior for their own configuration, but it would be helpful if we provided some tools to make it easier.

So, prosemirror-math should export an optional Plugin that detects dollar signs (or another user-configurable math delimiter) in pasted plain text and converts the encompassed text to an inline or block math node.

However, handling non-math dollar signs will be tricky, since it is unlikely that they will already be escaped in the pasted text. So, we will need to apply some common-sense criterion to determine whether a dollar sign corresponds to math or not. For example,

There are usually no spaces before a closing math dollar sign, so no math node should be detected in the example Billy has $4 and Sally has $3. When pasting, the dollar signs should be automatically escaped by prosemirror-math.

The text was updated successfully, but these errors were encountered:

bohrium · 2021-04-01T19:26:52Z

Here is an interesting but probably too-rare-to-worry-about case:

        The interstate highway system cost roughly $10^13 (in _today_'s dollars) to build; $\exp(2\pi i)=1$.

Here, a greedy approach based on common-sense heuristics might classify 10^13 (in _today_'s dollars) to build; as latex; this will absorb the middle dollar sign and thus prevent \exp(2\pi i)=1 from being considered. This is the sort of interaction that dynamic programming is good for. But this might be too rare to worry about.

Some ideas for features with which to classify (weights set by intuition, not learned programmatically from data):
a. latex-y characters: 8 * (number of backslashes) + 1 * (number of underscores, carats, or curlies)
b. numeric content: 1 * (number of digit characters) + 3 * (number of plus signs, minus signs, and equals signs)
c. bracket consistency: -13 * (1 if curly brace pattern is illegal (in the sense of catalan) else 0)
d. spacing context: -5 * (number of non-white-space characters immediately outside the dollar signs)
e. word count: -2 * (number of (space-delimited) blocks between the dollar signs)

In fact, simply threshholding a+e >= 1 would probably work well.

benrbray added the enhancement New feature or request label Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin to detect math in pasted LaTeX-like text #18

Plugin to detect math in pasted LaTeX-like text #18

benrbray commented Apr 1, 2021

bohrium commented Apr 1, 2021 •

edited

Plugin to detect math in pasted LaTeX-like text #18

Plugin to detect math in pasted LaTeX-like text #18

Comments

benrbray commented Apr 1, 2021

bohrium commented Apr 1, 2021 • edited

bohrium commented Apr 1, 2021 •

edited