Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin to detect math in pasted LaTeX-like text #18

Open
benrbray opened this issue Apr 1, 2021 · 1 comment
Open

Plugin to detect math in pasted LaTeX-like text #18

benrbray opened this issue Apr 1, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@benrbray
Copy link
Owner

benrbray commented Apr 1, 2021

Currently, consumers of prosemirror-math can set up custom paste behavior for their own configuration, but it would be helpful if we provided some tools to make it easier.

So, prosemirror-math should export an optional Plugin that detects dollar signs (or another user-configurable math delimiter) in pasted plain text and converts the encompassed text to an inline or block math node.

However, handling non-math dollar signs will be tricky, since it is unlikely that they will already be escaped in the pasted text. So, we will need to apply some common-sense criterion to determine whether a dollar sign corresponds to math or not. For example,

  • There are usually no spaces before a closing math dollar sign, so no math node should be detected in the example Billy has $4 and Sally has $3. When pasting, the dollar signs should be automatically escaped by prosemirror-math.
@benrbray benrbray added the enhancement New feature or request label Apr 1, 2021
@bohrium
Copy link

bohrium commented Apr 1, 2021

Here is an interesting but probably too-rare-to-worry-about case:

        The interstate highway system cost roughly $10^13 (in _today_'s dollars) to build; $\exp(2\pi i)=1$.

Here, a greedy approach based on common-sense heuristics might classify 10^13 (in _today_'s dollars) to build; as latex; this will absorb the middle dollar sign and thus prevent \exp(2\pi i)=1 from being considered. This is the sort of interaction that dynamic programming is good for. But this might be too rare to worry about.

Some ideas for features with which to classify (weights set by intuition, not learned programmatically from data):
a. latex-y characters: 8 * (number of backslashes) + 1 * (number of underscores, carats, or curlies)
b. numeric content: 1 * (number of digit characters) + 3 * (number of plus signs, minus signs, and equals signs)
c. bracket consistency: -13 * (1 if curly brace pattern is illegal (in the sense of catalan) else 0)
d. spacing context: -5 * (number of non-white-space characters immediately outside the dollar signs)
e. word count: -2 * (number of (space-delimited) blocks between the dollar signs)

In fact, simply threshholding a+e >= 1 would probably work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants