Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting HTML to markdown doesn't appear to preserve HTML entities #431

Open
sebpowell opened this issue Mar 23, 2023 · 2 comments
Open

Comments

@sebpowell
Copy link

sebpowell commented Mar 23, 2023

Consider the following HTML example:

<p>I think &amp;</p>

When I try converting this to Markdown using Turndown, I get the following output:

I think &

I guess I would expect Turndown to preserve HTML entities and to output something like this instead:

I think &amp;

I couldn't see an option to turn this on, so unless I'm missing something, I assume I need to use something like https://www.npmjs.com/package/html-entities. But I just wanted to check I'm not missing anything obvious?

Here's the config I'm using:

const INITIAL_TURNDOWN_OPTIONS: Turndown.Options = {
  headingStyle: "atx",
  hr: "---",
  bulletListMarker: "-",
  codeBlockStyle: "fenced",
  fence: "```",
  emDelimiter: "_",
  strongDelimiter: "**",
  linkStyle: "inlined",
};

Any help much appreciated!

@bjones1
Copy link

bjones1 commented Jun 2, 2023

Using the CommonMark dingus, entering I think & or I think &amp; renders to <p>I think &amp;</p>. So, the HTML entity in the HTML source doesn't need to be preserved in the resulting Markdown to still render properly. Are you asking for a way to preserve HTML entities, even if they don't need to be preserved to render correctly?

@Aloso
Copy link

Aloso commented Jul 18, 2023

@bjones1 it does need to be preserved in this case:

&lt;br&gt;

which is converted to

<br>

and in this case:

&amp;amp; is an ampersand

and in this case:

A big &nbsp; space

and in this case:

&nbsp; &nbsp; Not a code block

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants