-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π Bug <br> is converted into two new lines (\n\n) #40
Comments
This is expected behavior. A line break in Markdown requires two newline characters. A single newline character will not render as a line break, instead it will render as a space. |
According to this page (https://www.markdownguide.org/basic-syntax) a newline in markdown shall be formatted as follows: I have also seen implementations where <br> and <p></p> are converted to one and two newlines (as prologic recommends). I don't know if there is a real standard for this. However, <br> must be treaded differently than <p></p> for not to loose information when converting from html to md. |
Take this HTML as the input: <p>Line 1<br />Line 2</p> With html-to-markdown and the normal commonmark behaviour for "br" with two newlines we get: Line 1
Line 2 With Commonmark (see playground) this renders as: <p>Line 1</p>
<p>Line 2</p> If you add a custom rule for "br" that just returns a single newline with: return String("\n") You get this ouput: Line 1
Line 2 With Commonmark (see playground) this renders as: <p>Line 1
Line 2</p> If we compare the different implementations (see babelmark) this behaviour is mostly shared between implementations. The markdown rendering on github.com works differently however π€·ββοΈ If we want to be extra precise, the html-to-markdown library would need to also support hard line breaks. However that would require some other changes. So for now, the current behaviour is going to stay as it is. Changing it would break it for other implementations. However you are free to change the behaviour, by writing a very simple custom rule. |
Then can we have the GitHub-flavored markdown to use single line breaks please? And the change would be minimum I'd presume. IE changing from output
Thanks |
There are other renderers β like the GitHub Flavored Markdown Extension from goldmark β that also implement the spec. And I don't want to break those. Right now, it seems like its only github.com that causes the problem... |
What about an additional built-in rule for these linebreaks? @suntong seems to be against the idea of altering the behavior of using this project GFM's plugin or adding a new parameter to accomplish this. |
@suntong I'm doubting you want a PR of this but: ImportTaste/html2md@082a6fb Works well for me. I really don't think @JohannesKaufmann is going to budge. |
NP, I'd love to, since it works well for you, and also because I'd agree with you that such feature might never be accepted here. So, send the PR pls. |
Expanding on a previous comment (#40 (comment)): From the official Markdown specification:
Converting this HTML to Markdown:
Should be this Markdown (with two spaces at the end of the first two lines where the <br /> tags were):
Using + to visual spaces, it would look like this:
Though the Markdown itself should use spaces, not +. This works on GitHub, CommonMark, and 27 implementations on babelmark. The reference Markdown (note the two spaces at the end of the first two lines):
GitHub: Line 1 CommonMark (web demo):
babelmark (web demo): It seems like the solution for how html-to-markdown should handle <br /> tags is to convert <br /> tags to two spaces and a new line (\x20\x20\n) rather than one (\n) or two new lines (\n\n). This behavior is defined by the official Markdown specification and it seems well supported by various implementations. |
Describe the bug
In my testing I've found that the HTML tag
<br />
gets turned into two new lines (\n\n
);Example:
HTML Input
Generated Markdown
Expected Markdown
Additional context
Is there any way to control this behaviour? I get that this might be getting interpreted as a "paragraph", but I would only expect that if there are two
<br />
(s) or an actual paragraph<p>...</p>
. Thanks!The text was updated successfully, but these errors were encountered: