Fix parsing Markdown in HTML #135

zhouzi · 2019-03-20T15:59:45Z

A user reported an issue where the Markdown content of an HTML node (within a Markdown file) is not parsed. The goal of this PR is to fix that.

We already parse HTML in Markdown so it would make sense to parse Markdown in HTML. Here's an example of Markdown containing HTML containing Markdown, which is properly parsed by GitHub:

This div contains Markdown with a link and some bold content.

Source:

<div style="text-align: justify">

This div contains Markdown with a [link](https://www.google.com) and some **bold content**.

</div>

Note that the linebreaks matter, the following:

<div style="text-align: justify">
This div contains Markdown with a [link](https://www.google.com) and some **bold content**.
</div>

Yields:

This div contains Markdown with a [link](https://www.google.com) and some **bold content**.

zhouzi · 2019-03-22T08:55:07Z

I tried something that didn't work so I thought I'd share the blockers. I've been relying on the CommonMark Spec and more specifically an example from the spec for HTML blocks.

Ideally, we should parse Markdown in HTML blocks that start and end with a line break. The problem is that we are cleaning the input string with htmlclean which removes those line breaks. This sanitization is required to avoid interpreting meaningless code formatting which leads to undesired white spaces and nodes.

I am now thinking about cleaning the HTML through the HTML parser itself. I'll give it a shot.

Soreine · 2019-03-27T09:28:52Z

I'm not sure that we can properly clean the HTML by chunk (through the parser). htmlclean needs context to know what can be removed.
However, we could do a first parsing pass, where we detect div that start and end with a line break, and mark them (for example with an HTML attribute), so that we can treat their innerHTML as Markdown in the parser ?

zhouzi added 3 commits March 20, 2019 16:45

Remove failing test

d9b03cd

Add failing test

892f003

Fix test case

db58c77

zhouzi added the wip label Mar 20, 2019

zhouzi self-assigned this Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing Markdown in HTML #135

Fix parsing Markdown in HTML #135

zhouzi commented Mar 20, 2019

zhouzi commented Mar 22, 2019

Soreine commented Mar 27, 2019

Fix parsing Markdown in HTML #135

Are you sure you want to change the base?

Fix parsing Markdown in HTML #135

Conversation

zhouzi commented Mar 20, 2019

zhouzi commented Mar 22, 2019

Soreine commented Mar 27, 2019