Quadratic output growth with CommonMark renderer #377

nwellnhof · 2021-02-10T11:07:33Z

Found by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30784

Reduced test case:

$ printf '>>>>a\nb\nc\nd' |build/src/cmark -t commonmark
> > > > a
> > > > b
> > > > c
> > > > d
$ python3 -c 'print(">"*10000+"a\n"*10000)' |build/src/cmark -t commonmark |wc -c
200020000

The text was updated successfully, but these errors were encountered:

jgm · 2021-02-10T16:43:21Z

Not sure why this should be considered a bug. The commonmark being produced is correct (normalized to avoid laziness), it's just that in normalizing to avoid laziness, we have to fill in a 10k * 10k square of > characters.

nwellnhof · 2021-02-10T17:51:07Z

If you increase the input size, you get:

$ python3 -c 'print(">"*40000+"a\n"*20000)' |build/src/cmark -t commonmark >/dev/null
[cmark] cmark_strbuf_grow requests buffer with size > 1073741823, aborting
Aborted

That's an 80 KB input document taking several seconds and more than 1 GB of RAM to process before finally aborting. This could be considered a DoS vulnerability. At least, this issue makes it impossible to safely process untrusted data with the CommonMark renderer unless you severely limit the input size.

jgm · 2021-02-10T20:25:03Z

At least, this issue makes it impossible to safely process untrusted data with the CommonMark renderer unless you severely limit the input size.

I agree, but...It's just a fact that commonmark normalization can increase size like that.

I guess the only solution I can think of is to abort or truncate when block quote nesting passes some reasonable limit. (Maybe in the parser, maybe in the writer?)

nwellnhof · 2021-02-11T10:08:11Z

Maybe the CommonMark renderer can switch to lazy continuations after a certain nesting level? Another idea is to stop rendering without aborting if the output reaches a certain size.

jgm · 2021-08-30T18:53:14Z

There's a request to fix this on https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30784
Now that I look at that issue, is the problem really a quadratic output growth?
Would that cause xrealloc to abort?
I tried running the test data locally and it finished in 1.25 s. does the fuzzer include an abort after a specific time, or something?

nwellnhof · 2021-08-30T19:53:24Z

The test case doesn't trigger a timeout. But the output buffer of the renderer also grows quadratically, so the issue manifests when the buffer grows beyond 1 GB and our strbuf code aborts. Basically an OOM condition.

If you want to reproduce the test case from OSS-Fuzz, note that these are not Markdown files but contain a bit of header data for the fuzzer. It often works to just pipe them into cmark, but in general they should be run through the fuzzer binary to reproduce the issue reliably.

Anyway, I'm pretty sure I found root cause. The test cases generated from fuzzing aren't fun to work with, so I wouldn't waste more time on that.

jgm · 2021-08-31T04:08:22Z

Got it.

Maybe the CommonMark renderer can switch to lazy continuations after a certain nesting level?

I have a feeling this would be a bit complex to implement, but I haven't looked in detail. It would affect both lists and block quotes.

Another idea is to stop rendering without aborting if the output reaches a certain size.

How were you thinking this would be implemented? Would adding to a strbuf at a certain size just be made a no-op?

Ideally, we'd exit cleanly with an error status. What would it take to do this (and, would the fuzzer accept this)?

The cmark_render_* functions may now return NULL instead of aborting if the output exceeds the maximum internal buffer size. This fix isn't bullet-proof but should help with smaller documents that trigger quadratic output growth with the Commonmark renderer. See commonmark#377.

nwellnhof · 2021-09-16T16:15:07Z

I have a feeling this would be a bit complex to implement, but I haven't looked in detail. It would affect both lists and block quotes.

Yes, lists can also trigger quadratic output growth:

$ printf ' - - - - a\nb\nc\nd'  |build/src/cmark -t commonmark
  -   -   -   - a
                b
                c
                d
$ python3 -c 'print("- "*10000+"a\n"*10000)' |build/src/cmark -t commonmark |wc -c
400020000

How were you thinking this would be implemented? Would adding to a strbuf at a certain size just be made a no-op?

This would just hide the error condition, so I don't think it's a good idea.

Ideally, we'd exit cleanly with an error status. What would it take to do this (and, would the fuzzer accept this)?

I think the easiest solution to add a check in each iteration of the main loop in cmark_render. If the size of the output buffer is close to the limit, the function could simply return NULL. Consequently, some of the public cmark_render_* functions will return NULL as well. This is a slight change of the public API.

This approach isn't bullet-proof when handling really large documents but should stop the fuzzer from aborting.

Tentative fix: nwellnhof@b7e2451

jgm · 2021-09-16T17:13:16Z

So the idea is to implement a very primitive kind of error handling: a null return value means "an error occurred"?

nwellnhof · 2021-09-17T10:07:03Z

Yes, that's the simplest fix I could come up with.

But it turns out that switching to lazy continuation lines isn't too hard: nwellnhof@86bbd43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic output growth with CommonMark renderer #377

Quadratic output growth with CommonMark renderer #377

nwellnhof commented Feb 10, 2021

jgm commented Feb 10, 2021

nwellnhof commented Feb 10, 2021

jgm commented Feb 10, 2021

nwellnhof commented Feb 11, 2021

jgm commented Aug 30, 2021

nwellnhof commented Aug 30, 2021

jgm commented Aug 31, 2021

nwellnhof commented Sep 16, 2021

jgm commented Sep 16, 2021

nwellnhof commented Sep 17, 2021

Quadratic output growth with CommonMark renderer #377

Quadratic output growth with CommonMark renderer #377

Comments

nwellnhof commented Feb 10, 2021

jgm commented Feb 10, 2021

nwellnhof commented Feb 10, 2021

jgm commented Feb 10, 2021

nwellnhof commented Feb 11, 2021

jgm commented Aug 30, 2021

nwellnhof commented Aug 30, 2021

jgm commented Aug 31, 2021

nwellnhof commented Sep 16, 2021

jgm commented Sep 16, 2021

nwellnhof commented Sep 17, 2021