Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic output growth with CommonMark renderer #377

Open
nwellnhof opened this issue Feb 10, 2021 · 10 comments
Open

Quadratic output growth with CommonMark renderer #377

nwellnhof opened this issue Feb 10, 2021 · 10 comments

Comments

@nwellnhof
Copy link
Contributor

Found by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30784

Reduced test case:

$ printf '>>>>a\nb\nc\nd' |build/src/cmark -t commonmark
> > > > a
> > > > b
> > > > c
> > > > d
$ python3 -c 'print(">"*10000+"a\n"*10000)' |build/src/cmark -t commonmark |wc -c
200020000
@jgm
Copy link
Member

jgm commented Feb 10, 2021

Not sure why this should be considered a bug. The commonmark being produced is correct (normalized to avoid laziness), it's just that in normalizing to avoid laziness, we have to fill in a 10k * 10k square of > characters.

@nwellnhof
Copy link
Contributor Author

If you increase the input size, you get:

$ python3 -c 'print(">"*40000+"a\n"*20000)' |build/src/cmark -t commonmark >/dev/null
[cmark] cmark_strbuf_grow requests buffer with size > 1073741823, aborting
Aborted

That's an 80 KB input document taking several seconds and more than 1 GB of RAM to process before finally aborting. This could be considered a DoS vulnerability. At least, this issue makes it impossible to safely process untrusted data with the CommonMark renderer unless you severely limit the input size.

@jgm
Copy link
Member

jgm commented Feb 10, 2021

At least, this issue makes it impossible to safely process untrusted data with the CommonMark renderer unless you severely limit the input size.

I agree, but...It's just a fact that commonmark normalization can increase size like that.

I guess the only solution I can think of is to abort or truncate when block quote nesting passes some reasonable limit. (Maybe in the parser, maybe in the writer?)

@nwellnhof
Copy link
Contributor Author

Maybe the CommonMark renderer can switch to lazy continuations after a certain nesting level? Another idea is to stop rendering without aborting if the output reaches a certain size.

@jgm
Copy link
Member

jgm commented Aug 30, 2021

There's a request to fix this on https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30784
Now that I look at that issue, is the problem really a quadratic output growth?
Would that cause xrealloc to abort?
I tried running the test data locally and it finished in 1.25 s. does the fuzzer include an abort after a specific time, or something?

@nwellnhof
Copy link
Contributor Author

The test case doesn't trigger a timeout. But the output buffer of the renderer also grows quadratically, so the issue manifests when the buffer grows beyond 1 GB and our strbuf code aborts. Basically an OOM condition.

If you want to reproduce the test case from OSS-Fuzz, note that these are not Markdown files but contain a bit of header data for the fuzzer. It often works to just pipe them into cmark, but in general they should be run through the fuzzer binary to reproduce the issue reliably.

Anyway, I'm pretty sure I found root cause. The test cases generated from fuzzing aren't fun to work with, so I wouldn't waste more time on that.

@jgm
Copy link
Member

jgm commented Aug 31, 2021

Got it.

Maybe the CommonMark renderer can switch to lazy continuations after a certain nesting level?

I have a feeling this would be a bit complex to implement, but I haven't looked in detail. It would affect both lists and block quotes.

Another idea is to stop rendering without aborting if the output reaches a certain size.

How were you thinking this would be implemented? Would adding to a strbuf at a certain size just be made a no-op?

Ideally, we'd exit cleanly with an error status. What would it take to do this (and, would the fuzzer accept this)?

nwellnhof added a commit to nwellnhof/cmark that referenced this issue Sep 16, 2021
The cmark_render_* functions may now return NULL instead of aborting
if the output exceeds the maximum internal buffer size.

This fix isn't bullet-proof but should help with smaller documents
that trigger quadratic output growth with the Commonmark renderer.
See commonmark#377.
@nwellnhof
Copy link
Contributor Author

I have a feeling this would be a bit complex to implement, but I haven't looked in detail. It would affect both lists and block quotes.

Yes, lists can also trigger quadratic output growth:

$ printf ' - - - - a\nb\nc\nd'  |build/src/cmark -t commonmark
  -   -   -   - a
                b
                c
                d
$ python3 -c 'print("- "*10000+"a\n"*10000)' |build/src/cmark -t commonmark |wc -c
400020000

How were you thinking this would be implemented? Would adding to a strbuf at a certain size just be made a no-op?

This would just hide the error condition, so I don't think it's a good idea.

Ideally, we'd exit cleanly with an error status. What would it take to do this (and, would the fuzzer accept this)?

I think the easiest solution to add a check in each iteration of the main loop in cmark_render. If the size of the output buffer is close to the limit, the function could simply return NULL. Consequently, some of the public cmark_render_* functions will return NULL as well. This is a slight change of the public API.

This approach isn't bullet-proof when handling really large documents but should stop the fuzzer from aborting.

Tentative fix: nwellnhof@b7e2451

@jgm
Copy link
Member

jgm commented Sep 16, 2021

So the idea is to implement a very primitive kind of error handling: a null return value means "an error occurred"?

@nwellnhof
Copy link
Contributor Author

Yes, that's the simplest fix I could come up with.

But it turns out that switching to lazy continuation lines isn't too hard: nwellnhof@86bbd43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants