Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow, long compile times with large repetition values #596

Open
tadman opened this issue Apr 27, 2022 · 2 comments
Open

Stack overflow, long compile times with large repetition values #596

tadman opened this issue Apr 27, 2022 · 2 comments

Comments

@tadman
Copy link

tadman commented Apr 27, 2022

When implementing a spec based on an RFC, I wrote a rule like text{0,998}, as the spec (RFC5322) dictates that lines cannot exceed 998 characters.

This dramatically increased compile times from negligible to many seconds. Changing this to text* and doing that validation step in other code eliminates the problem.

By varying the repeat count in the rule and running tests on an Mac Mini (M1 2020):

  • text{0,2048} = 36.04s
  • text{0,998} = 8.77s
  • text{0,512} = 2.94s
  • text{0,256} = 1.12s
  • text{0,128} = 0.62s

Beyond a certain point it's just a "stack overflow":

  • text{0,3000} = fatal runtime error: stack overflow

Is this a known limitation of the implementation of limited repeat?

Sample grammar:

CRLF = _{ "\r"? ~ "\n" } // Relaxed definition

body = { (text{0,2600} ~ CRLF)* ~ text{0,2600} ~ CRLF? }
text = { '\u{01}'..'\u{09}' | '\u{0b}'..'\u{0c}' | '\u{0e}'..'\u{7f}' }

I tried this in the snippet generator but I think it can't handle it because of this issue.

@huacnlee
Copy link
Member

huacnlee commented Feb 11, 2023

Reason is here:

https://github.com/pest-parser/pest/blob/v2.5.3/meta/src/optimizer/unroller.rs#L52

If we define a repeat_min_max e.g.: text = { a{1,255} }, that logic will create 255 AST node in optimizer

@tadman
Copy link
Author

tadman commented Feb 17, 2023

Is there a recommended way of dealing with length-limited fields when parsing? I might be doing it wrong, but I'd like to be able to do this at the parser level so it can just cut out and quit if fed egregiously out of spec data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants