Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re2 error: re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35 #2739

Open
rgmz opened this issue Apr 24, 2024 · 4 comments
Labels

Comments

@rgmz
Copy link
Contributor

rgmz commented Apr 24, 2024

Please review the Community Note before submitting

TruffleHog Version

Trace Output

re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35

Expected Behavior

The chunk data should be scanned.

Actual Behavior

TruffleHog outputs the aforementioned error from re2, making it unclear what the cause is and whether certain chunks were skipped.

Steps to Reproduce

The error seems semi-random so it's difficult to reproduce. Additionally, the log comes directly from re2.cc, meaning there is no context associated with it.

Environment

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional Context

google/re2#186

References

  • #0000
@rgmz rgmz added the bug label Apr 24, 2024
@zricethezav
Copy link
Collaborator

Maybe providing an option for users to pick which regex engine they want, re2 or default, would be worthwhile since re2 is a drop-in replacement of regex

@rgmz
Copy link
Contributor Author

rgmz commented May 1, 2024

It couldn't hurt given #2354. I think this specific error is caused by the configured max_mem for re2 being smaller than the TruffleHog's maximum diff size.

// defaultMaxDiffSize is the maximum size for a diff. Larger diffs will be cut off.
defaultMaxDiffSize = 2 * 1024 * 1024 * 1024 // 2GB

@dustin-decker
Copy link
Contributor

The whole diff is never scanned, we use a sliding-window-with-overlap chunker to break up data into more manageable chunks:

// ChunkSize is the maximum size of a chunk.
ChunkSize = 10 * 1024
// PeekSize is the size of the peek into the previous chunk.
PeekSize = 3 * 1024
// TotalChunkSize is the total size of a chunk with peek data.
TotalChunkSize = ChunkSize + PeekSize

Looks like the default max_mem is 8MB, so i'm guessing we have an expensive regex on some data?

@rgmz
Copy link
Contributor Author

rgmz commented May 14, 2024

Unfortunately, this seems to be a transient error. I've attempted to re-scan orgs/repos where I encountered it but haven't been able to reproduce it (so far).

It might be possible for wasilibs/go-re2 to catch failures from the underlying RE2::Match method and log additional context.

https://github.com/google/re2/blob/b7e96b34c0945fccb8b5282404f82c7ab0843717/re2/re2.cc#L772-L777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants