-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection Evasion with Unicode #212
Comments
Interesting post! I've seen this be solved a few ways, one of them being what you suggest. The preprocessing/replacement part can be tricky as it could break functionality if you incorrectly replace a piece of unicode. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
Hi! I just read an interesting article on how bad actors can evade text-based static analysis tools using Unicode. Ever since PEP 3131, Python allowed programmers to use non-ASCII characters to allow developers "to define classes and functions with names in their native languages". As a consequence, there are now many ways keywords like
eval
be specified. (See: https://lingojam.com/BoldTextGenerator)Proposal
Guarddog could preprocess all source files by converting any Unicode to ASCII. According to the PEP, "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."
Alternatively, Guarddog could define a new heuristic that warns if non-ASCII characters are found.
Test
e
to obtain𝐞
tests/analyzer/sourcecode/code-execution.py
with the following code:semgrep --metrics off --test --config guarddog/analyzer/sourcecode tests/analyzer/sourcecode
The text was updated successfully, but these errors were encountered: