Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection Evasion with Unicode #212

Open
QuinceyJames opened this issue Mar 28, 2023 · 1 comment
Open

Detection Evasion with Unicode #212

QuinceyJames opened this issue Mar 28, 2023 · 1 comment

Comments

@QuinceyJames
Copy link
Contributor

Problem

Hi! I just read an interesting article on how bad actors can evade text-based static analysis tools using Unicode. Ever since PEP 3131, Python allowed programmers to use non-ASCII characters to allow developers "to define classes and functions with names in their native languages". As a consequence, there are now many ways keywords like eval be specified. (See: https://lingojam.com/BoldTextGenerator)

Proposal

Guarddog could preprocess all source files by converting any Unicode to ASCII. According to the PEP, "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."

Alternatively, Guarddog could define a new heuristic that warns if non-ASCII characters are found.

Test

  1. Generate a bolded Unicode variant of the letter e to obtain 𝐞
  2. Append tests/analyzer/sourcecode/code-execution.py with the following code:
    # ruleid: code-execution
    𝐞val("print('malicious print statement')")
  3. From the root of the project, run semgrep --metrics off --test --config guarddog/analyzer/sourcecode tests/analyzer/sourcecode
  4. Verify all of the unit tests pass
@zmallen
Copy link
Contributor

zmallen commented Apr 10, 2023

Interesting post!

I've seen this be solved a few ways, one of them being what you suggest. The preprocessing/replacement part can be tricky as it could break functionality if you incorrectly replace a piece of unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@zmallen @QuinceyJames and others