insecure_code_detector.cli doesn't detect insecure code as expected #35

fuhengwu2021 · 2024-05-13T18:32:09Z

I have a java file Sample.java. There is a pattern import java.net.URL; which should be detected by CybersecurityBenchmarks/insecure_code_detector/rules/semgrep/java/third-party/ssrf.yaml. But after running icd, I got nothing detected. Anybody knows why?

Sample.java

import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.net.URL;

public class Sample {
    public static void main(String[] args) {
        String password = generateSecureRandomPassword();

rule:

Result:

2024-05-13 18:28:21,636 [INFO] ICD took 968ms
2024-05-13 18:28:21,636 [INFO] Found 0 issues

The text was updated successfully, but these errors were encountered:

csahana95 · 2024-05-14T01:23:02Z

Hi! thanks for reporting. Could you please specify how you ran ICD?

Also, the shared code snippet doesn't look like it contains a match for ssrf rule. If you look at the rule in detail, it's looking for patterns like new URL(url).openConnection().connect(); or similar. It doesn't just look for import.

SimonWan · 2024-05-14T16:25:07Z

btw, if you're looking to use the Insecure Code Detector independently, without running CyberSecEval, you might want to consider switching to our latest version, CodeShield. It's an upgraded version. For more context, please refer to this README.

fuhengwu2021 · 2024-05-14T17:48:38Z

Thanks for the answers @csahana95 @SimonWan . I am not very familiar with this domain, but from my understanding, code-shield seems a thin wrapper of ICD because it just uses LLM to parse the result of ICD to make it more human readable, right?

Also is there any example to show ICD is able to catch problematic code generated from LLM? I tried many prompts but found LLM already generated secure code. Could you please share some prompts so I can see the value of ICD?

SimonWan · 2024-05-16T23:14:31Z

Hi @fuhengwu2021

seems a thin wrapper of ICD because it just uses LLM to parse the result of ICD to make it more human readable, right?

Not exactly. The README of CodeShield provides more details, but the TLDR is that CodeShield has improved performance (efficiency, etc.) compared to the insecure-coding-practice repo you referred now.

Could you please share some prompts so I can see the value of ICD?

The examples of prompts are the prompt dataset we open-sourced, specifically listed under the ICD benchmark: https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks#running-instruct-and-autocomplete-benchmarks

Also, you can try commands above to query these prompts directly for you to try and observe some insecure code generated by LLMs.

SimonWan · 2024-05-30T19:18:39Z

I am closing this issue now as there has been no response in two weeks. Feel free to reopen it.

SimonWan added the Code-Shield label May 14, 2024

SimonWan closed this as completed May 30, 2024

SimonWan self-assigned this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insecure_code_detector.cli doesn't detect insecure code as expected #35

insecure_code_detector.cli doesn't detect insecure code as expected #35

fuhengwu2021 commented May 13, 2024

csahana95 commented May 14, 2024 •

edited

SimonWan commented May 14, 2024

fuhengwu2021 commented May 14, 2024

SimonWan commented May 16, 2024 •

edited

SimonWan commented May 30, 2024

insecure_code_detector.cli doesn't detect insecure code as expected #35

insecure_code_detector.cli doesn't detect insecure code as expected #35

Comments

fuhengwu2021 commented May 13, 2024

csahana95 commented May 14, 2024 • edited

SimonWan commented May 14, 2024

fuhengwu2021 commented May 14, 2024

SimonWan commented May 16, 2024 • edited

SimonWan commented May 30, 2024

csahana95 commented May 14, 2024 •

edited

SimonWan commented May 16, 2024 •

edited