Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuckooFilter returns false negative #68

Open
tri2820 opened this issue Mar 28, 2024 · 3 comments
Open

CuckooFilter returns false negative #68

tri2820 opened this issue Mar 28, 2024 · 3 comments

Comments

@tri2820
Copy link

tri2820 commented Mar 28, 2024

I'm using CuckooFilter to test the membership of the following items. The filter returns false - definitely not in the list - for 2 items. These cases are false negatives which should not happen.

Is this a bug or am I missing something?
Version: "bloom-filters": "^3.0.1"

import { CuckooFilter } from "bloom-filters";
const items = [
  "https://www.youtube.com/watch?v=HJjxN05ewEc",
  "https://www.youtube.com/watch?v=BZNUo7orS3k",
  "https://www.youtube.com/watch?v=SD-McWZz_pk",
  "https://www.youtube.com/watch?v=De4QjH9fpgo",
  "https://www.youtube.com/watch?v=Hzko-cjHhTg",
  "https://www.youtube.com/watch?v=vqR-8lgOmBE",
  "https://www.youtube.com/watch?v=j6u0LH67YLk",
  "https://www.youtube.com/watch?v=B2z8ikGLRh8",
  "https://www.youtube.com/watch?v=N3ftBeP16TA",
  "https://www.youtube.com/watch?v=38RBRPwODUk",
  "https://www.youtube.com/watch?v=Ry8nSUfX6fY",
  "https://www.youtube.com/watch?v=-KrYohUJvYw",
  "https://www.youtube.com/watch?v=zRpl7Pr0fs4",
  "https://www.youtube.com/watch?v=uYYiypp6WaY",
  "https://www.youtube.com/watch?v=EPap21FBGbE",
  "https://www.youtube.com/watch?v=zN2_0WC7UfU",
  "https://www.youtube.com/watch?v=DNVwnkgTzbY",
];

const errorRate = 0.04; // 4 % error rate
const filter = CuckooFilter.from(items, errorRate);
console.log(filter.has("hello")); // return false (ok)
console.log(filter.has("https://www.youtube.com/watch?v=HJjxN05ewEc")); // return true (ok)
console.log(filter.has("https://www.youtube.com/watch?v=38RBRPwODUk")); // return false (false negative)
console.log(filter.has("https://www.youtube.com/watch?v=-KrYohUJvYw")); // return false (false negative)
@folkvir
Copy link
Collaborator

folkvir commented Mar 29, 2024

Hi! Thank your for sharing this behavior. That is actually weird, it seems that with a lot of rounds (100 000) we get a 66% of false negative and without the prefix https://www.youtube.com/watch?v= we get 33%. When dropping the error rate to 3% there is now no error at all what should be expected!. 🙄 Increasing the error rate up to 10% does not increase the number of false negative it stays at 33% without the prefix and 66% with the prefix. So for now no explanation on this, we need to investigate more but @Callidon and me are very busy so it may take longer than expected.

For now; try to use 0.03 and to remove the prefix https://www.youtube.com/watch?v.

Reproductible code on runkit (https://npm.runkit.com/bloom-filters)

const { CuckooFilter } = require("bloom-filters");
let items = [
  "https://www.youtube.com/watch?v=HJjxN05ewEc",
  "https://www.youtube.com/watch?v=BZNUo7orS3k",
  "https://www.youtube.com/watch?v=SD-McWZz_pk",
  "https://www.youtube.com/watch?v=De4QjH9fpgo",
  "https://www.youtube.com/watch?v=Hzko-cjHhTg",
  "https://www.youtube.com/watch?v=vqR-8lgOmBE",
  "https://www.youtube.com/watch?v=j6u0LH67YLk",
  "https://www.youtube.com/watch?v=B2z8ikGLRh8",
  "https://www.youtube.com/watch?v=N3ftBeP16TA",
  "https://www.youtube.com/watch?v=38RBRPwODUk",
  "https://www.youtube.com/watch?v=Ry8nSUfX6fY",
  "https://www.youtube.com/watch?v=-KrYohUJvYw",
  "https://www.youtube.com/watch?v=zRpl7Pr0fs4",
  "https://www.youtube.com/watch?v=uYYiypp6WaY",
  "https://www.youtube.com/watch?v=EPap21FBGbE",
  "https://www.youtube.com/watch?v=zN2_0WC7UfU",
  "https://www.youtube.com/watch?v=DNVwnkgTzbY",
];



const without_prefix = false;
const prefix = "https://www.youtube.com/watch?v="
const errorRate = 0.04; // 4 % error rate

if (without_prefix) {
    items = items.map(e => e.replace(prefix, ""));
}

const round = 100000
let c_false = 0

const filter = CuckooFilter.from(items, errorRate);
for (let i=0; i<round; i++) {
    let val = filter.has(`${without_prefix ? '' : prefix}HJjxN05ewEc`);
    if (!val) {
        c_false++;
    }
        
    val = filter.has(`${without_prefix ? '' : prefix}38RBRPwODUk`);
    if (!val) {
        c_false++;
    }
    val = filter.has(`${without_prefix ? '' : prefix}-KrYohUJvYw`);
    if (!val) {
        c_false++;
    }
}

console.log('Number of false positive:', c_false)
console.log('Rate:', c_false / (round * 3))

@tri2820
Copy link
Author

tri2820 commented Mar 30, 2024

Hi @folkvir, from the code it seems for each round we are just checking membership using '.has'.

We already know 2 out of 3 hard-coded items there would return false, so it would always return 66% no matter how many rounds we run.

@folkvir
Copy link
Collaborator

folkvir commented Apr 1, 2024

All the 3 items in my loop are part of the filter. The .has should return true and should return a 4% of false negatives.
I didn't check it yet but it could be due to the capacity of the filter. At 95% full the cuckoo filter can give 10% of errors. [1]

folkvir added a commit that referenced this issue May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants