Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about MurmurHash and FNV #251

Open
Cylrx opened this issue Dec 22, 2022 · 4 comments
Open

Questions about MurmurHash and FNV #251

Cylrx opened this issue Dec 22, 2022 · 4 comments
Labels

Comments

@Cylrx
Copy link

Cylrx commented Dec 22, 2022

Hi all, I have a couple of noob questions that I hope someone could answer:

  1. Benchmarks on this GitHub page show that MurmurHash3 runs at over 3000 MB/s, while my implementation can only reach 40 MB/s. When I did some calculations, my number seems reasonable given that a 3GHz processor can only run 3*10^9 cycles per second, which translates to roughly 60 MB/s assuming that each hash takes around 50 cycles (as stated on the GitHub page). How are hash functions able to achieve such high speeds? Are there any compiler optimizations that could be contributing to this performance?

  2. In my tests, I have found that MurmurHash3 is significantly slower than FNV1a (40MB/s vs 80MB/s). However, according to the benchmarks, MurmurHash is 4 times as fast as FNV. How is this the case, as it appears that MurmurHash3 have lengthier code and more instructions than FNV1a.

I would greatly appreciate it if someone could answer my questions!

@avaneev
Copy link
Contributor

avaneev commented Dec 25, 2022

Code length is rarely a factor. Branching or the number of "pivot points" to reach the result matters most in hash functions.

@rurban
Copy link
Owner

rurban commented Dec 25, 2022

Code length is a huge factor in most use cases. Esp. Hash tables.

@Cylrx
Copy link
Author

Cylrx commented Dec 25, 2022

Thanks for answering! I actually figured that one out, I missed the fact that MurmurHash3 hashes more bits each time than FNV. But I still don't understand how all these test results were all significantly faster than my implementation (which is my first question).

@avaneev
Copy link
Contributor

avaneev commented Dec 25, 2022

Code length is a huge factor in most use cases. Esp. Hash tables.

Have you actually did any comparisons of attribute always_inline vs just inline? Compilers are extremely selective of what they actually inline. They won't even inline a 2-line function if the number of inlinings are estimated to be large. Matter of fact, the larger the application code as a whole is, the slower it runs. Code also consumes memory and cache bandwidth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants