Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interfaces to popular Rust hash libraries #276

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

tgross35
Copy link

@tgross35 tgross35 commented Sep 12, 2023

Fixes #69, #94

This adds bindings to Rust implementations of the following algorithms:

  • adler
  • ahash
  • ascon
  • ascona
  • blake2b512
  • blake2s256
  • blake3
  • crc32
  • fnvhash
  • fxhash32
  • fxhash64
  • hasher_hasher
  • highway
  • md5
  • ripemd128
  • ripemd160
  • ripemd256
  • ripemd320
  • sha1
  • sha224
  • sha256
  • sha384
  • sha512
  • sha512-224
  • sha512-256
  • cshake128
  • cshake256
  • keccak224
  • keccak256
  • keccak256-full
  • keccak384
  • keccak512
  • sha3-224
  • sha3-256
  • sha3-384
  • sha3-512
  • shake128
  • shake256
  • turboshake128
  • turboshake256
  • twox xxhash32
  • twox xxhash64
  • siphash 1-3
  • siphash 2-4
  • siphash128 1-3
  • siphash128 2-4
  • whirlpool
  • wyhash
  • xxhash3
  • xxhash128
  • xxhash32
  • xxhash64

C interfaces are in rust-hashes/rust_hashes.h. You can build librust_hashes.a with cargo run --release (output is in target/release). I assumed here that *out is at least 512 bits, keccak256full_rstest is a special case that needs 1600 bits if run.

I am not sure how to tie these in to the rest of the system. If you could do that it would be great, or just tell me how.

@tgross35 tgross35 mentioned this pull request Sep 12, 2023
@rurban
Copy link
Owner

rurban commented Sep 12, 2023

Looks great, but I'll need a few days to integrate and test these. Still at home with covid for a few more days

@tgross35
Copy link
Author

Thanks for the update, sounds good. Hope you feel better soon.

@tgross35 tgross35 force-pushed the add-rust branch 6 times, most recently from 411993e to a479f0b Compare September 13, 2023 02:22
@tgross35
Copy link
Author

tgross35 commented Sep 13, 2023

I think I got it all linked up, the tests run now. Few things I wasn't sure about:

  • Please review the cmake configuration
  • How does verification and quality in HashInfo need to be set?
  • Does anything need to be done to indicate hash algorithms that do not use the seed? Or hashes that use a smaller/bigger seed?
  • Some of these tests have very large hashes, e.g. Keccak-256 CryptoNight, and are probably pretty slow. Is there a way to adjust the maximum timing of these hashes so as not to exceed limits, or disable them by default?
  • Are MSAN/LeakSanitizer/ControlFlowIntegrity run at any point? Rust supports these (adding -Zsanitizer=msan to RUSTFLAGS during build) but only one at a time

I plan to expose Rust's HashMap API so that can be tested directly rather than going via unordered_map but haven't gotten to that yet

Non GHA CI probably does not work yet, may need some help there.

rurban and others added 2 commits September 13, 2023 08:24
@rurban
Copy link
Owner

rurban commented Sep 13, 2023

See the branch tgross35-add-rust with my cmake cleanups.
but still I get a linker error: (fc38)

/usr/lib64/ccache/g++ -std=c++11 -march=native -DRUST_ENABLED -DLTO -O3 -DNDEBUG -flto=auto -fno-fat-lto-objects CMakeFiles/SMHasher.dir/main.cpp.o -o SMHasher libSMHasherSupport.a -lhighwayhash -lahash_c /home/rurban/Software/smhasher/rust-hashes/target/release/librust_hashes.a
/usr/bin/ld: /home/rurban/Software/smhasher/rust-hashes/target/release/librust_hashes.a(rust_hashes-db6e9d89cc853d1d.rust_hashes.372886951253229d-cgu.0.rcgu.o): in function rust_eh_personality': /builddir/build/BUILD/rustc-1.72.0-src/library/std/src/personality/gcc.rs:251: multiple definition of rust_eh_personality'; /usr/local/lib/libahash_c.a(ahash_c-11c0a330212a2bf2.ahash_c.6drpysfk-cgu.1.rcgu.o):/builddir/build/BUILD/rustc-1.50.0-src/library/panic_unwind/src/gcc.rs:282: first defined here

@rurban
Copy link
Owner

rurban commented Sep 13, 2023

How does verification and quality in HashInfo need to be set?

I"ll do this manually. verification can be tested via --verbose --test=VerifyAll

Does anything need to be done to indicate hash algorithms that do not use the seed? Or hashes that use a smaller/bigger seed?

So far we went with 32bit seeds only, as 64bit seeds are too hard to test.
self-seeded hashes get verification 0. Generally all crypto hashes don't support a seed per se, so I added seed support to them manually. (mixing it into the IV).

Some of these tests have very large hashes, e.g. Keccak-256 CryptoNight, and are probably pretty slow. Is there a way to adjust the maximum timing of these hashes so as not to exceed limits, or disable them by default?

I adjusted some test values for these overly slow crypto hashes, or disable them completely when they bring nothing to the table. in our case Rust vs C++ would be interesting though, esp. when parallized.

Are MSAN/LeakSanitizer/ControlFlowIntegrity run at any point? Rust supports these (adding -Zsanitizer=msan to RUSTFLAGS during build) but only one at a time

Not yet, but should be done somewhen. I've only added asan, ubsan. I don't expect them to fail msan/cfi, but who knows. Quality was surprisingly mixed.

@tgross35
Copy link
Author

tgross35 commented Sep 13, 2023

Thanks for the fixes, I just enabled LTO and fixed RUSTFLAGS since I it seemed like that was not getting set properly (diff).

Are you able to disable the ahash_c interface and see if that works? I think there are probably some Rust symbol conflicts between that staticlib and this one since they're currently getting linked at the same time

@rurban
Copy link
Owner

rurban commented Sep 13, 2023

Yes it clashes only with the ahash_c. Maybe just add ahash to librust_hashes

@tgross35
Copy link
Author

Thanks for these detailed responses, I just saw them

Does anything need to be done to indicate hash algorithms that do not use the seed? Or hashes that use a smaller/bigger seed?

So far we went with 32bit seeds only, as 64bit seeds are too hard to test. self-seeded hashes get verification 0. Generally all crypto hashes don't support a seed per se, so I added seed support to them manually. (mixing it into the IV).

This makes it sound like there are tests that need to sweep the entire seed space? I am not too familiar with the test algorithms. All the hashes that take a u64 or larger seed currently use the provided seed and zero-extend it to the needed size, but I can change them to just ignore the seed if that is preferred.

Some of these tests have very large hashes, e.g. Keccak-256 CryptoNight, and are probably pretty slow. Is there a way to adjust the maximum timing of these hashes so as not to exceed limits, or disable them by default?

I adjusted some test values for these overly slow crypto hashes, or disable them completely when they bring nothing to the table. in our case Rust vs C++ would be interesting though, esp. when parallized.

I suspect the C++ implementations will win out much of the time :) Many of these Rust implementations come from the rustcrypto org which generally forbids handwritten asm - meaning it's trivial to get them to compile on obscure targets and is generally significantly less UB, but also more difficult to get that last 10% of possible performance.

Rust code does tend to optimize a bit better than C when neither have handwrittern instructions (near-ideal restrict keyword use definitely helps the compiler) so I am curious to see the results.

Are MSAN/LeakSanitizer/ControlFlowIntegrity run at any point? Rust supports these (adding -Zsanitizer=msan to RUSTFLAGS during build) but only one at a time

Not yet, but should be done somewhen. I've only added asan, ubsan. I don't expect them to fail msan/cfi, but who knows. Quality was surprisingly mixed.

Rust also has an interpreter miri that is used for UB detection and can detect some things that asan/msan can't. An experiment that runs smhasher's input suite on the rust algorithms via miri would be interesting.

Yes it clashes only with the ahash_c. Maybe just add ahash to librust_hashes

It is there currently, under the symbol ahash_rs. I see now that they use a different implementation for the seeds https://github.com/tkaitchuck/aHash/blob/master/smhasher/ahash-cbindings/src/lib.rs#L10, this implementation can be updated to match

Add ntdll to link list
@rurban
Copy link
Owner

rurban commented Sep 13, 2023

The old ahash has much better seeds. The new one has tons of bad seeds. So either remove it, or match it.

@rurban
Copy link
Owner

rurban commented Sep 13, 2023

Ad rust restrict: rust depends on llvm, and this still has a broken restrict implementation with small inlinable loops.

@rurban
Copy link
Owner

rurban commented Sep 13, 2023

All the hashes that take a u64 or larger seed currently use the provided seed and zero-extend it to the needed size, but I can change them to just ignore the seed if that is preferred.

zero-extension is fine. no seed makes them almost unusable for us.

@tgross35
Copy link
Author

I adjusted this implementation to use the same invocation as the current ahash. Is repeating the seed preferred to zero extending? I don't understand the exact vector.

Seems to be some linking errors on Windows. 64 bit github CI can't find some .libs and 32 bit on appveyor can't find any symbols - think I just might need to have it regenerate the bindings on that target. There is also a duplicate symbol error on aarch64, seems like blake3 uses the C implementation only on that platform. Think maybe the easiest thing there is just to rename those symbols in rust's output staticlib.

@rurban
Copy link
Owner

rurban commented Sep 15, 2023

Have a look at the results in the tgross35-add-rust branch.
Most crypto branches have lots of failures. So far I haven't found anything good to keep

@tgross35
Copy link
Author

Great to have results, thank you for the update.

What does the "invalid hash bit width" failure mean? Is this the failure you were referring to, or did you mean algorithmic failures? I wonder if I could have hooked something up wrong, or if some of this would be different if I repeated the keys rather than zero-extending.

Also, that branch does not have the recent change to ahash that I added in 5c30579 to make it match the previous implementation.

@tgross35
Copy link
Author

The blake hashes also take some optional input variables, I just used defaults since I was not entirely sure how to use them. I can update the usage if you have suggestions

@rurban
Copy link
Owner

rurban commented Sep 16, 2023

Great to have results, thank you for the update.

What does the "invalid hash bit width" failure mean? Is this the failure you were referring to, or did you mean algorithmic failures?

This is just the 123 typo. Will be tested in the next round. Testing is extremely slow on my home laptop.

I wonder if I could have hooked something up wrong

Also, that branch does not have the recent change to ahash that I added in 5c30579 to make it match the previous implementation.

Yes, that's missing. Should be the same as the standalone ahash.
More worrying are the crypto troubles. I'll look into them later

@rurban
Copy link
Owner

rurban commented Sep 17, 2023

I've tested the new ahash_rs, it"s even worse than the old.
worse momentchi2, and 2x bad seeds. let's keep the old, please

@tgross35
Copy link
Author

I've tested the new ahash_rs, it"s even worse than the old. worse momentchi2, and 2x bad seeds. let's keep the old, please

That is weird, it should be doing the exact same thing. Do you know what version of the library it was testing against before?

I can run some of these on my home computer which should be faster, just haven't gotten the chance.

@ogxd
Copy link

ogxd commented Nov 17, 2023

Hello! Any progress on this? I have this PR pending for adding gxhash but I think it might be better to simply add it as a rust dependency as I've put a lot more effort into this rust version, and is available on crates.io

@tgross35
Copy link
Author

Hey @ogxd,

I haven't yet taken another look at this so no, there are no updates. I am happy to add gxhash to this list before it gets closer to being mergeable, but I don't think you need to hold off on adding the C implementation as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fxhash (Rust)
3 participants