Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for a new test: how "wide" can a hash go #120

Open
dumblob opened this issue Apr 8, 2020 · 2 comments
Open

Idea for a new test: how "wide" can a hash go #120

dumblob opened this issue Apr 8, 2020 · 2 comments

Comments

@dumblob
Copy link

dumblob commented Apr 8, 2020

I've stolen the term "wide" from the http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/ and I quote:

Incidentally, I was confused for a long time about the distinction between PRNGs and hash functions—they seem to do the same thing, i.e. create an “unpredictable” but deterministic output by jumbling up input bits. It wasn’t until I started playing around with GPU PRNGs that I realized the difference: PRNGs are designed for going deep, and hashes for going wide.

Could we construct a test showing how "wide" the hash goes? Such test could also be named "how prone the hash is to create correlation patterns".

It's probably not that much important for hash quality in general, but maybe it might find some important use in the future once it'll be measured 😉.

@peteroupc
Copy link

peteroupc commented Apr 8, 2020

The article you linked to compares two PRNGs:

  • A linear conguential generator (LCG) with a small multiplier and a modulus of 2^32.
  • A 32-bit xorshift PRNG.

Both kinds of PRNGs are linear, so they tend to produce correlated sequences of numbers across nearby seeds. Nonlinear PRNGs as well as nonlinear hash functions are less likely to do so. Generally, two PRNGs (e.g., a PRNG initialized with one seed and the same PRNG initialized with a nearby seed) can be tested for correlation by interleaving their outputs. Something similar can be done for hash functions. See also my notes on testing PRNGs.

However, for the use in hash tables and other data lookup applications, hash functions are designed to map data into a sequence of bits in a way that avoids collisions ("better than random").

(Note: Despite what the article states, LCGs are not always "bad news"; at larger sizes and when their lower bits are chopped off, LCGs can have exceptional randomness quality. Meanwhile, Xorshift generators by themselves fail linear complexity tests, but this is not much of an issue for casual games, which is the focus of the article.)

@o0101
Copy link
Contributor

o0101 commented Apr 23, 2020

In this case, there's probably already a test (or two) in SMHasher that measures "wideness" described in the article,

set up a lot of independent PRNG instances with different seeds so we can draw from each of them in parallel

(read hash for PRNG, in the above),

being the Seed test and MomentChi2 tests, which I think test some sort of correlation on seeded and non-seeded runs on the same key, and also on nearby seeds on the same key. Or at least they used to (MomentChi2 I think got updated in the last few months, tho maybe that was only a code re-write).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants