Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify performance considerations for mappers like scale_to_z_score_per_key and key_vocabulary_filename #251

Open
cyc opened this issue Oct 26, 2021 · 0 comments

Comments

@cyc
Copy link

cyc commented Oct 26, 2021

If I understand correctly, there may be some performance considerations when using mappers such as scale_to_z_score_per_key and either setting or leaving unset key_vocabulary_filename. The documentation makes it sound like it's simply a matter of whether the keys fit into memory or not. Please correct me if I am wrong, but it seems like if you leave key_vocabulary_filename=None, then it will do the lookups in memory via map_per_key_reductions which can be very inefficient if the number of keys is more than just a handful. On the other hand, setting key_vocabulary_filename will create a StaticHashTable and lookups will be much more efficient.

If my understanding is correct, it would be good to note this in the docs so that other people can decide what is best for their use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants