-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add argument to Profiler for samples ratio #1094
Labels
New Feature
A feature addition not currently in the library
Comments
Hey @carlsonp! Thanks for opening the issue and the idea presented. This makes a ton of sense and I think fits perfectly as a feature into the There are two features, although not percentages, that exist for CSV and Parquet: I think something like a percentage sampling would be a nice addition to the readers: read in sampled as desired and pass the pre-sampled data to the profiler. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Today, there seem to be 2 settings for adjusting the sample size. They are
samples_per_update
andmin_true_samples
. I can load in my file via Pandas and get the number of rows if I want to profile the whole thing. For example:I was just thinking it would be nice to add an additional flag like
samples_ratio
which would be a value between 0-1 denoting the percentage of data that you want to sample. This would mean you wouldn't have to essentially load the data in twice, you could just say I want X percentage loaded in as samples and it would go from there.The text was updated successfully, but these errors were encountered: