-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create some pre-canned standard datasets like the h20 groupby dataset #256
Comments
This is a great idea Here's what I would propose: The API would look something like the following:
We could also have some documentation of the datasets available via APIs so that we would not have to revise the docs every time we add a new dataset - along the lines of how dbutils is self describing.
Initial datasets would be a) the data sets described in the documentation as examples, b) some curated set of datasets such as the H20 one you reference To others reading this, feel free to suggest datasets |
This looks like a great proposal! Thanks! |
Expected Behavior
It'd be nice to make some "standard" datasets easily accessible, so users don't have to figure out how to create them from scratch. For example, the datasets used in the h2o benchmarks here.
Here are three rows from the h2o groupby dataset:
It would be nice if I could generate this dataset as follows:
Current Behavior
I am guessing that there is some way to generate this dataset with the current API, but might take me a little while to figure it out.
The text was updated successfully, but these errors were encountered: