Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling datasets using Hugging Face datasets and Hub #2

Open
dvsrepo opened this issue Sep 29, 2021 · 5 comments
Open

Handling datasets using Hugging Face datasets and Hub #2

dvsrepo opened this issue Sep 29, 2021 · 5 comments

Comments

@dvsrepo
Copy link

dvsrepo commented Sep 29, 2021

Hi,

Love this initiative, congrats!

Would it be possible to integrate the datasets into the huggingface Hub? Besides from the technical effort, would there be any copyright, licensing issues? If not I wouldn't mind to help out with this

@JieyuZ2
Copy link
Owner

JieyuZ2 commented Sep 30, 2021

Hi,

Thank you!! Would you mind waiting until the ICLR ddl? will be back soon!

@dvsrepo
Copy link
Author

dvsrepo commented Oct 8, 2021

Thanks for your quick response! That's perfect, ping me if you'd like me to help out

@JieyuZ2
Copy link
Owner

JieyuZ2 commented Oct 9, 2021

@dvsrepo Hey, I think it's a good idea! tho I'm not familiar with huggingface Hub. One potential issue is that each dataset is coupled with a matrix that's the weak labels, wondering if that could also be incorporated or just raw data?

@dvsrepo
Copy link
Author

dvsrepo commented Oct 15, 2021

Hi @JieyuZ2 , I think it shouldn't be a problem.

Just to be sure, the datasets can be instantiated from the json files here?: https://drive.google.com/drive/folders/1v55IKG2JN9fMtKJWU48B_5_DcPWGnpTq?usp=sharing

And the format described here?

https://github.com/JieyuZ2/wrench/wiki/Dataset:-Format-and-Usage

Or there's some additional matrix data files?

@JieyuZ2
Copy link
Owner

JieyuZ2 commented Oct 15, 2021

@dvsrepo Yes, the additional matrix data is stored in the "weak_labels" field of the json. No other additional data file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants