Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforming datasets #31

Open
cifkao opened this issue Dec 25, 2020 · 0 comments
Open

Transforming datasets #31

cifkao opened this issue Dec 25, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@cifkao
Copy link
Contributor

cifkao commented Dec 25, 2020

MusPy seems to have a really smooth pipeline for (down)loading a dataset and iterating over it or converting it to a PyTorch or TensorFlow dataset using one of the pre-defined representations. What I would like to do, and doesn't seem to be currently easy to do, is creating a new dataset object by transforming an existing dataset. An example use case would be to download the Lakh dataset, filter it using some criteria, split it into short segments, apply some data augmentation, and then use this to train a PyTorch model. Maybe something like this:

lmd_split = lmd.transform(filter_and_split_fn, "data/lmd_split")  # transforms and saves dataset or reuses existing result
lmd_aug = lmd_split.transform(aug_fn, "data/lmd_aug")
lmd_aug.to_pytorch_dataset(representation="pianoroll")

where each transform function would be a function taking a single Music object and returning a list of Music objects.

Another (even more general, but maybe less efficient) possibility would be to be able to create a new dataset from a generator, e.g.:

def g():
    for music in lmd:
        yield music.transpose(1)

lmd_aug = FolderDataset.from_generator(g(), "data/lmd_aug")
@salu133445 salu133445 added the enhancement New feature or request label Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants