-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with remove_duplicate #32
Comments
Yes, the current implementation of I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have |
Right, that would work.
Hmm, I wasn't even thinking of this case, what I had in mind were exactly duplicate objects that don't get sorted next to each other because there is some other object at the same time step which will end up between them. |
Actually, this doesn't work for objects that don't have a time attribute (e.g. tracks). Right now |
I think the high-level idea is to do it for each time step if an object has a time attribute, otherwise do it for all list items. |
The
remove_duplicate
method has two problems:sort
method is not sufficient for this purpose, because it only sorts by time. To catch all duplicates, one would need to sort by all attributes (including non-primitive ones, which are not comparable). (For example duplicate tracks will get removed iff they are adjacent.)Since making all MusPy objects (of the same type) comparable in a meaningful way would be complicated, the only reasonable solution seems to be to remove duplicates without sorting. For that, it would be sufficient to make objects hashable. Or JSON strings can be used as keys for a quick solution.
The text was updated successfully, but these errors were encountered: