Problems with remove_duplicate #32

cifkao · 2020-12-26T17:54:44Z

The remove_duplicate method has two problems:

It requires the list to be sorted, which is not documented.
The sorting done by the sort method is not sufficient for this purpose, because it only sorts by time. To catch all duplicates, one would need to sort by all attributes (including non-primitive ones, which are not comparable). (For example duplicate tracks will get removed iff they are adjacent.)

Since making all MusPy objects (of the same type) comparable in a meaningful way would be complicated, the only reasonable solution seems to be to remove duplicates without sorting. For that, it would be sufficient to make objects hashable. Or JSON strings can be used as keys for a quick solution.

The text was updated successfully, but these errors were encountered:

salu133445 · 2020-12-26T20:32:35Z

Yes, the current implementation of remove_duplicate only works as expected if the sort method sorts objects with all attributes, which is not the case for now. I agree that making MusPy objects comparable would be complicated and misleading in some way.

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)], we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)] or [Tempo(time=0, qpm=100)], which depends on user's preference.

cifkao · 2020-12-26T20:54:27Z

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

Right, that would work.

However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)], we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)] or [Tempo(time=0, qpm=100)], which depends on user's preference.

Hmm, I wasn't even thinking of this case, what I had in mind were exactly duplicate objects that don't get sorted next to each other because there is some other object at the same time step which will end up between them.

cifkao · 2020-12-27T10:49:20Z

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

Actually, this doesn't work for objects that don't have a time attribute (e.g. tracks). Right now remove_duplicate will consider such objects, but there is no way to sort them.

salu133445 · 2020-12-27T17:59:42Z

I think the high-level idea is to do it for each time step if an object has a time attribute, otherwise do it for all list items.

cifkao mentioned this issue Dec 26, 2020

Merging objects #33

Merged

salu133445 added the enhancement New feature or request label Dec 27, 2020

This was referenced Jan 8, 2021

Slow but correct implementation of remove_duplicate #46

Merged

Add _apply_list_op, each, map, filter to ComplexBase #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with remove_duplicate #32

Problems with remove_duplicate #32

cifkao commented Dec 26, 2020 •

edited

salu133445 commented Dec 26, 2020

cifkao commented Dec 26, 2020

cifkao commented Dec 27, 2020

salu133445 commented Dec 27, 2020

Problems with remove_duplicate #32

Problems with remove_duplicate #32

Comments

cifkao commented Dec 26, 2020 • edited

salu133445 commented Dec 26, 2020

cifkao commented Dec 26, 2020

cifkao commented Dec 27, 2020

salu133445 commented Dec 27, 2020

cifkao commented Dec 26, 2020 •

edited