Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stomp calculates wrong MP vectors for two time series comparison #40

Open
MichaelDoron opened this issue Apr 17, 2019 · 7 comments
Open

Comments

@MichaelDoron
Copy link

Reproduction:

from matrixprofile import *

a = np.random.rand(500)
b = np.random.rand(500)
mp_a_1 = matrixProfile.stomp(a,10,a)[0]
mp_a_2 = matrixProfile.stomp(a,10)[0]
mp_a_b = matrixProfile.stomp(a,10,b)[0]

assert np.max(np.abs(mp_a_b[0])) > 0, 'stomp returns 0-filled vectors when tsB != tsA'
assert (mp_a_1[0] == mp_a_2[0]).all(), 'stomp returns different vectors when tsB = tsA and when tsB = None'

@Ofer-Idan
Copy link
Contributor

Not sure what your first assertion is checking for (it seems to only check the first element in mp_a_b) - mp_a_b is not actually 0-filled (you can test it with mp_a_b.any())

But I'm seeing the same issue with comparing STOMP(A) != STOMP(A,A)

As far as I can tell, the issue is within STOMPDistanceProfile (I haven't tried to check this with STAMP) in the selfJoin check:
if selfJoin: trivialMatchRange = (int(max(0,idx - np.round(m/2,0))),int(min(idx + np.round(m/2+1,0),n))) distanceProfile[trivialMatchRange[0]:trivialMatchRange[1]] = np.inf

@Ofer-Idan
Copy link
Contributor

Put in a simple fix for STOMP(A) != STOMP(A,A).
I'm looking into the bigger issue where STOMP(A,B) produces weird results.

@vanbenschoten
Copy link
Collaborator

@Ofer-Idan any update on the above?

@JaKasb
Copy link

JaKasb commented Jun 15, 2019

Never compare floating point numbers for exact equality.
Use numpy.allclose() instead.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html
https://floating-point-gui.de/errors/comparison/

@Ofer-Idan
Copy link
Contributor

@vanbenschoten the PR for STOMP(A) != STOMP(A,A) was merged, so @MichaelDoron's original issue should be resolved (and this issue closed).
@MichaelDoron can you please pull the latest and see if this can be closed please?

(I haven't had a chance to look into my other STOMP issues in a while as I moved to a different project, but that should be a separate issue altogether)

Thanks!

@MichaelDoron
Copy link
Author

Hey, thanks for looking into this.
The second assertion is solved, mp_a_1 is now equal to mp_a_2.
However, the first assertion (the one checking whether stomp returns 0-filled vectors when tsB != tsA) is still failing - mp_a_b seems to be filled with zeros.

@aouyang1
Copy link
Contributor

dot[0] = dot_first[order]
I believe is the culprit here. Using the precomputed cache to populate the 0th index of the next row's distance profile makes the assumption that we're doing a self-join. In the event that we are comparing two time series that are not the same, this assumption becomes invalidated. What worked for me was to recompute the dot product between the query and the the 0th subsequence in the b timeseries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants