-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDE multi array benchmark requires further investigation #131
Comments
It is also possible that FFTW does well when the number of parallel timeseries is small (try 1, 2, 4, 8). And there are more options to FFTW, such as whether it can destroy the inputs, or whether it can spend time finding the best execution plan or should just estimate based on the sizes. |
Also, you can add a |
Initial results obtained running the TDE multi array benchmark (jb/fftw/bm_time_delay_estimator_many) are not confirming our assumption:
For example:
float:aligned:single summary min=110us, p25=111us, p50=111us, p75=111us, p90=113us, p99=114us, p99.9=118us, max=198us, N=10000
float:aligned:many summary min=113us, p25=113us, p50=113us, p75=113us, p90=115us, p99=116us, p99.9=123us, max=250us, N=10000
So, further investigation is required.
We will start finding out answers for the following questions:
Are running more iterations in one case vs. the other?
Why one path is slower than the other? Use callgrind to find out
Did we make sure FFTW is given the right options with respect to memory alignment?
Is the test fair with respect to FFTW initialization?
The text was updated successfully, but these errors were encountered: