TDE multi array benchmark requires further investigation #131

GFariasR · 2017-02-21T01:58:28Z

Initial results obtained running the TDE multi array benchmark (jb/fftw/bm_time_delay_estimator_many) are not confirming our assumption:

For example:
float:aligned:single summary min=110us, p25=111us, p50=111us, p75=111us, p90=113us, p99=114us, p99.9=118us, max=198us, N=10000

float:aligned:many summary min=113us, p25=113us, p50=113us, p75=113us, p90=115us, p99=116us, p99.9=123us, max=250us, N=10000

So, further investigation is required.

We will start finding out answers for the following questions:
Are running more iterations in one case vs. the other?
Why one path is slower than the other? Use callgrind to find out
Did we make sure FFTW is given the right options with respect to memory alignment?
Is the test fair with respect to FFTW initialization?

coryan · 2017-02-21T02:33:56Z

It is also possible that FFTW does well when the number of parallel timeseries is small (try 1, 2, 4, 8).

And there are more options to FFTW, such as whether it can destroy the inputs, or whether it can spend time finding the best execution plan or should just estimate based on the sizes.

coryan · 2017-02-21T14:38:19Z

Also, you can add a void iteration_setup(); member function to your fixture class to reset the array/vector to well known values before each iteration. The time taken to run iteration_setup() is not included in the performance results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDE multi array benchmark requires further investigation #131

TDE multi array benchmark requires further investigation #131

GFariasR commented Feb 21, 2017

coryan commented Feb 21, 2017

coryan commented Feb 21, 2017

TDE multi array benchmark requires further investigation #131

TDE multi array benchmark requires further investigation #131

Comments

GFariasR commented Feb 21, 2017

coryan commented Feb 21, 2017

coryan commented Feb 21, 2017