Inconsistencies in benchmark results #19
Replies: 9 comments 17 replies
-
I ended up testing with this file (New-Zealand-business-demography-statistics-At-February-2020) which is more representative of the kind of data loaded in Data Science. It is 120 MB in size and has 5,429,253 rows of 5 columns of The benchmark is running right now with 10,000 repetitions for each method. I will post the results when the benchmark is done. |
Beta Was this translation helpful? Give feedback.
-
I just re-ran the benchmark from the updated dockerfile with the new compile flags and the same gcc version 9.3.0. {'filename': 'sample.csv', 'repetitions': 10000, 'timer': 'real'}
PANDAS_READ_CSV = 12.0711
NUMPY_FROMFILE = 1.9925
NUMPY_LOADTXT = 0.5585
NUMPY_GENFROMTXT = 1.9629
CSV = 0.157
CSV_LIST = 0.159
CSV_MAP = 0.1627
FASTER_THAN_CSV = 5.7969 I will retry with |
Beta Was this translation helpful? Give feedback.
-
I think it is ok to leave it here I have seen it used many times. |
Beta Was this translation helpful? Give feedback.
-
I fixed it, I ran it more than 20 times, it gives me always half the time of the fastest, GCC 10.2 and Nim 1.5. |
Beta Was this translation helpful? Give feedback.
-
The results are much better now on the same machine with {'filename': 'sample.csv', 'repetitions': 1000, 'timer': 'real'}
PANDAS_READ_CSV = 1.2201
NUMPY_FROMFILE = 0.1994
NUMPY_LOADTXT = 0.0569
NUMPY_GENFROMTXT = 0.1938
CSV = 0.0159
CSV_LIST = 0.0158
FASTER_THAN_CSV = 0.0149 I will try again with GCC 10.2 and Nim 1.5, but there is clearly a great improvement already. |
Beta Was this translation helpful? Give feedback.
-
I re-ran the benchmark on the same system but with a tweaked docker image to force the installation of the latest versions of GCC and Nim. root@56056555239b:/tmp# gcc --version
gcc (Ubuntu 10.3.0-1ubuntu1~20.04~2) 10.3.0 root@56056555239b:/tmp# nim --version
Nim Compiler Version 1.5.1 [Linux: amd64]
Compiled at 2021-05-15
Copyright (c) 2006-2021 by Andreas Rumpf
root@56056555239b:/tmp# ./run-benchmark.sh
{'filename': 'sample.csv', 'repetitions': 1000, 'timer': 'real'}
PANDAS_READ_CSV = 1.0157
NUMPY_FROMFILE = 0.1679
NUMPY_LOADTXT = 0.0473
NUMPY_GENFROMTXT = 0.1908
CSV = 0.0136
CSV_LIST = 0.0137
FASTER_THAN_CSV = 0.0067 Now with the latest compiler versions it is twice as fast as the second fastest 👍🏿 🥇 |
Beta Was this translation helpful? Give feedback.
-
I had a look at the optimization you made. Basically you discarded the Did the compiler flags you changed a bit earlier have any effect? Or is the bulk of the improvements from refactoring I guess there has been a change in the behavior of the Nim compiler since you first ran the benchmark. |
Beta Was this translation helpful? Give feedback.
-
Btw @juancarlospaco you should add a description for the pypi page. |
Beta Was this translation helpful? Give feedback.
-
This works. |
Beta Was this translation helpful? Give feedback.
-
Hi @juancarlospaco .
I have been running the benchmark quite a few times to generate the plot and I noticed that on this AWS instance c5.4xlarge with 16 CPUs and 32GB of RAM running ubuntu 20.04, the results differ from the results in the table in the README.
When I run the benchmarks on this machine I get the following results:
I believe faster-than_csv should be the fastest by far.
Is there anything that changed in the
NIM compiler
or the optimization options of either theNIM compiler
or of theC/C++ compiler
since the last time you ran the benchmark?I used the results in the README table to generate the graph but my own bench-marking do not agree with those results.
Can you re-run the benchmark and let me know if you get the same behavior as me?
I am currently running the benchmark on a much bigger file of 24 MB 35 columns with mixed
float
,int
,str
andbool
entries with 142,695 rows.I will update this thread with the results of this benchmark.
Beta Was this translation helpful? Give feedback.
All reactions