-
-
Notifications
You must be signed in to change notification settings - Fork 19
CSV Benchmark
Yasuhiro Yamada edited this page Mar 1, 2023
·
4 revisions
Replace all the characters on 2nd field to "@" in the approx 120 MiB of CSV file.
Tool | 1st | 2nd | 3rd | Average |
---|---|---|---|---|
teip + tr | 3.253s | 3.462s | 3.447s | 3.387s |
awk | 5.143s | 4.946s | 4.792s | 4.960s |
teip + awk | 5.099s | 4.987s | 6.069s | 5.385s |
Please note that teip
parses CSV align with RFC 4180, AWK does not.
- Platform: AWS t3.medium (vCPU x 2, Memory 4 GiB)
- Storage: EBS volume gp2 / 200 GiB (600 IOPS)
$ wget https://github.com/greymd/test_files/raw/v1.0.0/xsv/1000000_Sales_Records.csv.gz
$ zcat 1000000_Sales_Records.csv.gz | awk '{print}' > test.csv # Filtered by AWK to add trailing newline
$ du -hs test.csv
120M test.csv
$ ./target/release/teip --csv -f 2 -- tr '[:print:]' '@' < test.csv > teip_result.csv
$ ./target/release/teip --csv -f 2 -- awk '{gsub(".", "@");print}' < test.csv > teip_awk_result.csv
$ awk '{gsub(".","@",$2);print}' FS=, OFS=, < test.csv > awk_result.csv
$ md5sum *_result.csv
4328c75307064d3bfc3743a24c83513b awk_result.csv
4328c75307064d3bfc3743a24c83513b teip_result.csv
4328c75307064d3bfc3743a24c83513b teip_awk_result.csv
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time ./target/release/teip --csv -f 2 -- tr '[:print:]' '@' < test.csv > /dev/null
real 0m3.253s
user 0m3.407s
sys 0m0.146s
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time ./target/release/teip --csv -f 2 -- tr '[:print:]' '@' < test.csv > /dev/null
real 0m3.462s
user 0m3.624s
sys 0m0.152s
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time ./target/release/teip --csv -f 2 -- tr '[:print:]' '@' < test.csv > /dev/null
real 0m3.447s
user 0m3.680s
sys 0m0.121s
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub(".","@",$2);print}' FS=, OFS=, < test.csv > /dev/null
real 0m5.143s
user 0m5.021s
sys 0m0.072s
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub(".","@",$2);print}' FS=, OFS=, < test.csv > /dev/null
real 0m4.946s
user 0m4.815s
sys 0m0.093s
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub(".","@",$2);print}' FS=, OFS=, < test.csv > /dev/null
real 0m4.792s
user 0m4.683s
sys 0m0.091s
$ time ./target/release/teip --csv -f 2 -- awk '{gsub(".", "@");print}' < test.csv > /dev/null
real 0m5.099s
user 0m4.840s
sys 0m0.187s
$ time ./target/release/teip --csv -f 2 -- awk '{gsub(".", "@");print}' < test.csv > /dev/null
real 0m4.987s
user 0m4.863s
sys 0m0.121s
$ time ./target/release/teip --csv -f 2 -- awk '{gsub(".", "@");print}' < test.csv > /dev/null
real 0m6.069s
user 0m5.819s
sys 0m0.071s