Skip to content
This repository has been archived by the owner on Jul 12, 2019. It is now read-only.

major slowdown after transitioning to hdf5r package #2

Open
mkoohafkan opened this issue Nov 17, 2017 · 3 comments
Open

major slowdown after transitioning to hdf5r package #2

mkoohafkan opened this issue Nov 17, 2017 · 3 comments
Labels

Comments

@mkoohafkan
Copy link
Owner

mkoohafkan commented Nov 17, 2017

test case: Compare the master branch (h5-based) to hdf5r_transition branch (hdf5r-based).

require(microbenchmark)
ras.file = system.file("sample-data/SampleQuasiUnsteady.hdf", package = "RAStestR")
microbenchmark(vol.change.cum <- read_sediment(ras.file, "Vol Bed Change Cum"))

microbenchmark results:

Unit: milliseconds
expr        min       lq     mean   median       uq      max neval
h5       100.51 104.1777 110.2355 106.9395 111.2867 197.3422   100
hdf5r  5685.172 5761.895 5860.317 5813.281 5908.393 6482.691   100

Benchmark for single table reads:

get_dataset_hdf5r = function(f, table.path) {
  x = hdf5r::H5File$new(f)
  g = x$open(table.path)
  res = g$read()
  g$close()
  x$close()
  res
}

get_dataset_h5 = function(f, table.path, type = "double") {
  x = h5::h5file(f)
  g = h5::openDataSet(x, table.path, type)
  res = h5::readDataSet(g)
  h5::h5close(g)
  h5::h5close(x)
  res
}

myfile = system.file("sample-data/SampleQuasiUnsteady.hdf", package = "RAStestR")
mytable =  "Results/Sediment/Output Blocks/Sediment/Sediment Time Series/Cross Sections/Vol Bed Change Cum"

microbenchmark(
  get_dataset_hdf5r(myfile, mytable),
  get_dataset_h5(myfile, mytable)
)

microbenchmark results:

Unit: milliseconds
                               expr      min       lq     mean   median       uq       max neval
 get_dataset_hdf5r(myfile, mytable) 3.898140 3.995330 4.196941 4.093295 4.254240  6.102233   100
    get_dataset_h5(myfile, mytable) 1.558758 1.607431 2.190121 1.677407 1.758579 50.932394   100
@mkoohafkan
Copy link
Owner Author

see hhoeflin/hdf5r#85

@mkoohafkan
Copy link
Owner Author

mkoohafkan commented Nov 17, 2017

after optimizing with commit 5b97da3

Unit: milliseconds
  expr        min       lq     mean   median       uq      max neval
 hdf5r   1463.447 1481.152  1515.812 1498.303 1538.885 1692.653   100

more than 3x improvement, but still ~15x slower than h5.

@mkoohafkan
Copy link
Owner Author

doesn't look like this will get resolved. Waiting for HDFql support for compound datasets, at which point will transition backend to hdfqlr.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant