You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for reaching out and helping us improve Vaex!
Description
Include support for vaex.DataFrame.export_hdf5(...) to handle columns that contain elements with variable length lists/arrays/etc and other HDF5 "special types," e.g., see https://docs.h5py.org/en/stable/special.html. Here's some example code that would ideally run, and generate an appropriate HDF5 file-
import vaex
import numpy as np
# Generate some test arrays/lists/lists-of-lists
rng = np.random.default_rng()
lol = [[d for d in range(rng.integers(0, 100, 1)[0])] for i in range(1000)]
lol = np.array(lol, dtype=list)
# To vaex
df = vaex.from_arrays(_primary=lol)
# Export to a file
df.export_hdf5("test.hdf5")
The column lol (list-of-lists) includes a list of variable-length lists (these could be other variable-length objects.) These are ostensibly supported by h5py/HDF5, e.g., see https://docs.h5py.org/en/stable/special.html and I've confirmed this in Python 3.10 via (this is just a scrap of code from something I'm writing that happens to write lists-of-lists fine)
def write_list(self, group: str, dataset: str, _list: list, **kwargs):
"""
Write the provided list within [group,dataset] in the file located at self.path.
Behaviour
----
If [group,set] exists, del will be attempted within the group, and a new dataset made. Note that this will
simply remove the data from the HDF5 files tree- it will not relieve file space. Special behaviour arises
when the elements of your list are not all of the same size-
see https://docs.h5py.org/en/stable/special.html.
**kwargs
----
_vtype: str (optional, default False)
In the case that your list is made up of lists or other elements of various length, you must specify
the dtype, e.g., "int32" or "float64." The list-of-lists will be converted to a list-of-arrays before
being written.
:param group: Parent key
:param dataset: Child key
:param _list: list
:return: bool for success.
"""
with h5py.File(self.path, 'a') as f:
if group not in f.keys():
f.create_group(group)
if dataset in f[group].keys():
del f[group][dataset]
_vtype = kwargs.get("_vtype", False)
if _vtype is not False:
_dtype = h5py.vlen_dtype(np.dtype(_vtype))
_list = [np.array(d, _vtype) for d in _list]
f.create_dataset(name=group + "/" + dataset, dtype=_dtype, data=_list)
else:
f.create_dataset(name=group + "/" + dataset, data=_list)
Is your feature request related to a problem? Please describe.
Not as far as I am aware of.
Additional context
When vaex attempts to write a list of variable length objects, this error message arises-
Traceback (most recent call last):
File "A:\straszaks\pycharm_tpa\DBKnowPy-sstrasza\class_DB.py", line 430, in <module>
DB().test()
File "A:\straszaks\pycharm_tpa\DBKnowPy-sstrasza\class_DB.py", line 428, in test
self._export()
File "A:\straszaks\pycharm_tpa\DBKnowPy-sstrasza\class_DB.py", line 211, in _export
self.FileLookup.export_hdf5(os.path.join(self.Root, self.Name + "_FileLookup.hdf5"), progress=False)
File "C:\Users\sstrasza\Documents\miniforge3\lib\site-packages\vaex\dataframe.py", line 6949, in export_hdf5
writer.layout(self, progress=progressbar_layout)
File "C:\Users\sstrasza\Documents\miniforge3\lib\site-packages\vaex\hdf5\writer.py", line 85, in layout
raise TypeError(f"Cannot export column of type: {dtype} (column {name})")
TypeError: Cannot export column of type: object (column _keys)
There should be an option somewhere under vaex.DataFrame.export_hdf5 for the user to specify if variable length types (or indeed other HDF5 "special" types) are present, and which columns in the DataFrame correspond to them, such that vaex can then successfully go forth and export these particular columns into the HDF5.
The text was updated successfully, but these errors were encountered:
Thank you for reaching out and helping us improve Vaex!
Description
Include support for
vaex.DataFrame.export_hdf5(...)
to handle columns that contain elements with variable length lists/arrays/etc and other HDF5 "special types," e.g., see https://docs.h5py.org/en/stable/special.html. Here's some example code that would ideally run, and generate an appropriate HDF5 file-The column lol (list-of-lists) includes a list of variable-length lists (these could be other variable-length objects.) These are ostensibly supported by h5py/HDF5, e.g., see https://docs.h5py.org/en/stable/special.html and I've confirmed this in Python 3.10 via (this is just a scrap of code from something I'm writing that happens to write lists-of-lists fine)
Is your feature request related to a problem? Please describe.
Not as far as I am aware of.
Additional context
When vaex attempts to write a list of variable length objects, this error message arises-
There should be an option somewhere under vaex.DataFrame.export_hdf5 for the user to specify if variable length types (or indeed other HDF5 "special" types) are present, and which columns in the DataFrame correspond to them, such that vaex can then successfully go forth and export these particular columns into the HDF5.
The text was updated successfully, but these errors were encountered: