Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error serializing dataframes with timestamps #599

Open
ektar opened this issue Feb 2, 2022 · 1 comment · May be fixed by #600
Open

Error serializing dataframes with timestamps #599

ektar opened this issue Feb 2, 2022 · 1 comment · May be fixed by #600
Assignees
Milestone

Comments

@ektar
Copy link

ektar commented Feb 2, 2022

Discovered "error serializing datetime" (or timestamps or others), described below, when serializing a complex class with several datetime and other parameters, and several dataframes. Identified that the issue seems to be in how pandas df's are (de)serialized - the "to_dict" function is called (see here) instead of panda's own json functions... in a df with timestamps those are simply dumped out in the resulting dict and that is attempted to be serialized, which fails as the timestamps aren't serializable without handling.

I found that by monkey patching the functions to use pandas own to_json/read_json functions I'm able to get the desired functionality. Also, by using the "table" orient option we get the schema in the output, allowing better serde roundtrip.

If the code below looks OK I'll be happy to submit a PR to add

Versions:
pandas: 1.4.0
json: 2.0.9
param: 1.12.0
python 3.10
osx

Description of expected behavior and the observed behavior:

Serialize and deserialize pandas dataframes with timestamps

Complete, minimal, self-contained example code that reproduces the issue

import datetime as dt
import json
import pandas as pd
import param

class TestClass(param.Parameterized):
    df_param = param.DataFrame()

df = pd.DataFrame({'a': [dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 1)],
                   'b': [1, 2, 3],
                   'c': [1.0, 2.0, 3.0]})

test = TestClass(df_param=df)

test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())

Stack traceback and/or browser JavaScript console output

Traceback (most recent call last):
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-212f05d3f170>", line 15, in <module>
    test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/parameterized.py", line 2087, in serialize_parameters
    return serializer.serialize_parameters(self_or_cls, subset=subset)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 104, in serialize_parameters
    return cls.dumps(components)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/site-packages/param/serializer.py", line 81, in dumps
    return json.dumps(obj)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/carlsoer/opt/anaconda3/envs/covid-model/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Timestamp is not JSON serializable

Example fix:

def df_serialize(cls, value):
    return json.loads(value.to_json(orient='table'))

def df_deserialize(cls, value):
    import pandas as pd
    return pd.read_json(json.dumps(value), orient='table')

param.DataFrame.serialize = df_serialize
param.DataFrame.deserialize = df_deserialize

test_serde_dict = TestClass.param.deserialize_parameters(test.param.serialize_parameters())
test_serde = TestClass(**test_serde_dict)

print(test_serde.__repr__())
print(test.__repr__())

output:

TestClass(df_param=           a  b    c
0 2000-01-01  1  1.0
1 2000-01-01  2  2.0
2 2000-01-01  3  3.0, name='TestClass00002')
TestClass(df_param=           a  b    c
0 2000-01-01  1  1.0
1 2000-01-01  2  2.0
2 2000-01-01  3  3.0, name='TestClass00002')
@jlstevens
Copy link
Contributor

Thanks for reporting this!

I'll need to think about it a little bit more, but at a glance the approach you propose would be a better solution than what is currently implemented. A PR would indeed be welcome!

@ektar ektar linked a pull request Feb 3, 2022 that will close this issue
@MridulS MridulS added this to the v1.12.1 milestone Feb 7, 2022
@maximlt maximlt modified the milestones: v1.12.x, v2.x Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants