Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pickle instead of JSON for serialization #138

Open
coderforlife opened this issue Jan 16, 2022 · 1 comment
Open

Use pickle instead of JSON for serialization #138

coderforlife opened this issue Jan 16, 2022 · 1 comment

Comments

@coderforlife
Copy link

Why can't pickle be used instead of JSON? It supports a much wider number of variable types and the advantages of JSON don't really make sense here:

  • JSON is human-readable - not necessary here since it is just a transport between kernel and client
  • JSON is portable between non-Python applications (or between different versions of Python) - not important here since both kernel and client are running with Python and presumably using the same exact Python version (in cases where it isn't [if that is even possible] then just need to make sure an earlier pickle version is used)
  • JSON is secure in that it cannot cause arbitrary code to execute - not important here since you are already arbitrary running large amounts of code (the notebook cells along with any code injected)

As an example, I made some extra methods that I monkey patch on that have worked so var in a much wider range of possibilities than the current value() method provides:

import ast
import pickle

def get_value(self, expression):
    """
    Gets a value computed with an expression in the notebook. The value must be pickle-able.

    Raises TestbookRuntimeError is there is a problem running the code.
    """
    output = self.inject(f"import pickle\npickle.dumps({expression})", pop=True).outputs[0]
    # Instead of ast.literal_eval could use: value[2:-1].encode('latin1').decode('unicode-escape').encode('latin1'))
    return pickle.loads(ast.literal_eval(output.data['text/plain']))

def set_variable(self, varname, value):
    """
    Sets a variable's value in the notebook.
    The varname must be a string containing a valid Python variable name.
    The value can be any value that can be pickled.
    """
    self.inject(f"import pickle\n{varname} = pickle.loads({pickle.dumps(value)})", pop=True)

You can then even do tb.get_value('_') which will get the output of the last executed cell. I have been able to use this for numpy arrays, Pandas DataFrames and Series, and other types as well that the JSON serialization balks at.

I wouldn't add the get_value() method to your class, instead, I would replace all usages of JSON with pickling. I just do this to not mess with any of the methods already there.

Some changes may need to be made to ref() since it seems to only return references to things that are not JSON serializable. It seems like it should always return a reference and not a value (the TestBookReference object would need to support more magic methods for some people though). One problem is that functions can sometimes be pickled. Sometimes unpickling them might fail even if they were pickled.

@tbenthompson
Copy link

You could go a step further here and use cloudpickle which would allow serializing a much wider range of objects including, for example, classes that are defined inside the notebook. cloudpickle is a common solution for interprocess communication of arbitrary objects in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants