We should try to understand better what we can serialize and what we cannot, so I'm making this issue to collect problems with our serialization.
There are five categories of problems:
If you run into problems like the above, please post a minimal example here or create a new issue for it.
Here is one that fits into 3.:
import re
b = re.compile(r"\d+\.\d*")
We actually can handle this case. If you try ray.get(ray.put(b)), it works. The thing that is failing when you pass b into a remote function is something else. I'm looking into it.
Ok, the issue here is that pickle.dumps(b) succeeds, but pickle.dumps(type(b)) fails, so we can't do the usual register_class approach because that involves pickling the type.
That said, it seems like we shouldn't actually need to do register_class if we're pickling things.
more resources: https://github.com/jsonpickle/jsonpickle/tree/master/tests and https://github.com/jsonpickle/jsonpickle/issues
Ideally we would port all the tests to Ray.
Is category (2) a real thing? Cloudpickle is supposed to be more general. It is true, that pickle "succeeds" at serializing some things that cloudpickle fails at, e.g.,
class Foo(object):
def __init__(self):
super
cloudpickle.dumps(Foo) # PicklingError: Could not pickle object as excessively deep recursion required.
pickle.dumps(Foo) # b'\x80\x03c__main__\nFoo\nq\x00.'
However, this is misleading. Pickle only succeeds because it doesn't capture the class definition (you couldn't unpickle it in another process).
After #550, I think category (3) should more or less be nonexistent. I'm probably wrong about this, but we'll see.
Category (4) is a big problem. E.g., #319.
I just remembered an important class of types that fall in category (3), which is subtypes of standard types like lists/dicts/tuples (see #512 for an example).
For example, if we serialize a collections.defaultdict, then it will be deserialized as dictionary. This presumably happens in a lot of types.
I opened https://issues.apache.org/jira/browse/ARROW-1059 with the goal of being able to more easily expand the kinds of objects that can be serialized using the Arrow IPC machinery. So if you had objects that don't fit the set of "fast storage" types (tensors and the other primitive types supported), then these could be stored as pickles. pyarrow could register a pickle deserializer to be able to understand this user "message" type
Ray currently fails to serialize xarray objects #748 (this is category (3) above).
Ray also fails to serialize tensorflow-related objects, like Tensor.
Do you have plan to fix it?
I'm working on Reinforcement learning algorithms, and sometimes I need to broadcast a tf model or Tensor and it fails.
This is already possible with a helper function, see docs.
@programmerlwj the underlying problem is that TensorFlow objects can't easily be serialized (e.g., with pickle). The preferable solution is to only ship the neural net weights or the gradient (e.g., as a list or dictionary of numpy arrays).
This is a bit of a doozy, but I would really like to be able to use ray with the excellent sacred library - they belong together, like peanut butter and jelly. (I'm specifically referring ray tune here)
The sacred.Experiment object is a nightmare to serialize though. pickle fails to serialize it in its simplest incarnation and real-world experiments are usually significantly more complicated.
from sacred.experiment import Experiment
test_exp = Experiment('test')
import pickle
pickle.dumps(test_exp) # nope
import cloudpickle
cloudpickle.dumps(test_exp) # nope
import dill
dill.dumps(test_exp) # also nope
Is there any hope for ever serializing experiment?
If not, there is an early stage discussion about an OO-based interface to sacred that might provide a path forward.
Update: I have managed to get tune running without having to serialize experiment and am now no longer sure that is required
We encountered many difficulties with serialization. #3917
In all cases, we wrote our own __reduce__ method to pickle our objects for other purposes. Can we configure ray to be used with custom serialization?
Thank you!
@JessicaSchrouff yes definitely, can you see if ray.register_custom_serializer does what you want? https://ray.readthedocs.io/en/latest/api.html#ray.register_custom_serializer
@pcmoritz
I'm having some trouble serializing spacy objects (nlp models) :
pickle cannot serialize them , but cloudpickle and dill do.
would you have a recommendation or should I go with a custom serialization function ?
Great work !
@alaameloh Can you share an example that you would like to get working?
python
custom_model = spacy.load('path_to_model')
for doc in documents:
result_ids.append(processing_function.remote(doc, custom_model))
here ray falls back to using pickle to serialize the spacy model. however , with the current spacy version i'm working with (2.0.16), pickle doesn't work (but cloud pickle does). it gets stuck afterwards and crashed after some time (due to memory i believe).
depending on the model, the loading time for spacy would be an inconvenience if i simply loaded the model inside the processing_function and executed it with every call.
@alaameloh hm, when we say "falling back to pickle" I think we actually mean "cloudpickle". Can you provide a runnable code snippet that we can use to reproduce the issue?
@robertnishihara
import spacy
import ray
@ray.remote
def processing(nlp,text):
processed = nlp(text)
return processed
ray.init()
texts_list = ['this is just a random text to illustrate the serializing use case for ray',
'Ray is a flexible, high-performance distributed execution framework.To launch a Ray cluster, \
either privately, on AWS, or on GCP, follow these instructions.',
'This document describes what Python objects Ray can and cannot serialize into the object store. \
Once an object is placed in the object store, it is immutable.']
processed_id = []
nlp = spacy.load('en')
for text in texts_list:
processed_id.append(processing.remote(nlp,text))
processed_list = ray.get(processed_id)
WARNING: Falling back to serializing objects of type <class 'pathlib.PosixPath'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.vocab.Vocab'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.tokenizer.Tokenizer'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.DependencyParser'> by using pickle. This may be inefficient.
WARNING: Falling back to serializing objects of type <class 'spacy.pipeline.EntityRecognizer'> by using pickle. This may be inefficient.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0408 14:25:37.543133 10242 node_manager.cc:245] Last heartbeat was sent 961 ms ago
Traceback (most recent call last):
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 71, in remote
return self._remote(args=args, kwargs=kwargs)
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/remote_function.py", line 121, in _remote
resources=resources)
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 591, in submit_task
args_for_local_scheduler.append(put(arg))
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 2233, in put
worker.put_object(object_id, value)
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 359, in put_object
self.store_and_register(object_id, value)
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/worker.py", line 293, in store_and_register
self.task_driver_id))
File "/home/alaameloh/anaconda3/envs/poc/lib/python3.7/site-packages/ray/utils.py", line 437, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 493, in pyarrow._plasma.PlasmaClient.put
File "pyarrow/serialization.pxi", line 345, in pyarrow.lib.serialize
File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowMemoryError: malloc of size 134217728 failed
Wanted to add a couple of notes here for problems that I am experiencing related to serialization.
import ray
ray.init()
def manager_fun_list(x_list, fun):
res = ray.get([manager_fun_one(x, fun) for x in x_list])
return res
@ray.remote
def manager_fun_one(x, fun):
return(fun(x))
Now let's test this code:
import unittest
from .dummy import manager_fun_list
class TestDummy(unittest.TestCase):
x_list = [1,2,3,4,5]
def myfun(self, x):
return x*2
def test_managed_myfun(self):
manager_fun_list(self.x_list, self.myfun)
Running the test results in TypeError: Cannot serialize socket object. Likely because of the above discussed problem with self / super.
Now, here's something interesting, that I can't seem to get passed when trying to create unit tests for your own recommended solution (https://ray.readthedocs.io/en/latest/serialization.html#last-resort-workaround)
import ray
import pickle
ray.init()
def manager_fun_list(x_list, fun):
fun = pickle.dumps(fun)
res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
return res
@ray.remote
def manager_fun_one(x, fun):
fun = pickle.loads(fun)
return(fun(x))
import unittest
from .dummy import manager_fun_list
def myfun(x):
return x*2
class TestDummy(unittest.TestCase):
x_list = [1,2,3,4,5]
def test_managed_myfun(self):
res = manager_fun_list(self.x_list, myfun)
self.assertEqual(res, [2,4,5,8,10])
And the result... is.... ??? huh?
ERROR: test_managed_myfun (tests.test_manager_fun_list.TestDummy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/workspaces/.../.../tests/test_manager_fun_list.py", line 12, in test_managed_myfun
res = manager_fun_list(self.x_list, myfun)
File "/workspaces/.../.../dummy.py", line 7, in manager_fun_list
res = ray.get([manager_fun_one.remote(x, fun) for x in x_list])
File "/usr/local/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
raise value
ray.exceptions.RayTaskError: [36mray_worker[39m (pid=11502, host=662e97277f24)
File "/workspaces/.../.../dummy.py", line 12, in manager_fun_one
fun = pickle.loads(fun)
ModuleNotFoundError: No module named 'tests'
This are all fixed I think.
Most helpful comment
This is a bit of a doozy, but I would really like to be able to use
raywith the excellentsacredlibrary - they belong together, like peanut butter and jelly. (I'm specifically referring ray tune here)The
sacred.Experimentobject is a nightmare to serialize though.picklefails to serialize it in its simplest incarnation and real-world experiments are usually significantly more complicated.Is there any hope for ever serializing experiment?
If not, there is an early stage discussion about an OO-based interface to
sacredthat might provide a path forward.Update: I have managed to get tune running without having to serialize experiment and am now no longer sure that is required