Incubator-mxnet: Can't pickle MXNet Modules

Created on 5 Dec 2017 · 9Comments · Source: apache/incubator-mxnet

Description

Can't pickle mxnet Modules

Environment info (Required)

print pickle.__version__
print mx.__version__
$Revision: 72223 $
0.12.0

Error Message:

TypeError                                 Traceback (most recent call last)
<ipython-input-36-024323561f98> in <module>()
      1 import pickle
----> 2 pickle.dumps(mlp_model)

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in dumps(obj, protocol)
   1378 def dumps(obj, protocol=None):
   1379     file = StringIO()
-> 1380     Pickler(file, protocol).dump(obj)
   1381     return file.getvalue()
   1382 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    423 
    424         if state is not None:
--> 425             save(state)
    426             write(BUILD)
    427 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

/home/ubuntu/anaconda2/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle module objects

Minimum reproducible example

net = mx.sym.Variable('data')
net = mx.sym.flatten(net)
net  = mx.sym.FullyConnected(net, num_hidden=128)
net = mx.sym.Activation(net, act_type="relu")
net = mx.sym.FullyConnected(net, num_hidden = 64)
net = mx.sym.Activation(net, act_type="relu")
net = mx.sym.FullyConnected(net, num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mlp_model = mx.mod.Module(symbol=net, context=mx.gpu())
import pickle
pickle.dumps(mlp_model)

Feature request Module Python

Source

fedorzh

Most helpful comment

This https://github.com/apache/incubator-mxnet/issues/10562 issue helped me.

train_data = gluon.data.DataLoader(
    mnist_train, batch_size=batch_size, shuffle=True, num_workers=0)

Change num_works=4 to num_works=0.

Also, do the same for validation data.

Hope this helps.

egeaydin on 13 Oct 2018

👍4

All 9 comments

why did you want to pickle the whole module class ?
you can saved your symbol with: https://mxnet.incubator.apache.org/api/python/symbol.html#mxnet.symbol.Symbol.save

edmBernard on 5 Dec 2017

For example, I use parallel processing to distribute my training jobs. joblib uses pickle in multiprocessing

fedorzh on 5 Dec 2017

Honestly I don't think module can be pickle. Mxnet have lot's of C++ inside.
For multi cpu core processing (if you don't use GPU) Mxnet support configuration with environnement variable: https://mxnet.incubator.apache.org/how_to/env_var.html
More you can also use NNPACK to parallelize training operation over cpu.

edmBernard on 6 Dec 2017

If you take the old interface with mxnet.model, it can be pickled.
Training of the model is actually not the longest part of my pipeline (sometimes, adds an insignificant part), a lot of other numpy-based machinery is happening outside of it, and parallelization helps immensely with that - I have to run multiple processes with different seeds

fedorzh on 6 Dec 2017

Proposed Labels:"Feature Request", "Module","Python"

leleamol on 10 Mar 2018

I am using windows 7, anaconda navigator 1.9.2, Python 3.6.6, Jupyter notebook 5.7.0, try to learn code from Gluon crash course chapter 5.

I already add:
import pickle

I got stuck at

for data, label in train_data:
    print(data.shape, label.shape)
    break
---
AttributeError                            Traceback (most recent call last)
<ipython-input-8-91a66f98d1d2> in <module>()
----> 1 for data, label in train_data:
      2     print(data.shape, label.shape)
      3     break

E:\Anaconda\envs\mxnet\lib\site-packages\mxnet\gluon\data\dataloader.py in __iter__(self)
    282         # multi-worker
    283         return _MultiWorkerIter(self._num_workers, self._dataset,
--> 284                                 self._batchify_fn, self._batch_sampler)
    285 
    286     def __len__(self):

E:\Anaconda\envs\mxnet\lib\site-packages\mxnet\gluon\data\dataloader.py in __init__(self, num_workers, dataset, batchify_fn, batch_sampler)
    142                 args=(self._dataset, self._key_queue, self._data_queue, self._batchify_fn))
    143             worker.daemon = True
--> 144             worker.start()
    145             workers.append(worker)
    146 

E:\Anaconda\envs\mxnet\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         # Avoid a refcycle if the target function holds an indirect

E:\Anaconda\envs\mxnet\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

E:\Anaconda\envs\mxnet\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

E:\Anaconda\envs\mxnet\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

E:\Anaconda\envs\mxnet\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object 'Dataset.transform_first.<locals>.base_fn'

please help. thanks.

sliawatimena on 10 Oct 2018

@sliawatimena Can you file a separate issue on this repository and also can you provide a minimum reproducible example to help debug this issue ?

From the stacktrace that you've posted, it seems unclear as to where you are using pickle ?
Also, are you using pickle.dump or pickle.load ?

piyushghai on 10 Oct 2018

Dear @piyushghai,

I just copy from 5. Train the neural network, from step 1 - 5 are okay. In step 6, the error message are as previous post.

From googling results: this looks like a Windows-specific problem with Python multiprocessing and Jupyter Notebook. Please help.

Thanks.

Suryadi

sliawatimena on 11 Oct 2018

This https://github.com/apache/incubator-mxnet/issues/10562 issue helped me.

train_data = gluon.data.DataLoader(
    mnist_train, batch_size=batch_size, shuffle=True, num_workers=0)

Change num_works=4 to num_works=0.

Also, do the same for validation data.

Hope this helps.

egeaydin on 13 Oct 2018

👍4

Was this page helpful?

0 / 5 - 0 ratings

Related issues

fine-tuning and freezing layers

yuconglin · 3Comments

Bug in backprop for the transpose of a transpose

dmadeka · 3Comments

a error! Mxnet will crash!

dushoufu · 3Comments

gluon.loss.SoftmaxCrossEntropyLoss() missed ignore_label parameter?

qiliux · 3Comments

train.rec test.rec for cifar100

ranti-iitg · 3Comments