Pysyft: `del` of ObjectPointers on Python shutdown throws errors

Created on 14 May 2020 · 7Comments · Source: OpenMined/PySyft

Describe the bug
On execution of even a very basic code such as

import torch as th

import syft as sy

hook = sy.TorchHook(th)
alice = sy.VirtualWorker(hook, id="alice")
bob = sy.VirtualWorker(hook, id="bob")
crypto_provider = sy.VirtualWorker(hook, id="james")

a = torch.ones(1, 5)
a = a.encrypt(workers=[alice, bob], crypto_provider=crypto_provider)

a error is thrown when python shuts down.

Exception ignored in: <bound method ObjectPointer.__del__ of [PointerTensor | me:51874166013 -> alice:36055571191]>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/syft/generic/pointers/object_pointer.py", line 347, in __del__
  File "/usr/local/lib/python3.6/dist-packages/syft/workers/base.py", line 267, in send_msg
  File "/usr/local/lib/python3.6/dist-packages/syft/serde/serde.py", line 40, in serialize
ImportError: sys.meta_path is None, Python is likely shutting down
Exception ignored in: <bound method ObjectPointer.__del__ of [PointerTensor | me:83736240700 -> bob:18435096342]>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/syft/generic/pointers/object_pointer.py", line 347, in __del__
  File "/usr/local/lib/python3.6/dist-packages/syft/workers/base.py", line 267, in send_msg
  File "/usr/local/lib/python3.6/dist-packages/syft/serde/serde.py", line 40, in serialize
ImportError: sys.meta_path is None, Python is likely shutting down

To Reproduce
Steps to reproduce the behavior:

Go to https://colab.research.google.com/drive/1CSQGnxDZo0TwMjERRpLYKTtDmQoIu9LK?usp=sharing or run given snippet pasted in a .py file from your terminal/IDE
See error

Expected behavior
Python must terminate without throwing an error

Status Status Type Type

Source

Syzygianinfern0

Most helpful comment

I have seen this happen before, though it may not be consistent. I think the cause is PointerTensors or ObjectPointers going out of scope, having their __del__ method called (which attempts to send a message to the owner of the referenced object), and having that collide with Python shut down activities. There's not likely to be a quick fix for this though, because our GC implementation fundamentally relies on Python's GC, which offers no guarantees for when or whether the __del__ method will be executed.

Long term, the solution is probably to come up with a better way to do distributed GC.

karlhigley on 15 May 2020

👍2

All 7 comments

@karlhigley not able to replicate the error, both in google colab and in local terminal.

shubham3121 on 14 May 2020

Long term, the solution is probably to come up with a better way to do distributed GC.

karlhigley on 15 May 2020

👍2

@tudorcebere I've been thinking a lot about refactoring workers in order to make Protocols work, and I think we might want to consider having separate threads for message sending/receiving, message processing, and Plan/Protocol execution. If we did that, we'd need to be able to pass messages between threads, so we'd probably use thread-safe queues.

And if we had _that_, then maybe the __del__ hook that gets called for garbage collection when objects go out of scope could add a delete message to the queue instead of directly trying to serialize and send it. That would turn this issue into "Do we want to make sure outgoing delete messages are processed when Python shuts down? If so, how?"

Instead of sending outgoing delete messages when we shut down, it might make more sense for workers to GC remaining objects that came from workers they're no longer in contact with? Not sure, but seems possible.

karlhigley on 28 May 2020

Thinking maybe we should create a milestone for async/multi-threaded workers and assign this issue to that milestone. Anyone else have thoughts on that? I can't see a good way to address this without some form of concurrency, but that doesn't mean there isn't one. 🤔

karlhigley on 28 May 2020

👍1

@karlhigley I think this might be an awesome idea, workers really need some love, I like the idea of making send and receive on separate threads (could this help async workers as well?). This could be a step forward the actor model as well and the stack forwarding project. (maybe we would like to stick with some custom actor model?). I am not familiar with the GC behavior, but in my mind, the idea of adding a del message when an object goes out of scope could work really nice, (should make everything more transparent as well).

tudorcebere on 29 May 2020

👍1

The current GC behavior does send a delete message, but since our comms methods are currently synchronous and blocking, that means that garbage collection is a blocking operation. 🙁

karlhigley on 29 May 2020

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.