from mongoengine import *
conn = connect('db_name', username='', password='', host='localhost', port=27017)
conn.close()
print Collection.objects.count() #Why this is working
Interesting.. i tested this scenario too. why is this not working. does this mean its leaves a connection behind ?
It's because there are references to the connection all over the place.
I think this solves the problem:
connection._dbs = {}
connection._connections = {}
connection._connection_settings = {}
# needed if you've done anything with Collection (saved/queried/etc.)
Collection._collection = None
This also solves problems using multiprocessing and getting warned about the dangers of forking.
@Clyde-fare
when to use your code ? for every query ?
It works!! Thank you so much!!! I'm using MongoEngine and multiprocessing, ending up with the same issue. The forked child goes to debug_1 at first, but if I run any query in child process, it goes to debug_2 (child is using parent's topology instead of its own). I've tried various way to close and delete the connection, the parent's connection still appeared after forking and caused warning from topology:
pymongo/topology.py
132 def open(self):
133 """Start monitoring, or restart after a fork.
134
135 No effect if called multiple times.
136
137 .. warning:: Topology is shared among multiple threads and is protected
138 by mutual exclusion. Using Topology from a process other than the one
139 that initialized it will emit a warning and may result in deadlock. To
140 prevent this from happening, MongoClient must be created after any
141 forking.
142
143 """
144 if self._pid is None:
145 self._pid = os.getpid() # debug_1
146 else:
147 if os.getpid() != self._pid: # debug_2
148 warnings.warn(
149 "MongoClient opened before fork. Create MongoClient only "
150 "after forking. See PyMongo's documentation for details: "
151 "http://api.mongodb.org/python/current/faq.html#"
152 "is-pymongo-fork-safe")
153
154 with self._lock:
155 self._ensure_opened()
@Clyde-fare I found disconnect() already cleans up _dbs, _connections and _connection_settings. Below code works for my case:
from mongoengine.connection import connect, disconnect
from multiprocessing import Pool
def save_data_to_database_per_dir(dir_path):
connect('db', host=...)
# save data to db
....
def save_data_to_database(root, overwrite, process_num=10):
count_before = MyClass.objects().count()
disconnect()
MyClass._collection = None # this line is NECESSARY! I wasted one day here...
dir_paths = [os.path.join(root, category, 'data_files') for category in os.listdir(root)]
with Pool(process_num) as p:
r = list(tqdm(p.imap(save_data_to_database_per_dir, dir_paths), total=len(dir_paths)))
count_after = MyClass.objects().count()
tqdm.write('%d exist, %d newly saved.' % (count_before, count_after - count_before))
fyi, I'm currently working on fixing the disconnect method.
Hi, new to MongoDB. I seem to still be unable to close a connection:
disconnect('test')
for test_thingy in Test.objects:
logger.info(test_thingy.to_json())
These loggers will still print the items in the test collection. My expectation is that disconnecting should prevent them from being accessed.
Most helpful comment
It's because there are references to the connection all over the place.
I think this solves the problem:
This also solves problems using multiprocessing and getting warned about the dangers of forking.