When saving a model during unit testing with unittest I get warnings on unclosed files.
This happens with custom models as well as with models downloaded from public repository.
import spacy
from tempfile import TemporaryDirectory
from unittest import TestCase
class TestSaveSpacyModel(TestCase):
def test_save(self):
nlp = spacy.load("de_core_news_sm")
with TemporaryDirectory() as temp_dir:
nlp.to_disk(temp_dir)
Running with unittest results in:
Process finished with exit code 0
C:\***\lib\site-packages\spacy\language.py:900: ResourceWarning: unclosed file <_io.TextIOWrapper name='C:\\***\\Temp\\tmpdintxtwz\\meta.json' mode='w' encoding='cp1252'>
srsly.json_dumps(self.meta)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
C:\***\lib\site-packages\spacy\util.py:645: ResourceWarning: unclosed file <_io.BufferedWriter name='C:\\***\\Temp\\tmpdintxtwz\\tagger\\model'>
writer(path / key)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
C:\***\lib\site-packages\spacy\util.py:645: ResourceWarning: unclosed file <_io.BufferedWriter name='C:\\***\\Temp\\tmpdintxtwz\\vocab\\vectors'>
writer(path / key)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Ran 1 test in 1.457s
OK
This seems to be a bug similar as to #1706, which was related to loading models and has been fixed. However, it seems that saving a model also does not properly close files.
Since this can be muted easily, this is not much of an issue for me. Still though, might as well report it.
```
Thanks for the report! Definitely something to look into. If you feel like helping out with a PR, that would be most welcome :-)
Sure, I'll have a look.
As a side note, the reason this may not have been seen yet appears to be this bug in pytest:
https://github.com/pytest-dev/pytest/issues/5676
As suggested by the warning messages, some of the file descriptors declared in to_disk methods of classes Language, Vectors and Tagger are currently not properly closed after use. As far as I can see this seems to be the case for classes Pipe and EntityLinker, too.
I could try to fix the latter two here as well, or should this be done somewhere else?
If it's the same issue, I think it's fine to copy the fix across. The code probably got copied around originally...
Here' s another question: The interface class Pipes requires a Model class to be attached as class attribute. In the implementations present in the same module, this is required to follow the interface class thinc.neural.Model (according to doc strings). Can I therefore assume that this is alwasys the case?
The reason I am asking is the following line in Pipes.to_disk (which seems to raise the warning):
if self.model not in (None, True, False):
serialize["model"] = lambda p: p.open("wb").write(self.model.to_bytes())
In particular, can I assume that model will always implement the method to_disk?
Afaik this is the case for thinc.model.Model which, I assume, is the same as thinc.neural.Model.
Short answer: Yes - you can assume that a pipe's model has a to_bytes() method.
Longer answer: when looking at the Thinc code, things get tricky. You're currently working on spaCy's master, which uses Thinc 7, which is the old version and can still be seen here: https://github.com/explosion/thinc/tree/v7.x/thinc. There, you'll find that Model is indeed in thinc.neural.
Meanwhile, Thinc 8 has been released which has Model in thinc.model. We are adjusting spaCy to this new version on the develop branch. But I would hope/assume that the fix you're creating, will be easily portable between the two.
Thank you for responding so fast. I see, I did not pay attention to the version of Thinc spacy is currently using. Thanks for pointing that out.
Actually, I was hoping to use Model.to_disk instead of Model.to_bytes (which is currently used). My question was not really clear there, sorry.
It seems that for a fix it is only necessary to use to_disk which is implemented in both versions, Thinc 7 and 8, such that it closes the file after writing to it. So this should work for both versions without larger modifications.
Yea, I think you're right. The Vocab is also written with its to_disk method so I think that'd make sense!
I think this is similar to the issue when i'm invoking nlp.to_disk('/path/to/thingie')
Exception ignored in: <_io.FileIO name='/path/to/thingie/meta.json' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.TextIOWrapper name='/path/to/thingie/meta.json' mode='w' encoding='UTF-8'>
Exception ignored in: <_io.FileIO name='/path/to/thingie/tagger/model' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.BufferedWriter name='/path/to/thingie/tagger/model'>
Exception ignored in: <_io.FileIO name='/path/to/thingie/vocab/vectors' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.BufferedWriter name='/path/to/thingie/vocab/vectors'>
Python version: 3.7.7
Spacy versions: 2.2.2, 2.2.3, 2.2.4
Which is weird. The only thing I think changed was minor update to python? I was using 2.2.3 a month ago, and then today it's decided to show this error. Note: I'm using it, not unittesting.
@lfiedler : thanks again for your contributions, this will be included in spaCy 2.3 onwards !
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.