I'm trying to create an object detector model using the IG02 cars dataset. All code I'm using is the same as what is given in the docs just with the addition of configuring to use a GPU. When it gets to exporting the model to a .mlmodel format, I get the following error:
Traceback (most recent call last):
File "/home/w012ldf/turicreate/createModel.py", line 28, in <module>
model.export_coreml('cars.mlmodel')
File "/opt/anaconda2/lib/python2.7/site-packages/turicreate/toolkits/object_detector/object_detector.py", line 1188, in export_coreml
del net._children[24]
File "/opt/anaconda2/lib/python2.7/collections.py", line 85, in __delitem__
dict_delitem(self, key)
KeyError: 24
srun: error: fry: task 0: Exited with exit code 1
My environment:
I am running this code with Singularity container and SLURM on a Linux server using a Tesla P100 GPU for training. The recipe file for building my container is here. Quick notes from the recipe file, I'm using Python 2.7 (have tried using 3.6 as well) and Turi Create 5.2.
It's able to save the model as a .model but is unable to export to .mlmodel. I was able to run the same code on my MacBook Pro 2017 with a different dataset a few weeks ago and get a mlmodel but now that I'm trying to train on a Linux server with a GPU it is throwing this error.
@loganfrank Can you share the .model with us. That would make it easier to reproduce.
Of course! The zip file of the .model was too large to insert into this comment so I uploaded it to my github.
Curious if there's been any discovery or recent developments on this issue. We've been running into the exact same error while training on either a Tesla K80 or T4 on Google Colab with Python 2 & 3 runtimes, TC 5.4+, CUDA 10, and mxnet-cu100 1.3.1+. The error below occurs regardless of how long we train the model:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-c5156deb5717> in <module>()
3
4 # Export the trained model to CoreML format
----> 5 res = model.export_coreml(coreml_model_name)
/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/object_detector/object_detector.py in export_coreml(self, filename, include_non_maximum_suppression, iou_threshold, confidence_threshold)
1216 assert (self._model[23].name == 'pool5' and
1217 self._model[24].name == 'specialcrop5')
-> 1218 del net._children[24]
1219 net._children[23] = op
1220
KeyError: 24
We've provided a link to the zipped .model file associated with this exact failure:
https://s3.amazonaws.com/skafos.example.data/ObjectDetection/ObjectDetection.model.zip
@loganfrank we were able to get this working on Google Colab this week using the following dependencies:
mxnet-cu91==1.1.0turicreate==5.2numpy==1.16.3Feel free to check out our public repo here https://github.com/skafos/colab-example-models/tree/master/ObjectDetection to see it work/ give it a try
Looking at the issue -- it appears that the mxnet version was upgraded after installing TuriCreate. This isn't currently tested / supported. We're working on that.
Closing now as this is not a supported configuration currently with mxnet v 1.13.1. We'll be working to upgrade mxnet support to a newer version soon.
Most helpful comment
Curious if there's been any discovery or recent developments on this issue. We've been running into the exact same error while training on either a Tesla K80 or T4 on Google Colab with Python 2 & 3 runtimes, TC 5.4+, CUDA 10, and mxnet-cu100 1.3.1+. The error below occurs regardless of how long we train the model:
We've provided a link to the zipped
.modelfile associated with this exact failure:https://s3.amazonaws.com/skafos.example.data/ObjectDetection/ObjectDetection.model.zip