The kernel is killing when I try to save the model.
The code responsible for this is:
dataset = tc.SFrame({'rating': ratings_vector, 'text': reviews_vector}) # {'rating': ratings_vector, 'text': reviews_vector}
model = tc.sentence_classifier.create(dataset, 'rating', features=['text']) # l1_penalty=1.0 ## no l_1penalty ? odd.
# model.predict
model.save('MySentenceClassifier.model')
model.export_coreml('MySentenceClassifier.mlmodel')
This gives an output of:
(sent_turi_2_7/) DN0a237d22:aclImdb airphoenix$ python imdb_sent_turi.py
building model from ratings and dataset, with text features
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples : 23718
Number of classes : 8
Number of feature columns : 1
Number of unpacked features : 243873
Number of coefficients : 1707118
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1 | 3 | 0.000042 | 3.592440 | 0.639514 | 0.207488 |
| 2 | 5 | 1.000000 | 170.172835 | 0.954844 | 0.273791 |
| 3 | 6 | 1.000000 | 349.994020 | 0.981870 | 0.306552 |
| 4 | 7 | 1.000000 | 530.304889 | 0.991146 | 0.322933 |
| 5 | 8 | 1.000000 | 700.272448 | 0.996079 | 0.330733 |
| 6 | 9 | 1.000000 | 866.660870 | 0.998440 | 0.330733 |
| 7 | 10 | 1.000000 | 1031.851763 | 0.999410 | 0.335413 |
| 8 | 11 | 1.000000 | 1213.579933 | 0.999831 | 0.334633 |
| 9 | 12 | 1.000000 | 1420.516521 | 0.999831 | 0.333073 |
| 10 | 13 | 1.000000 | 1605.898038 | 0.999831 | 0.333073 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.
Killed: 9
I have tried both 2.7 and 3.6. If I run something small, like the below:
dummy_ratings = [1, 0]
dummy_reviews = ['good', 'bad']
dataset = tc.SFrame({'rating': dummy_ratings, 'text': dummy_reviews}) # {'rating': ratings_vector, 'text': reviews_vector}
model = tc.sentence_classifier.create(dataset, 'rating', features=['text']) # l1_penalty=1.0 ## no l_1penalty ? odd.
# model.predict
model.save('dummy_SentenceClassifier.model')
it works fine:
(sent_turi) DN0a237d22:aclImdb airphoenix$ python compile.py
WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples : 2
Number of classes : 2
Number of feature columns : 1
Number of unpacked features : 2
Number of coefficients : 3
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+-------------------+
| Iteration | Passes | Elapsed Time | Training-accuracy |
+-----------+----------+--------------+-------------------+
| 1 | 2 | 1.036190 | 1.000000 |
| 2 | 3 | 1.042693 | 1.000000 |
+-----------+----------+--------------+-------------------+
SUCCESS: Optimal solution found.
Anything I can do?
Is this happening during save or export_to_coreml? I suspect its the latter. Save should work on data that size. Can you confirm?
I will run the longer one presently, but for now, the dummy model exports fine.
dummy_ratings = [1, 0]
dummy_reviews = ['good', 'bad']
dataset = tc.SFrame({'rating': dummy_ratings, 'text': dummy_reviews}) # {'rating': ratings_vector, 'text': reviews_vector}
model = tc.sentence_classifier.create(dataset, 'rating', features=['text']) # l1_penalty=1.0 ## no l_1penalty ? odd.
# model.predict
print('SAVING MODEL...')
model.save('dummy_SentenceClassifier.model')
print('EXPORTING COREML...')
model.export_coreml('dummy_SentenceClassifier.mlmodel')
Number of examples : 2
Number of classes : 2
Number of feature columns : 1
Number of unpacked features : 2
Number of coefficients : 3
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+-------------------+
| Iteration | Passes | Elapsed Time | Training-accuracy |
+-----------+----------+--------------+-------------------+
| 1 | 2 | 1.056459 | 1.000000 |
| 2 | 3 | 1.061228 | 1.000000 |
+-----------+----------+--------------+-------------------+
SUCCESS: Optimal solution found.
SAVING MODEL...
EXPORTING COREML...
Saving valid model to path dummy_SentenceClassifier.mlmodel
This
print('building model from ratings and dataset, with text features')
dataset = tc.SFrame({'rating': ratings_vector, 'text': reviews_vector}) # {'rating': ratings_vector, 'text': reviews_vector}
model = tc.sentence_classifier.create(dataset, 'rating', features=['text']) # l1_penalty=1.0 ## no l_1penalty ? odd.
# model.predict
print('SAVING MODEL...')
model.save('MySentenceClassifier.model')
print('MODEL SAVED!')
print('EXPORTING COREML...')
model.export_coreml('MySentenceClassifier.mlmodel')
print('EXPORTED COREML!')
is not reaching the print('SAVING MODEL...') line.
Logistic regression:
--------------------------------------------------------
Number of examples : 23741
Number of classes : 8
Number of feature columns : 1
Number of unpacked features : 244369
Number of coefficients : 1710590
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1 | 3 | 0.000042 | 3.221630 | 0.634640 | 0.185862 |
| 2 | 5 | 1.000000 | 171.659689 | 0.955099 | 0.277204 |
| 3 | 6 | 1.000000 | 340.046417 | 0.983067 | 0.304210 |
| 4 | 7 | 1.000000 | 808.600929 | 0.991702 | 0.319301 |
| 5 | 8 | 1.000000 | 981.538927 | 0.996083 | 0.335187 |
| 6 | 9 | 1.000000 | 1155.803319 | 0.998442 | 0.345512 |
| 7 | 10 | 1.000000 | 1311.784861 | 0.999452 | 0.345512 |
| 8 | 11 | 1.000000 | 1509.004454 | 0.999789 | 0.344718 |
| 9 | 12 | 1.000000 | 1731.304582 | 0.999789 | 0.347895 |
| 10 | 13 | 1.000000 | 1896.446011 | 0.999832 | 0.347101 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.
Killed: 9
I have run this on another machine. Same problem.
I can confirm this is also happening to me.
Code:
data.save("detail_images.sframe")
data.explore()
dataBuffer = turi.SFrame("detail_images.sframe")
trainingBuffers, testingBuffers = dataBuffer.random_split(0.9)
turi.config.set_runtime_config('TURI_FILEIO_MAXIMUM_CACHE_CAPACITY', 2*1024*1024*1024)
model = turi.image_classifier.create(trainingBuffers, target="productNumber", model="resnet-50", max_iterations=10)
evaluations = model.evaluate(testingBuffers)
print evaluations["accuracy"]
print('SAVING MODEL...')
model.save("detail_images.model")
print('EXPORTING MODEL...')
model.export_coreml("DetailImagesClassifier.mlmodel")
Output:
Materializing SFrame...
Done.
[10:01:15] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[10:01:15] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
Resizing images...
Performing feature extraction on resized images...
Completed 512/1321
Completed 1024/1321
Completed 1321/1321
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
WARNING: Detected extremely low variance for feature(s) '__image_features__' because all entries are nearly the same.
Proceeding with model training using all features. If the model does not provide results of adequate quality, exclude the above mentioned feature(s) from the input dataset.
Logistic regression:
--------------------------------------------------------
Number of examples : 1261
Number of classes : 446
Number of feature columns : 1
Number of unpacked features : 2048
Number of coefficients : 911805
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1 | 3 | 0.000793 | 27.941375 | 0.016653 | 0.000000 |
| 2 | 5 | 1.000000 | 53.585154 | 0.193497 | 0.000000 |
| 3 | 6 | 1.000000 | 70.705666 | 0.434576 | 0.000000 |
| 4 | 7 | 1.000000 | 87.500613 | 0.616971 | 0.000000 |
| 5 | 8 | 1.000000 | 103.658185 | 0.714512 | 0.016667 |
| 6 | 9 | 1.000000 | 120.545977 | 0.840603 | 0.066667 |
| 7 | 10 | 1.000000 | 139.111506 | 0.881840 | 0.116667 |
| 8 | 11 | 1.000000 | 155.651223 | 0.922284 | 0.150000 |
| 9 | 12 | 1.000000 | 176.977315 | 0.946868 | 0.183333 |
| 10 | 13 | 1.000000 | 192.442671 | 0.962728 | 0.183333 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.
Killed: 9
This is actually only about 1/3 of our full dataset which contains roughly 4500 images to be scanned.
Any ideas on why Terminal is killing the process?
@bfeher: Can you updgrade to the latest turicreate
pip install turicreate --upgrade
@srikris I upgraded to turicreate v4.3.1 and it seems to work :)
Successfully installed decorator-4.3.0 mxnet-1.1.0.post0 turicreate-4.3.1
I was given this output, and the Turi Create Visualizer crashed, but it did complete.
Materializing SFrame...
Done.
[10:18:36] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[10:18:36] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
Resizing images...
Performing feature extraction on resized images...
Completed 512/1320
57:65: execution error: Turi Create Visualization got an error: AppleEvent timed out. (-1712)
Completed 1024/1320
Completed 1320/1320
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
WARNING: Detected extremely low variance for feature(s) '__image_features__' because all entries are nearly the same.
Proceeding with model training using all features. If the model does not provide results of adequate quality, exclude the above mentioned feature(s) from the input dataset.
Logistic regression:
--------------------------------------------------------
Number of examples : 1257
Number of classes : 448
Number of feature columns : 1
Number of unpacked features : 2048
Number of coefficients : 915903
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1 | 3 | 0.000796 | 27.975578 | 0.031026 | 0.000000 |
| 2 | 5 | 1.000000 | 52.112428 | 0.193317 | 0.000000 |
| 3 | 6 | 1.000000 | 67.687941 | 0.429594 | 0.000000 |
| 4 | 7 | 1.000000 | 83.256735 | 0.603819 | 0.015873 |
| 5 | 8 | 1.000000 | 98.695507 | 0.751790 | 0.031746 |
| 6 | 9 | 1.000000 | 114.265765 | 0.812251 | 0.063492 |
| 7 | 10 | 1.000000 | 129.814082 | 0.872713 | 0.063492 |
| 8 | 11 | 1.000000 | 145.254219 | 0.887828 | 0.063492 |
| 9 | 12 | 1.000000 | 160.750995 | 0.925219 | 0.111111 |
| 10 | 13 | 1.000000 | 176.391701 | 0.944312 | 0.063492 |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.
Resizing images...
Performing feature extraction on resized images...
Completed 147/147
0.122448979592
SAVING MODEL...
EXPORTING MODEL...
The Turi Create Visualizer seems to crash on launch.
@bfeher Sorry about that. Is there any way you can share those images with us. I think there is probably an image in there that is causing the crash.
@srikris
Thank you for your help :)
I can confirm that the Visualizer is crashing for other image sets as well, not just my own. Before upgrading to 4.3.1 I did not have this problem.
This can be recreated using the dataset of images from this tutorial: https://www.appcoda.com/core-ml-model-with-python/
Here is the link to the image dataset from the tutorial above: https://drive.google.com/file/d/1ZLigrn7YcETalcj2qK6UqXceDdOV3244/view?usp=sharing
Please let me know if you need any more information.
@bfeher The fix is now available in Turi Create 4.3.2 on PyPI and GitHub
Confirmed. Thank you :)