Hi guys,
Finally I came out with a solution to my problem. Turi is working finally great but there is just one problem: if you will rename a folder with slash such as "example/human" and you will recall it in python as 'example:human' Turi will start to have problem during the exportation of the model and instead to spend several seconds on a base of 200 pics and 21 classes it will stay for 2 hours.
I don't know what can be the problem but try to fix it. Have a good easter 馃憤
@PietroMessineo Can you please provide some Python code that will reproduce this issue? (Even if you are unable to provide the data, we can probably recreate similar enough data, given a description of the columns + types). Thanks!
Hi @znation,
Actually I'm using three different python, but I'm encountering the problem during the export, below the code:
import turicreate as tc
**#Load the data**
data = tc.SFrame('alfa.sframe')
**# Make a train-test split**
train_data, test_data = data.random_split(0.8)
tc.config.set_runtime_config('TURI_FILEIO_MAXIMUM_CACHE_CAPACITY', 2*1024*1024*1024)
**# Create a model**
model = tc.image_classifier.create(train_data, target='label', model='squeezenet_v1.1', max_iterations=50)
**# Save predictions to an SFrame (class and corresponding class-probabilities)**
predictions = model.classify(test_data)
**# Evaluate the model and save the results into a dictionary**
results = model.evaluate(test_data)
print "Accuracy : %s" % results['accuracy']
print "Confusion Matrix : \n%s" % results['confusion_matrix']
**# Save the model for later usage in Turi Create**
model.save('Alfa2.model')
@PietroMessineo How does the folder with a slash in the name relate to the Python code you provided? I don't see where that code uses a folder with a slash in the name.
Hi @znation, in macOS the slash is recognize as semicolon so the code is as follow:
import turicreate as tc
**# load data**
image_data = tc.image_analysis.load_images('alfaset', with_path=True)
labels = ['A:a', 'B:b', 'C:c', 'D:d', 'E:e', 'F:f', 'G:g', 'H:h','I:i','J:j','K:k','L:l','M:m','N:n','O:o','P:p','Q:q','R:r','S:s','T:t','U:u','V:v','W:w','X:x','Y:y','Z:z']
def get_label(path, labels=labels):
for label in labels:
if label in path:
return label
image_data['label'] = image_data['path'].apply(get_label)
**# save data**
image_data.save('alfa.sframe')
**# explore**
image_data.explore()
The problem is when I'm lunching the code attached on the previous comment, it's staying for too long before to import but when I'm renaming the folder with "aa" instead "a:A" the export time is 100 times faster.
@PietroMessineo With the above code (both data preparation, and training and model saving) with 83 images and 5 classes (my own dataset), I get the following performance results:
# With slashes in filenames/labels
python repro.py 260.04s user 18.68s system 131% cpu 3:32.28 total
# With underscores instead of slashes
python repro_noslash.py 259.83s user 18.08s system 133% cpu 3:28.50 total
I don't see a performance difference between the two cases. Are you able to reproduce this consistently (even with the same data) with slashes vs. any other character, or is it possible something else changed as well that changed the performance drastically?
Hi @znation that鈥檚 pretty strange. I tried it multiple times... will you be interested to try my dataset with small amount of images? Like 10 for each class.
@PietroMessineo Yes, I'd like to try your dataset to see if I can repro with that, if that's ok with you. You can either share it on this thread (if GitHub will allow a file that large), or upload elsewhere and e-mail me a link (email is in my profile). Thanks!
@znation Thank you! Check your email 馃憤
After investigating with the dataset provided privately by @PietroMessineo, I have determined the underlying issue here is RAM usage. On my 8 GB RAM machine, I see this dataset using up to 30 GB of RAM (including 22 GB of swap), which effectively hangs the process and makes the machine unresponsive since swap is so slow (and we are so heavily relying on it). This seems like a bug - we should attempt to stay within a reasonable amount of RAM usage (relative to total system RAM) even on a large training set.
@PietroMessineo We finally identified the issue. If you need a temporary work around, please revert to turicreate==4.1
That鈥檚 good to hear, I鈥檒l try it as soon as possible. Keep me updated
Verified that the fix works (with the original repro script/data). That script now completes on my machine (8 GB RAM) without hanging.
@znation that鈥檚 great! Is it already out the last version?
@PietroMessineo The fix has now been released. pip install turicreate==4.3.1.
Most helpful comment
After investigating with the dataset provided privately by @PietroMessineo, I have determined the underlying issue here is RAM usage. On my 8 GB RAM machine, I see this dataset using up to 30 GB of RAM (including 22 GB of swap), which effectively hangs the process and makes the machine unresponsive since swap is so slow (and we are so heavily relying on it). This seems like a bug - we should attempt to stay within a reasonable amount of RAM usage (relative to total system RAM) even on a large training set.