I'm trying to figure out how to train TensorFlow on ImageNet.
The instructions under https://github.com/tensorflow/models/tree/master/research/slim list "ImageNet" as one of the supported datasets and ask to run bazel build slim/download_and_preprocess_imagenet, but that target is missing.
The closest one I found was research/inception/inception:download_and_preprocess_imagenet, however that fails with bazel-bin/research/inception/inception/download_and_preprocess_imagenet: line 79: bazel-bin/research/inception/inception/download_and_preprocess_imagenet.runfiles/inception/inception/data/preprocess_imagenet_validation_data.py: No such file or directory
It seems that bazel now places things inside download_and_preprocess_imagenet.runfiles/__main__/research while script assumes download_and_preprocess_imagenet.runfiles/research
Before I go down the rabbit hole of fixing things to get ImageNet training going, is this the right starting point, or is there something better?
Ah, it looks like the slim script moved to download_and_convert_imagenet.sh but the docs didn't get updated
Running that script in place of original downloads the dataset as expected but then fails with
./bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/download_imagenet.sh: line 99: ./bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/imagenet_lsvrc_2015_synsets.txt: No such file or directory
I checked and that file exists...the error is in this line of download_imagenet.sh
done < "${SYNSETS_FILE}"
Not sure what that line does...perhaps it gets confused by the path containing symlinks
@nathansilberman @sguada might have some ideas.
Still waiting to verify that whole procedure works, but I was able to get past that error by copying the file to a new location (instead of the old symlinked location)
./bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/imagenet_lsvrc_2015_synsets.txt to /tmp/synsets.txt and then modifying download_imagenet.sh to read
done < /tmp/synsets.txt instead of done < "${SYNSETS_FILE}"
@joel-shor also mentioned that he had issues with symlinks before in that script, so maybe that script just needs an extra copy step
Copying synsets file gets past that part, but now it crashes because the script references google3 here
Finished processing 544546 XML files.
Skipped 0 XML files not in ImageNet Challenge.
Skipped 0 bounding boxes not in ImageNet Challenge.
Wrote 615299 bounding boxes from 544546 annotated images.
Finished.
Finished downloading and preprocessing the ImageNet data.
Traceback (most recent call last):
  File "/home/ubuntu/models/research/slim/bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/build_imagenet_data.py", line 95, in <module>
    import google3
ModuleNotFoundError: No module named 'google3'
btw, fixing things above lets things convert again. To summarize:
Maybe after https://github.com/tensorflow/models/pull/3112 is merged, part 3. won't apply anymore
It works ! Thank you for the workaround @yaroslavvb
@yaroslavvb I did exact same thing as you did for step 2 and 3 and able to make tfrecord files for imagenet. And i started slim training for resnet50 and mobilenetv1 and make darknet-19(made my own using slim), but I get accuracy around 60%, but the accuracy should be above at least 70%.
Did you get the accuracy as claimed in paper for any nets in slim?
@yaroslavvb I followed all the steps 1 and 2. After applying the fixes you mentioned, after uncompressing a bunch of training data it gives the following error:
Uncompressing individual train tar-balls in the training data.
Processing: n01440764
Finished processing: n01440764
Processing: n01443537
Finished processing: n01443537
Processing: n01484850
Finished processing: n01484850
Processing: n01491361
Finished processing: n01491361
Processing: n01494475
Finished processing: n01494475
Processing: n01496331
Finished processing: n01496331
.......
Organizing the validation data into sub-directories.
Traceback (most recent call last):
  File "bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/preprocess_imagenet_validation_data.py", line 54, in <module>
    from six.moves import xrange
**ImportError: No module named six.moves**
I am using python 3.5 and installed tensorflow-gpu 1.5 from source on my Ubuntu 16.04.
after sudo easy_install six  and rerunning the same code I correct the previous error but get the following error: 
--> skipped 0 boxes and 0 XML files.
--> processed 535001 of 544546 XML files.
--> skipped 0 boxes and 0 XML files.
--> processed 540001 of 544546 XML files.
--> skipped 0 boxes and 0 XML files.
Finished processing 544546 XML files.
Skipped 0 XML files not in ImageNet Challenge.
Skipped 0 bounding boxes not in ImageNet Challenge.
Wrote 615299 bounding boxes from 544546 annotated images.
Finished.
Finished downloading and preprocessing the ImageNet data.
Traceback (most recent call last):
  File "/home/kazem/models/research/slim/bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/build_imagenet_data.py", line 98, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow
@kazemSafari the pre-processing script doesn't work on Python 3. I think you were actually using Python 2, which also explains why "six" was not found. I use conda environments to keep things separate, although that fails sometimes (conda install xyz once replaced my Python 3 binary with Python 2). To summarize -- 1. don't sudo install anything, 2. use virtual environments like conda to switch between versions 3. use sanity checks like which and strace -Ttf -e open to see which Python distribution is actually being used. In your case, I suspect sudo easy_install six changed your default Python2 environment to Python3 which doesn't have TensorFlow installed. Such is the state sad of affairs of open-source Python nowadays
@yaroslavvb I installed t.15 binary version for python2 using virtualenv. Activated the virtualenv and then ran the same command while in ~/models/research/slim directory:
export DATA_DIR = /media/kazem/ssd_1tb/imagenet-data
bazel-bin/download_and_convert_imagenet ${DATA_DIR}
It worked fine:
...
2018-02-17 16:49:37.672096 [thread 3]: Wrote 160146 images to 160146 shards.
2018-02-17 16:49:37.775175: Finished writing all 1281167 images in data set.
Thank you so much. My only question is does this code create tf_record files for the training set as well as the validation set? 
Here is the summary of all the steps needed based on @yaroslavvb corrections:
0i) The imagenet preprocessing code has dependencies to six package in python2. Also the part of the code after preprocessing is finished runs on tensorflow installed in python2. 
Install tensorflow (1.5 current version) in python2 (For instance install the binary version using virtualenv and then activate that environment). The preprocessing code only works in python2! In addition 
0ii) install bazel.
0iii) check whether tf.slim installation is successful
python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"
0iv) Install tf.slim
cd $HOME
git clone https://github.com/tensorflow/models/
0v) test tf.slim installation
cd $HOME/models/research/slim
python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"
0vi) changing directory
cd ~/models/research/slim 
1i) Build the preprocessing script.
bazel build slim/download_and_preprocess_imagenet
2)  Making corrections
2i) Move ./bazel-bin/download_and_convert_imagenet.runfiles/__main__/datasets/imagenet_lsvrc_2015_synsets.txt to /tmp/synsets.txt and then modifying download_imagenet.sh to read done < /tmp/synsets.txt instead of done < "${SYNSETS_FILE}".
2ii) remove google3 references in build_imagenet_data.py
3) Specify the location of where to place the ImageNet data
export DATA_DIR = /media/kazem/ssd_1tb/imagenet-data
4) Run the preprocessing script
bazel-bin/download_and_convert_imagenet ${DATA_DIR}
(python 2.7, tensorflow 1.1.0)   I followed your advice to move the txt file to another place, it worked well at first, but then it came the problem:
File "/data1/liusuyuan/resnet/models/research/inception/bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/inception/data/build_imagenet_data.py", line 175, in _bytes_feature
value = six.binary_type(value, encoding='utf-8')
TypeError: str() takes at most 1 argument (2 given)
2018-03-16 22:01:40.901723: Finished writing all 1281167 images in data set.
@yaroslavvb any suggestions?
Well, I solved the problem. I removed two lines of the code in build_imagenet_data.py, line174 and line 175.The function _bytes_feature  becomes:
def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
Looks like my PR makes it work for py3 but fail for py2.
@SuyuanLiu After deleting the two lines, the scripts finally worked! Thank you~
One question, will this have any effect on the following training and testing?
@Mrxiaoyuer I just run these model checkpoints: inception_resnet_v2, inception_v3, inception_v4, resnet_v2_50, resnet_v2_101, resnet_v2_152, to test an image, it looks good. I didn't train it or do anything more. Sorry I can't help you.
This thread authored by @yaroslavvb has shed tremendous light on _ImageNet_ training protocols, for the community. Thank you.
Most helpful comment
Well, I solved the problem. I removed two lines of the code in build_imagenet_data.py, line174 and line 175.The function
_bytes_featurebecomes:def _bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))