I just used git to download the codes, and tensorflow was installed from nightly build. Then I compiled tensorflow from source.
# python cifar10_multi_gpu_train.py --num_gpus=4
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/***/Downloads/models/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 171, in train
loss = tower_loss(scope)
File "cifar10_multi_gpu_train.py", line 78, in tower_loss
logits = cifar10.inference(images)
File "/home/***/Downloads/models/tutorials/image/cifar10/cifar10.py", line 207, in inference
wd=0.0)
File "/home/***/Downloads/models/tutorials/image/cifar10/cifar10.py", line 137, in _variable_with_weight_decay
weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
AttributeError: 'module' object has no attribute 'mul'
@drpngx It looks like cl/141623422 (Dec 9th) didn't get synced to tensorflow/models to rename tf.mul -> tf.multiply. Possibly because December 9th was also the day that https://github.com/tensorflow/models/commit/86ecc9730d751c1f72e3bfecac958166390f4125 moved cifar10 from tensorflow/tensorflow to tensorflow/models.
I'm also curious why tf.mul is throwing AttributeError since the old name is only supposed to be deprecated.
We don't do sync tensorflow/models. They are probably all broken by now. @aselle would know better about tf.mul. I think we actually removed mul sub etc to get a clean slate for the next release.
@weigei123 the new symbol is tf.multiply, would you mind submitting a pull request to fix this?
We don't do sync
tensorflow/models.

@drpngx Thanks for your help,it's my pleasure to submit a pull request. But, I got another error.
:P
# python cifar10_multi_gpu_train.py --num_gpus=4
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/***/Downloads/models/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/***/Downloads/models/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/***/Downloads/models/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/***/Downloads/models/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 210, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
colocate_with_primary=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 101, in create_slot
return _create_slot_var(primary, val, '')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 55, in _create_slot_var
slot = variable_scope.get_variable(scope, initializer=val, trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 987, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 889, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 347, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 332, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 656, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
That error seems to be what #892 is addressing and is hopefully fixed by PR #909 . Closing this issue out as a duplicate of #892.
@asimshankar This scope issue is discussed in tensorflow/tensorflow#6220. I have submitted a PR #911 to resolve these two issues.
@wookayin Thanks a lot. It works.
: )
But each GPU consumes less than 25% of GPU-Util. (I have TITAN X(Pascal)*4 ).
What do you think is the impact of GPU usage?
Yes, mine (Pascal Titan X * 4) also shows a low utilization (15%~20%). I presume this is because Pascal Titan X is too fast (!) than the rate of input being produced into the queue (including the time for I/O and the session.run() overhead), but not 100% confident.
Most helpful comment