Models: cifar10_eval issues - tensorflow 0.12

Created on 12 Jan 2017  路  25Comments  路  Source: tensorflow/models

tensorflow/models/tutorial/image/cifar10/cifar10_eval.py

When I run cifar10_eval.py alongside the cifar10_train.py script in tensorflow 0.12, it crashes the main training script with the error:

Out of range: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 46)

It does this on both the gpu training code and the cpu code - but this doesn't happen using tensorflow 0.11. Any ideas?

bug

Most helpful comment

@al3xsh is using the 0.12.1 release version, and @weigei123 is using 1.0 snapshot on the master branch. tf.mul is removed at master only, so it does not affect 0.12.x series

All 25 comments

Could you please provide the full stacktrace? Also did you modify the tutorial scripts?

The full stack trace (or at least as much as is captured in my command window) is:

thRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: Read less bytes than requested
     [[Node: tower_0/ReaderRead = ReaderRead[_class=["loc:@tower_0/FixedLengthRecordReader", "loc:@tower_0/input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/FixedLengthRecordReader, tower_0/input_producer)]]
2017-01-13 16:10:26.439793: step 530, loss = 3.09 (8574.8 examples/sec; 0.015 sec/batch)
2017-01-13 16:10:26.590086: step 540, loss = 3.26 (8112.0 examples/sec; 0.016 sec/batch)
2017-01-13 16:10:26.743346: step 550, loss = 3.00 (8532.6 examples/sec; 0.015 sec/batch)
2017-01-13 16:10:26.898753: step 560, loss = 3.00 (8184.1 examples/sec; 0.016 sec/batch)
2017-01-13 16:10:27.055726: step 570, loss = 3.05 (7889.9 examples/sec; 0.016 sec/batch)
2017-01-13 16:10:27.207754: step 580, loss = 2.94 (8160.6 examples/sec; 0.016 sec/batch)
2017-01-13 16:10:27.372556: step 590, loss = 3.01 (7219.4 examples/sec; 0.018 sec/batch)
2017-01-13 16:10:27.563653: step 600, loss = 3.09 (7959.2 examples/sec; 0.016 sec/batch)
2017-01-13 16:10:28.046971: step 610, loss = 3.08 (6326.3 examples/sec; 0.020 sec/batch)
2017-01-13 16:10:28.231482: step 620, loss = 3.05 (7315.9 examples/sec; 0.017 sec/batch)
2017-01-13 16:10:28.423616: step 630, loss = 2.90 (8359.0 examples/sec; 0.015 sec/batch)
2017-01-13 16:10:28.581104: step 640, loss = 2.67 (7338.5 examples/sec; 0.017 sec/batch)
2017-01-13 16:10:28.751448: step 650, loss = 3.10 (7431.2 examples/sec; 0.017 sec/batch)
2017-01-13 16:10:28.922110: step 660, loss = 2.81 (7299.7 examples/sec; 0.018 sec/batch)
2017-01-13 16:10:29.089983: step 670, loss = 2.99 (7354.7 examples/sec; 0.017 sec/batch)
2017-01-13 16:10:29.262036: step 680, loss = 2.94 (7145.2 examples/sec; 0.018 sec/batch)
W tensorflow/core/framework/op_kernel.cc:975] Out of range: RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
W tensorflow/core/framework/op_kernel.cc:975] Out of range: RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
Traceback (most recent call last):
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
     [[Node: local4/biases/read/_185 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_397_local4/biases/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cifar10_multi_gpu_train.py", line 273, in <module>
    tf.app.run()
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "cifar10_multi_gpu_train.py", line 269, in main
    train()
  File "cifar10_multi_gpu_train.py", line 239, in train
    _, loss_value = sess.run([train_op, loss])
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
     [[Node: local4/biases/read/_185 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_397_local4/biases/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'tower_0/shuffle_batch', defined at:
  File "cifar10_multi_gpu_train.py", line 273, in <module>
    tf.app.run()
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "cifar10_multi_gpu_train.py", line 269, in main
    train()
  File "cifar10_multi_gpu_train.py", line 171, in train
    loss = tower_loss(scope)
  File "cifar10_multi_gpu_train.py", line 75, in tower_loss
    images, labels = cifar10.distorted_inputs()
  File "/home/alex/code/tensorflow/tensorflow_012/official_cifar10/cifar10.py", line 159, in distorted_inputs
    batch_size=FLAGS.batch_size)
  File "/home/alex/code/tensorflow/tensorflow_012/official_cifar10/cifar10_input.py", line 202, in distorted_inputs
    shuffle=True)
  File "/home/alex/code/tensorflow/tensorflow_012/official_cifar10/cifar10_input.py", line 127, in _generate_image_and_label_batch
    min_after_dequeue=min_queue_examples)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 917, in shuffle_batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
    timeout_ms=timeout_ms, name=name)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/alex/development/anaconda3/envs/tensorflow-012/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_tower_0/shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 128, current size 39)
     [[Node: tower_0/shuffle_batch = QueueDequeueMany[_class=["loc:@tower_0/shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_175)]]
     [[Node: local4/biases/read/_185 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_397_local4/biases/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

And I made some small modifications to the tutorial scripts to make them work:

1) changed "contrib.deprecated." summaries to use "tf.summary."
2) changed cifar10_input.py "strided_slice" to provide the missing "strides" argument (strides=[1])
3) added to "cifar10_input.py"

if eval_data:
    read_input.label.set_shape((1,))

to fix "ValueError: All shapes must be fully defined"

I have tried making sure that the cifar10_eval.py script runs on the CPU (whilst the training is done on the GPU) by setting export CUDA_VISIBLE_DEVICES="" before running the evaluation.

I also only have this problem using the gpu enabled version of tensorflow - on the cpu only version this doesn't happen.

Regards,

Alex

+cc @shlens who has worked on this cifar10_eval.py in the past.

Thank you for taking the time to troubleshoot and fix the script. In that case, would you be interested in sending your changes as a pull request?

Hi @jart,

Many thanks. I have submitted my changes to the cifar10 files as pull request #909.

Thanks,

Alex

@al3xsh It may be a bit abrupt, but I still wanna ask whether the changes have been tested with tf0.12.1 or not.
Because when I tried your codes, I got the same error as I showed in #901

@weigel123 I have tested against tf 0.12.1 and it works for me :-)

One of the changes was to replace image_summary with tf.summary.image() - so that warning message should have gone away.

Could you try the code from https://github.com/al3xsh/models in its entirety and tell me if you still get those errors?

Do you get the same errors from the standard cifar10_train script?

Regards,

Alex

@al3xsh TensorFlow 1.0.0-alpha has removed tf.mul and the new symbol is tf.multiply. In 0.12.1, this change seems to have entered into force.
AttributeError: 'module' object has no attribute 'mul'
It was the first error I got when I tried your code.
The second one is
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
@wookayin has fixed these two bugs in #911.
Anyway, thanks for your help.
: )

@weigei123

Out of interest, how did you install tensorflow? I installed via pip, and in my environment tf.mul is still supported!

image

Anyway, glad your issue is fixed! I'm looking forward to tensorflow releases being stable :)

Cheers,

Alex

image
I installed Tensorflow by compiling source. It seems something ... different.
: P
Cheers,
Weigei123

@al3xsh is using the 0.12.1 release version, and @weigei123 is using 1.0 snapshot on the master branch. tf.mul is removed at master only, so it does not affect 0.12.x series

@weigei123 @wookayin can you successfully run cifar10_eval.py while running cifar10_multi_gpu_train.py? Are you both using the version 1.0 snapshot built from source?

The error I am getting suggests to me that the cifar10_eval script is pulling images out of the same queue as the training script (though I may be mistaken!). I am a bit too new to Tensorflow to fully understand what is happening though!

@al3xsh: I recommend installing TensorFlow from the nightly binaries or from source in order to run these tutorial models.

As per @nealwu that should solve the issue. Marking as closed.

Actually I just ran into this same issue myself recently. If you go through the tutorial at https://www.tensorflow.org/versions/master/tutorials/deep_cnn, it gives you the following warning:

Be careful not to run the evaluation and training binary on the same GPU or else you might run out of memory. Consider running the evaluation on a separate GPU if available or suspending the training binary while running the evaluation on the same GPU.

Assuming you're running Linux, you can suspend the training binary with Ctrl+z in order to run the evaluation script, and you can resume the training later with fg.

I also encountered this bug. I compiled 0.12.1 from source. Only thing I can do now is just running the eval script from another machine in the cluster.

UPDATE: well, actually sometimes, running across different machines won't do either. What I do now is first starting the eval script, then the train script.

Actually, running two train scripts, with different train_dir, but same data_dir will generate the same error.

@al3xsh does this bug still happen in 1.0? I haven't got chance to try it, because I need to compile Tensorflow from source for it to run on my machines...

@nealwu running the eval script using CUDA_VISIBLE_DEVICES='' or simply running it on another machine won't help, as long as eval and train scripts fetch data from the same directory.

@al3xsh ok. I think I've found the culprit. Undoing https://github.com/tensorflow/tensorflow/pull/5145/files should do.

@zym1010 Thanks for this!

I'm testing it now - it seems to have fixed it!

Cheers!

Hi @zym1010 and @al3xsh, what problem did that PR cause for you two? Want to know whether it's something we should actually revert.

@nealwu check #1083 . Essentially, that PR causes train and eval scripts unable to run concurrently, which to me totally defeats the purpose of having separate train and eval scripts.

@nealwu I have a question about suspending and resuming the training binary since I only have one gpu. The problem is that gpu memory is still being used when I suspend the training Ctrl+z, so when I run the eval file I run out of memory ("failed to allocate 157.88M (165543936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY") Do you have any other suggestion ? Thank you in advance

Yes, unfortunately the Ctrl+z I posted above may not actually work. Does the training script save checkpoints? It should ideally be able to resume training if quit and restarted.

@florpi instead of suspending the training, you could run it on the CPU instead. If you do:

$ export CUDA_VISIBLE_DEVICES=""

before running the evaluation script, then it won't use the GPU and will run on the CPU instead. This should get round the CUDA_ERROR_OUT_OF_MEMORY ...

I find that an element of GPU management is necessary when using tensorflow - otherwise it'll try and grab all the GPUs you've got on your system even if it is only going to use one of them!

Cheers,

Alex

Thank you very much, that does the trick and it's not so slow :)

Was this page helpful?
0 / 5 - 0 ratings