Keras-retinanet: WARNING:tensorflow:Your input ran out of data; interrupting training.

Created on 30 Aug 2020 · 16Comments · Source: fizyr/keras-retinanet

Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Thanks for your help.

Source

norbertsk9

Most helpful comment

I found two workarounds.

Use --steps argument while train.
Steps should be smaller than or equal to length of your dataset / batch size.
For example:
Your dataset has 1000 images and batch size is 1 --steps 1000
Your dataset has 1000 images and batch size is 2 --steps 500
Change default value of steps to None and do not use --steps argument while train.
https://github.com/fizyr/keras-retinanet/blob/8536cab6baafa8ae3beaa4f62e01cbad872e9884/keras_retinanet/bin/train.py#L436
Tesnorflow(keras) will calculate proper step automatically.

hansoli68 on 14 Sep 2020

👍4

All 16 comments

Hi, I have the same error.
Some time ago, I used script from here: https://github.com/curiousily/Deep-Learning-For-Hackers/blob/master/15.object-detection.ipynb

Now, when I run it, same as yours error appears. I can see that most of the packages are upgraded.
I am trying to downgrade some components, but so far no success.

micocw on 1 Sep 2020

I have the same problem.

BS-98 on 2 Sep 2020

I am having the same problem. Has anyone figured out what might be causing this?

medic873 on 6 Sep 2020

No. Problem still exists. I am using tensorflow==2.3.0 and keras==2.4.3.

BS-98 on 7 Sep 2020

I am using Keras 2.4.3 and Tensorflow 2.3.0 as well....

But I have noticed that I have a paperspace gpu server that runs this code just fine. It is running Tensorflow 1.14.0 and keras version 2.3.1

I have tried downgrading my tensorflow to 1.14.0 and keras to 2.3.1 but I get a different set of errors then. I will post what they are here in a few minutes. Once I recreate it again lol.

medic873 on 7 Sep 2020

So I just did the following

pip uninstall keras-resnet
pip uninstall keras-retinanet
pip uninstall Keras-Preprocessing
pip uninstall Keras-Applications
pip uninstall tensorflow
pip uninstall tensorflow-gpu

Then
pip install tensorflow==1.14.0
pip install tensorflow-gpu==1.14.0

Then I reran this code
pip install numpy --user
pip install . --user
python setup.py build_ext --inplace

And reran my model. I got an error saying keras retinanet requires at least tensorflow 2.2 witch shocks me since I have it running on a paperspace gpu server with tensorflow 1.14.0

But anyways I then did pip uninstall tensorflow and pip uninstall tensorflow-gpu and install tensorflow==2.2 and pip install tensorflow-gpu==2.2

I then tried to run the model again and got this new error "UboundLocalError: local variable 'retval_' reference before assignment"

After that, I uninstalled tensorflow and tensorflow gpu again and install tensorflow 2.3.0 again and am still getting the error.

"WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset"

So I am kind of at a loss. Just not sure what to try next :|

medic873 on 7 Sep 2020

I also tried to downgrade tensorflow and keras, but it doesnt give any effect.

norbertsk9 on 7 Sep 2020

So I just tried creating a new conda environment and using the pip list from my paperspace GPU server.

Here is the pip list from that sever.
PYTHON VERSION 3.7.5
Package Version

absl-py 0.7.1
apturl 0.5.2
asn1crypto 0.24.0
astor 0.8.0
attrs 19.1.0
Automat 0.6.0
backcall 0.1.0
bleach 3.1.0
blinker 1.4
Brlapi 0.6.6
certifi 2018.1.18
chardet 3.0.4
click 6.7
cloud-init 19.1
colorama 0.3.7
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
cryptography 2.1.4
cupshelpers 1.0
cycler 0.10.0
Cython 0.29.21
decorator 4.4.0
defer 1.0.6
defusedxml 0.6.0
distro-info 0.18ubuntu0.18.04.1
entrypoints 0.3
gast 0.2.2
google-pasta 0.1.7
grpcio 1.22.0
h5py 2.9.0
html5lib 0.999999999
httplib2 0.9.2
hyperlink 17.3.1
idna 2.6
incremental 16.10.1
ipykernel 5.1.1
ipython 7.6.1
ipython-genutils 0.2.0
ipywidgets 7.5.0
jedi 0.14.1
Jinja2 2.10.1
joblib 0.13.2
jsonpatch 1.16
jsonpointer 1.10
jsonschema 3.0.1
jupyter 1.0.0
jupyter-client 5.3.1
jupyter-console 6.0.0
jupyter-core 4.5.0
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
keras-resnet 0.1.0
keras-retinanet 0.5.1
keyring 10.6.0
keyrings.alt 3.0
kiwisolver 1.1.0
language-selector 0.1
launchpadlib 1.10.6
lazr.restfulclient 0.13.5
lazr.uri 1.0.3
linecache2 1.0.0
louis 3.5.0
lxml 4.5.2
macaroonbakery 1.1.3
Mako 1.0.7
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 3.1.1
mistune 0.8.4
nbconvert 5.5.0
nbformat 4.4.0
netifaces 0.10.4
notebook 6.0.0
numpy 1.16.4
oauth 1.0.1
oauthlib 2.0.6
olefile 0.45.1
opencv-python 4.1.0.25
PAM 0.4.2
pandas 0.25.0
pandocfilters 1.4.2
parso 0.5.1
pbr 3.1.1
pexpect 4.7.0
pickleshare 0.7.5
Pillow 6.1.0
pip 20.2
progressbar 2.5
progressbar2 3.51.4
prometheus-client 0.7.1
prompt-toolkit 2.0.9
protobuf 3.9.0
ptyprocess 0.6.0
pyasn1 0.4.2
pyasn1-modules 0.2.1
pycairo 1.16.2
pycrypto 2.6.1
pycups 1.9.73
Pygments 2.4.2
pygobject 3.26.1
PyJWT 1.5.3
pymacaroons 0.13.0
PyNaCl 1.1.2
pyOpenSSL 17.5.0
pyparsing 2.4.1.1
PyQt5 5.10.1
pyRFC3339 1.0
pyrsistent 0.15.3
pyserial 3.4
python-apt 1.6.5+ubuntu0.2
python-dateutil 2.8.0
python-debian 0.1.32
python-utils 2.4.0
pytz 2019.1
pyxdg 0.25
PyYAML 5.1.1
pyzmq 18.0.2
qtconsole 4.5.2
reportlab 3.4.0
requests 2.18.4
requests-unixsocket 0.1.5
scikit-learn 0.21.2
scipy 1.3.0
screen-resolution-extra 0.0.0
SecretStorage 2.3.1
Send2Trash 1.5.0
service-identity 16.0.0
setuptools 41.0.1
simplegeneric 0.8.1
simplejson 3.13.2
sip 4.19.8
six 1.12.0
ssh-import-id 5.7
system-service 0.3
systemd-python 234
tensorboard 1.14.0
tensorflow 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
terminado 0.8.2
testpath 0.4.2
testresources 2.0.0
Theano 1.0.4
torch 1.1.0
torchvision 0.3.0
tornado 6.0.3
traceback2 1.4.0
traitlets 4.3.2
Twisted 17.9.0
ubuntu-drivers-common 0.0.0
ufw 0.36
unattended-upgrades 0.1
unittest2 1.1.0
urllib3 1.22
usb-creator 0.3.3
virtualenv 15.1.0
wadllib 1.3.2
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.15.5
wheel 0.33.4
widgetsnbextension 3.5.0
wrapt 1.11.2
xkit 0.0.0
zope.interface 4.3.2

I created a new conda enviorment install numpy version 1.16.4 then install tensorflow version 1.14.0 and tensorflow gpu version 1.14.0 and I think had to do a pip isntall keras-retinanet

but after that, I reran my code and it did not give me the error saying that I need to figure out how to make it repeat.

But now tensorflow is not utilizing my gpu's :( so kindof defeats the purpose lol

medic873 on 7 Sep 2020

Has someone solved the issue?

norbertsk9 on 13 Sep 2020

I found two workarounds.

Use --steps argument while train.
Steps should be smaller than or equal to length of your dataset / batch size.
For example:
Your dataset has 1000 images and batch size is 1 --steps 1000
Your dataset has 1000 images and batch size is 2 --steps 500
Change default value of steps to None and do not use --steps argument while train.
https://github.com/fizyr/keras-retinanet/blob/8536cab6baafa8ae3beaa4f62e01cbad872e9884/keras_retinanet/bin/train.py#L436
Tesnorflow(keras) will calculate proper step automatically.

hansoli68 on 14 Sep 2020

👍4

the solution proposed by @hansoli68 works, but note that for the first version, --steps must be equal to the total number of unique images in your training set (divided by batch size) - I was making the mistake of using the total number of training labels, but some images have more than one training label, and my run failed until I determined the number of unique images.

If the second approach of setting step=None works, that seems more foolproof. In fact, as currently constructed, it apparently doesn't even make sense to have steps be a mutable parameter?

mooratov on 25 Sep 2020

❤2 👍1

if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:

instantiating and setting up ImageDataGenerator:

training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True,
shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)

test_generator = ImageDataGenerator(rescale=1./255)

setting up training and test database:
here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database

training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1,
class_mode='binary')

test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1,
class_mode='binary')

after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database

classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base,
validation_steps=624/1)

hope it helps you. sorry for my english.

Andre-Vitorino on 2 Oct 2020

I have done as @Andre-Vitorino suggest. Then I dont get that error, but it looks like the training works only because in each epoch the network looks at the same original images, because the validation accuracy doesnt change regardless of what learning rate is set. So it makes training work but doesnt solve the underlying problem - that the images are not augmented.

emoen on 14 Oct 2020

if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:

instantiating and setting up ImageDataGenerator:

training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True,
shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)

test_generator = ImageDataGenerator(rescale=1./255)

setting up training and test database:
here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database

training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1,
class_mode='binary')

test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1,
class_mode='binary')

after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database

classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base,
validation_steps=624/1)

hope it helps you. sorry for my english.

i also used your style and it solve the problem however the accuracy show the same value

hashimi1998 on 3 Nov 2020

Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Thanks for your help.

it has worked for me when I changed the number in steps per epoch where it was showing me error after step 2429 as my dataset has overall images of 2430
I changed steps_per_epoch = 2429
then it started running without any error

hrithiksagar on 3 Nov 2020

Thank you

On Tue, 3 Nov 2020, 10:41 pm Hrithik Sagar, notifications@github.com
wrote:

Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make
sure that your dataset or generator can generate at least steps_per_epoch

epochs batches (in this case, 5000 batches). You may need to use the
repeat() function when building your dataset.
Thanks for your help.

it has worked for me when I changed the number in steps per epoch where it
was showing me error after step 2429 as my dataset has overall images of
2430
I changed steps_per_epoch = 2429
then it started running without any error

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fizyr/keras-retinanet/issues/1449#issuecomment-721156730,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AP7NQ6L3LOY4Y5QHEAICHVDSOAJAZANCNFSM4QPYZGXA
.