Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Thanks for your help.
Hi, I have the same error.
Some time ago, I used script from here: https://github.com/curiousily/Deep-Learning-For-Hackers/blob/master/15.object-detection.ipynb
Now, when I run it, same as yours error appears. I can see that most of the packages are upgraded.
I am trying to downgrade some components, but so far no success.
I have the same problem.
I am having the same problem. Has anyone figured out what might be causing this?
No. Problem still exists. I am using tensorflow==2.3.0 and keras==2.4.3.
I am using Keras 2.4.3 and Tensorflow 2.3.0 as well....
But I have noticed that I have a paperspace gpu server that runs this code just fine. It is running Tensorflow 1.14.0 and keras version 2.3.1
I have tried downgrading my tensorflow to 1.14.0 and keras to 2.3.1 but I get a different set of errors then. I will post what they are here in a few minutes. Once I recreate it again lol.
So I just did the following
pip uninstall keras-resnet
pip uninstall keras-retinanet
pip uninstall Keras-Preprocessing
pip uninstall Keras-Applications
pip uninstall tensorflow
pip uninstall tensorflow-gpu
Then
pip install tensorflow==1.14.0
pip install tensorflow-gpu==1.14.0
Then I reran this code
pip install numpy --user
pip install . --user
python setup.py build_ext --inplace
And reran my model. I got an error saying keras retinanet requires at least tensorflow 2.2 witch shocks me since I have it running on a paperspace gpu server with tensorflow 1.14.0
But anyways I then did pip uninstall tensorflow and pip uninstall tensorflow-gpu and install tensorflow==2.2 and pip install tensorflow-gpu==2.2
I then tried to run the model again and got this new error "UboundLocalError: local variable 'retval_' reference before assignment"
After that, I uninstalled tensorflow and tensorflow gpu again and install tensorflow 2.3.0 again and am still getting the error.
"WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset"
So I am kind of at a loss. Just not sure what to try next :|
I also tried to downgrade tensorflow and keras, but it doesnt give any effect.
So I just tried creating a new conda environment and using the pip list from my paperspace GPU server.
Here is the pip list from that sever.
PYTHON VERSION 3.7.5
Package Version
absl-py 0.7.1
apturl 0.5.2
asn1crypto 0.24.0
astor 0.8.0
attrs 19.1.0
Automat 0.6.0
backcall 0.1.0
bleach 3.1.0
blinker 1.4
Brlapi 0.6.6
certifi 2018.1.18
chardet 3.0.4
click 6.7
cloud-init 19.1
colorama 0.3.7
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
cryptography 2.1.4
cupshelpers 1.0
cycler 0.10.0
Cython 0.29.21
decorator 4.4.0
defer 1.0.6
defusedxml 0.6.0
distro-info 0.18ubuntu0.18.04.1
entrypoints 0.3
gast 0.2.2
google-pasta 0.1.7
grpcio 1.22.0
h5py 2.9.0
html5lib 0.999999999
httplib2 0.9.2
hyperlink 17.3.1
idna 2.6
incremental 16.10.1
ipykernel 5.1.1
ipython 7.6.1
ipython-genutils 0.2.0
ipywidgets 7.5.0
jedi 0.14.1
Jinja2 2.10.1
joblib 0.13.2
jsonpatch 1.16
jsonpointer 1.10
jsonschema 3.0.1
jupyter 1.0.0
jupyter-client 5.3.1
jupyter-console 6.0.0
jupyter-core 4.5.0
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
keras-resnet 0.1.0
keras-retinanet 0.5.1
keyring 10.6.0
keyrings.alt 3.0
kiwisolver 1.1.0
language-selector 0.1
launchpadlib 1.10.6
lazr.restfulclient 0.13.5
lazr.uri 1.0.3
linecache2 1.0.0
louis 3.5.0
lxml 4.5.2
macaroonbakery 1.1.3
Mako 1.0.7
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 3.1.1
mistune 0.8.4
nbconvert 5.5.0
nbformat 4.4.0
netifaces 0.10.4
notebook 6.0.0
numpy 1.16.4
oauth 1.0.1
oauthlib 2.0.6
olefile 0.45.1
opencv-python 4.1.0.25
PAM 0.4.2
pandas 0.25.0
pandocfilters 1.4.2
parso 0.5.1
pbr 3.1.1
pexpect 4.7.0
pickleshare 0.7.5
Pillow 6.1.0
pip 20.2
progressbar 2.5
progressbar2 3.51.4
prometheus-client 0.7.1
prompt-toolkit 2.0.9
protobuf 3.9.0
ptyprocess 0.6.0
pyasn1 0.4.2
pyasn1-modules 0.2.1
pycairo 1.16.2
pycrypto 2.6.1
pycups 1.9.73
Pygments 2.4.2
pygobject 3.26.1
PyJWT 1.5.3
pymacaroons 0.13.0
PyNaCl 1.1.2
pyOpenSSL 17.5.0
pyparsing 2.4.1.1
PyQt5 5.10.1
pyRFC3339 1.0
pyrsistent 0.15.3
pyserial 3.4
python-apt 1.6.5+ubuntu0.2
python-dateutil 2.8.0
python-debian 0.1.32
python-utils 2.4.0
pytz 2019.1
pyxdg 0.25
PyYAML 5.1.1
pyzmq 18.0.2
qtconsole 4.5.2
reportlab 3.4.0
requests 2.18.4
requests-unixsocket 0.1.5
scikit-learn 0.21.2
scipy 1.3.0
screen-resolution-extra 0.0.0
SecretStorage 2.3.1
Send2Trash 1.5.0
service-identity 16.0.0
setuptools 41.0.1
simplegeneric 0.8.1
simplejson 3.13.2
sip 4.19.8
six 1.12.0
ssh-import-id 5.7
system-service 0.3
systemd-python 234
tensorboard 1.14.0
tensorflow 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
terminado 0.8.2
testpath 0.4.2
testresources 2.0.0
Theano 1.0.4
torch 1.1.0
torchvision 0.3.0
tornado 6.0.3
traceback2 1.4.0
traitlets 4.3.2
Twisted 17.9.0
ubuntu-drivers-common 0.0.0
ufw 0.36
unattended-upgrades 0.1
unittest2 1.1.0
urllib3 1.22
usb-creator 0.3.3
virtualenv 15.1.0
wadllib 1.3.2
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.15.5
wheel 0.33.4
widgetsnbextension 3.5.0
wrapt 1.11.2
xkit 0.0.0
zope.interface 4.3.2
I created a new conda enviorment install numpy version 1.16.4 then install tensorflow version 1.14.0 and tensorflow gpu version 1.14.0 and I think had to do a pip isntall keras-retinanet
but after that, I reran my code and it did not give me the error saying that I need to figure out how to make it repeat.
But now tensorflow is not utilizing my gpu's :( so kindof defeats the purpose lol
Has someone solved the issue?
I found two workarounds.
Use --steps argument while train.
Steps should be smaller than or equal to length of your dataset / batch size.
For example:
Your dataset has 1000 images and batch size is 1 --steps 1000
Your dataset has 1000 images and batch size is 2 --steps 500
Change default value of steps to None and do not use --steps argument while train.
https://github.com/fizyr/keras-retinanet/blob/8536cab6baafa8ae3beaa4f62e01cbad872e9884/keras_retinanet/bin/train.py#L436
Tesnorflow(keras) will calculate proper step automatically.
the solution proposed by @hansoli68 works, but note that for the first version, --steps must be equal to the total number of unique images in your training set (divided by batch size) - I was making the mistake of using the total number of training labels, but some images have more than one training label, and my run failed until I determined the number of unique images.
If the second approach of setting step=None works, that seems more foolproof. In fact, as currently constructed, it apparently doesn't even make sense to have steps be a mutable parameter?
if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:
instantiating and setting up ImageDataGenerator:
training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True,
shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)
test_generator = ImageDataGenerator(rescale=1./255)
setting up training and test database:
here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database
training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1,
class_mode='binary')
test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1,
class_mode='binary')
after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database
classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base,
validation_steps=624/1)
hope it helps you. sorry for my english.
I have done as @Andre-Vitorino suggest. Then I dont get that error, but it looks like the training works only because in each epoch the network looks at the same original images, because the validation accuracy doesnt change regardless of what learning rate is set. So it makes training work but doesnt solve the underlying problem - that the images are not augmented.
if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:
instantiating and setting up ImageDataGenerator:
training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True,
shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)test_generator = ImageDataGenerator(rescale=1./255)
setting up training and test database:
here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test databasetraining_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1,
class_mode='binary')test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1,
class_mode='binary')after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database
classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base,
validation_steps=624/1)hope it helps you. sorry for my english.
i also used your style and it solve the problem however the accuracy show the same value
Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at leaststeps_per_epoch * epochsbatches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Thanks for your help.
it has worked for me when I changed the number in steps per epoch where it was showing me error after step 2429 as my dataset has overall images of 2430
I changed steps_per_epoch = 2429
then it started running without any error
Thank you
On Tue, 3 Nov 2020, 10:41 pm Hrithik Sagar, notifications@github.com
wrote:
Hello,
During training with Google Colab train.py such error occured:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make
sure that your dataset or generator can generate at least steps_per_epoch
- epochs batches (in this case, 5000 batches). You may need to use the
repeat() function when building your dataset.
Thanks for your help.it has worked for me when I changed the number in steps per epoch where it
was showing me error after step 2429 as my dataset has overall images of
2430
I changed steps_per_epoch = 2429
then it started running without any error—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fizyr/keras-retinanet/issues/1449#issuecomment-721156730,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AP7NQ6L3LOY4Y5QHEAICHVDSOAJAZANCNFSM4QPYZGXA
.
Most helpful comment
I found two workarounds.
Use
--stepsargument while train.Steps should be smaller than or equal to
length of your dataset / batch size.For example:
Your dataset has 1000 images and batch size is 1
--steps 1000Your dataset has 1000 images and batch size is 2
--steps 500Change default value of steps to
Noneand do not use--stepsargument while train.https://github.com/fizyr/keras-retinanet/blob/8536cab6baafa8ae3beaa4f62e01cbad872e9884/keras_retinanet/bin/train.py#L436
Tesnorflow(keras) will calculate proper step automatically.