Please answer the following questions for yourself before submitting an issue.
https://github.com/tensorflow/models
Starting a training job on google cloud for my object detection dataset. Job stops after ~7 minutes giving this error:
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_main.py", line 26, in <module> from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 28, in <module> from object_detection import exporter as exporter_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/exporter.py", line 23, in <module> from object_detection.builders import model_builder
File "/root/.local/lib/python2.7/site-packages/object_detection/builders/model_builder.py", line 39, in <module> from object_detection.utils import tf_version
File "/root/.local/lib/python2.7/site-packages/object_detection/utils/tf_version.py", line 17, in <module> from tensorflow.python import tf2 # pylint: disable=import-outside-toplevel ImportError: cannot import name tf2
Local training however works fine, but is really slow and will take at least a week.
Install Tensorflow 1.14 with pip, all other libraries for the API, model repository, pycocotools, protobuf 3.11.4 -> testing the API installation works fine.
Create dataset including tfrecord files, training pipeline, googlecloud yaml file, ...
Run google cloud training job with:
gcloud ai-platform jobs submit training balls200_training_260520a --runtime-version 1.12 --job-dir=gs://200balls_model/train --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,tmp/pycocotools/pycocotools-2.0.tar.gz --module-name object_detection.model_main --region us-central1 --config /home/ubuntu/Documents/200balls_modeltraining/cloud.yml -- --model_dir=gs://200balls_model/train --pipeline_config_path=gs://200balls_model/pipeline.config
getting the error
uninstalling tensorflow 1.14 and installing 1.15 as it is required for the API
training job should run without any errors as I'm using tf 1.15 which is required for the object detection API
What is the output of pip list?
@mihaimaruseac
Package Version
---------------------------------- -----------
absl-py 0.9.0
actionlib 1.11.13
adium-theme-ubuntu 0.3.4
angles 1.9.12
astor 0.8.1
attrs 19.3.0
autobahn 0.10.3
backports.functools-lru-cache 1.6.1
backports.shutil-get-terminal-size 1.0.0
backports.ssl-match-hostname 3.4.0.2
backports.weakref 1.0.post1
base-local-planner 1.14.7
beautifulsoup4 4.4.1
bleach 3.1.5
bondpy 1.8.3
bzr 2.7.0
cairocffi 0.7.2
CairoSVG 1.0.19
camera-calibration 1.12.23
camera-calibration-parsers 1.11.13
catkin 0.7.20
catkin-pkg 0.4.20
catkin-pkg-modules 0.4.20
cffi 1.14.0
chardet 2.3.0
click 7.1.2
configobj 5.0.6
configparser 4.0.2
contextlib2 0.6.0.post1
controller-manager 0.13.5
controller-manager-msgs 0.13.5
crcmod 1.7
cryptography 2.9.2
cv-bridge 1.12.8
cycler 0.10.0
Cython 0.29.17
decorator 4.4.2
defusedxml 0.4.1
detection-utils 1.0.0
diagnostic-analysis 1.9.3
diagnostic-common-diagnostics 1.9.3
diagnostic-updater 1.9.3
dlib 19.19.0
docutils 0.12
dynamic-reconfigure 1.5.50
ecdsa 0.13
empy 3.3.2
entrypoints 0.3
enum34 1.1.10
face-recognition 1.3.0
face-recognition-models 0.3.0
funcsigs 1.0.2
functools32 3.2.3.post2
futures 3.3.0
gast 0.2.2
gazebo-plugins 2.5.20
gazebo-ros 2.5.20
gencpp 0.6.0
geneus 2.2.6
genlisp 0.4.16
genmsg 0.5.11
gennodejs 2.0.1
genpy 0.6.7
google-pasta 0.2.0
grpcio 1.29.0
h5py 2.10.0
html5lib 0.999
httplib2 0.9.1
idna 2.0
image-geometry 1.12.8
importlib-metadata 1.6.0
interactive-markers 1.11.5
ipaddress 1.0.23
ipykernel 4.10.1
ipython 5.10.0
ipython-genutils 0.2.0
ipywidgets 7.5.1
Jinja2 2.11.2
joblib 0.9.4
joint-state-publisher 1.12.15
joint-state-publisher-gui 1.12.15
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 5.3.4
jupyter-console 5.2.0
jupyter-core 4.6.3
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
keyring 7.3
kiwisolver 1.1.0
laser-geometry 1.6.5
launchpadlib 1.10.3
lazr.restfulclient 0.13.4
lazr.uri 1.0.3
llvmlite 0.31.0
lxml 3.5.0
lz4 0.7.0
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 2.2.5
mercurial 3.7.3
message-filters 1.12.14
mistune 0.8.4
mock 3.0.5
mpi4py 1.3.1
msgpack-python 0.4.6
nbconvert 5.6.1
nbformat 4.4.0
ndg-httpsclient 0.4.0
netifaces 0.10.4
nose 1.3.7
notebook 5.7.9
numba 0.47.0
numpy 1.16.4
oauth 1.0.1
object-detection 0.1
opt-einsum 2.3.2
packaging 20.3
PAM 0.4.2
pandas 0.24.2
pandocfilters 1.4.2
paramiko 1.16.0
pathlib2 2.3.5
pexpect 4.8.0
pickleshare 0.7.5
Pillow 3.1.2
pip 20.1.1
pluginlib 1.11.3
ply 3.7
prometheus-client 0.7.1
prompt-toolkit 1.0.18
protobuf 3.11.3
psutil 3.4.2
ptyprocess 0.6.0
pyasn1 0.1.9
pyasn1-modules 0.0.7
pycocotools 2.0.0
pycparser 2.20
pycrypto 2.6.1
pycurl 7.43.0
pydot 1.0.29
Pygments 2.1
pygobject 3.20.0
pygpgme 0.3
PyMySQL 0.7.2
PyOpenGL 3.0.2
pyOpenSSL 19.1.0
pyparsing 2.0.3
pyrsistent 0.16.0
pyserial 3.0.1
Pyste 0.9.10
python-dateutil 2.8.1
python-qt-binding 0.3.7
python-snappy 0.5
pytz 2014.10
PyYAML 3.11
pyzmq 19.0.1
qt-dotgraph 0.3.17
qt-gui 0.3.17
qt-gui-cpp 0.3.17
qt-gui-py-common 0.3.17
qtconsole 4.7.4
QtPy 1.9.0
requests 2.9.1
resource-retriever 1.12.6
roman 2.0.0
rosapi 0.11.6
rosbag 1.12.14
rosboost-cfg 1.14.6
rosbridge-library 0.11.6
rosbridge-server 0.11.6
rosclean 1.14.6
roscreate 1.14.6
rosdep 0.19.0
rosdep-modules 0.19.0
rosdistro 0.8.2
rosdistro-modules 0.8.2
rosgraph 1.12.14
rosinstall 0.7.8
rosinstall-generator 0.1.19
roslaunch 1.12.14
roslib 1.14.6
roslint 0.11.0
roslz4 1.12.14
rosmake 1.14.6
rosmaster 1.12.14
rosmsg 1.12.14
rosnode 1.12.14
rosparam 1.12.14
rospkg 1.2.6
rospkg-modules 1.2.6
rospy 1.12.14
rosserial-client 0.7.7
rosserial-python 0.7.7
rosservice 1.12.14
rostest 1.12.14
rostopic 1.12.14
rosunit 1.14.6
roswtf 1.12.14
RPi.GPIO 0.7.0
rqt-action 0.4.9
rqt-bag 0.4.12
rqt-bag-plugins 0.4.12
rqt-console 0.4.9
rqt-dep 0.4.9
rqt-graph 0.4.11
rqt-gui 0.5.0
rqt-gui-py 0.5.0
rqt-image-view 0.4.14
rqt-launch 0.4.8
rqt-logger-level 0.4.8
rqt-moveit 0.5.7
rqt-msg 0.4.8
rqt-nav-view 0.5.7
rqt-plot 0.4.8
rqt-pose-view 0.5.8
rqt-publisher 0.4.8
rqt-py-common 0.5.0
rqt-py-console 0.4.8
rqt-reconfigure 0.5.1
rqt-robot-dashboard 0.5.7
rqt-robot-monitor 0.5.8
rqt-robot-steering 0.5.9
rqt-runtime-monitor 0.5.7
rqt-rviz 0.5.10
rqt-service-caller 0.4.8
rqt-shell 0.4.9
rqt-srv 0.4.8
rqt-tf-tree 0.6.0
rqt-top 0.4.8
rqt-topic 0.4.11
rqt-web 0.4.8
rviz 1.12.17
scandir 1.10.0
scikit-learn 0.20.4
scipy 1.2.3
SecretStorage 2.1.3
Send2Trash 1.5.0
sensor-msgs 1.12.7
service-identity 16.0.0
setuptools 20.7.0
simplegeneric 0.8.1
simplejson 3.8.1
singledispatch 3.4.0.3
six 1.14.0
slim 0.1
smach 2.0.1
smach-ros 2.0.1
smclib 1.8.3
subprocess32 3.5.4
tensorboard 1.15.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
termcolor 1.1.0
terminado 0.8.3
testpath 0.4.4
tf 1.11.9
tf-conversions 1.11.9
tf2-geometry-msgs 0.5.20
tf2-kdl 0.5.20
tf2-py 0.5.20
tf2-ros 0.5.20
topic-tools 1.12.14
tornado 4.2.1
tqdm 4.46.0
traitlets 4.3.3
trollius 2.0.1
Twisted 16.0.0
txaio 1.0.0
unity-lens-photos 1.0
urllib3 1.13.1
vcstools 0.1.42
wadllib 1.3.2
wcwidth 0.1.9
webencodings 0.5.1
Werkzeug 1.0.1
wheel 0.29.0
widgetsnbextension 3.5.1
wrapt 1.12.1
wstool 0.1.17
wxPython 3.0.2.0
wxPython-common 3.0.2.0
xacro 1.11.3
xcffib 0.3.6
zipp 1.2.0
zope.interface 4.1.3
Just to be sure, can you create a new virtualenv and install TF1.15 in it directly and then try again?
Also, when posting multiline output, please use ``` to mark the content. 3 backticks in Markdown, not just 1.
It looks like you are using --runtime-version 1.12 for your training job instead of 1.15. Can you try changing that and report back if the error still persists ?
@tombstone So i edited my command to: gcloud ai-platform jobs submit training balls200_training_270520d --runtime-version 1.15 .....
That solved the ImportError! Thanks a lot! (however i didn't edit the cloud.yml file)
But right now I'm getting errors, because the LabelMap file can't be found: NotFoundError: /home/ubuntu/models/research/object_detection/200balls_model/200balls.pbtxt; No such file or directory
I already tried changing the path to "~/models/..." and "home/ubuntu/models/..." in my pipeline.config file.
the pbtxt file is in that directory though.
@tombstone So i edited my command to: gcloud ai-platform jobs submit training balls200_training_270520d --runtime-version 1.15 .....
That solved the ImportError! Thanks a lot! (however i didn't edit the cloud.yml file)
But right now I'm getting errors, because the LabelMap file can't be found: NotFoundError: /home/ubuntu/models/research/object_detection/200balls_model/200balls.pbtxt; No such file or directory
I already tried changing the path to "~/models/..." and "home/ubuntu/models/..." in my pipeline.config file.
the pbtxt file is in that directory though.
I solved that one by uploading the label map file to the gc storage bucket and editing the path accordingly in the pipeline.config file.
Now I'm getting a similar error:
NotFoundError: /home/models/research/object_detection/200balls_model; No such file or directory
I have no idea where that path comes from... it's not in the pipeline file.
Solved all the errors now by uploading all the files, including the pre trained model checkpoint, to the storage bucket and editing the paths to those files in the pipeline.config file.
Thanks for the help!!
@sohartma Glad you were able to get past it. Closing
Most helpful comment
It looks like you are using
--runtime-version 1.12for your training job instead of 1.15. Can you try changing that and report back if the error still persists ?