Models: GCP Pet training failed

Created on 3 Apr 2019  路  4Comments  路  Source: tensorflow/models

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • What is the top-level directory of the model you are using:
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Most helpful comment

@tonychen257 , the problem was solved for me with https://stackoverflow.com/questions/51430391/tensorflow-object-detection-training-error-with-tpu/51433826 , it's important you change the models/research/object_detection/dataset_tools/create_pycocotools_package.sh
this line:
sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

for:

put it in 3 lines.

sed "s/import matplotlib\.pyplot as plt/import matplotlib\\
matplotlib\.use\(\'Agg\'\)\\
import matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

is because, "\n" not works

All 4 comments

"The replica ps 0 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_main.py", line 26, in
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 27, in
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py", line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
^
SyntaxError: invalid syntax

The replica ps 1 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_main.py", line 26, in
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 27, in
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py", line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
^
SyntaxError: invalid syntax

The replica ps 2 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_main.py", line 26, in
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 27, in
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py", line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
^
SyntaxError: invalid syntax

To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=344299365784&resource=ml_job%2Fjob_id%2Froot_object_detection_pets_04_03_2019_15_37_21&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22root_object_detection_pets_04_03_2019_15_37_21%22"

TIM鍥剧墖20190403162126
My job always failed at here. Do you guys have any ideas?

@tonychen257 , the problem was solved for me with https://stackoverflow.com/questions/51430391/tensorflow-object-detection-training-error-with-tpu/51433826 , it's important you change the models/research/object_detection/dataset_tools/create_pycocotools_package.sh
this line:
sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

for:

put it in 3 lines.

sed "s/import matplotlib\.pyplot as plt/import matplotlib\\
matplotlib\.use\(\'Agg\'\)\\
import matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

is because, "\n" not works

Hi There,
We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Was this page helpful?
0 / 5 - 0 ratings