Models: FEELVOS cannot run

Created on 27 Feb 2019 · 22Comments · Source: tensorflow/models

When I run bash train.sh, it just prompts too many obvious errors. After I fix some of them, it still fails to run. Please update your code to make it usable.

support

Source

sydney0zq

Most helpful comment

Hi aquariusjay,
When I run eval.sh , I face a problem. It stops in vis_video.py line 480.
When I check the log, there only has information of "visulaizing batch 1/30" which seems that something goes wrong with line 469 res = sess.run(ops) resulting in no new data seq_ious appended to 'all_ious'.
I am not sure whether this problem related to the version of tensorflow or not. I use tensorflow 1.12.0 and python 3.6.

INFO:tensorflow:Visualizing batch 1 / 30 Traceback (most recent call last): File "/home/ci/models/research/feelvos/vis_video.py", line 500, in <module> tf.app.run() File "/home/ci/pj/py36tf12/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/ci/models/research/feelvos/vis_video.py", line 480, in main all_ious = np.concatenate(all_ious, axis=0) ValueError: need at least one array to concatenate

same issue! FYI, I can run the script train.sh without any issue. as @jeremy-cv

mtroym on 9 Apr 2019

👍2

All 22 comments

Hi,

I just took a quick look and the problem is that the code in deeplab on which feelvos depends changed. I'm not sure if I'll be able to fix this before the ICCV deadline. Until then, you could try to revert the deeplab code to an older version, then feelvos should hopefully work.

pvoigtlaender on 28 Feb 2019

Hi sydney0zq,

The problem should be resolved now. Please sync to HEAD.

Cheers,

aquariusjay on 28 Feb 2019

@pvoigtlaender @aquariusjay Thanks a lot.

sydney0zq on 1 Mar 2019

Hi aquariusjay,
When I run eval.sh , I face a problem. It stops in vis_video.py line 480.
When I check the log, there only has information of "visulaizing batch 1/30" which seems that something goes wrong with line 469 res = sess.run(ops) resulting in no new data seq_ious appended to 'all_ious'.
I am not sure whether this problem related to the version of tensorflow or not. I use tensorflow 1.12.0 and python 3.6.

INFO:tensorflow:Visualizing batch 1 / 30 Traceback (most recent call last): File "/home/ci/models/research/feelvos/vis_video.py", line 500, in <module> tf.app.run() File "/home/ci/pj/py36tf12/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/ci/models/research/feelvos/vis_video.py", line 480, in main all_ious = np.concatenate(all_ious, axis=0) ValueError: need at least one array to concatenate

Celiali on 15 Mar 2019

👍1

The code was written for python2 and will probably need some small changes to run with python3 (I'll look into this when I get some time, but this happen at earliest in a few weeks). If this is an option for you, please try to run with python2 for now.
(I'm not entirely sure if the specific error you report is related to the python versions, but at least eval.sh does run for me when using python2).

pvoigtlaender on 15 Mar 2019

Hi @pvoigtlaender,

After having the same issue described by Celiali, I tried with python2 and still get the error :

INFO:tensorflow:Visualizing batch 1 / 30 Traceback (most recent call last): File "/workdir/models/research/feelvos/vis_video.py", line 504, in <module> tf.app.run() File "/venv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/workdir/models/research/feelvos/vis_video.py", line 484, in main all_ious = np.concatenate(all_ious, axis=0) ValueError: need at least one array to concatenate

I'm using tensorflow version 1.12.0 with cuda 9.0.

Any idea ?

Thank you

jeremy-cv on 29 Mar 2019

Hi @jeremy-cv,

first of all it seems you either don't have the most recent version of the code or you changed the code, since the line numbers your traceback shows do not match the lines of the code, which makes debugging harder.
As I cannot reproduce the problem, it would be good if you can try to find out some additional information.

It seems that in your run all_ious is empty when np.concatenate is called.
Please check what happens in the loop
for batch in range(num_batches):
Especially please check what happens in these lines:

          seq_ious = _process_seq_data(segmentation_dir, embeddings_dir,
                                       seq_name_val, pred_labels_val,
                                       gt_labels_val, all_embeddings_val)
          all_ious.append(seq_ious)

If they are correctly executed, then all_ious should not be empty anymore.
In particular, please check if _process_seq_data is executed at all and if yes, what does it return?

pvoigtlaender on 3 Apr 2019

Hi @pvoigtlaender,

Indeed, I have changed the code and added prints to see what is happening.

The program exits at the line 469 :
res = sess.run(ops)
calling from python2.7/site-packages/tensorflow/python/platform/app.py :
_sys.exit(main(argv))

Hence, it seems that _process_seq_data is not even called.

When I print ops I get the following :[<tf.Tensor 'concat_1:0' shape=(?, ?, ?) dtype=int32>, <tf.Tensor 'Squeeze:0' shape=(?, ?, ?) dtype=int32>, <tf.Tensor 'strided_slice_4:0' shape=() dtype=string>])

FYI, I'm trying to run the code with a nvidia 1080Ti (12GB). Perhaps it is not enough and make the program to exit? Any advice to decrease the memory consumption without affecting too much the accuracy is welcomed.

Thanks for your help.

jeremy-cv on 3 Apr 2019

12 GB should be sufficient I think.
What you can try is to set up correlation cost (explained in the readme) which should bring the memory consumption down quite a bit and also make it much faster.

pvoigtlaender on 3 Apr 2019

Yes I did it but still get the issue.

jeremy-cv on 3 Apr 2019

Did the same issue also happen without using correlation cost?

pvoigtlaender on 3 Apr 2019

Yes same issue without using correlation cost.

FYI, I can run the script train.sh without any issue.

jeremy-cv on 4 Apr 2019

👍1

Hi aquariusjay,
When I run eval.sh , I face a problem. It stops in vis_video.py line 480.
When I check the log, there only has information of "visulaizing batch 1/30" which seems that something goes wrong with line 469 res = sess.run(ops) resulting in no new data seq_ious appended to 'all_ious'.
I am not sure whether this problem related to the version of tensorflow or not. I use tensorflow 1.12.0 and python 3.6.

INFO:tensorflow:Visualizing batch 1 / 30 Traceback (most recent call last): File "/home/ci/models/research/feelvos/vis_video.py", line 500, in <module> tf.app.run() File "/home/ci/pj/py36tf12/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/ci/models/research/feelvos/vis_video.py", line 480, in main all_ious = np.concatenate(all_ious, axis=0) ValueError: need at least one array to concatenate

same issue! FYI, I can run the script train.sh without any issue. as @jeremy-cv

mtroym on 9 Apr 2019

👍2

Hi sydney0zq,

The problem should be resolved now. Please sync to HEAD.

Cheers,

Hi developer, can you provide the environment of python information? ie. python==2.7 tensorfow=-=1.13 ...

mtroym on 11 Apr 2019

Hi,

the setup I tested it with (and just ran it again with a fresh checkout and it worked) is the following:
python 2.7
TensorFlow version 1.12.0

Can you please have a try if it works in that setup for you?

pvoigtlaender on 11 Apr 2019

Hi, @pvoigtlaender

It didn't work cuz another error occurred.
The message is shown:

Traceback (most recent call last):
  File "[path to my file]/models/research/feelvos/train.py", line 25, in <module>
    from feelvos import model
  File "[path to my file]/models/research/feelvos/model.py", line 58, in <module>
    from deeplab import model
  File "[path to my file]/models/research/deeplab/model.py", line 55, in <module>
    from deeplab.core import dense_prediction_cell
...
  File "/Users/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 71, in <module>
    from tensorflow.python.estimator import model_fn as model_fn_lib
ImportError: cannot import name model_fn

Seems like the deeplab model could not be loaded.
BTW, could you give me the information of the git commit message of deeplab model in the models/deeplab/ repo. maybe it did not meet the requirements in deeplab.

I appreciated your help!

Thanks.

mtroym on 14 Apr 2019

Hi, @pvoigtlaender

It didn't work cuz another error occurred.
The message is shown:
Traceback (most recent call last):
  File "[path to my file]/models/research/feelvos/train.py", line 25, in <module>
    from feelvos import model
  File "[path to my file]/models/research/feelvos/model.py", line 58, in <module>
    from deeplab import model
  File "[path to my file]/models/research/deeplab/model.py", line 55, in <module>
    from deeplab.core import dense_prediction_cell
...
  File "/Users/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 71, in <module>
    from tensorflow.python.estimator import model_fn as model_fn_lib
ImportError: cannot import name model_fn
Seems like the deeplab model could not be loaded.
BTW, could you give me the information of the git commit message of deeplab model in the models/deeplab/ repo. maybe it did not meet the requirements in deeplab.

I appreciated your help!

Thanks.

Hi,
I met the same problem.It seems that it caused by tensorflow-estimator. I uninstall it and the problem is solved.

jiady1990 on 15 Apr 2019

Thanks @jiady1990
I reinstalled the package tensorflow-estimator==1.10.12 and tensorflow==1.12.0 under environment python=2.7, but still exited when res = sess.run(ops) in this post:

Hi @pvoigtlaender,

Indeed, I have changed the code and added prints to see what is happening.

The program exits at the line 469 :
res = sess.run(ops)
calling from python2.7/site-packages/tensorflow/python/platform/app.py :
_sys.exit(main(argv))

Hence, it seems that _process_seq_data is not even called.

When I print ops I get the following :[<tf.Tensor 'concat_1:0' shape=(?, ?, ?) dtype=int32>, <tf.Tensor 'Squeeze:0' shape=(?, ?, ?) dtype=int32>, <tf.Tensor 'strided_slice_4:0' shape=() dtype=string>])

FYI, I'm trying to run the code with a nvidia 1080Ti (12GB). Perhaps it is not enough and make the program to exit? Any advice to decrease the memory consumption without affecting too much the accuracy is welcomed.

Thanks for your help.

I will try to apply correlation cost then.

mtroym on 18 Apr 2019

@jeremy-cv @MTonyM Have you solved the problem?

kaixu93 on 10 May 2019

Hi @kaixu93,

I've tried to run it on another gpu machine (Tesla P100 16Go) and it worked. I suspect 12Go GPU memory is not enough but I can't be 100% sure.

jeremy-cv on 13 May 2019

@jeremy-cv But the train.sh can work on 12G GPU, why the eval.sh is not enough memory.

kaixu93 on 13 May 2019

Hi There,
We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.