Hello :) i am working on transformer for multiple problems. Only one problem doesnt show any errors. But from two problems t2t-trainer shows error like title.
The location that error occurs is at line 1010 in _train_model of tensorflow/contrib/learn/python/learn/estimators/estimator.py
How can i resolve that?
I tried adding more swap partitions(100GB) or decreasing max_case numbers and so on. Could anyone please help me?
(My env is 46GB Ram 100GB swap gtx1080)
Thank you in advance,
Jae
Please provide the command-line you used to start the trainer and the stack trace of the error. Thanks.
Hi,
Please find below the command- line call and stack trace for this error. Also, please note that training a single problem using the same parameters is working fine. The error appears when we try to train multiple problems together.
Command line call:
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=wmt_ende_tokens_32k-wmt_enfr_tokens_32k \
--model=transformer \
--hparams_set=transformer_base_single_gpu \
--output_dir=$TRAIN_DIR
Stack trace:
Traceback (most recent call last):
File "/home/dilip/anaconda2/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/dilip/anaconda2/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/dilip/tensor2tensor/tensor2tensor/utils/trainer_utils.py", line 247, in run
run_locally(exp_fn(output_dir))
File "/home/dilip/tensor2tensor/tensor2tensor/utils/trainer_utils.py", line 540, in run_locally
exp.train_and_evaluate()
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 502, in train_and_evaluate
self.train(delay_secs=0)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 280, in train
hooks=self._train_monitors + extra_hooks)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 677, in _call_train
monitors=hooks)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
return func(args, *kwargs)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 458, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1010, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 518, in run
run_metadata=run_metadata)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 862, in run
run_metadata=run_metadata)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run
return self._sess.run(args, *kwargs)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 972, in run
run_metadata=run_metadata)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run
return self._sess.run(args, *kwargs)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/dilip/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value
This was due to summaries not working with tf.conds right. Should be corrected in 1.1.0, please take a look and reopen if you see this again!
Most helpful comment
This was due to summaries not working with tf.conds right. Should be corrected in 1.1.0, please take a look and reopen if you see this again!