virtualenv --python=python3.6 env
source env/bin/activate
git clone https://github.com/mozilla/DeepSpeech
git checkout v0.6.0
downloaded v0.6.0 pretrained checkpoint
https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-checkpoint.tar.gz
cd DeepSpeech
pip install -r requirements.txt
pip install tensorflow-gpu == 1.14.0
pip3 install $(python3 util/taskcluster.py --decoder)
Continuing training from a release model:
mkdir fine_tuning_checkpoints
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ./deepspeech-0.6.0-checkpoint --epochs 3 --train_files ./data/csv_files/train.csv --dev_files ./data/csv_files/dev.csv --test_files ./data/csv_files/test.csv --learning_rate 0.0001
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1206 06:45:41.998423 140389067556672 deprecation.py:323] From /media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from ./deepspeech-0.6.0-checkpoint/best_dev-233784
I1206 06:45:42.020016 140389067556672 saver.py:1280] Restoring parameters from ./deepspeech-0.6.0-checkpoint/best_dev-233784
Traceback (most recent call last):
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
[[{{node save_1/RestoreV2}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
[[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]
Original stack trace for 'save_1/RestoreV2':
File "DeepSpeech.py", line 965, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 938, in main
train()
File "DeepSpeech.py", line 495, in train
best_dev_saver = tfv1.train.Saver(max_to_keep=1)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
self.build()
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1296, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 965, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 938, in main
train()
File "DeepSpeech.py", line 554, in train
loaded = try_loading(session, best_dev_saver, 'best_dev_checkpoint', 'best validation')
File "DeepSpeech.py", line 403, in try_loading
saver.restore(session, checkpoint_path)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
[[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]
Original stack trace for 'save_1/RestoreV2':
File "DeepSpeech.py", line 965, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 938, in main
train()
File "DeepSpeech.py", line 495, in train
best_dev_saver = tfv1.train.Saver(max_to_keep=1)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
self.build()
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
How to resolve this issue? i was followed right instructions but why it is happened?
@MuruganR96 Can you share pip list output ?
Package Version
-------------------- ------------
absl-py 0.8.1
astor 0.8.0
attrdict 2.0.1
audioread 2.1.8
bcrypt 3.1.7
beautifulsoup4 4.8.1
bs4 0.0.1
certifi 2019.11.28
cffi 1.13.2
chardet 3.0.4
cryptography 2.8
cycler 0.10.0
decorator 4.4.1
ds-ctcdecoder 0.6.0
gast 0.3.2
google-pasta 0.1.8
grpcio 1.25.0
h5py 2.10.0
idna 2.8
joblib 0.14.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
librosa 0.7.1
llvmlite 0.30.0
Markdown 3.1.1
matplotlib 3.1.2
numba 0.46.0
numpy 1.15.4
pandas 0.25.3
paramiko 2.7.0
pip 19.3.1
pkg-resources 0.0.0
progressbar2 3.47.0
protobuf 3.11.1
pycparser 2.19
PyNaCl 1.3.0
pyparsing 2.4.5
python-dateutil 2.8.1
python-utils 2.3.0
pytz 2019.3
pyxdg 0.26
requests 2.22.0
resampy 0.2.2
scikit-learn 0.22
scipy 1.3.3
setuptools 42.0.2
six 1.13.0
SoundFile 0.10.3.post1
soupsieve 1.9.5
sox 1.3.7
tensorboard 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
urllib3 1.25.7
webrtcvad 2.0.10
Werkzeug 0.16.0
wheel 0.33.6
wrapt 1.11.2
Weird. I remember this error when loading a cudnn checkpoint on a non cudnn setup, can you check that? I' the release notes we also document the flag to use in that case, can you test with it?
@lissyx sir, i think, you are mentioned this flag.
--cudnn_checkpoint: path to a checkpoint created using --use_cudnn_rnn.
Specifying this flag allows one to convert a CuDNN RNN checkpoint to a
checkpoint capable of running on a CPU graph.
(default: '')
command:
CUDA_VISIBLE_DEVICES=2 python3 DeepSpeech.py --n_hidden 2048 --cudnn_checkpoint ./deepspeech-0.6.0-checkpoint --epochs 3 --train_files ./data/csv_files/train.csv --dev_files ./data/csv_files/dev.csv --test_files ./data/csv_files/test.csv --learning_rate 0.0001
I Converting CuDNN RNN checkpoint from ./deepspeech-0.6.0-checkpoint
Traceback (most recent call last):
File "DeepSpeech.py", line 965, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 938, in main
train()
File "DeepSpeech.py", line 525, in train
ckpt = tfv1.train.load_checkpoint(FLAGS.cudnn_checkpoint)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 65, in load_checkpoint
"given directory %s" % ckpt_dir_or_file)
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./deepspeech-0.6.0-checkpoint
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1
NVIDIA TITAN RTX
@lissyx sir. what is problem for v0.6.0 pretrained checkpoint files?
in my side i made any mistakes?
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./deepspeech-0.6.0-checkpoint
Are you sure about the path ? How about --checkpoint_dir as well ?
@lissyx sir, i added both --checkpoint_dir and --cudnn_checkpoint both.
CUDA_VISIBLE_DEVICES=2 python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ./deepspeech-0.6.0-checkpoint --cudnn_checkpoint ./deepspeech-0.6.0-checkpoint --epochs 3 --train_files ./data/csv_files/train.csv --dev_files ./data/csv_files/dev.csv --test_files ./data/csv_files/test.csv --learning_rate 0.0001
same error :)
I Converting CuDNN RNN checkpoint from ./deepspeech-0.6.0-checkpoint
Traceback (most recent call last):
File "DeepSpeech.py", line 965, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 938, in main
train()
File "DeepSpeech.py", line 525, in train
ckpt = tfv1.train.load_checkpoint(FLAGS.cudnn_checkpoint)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 65, in load_checkpoint
"given directory %s" % ckpt_dir_or_file)
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./deepspeech-0.6.0-checkpoint
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./deepspeech-0.6.0-checkpoint
Well, have you checked what the error states ?
yes. everything is fine. _./deepspeech-0.6.0-checkpoint_ in this folder pretrained checkpoints already present.
but it is showing
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./deepspeech-0.6.0-checkpoint
yes. everything is fine. _./deepspeech-0.6.0-checkpoint_ in this folder pretrained checkpoints already present.
You don't clearly answer. What is the content of ./deepspeech-0.6.0-checkpoint/ ?
You don't clearly answer. What is the content of
./deepspeech-0.6.0-checkpoint/?
@lissyx sir,
ls deepspeech-0.6.0-checkpoint
best_dev-233784.data-00000-of-00001 best_dev-233784.index best_dev-233784.meta best_dev_checkpoint flags.txt
@lissyx . I found the problem is,
ckpt = tfv1.train.load_checkpoint(FLAGS.cudnn_checkpoint)
it was not picking checkpoint from the directory and not loading. i tested like this,
ckpt = tfv1.train.load_checkpoint("/media/user1/storage-1/Murugan/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784")
I Initializing missing Adam moment tensors.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:17 | Steps: 202 | Loss: 16.245042 Traceback (most recent call last):
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Not enough time for target transition sequence (required: 89, available: 53)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node tower_0/CTCLoss}}]]
[[Mean_8/_91]]
(1) Invalid argument: Not enough time for target transition sequence (required: 89, available: 53)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node tower_0/CTCLoss}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 971, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 944, in main
train()
File "DeepSpeech.py", line 637, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File "DeepSpeech.py", line 605, in run_set
feed_dict=feed_dict)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Not enough time for target transition sequence (required: 89, available: 53)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[node tower_0/CTCLoss (defined at DeepSpeech.py:231) ]]
[[Mean_8/_91]]
(1) Invalid argument: Not enough time for target transition sequence (required: 89, available: 53)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[node tower_0/CTCLoss (defined at DeepSpeech.py:231) ]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node tower_0/CTCLoss:
tower_0/raw_logits (defined at DeepSpeech.py:196)
tower_0/DeserializeSparse (defined at DeepSpeech.py:220)
Input Source operations connected to node tower_0/CTCLoss:
tower_0/raw_logits (defined at DeepSpeech.py:196)
tower_0/DeserializeSparse (defined at DeepSpeech.py:220)
Original stack trace for 'tower_0/CTCLoss':
File "DeepSpeech.py", line 971, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 944, in main
train()
File "DeepSpeech.py", line 474, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File "DeepSpeech.py", line 301, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File "DeepSpeech.py", line 231, in calculate_mean_edit_distance_and_loss
total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/ctc_ops.py", line 176, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 335, in ctc_loss
name=name)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
it was not picking checkpoint from the directory and not loading. i tested like this,
ckpt = tfv1.train.load_checkpoint("/media/user1/storage-1/Murugan/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784")
That's not the appropriate way. Please use --cudnn_checkpoint.
ls deepspeech-0.6.0-checkpoint
best_dev-233784.data-00000-of-00001 best_dev-233784.index best_dev-233784.meta best_dev_checkpoint flags.txt
Try adding a checkpoint symlink that links to best_dev_checkpoint.
@lissyx sir. i have a doubt.
if we trying to use --cudnn_checkpoint means the --cudnn_checkpoint flag is only needed when converting a CuDNN RNN checkpoint to a CPU-capable graph.
If your system is capable of using CuDNN RNN, you can just specify the CuDNN RNN checkpoint normally with --checkpoint_dir.
here i am case 2.
my system is capable of using CuDNN RNN. then normally with --checkpoint_dir is enough for me.
but why i need -cudnn_checkpoint?
@lissyx i am bit confusing. clarify once :)
here i am case 2.
my system is capable of using CuDNN RNN. then normally with --checkpoint_dir is enough for me.but why i need -cudnn_checkpoint?
This is what I asked you in the beginning, if your setup was properly done for CuDNN. The error obviously suggests it's not the case.
@lissyx sir. my CuDNN setup might be wrong?
@lissyx sir. how to resolve this issue? what is the problem here i did? :)
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
[[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]
@lissyx sir. my CuDNN setup might be wrong?
@lissyx sir. how to resolve this issue? what is the problem here i did? :)
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint [[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]
Ok, I can't keep repeating over and over the same things. I told you: the error is because it cannot resume using CuDNN. Check your setup if it is supposed to work.
You have to specify --use_cudnn_rnn, it's not enabled by default.
@reuben sir. i tried --use_cudnn_rnn true.
CUDA_VISIBLE_DEVICES=2,3 python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ./deepspeech-0.6.0-checkpoint --epochs 3 --train_files ./data/csv_files/train.csv --dev_files ./data/csv_files/dev.csv --test_files ./data/csv_files/test.csv --learning_rate 0.0001 --use_cudnn_rnn true
not yet resolved. :(
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node save_1/CudnnRNNCanonicalToParams}}]]
Traceback (most recent call last):
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node save_1/CudnnRNNCanonicalToParams}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 972, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 945, in main
train()
File "DeepSpeech.py", line 561, in train
loaded = try_loading(session, best_dev_saver, 'best_dev_checkpoint', 'best validation')
File "DeepSpeech.py", line 403, in try_loading
saver.restore(session, checkpoint_path)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[node save_1/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:495) ]]
Original stack trace for 'save_1/CudnnRNNCanonicalToParams':
File "DeepSpeech.py", line 972, in <module>
absl.app.run(main)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 945, in main
train()
File "DeepSpeech.py", line 495, in train
best_dev_saver = tfv1.train.Saver(max_to_keep=1)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
self.build()
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 744, in restore
restored_tensors)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 221, in tf_canonical_to_opaque
opaque_params = self._cu_canonical_to_opaque(cu_weights, cu_biases)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 271, in _cu_canonical_to_opaque
direction=self._direction)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 917, in cudnn_rnn_canonical_to_params
seed=seed, seed2=seed2, name=name)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
resolved. i used this flag with their suggestion link.
--use_allow_growth true
CUDA_VISIBLE_DEVICES=2,3 python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir deepspeech-0.6.0-checkpoint/ --epochs 3 --train_files data/train_18-11-2019.csv --dev_files data/dev_18-11-2019.csv --test_files data/test_18-11-2019.csv --learning_rate 0.0001 --use_cudnn_rnn true --use_allow_growth true
Thanks @lissyx @reuben
resolved. i used this flag with their suggestion link.
--use_allow_growth true
CUDA_VISIBLE_DEVICES=2,3 python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir deepspeech-0.6.0-checkpoint/ --epochs 3 --train_files data/train_18-11-2019.csv --dev_files data/dev_18-11-2019.csv --test_files data/test_18-11-2019.csv --learning_rate 0.0001 --use_cudnn_rnn true --use_allow_growth trueThanks @lissyx @reuben
@MuruganR96 I'm having the same issue that you faced, as i following your solution. This is the error that i am getting. @lissyx @reuben sir your intention is also required. Thanks!
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from deepspeech-0.6.0-checkpoint/best_dev-233784
I1218 12:42:33.675762 139888463619840 saver.py:1280] Restoring parameters from deepspeech-0.6.0-checkpoint/best_dev-233784
E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:118) with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E <no registered kernels>
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.0-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.0-checkpoint/best_dev-233784.
resolved. i used this flag with their suggestion link.
--use_allow_growth true
CUDA_VISIBLE_DEVICES=2,3 python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir deepspeech-0.6.0-checkpoint/ --epochs 3 --train_files data/train_18-11-2019.csv --dev_files data/dev_18-11-2019.csv --test_files data/test_18-11-2019.csv --learning_rate 0.0001 --use_cudnn_rnn true --use_allow_growth true
Thanks @lissyx @reuben@MuruganR96 I'm having the same issue that you faced, as i following your solution. This is the error that i am getting. @lissyx @reuben sir your intention is also required. Thanks!
Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from deepspeech-0.6.0-checkpoint/best_dev-233784 I1218 12:42:33.675762 139888463619840 saver.py:1280] Restoring parameters from deepspeech-0.6.0-checkpoint/best_dev-233784 E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: E E No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:118) with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=240] E Registered devices: [CPU, XLA_CPU] E Registered kernels: E <no registered kernels> E E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]] E The checkpoint in deepspeech-0.6.0-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.0-checkpoint/best_dev-233784.
Have you passed the cudnn flags?
Your error seems to suggest your cudnn installation is wrong.
Have you passed the cudnn flags?
Yes sir i have tried --checkpoint_dir and --cudnn_checkpoint both method.
@l192423 Please check your setup then, you lack the CUDNN kernels. Either you CUDA installation is broken / incomplete, or your TensorFlow is, or both.
OK i will check my setup. Another thing is when i start training without checkpoint than everything would be fine and training continue without any issue. If my tensorflow or CUDNN installation is broken than why it is working when i am not using checkpoint.
This is my current training status without including checkpoint.
Epoch 0 | Training | Elapsed Time: 1 day, 21:40:02 | Steps: 3786 | Loss: 88.452532
OK i will check my setup. Another thing is when i start training without checkpoint than everything would be fine and training continue without any issue. If my tensorflow or CUDNN installation is broken than why it is working when i am not using checkpoint.
This is my current training status without including checkpoint.
Epoch 0 | Training | Elapsed Time: 1 day, 21:40:02 | Steps: 3786 | Loss: 88.452532
Are you enabling cudnn in this case?
OK i will check my setup. Another thing is when i start training without checkpoint than everything would be fine and training continue without any issue. If my tensorflow or CUDNN installation is broken than why it is working when i am not using checkpoint.
This is my current training status without including checkpoint.
Epoch 0 | Training | Elapsed Time: 1 day, 21:40:02 | Steps: 3786 | Loss: 88.452532Are you enabling cudnn in this case?
In this case i am just using the --checkpoint_dir flag and start training without using any checkpoint.
But when i am using this command with checkpoint
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/ --export_tflite --export_dir /home/neeha/Tayyab/export --epochs 1 --train_files /home/neeha/Tayyab/CV/clips/train.csv --dev_files /home/neeha/Tayyab/CV/clips/dev.csv --test_files /home/neeha/Tayyab/CV/clips/test.csv --learning_rate 0.0001 --use_cudnn_rnn true --use_allow_growth true
i receive these error
E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:118) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784.
OK i will check my setup. Another thing is when i start training without checkpoint than everything would be fine and training continue without any issue. If my tensorflow or CUDNN installation is broken than why it is working when i am not using checkpoint.
This is my current training status without including checkpoint.
Epoch 0 | Training | Elapsed Time: 1 day, 21:40:02 | Steps: 3786 | Loss: 88.452532Are you enabling cudnn in this case?
In this case i am just using the --checkpoint_dir flag and start training without using any checkpoint.
But when i am using this command with checkpointpython3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/ --export_tflite --export_dir /home/neeha/Tayyab/export --epochs 1 --train_files /home/neeha/Tayyab/CV/clips/train.csv --dev_files /home/neeha/Tayyab/CV/clips/dev.csv --test_files /home/neeha/Tayyab/CV/clips/test.csv --learning_rate 0.0001 --use_cudnn_rnn true --use_allow_growth true
i receive these error
E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:118) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /home/neeha/Tayyab/DeepSpeech/deepspeech-0.6.0-checkpoint/best_dev-233784.
So we are back to square one: your TensorFlow / CUDNN setup is broken.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
You have to specify
--use_cudnn_rnn, it's not enabled by default.