Keras: Embedding with TensorFlow very slow: converts indices to dense gradients

Created on 13 Nov 2016 · 6Comments · Source: keras-team/keras

I've noticed that the Embedding layer with TensorFlow backend is converting sparse gradient updates to dense ones and killing the performance, as well as gobbling up lots of memory. This is making it unusable for a large scale problem with a large Embedding layer.

Here is a script that makes a model with single large embedding layer using Keras and TensorFlow directly. In Keras, it takes about 2.3 seconds / batch and uses > 9 GB of memory while training. In TensorFlow it only takes 20 ms / batch (100X faster) and uses < 4 G of memory.

This is using TensorFlow 0.11.0rc2 and the master branch of Keras.

import numpy as np

from keras.layers import Embedding, Input
from keras.models import Model

# a model with just one Embedding layer
token_ids = Input(batch_shape=(128, 20),
                          dtype='int32', name='token_ids')
token_embedding = Embedding(793471,
    512, mask_zero=False, input_length=20)(token_ids)

model = Model(input=[token_ids], output=token_embedding)
model.compile(loss='mse', optimizer='sgd')

X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)

# compile model
model.train_on_batch(X, y)

# now time
%timeit model.train_on_batch(X, y)

Outputs:

Using TensorFlow backend.
/Users/matthewp/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:87: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 406257152 elements. This may consume a large amount of memory.
  "This may consume a large amount of memory." % num_elements)
1 loop, best of 3: 2.32 s per loop

With TensorFlow:

import numpy as np
import tensorflow as tf

token_ids = tf.placeholder(tf.int32, [128, 20])
W = tf.Variable(tf.zeros([793471, 512]))
token_embedding = tf.gather(W, token_ids)
y_ = tf.placeholder(tf.float32, [128, 20, 512])
loss = tf.reduce_mean((token_embedding - y_) ** 2)

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

init = tf.initialize_all_variables()

X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)

with tf.Session() as sess:
    sess.run(init)
    sess.run(train_step, feed_dict={token_ids: X, y_: y})
    %timeit sess.run(train_step, feed_dict={token_ids: X, y_: y})

Outputs:

10 loops, best of 3: 20.8 ms per loop

stale

Source

matt-peters

Most helpful comment

After some debugging, it turns out this is due to the Keras optimizers and the way in which they compute gradient updates. The sparse gradient updates produced by the embedding layer need to be handled in a different manner then the dense gradient updates. TensorFlow optimizers provide a mechanism to do this (methods _apply_sparse and _apply_dense). This problem goes away by using a TensorFlow optimizer, e.g.:

model.compile(loss='mse', 
              optimizer=TFOptimizer(tf.train.GradientDescentOptimizer(0.1)))

outputs

100 loops, best of 3: 17.7 ms per loop

matt-peters on 14 Nov 2016

👍9

All 6 comments

model.compile(loss='mse', 
              optimizer=TFOptimizer(tf.train.GradientDescentOptimizer(0.1)))

outputs

100 loops, best of 3: 17.7 ms per loop

matt-peters on 14 Nov 2016

👍9

I applied a similar approach as you suggested and training started working much faster - thanks !
Kera can not save the model though -

\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\models.py:114: UserWarning: TensorFlow optimizers do not make it possible to access optimizer attributes or optimizer state after instantiation. As a result, we cannot save the optimizer as part of the model save file.You will have to compile your model again after loading it. Prefer using a Keras optimizer instead (see keras.io/optimizers).

pianoman4873 on 3 Apr 2017

Also, when running this in an AWS machine I got the following error -

    for attr 'tensor_type'
    ; NodeDef: embedding_1/embeddings/_77 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"](^Adagrad/learning_rate/_59, ^Adagrad/update_embedding_1/embeddings/UnsortedSegmentSum, ^Adagrad/update_embedding_1/embeddings/Unique); Op<name=_Recv; signature= -> tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>

E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Invalid argument: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
return fn(*args)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
status, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"](^Adagrad/learning_rate/_59, ^Adagrad/update_embedding_1/embeddings/UnsortedSegmentSum, ^Adagrad/update_embedding_1/embeddings/Unique)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Trainer.py", line 30, in
Train()
File "Trainer.py", line 24, in Train
validation_data=testGenerator.generate())
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(args, *kwargs)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/engine/training.py", line 1876, in fit_generator
class_weight=class_weight)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/engine/training.py", line 1620, in train_on_batch
outputs = self.train_function(ins)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2075, in __call__
feed_dict=feed_dict)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"](^Adagrad/learning_rate/_59, ^Adagrad/update_embedding_1/embeddings/UnsortedSegmentSum, ^Adagrad/update_embedding_1/embeddings/Unique)]]

I used the following optimizer -

optimizer = opt.TFOptimizer(tf.train.AdagradOptimizer(0.01))

pianoman4873 on 3 Apr 2017

Keras can save the model, just not its optimizer. It's no big deal, just
recompile the model after loading.

Anyone interested in adding support for sparse gradient updates in Keras
optimizers?

On 3 April 2017 at 23:24, pianoman4873 notifications@github.com wrote:

I applied a similar approach as you suggested and training started working
much faster - thanks !
Kera can not save the model though -

\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\models.py:114:
UserWarning: TensorFlow optimizers do not make it possible to access
optimizer attributes or optimizer state after instantiation. As a result,
we cannot save the optimizer as part of the model save file.You will have
to compile your model again after loading it. Prefer using a Keras
optimizer instead (see keras.io/optimizers).

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4365#issuecomment-291278604,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb5NR5FqPdUl1PpJCQ8yXE6-2y1aCks5rsWOXgaJpZM4Kwju5
.

fchollet on 3 Apr 2017

thanks @fchollet , but look at the bigger issue when running this on a machine with GPU and the line

optimizer = opt.TFOptimizer(tf.train.AdagradOptimizer(0.01))

E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Invalid argument: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
return fn(*args)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
status, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/contextlib.py", line 66, in exit
next(self.gen)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"](^Adagrad/learning_rate/_59, ^Adagrad/update_embedding_1/embeddings/UnsortedSegmentSum, ^Adagrad/update_embedding_1/embeddings/Unique)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Trainer.py", line 30, in
Train()
File "Trainer.py", line 24, in Train
validation_data=testGenerator.generate())
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(args, *kwargs)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/engine/training.py", line 1876, in fit_generator
class_weight=class_weight)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/engine/training.py", line 1620, in train_on_batch
outputs = self.train_function(ins)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2075, in call
feed_dict=feed_dict)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/python35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: AttrValue must not have reference type value of float_ref
for attr 'tensor_type'
; NodeDef: embedding_1/embeddings/_77 = _Recv_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"; Op tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
[[Node: embedding_1/embeddings/_77 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_142_embedding_1/embeddings", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/cpu:0"](^Adagrad/learning_rate/_59, ^Adagrad/update_embedding_1/embeddings/UnsortedSegmentSum, ^Adagrad/update_embedding_1/embeddings/Unique)]]

When I ran it with optimizer = opt.Adagrad() it runs fine but becomes very slow do to the conversion to dense gradient update when the embedding size is big...

pianoman4873 on 3 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.