Keras: ResourceExhaustedError: OOM - When Doing an embedding using google's trasformer Architecture:

Created on 16 Jan 2019 · 6Comments · Source: keras-team/keras

I am getting a ResourceExhaustedError: OOM - When Doing an embedding usinfg google's trasformer Architecture which embeds the text into a 512 dimensional vectors.

The data I'm trying to embed has 5000 records which adds up to 40MB of data.
GPU used: Tesla k80 in a GCP instance.
CPUs : 4 (15mb RAM)
Tensorflow: tensorflow-gpu (3.0.1)

HERE IS THE CODE SNIPPET:
with tf.Session() as session: session.run([tf.global_variables_initializer(), tf.tables_initializer()]) message_embeddings = session.run(embed(test_cleansed_data))

Here is the log:

ResourceExhaustedError Traceback (most recent call last)
~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, args)
1326 try:
-> 1327 return fn(args)
1328 except errors.OpError as e:

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1311 return self._call_tf_sessionrun(
-> 1312 options, feed_dict, fetch_list, target_list, run_metadata)
1313

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1419 self._session, options, feed_dict, fetch_list, target_list,
-> 1420 status, run_metadata)
1421

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
515 compat.as_text(c_api.TF_Message(self.status.status)),
--> 516 c_api.TF_GetCode(self.status.status))
517 # Delete the underlying status object from memory otherwise it stays alive

ResourceExhaustedError: OOM when allocating tensor with shape[4096000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = SoftmaxT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

ResourceExhaustedError Traceback (most recent call last)
in
1 with tf.Session() as session:
2 session.run([tf.global_variables_initializer(), tf.tables_initializer()])
----> 3 message_embeddings = session.run(embed(test_cleansed_data))

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
903 try:
904 result = self._run(None, fetches, feed_dict, options_ptr,
--> 905 run_metadata_ptr)
906 if run_metadata:
907 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1138 if final_fetches or final_targets or (handle and feed_dict_tensor):
1139 results = self._do_run(handle, final_targets, final_fetches,
-> 1140 feed_dict_tensor, options, run_metadata)
1141 else:
1142 results = []

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1319 if handle is None:
1320 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1321 run_metadata)
1322 else:
1323 return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1338 except KeyError:
1339 pass
-> 1340 raise type(e)(node_def, op, message)
1341
1342 def _extend_graph(self):

 [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax', defined at:
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
self.io_loop.start()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
self._run_once()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
handle._run()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(self._args)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(args, *kwargs)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
self.run()
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
yield gen.maybe_future(dispatch(args))
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
user_expressions, allow_stdin,
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(args, *kwargs)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
return runner(coro)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
coro.send(None)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3185, in run_ast_nodes
if (yield from self.run_code(code, result)):
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 3, in
message_embeddings = session.run(embed(test_cleansed_data))
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow_hub/module.py", line 247, in __call__
name=name)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow_hub/native_module.py", line 514, in create_apply_graph
import_scope=relative_scope_name)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1927, in import_meta_graph
*kwargs)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 741, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(args, **kwargs)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 577, in import_graph_def
op_def=op_def)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = SoftmaxT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

If someone can give a solution for this, that would be great!

TA
Arav

tensorflow support

Source

Aravinviju

All 6 comments

I have faced similar issues. The only thing you can do is create tensors that can fit in your memory. This error can solved by reducing text vector dimension. But that will lead towards low accuracy of your model. You can try reducing batch size which will not affect the model much and will throw this error.

For more help, you need to share your code/model architecture so that one can understand how much memory you are actually allocating to tensors and where we can reduce the memory usage.

ParikhKadam on 16 Jan 2019

👍1

I have faced similar issues. The only thing you can do is create tensors that can fit in your memory. This error can solved by reducing text vector dimension. But that will lead towards low accuracy of your model. You can try reducing batch size which will not affect the model much and will throw this error.

For more help, you need to share your code/model architecture so that one can understand how much memory you are actually allocating to tensors and where we can reduce the memory usage.

It is not actually a model built by me, it is a pre-existing text embedding model called Transformer Architecture from google, (for more info : https://www.learnopencv.com/universal-sentence-encoder/?ck_subscriber_id=272164240). Basically I use this embeding technique to get the vectors and then use them for clustering.
Anyway I'll try changing the batch and get back.

Thanks
Arav

Aravinviju on 16 Jan 2019

@Aravinviju I looked at the model. Basically, it is the model which can generate word embeddings given a text as input. Once, I too needed to use word and character embeddings as a part of my model. But I came to know that I would require very high specs PC to do this task which wasn't possible for me.

I too wanted to develop such model in order to learn how it works but that wasn't possible. So, I used pretrained word and character embeddings and then passsed them to a Bidirectional LSTM so that they learn contextual information based on the problem set.

Gensim is most popular library for this purpose. Even though, I used pymagnitude in my model. Both are very easy to use and can help you out.

BTW, reducing the batch size will definitely work but it still depends on your specs. Use a binary search like method to find a range of batch size that will work on your device.

Thank you.. Update me and I will be here for help.

ParikhKadam on 17 Jan 2019

👍1

@ParikhKadam

Thanks a lot for your reply!

Yes indeed, but this particular embedding is better than the basic one's.
I have tried the pre-trained embedings both Gensim (for machine learning) and also a word embedding model (https://www.cs.york.ac.uk/nlp/extvec/) from google (for a CNN classification model) in the initial phase. The current use-case i'm working on requires it to understand the meaning or irregular text for which I needed Sentence and paragraph embedding - Which is actually done by DAN and TA from google (same link provided before ), thus makes the model more understanding and gives better results too.

The point you suggested - reducing the batch size, in this process, its not exactly a model training I'm doing so I wasn't sure giving a value for the batch size in embedding. But still I split my dataset and sent the data for embedding in batches and getting the embedding results as lists and then finally joining the embedding results as on large file.

So, your point of reducing the batch size worked!

In terms of the device, as I stated in the details of the question, its a GCP instance with a Teslak80 GPU for which even 1GB of text data is easy enough to process. I was just processing 100MB of data at a time, but since its converting it to 512 dimension vectors relatively the batch size is also more it couldn't handle it.

Thanks for you help @ParikhKadam
Will get back if I need anything else!
For now I'll close this issue!

Cheers
Arav

Aravinviju on 17 Jan 2019

@Aravinviju Welcome.. Happy to help.

ParikhKadam on 17 Jan 2019

👍1

@ParikhKadam , Thank you for resolving this query. This issue is closed.